{"componentChunkName":"component---src-templates-blog-post-tsx","path":"/best-practices-of-orchestrating-python-and-r-code-in-ml-projects","result":{"data":{"markdownRemark":{"id":"c8fa7dff-46a0-5a96-89a3-7f3033768ca3","excerpt":"<p>Beside Git and shell scripting additional tools are developed to facilitate the\ndevelopment of predictive model in a multi-language…</p>","html":"<p>Beside Git and shell scripting additional tools are developed to facilitate the\ndevelopment of predictive model in a multi-language environments. For fast data\nexchange between R and Python let’s use binary data file format\n<a href=\"https://blog.rstudio.com/2016/03/29/feather/\">Feather</a>. Another language\nagnostic tool <a href=\"http://dvc.org\">DVC</a> can make the research reproducible — let’s\nuse DVC to orchestrate R and Python code instead of a regular shell scripts.</p>\n<h2>Machine learning with R and Python</h2>\n<p>Both R and Python are having powerful libraries/packages used for predictive\nmodeling. Usually algorithms used for classification or regression are\nimplemented in both languages and some scientist are using R while some of them\npreferring Python. In an example that was explained in previous\n<a href=\"https://blog.dataversioncontrol.com/r-code-and-reproducible-model-development-with-dvc-1507a0e3687b\">tutorial</a>\ntarget variable was binary output and logistic regression was used as a training\nalgorithm. One of the algorithms that could also be used for prediction is a\npopular <a href=\"https://en.wikipedia.org/wiki/Random_forest\">Random Forest algorithm</a>\nwhich is implemented in both programming languages. Because of performances it\nwas decided that Random Forest classifier should be implemented in Python (it\nshows better performances than random forest package in R).</p>\n<h2>R example used for DVC demo</h2>\n<p>We will use the same example from previous blog\n<a href=\"https://blog.dataversioncontrol.com/r-code-and-reproducible-model-development-with-dvc-1507a0e3687b\">story</a>,\nadd some Python codes and explain how Feather and DVC can simplify the\ndevelopment process in this combined environment.</p>\n<p>Let’s recall briefly the R codes from previous tutorial:</p>\n<p><html><head></head><body><span class=\"gatsby-resp-image-wrapper\" style=\"position: relative; display: block; margin-left: auto; margin-right: auto;  max-width: 335px;\">\n      <a class=\"gatsby-resp-image-link\" href=\"/static/68824bc8c4ac0c84edf737da9f1bfa01/9be56/r-jobs.png\" style=\"display: block\" target=\"_blank\" rel=\"noopener\">\n    <span class=\"gatsby-resp-image-background-image\" style=\"padding-bottom: 78.65671641791046%; position: relative; bottom: 0; left: 0; background-image: url(&#x27;data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAAQCAYAAAAWGF8bAAAACXBIWXMAAAsSAAALEgHS3X78AAADs0lEQVQ4y22UXVAbVRTH70y77GzVwVl19MUnp/rmm31xnNGxdcqItAOD+qC+MU3F2nxtMULIB0lKEgqjEFoSyBJKS5qQhFKVggntlI9WGpJAdpOUJ18qmVZJcIb1KV7PvSlQRx9+c8499//fvXvuSdDZkTTWiBmsHc1g3R5rdC2MrWFHJI+7owUaz0fz+JvxdawV9zUkEsgzCEg9sqroRtOK5ZqkmINZxQTQHOgKSUp3NKe4pwo7zlh+h6wtoarOCnFXawvLyrlARtH4UwrqDqY4dyjNXbqR5S5OZ7nvY2vc0A8SNwDxE9tNrmXsIfv60ZbnD7/7BX/EnGVPmH7ijIEVzj+T43w/StQ3CL7eyQxHnoVOX1oxtnrvG1uH/s1X3qSxfTzTaRhf79B5Fx1nBhJua0ju6ICaZmTV+CXVPK2vgtzxEu6b38bueBn3zpdxTwKg+TYevruD/fcU7FvewT7IAyt/YfEXBQ/c+ZNqiJf4LiTKNBKQIyILzmhOsIerEXomnIcaiabguuCI5PQdl1NtlolMW+dEVt95ldSq+7awBMhUb5+UBQeAGqxxvt6a4E/a5vl6S4L/0Byn8SOoNdvj/AdOqfbNk8Krbxxtea25f6O20fYzfwL2GoAmxy2+yX6Laom/EXJ0dni1ohPTFa2YopCcADdWgZurwKkr8Oa/4RRArmK8ul7R+lN7+n1PinoQNLgIV1+EMSh2hSWaw3js1jZdsVwRPmvTNVX4DR5Kc/g8qt/3kbpUbBvLFJFhaInRDS4wmoEFRri4yHzrXWLM/ruMeyLJqPtma9Dp4oFnDr9fi155i0ef/nqgwRCrcV25z/QGk0wPaAygV/ffYbSeBepFX/uSHjilR/2EM7CGkaFAOzymoNSvH77nM4jLYldI7teLqb19gvopLwEZYw+x6fompRNwzj7CF+YeY9fsY9wX/x17F8vYc3sL+5dhjJbK+Lv5P7Dr5iPcAxo3YJ7e3PMTyE9PBQ2lQLNV1mtZFfRKZZ+UVA4A+nOq/fJqK8EclE7ZwxLd6wYNXJZKP5qmvl0/+twxx35smWHf006xdW3T7IuNoyw65mOfrfezz9UNsujt5EH0wjs8eunIy6j5wcFDxz1sTd0waLzsIdAQ3/Fz02yTaYb9zD7HkrEpacV0Cd70H2AcSvAHsOWMymU40Tbc8JYQ+H8tjE8JxqaENOIaNlyRsSP6AFsnC7gLsEcK2BzKY0s4j51TG9h1fYNGoiE1W6Sqsz6BrNsnZKwPZPE/BKccZhNrfagAAAAASUVORK5CYII=&#x27;); background-size: cover; display: block;\"></span>\n  <picture>\n        <source srcset=\"/static/68824bc8c4ac0c84edf737da9f1bfa01/c54d4/r-jobs.webp 175w, /static/68824bc8c4ac0c84edf737da9f1bfa01/a3432/r-jobs.webp 350w, /static/68824bc8c4ac0c84edf737da9f1bfa01/6cceb/r-jobs.webp 670w\" sizes=\"(max-width: 670px) 100vw, 670px\" type=\"image/webp\">\n        <source srcset=\"/static/68824bc8c4ac0c84edf737da9f1bfa01/17006/r-jobs.png 175w, /static/68824bc8c4ac0c84edf737da9f1bfa01/d6f3f/r-jobs.png 350w, /static/68824bc8c4ac0c84edf737da9f1bfa01/9be56/r-jobs.png 670w\" sizes=\"(max-width: 670px) 100vw, 670px\" type=\"image/png\">\n        <img class=\"gatsby-resp-image-image\" src=\"/static/68824bc8c4ac0c84edf737da9f1bfa01/9be56/r-jobs.png\" alt=\"R Jobs\" title=\"R Jobs\" loading=\"lazy\" style=\"width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;\">\n      </picture>\n  </a>\n    </span></body></html><em>R Jobs</em></p>\n<p>Input data are StackOverflow posts — an XML file. Predictive variables are\ncreated from text posts — relative importance\n<a href=\"https://en.wikipedia.org/wiki/Tf%E2%80%93idf\">tf-idf</a> of words among all\navailable posts is calculated. With tf-idf matrices target is predicted and\nlasso logistic regression for predicting binary output is used. AUC is\ncalculated on the test set and AUC metric is used on evaluation.</p>\n<p>Instead of using logistic regression in R we will write Python jobs in which we\nwill try to use random forest as training model. Train_model.R and evaluate.R\nwill be replaced with appropriate Python jobs.</p>\n<p>R codes can be seen\n<a href=\"https://blog.dataversioncontrol.com/r-code-and-reproducible-model-development-with-dvc-1507a0e3687b\">here</a>.</p>\n<p>Code for <html><head></head><body><code class=\"language-text\">train_model_Python.py</code></body></html> is presented below:</p>\n<p><html><head></head><body><div id=\"gist73527556\" class=\"gist\">\n    <div class=\"gist-file\">\n      <div class=\"gist-data\">\n        <div class=\"js-gist-file-update-container js-task-list-container file-box\">\n  <div id=\"file-train_model_python-py\" class=\"file\">\n    \n\n  <div itemprop=\"text\" class=\"Box-body p-0 blob-wrapper data type-python\">\n      \n<table class=\"highlight tab-size js-file-line-container\" data-tab-size=\"8\">\n      <tbody><tr>\n        <td id=\"file-train_model_python-py-L1\" class=\"blob-num js-line-number\" data-line-number=\"1\"></td>\n        <td id=\"file-train_model_python-py-LC1\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-k\">import</span> <span class=\"pl-s1\">numpy</span> <span class=\"pl-k\">as</span> <span class=\"pl-s1\">np</span></td>\n      </tr>\n      <tr>\n        <td id=\"file-train_model_python-py-L2\" class=\"blob-num js-line-number\" data-line-number=\"2\"></td>\n        <td id=\"file-train_model_python-py-LC2\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-k\">from</span> <span class=\"pl-s1\">sklearn</span>.<span class=\"pl-s1\">ensemble</span> <span class=\"pl-k\">import</span> <span class=\"pl-v\">RandomForestClassifier</span></td>\n      </tr>\n      <tr>\n        <td id=\"file-train_model_python-py-L3\" class=\"blob-num js-line-number\" data-line-number=\"3\"></td>\n        <td id=\"file-train_model_python-py-LC3\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-k\">import</span> <span class=\"pl-s1\">sys</span></td>\n      </tr>\n      <tr>\n        <td id=\"file-train_model_python-py-L4\" class=\"blob-num js-line-number\" data-line-number=\"4\"></td>\n        <td id=\"file-train_model_python-py-LC4\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-k\">try</span>: <span class=\"pl-k\">import</span> <span class=\"pl-s1\">cPickle</span> <span class=\"pl-k\">as</span> <span class=\"pl-s1\">pickle</span>   <span class=\"pl-c\"># python2</span></td>\n      </tr>\n      <tr>\n        <td id=\"file-train_model_python-py-L5\" class=\"blob-num js-line-number\" data-line-number=\"5\"></td>\n        <td id=\"file-train_model_python-py-LC5\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-k\">except</span>: <span class=\"pl-k\">import</span> <span class=\"pl-s1\">pickle</span>           <span class=\"pl-c\"># python3</span></td>\n      </tr>\n      <tr>\n        <td id=\"file-train_model_python-py-L6\" class=\"blob-num js-line-number\" data-line-number=\"6\"></td>\n        <td id=\"file-train_model_python-py-LC6\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-k\">from</span> <span class=\"pl-s1\">scipy</span> <span class=\"pl-k\">import</span> <span class=\"pl-s1\">sparse</span></td>\n      </tr>\n      <tr>\n        <td id=\"file-train_model_python-py-L7\" class=\"blob-num js-line-number\" data-line-number=\"7\"></td>\n        <td id=\"file-train_model_python-py-LC7\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-k\">from</span> <span class=\"pl-s1\">numpy</span> <span class=\"pl-k\">import</span> <span class=\"pl-s1\">loadtxt</span></td>\n      </tr>\n      <tr>\n        <td id=\"file-train_model_python-py-L8\" class=\"blob-num js-line-number\" data-line-number=\"8\"></td>\n        <td id=\"file-train_model_python-py-LC8\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-k\">import</span> <span class=\"pl-s1\">feather</span> <span class=\"pl-k\">as</span> <span class=\"pl-s1\">ft</span></td>\n      </tr>\n      <tr>\n        <td id=\"file-train_model_python-py-L9\" class=\"blob-num js-line-number\" data-line-number=\"9\"></td>\n        <td id=\"file-train_model_python-py-LC9\" class=\"blob-code blob-code-inner js-file-line\">\n</td>\n      </tr>\n      <tr>\n        <td id=\"file-train_model_python-py-L10\" class=\"blob-num js-line-number\" data-line-number=\"10\"></td>\n        <td id=\"file-train_model_python-py-LC10\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-k\">if</span> <span class=\"pl-en\">len</span>(<span class=\"pl-s1\">sys</span>.<span class=\"pl-s1\">argv</span>) <span class=\"pl-c1\">!=</span> <span class=\"pl-c1\">4</span>:</td>\n      </tr>\n      <tr>\n        <td id=\"file-train_model_python-py-L11\" class=\"blob-num js-line-number\" data-line-number=\"11\"></td>\n        <td id=\"file-train_model_python-py-LC11\" class=\"blob-code blob-code-inner js-file-line\">    <span class=\"pl-s1\">sys</span>.<span class=\"pl-s1\">stderr</span>.<span class=\"pl-en\">write</span>(<span class=\"pl-s\">'Arguments error. Usage:<span class=\"pl-cce\">\\n</span>'</span>)</td>\n      </tr>\n      <tr>\n        <td id=\"file-train_model_python-py-L12\" class=\"blob-num js-line-number\" data-line-number=\"12\"></td>\n        <td id=\"file-train_model_python-py-LC12\" class=\"blob-code blob-code-inner js-file-line\">    <span class=\"pl-s1\">sys</span>.<span class=\"pl-s1\">stderr</span>.<span class=\"pl-en\">write</span>(<span class=\"pl-s\">'<span class=\"pl-cce\">\\t</span>python train_model.py INPUT_MATRIX_FILE SEED OUTPUT_MODEL_FILE<span class=\"pl-cce\">\\n</span>'</span>)</td>\n      </tr>\n      <tr>\n        <td id=\"file-train_model_python-py-L13\" class=\"blob-num js-line-number\" data-line-number=\"13\"></td>\n        <td id=\"file-train_model_python-py-LC13\" class=\"blob-code blob-code-inner js-file-line\">    <span class=\"pl-s1\">sys</span>.<span class=\"pl-en\">exit</span>(<span class=\"pl-c1\">1</span>)</td>\n      </tr>\n      <tr>\n        <td id=\"file-train_model_python-py-L14\" class=\"blob-num js-line-number\" data-line-number=\"14\"></td>\n        <td id=\"file-train_model_python-py-LC14\" class=\"blob-code blob-code-inner js-file-line\">\n</td>\n      </tr>\n      <tr>\n        <td id=\"file-train_model_python-py-L15\" class=\"blob-num js-line-number\" data-line-number=\"15\"></td>\n        <td id=\"file-train_model_python-py-LC15\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s1\">input</span> <span class=\"pl-c1\">=</span> <span class=\"pl-s1\">sys</span>.<span class=\"pl-s1\">argv</span>[<span class=\"pl-c1\">1</span>]</td>\n      </tr>\n      <tr>\n        <td id=\"file-train_model_python-py-L16\" class=\"blob-num js-line-number\" data-line-number=\"16\"></td>\n        <td id=\"file-train_model_python-py-LC16\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s1\">seed</span> <span class=\"pl-c1\">=</span> <span class=\"pl-en\">int</span>(<span class=\"pl-s1\">sys</span>.<span class=\"pl-s1\">argv</span>[<span class=\"pl-c1\">2</span>])</td>\n      </tr>\n      <tr>\n        <td id=\"file-train_model_python-py-L17\" class=\"blob-num js-line-number\" data-line-number=\"17\"></td>\n        <td id=\"file-train_model_python-py-LC17\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s1\">output</span> <span class=\"pl-c1\">=</span> <span class=\"pl-s1\">sys</span>.<span class=\"pl-s1\">argv</span>[<span class=\"pl-c1\">3</span>]</td>\n      </tr>\n      <tr>\n        <td id=\"file-train_model_python-py-L18\" class=\"blob-num js-line-number\" data-line-number=\"18\"></td>\n        <td id=\"file-train_model_python-py-LC18\" class=\"blob-code blob-code-inner js-file-line\">\n</td>\n      </tr>\n      <tr>\n        <td id=\"file-train_model_python-py-L19\" class=\"blob-num js-line-number\" data-line-number=\"19\"></td>\n        <td id=\"file-train_model_python-py-LC19\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s1\">df</span> <span class=\"pl-c1\">=</span> <span class=\"pl-s1\">ft</span>.<span class=\"pl-en\">read_dataframe</span>(<span class=\"pl-s1\">input</span>)</td>\n      </tr>\n      <tr>\n        <td id=\"file-train_model_python-py-L20\" class=\"blob-num js-line-number\" data-line-number=\"20\"></td>\n        <td id=\"file-train_model_python-py-LC20\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s1\">labels</span> <span class=\"pl-c1\">=</span> <span class=\"pl-s1\">df</span>.<span class=\"pl-s1\">loc</span>[:,<span class=\"pl-s\">'label'</span>]</td>\n      </tr>\n      <tr>\n        <td id=\"file-train_model_python-py-L21\" class=\"blob-num js-line-number\" data-line-number=\"21\"></td>\n        <td id=\"file-train_model_python-py-LC21\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s1\">x</span> <span class=\"pl-c1\">=</span> <span class=\"pl-s1\">df</span>.<span class=\"pl-s1\">loc</span>[:, <span class=\"pl-s1\">df</span>.<span class=\"pl-s1\">columns</span> <span class=\"pl-c1\">!=</span> <span class=\"pl-s\">'label'</span>]</td>\n      </tr>\n      <tr>\n        <td id=\"file-train_model_python-py-L22\" class=\"blob-num js-line-number\" data-line-number=\"22\"></td>\n        <td id=\"file-train_model_python-py-LC22\" class=\"blob-code blob-code-inner js-file-line\">\n</td>\n      </tr>\n      <tr>\n        <td id=\"file-train_model_python-py-L23\" class=\"blob-num js-line-number\" data-line-number=\"23\"></td>\n        <td id=\"file-train_model_python-py-LC23\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s1\">clf</span> <span class=\"pl-c1\">=</span> <span class=\"pl-v\">RandomForestClassifier</span>(<span class=\"pl-s1\">n_estimators</span><span class=\"pl-c1\">=</span><span class=\"pl-c1\">100</span>, <span class=\"pl-s1\">n_jobs</span><span class=\"pl-c1\">=</span><span class=\"pl-c1\">2</span>, <span class=\"pl-s1\">random_state</span><span class=\"pl-c1\">=</span><span class=\"pl-s1\">seed</span>)</td>\n      </tr>\n      <tr>\n        <td id=\"file-train_model_python-py-L24\" class=\"blob-num js-line-number\" data-line-number=\"24\"></td>\n        <td id=\"file-train_model_python-py-LC24\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s1\">clf</span>.<span class=\"pl-en\">fit</span>(<span class=\"pl-s1\">x</span>, <span class=\"pl-s1\">labels</span>.<span class=\"pl-s1\">ix</span>[:,<span class=\"pl-c1\">0</span>])</td>\n      </tr>\n      <tr>\n        <td id=\"file-train_model_python-py-L25\" class=\"blob-num js-line-number\" data-line-number=\"25\"></td>\n        <td id=\"file-train_model_python-py-LC25\" class=\"blob-code blob-code-inner js-file-line\">\n</td>\n      </tr>\n      <tr>\n        <td id=\"file-train_model_python-py-L26\" class=\"blob-num js-line-number\" data-line-number=\"26\"></td>\n        <td id=\"file-train_model_python-py-LC26\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-k\">with</span> <span class=\"pl-en\">open</span>(<span class=\"pl-s1\">output</span>, <span class=\"pl-s\">'wb'</span>) <span class=\"pl-k\">as</span> <span class=\"pl-s1\">fd</span>:</td>\n      </tr>\n      <tr>\n        <td id=\"file-train_model_python-py-L27\" class=\"blob-num js-line-number\" data-line-number=\"27\"></td>\n        <td id=\"file-train_model_python-py-LC27\" class=\"blob-code blob-code-inner js-file-line\">    <span class=\"pl-s1\">pickle</span>.<span class=\"pl-en\">dump</span>(<span class=\"pl-s1\">clf</span>, <span class=\"pl-s1\">fd</span>)</td>\n      </tr>\n</tbody></table>\n\n\n  </div>\n\n  </div>\n</div>\n\n      </div>\n      <div class=\"gist-meta\">\n        <a href=\"https://gist.github.com/Zoldin/b312897cc492608feef1eaeae7f6eabc/raw/8dad0f69067945b9b84f8d90a8cdbe52694e36f8/train_model_Python.py\" style=\"float:right\">view raw</a>\n        <a href=\"https://gist.github.com/Zoldin/b312897cc492608feef1eaeae7f6eabc#file-train_model_python-py\">train_model_Python.py</a>\n        hosted with ❤ by <a href=\"https://github.com\">GitHub</a>\n      </div>\n    </div>\n</div></body></html></p>\n<p>Also here we are adding code for <html><head></head><body><code class=\"language-text\">evaluation_python_model.py</code></body></html>:</p>\n<p><html><head></head><body><div id=\"gist73527649\" class=\"gist\">\n    <div class=\"gist-file\">\n      <div class=\"gist-data\">\n        <div class=\"js-gist-file-update-container js-task-list-container file-box\">\n  <div id=\"file-evaluation_python_model-py\" class=\"file\">\n    \n\n  <div itemprop=\"text\" class=\"Box-body p-0 blob-wrapper data type-python\">\n      \n<table class=\"highlight tab-size js-file-line-container\" data-tab-size=\"8\">\n      <tbody><tr>\n        <td id=\"file-evaluation_python_model-py-L1\" class=\"blob-num js-line-number\" data-line-number=\"1\"></td>\n        <td id=\"file-evaluation_python_model-py-LC1\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-k\">from</span> <span class=\"pl-s1\">sklearn</span>.<span class=\"pl-s1\">metrics</span> <span class=\"pl-k\">import</span> <span class=\"pl-s1\">precision_recall_curve</span></td>\n      </tr>\n      <tr>\n        <td id=\"file-evaluation_python_model-py-L2\" class=\"blob-num js-line-number\" data-line-number=\"2\"></td>\n        <td id=\"file-evaluation_python_model-py-LC2\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-k\">import</span> <span class=\"pl-s1\">sys</span></td>\n      </tr>\n      <tr>\n        <td id=\"file-evaluation_python_model-py-L3\" class=\"blob-num js-line-number\" data-line-number=\"3\"></td>\n        <td id=\"file-evaluation_python_model-py-LC3\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-k\">import</span> <span class=\"pl-s1\">sklearn</span>.<span class=\"pl-s1\">metrics</span> <span class=\"pl-k\">as</span> <span class=\"pl-s1\">metrics</span></td>\n      </tr>\n      <tr>\n        <td id=\"file-evaluation_python_model-py-L4\" class=\"blob-num js-line-number\" data-line-number=\"4\"></td>\n        <td id=\"file-evaluation_python_model-py-LC4\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-k\">from</span> <span class=\"pl-s1\">scipy</span> <span class=\"pl-k\">import</span> <span class=\"pl-s1\">sparse</span></td>\n      </tr>\n      <tr>\n        <td id=\"file-evaluation_python_model-py-L5\" class=\"blob-num js-line-number\" data-line-number=\"5\"></td>\n        <td id=\"file-evaluation_python_model-py-LC5\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-k\">from</span> <span class=\"pl-s1\">numpy</span> <span class=\"pl-k\">import</span> <span class=\"pl-s1\">loadtxt</span></td>\n      </tr>\n      <tr>\n        <td id=\"file-evaluation_python_model-py-L6\" class=\"blob-num js-line-number\" data-line-number=\"6\"></td>\n        <td id=\"file-evaluation_python_model-py-LC6\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-k\">try</span>: <span class=\"pl-k\">import</span> <span class=\"pl-s1\">cPickle</span> <span class=\"pl-k\">as</span> <span class=\"pl-s1\">pickle</span>   <span class=\"pl-c\"># python2</span></td>\n      </tr>\n      <tr>\n        <td id=\"file-evaluation_python_model-py-L7\" class=\"blob-num js-line-number\" data-line-number=\"7\"></td>\n        <td id=\"file-evaluation_python_model-py-LC7\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-k\">except</span>: <span class=\"pl-k\">import</span> <span class=\"pl-s1\">pickle</span>           <span class=\"pl-c\"># python3</span></td>\n      </tr>\n      <tr>\n        <td id=\"file-evaluation_python_model-py-L8\" class=\"blob-num js-line-number\" data-line-number=\"8\"></td>\n        <td id=\"file-evaluation_python_model-py-LC8\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-k\">import</span> <span class=\"pl-s1\">feather</span> <span class=\"pl-k\">as</span> <span class=\"pl-s1\">ft</span></td>\n      </tr>\n      <tr>\n        <td id=\"file-evaluation_python_model-py-L9\" class=\"blob-num js-line-number\" data-line-number=\"9\"></td>\n        <td id=\"file-evaluation_python_model-py-LC9\" class=\"blob-code blob-code-inner js-file-line\">\n</td>\n      </tr>\n      <tr>\n        <td id=\"file-evaluation_python_model-py-L10\" class=\"blob-num js-line-number\" data-line-number=\"10\"></td>\n        <td id=\"file-evaluation_python_model-py-LC10\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-k\">if</span> <span class=\"pl-en\">len</span>(<span class=\"pl-s1\">sys</span>.<span class=\"pl-s1\">argv</span>) <span class=\"pl-c1\">!=</span> <span class=\"pl-c1\">4</span>:</td>\n      </tr>\n      <tr>\n        <td id=\"file-evaluation_python_model-py-L11\" class=\"blob-num js-line-number\" data-line-number=\"11\"></td>\n        <td id=\"file-evaluation_python_model-py-LC11\" class=\"blob-code blob-code-inner js-file-line\">    <span class=\"pl-s1\">sys</span>.<span class=\"pl-s1\">stderr</span>.<span class=\"pl-en\">write</span>(<span class=\"pl-s\">'Arguments error. Usage:<span class=\"pl-cce\">\\n</span>'</span>)</td>\n      </tr>\n      <tr>\n        <td id=\"file-evaluation_python_model-py-L12\" class=\"blob-num js-line-number\" data-line-number=\"12\"></td>\n        <td id=\"file-evaluation_python_model-py-LC12\" class=\"blob-code blob-code-inner js-file-line\">    <span class=\"pl-s1\">sys</span>.<span class=\"pl-s1\">stderr</span>.<span class=\"pl-en\">write</span>(<span class=\"pl-s\">'<span class=\"pl-cce\">\\t</span>python metrics.py MODEL_FILE TEST_MATRIX METRICS_FILE<span class=\"pl-cce\">\\n</span>'</span>)</td>\n      </tr>\n      <tr>\n        <td id=\"file-evaluation_python_model-py-L13\" class=\"blob-num js-line-number\" data-line-number=\"13\"></td>\n        <td id=\"file-evaluation_python_model-py-LC13\" class=\"blob-code blob-code-inner js-file-line\">    <span class=\"pl-s1\">sys</span>.<span class=\"pl-en\">exit</span>(<span class=\"pl-c1\">1</span>)</td>\n      </tr>\n      <tr>\n        <td id=\"file-evaluation_python_model-py-L14\" class=\"blob-num js-line-number\" data-line-number=\"14\"></td>\n        <td id=\"file-evaluation_python_model-py-LC14\" class=\"blob-code blob-code-inner js-file-line\">\n</td>\n      </tr>\n      <tr>\n        <td id=\"file-evaluation_python_model-py-L15\" class=\"blob-num js-line-number\" data-line-number=\"15\"></td>\n        <td id=\"file-evaluation_python_model-py-LC15\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s1\">model_file</span> <span class=\"pl-c1\">=</span> <span class=\"pl-s1\">sys</span>.<span class=\"pl-s1\">argv</span>[<span class=\"pl-c1\">1</span>]</td>\n      </tr>\n      <tr>\n        <td id=\"file-evaluation_python_model-py-L16\" class=\"blob-num js-line-number\" data-line-number=\"16\"></td>\n        <td id=\"file-evaluation_python_model-py-LC16\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s1\">test_matrix_file</span> <span class=\"pl-c1\">=</span> <span class=\"pl-s1\">sys</span>.<span class=\"pl-s1\">argv</span>[<span class=\"pl-c1\">2</span>]</td>\n      </tr>\n      <tr>\n        <td id=\"file-evaluation_python_model-py-L17\" class=\"blob-num js-line-number\" data-line-number=\"17\"></td>\n        <td id=\"file-evaluation_python_model-py-LC17\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s1\">metrics_file</span> <span class=\"pl-c1\">=</span> <span class=\"pl-s1\">sys</span>.<span class=\"pl-s1\">argv</span>[<span class=\"pl-c1\">3</span>]</td>\n      </tr>\n      <tr>\n        <td id=\"file-evaluation_python_model-py-L18\" class=\"blob-num js-line-number\" data-line-number=\"18\"></td>\n        <td id=\"file-evaluation_python_model-py-LC18\" class=\"blob-code blob-code-inner js-file-line\">\n</td>\n      </tr>\n      <tr>\n        <td id=\"file-evaluation_python_model-py-L19\" class=\"blob-num js-line-number\" data-line-number=\"19\"></td>\n        <td id=\"file-evaluation_python_model-py-LC19\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-k\">with</span> <span class=\"pl-en\">open</span>(<span class=\"pl-s1\">model_file</span>, <span class=\"pl-s\">'rb'</span>) <span class=\"pl-k\">as</span> <span class=\"pl-s1\">fd</span>:</td>\n      </tr>\n      <tr>\n        <td id=\"file-evaluation_python_model-py-L20\" class=\"blob-num js-line-number\" data-line-number=\"20\"></td>\n        <td id=\"file-evaluation_python_model-py-LC20\" class=\"blob-code blob-code-inner js-file-line\">    <span class=\"pl-s1\">model</span> <span class=\"pl-c1\">=</span> <span class=\"pl-s1\">pickle</span>.<span class=\"pl-en\">load</span>(<span class=\"pl-s1\">fd</span>)</td>\n      </tr>\n      <tr>\n        <td id=\"file-evaluation_python_model-py-L21\" class=\"blob-num js-line-number\" data-line-number=\"21\"></td>\n        <td id=\"file-evaluation_python_model-py-LC21\" class=\"blob-code blob-code-inner js-file-line\">\n</td>\n      </tr>\n      <tr>\n        <td id=\"file-evaluation_python_model-py-L22\" class=\"blob-num js-line-number\" data-line-number=\"22\"></td>\n        <td id=\"file-evaluation_python_model-py-LC22\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s1\">df</span> <span class=\"pl-c1\">=</span> <span class=\"pl-s1\">ft</span>.<span class=\"pl-en\">read_dataframe</span>(<span class=\"pl-s1\">test_matrix_file</span>)</td>\n      </tr>\n      <tr>\n        <td id=\"file-evaluation_python_model-py-L23\" class=\"blob-num js-line-number\" data-line-number=\"23\"></td>\n        <td id=\"file-evaluation_python_model-py-LC23\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s1\">labels</span> <span class=\"pl-c1\">=</span> <span class=\"pl-s1\">df</span>.<span class=\"pl-s1\">loc</span>[:,<span class=\"pl-s\">'label'</span>]</td>\n      </tr>\n      <tr>\n        <td id=\"file-evaluation_python_model-py-L24\" class=\"blob-num js-line-number\" data-line-number=\"24\"></td>\n        <td id=\"file-evaluation_python_model-py-LC24\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s1\">x</span> <span class=\"pl-c1\">=</span> <span class=\"pl-s1\">df</span>.<span class=\"pl-s1\">loc</span>[:, <span class=\"pl-s1\">df</span>.<span class=\"pl-s1\">columns</span> <span class=\"pl-c1\">!=</span> <span class=\"pl-s\">'label'</span>]</td>\n      </tr>\n      <tr>\n        <td id=\"file-evaluation_python_model-py-L25\" class=\"blob-num js-line-number\" data-line-number=\"25\"></td>\n        <td id=\"file-evaluation_python_model-py-LC25\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s1\">predictions_by_class</span> <span class=\"pl-c1\">=</span> <span class=\"pl-s1\">model</span>.<span class=\"pl-en\">predict_proba</span>(<span class=\"pl-s1\">x</span>)</td>\n      </tr>\n      <tr>\n        <td id=\"file-evaluation_python_model-py-L26\" class=\"blob-num js-line-number\" data-line-number=\"26\"></td>\n        <td id=\"file-evaluation_python_model-py-LC26\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s1\">predictions</span> <span class=\"pl-c1\">=</span> <span class=\"pl-s1\">predictions_by_class</span>[:,<span class=\"pl-c1\">1</span>]</td>\n      </tr>\n      <tr>\n        <td id=\"file-evaluation_python_model-py-L27\" class=\"blob-num js-line-number\" data-line-number=\"27\"></td>\n        <td id=\"file-evaluation_python_model-py-LC27\" class=\"blob-code blob-code-inner js-file-line\">\n</td>\n      </tr>\n      <tr>\n        <td id=\"file-evaluation_python_model-py-L28\" class=\"blob-num js-line-number\" data-line-number=\"28\"></td>\n        <td id=\"file-evaluation_python_model-py-LC28\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s1\">precision</span>, <span class=\"pl-s1\">recall</span>, <span class=\"pl-s1\">thresholds</span> <span class=\"pl-c1\">=</span> <span class=\"pl-en\">precision_recall_curve</span>(<span class=\"pl-s1\">labels</span>.<span class=\"pl-s1\">ix</span>[:,<span class=\"pl-c1\">0</span>], <span class=\"pl-s1\">predictions</span>)</td>\n      </tr>\n      <tr>\n        <td id=\"file-evaluation_python_model-py-L29\" class=\"blob-num js-line-number\" data-line-number=\"29\"></td>\n        <td id=\"file-evaluation_python_model-py-LC29\" class=\"blob-code blob-code-inner js-file-line\">\n</td>\n      </tr>\n      <tr>\n        <td id=\"file-evaluation_python_model-py-L30\" class=\"blob-num js-line-number\" data-line-number=\"30\"></td>\n        <td id=\"file-evaluation_python_model-py-LC30\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-s1\">auc</span> <span class=\"pl-c1\">=</span> <span class=\"pl-s1\">metrics</span>.<span class=\"pl-en\">auc</span>(<span class=\"pl-s1\">recall</span>, <span class=\"pl-s1\">precision</span>)</td>\n      </tr>\n      <tr>\n        <td id=\"file-evaluation_python_model-py-L31\" class=\"blob-num js-line-number\" data-line-number=\"31\"></td>\n        <td id=\"file-evaluation_python_model-py-LC31\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-c\">#print('AUC={}'.format(metrics.auc(recall, precision)))</span></td>\n      </tr>\n      <tr>\n        <td id=\"file-evaluation_python_model-py-L32\" class=\"blob-num js-line-number\" data-line-number=\"32\"></td>\n        <td id=\"file-evaluation_python_model-py-LC32\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-k\">with</span> <span class=\"pl-en\">open</span>(<span class=\"pl-s1\">metrics_file</span>, <span class=\"pl-s\">'w'</span>) <span class=\"pl-k\">as</span> <span class=\"pl-s1\">fd</span>:</td>\n      </tr>\n      <tr>\n        <td id=\"file-evaluation_python_model-py-L33\" class=\"blob-num js-line-number\" data-line-number=\"33\"></td>\n        <td id=\"file-evaluation_python_model-py-LC33\" class=\"blob-code blob-code-inner js-file-line\">    <span class=\"pl-s1\">fd</span>.<span class=\"pl-en\">write</span>(<span class=\"pl-s\">'AUC: {:4f}<span class=\"pl-cce\">\\n</span>'</span>.<span class=\"pl-en\">format</span>(<span class=\"pl-s1\">auc</span>))</td>\n      </tr>\n</tbody></table>\n\n\n  </div>\n\n  </div>\n</div>\n\n      </div>\n      <div class=\"gist-meta\">\n        <a href=\"https://gist.github.com/Zoldin/9eef13632d0a9039fe9b0dba376516a4/raw/8b8837f0d5640e0c208ea1c4910d655d933b9bd0/evaluation_python_model.py\" style=\"float:right\">view raw</a>\n        <a href=\"https://gist.github.com/Zoldin/9eef13632d0a9039fe9b0dba376516a4#file-evaluation_python_model-py\">evaluation_python_model.py</a>\n        hosted with ❤ by <a href=\"https://github.com\">GitHub</a>\n      </div>\n    </div>\n</div></body></html></p>\n<p>Let’s download necessary R and Python codes from above (clone the\n<a href=\"https://github.com/Zoldin/R_AND_DVC\">Github</a> repository):</p>\n<html><head></head><body><div class=\"gatsby-highlight\" data-language=\"dvc\"><pre class=\"language-dvc\"><code class=\"language-dvc\"><span class=\"token line\"><span class=\"token input\">$ </span><span class=\"token command\">mkdir</span> R_DVC_GITHUB_CODE\n</span><span class=\"token line\"><span class=\"token input\">$ </span><span class=\"token command\">cd</span> R_DVC_GITHUB_CODE\n</span>\n<span class=\"token line\"><span class=\"token input\">$ </span><span class=\"token git\">git clone</span> https://github.com/Zoldin/R_AND_DVC</span></code></pre></div></body></html>\n<p>Our dependency graph of this data science project look like this:</p>\n<p><html><head></head><body><span class=\"gatsby-resp-image-wrapper\" style=\"position: relative; display: block; margin-left: auto; margin-right: auto;  max-width: 250.5px;\">\n      <a class=\"gatsby-resp-image-link\" href=\"/static/fbd7192868b16c9a421107083e2dd45b/f55b8/our-dependency-graph.png\" style=\"display: block\" target=\"_blank\" rel=\"noopener\">\n    <span class=\"gatsby-resp-image-background-image\" style=\"padding-bottom: 199.20159680638722%; position: relative; bottom: 0; left: 0; background-image: url(&#x27;data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAAoCAYAAAD+MdrbAAAACXBIWXMAAAsSAAALEgHS3X78AAAIoklEQVRIx4VWC1BTZxb+gygBIuEVEgh5kgd5kZvcPCDEEAIFAmJEoAK2dSoCCQ+LqAUF5SEPFRBBIAUlylbXF9aKb1EEJIC0VWtbca2P1VnbnZ3W2c7YnZ3dWe+GewVLDfbPnPn+nP+c757z/+f89wLwm1HYOYZifGsLiiHN77pzOqIJar2Sp9YauRq1XhC2ZoWP8MAKgqqmzMXeiwDN7s2orbl1GDgdM6ToKAA4/73ePoStgCir45FYZf7+lEaCr2+Ht+eMyXun2oEue9/bySwddnyh1U4psd4klbZ973f0GUIwZFmgzC224L4HiOfH1im/wk47ucg67jNtX/P5Q2BpG3mTsOgVYVHHWMQ663hokdUeUvjJKOeDLTZFmrlBn1HcpskobtWaWy6HFndPMvP22qNyW4bQaM17ht4kLGi3Y5F2jGnMbdcZea0jQfmd40HLs6tV6TnVmpQ1VdqU3AY4u/EyPb99jJzTel2bu2eIMO2T13LtTUKHM7bBHWOEgs5xRkHnGN3SPsKoHERIWVu6hea2Qcbm/p/JBdYxOrpuHafErWlHfXJ2X3O+jzOkvx9uAPg5wNXZ2uq6887JohsxsviWPhzwQEBiR7V3wpFaogI20HTRyUJd9DLWkvJt+OSebeTEljOu+p13cLk91ajPssbrr4kSEhJQ/Nohhk3WOQ9J7FKS9MfciDG7ggMM7X5+CQcFpKKTIveZ9YyNtSDNqEfnJpPptWN8fDwACILOqTVfenJrxySc+ls8RvNTNtf2LyZrVb2BVXISYvX8m8Gq/5bDqx0T8+pvMGb8TWtLXhE5SIzvm9G5Jt2yYCAOuEoqL0H8XTf58py9ZHXaBpViRYlMnrYJUpnWRahNhUskWdUCfscPhLDtw4b0LR0+9Urggkb70YcYKc+0DEUSlebLdAe+gs4nwcwau4JVep7Pz7epuAUHFdz8Awq+eb9akGvVsEtOhgVXjnIFe6ZgsTaGRAnwCZj2F6rluNk0l/Dk3nK2iAjReb7hnHB3Yvm4J7X1rpfnMcTd5/D/PKZlcR/iju9H8JTOhwRSw73FkugiH2lgKBHmKPwi2fEeKJHRdsbN2H40RJf0LkeTlB6iTc7k6FNX09O6jtM3Hz2w0Fk17NgX6GLab6bpTDp6eGIsV5P0Toh2WRwtsbuYiZ3y/j6i7sxVf8PhE/5Rtk9JhlODAfE9ZxajizZHBp87ZL1DSh1y7FVGXSUETd8aUvTx9/wTDq0iRZ9aRUq05RMBOJ03++TlALhkm0yLEQ7AIRDAfV8qc31cInSdsIgXTq4PX/RtiWLhZL544YONIldkOXYQf05NxxfpzYvmpJDQfRzorL0A2ncBIDYB4NeP+HIaxgI5dXYKu9ZOYd1FFlOMZoi6ajuXdQUhTus59aMUbsNIEK9hyHuaI9KajzP2WF6TchomUWTWTXiL68bUwppRJr9+ki0uPSWActoixFk1eunqHTppTlt4aMUAV1A3wRbW2lmsqlEDuWJkwRsbzam8giJj1zd+4oYJJX/7hF/Ijjtk4cY+GpSxVSlbuVXhEGVYZrWMXXY5iFX7BTmkZtyfse16DG3rMNrjyHTFTFDHQbpsJU6mNQGVXOWSYDTixaUn2KLNp0WiLf0CYcW5UH7TLZbo46NiqPYqX9Z6hy3fPhAqrzovlFddEMN119Bu0ehMuGR45dwoSV7+3jQKleysVP4LHAflRC/xcwuco5gKmlqQCWUxKBGsQJaKQw5RC6jBagEzFV5LQ4QIDmC5zPE5AT1bIFNoqUFaMTlQwqazlEJGqFrCyJCt8QbPKc8XDXKuKQ6GHeL2Sj4N6YEOcvdLDwqGhCMQQkfQvekLPwmSkhLBdTV2Rb0IfuHdH3pW2h72Cdcm6WV3Sru5B6Be8W3+N3wgyzqCBUFD8Iga8RiX3iAiXMQNVa6KBgiMRTc49RjFHxk/YvY+CA7xQxbeXnrby5HJIoSFOOkqL+ADFgH8b1VdYGJ2/mXwVyj+Svt1VhdVqp+1D6fGAHBW9pfIU/I7okH4mbCC3xHbLDoWeQa+JzkN3ZVelD2STBt2466gJSEkC8HLwJeo8zUD9g5Zl7fOozmr2XFgCHgJOdYOw6OWj1RVS6oUHfrNqpbYMrg5dqOiwViuaItzPCgVTc/xi1+eip2qDLudLRbLDHo4BDcb8lXlE+Ul1X3BVdUj4WnlLYlNdlFhk1+Cu+FzEb2yQTkgA2IogHAfIgUgoIKCOcY63oz5ZueEfzQYdDYnhMchzPzHb8OuvaK8dc4JBxSPXC6q7uEuKO/h2pM+Q28QQX0Yib6HTYuHjSSdPJa/RJpI82+keJMbA1ni3TLU2ViWOH+EYdIwFAN3BWHYRMUHNlI5/rsDuCGNHFHwTrowoInMozQGole92253kLshb4bQ/a0pO8jm/M8ozvRI2ZAyWxqeeUTwCP4JFGYUv44wfx5CShPWlrhuAJQVGtQoJcfo+v7aLM+UwmR8as5yr/g0g0dUSqR79Aqte0yK3j0rM4tQ01r59oOYgv+D4k8CRzeIEBdH56D3nZyqYqcyPiBO1yQS5tDDiMtMWc07LkX+FSunyB/wA6qHuj+pBzUH1APakxFfKavDrEu74LPaI+H28C712cjT6juGYeUzNtqayqfO0x4Mf4rijYhfvC7Iv4svgxp05dCu6HJpk65SvidqB9Sl2ipt0a2XV+r3yc4bJ+GfxdP2w/A8hEeS7qHYZBrEXYx5Sh2J+ydtOO558JDhefBg+jP2wNK/MYdinwfb415Qr7zzd2Z/zH20Rs/FPpo/7ZvCJ84XmGC6bFx+r74v+mV+sn7DAwAOrcc+Iv+B4GBjqr+RDFFN4igqzIZoOn4ELTlIQY4RRZGE1c3YV1h7Mrikf/yWdmvZib24vvjMhbqtXMdcvymJbSlSyrPzObxss4RZUqqjl1ckBTXVop1A2VmNG76Lpfx/QzntZYIry2kAAAAASUVORK5CYII=&#x27;); background-size: cover; display: block;\"></span>\n  <picture>\n        <source srcset=\"/static/fbd7192868b16c9a421107083e2dd45b/c54d4/our-dependency-graph.webp 175w, /static/fbd7192868b16c9a421107083e2dd45b/a3432/our-dependency-graph.webp 350w, /static/fbd7192868b16c9a421107083e2dd45b/52d01/our-dependency-graph.webp 501w\" sizes=\"(max-width: 501px) 100vw, 501px\" type=\"image/webp\">\n        <source srcset=\"/static/fbd7192868b16c9a421107083e2dd45b/17006/our-dependency-graph.png 175w, /static/fbd7192868b16c9a421107083e2dd45b/d6f3f/our-dependency-graph.png 350w, /static/fbd7192868b16c9a421107083e2dd45b/f55b8/our-dependency-graph.png 501w\" sizes=\"(max-width: 501px) 100vw, 501px\" type=\"image/png\">\n        <img class=\"gatsby-resp-image-image\" src=\"/static/fbd7192868b16c9a421107083e2dd45b/f55b8/our-dependency-graph.png\" alt=\"R (marked red) and Python (marked pink) jobs in one project\" title=\"R (marked red) and Python (marked pink) jobs in one project\" loading=\"lazy\" style=\"width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;\">\n      </picture>\n  </a>\n    </span></body></html><em>R\n(marked red) and Python (marked pink) jobs in one project</em></p>\n<p>Now lets see how it is possible to speed up and simplify process flow with\nFeather API and data version control reproducibility.</p>\n<h2>Feather API</h2>\n<p>Feather API is designed to improve meta data and data interchange between R and\nPython. It provides fast import/export of data frames among both environments\nand keeps meta data information which is an improvement over data exchange via\ncsv/txt file format. In our example Python job will read an input binary file\nthat was produced in R with Feather api.</p>\n<p>Let’s install Feather library in both environments.</p>\n<p>For Python 3 on linux environment you can use cmd and pip3:</p>\n<html><head></head><body><div class=\"gatsby-highlight\" data-language=\"dvc\"><pre class=\"language-dvc\"><code class=\"language-dvc\"><span class=\"token line\"><span class=\"token input\">$ </span><span class=\"token command\">sudo</span> pip3 <span class=\"token function\">install</span> feather-format</span></code></pre></div></body></html>\n<p>For R it is necessary to install feather package:</p>\n<html><head></head><body><div class=\"gatsby-highlight\" data-language=\"r\"><pre class=\"language-r\"><code class=\"language-r\">install.packages<span class=\"token punctuation\">(</span>feather<span class=\"token punctuation\">)</span></code></pre></div></body></html>\n<p>After successful installation we can use Feather for data exchange.</p>\n<p>Below is an R syntax for data frame export with Feather (featurization.R):</p>\n<html><head></head><body><div class=\"gatsby-highlight\" data-language=\"r\"><pre class=\"language-r\"><code class=\"language-r\">library<span class=\"token punctuation\">(</span>feather<span class=\"token punctuation\">)</span>\n\nwrite_feather<span class=\"token punctuation\">(</span>dtm_train_tfidf<span class=\"token punctuation\">,</span>args<span class=\"token punctuation\">[</span><span class=\"token number\">3</span><span class=\"token punctuation\">]</span><span class=\"token punctuation\">)</span>\nwrite_feather<span class=\"token punctuation\">(</span>dtm_test_tfidf<span class=\"token punctuation\">,</span>args<span class=\"token punctuation\">[</span><span class=\"token number\">4</span><span class=\"token punctuation\">]</span><span class=\"token punctuation\">)</span>\nprint<span class=\"token punctuation\">(</span><span class=\"token string\">\"Two data frame were created with Feather - one for train and one for test data set\"</span><span class=\"token punctuation\">)</span></code></pre></div></body></html>\n<p>Python syntax for reading feather input binary files (train<em>model</em>python.py):</p>\n<html><head></head><body><div class=\"gatsby-highlight\" data-language=\"python\"><pre class=\"language-python\"><code class=\"language-python\"><span class=\"token keyword\">import</span> feather <span class=\"token keyword\">as</span> ft\n\n<span class=\"token builtin\">input</span> <span class=\"token operator\">=</span> sys<span class=\"token punctuation\">.</span>argv<span class=\"token punctuation\">[</span><span class=\"token number\">1</span><span class=\"token punctuation\">]</span>\ndf <span class=\"token operator\">=</span> ft<span class=\"token punctuation\">.</span>read_dataframe<span class=\"token punctuation\">(</span><span class=\"token builtin\">input</span><span class=\"token punctuation\">)</span></code></pre></div></body></html>\n<h2>Dependency graph with R and Python combined</h2>\n<p>The next question what we are asking ourselves is why do we need DVC, why not\njust use shell scripting? DVC automatically derives the dependencies between the\nsteps and builds\n<a href=\"https://en.wikipedia.org/wiki/Directed_acyclic_graph\">the dependency graph (DAG)</a>\ntransparently to the user. Graph is used for reproducing parts/codes of your\npipeline which were affected by recent changes and we don’t have to think all\nthe time what we need to repeat (which steps) with the latest changes.</p>\n<p>Firstly, with <html><head></head><body><code class=\"language-text\">dvc run</code></body></html> command we will execute all jobs that are related to our\nmodel development. In that phase DVC creates dependencies that will be used in\nthe reproducibility phase:</p>\n<html><head></head><body><div class=\"gatsby-highlight\" data-language=\"dvc\"><pre class=\"language-dvc\"><code class=\"language-dvc\"><span class=\"token line\"><span class=\"token input\">$ </span><span class=\"token dvc\">dvc import</span> https://s3-us-west-2.amazonaws.com/dvc-share/so/25K/Posts.xml.tgz <span class=\"token punctuation\">\\</span>\n            data/\n</span>\n<span class=\"token line\"><span class=\"token input\">$ </span><span class=\"token dvc\">dvc run</span> <span class=\"token function\">tar</span> zxf data/Posts.xml.tgz -C data/\n</span>\n<span class=\"token line\"><span class=\"token input\">$ </span><span class=\"token dvc\">dvc run</span> Rscript code/parsingxml.R <span class=\"token punctuation\">\\</span>\n                  data/Posts.xml data/Posts.csv\n</span>\n<span class=\"token line\"><span class=\"token input\">$ </span><span class=\"token dvc\">dvc run</span> Rscript code/train_test_spliting.R <span class=\"token punctuation\">\\</span>\n                  data/Posts.csv <span class=\"token number\">0.33</span> <span class=\"token number\">20170426</span> <span class=\"token punctuation\">\\</span>\n                  data/train_post.csv data/test_post.csv\n</span>\n<span class=\"token line\"><span class=\"token input\">$ </span><span class=\"token dvc\">dvc run</span> Rscript code/featurization.R <span class=\"token punctuation\">\\</span>\n                  data/train_post.csv <span class=\"token punctuation\">\\</span>\n                  data/test_post.csv data/matrix_train.feather <span class=\"token punctuation\">\\</span>\n                  data/matrix_test.feather\n</span>\n<span class=\"token line\"><span class=\"token input\">$ </span><span class=\"token dvc\">dvc run</span> python3 code/train_model_python.py <span class=\"token punctuation\">\\</span>\n                  data/matrix_train.feather <span class=\"token punctuation\">\\</span>\n                  <span class=\"token number\">20170426</span> data/model.p\n</span>\n<span class=\"token line\"><span class=\"token input\">$ </span><span class=\"token dvc\">dvc run</span> python3 code/evaluate_python_mdl.py <span class=\"token punctuation\">\\</span>\n                  data/model.p data/matrix_test.feather <span class=\"token punctuation\">\\</span>\n                  data/evaluation_python.txt</span></code></pre></div></body></html>\n<p>After this commands jobs are executed and included in DAG graph. Result (AUC\nmetrics) is written in evaluation_python.txt file:</p>\n<html><head></head><body><div class=\"gatsby-highlight\" data-language=\"dvc\"><pre class=\"language-dvc\"><code class=\"language-dvc\"><span class=\"token line\"><span class=\"token input\">$ </span><span class=\"token command\">cat</span> data/evaluation_python.txt\n</span>AUC: 0.741432</code></pre></div></body></html>\n<p>It is possible to improve our result with random forest algorithm.</p>\n<p>We can increase number of trees in the random forest classifier — from 100 to\n500:</p>\n<html><head></head><body><div class=\"gatsby-highlight\" data-language=\"python\"><pre class=\"language-python\"><code class=\"language-python\">clf <span class=\"token operator\">=</span> RandomForestClassifier<span class=\"token punctuation\">(</span>n_estimators<span class=\"token operator\">=</span><span class=\"token number\">500</span><span class=\"token punctuation\">,</span>\n                             n_jobs<span class=\"token operator\">=</span><span class=\"token number\">2</span><span class=\"token punctuation\">,</span>\n                             random_state<span class=\"token operator\">=</span>seed<span class=\"token punctuation\">)</span>\nclf<span class=\"token punctuation\">.</span>fit<span class=\"token punctuation\">(</span>x<span class=\"token punctuation\">,</span> labels<span class=\"token punctuation\">)</span></code></pre></div></body></html>\n<p>After commited changes (in <html><head></head><body><code class=\"language-text\">train_model_python.py</code></body></html>) with <html><head></head><body><code class=\"language-text\">dvc repro</code></body></html> command all\nnecessary jobs for <html><head></head><body><code class=\"language-text\">evaluation_python.txt</code></body></html> reproduction will be re-executed. We\ndon’t need to worry which jobs to run and in which order.</p>\n<html><head></head><body><div class=\"gatsby-highlight\" data-language=\"dvc\"><pre class=\"language-dvc\"><code class=\"language-dvc\"><span class=\"token line\"><span class=\"token input\">$ </span><span class=\"token git\">git add</span> <span class=\"token builtin class-name\">.</span>\n</span><span class=\"token line\"><span class=\"token input\">$ </span><span class=\"token git\">git commit</span>\n</span>[master a65f346] Random forest classifier — more trees added\n    1 file changed, 1 insertion(+), 1 deletion(-)\n\n<span class=\"token line\"><span class=\"token input\">$ </span><span class=\"token dvc\">dvc repro</span> data/evaluation_python.txt\n</span>\nReproducing run command for data item data/model.p. Args: python3 code/train_model_python.py data/matrix_train.txt 20170426 data/model.p\nReproducing run command for data item data/evaluation_python.txt. Args: python3 code/evaluate_python_mdl.py data/model.p data/matrix_test.txt data/evaluation_python.txt\nData item “data/evaluation_python.txt” was reproduced.</code></pre></div></body></html>\n<p>Beside code versioning, DVC also cares about data versioning. For example, if we\nchange data sets <html><head></head><body><code class=\"language-text\">train_post.csv</code></body></html> and <html><head></head><body><code class=\"language-text\">test_post.csv</code></body></html> (use different splitting\nratio) DVC will know that data sets are changed and <html><head></head><body><code class=\"language-text\">dvc repro</code></body></html> will re-execute\nall necessary jobs for evaluation_python.txt.</p>\n<html><head></head><body><div class=\"gatsby-highlight\" data-language=\"dvc\"><pre class=\"language-dvc\"><code class=\"language-dvc\"><span class=\"token line\"><span class=\"token input\">$ </span><span class=\"token dvc\">dvc run</span> Rscript code/train_test_spliting.R <span class=\"token punctuation\">\\</span>\n                  data/Posts.csv <span class=\"token number\">0.15</span> <span class=\"token number\">20170426</span> <span class=\"token punctuation\">\\</span>\n                  data/train_post.csv <span class=\"token punctuation\">\\</span>\n                  data/test_post.csv</span></code></pre></div></body></html>\n<p>Re-executed jobs are marked with red color:</p>\n<p><html><head></head><body><span class=\"gatsby-resp-image-wrapper\" style=\"position: relative; display: block; margin-left: auto; margin-right: auto;  max-width: 250.5px;\">\n      <a class=\"gatsby-resp-image-link\" href=\"/static/10053d985ed8b13cfb9b560ee5d2cc37/f55b8/re-executed-jobs.png\" style=\"display: block\" target=\"_blank\" rel=\"noopener\">\n    <span class=\"gatsby-resp-image-background-image\" style=\"padding-bottom: 199.4011976047904%; position: relative; bottom: 0; left: 0; background-image: url(&#x27;data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAAoCAYAAAD+MdrbAAAACXBIWXMAAAsSAAALEgHS3X78AAAIcklEQVRIx4VWC1ST5xn+QkQTQiCX/89/SfLnnj/kBoSEawgJFwuIgSoWBXRFKwID6rzNa6LobG0rVkFRQFartuK9R+tWnVOHyFad7qw9azs70bVunR5PndOeVYVvyR+iUmL3nvOd9/0uz/O93/u97/f/ADwjTR0DjC7a/DajNRvLufqtubFpboc+3Vmky0zPSbDMniI09kyNTVvjizrSAwHd0sysLfP/CkSUpm0XnnbmAxbSJhTGrgLx1hYtSi0RIvhbfJFgKz8mvGTVkVPgudI4QlbffoHT2NGPL9h+GV265UvxezdgrHtGXXLlinek+7+AvMUdn4mbAvON2wYEwfXr9v0Z1G7+3VjC5u0DYZ3x6vYBurnjgqZ5R7/2J8t32qfVveae8bO2zOnzNzvr3z5FL+i6qGzYeiG3aMWHjLf5Sz4YS1jffj7saeYrm84p5rSelc7dcp701qxOnTpnTWZpjT+rdO5rtpc3fETN3dyH1W45n1O84kNeEONZdHQsYW1bP6Nf2XKeP6/9gqK2vT8APKdYfnIILV+8zThn02nFosO3sNr289S8tn7F3LZ+PIx9cc2vI8cxTPpD4QAgDih2pLmY/E2RybJeD5Hltx5gAQDBxHafoGDv2nh7Sq48O9drdHm8KudKP7e4ayXm3nCM7Vx/Oaqhs4LBuPy/eUpUPKmY0YPBWCzePmoTb3c6OvEwJz5vg1wysQMVT95tQpYfS+CE58sWvwnmVxQwdnV19VNgSUlJwCnI2BL/x7Fyf5+VWH2JFq+/piG23lWRFevy8OaDSUj7f5SY/4pO5uuzKFsGqDB+9kJ/yCj8LwSTaxpCbk//KfvsFE60yXcyWbX+kp6e1YrZyuanJpbOTzaXLUi2eZvTU0oaXPRLfoPo9cFYQ8u53Epfh2BPlTkqiK/xNYVItd4yRmOUUqTmAaGx4+9y/S8GHIYVH9Hm5l1pxld3241NuxzmpnfSrA2dWYYlH1h16wb0xra/2hKduSglw9AgPiUtg/XkmC59ksCmtQiSFLQ4U5fKxfx/4Kna/hInPgS5+L6hGHw/jEEDtuAE5Cg6B2Nla//ET3HXCRMJWmDTpyE5lIfLEE365fEJxe29GtfkGdrM4nJ1VskMTc5Ls6kpnfsVS97vGRcpG37ebWZ7O+up7FKXIq3Qo80sKVC7XiyUFXY1K0OvS/eheM/xs4hnb6/Y1fMuMungSXR6Ry8/OPdvdXkggXggeA4IJgDIDWHK2utj83urUe+BCnHJ/irEc2gmWtLdGAeEh54mZVEgcStyXHHgRCsL3L0aBU7/NhqcOhkNeg9Eg/d7x4PDR6PBgUPR4PSpaHANMhfhXr6Ns6nKP37UEQo7D4C8be+ChY2LmJh+g+Oi+yIRcV8gwO7HxWEB7/h+Pi/pPT5PA9kg/kF8HH5PJMK/RcXkbYkoPsgxd20pq6irPkRYWl0FoFDI2FAkEkKJJG2Iz1fA2FjVIIIYDlPajJ0ypXsPpXYdlanS7whEWsjnq4b5fCUUi91QSkaNCfQdPFTnkMcTQ4JwQBRFApvgNwiC2qEzObq0RnuX3uzYpTUl38NIKUTEOEQQSWDz3Ec4/rTGf6+hQUVKFstDZ4Jspzuqfl4d55rJrLqp1xv/SdP0bb1eD3VqxecalfG2ktJBJaX6h0ZNX9NoDYMajfEWTcuDPNMS01ne1BdGeylDRUKjTiOJ+HJAyArn7CjRachR/S+UWnZlcoaSzCgitam5uNbukaltbtVMSwYVgLMYrjBnODwIPi7b5qJ0KfmYgk6iNKm5igS7Wz3LnhcHvpXKx5/T0La9Frtuj9Wm3pXk0O5KdCT0GS1WqNEwid3nLQVzZs8GZxKTGMJ7pFR4PMFi7Uh06HYnp2l2BjHWFMvntFELHod3J0QTYKKJ22fQxUEVweTV/uC4J4eZH75ymdFfKUKJDfmABUWc6KszK+OgRRcNFeIIVaVSioBBx3l26Otn7D+S8pCHUtmTsRkbW5+sb0TUAHwnFmffJkkzxHDTPgwvOIOTzruk1HoXw5IfCYXW4MJPRwBmFAXDUiljXywqYnTLyhUxPfX1rGB8h+12AL7EyYY3TLbsPdoEzw6DtaBDayrYrLcWdWvNhf8SIuXhy3A57AyBNcnNaJ/PF9YxYTsUGwRxfEeQCQ8x3HhHJrNco/X263p9ylWdPuMrrTZ5XHJifOCALLi1C1zHMObWh9ls4F+1KjLh/xO9Qq2zagz8cP9+HFO6YGlLS2TCRwgS9YAgWQ8IgvUYlTA1+Ugml0AUocrMqWhaej5tySyWBUpMMEwQqm+syQzu2LTy53tYW1vL6McEEUoJkpzwGCc0EMd198Qi0wMcN0IM0w3hOBKO6dXe3jAh1+db9fwjDuP4qP4sj4fb7HJNCPeP4DirorISLJw6JfT3tXJljG+1/8fJhgKpcVOpZC6At3TpuOIFC3g1y5Zxqurq+Ckzq2OsVVUca3U1N2X6dG5NzcuxMFKNj3oHuKHvDWSz2YGl0SMNFElQnV8oEAbsqEAbF2xhTPfIT+pYL1E0RIaiMQ8xLOczQpr5CSF13sSI1IMSYtIVFHPewMn0i1JF1h1Ekgfj49UMTiBgRfZOLB55uYWCWziRt1ZtdL6ponPeUOidrRqTaweltb+l0Gcv01lcZzD5C5AXY2TCxI97DqFCEXoEDAksKJVKIYbKIIHJICqWQlOCCqqUCigJjEkQMjCu+F4mY3444UhJRpTrak3kCbkUC7yQY74f30skP3IhJAmGRryEMSBqicGKsp1eWVJ2MWUyOyirLUc53lVGpDjyJY8wgrm9v+FSMDySw5EvhqJGjo9EfS2V5XwsV5acUOmSNlrs8h69mb4kU2YPSqnJgTRj8uwhTrCGPv2EwfwPlt2a2JOlvaMAAAAASUVORK5CYII=&#x27;); background-size: cover; display: block;\"></span>\n  <picture>\n        <source srcset=\"/static/10053d985ed8b13cfb9b560ee5d2cc37/c54d4/re-executed-jobs.webp 175w, /static/10053d985ed8b13cfb9b560ee5d2cc37/a3432/re-executed-jobs.webp 350w, /static/10053d985ed8b13cfb9b560ee5d2cc37/52d01/re-executed-jobs.webp 501w\" sizes=\"(max-width: 501px) 100vw, 501px\" type=\"image/webp\">\n        <source srcset=\"/static/10053d985ed8b13cfb9b560ee5d2cc37/17006/re-executed-jobs.png 175w, /static/10053d985ed8b13cfb9b560ee5d2cc37/d6f3f/re-executed-jobs.png 350w, /static/10053d985ed8b13cfb9b560ee5d2cc37/f55b8/re-executed-jobs.png 501w\" sizes=\"(max-width: 501px) 100vw, 501px\" type=\"image/png\">\n        <img class=\"gatsby-resp-image-image\" src=\"/static/10053d985ed8b13cfb9b560ee5d2cc37/f55b8/re-executed-jobs.png\" alt=\"re executed jobs\" title=\"re executed jobs\" loading=\"lazy\" style=\"width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;\">\n      </picture>\n  </a>\n    </span></body></html></p>\n<html><head></head><body><div class=\"gatsby-highlight\" data-language=\"dvc\"><pre class=\"language-dvc\"><code class=\"language-dvc\"><span class=\"token line\"><span class=\"token input\">$ </span><span class=\"token dvc\">dvc run</span> Rscript code/train_test_spliting.R <span class=\"token punctuation\">\\</span>\n                  data/Posts.csv <span class=\"token number\">0.15</span> <span class=\"token number\">20170426</span> <span class=\"token punctuation\">\\</span>\n                  data/train_post.csv <span class=\"token punctuation\">\\</span>\n                  data/test_post.csv\n</span>\n<span class=\"token line\"><span class=\"token input\">$ </span><span class=\"token dvc\">dvc repro</span> data/evaluation_python.txt\n</span>\nReproducing run command for data item data/matrix_train.txt. Args: Rscript — vanilla code/featurization.R data/train_post.csv data/test_post.csv data/matrix_train.txt data/matrix_test.txt\nReproducing run command for data item data/model.p. Args: python3 code/train_model_python.py data/matrix_train.txt 20170426 data/model.p\nReproducing run command for data item data/evaluation_python.txt. Args: python3 code/evaluate_python_mdl.py data/model.p data/matrix_test.txt data/evaluation_python.txt\n\nData item “data/evaluation_python.txt” was reproduced.\n\n<span class=\"token line\"><span class=\"token input\">$ </span><span class=\"token command\">cat</span> data/evaluation_python.txt\n</span>AUC: 0.793145</code></pre></div></body></html>\n<p>New AUC result is 0.793145 which shows an improvement compared to previous\niteration.</p>\n<h2>Summary</h2>\n<p>In data science projects it is often used R/Python combined programming.\nAdditional tools beside git and shell scripting are developed to facilitate the\ndevelopment of predictive model in a multi-language environments. Using data\nversion control system for reproducibility and Feather for data interoperability\nhelps you orchestrate R and Python code in a single environment.</p>","timeToRead":10,"fields":{"slug":"/best-practices-of-orchestrating-python-and-r-code-in-ml-projects"},"frontmatter":{"title":"Best practices of orchestrating Python and R code in ML projects","date":"September 26, 2017","description":"What is the best way to integrate R and Python languages in one data science\nproject? What are the best practices?\n","descriptionLong":"Today, data scientists are generally divided among two languages — some prefer\nR, some prefer Python. I will try to find an answer to a question: “What is\nthe best way to integrate both languages in one data science project? What are\nthe best practices?”\n","tags":["R","Python","DVC","Tutorial","Best Practices"],"commentsUrl":"https://discuss.dvc.org/t/best-practices-of-orchestrating-python-and-r-code-in-ml-projects/295","author":{"childMarkdownRemark":{"frontmatter":{"name":"Marija Ilić","avatar":{"childImageSharp":{"fixed":{"base64":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAAUCAIAAAAC64paAAAACXBIWXMAAAsSAAALEgHS3X78AAAEkElEQVQ4yy3Q+VPSeRzHcXZnZzs0w8IjQYNUEhRF5VYQEAUEb0XFXEtyK7LD1U6PstTWXPPKQctUvgffk+/3C4jpmLaNre1O287sVLM/9M/sl21nnvP57fF+fefLYej1SPglG8OsU2yhDSYUfb64dG/g9tPpaRhESJwKMVGCCJGxaByjMDSIBggExjkoQhI4zQZC2OoauLS41NXRUSDOykw5np8p1BUVOcrNY0ODCBgA12DAH1hbhVZXwJUXwMqyn8NeCpIhFiOsB+G2xrrkw9+nHzkkST4q5sULEg5mcA/li/gjt/oxGPEDCASi7Ak/e2gN5mAETZAMQTA4Ti/6FjWy01pRSnOxuFMnaSgU2XMFFVKhnM8rleYszi/AcBD0oxCAQn4EXAtw8P8w+5IhxnuhWy8WXKkoGnQUPWzQjLpKx5t0DxpKrlQoNIIkT1sbFqAgPwoDGPuCawgHwykWk8EQAIFGrcIpP3nHUTjRpF7wVPku1fu67LNu46TbdEYpYceXZp8hEAkB2FfPLlPsLPuHJ8bGcwS8OoVwsFY15dYv9zSu9nUt97jmz9nnOp19ZpWcnzTcd4dCmQCAxfLjMcyOhyMb173eDO6B+iLRnSr1zy2mF94WuK975dqZha66uWZHb5lSwou/2N5BozQC4hgbQMU+m10OhaPtjXWC+G/blJKrZtWPZlWvreS5p3nSXdNTaRh2Vl4tVUq4h886q0mYxGACZ4NimGYxRa83VFVmxH/XbVK4i7Jq5DkdOvlDh/6GWeeQ5Xo0xf2mUtUJbndtLQEFiQBNwDQRCH3FDEWtn3W1SHiJ16zaQZtqqMZ006IdNhTer9DesOhHK00jVZVGwbHrrW00thFEIhS6TiPrMcxGkqGBvn7hce45tWym2rzQartrko8aVU9s+vma8uWm6vHaujJB0oi3J0q/YvBomNwMYy85KEaxsX52Zp6feswmTp+0mx5Xlw2aiydtloVq67zTDJ1p7TWbSkRpvvHHL5ndCLkZpbY3g9v/4wBKrSxDytzMYn7iaHXJA6vqlqZgwlA2bTXN1ZcDHnetJLuyMC8CEFvM663Q7nb49U741xhG0CBGME9nZnTSdDEvrkuTNWJVDGgL76vVTyzGFbfzXo1Zyk3otJS+IiO7kb030b29zd/ebu5zAmgQRggMZ/oun5fz4zJ58caspEsl0keO0plGy7MW+y/1FpMoNT+F222QUz7fu53377Z+f7/z5/vdDxwQIhCUXpietioluamHRYkHzDknvOUFvWV5Y3bdmK2kQ56dz0swZqa68oVD7c37oa2P+58/7n369PYzBwDxAIR21lcpTibK+Amnjh00ZiXfcmpv2JUDTu1dm7JZLixMPWIVn7BlJrcUZE+dv7hPbP6z9/eXv75w/H5kdGhYdyo5P+WgmHdAEPeNLO3QRVPeSINhxGV82KT/QZWtSztiPXXckHbUlZvuUcsnr/xE+YBPbz5wnk7N9V+6YFVIDbkitThDJuSrT6d7q3QTZ2unPI3TnqbrDn27VtauyXMppb12w2Vr2ezN2+Qi/MfG/r98UBJBtoZjogAAAABJRU5ErkJggg==","width":40,"height":40,"src":"/static/9add844328fd47c78f5df2bdfe1c56a3/4d3a4/marija_ilic.png","srcSet":"/static/9add844328fd47c78f5df2bdfe1c56a3/4d3a4/marija_ilic.png 1x,\n/static/9add844328fd47c78f5df2bdfe1c56a3/4c8bc/marija_ilic.png 1.5x,\n/static/9add844328fd47c78f5df2bdfe1c56a3/c0e17/marija_ilic.png 2x","srcWebp":"/static/9add844328fd47c78f5df2bdfe1c56a3/e145b/marija_ilic.webp","srcSetWebp":"/static/9add844328fd47c78f5df2bdfe1c56a3/e145b/marija_ilic.webp 1x,\n/static/9add844328fd47c78f5df2bdfe1c56a3/0d42c/marija_ilic.webp 1.5x,\n/static/9add844328fd47c78f5df2bdfe1c56a3/f46db/marija_ilic.webp 2x"}}}}}},"picture":{"childImageSharp":{"fluid":{"base64":"data:image/jpeg;base64,/9j/2wBDABALDA4MChAODQ4SERATGCgaGBYWGDEjJR0oOjM9PDkzODdASFxOQERXRTc4UG1RV19iZ2hnPk1xeXBkeFxlZ2P/2wBDARESEhgVGC8aGi9jQjhCY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2P/wgARCAAJABQDASIAAhEBAxEB/8QAGAAAAgMAAAAAAAAAAAAAAAAAAAMBAgX/xAAUAQEAAAAAAAAAAAAAAAAAAAAA/9oADAMBAAIQAxAAAAHVU+BhcP/EABoQAAIDAQEAAAAAAAAAAAAAAAECAxMzABH/2gAIAQEAAQUCNgawyKvoWXI69//EABQRAQAAAAAAAAAAAAAAAAAAABD/2gAIAQMBAT8BP//EABQRAQAAAAAAAAAAAAAAAAAAABD/2gAIAQIBAT8BP//EAB4QAAIBAwUAAAAAAAAAAAAAAAABAhAREjFBYXFy/9oACAEBAAY/AtsBqCV+WLLUl0R80//EAB4QAQABAwUBAAAAAAAAAAAAAAERABAhMUFhcaHB/9oACAEBAAE/IcmLGssNB0C4wTz1RKQ7ot3rfLf/2gAMAwEAAgADAAAAEPPP/8QAFBEBAAAAAAAAAAAAAAAAAAAAEP/aAAgBAwEBPxA//8QAFBEBAAAAAAAAAAAAAAAAAAAAEP/aAAgBAgEBPxA//8QAHxABAAIBAwUAAAAAAAAAAAAAAQARIRBRoTFBcYHw/9oACAEBAAE/EMQYtOPnt7iblFoA6DAq3IYCmV65nNz5u+j/2Q==","aspectRatio":2.115702479338843,"src":"/static/11e326249ba3e8223ebe073afdc81f9b/6fdf8/post-image.jpg","srcSet":"/static/11e326249ba3e8223ebe073afdc81f9b/9fc73/post-image.jpg 213w,\n/static/11e326249ba3e8223ebe073afdc81f9b/ee221/post-image.jpg 425w,\n/static/11e326249ba3e8223ebe073afdc81f9b/6fdf8/post-image.jpg 850w,\n/static/11e326249ba3e8223ebe073afdc81f9b/88a70/post-image.jpg 1275w,\n/static/11e326249ba3e8223ebe073afdc81f9b/15ae8/post-image.jpg 1700w,\n/static/11e326249ba3e8223ebe073afdc81f9b/0f962/post-image.jpg 2048w","srcWebp":"/static/11e326249ba3e8223ebe073afdc81f9b/5c1d9/post-image.webp","srcSetWebp":"/static/11e326249ba3e8223ebe073afdc81f9b/99b2d/post-image.webp 213w,\n/static/11e326249ba3e8223ebe073afdc81f9b/23220/post-image.webp 425w,\n/static/11e326249ba3e8223ebe073afdc81f9b/5c1d9/post-image.webp 850w,\n/static/11e326249ba3e8223ebe073afdc81f9b/5e720/post-image.webp 1275w,\n/static/11e326249ba3e8223ebe073afdc81f9b/35cfd/post-image.webp 1700w,\n/static/11e326249ba3e8223ebe073afdc81f9b/96ec1/post-image.webp 2048w","sizes":"(max-width: 850px) 100vw, 850px","presentationWidth":850}}},"pictureComment":"Image was taken from intersog.com"}}},"pageContext":{"next":{"fields":{"slug":"/ml-best-practices-in-pytorch-dev-conf-2018"},"frontmatter":{"title":"ML best practices in PyTorch dev conf 2018"}},"previous":{"fields":{"slug":"/ml-model-ensembling-with-fast-iterations"},"frontmatter":{"title":"ML Model Ensembling with Fast Iterations"}},"currentPage":16,"slug":"/best-practices-of-orchestrating-python-and-r-code-in-ml-projects"}}}