{"componentChunkName":"component---src-templates-blog-post-tsx","path":"/ml-model-ensembling-with-fast-iterations","result":{"data":{"markdownRemark":{"id":"71804b8a-8af6-5736-b558-fdf6f724c4af","excerpt":"<p>In a model ensembling setup, the final prediction is a composite of predictions\nfrom individual machine learning algorithms. To make the…</p>","html":"<p>In a model ensembling setup, the final prediction is a composite of predictions\nfrom individual machine learning algorithms. To make the best model composite,\nyou have to try dozens of combinations of weights for the model set. It takes a\nlot of time to come up with the best one. That is why the iteration speed is\ncrucial in the ML model ensembling. We are going to make our research\nreproducible by using <a href=\"http://dvc.org\">Data Version Control</a> tool -\n(<a href=\"http://dvc.org\">DVC</a>). It provides the ability to quickly re-run and replicate\nthe ML prediction result by executing just a single command <html><head></head><body><code class=\"language-text\">dvc repro</code></body></html>.</p>\n<p>As we will demonstrate, DVC is a good tool that helps tackling common technical\nchallenges of building pipelines for the ensemble learning.</p>\n<h2>Project Overview</h2>\n<p>In this case, we will build an R-based solution to attack the\nsupervised-learning regression problem to predict win sales per\n<a href=\"https://inclass.kaggle.com/c/pred-411-2016-04-u3-wine/\">Predict Wine Sales</a>\nKaggle competition.</p>\n<p>An ensemble prediction methodology will be used in the project. The weighted\nensemble of three models will be implemented, trained, and predicted from\n(namely, these are Linear Regression, <html><head></head><body><code class=\"language-text\">GBM</code></body></html>, and <html><head></head><body><code class=\"language-text\">XGBoost</code></body></html>).</p>\n<p><html><head></head><body><span class=\"gatsby-resp-image-wrapper\" style=\"position: relative; display: block; margin-left: auto; margin-right: auto;  max-width: 435px;\">\n      <a class=\"gatsby-resp-image-link\" href=\"/static/eb9050a712d4a3f7fd006686b1f41fe2/93314/ensemble-prediction-methodology.png\" style=\"display: block\" target=\"_blank\" rel=\"noopener\">\n    <span class=\"gatsby-resp-image-background-image\" style=\"padding-bottom: 68.62068965517241%; position: relative; bottom: 0; left: 0; background-image: url(&#x27;data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAAOCAIAAACgpqunAAAACXBIWXMAAAsSAAALEgHS3X78AAACPklEQVQoz12SSW/TQBTH8yV76ZEDFw5UiEOFkBASqir1xAWJj1AhlXSjVNDQJaRLGtIlJe5GS53YcdzE22ye5Q2euE1T3mH83pv/b/625xX0fSCE4yTRI8EYC8JISjnaRAhRyvK8kD8opaX1jd3qvuO6WQkA2bpTrf7c2b28/DPsdH3/24/1X4dHcWxsCnk3SdDK19XPxWLTsnJpZrhZLs8vLFS2d4aw3WotLi8tLi13PO/BWQhxenFdPzkPwvBOCnBzY+8fNp22M4QTlNQbpwfHTUrJPTzY0MjRne2H0iRC+VUAdV+aPjg1CNt5aWAh4cDT1TO7Vq/sudpDRp0eH+HSKvk0RTfWuPU76/Spe9TbO2l+bPyds8JDLpmBAwpv1tKJYu/lFzKxhNevMwcVvnt7NT5mPX3WHh+L3s9kslrn+3TlydTW5HTl+Yf6i5D5hfx7iNK4b+OrEgYtpXEGlkLoQ2MWohBSczdKcQKctDdJcMpAalAGTqVGQsdhD3lnCc9K0EoqgkUS8ZuqJAiYgYVKieK0vUWjS6JSqVIDJ6lOmLLd224sQgbZQVpwGQR9z3daPRbFKow0ZAYRYrHdT9xeF7MIy9jACjThQKSmoUsCezBQKvvVmkf6fFZLrLXpSZBUxLRbo2lAJVYgC6PXoANLO2WDcom6EnkcXZxnCfalkgOBwPp6RYN8NJ7DyA9JHF5625p/dTL32ipONsozHXwrBtuPxIX/0QGtBCQuj900dljspNkr3DmPjpDW/wDEnAvQG1Y76gAAAABJRU5ErkJggg==&#x27;); background-size: cover; display: block;\"></span>\n  <picture>\n        <source srcset=\"/static/eb9050a712d4a3f7fd006686b1f41fe2/c54d4/ensemble-prediction-methodology.webp 175w, /static/eb9050a712d4a3f7fd006686b1f41fe2/a3432/ensemble-prediction-methodology.webp 350w, /static/eb9050a712d4a3f7fd006686b1f41fe2/426ac/ensemble-prediction-methodology.webp 700w, /static/eb9050a712d4a3f7fd006686b1f41fe2/bf818/ensemble-prediction-methodology.webp 870w\" sizes=\"(max-width: 700px) 100vw, 700px\" type=\"image/webp\">\n        <source srcset=\"/static/eb9050a712d4a3f7fd006686b1f41fe2/17006/ensemble-prediction-methodology.png 175w, /static/eb9050a712d4a3f7fd006686b1f41fe2/d6f3f/ensemble-prediction-methodology.png 350w, /static/eb9050a712d4a3f7fd006686b1f41fe2/69344/ensemble-prediction-methodology.png 700w, /static/eb9050a712d4a3f7fd006686b1f41fe2/93314/ensemble-prediction-methodology.png 870w\" sizes=\"(max-width: 700px) 100vw, 700px\" type=\"image/png\">\n        <img class=\"gatsby-resp-image-image\" src=\"/static/eb9050a712d4a3f7fd006686b1f41fe2/69344/ensemble-prediction-methodology.png\" alt=\"ensemble prediction methodology\" title=\"ensemble prediction methodology\" loading=\"lazy\" style=\"width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;\">\n      </picture>\n  </a>\n    </span></body></html></p>\n<p>If properly designed and used, ensemble prediction can perform much better then\npredictions of individual machine learning models composing the ensemble.</p>\n<p>Prediction results will be delivered in a format of output CSV file that is\nspecified in the requirements to the\n<a href=\"https://inclass.kaggle.com/c/pred-411-2016-04-u3-wine/\">Predict Wine Sales</a>\nKaggle competition (so called Kaggle submission file).</p>\n<h2>Important Pre-Requisites</h2>\n<p>In order to try the materials of this\n<a href=\"https://github.com/gvyshnya/DVC_R_Ensemble\">repository</a> in your environment,\nthe following software should be installed on your machine</p>\n<ul>\n<li><strong><em>Python 3</em></strong> runtime environment for your OS (it is required to run DVC\ncommands in the batch files)</li>\n<li><strong><em>DVC</em></strong> itself (you can install it as a python package by simply doing the\nstandard command in your command line prompt: <html><head></head><body><code class=\"language-text\">pip install dvc</code></body></html>)</li>\n<li><strong><em>R</em></strong> <strong><em>3.4.x</em></strong> runtime environment for your OS</li>\n<li><strong><em>git</em></strong> command-line client application for your OS</li>\n</ul>\n<h2>Technical Challenges</h2>\n<p>The technical challenges of building the ML pipeline for this project were to\nmeet business requirements below</p>\n<ul>\n<li>Ability to conditionally trigger execution of 3 different ML prediction models</li>\n<li>Ability to conditionally trigger model ensemble prediction based on\npredictions of those 3 individual models</li>\n<li>Ability to specify weights of each of the individual model predictions in the\nensemble</li>\n<li>Quick and fast redeployment and re-run of the ML pipeline upon frequent\nreconfiguration and model tweaks</li>\n<li>Reproducibility of the pipeline and forecasting results across the multiple\nmachines and team members</li>\n</ul>\n<p>The next sections below will explain how these challenges are addressed in the\ndesign of ML pipeline for this project.</p>\n<h2>ML Pipeline</h2>\n<p>The ML pipeline for this project is presented in the diagram below</p>\n<p><html><head></head><body><span class=\"gatsby-resp-image-wrapper\" style=\"position: relative; display: block; margin-left: auto; margin-right: auto;  max-width: 365.5px;\">\n      <a class=\"gatsby-resp-image-link\" href=\"/static/9cf20fd774b97331a5c6e17a1e92115b/b2a6b/ml-pipeline.png\" style=\"display: block\" target=\"_blank\" rel=\"noopener\">\n    <span class=\"gatsby-resp-image-background-image\" style=\"padding-bottom: 74.00820793433653%; position: relative; bottom: 0; left: 0; background-image: url(&#x27;data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAAPCAYAAADkmO9VAAAACXBIWXMAAAsSAAALEgHS3X78AAADbUlEQVQ4y41UTWwbVRB+diIVDpUqilROSKBeAKlnLjQIAUIccqGAQLSihwYhOPYCFwRSCEVpJBCoFIWKlkIvgEMLTpzUsb32/tj747XXrrPZ9c/uev23/iH22oljO8PuOiSHSlXnafVmZ+Z9mvlm3kPDZuRbGAowaBJCJXPzRHnz70inFISKvDKsZlcG3QoGVdnrQZaoWc5lyH+hB8qoRSzDTgL6daKuCbdOGTkfa5bDIPEekOIe6DcoaOvrK3Ysw/CoV157MKCpeY7CFj49agSesf9hRD8He+xUM3/npYHhPw0j5kXokycdH+TRsLqEHloAll1NxXfxX82/WJGXf2wUVq+01LvXWopvxvbHY6uoW/Y7sTGpgxKFbZRUdxCb7SIPXhyD9Gt+1DdCE7OXFi1AZUIXvXXo8dCr4dCt4gCDFFhcJuzYqz/9iXolL1qLG/clk1S2XYeZ1VYP9Ubw/YERnO1ovi+2y3c/260G5qAVOmP7ht0MEhiHTrSEq+5bvtSRX/7hjrzx0WW3bSubMAYJF8gDwJjGfMBo3HzwHjaPy+RXbDG+QKn0u2NKAG3UwDkcz/deFSujvGzARqY0SlObnRcOQDiNdnYRIhOcwjRKuzpITRE2jAw0oA6JIpe0/efOXnDfDrCTzpmceSat9iDE5UEsD4DPmYuHWSnEgU4XqBlOpedJKXKJzlNfWvoCo0bfsX2/Lv3munrztgNIpGpvJnJtWCPFYUoxgc93fjgAiRbwQ/ACeT6WI2exdGCW2Ax/Tsn4nGVzOLzw8Yzr6++vT45L7k5L1b1tuQZG1gBIF3cX7is5anonOJWtt2ELxPqG85nWsmhwuvz2W2fdn8x9OlnZI11/YNwjN7zc47+HCsc8Ien4OrN5FOCb/TKV/aYcQ4jX2YvpqrBI56JX4hr7naVfixcZZw7fO3/O/VADHdUxpLerjs7q5LOpOn/aL/heZjRySqhxU0yROPl/LF9hTiUr3Gt8iXmd12NPO9038cdgh56GXuRJJwgXQ85+R/7ZTYgRPqknIJzGICQEIVUWAEsHnUF94qnjj1ISQa8xq0BnY8CpzGXb3i4FsEHTvvMBcb8R4y5/eOMVl1UmZY9LOIUNI/fCA7klAavQzhNz4nmXO5ajQngGt3hlLUB6zrZv6evXh60YWNc08B8bAFkvKDFRaQAAAABJRU5ErkJggg==&#x27;); background-size: cover; display: block;\"></span>\n  <picture>\n        <source srcset=\"/static/9cf20fd774b97331a5c6e17a1e92115b/c54d4/ml-pipeline.webp 175w, /static/9cf20fd774b97331a5c6e17a1e92115b/a3432/ml-pipeline.webp 350w, /static/9cf20fd774b97331a5c6e17a1e92115b/426ac/ml-pipeline.webp 700w, /static/9cf20fd774b97331a5c6e17a1e92115b/feeb6/ml-pipeline.webp 731w\" sizes=\"(max-width: 700px) 100vw, 700px\" type=\"image/webp\">\n        <source srcset=\"/static/9cf20fd774b97331a5c6e17a1e92115b/17006/ml-pipeline.png 175w, /static/9cf20fd774b97331a5c6e17a1e92115b/d6f3f/ml-pipeline.png 350w, /static/9cf20fd774b97331a5c6e17a1e92115b/69344/ml-pipeline.png 700w, /static/9cf20fd774b97331a5c6e17a1e92115b/b2a6b/ml-pipeline.png 731w\" sizes=\"(max-width: 700px) 100vw, 700px\" type=\"image/png\">\n        <img class=\"gatsby-resp-image-image\" src=\"/static/9cf20fd774b97331a5c6e17a1e92115b/69344/ml-pipeline.png\" alt=\"ml pipeline\" title=\"ml pipeline\" loading=\"lazy\" style=\"width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;\">\n      </picture>\n  </a>\n    </span></body></html></p>\n<p>As you can see, the essential implementation of the solution is as follows</p>\n<ul>\n<li><a href=\"https://gist.github.com/gvyshnya/443424775b0150baac774cc6cf3cb1cc\"><html><head></head><body><code class=\"language-text\">preprocessing.R</code></body></html></a>\nhandles all aspects of data manipulations and pre-processing (reading training\nand testing data sets, removing outliers, imputing NAs etc.) as well as stores\nrefined training and testing set data as new files to reuse by model scripts</li>\n<li>3 model scripts implement training and forecasting algorithms for each of the\nmodels selected for this project\n(<a href=\"https://gist.github.com/gvyshnya/7ec76316c24bc1b4f595ef1256f52d3a\"><html><head></head><body><code class=\"language-text\">LR.R</code></body></html></a>,\n<a href=\"https://gist.github.com/gvyshnya/50e5ea3efa9771d2e7cc121c2f1a04e4\"><html><head></head><body><code class=\"language-text\">GBM.R</code></body></html></a>,\n<a href=\"https://gist.github.com/gvyshnya/2e5799863f02fec652c194020da82dd3\"><html><head></head><body><code class=\"language-text\">xgboost.R</code></body></html></a>)</li>\n<li><a href=\"https://gist.github.com/gvyshnya/84379d6a68fd085fe3a26aabad453e55\"><html><head></head><body><code class=\"language-text\">ensemble.R</code></body></html></a>\nis responsible for the weighted ensemble prediction and the final output of\nthe Kaggle submission file</li>\n<li><html><head></head><body><code class=\"language-text\">config.R</code></body></html> is responsible for all of the conditional logic switches needed in\nthe pipeline (it is included as a source to all of modeling and ensemble\nprediction scripts, to get this done)</li>\n</ul>\n<p>There is a special note about lack of feature engineering for this project. It\nwas an intended specification related to the specifics of the dataset. The\nexisting features were quite instrumental to predict the target values ‘as is’.\nTherefore it had been decided to follow the well-known\n<a href=\"https://en.wikipedia.org/wiki/Pareto_principle\">Pareto principle</a> (interpreted\nas “<strong><em>20% of efforts address 80% of issues</em></strong>”, in this case) and not to spend\nmore time on it.</p>\n<p><strong><em>Note</em></strong>: all <html><head></head><body><code class=\"language-text\">R</code></body></html> and batch files mentioned throughout this blog post are\navailable online in a separate GitHub\n<a href=\"https://github.com/gvyshnya/DVC_R_Ensemble\">repository</a>. You will be also able\nto review more details on the implementation of each of the machine learning\nprediction models there.</p>\n<h3>Pipeline Configuration Management</h3>\n<p>All of the essential tweaks to conditional machine learning pipeline for this\nproject is managed by a configuration file. For ease of its use across solution,\nit was implemented as an R code module (<html><head></head><body><code class=\"language-text\">config.R</code></body></html>), to be included to all model\ntraining and forecasting. Thus the respective parameters (assigned as R\nvariables) will be retrieved by the runnable scripts, and the conditional logic\nthere will be triggered respectively.</p>\n<p>This file is not intended to run from a command line (unlike the rest of the R\nscripts in the project).</p>\n<p><html><head></head><body><div id=\"gist73938264\" class=\"gist\">\n    <div class=\"gist-file\">\n      <div class=\"gist-data\">\n        <div class=\"js-gist-file-update-container js-task-list-container file-box\">\n  <div id=\"file-config-r\" class=\"file\">\n    \n\n  <div itemprop=\"text\" class=\"Box-body p-0 blob-wrapper data type-r\">\n      \n<table class=\"highlight tab-size js-file-line-container\" data-tab-size=\"8\">\n      <tbody><tr>\n        <td id=\"file-config-r-L1\" class=\"blob-num js-line-number\" data-line-number=\"1\"></td>\n        <td id=\"file-config-r-LC1\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-c\"><span class=\"pl-c\">#</span> Competition: https://inclass.kaggle.com/c/pred-411-2016-04-u3-wine/</span></td>\n      </tr>\n      <tr>\n        <td id=\"file-config-r-L2\" class=\"blob-num js-line-number\" data-line-number=\"2\"></td>\n        <td id=\"file-config-r-LC2\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-c\"><span class=\"pl-c\">#</span> This is a configuration file to the entire solution </span></td>\n      </tr>\n      <tr>\n        <td id=\"file-config-r-L3\" class=\"blob-num js-line-number\" data-line-number=\"3\"></td>\n        <td id=\"file-config-r-LC3\" class=\"blob-code blob-code-inner js-file-line\">\n</td>\n      </tr>\n      <tr>\n        <td id=\"file-config-r-L4\" class=\"blob-num js-line-number\" data-line-number=\"4\"></td>\n        <td id=\"file-config-r-LC4\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-c\"><span class=\"pl-c\">#</span> LR.R specific settings</span></td>\n      </tr>\n      <tr>\n        <td id=\"file-config-r-L5\" class=\"blob-num js-line-number\" data-line-number=\"5\"></td>\n        <td id=\"file-config-r-LC5\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-smi\">cfg_run_LR</span> <span class=\"pl-k\">&#x3C;-</span> <span class=\"pl-c1\">1</span> <span class=\"pl-c\"><span class=\"pl-c\">#</span> if set to 0, LR model will not fit, and its prediction will not be calculated in the batch mode</span></td>\n      </tr>\n      <tr>\n        <td id=\"file-config-r-L6\" class=\"blob-num js-line-number\" data-line-number=\"6\"></td>\n        <td id=\"file-config-r-LC6\" class=\"blob-code blob-code-inner js-file-line\">\n</td>\n      </tr>\n      <tr>\n        <td id=\"file-config-r-L7\" class=\"blob-num js-line-number\" data-line-number=\"7\"></td>\n        <td id=\"file-config-r-LC7\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-c\"><span class=\"pl-c\">#</span> GMB.R specific settings</span></td>\n      </tr>\n      <tr>\n        <td id=\"file-config-r-L8\" class=\"blob-num js-line-number\" data-line-number=\"8\"></td>\n        <td id=\"file-config-r-LC8\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-smi\">cfg_run_GBM</span> <span class=\"pl-k\">&#x3C;-</span> <span class=\"pl-c1\">1</span> <span class=\"pl-c\"><span class=\"pl-c\">#</span> if set to 0, GBM model will not fit, and its prediction will not be calculated in the batch mode</span></td>\n      </tr>\n      <tr>\n        <td id=\"file-config-r-L9\" class=\"blob-num js-line-number\" data-line-number=\"9\"></td>\n        <td id=\"file-config-r-LC9\" class=\"blob-code blob-code-inner js-file-line\">\n</td>\n      </tr>\n      <tr>\n        <td id=\"file-config-r-L10\" class=\"blob-num js-line-number\" data-line-number=\"10\"></td>\n        <td id=\"file-config-r-LC10\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-c\"><span class=\"pl-c\">#</span> xgboost.R specific settings</span></td>\n      </tr>\n      <tr>\n        <td id=\"file-config-r-L11\" class=\"blob-num js-line-number\" data-line-number=\"11\"></td>\n        <td id=\"file-config-r-LC11\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-smi\">cfg_run_xgboost</span> <span class=\"pl-k\">&#x3C;-</span> <span class=\"pl-c1\">1</span> <span class=\"pl-c\"><span class=\"pl-c\">#</span> if set to 0, xgboost model will not fit, and its prediction will not be calculated in the batch mode</span></td>\n      </tr>\n      <tr>\n        <td id=\"file-config-r-L12\" class=\"blob-num js-line-number\" data-line-number=\"12\"></td>\n        <td id=\"file-config-r-LC12\" class=\"blob-code blob-code-inner js-file-line\">\n</td>\n      </tr>\n      <tr>\n        <td id=\"file-config-r-L13\" class=\"blob-num js-line-number\" data-line-number=\"13\"></td>\n        <td id=\"file-config-r-LC13\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-c\"><span class=\"pl-c\">#</span> ensemble.R specific settings</span></td>\n      </tr>\n      <tr>\n        <td id=\"file-config-r-L14\" class=\"blob-num js-line-number\" data-line-number=\"14\"></td>\n        <td id=\"file-config-r-LC14\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-smi\">cfg_run_ensemble</span> <span class=\"pl-k\">&#x3C;-</span> <span class=\"pl-c1\">1</span> <span class=\"pl-c\"><span class=\"pl-c\">#</span> if set to 0, the ensemble will not predict, and ensemble prediction will not be created</span></td>\n      </tr>\n      <tr>\n        <td id=\"file-config-r-L15\" class=\"blob-num js-line-number\" data-line-number=\"15\"></td>\n        <td id=\"file-config-r-LC15\" class=\"blob-code blob-code-inner js-file-line\">\n</td>\n      </tr>\n      <tr>\n        <td id=\"file-config-r-L16\" class=\"blob-num js-line-number\" data-line-number=\"16\"></td>\n        <td id=\"file-config-r-LC16\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-c\"><span class=\"pl-c\">#</span> ensemble components</span></td>\n      </tr>\n      <tr>\n        <td id=\"file-config-r-L17\" class=\"blob-num js-line-number\" data-line-number=\"17\"></td>\n        <td id=\"file-config-r-LC17\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-smi\">cfg_model_predictions</span> <span class=\"pl-k\">&#x3C;-</span> c(<span class=\"pl-s\"><span class=\"pl-pds\">\"</span>data/submission_LR.csv<span class=\"pl-pds\">\"</span></span>, <span class=\"pl-s\"><span class=\"pl-pds\">\"</span>data/submission_GBM.csv<span class=\"pl-pds\">\"</span></span>, <span class=\"pl-s\"><span class=\"pl-pds\">\"</span>data/submission_XGBOOST.csv<span class=\"pl-pds\">\"</span></span>)</td>\n      </tr>\n      <tr>\n        <td id=\"file-config-r-L18\" class=\"blob-num js-line-number\" data-line-number=\"18\"></td>\n        <td id=\"file-config-r-LC18\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-c\"><span class=\"pl-c\">#</span> element weights mapped to the cfg_model_predictions elements above</span></td>\n      </tr>\n      <tr>\n        <td id=\"file-config-r-L19\" class=\"blob-num js-line-number\" data-line-number=\"19\"></td>\n        <td id=\"file-config-r-LC19\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-smi\">cfg_model_weights</span> <span class=\"pl-k\">&#x3C;-</span> c(<span class=\"pl-c1\">1</span>,<span class=\"pl-c1\">1</span>,<span class=\"pl-c1\">1</span>) <span class=\"pl-c\"><span class=\"pl-c\">#</span> weights of predictions of the models in the ensemble</span></td>\n      </tr>\n</tbody></table>\n\n\n  </div>\n\n  </div>\n</div>\n\n      </div>\n      <div class=\"gist-meta\">\n        <a href=\"https://gist.github.com/gvyshnya/918e94b06ebf222f6bb56ed26a5f44ee/raw/e274919657607fdfd67a2fb6354e40ff0c4173e9/config.R\" style=\"float:right\">view raw</a>\n        <a href=\"https://gist.github.com/gvyshnya/918e94b06ebf222f6bb56ed26a5f44ee#file-config-r\">config.R</a>\n        hosted with ❤ by <a href=\"https://github.com\">GitHub</a>\n      </div>\n    </div>\n</div></body></html></p>\n<h3>Why Do We Need DVC?</h3>\n<p>As we all know, there is no way to build the ideal ML model with sound\nprediction accuracy from the very beginning. You will have to continuously\nadjust your algorithm/model implementations based on the cross-validation\nappraisal until you yield the blooming results. This is especially true in the\nensemble learning where you have to constantly tweak not only parameters of the\nindividual prediction models but also the settings of the ensemble itself</p>\n<ul>\n<li>changing ensemble composition — adding or removing individual prediction\nmodels</li>\n<li>changing model prediction weights in the resulting ensemble prediction</li>\n</ul>\n<p>Under such a condition, DVC will help you to manage your ensemble ML pipeline in\na really solid manner. Let’s consider the following real-world scenario</p>\n<ul>\n<li>Your team member changes the settings of <html><head></head><body><code class=\"language-text\">GBM</code></body></html> model and resubmit its\nimplementation to (this is emulated by the commit\n<a href=\"https://github.com/gvyshnya/DVC_R_Ensemble/commit/27825d0732f72f07e7e4e48548ddb8a8604103f0\">#8604103f0</a>,\ncheck sum <html><head></head><body><code class=\"language-text\">27825d0</code></body></html>)</li>\n<li>You rerun the entire ML pipeline on your computer, to get the newest\npredictions from <html><head></head><body><code class=\"language-text\">GBM</code></body></html> as well as the updated final ensemble prediction</li>\n<li>The results of the prediction appeared to be still not optimal thus someone\nchanges the weights of individual models in the ensemble, assigning <html><head></head><body><code class=\"language-text\">GBM</code></body></html>\nhigher weight vs. <html><head></head><body><code class=\"language-text\">xgboost</code></body></html> and <html><head></head><body><code class=\"language-text\">LR</code></body></html></li>\n<li>After the ensemble setup changes committed (and updated <html><head></head><body><code class=\"language-text\">config.R</code></body></html> appeared in\nthe repository, as emulated by the commit\n<a href=\"https://github.com/gvyshnya/DVC_R_Ensemble/commit/5bcbe115afcb24886abb4734ff2da42eb97612ce\">#eb97612ce</a>,\ncheck sum <html><head></head><body><code class=\"language-text\">5bcbe11</code></body></html>), you re-run the model predictions and the final ensemble\nprediction on your machine once again</li>\n</ul>\n<p>All that you need to do to handle the changes above is simply to keep running\nyour <strong>DVC</strong> commands per the script developed (see the section below). You do\nnot have to remember or know explicitly the changes being made into the project\ncodebase or its pipeline configuration. <strong>DVC</strong> will automatically check out\nlatest changes from the repo as well as make sure it runs only those steps in\nthe pipeline that were affected by the recent changes in the code modules.</p>\n<h3>Orchestrating the Pipeline : DVC Command File</h3>\n<p>After we developed individual R scripts needed by different steps of our Machine\nLearning pipeline, we orchestrate it together using DVC.</p>\n<p>Below is a batch file illustrating how DVC manages steps of the machine learning\nprocess for this project</p>\n<p><html><head></head><body><div id=\"gist73940214\" class=\"gist\">\n    <div class=\"gist-file\">\n      <div class=\"gist-data\">\n        <div class=\"js-gist-file-update-container js-task-list-container file-box\">\n  <div id=\"file-dvc-bat\" class=\"file\">\n    \n\n  <div itemprop=\"text\" class=\"Box-body p-0 blob-wrapper data type-batchfile\">\n      \n<table class=\"highlight tab-size js-file-line-container\" data-tab-size=\"8\">\n      <tbody><tr>\n        <td id=\"file-dvc-bat-L1\" class=\"blob-num js-line-number\" data-line-number=\"1\"></td>\n        <td id=\"file-dvc-bat-LC1\" class=\"blob-code blob-code-inner js-file-line\"># This is a DVC-based script to manage machine-learning pipeline <span class=\"pl-k\">for</span> a project per</td>\n      </tr>\n      <tr>\n        <td id=\"file-dvc-bat-L2\" class=\"blob-num js-line-number\" data-line-number=\"2\"></td>\n        <td id=\"file-dvc-bat-LC2\" class=\"blob-code blob-code-inner js-file-line\"># https://inclass.kaggle.com/c/pred-411-2016-04-u3-wine/</td>\n      </tr>\n      <tr>\n        <td id=\"file-dvc-bat-L3\" class=\"blob-num js-line-number\" data-line-number=\"3\"></td>\n        <td id=\"file-dvc-bat-LC3\" class=\"blob-code blob-code-inner js-file-line\">\n</td>\n      </tr>\n      <tr>\n        <td id=\"file-dvc-bat-L4\" class=\"blob-num js-line-number\" data-line-number=\"4\"></td>\n        <td id=\"file-dvc-bat-LC4\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-k\">mkdir</span> R_DVC_GITHUB_CODE</td>\n      </tr>\n      <tr>\n        <td id=\"file-dvc-bat-L5\" class=\"blob-num js-line-number\" data-line-number=\"5\"></td>\n        <td id=\"file-dvc-bat-LC5\" class=\"blob-code blob-code-inner js-file-line\"><span class=\"pl-k\">cd</span> R_DVC_GITHUB_CODE</td>\n      </tr>\n      <tr>\n        <td id=\"file-dvc-bat-L6\" class=\"blob-num js-line-number\" data-line-number=\"6\"></td>\n        <td id=\"file-dvc-bat-LC6\" class=\"blob-code blob-code-inner js-file-line\">\n</td>\n      </tr>\n      <tr>\n        <td id=\"file-dvc-bat-L7\" class=\"blob-num js-line-number\" data-line-number=\"7\"></td>\n        <td id=\"file-dvc-bat-LC7\" class=\"blob-code blob-code-inner js-file-line\"># clone the github repo with the code</td>\n      </tr>\n      <tr>\n        <td id=\"file-dvc-bat-L8\" class=\"blob-num js-line-number\" data-line-number=\"8\"></td>\n        <td id=\"file-dvc-bat-LC8\" class=\"blob-code blob-code-inner js-file-line\">git clone https://github.com/gvyshnya/DVC_R_Ensemble</td>\n      </tr>\n      <tr>\n        <td id=\"file-dvc-bat-L9\" class=\"blob-num js-line-number\" data-line-number=\"9\"></td>\n        <td id=\"file-dvc-bat-LC9\" class=\"blob-code blob-code-inner js-file-line\">\n</td>\n      </tr>\n      <tr>\n        <td id=\"file-dvc-bat-L10\" class=\"blob-num js-line-number\" data-line-number=\"10\"></td>\n        <td id=\"file-dvc-bat-LC10\" class=\"blob-code blob-code-inner js-file-line\"># initialize DVC</td>\n      </tr>\n      <tr>\n        <td id=\"file-dvc-bat-L11\" class=\"blob-num js-line-number\" data-line-number=\"11\"></td>\n        <td id=\"file-dvc-bat-LC11\" class=\"blob-code blob-code-inner js-file-line\">$ dvc init</td>\n      </tr>\n      <tr>\n        <td id=\"file-dvc-bat-L12\" class=\"blob-num js-line-number\" data-line-number=\"12\"></td>\n        <td id=\"file-dvc-bat-LC12\" class=\"blob-code blob-code-inner js-file-line\">\n</td>\n      </tr>\n      <tr>\n        <td id=\"file-dvc-bat-L13\" class=\"blob-num js-line-number\" data-line-number=\"13\"></td>\n        <td id=\"file-dvc-bat-LC13\" class=\"blob-code blob-code-inner js-file-line\"># import data</td>\n      </tr>\n      <tr>\n        <td id=\"file-dvc-bat-L14\" class=\"blob-num js-line-number\" data-line-number=\"14\"></td>\n        <td id=\"file-dvc-bat-LC14\" class=\"blob-code blob-code-inner js-file-line\">$ dvc import https://inclass.kaggle.com/c/pred-411-2016-04-u3-wine/download/wine.csv data/</td>\n      </tr>\n      <tr>\n        <td id=\"file-dvc-bat-L15\" class=\"blob-num js-line-number\" data-line-number=\"15\"></td>\n        <td id=\"file-dvc-bat-LC15\" class=\"blob-code blob-code-inner js-file-line\">$ dvc import https://inclass.kaggle.com/c/pred-411-2016-04-u3-wine/download/wine_test.csv data/</td>\n      </tr>\n      <tr>\n        <td id=\"file-dvc-bat-L16\" class=\"blob-num js-line-number\" data-line-number=\"16\"></td>\n        <td id=\"file-dvc-bat-LC16\" class=\"blob-code blob-code-inner js-file-line\">\n</td>\n      </tr>\n      <tr>\n        <td id=\"file-dvc-bat-L17\" class=\"blob-num js-line-number\" data-line-number=\"17\"></td>\n        <td id=\"file-dvc-bat-LC17\" class=\"blob-code blob-code-inner js-file-line\"># run data pre-processing</td>\n      </tr>\n      <tr>\n        <td id=\"file-dvc-bat-L18\" class=\"blob-num js-line-number\" data-line-number=\"18\"></td>\n        <td id=\"file-dvc-bat-LC18\" class=\"blob-code blob-code-inner js-file-line\">$ dvc run Rscript --vanilla code/preprocessing.R data/wine.csv data/wine_test.csv data/training_imputed.csv data/testing_imputed.csv</td>\n      </tr>\n      <tr>\n        <td id=\"file-dvc-bat-L19\" class=\"blob-num js-line-number\" data-line-number=\"19\"></td>\n        <td id=\"file-dvc-bat-LC19\" class=\"blob-code blob-code-inner js-file-line\">\n</td>\n      </tr>\n      <tr>\n        <td id=\"file-dvc-bat-L20\" class=\"blob-num js-line-number\" data-line-number=\"20\"></td>\n        <td id=\"file-dvc-bat-LC20\" class=\"blob-code blob-code-inner js-file-line\"># run LR model fit and forecasting</td>\n      </tr>\n      <tr>\n        <td id=\"file-dvc-bat-L21\" class=\"blob-num js-line-number\" data-line-number=\"21\"></td>\n        <td id=\"file-dvc-bat-LC21\" class=\"blob-code blob-code-inner js-file-line\">$ dvc run Rscript --vanilla code/LR.R data/training_imputed.csv data/testing_imputed.csv 0.7 <span class=\"pl-c1\">825</span> data/submission_LR.csv code/config.R</td>\n      </tr>\n      <tr>\n        <td id=\"file-dvc-bat-L22\" class=\"blob-num js-line-number\" data-line-number=\"22\"></td>\n        <td id=\"file-dvc-bat-LC22\" class=\"blob-code blob-code-inner js-file-line\">\n</td>\n      </tr>\n      <tr>\n        <td id=\"file-dvc-bat-L23\" class=\"blob-num js-line-number\" data-line-number=\"23\"></td>\n        <td id=\"file-dvc-bat-LC23\" class=\"blob-code blob-code-inner js-file-line\"># run GBM model fit and forecasting</td>\n      </tr>\n      <tr>\n        <td id=\"file-dvc-bat-L24\" class=\"blob-num js-line-number\" data-line-number=\"24\"></td>\n        <td id=\"file-dvc-bat-LC24\" class=\"blob-code blob-code-inner js-file-line\">$ dvc run Rscript --vanilla code/GBM.R data/training_imputed.csv data/testing_imputed.csv <span class=\"pl-c1\">5000</span> <span class=\"pl-c1\">10</span> <span class=\"pl-c1\">4</span> <span class=\"pl-c1\">25</span> data/submission_GBM.csv code/config.R</td>\n      </tr>\n      <tr>\n        <td id=\"file-dvc-bat-L25\" class=\"blob-num js-line-number\" data-line-number=\"25\"></td>\n        <td id=\"file-dvc-bat-LC25\" class=\"blob-code blob-code-inner js-file-line\">\n</td>\n      </tr>\n      <tr>\n        <td id=\"file-dvc-bat-L26\" class=\"blob-num js-line-number\" data-line-number=\"26\"></td>\n        <td id=\"file-dvc-bat-LC26\" class=\"blob-code blob-code-inner js-file-line\"># rum XGBOOST model fit and forecasting</td>\n      </tr>\n      <tr>\n        <td id=\"file-dvc-bat-L27\" class=\"blob-num js-line-number\" data-line-number=\"27\"></td>\n        <td id=\"file-dvc-bat-LC27\" class=\"blob-code blob-code-inner js-file-line\">$ dvc run Rscript --vanilla code/GBM.R data/training_imputed.csv data/testing_imputed.csv <span class=\"pl-c1\">1000</span> <span class=\"pl-c1\">10</span> 0.0001 1.0 data/submission_xgboost.csv code/config.R</td>\n      </tr>\n      <tr>\n        <td id=\"file-dvc-bat-L28\" class=\"blob-num js-line-number\" data-line-number=\"28\"></td>\n        <td id=\"file-dvc-bat-LC28\" class=\"blob-code blob-code-inner js-file-line\">\n</td>\n      </tr>\n      <tr>\n        <td id=\"file-dvc-bat-L29\" class=\"blob-num js-line-number\" data-line-number=\"29\"></td>\n        <td id=\"file-dvc-bat-LC29\" class=\"blob-code blob-code-inner js-file-line\"># prepare ensemble submission</td>\n      </tr>\n      <tr>\n        <td id=\"file-dvc-bat-L30\" class=\"blob-num js-line-number\" data-line-number=\"30\"></td>\n        <td id=\"file-dvc-bat-LC30\" class=\"blob-code blob-code-inner js-file-line\"># Note: please make sure to <span class=\"pl-k\">edit</span> your code/config.R to <span class=\"pl-k\">set</span> up the references to the predictions from each model according</td>\n      </tr>\n      <tr>\n        <td id=\"file-dvc-bat-L31\" class=\"blob-num js-line-number\" data-line-number=\"31\"></td>\n        <td id=\"file-dvc-bat-LC31\" class=\"blob-code blob-code-inner js-file-line\"># to the names of output files on the steps above</td>\n      </tr>\n      <tr>\n        <td id=\"file-dvc-bat-L32\" class=\"blob-num js-line-number\" data-line-number=\"32\"></td>\n        <td id=\"file-dvc-bat-LC32\" class=\"blob-code blob-code-inner js-file-line\">$ dvc run Rscript --vanilla code/ensemble.R data/submission_ensemble.csv code/config.R</td>\n      </tr>\n</tbody></table>\n\n\n  </div>\n\n  </div>\n</div>\n\n      </div>\n      <div class=\"gist-meta\">\n        <a href=\"https://gist.github.com/gvyshnya/7f1b8262e3eb7a8b3c16dbfd8cf98644/raw/4818eab6c2f99722110a37c7d2c509c78ce4240a/dvc.bat\" style=\"float:right\">view raw</a>\n        <a href=\"https://gist.github.com/gvyshnya/7f1b8262e3eb7a8b3c16dbfd8cf98644#file-dvc-bat\">dvc.bat</a>\n        hosted with ❤ by <a href=\"https://github.com\">GitHub</a>\n      </div>\n    </div>\n</div></body></html></p>\n<p>If you then further edit ensemble configuration setup in <html><head></head><body><code class=\"language-text\">code/config.R</code></body></html>, you\ncan simply leverage the power of DVC as for automatic dependencies resolving and\ntracking to rebuild the new ensemble prediction as follows</p>\n<p><html><head></head><body><div id=\"gist74997297\" class=\"gist\">\n    <div class=\"gist-file\">\n      <div class=\"gist-data\">\n        <div class=\"js-gist-file-update-container js-task-list-container file-box\">\n  <div id=\"file-dvc-repro-code\" class=\"file\">\n    \n\n  <div itemprop=\"text\" class=\"Box-body p-0 blob-wrapper data type-text\">\n      \n<table class=\"highlight tab-size js-file-line-container\" data-tab-size=\"8\">\n      <tbody><tr>\n        <td id=\"file-dvc-repro-code-L1\" class=\"blob-num js-line-number\" data-line-number=\"1\"></td>\n        <td id=\"file-dvc-repro-code-LC1\" class=\"blob-code blob-code-inner js-file-line\"># Improve ensemble configuration</td>\n      </tr>\n      <tr>\n        <td id=\"file-dvc-repro-code-L2\" class=\"blob-num js-line-number\" data-line-number=\"2\"></td>\n        <td id=\"file-dvc-repro-code-LC2\" class=\"blob-code blob-code-inner js-file-line\">$ vi code/config.R</td>\n      </tr>\n      <tr>\n        <td id=\"file-dvc-repro-code-L3\" class=\"blob-num js-line-number\" data-line-number=\"3\"></td>\n        <td id=\"file-dvc-repro-code-LC3\" class=\"blob-code blob-code-inner js-file-line\">\n</td>\n      </tr>\n      <tr>\n        <td id=\"file-dvc-repro-code-L4\" class=\"blob-num js-line-number\" data-line-number=\"4\"></td>\n        <td id=\"file-dvc-repro-code-LC4\" class=\"blob-code blob-code-inner js-file-line\"># Commit all the changes.</td>\n      </tr>\n      <tr>\n        <td id=\"file-dvc-repro-code-L5\" class=\"blob-num js-line-number\" data-line-number=\"5\"></td>\n        <td id=\"file-dvc-repro-code-LC5\" class=\"blob-code blob-code-inner js-file-line\">$ git commit -am \"Updated weights of the models in the ensemble\"</td>\n      </tr>\n      <tr>\n        <td id=\"file-dvc-repro-code-L6\" class=\"blob-num js-line-number\" data-line-number=\"6\"></td>\n        <td id=\"file-dvc-repro-code-LC6\" class=\"blob-code blob-code-inner js-file-line\">\n</td>\n      </tr>\n      <tr>\n        <td id=\"file-dvc-repro-code-L7\" class=\"blob-num js-line-number\" data-line-number=\"7\"></td>\n        <td id=\"file-dvc-repro-code-LC7\" class=\"blob-code blob-code-inner js-file-line\"># Reproduce the ensemble prediction</td>\n      </tr>\n      <tr>\n        <td id=\"file-dvc-repro-code-L8\" class=\"blob-num js-line-number\" data-line-number=\"8\"></td>\n        <td id=\"file-dvc-repro-code-LC8\" class=\"blob-code blob-code-inner js-file-line\">$ dvc repro data/submission_ensemble.csv</td>\n      </tr>\n</tbody></table>\n\n\n  </div>\n\n  </div>\n</div>\n\n      </div>\n      <div class=\"gist-meta\">\n        <a href=\"https://gist.github.com/gvyshnya/9d80e51ba3d7aa5bd37d100ed82376ee/raw/4367adacf7f6d78ad223289c52737588441fabcb/dvc%20repro%20code\" style=\"float:right\">view raw</a>\n        <a href=\"https://gist.github.com/gvyshnya/9d80e51ba3d7aa5bd37d100ed82376ee#file-dvc-repro-code\">dvc repro code</a>\n        hosted with ❤ by <a href=\"https://github.com\">GitHub</a>\n      </div>\n    </div>\n</div></body></html></p>\n<h2>Summary</h2>\n<p>In this blog post, we worked through the process of building an ensemble\nprediction pipeline using DVC. The essential key features of that pipeline were\nas follows</p>\n<ul>\n<li><strong><em>reproducibility</em></strong> — everybody on a team can run it on his/her premise</li>\n<li><strong><em>separation of data and code</em></strong> — this ensured everyone always runs the\nlatest versions of the pipeline jobs with the most up-to-date ‘golden copy’ of\ntraining and testing data sets</li>\n</ul>\n<p>The helpful side effect of using DVC was you stop keeping in mind what was\nchanged on every step of modifying your project scripts or in the pipeline\nconfiguration. Due to it maintaining the dependencies graph (DAG) automatically,\nit automatically triggered the only steps that were affected by the particular\nchanges, within the pipeline job setup. It, in turn, provides the capability to\nquickly iterate through the entire ML pipeline.</p>\n<blockquote>\n<p>As DVC brings proven engineering practices to often suboptimal and messy ML\nprocesses as well as helps a typical Data Science project team to eliminate a\nbig chunk of common\n<a href=\"https://blog.dataversioncontrol.com/data-version-control-in-analytics-devops-paradigm-35a880e99133\">DevOps overheads</a>,\nI found it extremely useful to leverage DVC on the industrial data science and\npredictive analytics projects.</p>\n</blockquote>\n<h2>Further Reading</h2>\n<ol>\n<li><a href=\"https://en.wikipedia.org/wiki/Ensemble_learning\">Ensemble Learning and Prediction Introduction</a></li>\n<li><a href=\"https://blog.dataversioncontrol.com/data-version-control-beta-release-iterative-machine-learning-a7faf7c8be67\">Using DVC in Machine Learning projects in Python</a></li>\n<li><a href=\"https://blog.dataversioncontrol.com/r-code-and-reproducible-model-development-with-dvc-1507a0e3687b\">Using DVC in Machine Learning projects in R</a></li>\n<li><a href=\"https://mlwave.com/kaggle-ensembling-guide/\">Kaggle Ensembling Guide</a></li>\n</ol>","timeToRead":12,"fields":{"slug":"/ml-model-ensembling-with-fast-iterations"},"frontmatter":{"title":"ML Model Ensembling with Fast Iterations","date":"August 23, 2017","description":"Here we'll talk about tools that help tackling common technical challenges of\nbuilding pipelines for the ensemble learning.\n","descriptionLong":"In many real-world Machine Learning projects, there is a need to ensemble\ncomplex models as well as maintain pipelines. As we will demonstrate, DVC is a\ngood tool that helps tackling common technical challenges of building\npipelines for the ensemble learning.\n","tags":["Best Practices","DVC","Model Ensembling","R"],"commentsUrl":"https://discuss.dvc.org/t/ml-model-ensembling-with-fast-iterations/296","author":{"childMarkdownRemark":{"frontmatter":{"name":"George Vyshnya","avatar":{"childImageSharp":{"fixed":{"base64":"data:image/jpeg;base64,/9j/2wBDABALDA4MChAODQ4SERATGCgaGBYWGDEjJR0oOjM9PDkzODdASFxOQERXRTc4UG1RV19iZ2hnPk1xeXBkeFxlZ2P/2wBDARESEhgVGC8aGi9jQjhCY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2P/wgARCAAUABQDASIAAhEBAxEB/8QAGQABAAMBAQAAAAAAAAAAAAAAAAIEBQMG/8QAFwEAAwEAAAAAAAAAAAAAAAAAAQIDAP/aAAwDAQACEAMQAAABjC9VJ6sc87GoTp50Zf/EABoQAAIDAQEAAAAAAAAAAAAAAAECAAMREhP/2gAIAQEAAQUCYPpFlZCkznFsI8+zO9tcZTs//8QAGREAAgMBAAAAAAAAAAAAAAAAAAECEBJB/9oACAEDAQE/AVBZMnK//8QAGhEAAgIDAAAAAAAAAAAAAAAAAAECEBESQf/aAAgBAgEBPwFzeTdna//EAB0QAAICAgMBAAAAAAAAAAAAAAABAhESIRAxQXH/2gAIAQEABj8CUYd1Ys3aZpWZdfCT2abIQpJZeE37XH//xAAaEAEBAQEBAQEAAAAAAAAAAAABEQBhQSEx/9oACAEBAAE/IWkCqc0EET55i645mSq94yEldwZBnHXOzwji/RDlw5v/2gAMAwEAAgADAAAAEKAoQ//EABcRAQEBAQAAAAAAAAAAAAAAAAABESH/2gAIAQMBAT8QuMpemP/EABcRAAMBAAAAAAAAAAAAAAAAAAABESH/2gAIAQIBAT8Q16O2DFZ//8QAHBABAQADAAMBAAAAAAAAAAAAAREAITFBUXGx/9oACAEBAAE/ELAwC8WBvziOXvYnEgcxZl9ZsjGNVeg/cVTAJDHyT5heJcYLgaB20Sy1uXG0xWawIKa1wz//2Q==","width":40,"height":40,"src":"/static/226395429650b032ac92f5ecf1410e9b/d83e5/george_vyshnya.jpg","srcSet":"/static/226395429650b032ac92f5ecf1410e9b/d83e5/george_vyshnya.jpg 1x,\n/static/226395429650b032ac92f5ecf1410e9b/58860/george_vyshnya.jpg 1.5x,\n/static/226395429650b032ac92f5ecf1410e9b/90ac5/george_vyshnya.jpg 2x","srcWebp":"/static/226395429650b032ac92f5ecf1410e9b/e145b/george_vyshnya.webp","srcSetWebp":"/static/226395429650b032ac92f5ecf1410e9b/e145b/george_vyshnya.webp 1x,\n/static/226395429650b032ac92f5ecf1410e9b/0d42c/george_vyshnya.webp 1.5x,\n/static/226395429650b032ac92f5ecf1410e9b/f46db/george_vyshnya.webp 2x"}}}}}},"picture":{"childImageSharp":{"fluid":{"base64":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAANCAIAAAAmMtkJAAAACXBIWXMAAAsSAAALEgHS3X78AAACzElEQVQoz2NgAAMuLi4xUVEgEhcXFxYWlpCQ4Obm4efnB4oLCPDz8fEJCwmLiIiIiooCSXZ2doguBiYmJiCpoaHh7+sXGRERGhwQFhqWk5Xl7uZsY21laKjr5+vv6+0VHhYSERbu6+MTFRkJVAzUwsjICDeCmRUE2Ll45FlZmDnYZfn4VYBcHh4TNlZudg4BDk4xFhYmFjBgZmZmwATsrDzsHApAtzAxC7KySQF9w8tnwcrCzc4uwcUlA3EjCmBkANnOziWoaehnapvMxSMrp+bOL6LJKwp0G6OIrL2wjK2YnJ2YtKWAoBLEpQgHM4A1c3CJiEiayqiHKBvlq5okyhvlajtUswp4SmvHKxslqZimKepFSiqHsnCowLWAfcvIxM8rLyNvJ6MZo2TRrOXQpqwboufSoGxdKmoxQ0wnTd00BahTQTtE365aRrdQRMpFXFyHiZEVElR8yqqOmnq+1g5p5taJVs7phtZp7kF1ciZZtp61Jr6tOrZ5hhZJemYJuoZBytq+IvKOalruAgJKDAzMDMx8Nmz8hly8ejw8GqISJkISVspavspaoZL+KzTcq9Wj14nqpMiqeIrL2otIWwvJ+3LJhTAwybHwmDAwKjKwSkezm3Rz2s/hdZ7PYzNZxG+doFmTlMNE28IzZvGTbQv2qwcuEnOeKuy6iNd2hpjnOk6HhSllC2fNXe8VkM/AIp3BarOYOfwMc9Fnltb/rF1/uOq/6fe8jypaVtS0Jad6tWHVGe4p/5k7/rCV/2CIfGhdee30pWf+SfOPHr0AdHYcs3ofs91GpvCLjHmvmBp+ABFH+3eRqf/lCx8o1n7k6frD0PiDseYnU9ozBu+TEh6bFq05P2/ZieqamQxMLL7M0nXMenOZXXcyJd5lrPvHVPiO33cxU8sbhrLPDAW/mCJOM4cfYcz/zhRymcVhG4PSLPeI+RvX7VJS8gEA2lypI2fPxLQAAAAASUVORK5CYII=","aspectRatio":1.5023474178403755,"src":"/static/3b64f2fc12bafbf137cc1d329f611ec5/286b3/post-image.png","srcSet":"/static/3b64f2fc12bafbf137cc1d329f611ec5/1f44b/post-image.png 213w,\n/static/3b64f2fc12bafbf137cc1d329f611ec5/3e433/post-image.png 425w,\n/static/3b64f2fc12bafbf137cc1d329f611ec5/286b3/post-image.png 850w,\n/static/3b64f2fc12bafbf137cc1d329f611ec5/9a739/post-image.png 1275w,\n/static/3b64f2fc12bafbf137cc1d329f611ec5/c47cc/post-image.png 1700w,\n/static/3b64f2fc12bafbf137cc1d329f611ec5/3897f/post-image.png 1920w","srcWebp":"/static/3b64f2fc12bafbf137cc1d329f611ec5/5c1d9/post-image.webp","srcSetWebp":"/static/3b64f2fc12bafbf137cc1d329f611ec5/99b2d/post-image.webp 213w,\n/static/3b64f2fc12bafbf137cc1d329f611ec5/23220/post-image.webp 425w,\n/static/3b64f2fc12bafbf137cc1d329f611ec5/5c1d9/post-image.webp 850w,\n/static/3b64f2fc12bafbf137cc1d329f611ec5/5e720/post-image.webp 1275w,\n/static/3b64f2fc12bafbf137cc1d329f611ec5/35cfd/post-image.webp 1700w,\n/static/3b64f2fc12bafbf137cc1d329f611ec5/25f09/post-image.webp 1920w","sizes":"(max-width: 850px) 100vw, 850px","presentationWidth":850}}},"pictureComment":null}}},"pageContext":{"next":{"fields":{"slug":"/best-practices-of-orchestrating-python-and-r-code-in-ml-projects"},"frontmatter":{"title":"Best practices of orchestrating Python and R code in ML projects"}},"previous":{"fields":{"slug":"/data-version-control-in-analytics-devops-paradigm"},"frontmatter":{"title":"Data Version Control in Analytics DevOps Paradigm"}},"currentPage":17,"slug":"/ml-model-ensembling-with-fast-iterations"}}}