{"componentChunkName":"component---src-templates-blog-post-tsx","path":"/march-19-dvc-heartbeat","result":{"data":{"markdownRemark":{"id":"c739ad83-e7c1-56d2-942f-e330105e502b","excerpt":"<p>This is the very first issue of the DVC❤️Heartbeat. Every month we will be\nsharing our news, findings, interesting reads, community…</p>","html":"<p>This is the very first issue of the DVC❤️Heartbeat. Every month we will be\nsharing our news, findings, interesting reads, community takeaways, and\neverything along the way.</p>\n<p>Some of those are related to our brainchild <a href=\"https://dvc.org\">DVC</a> and its\njourney. The others are a collection of exciting stories and ideas centered\naround ML best practices and workflow.</p>\n<h2>News and links</h2>\n<p>We read a ton of articles and posts every day and here are a few that caught our\neye. Well-written, offering a different perspective and definitely worth\nchecking.</p>\n<ul>\n<li><strong><a href=\"https://veekaybee.github.io/2019/02/13/data-science-is-different/\">Data science is different now</a>\nby <a href=\"https://veekaybee.github.io/\">Vicki Boykis</a></strong></li>\n</ul>\n<p><html><head></head><body><html><head></head><body><section class=\"elp-content-holder\">\n      <a href=\"https://veekaybee.github.io/2019/02/13/data-science-is-different/\" class=\"external-link-preview\">\n          <div class=\"elp-description-holder\">\n            <h4 class=\"elp-title\">Data science is different now</h4>\n            <div class=\"elp-description\">Woman holding a balance, Vermeer 1664 What do you think of when you read the phrase 'data science'? It's probably some…</div>\n            <div class=\"elp-link\">veekaybee.github.io</div>\n          </div>\n           <div class=\"elp-image-holder\">\n                <img src=\"/uploads/images/2019-03-05/data-science-is-different-now.png\" alt=\"Data science is different now\">\n            </div>\n      </a>\n    </section>\n    </body></html></body></html></p>\n<blockquote>\n<p>What is becoming clear is that, in the late stage of the hype cycle, data\nscience is asymptotically moving closer to engineering, and the\n<a href=\"https://www.youtube.com/watch?v=frQeK8xo9Ls\">skills that data scientists need</a>\nmoving forward are less visualization and statistics-based, and\n<a href=\"https://tech.trivago.com/2018/12/03/teardown-rebuild-migrating-from-hive-to-pyspark/\">more in line with traditional computer science curricula</a>.</p>\n</blockquote>\n<ul>\n<li><strong><a href=\"https://emilygorcenski.com/post/data-versioning/\">Data Versioning</a> by\n<a href=\"https://emilygorcenski.com/\">Emily F. Gorcenski</a></strong></li>\n</ul>\n<p><html><head></head><body><html><head></head><body><section class=\"elp-content-holder\">\n      <a href=\"https://emilygorcenski.com/post/data-versioning/\" class=\"external-link-preview\">\n          <div class=\"elp-description-holder\">\n            <h4 class=\"elp-title\">Data Versioning</h4>\n            <div class=\"elp-description\">Productionizing machine learning/AI/data science is a challenge. Not only are the outputs of machine-learning…</div>\n            <div class=\"elp-link\">emilygorcenski.com</div>\n          </div>\n           <div class=\"elp-image-holder\">\n                <img src=\"/uploads/images/2019-03-05/data-versioning.jpeg\" alt=\"Data Versioning\">\n            </div>\n      </a>\n    </section>\n    </body></html></body></html></p>\n<blockquote>\n<p>I want to explore how the degrees of freedom in versioning machine learning\nsystems poses a unique challenge. I’ll identify four key axes on which machine\nlearning systems have a notion of version, along with some brief\nrecommendations for how to simplify this a bit.</p>\n</blockquote>\n<ul>\n<li><strong><a href=\"https://blog.mi.hdm-stuttgart.de/index.php/2019/02/26/reproducibility-in-ml/\">Reproducibility in Machine Learning</a>\nby <a href=\"https://blog.mi.hdm-stuttgart.de/index.php/author/pf023/\">Pascal Fecht</a></strong></li>\n</ul>\n<p><html><head></head><body><html><head></head><body><section class=\"elp-content-holder\">\n      <a href=\"https://emilygorcenski.com/post/data-versioning/\" class=\"external-link-preview\">\n          <div class=\"elp-description-holder\">\n            <h4 class=\"elp-title\">Reproducibility in Machine Learning | Computer Science Blog</h4>\n            <div class=\"elp-description\">The rise of Machine Learning has led to changes across all areas of computer science. From a very abstract point of…</div>\n            <div class=\"elp-link\">blog.mi.hdm-stuttgart.de</div>\n          </div>\n           <div class=\"elp-image-holder\">\n                <img src=\"/uploads/images/2019-03-05/reproducibility-in-machine-learning.jpeg\" alt=\"Reproducibility in Machine Learning | Computer Science Blog\">\n            </div>\n      </a>\n    </section>\n    </body></html></body></html></p>\n<blockquote>\n<p>…the objective of this post is not to philosophize about the dangers and\ndark sides of AI. In fact, this post aims to work out common challenges in\nreproducibility for machine learning and shows programming differences to\nother areas of Computer Science. Secondly, we will see practices and workflows\nto create a higher grade of reproducibility in machine learning algorithms.</p>\n</blockquote>\n<html><head></head><body><hr></body></html>\n<h2>Discord gems</h2>\n<p>There are lots of hidden gems in our Discord community discussions. Sometimes\nthey are scattered all over the channels and hard to track down.</p>\n<p>We will be sifting through the issues and discussions and share the most\ninteresting takeaways.</p>\n<h3>Q: <a href=\"https://discordapp.com/channels/485586884165107732/485586884165107734/541622187296161816\">Edit and define DVC files manually, in a Makefile style</a></h3>\n<p>There is no separate guide for that, but it is very straight forward. See\n<a href=\"https://dvc.org/doc/user-guide/dvc-file-format\">DVC file format</a> description\nfor how DVC file looks inside in general. All <html><head></head><body><code class=\"language-text\">dvc add</code></body></html> or <html><head></head><body><code class=\"language-text\">dvc run</code></body></html> does is\njust computing <html><head></head><body><code class=\"language-text\">md5</code></body></html> fields in it, that is all. You could write your DVC-file\nand then run <html><head></head><body><code class=\"language-text\">dvc repro</code></body></html> that will run a command(if any) and compute all needed\nchecksums,<a href=\"https://discordapp.com/channels/485586884165107732/485586884165107734/541622187296161816\">read more</a>.</p>\n<h3>Q: <a href=\"https://discordapp.com/channels/485586884165107732/485586884165107734/547424240677158915\">Best practices to define the code dependencies</a></h3>\n<p>There’s a ton of code in that project, and it’s very non-trivial to define the\ncode dependencies for my training stage — there are a lot of imports going on,\nthe training code is distributed across many modules,\n<a href=\"https://discordapp.com/channels/485586884165107732/485586884165107734/547424240677158915\">read more</a></p>\n<h3>Q: <a href=\"https://discordapp.com/channels/485586884165107732/485586884165107734/548495589428428801\">Azure data lake support</a></h3>\n<p>DVC officially only supports regular Azure blob storage. Gen1 Data Lake should\nbe accessible by the same interface, so configuring a regular azure remote for\nDVC should work. Seems like Gen2 Data Lake\n<a href=\"https://discordapp.com/channels/485586884165107732/485586884165107734/550546413197590539\">has disable</a>\nblob API. If you know more details about the difference between Gen1 and Gen2,\nfeel free to join <a href=\"https://dvc.org/chat\">our community</a> and share this\nknowledge.</p>\n<h3>Q: <a href=\"https://discordapp.com/channels/485586884165107732/485596304961962003/542390986299539459\">What licence DVC is released under</a></h3>\n<p>Apache 2.0. One of the <a href=\"https://opensource.org/licenses\">most common</a> and\npermissible OSS licences.</p>\n<h3>Q: Setting up S3 compatible remote</h3>\n<p>(<a href=\"https://discordapp.com/channels/485586884165107732/485596304961962003/543445798868746278\">Localstack</a>,\n<a href=\"https://discordapp.com/channels/485586884165107732/485596304961962003/541466951474479115\">wasabi</a>)</p>\n<html><head></head><body><div class=\"gatsby-highlight\" data-language=\"dvc\"><pre class=\"language-dvc\"><code class=\"language-dvc\"><span class=\"token line\"><span class=\"token input\">$ </span><span class=\"token dvc\">dvc remote add</span> upstream s3://my-bucket\n</span><span class=\"token line\"><span class=\"token input\">$ </span><span class=\"token dvc\">dvc remote modify</span> upstream region REGION_NAME\n</span><span class=\"token line\"><span class=\"token input\">$ </span><span class=\"token dvc\">dvc remote modify</span> upstream endpointurl <span class=\"token operator\">&#x3C;</span>url<span class=\"token operator\">></span></span></code></pre></div></body></html>\n<p>Find and click the <html><head></head><body><code class=\"language-text\">S3 API compatible storage</code></body></html> on\n<a href=\"https://dvc.org/doc/commands-reference/remote-add\">this page</a></p>\n<h3>Q: <a href=\"https://discordapp.com/channels/485586884165107732/485596304961962003/543914550173368332\">Why DVC creates and updates <html><head></head><body><code class=\"language-text\">.gitignore</code></body></html> file?</a></h3>\n<p>It adds your data files there, that are tracked by DVC, so that you don’t\naccidentally add them to git as well you can open it with file editor of your\nliking and see your data files listed there.</p>\n<h3>Q: <a href=\"https://discordapp.com/channels/485586884165107732/485596304961962003/545562334983356426\">Managing data and pipelines with DVC on HDFS</a></h3>\n<p>With DVC, you could connect your data sources from HDFS with your pipeline in\nyour local project, by simply specifying it as an external dependency. For\nexample let’s say your script <html><head></head><body><code class=\"language-text\">process.cmd</code></body></html> works on an input file on HDFS and\nthen downloads a result to your local workspace, then with DVC it could look\nsomething like:</p>\n<html><head></head><body><div class=\"gatsby-highlight\" data-language=\"dvc\"><pre class=\"language-dvc\"><code class=\"language-dvc\"><span class=\"token line\"><span class=\"token input\">$ </span><span class=\"token dvc\">dvc run</span> -d hdfs://example.com/home/shared/input <span class=\"token punctuation\">\\</span>\n          -d process.cmd <span class=\"token punctuation\">\\</span>\n          -o output process.cmd</span></code></pre></div></body></html>\n<p><a href=\"https://discordapp.com/channels/485586884165107732/485596304961962003/545562334983356426\">read more</a>.</p>\n<html><head></head><body><hr></body></html>\n<p>If you have any questions, concerns or ideas, let us know\n<a href=\"https://dvc.org/support\">here</a> and our stellar team will get back to you in no\ntime.</p>","timeToRead":5,"fields":{"slug":"/march-19-dvc-heartbeat"},"frontmatter":{"title":"March ’19 DVC❤️Heartbeat","date":"March 05, 2019","description":"The very first issue of the DVC Heartbeat! News, links, Discord discussions\nfrom the community.\n","descriptionLong":"Every month we are sharing here our news, findings, interesting reads,\ncommunity takeaways, and everything along the way.\nSome of those are related to our brainchild DVC and its journey. The others\nare a collection of exciting stories and ideas centered around ML best\npractices and workflow.\n","tags":["Heartbeat","Discord Gems","DVC"],"commentsUrl":"https://discuss.dvc.org/t/march-19-dvc-heartbeat/293","author":{"childMarkdownRemark":{"frontmatter":{"name":"Svetlana Grinchenko","avatar":{"childImageSharp":{"fixed":{"base64":"data:image/jpeg;base64,/9j/2wBDABALDA4MChAODQ4SERATGCgaGBYWGDEjJR0oOjM9PDkzODdASFxOQERXRTc4UG1RV19iZ2hnPk1xeXBkeFxlZ2P/2wBDARESEhgVGC8aGi9jQjhCY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2P/wgARCAAUABQDASIAAhEBAxEB/8QAGAABAAMBAAAAAAAAAAAAAAAAAAIDBQT/xAAUAQEAAAAAAAAAAAAAAAAAAAAA/9oADAMBAAIQAxAAAAHRz7x3qhnQCoH/xAAaEAACAwEBAAAAAAAAAAAAAAABAgMEMxIU/9oACAEBAAEFArBJZ0aqyN0s+trCvhYchTKzj0SRD//EABQRAQAAAAAAAAAAAAAAAAAAACD/2gAIAQMBAT8BH//EABQRAQAAAAAAAAAAAAAAAAAAACD/2gAIAQIBAT8BH//EAB0QAAICAgMBAAAAAAAAAAAAAAECABExQQMSIVH/2gAIAQEABj8CXiVuvbcDK9jYMDDcRyLBFTHpIixRflQjAHyBVwJ//8QAGxABAAMAAwEAAAAAAAAAAAAAAQARQSExUXH/2gAIAQEAAT8ht1LLoIkF4B6wLgOVH8uRVjiTT5AoqBr2L1odagwijiyf/9oADAMBAAIAAwAAABBjDwD/xAAUEQEAAAAAAAAAAAAAAAAAAAAg/9oACAEDAQE/EB//xAAUEQEAAAAAAAAAAAAAAAAAAAAg/9oACAECAQE/EB//xAAeEAACAwADAAMAAAAAAAAAAAABEQAhMUFRYXGBkf/aAAgBAQABPxCmAzsDr5hIjqwzzzqjPahi5r8IQVrj2MCG47GtJ+o5KCJ7t+zbocIBE8rYP/mqRYTv5EF8QZIHU//Z","width":40,"height":40,"src":"/static/fcc8502faa36f9a989fa0651c3c21653/d83e5/svetlana_grinchenko.jpg","srcSet":"/static/fcc8502faa36f9a989fa0651c3c21653/d83e5/svetlana_grinchenko.jpg 1x,\n/static/fcc8502faa36f9a989fa0651c3c21653/58860/svetlana_grinchenko.jpg 1.5x,\n/static/fcc8502faa36f9a989fa0651c3c21653/90ac5/svetlana_grinchenko.jpg 2x","srcWebp":"/static/fcc8502faa36f9a989fa0651c3c21653/e145b/svetlana_grinchenko.webp","srcSetWebp":"/static/fcc8502faa36f9a989fa0651c3c21653/e145b/svetlana_grinchenko.webp 1x,\n/static/fcc8502faa36f9a989fa0651c3c21653/0d42c/svetlana_grinchenko.webp 1.5x,\n/static/fcc8502faa36f9a989fa0651c3c21653/f46db/svetlana_grinchenko.webp 2x"}}}}}},"picture":{"childImageSharp":{"fluid":{"base64":"data:image/jpeg;base64,/9j/2wBDABALDA4MChAODQ4SERATGCgaGBYWGDEjJR0oOjM9PDkzODdASFxOQERXRTc4UG1RV19iZ2hnPk1xeXBkeFxlZ2P/2wBDARESEhgVGC8aGi9jQjhCY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2P/wgARCAANABQDASIAAhEBAxEB/8QAGAAAAwEBAAAAAAAAAAAAAAAAAAQFAQP/xAAUAQEAAAAAAAAAAAAAAAAAAAAC/9oADAMBAAIQAxAAAAFjlrYcspCH/8QAGxABAAICAwAAAAAAAAAAAAAAAgEDACEREiL/2gAIAQEAAQUCuHY8I5ZPo2zhWpW//8QAFREBAQAAAAAAAAAAAAAAAAAAEDH/2gAIAQMBAT8Bh//EABURAQEAAAAAAAAAAAAAAAAAABAx/9oACAECAQE/Aaf/xAAdEAABAgcAAAAAAAAAAAAAAAAAAQIRICExMlGR/9oACAEBAAY/AoZO2JTklj//xAAaEAADAAMBAAAAAAAAAAAAAAAAAREhMWFR/9oACAEBAAE/IYrPoxRSkvYpl6R4UNupUVtZ/9oADAMBAAIAAwAAABD3L//EABYRAQEBAAAAAAAAAAAAAAAAAAERAP/aAAgBAwEBPxCuJW7/xAAXEQEBAQEAAAAAAAAAAAAAAAABABEh/9oACAECAQE/EMFHC//EAB0QAQACAgIDAAAAAAAAAAAAAAEAESExQVFhgeH/2gAIAQEAAT8QBxFHiPRKImtBTezG+44SJRZVZc6i0APLmHZmFpiDoC7X5P/Z","aspectRatio":1.500280112044818,"src":"/static/70b24f4f954576937ebc3bc84e01679b/6fdf8/post-image.jpg","srcSet":"/static/70b24f4f954576937ebc3bc84e01679b/9fc73/post-image.jpg 213w,\n/static/70b24f4f954576937ebc3bc84e01679b/ee221/post-image.jpg 425w,\n/static/70b24f4f954576937ebc3bc84e01679b/6fdf8/post-image.jpg 850w,\n/static/70b24f4f954576937ebc3bc84e01679b/88a70/post-image.jpg 1275w,\n/static/70b24f4f954576937ebc3bc84e01679b/15ae8/post-image.jpg 1700w,\n/static/70b24f4f954576937ebc3bc84e01679b/44ebd/post-image.jpg 2678w","srcWebp":"/static/70b24f4f954576937ebc3bc84e01679b/5c1d9/post-image.webp","srcSetWebp":"/static/70b24f4f954576937ebc3bc84e01679b/99b2d/post-image.webp 213w,\n/static/70b24f4f954576937ebc3bc84e01679b/23220/post-image.webp 425w,\n/static/70b24f4f954576937ebc3bc84e01679b/5c1d9/post-image.webp 850w,\n/static/70b24f4f954576937ebc3bc84e01679b/5e720/post-image.webp 1275w,\n/static/70b24f4f954576937ebc3bc84e01679b/35cfd/post-image.webp 1700w,\n/static/70b24f4f954576937ebc3bc84e01679b/d664d/post-image.webp 2678w","sizes":"(max-width: 850px) 100vw, 850px","presentationWidth":850}}},"pictureComment":null}}},"pageContext":{"next":{"fields":{"slug":"/april-19-dvc-heartbeat"},"frontmatter":{"title":"April ’19 DVC❤️Heartbeat"}},"previous":{"fields":{"slug":"/ml-best-practices-in-pytorch-dev-conf-2018"},"frontmatter":{"title":"ML best practices in PyTorch dev conf 2018"}},"currentPage":14,"slug":"/march-19-dvc-heartbeat"}}}