{"componentChunkName":"component---src-templates-blog-post-tsx","path":"/january-20-community-gems","result":{"data":{"markdownRemark":{"id":"098433f5-e16b-51a9-9c71-900b725126a4","excerpt":"<h2>Discord gems</h2>\n<p>There’s a lot of action in our Discord channel these days. Ruslan, DVC’s core\nmaintainer, said it best with a gif.</p>\n<p><html><head></head><body><blockquote class=\"twitter-tweet\" data-dnt=\"true\"><p lang=\"en\" dir=\"ltr\">How it feels when <a href=\"https://twitter.com/DVCorg\">@DVCorg</a> team is handling multiple conversations on Discord at the same time. <a href=\"https://t.co/QrLusdWYml\">https://t.co/QrLusdWYml</a></p>— 🦉 Ruslan Kuprieiev (@rkuprieiev) <a href=\"https://twitter.com/rkuprieiev/status/1144008869414342658\">June 26, 2019</a></blockquote></body></html></p>\n<p>It’s a lot…</p>","html":"<h2>Discord gems</h2>\n<p>There’s a lot of action in our Discord channel these days. Ruslan, DVC’s core\nmaintainer, said it best with a gif.</p>\n<p><html><head></head><body><blockquote class=\"twitter-tweet\" data-dnt=\"true\"><p lang=\"en\" dir=\"ltr\">How it feels when <a href=\"https://twitter.com/DVCorg\">@DVCorg</a> team is handling multiple conversations on Discord at the same time. <a href=\"https://t.co/QrLusdWYml\">https://t.co/QrLusdWYml</a></p>— 🦉 Ruslan Kuprieiev (@rkuprieiev) <a href=\"https://twitter.com/rkuprieiev/status/1144008869414342658\">June 26, 2019</a></blockquote></body></html></p>\n<p>It’s a lot to keep up with, so here are some highlights. We think these are\nuseful, good-to-know, and interesting conversations between DVC developers and\nusers.</p>\n<h3>Q: <a href=\"https://discordapp.com/channels/485586884165107732/563406153334128681/657590900754612284\">What pros does DVC have compared to Git LFS?</a></h3>\n<p>For an in-depth answer, check out this\n<a href=\"https://stackoverflow.com/questions/58541260/difference-between-git-lfs-and-dvc\">Stack Overflow discussion</a>.\nBut in brief, with DVC you don’t need a special server, and you can use nearly\nany kind of storage (S3, Google Cloud Storage, Azure Blobs, your own server,\netc.) without a fuss. There are also no limits on the size of the data that you\ncan store, unlike with GitHub. With Git LFS, there are some general LFS server\nlimits, too. DVC has additional features for sharing your data (e.g.,\n<html><head></head><body><code class=\"language-text\">dvc import</code></body></html>) and has pipeline support, so it does much more than LFS. Plus, we\nhave flexible and quick checkouts, as we utilize different link types (reflinks,\nsymlinks, and hardlinks). We think there are lots of advantages; of course, the\nusefulness will depend on your particular needs.</p>\n<h3>Q: <a href=\"https://discordapp.com/channels/485586884165107732/563406153334128681/656016145119182849\">How do I use DVC with SSH remote storage?</a> I usually connect with a .pem key file. How do I do the same with DVC?</h3>\n<p>DVC is built to work with the SSH protocol to access remote storage (we provide\nsome\n<a href=\"https://dvc.org/doc/user-guide/external-dependencies#ssh\">examples in our official documentation</a>).\nWhen SSH requires a key file, try this:</p>\n<html><head></head><body><div class=\"gatsby-highlight\" data-language=\"dvc\"><pre class=\"language-dvc\"><code class=\"language-dvc\"><span class=\"token line\"><span class=\"token input\">$ </span><span class=\"token dvc\">dvc remote modify</span> myremote keyfile <span class=\"token operator\">&#x3C;</span>path to *.pem<span class=\"token operator\">></span></span></code></pre></div></body></html>\n<h3>Q: <a href=\"https://discordapp.com/channels/485586884165107732/563406153334128681/651098762466426891\">If you train a TensorFlow model that creates multiple checkpoint files, how do you establish them as dependencies in the DVC pipeline?</a></h3>\n<p>You can specify a directory as a dependency/output in your DVC pipeline, and\nstore checkpointed models in that directory. It might look like this:</p>\n<html><head></head><body><div class=\"gatsby-highlight\" data-language=\"dvc\"><pre class=\"language-dvc\"><code class=\"language-dvc\"><span class=\"token line\"><span class=\"token input\">$ </span><span class=\"token dvc\">dvc run</span> <span class=\"token punctuation\">\\</span>\n     -f train.dvc <span class=\"token punctuation\">\\</span>\n     -d data <span class=\"token punctuation\">\\</span>\n     -d train.py <span class=\"token punctuation\">\\</span>\n     -o models python code/train.py</span></code></pre></div></body></html>\n<p>where <html><head></head><body><code class=\"language-text\">models</code></body></html> is a directory created for checkpoint files. If you would like to\npreserve your models in the data directory, though, then you would need to\nspecify them one by one. You can do this with bash:</p>\n<html><head></head><body><div class=\"gatsby-highlight\" data-language=\"dvc\"><pre class=\"language-dvc\"><code class=\"language-dvc\"><span class=\"token line\"><span class=\"token input\">$ </span><span class=\"token dvc\">dvc run</span> <span class=\"token variable\"><span class=\"token variable\">$(</span><span class=\"token keyword\">for</span> <span class=\"token for-or-select variable\">file</span> <span class=\"token keyword\">in</span> data/*.gz<span class=\"token punctuation\">;</span> <span class=\"token keyword\">do</span> <span class=\"token builtin class-name\">echo</span> -n -d $file<span class=\"token punctuation\">;</span> <span class=\"token keyword\">done</span><span class=\"token variable\">)</span></span></span></code></pre></div></body></html>\n<p>Be careful, though: if you declare checkpoint files to be an output of the DVC\npipeline, you won’t be able to re-run the pipeline using those checkpoint files\nto initialize weights for model training. This would introduce circularity, as\nyour output would become your input.</p>\n<p>Also keep in mind that whenever you re-run a pipeline with <html><head></head><body><code class=\"language-text\">dvc repro</code></body></html>, outputs\nare deleted and then regenerated. If you don’t wish to automatically delete\noutputs, there is a <html><head></head><body><code class=\"language-text\">--persist</code></body></html> flag (see discussion\n<a href=\"https://github.com/iterative/dvc/issues/1214\">here</a> and\n<a href=\"https://github.com/iterative/dvc/issues/1884\">here</a>), although we don’t\ncurrently provide technical support for it.</p>\n<p>Finally, remember that setting something as a dependency (<html><head></head><body><code class=\"language-text\">-d</code></body></html>) doesn’t mean it\nis automatically tracked by DVC. So remember to <html><head></head><body><code class=\"language-text\">dvc add</code></body></html> data files in the\nbeginning!</p>\n<h3>Q: <a href=\"https://discordapp.com/channels/485586884165107732/485596304961962003/655012135973158942\">Is it possible to use the same cache directory for multiple DVC repos that are used in parallel?</a> Or do I need external software to prevent potential race conditions?</h3>\n<p>This is absolutely possible, and you don’t need any external software to safely\nuse multiple DVC repos in parallel. With DVC, cache operations are atomic. The\nonly exception is cleaning the cache with <html><head></head><body><code class=\"language-text\">dvc gc</code></body></html>, which you should only run\nwhen no one else is working on a shared project that is referenced in your cache\n(and also, be sure to use the <html><head></head><body><code class=\"language-text\">--projects</code></body></html> flag\n<a href=\"https://dvc.org/doc/command-reference/gc\">as described in our docs</a>). For more\nabout using multiple DVC repos in parallel, check out some discussions\n<a href=\"https://discuss.dvc.org/t/setup-dvc-to-work-with-shared-data-on-nas-server/180\">here</a>\nand <a href=\"https://dvc.org/doc/use-cases/shared-development-server\">here</a>.</p>\n<h3>Q: <a href=\"https://discordapp.com/channels/485586884165107732/485596304961962003/652380507832844328\">What are some strategies for reproducibility if parts of our model training pipeline are run on our organizations’s HPC?</a></h3>\n<p>Using DVC for version control is entirely compatible with using remote computing\nresources, like high performance computing (HPC), in your model training\npipeline. We think a great example of using DVC with parallel computing is\nprovided by <a href=\"http://www.peterfogh.dk/\">Peter Fogh</a> Take a\n<a href=\"https://github.com/PeterFogh/dvc_dask_use_case\">look at his repo</a> for a\ndetailed use case. Please keep us posted about how HPC works in your pipeline,\nas we’ll be eager to pass on any insights to the community.</p>\n<h3>Q: Say I have a Git repository with multiple projets inside (one classification, one object detection, etc.). <a href=\"https://discordapp.com/channels/485586884165107732/563406153334128681/646760832616890408\">Is it possible to tell DVC to just pull data for one particular project?</a></h3>\n<p>Absolutely, DVC supports pulling data from different DVC-files. An example would\nbe having two project subdirectories in your Git repo, <html><head></head><body><code class=\"language-text\">classification</code></body></html> and\n<html><head></head><body><code class=\"language-text\">detection</code></body></html>. You could use <html><head></head><body><code class=\"language-text\">dvc pull -R classification</code></body></html> to only pull files in\nthat project to your workspace.</p>\n<p>If you prefer to be even more granular, you can <html><head></head><body><code class=\"language-text\">dvc add</code></body></html> files individually.\nThen you can use <html><head></head><body><code class=\"language-text\">dvc pull &#x3C;filename>.dvc</code></body></html> to retrieve the outputs specified\nonly by that file.</p>\n<h3>Q: <a href=\"https://discordapp.com/channels/485586884165107732/563406153334128681/623234659098296348\">Is it possible to set an S3 remote without the use of AWS credentials with DVC?</a> I want to publicly host a dataset so that everybody who clones my code repo can just run <html><head></head><body><code class=\"language-text\">dvc pull</code></body></html> to fetch the dataset.</h3>\n<p>Yes, and we love the idea of publicly hosting a dataset. There are a few ways to\ndo it with DVC. We use one method in our own DVC project repository on Github.\nIf you run <html><head></head><body><code class=\"language-text\">git clone https://github.com/iterative/dvc</code></body></html> and then <html><head></head><body><code class=\"language-text\">dvc pull</code></body></html>,\nyou’ll see that DVC is downloading data from an HTTP repository, which is\nactually just an S3 repository that we’ve granted public HTTP read-access to.</p>\n<p>So you would need to configure two remotes in your config file, each pointing to\nthe same S3 bucket through different protocols. Like this:</p>\n<html><head></head><body><div class=\"gatsby-highlight\" data-language=\"dvc\"><pre class=\"language-dvc\"><code class=\"language-dvc\"><span class=\"token line\"><span class=\"token input\">$ </span><span class=\"token dvc\">dvc remote add</span> -d --local myremote s3://bucket/path\n</span><span class=\"token line\"><span class=\"token input\">$ </span><span class=\"token dvc\">dvc remote add</span> -d mypublicemote http://s3-external-1.amazonaws.com/bucket/path</span></code></pre></div></body></html>\n<p>Here’s why this works: the <html><head></head><body><code class=\"language-text\">-d</code></body></html> flag sets the default remote, and the <html><head></head><body><code class=\"language-text\">--local</code></body></html>\nflag creates a set of configuration preferences that will override the global\nsettings when DVC commands are run locally and won’t be shared through Git (you\ncan read more about this\n<a href=\"https://dvc.org/doc/command-reference/remote/add#remote-add\">in our docs</a>).</p>\n<p>This means that even though you and users from the public are accessing the\nstored dataset by different protocols (S3 and HTTPS), you’ll all run the same\ncommand: <html><head></head><body><code class=\"language-text\">dvc pull</code></body></html>.</p>","timeToRead":5,"fields":{"slug":"/january-20-community-gems"},"frontmatter":{"title":"January '20 Community Gems","date":"January 20, 2020","description":"Great discussions and technical Q&A's from our users.\n","descriptionLong":"Every month we share news, findings, interesting reads,\ncommunity takeaways, and everything else along the way.\nSome of those are related to our brainchild DVC and its journey. The others\nare a collection of exciting stories and ideas centered around ML best\npractices and workflow.\n","tags":["Discord","DVC","Gems"],"commentsUrl":"https://discuss.dvc.org/t/january-20-community-gems/315","author":{"childMarkdownRemark":{"frontmatter":{"name":"Elle O'Brien","avatar":{"childImageSharp":{"fixed":{"base64":"data:image/jpeg;base64,/9j/2wBDABALDA4MChAODQ4SERATGCgaGBYWGDEjJR0oOjM9PDkzODdASFxOQERXRTc4UG1RV19iZ2hnPk1xeXBkeFxlZ2P/2wBDARESEhgVGC8aGi9jQjhCY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2P/wgARCAAUABQDASIAAhEBAxEB/8QAGQABAAIDAAAAAAAAAAAAAAAAAAMFAgQG/8QAFQEBAQAAAAAAAAAAAAAAAAAAAgP/2gAMAwEAAhADEAAAAZmtNOlyjIcrZgpiEP/EABsQAQACAgMAAAAAAAAAAAAAAAIBAxIhABEz/9oACAEBAAEFArV0dVzy942N41GdSpSt8A0D/8QAFxEAAwEAAAAAAAAAAAAAAAAAAQIgIf/aAAgBAwEBPwFRkf/EABURAQEAAAAAAAAAAAAAAAAAAAEg/9oACAECAQE/AWP/xAAfEAACAQIHAAAAAAAAAAAAAAAAARACEQMSUVJhcYH/2gAIAQEABj8CS3MzUexhvQ7h3FwWTP/EAB0QAAMBAAEFAAAAAAAAAAAAAAABESFRMUFxsdH/2gAIAQEAAT8hsbi0nApKKKoujQjiDTN+15X0ovgcWhPSElmTuf/aAAwDAQACAAMAAAAQHOi9/8QAGBEAAwEBAAAAAAAAAAAAAAAAAAExEBH/2gAIAQMBAT8QRRwUz//EABcRAAMBAAAAAAAAAAAAAAAAAAEQITH/2gAIAQIBAT8QOo6v/8QAHRABAAMBAAIDAAAAAAAAAAAAAQARITFxkUFhsf/aAAgBAQABPxB89x9U6Sn6EA69v7iEs5LB0aDWr38jGAHlgPfSAs0pXqIzhQ8QFPkh4ORpypLXVhnif//Z","width":40,"height":40,"src":"/static/1614906361c7d460137741db062e0c7e/d83e5/elle_obrien.jpg","srcSet":"/static/1614906361c7d460137741db062e0c7e/d83e5/elle_obrien.jpg 1x,\n/static/1614906361c7d460137741db062e0c7e/58860/elle_obrien.jpg 1.5x,\n/static/1614906361c7d460137741db062e0c7e/90ac5/elle_obrien.jpg 2x","srcWebp":"/static/1614906361c7d460137741db062e0c7e/e145b/elle_obrien.webp","srcSetWebp":"/static/1614906361c7d460137741db062e0c7e/e145b/elle_obrien.webp 1x,\n/static/1614906361c7d460137741db062e0c7e/0d42c/elle_obrien.webp 1.5x,\n/static/1614906361c7d460137741db062e0c7e/f46db/elle_obrien.webp 2x"}}}}}},"picture":{"childImageSharp":{"fluid":{"base64":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAALCAYAAAB/Ca1DAAAACXBIWXMAAAsSAAALEgHS3X78AAACT0lEQVQozzVTWU9TYRC9vwVaNk2ALtDS7XaltNCdtkBCaZrSLWkLbWkbCMSfZWLUxAf/AQkm+mZiXNAXNYp6nDPKw8l8M/Pd+c7MnGvU63VEIhH4/X4kEgns7+/DNE0EBTs7OyiXy5pjLJ/PK3hmjLlUKoVgMKixSqUCo1QqIRAIYGtrC+FwGJFoFFFBXHxaxjc3NxGPxxGLxRT07+39tyyayWRgDAYDfSkUCuHo6AhlecDn9SIoF70eD8yACZfLBbfLDZ/PB/9/BIRhUjqq1WpamExPTk5gDIdDLWiz2dBsNpEQJksLC3CtrSGfzaCYTaOUEyvYlfNuLgufPPRwaQlOux2N42Osr68jnU5jNBrB4EwY8AorwhRmi/PzqAnbD1/vcH0r+PQTN5/vcHP7A+Li8uIccxYL1hwOZU/mtLlcDka1WsX29rY62WwWMVnQnNWKZq2K6y/A03fA47d/8Fztb7z6Djy6vIB1dla74LckRctaxnQ61ZadTie63S7SMguLXG7Xa3j9DXj2EXh5Czx5D7yQ85tfUvDqCpaZGW293W5rh8ViEaxltFot3ZhD6HNLEdn0g8VFmDL04emp4lSGPR78w0hmnpAl8A5bZldsl1tWhpQNNUVZMKhblFnSri4vY3VlGQ5ZmF2wQl/gcbt11p6NDZ0fZUOfujW4atKlbA4ODtDr9ZBMJlWDFVlMv9/Xx4iGqKDT6WhHzPNM2TDHGXJkxtnZGRqNhrJkYDKZ4PDwUB+gRukXCgVlwbvj8Vg/5niYo1T4p+3t7WnuL5MHxLTWXBdhAAAAAElFTkSuQmCC","aspectRatio":1.7804154302670623,"src":"/static/4c384504c720b784d8f8c817bea289d9/286b3/Community_Gems.png","srcSet":"/static/4c384504c720b784d8f8c817bea289d9/1f44b/Community_Gems.png 213w,\n/static/4c384504c720b784d8f8c817bea289d9/3e433/Community_Gems.png 425w,\n/static/4c384504c720b784d8f8c817bea289d9/286b3/Community_Gems.png 850w,\n/static/4c384504c720b784d8f8c817bea289d9/9a739/Community_Gems.png 1275w,\n/static/4c384504c720b784d8f8c817bea289d9/c47cc/Community_Gems.png 1700w,\n/static/4c384504c720b784d8f8c817bea289d9/8bc52/Community_Gems.png 2400w","srcWebp":"/static/4c384504c720b784d8f8c817bea289d9/5c1d9/Community_Gems.webp","srcSetWebp":"/static/4c384504c720b784d8f8c817bea289d9/99b2d/Community_Gems.webp 213w,\n/static/4c384504c720b784d8f8c817bea289d9/23220/Community_Gems.webp 425w,\n/static/4c384504c720b784d8f8c817bea289d9/5c1d9/Community_Gems.webp 850w,\n/static/4c384504c720b784d8f8c817bea289d9/5e720/Community_Gems.webp 1275w,\n/static/4c384504c720b784d8f8c817bea289d9/35cfd/Community_Gems.webp 1700w,\n/static/4c384504c720b784d8f8c817bea289d9/f1e40/Community_Gems.webp 2400w","sizes":"(max-width: 850px) 100vw, 850px","presentationWidth":850}}},"pictureComment":null}}},"pageContext":{"next":{"fields":{"slug":"/gsoc-ideas-2020"},"frontmatter":{"title":"Join DVC for Google Summer of Code 2020"}},"previous":{"fields":{"slug":"/january-20-dvc-heartbeat"},"frontmatter":{"title":"January '20 DVC❤️Heartbeat"}},"currentPage":3,"slug":"/january-20-community-gems"}}}