There are large CSV files, and then there are CSV files like the ones Max Ogden (@denormalize) is familiar with through his work with academic and scientific researchers. In this case - a team of astronomers cataloguing every star they could see - we're talking 600 million lines of CSV and hundreds of gigabytes of data.
These sorts of massive research datasets require a specific tool to manage, and Ogden has helped built it, with the Dat Project. Dat is an open source, decentralized tool for versioning and syncing changes to data across distributed sets - making it especially useful for research data that needs to be published and archived.
In this episode, the New Builders brings you a conversation between Max and guest host Bradley Holt (@BradleyHolt), recorded live during Offline Camp California (along with a full slate of interviews). They talk about Max's longstanding work with Apache® CouchDB™ and how Couch helped him build single-page apps "before that was a thing" (2:02), what makes Dat ideal for storing and moving research data (3:30), how a data collection project evolved into his Cat Mapper work (9:37), what to expect at v3 of CSV Conf (10:26), and why Offline First could solve the rampant problem of 404 errors in scientific data (13:03).
Watch Max's passion talk from Offline Camp, "Acquiring Grant Funding for Open Source Projects," to learn tips and tricks for obtaining grant funding for scientific and open source projects.
You can find new episodes of The New Builders on developerWorks TV and SoundCloud. Find out more about IBM Cloud Data Services at IBM.biz/forbuilders. Contact hosts Doug Flora and Jim Young on Twitter (@DSFlora, @JW_Young) or email (email@example.com, firstname.lastname@example.org). The show’s music is provided by School for Robots. Check them out at schoolforrobots.bandcamp.com!