Videos from JuliaCon are now available online
2018
Previous editions: 2017 | 2016 | 2015 | 2014
Josh Day



Scalable Data Science with JuliaDB and OnlineStats

JuliaDB is a distributed database for high performance analytics that allows you to load large multi-file datasets across multiple processes, index the data for fast queries, and save tables to disk for quick reloading. JuliaDB is 100% Julia, so user-defined functions are compiled and you can efficiently store any data type. OnlineStats is a Julia package that provides online algorithms for statistics, machine learning, and big data visualization. Online algorithms update estimators one observation at a time in a single pass, making it unnecessary that your data fit in memory. JuliaDB interfaces with OnlineStats to provide a scalable analytical framework that does the heavy lifting for you. The same operations used on toy datasets can therefore be efficiently executed on a cluster without changing your code. This combination provides a powerful workflow for dealing with both large and small datasets. This talk will demonstrate how you can take advantage of these packages to implement scalable analytics over your own datasets.

Speaker's bio