JuliaCon 2018

How JuliaDB works

JuliaDB is a pure Julia analytical database. It makes loading large datasets and playing with them easy and fast. JuliaDB needs to support a number of features: releational database operations, quickly parsing text files, parallel computing, data storage and compression. This talk is a bottom-up look at the construction of JuliaDB. We will talk about the scope and implementation of underlying building block packages, namely IndexedTables, TextParse, Dagger, OnlineStats and PooledArrays. This talk should help beginners get started using, as well as hacking on JuliaDB!

Why we built JuliaDB?
Brief introduction to Indexed tables
Table construction
Benefits of Indexing
NDSparse view of a table
Selecting the data you need
Selection semantics
Aggregation
groupreduce vs groupby
Many aggregations at once
Use of selections in grouping
OnlineStats
Joins
Use of selection
About parallelism in JuliaDB
Representation of a distributed dataset
How parallel operations work
Demo of parallel speedup on a 100GB dataset.
Plotting
Demo of partitionplot on a big dataset
Demo of PlugAndPlot.jl and Sputnik.jl working with JuliaDB
Some trivia that are useful
Representation of String arrays
Representation of PooledArrays
The role of sortperm
Using the full power of Julia
User defined types and functions
Performance comparisons

Speaker's bio

I’m a Julia programmer at Julia Computing.