How JuliaDB works
JuliaDB is a pure Julia analytical database. It makes loading large datasets and playing with them easy and fast. JuliaDB needs to support a number of features: releational database operations, quickly parsing text files, parallel computing, data storage and compression.
This talk is a bottom-up look at the construction of JuliaDB. We will talk about the scope and implementation of underlying building block packages, namely IndexedTables, TextParse, Dagger, OnlineStats and PooledArrays. This talk should help beginners get started using, as well as hacking on JuliaDB!
- Why we built JuliaDB?
- Brief introduction to Indexed tables
- Table construction
- Benefits of Indexing
- NDSparse view of a table
- Selecting the data you need
- Selection semantics
- Aggregation
- groupreduce vs groupby
- Many aggregations at once
- Use of selections in grouping
- OnlineStats
- Joins
- Use of selection
- About parallelism in JuliaDB
- Representation of a distributed dataset
- How parallel operations work
- Demo of parallel speedup on a 100GB dataset.
- Plotting
- Demo of partitionplot on a big dataset
- Demo of PlugAndPlot.jl and Sputnik.jl working with JuliaDB
- Some trivia that are useful
- Representation of String arrays
- Representation of PooledArrays
- The role of sortperm
- Using the full power of Julia
- User defined types and functions
- Performance comparisons
Speaker's bio
I’m a Julia programmer at Julia Computing.