Julia Computing
We at Julia Computing are working on a scheduler for running large DAGs (directed acyclic graphs) on many nodes in parallel. Though the concepts it is being built on are generic, it currently leverages on the existing Julia package Dagger (for representing computation DAGs) and is intended to be used with JuliaRun (to deploy Julia applications in production at scale). Based on experiences in running large scale DAG computations with JuliaDB, this talk will present problems we encountered and our approach to tackle them. In particular: - minimizing data transfer across process and node boundaries - inter-process data transfer without blocking computation - avoiding single point scheduler bottleneck - graceful failure handling