Capillaries 1.1.12: distributed batch processing with Go

kleineshertz · November 24, 2023, 7:09pm

It’s a batch data processing framework. It’s distributed, it uses Cassandra as a data store and RabbitMQ for task scheduling, it implements basic relational algebra, it allows very flexible row-based data transforms.

It is somewhat similar to Apache Spark, but there are some key features that make it stand out:

written in Go, uses Go expressions for simple one-liner data transforms, Python for complex calculations;
declarative, formalized data transform process configuration, explicit DAG as part of the configuration;
each transform produces a persistent Cassandra table that is used a source for further transforms, or for troubleshooting;
the focus is on delivering production-quality data, so intermediate results can be signed-off or re-calculated by an operator if needed.

Github readme has instructions how to run a 100% Docker-based demo and see Capillaries in action within minutes.

Version 1.1.12 adds minor improvements that allows to obtain tangible performance results and cost estimates for a ~800-core AWS-based test environment: Capillaries: ARK portfolio performance calculation at scale.

system · February 22, 2024, 7:10pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.