A (rearranged) comprehensive list of open source big data tools from this paper https://cambridgeservicealliance….
Data Ingestion
- Hydrograph: capitalone/Hydrograph
- NSQ: https://github.com/nsqio/nsq
- Metamorphosis: https://github.com/killme2008/Me…
- Jafka: https://github.com/adyliu/jafka
- Disque: https://github.com/antirez/disque
- Open Messaging: openmessaging/openmessaging-java
- VerneMQ: https://github.com/erlio/vernemq
- Cherami-Server-Client: https://github.com/uber/cherami-…
- Machinery: https://github.com/RichardKnop/m…
- Suro: Netflix/suro
- LogStash: elastic/logstash
- Apache Chukwa: http://chukwa.apache.org/
- Apache Flume: Apache Flume
- Apache Gobblin: Apache Gobblin
- Apache Kafka: http://kafka.apache.org/
- Apache Nifi: Apache NiFi
- Apache Pulsar: http://pulsar.incubator.apache.org
- Apache RocketMQ: http://rocketmq.incubator.apache…
- Apache Sqoop: http://sqoop.apache.org/
Data Pre-processing
- OpenRefine: OpenRefine/OpenRefine
- Data Cleaner: datacleaner/DataCleaner
- Talend Open Studio: Talend/tbd-studio-se
- Wherehow: linkedin/WhereHows
- StreamSets-Data collector: streamsets/datacollector
- CKAN: ckan/ckan
- Boom Filters: https://github.com/tylertreat/Bo…
- Apache AsterixDB: Apache AsterixDB
- Apache Avro: Apache Avro!
- Apache CarbonData: CarbonData
- Apache Griffin: Apache Griffin
Storage
- ClickHouse: yandex/ClickHouse
- IndexR: shunfei/indexr
- Smart Storage Management: Intel-bigdata/SSM
- Grid DB: https://github.com/griddb/griddb…
- Druid: druid-io/druid
- Redis: antirez/redis
- TIDB: pingcap/tidb
- Titan: thinkaurelius/titan
- OpenTSDB: OpenTSDB/opentsdb
- TIDB: pingcap/tikv
- Crate: crate/crate
- RQLite: rqlite/rqlite
- ActorDB: biokoda/actordb
- JanusGraph: JanusGraph/janusgraph
- AtlasDB: palantir/atlasdb
- CurioDB: stephenmcd/curiodb
- Ceres: graphite-project/ceres
- RethinkDB: rethinkdb/rethinkdb
- Tera: baidu/tera
- Scylla: scylladb/scylla
- DGraph: dgraph-io/dgraph
- Bolt: boltdb/bolt
- BuntDB: https://github.com/tidwall/buntdb
- Voldemort voldemort/voldemort
- SummitDB: https://github.com/tidwall/summitdb
- Riak: https://github.com/basho/riak_kv
- Hstore: apavlo/h-store
- ElephantDB: nathanmarz/elephantdb
- Apache Accumulo: Apache Accumulo
- Apache Cassandra: Apache Cassandra
- Apache CouchDB: http://couchdb.apache.org/
- Apache Gora: Apache Gora&trade
- Apache HBase: Apache HBase
- Apache ORC: Apache ORC
- Apache Parquet: Apache Parquet
- Apache Rya: http://rya.incubator.apache.org/
- Apache S2Graph: http://s2graph.incubator.apache….
Distributed File System
- Ceph: ceph/ceph
- Baidu File System: baidu/bfs
- SeaweedFS: chrislusf/seaweedfs
- GlusterFS: gluster/glusterfs
- QFS: quantcast/qfs
- XtreemFS: xtreemfs/xtreemfs
- Hyperdrive mafintosh/hyperdrive
- Ambry: linkedin/ambry
- LizardFS GitHub lizardfs/lizardfs
- FastDFS GitHub happyfish100/fastdfs
- MooseFS GitHub moosefs/moosefs
- Alluxio: Alluxio/alluxio
Data Analysis
- Aperture Tiles: unchartedsoftware/aperture-tiles
- PrestoDB: prestodb/presto
- Simba: InitialDLab/Simba
- Geomesa: locationtech/geomesa
- FlashX: https://github.com/flashxio/FlashX
- MOA: https://github.com/Waikato/moa
- Squall: epfldata/squall
- RapidMiner: https://github.com/rapidminer/ra…
- Esper: espertechinc/esper
- Drools: kiegroup/drools
- Mondrian: pentaho/mondrian
- Godot: nodejitsu/godot
- Tensorflow: https://github.com/tensorflow/te…
- MLPack: https://github.com/mlpack/mlpack
- Conjecture: https://github.com/etsy/Conjecture
- Photon-ML: https://github.com/linkedin/phot…
- DMLC: https://github.com/dmlc/dmlc-core
- H20: https://github.com/h2oai/h2o-3
- DSSTNE https://github.com/amzn/amazon-d…
- Angel: https://github.com/Tencent/angel
- Oryx: https://github.com/OryxProject/oryx
- Fregata: https://github.com/TalkingData/F…
- Zen: https://github.com/cloudml/zen
- BenchML: https://github.com/szilard/bench…
- Cascalog: nathanmarz/cascalog
- Cascading: Cascading/cascading
- Scalding: twitter/scalding
- Jubatus: https://github.com/jubatus/jubatus
- PipelineDB: https://github.com/pipelinedb/pi…
- StreamCQL: https://github.com/HuaweiBigData…
- Apache Calcite: Apache Calcite
- Apache Drill: http://drill.apache.org/
- Apache HAWQ: Apache HAWQ®
- Apache Horn: HORN Project
- Apache Hive: http://hive.apache.org/
- Apache Hivemall: http://hivemall.incubator.apache…
- Apache Impala: http://impala.incubator.apache.org/
- Apache Kudu: http://kudu.apache.org/
- Apache Kylin: http://kylin.apache.org/
- Apache Lens: http://lens.apache.org/
- Apache MADLib: http://madlib.apache.org
- Apache Mahout: http://mahout.apache.org/
- Apache MetaModel: Apache MetaModel
- Apache MRQL: A Query Processing and Optimization System
- Apache Trafodion: http://trafodion.incubator.apach…
- Apache Phoenix: Apache Phoenix
- Apache Pig: http://pig.apache.org/
- Apache SAMOA: http://samoa.incubator.apache.org/
- Apache SINGA: http://singa.incubator.apache.org/
- Apache VXQuery: http://vxquery.apache.org/
- Apache SystemML: http://systemml.apache.org/
- Apache Tajo: http://tajo.apache.org/
Distributed Architecture
- Pentaho: pentaho/big-data-plugin
- Thrill https://github.com/thrill/thrill
- HPCC: https://github.com/hpcc-systems/…
- JStorm: https://github.com/alibaba/jstorm
- Riemann: https://github.com/riemann/riemann
- Tigon: https://github.com/caskdata/tigon
- Riko: https://github.com/nerevu/riko
- SensorBee: sensorbee/sensorbee
- Automi: vladimirvivien/automi
- Goka: lovoo/goka
- SpringCloudDataFlow: spring-cloud/spring-cloud-dataflow
- GraphJET: twitter/GraphJet
- PigPen: Netflix/PigPen
- Disco: discoproject/disco
- Infovore: paulhoule/infovore
- Gleam: chrislusf/gleam
- Glow: chrislusf/glow
- Parkour: damballa/parkour
- Onyx: onyx-platform/onyx
- SummingBird: twitter/summingbird
- Hydra: addthis/hydra
- Apache Apex: Apache Apex
- Apache Beam: Apache Beam
- Apache DataFu: http://datafu.incubator.apache.org/
- Apache Falcon: http://falcon.apache.org/
- Apache Flink: http://flink.apache.org/
- Apache Gearpump: Apache Gearpump
- Apache Giraph: Apache Giraph!
- Apache Hadoop: Apache™ Hadoop®!
- Apache Hama: Hama
- Apache Heron: Heron
- Apache Ignite: http://ignite.apache.org/
- Apache Samza: http://samza.apache.org/
- Apache Spark: http://spark.apache.org/
- Apache Storm: http://storm.apache.org/
Visualization
- Lumify: lumifyio/lumify
- Plywood: implydata/plywood
- Kibana: elastic/kibana
- Airpal: https://github.com/airbnb/airpal
- Bokeh: https://github.com/bokeh/bokeh
- Apache Zeppelin: http://zeppelin.apache.org/
Security & Governance
- HiBench: intel-hadoop/HiBench
- SpringXD: spring-projects/spring-xd
- Redisson: redisson/redisson
- Akka: https://github.com/akka/akka
- Mist: https://github.com/Hydrosphereda…
- Secor: https://github.com/pinterest/secor
- Elephant-Bird: https://github.com/twitter/eleph…
- Streaming Benchmark: https://github.com/yahoo/streami…
- Apache Ambari: Ambari
- Apache Atlas: Data Governance and Metadata framework for Hadoop
- Apache Bigtop: Bigtop – Apache Bigtop
- Apache BookKeeper: Apache BookKeeper™
- Apache Curator: http://curator.apache.org/
- Apache Eagle: http://eagle.apache.org/
- Apache Geode: Apache Geode
- Apache HTrace: http://htrace.incubator.apache.org/
- Apache Kerby: http://directory.apache.org/kerby/
- Apache Milagro: Milagro
- Apache Metron: Apache Metron
- Apache OODT: Apache OODT
- Apache Ranger: http://ranger.apache.org/
- Apache Spot: http://spot.incubator.apache.org/
- Apache Sentry: http://sentry.apache.org/
- Apache Thrift: http://thrift.apache.org
- Apache ZooKeeper: http://zookeeper.apache.org/
Cluster Management
- Apache Aurora: http://aurora.apache.org
- Azkaban: azkaban/azkaban
- Genie-Netflix: Genie by Netflix OSS
- Chronos: mesos/chronos
- Kubernetes: kubernetes/kubernetes
- Tron: Yelp/Tron
- Vitess: https://github.com/youtube/vitess
- Schedoscope: https://github.com/ottogroup/sch…
- Luigi: https://github.com/spotify/luigi
- Serf: https://github.com/hashicorp/serf
- Fineagle: https://github.com/twitter/finagle
- Fenzo: Netflix/Fenzo
- Apache Airavata: Apache Airavata
- Apache CloudStack: http://cloudstack.apache.org
- Apache Helix: Apache Helix
- Apache Mesos: Apache Mesos
- Apache Myriad: Apache Myriad
- Apache REEF: http://reef.apache.org/
- Apache Slider: http://slider.incubator.apache.org/
- Apache Tez: http://tez.apache.org/
- Apache Twill: http://twill.apache.org/
- Apache Oozie: Apache Oozie
Application
- ElasticSearch: elastic/elasticsearch
- KilrWeather: killrweather/killrweather
- Refarch: https://github.com/awslabs/lambd…
- Dat- Node: datproject/dat-node
- Redash: https://github.com/getredash/redash
- Rakam-IO: https://github.com/rakam-io/rakam
- Countly: https://github.com/Countly/count…
- Kapacitor: https://github.com/influxdata/ka…
- Apache Lucene: https://lucene.apache.org/core/
- Apache Nutch: Apache Nutch™
- Apache Solr: http://lucene.apache.org/solr/
Support
- Stream Alert: https://github.com/airbnb/stream…
- Finagle: https://github.com/twitter/finagle
- Apache Bahir: Home
- Apache Crunch: http://crunch.apache.org/
- Apache Edgent: http://edgent.incubator.apache.org/
- Apache Fluo: Apache Fluo
- Apache Knox: http://knox.apache.org/
- Apache River: http://river.apache.org/
- Apache Tephra: http://tephra.incubator.apache.org/
- Apache Omid: Apache Omid
- Apache OpenWhisk: Apache OpenWhisk