Introduction ...myitguide_small_01.jpg

Apache Cassandra is an open-source, NoSQL platform designed from the ground up to handle concurrent requests with fast writes and provide a low latency response for widely distributed users. DataStax Enterprise built on Cassandra provides integrated batch analytics so that users can leverage Hadoop tools such as Hive, Pig and Mahout to carry out the analysis on Cassandra data. In addition, DSE’s integrated search (built on Solr) delivers features such as full-text search, faceting, hit highlighting and more.​

​An interesting fact is that there are more than 100 NoSQL data stores in the software industry today; Cassandra has emerged as one of the leaders in this congested arena, thanks to its distinct capabilities that include easy scalability and ease of usage.

Cassandra's architecture is based on the best-of-the-world combination of two proven technologies—Google BigTable and Amazon Dynamo.

Amazon Dynamo is a proprietary key-value store developed at Amazon. The key Design requirements are high performance and high availability with continuous growth of data. These requirements mean that firstly, Dynamo has to support scalable, performant architectures that would be transparent to machine failures, and secondly, Dynamo used data replication and autosharding across multiple machines. 

Google BigTable is the underlying data store/model that runs multiple popular Google applications that we use daily, ranging from Gmail, YouTube, Orkut, Google Analytics, and much more. As it was invented by Google, BigTable is designed to scale up to petabytes (PB) of data and cater to real-time operations necessary for web-scale applications

Who uses Cassandra?

Today, Cassandra powers hundreds of big web scale applications for big names such as Netflix, eBay, Facebook, Twitter, Cisco, Comcast, Disney, Ericsson, Instagram, IHG,Intuit, NASA, PBS Kids, Travelocity, and others, which the majority of us use, today.

NetflixTM is the poster child for Cassandra usage as a scalable cloud-based web store, handling 60+ TBs of data. Interestingly, Netflix is the biggest single Internet traffic source as well as the largest pure cloud service for North America. Netflix has also shared the publicly available benchmar​k proving Cassandra's near-linear scalability.

We can see many more examples of the production users of Cassandra at PlannetCassandra.org—a website dedicated to Cassandra information that lists more than 400 Cassandra users.

Why Cassandra?

Apache Cassandra is an open-source, NoSQL platform designed from the ground up to handle concurrent requests with fast writes and provide a low latency response for widely distributed users. DataStax Enterprise built on Cassandra provides integrated batch analytics so that users can leverage Hadoop tools such as Hive, Pig and Mahout to carry out the analysis on Cassandra data. In addition, DSE’s integrated search (built on Solr) delivers features such as full-text search, faceting, hit highlighting and more.

The whole motivation with integrated Hadoop (soon to also have Spark/Shark in next release next month) is to "avoid" ETL altogether.  With the integration of Spark/Shark you will be able to do more "real time" analytics on the data.

Created by Billie D on 2017/10/26 08:33