Tag Archives: Cassandra

Feb. 21, 2012 SVForum Pervasive Database Decisions

On February 21, 2012 in Palo Alto at SAP, SVForum’s Business Intelligence SIG Chair Corrinne Kahler introduced John Akred of Accenture. His topic was “Pervasive Data-Based Decisions.” Despite talk about SQL versus noSQL, the rise of big data does not mean the end of relational databases. Experts recognize the two worlds must coexist. Structured or unstructured, data is still data and needs to be turned into useful information. Big data is like the ocean, there is a lot of water but it must be processed before you can drink any of it. Now there will be even more demand for database ETL professionals willing to dive into it.

Big data will be incorporated to existing data structures adding more value through better context. GPS tracking data on delivery trucks gives insight into employee productivity and customer satisfaction. Tracking customer behavior informs how they make purchasing decisions.

Akred thinks people need to understand the difference of geometric and linear scalability. Most relational databases scale geometrically with costs of processing and storage increasing geometrically. With big data technologies like Hadoop, Cassandra or Amazon’s DynamoDB, the first terabyte of storage and processing power costs the same as the last.
Tools created by Asterdata, Greenplum and Microsoft SQL Server Azure can then bring this linear scalability to the relational world.

In short, companies are building better funnels for the fire hose of data heading your way.

Copyright 2012 DJ Cline All rights reserved.

Apr. 22, 2009 SDF Facebook Cassandra

SDForum copy.jpglakshman-avinash-copy.jpgmalik-prashant-copy.jpg

On April 22, 2009 in Palo Alto, SDForum’s SAM SIG hosted engineers Avinash Lakshman and Prashant Malik. Lakshman came from Amazon and Malik from Microsoft. Together they are working on something they call the Cassandra Project at Facebook.

Cassandra is a distributed storage system for managing structured data designed to scale to a very large size across many commodity servers, with no single point of failure. Reliability at massive scale is a very big challenge. Outages in service can have significant negative impact. Cassandra runs on top of an infrastructure of hundreds of nodes (possibly spread across different data centers). At this scale, small and large components can fail continuously. Cassandra manages the persistent state in the face of these failures driving the reliability and scalability of the software systems relying on this service. Cassandra achieves the goals of scalability, high performance, high availability and applicability. It shares many design and implementation strategies with databases. Cassandra does not support a full relational data model but provides clients with a simple data model that supports dynamic control over data layout and format.


Copyright 2009 DJ Cline All rights reserved.