Tag Archives: semi-structured data

Sept. 21, 2010 SDF Analytics: SQL or NoSQL

On September 21, 2010 in Palo Alto at SAP, the SDForum Business Intelligence SIG hosted SenSage‘s Richard Taylor presentation “Analytics: SQL or NoSQL.” From his early days at Cambridge, Taylor’s research projects in parallel and distributed computing for DEC, Data-Cache, RedBrick Systems, Informix, and IBM are well known to experts in the business intelligence community. That is why the room was packed when he chose to talk about the new challenge to relational databases called the NoSQL movement.

Started as SEQUEL in 1974, it evolved into SQL. Adopted by Oracle, it became the standard for relational databases using schema, multi-version concurrency control, isolation levels and analytics extensions to deal with the complexity of structured data. The relational model created a world of normalized data in rows and columns with tables selected, projected or joined using primary or foreign keys. It had handled transaction processing very well but complicated cases became repetitive. Scaling was difficult.

By 2000, the rise of unstructured data on the web created new levels of complexity and the need for a new approach. Coined by Eric Evans in June of 2009, the NoSQL movement is seen in the development of Google’s Big Table, Amazon’s Dynamo and Facebook’s Cassandra. All of these used a tuple, one table consisting of a structured key with a column timestamp and an unstructured value. The two functions were map and reduce. Map input a tuple and output a list of tuples. Reduce input a key and list of values then output a list or tuple. You specified clusters, input and tuple stores as the framework did the rest. While there is no need to normalize large amounts of semi-structured data and it is cheaper to implement, it still requires some programming ability. There is no guidance from schema or model for historical data.

Taylor gave examples of how SQL and NoSQL would handle the same problems. Each had its advantages and disadvantages. I urge you to read Taylor’s work and listen to him speak on this subject.

Frankly, I would still want an experienced database developer with a strong background in SQL to deal with NoSQL because only they would be able sense when something was wrong. Big data is no place for amateurs.

Note: A delegation from Peru was in the audience. Picture below.

Copyright 2010 DJ Cline All rights reserved.