On October 27, 2010 in Mountain View at LinkedIn, the SDForum SAM SIG hosted Chris Riccomini, Senior Data Scientist on LinkedIn’s Product Analytics team to talk about â€œScalable Analytical Processing with LinkedInâ€™s Avatara.â€
Formerly of PayPal, Riccomini worked on LinkedIn’s â€People You May Knowâ€ feature and â€œWho’s Viewed My Profileâ€ product. He talked about the competing priorities between high throughput and low latency and the solution of Hadoop and Voldemort. LinkedIn needed something that would support offline aggregation, event-time aggregation and query time aggregation. It had to run through a Map/Reduce shared interface to power customer facing data products. He described a layered structure from top to bottom. On the top layer is the engine. Below that are the cube, and then the cube query. At the bottom are three elements: transform, aggregator and comparator.
The result was Avatara, a real time scalable Online Analytical Processing (OLAP) System already in production. It has features like select, count, where, group, having, order and limit. Riccomini described the architecture and implementation of its storage and aggregation layers.
One new term I heard was AvatarSQLishBuilder. Apparently, even in a NoSQL environment, the code should still have the look and structure of SQL. My advice for anyone heading into Hadoop territory is to take an experienced SQL database developer with you. Java is not enough in this Wild West show.
Another new term is Yahoo Cloud Serving Benchmark (YCSB). This is a way to compare various cloud products. I thought they were talking about yogurt. More explanation is at:
Richard Taylor was there and has written a splendid article about it at: