Tag Archives: Map/Reduce

Aug. 7, 2013 Hive Hadoop MapReduce And SQL

hivelogo Erickson Justin 2 Gates Alan 2 Kaushik Sausheel 2 Patel Priyank Ramakrishnan Raghu Ravi TM Shiran Tomer 4

On Wednesday, August 7, 2013 in Sunnyvale at NetApp, The Hive Held an event discussing Big Data, Hadoop, Hive, MapReduce, Pig and SQL. Raghu Ramakrishnan of Microsoft moderated panelists Justin Erickson of Cloudera, Alan Gates of Hortonworks, Sausheel Kaushik of Pivotal, Priyank Patel of Teradata Aster, and Tomer Shiran of MapR. Grabbing large batches of data with MapReduce is fine, but businesses still want SQL for interactive and real-time queries. The result will be a hybrid of new and old strategies. The best strategy is to hire an experienced SQL developer with a strong ETL background and let them learn the new tools. You will get the information you need when you need it.

IMG_5959DJClinecom IMG_5968DJClinecom IMG_5969DJClinecom IMG_5971DJClinecom IMG_5972DJClinecom IMG_5974DJClinecom IMG_5976DJClinecom IMG_5978DJClinecom IMG_5982DJClinecom IMG_5985DJClinecom IMG_5986DJClinecom IMG_5987DJClinecom IMG_5995DJClinecom IMG_5996DJClinecom IMG_5997DJClinecom IMG_5998DJClinecom IMG_5999DJClinecom IMG_6001DJClinecom IMG_6003DJClinecom IMG_6004DJClinecom IMG_6008DJClinecom IMG_6009DJClinecom IMG_6018DJClinecom IMG_6019DJClinecom IMG_6020DJClinecom IMG_6021DJClinecom IMG_6030DJClinecom IMG_6041DJClinecom IMG_6045DJClinecom IMG_6076DJClinecom IMG_6083DJClinecom IMG_6108DJClinecom IMG_6212bDJClinecom IMG_6268DJClinecom IMG_6272DJClinecom IMG_6278DJClinecom IMG_6293DJClinecom IMG_6322DJClinecom

Copyright 2013 DJ Cline All rights reserved.

 

Apr. 9, 2010 SDF Analytics Revolution

SDF logo2009 copyAwadallah Amr copyBishop Stacey copyChandna Asheem copyCheng Jie copyDaver Vispi copyEfrusy Kevin copyFarago Peter copyHall Martin copyJain Sumeet copyKlahr Josh 2 copyKohavi Ronny copyKreulen Jeff copyLeckie Lars copyLewin Danl copyMcLaughlin Thomas copyMinich Jeff copyNorvig Peter copyPatil DJ copyPhillips James 2 copyPoonen Sanjay copyRudin Ken copySaundaresan Neel copySenSarma Joydeep copySteier David copySuermondt Jaap copyThomas Owen copyVenugopal Anand copyWeil Kevin 2 copy

On Friday April 9, 2010 in Mountain View at the Microsoft Auditorium SDForum held “The Analytics Revolution Conference.” The fact that you can now do large-scale analytics changes the way you model and run your company. Text from DJCline.com

Dan’l Lewin of Microsoft did the welcome and introduced the opening keynote speaker Ronny Kohavi of Microsoft formerly of Amazon. Kohavi presentation “Online Controlled Experiments: Listening to the Customers, not to the HiPPO.” The Highest Paid Person in an Organization is a HiPPO and while they may sign the paychecks, it is the customer that sends him the money. If you don’t properly analyze the data you will miss important cues that drive more sales. Ask what you are optimizing for.

David Steier of PricewaterhouseCoopers moderated panelists DJ Patil of LinkedIn, Ken Rudin of  Zynga, Neel Sundaresen of Ebay and Kevin Weil of Twitter. They discussed Competing on Analytics at the Highest Level.” The demand for professionals with solid database development is increasing. Look for people with experience in Oracle data warehousing, SQL, Cloudera, Vertica, Tableau, Hadoop, Pig and Memcache D. Start budgeting and being very nice to the database people you hire.

Sanjay Poonen of SAP gave the second keynote presentation “Leading the Analytics Revolution.” You can now do analytics from mobile devices like the iPhone using SAP apps.

Owen Thomas of VentureBeat moderated panelists Amr Awadalla of Cloudera, Joshua Klahr of Yahoo, James Phillips of Northscale and Joydeep Sen Sarma of Facebook. They discussed “Analyzing Big Data.” Cloud computing frees you from poorly structured datasets tied to old hardware. Learn Hadoop and MapReduce to process big data, awesome data and stupendous amounts of data.

Before and during lunch there were short pitches from exhibitors and startups like Karmasphere, Accept Software, Agilis Solutions, Aster Data, CTPartners, Dyyno, Execustaff, IBM, KXEN, Medallia and MergerTech.

Peter Norvig of Google gave the third keynote presentation “The Unreasonable Effectiveness of Data.” Believe it or not, more data means better results. The closer two points are to each other, the more likely they might share the same characteristics. The original picture of Mona Lisa will be at the center of a cluster.

Brett Sheppard of BigDataNews.com moderated panelists Jie Cheng ofAcxiom, Vispi Daver of Sierra Ventures, Peter Farago of Flurry, Tom McLaughlin of Accept Software and Jeff Minich of CalmSea. They discussed “New Frontiers for Analytics.” The breakthroughs in analytics are speeding up business cycles.

Jeff Kreulen of  IBM gave the fourth keynote presentation “Analytics: An Applied Researcher’s Perspective”

Harold Yu, Orrick, Herrington & Sutcliffe LLP moderated panelists Stacey Curry Bishop of Scale Ventures, Asheem Chandna of Greylock, Kevin Efrusy of Accel, Sumeet Jain of CMEA and Lars Leckie of Hummer Winblad. They discussed “The Investor Perspective.” They don’t want invest in anything that will quickly become a generic commodity. Companies want more than a small incremental lift. They want analytics to give them a dramatic change in the way they do business.

Jaap Suermondt of HP Labs gave the fourth keynote presentation “Research in Analytics for Operational Impact at HP.” A commitment to R&D at HP is producing clear improvements to everyday operations.

IMG_7285DJClinecom copy04-09-10 crowd1 copy04-09-10 crowd2 copy04-09-10 panel1 copy04-09-10 panel2 copy04-09-10 panel3 copy04-09-10 panel4 copyIMG_7363Karmasphere copyIMG_7268Accept copyIMG_7270Agilis copyIMG_7279Aster copyIMG_7422KXED copyIMG_7271DJClinecom copyIMG_7272DJClinecom copyIMG_7274DJClinecom copyIMG_7276DJClinecom copyIMG_7278DJClinecom copyIMG_7280DJClinecom copyIMG_7282DJClinecom copyIMG_7281DJClinecom copyIMG_7288DJClinecom copyIMG_7284DJClinecom copyIMG_7291DJClinecom copyIMG_7292DJClinecom copyIMG_7293DJClinecom copyIMG_7294DJClinecom copyIMG_7399DJClinecom copyIMG_7403DJClinecom copyIMG_7419DJClinecom copyIMG_7420DJClinecom copyIMG_7424DJClinecom copyIMG_7426DJClinecom copyIMG_7431DJClinecom copyIMG_7457DJClinecom copyIMG_7462DJClinecom copyIMG_7463DJClinecom copy

Video of the conference can be seen at:

www.dyyno.com/sdforum

Copyright 2010 DJ Cline All rights reserved.

Aug. 18, 2009 SDF Business Intelligence in the Cloud

SDF logo2009 copyGali Lenin copyGuanlao Arnel copy

On August 18, 2009 in Palo Alto at SAP, SDForum presented “Cutting Edge Business Intelligence in the Cloud” with Lenin Gali of ShareThis. ShareThis has a widget that allows people to share what they find on the web with others on their social network. It doesn’t matter if it is FaceBook, Twitter, MySpace, or LinkedIn. Their clients include Fox Media, UsMagazine, Wired, ESPN, and movies.com. They built their IT on Amazon EC2, Cascading, Hadoop, Hive and MicroStrategy. They use Aster Data for their Data Warehouse. Text from DJCline.com

If you come from a traditional database IT background, I guarantee that you have never seen an operation like this. Cascading is the processing API for Hadoop Clusters. There are pipes, flows, branches and groups. You get event notification, can write scripts and control it at the tuple level. Hive is the data warehouse built on top of Hadoop. It supports non-complex SQL using HQL. You can build a custom map/reduce jobs for complex analytics. You can still make adhoc queries for large data sets. The Aster Data DW in the cloud is scalable commodity hardware with an Massively Parallel Processing (MPP) Architecture. It uses SQL, Map/Reduce, JDBC, ODBC, and is compatible with Extract Transfer and Load (ETL) tools. Aster Data architecture uses PostgreSQL and has a beehive heirarchy. Queens control the cluster and hold metadata while workers process and store it. If the queen fails it is replaced immediately. Text from DJCline.com

They think that all of this is easier to use and lowers their costs. They keep their headcount down and their revenue up. It works for them. The question is whether it will work elsewhere. Text from DJCline.com

08-18-09 SAP copy08-18-09 crowd pan1 copy08-18-09 sharethisslide copy

Copyright 2009 DJ Cline All rights reserved.