Tag Archives: Machine Learning

Nov. 3, 2010 SDF Machine Learning And Mahout

On November 3, 2010 Palo Alto at Cubberley Community Center, SDForum’s Semantic Web SIG hosted “Make Machines Learn for Us.” Ted Dunning of MapR Technologies and Evgeniy Gabrilovich of Yahoo Research. The dramatic increase in computer network speed and capacity in the cloud create new strategies for machine learning.

Ted Dunning talked about Mahout, the new open source tool for large-scale machine learning. The new 0.4 release has a new classification framework that can be deployed in production settings for high volume training and classification tasks such as ad quality estimates, detailed offer targeting and fraud detection. Mahout provides early production quality scalable data mining. New classification systems like Sequential Gradient Descent (SGD) and training APIs allow industrial scale classification.

Evgeniy Gabrilovich from Yahoo Research presented “Machine learning in computational advertising: algorithms and applications.” Most Internet services make money from advertising determined by the growing field of “computational advertising” that studies all the various elements of an ad. Gabrilovich explained how machine-learning techniques design evaluation strategies. Web search results enrich query representation and then use the Web as a repository of relevant query-specific knowledge. The results are then used to understand which ads work for particular users.

Copyright 2010 DJ Cline All rights reserved.

Oct. 13, 2010 SDF IQ Engines

On October 13, 2010 in Palo Alto at Pillsbury Winthrop, the SDForum Emerging Technology SIG hosted Gerry Pesavento, CEO and Co-Founder of IQ Engines and Pierre Garrigues, Director of Research and Development. They talked about “Trends in Visual Intelligence.”

They explained how their image recognition engine takes advantage of human crowd sourcing from the millions of mobile devices taking billions of pictures everyday around the world. If someone takes a picture and they do not tag it with identifying data about the content, the camera merely assigns the images a number, which does not help much.

Pesavento said “The mobile camera is evolving to an ‘intelligent visual sensor’ to power mobile visual search, vision for the blind, photo labeling and augmented reality.” IQ Engines is sorting through images from mobile devices is a user driven strategy of working from images that people are already interested in rather large libraries of stock images. Putting human recognition in a real-time loop to assist machine learning dramatically speeds up the accuracy of recognizing images. The better a person identifies the image, the higher their ranking. The key is their scalable any-image recognition engine. All this easier with the growth of the cloud, new database and analytics tools.

The most interesting development to me will be the new high definition three-dimensional digital cameras. Soon many mobile devices will have two cameras to give the kind images reminiscent of stereo-optic images from the 1800s or Viewmaster images from the 1900s. Until then I will have to take two pictures of the same stationary object a few inches apart and process them together later. (Now you know why I do that funny move when I take your picture. Always thinking ahead.)

Copyright 2010 DJ Cline All rights reserved.

Apr. 21, 2009 SDF Apache Mahout

SDForum copy.jpgeastman-jeff-copy.jpghoffman-suzanne-copy.jpg

On April 21, 2009 at SAP in Palo Alto, SDForum’s Business Intelligence SIG hosted “BI Over Petabytes: Meet Apache Mahout” by Jeff Eastman. Suzanne Hoffman of Star Analytics talked about what she learned at the Gartner conference. Performance management is making a comeback as people try to make better use of the information they may already have. The leaders in BI are IBM Cognos, Microsoft and Oracle. One visionary is TIBCO.

Eastman thinks machine learning is a subfield of artificial intelligence concerned with algorithms that optimize computer performance. It is used in search clustering, knowledge management, mapping social networks, transforming taxonomies, analyzing markets, filtering unwanted e-mail and detecting fraud.

The Apache Mahout project is dedicated to the production of open source Machine Learning tools on the Apache Hadoop supercomputing platform orchestrating thousands of computers to analyze huge volumes of data in reasonable time. Mahout currently offers highly scalable programs for classifying (is this spam?), clustering (are these similar?), recommending (if you like X you might also like Y) and other tasks that can improve their performance by learning from past experiences. Coupled with cost-effective cloud computing infrastructures such as Amazon’s EC2/S3, this means that it is now practical for even small companies to distill Business Intelligence from Internet-sized datasets. The world needs scalable implementations of machine learning under open license and that is what Mahout aims to do.

Coptright 2009 DJ Cline All rights reserved.