Scaling Intelligent Data Analysis with Apache Mahout
With the growing amount of digital data available at developers' fingertips today the need for scalable software to analyse and make sense of this raw data becomes ever more prevalent. Apache Mahout offers scalable implementations of algorithms for data mining and machine learning. Possible use cases range from
- Recommending items to users, e.g. users of a social networks to people they "might know".
- Classifying content into pre-defined categories, e.g. e-mails into spam and non-spam folders.
- Clustering published articles into groups of similar articles on a common topic.
- and many more that will be covered in more detail in the presentation.
All implementations in Mahout have a special focus on scalability - which here means "scalable community" as in a sustainable group of developers and users helping newcomers with their problem settings while still actively driving project development. Scalable also means a commercially friendly license to facilitate implementation of various business models. Of course scalable also means scalable in terms of amount of data to process: Apache Mahout is easy to start with but scales to increasing data volumn due to its use of Apache Hadoop.
After motivating the need for machine learning the talk gives an overview of Apache Mahout, including a deep dive into one of its algorithms. It shows the tremendous improvements that have been implemented in recent past - including the addition of several algorithms, performance improvements. Last but not least Apache Mahout graduated to a top level project this year.
Thursday, 11 of November of 2010, from 14:00 to 15:00