Machine learning 101
Machine learning uses run the gamut from game playing to fraud detection to stock-market analysis. It's used to build systems like those at Netflix and Amazon that recommend products to users based on past purchases, or systems that find all of the similar news articles on a given day. It can also be used to categorize Web pages automatically according to genre (sports, economy, war, and so on) or to mark e-mail messages as spam. The uses of machine learning are more numerous than I can cover in this article. If you're interested in exploring the field in more depth, I encourage you to refer to the Resources.
Several approaches to machine learning are used to solve problems. I'll focus on the two most commonly used ones — supervised andunsupervised learning — because they are the main ones supported by Mahout.
Supervised learning is tasked with learning a function from labeled training data in order to predict the value of any valid input. Common examples of supervised learning include classifying e-mail messages as spam, labeling Web pages according to their genre, and recognizing handwriting. Many algorithms are used to create supervised learners, the most common being neural networks, Support Vector Machines (SVMs), and Naive Bayes classifiers.
Unsupervised learning, as you might guess, is tasked with making sense of data without any examples of what is correct or incorrect. It is most commonly used for clustering similar input into logical groups. It also can be used to reduce the number of dimensions in a data set in order to focus on only the most useful attributes, or to detect trends. Common approaches to unsupervised learning include k-Means, hierarchical clustering, and self-organizing maps.
For this article, I'll focus on three specific machine-learning tasks that Mahout currently implements. They also happen to be three areas that are quite commonly used in real applications:
- Collaborative filtering
- Clustering
- Categorization
I'll take a deeper look at each of these tasks at the conceptual level before exploring their implementations in Mahout.
Hiç yorum yok:
Yorum Gönder