 |
 |
|
SC Conference - Activity Details
Text Mining on a Grid Environment
Presenter:
|
Valeriana Gomes Roncero
(COPPE UFRJ)
|
Doctoral Research Showcase Session
|
Thursday, 03:45PM - 04:00PM
|
|
Room 17A/17B
|
Abstract:
One key dificulty with text classification learning algorithms is that they require many hand-labeled documents to learn accurately. In this study, we propose to use a combination of Expectation-Maximization (EM) and a naive Bayes classifier on a grid environment, this combination is based on a mixture of multinomials, which is commonly used in text classification. Naive Bayes is a probabilistic approach to inductive learning. It estimates the a posteriori probability that a document belongs to a class given the observed feature values of the documents, assuming independence of the features. The class with the maximum a posteriori probability is assigned to the document. Expectation-Maximization (EM) is a class of iterative algorithms for maximum likelihood or maximum a posteriori estimation in problems with unlabeled data. Text classification mining methods are time-consuming and utilizing the grid infrastructure can bring significant benefits.
|
|
|