Artificial General Ingelligence

( Log In | Register )    

AGIRI · Forums · Discussion Lists · Wiki     
 

Outline · [ Standard ]

> Abstract: Mining Semantic Information From Unst..., By Michael Ross

AGIRI 20-Apr 06, 10:22 PM · Post # 1
Group: Admin · Joined: 20-Dec 05 · Posts: 131 · PM · E-Mail
+Quote Post
Go to the top of the page
Abstract posted for the May 20-21 AGI Workshop:

Mining Semantic Information From Unstructured Data
By Michael Ross, Science Applications International Corporation

user posted image PPT: Mining Semantic Information From Unstructured Data

MICCE is a general purpose algorithm, based on Clustering-By-Committee (CBC), for mining semantic data. The algorithm discovers semantic classes by grouping together items with similar features or which appear in similar contexts. Instances of ambiguous items can then be assigned to classes. MICCE is designed to work in any domain which contains structural and classification ambiguities. With domain-specific plugins, it can be applied to linguistic, visual, auditory, spatial, or mixed data. In addition, MICCE can be applied to the results of other algorithms, and make use of external ontologies. This flexibility allows it to use its own classifications or other new information in a feedback loop which can iteratively improve results.

As an example of the algorithm's behavior in a text domain, MICCE may discover two semantic classes for the noun 'plant.' These classes correspond with different meanings or senses of 'plant,' and are represented by sets of words which occur in similar contexts, i.e., {plant, flower, shrub, tree} and {plant, factory, mill, warehouse}. The senses are associated with a representations of the contexts in which they occur. Once these senses are discovered, occurrences of the word 'plant' can be disambiguated. MICCE compares an occurrence's context with each candidate sense's context representation. The occurrence can then be assigned to the best candidate. Thus, the documents (and individual words in the documents) may be tagged with semantic data. Furthermore, because MICCE can handle multiple types of context representation, the disambiguation data can be fed back into subsequent iterations of the algorithm. For instance, if an occurrence of 'plant' is disambiguated to the sense {plant, flower, shrub, tree}, this provides a more detailed description of the context for nearby words such as 'soil' or 'garden.' Such words can then be more accurately disambiguated, and syntactic parse ambiguities involving these words may be more accurately resolved. In non-text domains, such as visual/spatial data, the algorithm could similarly discover classes of objects based on their contexts and/or features, and assign instances to classes. Thus, although MICCE is currently being tested only on sequences of words, the algorithm can be understood more generally as a method for bootstrapping high-level representations from low-level data in any domain with structural and classification ambiguities.

At SAIC, MICCE is being developed to discover and disambiguate referents of names (determining whether "Michael Jordan" refers to the famous basketball player or to another individual with the same name ). Interim results are presented along with a discussion of applications and future development directions.

Important Links:
Main Workshop Website: http://www.agiri.org/workshop
Directions/Hotel: http://www.agiri.org/directions.htm
Workshop Schedule: http://www.agiri.org/schedule.htm
Printable Version / Handout: http://www.agiri.org/workshop/AGIRI_Workshop_2006.pdf

Reply to this topicTopic OptionsStart new topicStart Poll


AGIRI Forum Sponsor - Novamente LLC AGIRI · Forums · Discussion Lists · Wiki     
Top of Page
AGIRI - Artificial General Intelligence Research Institute
Copyright © 2001-2007 :: User Agreement :: Discussion Lists :: Contact