Semantic Analysis Method #4: Concept and Named Entity Extraction
Junyi Xie
Think about whether each tool performed its task well: a. If you had been asked to perform the same task yourself, would you have selected the same terms? Would you have included some the which tool skipped over? Left out some that it extracted?
I think Open Calais did better job of concept and entity extraction than Stanford Named Entity Tagger. I used a scientific news from NIH website to test both tools. After processed by Stanford, it only tagged categories of auxiliary nouns like “ORGANIZATION”, “LOCATION” AND “PERSON” which are not very useful to extract the key meaning of the text, but not included some important scientific terms related to the topic. Open Calais tagged more categories like “INDUSTRY TERM”, “MEDICAL CONDITION”, “PUBLISHED MEDIUM” AND “TECHNOLOGY” which are helpful to interpret the content. Also, Open Calais list relevance of each tagged terms. I probably pick the similar words as Open Calais did, and I will add one more key words like ” lung” ,”death”, “vaccine”. Besides, I don’t think person’s name are necessary to be extracted here.
Based on your evaluation of the tools’ performance: a. What are some tasks where this type of semantic technology would add value to a knowledge organization? b. If so, to which parts of the Knowledge Life Cycle would they add value?
The Open Calais and Stanford NET automatically create rich semantic metadata for the content. Concept and named entity extraction tools go well beyond classic entity identification and returns the facts and events hidden within text. They provide basic definitions for common terms and support the interoperability of content. I think the concept and named entity extraction technology still add value to the stage of “knowledge Capture” of knowledge life cycle.
Are these tools ready for “the Semantic Future”? a. Does it produce useable output in both human readable and machine-readable formats? b. If so, describe at least one “Web 3.0” application that this output would allow one to develop.
These tools are preparing for further semantic analysis. Open Calais uses semantic technologies like RDF, entity, geonames, NLP, ontology, OWL which enable analysis result both human and machine readable. This technology improve the development of social computing in web 3.0 like concept-based search and also support to understand web of meaning and knowledge.
We have only looked at the online demos here, but neither the open source download of the Stanford tool nor the API for Open Calais provide any greater access (just much more functionality). a. What do you think of the level of access that these tools give to the user for viewing the algorithms (and underlying rule sets) used for running these extractions? b. If you were using these tools for your organization, what options do you think you would have if you wanted to increase the tool’s accuracy or identify additional types of entities?
I think both of the tools hide the view of algorithms behind the screen. If I will use these tools to analyze the professional texts for my organization, I think I need to add more special vocabularies to increase their sensitiveness to certain entities or contexts.