IBM Japan
Skip to main content
 
     Home  |  Products & services  |  Support & downloads  |  My account
 Select a country
English | Japanese
 IBM Research home
Tokyo Research Lab
Projects
 Information and Interaction
 ·Language Translation
 
 
 


Language Translation

  
 

Linguistic Annotation

LAL (Linguistic Annotation Language)

Natural language texts contain many ambiguities which are difficult for natural language processing systems to resolve properly. Some of these ambiguities are not resolved due to the immaturity of NLP technologies, but some can be resolved only by writers. Consider the following example:

a small computer company
There are two interpretations; one is "a company which produces small computers," and the other is "a small company which produces computers." If there are linguistic annotations as shown below, then NLP programs can recognize the latter one is correct.
a small <seg>computer company</seg>
In this example, a segment between <seg> and </seg> is recognized as a unit (or a phrase).

We have developed an XML-based tag set called LAL (Linguistic Annotation Language) to annotate linguistic information. For instance, LAL has the following tags:

s ... specifies a scope of a sentence.
w ... specifies a signle word.
seg ... specifies a phrase.
The following shows an example of LAL annotation.
IBM announced <lal:seg>a new computer system for children</lal:seg> with voice function.
In this sentence, the phrase "with voice function" may modify "children" or "system" from the syntactic viewpoint. The seg tag specifies that phrase modifies "system."

It is difficult and tough for end users to annotate these tags manually. Therefore, we have developed a GUI-based LAL tagging editor for facilitate this annotation work.

If this annotation framework spreads worldwide and linguistic information is annotated in many documents, we can expect that NLP poragms such as machine translation and automatic summarization will become much more beneficial to people.

  
 
  About IBM  |  Privacy  |  Terms of use  |  Contact