Ozora API

At Ozora Research, we've done a lot of work at building specialized linguistic and textual knowledge into our system. By connecting to our APIs, you can take advantage of that work for your own NLP tools.

To use the API, you just package your text document into a simple XML format, and submit it to one of our HTTP endpoints. All of the APIs use the same input format. Our system responds with another XML document, containing the desired response data. If you are interested in using our API, please Contact Us for more details.

Sentence Parser API

The parser API allows users to analyze the grammatical content of text documents, searching for grammatical relationships such as "subject", "object", or "owner". Understanding of grammatical information is crucial to extract the precise meaning of text in a programmatic way.

The Ozora system will parse the submitted sentences, and return an XML document containing a representation of the resulting structure. For a visualization of the Link Grammar structure, try the Sentence Parser demo here.

Entity Lookup API

As part of our general project of understanding text, we spend a lot of effort in developing, curating and maintaining an extensive database of Named Entities (also known as Proper Nouns). Often the most interesting pieces of information in a text document relate to Entities that the document discusses.

Entity Lookup is challenging for several reasons. First, the same Entity can appear in many different string forms. A newspaper article might refer to the president as "Barack H. Obama", "Mr. Obama", or just "Obama". Determining which Entity a particular string form refers to is a non-trivial task, which depends on context (which Entities were referred to previously in the document?), general knowledge of English usage (what nicknames are commonly associated with the name "Robert"?), and knowledge of specific details (what is Ted Cruz's full name?)

Note that the full problem of Named Entity Recognition is actually involves two distinct tasks: 1) determining that a string refers to some proper noun, and 2) identifying the particular Entity, and associated information, that the string form represents. Many other NLP systems only attempt to solve problem 1.

Stemming API

The Ozora text analysis system also includes sophisticated tools for stemming and entity recognition. Stemming is the process of determining the root form of a word that has been formed through a procedure of morphological inflection. For example, the word "unoperationalizable" is formed from the root word "operate" by applying a prefix and four suffixes (-ion, -al, -ize, -able). The system also knows about alternate spellings, so it will identify "recognisable" as being the British English version of the word Americans spell "recognizable".

Entity recognition is the problem of indentifying proper nouns in text, and resolving different permutations of the string form of the entity to the same item. As you can see in the search examples above, the system knows that the string "FDA" is the acronym version of "Food and Drug Administration". Similarly, the system will recognize "FLHMC", "Freddie Mac", and "Federal Home Loan Mortgage Corp." as strings that all refer to the same entity. Our system contains a large database of these entities, and we are constantly expanding the database.

This technology can be useful for companies that want to do their own NLP projects in-house, but don't want to spend a lot of time on stemming and entity recognition tools. Many NLP algorithms use word feature vector counts as the main input. These word vectors are often very high dimensional - by default, one dimension per distinct word in the corpus. The performance of the NLP algorithms can be improved by using stemming to intelligently compress the feature vectors. It is better to use a single dimension for the root verb "run" than to use separate dimensions for each conjugational variant ("ran", "running", "runs").