Advanced Sentence Search

Ozora's text analysis tools enable several advanced search techniques. These search methods all operate on the basis of individual sentences, in comparison to traditional search methods, which typically return whole documents.

Grammar Based Search

This search technique allows users to find sentences which match a particular grammatical structure. For example, say you want to find information about instances when the FDA has rejected a new drug. You can enter the phrase "FDA rejected the drug" into the search field. The system will parse the phrase and determine that the main verb is "reject", which is linked to "FDA" with a subject grammatical role, and to "drug" with an object role.

The system then transforms the parsed sentence into a database query, and runs it against our pre-parsed corpus database. It finds other sentences where the main verb is "reject", the subject is "FDA", and the object is "drug". The results look like this:

  • The Food and Drug Administration rejected the renal cancer drug in June. article
  • The FDA had previously rejected the drug, citing issues with the company 's manufacturing plant. article
  • Durect 's post - operative pain relief drug, Posidur, is also under the spotlight after the U.S. Food and Drug Administration rejected the drug in February, indicating that additional safety studies would be needed. article

You can try out the Grammar Based Search here.

Simple Entity Lookup

An important component of the Ozora text analysis technology involves Entity Recognition. Entities, or proper nouns, are used to refer to specific people, places, companies, or other types of unique objects.

Sometimes users want to find sentences that refer to particular entities. The challenge is that the same Entity can often appear in several different permutations. For example, a newspaper article might refer to Hillary Rodham Clinton as "Mrs Clinton". Similarly, the acronym "FBI" is often used to refer to the Federal Bureau of Investigation. The Ozora system knows how to resolve different Entity strings to the same unique item.

The Simple Entity Lookup tool allows users to specify an Entity, and search for sentences that refer to it. The system will return all permutations of the Entity string. For example, searching for the Entity "Angela Merkel" will return the following hits. Notice how in each sentence, the text string representing the name is different:

  • We can't allow the opponents of Europe's direction to set the speed and tone of European policy, said Roth , a member of Merkel's Social Democratic Party coalition partner . article
  • With its support for traditional families and its bracing talk on crime and immigration, it is selling itself as a conservative party of the kind the CDU was before Mrs Merkel took charge. article
  • But German Chancellor Angela Merkel, who has taken a lead in unsuccessful negotiations with Putin, stressed : "Sanctions... can only be lifted if the reasons for them change." article

Try the Simple Entity Lookup here.

Entity Co-Occurence Search

Sometimes it is interesting to observe the links and relationships between different Entities. The Entity Co-Occurence Search allows users to discover these relationships and then examine the sentences that descibe the relationship.

This process works in two phases. First, the user selects an Entity to use as a starting point. The system then finds all other Entities that are linked to the initial one, and ranks them by the number of hits. The user can then scan this list, which often contains interesting new information. For example, running a query for "Donald Trump", we find that he has a link to "Eric Schneiderman" (wikipedia), and to "Jared Kushner" (wikipedia).

Examining the link to Schneiderman, we find the following sentences:

  • It ran a long, unflattering article about New York 's attorney general, Eric T. Schneiderman, that critics claimed was retribution for a lawsuit Mr. Schneiderman's office was pursuing against Mr. Trump's education business. article
  • Suspicions that the article was ordered up as retribution on Mr. Schneiderman were fueled in part by a message Mr. Trump had posted on Twitter in December, after Vanity Fair published a long article on the investigation. article

Examining the link to Kushner, we find these sentences:

  • Mr. Kushner , the son-in-law of Donald Trump, hired a former chairwoman of the Landmarks Preservation Commission as an architect on the project. article
  • "We have confidence in the Israeli economy and its growth potential," said Kushner, who is married to Donald Trump's daughter, Ivanka . article

So the Ozora system can act as a high-level bridge between articles, allowing users to discover not just relationships, but also to find the specific text documents that describe the relationship.

You can try out the Entity Co-Occurrence Search here.