Semantic repositories

A repository is simply a place for collecting and preserving things. In this sense, libraries, museums, and warehouses are all repositories.

In computer science, a repository is a storage facility that is used to collect and maintain information. In this sense, a database is a repository. A spreadsheet or a flat file may also be a repository, depending on its intended use.

What is a Semantic Repository?

A Semantic Repository organizes information according to its meaning, rather than according to its raw content.

For example, consider the statement: “Some snarks are boojums.”

A conventional Web site or search engine’s database represents this statement as a sequence of four words: some, snarks, are, and boojums. If it is fairly sophisticated, it may record facts like these:

  • Snarks is the plural of snark, and boojums is the plural of boojum.
  • Snark and boojum (according to a dictionary) are nouns, and are is a verb.
  • Snark and boojum occurred in the same sentence, separated by one word.

But this database doesn’t record any information about what the statement means.

In contrast, a Semantic Repository represents the statement with facts like these:

  • There is a class of things called snarks and a class of things called boojums.
  • Some snarks (but apparently not all) are boojums.
  • Some boojums (maybe all, or maybe not) are snarks.

If the conventional database contains other statements about snarks and boojums, the search engine that uses it might be able to combine them and derive facts like these:

  • Almost all of the references to boojums also refer to snarks.
  • Only a few of the references to snarks also refer to boojums.
  • Almost all of the references to both snarks and boojums contain the phrase “For the snark was a boojum, you see,” written as a quotation.

Given the limited information its database, the conventional Web site or search engine provides facts about the words “snark” and “boojum,” but not about the things they represent. In contrast, the Web site or search engine that uses a semantic repository can combine related statements to provide information about the things themselves. For example, given a large group of statements about snarks and boojums, it could derive facts like these:

  • Snarks and boojums were originally mentioned in Jabberwocky, a nonsense poem written by Lewis Carroll.
  • Snarks and boojums are not real.
  • Snark is also the name of an American anti-aircraft missile developed in the 1950s.
  • Snarky is also used as an adjective meaning “sarcastic, or overly critical.”

How can a Semantic Repository benefit an end user?

The difference between a Semantic Repository and a conventional database becomes most clear when both are used to answer the types of questions that search engine users typically ask.

For example, suppose a user asks the question “What percentage of boojums are snarks?” of the two search engines described here.

The Web site or search engine that uses a conventional database simply looks for references to boojum and snark and tries to eliminate the ones that appear to be incidental or redundant. It returns a list of references that refer to boojums and snarks, perhaps applying some special-case rules to give priority to the entries most likely to contain information that that user wants.

The Web site or search engine that uses the Semantic Repository analyzes the question and derives its meaning, just as it did for each statement it incorporated in the Semantic Repository. By relating the meaning of the query to the meaning of the various statements in the Semantic Repository, it may return results like these:

  • The Semantic Repository does not contain information about the percentage of boojums that are snarks.
  • Snark and boojum are imaginary creatures that first appeared in Jabberwocky, a nonsense poem by Lewis Carroll. It was first published in 1871 in the novel Through the Looking Glass.
  • Some snarks are boojums. Therefore at least some boojums are snarks.

The Semantic Repository and its search engine are not alive, of course, so they do not literally understand what queries and Semantic Repository statements mean. By analyzing their semantic content, though, it can often produce answers that just as valuable as if it did!

What is special about Syneural’s Semantic Repository?

Semantic Repositories and software to manage them have been available for many years, but most of them have been either experimental systems with little practical use, or sophisticated systems that require advanced training to set up, and that can only be applied to specialized problems.

Syneural’s Semantic Repository is the first one that can acquire useful information from natural language text written by non-technical users, and can derive useful results from a large body of heterogeneous information. It represents a major advance over the prior state of the art search engine technologies.

Syneural’s first real-world application of its Semantic Repository technology is TownSource, a family of Web sites that collect and share information about neighborhoods, towns, and other real-world communities. (Read more about the TownSource concept here.)