A natural language interface for querying and analyzing interlinked datasets

by Dezhao Song, Frank Schilder, Charese Smiley, Chris Brew, Tom Zielund, Hiroko Bretz, Robert Martin, Chris Dale, John Duprey, Tim Miller, and Johanna Harrison
Abstract Tech Flow


Currently, the dominant technology for providing non-technical users with access to Linked Data is keyword-based search. This is problematic because keywords are often inadequate as a means for expressing user intent. In addition, while a structured query language can provide convenient access to the information needed by advanced analytics, unstructured keyword-based search cannot meet this extremely common need. This makes it harder than necessary for non-technical users to generate analytics. We address these difficulties by developing a natural language-based system that allows non-technical users to create well-formed questions.

Our system, called TR Discover, maps from a fragment of English into an intermediate First Order Logic representation, which is in turn mapped into SPARQL or SQL. The mapping from natural language to logic makes crucial use of a feature-based grammar with full formal semantics. The fragment of English covered by the natural language grammar is domain specific and tuned to the kinds of questions that the system can handle. Because users will not necessarily know what the coverage of the system is, TR Discover offers a novel auto-suggest mechanism that can help users to construct well-formed and useful natural language questions. TR Discover was developed for future use with Thomson Reuters Cortellis, which is an existing product built on top of a linked data system targeting the pharmaceutical domain.

Currently, users access it via a keyword-based query interface. We report results and performance measures for TR Discover on Cortellis, and in addition, to demonstrate the portability of the system, on the QALD-4 dataset, which is associated with a public shared task. We show that the system is usable and portable, and report on the relative performance of queries using SQL and SPARQL back ends.


Dezhao Song, Frank Schilder, Charese Smiley, Chris Brew, Tom Zielund, Hiroko Bretz, Robert Martin, Chris Dale, John Duprey, Tim Miller, and Johanna Harrison, TR Discover: A Natural Language Interface for Querying and Analyzing Interlinked Datasets, The Semantic Web - ISWC 2015, Lecture Notes in Computer Science, vol. 9367, Springer International Publishing, 2015, pp. 21–37 (English).

Related Articles

Previous systems for natural language questions over complex linked datasets require the user to enter a complete and wellformed question, and present the answers as raw lists of entities.

A novel yet simple method is proposed to exercise in stock market by following successful stock market expert in social media.

This book constitutes the refereed postproceedings of two workshops held at the 5th International Conference on Social Informatics, SocInfo 2013, in Kyoto, Japan.