Developing a Comprehensive Patent Related Information Retrieval Tool

Article excerpt

Abstract

In recent years, there has been a massive growth of regulatory and related information available online. This information is distributed across many different domains creating a problem for accessing and managing this data. This paper proposes a framework to access information across two such domains - patents and court cases. The framework is designed to boost the value of a set of patents based on information available in court cases by identifying and cross-referencing mutual information in the two domains. We test our framework by constructing a use case involving the hormone erythropoietin. A corpus of 1150 patents (including 135 closely related patents) and 30 court cases is gathered. Challenges associated with such integration and future plans are briefly discussed.

Keywords: Patents, Court cases, USPTO, Search, Information retrieval

(ProQuest: ... denotes formulae omitted.)

1 Introduction

The administration of the government creates and enforces laws and regulations at various levels. At the top most level are the federal laws passed by Congress which focus on a wide range of areas, including science and technology. These laws are codified in the United States Code (U.S.C.). Broad power is given to administrative agencies, such as the Food and Drug Administration (FDA), the Federal Communications Commission (FCC) and the United States Patent and Trademark Office (USPTO), in order to create and enforce rules and regulations that then appear in the relevant chapters of the Code of Federal Regulations (C.F.R.). Huge amounts of information pertaining to science and technology is buried in this system and distributed across various incompatible and sometimes disconnected domains. These domains can be broadly classified into laws, regulations, the documents in the administrative agencies, the documents generated by the court system and other scientific and technological literature. Comprehensive regulatory knowledge on a particular topic is typically spread across several of these disparate domains. For example, a company working in the field of Global System for Mobile Communications (GSM) would likely need to know about existing patents, court litigations involving any of these patents, their competitors' work, and the relevant scientific literature. All of this information is available in different domains, namely (a) the administrative agency (USPTO in this case), (b) the federal court system, (c) the pertinent laws and regulations, and (d) the scientific literature. The task of retrieving information or knowledge relating to GSM requires thorough study of documents across all these domains. With the explosive regulatory growth and related information in the recent years, thorough study of such documents has become a very laborious task involving many hours of manual crossreferencing across different domains due to the lack of smart tools. There is a need for integrating such diverse sources of information and providing a common interface that has the ability to search and correlate information in various domains.

The recent years have seen a tremendous growth in research and developments in science and technology, and an emphasis in obtaining intellectual property protection for one's innovations. In 2009, around 485,312 patent applications were filed with the USPTO (Site 1). PubMed, a biomedical literature database, comprises of over 19 million records including MEDLINE citations. Searching for relevant information across these domains is a non-trivial task for two major reasons:

1. The domains are incompatible - The information in these domains is stored and expressed in different document formats, some of which are not computationally friendly.

2. The domains are highly distributed - The domains and the sub-domains are very widely distributed across many databases. For example, there are 94 federal judicial districts and 13 Courts of Appeal in the U. …