Intellixir: A Patent and Literature Analysis Product: Q&A with Developer Jean-Michel Careil
Hutcherson, Mark, Online
INTELLIXIR (www.intellixir.com) is an innovative patent and science literature analysis system. Launched commercially in 2002, it debuted in North America 6 years later, in 2008, during The Patent Information Users Group, Inc. (PIUG) annual conference. I stumbled across it there--and I did in fact wonder if INTELLIXIR was an elixir.
To clarify, the system is not an interface to a comprehensive patent or literature database--although users can search against uploaded data sets. INTELLIXIR's purpose is to supply a chest of tools that enable probing imported data and converting findings into visible rather than text formats. The goal is to extract and discover collaborators, competitors, and emerging technologies.
Standing there, glassy-eyed from a day of looking and listening at the conference, my heart raced some as I noticed INTELLIXIR's response rate, sharp imagery, and apparent usability. Five minutes on, succumbing to the eye candy, I found myself pondering: Maybe someone had finally figured out how to ease our data analysis pains in a way that made some sense--with simplicity. INTELLIXIR was quick to map and dissect cited patents, expose technology patenting trends, and reveal inventor working groups. It ranked, graphed, and charted inventors, assignees, and concepts. It allowed me to group assignee subsidiaries and company names with different spellings. It allowed downloading into Excel the data supporting the illustrations.
Having stepped away from the allure of the first encounter, I decided I wanted to learn more, to try to clarify whether my infatuation could be justified. It took some time, but I finally plucked up the courage and asked Jean-Michel Careil, developer and INTELLIXIR founder, for an interview that would set the record straight. An edited transcript of our conversation follows.
Mark: Jean-Michel, how did you become involved in the field of programming, and more specifically, how long you have been making analytics software?
Jean-Michel: It all began in 1991, when I was in the French atomic agency (CEA; Commissariat a l'Energie Atomique), where I was producing a dashboard and reporting system for the Environmental Radiological Control department. I was working a lot with Excel (and its father, Multiplan ... I am so old) and Word. This experience showed me how it is important and interesting to make efficient graphical representations to help management make the right decisions.
The project began in CEA with SIMBAD, initiated by Patrick Baldit. Patrick wanted to create a web application that could allow scientists to navigate easily among text and graphical representations to help them to find relevant information. It was in 1996-1997, the beginning of web-database applications. We met in 1998 in Cadarache, a CEA research plant in southeast France, and we decided to develop this project together with other colleagues: Pierre Mahler, Jean-Louis Emeric, and Sylvie Gibert. Patrick has presented this application several times during conferences, and some in the industry wanted to try it.
In 2001, we decided to create a spinoff with this project. Although in 2002 our work as a group ended, I continued to develop INTELLIXIR on my own--but the others are still friends of mine, I promise you.
Mark: Were those nuclear energy conferences where Patrick was demonstrating the software?
Jean-Michel: It was particularly during SCIP [Strategic and Competitive Intelligence Professionals] meetings. Patrick also presented SIMBAD during the 1998 conference of the Veille Strategique Scientifique & Technologique, where delegates come mainly from universities.
Mark: How did you decide to target patent information, and when did INTELLIXIR become a dream that you wanted to pursue?
Jean-Michel: We decided to communicate more on patent analysis in 2008, discovering the Patent Information [Users] Group and attending the Annual Conference. Patents move more money in organizations, so it is easier to find a budget in these departments. But a large number of our users use both patent and nonpatent literature with our tool.
INTELLIXIR became, perhaps not a dream, but a promising project when we signed three big contracts the same year in 2005. And the most positive thing, for me, is the testimonies we receive each year during the User Day [when] people, coming from large companies, discuss their work. It is very satisfying for me. It means that users find something relevant in INTELLIXIR. In our business, where the production is fully virtual, such testimonies are precious, at least to get up every morning.
Mark: What is User Day?
Jean-Michel: User Days happen each year in the south of France. The first one took place 2 years ago, and there were 21 users. Last year there were 38. We are expecting 60 attendees for the 2011 User Day. During User Day, three or four clients voluntarily present their work with INTELLIXIR, then roundtables allow them to talk together about graphical representations, text mining, sourcing, and methodology. We finish the day by presenting the upcoming release of INTELLIXIR. People really appreciate this day because it is quite technical, pragmatic. We do not want to be commercial during this meeting. It is really the users' day. Next year I want to organize a North America User Day--Chicago would be a good place for that--and a European English User Day, probably in Brussels.
Mark: One of INTELLIXIR's company graphics states, "INTELLIXIR: L'Infometrie Decisionnelle," which I have loosely translated into English as "INTELLIXIR: Decisive Infometrics." What are the particular advantages to INTELLIXIR's metrics compared to other products? And is my translation acceptable?
Jean-Michel: I tried to translate it myself, but I never found something as good as what you propose! We use "INTELLIXIR: Leading Your Way to Discovery." "Infometrics" is perhaps too obscure.
But your translation highlights what we want to do-something efficient, pragmatic, "decisive." We don't want to add information to information but help to detect something that will change direction, decision in the organization of our client.
To do that we based our work on two external factors:
* Information professionals, who prepare the sets of documents to analyze, avoiding "garbage in, garbage out."
* Experts of the client organization who drive the analysis and decide which information is relevant or not. I do not believe that a tool can decide the relevancy of documents instead of an expert. You can help her or him, showing the best candidates, but at the end, she or he decides.
INFORMATION UNCOVERD BY INTELLIXIR
Mark: What types of information is INTELLIXIR able to uncover that should compel researchers to use INTELLIXIR?
Jean-Michel: Two kinds of information:
* Global aspects, trends: Where the competition is going, which technology is emerging, what collaboration exists between players, who are the experts.
* Document references: Graphical representation and an internal search engine which allow our users to find very relevant documents using a nonlinear method. Instead of reading documents one by one, they can use several angles by which to visualize the content of a lot of documents and detect the most relevant ones.
Mark: In 2008, INTELLIXIR teamed with Questel-Orbit to leverage INTELLIXIR's statistical analysis capabilities with Questel's QPAT product. INTELLIXIR also powers the statistical analysis behind Orbit.com, which began offering a web-based interface to the FamPat database in 2009. I understand INTELLIXIR can process nonpatent literature, but is the software analysis available through QPAT and Orbit.com only for analyzing patent information, and not journal literature?
Jean-Michel: Yes, it is. The Questel statistical module analyzes only Questel data. So, currently, only patents.
Mark: Are enhancements to INTELLIXIR immediately transferable to Questel products?
Jean-Michel: Yes, we try. In fact, enhancements come both from Questel users and INTELLIXIR. But Questel users and INTELLIXIR users are not strictly the same. INTELLIXIR users merge different sources and spend more time on our stand-alone application to make a report--up to several weeks. Most Questel users need to have a result in a couple of minutes. The challenge is different. But of course there are overlaps.
Mark: For the foreseeable future, will INTELLIXIR continue in its relationship with Questel?
Jean-Michel: I hope so! It is a very fruitful collaboration for both companies. I really appreciate being able to work so confidentially with Charles [Besson] and his team.
PARSING THE PROCESS
Mark: When using INTELLIXIR as a stand-alone product, it can accept structured data from a variety of sources, including database aggregators like Dialog and Questel, and individual databases like Derwent, PatBase, and MEDLINE. Users access INTELLIXIR and upload their patent data to INTELLIXIR using a standard web browser interface.
That said, and with the understanding that INTELLIXIR is designed to help discover relationships in data, how many patents in a given collection undergoing analysis can the system work with comfortably?
Jean-Michel: Ideally, between 100 and 5,000. Of course, it depends on what you want to see and how documents are focused and not on one specific subject. If they want to discover something new, we encourage our users to work on bigger sets of documents in INTELLIXIR than was possible before. Our tool can help them navigate through the noise to find the relevant signals.
Mark: How many patents can the system be pushed to handle?
Jean-Michel: Our biggest database had 60,000 patents without claims. But the main issue with large databases is the assignee grouping: With 60,000 patents, you can have nearly 40,000 different assignee labels to standardize. Even if we have an efficient feature to do this, you need time. And time is currently harder to find than money in lot of organizations.
But sometimes users want to analyze huge data sets because their needs or their goals are not really defined. Or they are anxious: They are afraid of missing something. A question like "What's happening in nanotechnology?" could result in attempting to analyze 100,000 documents or more. But it is smarter to split sets into subtechnologies, particularly if you want to detect small changes that are indicators of emerging technologies.
Mark: Will the system work at the family level, as well as the individual patent level? Are there options for the user to decide whether to analyze data based on individual patents or families?
Jean-Michel: We prefer to work with families (representative members and family information), to avoid the redundancies of content. But from time to time, some of our users analyze individual patents.
Mark: How would the system manage a single U.S. patent from, for example, MicroPatent, when including it into a collection previously formed using Derwent families?
Jean-Michel: Poorly. The INTELLIXIR System needs consistent information. So, if you want to merge patent sources, my advice is to work with a tool like BizInt to choose abstracts, claims, and other fields among different sources, and then export to INTELLIXIR.
Mark: You are referring to BizInt as a tool that allows data from different patent sources to be combined and normalized, so users could then export a set of data from BizInt to INTELLIXIR in .csv format?
Mark: Regarding duplicates, how would the system deal with similar families, for example, from INPADOC and Derwent, which may contain some similar and some different equivalents?
Jean-Michel: Same answer. We deduplicate your data using patent numbers. If the same number appears in different families, we are going to miss information.
Mark: What do you mean by "miss information"?
Jean-Michel: If an INPADOC Family A contains the patent US123 and the patent FR345, and a Derwent Family B patent US123 and patent DE346, we consider the second family as a duplicate because there is a common patent between both families. So we miss the DE346 patent family.
Mark: Most analysis is probably performed on bibliographic information--the information available on the equivalent of the front page of a patent, such as assignees, inventors, classifications, and priority dates.
However, are there times when the system analyzes text. In those cases, does it primarily analyze the text of the title, abstract, and claims? Can a user elect to analyze full text, for example a patent specification?
Jean-Michel: INTELLIXIR analyzes titles, abstracts, and claims, and we want to add the description in a future version. INTELLIXIR is based on bibliometrics. We can analyze full text, but at the expense of performance. It is not our specialty.
Mark: Are there ways in which INTELLIXIR allows non-text or chemical structures to be analyzed?
Jean-Michel: If chemical structures are well-indexed, using codes for example, we will be able to analyze it like any metadata in the document. But this indexing is not easy to find in all documents.
Mark: Can you give readers an idea of how the system parses data--that is, what is happening to a patent as it is processed by INTELLIXIR?
Jean-Michel: When the documents are stored on our system, data are stored in a database and are indexed by an internal search engine. So the system offers two ways to access the information: graphical representations created from statistics, and via an internal search engine.
Mark: When you say two ways to access information, do you mean: first, as graphics which can be exported as an image, and secondly, as the raw data which can be exported in Excel?
Jean-Michel: Users can export these data. But in fact I mean that users can, first, search classically a patent using our internal search engine, and second, they can find a patent clicking on the graphs we provide because each graph is linked to the documents used to produce this graph: If you see a link between two organizations, you can click on the link to reach and read the documents shared by both organizations (co-publications or co-patents). And this is the more interesting way.
FILTERS AND OTHER PARAMETERS
Mark: To process data imported from any particular database, INTELLIXIR uses filters. What are filters? Jean-Michel: When you want to analyze "structured" information exports from different databases, you need to standardize the format of each document to identify perfectly each metadata. Lots of solutions allow users to manipulate the format themselves, so they can analyze all the documents they want. But the issue is that this process is too user-specific because some parameter choices can change with the person. Therefore when using such systems if the person in charge of this work can't do it for any reason, the process is down; similarly, if the person doing this work changes, the results can change. This was one of the main points we wanted to solve since the beginning with INTELLIXIR, to make the process more "industrial," more reliable and consistent. So we develop filters to handle direct exports from databases. No manipulation or change is needed. And if your export contains several formats from different databases (like Dialog or STN), we develop a single filter to handle this file.
Mark: Can you give an example of a parameter that can be arbitrarily chosen in other systems that will create problems with results?
Jean-Michel: If the "Authors" field contains the author names and the names of organizations, one user could extract correctly the organization data, while a colleague could decide that it is too complex an undertaking to bother doing. The process depends on the capacity of users to make decisions about the format of the data that they must analyze, and sometimes it is quite tricky.
Mark: It seems like filters give you the flexibility to allow users to import data from just about any source they require. My impression is that some analytics packages, perhaps most, have limited numbers of sources from which users can choose--does this seem true to you?
Jean-Michel: Usually our users have access from one to six different sources. They make occasional or recurring searches using these sources, and INTELLIXIR is integrated in this process.
Mark: For instance, users could have filters that work with Derwent data from STN, patent data from Minesoft's PatBase, and IFI Claims data that originates from Dialog? Similarly, they could have a filter for processing data that they combine and export from BizInt?
Jean-Michel: Absolutely. Usually our clients have one main patent source, and several article sources. We are trying to handle all the sources of our clients to allow them to analyze their exports directly, without any file manipulation.
Mark: How important is it for an analytics company like INTELLIXIR to demonstrate that the system maintains data integrity? Do you have standard methods to test the system? Are their standardized methods used by analytics developers to test their systems? Do you find that your clients are interested in learning about this issue?
I ask the question because whenever I have worked with analytics systems, I am concerned that such systems could drop data, or manipulate it in such a way as to provide inaccurate results. My assumption is that data suppliers and analysis vendors would have the means to test their systems as a quality assurance measure.
Jean-Michel: The data are stored in a database on the INTELLIXIR server. So the integrity is quite preserved because there isn't any other process that could access and modify these data.
In another level, graphical representations are always linked to the documents on which they are based. So users can verify the data consistency easily by clicking on the graphs.
THE BUSINESS OF ANALYSIS
Mark: Are there nonpatent markets that INTELLIXIR is pursuing in analytics?
Jean-Michel: Half of our users analyze nonpatent literature. But we have stayed in sci-tech information. We have made a trial with marketing information, but I think that the need is different because an answer can be found in one title easier in that domain than in SciTech, where you have to verify the concept and cross-reference with other metadata.
We have thought about legal information--analyzing court information for example. But it is not our business, so it is hard to move there.
Mark: Like most other business, has the analytics software industry felt the repercussions of the economic crisis that began in 2007-2008?
Jean-Michel: We started in the North America market in 2008. Good timing, wasn't it? Of course, nothing happened for us in North America until the end of 2009.
Mark: What are your biggest challenges for INTEL LIXIR now?
Jean-Michel: Since the beginning, we have never lost a client. All of them have renewed their subscriptions. Our challenge is to continue not to disappoint our current users while attracting new ones.
Mark: Are there improvements to INTELLIXIR that you are working on now that you can discuss?
Jean-Michel: We just released version 7 in March 2011, which offers better performance, a more accurate search engine that offers new proximity operators and directional proximity operators, new statistics, and a dynamic report management tool. We have also improved the "ExpertLIXIR" module, a way to allow experts to rate and comment on documents. And I hope that the fall 2011 version (we are trying to produce a new version every spring and fall) will come with something very new. But it is not ready. New features are launched only if they bring real improvement, a real help for our users in their work--not only because it is beautiful or technologically satisfying.
Mark: Lastly, what is INTELLIXIR's pricing model?
Jean-Michel: You can start with us step by step, beginning with a free trial on our demo server. Then an Evaluation Pack allows you to work for 3 months in real conditions with the INTELLIXIR system for 5,000 [euro] [about $7,118]. And if it is relevant, you will be able to subscribe annually between 20,000 [euro] [about $28,471] and 30,000 [euro] [about $42,707], depending of the number of filters you want. There is no limit in the number of users and no limit in the number of uses.
Product name: INTELLIXIR
Tagline: Leading Your Way to Discovery
Latest release: Version 7 (March 2011)
Problem it solves: Mapping structured patent and nonpatent information to visually detect trends, collaborations, players, and relevant prior art.
Data formats: All structured information in tagged, tabular, or XML formats
Launch date: 2002
Company site: www.intellixir.com
Ownership: Fabienne Careil and Jean-Michel Careil
Author's disclosure: I had the opportunity to test INTELLIXIR 2 years ago and was reintroduced to INTELLIXIR on a trial basis through Orbit.com. Neither my employer nor I currently have a business relationship with INTELLIXIR or Jean-Michel Careil.
Mark Hutcherson (email@example.com) is a research analyst at Threshold Information, Inc. From 2004 to 2010 he was editor of the Patent Information Users Group PIUG Newsletter, and he continues to author the newsletter column PatentAddict.…
Questia, a part of Gale, Cengage Learning. www.questia.com
Publication information: Article title: Intellixir: A Patent and Literature Analysis Product: Q&A with Developer Jean-Michel Careil. Contributors: Hutcherson, Mark - Author. Magazine title: Online. Volume: 35. Issue: 5 Publication date: September-October 2011. Page number: 20+. © 2009 Information Today, Inc. COPYRIGHT 2011 Gale Group.
This material is protected by copyright and, with the exception of fair use, may not be further copied, distributed or transmitted in any form or by any means.