Using patent classes to search for concepts

Patent classes allow patent examiners or other people to code patents according to their content. For example, patent classes make it clear whether a patent relates to agriculture, data processing, medical applications, chemistry, etc. (for more details, see Cooperative Patent Classification). Such a classification system is very useful for grouping documents by concept, for example.

Unfortunately patent classes are only widely available for patent documents; every patent is assigned one or more patent classes, depending on the patent’s contents. No such classification is widely available for other documents, e.g. scientific papers, financial or investment news, blog posts, and other contents from the web.

Because patent classes provide a useful, wide-coverage, and internationally standardized way of classifying contents by concept, and because no such classification is available for anything but patents, we decided to build an automatic classifier. This classifier uses advanced self-learning technologies such as recurrent neural networks. It assigns patent classes to all incoming documents that are not patents (e.g. scientific papers, news, blog posts, financial events). To our knowledge, this is the first time that such a classifier is available in a commercial standard software product.

As a start, we ran a subset of our entire database through our new classifier. The results are available via mergeflow.net. Over the coming weeks and months, we will classify more data, and add user interface functionalities to our platform that make more direct use of patent classes.

Combined with Mergeflow’s other analytics capabilities, here are some examples of things you can do with our new  classifier:

Organize companies and other entities by concept

Let’s say we are interested in “smart city”, and we would like to know more about what subtopics there are, and what SME companies are active in these subtopics. We start by searching for “smart city”, zooming in on companies data (Mergeflow has a data repository of ca. 250,000 crawled SME websites). We use a patent class tag cloud in order to get a first overview:

This tag cloud, generated by our new classifier, shows the topics within “smart city” addressed by various SMEs. The bigger the font, the more companies. Probably not very surprisingly, it seems that many companies are active in data processing and transmission. But let’s say that now we want to see the companies in two subtopics, (1) waste collection and removal (highlighted in red in the tag cloud); (2) traffic control (highlighted in green in the tag cloud). Clicking on the waste collection and removal tag shows us the companies in this space:

  • Ecube Labs provides data-driven waste management solutions.
  • binee collects and recycles used electronic devices.
  • Experfy provides, among other IoT solutions, detection of trash levels in containers to optimize trash collection routes.

Next, for exploring traffic control companies, we use a new interactive visualization in Mergeflow, the Sankey chart. In our case, this chart maps companies (left side) against patent classes (right side), thus helping us further organize the results set. For example, this lets us zoom in on traffic control companies that work in the area of wireless networks (Future Intelligence; Libelium; Rajant):

Search for concepts rather than just for keywords

For concept search, we will use a different example. Let’s assume we are interested in VC-funded machine learning companies in the medical field. In order to do this, we can first search for machine learning companies, and use Mergeflow’s VC funding event extractor to get all VC-funded machine learning companies. Among these companies, we then need to identify companies in the medical field.

In order to find the medical companies, rather than rattling off all kinds of medical terms we can think of and make a huge monster query, we can simply use an appropriate patent class. This patent class then represents an entire concept model that our classifier has learned from data. In this case, we use A61* (= medical or veterinary science and its subclasses). Here is a screenshot of some results, showing the date, company name, funding amount, and a short description for each company (click on the image to enlarge it):

Cross-correlate patents and other contents

Patents already come with patent classes. If now we have patent classes for non-patents as well, we can cross-reference patents and non-patent documents much more easily. For example, let’s say we want to explore how CRISPR, a relatively new gene editing method, is being used in making or treating food. We may search patents for CRISPR and the patent class A23 (foods or foodstuff), for example. Just using the class A23 saves us the hassle of writing a big query that includes all kinds of food-related search terms. For patents, the patent examiners did the work of labeling documents with A23 for us. For non-patents, our classifier does this now.

If now we want to find companies that use CRISPR in the food space, we can search patents for CRISPR and A23, for example. This returns some companies, mostly big companies such as Danisco, DSM, or DuPont Nutrition Biosciences. Now, one question could be, how do we find companies that use CRISPR but do not say so in their patents? This is where our Companies data repository comes in. This data repository contains the websites of ca. 250,000 companies (mostly SMEs) worldwide, now also tagged with patent classes, thanks to our new classifier. This enables us to now search for CRISPR and A23 (food) in these companies as well. This way we find additional companies, e.g. Applied StemcellClara FoodsEligo Bioscience, and Xcode Life. Now, we can go back to patents again and generally search for patents held by these companies. This lets us identify relevant inventors at these companies, for example. You can see the result in the screenshot below:The examples above are a first start. Next, we will run more data through our classifier, keep working on the classifier itself, and also add more UI functionalities that let you use patent classes more directly.