Topic Triage: A new tool for exploring topic emergence with Mergeflow

Earlier this year, we launched the Topic Matrix, which lets you generate size-growth matrices to trace technologies and other topics over time.

Over the course of the last several months, we used the Topic Matrix in several innovation strategy projects together with our customers. In these projects, we found that analyzing and visualizing topic dynamics over time and across various signals (patents, R&D, investments, news and blogs) has great potential for revealing non-trivial innovation strategy insights. But we found that the design of the matrix itself sometimes made it hard to interpret the results because it often required too many levels of abstraction.

So we got our heads together in order to come up with a tool that maintains the analytical power of the matrix but is easier to interpret. In conversations with our customers, we agreed that we would like to replace the matrix by something that meets the following criteria:

  1. Cross-sectional: consider patents, R&D, VC investments, news, and blogs. As we saw in the matrix, such a cross-sectional approach lets us distinguish topics that are active in just one area (R&D for instance) from topics that are active across a range of signals. It also lets us see the sequence of activities (e.g. R&D first, then VC investments, or vice versa).
  2. Flexible: the tool has to work for any query, ranging from technologies to applications to companies and products.
  3. Easier to interpret: the new tool should be easier to read than the matrix, i.e. it should require fewer abstractions by the user in order to interpret the results.

We just released the result of what we came up with, the Topic Triage tool. It replaces the matrix. The data that go into the Topic Triage tool are the same that went into the matrix. But the Topic Triage visualization is much more straightforward because it requires less abstraction. Below I describe how it works.

The visualization

If we just wanted to visualize relative share or size of topics at one point in time, we could just use a pie chart, like this:

If we want to see how things change over time, we could use a series of pie charts, like this:

2015:

2016:

2017:

While this may work with just two topics, it quickly gets hard to read with more topics. So we decided to use a stacked area chart instead (the chart below shows the same data as the pie charts above):

The data

Shares

As mentioned above, the data that go into the Topic Triage tool are the same that went into the matrix.

For scientific publications, technology blogs, industry news, and patents, we use the number of hits of each topic. If, for example, Topic A has 1,000 hits in patents and Topic B has 3,000 hits, then the share of Topic A is 25% (1,000 / (1,000+3,000)), and the share of Topic B is 75% (3,000 / (1,000+3,000)).

For investments, we consider the investment sums for each topic. For example, if Topic A is associated with $30Mio in VC investments and Topic B is associated with $70Mio, then Topic A has a share of 30% ($30Mio / ($30Mio + $70Mio)), and Topic B has a share of 70% ($70Mio / ($30Mio + $70Mio)).

In order to calculate an overall share of a topic across all signals, we simply use the arithmetic mean. For example, if a topic has shares of…

  • 20% scientific publications
  • 10% technology blogs
  • 30% VC investments
  • 5% patents
  • 30% industry news

…it has an overall score of (20% + 10% + 30% + 5% + 30%) / 5 = 19%.

Growth rates

We calcuate growth as compound quarterly growth rate. The formula is the same as for CAGR, just that it is CQGR instead. When looking at a time series of share values, we start calculating CQGR at the first value that is not zero. This helps us avoid the nasty “from zero to more than zero” problem. What do I mean by this? Well, formally, the growth rate in such cases is infinite. But we think that this does not capture what we want here. For instance, a growth in investments from 0 to $10Mio would be infinite, just as a growth from 0 to $100Mio would be. This does not seem right to us. So we skip zeroes for growth rate calculation, until we have a plausible solution.

Let’s see an example!

Yes. A while ago, we compared the wearables makers Fitbit and Jawbone. Let’s resume this example here. In our new Topic Triage tool, the overall chart for these two searches now looks like this:

In this screenshot, we can see that from 2014 to 2017, Fitbit grew, and Jawbone got smaller.

Let’s look at the individual signals for each topic now:

In the Topic Triage tool, you can select which signals to include. By default, all signals are included, like in the screenshot above. But you may want to exclude one or several signals, depending on your use case.

Here are the detailed charts for Fitbit and Jawbone:

From this you can see that Fitbit grew mostly in Scientific Publications. By contrast, Jawbone shows negative growth in Scientific Publications and in Industry News, whereas it grew slightly in VC investments:

The charts are interactive. This lets you explore the underlying data. For example, let’s say you want to zoom in on Fitbit’s growth in Scientific Publications:

Clicking on the Scientific Publications part of the chart opens a new tab with the underlying data. In this case you can see the most relevant experts in a tag cloud, followed by the most recent documents:

What’s next?

At the moment we are working on several case studies that use the Topic Triage tool. We will publish the case studies at https://thescope.tech/, our new technology and innovation news portal.

We will also spend some thoughts on the “from zero to more than zero growth problem” mentioned above. If you have any suggestions here, we would be glad to discuss this with you!

 


Also published on Medium.