Some web sites contain interesting information or updates but do not offer feeds or APIs. Here we describe how to create a feed for such sites, add the feed to mergeflow as a source, and analyze its content. We already posted a very short article on this topic a while ago (http://blog.mergeflow.com/2014/10/kimonolabs-how-to-get-updates-from-websites-that-have-no-web-feeds-or-api/). Here we go into details.
As an example web site with interesting updates but no feeds or APIs, we selected http://www.baypat.de/en/. This is the web site of “Bayerische Patentallianz”, a Bavarian government organization that hosts interesting technology offers that come out of Bavarian research institutions (e.g. universities). They do other things too at BayPat, but here we are interested in their technology offers (cf. http://www.baypat.de/en/technologyoffers; please click on the screenshot below in order to see a larger version):
We use a very easy-to-use tool from our friends at kimonolabs (https://www.kimonolabs.com/) to create a web feed for this page:
The web feed from kimono will then allow us to add BayPat technology offers (existing ones and new updates) as a source to mergeflow.
Creating the web feed with kimono
After installing the kimono extension to your chrome browser (we recommend you do this as it makes life easier downstream), you can request the webpage of interest, in our case www.baypat.de/en/technologyoffers. Then click on the kimono icon at the top right of your browser.
When the kimono extension has started, choose “title” as first data type and select the list item headlines from text below.
Then, add a further datatype, “description”, and select appropriate passages:
Now, in order to get all existing technology offers from BayPat, use kimono’s pagination function, and browse all existing pages:
Then finish your API (which will deliver the web feed) by clicking on “done”. Choose all settings as shown below:
Adding the kimono web feed to mergeflow
Now, add this new web feed to the mergeflow custom repository of your choice (for information on custom repositories, please see http://blog.mergeflow.com/2015/01/custom-repository/):
Analyzing the content
Now you can start analyzing the retrieved documents, using mergeflow’s analytics. For instance, once you add a feed to mergeflow as a source, mergeflow automatically identifies organizations, technologies, locations, and other objects in the contents delivered by the feed. You can, for instance, use a relationship graph to explore how these objects relate to each other (for more information on how to use mergeflow’s relationship graphs, please see http://blog.mergeflow.com/2015/01/relationship-graphs/):
The relationship graph suggests that many of BayPat’s technology offers are related to health care (as evidenced by the high number of “Disease” nodes in the graph). For instance, one technology offer from the field of ophthalmology…
…describes a new coating for after-cataract intraocular lenses (cf. http://www.baypat.de/en/technologyoffers?tech_ang=1547):