Scraping the International Finance Corporation’s database with R

20 minutes read
Code with pipes representing a web scraping workflow in R

This scraper was used for the paper The Political Economy of International Finance Corporation Lending by Dreher and Richert.

The International Finance Corporation offers a database that contains
the projects it helped to finance. The data that should be extracted for every project are:

  • Project number
  • URL
  • Region
  • The dates at which the project was signed and approved
  • Cost, risk management, guarantee, loan
  • Information about the project’s sponsor
  • The main location of the involved enterprise

The main challenge scraping the IFC’s database is the messy way in which the project
data was entered and differing page layouts. For example, the latest projects like RCBC Bond have tabs at the end of the page that
contain some of the important information like cost and contact data. The
aforementioned project only has the project cost given as running text which
will be returned as is, because it would be extremely hard to automatically
evaluate the text’s meaning. The same applies for the “project sponsor and major
shareholders” section. Additionally, some of the later projects have
a table in the cost tab that contains risk management, guarantee, loan and
equity values, see e.g. Ineco SME 2015.

Extracting the main location of the enterprise is a bit tricky. This piece of
information will be gathered from the first entry in the “contact” tab. The scraper
will have to be able to recognize the country name. For this purpose the countrycode
package can be used that contains a data frame of the world’s country names so that
the scraper can search for occurences of these names.

So, in order to get started, we need the URLs of all projects in the database.

Read more

EBRD: A summary of spatial distribution of projects and project size

9 minutes read
A map depicting project sizes per country

This post summarizes the energy related projects that were financed by the
European Bank for Reconstruction and Development. The necessary data was
downloaded from the publicly available
project database using the following
web scraper written in R that is also available on Github. It is not
necessary to execute that code if you would like to reproduce the document.
The resulting data is available as projects_costscurr.csv.
The code that produces the graphs can be found in the
Rmd-file of this project.

Read more