This scraper was used for the paper The Political Economy of International Finance Corporation Lending by Dreher and Richert.
The International Finance Corporation offers a database that contains
the projects it helped to finance. The data that should be extracted for every project are:
- Project number
- The dates at which the project was signed and approved
- Cost, risk management, guarantee, loan
- Information about the project’s sponsor
- The main location of the involved enterprise
The main challenge scraping the IFC’s database is the messy way in which the project
data was entered and differing page layouts. For example, the latest projects like RCBC Bond have tabs at the end of the page that
contain some of the important information like cost and contact data. The
aforementioned project only has the project cost given as running text which
will be returned as is, because it would be extremely hard to automatically
evaluate the text’s meaning. The same applies for the “project sponsor and major
shareholders” section. Additionally, some of the later projects have
a table in the cost tab that contains risk management, guarantee, loan and
equity values, see e.g. Ineco SME 2015.
Extracting the main location of the enterprise is a bit tricky. This piece of
information will be gathered from the first entry in the “contact” tab. The scraper
will have to be able to recognize the country name. For this purpose the
package can be used that contains a data frame of the world’s country names so that
the scraper can search for occurences of these names.
So, in order to get started, we need the URLs of all projects in the database.