Scraping the International Finance Corporation’s database with R

20 minutes read

This scraper was used for the paper The Political Economy of International Finance Corporation Lending by Dreher and Richert. The database is a little hard to scrape automatically (as we will see), but apparently the scraping results made the manual work of going through the database a lot easier.

The International Finance Corporation offers a database that contains
the projects it helped to finance. The data that should be extracted for every project are:

  • Project number
  • URL
  • Region
  • The dates at which the project was signed and approved
  • Cost, risk management, guarantee, loan
  • Information about the project’s sponsor
  • The main location of the involved enterprise

The main challenge scraping the IFC’s database is the messy way in which the project
data was entered and differing page layouts. For example, the latest projects like RCBC Bond have tabs at the end of the page that
contain some of the important information like cost and contact data. The
aforementioned project only has the project cost given as running text which
will be returned as is, because it would be extremely hard to automatically
evaluate the text’s meaning. The same applies for the “project sponsor and major
shareholders” section. Additionally, some of the later projects have
a table in the cost tab that contains risk management, guarantee, loan and
equity values, see e.g. Ineco SME 2015.

Extracting the main location of the enterprise is a bit tricky. This piece of
information will be gathered from the first entry in the “contact” tab. The scraper
will have to be able to recognize the country name. For this purpose the countrycode
package can be used that contains a data frame of the world’s country names so that
the scraper can search for occurences of these names.

So, in order to get started, we need the URLs of all projects in the database.
If we click through the pages of the database, we notice the URL scheme:

  • The variables “doccount” and “count” remain constant. Every page contains 100 projects.
  • The variable “page” is equal to the page number
  • The variable “start” refers to the number of the first project on the page, e.g.
    101 for page 2.

A function to generate URLs of all 68 pages of the database could look like this:

generatePageURLs <- function(pages) {
    databasePageURLs <- rep(NA, times = pages)
    for (p in 1:pages) {
        databasePageURLs[p] <- paste0("http://ifcextapps.ifc.org/",
                                      "ifcext/spiwebsite1.nsf/",
                                      "frmshowview?openform&view=CRUDate&",
                                      "start=", 100 * p - 99,
                                      "&count=100&page=", p,
                                      "&doccount=6770")
    }
    return(databasePageURLs)
}

Next, we have to find the URLs of all projects that one of those pages contains.
All URLs that link to projects contain the string ?opendocument which can be
used to extract these URLs from the HTML:

getProjLinksFromPage <- function(URL) {
    require(XML)
    doc <- htmlParse(URL)
    links <- xpathSApply(doc, "//a/@href")
    links <- links[grep(x = tolower(links), pattern = "opendocument")]
    links <- paste0("http://ifcextapps.ifc.org", links)
    free(doc)
    return(links)
}

The Xpath statement looks for all links on the page and the regular expression selects all
that contain opendocument. On windows (my experience with Win 10) there are
enconding problems which would later lead to characters like  or ’.
The following function converts
the encoding from UTF-8 and also removes line breaks (\n. str_trim() from the
stringr package can also be used to do this).
There were no problems on Ubuntu, but the function doesn’t do any harm there.

cleanEnc <- function(string) {
    stringClean <- iconv(string, from = "UTF-8")
    stringClean <- gsub(x = stringClean, pattern = "\\n+", replacement = " ")
    return(stringClean)
}

In addition to the fact that only some pages contain tabs at the end of the
page the tabs that are present differ from project to project. We need a function
that looks for these tabs and reads their names so we can look for the cost and
contacts tabs. Since the number of tabs is unknown the function uses a while loop:

getTabNames <- function(projectHTML) {
    # Find tab that contains cost
    # Get names of all tabs
    require(rvest)
    require(stringr)
    latestTab <- c()
    tabNames <- c()
    tabNr <- 1
    # Stop loop if latestTab is character(0)
    while (!identical(latestTab, character(0))) {
        tabToTry <- paste0('#tab', tabNr)
        latestTab <- projectHTML %>%
            html_nodes(tabToTry) %>%
            html_text(trim = T) %>%
            cleanEnc() # Doesn't always work here, why?
        tabNames <- c(tabNames, latestTab)
        tabNr <- tabNr + 1
    }
    return(tabNames)
}

Now we are ready to define the main function that will gather the project information.
The code is rather long but I guess it is quite self explanatory. It will need
the tab names and the HTML of the project page. The workhorses are html_nodes
and html_text from the rvest package as well as regular expressions.
The Xpaths were generated using
SelectorGadget for Chrome as suggested in this tutorial.

There is a lot of “if-else” going on which is due to the messiness of the
project pages. The scraper has to be robust to the different layouts that were
mentioned in the beginning and to missing tabs or headlines. Usually, the function
selects a section of the page using html_nodes and html_text which results
in a character vector that contains headlines and the following text. It then
looks for a specific headline and saves the text that follows, which is in the next
element of the character vector.

Some pieces of information, like the project number and the dates at which
the project was signed and approved, are contained in the part of the website
above the tabs and can be extracted by looking for “region” or “previous events”
(which contains the dates) at the beginning of a character element. The project
number is always first.

As was already mentioned, the last part of the function uses countrycode_data$country.name
from the countrycode package to match any of the text in the first entry of
the contacts tab to a country name. Again, the function has to be robust to
finding none, one, or multiple country names and to traps like matching
‘Dominica’ when the contact details contain ‘Dominican Republic’.

getProjectInfo <- function(projectHTML, tabNames) {
    require(rvest)
    require(stringr)
    require(countrycode)
    originalLocale <- Sys.getlocale(category = "LC_TIME")
    # For date conversion irrespective of OS language
    Sys.setlocale("LC_TIME", "C")

    # Get project NR
    projectNr <- projectHTML %>%
        html_nodes(xpath = "//*[contains(concat( ' ', @class, ' ' ), concat( ' ', 'dataCell', ' ' ))]") %>%
        html_text(trim = T) %>%
        head(n = 1) %>%
        str_trim()
    projectHeading <- projectHTML %>%
        html_nodes("td") %>%
        html_text() %>%
        str_trim()

    # Get Region
    regionPosition <- grep(x = tolower(projectHeading), pattern = "^region.?

quot;) if (length(regionPosition) > 1) { warning(paste("Multiple occurences of region found")) } if (length(regionPosition) == 1) { projectRegion <- projectHeading[regionPosition + 1] } else projectRegion <- "Not found" # Get 'previous events' previousEventsPosition <- grep(x = tolower(projectHeading), pattern = "^previous") if (length(previousEventsPosition) == 1) { previousEvents <- projectHeading[previousEventsPosition + 1] previousEvents <- cleanEnc(previousEvents) } else previousEvents <- "Not found" # Store Signed and Approved if (previousEvents != "Not found") { eventsSplit <- unlist(strsplit(previousEvents, " ")) apprPos <- grep(tolower(eventsSplit), pattern = "approved:") if (length(apprPos) == 1) { Approved <- paste(eventsSplit[apprPos + 1:3], collapse = " ") Approved <- gsub(Approved, pattern = "[ |,]+", replacement = "-") Approved <- as.Date(Approved, format = "%B-%d-%Y") Approved <- as.character(Approved) # To prevent coercion to num } else { Approved <- "'Approved' not among previous events" } signedPos <- grep(tolower(eventsSplit), pattern = "signed:") if (length(signedPos) == 1) { Signed <- paste(eventsSplit[signedPos + 1:3], collapse = " ") Signed <- gsub(Signed, pattern = "[ |,]+", replacement = "-") Signed <- as.Date(Signed, format = "%B-%d-%Y") Signed <- as.character(Signed) # To prevent coercion to num } else { Signed <- "'Signed' not among previous events" } } else { Approved = Signed = "Not found" } # Extract content of the cost tab costTabNr <- grep(tabNames, pattern = '[C|c]ost') # If length == 0 no cost tab was found if (length(costTabNr) == 0) { info <- data.frame(matrix(rep("No cost tab", times = 4), nrow = 1), stringsAsFactors = F) colnames(info) <- c("Risk Management", "Guarantee", "Loan", "Equity") sponsor = "No cost tab" totalCost = "No cost tab" } else { xp <- paste0("//*[(@id = 'divtab", costTabNr, "')]//td") tabContent <- projectHTML %>% html_nodes(xpath = xp) %>% html_text() # Store content of "general" cost info field fieldname <- "total project cost and amount and nature of ifc's investment" fieldposition <- grep(x = tolower(tabContent), pattern = paste0("^", fieldname)) if (length(fieldposition) > 1) { warning(paste("Multiple occurences of", fieldname, "found")) } if (length(fieldposition) == 1) { totalCost <- tabContent[fieldposition + 1] totalCost <- cleanEnc(totalCost) } else totalCost <- "Not found" # Store content of sponsor field fieldname <- "project sponsor" fieldposition <- grep(x = tolower(tabContent), pattern = paste0("^", fieldname)) if (length(fieldposition) > 1) { warning(paste("Multiple occurences of", fieldname, "found")) } if (length(fieldposition) == 1) { sponsor <- tabContent[fieldposition + 1] sponsor <- cleanEnc(sponsor) } else sponsor <- "Not found" # store table of project infos (not on every page) info <- data.frame(matrix(NA, nrow = 1, ncol = 4), stringsAsFactors = F) colnames(info) <- c("Risk Management", "Guarantee", "Loan", "Equity") info[1, ] <- sapply(colnames(info), function(x) { pat <- paste0("^", x) categoryIndex <- grep(tabContent, pattern = pat) if (length(categoryIndex) >= 1) { cellValue <- tabContent[categoryIndex + 1] cellValue <- cleanEnc(cellValue) return(cellValue) } else return("Not found") }) } # Extract content of the contact tab contactTabNr <- grep(tabNames, pattern = '[C|c]ontacts') # If length == 0 no cost tab was found if (length(contactTabNr) == 0) { enterpriseBase <- "Not found" } else { xp <- paste0("//*[(@id = 'divtab", contactTabNr, "')]//td") tabContent <- projectHTML %>% html_nodes(xpath = xp) %>% html_text(trim = T) %>% cleanEnc() # Store content of "for inquiries..." field fieldname <- "for inquiries about the project, contact:" fieldposition <- which(tolower(tabContent) == fieldname) if (length(fieldposition) >= 1) { enterpriseBase <- tabContent[fieldposition + 1] enterpriseBase <- sapply(countrycode_data$country.name, function(x) { # There should be a space or end of line etc. after the country name # otherwise e.g. Dominica is found if country = Dominican Rep. pat <- paste0(tolower(x), "[ |$|,|\\.|:]") countryFound <- grep(tolower(enterpriseBase), pattern = pat) }) enterpriseBase <- unlist(enterpriseBase) enterpriseBase <- names(enterpriseBase) if (length(enterpriseBase) > 1) { enterpriseBase <- paste(enterpriseBase, collapse = " and/or ") } if (is.null(enterpriseBase)) { enterpriseBase <- "Not found or country name unknown" } } else enterpriseBase <- "Not found" } Sys.setlocale("LC_TIME", originalLocale) return(list(projectNr = projectNr, projectRegion = projectRegion, sponsor = sponsor, Approved = Approved, Signed = Signed, enterpriseBase = enterpriseBase, totalProjectCost = totalCost, riskManagement = info

This post is a little similar to my post about the EBRD’s database but, at least currently,
this scraper is still functional as the webpage has not changed.

The International Finance Corporation offers a database that contains
the projects it helped to finance. The data that should be extracted for every project are:

  • Project number
  • URL
  • Region
  • The dates at which the project was signed and approved
  • Cost, risk management, guarantee, loan
  • Information about the project’s sponsor
  • The main location of the involved enterprise

The main challenge scraping the IFC’s database is the messy way in which the project
data was entered and differing page layouts. For example, the latest projects like RCBC Bond have tabs at the end of the page that
contain some of the important information like cost and contact data. The
aforementioned project only has the project cost given as running text which
will be returned as is, because it would be extremely hard to automatically
evaluate the text’s meaning. The same applies for the “project sponsor and major
shareholders” section. Additionally, some of the later projects have
a table in the cost tab that contains risk management, guarantee, loan and
equity values, see e.g. Ineco SME 2015.

Extracting the main location of the enterprise is a bit tricky. This piece of
information will be gathered from the first entry in the “contact” tab. The scraper
will have to be able to recognize the country name. For this purpose the countrycode
package can be used that contains a data frame of the world’s country names so that
the scraper can search for occurences of these names.

So, in order to get started, we need the URLs of all projects in the database.
If we click through the pages of the database, we notice the URL scheme:

  • The variables “doccount” and “count” remain constant. Every page contains 100 projects.
  • The variable “page” is equal to the page number
  • The variable “start” refers to the number of the first project on the page, e.g.
    101 for page 2.

A function to generate URLs of all 68 pages of the database could look like this:

generatePageURLs <- function(pages) {
    databasePageURLs <- rep(NA, times = pages)
    for (p in 1:pages) {
        databasePageURLs[p] <- paste0("http://ifcextapps.ifc.org/",
                                      "ifcext/spiwebsite1.nsf/",
                                      "frmshowview?openform&view=CRUDate&",
                                      "start=", 100 * p - 99,
                                      "&count=100&page=", p,
                                      "&doccount=6770")
    }
    return(databasePageURLs)
}

Next, we have to find the URLs of all projects that one of those pages contains.
All URLs that link to projects contain the string ?opendocument which can be
used to extract these URLs from the HTML:

getProjLinksFromPage <- function(URL) {
    require(XML)
    doc <- htmlParse(URL)
    links <- xpathSApply(doc, "//a/@href")
    links <- links[grep(x = tolower(links), pattern = "opendocument")]
    links <- paste0("http://ifcextapps.ifc.org", links)
    free(doc)
    return(links)
}

The Xpath statement looks for all links on the page and the regular expression selects all
that contain opendocument. On windows (my experience with Win 10) there are
enconding problems which would later lead to characters like  or ’.
The following function converts
the encoding from UTF-8 and also removes line breaks (\n. str_trim() from the
stringr package can also be used to do this).
There were no problems on Ubuntu, but the function doesn’t do any harm there.

cleanEnc <- function(string) {
    stringClean <- iconv(string, from = "UTF-8")
    stringClean <- gsub(x = stringClean, pattern = "\\n+", replacement = " ")
    return(stringClean)
}

In addition to the fact that only some pages contain tabs at the end of the
page the tabs that are present differ from project to project. We need a function
that looks for these tabs and reads their names so we can look for the cost and
contacts tabs. Since the number of tabs is unknown the function uses a while loop:

getTabNames <- function(projectHTML) {
    # Find tab that contains cost
    # Get names of all tabs
    require(rvest)
    require(stringr)
    latestTab <- c()
    tabNames <- c()
    tabNr <- 1
    # Stop loop if latestTab is character(0)
    while (!identical(latestTab, character(0))) {
        tabToTry <- paste0('#tab', tabNr)
        latestTab <- projectHTML %>%
            html_nodes(tabToTry) %>%
            html_text(trim = T) %>%
            cleanEnc() # Doesn't always work here, why?
        tabNames <- c(tabNames, latestTab)
        tabNr <- tabNr + 1
    }
    return(tabNames)
}

Now we are ready to define the main function that will gather the project information.
The code is rather long but I guess it is quite self explanatory. It will need
the tab names and the HTML of the project page. The workhorses are html_nodes
and html_text from the rvest package as well as regular expressions.
The Xpaths were generated using
SelectorGadget for Chrome as suggested in this tutorial.

There is a lot of “if-else” going on which is due to the messiness of the
project pages. The scraper has to be robust to the different layouts that were
mentioned in the beginning and to missing tabs or headlines. Usually, the function
selects a section of the page using html_nodes and html_text which results
in a character vector that contains headlines and the following text. It then
looks for a specific headline and saves the text that follows, which is in the next
element of the character vector.

Some pieces of information, like the project number and the dates at which
the project was signed and approved, are contained in the part of the website
above the tabs and can be extracted by looking for “region” or “previous events”
(which contains the dates) at the beginning of a character element. The project
number is always first.

As was already mentioned, the last part of the function uses countrycode_data$country.name
from the countrycode package to match any of the text in the first entry of
the contacts tab to a country name. Again, the function has to be robust to
finding none, one, or multiple country names and to traps like matching
‘Dominica’ when the contact details contain ‘Dominican Republic’.

Risk Management`, guarantee = info$Guarantee, loan = info$Loan, equity = info$Equity )) } 

The getProjectInfo() function can now be incorporated in a loop that would
access all database pages and get the information of every project. A practical
issue is that with close to 6800 projects to load, earlier or later a
read_html will probably fail due to a timeout or temporary connection problems.
In order to prevent this the following function is used.
It just tries a certain number of times to assign something (e.g. the returned object
of a function) to a specified
object in the parent frame:

keepTrying <- function(target, call, maxTries = 5, timeout = 5,
                       noTermination = F, silent = F) {
    stopifnot(is.character(target) & is.character(call))
    result <- NA
    class(result) <- "try-error"
    nTries <- 0
    while ("try-error" %in% class(result)) {
        result <- try(eval(parse(text = call)), silent = silent)
        if ("try-error" %in% class(result)) {
            nTries <- nTries + 1
            if (nTries >= maxTries & noTermination) {
                break
            } else if (nTries >= maxTries & !noTermination) {
                stop ("maxTries reached")
            }
        }
        if ("try-error" %in% class(result)) {
            if (silent == F) message(paste("Trying again in", timeout, "sec."))
            Sys.sleep(timeout)
        }
    }
    assign(x = target,
           value = result,
           envir = parent.frame())
}

Here’s the loop for scraping the project data. It displays percent completed
and gives some status messages during run time. There’s actually one dead link
in the project database so the loop has to be robust to that: If keepTrying
returns a try-error the error message will be saved in the output.

source("functions.R")
source("keepTrying.R")
# Look up manually at
# http://ifcextapps.ifc.org/ifcext/spiwebsite1.nsf/frmShowView?openform&view=CRUDate&start=1&count=100&page=1
pages <- 68
databasePageURLs <- generatePageURLs(pages)
projectList <- list()
nProjects <- 100 * pages # approximate number of projects to load
nLoaded <- 1
# Elapsed time for one database page with a timeout between 2 und 5:
# about 6.5 minutes
for (page in 1:(length(databasePageURLs))) {
    # To limit stress to the site
    Sys.sleep(runif(1, min = 2, max = 5))
    cat(paste0("Getting project links from database page ", page, "..."))
    projLinks <- getProjLinksFromPage(databasePageURLs[page])
    cat(" done. \n")
    for (pnr in seq_along(projLinks)) {
        # To limit stress to the site
        Sys.sleep(runif(1, min = 2, max = 5))
        cat(paste0(round(nLoaded / nProjects, 4) * 100, "% ",
                   Sys.time(), " Downloading and reading... "))

        keepTrying("projectHTML",
                   "read_html(projLinks[pnr], encoding = 'Latin-1')",
                   maxTries = 10, timeout = 10, noTermination = T, silent = F)

        if ("try-error" %in% class(projectHTML)) { # = download error
            errMessage <- attributes(projectHTML)$condition$message
            projInfo <- rep(list(errMessage), 12)
            names(projInfo) <- names(projectList[[1]])
            projInfo$URL <- projLinks[pnr]
        } else {
            tabNames <- getTabNames(projectHTML)
            projInfo <- getProjectInfo(projectHTML, tabNames)
            projInfo$URL <- projLinks[pnr]
        }

        projectList[[length(projectList) + 1]] <- projInfo

        nLoaded <- nLoaded + 1
        cat(paste0(" project ", projInfo$projectNr, " done. \n"))
    }
    # Safety measure: Store list after finishing a page
    save(projectList, file = "projectList_backup.RData")
}

result <- lapply(projectList, unlist)
result <- do.call(rbind, result) # We don't want warnings here!
result <- data.frame(result, stringsAsFactors = F)
result$projectNr <- as.numeric(result$projectNr) # Coerces 'Not found' to NA
result$projectNr[is.na(result$projectNr)] <- "error"
result <- result[order(result$projectNr, decreasing = F), ]

This is what a row of the result can look like (some are rather empty of course):

## projectNr: 25757
## projectRegion: Not found
## sponsor: Ecobank Ghana Ltd (EBG) Ecobank Ghana (EBG) was incorporated in January 1989 as a private limited liability company.  Initially licensed to operate as a merchant bank by the Bank of Ghana, it began operating in February 1990.  EBG has grown consistently over the years to become one of the leading banks in Ghana and a well-recognized brand in the Ghanaian banking industry.  EBG acquired a universal banking license in 2003 and has since expanded its geographical reach and broadened its scope of financial services.  It is now the fourth largest bank in Ghana with 8.4% of the market in terms of asset size and a network of 25 branches.  Its shares were listed on the Ghana Stock Exchange in July 2006.  EBG is a 87.47% owned subsidiary of Ecobank Transnational International (ETI), one of the largest banking franchises in Sub-Saharan Africa head-quartered in Lome, Republic of Togo.
## Approved: 2008-06-04
## Signed: 2008-11-27
## enterpriseBase: Ghana
## totalProjectCost: The proposed project will be implemented in Ghana under the IDA-IFC program for Sub-Saharan Africa and aims to encourage selected commercial banks to lend to SMEs. The RSF of $6.25 million will support a portfolio of up to $12.5 million. The proposed RSF is a risk-sharing facility developed under the IDA-IFC program for Sub-Saharan Africa and aims to encourage selected commercial banks to lend to SMEs.  The RSF, issued by IFC as facility agent for its account and for the account of the GoG, would cover up to 50% of the loan principal amount of the bank’s portfolio of selected SME loans, thus encouraging the bank to lend whilst ensuring good credit standards since the bank still retains 50% of the risk on the portfolio.
## riskManagement:  
## guarantee: 6.28
## loan:  
## equity:  
## URL: http://ifcextapps.ifc.org/ifcext/spiwebsite1.nsf/651aeb16abd09c1f8525797d006976ba/ea0597dbb41f97db852576ba000e2ab2?OpenDocument

Another peculiarity of this database is that the projects can have multiple pages
so that the data from multiple pages has to be combined. We can of course use
the project number to merge the rows of the output. I’m also converting the
“not found” messages to NA.

resultMerged <- result
NA_strings <- c("Not found", "No cost tab", "'Approved' not among previous events",
                "'Signed' not among previous events", " ")
resultMerged <- apply(resultMerged, 2, function(x) {
    temp <- x
    temp[temp %in% NA_strings] <- NA
    temp <- str_trim(temp)
    return(temp)
})
resultMerged <- data.frame(resultMerged, stringsAsFactors = F)
resultMerged$projectNr <- as.numeric(resultMerged$projectNr)
resultMerged$projectNr[is.na(resultMerged$projectNr)] <- "error"
# Put both URLs (if possible) into URL column
UrlPerProject <- sapply(resultMerged$projectNr,
                        function(x) resultMerged[resultMerged$projectNr == x, "URL"])
UrlPerProject <- sapply(UrlPerProject, function(x) paste(sort(x), collapse = " "))
UrlPerProject <- data.frame(projectNr = as.character(names(UrlPerProject)),
                            URL = UrlPerProject, stringsAsFactors = F)
resultMerged$URL <- NULL
resultMerged$URL <- sapply(resultMerged$projectNr, function(x) {
    insert <- UrlPerProject[UrlPerProject$projectNr == x, "URL"]
    insert <- sort(unique(insert))
    insert <- paste(insert, collapse = " ")
    return(insert)
})

This is ‘real world’ data which suffers from the fact that it was obviously
entered by humans who had some freedom in deciding what to write. We’ll have
to check if there are no conflicts in the column values of a project if that project
had multiple pages in the database.

projectNumbers <- unique(resultMerged$projectNr)
projectNumbers <- projectNumbers[projectNumbers != "error"]
projWithConflicts <- c()
for (i in seq_along(projectNumbers)) {
    tempdat <- resultMerged[resultMerged$projectNr == projectNumbers[i], ]
    if (nrow(tempdat) > 1) {
        if (nrow(tempdat) > 2) {
            print(paste("Project", projectNumbers[i], "has more than 2 pages"))
        }
        for (col in 1:ncol(tempdat)){
            tempcol <- na.omit(tempdat[, col])
            if (length(tempcol) > 1) {
                if (!identical(tempcol[1], tempcol[2])) {
                    print(paste("Project", projectNumbers[i], "Conflict of values"))
                    projWithConflicts <- c(projWithConflicts, projectNumbers[i])
                }
            }
        }
    }
}

There are 7 projects with conflicts: 26504, 27091, 27373, 27373, 28741, 37243, and 8648.
Sometimes the region was changed, e.g. from WORLD to a continent or something was
added to the cost field. These conflicts will have to be resolved manually.

For now, we are keeping the first value if a project has multiple values in a
column that are not NA (and if there are multiple ones those should normally
be equal):

resultMerged <- aggregate(x = resultMerged,
          by = list(resultMerged$projectNr),
          FUN = function(x) na.omit(x)[1])[,-1]

Lastly, here are some projects from the final table. This scraper manages to scrape most of
the information, but still has some problems scraping older projects whose pages
‘look different’. This is still work in progress and I will update this page
accordingly.

## projectNr: 25748
## projectRegion: East Asia and the Pacific
## sponsor: No cost tab
## Approved: 2008-05-16
## Signed: 2008-05-21
## enterpriseBase: Not found
## totalProjectCost: No cost tab
## riskManagement: No cost tab
## guarantee: No cost tab
## loan: No cost tab
## equity: No cost tab
## URL: http://ifcextapps.ifc.org/ifcext/spiwebsite1.nsf/651aeb16abd09c1f8525797d006976ba/c0f797370bbeebe7852576ba000e2b98?OpenDocument
## ------------------------------
## projectNr: 25748
## projectRegion: Not found
## sponsor: Nature is currently wholly-owned by Mr. Se Hok Pan and his family.  Mr. Se is one of the leading entrepreneurs in China’s wood manufacturing sector and holds several important positions in industry associations and has over 20 years of experience in wood flooring industry since 1987 when he and his family started the wood flooring trading business in Shunde.  Mr. Se is the majority shareholder owner of the company; other minority shareholders include Mr. Se’s wife, his brother and two other founding shareholders.
## Approved: 2008-05-16
## Signed: 2008-05-21
## enterpriseBase: Not found or country name unknown
## totalProjectCost: The cost of the Project is estimated at $120 million.  IFC’s proposed investment includes a quasi-equity/equity investment of $20 million and a long-term loan of up to $30 million.   
## riskManagement:  
## guarantee:  
## loan: 30
## equity: 20
## URL: http://ifcextapps.ifc.org/ifcext/spiwebsite1.nsf/651aeb16abd09c1f8525797d006976ba/3298db8d9ed575a0852576ba000e2b91?OpenDocument
## ------------------------------
## projectNr: 25751
## projectRegion: Not found
## sponsor: Injazat Capital Limited is a venture capital & private equity fund management and advisory services company operating from Dubai, United Arab Emirates with fifteen investment professionals.  Injazat is registered under the laws of Dubai International Financial Centre (DIFC), regulated by Dubai Financial Services Authority (DFSA), and is owned by Gulf Finance House, a private Bahrain-based investment bank, Islamic Corporation for the Development of the Private Sector, the private sector arm of the Islamic Development Bank, and the principals of the Manager.  Injazat previously made 11 investments through a predecessor fund, of which three are fully realized.
## Approved: 2007-05-16
## Signed: 2007-05-30
## enterpriseBase: Not found or country name unknown
## totalProjectCost: It is proposed that IFC invest up to $15 million, up to 20% of total commitments to the Fund.  
## riskManagement:  
## guarantee:  
## loan:  
## equity: 15
## URL: http://ifcextapps.ifc.org/ifcext/spiwebsite1.nsf/651aeb16abd09c1f8525797d006976ba/1f1bd41dc36a6593852576ba000e297c?OpenDocument
## ------------------------------
## projectNr: 25757
## projectRegion: Not found
## sponsor: Ecobank Ghana Ltd (EBG) Ecobank Ghana (EBG) was incorporated in January 1989 as a private limited liability company.  Initially licensed to operate as a merchant bank by the Bank of Ghana, it began operating in February 1990.  EBG has grown consistently over the years to become one of the leading banks in Ghana and a well-recognized brand in the Ghanaian banking industry.  EBG acquired a universal banking license in 2003 and has since expanded its geographical reach and broadened its scope of financial services.  It is now the fourth largest bank in Ghana with 8.4% of the market in terms of asset size and a network of 25 branches.  Its shares were listed on the Ghana Stock Exchange in July 2006.  EBG is a 87.47% owned subsidiary of Ecobank Transnational International (ETI), one of the largest banking franchises in Sub-Saharan Africa head-quartered in Lome, Republic of Togo.
## Approved: 2008-06-04
## Signed: 2008-11-27
## enterpriseBase: Ghana
## totalProjectCost: The proposed project will be implemented in Ghana under the IDA-IFC program for Sub-Saharan Africa and aims to encourage selected commercial banks to lend to SMEs. The RSF of $6.25 million will support a portfolio of up to $12.5 million. The proposed RSF is a risk-sharing facility developed under the IDA-IFC program for Sub-Saharan Africa and aims to encourage selected commercial banks to lend to SMEs.  The RSF, issued by IFC as facility agent for its account and for the account of the GoG, would cover up to 50% of the loan principal amount of the bank’s portfolio of selected SME loans, thus encouraging the bank to lend whilst ensuring good credit standards since the bank still retains 50% of the risk on the portfolio.
## riskManagement:  
## guarantee: 6.28
## loan:  
## equity:  
## URL: http://ifcextapps.ifc.org/ifcext/spiwebsite1.nsf/651aeb16abd09c1f8525797d006976ba/ea0597dbb41f97db852576ba000e2ab2?OpenDocument
## ------------------------------
## projectNr: 25761
## projectRegion: Not found
## sponsor: - Merchant Bank Ghana Ltd (MBG) Merchant Bank (Ghana) Limited (MBG) began operations in March 1972 and was the first merchant bank in Ghana.  It is currently the 5th largest bank in Ghana in terms of total assets, with a market share of deposits around 7.0%. The Bank has played a significant role in the development of the country’s merchant banking industry - achievements include:  - establishment of one of the first hire purchase and leasing companies (Merban Leasing),  - promotion and formation of the first discount house in Ghana – Consolidated Discount House,  - sponsorship and provision of registrar services to about 50% of the companies on the GSE,  - arrangement of the first housing finance institution in the country (now HFC Bank).   In 2003, MBG became a universal bank and re-aligned its operations in three business lines: Retail, Corporate and Investment.  In 2005, MBG created an SME unit as part of a strategic re-orientation of the bank and is one of the fastest growing banks in the country driven by its deposit growth and focus on the SME sector.  The Major Shareholders of MBG are Social Security National Investment Trust (SSNIT), the dominant social insurance scheme in Ghana and State Insurance Company (SIC), the leading insurance company in Ghana. SSNIT holds 68.75% and SIC 18.75% of MBG’s shares. 
## Approved: Not found
## Signed: Not found
## enterpriseBase: Ghana
## totalProjectCost: The proposed project will be implemented in Ghana under the IDA-IFC program for Sub-Saharan Africa and aims to encourage selected commercial banks to lend to SMEs. The LOC will lead to new lending of $7.5 million.
## riskManagement: Not found
## guarantee: Not found
## loan: Not found
## equity: Not found
## URL: http://ifcextapps.ifc.org/ifcext/spiwebsite1.nsf/651aeb16abd09c1f8525797d006976ba/97b319b47e3661f3852576ba000e2a8f?OpenDocument
## ------------------------------
## projectNr: 25762
## projectRegion: Not found
## sponsor: Unik’s largest shareholder is the Rio Bravo Group, a leading, Brazilian alternative asset management firm.  Other shareholders include senior management of the firm, as well as the Santa Cruz Group.
## Approved: 2008-05-29
## Signed: 2008-05-29
## enterpriseBase: Brazil
## totalProjectCost: The total investment associated with this project will be BRL 16 million (approximately US$8 million at current exchange rates).  IFC will invest BRL 3 million of cash for an equity stake in Unik, and will also provide Unik with BRL 5 million of debt which is convertible into an additional equity stake of the company if certain operating performance targets are met in the future.  IFC’s investment and debt package will be accompanied by a BRL 3 million cash investment in the firm by Unik’s main shareholder (Rio Bravo Group), as well as a BRL 5 million investment package of equity and debt on the part of IIC that is undertaken on the same terms as IFC’s. The project will allow Unik to complete a comprehensive recapitalization of its liabilities, thereby positioning it to more effectively pursue various product development, growth, and expansion opportunities.
## riskManagement:  
## guarantee:  
## loan: 2.34
## equity: 1.4
## URL: http://ifcextapps.ifc.org/ifcext/spiwebsite1.nsf/651aeb16abd09c1f8525797d006976ba/374aefb43b2af716852576c10080cd3a?OpenDocument
## ------------------------------
## projectNr: 25763
## projectRegion: Not found
## sponsor: The Sponsors are two Nigerian entrepreneurs, Koye Edu (50%) and Gbolly Balogun (50%).  Mr. Edu, a lawyer specializing in commercial and intellectual property matters, worked for Irving & Bonnar (an ex-IFC local counsel) in association with Bentley Edu & Co. for ten years before starting his own firm, Jackson, Etti & Edu, in 1996 with two other partners.  Mr. Balogun is an entrepreneur and property developer/manager with a good track record. The Sponsors started MPL in 1996 with a primary focus in property development and asset management.  
## Approved: 2011-05-13
## Signed: 2011-06-17
## enterpriseBase: Nigeria
## totalProjectCost: The total Project cost is estimated at $37.4 million. The proposed IFC investment is $7.5 million senior loan in AMHL for IFC’s own account and $7.4 million equity in MPL for IFC’s own account.
## riskManagement:  
## guarantee:  
## loan: 7.5
## equity: 7.4
## URL: http://ifcextapps.ifc.org/ifcext/spiwebsite1.nsf/651aeb16abd09c1f8525797d006976ba/1a0c9059cf6ea37685257847000037b2?OpenDocument
## ------------------------------
## projectNr: 25763
## projectRegion: Sub-Saharan Africa
## sponsor: No cost tab
## Approved: 2011-05-13
## Signed: 2011-06-17
## enterpriseBase: Not found
## totalProjectCost: No cost tab
## riskManagement: No cost tab
## guarantee: No cost tab
## loan: No cost tab
## equity: No cost tab
## URL: http://ifcextapps.ifc.org/ifcext/spiwebsite1.nsf/651aeb16abd09c1f8525797d006976ba/58408e97042769e18525784500762408?OpenDocument
## ------------------------------
## projectNr: 25765
## projectRegion: Not found
## sponsor: Bauducco’s controlling interest is owned by Pandurata Participacoes S.A. and Bedece Comercio e Participacoes Ltda., the Bauducco family’s holding companies.
## Approved: 2007-06-22
## Signed: 2007-06-27
## enterpriseBase: Brazil
## totalProjectCost: The proposed IFC investment in the project is a $30 million A Loan.
## riskManagement:  
## guarantee:  
## loan: 30
## equity:  
## URL: http://ifcextapps.ifc.org/ifcext/spiwebsite1.nsf/651aeb16abd09c1f8525797d006976ba/4a75989f085f070a852576ba000e29d2?OpenDocument
## ------------------------------
## projectNr: 25765
## projectRegion: Latin America and the Caribbean
## sponsor: No cost tab
## Approved: 2007-06-22
## Signed: 2007-06-27
## enterpriseBase: Not found
## totalProjectCost: No cost tab
## riskManagement: No cost tab
## guarantee: No cost tab
## loan: No cost tab
## equity: No cost tab
## URL: http://ifcextapps.ifc.org/ifcext/spiwebsite1.nsf/651aeb16abd09c1f8525797d006976ba/fc16f087f4d003e1852576ba000e29d3?OpenDocument
## ------------------------------
## projectNr: 25766
## projectRegion: Not found
## sponsor: The Trust Bank Limited was established in November 1, 1996 to take over the assets and liabilities of the liquidated Meridien BIAO Bank. It is currently owned by six institutions: Belgolaise Bank (35%), Social Security & National Insurance Trust (33.09%),  Holding Cofipa (10%), FMO (10%), African Tiger Mutual Fund (6%), and Ghana Reinsurance Company Limited (5.91%). TTB, a licensed commercial bank, provides general financial intermediation with a focus on universal retail banking services. The core of TTB’s customers are in the middle-tier market segment, mainly SMEs and corporate bodies engaged in sectors such as agribusiness, commerce, construction, manufacturing and services. TTB is considered one of the most SME friendly banks in Ghana. It is also one of the few banks that lends to schools; its school portfolio is currently the best performing of all its SME sectors.
## Approved: 2007-05-11
## Signed: 2007-05-17
## enterpriseBase: Not found or country name unknown
## totalProjectCost: This transaction would be structured as a risk-sharing guarantee with TTB. IFC’s guarantee will cover 50% of the principal credit losses that are in excess of a 5% first loss threshold up to a maximum of 22 billion cedis (equivalent of $2.4 million) with respect to the credit performance of a pool of loans to schools originated by TTB. The Bank’s first-loss position would have to be fully exhausted before IFC would have to pay any guarantee claims under the program. The portfolio is expected to reach a size of 46 billion cedis ($5.1 million) over the next 24 months. Individual loan sizes are expected to range from 180 million cedis ($20,000) to 2.7 billion cedis ($300,000). Under the proposed structure, IFC would reimburse the Bank for half of the principal credit losses of the underlying loans (but only when credit losses exceed 5%).  IFC’s 47.5% share of the second loss guarantee represents our maximum potential liability under this structure.  IFC’s partial guarantee would be denominated in local currency. 
## riskManagement:  
## guarantee: 2.37
## loan:  
## equity:  
## URL: http://ifcextapps.ifc.org/ifcext/spiwebsite1.nsf/651aeb16abd09c1f8525797d006976ba/edfeb169024de23b852576ba000e29c4?OpenDocument
## ------------------------------

Leave a Comment