Exploratory Analysis of the Rossmann store sales data from Kaggle

14 minutes read
A Scatterplot with smoothing

This is an exploratory analysis in R of the Rossmann Store Sales data which can be found here. The data isn’t huge but the speedup using data.table is noticeable. It is nice to have unmasked data which allows for some interpretations.

Read in the data:

library(data.table)
library(zoo)
library(forecast)
library(ggplot2)
test <- fread("test.csv")
train <- fread("train.csv")
store <- fread("store.csv")

Let’s have a first look at the train set:

str(train)
## Classes 'data.table' and 'data.frame':   1017209 obs. of  9 variables:
##  $ Store        : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ DayOfWeek    : int  5 5 5 5 5 5 5 5 5 5 ...
##  $ Date         : chr  "2015-07-31" "2015-07-31" "2015-07-31" "2015-07-31" ...
##  $ Sales        : int  5263 6064 8314 13995 4822 5651 15344 8492 8565 7185 ...
##  $ Customers    : int  555 625 821 1498 559 589 1414 833 687 681 ...
##  $ Open         : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ Promo        : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ StateHoliday : chr  "0" "0" "0" "0" ...
##  $ SchoolHoliday: int  1 1 1 1 1 1 1 1 1 1 ...
##  - attr(*, ".internal.selfref")=<externalptr>

…and at the test set:

Read more

EBRD: A summary of spatial distribution of projects and project size

9 minutes read
A map depicting project sizes per country

This post summarizes the energy related projects that were financed by the
European Bank for Reconstruction and Development. The necessary data was
downloaded from the publicly available
project database using the following
web scraper written in R that is also available on Github. It is not
necessary to execute that code if you would like to reproduce the document.
The resulting data is available as projects_costscurr.csv.
The code that produces the graphs can be found in the
Rmd-file of this project.

Read more