Search for contacts, projects,
courses and publications

Data Analytics for Finance I & II


Gruber P.

Course director



Programming in Finance and Economics I, Statistics at master level.
The R programming language (together with a bit of SQL and Linux) will be used for most part of this course. Students are free to use other languages for the assignments.


Tukey (1962) defines Data Analysis to be “Procedures for analyzing data, techniques for interpreting the results of such procedures, ways of planning the gathering of data to make its analysis easier, more precise or more accurate, and all the machinery and results of (mathematical) statistics which apply to analyzing data.”

The goal of this course is to provide the students with the tools and thinking framework to accomplish these tasks with the help of the computer along the entire tool chain of financial data science: from obtaining data to organizing and merging it to analyzing and visualizing it.

Description / Program
The course is organized weekly sessions of four hours. There are two parts:

Part 1: Financial Data – first half of the semester

  1. Introduction to Financial Data Science
    Nature and theory of data, data generating process, measurement, data types
  2. Obtaining Financial Data
    Standard datasets: Computstat, CRSP, Optionmetrics, Markit, Factset, Bloomberg
    Alternative datasets: Quandl, microblogging, search engines, blockchain
    Historic datasets
    How to find or create missing data
    A recap of data APIs
  3. The problems with data
    Data encodings and documentation
    Quality and consistency checks
  4. The economics of data
    Flow vs. stock, P vs. Q data
    The data business
  5. Database design
    Design principles for databases
    Dimensioning a database server
  6. Data merging and relational databases
  7. Exploratory data analysis
    Robust and non-robust statistics
    Aggregation, subsampling
    Introduction to Nonparametric statistics
  8. Advanced datacentric econometrics
    Assumptions about data and their verification (seasonality, ADF, out-of sample R^2, …)
    Advanced methods for dealing with problems/limitations in the data (bootstrap, MIDAS,...)

Part 2: Data visualization (not only) for financial Data – second half of the semester

  1. Visualization theory
    Perception and aesthetics, color, the grammar of graphics
  2. Static visualizations
    Bar charts, scatter plots, pie charts, line carts, Sankey diagrams, Parallel coordinate plots,
  3. Statistical visualizations
    Box and violin plots, qq-plot, histograms, tree maps, forest plot, autocorellograms, Lorenz curves, Venn diagrams,
  4. Data maps
    Dot distribution maps, heat maps, choropleths and alternative maps: cartograms, grid and hexagon maps, statebins
  5. Interactive visualizations
    Basic web technology, user interaction, R shiny
  6. Visualizations in R
    ggPlot and shiny

Additional topics (time permitting)

  • The data economy: data as raw material and product, business models, licensing, open data
  • Data in research: sharing and publishing data sets, case studies in the value of (new) datasets
  • Managing a data science project
  • Storytelling with data
  • Alternative plots: Cernov faces, trees and dendograms,
  • Copyright, GDPR (privacy)

Learning Method / Style of Lessons

This course will take students from theory to practice in three steps. New topics are introduced in short lectures, which are followed by learning-by-doing in PC labs. Students finally apply their new knowledge in individual work, which is collected and presented in a student portfolio.

Exam Style
40% – Written midterm exam
60% – Portfolio
Students create portfolios of ca. 20 pages from their individual work throughout the semester, including

  • Discussions of data sets and/or methods
  • Discussions of a paper from the literature
  • Data visualizations

Requested Material

Students are required to bring a laptop with R and R Studio installed to every class. Students will additionally profit from continuing to use the Linux data server, which they have set up in Programming in Finance and Economics II.


Tukey, J.W. (1962): The future of Data Analysis, The Annals of mathematics and statistics, p. 1-67