Search for contacts, projects,
courses and publications

Data Analytics for Finance I & II


Gruber P.

Course director


Programming in Finance and Economics I, Statistics at master level
The R programming language (together with a bit of SQL and Linux) will be used for most part of this course. Students are free to use other languages for the assignments.

Tukey (1962) defines Data Analysis to be “Procedures for analyzing data, techniques for interpreting the results of such procedures, ways of planning the gathering of data to make its analysis easier, more precise or more accurate, and all the machinery and results of (mathematical) statistics which apply to analyzing data.”
The goal of this course is to provide the students with the tools and thinking framework to accomplish these tasks.

Description / Program
The course is organized weekly sessions of four hours. There are two parts:

Part 1: Financial Data – first half of the semester

  1. The provenance and nature of financial data
    Random variables, measurement, data generation, data types
  2. Managing data
    Designing a financial database, ETL, data encodings, quality checks, relations, data at scale, documentation
  3. Preparing datasets
    Merging and aggregating data
  4. Standard data sets and sources in finance
    Computstat, CRSP, Optionmetrics, Markit, Factset, Bloomberg
  5. Alternative and historic data sets
    Quandl, microblogging, search engines, blockchain
  6. Exploratory data analysis
    Robust and non-robust descriptive statistics,
    hypothesis generation, verification of assumptions
  7. Advanced statistical methods
    Dealing with non-rectangular and non-numerical data

Part 2: Data visualization (not only) for financial Data – second half of the semester

  1. Visualization theory
    Perception and aesthetics, color, the grammar of graphics
  2. Static visualizations
    Bar charts, scatter plots, pie charts, line carts, Sankey diagrams, Parallel coordinate plots,
  3. Statistical visualizations
    Box and violin plots, qq-plot, histograms, tree maps, forest plot, autocorellograms, Lorenz curves, Venn diagrams,
  4. Data maps
    Dot distribution maps, heat maps, choropleths and alternative maps: cartograms, grid and hexagon maps, statebins
  5. Interactive visualizations
     Basic web technology, user interaction, R shiny
  6. Visualizations in R
    ggPlot and shiny

Additional topics (time permitting)

  • The data economy: data as raw material and product, business models, licensing, open data
  • Data in research: sharing and publishing data sets, case studies in the value of (new) datasets
  • Managing a data science project
  • Storytelling with data
  • Alternative plots: Cernov faces, trees and dendograms,
  • Copyright, GDPR (privacy)

Learning Method / Style of Lessons
This course will take students from theory to practice in three steps. New topics are introduced in short lectures, which are followed by learning-by-doing in PC labs. Students finally apply their new knowledge in individual work, which is collected and presented in a student portfolio.
Compliant with COVID-19 guidelines 

Exam Style
40% – Written midterm exam
60% – Portfolio
Students create portfolios of ca. 20 pages from their individual work throughout the semester, including

  • Discussions of data sets and/or methods
  • Discussions of a paper from the literature
  • Data visualizations

Class participation is a mandatory component of the course grade

Requested Material
Students are required to bring a laptop with R and R Studio installed to every class. Students will additionally profit from continuing to use the Linux data server, which they have set up in Programming in Finance and Economics II. 

Tukey, J.W. (1962): The future of Data Analysis, The Annals of mathematics and statistics, p. 1-67
Tukey, J.W. (1977): Exploratory data analysis
Additional resources will be given in the first class