Datasets for Computational Education

List made by Chris Oct 2019

World bank education data:

Dataset: hour of code data:

Code from half a million students learning on's Hour of Code curriculum in 2013 (10 problems).
Dataset: code studio data:

Data from Mike + Chris' paper on Rubric Sampling
Talk to us

KDD cup challenge

This year's challenge asks you to predict student performance on mathematical problems from logs of student interaction with Intelligent Tutoring Systems

DuoLingo dataset


Coursera ML LogReg dataset

Data from the Codewebs paper and Functional Variability of a Million MOOC submissions
Talk to us

PyramidSnapshot dataset

Citizenship Test questions

Google Trends

GitHub JavaSmall

Chris downloaded the public Java repositories on GitHub (minus the massive projects).
Talk to us

Gradescope assignment scores

Grade data from 6,607 assignments on GradeScope submitted to 2,748 different courses. Used in the Grades are not Normal paper
Talk to us

AI2 Reasoning Challenge (ARC)

Many science questions Code:

Want more?

Talk to Chris! He would be happy to help you make a data request.

© Stanford 2019 | Website designed and made by Chris Piech.