World bank education data:

Dataset: hour of code data:

Code from half a million students learning on's Hour of Code curriculum in 2013 (10 problems).
Dataset: code studio data:

Data from Mike + Chris' paper on Rubric Sampling
KDD cup challenge

This year's challenge asks you to predict student performance on mathematical problems from logs of student interaction with Intelligent Tutoring Systems

DuoLingo dataset


Coursera ML LogReg dataset

Data from the Codewebs paper and Functional Variability of a Million MOOC submissions
PyramidSnapshot dataset

Citizenship Test questions

Google Trends

GitHub JavaSmall

Chris downloaded the public Java repositories on GitHub (minus the massive projects).
Gradescope assignment scores

Grade data from 6,607 assignments on GradeScope submitted to 2,748 different courses. Used in the Grades are not Normal paper
AI2 Reasoning Challenge (ARC)

Many science questions Code:

