Datasets for Computational Education

List made by Chris Oct 2019


World bank education data:

Dataset: https://www.kaggle.com/theworldbank/education-statistics

Code.org hour of code data:

Code from half a million students learning on Code.org's Hour of Code curriculum in 2013 (10 problems).
Dataset: https://code.org/research

Code.org code studio data:

Data from Mike + Chris' paper on Rubric Sampling
Talk to us

KDD cup challenge

This year's challenge asks you to predict student performance on mathematical problems from logs of student interaction with Intelligent Tutoring Systems
Dataset: https://pslcdatashop.web.cmu.edu/KDDCup/

DuoLingo dataset

Dataset: http://sharedtask.duolingo.com/

Coursera ML LogReg dataset

Data from the Codewebs paper and Functional Variability of a Million MOOC submissions
Talk to us

PyramidSnapshot dataset

http://stanford.edu/~cpiech/pyramidsnapshot/challenge.html

Citizenship Test questions

https://www.microsoft.com/en-us/download/details.aspx?id=52397

Google Trends

trends.google.com

GitHub JavaSmall

Chris downloaded the public Java repositories on GitHub (minus the massive projects).
Talk to us

Gradescope assignment scores

Grade data from 6,607 assignments on GradeScope submitted to 2,748 different courses. Used in the Grades are not Normal paper
Talk to us

AI2 Reasoning Challenge (ARC)

Many science questions http://data.allenai.org/arc/ Code: https://github.com/allenai/aristo-mini

Want more?

Talk to Chris! He would be happy to help you make a data request.


© Stanford 2019 | Website designed and made by Chris Piech.