Frontiers: AI Based Privacy Preservation in Education
Welcome to Frontiers - a series where we bring top researchers, engineers, designers, and leaders working at the cutting edge of various fields to go deep on their work with the Manifold Community.
For this talk, our speaker was Langdon Holmes. Langdon is a Ph.D. student at Vanderbilt University, specializing in language learning and educational technology. His research focuses on learner language development, learning analytics, and open science. His master's thesis explored non-adjacent word combinations known as collocations. He is also part of the learning analytics team at the National AI Institute for Adult Learning and Online Education (AI-ALOE).
Check out more of Langdon's work on his website:
https://langdonholmes.info/
Abstract
Education is increasingly taking place in technologically mediated settings, making it easier to collect student data that would provide significant value to the educational research community. However, much of this data is not available due to concerns about protecting student privacy. Deidentifying that data would be, in some cases, sufficient to permit data sharing among researchers and even public release. However, manual deidentification at scale is extremely time-consuming and difficult. Automated deidentification is a promising solution but continues to be challenging for unstructured data such as student writing. In this talk, I will discuss our efforts to develop automated deidentification systems that are appropriate for student writing. I will then introduce a new open-source dataset called the Cleaned Repository of Annotated Personally Identifiable Information (CRAPII), which includes over 20,000 student essays that have been annotated for personally identifiable information (PII). In order to promote the development of automated deidentification methods, we recently hosted an open data science competition in which over 2,000 teams of data scientists competed to develop deidentification algorithms using CRAPII. I will close by sharing the results from the competition and highlighting possible future directions for the deidentification of student writing.
We’re growing our core team and pursuing new projects. If you’re interested in working together, see our website for active initiatives and open positions, join the conversation on Discord and check out our Github.
If you want to see more of our updates as we work to explore and advance the field of Intelligent Systems, follow us on Twitter and Linkedin!