How do students respond to AI generated questions?

The use of artificial intelligence (AI) tools in higher education has skyrocketed in the last year, but what does research tell us about how students are responding to technology?

VitalSource’s team of learning scientists recently conducted the largest empirical evaluation of AI-generated questions known to date, using data collected from nearly one million unique questions, 300,000+ students and seven+ million total question attempts from January 2022 to May 2023. VitalSource, a leading education technology solutions provider, sought to learn more about the performance of automatically generated (AG) questions at a large scale to gain a better understanding of emerging patterns in student behavior, laying the groundwork for the future of AG formative questions.

This paper, which was peer-reviewed and presented in 2023 at the 24th International Conference on Artificial Intelligence in Higher Ed, evaluated the performance of five types of AG questions to learn more about the way students interact with types of questions on tests and homework. The five types of questions, fill-in-the-blank, matching, multiple choice, free response and self-graded submit and compare, were incorporated into digital textbooks using VitalSource’s free, artificial intelligence learning tool called Bookshelf CoachMe®.

Bookshelf CoachMe incorporates all five question types to provide variation in how students practice and process new content knowledge. This process, formative practice, provides students with immediate feedback and unlimited answer attempts and is known to increase learning gains when incorporated into the primary learning material.[1][2]

Key findings from VitalSource’s study:

  1. The type of question is related to difficulty. Recognition-type questions are generally easier than recall-type questions.
  2. Only about 12% of students input “non-genuine” responses to fill-in-the-blank questions and nearly half of those students persist in answering until they input the correct response on short answer.
  3. In a classroom environment, the difficulty index for all question types increases compared to the aggregated data, and all persistence rates are over 90%, indicating students behave differently when answering formative practice when it is incorporated into the course expectations.

“This research helps set benchmarks for performance metrics of automatically generated questions using student data and provides valuable insight into student learning behavior,” said Rachel Van Campenhout, EdD, Senior Research Scientist at VitalSource and author of the paper. “These analyses help us to continually improve our learning tools for students.”

VitalSource’s award-winning, AI-powered study coach, Bookshelf CoachMe, launched in 2021. Based on the proven learning science principle known as the Doer Effect, its AI-generated questions help students study more effectively, build confidence and deepen subject matter expertise. Its Content Improvement Service monitors and evaluates response data and automatically replaces under-performing questions, the first system of its kind to continually improve formative practice in real-time.

To learn more, read about Bookshelf CoachMe here. To find out about the science behind Bookshelf CoachMe, visit the VitalSource Learning Science Research Center. Information about VitalSource is available at


  1. Koedinger, K., McLaughlin, E., Jia, J., & Bier, N. (2016). Is the doer effect a causal relationship? How can we tell and why it’s important. Learning Analytics and Knowledge, 388–397
  2. Van Campenhout, R., Jerome, B., & Johnson, B. G. (2023). The Doer Effect at Scale: Investigating Correlation and Causation Across Seven Courses. The 13th International Learning Analytics and Knowledge Conference (LAK 2023), 357–365.


Leave a Comment

Your email address will not be published. Required fields are marked *