Can ChatGPT Write Valid Multiple-Choice Questions?

By Steven Just, Ed.D.

Validating exams is a critical part of the learning, assessment and certification process. Creating a valid exam is a multi-step process. The first step is defining the scope of the exam (content validity). After defining the scope, we then need to write questions to the content.

Frequently, for high-stakes life sciences exams, that scope is the PI (prescribing information) for one of the company’s products. Often, when validating a PI exam, the questions already exist, typically written by a supplier or medical writer.

Writing valid questions isn’t rocket science, but it is a science, and you would be amazed (perhaps not) at how many question writers have not mastered that science. So, the first task is often rewriting the existing questions to conform to valid question writing rules.

Enter ChatGPT

Over the past few months, like many of you, I have been reading about the astonishing capabilities of the artificial intelligence chatbot ChatGPT. It can write essays, poetry, score in the top 10% on the bar exam, etc.

So, I was curious: Can ChatGPT write valid multiple-choice questions? Has it mastered the science of question writing?

I selected a product (no need to name it) and asked ChatGPT to write a valid question about its indications. Here’s what it came up with:

What is “Product 1” used to treat?

Breast cancer
Lung Cancer
Prostate cancer
Leukemia

I then asked it to generate a question about the dosing schedule of another product. Its question:

What is the recommended dosing schedule of “Product 2” for the treatment of “indication”?

2,500 IU/m² intravenous infusion once a week
10,000 IU/m² intravenous infusion once every two weeks
25,000 IU/m² intravenous injection once a week
25,000 IU/m² intravenous infusion once every three weeks

Note: I didn’t seed the question with the indication. ChatGPT selected one of the product’s indications on its own.

The Test Results

My reaction? These are pretty good questions. Perfectly valid. For example:

In question 1, all choices are plausible.
In question 2, the numeric choices are in ascending order.
In both questions the stems are short and direct. No superfluous information.

All these attributes conform to the rules of valid question writing.

Admittedly, these are pretty basic questions, written at the lowest level of Bloom’s Taxonomy. But the truth is most PI exam questions are written at that level.

Can ChatGPT write higher level questions? More experimentation will help us answer that question.

Or maybe we can just ask ChatGPT.

Steven Just Ed.D. is the CEO and principal consultant of Princeton Metrics. Dr. Just can be reached at sjust@princetonmetrics.com.