Assessment Design in the Age of AI · AI for Educators Hub

Set the Rules ⚖️ Policy Foundations

The AI Use Continuum

Mark Complete

The single most important move: tell students exactly how much AI is allowed — on each task, in writing, before they start. "No AI" and "AI everywhere" are both policies. Ambiguity is not.

The AI Assessment Scale (AIAS)

Based on the framework by Perkins, Furze, Roe & MacVaugh (2024). Pick one level per assessment and name it on the task sheet.

🚫

Level 1 — No AI

Assessment completed entirely without AI assistance. Protects foundational skill demonstration.

When to use

Formative checks, foundational literacy/numeracy, invigilated exams, early-unit diagnostics.

How to enforce

In-class, handwritten, or on a locked-down device. Don't give this as take-home.

💡

Level 2 — AI Planning

AI for ideation, outlines, feedback on drafts. The final product must be the student's own words.

Allowed

Brainstorming, structuring arguments, grammar checks, summarising source material.

Not allowed

Generating sentences/paragraphs that appear in the final work. Students submit their prompts log.

🤝

Level 3 — AI Collaboration

AI co-authors specific sections. Student directs, edits, and is accountable for every claim.

Allowed

AI-generated drafts the student substantially rewrites, code scaffolds, translation, image generation.

Required

Citation of AI use + reflection on what was accepted, rejected, and revised.

⚡

Level 4 — Full AI

AI does most of the production. Student is assessed on prompting, curation, and critique.

What's assessed

Prompt quality, selection between AI outputs, fact-checking, ethical judgement.

Example

"Use AI to draft 3 marketing pitches — critique which is strongest and why, in 400 words of your own."

🚀

Level 5 — AI Exploration

Students build with AI — custom GPTs, agents, multi-tool workflows — as the subject of assessment itself.

When to use

Capstone projects, EPQ, IB extended essay, senior design, AP research.

Grading focus

Originality of the system designed, not the AI output it produces.

The Traffic Light: write this on every task sheet

RED

No AI permitted. Use of AI is academic misconduct. Often used for in-class exams and foundational skill checks.

AMBER

AI permitted for specified steps only — brainstorming, editing, language help. All uses must be cited and reflected on.

GREEN

AI use expected and encouraged. Graded on how well students direct, critique, and build upon AI outputs.

💡 Pro-tip

Never leave AI policy to "department norms" — put the traffic-light colour, a one-sentence explanation, and the citation requirement on every single assignment brief. Students should not have to guess.

AI-Proof 🛡️ 12 Concrete Moves

AI-Proofing Traditional Assessments

Mark Complete

AI-detection software is unreliable and biased against multilingual students. Stop trying to catch AI. Design tasks where AI can't silently do the work for students in the first place.

Anchor to this week's class

Require references to a specific Tuesday lab, Mr. Ahmed's anecdote, or slide 14 of the deck. AI can't know what happened in your room.

Use hyper-local context

Tie prompts to your city, school sports day, yesterday's headline, or data your students collected themselves.

Process over product

Grade the revision history — Google Docs version log, drafts, marginal notes — not just the final file.

Two-stage tasks

Students take home a draft; next class they revise it in the room based on peer feedback. Stage 2 is the graded one.

Handwritten in class

For high-stakes skill checks: return to paper. Pair with short, frequent assessments rather than one big exam.

Multimodal response

Ask for a physical model, labelled sketch, whiteboard photo, or hand-annotated diagram as a required component.

Personal-experience prompts

"Connect this concept to a decision you made last term." Lived experience can't be LLM-generated convincingly.

Require primary sources

Ask for direct quotes with page numbers from specific texts. Fabricated ("hallucinated") citations are the #1 AI tell.

Post-training-cutoff content

Base prompts on last month's news, a recent school event, or data you generated yesterday. LLMs fumble on very recent context.

Oral defense follow-up

Any written submission is followed by a 3-5 min interview. See Module 4 for the full viva voce playbook.

Ask for the reasoning

"Why did you choose approach A over B?" AI produces answers; students need to defend choices.

Collaborative artefacts

Group projects with individual contribution tracking (Docs suggestion mode, GitHub commits). Each student defends their slice.

⚠️

Don't rely on AI detectors

Tools like Turnitin's AI detector and GPTZero produce false positives — especially against non-native English writers and neurodivergent students. Use them only as a conversation starter, never as proof of misconduct.

Integrate 🤝 Assess With AI, Not Against It

Integrating AI Into Assessments

Mark Complete

Some of the most powerful assessments in 2026 require AI. The skill being tested is not production — it's direction, critique, and judgement. Here are 9 assessment patterns to use today.

🧠

A Pedagogical Stance

The patterns below are not workarounds. They demand more critical thinking than the essays they replace — students must evaluate, refute, fact-check, and reason against a confident, articulate adversary on every task. Designed well, AI doesn't kill critical thinking. It puts students in a daily debate with one of the most persuasive arguers on the planet.

🔍

Pattern 1 · "Find the Flaw"

Give students an AI-generated essay, code, or lab report with deliberate errors. They annotate errors, explain why they're wrong, and rewrite correctly.

What you're assessing

Domain knowledge, critical reading, and error detection — higher-order Bloom's.

✏️

Pattern 2 · Prompt Engineering

Students submit a problem + their best prompt + the resulting AI output + a justification of why this prompt worked better than a naive one.

What you're assessing

Clarity of instruction, task decomposition, specificity — real professional skills.

⚖️

Pattern 3 · Compare & Choose

Generate 3 AI outputs on the same problem. Students rank them, justify the ranking against a rubric, and improve the best one.

What you're assessing

Evaluative judgement, rubric literacy, revision skills.

📚

Pattern 4 · Verify Every Claim

Students use AI to produce a research brief, then must verify every single citation against primary sources — flagging hallucinations with evidence.

What you're assessing

Information literacy, source evaluation, epistemic caution.

💬

Pattern 5 · The Chat Transcript

Students submit their full conversation with the AI as the primary artefact — you grade the questions they asked, how they pushed back, and what they rejected.

What you're assessing

Metacognition, iterative thinking, learning-by-dialogue.

🛠️

Pattern 6 · Build a Custom GPT

Students design a custom GPT / Gem / project that teaches a younger student, tutors a concept, or solves a niche problem. Grade the system design and instructions.

What you're assessing

Pedagogical thinking, systems design, audience awareness, original application.

🥊

Pattern 7 · Debate the AI

Student picks a position. AI is prompted to argue the strongest possible counter. Student submits the full debate transcript + a 200-word post-mortem on which AI arguments forced a concession, which they rebutted, and where the AI used sophistry.

What you're assessing

Argumentation, intellectual humility, rebuttal craft, recognising fallacies.

🚨

Pattern 8 · Red Team the Model

Students systematically probe an AI for failure modes — surfacing one verifiable hallucination, one bias (gender, cultural, geographic), and one factual error in their subject area, with primary-source evidence for each. This is the work AI labs literally pay people to do.

What you're assessing

Domain expertise, ethical reasoning, evidence collection, healthy scepticism.

📺

Pattern 9 · Live Critique

Project the AI on the board. Give it the assignment live. Students annotate the streaming output in real time — flagging weak claims, missing evidence, biased framing, and great phrasing. Run it as a 10-minute timed sprint; the student who flags the most defensible issues wins.

What you're assessing

Speed of critical reading, content mastery, recognition of AI tells (over-confidence, fake citations, vague hedging).

🔒 Equity & access

If your task requires AI, you must provide equal access. Use school-licensed tools, provide shared accounts, or give in-class time. Never assume students have paid plans at home.

Grade Differently 🎤 Defense-Based Assessment

Viva Voce & Oral Defense

Mark Complete

Borrow from Italian universities, PhD defenses, and IB orals: the viva voce (Latin for "living voice") — a short interview where students defend their work aloud. It's the single most effective anti-AI measure — and it turns weak work into a coaching moment.

What it is

A 5-10 minute defense

Student sits with you (or a panel), you ask unprepared questions about their submitted work, and you grade the conversation — not just the paper.

Why it works

AI can't sit the interview

Even if AI wrote the essay, the student must understand it to defend it. A 60-second follow-up question separates learning from outsourcing.

Weighting

30-50% of the grade

Treat the written submission as the starter. The viva makes the mark. Students who can't defend their work, don't own their work.

The Viva Voce question bank

Pick 3-5 of these at random per student. Keep the tone warm but precise.

Source & reasoning

• "Walk me through how you arrived at this conclusion."
• "Which source was most useful, and why?"
• "Show me where in your draft you changed your mind."

Understanding

• "Explain this sentence as if I were a Year 7 student."
• "Define this technical term you used."
• "What would you cut if I gave you half the word count?"

Process

• "What was the hardest part, and how did you handle it?"
• "Which AI tools did you use, and for which steps?"
• "What's one thing the AI got wrong that you fixed?"

Extension

• "If you had another week, what would you add?"
• "Give me a counterargument to your own thesis."
• "What's a question your work raises but doesn't answer?"

Oral defense rubric (copy-ready)

Criterion	Emerging (1-2)	Proficient (3)	Exemplary (4)
Ownership of content	Recites memorised phrases; confused when probed.	Explains main ideas confidently in own words.	Re-explains, reframes, and improves on the fly.
Evidence & sourcing	Cannot locate or justify sources.	Names sources and summarises them accurately.	Weighs sources against each other; flags weaknesses.
Response to challenge	Becomes defensive; repeats earlier claims.	Accepts feedback; adjusts position reasonably.	Generates counterarguments unprompted; thinks aloud.
AI transparency	Vague or evasive about AI use.	Names tools and steps where AI was used.	Explains what AI got wrong and how they fixed it.

Defense formats for different class sizes

🎤

1-on-1 Viva

5-10 min. Best for capstones, IB orals, senior projects.

📹

Loom/Flip Video

3-min walkthrough of their work + one surprise question by reply.

👥

Peer Panel

Student defends to 3 peers using your rubric. Scales to 30+ students.

🎟️

Random Exit Ticket

Pull 5 students at random for a 2-min mini-viva after each submission.

💡

The 30-student reality check

Don't try to viva every student on every task. Viva one in three, randomly chosen after submissions are in. The possibility of being called changes the behaviour of the whole class.

Cite & Reflect 📝 Transparency Tools

Citing AI & Writing Reflections

Mark Complete

Every time a student uses AI, they should do two things: cite it (like a source) and reflect on it (like a co-author). Reflections themselves must be AI-proofed — otherwise students will just ask the AI to write the reflection.

How to cite AI

APA 7th Edition Updated guidance

OpenAI. (2026). ChatGPT (April 2026 version) [Large language model]. https://chat.openai.com

In-text: (OpenAI, 2026). Include the prompt in an appendix.

MLA 9th Edition

"Describe the symbolism of the green light" prompt. ChatGPT, April 2026 version, OpenAI, 12 Apr. 2026, chat.openai.com.

Quote the prompt, name the tool version, date it, link it.

Minimum required fields (any style)

Tool name
ChatGPT, Claude, Gemini

Version/date
AI outputs are irreproducible

Exact prompt
In an appendix, verbatim

How used
Brainstorm? Draft? Edit?

The AI Reflection (required on every AI-assisted task)

A structured 150-word statement students attach to their work. Use these four prompts:

What did I ask?

Paste your best 2-3 prompts verbatim.

What did I accept?

Which AI suggestions made it into the final piece unchanged?

What did I reject or revise?

Name one thing the AI got wrong, misleading, or biased.

What did I learn?

One sentence on how using AI changed your thinking about the topic.

AI-proofing the reflection itself

Students will absolutely use AI to write an "AI reflection" if you let them. Use one of these four strategies:

Handwritten in class

Take the first 10 minutes of the lesson after the submission is due. Students handwrite the reflection on an index card. No devices.

Spoken reflection (recorded)

Student records a 60-second Flip/Loom video answering the four prompts — verbal fillers and pauses reveal real thinking.

Screenshot audit trail

Students submit dated screenshots of their AI chat(s). Chat history on ChatGPT, Claude, and Gemini is timestamped — very hard to fake.

Reflect-in-pairs

Students interview each other for 3 minutes using the four prompts, then write each other's reflection. Surfaces understanding, not just output.

Print this One-page summary

The AI Assessment Cheat Sheet

Everything from this masterclass, on one card. Put it above your desk while you redesign your assessments.

① Policy

Declare AI level (1-5) and traffic-light colour on every task.

② AI-proof

Anchor tasks to this week's class, local context, and personal experience.

③ Integrate

Assess prompting, critique, and fact-checking — not production.

④ Defend

Viva voce 1-in-3 students at random. Weight it 30-50% of the grade.

⑤ Reflect

Cite AI + 150-word handwritten reflection on ask / accept / reject / learn.

Explore more trainings

Module 1 of 6

If an AI can answer it in 3 seconds, it's no longer an assessment. It's a typing test.

The AI Use Continuum

The AI Assessment Scale (AIAS)

Level 1 — No AI

Level 2 — AI Planning

Level 3 — AI Collaboration

Level 4 — Full AI

Level 5 — AI Exploration

The Traffic Light: write this on every task sheet

AI-Proofing Traditional Assessments

Anchor to this week's class

Use hyper-local context

Process over product

Two-stage tasks

Handwritten in class

Multimodal response

Personal-experience prompts

Require primary sources

Post-training-cutoff content

Oral defense follow-up

Ask for the reasoning

Collaborative artefacts

Don't rely on AI detectors

Integrating AI Into Assessments

Pattern 1 · "Find the Flaw"

Pattern 2 · Prompt Engineering

Pattern 3 · Compare & Choose

Pattern 4 · Verify Every Claim

Pattern 5 · The Chat Transcript

Pattern 6 · Build a Custom GPT

Pattern 7 · Debate the AI

Pattern 8 · Red Team the Model

Pattern 9 · Live Critique

Viva Voce & Oral Defense

A 5-10 minute defense

AI can't sit the interview

30-50% of the grade

The Viva Voce question bank

Oral defense rubric (copy-ready)

Defense formats for different class sizes

1-on-1 Viva

Loom/Flip Video

Peer Panel

Random Exit Ticket

The 30-student reality check

Citing AI & Writing Reflections

How to cite AI

The AI Reflection (required on every AI-assisted task)

What did I ask?

What did I accept?

What did I reject or revise?

What did I learn?

AI-proofing the reflection itself

Handwritten in class

Spoken reflection (recorded)

Screenshot audit trail

Reflect-in-pairs

The AI Assessment Cheat Sheet