Ofsted

Ofsted reform should focus on inspection reliability first

Our new research casts doubt on the reliability of 'Inadequate' Ofsted judgments, explains Sam Sims, but any attempt at reform should aim to understand this better

Our new research casts doubt on the reliability of 'Inadequate' Ofsted judgments, explains Sam Sims, but any attempt at reform should aim to understand this better

2 Feb 2023, 17:30

New Ofsted research looks at the role of MATs in inspections

School inspections add value to data-driven school performance metrics by sending an experienced educator to collect first-hand evidence from inside a school. The human element of an Ofsted visit is a feature, not a bug.

But each inspector comes with their own unique set of experiences and priorities. This can lead to inconsistency. Two inspectors might reach different conclusions about the same school.

Given that perfect reliability is not desirable, how much reliability should we expect?

The American Educational Research Organisation argues that the higher the stakes of any assessment, the more reliable it should be. Big decisions require reliable judgements.

It is well known that Ofsted ‘Inadequate’ judgments can lead to school closures or heads losing their jobs. So when it comes to the lowest Ofsted judgements, we should expect good reliability.

Christian Bokhove, John Jerrim and I have just released new Nuffield-funded research comparing the judgements reached by 1,376 different inspectors across 35,751 schools between 2012 and 2019.

We found that primary schools assigned a female lead inspector are around one-third more likely to receive an ‘Inadequate’ judgement. Just under 6 per cent of judgements reached by female inspectors were inadequate versus 4.5 per cent by male inspectors.

Maybe female inspectors tend to get sent to weaker schools? But we found that this pattern held even when we compared male and female inspectors sent to inspect schools with the same prior Ofsted inspection rating, exam results, levels of pupil absences, pupil intake, and in the same region of the country.

‘Inadequate’ judgments may not be reliable

Of course, we can’t definitively establish that there were no differences between the schools to which male and female lead inspectors were assigned. Maybe there were subtle differences – visible to the inspectors, but not in our data.

The only way to definitively establish the reliability of Ofsted inspections is to send two Ofsted inspectors to the same school, and check whether they agree. Indeed, you may remember that Ofsted did just such a study back in 2016 and found that the two inspectors tended to agree.

But this research had some important limitations. Crucially, the inspected schools were all previously rated ‘Good’, meaning they were subject to a short inspection in which the presumption was that they remained ‘Good’ unless proven otherwise. The inspections were also conducted by more senior inspectors, known as HMIs.

At the time, Amanda Spielman described this study as a “first step” and said that Ofsted should “routinely be looking at issues of consistency and reliability”. Ofsted has conducted a range of research since. However, there have been no more of these gold-standard two-inspector-one-school studies since.

Crucially, there has been no research on the critical ‘Inadequate’ judgements. These are big decisions, but we do not have any evidence to suggest that they are reliable. Indeed, our new research provides some evidence to suggest they may not be.

Spielman’s term as Chief Inspector comes to an end in January 2024. And current polling suggests the government may lose power in the general election soon after. This creates a window of opportunity for modernising Ofsted. But what should be done?

Labour has recently dropped its Corbyn-era policy of abolishing Ofsted, promising instead to reform the inspectorate and focus it more directly on school improvement. Retaining Ofsted will likely be popular with parents. But Bridget Phillipson was heckled by teachers when she announced the plan at a union conference this week.

I would advise the shadow secretary of state to announce a series of new Ofsted reliability studies. These should use the gold-standard two-inspector-one-school methodology. And there should be four studies, focusing on schools in each of the four categories.

This would likely be popular with teachers who demand to know whether the methods by which they are held to account are reliable. It should also be popular with parents who will learn about how much weight to place on judgements.

Importantly, the results would also provide the information policymakers need to make an informed decision about whether we have struck the right balance between the consequences of inspections and their reliability.

More from this theme

Ofsted

Ofsted grades keep getting better after Oliver takes helm

Watchdog puts findings down to continued 'upward trend in inspection outcomes over the last few years'

Lucas Cumiskey
Ofsted

NAHT votes to explore ‘legal and industrial routes’ to secure Ofsted reform

Government 'under notice' that action could follow if heads don't get 'meaningful answers'

Freddie Whittaker
Ofsted

Schools ‘putting off’ SEND pupils face more Ofsted scrutiny

Watchdog gathers 'insights' from councils and checks if schools look 'out of kilter' with their area, says chief inspector

Samantha Booth
Ofsted

Ofsted to scrap subject deep dives for ungraded inspections

Sir Martyn Oliver says 'it isn't helpful to cram' full inspection detail into ungraded visits

Samantha Booth
Ofsted

Trust boss criticises Ofsted complaints transparency

CEO speaks out about how he persuaded watchdog to 'set aside' report amid inspector criticism

Freddie Whittaker
Ofsted

Ofsted chief wants inspection to feel like ‘peer review system’

Inspection should be 'of the system, by the system, for children and parents', says Sir Martyn Oliver

Lucas Cumiskey

Your thoughts

Leave a Reply

Your email address will not be published. Required fields are marked *

One comment

  1. “Big decisions require reliable judgements.”

    The same is true for GCSE, AS and A level grades too, for which reliability is even more important – being awarded a wrong grade can be life-changing. As happened in August 2022 for about 23,000 students who received certificates showing grade 3, fail, when, had a senior examiner marked their scripts, they would have been awarded grade 4, pass.

    Unreliable grades do great damage, as discussed in FE Week a few days ago https://feweek.co.uk/gcse-re-sits-wrong-grades-drain-students-and-resources/