Notes on Making Good Progress – Part 4

progressSometimes I just get carried away a bit. I managed to get an early copy of Daisy Christodoulou’s new book on assessment called Making Good Progress. I read it, and I made notes. It seems a bit of a shame to do nothing with them, so I decided to publish them as blogs (6 of them as it was about 6000 words). They are only mildly annotated. I think they are fair and balanced, but you will only think so if you aren’t expecting an incredulous ‘oh, it’s the most important book ever’ or ‘it is absolutely useless’. I’ve encountered both in Twitter discussions.



This part addresses chapters 6 and 7.

Chapter 6 describes the first of the alternative models, with the model of progress. I think it makes perfect sense to link summative and formative assessments, and I also applaud the suggestion that textbooks, or even digital textbooks, could play a larger role in the English curriculum. Here, I have been influenced by my Dutch background, where using textbooks (for maths, for example, my subject) is quite normal. There also is ample research on textbooks from other countries. ‘Progression’ also seems to refer to starting with basic skills and ‘progressing’ to next phases. I’m immediately thinking about (sequences of) task design, worked examples, fading of feedback, scaffolding, etc. These are all common elements of instructional design and multimedia learning and remain unmentioned. I think it’s good that the idea of ‘progression’ is made accessible for the average teacher but do wonder whether this is a missed opportunity. In designing their lessons teachers can be helped, even for the domain of assessment. It is followed up by some interesting threats to validity, including teaching to the test. I thought the author’s description of a progression model made sense; I imagine it is what humans have done over the centuries while designing curricula. Measuring the progression (p. 155) repeats the assumption that if you are interested in generic skills (I agree that with Christodoulou that’s not enough) you will grade frequently. In my mind it seems a bit of a rhetorical trick to make generic skill lovers complicit to a testing regime. It is interesting that Christodoulou mentions the word ‘multidimensional’ because I will later on see it as one of the summative shortcomings of comparative judgement, which promotes an holistic judgement over separate elements. Of course I agree with the advice we “need different scales and different types of assessment” (p. 159) and I also like the marathon analogy. But I do wonder what is new about that advice.

Then it’s onwards to principles for choosing the right assessments in chapter 7. To improve formative assessments some elements are promoted: specificity, frequency, repetition, and recording raw marks. I like how multiple-choice questions are ‘reinstated’ as being useful. I do think the advantages are exaggerated, especially because the subject context is disregarded, as well as multiple-choice ‘guessing’ strategies. It is notable that Christodoulou goes into the latter criticism and explains how the drawbacks could be mitigated. I think it would have been good if these had also been addressed for more subjective essays. The maths example of p. 167 is fair enough, but there technically (even with marking) is no reason to not make this an open question that can even provide feedback. I think it also would be useful to distinguish between different types of knowledge that should underpin questions. I think it is perfectly fine to give MC questions a firm place for diagnostics (or even diagnostic databases, as there already are many of them) but the author could highlight cutting edge potential more as well. Maybe it’s most useful to simply not say that one of question type is ‘best suited’ but to simply say one needs to ensure that the inferences drawn from the questions are valid; in other words the validity of them. ‘Validity’ seems to be a term that underpins a lot of the author’s thinking, which makes it a shame that it wasn’t treated more elaborately in chapter 3. I like how the testing effect, and Roediger and Karpicke’s work, features from page 169, as well as desirable difficulties (Bjork) and spaced and distributed practice. These are all very relevant and indeed could inform teachers how to better organise their assessments.


Notes on Making Good Progress – Part 3

progressSometimes I just get carried away a bit. I managed to get an early copy of Daisy Christodoulou’s new book on assessment called Making Good Progress. I read it, and I made notes. It seems a bit of a shame to do nothing with them, so I decided to publish them as blogs (6 of them as it was about 6000 words). They are only mildly annotated. I think they are fair and balanced, but you will only think so if you aren’t expecting an incredulous ‘oh, it’s the most important book ever’ or ‘it is absolutely useless’. I’ve encountered both in Twitter discussions.


This part addresses chapters 4 and 5.
Chapter 4 critiques descriptor-based assessments. I think it is important here to distinguish a bad implementation of a good policy or simply a bad policy. It starts by describing ‘assessment with levels’. I notice that the author often takes reading examples, which in principle is fine, but the danger is that we too quickly think it applies to all subjects. I think the chapter does a good job at describing the drawbacks of descriptor-based systems. I do, however, feel that some of them are not less prominent in alternatives presented later. I also get the feeling that apples and oranges are sometimes compared in the ‘descriptive, not analytic’ section because there is no reason to not simply do both. The comment on ‘generic, not specific’ with regard to feedback is spot on, but again there is no reason to not then do both: generic AND more specific feedback, in my opinion. Actually, throughout the book I feel that the novice/expert cut that had so skillfully been exposed, is not taken into account in many of the pages. As reviews of feedback use have shown, the type of feedback (and timing) interacts with levels of expertise. The examples of different questions seem related to their specific goal e.g. on page 94 the question on Stalin can be an excellent multiple choice question on certain knowledge. However, if it was more about relationships of certain events multiple choice questions might give away the game too much. The same with equations: multiple choice questions do not make sense if your aim is to check equation solving skill, but would make sense if you want to check if they can check the correctness of solutions. I think there is some confusion about reliability and validity here, most prevalent in the example on fractions. Yes, the descriptor on fractions is general but that is often part of a necessarily somewhat vague set of descriptors in a curriculum. What Christodoulou then gives as example (page 99) seems to be more about validity and reliability of tests and assessments. Decades of psychometric research have provided insight in how to reliably improve assessment for summative purposes. It feels as if this is under-emphasised. Also, descriptor systems can be made more precise by mark-schemes and exemplars (as, by the way, later on presented in the comparative judgement context). A pattern in  the book seems to be that

  1. The author provides some good critiques of the drawbacks of existing practices,
  2. But then does not mention research on mitigating drawbacks,
  3. Nevertheless, a case is made for changes with a ‘solution’
  4. But these solutions are not discussed in light of how they improve the drawbacks and/or introduce other drawbacks.

This could lead to a situation where readers might nod along with the critique but then incorrectly assume the proposed solutions will solve them. I think it is admirable to describe the challenges in this accessible way but would have preferred a more balanced approach. As a case in point, take the ‘bias and stereotyping’ of page 104. This is a real challenge, and rightly so seen as a point to address in descriptor-based assessment. Yet, as said before, there are ways to mitigate these drawbacks. Instead, the case is made that reform is necessary, and later on in the book a ‘solution’ is given that still uses teacher judgements but ‘simpler’ (not really, holistic judgement is not simple per se, only if you have a short uni-dimensional judgement to make, but the condemnation of teacher judgement wasn’t about that, it was about complex judgements). In my view it just ‘pretends’ to be a solution for these well-observed challenges.

Chapter 5 critically assesses another assessment type, namely exam-based assessment. The somewhat exaggerated style is exemplified by the first sentence “we saw that descriptor-based assessment struggles to produce valid formative and summative information”. The chapter first links the exam model to chapter 3’s distinction of the quality and the difficulty model. I am not convinced by the arguments that then try to explain why exam-based (summative) assessments are difficult to use for formative purposes. Sure, they are samples from a domain, but one can simply collect all summative questions on a certain topic or subject, to make valid inferences. Sure, questions differ in difficulty, but there are ways to analyse the difficulty. The comments on page 120 and 121 are fair (hard to say why right or wrong) but  I can’t help but think about the ‘solutions’ provided later on with comparative judgement, which uses Rasch analysis and ‘just correct or incorrect’, suffer a same problem (granted, they are presented as ‘summative’ solution). With maths exams there are mark-schemes, so a more fine-grained analysis *is* possible for formative purposes. The chapter *does* provide a nice insight in the difficulties regarding marking and judgement. A third problem, it is suggested, is that marks aren’t designed to measure formative progress. I think again that the book asks some good critical questions, but ultimately too much sends out the message that old practices are bad. From page 130 the author argues there are issues with the summative affordances of exams as well. I think that this section, again with the fractions examples, exaggerates the ‘non validity’ of exams. Testing agencies have developed a raft of tools to make exams valid over years and between samples. Again, the challenges and difficulties are described well, but ways to mitigate the challenges are undermentioned. Further, the suggested ‘modular’ approach is good but is this really new? The next four chapters are about alternative systems.


Notes on Making Good Progress – Part 2

progressSometimes I just get carried away a bit. I managed to get an early copy of Daisy Christodoulou’s new book on assessment called Making Good Progress. I read it, and I made notes. It seems a bit of a shame to do nothing with them, so I decided to publish them as blogs (6 of them as it was about 6000 words). They are only mildly annotated. I think they are fair and balanced, but you will only think so if you aren’t expecting an incredulous ‘oh, it’s the most important book ever’ or ‘it is absolutely useless’. I’ve encountered both in Twitter discussions.



This part addresses chapters 2 and 3.

As I think the two approaches in chapter 1 are a bit of a caricature, I wonder whether this continues in Chapter 2. At least there are some good examples of people who, in my view, utilise an exaggerated view of the ‘generic skills’ approach. Generic skills are rooted in domain knowledge. Yet, it is *not* the case that you will have to re-learn certain skills again and again in every domain. A good example is in my area of expertise maths and spatial/mental rotation skills. There is a (limited) amount of transfer within groups of domains. This is tightly linked to schema building etc. It is therefore unhelpful to present a binary choice here. What *is* good is to make people aware that generic skills courses *need* some domain knowledge.  The chapter uses quite a lot of quotes, some from the ‘7 myths’ approach of using Ofsted excerpts. Although I like this empirical element, and I even think there’s something in some of the claims, it would have helped if the quotes were something less ‘cherrypicking’ like. The fact that generic skills are mentioned are no evidence that they necessarily are a focal point of teaching. In fact, generic skills seem to be presented as an almost ‘automatic’ outcome of ‘just teaching’ in the deliberate-practice approach. So even in an approach the author seems to prefer, generic skills will probably be mentioned. The chapter of course goes on to name-checking ‘cognitive psychology’ and  Adriaan de Groot (like in Hirsch’s latest book). It is good that this research is tabulated, and the addition of ‘not easily transferable’ already shows a bit more nuance (p. 33). Schemas are mentioned and that it is good that ‘acquiring mental models’ is put central, and not ‘less cognitive load for working memory is best’. I felt these pages showed a wide range of references, though quite dated. I wholeheartedly agreed with the conclusion on page 37 that ‘specifics matter’ i.e. domain knowledge. It is telling that in discussing this, other ‘nuanced’ words appear, for example on page 38 when Christodoulou says ‘when the content changes significantly, skill does not transfer’. The interesting question then, in my mind, is when content is ‘significantly different’. My feeling is that this threshold is often sought far too low by some, and far too high by others. It would be good to discuss the ‘grey area’, just like the ‘grey area’ in going from a novice to an expert.

The section concludes with a plea for knowledge, practice etc. with which I very much agree. It becomes the prelude to a section on deliberate practice. It is an interesting section with a role for Ericsson’s work. Practice is extremely important; I do wonder, though whether the distinction performance and deliberate practice is more mixed than presented. Originally, the discussion about deliberate practice seemed to revolve around ‘effort versus talent’. This meta-review suggests there are more things in becoming an expert. Yes you practice specific tasks but I think it is perfectly normal to early on also ‘perform’, whether it is a test, a chess game or a concert. Or even look at an expert and see how they do (mimic). Not with the idea that you instantly become an expert but the idea that it all contributes to your path towards *more* expertise and solidify schema. Especially the link to ‘performance’ being too big a burden on working memory does not seem to be supported by a lot of evidence. It is not true that you can’t learn from reflecting on ‘performance’ as many post-match analyses show. Of course, one reason for this might be that the examples all are from ‘performing arts’ and sports, arguably more restrained by component skills leading to the ‘performance skill’, but after all it’s not me introducing these examples. At the bottom of page 41: “even if pupils manage to struggle through a difficult problem and perform well on it, there is a good chance that they will not have learnt much from the experience.” in my view plays semantics with the word ‘difficult problem’. I wonder why ‘there is a good chance’ this is the case. It also poses interesting questions regarding falsifiability, after all if a student does well on a post-test and has a big gain in an experimental research setting, maybe they haven’t learnt anything? Maybe they just performed well. By now, I have seen enough from the over-relied on Kirschner, Sweller and Clark paper. Bjork’s ‘over-learning’ is an interesting addition: I would agree it can be good to over-learn but we need to think about the magnitude (hours, days, weeks?) and unfortunately there is no mention of expertise reversal, worse performance. On page 42 and 43 I thought we would get to the crux (and difference) in aims of tasks, because I agree that those are key in learning. While acquiring mental schemas the cognitive load does not have to be minimal, just as long as those schemas are taught. In assessments you don’t want the cognitive load to be too high because you will fail your assessment. The chapter finishes with an ‘alternative method’ as a ‘model of progression’. I am not sure why this is called an ‘alternative’ because it sounds as if it has been around for ages. It even echoes Bruner’s scaffolding (oh no!). The attention to peer- and self-assessment is interesting, but I’m not sure if direct instruction methods really incorporate them, at least not in the often narrow terminology used in the edu blogosphere. Although I have seen a broadening of the definition through ‘explicit instruction’. I’m sure some will point out, that oft ridiculed progressive behaviour, of not understanding the definitions 😉 In sum, a useful chapter with a bit too much of a false choice.

The start of chapter 3 puzzles me a bit because it starts by explaining how summative and formative functions are on a continuum. I agree with that, and find it at odds with Wiliam’s foreword, in which he seemed to confess that the functions need to be separated. The chapter discusses the concepts of validity and reliability. I am not completely sure I agree with the formulation that validity only pertains to the inferences we make based on the test results, but I haven’t read Koretz. There are many types of validity and threats to validity, and I would say it *also* is important that a test simply measures what it purports to measure (construct validity); the many sides of the term should be discussed more. The comment on sampling is an important one. With reliability, I think the example with a bag of flour of 1 kg is an awkward choice, as it suggests a measure can only be reliable -in this case- as it shows 1 kg. This is not the case, scales that consistently measure say 100 grams over would still be a reliable scale, just not valid for the construct measured (mass). Reliability also isn’t an ‘aspect of validity’. When discussing unreliability it would have been helpful to have been more precise with explaining the ‘confidence bands’, and perhaps measurement errors. I get the feeling that the author wants to convey the message that measurements often are unreliable, but maybe I’m wrong. I very much like the pages (p. 64) on the quality and difficulty model; I agree that both models are accompanied by a trade-off between validity and reliability. There is a raft of literature on reliability and validity, Christodoulou chose only a few. As a whole, the chapter makes some useful links with summative and formative assessment. However, the example on page 70 is not chosen very well (and again note that there are many long quotes from other sources, more paraphrasing would be helpful), as in my view the first example (5a + 2b) *can* be a summative question if pupils are more expert (e.g. maths undergraduates). I like how Christodoulou tries to combine summative and formative assessments in the end, but wonder what new baggage we have learnt to make that happen.


Notes on Making Good Progress – Part 1

progressSometimes I just get carried away a bit. I managed to get an early copy of Daisy Christodoulou’s new book on assessment called Making Good Progress. I read it, and I made notes. It seems a bit of a shame to do nothing with them, so I decided to publish them as blogs (6 of them as it was about 6000 words). They are only mildly annotated. I think they are fair and balanced, but you will only think so if you aren’t expecting an incredulous ‘oh, it’s the most important book ever’ or ‘it is absolutely useless’. I’ve encountered both in Twitter discussions.


This part addresses the foreword, introduction and chapter 1.

I have been following the English education blogosphere for some time now. Daisy Christodoulou might be best known for her book ‘7 myths about education’ (and winning University Challenge with her team). ‘7 myths’ was a decent book with some nice and accessible writing, especially useful because it gave knowledge a bit more attention again. Points for improvement were the fact they weren’t really 7 myths in my view, 3 were variations of another myth, the empirical backing was a bit one-sided, and there was an error in quoting (revised) Bloom. But any way, a fresh voice, and some good ideas; bring it on, now in a new book on assessment.

The foreword of the book (again) is by Dylan Wiliam, best known perhaps for his ‘formative assessment’ work with Paul Black. After all the government malarkey on assessment with ‘assessment after levels’ he rightly so emphasises the timeliness of the book. Schools can make new assessment systems. Of course it is telling that a book needs to address this; it could be argued -especially when a government is keen to point at top performing PISA countries- that such an assessment system could be designed by a government. Of course, we now hear this more and more, but only after finishing the old system, opening the way to all kinds of empirically less grounded and tested practices. The foreword ends with a statement I am not convinced by, namely that formative and summative assessment might have have to be kept apart. For instance, it is perfectly acceptable to use worked examples from old summative assessments in a formative way. One could argue that both summative and formative assessments draw from the same source. In fact, in one of the promoted types of assessment, comparative judgement, one advice seems to be to use exemplars for students to know what teachers are looking for: a summative and formative mix.

One thing that immediately strikes me is that I love the formatting. The book has a nice layout and a good structure. Throughout the book, polygon diagrams perhaps suggest more structure than there is (who hasn’t used triangles ;-). Contrary to 7 myths each chapter seems to really tackle a separate issue, rather than the same issue in a different guise. The reference lists in the beginning are quite extensive. though for people who know the blogosphere a bit one-sided (Oates, Hirsch etc.). Later chapters have less references, and that is a shame because the second half is far more constructive and less ‘this and this is bad’ (more on that later). I can agree with a lot of criticisms in the first half, and even with the drawbacks of ‘levels’, but I am less convinced that some of the proposed alternatives will be an improvement. More evidence would have worked there.

The book starts with an introduction. Unfortunately the introduction immediately sets the tone, and in an un-evidenced way. “In the UK, teacher training courses and the Office for Standards in Education, Children’s Services and Skills (Ofsted) encouraged independent project-based learning, promoted the teaching of transferable skills, and made bold claims about how the Internet replace memory.” I find that a gross generalization. Of course I know about the Robinson’s and Mitras of the world, and there probable *are* people in those organisations saying this (and outside), but is it rife? It is a pattern that also was apparent in ‘7 myths’. The sentence after that with ‘pupils learn best with direct instruction’ (no, novice pupils, it can even backfire with better pupils, so-called expertise reversal) and ‘independent projects overwhelm our limited working memories’ (no, this depends on the amount of germane load or, if you will, element interaction) in my view are caricatures of the scientific evidence. Often this has been parried in debates that it is reasonable to simplify it this way. I’m not sure; my feeling is that this is actually how new myths take hold. Luckily, what follows is a good explanation and problem statement for the book; I think it is good to tackle the topic of assessment.

Chapter 1 starts with a focus on Assessment for Learning (AfL). I think the analysis of why AfL failed, partly focussing on the role and types of feedback, is a good one. Black and Wiliam themselves emphasised the pivotal role of feedback, in that it needed to lead to a change in behaviour in the students. This did not seem to happen well enough. On page 21 it is ironic, given what follows in later chapters, that Christodoulou writes “When government get their hands on anything involving the word ‘assessment’, they want it to be about high stakes monitoring and tracking, not low-stakes diagnostics.” I feel that when Nick Gibb embraces ‘comparative judgement’, this is exactly what is happening. The analysis then continues, on page 23, with sketching two broad approaches in developing skills in the ‘generic skills’ and ‘deliberate practice’ methods. I had the well-known ‘false dichotomy’ feeling here. By adding words like ‘generic’ and also linking one approach to ‘project-based’ I felt there clearly was an ‘agenda’ to let one approach be ‘wrong’ and one ‘correct’. It even goes as far on page 26 to say that the ‘generic skills’ method leads to more focus on exam tasks. No real support for this supposition. Actually, some deliberate practice methods focus on ‘worked examples’ where using exam tasks would be reasonable but also ‘working with exam tasks’. I agree that approaches should be discussed, by the way, but as so many discussions on the web, not in a dichotomous way if evidence points to more nuance.


Thoughts on Comparative Judgement

This is just a quick post to collect some thoughts on (Adaptive) Comparative Judgement. Recently it seems to have gotten a lot of traction in the UK education blogosphere, to the extent that even the minister and experts for the Education Select Committee are already mentioning it, and the NAHT already mention it. I think the technique, originally Thurstone and in the adaptive form by Pollitt technically is great but I think the advantages might be exaggerated, certainly on a national, summative scale. Nevertheless, I hardly see any critical voices, except for this blog, which is excellent on this and being an arch-skeptic I’d thought I’d write down some of my thoughts.
  • Firstly it is important to take into account to the nature of the tasks being assessed. For example the subject. If we are talking about some subjects, mark-schemes are perfectly fine and pretty reliable (comment on what we mean by that later on); it is with more subjective tasks there are challenges. So perhaps maths less of a problem than say an English essay. This Ofqual review is sometimes referenced, and it gives a far more nuanced position on reliability of marking, although they base a lot on Meadows and Billington (2005).
  • But even then, as the Ofqual review of marking reliability shows, there are decades of procedures and actions you can take to get increased validity and reliability. It seems as if the idea has taken hold that we have marked unreliably for decades. The question is not whether it’s unreliable or not, but whether reliability was enough (note that this also means you need a good conception of what reliability and validity are, not always sure of that).
  • The ‘enough’ question partly depends on what you want to do with it, I guess. The more high stakes the nature of the assessment the more important it is. One could even see a ‘cut’ between whether you use the assessment for formative or summative purposes. I do not see enough discussion about these aspects.
  • In the discussion it also is important to say what ‘reliability’ is any way. Agreeing on a rank (“I think A is better than B”) is different to agreeing on a quantification of the mark (“A is 80, B=60”). To compare ‘like with like’ you need to use the same type of measure.
  • Some CJ literature has shown than some of the challenges of traditional marking of course still apply: for example the influence of the length of assessments, or multidimensional nature (also note the potential subject differences again). The assumption that you just ‘say which is the better work’, holistically, and it then will lead to a lot of agreement (statistically) seems tricky to me. Even if your conception of the “better piece of writing”, “OK to go with your gut instinct”, is clear (is it? Can’t groups engage in a groupthink?), it still is important to know what to look for e.g. originality, spelling, handwriting style. Certainly if at one point you do want to give students some feedback or exemplars of higher scoring work. This also touches the ‘summative/formative’ issue.
  • Which leads me to think that the ‘old’ situation is painted too much as ‘not good enough’ and the new one as improving many things, in the summative sense.
  • Certainly if we take into account claims like ‘more efficient’, ‘less time’ and ‘reduces workload’, I think it’s too facile to say we can get all that AND more reliable.

I think the developments around ‘comparative judgement’ at a potential policy level are going far too quickly, and are based on some misconceptions of validity and reliability, and -maybe most of all- purported time-savings and workload reduction. In my view ‘more efficient’ and ‘more reliable’ aren’t reasons why you would want CJ (or actually ACJ), as in my view these advantages over traditional assessments only might hold for only a small part of summative assessments, namely those that are relatively short, uni-dimensional and subjective (e.g. a short English essay). And even then it pays to check the costs involved, from scanning work, all the training courses, the marking time etc. Simply suggesting that at one point those costs will all go and you are left with a nice 30-second comparison, is not really painting the full picture. We always need to ask that age-old important social media question ‘is that worth the opportunity cost’. This does not mean we couldn’t continue trying out piloting and testing (producing evidence) of exactly the issues I’ve just mentioned. And if we’re doing that any way, we might actually look at some applications that look more promising to me than a limited, national, summative application:

  • As tool for Continuing Professional development with teams of teachers, to increase awareness of marking practices. We sometimes actually did that with our department when I was teaching in the Netherlands.
  • Meso-moderation: at the level of groups of schools.
  • Experimenting with assessment types not yet used, for example open-ended questions in maths.
  • Peer assessment. The evidence base could be linked to emerging research on peer assessment and formative practices.

Finally, what I always would like is that pilots and experiments (no policy changes please!) are facilitated, and do not involve promoting paid services at this point. Apart from No More Marking (whom I think have started to charge?), maybe the open source platform featured by this Belgium project D-PAC might be interesting.


Routes into teaching: the new IFS report

Update: added a little bit on the ‘leadership’ aspect and made  wording more precise.

I covered the costs of routes into teaching before. Two reports have been released, one by the Institute for Fiscal Studies (funded by the Nuffield Foundation) and one by Teach First and Education Datalab. I must say, it doesn’t seem a coincidence Teach First commissioned another report and releases it exactly on the day the IFS report is published. I can understand why because the report(s) together give the impression that: TF is expensive, has low retention (saying it is higher in year 2 is strange as the Teach First programme lasts two years), teacher do not stay on in challenging schools BUT the ones who do stay end up in leadership functions and higher salaries. Both reports are interesting reading and I applaud the transparency behind them. What was even more interesting though was the social media and press flurry around them. In this post, I’d thought it would be good to tabulate the numerous tweets and comment on several first-hand press releases.

First a blog by the Education Datalab, which led both studies. This first describes the worse retention but then makes a case for the good aspects of Teach First. Although I think these good aspects should not be undervalued I did have some questions about some of the points highlighted, including some errors.

The error concerns the reporting of the benefits. Data was not from headteachers, as stated in the blog but secondary subject leaders (see my older blog on the report in draws from). In addition, we could also present the ‘secondary ITT coordinators’ which shows higher or at least comparable perceptions of benefit, except for salaried SD.

I also wonder where the ‘much more likely to continue to work in schools in challenging circumstances’ comes from, as the report seems to say that it may be the case after 3 years but that this reversed after 5 years. There is an additional graph (Figure 4) based on Free School Meals but that also shows a shift from higher percentages of FSM towards lower percentages FSM. I think there should be genuine concerns about this, if the idea is that the disadvantaged’ are helped. In any case, the migratory patterns of trainees need further scrutiny.

Finally, the ‘seven times’ is based on data on different cohorts. The text talks about different numbers in the table, I would say it’s 4 against 25, which of course is not seven ( a minor point). The text does mention that other routes are one year less on the job market, but I agree that it is hard to account for that.

However, what I do miss is some critical reflection on the nature of these positions. Sure, there are more in leadership positions, but are they within their original MAT? Schools within MATs? Newly founded free schools? Given the objective regarding ‘disadvantaged students’ it seems there needs to be a bit more analysis before one could say that objective is reached most effectively through ‘leadership’. It certainly isn’t through teaching as the IFS report already established that less were teaching. The establishment of charities seems a less convincing cause of reduced inequality. Given the difficulty to get school leaders I can see a place for this, by the way, but we should ask whether the much larger investment of public funds is worth the developed leadership AND whether it really ends up helping the disadvantaged. I, for one, have always said that not a lot of pressure comes out of Teach First to argue for systemic actions to address poverty and inequality. Also, the argument that TF-ers would otherwise would not have gone into education could be cynically parried by “and 60-70% doesn’t because they go to other pastures”. Is the return of investment really worth it, just looking at the expense? (And not an, in my view, emotive argument that they are such fine teachers).

Of course Teach First also had their own series of press releases. Understandably they liked to stress the ‘leadership’ aspect more than cost and retention. But they also had a post asking for more investment in research into teaching training routes. I thought the press release was a bit too defensive, to be honest. It starts off with basically saying that the comparison had not been fair, in my view surprising because previously -when the IFS had used an in my view strange way to calculate Teach First’s larger benefits– there had been no complaints about that. Towards the end it actually is repeated.

It is important to first say that I agree that more transparency of these costs and benefits is needed over the board. Of course part of the transparency is supplied by the report. Nevertheless, there still might be information that is unknown (for example, upfront I personally was wondering what part of the cost actually was covered by third party donations) and we need to realise that. The text then goes on to emphasise the ‘leadership benefits’ and again suggest it had ‘not been a fair comparison’ without actually explaining why. One aspect, it seems, concerns long term teacher quality. Although I agree with that statement, it seems a bit strange to first explain what the study did *not* study nor was asked to study. I have no doubt that Teach First provides a good quality provision, yet it needs to be off-set against the cost, just like any provision. I do know that ‘good’ and ‘outstanding’ are relatively unhelpful notions to describe, as most provisions *must* have level 3 and 4 trainees to survive (as far as I know).

Finally, then, the findings of the IFS report are addressed. I think saying the programme’ is ‘three years’ (by including recruitment) is a bit strange. Initially there were some thoughts that previous calculations did not take into account the fact that an TF or SD trainee would immediately teach in the first year, but on page 19 it is clear this has been accounted for.

Of course it is true that TF trainees do more than just PGCE and QTS, namely ‘leadership’, and that within the teachers that stay do so effectively, but I’m not sure if that is the core aim of such a programme.

The second point regarding recruitment costs seems fair, but needs to be said that TF asks schools for a recruitment fee as well. SD fees were also taken into but are much lower. I don’t think HEI fees were included but would expect that they would be lower as well, as universities can make use of extensive PR departments any way. Overall, though this might lower the total cost (see later on).

Another point made concerned the donations.

It is correct to say this is ‘not cost for the taxpayer’ of course, although it does make sense to look at all costs to evaluate the ‘value for money’. After all, if we would state that TF produces better outcomes then this might be caused by more money being pumped into their trainees.Looking again at the net funding:

It is sensible to ask what of that *is* public and what is not. Looking at those direct grants from the NCTL we can look at the year reports and conclude that Teach First received around £ 40 million in 2014-2015 to cater for 1685 trainees which is around 24k per trainee (actually, the 13-14 data was £ 34 million against 1426, about similar).

If we now also take into the direct costs to school, the upfront recruitment fee of 4k then it seems that the ‘voluntary contribution per trainee’ is rather low. Of course what is difficult is to unpick what money is actually used internally for what, so let’s also look at the total income of Teach First for 13-14 and 14-15 respectively (this is the 2014-2015 year report for Teach First):



Simply dividing the total ‘Corporate, trusts and other contributions’ by the total number of new trainees per year yields an amount of £4400-£4800. Of course not all of that might go towards training. According to the online appendices to the first report voluntary contributions for Teach First are £ 1200. Off-setting all of this against aforementioned costs they make a difference but even if we would subtract all the donations and recruitment fees, the costs stay high. I would even say it’s a bit disingenuous to focus attention on these cost types, as it suggests the work is poor (although readily cited in places where the outcome seems more favourable). The cost must be discussed, not downplayed. The last point of the Teach First press release concerns the bursaries. This, of course, is a valid point from the viewpoint of the student. I think the absurdly high costs of the bursary programmes certainly need to be taken into account. But these bursaries are not ‘cost of the programme’ rather a stimulus for individuals. I think that money can be better spent to attract teachers.

The press release finishes with:



It is interesting that Teach First uses ‘four years’ because as mentioned previously the IFS reports seems to indicate that it has changed after 5 years. The last point is a variation of incorrect reporting mentioned previously, namely that schools mentioned the benefit; they didn’t they were subject specialist and ITT coordinators in the schools gave a different picture. In an older blog I already criticised the emphasis on ‘value of benefit’ in the 2014 report.

After reading all these sources I would say:

  • Teach First is much more expensive than other routes, even after taking into account school fees, recruitment, first year teaching and donations;
  • Teach First has a much stronger leadership focus than other routes;
  • Retention in Teach First is worse than other routes;
  • There is a shift from TFers working more in ‘requiring improvement’ and ‘inadequate’ schools towards ‘good’ and ‘outstanding’ from year 3 to year 5;
  • The ‘benefit’ from the 2014 IFS report is misreported;

I think Teach First is a valuable route into teaching with passionate leadership and alumni ambassadors (important: criticising cost is not criticising individuals), but it is important to evaluate the overall cost of such a programme (per trainee). Certainly in a time where both provider-led teacher training and school direct programmes have to train with vastly smaller amounts of money (for example 9k for HEI but they normally pay part of this to schools for mentoring), it is realistic to look at ‘added value’ for education. Maybe that is ‘leadership’. Maybe that’s ‘helping the disadvantaged’.  But even if we think those need to be addressed it doesn’t help if retention is low and teachers end up in better schools. Rather than say ‘not a fair comparison’ it would be best to address these aspects head on.

Education Education Research Math Education MathEd

Presentatie researchED Amsterdam (Dutch)

Dit is de researchED presentatie die ik gaf op 30 Januari 2016 in Amsterdam. Enkele Engelstalige woorden zijn er in gelaten. Literatuur is aan het einde toegevoegd.

Education Math Education MathEd Tools


As some may know I’ve had an interest in technology for maths for quite some time now. Because of this I am very aware of what the developments are. One of the latest offering from the wonderful online graphing calculator from Desmos, consists of their ‘activities’. Although every maths teacher should stay critical with regard to integrating ‘as is’ activities in their classrooms, I also think they should be aware of this fairly new feature. That is the reason I flagged it up during some ‘maths and technology’ sessions I ran for the maths PGCE. But always as critical consumers.

One of the latest offerings is the Marbleslides activities. I first read about it on Dan Meyer’s blog. There are several version of it, with linear functions, parabolas and more. As always the software is slick and there is no doubt the ‘marble effect’ is pretty neat. It reminds me of a combination of ‘Shooting balls‘ (linear functions, Freudenthal Institute, progressive tasks), ‘ Green globs‘ (functions through the globs) and also the gravity aspects of Cinderella. It has already been possible to author series of tasks with the latter widget. I first tried the ‘marbleslide-lines‘. The goals of the activity are:

The activity starts off with some instruction on the use of it. Many questions arise:

  1. Why do the marbles start at (0,7) ?
  2. Are the centers of the starts ‘points’? (this becomes important later on)
  3. Why several marbles? Why not one?
  4. Why do the marbles have gravity?
  5. How much gravity is it? 9.8 m/s^2?

Clicking launch will make the marbles fall and because they fall through the stars ‘success’ is indicated.


I am already thinking: so is the point to get through the points or the stars? And if gravity is at play, does that mean lines do not extend upwards? Any way, I continue to the second page, where I need to fix something. What is noticeable is 1. yes, the marbles again are at (0,7), 2. the line has a restricted domain, 3. the star to the right is ‘off line’. I’m not much more informed about the coordinates of the star, which leads me to assume they don’t really matter: it must be about collecting them. ‘Launching’ shows the marbles only picking up only two of the stars (for movie, see here).

desmos_p2The line has to be extended. The instruction is “change one number in the row below to fix the marble slide”. A couple of things here: what is there to fix? Is something really broken? The formula has a domain restriction. Do we really want to use the terminology of this domain being broken? I removed the domain restriction. This is ‘just’ a normal line. But it doesn’t give all the stars so ‘no success’. Restricted to x<12 no. For x<9 marbles shoot over. x<7 and there is success.

desmos_p2_2This is very much trial and error, partly caused by the gravity aspect.

On page 4 there is a more conventional approach: there is a line with a slope. The prior knowledge indicated in the notes mentions y=mx+b should be known: “Students who know a little something about domain, range, and slope-intercept form for lines (y=mx+b)”. I wonder why this terminology is not used then. Again the formulation of the task is “change one number in the row below to fix the marble slide”.

desmos_p3Because it’s relatively conventional I guess the slope is meant. But am I meant to guesstimate? Or use the coordinates? Does it matter? I first tried 1 (yes, I know that’s incorrect) and I just keep on adjusting.


0.5 seems ok, but 0.45 is ok as well, even 0.43. 0.56 does as well, but 0.57 misses a star because the line runs above it. May I adjust the intercept? I can, so this again promotes trial and error over thinking beforehand, in my opinion. In addition, it does not instill precision.

On page 5 the same thing but now for the intercept.

desmos_p5I’m still curious why the terminology of y=mx+b isn’t used. I guess -2 is expected as nicest fit but I can go as far as -2.7 to get ‘success’, yet -1.4 is ‘no success’. This could be seen by the teacher, of course (well, we can make any confusion into an interesting discussion, of course). It is interesting to see the marbles now start from higher up, by the way. The gravity question becomes more pertinent. How much gravity? And there is a bounce, surely the bounce is more if the gravity or hight is greater? Or not? Apart from the neat animation, what does it add?

Then on page 6 we go to stars that are not on one line (surely too quick?). There again are several answers, which in my opinion keeps on feeding the idea that points (but sure, they are stars) do not uniquely define a line.

desmos_p6From page 7 predictions are asked when numbers are changed. There still is no sign of terminology. It is a nice feature that the complete Desmos tool can then be used to check the answers. This is about functions, unlike the marbles section. Why is the domain still restricted, though? Throughout the tasks it seems as if domain and range are modified to suit the task, rather than a property of functions. Granted, a small attempt to address it is in page 11.

On page 13 the stars are back again. The first attempt with whole numbers is exactly right y=2x+4{x<5}. Some marbles fell to the right of the line though. Nevertheless, there was ‘success’. But there also was success on page 14 like this:

desmos_p14_2From page 15 there are challenges. The instruction says to “Try to complete them with ONLY linear equations and as FEW linear equations as possible.” These are a lot of fun, but I struggle to see the link to slope-intercept form y=mx+b. It is not mentioned explicitly. There is no real attempt to link to the terminology. I fear it will remain a fun activity with a lot of creative trial and error.

desmos_p17I’ve also looked at the parabolas activity. The same features are apparent here: functions are collections of points (rather: stars) and functions have to be found that go through them. The assertion is that transformations of graphs are somewhat addressed concurrently, but the trial and error aspect makes me doubt this. It also deters from general properties of graphs like roots, symmetry, minimums, maximums. I can see a role for playful estimates but in my opinion they must be anchored in proper terminology, precision and properties of graphs. Furthermore, I was inclined to sometimes just use lines. There was no feedback as how this was not permitted. One could even say a line also is polynomial, so why wouldn’t I. The trial-and-error nature might further incentivise these creative solutions. Great, of course, if you know transformations already but not if the activities are meant to strengthen skills and understanding (did I ever say they go hand in hand? 🙂

Some of these aspects might be mitigated by the editing feature that will be released soon, but surely not all answers to fundamental but friendly critique will be “do it yourself”? Another nice feature of course, also in other software, is that you can see student work. Yet I feel with some of these fundamental issues not properly addressed, misconceptions might arise. I think that the marble animation is at risk of obfuscating what the tasks should be about. It might lead to more engagement (fun!) but if it does not lead to learning or even might lead to misconceptions, is that helpful? Firstly, I think the scaffolding of tasks should be more extensive with a clear link to maths content. Secondly, I would reconsider the confusion between ‘points on a line’ and ‘stars to collect’. I hope Desmos can iron out some of these issues, because one thing is sure: the falling marble effect remains a joy to behold. However, pedagogically, I think as it stands it needs to be developed further.

Education Politics

Thoughts on the HE green paper

I was asked to give my thoughts on the HE green paper. Here are some thoughts.

  • I agree with a strong emphasis on teaching.
  • This can be improved considerably but it is a caricature to suggest that teaching is not valued.
  • A stronger emphasis on teacher should not mean that it’s just another extra obligation. It must be understood that if we want more emphasis on teaching, perhaps there must be less emphasis on research. A nightmare scenario would be that the TEF adds to the bureaucracy of HEIs
  • In other words, it should not be another REF, but then just for teaching. Certainly if we want a cross-over between research and teaching (p. 20), especially important perhaps for an Education School, then this also means appreciating what can’t be done.
  • Measuring ‘teaching quality’ is difficult. Using multi-modal approaches is better than simple metrics. I think chapter 3 on the TEF addresses this quite reasonably.
  • Teaching is a collaborative affair where teachers (staff) are experts who ‘teach’ students. Teachers together with students. A too student-centered approach (aimed at what the students want) is not desirable. Students do not always know best. Metrics like the NSS are very arbitrary and do not correlate strongly with teaching quality. Research shows that variance of surveys like these lies more at the student level and not institutional level, in other words differences within HEI are much larger than between HEI. This means that comments like on p.19 (point 7.) are unwarranted, as hardly any variance on student experience and engagement can be explained at the HEI level. I appreciate that this can be mitigated by using multiple sources for determining ‘teaching quality’ but this must really be key. In addition, good teachers already listen to students.
  • The link between ‘teaching quality’ and raising fees is undesirable. The comment on p. 19 on ‘value for money’ is subjective as ‘value for money’ can also mean that the fees should be lowered (this would be a good idea), thus increasing the ‘value for money’. Yet, in this context it is used to argue the value should be increased. Given the high scores for the NSS, this seems strange. Further, the HEPI-HEA research also says (p. 9): “Unsurprisingly, when asked about their top three priorities for institutional expenditure, 48% of students chose ‘reducing fee levels’. However, four further clear priorities emerge, each chosen by over one-third of students:  increasing teaching hours, decreasing class sizes, better training for lecturers  and better learning facilities.”. The current report seems to what one-sidedly have chosen only a few of these items.
  • There is another inconsistency in all of this: if teaching quality for students is improved, partly based on student judgements, I presume, students are ‘rewarded’ with higher fees. This seems very paradoxical, very ‘anti market’, and could also set teachers against students.
  • GPA seems more appropriate than the current ‘banding’ but the problem is more that student achievement is conflated with ‘teaching quality’. Certainly with a strong contribution of student evaluations I doubt a new system will do away with a tendency to have higher grades. The new system even seems to incentivize this. I would prefer actions that really do something about root causes. Luckily the limitations are acknowledged on p.26 (point #41).
  • On participation the document does not convey any sense of wider, systemic reasons for a lack of participation, for example Socio-economic inequalities. Of course this document is about HE but like many foci, some acknowledgement of this would have been good.
  • The changes in the market do not acknowledge that there is no real market. Like many semi-public examples this will combine the worst of both worlds.
  • On the education structure: students should not be at the center: students AND teachers should be at the center. In point 4 staff is sorely missed. In addition, ‘market principles’ are very central. It also says ‘affordable’ which seems counter to the fee developments in the last decade. A name ‘Office for Students’ fails to acknowledge HE is a joint affair.
  • The changes in the architecture seem very reasonable but the question needs to be asked “what does this restructuring really solve?”. It sounds like rebranding exercises the private sector often does: old wine in new wineskins. The costs of these reforms are often underestimated (e.g. IT costs).

[Dutch post] Maatwerk: de verengelsing van het onderwijs

Afgelopen Juni stuurde ik onderstaande reactie naar iemand van de VO-raad. Het vat goed samen hoe ik over de huidige ontwikkelingen naar een zogenaamd maatwerkdiploma denk: “Mijn betoog is echter dat er een levensgroot gevaar is dat de wens om maatwerk te leveren, naar mijn mening ingegeven door een relatief kleine groep anekdotische dramaverhalen, gaat leiden tot ‘Engelse toestanden’. Dus dat we een groot pluspunt van het Nederlandse systeem op losse schroeven zetten voor een ideaal dat nooit bereikt gaat worden.”

Ik probeer wat zaken te formuleren die mij te binnen schoten in alle discussies op twitter. Ik heb niet te veel op die precieze bewoording gelet en ben gewoon gaan schrijven; het kan op punten onsamenhangend zijn. Grosso modo komt het er op neer dat ik bang ben dat het Nederlandse systeem te veel ‘Angelsaksisch’ gaat worden: meer selectie en meritocratie. Ik denk dat het moeilijk valt te ontkennen dat economisch gezien de UK en VS ongelijker zijn. Dit geldt ook voor onderwijs. Met alle voorbehouden van dien (met name met betrekking tot validiteit PISA’s, PIAAC’s, TIMSS van de wereld) is 1 aspect in Nederland dat de spreiding van scores kleiner is, met andere woorden de slechtere leerlingen doen het nog steeds vrij goed in Nederland. In de UK is dit niet het geval, Dit terwijl ze ‘op papier’ veel van de voordelen zouden moeten hebben waar nu hoog over wordt opgegeven. Laat ik eens enkele zaken opsommen:

  1. De gedachte dat in Engeland (ik zeg UK maar Schotland is behoorlijk anders) meer maatwerk zou zijn stamt denk ik van de gedachte van het ‘oude systeem’ met O en A levels. Het romantische beeld van de 10-jarige wiskundegenie die al naar de universiteit kan gaan. Los van diverse andere uitdagingen die dit met zich meebrengt is het in de praktijk niet zo romantisch. Op paper zijn er ‘allerlei kansen’ om naar hogere niveaus te gaan, in de praktijk wordt dit gegeven gebruikt om eigenlijk -als het niet lukt- de verantwoordelijkheid op kinderen af te schuiven. Immers, je *kunt* naar een hoger plan maar je haalt het niet. Dus in mijn ogen zorgt dit er juist voor dat er meer ongelijkheid komt (zie ook de volgende punten).
  2. Het toetsen in de UK wordt op papier veel langer uitgesteld. Einde basisonderwijs zijn er de SATs maar die dienen niet voor plaatsing VO. Ze zijn er alleen om basisscholen op ‘af te rekenen’. VO plaatsen worden vergeven op basis van ‘catchment’, kortom waar je woont. Dit is natuurlijk al sterk verbonden met Socio-Economische Status, beter bedeelden wonen in gebieden met doorgaans betere scholen, mede omdat catchment ook huurprijzen en huizenprijzen (indirect) bepalen. Dit leidt tot straten waar ene kant straat met zelfde huizen andere kant 200 pond per maand duurder. Maar het is nog gekker, op papier begint iedereen ‘gelijk’ aan de VO maar om ‘maatwerk’ te leveren voor ‘de betere leerling’ kun je sommige vakken op een ‘hoger niveau’ doen. Hierbij krijg je meer leerstof aangeboden. Dit werkt dan weer door in het VO alwaar alweer op paper ‘iedereen gelijk is’ maar kinderen na een paar maanden in zg. ‘sets’ worden ingedeeld. Eigenlijk is dat gewoon ‘ability grouping’ waarbij het idee dan is dat je kunt ‘opstromen’ per vak als je daar talent voor hebt. Op papier dus weer ‘maatwerk’. Maar ook dat werkt in de praktijk niet: lagere sets kom je bijna niet uit, omdat het sociaal-pedagogische klimaat niet optimaal is (zeg maar, er wordt niet opgelet, want iedereen straalt uit ‘we zitten inde laagste set. Daarnaast wordt niet alle stof onderwezen (Opportunity to Learn). Dus heb je een achterstand, demotivatie, de lagere sets blijven veroordeeld tot lager, hogere worden hoger. Ogenschijnlijk maatwerk leidt tot ongelijkheid.
  3. Het is al vaker geopperd op twitter en vervolgens wordt er gezegd ‘maar we willen niet dat het lagere geaccepteerd wordt maar dat als iemand wat meer kan (ik moet denken aan Rosenmullers wiskunde op VWO voorbeeld) dit ook kan. Ik denk dat dit veel minder vaak gaat voorkomen dan wat naar mijn mening veel meer de trend is: de voorbeelden van de VWO leerling die niet goed in talen is en dreigt niet te slagen, die talen op een lager niveau gaat doen. Het meest heldere systeem blijft voor mij een uniform diploma. Waarbij aangetekend dat het allang niet uniform meer is met profielen, keuzevakken en zelfs al maatwerkregelingen. Dat is al genoeg maatwerk en ik denk dat het overgote deel van de leerlingen daar prima mee kan opschieten. ik heb daarom ook al eens geopperd dat de omvang van ‘het probleem’ maar eens helde rgemaakt moet worden. Dat is niet wat anekdotes hier en daar. Het vereist ook dat duidelijk wordt gemaakt dat het hudiige systeem fnuikend is. Dat betekent niet hier en daar een anekdote dat iemand niet heeft kunnen doen wat hij/zij wil (en bovendien zal hier de neiging zijn om ‘het systeem’ de schuld te geven). Ik denk dat het ‘probleem’ wel meevalt.
  4. Ranglijsten en rankings. We werken naar een systeem met ongelijke diploma’s (want maatwerk). Dit systeem zal leiden tot veel verschillen aan het einde van het VO, zoals nu in Engeland met GCSEs. In Engeland zijn er dan nog meer verschillen aan het einde van A-levels en vervolgens is er het Hoger Onderwijs waarbij de A-level cijfers de doorslag geven. In het voorjaar schrijven toekomstige studenten zich in en krijgen ze zogenaamde ‘offers’, bijvoorbeeld ‘als je drie A*’s haalt (A*A*A*) dan willen we je graag naar Oxford. Soms zijn die zelfs ‘unconditional’. Ook moet je soms nog een aanvullende toets doen of een intake gesprek (meestal de betere universiteiten, bij ons is dat bijvoorbeeld al zo voor de lerarenopleiding). De prestigieuze, hoger op ranking staande universiteiten en opleidingen kunnen hogere eisen stellen. Neem mijn uni (zg. Russel Group, zeg maar sub-top, top 20), engineering AAA, de meeste opleidingen AAB, wat minder populaire ABB, maar niet lager. Vervolgens doe je A-level examens. Haal je het dan kun je je ‘offer’ opnemen en naar de uni in kwestie. haal je het niet, dan moet je alsnog een plek zoeken in September (dit heet ‘clearing’). Opleidingen proberen daar hun overgebleven plaatsen te vullen. 1 Oktober begint het Academische jaar, mede ook om die reden. Ik zie het als volgt: doordat er zo veel verschillen zijn in de ‘pakketten’ voor A-levels, iedereen kan een ander pakket, is er maatwerk. Maar hierdoor is er geen uniforme toelatingseis te stellen. Universiteiten zullen selecteren. Maatwerk leidt tot selectie. Selectie leidt tot competitie. Het voorland van Nederland als er ‘maatwerk’ komt. Dat is geen goede zaak.

Nou snap ik dat ik in dit hele relaas al meerdere keren heb gezegd dat het ‘in de praktijk’ niet werkt. De gedachte kan daarom zijn ‘maar het is wel een mooi idee dus wij doen het gewoon anders en beter, en dan gaat het goedkomen’. Mijn betoog is echter dat er een levensgroot gevaar is dat de wens om maatwerk te leveren, naar mijn mening ingegeven door een relatief kleine groep anekdotische dramaverhalen, gaat leiden tot ‘Engelse toestanden’. Dus dat we een groot pluspunt van het Nederlandse systeem op losse schroeven zetten voor een ideaal dat nooit bereikt gaat worden.

Dit alles laat niet onverlet dat (i) de betere leerling beter bediend kan worden, (ii) maatwerk soms heel handig kan zijn. Over (i): persoonlijk denk ik dat dit meer een mentaliteitskwestie is. Ik heb nooit goed begrepen hoe we personen anders dan ‘de betere leerling’ in het VO verantwoordelijk kunnen houden of ‘het er echt uit komt’. Als een puber, en dat valt best te begrijpen, zichzelf niet tot daden kan aanzetten dan moet deze daar aan werken. Al dan niet gesteund door ouders, schools enz. Maar hij/zij moet het wel doen en de actoren om hem/haar heen moeten niet. Dat is geen ontkenning van de rol van het onderwijs; die moet ‘slechts’ (blijven) doen waar ze goed in zou moeten zijn: onderwijzen. Over (ii): hier kan de wet misschien wat vrijer gemaakt worden. Maar dat is geen systeemwijziging. Het opstromen en stapelen weer makkelijker maken, iets wat ook de inspectie constateeert (maar dan gek genoeg in plaats van te zeggen ‘dat is slecht’ ook op de maatwerk-toer gaat), is een veel snellere en minder risicovoller manier om de toegankelijkheid goed te houden maar toch meer maatwerk te leveren. Het legt ook de verantwoordelijkheid bij de leerling, niet het onderwijssysteem.