Sometimes I just get carried away a bit. I managed to get an early copy of Daisy Christodoulou’s new book on assessment called Making Good Progress. I read it, and I made notes. It seems a bit of a shame to do nothing with them, so I decided to publish them as blogs (6 of them as it was about 6000 words). They are only mildly annotated. I think they are fair and balanced, but you will only think so if you aren’t expecting an incredulous ‘oh, it’s the most important book ever’ or ‘it is absolutely useless’. I’ve encountered both in Twitter discussions.
PART 2 – SKILLS AND RELIABILITY
This part addresses chapters 2 and 3.
As I think the two approaches in chapter 1 are a bit of a caricature, I wonder whether this continues in Chapter 2. At least there are some good examples of people who, in my view, utilise an exaggerated view of the ‘generic skills’ approach. Generic skills are rooted in domain knowledge. Yet, it is *not* the case that you will have to re-learn certain skills again and again in every domain. A good example is in my area of expertise maths and spatial/mental rotation skills. There is a (limited) amount of transfer within groups of domains. This is tightly linked to schema building etc. It is therefore unhelpful to present a binary choice here. What *is* good is to make people aware that generic skills courses *need* some domain knowledge. The chapter uses quite a lot of quotes, some from the ‘7 myths’ approach of using Ofsted excerpts. Although I like this empirical element, and I even think there’s something in some of the claims, it would have helped if the quotes were something less ‘cherrypicking’ like. The fact that generic skills are mentioned are no evidence that they necessarily are a focal point of teaching. In fact, generic skills seem to be presented as an almost ‘automatic’ outcome of ‘just teaching’ in the deliberate-practice approach. So even in an approach the author seems to prefer, generic skills will probably be mentioned. The chapter of course goes on to name-checking ‘cognitive psychology’ and Adriaan de Groot (like in Hirsch’s latest book). It is good that this research is tabulated, and the addition of ‘not easily transferable’ already shows a bit more nuance (p. 33). Schemas are mentioned and that it is good that ‘acquiring mental models’ is put central, and not ‘less cognitive load for working memory is best’. I felt these pages showed a wide range of references, though quite dated. I wholeheartedly agreed with the conclusion on page 37 that ‘specifics matter’ i.e. domain knowledge. It is telling that in discussing this, other ‘nuanced’ words appear, for example on page 38 when Christodoulou says ‘when the content changes significantly, skill does not transfer’. The interesting question then, in my mind, is when content is ‘significantly different’. My feeling is that this threshold is often sought far too low by some, and far too high by others. It would be good to discuss the ‘grey area’, just like the ‘grey area’ in going from a novice to an expert.
The section concludes with a plea for knowledge, practice etc. with which I very much agree. It becomes the prelude to a section on deliberate practice. It is an interesting section with a role for Ericsson’s work. Practice is extremely important; I do wonder, though whether the distinction performance and deliberate practice is more mixed than presented. Originally, the discussion about deliberate practice seemed to revolve around ‘effort versus talent’. This meta-review suggests there are more things in becoming an expert. Yes you practice specific tasks but I think it is perfectly normal to early on also ‘perform’, whether it is a test, a chess game or a concert. Or even look at an expert and see how they do (mimic). Not with the idea that you instantly become an expert but the idea that it all contributes to your path towards *more* expertise and solidify schema. Especially the link to ‘performance’ being too big a burden on working memory does not seem to be supported by a lot of evidence. It is not true that you can’t learn from reflecting on ‘performance’ as many post-match analyses show. Of course, one reason for this might be that the examples all are from ‘performing arts’ and sports, arguably more restrained by component skills leading to the ‘performance skill’, but after all it’s not me introducing these examples. At the bottom of page 41: “even if pupils manage to struggle through a difficult problem and perform well on it, there is a good chance that they will not have learnt much from the experience.” in my view plays semantics with the word ‘difficult problem’. I wonder why ‘there is a good chance’ this is the case. It also poses interesting questions regarding falsifiability, after all if a student does well on a post-test and has a big gain in an experimental research setting, maybe they haven’t learnt anything? Maybe they just performed well. By now, I have seen enough from the over-relied on Kirschner, Sweller and Clark paper. Bjork’s ‘over-learning’ is an interesting addition: I would agree it can be good to over-learn but we need to think about the magnitude (hours, days, weeks?) and unfortunately there is no mention of expertise reversal, worse performance. On page 42 and 43 I thought we would get to the crux (and difference) in aims of tasks, because I agree that those are key in learning. While acquiring mental schemas the cognitive load does not have to be minimal, just as long as those schemas are taught. In assessments you don’t want the cognitive load to be too high because you will fail your assessment. The chapter finishes with an ‘alternative method’ as a ‘model of progression’. I am not sure why this is called an ‘alternative’ because it sounds as if it has been around for ages. It even echoes Bruner’s scaffolding (oh no!). The attention to peer- and self-assessment is interesting, but I’m not sure if direct instruction methods really incorporate them, at least not in the often narrow terminology used in the edu blogosphere. Although I have seen a broadening of the definition through ‘explicit instruction’. I’m sure some will point out, that oft ridiculed progressive behaviour, of not understanding the definitions 😉 In sum, a useful chapter with a bit too much of a false choice.
The start of chapter 3 puzzles me a bit because it starts by explaining how summative and formative functions are on a continuum. I agree with that, and find it at odds with Wiliam’s foreword, in which he seemed to confess that the functions need to be separated. The chapter discusses the concepts of validity and reliability. I am not completely sure I agree with the formulation that validity only pertains to the inferences we make based on the test results, but I haven’t read Koretz. There are many types of validity and threats to validity, and I would say it *also* is important that a test simply measures what it purports to measure (construct validity); the many sides of the term should be discussed more. The comment on sampling is an important one. With reliability, I think the example with a bag of flour of 1 kg is an awkward choice, as it suggests a measure can only be reliable -in this case- as it shows 1 kg. This is not the case, scales that consistently measure say 100 grams over would still be a reliable scale, just not valid for the construct measured (mass). Reliability also isn’t an ‘aspect of validity’. When discussing unreliability it would have been helpful to have been more precise with explaining the ‘confidence bands’, and perhaps measurement errors. I get the feeling that the author wants to convey the message that measurements often are unreliable, but maybe I’m wrong. I very much like the pages (p. 64) on the quality and difficulty model; I agree that both models are accompanied by a trade-off between validity and reliability. There is a raft of literature on reliability and validity, Christodoulou chose only a few. As a whole, the chapter makes some useful links with summative and formative assessment. However, the example on page 70 is not chosen very well (and again note that there are many long quotes from other sources, more paraphrasing would be helpful), as in my view the first example (5a + 2b) *can* be a summative question if pupils are more expert (e.g. maths undergraduates). I like how Christodoulou tries to combine summative and formative assessments in the end, but wonder what new baggage we have learnt to make that happen.