Education Education Research

researchEd national conference

On 9 September 2017 I gave a talk at the national researchEd conference in London. The presentation was about how mythbusting might lead to new myths. The presentation covered the following:

  • I started by explaining how myths might come about, by referencing some papers about neuromyths.
  • I then used the case of iron in spinach to illustrate how criticising myths can lead to new myths (paper by Rekdal).
  • I gave examples of some themes that are in danger of becoming new myths.
  • I concluded that it is important to read a lot, stay critical and observe nuance. No false dichotomies please.

I will endeavor to write this up at one point. Slides below.

Education Research Math Education Tools

Seminar at Loughborough University

Dr. Christian Bokhove recently gave an invited seminar at Loughborough University:

Using technology to support mathematics education and research

Christian received his PhD in 2011 at Utrecht University and is lecturer at the University of Southampton. In this talk Christian will present a wide spectrum of research initiatives that all involve the use of technology to support mathematics education itself and research into mathematics education. It will cover (i) design principles for algebra software, with an emphasis on automated feedback, (ii) the evolution from fragmented technology to coherent digital books, (iii) the use of technology to measure and develop Mental Rotation Skills, and (iv) the use of computer science techniques to study the development of mathematics education policy.

The talk referenced several articles Dr. Bokhove has authored over the years, for example:

  • Bokhove, C., & Drijvers, P. (2012). Effects of a digital intervention on the development of algebraic expertise. Computers & Education, 58(1), 197-208. doi:10.1016/j.compedu.2011.08.010
  • Bokhove, C., (in press). Using technology for digital maths textbooks: More than the sum of the parts. International Journal for Technology in Mathematics Education.
  • Bokhove, C., & Redhead, E. (2017). Training mental rotation skills to improve spatial ability. Online proceedings of the BSRLM, 36(3)
  • Bokhove, C. (2016). Exploring classroom interaction with dynamic social network analysis. International Journal of Research & Method in Education, doi:10.1080/1743727X.2016.1192116
  • Bokhove, C., &Drijvers, P. (2010). Digital tools for algebra education: criteria and evaluation. International Journal of Computers for Mathematical Learning, 15(1), 45-62. Online first. doi:10.1007/s10758-010-9162-x
Education Education Research

Hirsch: the case of France

(click on the image for a larger version)

I wanted to do a relatively quick post on something I have been looking at in some tweets. It is related to part of Hirsch’s book on which I had already written. I think it’s quite clear that I like Hirsch’s emphasis on the ‘low achieving’, although we probably disagree on the role ‘systemic unfairness’ plays in schooling. This post, though, wants to focus on one of the pivotal examples Hirsch presents to argue that a skills-oriented curriculum, contrary to a knowledge-based curriculum, increases unfairness: the case of France from 1987 to 2007 (Loi Jospin). I can probably write pages full on the ‘knowledge’ versus ‘skills’ (aren’t skills just practical knowledge?) but let’s just assume that these labels are wholly justified. I will also assume, but find the justification lacking, that what Hirsch says on the page regarding amount of funding, buildings etc. to *not* have had an influence on this, is true. I think it’s quite difficult to simply ascribe changes to just a change of curriculum, even though I grant him the curriculum change in France has been vast.

I tried to track down the data Hirsch used. In Appendix II Hirsch refers to documents from the DEPP. The data seems to come from a French department, and coincidentally 2015 data has recently (November 2016) been added to the database. The data is tabulated in this document on page 3. One of the headers states that ‘social inequality is still apparent but remains stable’.

The raw data indicates ‘errors’ made (the column Moyenne denotes the mean number of errors per sub-group, Ecart-type denotes standard-deviation). I did not look at the detail of the standardised tests themselves. The document mentions some limitations, for example that labels have changed a bit over time but also the massive increase in people not wanting to give there social class of the parents (PCS de la personne responsable d’eleve).

Compared with the graph in Hirsch’s book two things can be seen:

  1. There seem to be more SES (Socio-Economic Status) categories in the data than in the book. My French is a bit rusty but I think at least one large category, probably Employes, is missing. I think that is strange.
  2. The gaps between the different groups seem to have diminished between 2007 and 2015, or -looking from 1987 to 2015- there does not seem to be a SES gap increase ie no increase in unfairness.

To go a little bit further than just ‘having a look’ I then proceeded to create some graphs. I did not create Z-values and I wonder why Hirsch did, as the ‘number of errors’ in the test used is quite a standardised measure. I also tried to use the supplied ‘standard deviations’ to try and replicate the Z-values, but could not get all the numbers matched. Here is the graph Hirsch did, but now with only the errors:


Based on this graph (sorry, I know, labels are missing etc., and I’m embarrassed to say I used Excel) one could indeed conclude that from 1987 to 2007 gaps have increased, although the gap between ‘White collar and shopkeepers (‘Artisans, commercants’) and ‘Profession intermediaires’ has decreased. As mentioned before, maybe it’s my command of the French language that conflicts with this. I also plotted the same graph but now with all categories.

It seems as if the picture conveyed on page 145 of Hirsch’s book is far less pronounced. Of course, some of these categories are quite small, but in any case, one of the five largest groups has not been included in Hirsch’s book. I wonder what causes this discrepancy; it seems implausible that the use of Z-scores could explain that difference, but I’m open to be proven wrong.

The categories was the first element I wanted to look at, the second is the 2015 data. I plotted the new graphs with 2015 data included. I did this for both the graph with only 5 categories and all others.

It is clear that more errors are being made over the years, but the unfairness (ie increasing gaps between the different socio-economic strata) seems hard to maintain. Certainly the argument that if we ‘trace the lines backward in time in time’ would mean positive equity effects (see below) of what Hirsch calls the ‘knowledge curriculum’ seems unlikely. Like the DEPP themselves state, I think it is hard to maintain that the unfairness has increased; at least based on this French data.

Education Research

Social Network Analysis: applications for education research

Today, with Dr. Chris Downey, I gave a talk on applications of Social Network Analysis in education research. Slides below.

Education Research

Explaining a grade

I’m constantly challenging myself with regard to Comparative Judgement. In a first blog I explained why I think there might be some better reasons to use it than ‘efficient’, ‘workload’ and ‘more reliable’. I extended (and repeated) this in this book review. To me, professional development, moderation and formative feedback seem much more promising. However, I think many teachers, especially English teachers, are simply so disenamoured by KS2 English writing that they frankly see anything as improvement. They are willing to replace the current summative SATs for a different summative assessment. In the meantime I have seen several examples where I would say that teachers tried to strike a balance between both summative and formative elements. Good.

Recently, though, there is one particular aspect I have been thinking about some more, and that is the challenge of justifying a grade to one of your pupils. Is the ‘Comparative Judgement’ (CJ) process, as process that assigns grades to (subjective) work, not in danger of delegating a justification for the mark? If you are a pupil’s teacher and you give a certain mark, you know why this is the case. You go into a moderation meeting, knowing why you gave a particular mark. You might feel it is a bit subjective, and also that the criteria are ambiguous, but at least you have professional ownership of that mark. I expect that you can probably explain the mark to a reasonable degree. Even after moderation, you probably know why that particular piece scored what it scored. What about CJ? The judgement is holistic (at least in the so promoted version that saves time and so reduces workload). The grade is based on ‘the collective’ judgement of many judgers. There is no feedback as to the *why* for individual pieces of writing. So what is the justification for a particular grade? What will you tell your pupil? Maybe you feel simply referring to the collective opinion of a set of experts is enough, but surely we would want to be somewhat more precise in this?

One way to tackle this, it has been suggested, is a bank of annotated exemplars. It is not always clear whether these are meant for teachers and students or just teachers. If it’s just teachers then I guess we still have the same problem as before, in that pupils will not know about the *why* of a grade. If, however, they are used as exemplifications of when you get a higher of lower grade, I also think it’s a bit wishful thinking that pupils (but even teachers!) will simply scrutinise a pack of annotated examples, and then will extract where they can improve. It is ironic that this seems to incorporate a form of ‘discovery’: “discover the criteria we holistically and collectively used, and therefore don’t know ourselves but hey we are experts, to judge yourself what is important”. I predict that very swiftly, teachers will be formulating criteria again: ‘this exemplar had only few Spag errors, was well structured but not very original, and therefore scored higher than this other one that had a similar amount of Spag errors and structure but was highly original’. Always in comparative perspective, of course, an absolute mark -although assigned in a summative assessment- could only be justified in a relative notion. I continue to think that many of the challenges correctly diagnosed with descriptors and criterion-based assessments will continue to exist, but now with a myth that assessment is very easy: you just compare with others and holistically judge them. Rather than think this, I think it is better to appreciate assessment is hard and take in more general conceptions of what constitutes reliability and validity.


Chronicles of local politics

I’ve been wanting to chronicle my time in local politics in the Netherlands for quite some time, especially because recently I think some of my behaviours, and viewpoints regarding politics and debate have been influenced by it. So here goes.

It must have been 2002 I first got politically active while still living in Purmerend, a close just north of Amsterdam. It was at a time when Pim Fortuyn (the politician who was later murdered) was making a name for himself. I liked his style (he was a very eloquent politician who had written some books) and he had -in my view- an eclectic mix of left-and-right points. Especially his analysis of the governments in the 80s spoke to me. However, I did not agree with many of his more ‘classical liberal’ viewpoints. At the time, many political parties, from left to right were attacking him, sometimes with derogatory references to Germany in the 30s. I was appalled by that; one party did not do this and wrote a booklet in which they addressed the arguments in the book. That was when I decided to become active for the SP. I continued doing so when I moved back to the town where I was brought up, Enkhuizen.

In the council

The historic council room of Enkhuizen

It was our intention, but there were strict rules for taking part in them, to join the council elections in 2006. The council in Enkhuizen has 17 seats, and like nationally there is a representative system. In 2006, for the first time, we obtained two seats in the council, I was ‘the number 2’ and obtained a seat. Although the SP politically is quite left, I would say that locally this was less important than local points. Our ‘profile’ consisted of good affordable housing, local healthcare, that sort of thing. Unfortunately in (I think) 2008 the ‘number 1′ stepped out of the party and continued for herself, making me the only councilor (and therefore leader) of the local SP. Of course behind it there was a local chapter and the rule was that the members set out the course, and the councilor(s) would then follow that line. This, of course, is quite relevant to explain why I can’t stand MPs that feel they can *not* follow their members. Sure, you might disagree with viewpoints, I think it’s an illusion to think you might never, but surely there must be some basis of democracy. In a party system like in the Netherlands people vote on parties for the parties’ viewpoints. Only in some rare cases do individuals have enough clout to attract enough votes to get a seat. But in 99% of the cases votes are based on party choices. I know this is different in the UK but nevertheless, it has instilled in me a firm belief that members decide, political reps follow. If they feel bad about it, they either (i) keep silent, (ii) arrange freedom to vote (if it does not undermine the party’s image), (iii) go away and give the seat to someone else. It was interesting to learn that in the UK giving up a seat would automatically trigger by-elections.

From 2008-2010 I was the only councilor and we built a new party. There were some big issues in Enkhuizen at the time with the two most prolific ones being parking permits for the inner town, and the sale of an historical hospital and healthcare. Our viewpoints in both the topics made that we had quite a unique profile. In addition, I think that as a councilor I espoused that I was very accessible, always would react to queries from citizens, and took them seriously. Mind you, contrary to what political opponents would say, this did not mean ‘doing what citizens wanted me to do’. There are certain principles and policies that follow from the political party and those are key. This is something I would expect any councilor to do. You’re not there for yourself but to represent your voters. During that time there was a minority coalition with three ‘governors’ from the parties NE, VVD-D66 and CDA that got backing from the smaller Christian party (CU-SGP).

Votes 2010 2006 Typology
Nieuw Enkhuizen (NE) 1212 (3 seats) 2206  Local party
PvdA 1244 (3 seats) 1990  Social-democrats (Labour)
VVD-D66 1259 (3 seats) 1007  Liberals (Tories plus LibDem)
SP 1593 (3 seats) 1006  Socialist Party (in practical terms more slightly left of social-democratic)
CDA 1060 (2 seats) 778  Conservative Christian
(nationally often governed)
CU-SGP 446 (1 seat) 764  Christian religion
GROENLINKS 379 (1 seat) 421  Green party
Lijst Quasten 442 (1 seat)  Local party

The 2010 campaign

During 2009/2010 we led a very successful (local) campaign in which I was the front-man, and we managed to get three seats (almost four). Apart from our ‘national’ profiles – the national image of the parties obviously plays a large role locally as well, two local issues stood out in that time:

1. Paid parking permits. The governor  responsible for parking matters had plans to introduce paid parking permits in the old inner town of Enkhuizen. The plan was not thought out well, and as all plans the main question was “how is this plan going to address the aim to reduce cars in the inner city?”. This question had never been answered. We were very much opposed to these plans as it would just cost money with no noticeable effect. We devised our own plan and then started a poster campaign, distributing A4 window posters against paid parking permits. It was a big success. Some citizens put the posters behind their windows, but more importantly our party became known as a party that fought against what many inner city citizens saw as an unattractive development. Because of this success, the governor in question was not very amused, eventually even resulting in calling us ‘liars’ and starting a counter campaign.

The old Snouck van Loosen hospital

2. A new health centre.

The old town of Enkhuizen has an old ‘Snouck van Loosen’ hospital. This hospital was in use by several health providers, including a walk-in center of the nearby larger general hospital. As the maintenance costs of the building were very high, the governors and council decided that it should be redeveloped. During the development of the plans two serious providers remained: one was a collaboration of the hospital with the local organisation responsible for council homes and healthcare, another was a property developer with experience in building healthcare centers. For us, the guarantee of healthcare, and especially the hospital, remaining in Enkhuizen was key. The property developer could not guarantee this but offered more money. The governors at the time chose for the property developer. In addition to health the people living around the development site were afraid that planning permission would mean their homes would be affected negatively. At the point of the elections it was clear that the responsible governor (from VVD-D66) would continue while some other political parties, including us, were against.

Photograph in the newspapers; in the back our campaign poster against parking charges.

In the table I’ve put in a fairly simple ‘typology’ of the parties, knowing full well that it doesn’t do justice to all of it, certainly not in a local context. What I think was very important was the diverse ‘media strategy’ we had. Twitter was up and coming, and I got a mention in a national magazine as one of the most prolific local politicians on Twitter. We did a lot flyers on the street and door-to-door (we did that throughout the year, by the way, which added to an image we also cared after the elections). We also did ads in the local newspapers, where we would simply show our voting history (and that of others) on the main themes. Obviously we kept an updated website with sometimes an interactive addition; I used my programming skills to, for example, design a ‘fun’ space invaders game with the key politicians.Finally, there also were hustings, in which we kept on highlighting our key themes.

The day of the election

The day of the elections was one of the strangest days I’ve ever seen. The campaign was over. I can’t really remember what I did that day, but I do know that, contrary to most political parties, we started off the election night in the place we always had our meetings. We wanted to organise a nice get-together with all the people who had helped during the campaign. After 9pm, when the ballot closed, we went to the old city hall or more precise the neighbouring tavern where the results would be announced. In this packed room we would await the Mayor who would announce the results. The atmosphere was feverish and some people who had seen the counting whispered in my ear that we were doing really well. As we had gone down from 2 to 1 seat because of 1 person leaving the party, I thought it would be great if we would be back on two. I remember watching the big screen when the votes were announced. Our bar shot up highest: we had received the most votes of all the political parties. It seemed the room (and also the Mayor) was stunned. It was generally expected that the old minority coalition, as the liberals nationally were winning quite some seats, would get a majority (later, it would turn out that the governor even had already written a draft coalition document for the occasion). But unexpectedly they ended up on eight again, and we had taken most votes. It really was quite special. In retrospect I think I could already see that evening coalition building had begun. I could not participate of course, as everyone wanted to congratulate me with our victory. I knew the next morning coalition talks would begin.


Now came one of the hardest times in my life, both mentally and politically. But to understand this phase best I first need to say a bit more about the political context and system. the Enkhuizen council has 17 seats. Ideally, after elections the biggest political party takes the lead in assembling a majority coalition (so of at least 9 seats). The tradition is that the largest party would take the lead, and that was the SP. We really tried to approach these negotiations as neutrally as possible, although some might’ve thought we were full of resentment. I can honestly say we weren’t. Sure, it was clear that the ‘old coalition’ wanted to continue, and that our victory had complicated things, but some (naive?) idealism meant that we thought that after the elections, the campaigns were over and we could simply use logical argument and debate to get a good coalition that did justice to the election result. This meant that the first step was to meet all the parties. With a standard script we met with representatives of all the other parties to get clear what they expected. Minutes of all the 1-2 hours meetings were made up and published openly. Based on this we wrote an initial ‘exploration’ document, suggesting the most logical programmatic way forward. Even in the first round one could already see some friction in the big election points. Although initially the idea was to only have formal meetings and a completely transparent process, it became clear that sometimes informal meetings were necessary. The advantage of these is that there is bit less tension; the disadvantage is that things can be said that can later on not be used, as they were not recorded. After a first round, taking into account programmatic agreement and the winning/losing parties, we first tried to build a core coalition with VVD-D66 and CDA. This came as a bit of a shock to parties on the center left as ideologically they felt they were closer to the SP. However, they had lost votes, and they would not budge on one of the main election points, parking. We did feel that the other point might become problematic with the other parties, but this was not voiced as resolutely (yet). It seems useful to mention that in (Dutch) politics parties that walk away from negotiations, can be depicted as not taking responsibility. This meant that part of the process was to *never* do that, and let others leave, if possible.

An inappropriate offer

Coalition agreement. Note how we crossed out the CDA, who recanted 2 hours before the formal council meeting.

Basically, if now only one more party would join, we would have the ‘old coalition’ complemented with us, the SP. This was not what we wanted, as it was quite clear that a large group of people wanted some changes to be made. We expected that this fourth party, NE, would increasingly present themselves with viewpoints we wanted to hear, to force this ‘old coalition plus us’ on us. They even went to the newspaper to declare that they would be perfectly willing to govern with the party they had accused of ‘lies’ before. Our hesitance to include them was not taken well and seen as resentment from our part; this was unfair in my view, it was quite clear that there simply were incompatibilities between us.The negotiations with those parties faltered and we regrouped to try and get an agreement on the center-left. One challenge was that we needed at least one party from the ‘old coalition’. We did extra rounds with the party that was closest, and normally quite pragmatic, the CDA. By giving them several points, making it hard to refuse, we managed to get an agreement. Literally, later that day, CDA unfortunately recanted their agreement. I never really found out what had happened. Their story was that one of them had encountered *some* members of *one of the other parties* and that they had said things that had made them doubt our true intentions with the agreement. But I think it had more to do with the ‘old coalition’ and their agreement to not let each other go. To set this in context, there were blocks on the political left and right, and solitary seats ‘in the middle’ (CU-SGP). Previously, they had given support to the old coalition. I think they were hoping this would happen again; this was not an impossibility. To be honest, I was shattered at that point. Weeks of well-intended and idealistic work, in my view, became the victim of political games. What we did do is make clear to the outside world, that it wasn’t us that had ‘let go’ but others. This was, for example, done by releasing a nicely formatted ‘coalition agreement’ with CDA crossed out, making clear that there *was* an agreement. I gave several interviews outlining the process, and made sure our website gave a full account of what had happened.

Ironically, it then was the ‘second largest party’, VVD-D66 the party that led the ‘old coalition’, that took over the negotiations. It was clear that the parties of the ‘old coalition’ knew each other very well and agreed about many points very swiftly. Of course, not taking the lead any more, meant that we could not get many of our points in. In fact, we were quite resistant to any ritual games of ‘neutral negotiations’. It was quite clear what everyone wanted, it was all in the minutes of the public negotiations, and essentially it boiled down to whom the one seat party CU-SGP would support to make a majority. Unfortunately they felt they had to support ‘continuity’ and went for the ‘old coalition’. That was that. In parallel, though, there was another problem. At one point the VVD-D66 governor invited me over for a frank conversation about the new health centre. In this conversation he confided in facts about the centre that previously had been denied, upon direct questions from us. My feeling was that this was an inappropriate offer: by confiding this to me, we could become part of the ‘inner circle’ and ultimately become part of the local government. This was not something we were willing to accept, after all, in our view, the governor had lied to the council. We gave him an ultimatum: to either come clean himself within a couple of days, or we would send out a message and ask questions about this during the next council meeting. This still drives me: some principles are non-negotiable. We could have governed but the price, accepting lies about a project that concerned the citizens of Enkhuizen, was unacceptable.

How it ended

It was this council meeting that became a big focal point of two things. Firstly, because it would concretise the establishment of the old coalition (and not us, although we were the largest party). Secondly, we would bring out the news about the lies. Of course, we were accused of being ‘bad losers’, and even that we had jeopardised the good relationships in the council, because we had taken so long to negotiate. I guess that disappointed the most; rather than confess any manipulation had taken place, it was all *our fault*. Luckily, I think we had enough goodwill, and explained the events so clearly in the media, that not many citizens seemed to believe these ‘alternative facts’. Remarkably, the coalition agreement they had made, contained elements that were non-negotiable when the ‘old coalition’ was negotiating with us.

Dutch heading in newspaper when I left for Southampton.
Dutch heading in newspaper when I left for Southampton.

So that is how the negotiations ended. We were in opposition and of course still were going strong (although I had to pause my activities for some months later on). I emigrated with my family to the UK in 2012 and I’m very happy to see that the local chapter has gone from strength to strength, being by far the biggest party in the current council with 4 seats, as well as a governor. I still value my time as councilor as a lot.



Hirsch – notes on ‘Why knowledge matters’

hirschThis is a quick post with some thoughts on E.D. Hirsch jr.’s last book ‘Why knowledge matters’. Of course Hirsch is mentioned a lot by the ‘knowledge oriented’ blogosphere, and I can see why. I also had the feeling his message was somewhat distorted though, and so set out to read his latest book. It is an ‘ideas rich’ book, that could have been presented a bit more coherently. It makes it hard to summarise the book. Nevertheless, there were numerous interesting points (my interpretation of course).

Importantly, we need to acknowledge the large role the US context plays; Hirsch is truly interested in inequality and the US has a lot of it. It especially became clear that he was far more communitarian than sometimes depicted. Rather than individualism, he favours community and that is something that appeals to me a lot. He does not seem overly attached to a *certain* curriculum or certain systems and structures, just as as long as there is a coherent, knowledge-oriented curriculum. He gives several favourable examples from Japan where the system is not like the charter system. I find this slightly ironic in the English context, because I feel that a lot of the communitarian aspect has actually been undermined in the last five years with systems and structures, like academies and free schools, even allowing to divert from a national curriculum. Of course, I know the reactions to this, namely that the curriculum wasn’t fit for purpose but given the communitarian ideals behind Hirsch’s thoughts I wonder whether getting rid of one, and changing the system so it becomes more fragmented, to then start a campaign to let everyone adopt one particular vision, really is a communitarian thing to do. My feeling is it actually has caused less ‘overall’ community within England, although within certain sub-cultures there is more.

Hirsch is critical of the interpretation of the Coleman Report as ‘it’s not the schools’. I can understand that; it seems that especially in the US a coherent curriculum was not on the mind. Devolving responsibility from education, in my view isn’t a good thing. Yet, I now see the opposite: education as the ‘great equaliser’, allowing governments to get away with not addressing systemic inequality. I think there is plenty of evidence that shows that different levels contribute to inequality (or equity): at the individual level, family SES, teachers, schools but also country level policies.

Another interesting aspect was the numerous times that France featured. I thought the narrative of the introduction of the Jospin law was quite compelling with regard to the decrease in France’s achievement. A weaker point was the impact it had on increasing inequality. I base this on the latest measurement done in France:

The communitarian aspect returns again, and I appreciate that Hirsch tries to detach the developments from a political colour. He does this, for example, by contrasting the I would say center left developments in France with the I would say center right developments in Sweden. I would not, though, say it’s non-political: for both countries the communitarian ideal of a coherent (knowledge) curriculum was undermined. In one by generic skills ideals, in another by system changes (friskolar, Sweden experts, correct me if I’m wrong).

Quite some space in the book is devoted to ‘educationally invalid testing’. It builds on what in the introduction is described regarding ‘generic skills’. Hirsch really seems to have big problems with the term ‘skills’ and at a certain point (p. 13) says “Think how significantly our view of schooling might change if suddenly policy makers, instead of using the term skill, had to use the more accurate, knowledge-drenched term expertise.”. I can see how one would start to dislike an opaque term like ‘skills’ when people use it interchangeably for all-sorts. But what’s in a name? If we would redefine skills, as for example Ohlsson does, to ‘practical knowledge’, I’m not sure if that really makes a difference. Also, the term ‘expertise’, according to Hirsch, might be ‘knowledge-drenched’, but in becoming an expert surely one needs to practice. We can stop using ‘skills’ in describing practice, but I feel this is more semantics: people favour generic skills, let’s get rid of the word.

The best part of a whole chapter (one) is then used to describe how in the US the use of reading tests is educationally invalid. I’m not sure how much of it can be ascribed to the US situation, but I do sense some overlap with other educational jurisdictions. Hirsch at first seemed to suggest that high stakes were best removed, to allow teachers to pay more attention to the ‘long arc of knowledge acquisition’. I don’t, however, think this should be read as Hirsch being against testing per se, just as long as they were ‘based on good, knowledge-based standards’ (p. 33). I find Hirsch slightly inconsistent here, apart from the ever-present ‘coherent knowledge based curriculum and standards’.

Hirsch, rightly so is against scapegoating teachers and goes into Value Added Models. I think it makes sense, and I had to think about the balanced American Statistical Association statement on Value Added. The links, plus a balanced evaluation can be found here. In another chapter, Hirsch covers the phenomenon of ‘fadeout’, which is challenging for every programme. Some took his mention of Direct Instruction and Success for All to be criticism of direct instruction (small letters) but it’s more the Engelmann style (capital letters). Project Follow Through makes another appearance, as does the Reggio Emilia schools as example of ‘naturalistic’. It is interesting, though, that he mentions that all programmes suffer fadeout; it seems the reason why he wants a long-term coherent curriculum. I think that makes sense, but think it does make it hard to do evaluations. Hirsch mentions he’said not very interesting in, for example, Randomised Controlled Trials. I understand his position but this does contrast with the Core Knowledge evidence base, which is rather mixed.

In sum, I enjoyed the themes in this book, although delivered in a fragmented way. I think Hirsch’s aims regarding equality are genuine and noteworthy, and is clearly fed up with teachers getting the blame. I think he really focuses on ‘a coherent knowledge curriculum’ and not, as some seem to think, systems and structures. I think his dislike of ‘skills’ being abused has been taken too far though. At first it seems he’s against testing but he’s not, as he wouldn’t mind knowledge tests. Interesting ideas, I hope we take them in, and not just pick what suits.


Notes on Making Good Progress – Summary blog

Just because I had written extensive notes, I’d thought I’d just post them in a series of blogs. All blogs together in this pdf (might have made some slight changes over time in the blogs, which are not in the pdf).

Part 1 – foreword, introduction, chapter 1
Part 2 – chapters 2 and 3
Part 3 – chapters 4 and 5
Part 4 – chapters 6 and 7
Part 5 – chapter 8
Part 6 – chapter 9 and conclusion

In conclusion, I think that  if a teacher wants to read a timely book with a lot of interesting content on assessment, they do well to read this one. They should, however, read it with the frame of mind that in places the situation is presented somewhat one-sidedly, in my view too negative about the ‘old’ situation and too positive about alternative models. Teachers can profit from that, but it can also mean that they miss out on decades of unmentioned research on curriculum, psychometrics and assessment. I would therefore encourage them to (i) read the book (ii) follow up the references and (iii) also read a bit wider. Of course, one cannot write a 1000 page ‘accessible’ book but given the number of footnotes a bit more depth in some places would have been good. Particular points are:

  • Yes, the implementation of  Assessment for Learning (AfL) has been problematic. The book covers some on the importance of feedback but not enough prior research is covered.
  • I recognise the generic versus specific domain skills discussion but in my view it is presented in a too dichotomous way. There is more than Willingham, for example Sternberg and Roediger on critical thinking. In addition, linking it to leading to certain assessment practices (e.g. teaching to the test) is unevidenced. There also exist fair criticisms of deliberate practice.
  • The introduction of a quality and difficulty model is useful but again rather binary.
  • Reliability and validity are covered but only quite superficially (types of validity, threats to validity etc.), and reliability -in my view- is not covered correctly (the example with 1kg on a scale is an example of reliable AND valid and does not tease out the essential test-retest characteristic of reliability).
  • Yes, there are problems with descriptor-based assessments but there is a raft of research addressing their validity and reliability.
  • The progression model makes sense but haven’t people been doing this for decades? (e.g. in good textbooks).
  • The attention given to the testing effect, spaced practice, multiple choice questions is well done.
  • Comparative Judgement is worth examining (critically), but (i) no silver bullet, (ii) probably only applicable for niche objectives, (iii) several pressing questions still to ask, (iv) maybe its strength lies even more in the formative realm.
  • The proposed integrated system describes what already is in place, with a plea to collaborate. This is good but we must realise that it not having worked out over the years, mainly is a funding issue, in my view.

One might wonder ‘why mention this, it’s great that this topic gets some attention?’ but I simply have to refer to what the author states towards the end of the book. ‘Assessment is a form of measurement’ and ‘flawed ideas about assessment have encouraged flawed classroom practice’ (p. 212). If these are the main aims behind the book, then it surely increases awareness of this, but without covering the basics more, I fear we don’t get the complete picture. Overall, I would say it’s an interesting, good book, but not outstanding. 3.5*.


Notes on Making Good Progress – Conclusion

progressSometimes I just get carried away a bit. I managed to get an early copy of Daisy Christodoulou’s new book on assessment called Making Good Progress. I read it, and I made notes. It seems a bit of a shame to do nothing with them, so I decided to publish them as blogs (6 of them as it was about 6000 words). They are only mildly annotated. I think they are fair and balanced, but you will only think so if you aren’t expecting an incredulous ‘oh, it’s the most important book ever’ or ‘it is absolutely useless’. I’ve encountered both in Twitter discussions.


This part addresses chapter 9 and the conclusion.
Finally, chapter 9 tries to tie several things together in one ‘integrated assessment system’. There are no references in this chapter. Many elements have already been discussed. For example, the ‘progression model’ which did not seem to offer really new insights (at least to me). Lesson plans and schemes of work appear out of the blue, together with ‘curriculum’. I agree that textbooks would be most helpful here. Another element is a ‘formative item bank’. Again, very useful, and there are already plenty out there. I am not sure if the summative item bank would need to be a different bank, just the way the items are used and compiled in valid, rigorous summative assessments needs scrutiny. I felt the ‘summative item bank’ for the quality model was far too much geared towards comparative judgement, an approach that in my view has limited scope; descriptor-based assessments can still play a role, especially in relation to exemplars. What the model *does* emphasise is that an assessment system should draw from several summative and formative sources, perhaps a little bit contradicting earlier parts of the book. This is also expressed on page 206 with the benefits (coherence, pupil ownership with adaptivity and gamification, self-improving with more adaptivity). Ultimately, though, I am left with the feeling that all these elements are already readily understood and even in place. Christodoulou seems to realise and state this on page 207 “Every individual element of this system exists already”, but does not address *how* organisations could come to an ‘unprecedented collaboration’. Maybe the challenge is that so many people *have* already tried and failed. Ideally I would have wanted the author to have touched on the costs for the resources as well. Many item banks cost money, GL and CEM assessments cost money, No More Marking is not free, Textbooks and exam boards charge money. All in all, with a funding squeeze, it is unrealistic to not address the costs.

The conclusion in the book is rather meagre with three pages. There is some repetition and bold claims again ‘flawed ideas about assessment have encouraged flawed classroom practice’. I think this caricaturises the situation. Sure, there are flawed practices but one could also say -in the quest for valid and reliable assessments- there always are flaws, even in some of the solutions Christodoulou proposes. Rather than exaggerate by calling practices flawed, it is better to look how practices can be improved. Christodoulou has some suggestions that should be taken seriously, but also critically evaluated in light of the wide body of research on assessment.


Notes on Making Good Progress – Part 5

progressSometimes I just get carried away a bit. I managed to get an early copy of Daisy Christodoulou’s new book on assessment called Making Good Progress. I read it, and I made notes. It seems a bit of a shame to do nothing with them, so I decided to publish them as blogs (6 of them as it was about 6000 words). They are only mildly annotated. I think they are fair and balanced, but you will only think so if you aren’t expecting an incredulous ‘oh, it’s the most important book ever’ or ‘it is absolutely useless’. I’ve encountered both in Twitter discussions.



This part addresses chapter 8.
Chapter 8 addresses the topic I started out reading this book in the first place: improving summative assessments through comparative judgement (CJ). This previous post, which I wrote right after reading this chapter, asks some questions about CJ. The chapter starts by repeating some features for summative assessments. The first is ‘standard tasks in standard conditions’. But it isn’t really about that in the subsequent section, but ‘marker reliability’ (p. 182). The distinction between the previously described difficulty model and quality model, is useful. It is clear to me that essays and such are harder to mark objectively, even with a (detailed) rubric. It is pertinent to describe the difference between absolute and relative judgements. However, when the author concludes “research into marking accuracy…distortions and biases” she again disregards ways to mitigate these issues, even while the referenced Ofqual report does mention them. Indeed, many of the distortions are ‘frustrating’ judgements, and therefore a big danger of rubrics. I, however, find it strange that this risky point of rubrics is disregarded, when comparative judgement, it is suggested, can work with ‘exemplars’. As Christodoulou pointed out on p. 149 there is a danger that students work towards those exemplars. I saw this often in some of the Master modules I taught: a well-scoring exemplar’s structure of sub-headings was (awfully) applied by many of the students, as if they thought that adopting those headers surely had to result in top marks. So in a sense I agree with the author’s critique, I just don’t see how the proposed alternative isn’t just as flawed. Then comparative judgement is described as very promising. It is notable that most examples of its affordances feature English essays. It also is notable that ‘extended writing’ is mentioned, while some examples are notably shorter. The process of CJ is described neatly. I think the ‘which one is better’ is glanced over i.e. ‘in what way’? I also think more effort could have been put in describing the algorithm that ‘combines all those judgements, work out the rank order of all the scripts and associate a mark for each one’ (p. 187). The algorithm is part of the reason why reliability is high: inter-rater reliability can be compared with regard to rank orders; I am not sure if the criticised traditional method is based on rank orders. For instance, if one rater says 45% and another 50% it seems reasonable to say that both raters did not agree. Yet, if we just look at the rank order they might have agreed that one was better than the other. As CJ simply looks at those comparisons, reliability is high. But it’s not comparing like with like. A similar process with one marker and 30 scripts would involve ordering scripts, not marking them.I have to think about several challenges that are mentioned in this older AQA report. I don’t think these challenges have yet been addressed nor discussed.

I think it is also interesting that Christodoulou correctly contrasts with (p. 187) ‘traditional moderation methods’. Ah, so not the assessment method per se, but the moderation. The Jones et al. article is referenced, but the book fails to mention how the literature also mentions several caveats e.g. multidimensionality and length of the assessment. The mentioning of ‘tacit knowledge’ is fine but it is not necessarily tacit knowledge that improves reliability, in my view. It can be collective bias. I think it’s a far stretch to actually see the lack of feedback for the scripts as an advantage, because it ‘separates out the grading process from the formative process’. It even distributes the grading process over a large group of people; to a student it can be seen as an anonymous procedure in the background. Who does the student turn to if he/she wants to know why he/she got the mark she received? Sure, post-hoc you can analyse misfit, but can you really say -as classroom teacher- you ‘own’ the judgement? Maybe that is the reason why it is seen as advantage, but one can rightly so say the exact opposite.  It is interesting to note that the Belgium D-PAC process actually seems to embrace the formative feedback element CJ affords. The section ends with ‘significant gain of being able to grade essays more reliably and quickly than previous’. I think the ‘reliably’ should be seen in the context of ‘rank ordering’, length of the work, and multidimensionality. ‘Quickly’ could be seen in more than just the pairwise comparisons (it is clear that they are short, if ‘holistic’ is only needed); but the collective time needed often surpasses the ‘traditional’ approach. ‘Opportunity cost’ comes to mind if we are talking about summative purposes through CJ.  I am disappointed that these elements are not covered a bit more. The section, however, ends with what I *would* see as one big affordance of CJ: CJ as way to CPD and awareness of summative and formative marking practices. But this is something different than a complete overhaul of (summative) assessment, because of the limitations:

  • Needs to be a subjective task (quality model, because otherwise there are more reliable methods)
  • Can’t be too long (holistic judgement would most probably not suffice)
  • Can’t be multidimensional (holistic judgement would most probably not suffice)

That’s quite a narrow field of application. And with the desire to stay in the summative realm, in England, only summative KS2 not-too-extended writing seems to be the only candidate (see for formative suggestions the previous blog on CJ). But careful:

In my opinion, page 188 also repeats a false choice regarding rubrics, as the described ‘exemplars’ can also be used with rubrics, not only with CJ. We do that in aforementioned Masters module (with the disadvantage it becomes a target for students). So although I agree this would be ‘extremely useful’, it actually is not CJ. Another unmentioned element is that CJ could be linked to peer assessment. To return to page 105 where bias is seen as human nature, one could argue that a statistical model is used to pave over human bias. In my opinion, this does not mean it’s not there, it’s just masked.

The second half of the chapter addresses curriculum-linked assessments. I don’t understand the purpose of mentioning GL assessment, CEM and NFER apart from using the unrealistic nature of using them to argue ‘we need something in-between’ summative and formative, and then to argue ‘curriculum linked’. As previous chapters, good points are raised but it feels the purported solutions aren’t really solutions; the problems are used to argue *something* must change but not so much why the suggested changes would really make a difference. For example, the plea for ‘scaled scores’ is nice but I would suggest only people who know how to deal with them, should use scaling; simply applying a scaling algorithm might also distort (think of some of the age-related assessments used in EEF studies, or PISA rankings).