Education Education Research

researchEd national conference

On 9 September 2017 I gave a talk at the national researchEd conference in London. The presentation was about how mythbusting might lead to new myths. The presentation covered the following:

  • I started by explaining how myths might come about, by referencing some papers about neuromyths.
  • I then used the case of iron in spinach to illustrate how criticising myths can lead to new myths (paper by Rekdal).
  • I gave examples of some themes that are in danger of becoming new myths.
  • I concluded that it is important to read a lot, stay critical and observe nuance. No false dichotomies please.

I will endeavor to write this up at one point. Slides below.

Education Research Math Education Tools

Seminar at Loughborough University

Dr. Christian Bokhove recently gave an invited seminar at Loughborough University:

Using technology to support mathematics education and research

Christian received his PhD in 2011 at Utrecht University and is lecturer at the University of Southampton. In this talk Christian will present a wide spectrum of research initiatives that all involve the use of technology to support mathematics education itself and research into mathematics education. It will cover (i) design principles for algebra software, with an emphasis on automated feedback, (ii) the evolution from fragmented technology to coherent digital books, (iii) the use of technology to measure and develop Mental Rotation Skills, and (iv) the use of computer science techniques to study the development of mathematics education policy.

The talk referenced several articles Dr. Bokhove has authored over the years, for example:

  • Bokhove, C., & Drijvers, P. (2012). Effects of a digital intervention on the development of algebraic expertise. Computers & Education, 58(1), 197-208. doi:10.1016/j.compedu.2011.08.010
  • Bokhove, C., (in press). Using technology for digital maths textbooks: More than the sum of the parts. International Journal for Technology in Mathematics Education.
  • Bokhove, C., & Redhead, E. (2017). Training mental rotation skills to improve spatial ability. Online proceedings of the BSRLM, 36(3)
  • Bokhove, C. (2016). Exploring classroom interaction with dynamic social network analysis. International Journal of Research & Method in Education, doi:10.1080/1743727X.2016.1192116
  • Bokhove, C., &Drijvers, P. (2010). Digital tools for algebra education: criteria and evaluation. International Journal of Computers for Mathematical Learning, 15(1), 45-62. Online first. doi:10.1007/s10758-010-9162-x
Education Research

Economic papers about education (CPB part 2)

This is a follow-up post from this post in which I unpicked one part of large education review. In that post I covered aspects of papers by Vardardottir, Kim, Wang and Duflo. In this post I cover another papers in that section (page 201).

Booij, A.S., E. Leuven en H. Oosterbeek, 2015, Ability Peer Effects in University: Evidence
from a Randomized Experiment, IZA Discussion Paper 8769.
This is a discussion paper from the IZA series. This is a high quality series of working papers, but this -of course- is not yet a peer-reviewed journal version. Maybe there is one at the moment but clearly this version was used for the review. Previously I had already noticed there could be considerable differences between working papers and the final version, just see Vardardottir’s evaluation of Duflo et al.’s paper.
The paper concerns undergraduate economics students. Of course a first observation would be that it might be difficult to generalize wider than ‘economics undergraduates from a general university in the Netherlands’. Towards the end it is however argued that together with other papers (Duflo, Carrell) a pattern results is emerging. The first main result is in Table 4.
The columns show how the models were built. Column (1) has the basic model with only the mean of peers’ Grade Point Average (GPA) and ‘randomization controls’ are included. Column (2) adds controls like ‘gender’, ‘age’ and ‘professional college’. Column (3) adds the Standard Deviation (SD) of peers’ GPA in a tutorial group. Columns (1) to (3) do not show any effect. Only in column (4), where non-linear terms and an interaction are added, some significant variables appear. This can be seen by the **. The main result seems rather borderline, but ok, in the context of ability grouping it is Table 5 that is more interesting.
trackingIn that table different tracking scenarios are studied. The first column is overall effects compared to ‘mixed’, so this looks at the ‘system’ as a whole. Columns (2) to (4) show the differentiated effects. From this table I would deduce:
  • In two-way tracking lower ability gain a little bit (10% significance in my book is not significant), higher ability gain a little bit (borderline 5%)
  • Three way tracking: middle and low gain some, high doesn’t.
  • Track Low: low gains, middle more (hypothesis less held back?), high doesn’t.
  • Track Middle: only middle gains (low slightly negative but not significant!)
  • Separate high ability: no one gains.

This is roughly the same as what is described in the article on page 20. The paper then also addresses average grade and dropout. Actually, the paper goes into many more things (teachers, for example) which I will not cover. It is interesting to look at the conclusions, and especially the abstract. I think the abstract follows from the data, although I would not have said “students of low and medium ability gain on average 0.2 SD units of achievement from switching from ability mixing to three-way tracking.” because it seems 0.20 and 0.18 respectively (so 19% as mentioned in the main body text). Only a minor quibble, which after querying, I heard has been changed in the final version. I found the discussion very limited. It is noted that in different contexts (Duflo, Carrell) roughly similar results are obtained (but see my notes on Duflo).

Overall, I find this an interesting paper which does what it says on the tin (bar some tiny comments). Together with my previous comments, though, I would still be weary about the specific contexts.



Education Research

Unpicking economic papers: a paper on direct instruction

This paper has the title “Is traditional teaching really all that bad?” and is by Schwerdt and Wuppermann makes clear that this paper sets out to show it isn’t. And without this paper I would have said the same thing. Simply because I wouldn’t deny that ‘direct instruction’ has had a rough treatment in the last decades.

There are several versions of this paper on SSRN and other repositories. The published version is from ‘Economics of Education Reviw’, and this immediately shows why I have included it. In the advent of economics papers some have preferred to use this paper rather than a more sociological, psychological or education research approach.


The literature review is, as often the case in my opinion in economics papers, a bit shallow. The study uses TIMSS 2003 year 8 data (I don’t know why they didn’t use 2007 data).

I find the wording “We standardize the test scores for each subject to be mean 0 and standard deviation 1.” a bit strange because the TIMSS dataset, as in later years, does not really have ‘test scores per subject’ because subjects do not make all the assessment items.

pv(link)Instead, there are five so-called ‘plausible values’. Not using them might underestimate the standard error, which might lead to results being significant more swiftly. This variable is the outcome, another variable is the question 20.

teachThe distinction between instruction and problem solving are based on three of these items: b is seen as direct instruction, c and d together problem solving (note that one of course does mention ‘guidance’). There is an emphasis on ‘new material’ so I can see why these are chosen. Of course the use of percentages means that an absolute norm is not apparent, but I can see how lecture%/(lecture%+problemsolving%) denotes a ratio of lecturing. The other five elements are together used as control. Mean imputation was used (I can agree that imputation method probably did not make a difference) and sample weights (also good, contrary to no plausible values).

Table 1 in the paper tabulates all the variables and shows some differences between maths and science teachers, for example in the intensity of lecture style teaching. The paper then proposes a model “standard education production function”. In all the result tables we can certainly see the standard p=.10 and again with large N’s this, to me, seems unreasonable. A key result is in Table 4:

lecturingThe first line is the lecture style teaching variable. Columns 1 and 3 show that Math is significant (but keep in mind, at 5% with high N. However, 0.514 does sound quite high) and Science is not. Columns 2 and 4 then have the same result but now by taking into account school sorting based on unobservable characteristics of students through inclusion of fixed school effects. I find the pooling a bit strange, and reminds me of the EEF pooling of maths mastery for primary and secondary to gain statistically significant results. Yes, here too, both subjects then yield significant results. Together with the plausible values issue I would be cautious.

Table 5 extends the analysis.

table5The same pattern arises. The key variable is significant at the questionable 10% level (column 1) and a bit stronger after adding confounding variables (at the 5% level, but again with high N). The articles notices that over the columns the variable is quite constant, but also that it’s lower than the Table 4 results, showing that there are school effects.

rangeThere is footnote on page 373 that might have received a bit more attention. I find the reporting a bit strange because the first line indicates that variable ranges from 0.11 to 0.14, not 0.14 to 0.1 (and why go from a larger to a smaller number, is this a typo?). Overall, 1% of an SD seems very low. I think the discussion that follows is interesting and adds some thoughts. I thought it was interesting that was said “Our results, therefore, do not call for more lecture style teaching in general. The results rather imply that simply reducing the amount of lecture style teaching and substituting it with more in-class problem solving without concern for how this is implemented is unlikely to raise overall student achievement in math and science.”. Well, that does seem a balanced conclusion, indeed. And again, a strong feature for most economic papers, the robustness checks are good.

In conclusion, I found this an interesting use of a TIMSS variable. Perhaps it could be repeated with 2011 data, and now include all five plausible values (perhaps a source of error). Nevertheless, although I think strong conclusions in favour of lecturing could be debated, likewise it could be said that there also are no negative effects of it: there’s nothing wrong with lecturing!

Education Research

Unpicking economic papers: a paper on behaviour

One of the papers that made a viral appearance on Twitter is a paper on behaviour in the classroom. Maybe it’s because of the heightened interest in behaviour, for example demonstrated in the DfE’s appointment of Tom Bennett, and behaviour having a prominent place in the Carter Review.

Carrell, S E, M Hoekstra and E Kuka (2016) “The long-run effects of disruptive peers”, NBER Working Paper 22042. link.


The paper contends how misbehaviour (actually, domestic violence) of pupils in a classroom apparently leads to large sums of money that people will miss out of later in life. There, as always, are some contextual questions of course: the paper is about the USA, and it seems to link domestic violence with classroom behaviour. But I don’t want to focus on that, I want to focus on the main result in the abstract: “Results show that exposure to a disruptive peer in classes of 25 during elementary
school reduces earnings at age 26 by 3 to 4 percent. We estimate that differential exposure to children
linked to domestic violence explains 5 to 6 percent of the rich-poor earnings gap in our data, and that
removing one disruptive peer from a classroom for one year would raise the present discounted value
of classmates’ future earnings by $100,000.”.

It’s perfectly sensible to look at peer effects of behaviour of course, but monetising it -especially with a back of envelope calculation (actual wording in the paper!)- is on very shaky ground. The paper respectively looks at the impact on test scores (table 3), college attendance and degree attainment (table 4), and labor outcomes (table 5). The latter is also the one reported in the abstract.

table5There are some interesting observations here. The abstract’s result is mentioned in the paper “Estimates across columns (3) through (8) in Panel A indicate that elementary school exposure to one additional disruptive student in a class of 25 reduces earnings by between 3 and 4 percent. All estimates are significant at the 10 percent level, and all but one is significant at the 5 percent level.” The fact economists would even want to use 10% (with such a large N) is already strange to me. Even 5% is tricky with those numbers. However, the main headline in the abstract can be confirmed. But have a look at panel C. It seems there is a difference between ‘reported’ and ‘unreported’ Domestic Violence. Actually, reported DV has a (non-significant) positive effect. Where was that in the abstract? Rather than a conclusion along the lines whether DV was reported or not, the conclusion only focuses on the negative effects of *unreported* DV. I think it would be more fair to make a case for better signalling and monitoring of DV, so that negative effects of unreported DV are countered; after all, there are no negative effects on peers when reported.



Education Research Math Education MathEd

Slides from researchEd maths and science

Presentation for researchED maths and science on June 11th 2016.

References at the end (might be some extra references from slides that were removed later on, this interesting 🙂

Interested in discussing, contact me at or on Twitter @cbokhove

Education Education Research Games ICT Math Education MathEd Tools

Games in maths education

This is a translation of a review that appeared a while back in Dutch in the journal of the Mathematical Society (KWG) in the Netherlands. I wasn’t able to always check the original English wording in the book.

Computer games for Maths

Christian Bokhove, University of Southampton, United Kingdom

51iyzu1DTlL._SX326_BO1,204,203,200_Recently, Keith Devlin (Stanford University), known of his newsletter Devlin’s Angle and popularisation of maths, released a computer game (app for the iPad) with his company Innertubegames called Wuzzit Trouble ( The game purports to, without actually calling them that, address linear Diophantine equations and build on principles from Devlin’s book on computer games and mathematics (Devlin, 2011) in which Devlin explains why computer games are an ‘ideal’ medium for teaching maths in secondary education. In twelve chapters the book discusses topics like street maths in Brasil, mathematical thinking, computer games, how these could contribute to the learning of maths, and concludes with some recommendations for successful educational computer games. The book has two aims: 1. To start a discussion in the world of maths education about the potential for games in education. 2. To convince the reader that well designed games will play an important role in our future maths education, especially in secondary education. In my opinion, Devlin succeeds in the first aim simply by writing a book about the topic. The second aim is less successful.

Firstly, Devlin uses a somewhat unclear definition of ‘mathematical thinking’.: at first it’s ‘simplifying’, then ‘what a mathematician does’, and then something else yet again. Devlin remains quite tentative in his claims and undermines some of his initial statements later on in the book. Although this is appropriate it doesweaken some of the arguments. The book subsequently feels like a set of disjointed claims that mainly serve to support the main claim of the book: computer games matter. A second point I noted is that the book seems very much aimed the US. The book describes many challenges in US education that, in my view, might be less relevant for Europe. The US emphasis also might explain the extensive use of superlatives like an ‘ideal medium’. With these one would expect a good support of claims with evidence. This is not always the case, for example when Devlin claims that “to young players who have grown up in era of multimedia multitasking, this is no problem at all” (p. 141) or  “In fact, technology has now rendered obsolete much of what teachers used to do” (p. 181). Devlin’s experiences with World of Warcraft are interesting but anecdotical and one-sided, as there are many more types of games. It also shows that the world of games changes quickly, a disadvantage of a paper book from 2011.

Devlin has written an original, but not very evidenced, book on a topic that will become more and more relevant over time. As avid gamer myself I can see how computer games have conquered the world. It would be great if mathematics could tap into a fraction of the motivation, resources and concentration it might offer. It’s clear to me this can only happen with careful and rigorous research.

Devlin, Keith. (2011). Mathematics Education for a New Era: Video Games as a Medium for Learning.

Education Research

Some work presented in the last months

snaSome work was presented in the last months.

At Sunbelt XXXV I presented this work on classroom interaction and Social Network Analysis:

At ICTMT and PME my colleague presented our work on c-books

Education Research

Predatory journals

More and more I’m being confronted with questions about journal publications. I devote some words to it in a session for our MSc programme in the module ‘Understanding Education Research’ and recently, in a panel discussion at our local PGR conference, there were questions about how to judge a journal’s reputation. Note that in answering this question I certainly don’t want be a ‘snob’ i.e. that only the conventional and traditional publication methods suffice. Actually, developments on blogging and Open Access are positive changes, in my view. Unfortunately there also is a darker side to all of this:

One place where I always look first when it comes to ‘vanity press’ and predatory journals is Beall’s List, which is “a list of questionable, scholarly open-access publishers.”. What I like about this list is that they are rather sensible about how to use the list: “We recommend that scholars read the available reviews, assessments and descriptions provided here, and then decide for themselves whether they want to submit articles, serve as editors or on editorial boards.”. The list of criteria for determining predatory open access journals is clear as well. One thing you can do is use the search function to see if a journal or publisher gets a mention. This is exactly what I did recently with some high profile research. I was surprised to find out articles were indeed published in such journals.

The first example is this high profile article mentioned in the Times Educational Supplement. It references a press release from Mitra’s university:  

The journal title did not ring a bell so I checked Beall’s list, and yes the journal and publisher are mentioned in this article on the list. Just a quick glance, also the comments, should make most scholars think twice to publish in here, certainly if it is ‘groundbreaking’ stuff. This is not to say that articles per se are bad (although methodologically there is much to criticisise as well, maybe later, although this blog does a good job at concisely flagging up some issues) but I am worried that high profile professors are publishing in journals like these (assuming it was done with the authors’ agreement, predatory journals sometimes just steal content to bump up their reputation). In the case of this person it has happened before, in 2012, when the ‘Center of Promoting Ideas’ (this name would be enough for me to not want to appear in their publications) published this article in a journal, which is also on Beal’s list. It is poignant that an Icelandic scholar really got into problems because of this. Some other examples: this article, CIR world also features on Beall’s list (Council for Innovative Research, again a name which raises suspicion by itself).


These publications serve as examples that even high end professors could fall victim of predatory journals. I do not mean that in a judgemental way; it shows that more education on the world of predatory journals is needed. Although I must admit, there might be some naivety at play here, experienced scholars should know ‘positive reviews only’, ‘dubious publishing fees’ and ‘unrealistic publication turnovers’ are very suspicious. Early Career Researchers often are targets of predatory journals and it therefore is important to be aware of this ‘dark side’ of Open Access publishing. Beal’s list covers these but recently there also are more and more ‘non open access’ journals that might be a bit dubious as well. In many cases it’s quite a challenge to judge the trustworthiness of publications. Certainly if in social sciences we would want to go away from the hegemony of the five big publishers, there is a lot to be gained in general skills to judge literature. Now, everyone has their own judgements to make when it comes where they want to publish, but I would be very concerned publishing in any journal (and for any publisher) on Beall’s list.


A response to a W3C priorities blog

This blogpost is a comment on this interesting white paper by Crispin Weston and Pierre Danet about setting W3C priorities. I first thought to comment on the site but it became rather lengthy so seemed more logical to post it here. Since coming to work in the United Kingdom I have not really been involved in standards for education, but the topic triggered my previous experiences. As from quite early on, 2004-ish, I did my fair share of ‘standards’ work (SCORM packages, programming the SCORM module in moodle, assessment standards for Maths via SURF) I thought it would be good to comment on the blog in more detail, not only because I disagree with some of it, but more importantly, I think any new development in this area *must* learn from previous experiences. I thought long and hard before writing this piece because I don’t want to come over, or be dismissed, as someone against innovation per se. But I must admit, even when I lived in the Netherlands, I did not really feel people really wanted too much innovation.

The context of the document

I hope that some of the claims and associations made in the first paragraph(s) will be reworded or evidenced more. At the moment one sentence combining underperforming education, PISA, MOOCs and elite courses seems far-fetched. I also wonder whether a statement saying that it’s a good time for 1:1 because touch is easy-to-use isn’t a bit too ‘easy’. On evidence I would also say that, when we run with the assumption that technology can do both good and bad, there might not be general evidence of the impact of technology on learning outcomes because it is seen as a means to and end. Like any tool, means are used correctly and incorrectly.

Three barriers

The authors of the piece see three key barriers to the emergence of a dynamic ed-tech market (why the word ‘market’?):
  • The lack of interoperability of systems and content that would allow different sorts of instructional software to work together in an integrated environment.
  • The lack of discoverability of innovative new products.
  • The institutional conservatism of many state-run education systems, which are often resistant to innovation and uncomfortable with the use of data as a management tool.
I certainly agree with interoperability (or lack thereof), which *is* the main topic of the article, as being a big barrier (some more comments later on). The second one, discoverability, is not really defined but if it leads to the conclusion that the W3C would be a good organisation to connect different players, then that’s fine as well. However, the article then primarily emphasizes how the W3C should “work with governments that are prepared to address the institutional conservatism to be found in their own formal education system.”. This is in line with the third barrier the authors define, adding that these systems “are often resistant to innovation”. I think such statements are not warranted and rather unproductive. It also neglects the role, at least on the topic of adoption of standards and interoperability, of companies that, in my view, in the last decades systematically have undermined standards, either intentional or through mis-management and neglect. In my view this thread runs from even HTML3.2 (Netscape vs Internet Explorer) to the current ePub3, Kindle and iBooks endeavours.


The paper then goes to requirements analysis. In general terms this section is ok, and I certainly see a lot of good things in Analytics and Games. I do, however, miss some analysis there. How are these aspects effective in “other businesses”?,  why is that the case?, What characteristics make it this way? What’s to say it would work in education? And, crucially, why and how would you adopt a whole paradigm? To use this then to argue that little has been done, and subsequently to propose how to go forward has a bit of a ‘light touch’.


What I do find interesting and appropriate is the conclusion that interoperability between the two strands, let’s concisely call them analytics and content, is an important aspect. So although I think the analysis falls short, I think there would be no harm in, even potentially serve as catalyst, to have good interoperability. But that’s not something new, of course. Getting a standard is a ‘different cookie’ (Sorry, a Dutch joke). I also like the ambition to outsmart the proprietory market and be there first.
blog2Having used SCORM myself and even having modded the SCORM module in Moodle so that it made more use of the SCORM specification, I think the lessons to learn from it, are not complete. One only needs to look at software that can create SCORM packages like Reload/Weload, Xerte, but also commercial packages to see that it has been possible to make rich content. So I’m not really sure whether it’s the lack of standardisation and tools why it has not really taken off. When I extended the SCORM module most users did not really care, they just wanted a simple scoring mechanism. But now as well: when I see current adaptive systems they are mainly multiple choice and scores. When I look at game mechanisms they mainly are Points, Badges and Leaderboards. To me, that indicates users might not really want more sophistication (yet). Now I understand this might be seen as a chicken/egg issue i.e. when we finally *can* make sophisticated stuff it *will* happen. Perhaps this is true (although history tells us otherwise) but it certainly would be smart to analyze the situation more deeply. Not in the least with regard to the role of the current edtech industry who, in my view, have sometimes frustrated attempts to formulate standards.
This also brings me to a general frustration with the fact that millions have been spent on attempts to write specifications on standards and, even worse, metadata. Over the years this has resulted in thousands of pages of documentation. Why will it be different this time? I feel that before setting out on yet such a journey, that question needs to be answered extensively. The description of the  SCORM standard shows that we are dealing with experts. Given what I said previously I think there are more important reasons for SCORM’s waning than others. Apart from asking ourselves what factors, we also need to ask ourselves how it will be prevented this time. I also wondered whether there still was any scope in assessments standards like QTI. A thread, in my view, through almost all standards is the mis-management and frustration by organisations and enterprises. If W3C leads the process, that is at least a strong start. In how far W3C can confront the largest enterprises in the world, I don’t know.
A second point risk remains hardware and software. Hardware and software becomes obsolete or deprecated. Every time when it happens we are told there are good reasons for it, often inefficiency or security (e.g. java, but that’s also, again mis-management and perhaps personal feuds), but in any case: who’s to say this won’t happen again. In my opinion it certainly ties in again with the corporate aspect. The W3C should be strong enough to address it.

SCORM was under-utilized

From a technical point of view I have always thought the CMI had not been used as well as possible. I agree that it was partly because of the SCORM specification but also unfamiliarity with it, for example the ‘suspendState’ option. A package could communicate quite a lot of information through this field, needing two things: (1) packages that would create a suspendState, (2) a VLE that would actually read that state. I remember being involved in the Galois Project, a maths algebra project, where we tried to do just that. The former was done by creating a system that could produce algebra SCORblog3M packages which utilized the suspendState for storage. The latter had indeed needed to be obtained by reprogramming a VLE, which I did for moodle’s SCORM module. The annotated SCORM module was plugged on the moodle forums. As said previously, most people simply did not see the point in doing such a thing. Now, this was just a hack, but it did led me to believe that there’s more going on in the education world, in that technology is (and probably should be) linked to the preferred didactics/pedagogy. So: maybe we don’t even need a lot of sophistication. Why am I saying this: I think it would be good to use some Rapid Application Development here: get some working stuff out there first, rather than slave for years on specs and metadata.


Having said all that, I do think, given the developments, that a follow-up for SCORM is needed. And also that it is warranted that W3C would look into something like that. It is smart, I think, to take the theme of connectivity rather than content, to make it more W3C. It also provides a good reason to include Analytics. The fact the authors mention privacy and data Protection acknowledges awareness of the politics involved with such an initiative. So overall I think this is a good initiative, but ask attention for the following:
  • Traction and commitment with enterprise. How prevent frustration of the process?
  • Rather get technology working quickly than endless specification and metadata mashing.
  • Promote a more sophisticated use of technology as well.
  • Either refrain from sweeping statements about ‘conservatism’ in education and focus only on interoperability, OR get more evidence for the various claims (I doubt you will find this).