Big data and Ed, Dec 2013

31st December

Many of the popular books and articles about big data always cite the volume of data being produced per day and then proceed to convert these large numbers into some physical equivalent, and then often, compare that to how many times the data equivalent would make a return trip to the Moon. Yes. Lots of data. Included in this volume, I suspect are all things digital, i.e. phone calls, text messages etc. The 2nd point is that while some sets of big data are generated directly by human activity, i.e. click counts on a web page, a lot is indirectly generated, i.e. via algorithms, via inscription generating machines like telescopes and all the other myriad devices that will soon populate the surface of the planet1.

26th December

I've been mulling the problems associated with designing, building and running big systems. I think it is useful to distinguish a lot of data about a tightly focussed topic, e.g. genetic sequencing for a large number of species of plant, animals and other living things. For education there are a number of possible sources, some of which are regarded as big in the old sense, i.e. PISA data. The kinds of applications in online business, tracking user behaviour, choices and so on and providing real-time feedback to the user and the vendor is a long way from the mimicking of this in education analytics.

Needs to be a step by step comparison here. Educational analytics is wishful naming. Business analytics must deliver the right information or the business fails. The consequential softness of educational data is nowhere near the business world.

23rd December

How much of the big data euphoria is for what is really old wine in new bottles? Another section of the CFP is also indicative of the old wine stuff:

Forms of analysis in education – statewide and global systems – are increasingly governed by the huge size of data sets in the order of exabytes (EB 1018 1 EB = 1000000000000000000B = 1018bytes = 1000 petabytes=I billion gigabytes) that present problems of data capture, storage, analysis and presentation. Data sets have grown in size because information is collected by ubiquitous information-sensing mobile devices, aerial sensory technologies and global digital systems. Serious questions emerge concerning who should own and have access to these big data initiatives, Another issue concerns the fact that we know little about ‘underlying empirical micro-processes that lead to the emergence of the[se] typical network characteristics of Big Data’.[1] Some analysts are suggesting that big data in online learning will provide the predictive tools they need to improve learning outcomes for personalized learning: ‘By designing a curriculum that collects data at every step of the student learning process, universities can address student needs with customized modules, assignments, feedback and learning trees in the curriculum that will promote better and richer learning.’[2]

The dream of machine reason has been hovering over education for a long time. Really stupid. All machines can assess is what machines are good at. It makes little sense to teach kids what machines are good at now or soon will be2.

These logics maintain the classical separation between human and machine. It takes us nowhere. Learning analytics3 is being sold as the new silver bullet. Another touch of solutionism4. So there is an option here to run the standard crit. of digital wishful naming/thinking over this stuff or to do the more interesting poke into the politics of algorithms.

Having a look at the Coursera course run by Ryan Baker from Columbia. He points to the PSLC datashop5. The early rationale for collecting all of this data appears to be becasue we can. What claims to be derived from clicks and time on page are characteristics of students like boredom and frustration! It seems to be more like automated telephone systems on steroids6.

So, I am thinking that the crit of this stuff is largely like talking to a wall. It is likely to be more interesting and useful to explore the algorithm side of this which will inevitably get to the banal data that is big data in education.

The other quote in the CFP that also prompts an amused reaction, for me at least is Andreas Schleicher's claim:

Big data is the foundation on which education can reinvent its business model and build the coalition of governments, businesses, and social entrepreneurs that can bring together the evidence, innovation and resources to make lifelong learning a reality for all. So the next educational superpower might be the one that can combine the hierarchy of institutions with the power of collaborative information flows and social networks.

The logic behind this assertion is that the current set of data reporting generates controversy and adversarial responses. The solution, you guessed it, BD7!

When I first came across posts like this, I dismissed them as just more of the same digital hype. Mapping the activities of someone buying something online to a student answering a question online are not the same thing. It's not that I don't think better feedback and so on is not important but making a purchasing decision with what a student ticks as the answer to a multiple choice question is plain silly.

Potentially more interesting lines of enquiry are in relation to new ways of working in the sciences8. Some of the interesting responses from education sources9 are a good deal more measured than the nonsense from bureaucrats who want to manage bigger data sets.

Random quote

Bill Buxton, Principal researcher, Microsoft,, Toronto, Canada
I subscribe to Melvin Kranzberg’s second law of technology: invention is the mother of necessity. Although technologies are created to fulfil needs, each also creates them; the next generation of technologies will deliver the promises of what we already have. The history of communication technologies over the past century tells me that anything that’s going to impact on the next ten years is going to be ten years old already. (The components that made Google possible ten years ago were already there ten years earlier, with the creation of the web.)

21st December

The invitation to think about a paper for Policy Futures in Education came from Radhika Gorur on the 19th. I went to the journal site to get a sense of what they publish. Reassured to see papers written about conversations after a few bottles of good red wine. It looks like education inching towards the stuff I am most interested in and concerned about so maybe a good place to pop other scribbles10. But it is one of those luxury journals. The other good 'find' was an issue dedicated to public pedagogy11. Not so much in the way I am fumbling with but still potentially useful grist.

The other stuff to collect is from the CFP. Then I'll sift through my DevonThink collection of big data stuff. It's funny but the CFP looks a bit like a homework assignment written by students asked to go see what implications there are for big data and education:

This special issue of Policy Futures in Education will investigate big data in education and learning analytics. Possible topics include:
- Big data and education policy
- Big data and the implications for education research
- Big data and edu-business
- Big data and schooling in democracies
- Big data and knowledge production
- Big data and school systems
- Big data and the purposes of schooling.

Big Data is a dodgy enough label as is. It will be interesting to see how this plays out. It is easy to cite, as the CFP does helpfully how big some of the data sets are. How do just plain folks relate to numbers of this magnitude? It's a bit like contemplating the number of stars in our galaxy. How big is big? Education is not new to this turf. Most folk are familiar with the data gathering of international studies like PISA12.

100GB is enough to store at least the basic demographic information—age, sex, income, ethnicity, language, religion, housing status, and location, packed in a 128-bit record— for every living human being on the planet. This would create a table of 6.75 billion rows and maybe 10 columns. Should that still be considered “big data?” It depends, of course, on what you’re trying to do with it13.

Just gathering bits and pieces. R reckons that a paper by Savage & Burrows14 triggered the CFP. Have to sift my collection of papers/files in on around 'big data'. Mike Savage has been doing some useful stuff around methods and the rise of 'the digital'15.

A few initial questions/thoughts.

The issue/problem is being framed as large sets of data. To me, there are a couple of issues here, the methods of assembling such data sets. Some, particularly those in education are collected manually, while many of the other, more sophisticated collections derive from harvesting things like purchases from credit cards, click-throughs etc. Then there is the meshing of large sets. Probably more here. What to me that is really more important is the role of machines/code in the manipulation of data and in its collection, refinement and representation. So much of this work has been given over to machines. You could imagine some time in the future where all you needed was the set of algorithms, access to the data and a mouse. Black boxes R us.

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License