Storing and archiving data


When I was doing my own PhD, I had a filing cabinet with three or four drawers, and even then I had hundreds of photocopies of academic papers stacked in small piles according to theme and relevance to the section that I was writing about next. My raw research data, however, was compactly contained in electronic format in the form of tables and graphs; row after row of numbers on spreadsheets which could be tabulated and correlated in any format that I desired. When I left the department, the files were archived for a few years, and then I suspect they were all dumped when the department moved to another building on another campus.

Now, when I generate research data, it is almost entirely in electronic format, and it is automatically stored in several places. I have my personal space in the memory banks of the university computing system, and this space is automatically backed-up overnight. I also usually back-up to my own cloud-space, so that I can access the data wherever and whenever I want. Usually, I also store data for individual projects on a separate memory stick or portable hard-drive. The digital age means that after two or three clicks, I can be assured that copies of my data are safely held in four or five independent locations. Research students can simultaneously share data with a colleague or supervisor in a different part of the world without even leaving their own desk.

This is only the tip of the iceberg, however, because the production of digital data raises almost as many questions as it provides innovative opportunities. There needs to be an early discussion in the supervisory team, for instance, about not simply which data will be stored, but where will it be stored, for how long, and who will have access to it? This is not simply an issue of security, although security, confidentiality, and appropriate use of the data will certainly figure in the discussion. There is a growing awareness that when public money is used to fund research, there needs to be a transparent return on public interest. Initially this has meant that research results, reports, and journal articles, should be made freely available to the public. This is being extended in the next Research Excellence Framework in the UK to insist that if the journal article is not already published as an open resource, it needs to be added as an open source on the digital repository of the relevant institution. But there’s more.

The argument has been extended to include the research data generated by the public funding, so the datasets themselves are trending to become open and shared property. Whether the data is numbers, interviews, audio recordings, photographs, or other recordable results, the likelihood is that the data being gathered by a researcher today, is probably going to be a shared resource tomorrow. It will be possible for other researchers, in subsequent years, to access your raw data, perhaps combine it with other raw data, and re-analyse, re-interpret, and publish their conclusions. It now begins to matter a great deal more seriously exactly who can gain access to your research data, and for what purposes. As the law currently stands, a bona fide researcher can have access to open datasets for up to ten years after they have been deposited. But here is the catch – if a researcher accesses this data after nine years, the open-access clock is automatically re-set for a further ten years. This ensures the certainty that data which is being collected and digitally stored just now, might be still openly available long after the initial researcher has moved on from that research topic, perhaps changed institutions, changed careers, maybe even passed away. The raw data of open access digital resources is now guaranteed a lifetime longer than the career-span of many individual researchers. So think carefully about what you gather, how you organise and store it, and what your legacy of research data will be!

Recording data


Firstly, I’m aware that I have broken the first ‘rule’ of blogging, which is to keep the posts short, and keep them coming regularly, but I had a bit of a hiatus due to other interests and demands over the summer. Hopefully, now to get back on track

Starting to record the new data which is being gathered as part of a research project, whether a long-term study like a PhD, or a quick toe-in-the-water project, is the most crucial, but perhaps the subtlest stage of the research. If you gather too little data, the project may flounder even before it gets started; too much data, and a metaphoric mountain of results can be generated by cross-correlation and individual analysis, which can paralyse a project almost as quickly as having no data at all. Then there is the question of what is the “right” data? How will I know it when I see it? In reality, it is as likely to be different for every individual project as the diversity of methods of data gathering. The correct procedure, of course, is to recognise that recording the correct data is integrally dependant on selecting the correct research methodology, and in carefully selecting how the data will be collected, coded, and stored in the future.

One of the most impressive records of research data that I can remember, is from a scientist who was studying birds of prey, and his handwriting in an old notebook recorded what seemed to me to be almost every conceivable factor which might influence nesting success, including several factors that I, personally, would never have begun to consider relevant. He was of course correct, for it is often the correlations with hidden, and often apparently spurious, information which leads to the really stunning breakthroughs in research projects. There are many different ways of the recording research data that you might collect, and there is no one-size-fits-all solution. If you are interviewing people, there is a choice between taking notes, audio recording, or video recording; all these methods have their advantages and disbenefits. Taking notes is less obtrusive, but also can be distracting for the researcher. Audio recording can be done easily with a digital recorder, or a suitable app on a smart-phone, but some people may be more guarded in their responses when they are being recorded, and there is also the problematic issue of what to do with all the data you have gathered. Gathering a huge mass of data can be attractive, but it needs to be proportionate to the scale of the project, because there is little point in generating a mountain of data if 80% is left unanalysed and unused. Great care needs to be taken to strike a balance between collecting a good data-set which provides rich possibilities for future analysis, against de-motivating your participants by presenting them with huge questionnaire or over-long interviews. Similar constraints apply when conducting laboratory experiments, fieldwork, or desk-top studies.

Finally, in addition to having to consider your recording requirements in terms of how you propose to codify and analyse the potential results (there is little point in collecting data so randomly that it cannot be interrogated effectively) there are the issues of long-term storage and access to the data. The research supervisor has a crucial role here, not simply in helping to shape what the research students proposes to gather, or how that might be analysed and interpreted, but in providing the continuity which may extend over several decades and overlap with numerous related research student projects. In an increasingly digital and open educational society, not simply the research results, but also the raw research data is also becoming more open and accessible. It is becoming more possible and more likely that scholars coming after you will read not just your conclusions, but also your original data recording notes, so think carefully about what you collect and how you record it!

The things other people say…

Some light-hearted relief over the summer months, and still on the topic of PhD supervision, here are a few blogs that are worth dipping into:

Get a Life, PhD (just what it says!)

The Thesis Whisperer (source of lots of good advice)

Good, practical tips (from someone who has been through it)

And, last but not least, just to illustrate that there is always someone worse off than yourself, take a look at these comments which academics have had from reviewers of their article submissions. Some of them are very, very funny…

Pilot studies


Before rushing off to take the final leap into the swimming-pool of the main data gathering exercise, it is usually advisable to conduct a quick reality-check. In some form or other, a short pilot study, which samples just a small part of each data gathering method, is a useful activity at this stage. Depending on the diversity of the selected data-gathering methods to be used in the main study, it could mean asking 3 or 4 people to complete a questionnaire, or trying out the interview questions on a few “volunteers”, or perhaps conducting a trial run of a bench experiment, just to make sure that things progress in practice as smoothly as they have been envisioned in theory. Either way, a pilot study can do several things. In the first place, it allows the supervisor to observe just how much thought, care, and background research has been already conducted in the formulation of the research methodology of this study. There may be some opportunity for improving the methods, or there might simply be a reassurance that things have been well-planned… so far. Feedback at this design stage may avoid making elementary mistakes, or designing a method which will lead to incorrect or misleading results.

For the research student, the pilot study can have multiple benefits. The reassurance of the supervisor is useful, but the feedback from the pilot participants can be even more critical. This is the time when slightly ambiguous questions can be reworded, and research methods can then be tweaked to make sure that they do what it is hoped that they should do. If a participant reports that the wording of a question is difficult to understand, or that there is no relevant category of response, this suggests that other people in the larger study will encounter the same difficulties. The error created will become multiplied when the full study progresses, and may become significant. The fault in the misunderstanding lies with the researcher, not with the participants being questioned. It is up to the researcher to construct questions which are unbiased, not leading towards a particular response, and are clearly understandable by participants in the sample population. Similarly, with experimental design, if the experiment has a fault in its design, it is much better to find out at this stage through a short pilot study, than to run the experiments several hundred times before finding out that there is a problem.

Writing up a description of the pilot study is an integral part of the methodology chapter in the dissertation. If there were changes made to improve the design of the main research survey, (and even if not) then this is a good place to note the changes, justify them, and demonstrate that the researcher has not simply woken up one morning and plucked a research design idea from thin air. Demonstrate that thought and care has been invested in this. Even the experience of codifying and analysing a few results from the pilot study might give the researcher (and the supervisor) a good sense of the ease (or difficulty) which the final main data-set will present, and allow for a simplification or clarification as appropriate. It is a huge mistake to seek a “short-cut” by avoiding pilot studies!

Getting research ethics approval


Teaching research ethics is almost impossible. Teaching someone about ethics is a different matter, but unless a person actually understands why ethical standards are essential, then everything else is fruitless. It is relatively straightforward to present examples of good ethical practice (and what happens when this practice is ignored) but this simply underpins the implementation of the ethical standards, not the need for them. Fortunately, there are lots of detailed guidelines and professional codes describing the expectations of ethical behaviour, many of them readily available on the web. I say “lots” because the ethical standards vary widely in content and detail, dependent on the subject discipline, the research methods employed, the level of study, and several other factors. This might sound vague, but think about it. There will be a different level of scrutiny required if a researcher seeks access to the confidential medical files of patient, rather than simply asking patients to respond to a few verbal questions. There will be different standards again if the researcher plans to work with animals, or children, or vulnerable adults with diminished responsibility. There is also an ethical code for internet-mediated research, although this is new, variable, and highly contextual, so it is an evolving set of guidelines. Despite these differences, the purpose of research ethics is the same in each case – namely to prevent causing harm to the participants, to preserve their dignity (for example their right to anonymity) and to enable them to withdraw from the study without any undue pressure or penalty.

For these reasons, there is a crucial stage between deciding on what research methods are to be adopted for a study, and the commencement of data collection. This crucial stage is where the researcher submits the details of the design, methodology, and any issues relating to the collection and storage of data, for approval by the university ethics committee. Only after ethical clearance has been approved can the student begin to collect data. Failure to obtain approval before data is collected may result in the university deciding that this data is not admissible for inclusion in the study. If there have been any severe breaches of ethical responsibility, the study may be terminated or the student de-registered. For this reason, the ethical approval of a student research project is a gate-keeper stage of every study.

Fortunately, most research projects have fairly straightforward ethical requirements which are easily satisfied in full. A lot of the ethical safeguards might be regarded as “simply common sense” (and so they are) but you might be surprised by the number of times people say “Oh, there are no ethical issues with my research!” This is almost certainly wrong. Even the issue of whether the researcher with half-formed ideas should be “wasting” the time of an interviewee who almost certainly has something better, perhaps crucial, to do, is an ethical issue. For these reasons, seeking ethical approval for research should be a serious matter, but not something to fret unduly about, if the researcher has properly thought through the research design. Once the ethical approval has been obtained, the researcher is able to jump out of the starting blocks to engage with data collection, and this is where the real fun part starts.

What methods will help to answer the research question?


This is where it gets hard, not simply because the research student is venturing out into the unknown, but also because selecting the methods through which the research will be conducted will differ hugely between cultures, between disciplines, and between subjects within disciplines. There is no one-size-fits-all template which will allow a pick-and-choose approach to selecting the most appropriate methods. In one sense, this is an easy step, because it will probably be pretty obvious from the outset what methods will be needed in order to answer the research question(s). Almost all academic research methods will involve reading, either to follow-up on what has already been said about the topic or to put it into a wider context. After that, the methods might include interviews, experiments, observations, questionnaires, focus groups, and a host of other activities which will change in emphasis from discipline to discipline. Getting the “correct” mixture of these methods is what will determine the methodology, that is, the system of methods for further research.

Here is where high technology can come in. I say “high” technology because even using a pen-and-paper or driving a car to conduct an interview is using technology, but of course we generally mean computer-based technology. In educational circles you will frequently hear the assertion that “the technology should never lead!”. This is certainly true, to an extent, but not entirely. For instance, if there are two (or more) ways to record research data, and one way entails using a high-technology solution which makes it easier, more flexible and/or more secure, then surely most sensible people would vote for the use of the technology. Examples might include, the use of RefME to compile the dissertation reference list and store it on the cloud; using Mendeley to store the articles online; the use of SurveyMonkey to conduct a questionnaire online rather than face-to-face, giving time-flexibility, wider geographic coverage, and the ability to utilise automatic data analysis and presentation tools; the use of a free voice-recorder smartphone app to record interviews… The list could go on and on.

A crucial factor in all of this is to consider carefully – right at the start – how these methods will allow you to analyse and hopefully make sense of the data which will be gathered. It makes little sense jumping off a high-point without knowing, even approximately, where you might land. Similarly, it makes little sense to gather mountains of data without any ideas how to begin to make sense of it. The supervisor should be able to give some clear directions, but ultimately each situation, each carefully worded question, is slightly different, and will have different constraints on time, resources, and abilities, so the student will need to be fully comfortable with the methodology before even starting the research. Prior studies in a similar area can help to provide some direction, but the precise mixture needs to be decided for each individual research project.

Writing the methodology


When starting a PhD, there is often a great mystique surrounding the selection (and writing-up) of the proposed methodology. It is important to remember that the term “methodology” means more than simply describing the methods that are intended to be used for the collection of research data, it is the constructed system of methods proposed, and how they interact. Importantly, in order to understand the data which might be generated by the research, it is critical to first understand the rules which govern the various research methods selected, their strengths and their limitations. The selection of a variety of methods will enable the researcher to gather different types of data, and to look at the research area from complementary angles. As always, it is the role of the supervisor to help the research student put together the best methodology for the research project, that is to say, the best combination of methods through which the student proposes to gather new data on the topic. In most circumstances the supervisor will already have an established preference for one or more methods. It might be necessary to include a second, or third, supervisor who has expertise in a complementary a different set of methods, particularly for multi-disciplinary research.

There are many ways of gathering research data, but broadly they can be divided into three major methodological approaches; these are quantitative, qualitative, and mixed methods. I do not propose to go into much more detail here – there are whole volumes written on even the specific sub-categories of these approaches – but briefly, quantitative research explores through the measurement of phenomena, while a qualitative researcher looks for the emergence of themes or patterns in the evidence provided. A “mixed-methods” approach is not simply a randomly constructed “a-bit-of-one-and-a-bit-of-the-other” style, but it does use both qualitative and quantitative analysis to provide complementary perspectives on the same research topic.

The reason that so much early attention is given to establishing the methodology of the proposed research project is partly because the confirmation of the methodology will determine how the researcher looks at the world emerging through the data; partly, also it will condition the forms of analysis, the reliability, and the compatibility of the research data produced. Any fool can go out and collect data, but getting hold of the type of data which will allow reasonably reliable conclusions to be established is a different matter. In some cases, the choice will be easy. There may be a very limited number of tried-and-tested ways in which an experiment can be constructed, or there might be a very similar study already published, the replication of which to the new subject area might facilitated a useful extension and comparison of knowledge. The supervisor may even have pioneered a particular combination of methods over a long research career and therefore be in a position to give the research student advice on very practical issues, as well as the theory. The literature review is, of course, one element of the methods of research, and the published academic records will likely reveal a quite precise range of options to follow. In any event, it is worth thinking hard right at this stage, in order to avoid false starts and perhaps false data later on.

e-learning, networking, and the UHI

Jon Dron's home page

Aggregated musings, news and stories, mostly about learning and technology


ponderings from the world of educational technology in HE

3E Education

Keith Smyth blogging on education, learning, technology, inclusion

Beyond the Horizon

Commentary and Sustainability Policy Analysis from Dr Calum Macleod


Opening Educational Practices in Scotland

The corridor of uncertainty

e-learning, networking, and the UHI

The Ed Techie » The Ed Techie

e-learning, networking, and the UHI

Learning with 'e's

e-learning, networking, and the UHI