Storing and archiving data


When I was doing my own PhD, I had a filing cabinet with three or four drawers, and even then I had hundreds of photocopies of academic papers stacked in small piles according to theme and relevance to the section that I was writing about next. My raw research data, however, was compactly contained in electronic format in the form of tables and graphs; row after row of numbers on spreadsheets which could be tabulated and correlated in any format that I desired. When I left the department, the files were archived for a few years, and then I suspect they were all dumped when the department moved to another building on another campus.

Now, when I generate research data, it is almost entirely in electronic format, and it is automatically stored in several places. I have my personal space in the memory banks of the university computing system, and this space is automatically backed-up overnight. I also usually back-up to my own cloud-space, so that I can access the data wherever and whenever I want. Usually, I also store data for individual projects on a separate memory stick or portable hard-drive. The digital age means that after two or three clicks, I can be assured that copies of my data are safely held in four or five independent locations. Research students can simultaneously share data with a colleague or supervisor in a different part of the world without even leaving their own desk.

This is only the tip of the iceberg, however, because the production of digital data raises almost as many questions as it provides innovative opportunities. There needs to be an early discussion in the supervisory team, for instance, about not simply which data will be stored, but where will it be stored, for how long, and who will have access to it? This is not simply an issue of security, although security, confidentiality, and appropriate use of the data will certainly figure in the discussion. There is a growing awareness that when public money is used to fund research, there needs to be a transparent return on public interest. Initially this has meant that research results, reports, and journal articles, should be made freely available to the public. This is being extended in the next Research Excellence Framework in the UK to insist that if the journal article is not already published as an open resource, it needs to be added as an open source on the digital repository of the relevant institution. But there’s more.

The argument has been extended to include the research data generated by the public funding, so the datasets themselves are trending to become open and shared property. Whether the data is numbers, interviews, audio recordings, photographs, or other recordable results, the likelihood is that the data being gathered by a researcher today, is probably going to be a shared resource tomorrow. It will be possible for other researchers, in subsequent years, to access your raw data, perhaps combine it with other raw data, and re-analyse, re-interpret, and publish their conclusions. It now begins to matter a great deal more seriously exactly who can gain access to your research data, and for what purposes. As the law currently stands, a bona fide researcher can have access to open datasets for up to ten years after they have been deposited. But here is the catch – if a researcher accesses this data after nine years, the open-access clock is automatically re-set for a further ten years. This ensures the certainty that data which is being collected and digitally stored just now, might be still openly available long after the initial researcher has moved on from that research topic, perhaps changed institutions, changed careers, maybe even passed away. The raw data of open access digital resources is now guaranteed a lifetime longer than the career-span of many individual researchers. So think carefully about what you gather, how you organise and store it, and what your legacy of research data will be!


