Data analysis

dataset

The data analysis stage is one of the main research areas where the supervisory team can really make a significant contribution to assisting the research student. Obviously, the student still needs to do the work for themselves, but this is a stage in the PhD process where the depth and breadth of experience of the supervisors should shine through and help the student to make sense of a complex set of tasks, and make them a bit simpler to complete. Having collected a mass of data, perhaps even coded and categorised this data according to exacting and laborious protocols and methods of analysis, the student needs to understand what this data is actually saying. This might be a simpler task for some projects than for others, according to the amount of data collected, the form in which it was collected, how detailed or exact the observations or calculations are, or what methods for codifying or interpreting the raw data have been employed in the research methodology.

To the beginner, this might seem straightforward, but there is no “one-way” to analyse data, because there are many, many different forms of data. This data might be collected at different levels of granularity, different levels of accuracy, and embody different assumptions and methodological approaches. At some early stage of the analysis it is usually a good idea for the research student to sit down with the supervisory team, spread the collected data out on a table, and look at it together. The student needs to identify a number of key attributes of this data, such as what does this data actually indicate? How robust is the data? What is its accuracy and what are its limitations? Are there any correlations (positive or negative) and so on, leading to the penultimate question, which is what does this data actually tell us? Hopefully there will be a new insight, or a discovery which, at least in part, will address the initial research question(s). The final question is then, how should I present this information so that it understandable to others who have not been involved in this research, and how transferrable is this knowledge to other (present and future) researchers in this general subject area? The ability to repeat, re-combine, and re-use data (and the results of its analysis) is a particularly useful feature to enable contrast and comparison with other projects in similar or related subject areas.

It will seem a bit ironic, or a perhaps a bit of a paradox, given that we are often seeking to reveal hitherto unknown facts, and “answers” to high-level research questions, but usually it is better to be less ambitious in the interpretation of the conclusions, but to be absolutely sure of the reliability and credibility of our data, rather than to propose over-ambitious conclusions (exciting thought they may seem) which are based on sketchy evidence and correlations which are skating on thin ice. It the research can be shown to produce solid, incontrovertible evidence, regardless of how large or small the breakthrough, then this small advance can be built-upon by subsequent researchers. If the conclusions are like a house built on thin ice, then there is always a doubt about the credibility of the data, or the results, and therefore the value of the research output is devalued. With data analysis, it always pays to check, cross-check, consider the constraints and limitations, then double-check each stage before you venture to draw conclusions to share with a wider audience.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s