2  What is Data?

Before we can explore what we mean and understand about ‘data management’, we should start by discussing what we mean by archaeological ‘data’. Archaeology as a discipline spans the humanities, social sciences and physical sciences. As such archaeologists are used to dealing with a variety of different data types.

The term data is defined in the Oxford Dictionary as:

A definition of data as facts or information, especially when examined and used to find out things or to make decisions

What we consider data in archaeological research is actually quite broadly defined. In a basic sense, data is that which can be incorporated into a different research outputs. In reality any and all information that we create and use to examine things as part of your research could, and should, be considered data.

2.1 For the Humanities

In the humanities there is sometimes confusion within researchers about what they would or would not consider ‘data’, particularly between sub-disciplines. Where data is known and defined - in a scholarly sense - it tends to be focused in the humanities GLAM sector (Galleries, Libraries, Archives and Museums).

The article listed below, published in the Journal of Documentation by a group of Italian researchers, interviewed 19 researchers from the Department of Classical Philology and Italian Studies at the University of Bologna.

Gualandi, B., Pareschi, L. and Peroni, S. (2023), “What do we mean by “data”? A proposed classification of data types in the arts and humanities”, Journal of Documentation, Vol. 79 No. 7, pp. 51-71. DOI: https://doi.org/10.1108/JD-07-2022-0146

The purpose of the research was to shed light on the definition of the word “data” in the humanities domain. The graphic below provides the groupings defined by the article, which constitutes 13 different types, and provides a good assemblage of the different types of data that we create and collect in the humanities.

A picture of a word cloud denoting different types of humanities data

Some of these data types are relatively obvious and applicable to all researchers, such as publications, primary sources, documentation and standards and guidance.

Some others are more digitally based, such as catalogues/databases, websites, software, digital representations (e.g. photos, 3d models) and digital infrastructures (e.g. a 3d model viewer or mobile app).

Other data types are perhaps less obvious, such as events (e.g. conferences, exhibitions, webinars, guided tours or training sessions), borne digital artefacts (which are different per discipline - more below) and corpora, which represents a searchable database of language samples specifically for linguistic research.

2.2 For Archaeology

Unfortunately, similar research has not yet been undertaken for archaeology as a discipline. However, below is an attempt to create a similar mapping exercise for archaeology, using research published in the Advances in Archaeological Practice about archaeological data collection (referenced below), and by crowd sourcing participants from the in-person version of this data management training course.

Heilen, M., & Manney, S. (2023). Refining Archaeological Data Collection and Management. Advances in Archaeological Practice, 11(1), 1-10. DOI: https://doi.org/10.1017/aap.2022.41

A picture of a word cloud denoting different types of archaeology data

Archaeological data types

Some data types above incorporate the elements that we see in the Humanities more generally, including Primary Sources, Publications and Grey literature reports.

Other types are more specific to archaeological investigation including Stratigraphic data, Excavation records and Geophysical survey, while others relate more to historical periods, such as Architecture and Oral histories.

Interestingly there are a number of data types that relate specifically to digital sources including 3d models, Geospatial data and Remote sensing.

What we are finding is that, over the last ten years, archaeological data is increasingly borne digital, by which I mean the digital records are the only resource we create as part of archaeological research that are unsupported by physical medium, such as paper records.

This exercise is to highlight the importance of proper data management but also to broaden our understanding of what data constitutes and what data may be useful for ourselves and others. Our research data is not just the outputs or end products of our research projects, but also includes much of the data that is produced along the way.

2.3 The Research Data Life Cycle

A graphic denoting the different stages of the research data life cycle

The Research Life Cycle: The Turing Way Community, & Scriberia. (2023). Illustrations from The Turing Way: Shared under CC-BY 4.0 for reuse. Zenodo. DOI: https://doi.org/10.5281/zenodo.7587336

Thinking about how we document or manage all of these different types of data can be quite overwhelming for many researchers. A useful way to visualise the flow of data within a research project is to consider how it forms part of what is called the ‘research data life cycle’.

As shown in the image above, the research life cycle is the realisation that most research is not a discrete and linear project, despite the way we design them, but part of a larger and long term cycle. Over our research careers one project feeds into the next, building upon past knowledge and, vitally, our research data. Let’s look at the stages in more detail.

2.3.1 The Early Stages

Many researchers will be familiar with many of the early stages of the research life cycle including data collection, data processing and data study and analysis. Already in these early stages when we consider data we can envisage how it is being transformed from raw data, to processed data to (probably) some sort of table or visualisation. Equally the next step, data publishing and access, is likely also familiar - the equivalent of writing the paper or book from your research and getting it published.

2.3.2 The Later Stages

The next steps are generally less well understood but vitally important; data preservation and data reuse. Following completion of your research these stages ensure the long term preservation of your datasets, or at least a subset of them (more later). Data reuse is an attempt to increase the impact of your research data, possibly by transforming your data into an accessible format and by making it as reusable as possible, for both others and for your own future research.

These steps are important because through organising and managing your data for the preservation and reuse stages, you can best utilise this data to drive and direct the next stages of your research through research ideas and research data planning and design.

The video below by the UK Data Service provides a useful overview of the Research Data Life cycle including the actions associated with each of the stages listed above.