Making Data on Microscopic Archaeological Plant Remains More Findable, Accessible, Interoperable and Reusable
Working to create a global open phytoliths community to improve data sharing.
What are phytoliths?
Phytoliths are microscopic plant remains made of inorganic silica that are formed within living plant cells through the uptake of groundwater. These remains are incredibly robust and therefore persist in sediments for hundreds of thousands and even millions of years. Phytoliths are a useful environmental proxy when preservation of organic plant remains on archaeological sites is poor.
Better stewardship of data will lead to better research
Archaeological data is a rich resource but we must consider how to best preserve it for future research. Our datasets have the potential to be reused multiple times, which makes their collection much more cost effective. However, data can only be reused if it is carefully documented and archived so future researchers can fully understand it.
Our project ‘Increasing the FAIRness of phytolith data’ commonly known as the ‘FAIR Phytoliths Project’, website, is a collaboration led by Historic England and University of Pompeu Fabra in Barcelona, but also includes members from the Spanish National Research Council and Texas A&M University. The project is a case study of how a particular discipline of scientific archaeology might improve the management, publishing and archiving of data
Our aim is to increase use of the ‘FAIR’ data principles (Wilkinson et al. 2016) in phytolith research.
‘FAIR’ stands for findable, accessible, interoperable and reusable, and these principles constitute a set of data stewardship guidelines to increase the sustainability of research data.
Essentially these principles help researchers to publish and archive data to a higher quality in terms of transparency and accessibility. Data can then be fully validated by reviewers when submitted for publication and it also allows other researchers the opportunity to fully understand the data, increasing its reuse potential for future research.
We have four main goals for the ‘FAIR’ Phytolith project:
- To find out more about current data sharing and opinions of open science practices in our community by conducting a survey
- To complete a ‘findable, accessible, interoperable and reusable: FAIR’ assessment of existing phytolith data from two regions - Europe and South America
- To offer training in ‘findable, accessible, interoperable and reusable; FAIR’ data and open science tools to the phytolith community and related disciplines
- To draw up ‘findable, accessible, interoperable and reusable’ data guidelines for existing and future phytolith data. Others communities can use them to think about how they could do this for their disciplines.
Achieving these goals will start to improve the quality of phytolith data sharing that will lead to an increase in the quality of research in this discipline. Creating findable, accessible, interoperable and reusable data is one of the practices encompassed in what are termed Open Science (also known as Open Research) practices that aim to open up research so that it is more reproducible, transparent, reusable, collaborative, accountable, and accessible to other researchers but also to wider society. Embracing open science practices, and therefore improving data sharing and the sharing of research more broadly, enables greater confidence in the applications of phytolith research in archaeological and palaeoecological studies.
Why is an assessment of data sharing important in this discipline?
In the last two decades, phytolith research has evolved quickly and is used in a wide range of scientific disciplines including archaeobotany, palaeobotany, geology and plant physiology. Phytolith analysis can be used to answer research questions about past environmental and landscape changes as well as inform us about past dietary and agricultural regimes.
With the expansion of this discipline has come increased publications and the data associated with them. Methods and techniques have been developed in different research groups and this has led to considerable differences in how data is produced, analysed and published. In a study conducted in 2020 (Karoune 2022) looking at phytolith research publications, I found there was a clear lack of data sharing (only 53% shared data in any format) and data reusability was very low - just 4%. In the ‘FAIR’ phytolith project we are looking in more detail at how phytolith data is published and assessing it in terms of the ‘Findable, Accessible, Interoperable and Reusable: FAIR’ principles. This will let us understand how to approach phytolith data sharing to make it more sustainable.
Working with multiple organisations
The multiple organisations involved in this project meant that we adopted a fully open-source approach. We initially trained all of our core team members in using open source tools such as GitHub - an advanced version-control system that can be used as a digital workspace for developing data science projects. This approach has allowed us to work collaboratively on the different project work packages from different locations and time zones.
Our project is funded by 'European Open Science Cloud Life', which is the life sciences hub for the European Open Science Cloud. European Open Science Cloud Life is building a digital space for life science research and we are one of eleven projects across Europe to demonstrate how the ‘Findable, Accessible, Interoperable and Reusable: FAIR’ data principles can be implemented in specific research domains.
Working with European Open Science Cloud Life has enabled us to draw on experts to tackle specific challenges in our work. One of these challenges is the interoperability of phytolith data (interoperability is the ability to connect and merge datasets). In phytolith research the sticking point is the way that phytoliths are named by researchers. To be able to reuse data, we need to understand what the name given to each type of phytolith means. One of the ways to approach this is to develop a standardised vocabulary (or this can be called nomenclatures) for these remains. We have two standardised nomenclatures for phytoliths (Madella et al. 2005 and Neuman et al. 2019), however many phytolith researchers do not use them as they prefer their own systems of naming phytoliths that are more specific to their regional floras.
We found in our project’s ‘Findable, Accessible, Interoperable and Reusable: FAIR’ assessment that out of 100 published articles which we examined only 27 fully used one of the standardised nomenclatures. This means that the majority of phytolith datasets in our study are difficult to fully understand and therefore not interoperable.
We have therefore been looking at ways to improve interoperability of phytolith data and this has led us to experts from the European Molecular Biology Laboratory-European Bioinformatics Institute, which specialises in standardised vocabularies and ontologies. With their help, we have started to develop an ontology (a classification system) for phytoliths that does not force researchers to change their habits of naming phytoliths but draws connections between different naming systems so we can understand the relationship between different names.
Once completed, the ontology will allow phytolith researchers to take another researcher's data set and transform the names to match their own naming conventions making data sharing and reuse more possible.
Global community development
Another important aspect of our project has been to develop a community of researchers interested in this approach– the Open Phytoliths Community. It is important for the impact of our project to involve our whole community from the start so that we consult with them to co-create our ‘Findable, Accessible, Interoperable and Reusable: FAIR’ guidelines and provide training in open research skills.
Training is key to changing research culture to be more open.
Practising open research requires the development of new skills such as sustainable data management, open publishing, and computational skills. Through training, our community can learn of the benefits of open research and hopefully then want to start using these practices in their own work.
We therefore created the International Committee on Open Phytolith Science, which is a standing committee of the International Phytolith Society. We now have 12 members covering 5 continents - North America, South America, Asia, Africa, and Europe.
We are working together on initiatives such as the ‘Findable, Accessible, Interoperable and Reusable: FAIR’ phytolith project, the phytolith ontology and an open publishing guide. We are also currently running a series of workshops on Open Research Skills, which is being funded by another European Open Science Cloud Life grant. This series provides workshops in multiple languages to be fully inclusive and accessible to our whole community
The work of the ‘Findable, Accessible, Interoperable and Reusable: FAIR’ Phytoliths Project and our wider international committee is having an impact on the wider archaeological community. We are working with projects such as the ‘Rewilding Later Prehistory Project’ (https://rewilding.oxfordarchaeology.com/) to contribute to workshops and we are providing training open to researchers in other disciplines. We see our project as a case study for other related disciplines to consider how to implement the ‘Findable, Accessible, Interoperable and Reusable: FAIR’ data principles and the results of this project can be considered more widely as part of Historic England’s new digital strategy.
Acknowledgements
The author would like to thank the members of the ‘Findable, Accessible, Interoperable and Reusable: FAIR’ Phytoliths Project team - Carla Lancelotti, Celine Kerfant, Juan José García-Granero, Javier Ruiz-Pérez and Marco Madella - as well as the other members of the International committee on Open Phytolith Science.
And also thanks go to Historic England’s Investigative Science Team for hosting this project especially Gill Campbell, Matt Canti and Jen Heathcote.