FAIR data principles for research data
FAIR data principles for research data
One of the main objectives of a good RDM practice during the research project is to enable the long-term preservation of FAIR data objects after the research results have been published and/or the research project has ended. By preserving data for the long-term, it becomes possible to (1) reproduce the findings of a certain study at a later stage and to (2) re-use the data for new research purposes.
The guiding principle should be that the data are as open as possible, and as closed as necessary.
Long-term preservation of the data in itself is not sufficient to turn the data into a valuable and citable research output that is on par with publications. Barely documented data can be stored for a very, very long time on the private server of a research department, gathering dust and steadily sinking into oblivion. But digital innovations will leave that data unreadable for both machine and researcher.
Detailed documentation of you data collection processes (as part of your general data management) can help to ensure that any selection of dates clear, readable and can be made available for others. Also preregistration of your methodology cam help to prevent any threat research integrity.

Cartoon by Patrick Hochstenbach under a Creative Commons CC BY-SA 4.0 license
FAIR Principles
This is where the FAIR guiding principles for scientific data management and stewardship enter the picture. The FAIR principles were initially conceived for research data (Wilkinson et al, 2016), but are also being applied to more specific types of research outputs such as software (Lamprecht et al, 2019). Data can be turned into FAIR objects, which makes them ‘exploitable’ for the broader research community in the long run. FAIR stands for Findable, Accessible, Interoperable and Re-usable.
- Findable
Ideally, data and accompanying documentation/metadata are made findable by both humans and ICT systems. Concretely, this findability is typically ensured by means of ‘discovery metadata’ that are available via a data search engine such as DataCite. If you search for the data on the search engine, the associated discovery metadata, including the name(s) of the data creator(s), the subject of the data etc., pop up among the search results. Commonly, the discovery metadata would include a persistent identifier (e.g. DOI, handle, etc.) that directs you to the landing page where the (non-sensitive) data are available for download. Note that findability comes in different flavors. Data published on a personal website or a specific project website are to some extent findable, but not in any meaningful, structural way. - Accessible
The access conditions for the data are well-defined, supported by the appropriate license (e.g. a Creative Commons license for open data). Data are published in open access when possible, but restricted/closed access is applied in case of sensitive data (e.g. personal data). Although sensitive data are not publicly made available in open access, they can still often by re-used by other researchers, albeit via a more complex procedure that safeguards the rights of the data subjects and ensures data security. Note that the discovery metadata referring to the sensitive data can still be publicly available, even though the data themselves are not. - Interoperable
Interoperable data are data that can be combined with other datasets by humans as well as ICT systems, and has no unnecessary legal obstacles (e.g. OA license with overly complex restrictions). Additionally, the data can easily interoperate with automatised analysis workflows or other applications. It is also important that the documentation/metadata accompanying the data maximally adhere to discipline-specific standards, for instance by using ‘controlled vocabularies’ and can be encoded in a standardised, structured format in order to make them machine-readable. Examples of generic metadata standards are the Dublin Core and DataCite Metadata Schema. - Re-usable
The three pillars ‘findable’, ‘accessible’ and ‘interoperable’ are all necessary prerequisites to make the data eventually re-usable and interpretable by other researchers. Particularly important is the documentation/metadata that accompanies the data such as, f.e. a codebook that explains the different variables or an explanation of how the data were collected. Without the adequate documentation, data are generally difficult to interpret, which obviously hampers re-use. Note that, if the data are sensitive, re-use is not impossible, but has to comply with stringent conditions stipulated in a ‘data use agreement’.
Data Repositories
Researchers should hone their FAIRification skills, in order to make the data that they collect or generate as FAIR as possible. However, they are not alone in this endeavour. Next to supporting RDM services at research institutions, there is also a pivotal role for the ‘trustworthy repository’ where the research data are ultimately archived. For example, the data repository can be well-connected to the broader data ecosystem, enhancing the findability of the archived data, and provide infrastructure to implement certain metadata standards, improving interoperability.
- Re3data is a registry of institutional, disciplinary and interdisciplinary research data repositories worldwide.
When to think about this?
As already stated before: “RDM includes all steps before, during and after the project” which means you have to handle your research data correctly throughout the whole research data lifecyle to ensure the quality and integrity of your research. A data management plan (DMP) is the best tool to help you to do just that.