Understanding Open Data
Everything you need to know about making your research data open and FAIR
Here at F1000Research, we’re big advocates for open data. We believe that sharing research data can accelerate the pace of discovery, provide credit and recognition for authors, and even improve public trust in research (but more on that later).
We know the 21st century researcher has lots to think about – not least securing grant funding, conducting high quality research, and maximizing impact after publication. We’re asking you to add one more thing to this list: open data. Far from being another hoop to jump through, sharing your research data can bring a whole host of benefits to every stage of your research journey.
On this page, we’ll walk you through the what, why, and how of data sharing, shining a light on how open data can help you and your research community. Keep reading for information and resources designed to answer your key questions, including:
- What is open data?
- How to make your data FAIR
- Why choose open data? What are the benefits?
- Data collection tips and tricks
- How to prepare your data when submitting to F1000Research
Working in STEM?
What is Open Data?
First things first: let's cover off the basics of Open Data.
In a nutshell: open data is data that is available for everyone to access, use, and share.
For researchers, this refers to any information or materials that have been collected or created as part of your research project – such as survey results, gene sequences, software, code, neuro-images, even audio files. In research, open data practices are also known as ‘data sharing’.
There are some cases where data sharing is not appropriate for legal, ethical, data protection, or confidentiality reasons. We recommend researchers strive to make their data as open as possible, and as closed as necessary. This means researchers should only restrict access to their data where absolutely necessary, in situations where openly sharing the data is not possible.
What is FAIR Data?
The FAIR Guiding Principles were published in Scientific Data in 2016, offering a new framework for research data management, designed to maximize its reuse and support open data practices.
FAIR data is Findable, Accessible, Interoperable, and Reusable. FAIR data goes beyond open data, aiming to make the data itself more useful and user-friendly, rather than simply 'open'. At F1000Research, we endorse the FAIR guidelines as part of our Open Data Policy.
Why Choose Open Data?
What are the benefits of open data, for researchers, research, and society?
When you choose open data, this helps others to replicate your study and validate your results. As such, open data is a fundamental requirement for reproducibility and transparency. These are two things we’re big fans of at F1000Research, because they have impact not just for individual researchers, but for the research ecosystem as a whole, and wider society.
When the data underlying academic research is made open, it makes it easier to question, share, replicate, validate, confirm, and build upon the evidence which underpins the results.
Benefits for Researchers
Boost the credibility of your research
Open data enables replication and validation of your research, which in turn boosts its credibility and robustness. By sharing your data openly, your entire research project becomes more transparent (and satisfies funder requirements, to boot).
Enhance the visibility of your work
Increase the discoverability of your research by reciprocally linking your article and its related datasets. Plus, describing your data with rich, meaningful, machine-readable metadata makes it easy for humans (and computers!) to find and use.
Progress in your career
Researchers can benefit from increased credit and recognition for their outputs by sharing their research data, which in turn may lead to increased opportunities for collaboration – even across disciplines. Plus, one 2019 study suggests that open data can generate up to 25% more citations!
Develop a better understanding of your field
Open data supports learning and enables a deeper, richer understanding of the research topic – this is particularly useful in teaching, as students are able to interrogate raw research data for themselves.
Benefits for Research
Beyond benefiting individual researchers, choosing open data has wide-reaching implications for the research community as a whole. Firstly, it accelerates the pace of research by reducing unnecessary experiments and enabling faster discovery. This streamlining of the research workflow reduces inefficiencies and supports reproducibility and transparency. All of this combines to increase public trust in science and support the wider research agenda.
Benefits for Society
Open data ultimately enables better real-world impact from academic research, which has many benefits for wider society – from driving innovation in technology, to better evidence-based policy-making, and even economic benefits. In addition, open data improves not just public access to and involvement in research, but also public understanding of research and the value it provides.
How to Share your Research Data
So we’ve covered what open data is, and how choosing open data can benefit you as a researcher. But how do you actually do it?
Open data can’t be an afterthought. It’s essential to know at the outset of your research project if you’ll be making your data open, so that you can plan accordingly.
Create a detailed Data Management Plan (DMP) at the start of your project and keep this updated throughout. Your DMP is a living document that will change and grow over the course of your research lifecycle.
A good DMP has benefits beyond simply supporting open data. It will help you find, organize and understand your data better throughout the research process, improve efficiency by reducing unnecessary duplication (e.g. re-collecting data), and even provide continuity in the event of staff turnover.
When it comes to data collection and analysis, there may be discipline-specific (and repository-specific!) guidelines you need to comply with to ensure your data is FAIR. Make sure you’ve done your research, and have a clear understanding of best practice in your field. If you need support with this step, reach out to your institution’s Data Steward for guidance.
Data Collection Tips & Tricks
Data collection can seem daunting, especially for ECRs. Check out these handy tips and tricks to help you navigate this tricky topic, with advice on:
- Ensuring reproducibility
- Collaboration for data collection
- Maximizing data reuse
Open Data on F1000Research
How to share your data in line with our Open Data Policy and Data Guidelines.
Submitting to F1000Research? (Great choice, by the way). Before you submit your article, make sure your research data complies with the progressive Open Data Policy we advocate for, and that you’ve prepared your data according to our stringent Data Guidelines.
About our Open Data Policy
All articles published on F1000Research that report original results should include a Data Availability Statement: this is a short section of text providing citations to repositories that host the data underlying your results, together with details of any software used to process results.
Failure to provide your research data openly is likely to result in your submission being rejected, although there are a few exceptions:
- Ethics and security: where data access must be restricted for ethical or security reasons
- Data protection: where human data cannot be de-identified, so data cannot be shared in order to protect patient/participant privacy
- Large data: where data is too large to be feasibly hosted by a recommended repository
- Third party data: where data has been obtained by a third party, and restrictions apply to the availability of the dataset
In all cases where the data cannot be shared openly, authors should provide detailed instructions for readers on how to apply for access to the data. These instructions should be included in the Data Availability Statement for the article.
F1000Research authors must make all their data, including extended data, openly available. Extended data are additional materials that support the key claims made in your article, but are not absolutely required to follow the study design and analysis. Examples include questionnaires, images, or tables, which some journals may refer to as 'Supplementary Materials'. If there is any code required for processing or replication, this should be included within your extended data. For submission to F1000Research, this data needs to be uploaded to an approved online repository, alongside any data underlying your results.
Creating research software?
You’ve come to the right place. We’re pretty much trailblazers when it comes to software, as one of the first publishers asking for it to be made open alongside the rest of your data back in 2015. Even today, not all publishing platforms or journals require your software to be made openly available, but we think differently. At F1000Research, we know that open software is just as important as open data when it comes to ensuring reproducibility.
So, what exactly are our requirements?
For submission to F1000Research, any novel software should be written in an open source programming language, and made openly available in a structured repository like Zenodo. We also ask for an archived version at the time of submission, hosted on a recognized Version Control System (VCS) like GitHub. Your source code must be assigned an open license, ideally an OSS approved license.
Include software in your Data Availability Statement under a ‘Software Availability’ heading; here, you should list the repository and license under which the software can be used.
How to Write a Data Availability Statement
Not sure exactly what to include in your Statement, or how it should be formatted?
Don't worry - we've pulled together a quick guide which walks you through every step of the process. Download the guide now to find out:
- What is a Data Availability Statement?
- What kinds of data need to be covered by the Statement?
- How to cite repository-hosted data
- When and how to reference research software
- How to reference third party data
- Examples of Data Availability Statements on F1000Research
If you have questions about how to write your Data Availability Statement, you can always get in touch with our Editorial team, who will be happy to help!
4 Steps to Open Data
We've broken our comprehensive Data Guidelines down into four simple steps.
1. Prepare your data for sharing
This step is the most time consuming, but also the most important. Firstly, consider how to make your data as open as possible, and as closed as necessary. Are there any ethical or security issues around sharing your data? Do you need to anonymize your dataset to protect patient or participant privacy? If you’re unsure, reach out to the F1000Research Editorial team for advice.
Are there subject-specific data standards relevant to your research? If so, make sure your data meets these standards, and that you label your files according to discipline-specific best practice. If your dataset includes spreadsheets with large tables, follow our simple Do’s and Don’ts to maximize its accessibility and reusability.
Finally, ensure details of any software that is required to view your datasets is included – if you’ve coded the software yourself, the code should be made openly available too.
2. Select a repository
Your datasets should be deposited in a stable and recognized open repository, under a CC0 license. Your community might have a recognized repository, and some data types (such as genetic sequences or protein structures) have specific data banks they should be deposited in. Struggling to decide which repository is right for your research? Our Data Guidelines include a comprehensive list of F1000Research-approved repositories, or download our handy guide.
3. Add a Data Availability Statement to your article
On F1000Research, all articles must include a Data Availability Statement, even when there is no data associated with the article. This statement helps your reviewers and readers find and access the data underlying your results. Not sure how? Read our guidance on how to write this.
4. Link your datasets to your article
Once your article is published, update your repository project with the DOI for your article. Linking your data and your article in this way means they are reciprocally connected, ensuring you receive credit for your work.
So that's it - four simple steps to open data on F1000Research! Ready to publish? Submit your research now