Public domain image from Unsplash
This is Lesson 2 of the DataONE Data Management learning series. This lesson covers Data Management: Data Sharing
The topics covered in this lesson include: the role of data sharing within the lifecycle, the value of data sharing, concerns about data sharing, and methods for making data sharable.
After completing this lesson, the participant will be able to:
After completing this lesson, you will be able to recognize the benefits of sharing scientific data, address concerns about sharing data, outline a process for making data sharable, and identify mechanisms for sharing data.
Data sharing should be addressed throughout the data lifecycle.
Data sharing should be addressed throughout the Data Life Cycle Guide to social science data preparation and archiving: Best practice throughout the data life cycle, 4th edition (ICPSR, 2009)
Several stages require critical attention to ensure effective data sharing
Step | Action |
---|---|
Describe | document the data content, character and process |
Deposit | store the data in a location from which it can be accessed |
Preserve | select storage formats and media with long term use in mind |
Discover | publish information about the data so that others can find it |
Effective data sharing requires careful thought during each stage of the data development process including:
Data sharing requires effort, resources, and faith in others. Why do it?
For the benefit of:
CC image by Jessica Lucia on Flickr
Why expend the extra effort to share data? Because it benefits the public, the research sponsor, the research community and, perhaps most importantly, the researcher.
A better informed public yields better decision making with regard to:
CC image by falonyates on Flickr
How does the public benefit from shared research? The more informed the public the better they are able to understand and contribute toward effective public and personal decisions:
Australian Bureau of Statistics, National Statistical Service (2009) A good practice guide to sharing your data with others, Vers. 1. http://www.nss.gov.au/nss/home.nsf/NSS/E6C05AE57C80D737CA25761D002FD676?opendocument
Niu, J. (2006). Reward and Punishment Mechanisms for Research Data Sharing. IASSIST Quarterly, Winter 2006.
Why do research sponsors encourage data sharing? Because sponsors have an obligation to maximize the investment of research dollars.
Data sharing enhances the value of the research investment by enabling external reviewers to verify the project performance metrics and outcomes. This not only increases the credibility of the data but also spurs new research that can build upon the initial investment and advance the science rather than duplicate expenditures.
Guide to social science data preparation and archiving: Best practice throughout the data life cycle, 4th edition (ICPSR, 2009)
Piwowar, H.A. (2011). A new task for NSF reviewers: Recognizing the value of data reuse. http://researchremix.wordpress.com/2011/05/28/dear-nsf-reviewers/
Access to related research enables community members to:
CC image by Lawrence Berkeley National Laboratory on Flickr
The scientific community as a whole also benefits from sharing among researchers. Data sharing allows researchers to build upon one another's work and to further, rather than duplicate, the science by exploring new findings or combining findings into meta analyses that cannot be performed with individual data. In sharing data, the scientific community expands both individual perspectives and the collective comprehension.
Guide to social science data preparation and archiving: Best practice throughout the data life cycle http://www.icpsr.umich.edu/files/ICPSR/access/dataprep.pdf, 4th edition (ICPSR, 2009)
Piwowar HA, Becich MJ, Bilofsky H, Crowley RS, on behalf of the caBIG Data Sharing and Intellectual Capital Workspace (2008) Towards a Data Sharing Culture: Recommendations for Leadership from Academic Health Centers. PLoS Med 5(9): e183. doi:10.1371/journal.pmed.0050183
Teeters, J.L., Harris, K.D., Millman, K.J., Olshausen, B.A., Sommer, F.T. (2008). Data Sharing for Computational Neuroscience. Neuroinform, DOI 10.1007s12021-008-9009-y
National Institute of Health (NIH) (2003). NIH Data Sharing Policy and Implementation Guidelines.
Borgman, C.L. Research Data: Who will share what, with whom, when, and why? In Proceedings of the China-North American Library Conference, Beijing , September 2010. (http://works.bepress.com/borgman/238/)
Access to related research enables community members to (cont’d):
Access to related research enables members of the scientific community to better reproduce, compare and assess methods and results. Scientists are able to learn from one another and educate new researchers as to the most current and significant findings.
Scientists that share data gain the benefit of:
CC image by SLU Madrid Campus on Flickr
And finally, how does the independent researcher benefit from data sharing? When scientists share their data, they gain recognition as an authoritative source and respect as a wise investment for research dollars. When data are exposed, feedback from the broader community can be used to improve the quality and presentation of the data. Shared data also allows for greater opportunity for data exchange and networking opportunities with peers and potential collaborators.
Even if the value of data sharing is recognized, concerns remain as to the impacts of increased data exposure.
Public domain image by succo on pixabay
Researchers may worry that the data will be taken out of context, misinterpreted or used inappropriately.
Problem | Solution |
---|---|
inappropriate use due to misunderstanding of research purpose or parameters | ? |
security and confidentiality of sensitive data | ? |
lack of acknowledgement / credit | ? |
loss of advantage when competing for research dollars | ? |
Researchers may worry that the data will be taken out of context, misinterpreted or used inappropriately. They may also be concerned about maintaining the confidentiality and security of sensitive data. Business concerns may arise as well. Will data users give proper credit and acknowledgement to the scientist? Will the scientist lose a competitive advantage by sharing this valuable resource?
Each of these issues can, in great part, be addressed by providing rich data documentation known as ‘metadata’.
Problem | Solution |
---|---|
inappropriate use due to misunderstanding of research purpose or parameters | metadata |
security and confidentiality of sensitive data | metadata |
lack of acknowledgement / credit | metadata |
loss of advantage when competing for research dollars | metadata |
Each of these issues can, in great part, be addressed by providing rich data documentation known as ‘metadata’.
The metadata does NOT contain the data.
Problem | Solution |
---|---|
inappropriate use due to misunderstanding of research purpose or parameters | provide rich Abstract, Purpose, Use Constraints and Supplemental Information where needed |
security and confidentiality of sensitive data | Use Constraints specify who may access the data and how |
Consider audiences, usability, versions.
Problem | Solution |
---|---|
lack of acknowledgement / credit | specify a required data citation within the Use Constraints |
loss of advantage when competing for research dollars | create second, public version with generalized Data Processing Description |
Shared data should align with the FAIR principles:
FAIR | Principle | How to check |
---|---|---|
F | Findable | Do the data have a unique ID? Are they discoverable via web search? |
A | Accessible | Can the data be accessed without logging in or paywalls? |
I | Interoperable | Can the data be used by humans and computers without special software? |
R | Re-usable | Do the data have sufficient metadata and clear reuse policies? |
Let's see how to put these principles into practice in the next slides...
The primary goal of sharing data is to support reuse, by you and others. In this light, here are "Nine simple ways to make it easier to (re)use your data" [1]
See this GREAT paper for more information about each of these important aspects of sharing data in a useful way.
The primary goal of sharing data is to support reuse, by you and others. In this light, here are "Nine simple ways to make it easier to (re)use your data" [1]
See this GREAT paper for more information about each of these important aspects of sharing data in a useful way.
In 2003, a group of scientists from the National Institutes of Health, the Food and Drug Administration, drug and medical imaging industries, universities, and nonprofit groups joined in a collaborative effort to find the biological markers that show the progression of Alzheimer’s disease in the human brain.
The goal of this project was to do research on a massive scale that would involve sharing and making accessible all the data uncovered to anyone in the world with a computer.
Dr. John Trojanowski an Alzheimer’s researcher at the University of Pennsylvania stated, “It’s not science the way most of us have practiced it in our careers. But we all realized that we would never get biomarkers unless all of us parked our egos and intellectual-property noses outside the door and agreed that all of our data would be made public immediately.” http://www.nytimes.com/2010/08/13/health/research/13alzheimer.html
In 2003, a group of scientists from the National Institutes of Health, the Food and Drug Administration, drug and medical imaging industries, universities, and nonprofit groups joined in a collaborative effort to find the biological markers that show the progression of Alzheimer’s disease in the human brain.
The goal of this project was to do research on a massive scale that would involve sharing and making accessible all the data available to anyone in the world with a computer.
Dr. John Trojanowski, an Alzheimer’s researcher at the University of Pennsylvania, stated “It’s not science the way most of us have practiced it in our careers. But we all realized that we would never get biomarkers unless all of us parked our egos and intellectual-property noses outside the door and agreed that all of our data would be made public immediately.”
[Our “Data in Real Life” segment features a news report on this research. (?)]
Public domain image from Unsplash
In summary, data sharing adds value to the data. As such, it is the responsibility of the researcher to share their data. Metadata should be created for data resources to support data accountability, liability and usability. Research sponsors expect, and increasingly require, data to be shared. THIS CONCLUDES Lesson 2.
To assess your learning on the content presented in this lesson, proceed to the next slide take the quiz.
White, Ethan P., Elita Baldridge, Zachary T. Brym, Kenneth J. Locey, Daniel J. McGlinn, and Sarah R. Supp. ‘Nine Simple Ways to Make It Easier to (re)use Your Data’. Ideas in Ecology and Evolution 6, no. 2 (30 August 2013). doi:10.4033/iee.v6i2.4608.
Wilson, Greg, Jennifer Bryan, Karen Cranston, Justin Kitzes, Lex Nederbragt, and Tracy K. Teal. ‘Good Enough Practices in Scientific Computing’. arXiv:1609.00037 [cs], 31 August 2016. http://arxiv.org/abs/1609.00037.
Gil, Yolanda, Cédric H. David, Ibrahim Demir, Bakinam T. Essawy, Robinson W. Fulweiler, Jonathan L. Goodall, Leif Karlstrom, et al. ‘Towards the Geoscience Paper of the Future: Best Practices for Documenting and Sharing Research from Data to Software to Provenance’. Earth and Space Science, 1 July 2016, 2015EA000136. doi:10.1002/2015EA000136.
Force11 FAIR Principles for data sharing
More stuff:
Participate in our GitHub repo: https://github.com/DataONEorg/Education
Suggested citation: DataONE Education Module: Data Sharing. DataONE. Retrieved November 12, 2016. From https://dataoneorg.github.io/Education/
Copyright license information: No rights reserved; you may enhance and reuse for your own purposes. We do ask that you provide appropriate citation and attribution to DataONE.
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |