Provenance information enables datasets that are linked to the software and analysis code that created them and used them in research. It allows users to trace new and ongoing uses of data, and provides rich information about the origins of data that ultimaltely supports reproducible research workflows. Prov-a-thon is a two-day workshop designed to advance practical approaches to incorporating provenance information into tools and workflows that are useful in earth, environmental, and archeological research domains.
Lowndes, J. S. S., B. D. Best, C. Scarborough, J. C. Afflerbach, M. R. Frazier, C. C. O’Hara, N. Jiang, and B. S. Halpern. 2017. Our path to better science in less time using open data science tools. Nature Ecology & Evolution 1:0160.
Marwick, B. 2017. Computational Reproducibility in Archaeological Research: Basic Principles and a Case Study of Their Implementation. Journal of Archaeological Method and Theory 24:424–450.
McPhillips, T., Song, T., Kolisnik, T., Aulenbach, S., Belhajjame, K., Bocinsky, R.K., Cao, Y., Cheney, J., Chirigati, F., Dey, S. and Freire, J., 2015. YesWorkflow: A User-Oriented, Language-Independent Tool for Recovering Workflow Information from Scripts. International Journal of Digital Curation, 10(1), pp.298-313.
Cao, Y., Jones, C., Cuevas-Vicenttín, V., Jones, M.B., Ludäscher, B., McPhillips, T., Missier, P., Schwalm, C., Slaughter, P., Vieglais, D. and Walker, L., 2016,June. DataONE: A Data Federation with Provenance Support. (extended preprint) In International Provenance and Annotation Workshop (pp. 230-234). Springer.
Ludäscher B, Chard K, Gaffney N, Jones M, Nabrzyski J, Stodden V, Turk M, Capturing the “Whole Tale” of Computational Research: Reproducibility in Computing Environments, Science Gateways Workshop, San Diego, 2016.
0715 - 0800 Breakfast
0800 - 1000 Welcome and Overviews (Room: Tamaya ABC)
0800 - 0825 Overview of DataONE (Bill Michener, DataONE)
0825 - 0845 Overview of Provenance (Bertram Ludäscher, UIUC)
0845 - 0945 Overview of the Status of Provenance Tools (Matt Jones, NCEAS)
Review of provenance-related tools & systems (ProvONE, Prov in R + Matlab, YW, Prov Entry UI, Whole Tale)
Demo of the DataONE provenance display system:
Demo of the prov entry UI
0945 - 1000 Goals of Prov-a-thon (Dave Vieglais, DataONE)
1000 - 1030 Break
1030 - 1200 Introductions and Lightning Talks (Room: Eagle AB)
1030 - 1050 Around the room introductions (Amber Budden, DataONE)
1050 - 1150 Lightning talks: Provenance and Reproducible Workflows (Kyle Bocinsky, Whole Tale)
1150 - 1200 Agenda review (Matt Jones)
1200 - 1300 Lunch
1300 - 1445 Provenance Tools I (Room: Eagle AB)
1300 - 1400 Intro to the DataONE R provenance tools (Matt)
dataone
, datapack
, recordr
1400 - 1445 Intro to YesWorkflow (Bertram)
1445 - 1515 Break
1515 - 1700 Provenance Tools II (Room: Eagle AB)
1515 - 1700 Intro to the Whole Tale web tool (Matt & Bertram)
0715 - 0800 Breakfast
0800 - 1000 Breakout Groups: Archaeology (Room: Eagle A), Environmental Science (Room: Eagle B)
Environmental Science (Jones)
Breakout agenda planning
Hands on provenance metadata writing activities, troubleshooting, usability
Identify future development directions (DataONE/YW/WT/rrtools/others?)
Discussion of barriers to reproducibility in environmental sciences
Planning for advocacy for reproducible research approaches in environmental science
Archaeology (Bocinsky)
Hands-on with WT/rrtools/opencontext/dataone — Building tales
Discussion of barriers to reproducibility in archaeology (generalizable to other disciplines; ideas below)
Lack of training in computational methods/reproducibility
Persistence of data hoarding/siloing
Data sensitivity & archaeological looting
Few “sticks” from journals/funding agencies/professional societies
Few “carrots” from journals/peers/tenure committees/funding agencies
Archaeology Goals (Bocinsky):
Tool assessment/usability feedback (YW/WT)
Identify future development directions (DataONE/YW/WT/rrtools/others?)
Create provenance records (DataONE)
Identify ways to promote reproducibility in the communities
Identify next steps/plans for further collaboration
How To Do Archaeological Science Using R book status update (Ben/Matt/Paulina/Kyle)
Intro and feedback on rrtools package (Ben)
Plan further advocacy for reproducibility in Archaeology (ideas below)
SAA committees (publications/curriculum)
SAA Events/forums/workshops?
Collaborations with open journals?
1000 - 1030 Break
1030 - 1200 Continued Breakout Sessions
1200 - 1300 Lunch
1300 - 1445 Continued Breakout Sessions
1445 - 1515 Break
1515 - 1700 Plenary: Reproducibility and Provenance for Science
Report back from breakout groups (10 minutes each)
Moderated Discussion (Kyle & Matt):
Evangelism and advocacy for reproducible research in general
The tool landscape supporting reproducible research
Next steps
Roadmap for DataONE and Whole Tale tool development
How to get buy-in from data contributors at the user level
How to build a community pulling towards the same reproducible research goals
What do we collectively want to do next
Conclusion, establish report back mechanism
0730 - 0800 Breakfast
0800 - 1100 Review, Reflections, Follow-up
1100 - 1200 Lunch (boxes)