provathon-2017

Prov-a-thon Day 2, Friday 2017-09-01

Ecology Breakout

https://docs.google.com/document/d/1UND3RohNBRIwmLbZdTTfom2KOAo-dohsBOteaqOttqU

https://dataoneorg.github.io/provathon-2017/

CB: Challenge: The example that motivates the need to have provenance metadata. It’s clear why writing clear metadata is empowering. Lacking the same for provenance. MJ: Why provenance and use cases.

JL: Benefit in meeting earlier with Archeology since there are shared questions / stories with respect to the ‘why’

CJ: Nice to have both the prose documentation and the script provenance

Josh: Would love to avoid parallel processing and expand rmd

JA: When does the story start? What’s the most useful starting point?

MJ: Linking to publications. Are publications derived from data or do they contain the data?

CB: Is provenance the future of citations?

Topics

  1. Why provenance

    1. Use cases: OHI, NOAA

    2. What is useful?

    3. Who uses it?

  2. Merge ProvONE and RMD/Prose methods

  3. References to “other people’s data”

  4. Citations as provenance

  5. Tool compatibility

Why Provenance

OHI Use Case

Archaeology Breakout

[Prov-a-thon Day 1 Notes]

Agenda Ideas

SAA Board

SAA Publications Committee

Future SAA session on peer reviewing to raise awareness about lifting standards

Appeal to NSF program director (John Yellen), “the main professional organisation says we must do …”

rrtools package demo: Ben Marwick

Reproducibility round robin evaluation exercise:

How to get to archaeologists that are not us, to help them learn about this stuff:

Afternoon

Plenary Session:

Kyle B reporting

Ecology group reporting:

General Discussion:

  1. Prioritization

  2. Community Outreach

    1. Sponsor incubation actual research outcomes (e.g. workshops)

    2. Highlight examples in the community

    3. Archeology group brainstormed - committee work (ESA, ..). Discussed motions to propose to bairds of societies, discussed bottom up approach to behavioral change, workshops for training new people (not sure what that looks like), try to reproduce archeological research and anticipate this to be eye opening. Junior scholars - create materials for this cohort?

    4. What has been successful thus far in the community? And what hasn’t been successful? Showcase examples.

    5. People adopt things if they are easy to adopt. Support ‘leaders’ to get provenance metadata to facilitate the spread of knowledge.

    6. Use other forms of communication - blogs, social media etc for high exposure.

    7. Ted talk on DataONE

    8. Drop the word provenance from materials and talk about reproducible research? Citable data? Workflows? Provenance is different to reproducible. It brings out communication. The distinction is useful. However, is the distinction important to people that don’t even understand reproducible?

Next Steps

  1. Workshop report out

  2. WholeTale working groups

  3. Exposing data via WholeTale blogs

  4. Recycle award - sponsored prize for someone that reuses data. Through WholeTale?

  5. Follow through on getting uses cases (in a collaborative fashion?) completed

    1. Coordinate through github?
  6. Submission review process for reproducible packages. Community developed criteria.

  7. Github badges

  8. Increase visibility through development / improvement of R packages.

  9. Integrate prov (yesworkflow?) support into Rstudio and rmd

  10. Rtools, data repo, provenance interoperability

  11. Feedback on tools through various mechanisms

  12. Linking out to other objects. Introduce ‘scoring’ system for availability of data. (could extend the scoring to an author level metric as a check when entering prov data)

  13. Develop a recommendation (with examples as rationale) for DOI use. What the DOI should resolve to.

  14. Quick follow up survey.

Actions

  1. Carl nominated Ben to establish a submission review model. Carl and Matt to support.

  2. Josh will develop out a use case and communicate via slack

  3. Peter to explore yesworklow as a stand alone package that RecordR can use

  4. Ben to explore interoperability of tools. Kyle will support.

  5. Matthew willing to test and guinea pig things as needed.

  6. Bertram will lead the report.

  7. DataONE to explore how to link out to DOIs, recognising variation in ‘class’ of objects.