The DataONE system should log various interactions and operations in the system
to provide operational status information about the entire system, to report on
specific node operations, and to inform DataONE participants (users,
contributors, administrators) about their specific domain of interest in the
system. For example, a contributor might like to monitor use of their data and
where it is being replicated to. The methods MNCore.getLogRecords()
and
CNCore.getLogRecords()
provide outward facing services for retrieving
log information from member and coordinating nodes respectively.
Logging is described in use cases 16, 17, 18, 20, and potentially 19.
UC 16 specifies “all CRUD operations on metadata and data are logged at each node”
UC 17 indicates that all CRUD operation logs should be aggregated at the CNs
UC 18 indicates that a MN can retrieve aggregated logs about content that originated from that MN.
UC 20 indicates that a data owner (original contributor, delegated owner) can retrieve aggregated logs about objects they own.
UC 19 indicates that anyone can retrieve general use information for any object in DataONE.
The performance metrics survey results from the Leadership Team specify (at least) the following metrics should be captured. Items that may be captured from the CI portion of the project are indicated by !!!.
Size and Diversity of DataONE Data, Metadata, and Investigator Toolkit Holdings
!!! Data volume – total size of data holdings in DataONE Member Nodes and Coordinating Nodes
- Recorded in:
Total (including replicas) data volume + unique object data volume
Sysmeta
!!! Number of metadata records – quantity of metadata records held at a Coordinating Node (note: the concept of a record may vary across metadata standards)
science metadata only
!!! Number of data sets held by Member Nodes – (note: may be less meaningful as an absolute value because of the immense variability in data set granularity, but probably still useful in tracking the shape of the data accumulation curve)
system metadata
!!! Number and types of software tools included in the Investigator Toolkit
!!
Mechanism to register external uses / implementations
HTTP user-agent
!!! Number of different metadata schemas supported – (note: more metadata schemas is not necessarily better)
object format from sysmeta
DataONE System Capacity
!!! Number of Member Nodes
registry
!!! Total storage capacity at Member Nodes
per member node
total
part of MN registration and capabilities
!!! Geographic coverage of Member Nodes – continents, regions, and countries covered
could potentially be part of capabilities (record physical location)
!!! Number of Coordinating Nodes
constant = 3
general expansion over five years to all countries.
!!! Total storage capacity at Coordinating Nodes
disk space
!!! Geographic coverage of Coordinating Nodes – continents, regions, and countries covered
capabilities / metadata
DataONE Usage Statistics
!!! (CN LOG) Number of web hits on DataONE portal
standard web hits from logs
aggregated across CNs
!!! (CN LOG) Number of DataONE users – (note: recording of individual IP addresses may be most readily implemented; requiring users to login to Member Nodes is not presently required)
Any users of MNs = D1 user
Eventually can use authn subsystem
!!! (supporting web site logs) Number of downloads of tools from the Investigator Toolkit –> “Number of times tools are downloaded”
Log analysis
Text needs to be clarified. ITK isn’t a place
supporting information (videos, documents, …)
!!! (CN LOG) Number of metadata catalog searches completed – (note: over time it may also be desirable to assess precision and recall of incoming searches)
part of the Mercury log (in mysql)
!!! (MN, CN LOG) Number of DataONE datasets downloaded (daily, weekly, monthly, annually) – (note: this may be straightforward if included in specifications for Member Node data, impossible otherwise)
Member node access logs - need to be aggregated
What was pulled through D1.get() vs the native mechanisms of the MN
!!! (MN, CN LOG) Most frequently downloaded datasets
Same as 16
Reliability and System Performance
!!! (CN heartbeat logs) Uptime (availability) of Coordinating Nodes
Need a monitoring service in addition to the CN service
also need to consider geographic accessibility (users)
!!! (MN heartbeat logs) Uptime (availability) of Member Nodes – (note: tracked if heartbeat is fully implemented)
Same as 18
!!! Server response time
REST service performance
Define a bunch of test queries that can be executed in parallel for load testing.
!!! Response time of user interface
Time for “page load” vs. number of concurrent users
Time for specific operations (test queries, test renderings, …)
Community Engagement
Baseline assessment of scientists completed
Number of repeat assessments of scientists completed
Baseline assessment of other stakeholders completed
Number of repeat assessments of other stakeholders completed
Number of DataONE Partnership Agreements established
Education and Outreach
Number of education modules developed and/or accessible through DataONE
Number of times education modules are downloaded
Number of best practices guides developed and/or accessible through DataONE
Number of times best practices guides are downloaded
Number of training sessions or workshops offered (e.g., at meetings)
Number of workshop participants
Number of people in DataONE International Users Group
Sustainability
Amount of revenue (including in-kind support) generated annually to support DataONE
Diversity of revenue streams – e.g., government, private foundations, commercial for-profit sector, etc.
Number of projects and partners collaborating with DataONE or leveraging DataONE infrastructure
The following bullets represent the union of logging information indicated in the use cases and the metrics that can be reported from the logs. The information logged and suitable summarization and extraction procedures need to be identified to ensure the following items can be addressed:
all CRUD operations on metadata and data are logged at each node
all CRUD operation logs should be aggregated at the CNs
an MN can retrieve aggregated logs about content that originated from that MN.
A data owner (original contributor, delegated owner) can retrieve aggregated logs about objects they own.
Anyone can retrieve general use information for any object in DataONE.
(metric) Number of web hits on DataONE portal
(metric) Number of DataONE users
(metric) Number of downloads of tools from the Investigator Toolkit (From the download site logs)
(metric) Number of metadata catalog searches completed
(metric) Number of DataONE datasets downloaded (daily, weekly, monthly, annually)
(metric) Most frequently downloaded datasets