The planning and design for DataONE cybersecurity is predicated on the fact that DataONE is a diverse collaboration of researchers, data providers, institutions, coordinating nodes, member nodes, data collections and other infrastructure components. As such, DataONE is inherently a virtual organization (VO). DataONE, as an entity, spans many physical organizations and administrative domains. The approach that DataONE follows for cybersecurity is one in which all sensitive operations and or data/metadata resources within the DataONE VO require users requesting access to the operation/resource to register and authenticate with the DataONE system before an evaluation of rights to the operation/resource can/will be performed. DataONE supports authentication through CILogon, inlcuding both its InCommon partners such as Universities and OAuth providers such as Google. Thus, any given user may have multiple identities which are verified through the authentication mechanisms from multiple providers, and these identities can then be mapped to one another to manage changes as users change institutions and roles. Once a user is authenticated and mapped to their DataONE identity, they can establish a session to pass credentials to DataONE services by either 1) passing a CILogon or DataONE-issed X.509 certificate to the service provider as part of SSL negotiation, or 2) by providing an authentication token that is provided in HTTP requests made to the DataONE nodes.
To this end, three services support identity management and authentication:
Identity Management - user account registration, identity mapping, group management, and management of authentication tokens
Authentication - establishing the identity of a user
Authenticated Session Management - establishing a timed-session that identifies a DataONE user
Identity Management for DataONE addresses the need to identify users that request the use of services and or data/metadata resources within DataONE (DataONE does recognize that not all services/resources require user identification, thus support for anonymous access to certain services/resources is possible using a Public identity). DataONE provides services for users to register their identity with DataONE in a user account so as to create a unique DataONE identifier, along with other attributes about that user. This account information may be used for authorization and logging DataONE transactions.
Users may have multiple identities as a result of distributed research endeavors at different participating organizations and or changes in organizational affiliation. Because of this, the DataONE Identity Management service will support user identity mappings, which allows users to authenticate using any one of their multiple identities, but still be recognized as the same DataONE identity. When a DataONE authenticated session begins, information pertaining to the user’s identity is available for authorization purposes, which includes a listing of all mapped identities associated with that user; these mapped identities serve equally well for authorization decisions - that is, within DataONE access control policies, reference to any mapped identity is the same as using any other of the user’s identities.
DataONE also supports the use of ORCID identifiers both for authentication and within Access Policies. In this scenario, a user can either map their ORCID identifier to an existing CILogon Distinguished Name, or they can use ORCID as the primary mechanism for logging into DataONE, in which case the ORCID is the user’s primary identifier.
In addition, the Identity Management service provides a system for users to create, store, and modify groups of users that can be used in access control directives. Only the user creating a group will be allowed to delete the group or to change the group’s membership. Service APIs for group management are outlined below in the Identity Management Service section.
The Internet2 project has defined a product called Grouper that is a standalone group management utility and that publishes a web interface for interacting with the service. It allows users and organizations to create and manage groups. See Grouper and earlier work from Internet2 on group management through their MACE project.
Principals are users, groups of users, and system services within the DataONE system. They need to be represented in access policies, authentication sessions, and other places within the system.
The values for identifiers representing principals are unique, persistent, non-reassignable strings. Within those constraints, it is useful to use a common convention for representing and scoping these principal names. In DataONE, Principals are currently represented in one of several forms, although future forms may add to these as needed:
X.509 Distinguished Names (DN)
Examples of the syntax for the representation of principals in an DN include:
CN=Matt Jones A729,O=Google,C=US,DC=cilogon,DC=org
Identifiers from ORCID are compatible with ISNI and take either a bare form or are embedded in an orcid.org URL:
Within DataONE, values of
Types.Subject can be represented as the string
form of a Distinguished Name (DN) as defined in RFC4514. Distinguished
Names are composed of a sequence of Relative Distinguished Names (RDNs), each of
which is composed of an attribute type and a value. Subjects are serialized to
strings with attribute types in upper case (a DataONE convention), case is
preserved for all values. RDNs are separated by commas, and ordering is
preserved. Values must be converted to strings following the encoding rules in
section 2.4 of RFC4514. In summary, Subjects in DataONE are often represented as
Distinguished Names with the additional constraint that attribute types are in
This approach enables simple string comparison to provide accurate results within the DataONE infrastructure and services and is fully compatible with existing services that utilize Distinguished Names as defined in RFC4510.
In addition to named users, access policies can refer to several special symbolic groups of users that do not need to be explicitly enumerated, but define classes of people in the system. The reserved symbolic principals are:
Verified authenticated users
A user who has a valid authentication token and an ‘isVerified’ flag.
This designation is be used to ensure that users are in fact who they claim to be. These accounts have originated from trusted affiliate organizations or identity services or have been manually verified by an administrator. The identity information when logged during read operations should be fully trusted.
Represented using the special principal verifiedUser
Any user who has a valid authentication token is considered a member of the authenticated users group. This designation can be used in particular to require that user identity has been established, but not necessarily verified as accurate. Authenticated users may be restricted from certain [read] operations depending on the data owners’ policy regarding access for untrusted identities.
Represented using the special principal authenticatedUser
The Public user represents any user accessing services that does not have a valid session token, plus all of those who do have a valid token. If a token is found to be invalid, the user’s privileges are immediately lowered to those of the symbolic ‘public’ user. For create, update, and delete operations, this typically means that the user has insufficient privileges to access the service. At times providers may want to provide public read access to resources.
Represented using the special principal public
The DataONE Identity Management service provides individuals with the ability to register a DataONE user account with the system and to set information into their profile. This process creates a new identifier value for the user that uniquely identifies them in DataONE from other DataONE users. This identifier is critical because it associates the user with an authenticated session for use when requesting services and or data/metadata resources from the DataONE nodes.
The general application flow for a user to use DataONE services is to first log
into CILogon or ORCID with an identity of their choice, then to register that identity
with DataONE by calling the
CN_auth.registerAccount() service. An authorized
third party (such as a site manager) can then call
to verify that the real name, email address, and other biographical information
about the Person are correct. A user with more than one Identity can call
CN_auth.mapIdentity() to link those two identities together as equivalent
identities. Once this registration process is complete, future authentication
steps with CILogon will produce X.509 certificates that contain this
biographical and account information in the returned certficate, all of which
can be used by services to make authorization decisions.
Register an identity with the DataONE IdentityService. When a user attempts to use a given identity at DataONE, the user must first register the identity and provide biographical information including their real name, real email address, and other identifying attributes. Takes a Person description including principal, givenName, familyName, and email address as input (other elements from the Person description such as isMemberOf, equivalentIdentity, and verifiedBy are ignored during registration because these elements are populated by other services).
Verify that an Person is an accurate portrayal of the real-life name and identity of the named individual.
Create an equivalence mapping between the identities listed for the users authenticated and represented by session1 and session2.
Confirm an equivalence mapping between the identities listed for the users authenticated and represented by session1 and session2.
Get the information about a Person, their equivalent identities, and the Groups to which they belong.
Query for a matching set of users, groups, and systems.
Create a named group of users. Throws IdentifierNotUnique if the group name is already in use.
Add the listed array of members to the named group, if and only if the user represented in token originally created the group.
Remove the listed array of members from the named group, if and only if the user represented in token originally created the group or is an equivalent identity of the user who created the group.
Figure 1. Identity Service is used to register an existing identity with DataONE. In this example, the same user has two distinct pre-existing identities. We register the primary identity with DataONE. We then request that a secondary identity is mapped to the same Types.Subject. The user must then confirm this equivalence between the two identities.
Figure 2. Identity Service is used to register an existing identity with DataONE. In this example, the user has an identity affiliation that is not initially trusted. We register the identity with DataONE as unverified. An administrator needs to verify the Person details before the identity is considered fully verified. Some DataONE actions will be restricted until verification is completed.
Figure 3. Identity Service is used to manage groups. The group creator is initially the only user able to add and remove group members. List editing permissions must be granted for other group members to edit the group.
DataONE is working closely with the CILogon project to streamline and incorporate user identities that originate from academic and commercial institutions in the U.S. that are members of the InCommon federation or through more globally accessible identity providers like Google, Facebook, and Yahoo!. CILogon acts as an intermediary broker of “short-lived” identity assertions that are made by users verifying their identity through their home institution or identity provider service. These assertions are converted by CILogon into a longer-lived and more commonly recognized X.509 identity certificate, which can then be reused a number of times when interacting with DataONE. A benefit of adopting identity management through CILogon is that users who regularly identify themselves through their home institution or other identity service will now be able to access DataONE resources without yet another identity to manage.
The DataONE Authentication Service provides a set of services for validating the identity of users and services and then establishing limited duration sessions that are represented by an X.509 cryptographically-signed certificate or an an authentication API key. A single session is always associated with a single user. The Authentication Service uses various methods to validate the identity of a user in the system, and then produces SubjectInfo and potentially a session certificate in the form of a X.509 certificate that contains the relevant properties for that session.
The CILogon service supports authentication by redirecting authentication requests to a pre-approved list of Identity Providers associated with user’s home institutions. The main source of these Identity Providers are the institutions that are members of the InCommon federation. Users only need to authenticate with their home institution, thereby protecting user credentials by preventing 3rd party clients and services from handling those credentials, and rather only passing the credentials to the user’s trusted institutional provider.
In general, the user will initiate the request for a session from DataONE through either a dedicated DataONE desktop application or through a web-browser connected to a DataONE web server. After contacting CILogon, the user will be redirected to their institutional provider, which in turn will certify that the user successfully authenticated to CILogon. CILogon then will contact the DataONE Identity service to gather biographical attributes and additional identity attributes such as group memberships and equivalent identities, and produce an X.509 certificate containing these attributes and with a limited duration. This certificate represents a “valid session” via the digitally signed “authentication token” (see below) that is generated by DataONE upon authenticating the user by one of the above mechanisms. The certificate is then returned to the user for subsequent interactions with DataONE, and can be provided to services that need identity information information necessary to perform authorization processing.
DataONE web clients will likely use CILogon Portal Delegation (http://www.cilogon.org/portal-delegation) to manage user certificates (rather than the browser). The portal acts as a proxy for the user when interacting with underlying DataONE services that require authentication or authorization. Instead of direct browser-based certificate management, the portal requests and stores user certificates and opaquely presents them to DataONE. This does require development of an extra web application “layer” that provides user session/certificate management in conjunction with the defined DataONE services.
Obtaining a CILogon X.509 certificate requires that the user authenticate through the CILogon InCommon identity services as outlined in Figure 4.
A proposal for mapping existing KNB accounts is included at the bottom of Figure 4.
Figure 4. Detailed sequence of events for authentication through CILogon - Client authentication through the CILogon service; CILogon, using Shibboleth, requests a SAML authentication through a registered Identity Provider (IdP); the IdP confirms identity and returns SAML response to CILogon; client continues process, the portal delegate requests Certificate from CILogon; CILogon generates X509 certificate and returns it to portal for use with DataONE.
The CILogon X.509 certificate provides a portable credential that binds a user’s public key to their distinguished name or another significant identifier (e.g., email) that is stored in the “subject” field of the certificate. Once generated, the CILogon X.509 certificate has a specified span of time in which it is considered valid; this information is stored in the “valid not before” and “valid not after” fields of the certificate.
Processing the CILogon X.509 certificate requires a verification exchange between the service provider (DataONE) and the external user. Upon receiving a service request from the user, the DataONE service provider will first determine that the user’s CILogon X.509 certificate sent with the request is valid (i.e., verify issuer signature and confirm valid date span) and then use the attributes in the certificate to make authorization decisions regarding the request.
Figure 5. Authentication and session management assuming that the CN only runs an Identity service, and that the CILogon server runs the session management service as part of the authentication process.
Some clients have difficulty handling X.509 certificates, and so the DataONE V2 API also supports the use of access tokens as a mechanism for clients to identifiy themselves for API calls. In this scenario, the user authenticates with the DataONE Portal, and then obtains an access token directly from the DataONE Portal. The client then uses this information to pass access tokens to service providers using an HTTP “Authorization” header. The service provider (e.g., MN), must then validate the access token to ensure that the request is coming from a host with a valid token and that the request is not being repeated via session hijacking.
DataONE considered different scenarios for the structure and meaning of these access tokens, largely following the OAuth1.0 and OAuth2.0 specifications. These two specification propose related but different mechanisms for generating and utilizing access tokens. Decision making fell along two axes: 1) what does an acces token represent, and 2) how does a service provider validate an access token. The main choices were, in order of both increasing security and increasing complexity, are:
HMAC tokens with shared symmetric keys
RSA-SHA1 tokens using RSA private/public keypairs
We ultimately decided to use the simpler Bearer Token approach outlined below:
Bearer tokens are unique string values issued by an authentication server that show that a client is authorized for access; anyone holding the bearer token can use it to gain access to a service. Service providers validate the token via either a call to the central authorization service or cryptographically using the public key that corresponds to the private key with which the authentication service signed the token. These tokens must be used with TLS, and care must be taken to not leak the tokens via e.g. logs, client storage locations, or via MITM attacks.
Pros: simple to implement on client, passed directly in Authorization header
Cons: Passed with every request, simple to capture, can use to steal session via replay attacks, may require centralized validation of tokens, requires TLS.
Figure 6. Scenarios for using OAuth access tokens in HTTP Authorization headers as a mechanism to establish client identity and authorization. The user authenticates with the DataONE Portal to establish their identity, and the portal issues a client-specific token. For Bearer Tokens, the Portal issues a signed, reusable access token that is sent to service providers and verified using an asymmetric public key.
When clients wish to use an authentication token, they will include the token value in the Authorization request header:
Authorization: Bearer <token_value>
This is consistent with the OAuth 2 specification that states how Bearer tokens should be transmitted to the service. Service providers should inspect the request for this token and use it when a client x509 certificate is not present.
When services wish to verify the JSON Web Token (JWT) , they should use the public certificate of the token issuing service (i.e., the CN’s public server certificate). A utility is included in the d1_portal project with example code below.
Because we wish to support many different identity providers, there is no single “login” method. Different identity providers will inherently utilize different methods for having users prove their identity. Initially, DataONE will target these avenues for authentication:
CILogon - Users can continue to authenticate using CILogon identities much like they do today. The portal keeps track of their successful authnetication from whichever IdP they choose, and will issue an authentication token for that user
To begin the CILogon authorization process in the CN portal, users can navigate to:
ORCID - Using a similar workflow as with CILogon, users can opt to authenticate using their ORCID accounts using OAuth 2.0. After successfully authenticating with the ORCID identity provider, an authentication token can be retrieved from the portal. To begin the ORCID authorization process in the CN portal, users can navigate to:
LDAP - Existing Ecoinformatics account holders can opt to continue using those identites to authenticate with the DataONE portal. This is a more direct method of authentication and is meant to bridge the gap between our legacy account system and newer SSO options available above. To authenticate using LDAP, users can post to:
The following parameters are requred (*) or supported:
username* - the full LDAP DN
password* - the user's LDAP password
target - any URI to be redirected to upon successful authentiction
In all cases, users will retrieve an authentication token from the CN Portal after they have successfully authenticated with their IdP of choice and been redirected back to the CN Portal. The auth token endpoint is:
Note that the same browser session used to authenticate should be used to retrieve the authentication token.
For DataONE, identity management and verification is only the first step in ensuring system-wide security. Many service calls within DataONE will require authentication of the caller to create an authenticated session with a limited duration for access to DataONE services. The process of authentication for most users will begin with identity verification and downloading of the X.509 certificate from CILogon. This download will often happen from within a local desktop DataONE application, which is acting on behalf of the user and can then use the certificate to represent the authenticated session when it makes requests to DataONE service providers. Both the desktop application and the DataONE service provider can verify (1) that the certificate originated from CILogon and (2) that the owner of the certificate, the user, is the actual party requesting authentication with DataONE (user identity verification is performed as prerequisite of the certificate).
Passed from DataONE system to DataONE system, such as making requests from a client application to a Member Node, the certificate is a reference to an authenticated session that contains all the necessary information identifying the user of the original service call and other attributes used to determine authorization in the DataONE system. The certificate itself will have a short “time to live”, thereby limiting the duration of malicious activity if a rogue application or user were to intercept the certificate. The certificate will also have limited applicability, in that it will be intended to be used from a particular host location on the internet, and have other restrictions that prevent it from being broadly used as a surrogate for the user.
Services internal to the DataONE system may operate autonomously to perform maintenance tasks or other asynchronous activities that are not bound to a particular user. In these cases, a certificate will still be generated, but without the prerequisite identity verification. Such certificates will have a special system identity that signifies it is a “trusted” principal of the DataONE system. For most instances, this certificate will serve identically to one generated during the authentication process of normal user.
For web clients, we can use the CILogon portal delegation approach. Note that the CN and portal are assumed to be on the same server.
Figure 7. Authenticated “read” going through CN. The browser is the client with no certificate. The portal keeps the user’s client certificate. The CN looks up the client certificate using the client cookie. The CN includes the client certificate in the request that is sent to Metacat. Object is returned to the client as though it was retrieved directly with a certificate.
This is a deprecated scenario that describes the use of a separate Session Service.
Types.AuthToken is a unique identifier that is affiliated
with and specifies the authentication session associated with a particular
request. DataONE AuthToken References are UUID values that are created by the
DataONE Session Service when a client requests that a session be established. A
client requests that a session be established from a particular Internet
Protocol address, and all service requests associated with that session MUST
originate from that Internet Protocol address.
The DataONE AuthToken reference is a unique identifier that references a session that has been established for the purposes of interacting with particular DataONE service providers. DataONE AuthTokens are generally passed in the header of an HTTP request to a service, thereby supporting clients that utilize authentication and those that do not, as well as Member Nodes that support authentication and those that don’t. Any Member Nodes or clients that do not support authentication and access control will simply ignore the presence of the AuthToken in the HTTP header information if one is present.
The DataONE HTTP header containing the AuthToken has the name ‘x-AuthToken’ and contains an identifier value that is a UUID URN; for example, one might send the header:
This session reference is used to indicate the session that should be used for requests, and has limited duration based on the session expiration time. AuthToken references refer to sessions that have limited duration and other constraints on their validity, and these constraints MUST be validated by service providers.
If a Node or other data one service provider receives a service request with the
DataONE x-AuthToken header, then the service SHOULD retrieve the associated
SAML.Assertion data in order to confirm that the client has
appropriately authenticated with the DataONE session service. If the service
needs to make authorization decisions, the service MUST validate the the
associated session data, check validity constraints on the session, and then
proceed to make authorization decisions.
While making authorization decisions, the service should apply any AccessPolicy rules that reference the identifier for the Principal, any identifier in the ‘equivalentIdentity’ attributes in the session, any groups that are referenced in an ‘isMemberOf’ attribute in the session, and any polices that reference the DataONE ‘AuthenticatedUser’ or ‘VerifiedUser’ identities. All of these identities are valid identities for the authenticated session.
If a Member Node or Coordinating Node receives an AuthToken that is invalid, can
not be found using the
CN_auth.getAuthSession() method, or is determined
to not be satisfying the constraints of the session (such as wrong source IP
Address), then the service MUST return an
If a Member Node or Coordinating Node receives a service request in which there
is no x-AuthToken header, or if the header is empty, then the request should be
considered to be validated as the DataONE ‘Public’ user. This user may be denied
access to certain services as determined by appropriate access policies, or it
may be granted access to services when appropriate (e.g., to perform a
MN_read.get() operation on a data set marked for Public read access).
Metadata about authenticated sessions are represented as a
SAML.Assertion. Details of the fields to be included in an
SAML.Assertion include Subject, Address, givenName, sn, mail,
equivalentIdentity, and group membership, among other fields. These fields are
all mapped to SAML2 Assertion elements, as illustrated in the following example
of an authenticated session represented by a SAML Assertion. Note that these
SAML Assertion messages are returned when Member Nodes and Coordinating Nodes
make calls to
SPProvidedID="CN=Some User,O=University One,C=US">
<saml:SubjectConfirmationData Address="10.0.45.21" />
<!-- Note: One might also use X509 certs to authenticate, in which case the
context class would be:
/DC=org/DC=cilogon/C=US/O=ProtectNetwork/CN=Matthew Jones A332
Figure 8. Authentication and session management assuming that the CN runs a seperate SessionService that creates and tracks sessions. This is an alternative scenario based on the idea that CILogon may not be able to make calls to the Identity Service, in which case the separate Session Service would need to be created.