ACLs and permissions¶

Permissions are implemented in a CRUD fashion. They are handled by an instance configuration value and with the _access field present in every record. The first one is set at collection level, whereas the second is the first part of every document schema that looks like:

{
  "_access": {
    "type": "object",
    "properties": {
      "owner":{
        "type": "keyword"
      },
      "read": {
        "type": "keyword"
      },
      "update": {
        "type": "keyword"
      },
      "delete": {
        "type": "keyword"
      }
    }
  }
}

Client authentication is done via API key, a token, that is obtained in the Web UI at https://<host:port>/account/settings/applications/ under applications in the Personal access token section.

Warning

Please store it in a safe place. After you have saved it, it will not be shown again.

Single document access¶

Read, Update and Delete follow the same pattern. The client has to be authenticated and to be allowed to perform the corresponding action (e.g. update) on the document. This means that one of its egroups has to be present in the corresponding _access field. For example for the following document:

"_access": {
  "delete": ["egroup-two@cern.ch"], 
  "owner": ["egroup-three@cern.ch"], 
  "read": ["egroup-one@cern.ch", "egroup-two@cern.ch"], 
  "update": ["egroup-one@cern.ch", "egroup-two@cern.ch"]
},

A user belonging to the egroup egroup-one@cern.ch could only perform a read (GET) or an update (PUT or PATCH) over the document. However, it cannot delete it. A DELETE operation can be done by users of the egroup-two@cern.ch egroup. In addition, users of egroup-three@cern.ch are owners of the document, meaning that they can perform all the possible operations over it.

For these operations an access token needs to be sent in the headers as Authorization:Bearer <ACCESS_TOKEN>. For example:

curl -X GET -H 'Content-Type: application/json' -H 'Accept: application/json' \
    -H 'Authorization:Bearer <ACCESS_TOKEN>' -i 'https://<host:port>/api/record/5'

Search and filtering¶

Search is not treated as a normal read. In this case, along with the access token, the egroups that are allowed to read are required and send in the access parameter. First, the system checks that the client is authenticated, meaning that the token is valid. After it passes the egroups through the cern_filter. This filter applies the following rules:

public | read_restricted | write_restricted | delete_restricted | owner_restricted

A document is consider public if the _access/read field is not present, therefore you do not have to add it when submitting public documents. The _access field would look like:

"_access": {
  "delete": ["egroup-two@cern.ch"], 
  "owner": ["egroup-three@cern.ch"],  
  "update": ["egroup-one@cern.ch", "egroup-two@cern.ch"]
},

This way we can ensure that you get full access to all your documents. You can add an egroup you are in to the _access\_owner field or you can use the superuser account.

Moreover, you can give access to third parties. To achieve this you will use your token as authentication and then send their egroups and the response will contain only those documents which pass the filter for the given egroups.

The first method (_owner field / superuser account) is aimed to users who want to search on their own documents, while the second method is aimed towards services that want to provide search capabilities to their parties.

Info

We have put egroups in the filters, but it can also work with mere strings that state levels (e.g. public and private). You can think of this as a mere string exact match. If you wish to not use egroups, please specify so when requesting the instance.

Why is this done this way?

Imagine the following scenario: You own a system in which you store document from users and want to provide search capabilities over them. You will send/add the documents to your search instance using your token. Then you will query and the instance will return you the relevant documents according to the requested size and page. This would work if the documents are public. However, how would we handle those cases when the document has some, for example, read restrictions. There are some options:

You give access to your search instance to every single user that wants to make a query and they have to specify their access token when doing so (in your platform). This is an inconvenience for existing platforms (changing the way they are developed) plus gives room for many security breaches.
We can return the most relevant documents for the search, but what if those are restricted for that user. For example, user A queries for test text and the 10 most relevant results are returned. However from those 10, the user does not have access to 8 of them. Then you would have to query for the next 8, and what if again there are some that the user cannot access?, what if this happens for the 1 million most relevant documents?. As it can be seen, this is not optimal, it requires many queries and the development of algorithms to deal with it.
The solution is to send the access rights of that user in the access parameter and allow the search instance to take care of all these for you. You can, therefore, understand access control as a filtering operation.

An example of a search query for a third party user (still performed by you, keep your token safe or generate new ones for the users) is:

curl -k -X GET -H 'Content-Type: application/json' -H 'Accept: application/json' \
    -H 'Authorization:Bearer <ACCESS_TOKEN>' \
    -i 'https://<host:port>/api/records/?access=egroup-one,egroup-two'

This access parameter can also be used for the operations mentioned in the "Single document access" section. However, it is optional. This means that if the parameter is not set, the access rights will work as described before. On the other hand, if it is set, those operations will use the egroups sent in the parameter for access rights. Making those operations work as a "search" (in terms of access rights).

Summary table¶

Field \ Action	LIST	GET	EDIT	DELETE
Read	X	X
Update	X	X	X
Delete	X	X	X	X
Doc Owner	X	X	X	X

Document creation¶

Document creation is also treated in a different fashion. It also needs the user to be authenticated, but in this case the ownership/permissions are not check at document level but are configurable through an environment variable.

Defined standards¶

We provide automatic authorization extraction from exchanged tokens if you adhere to the ACL standards - for the search endpoint only (/api/records).

Currently we support automatic mapping of user (email, upn and cern id) and groups. In order for the service to match them you must either set the user prefix or the group prefix, as shown in the following example:

{
  "_access": {
    "read": [
      "group:my-group", 
      "user:upn",
      "user:cern-id",
      "user:e-mail@cern.ch"
    ],
    ...
  },
  "_data": { ... }
}

Additionally, users can keep simple text (without prefixes) in order to manage access via the _access parameter.

Other automatic ACLS can be the following roles:

role:search-admin (has admin privileges in the instance, you might want to add this role to all access fields for easy access)
role:search-user (any user, any loa - requires authentication).

And upon request (minimum levels of assurance):

role:social-account
role:verified-external
role:hep-trusted
role:edugain-with-sirtifi
role:cern

In other words a user who matches role:cern will also match role:social-account, role:verified-external ... and so on.

Limitations:

We currently don't support automatic mapping to email aliases, therefore they should be avoided.
During ACL extraction all tokens are lowercased in order to avoid issues with capitalization. To guarantee a match chose between either setting all acls in lowercase or set up a normalizer.