TD.1.2.A Security Design for the IPF

Marcus Roberts, George Coulouris, Jean Dollimore

Status of this document

Final Version of Deliverable TD.1.2.A Security Design for the IPF

Last Modified: June 19th 1998 – Marcus Roberts

Introduction

PerDiS deliverable TD1.1.A proposed a set of security design goals for PerDiS and gave a rationale for them. It also defined requirements inferred from the Virtual Enterprise model for organisations engaging in cooperative work. It then proposed a trust model for a virtual enterprise sharing PerDiS memory, the position of security mechanisms in the PerDiS architecture, the design of an access control system and mechanisms for securing the communication between PerDiS components.

This document takes these models and designs as a basis for a design for the implementation of security for the IPF.

1. Executive Summary

The PerDiS security model is centred on the concept of Role Based Access Control. Users’ activities are defined as taking place inside tasks. A task is a group activity where the group members are working towards a common goal. An aim of PerDiS is to allow data sharing between people working for different organisations, where this grouping of people and organisations forms a Virtual Enterprise.

Rather than explicitly allocating access rights to individuals, a role based access control system defines the roles that people may perform in a task. In PerDiS, there is also the notion of object categories. Rather than assigning access rights to objects individually, the creator of a task defines categories that objects may belong to. Access control is based on the access rights afforded to a role for a particular category. The collection of roles that form a task, and the access rights assigned, may be defined in terms of a general task, for example the task of setting an exam. In PerDiS, such a definition is contained in a task template.

A task template is used in the instantiation of an actual task, for example "setting the maths exam". When a new task is created, a corresponding task object is created that stores information about the task. Initially this task object will contain the definitions of object categories and roles’ access rights as defined in the task template.

The person acting as task manager is given responsibility for assigning users to roles. This is done by the creation of a RoleInTask certificate. The certificate is simply a statement, digitally signed by the task manager, stating that User U has been assigned to role R in task T. Users may (if allowed) delegate their roles to other users through the use of delegation certificates, similar in form to RoleInTask certificates. These certificates are stored in the task object.

As well as providing access control, PerDiS provides for secure communication between PerDiS Daemons (PDs). Through the use of public and private key cryptography PerDiS can ensure the integrity of message contents, authenticate the origin of messages, authenticate users’ identities, and if required ensure the secrecy of message content.

The main components of the PerDiS security architecture are shown in Figure 1 below:

 

The Access Control Manager applies Role Based access control to requests to retrieve or update data sent to the PD.

The Secure Communications Manager uses cryptographic techniques to provide for secure communication between PDs.

The Template Tool and Task Management Tool are user tools that allow task managers to define task templates and manage instances of tasks. More information on these tools may be found in [tools].

The Security Shell is a user application from within which PerDiS applications are executed. The security shell is described in more detail later.

The certificate database refers to the collection of certificate servers that provide access to public key certificates. These certificates allow for the verification of a user's claimed public and private key pair.
Access control is applied in two places - at the boundary of communication between ULL and PD, and at the boundary of communication between PD and PD.  Access control at the ULL/PD boundary is at cluster granularity.  A request to access information inside a cluster is immediately either denied or granted (although later requests may be rejected if the user attempts to exceed their access rights).   Access control at the PD/PD (inter-cache) boundary is applied at the request level - i.e. every request must be evaluated, although the results of previous evaluations of requests from the same source may be stored and used to speed up the process.

Communication between a ULL and PD is not secured, as it is expected to occur between processes on the same host or on the same LAN.  Communication between PDs is secured on a configurable basis.  In particular, the notion of trust domains allows for weaker security but more efficient communication between nodes within the same administration domain, and for the use of stronger protection on communication between nodes over open networks.

A 'lazy' cryptography protocol has been designed to minimise the use of public key cryptography.  When communication occurs within a trust domain, the shared domain key will automatically be used.  When communication occurs between two PDs in different domains, the lazy approach ensures that public key cryptography is only used where necessary, and then only to establish a shared session key.

The credentials signed by the task manager to allocate roles to users will be stored just as any other data in the PerDiS system.  When evaluating a request, the source of credentials is configurable, depending on the sensitivity of information in the cluster.  The PD may retrieve the credentials itself from the task object (and the user may provide hints as to which credentials apply to speed up the process), which allows for immediate revocation of credentials.

The security attributes for a cluster will be held in a separate security cluster, with there being a corresponding security cluster for each application cluster.  Application clusters and security clusters are the same except that an application cluster contains a reference to its security cluster in its meta-information whilst a security cluster refers to itself.  Modification of the security cluster is controlled by the special attribute listing the principals who may modify it.  Altering this list requires a submission to the home site for re-signing of the information contained in it.

Access control and secure communication are integrated into the PerDiS platform at the stub interface level.  This allows for the encryption and decryption of information to occur transparently to the rest of the platform.  Access control is applied to requests as they arrive at the PD, allowing the security manager to reject them before they are seen by the rest of the platform.  However, in the case of requests for data from a PD, because of the ability of requests to migrate independently of the usual request path, these requests are not checked for security until the cache is about to execute them.  This requires the cache to be able to 'roll back' the release of a lock in case of a refusal, but otherwise minimises the impact of security on the programming and design of the cache.

2. Threats to Security

Before describing the design for an implementation of the security model it is useful to describe the threats the model is designed to protect against.

Open Networks

As a distributed system PerDiS requires communication to occur between a number of system components, the majority of which will be executing on different host systems. Communication between these components will occur over a variety of networks such as local area networks for data sharing within a single building; or increasingly over Metropolitan or Wide Area networks for physically distributed organisations or cooperative work involving several organisations.

Whilst intra-organisation data communication may be protected through the use of private intra-nets, this is not a generally satisfactory solution for the working environments envisaged for PerDiS. Establishing an intranet, particularly for a single cooperative venture, is complex and expensive. Also, whilst several organisations may work together on one task, they remain competitors. Any such intranet would need to be very carefully configured to ensure that external organisations were not granted access to resources external to the joint venture. In many cases, such as cooperation between organisations already using public networks such as the Internet, the continued use of such networks is a cheaper and less complex option. In either case, we should be aware that unless physical access to the network can be controlled, eavesdropping on and tampering with data in transit cannot be avoided without the use of extra security features.

Technologies do already exist for the secure transmission of information over open networks, for example PGP. PerDiS differs in its aim to provide a transparent security service, whereby communication is secured automatically as necessary. For example, a feature of the PerDiS security model described later is one of security domains, where different strengths of protection can be applied automatically depending on the perceived level of danger.

Virtual Enterprise

The PerDiS security model is aimed at supporting cooperative work between organisations forming a virtual enterprise. As has already been noted, these organisations may be working together on one task, but remain competitors on others. For this reason standard security methods are not adequate. Allocating users outside of an organisation user accounts within an organisation may grant them access (either directly or indirectly) to information to which they should not have access.

In addition, existing security models are usually applicable only within a particular a single ‘protection domain’. A cooperative enterprise should allow all aspects of work to be shared cooperatively, including administration. However, it is also important to be able to differentiate between the responsibilities of the virtual organisation and the real organisations. For example, particular organisations may have a legal responsibility for data they share with others within the virtual organisations. For this reason PerDiS includes the notion of home sites, so that an organisation can maintain complete control over a set of data if necessary.

Again, PerDiS intends to provide an integrated security service, where security is provided automatically as necessary. Whilst users may choose to use technologies such as PGP to secure data, the users may not be the owners of the data. PerDiS allows the owner of the data to specify the security required.

3. Components of the security design

The following sections describe the various components making up the PerDiS security architecture, and contain some discussions on the problems to be solved, and possible solutions that could be used.

3.1 Cluster security attributes

A key component of the PerDiS security model is the protection of data at a cluster level, through the use of secure communications and role based access control. Each cluster has associated with it a collection of security attributes; the Access Control List (ACL) for the cluster specifying who can access the data in the cluster, whether the cluster should be encrypted, and other security information. The security attributes also specify who can modify the security attributes themselves. The ACL list for security attributes differs from that of the data stored in the cluster; having rights to edit the data doesn't automatically confirm the right to allow others to edit the data.

One of the most important points to note is that we must be able to authenticate the validity of the security attributes. Unless the security attributes are obtained directly from the home site of the cluster, we must be able to verify that the security attributes supplied were generated or modified only by a principal entitled to do so.

Signing the attributes allows us to determine which principal last modified them. However, we must determine if this principal had the rights to modify them. As the attributes themselves specify which principals have this right, a plausible attack is to add a fraudulent principal to the principals specified in the attributes, and re-sign the information with the new principal's key. Any later recipient of such re-signed security information would be unable to detect the fraudulent alteration. The next section describes how this may be avoided.

Security Attributes

Unless the storage location of security attributes can be determined independently, either the security attributes or a reference to them must be contained inside the cluster. Each cluster has a home site that is ultimately responsible for the persistent storage and protection of the contents of the cluster. The identity of the home site can be simply derived from the URL identifying the cluster, where the URL takes the form

pds://home.site/cluster_id

As the home site has ultimate responsibility for the cluster, we grant it ultimate responsibility for allocating access rights for the data and the security attributes of a cluster.

The starting point for validating security information is thus the signature of the home site. (A home site is a principal, and as such has a public/private key pair. Signing by a home site may be performed automatically or manually by the 'super user' of the site.)

The next sections describe how access control may be applied on the access control data itself.

Creating and maintaining security attributes

When creating the cluster the user may specify a category from a task template or task object. In this case, the ACL of the cluster is defined accordingly. If no category is specified, the creating user may choose to be given Read or WriteRead access to the cluster; alternatively the ACL may be left empty.

When a PD receives security information for a cluster from another PD (other than the home site) it must ensure that the information it has been passed is valid. It can validate the security information as follows:
  1. Determine the identity of the home site from the URL
  2. Verify the list signature [1] was made by the home site
  3. If the list is valid, check that the home site or a principal specified in the list created the content signature [2].
  4. If not, reject the security information
  5. Else authenticate the signature accordingly.
Having verified the validity of the security attributes, the recipient PD can use the information to evaluate the request for the data cluster.

If a request to update the security information is received, the authenticates the principal behind the request and then allows it only if that principal is in the list of principals allowed to alter the ACL.

Any principal who has been designated security information editing rights may wish to delegate those rights to other principals. They may do this by signing certificates delegating this authority much in the same way as roles may be delegated.

To retain the flexibility of the role based access control for controlling access to security attributes as well as data, principals listed as having authority to modify security attributes may be specified as named principals or as roles.

Storing a cluster's security attributes

The security model defines the cluster as the unit of protection for data. It follows that any access control rules should be applicable at the cluster level. Storage of security rights in the same cluster as application data is therefore not possible, because the access control applicable to security attributes and application data is different. If security attributes were stored in the same cluster as application data, we would have two applicable security policies within the same cluster.

Similarly if the security information for more than one cluster is stored in a cluster, if the sets of security information stored have different editing rights, different access control policies will apply within the same cluster.

The security design proposes that the security attributes for a data cluster are held in a separate security cluster, and that there is a unitary relationship between data clusters and security clusters.

Propagating changes to the security attributes

Although changes to security attributes are expected to occur only infrequently, it is important that the changes are propagated to any PD performing access control using the attributes.

As security attributes are stored in a cluster in the same way as any other data, a daemon may access the information using the DSM model provided by PerDiS. When the attributes are modified, any copies of the attributes in existence are invalidated through the DSM, allowing new copies to be propagated to the appropriate PDs.

3.2 Principal Credentials

Two pieces of information are required to perform access control on a request. The access control list has been described previously. In this section we discuss credentials - an association between a role in a task and a named principal. The validity of this association is demonstrated by attaching the digital signature of the task manager to the credential. We generally think of credentials as being certificates, although this may not be the case.

As a user's credentials are needed for the access control process, we have two options for providing them. Either the user can provide the credentials along with their request (through message passing) or the PD assessing the request can obtain the information using the DSM mechanism provided by PerDiS itself.

A major motivating factor in the design for credentials is the need for an ability to revoke credentials. If a user is no longer participating in a task, or they believe their security has been compromised, there must be a method for removing the association between the user and roles in the task. The credential certificates created by the task manager are stored in the task object. This is a normal PerDiS object that may be accessed using the DSM mechanism.

One method of handling credentials requires the user to take responsibility for managing their credentials. When a user wishes to run a PerDiS based application, they start their security shell. Having specified the task they wish to work in, their shell obtains a copy of their credential certificates from the relevant task object. When an application is started, the security information collected by the security shell is passed to the application's ULL. The ULL passes the credentials along with any requests it makes to the local PD. The credentials are passed with the request to the PD performing the access control.

With this method, timely revocation of a user's credentials is difficult. The task manager can revoke a credential by removing the certificate from the task object. However, if the user still has a copy of the credential they can continue to present it and play the role specified until the certificate expires. As certificates are issued manually, the expiry time is likely to be large to avoid the need for the task manager to frequently re-sign them. If a certificate's life span is for days or weeks, then the revoked user will continue to have access for a significantly long time.

Rather than expecting the user to present their credentials, we could use DSM to share the credential information between PDs, in much the same way as security attributes for clusters are shared. Now when a request is received, the PD is responsible for obtaining the user’s credentials from the task object.

Such an approach allows immediate revocation of credentials. As soon as the task manager removed the credential from the task object, all other PDs would be made aware of the change.

Such an approach requires more work to be done by the PD evaluating the request. In the first method the PD must simply evaluate the credentials presented by the requestor. In the second method, the PD must do the work of locating the appropriate credentials. For this reason, the user will still initally obtain their credentials from the task object through their security shell. This allows the user to select the role they wish to play in the task (if they are playing more than one.) This is advantageous and perhaps necessary to prevent a user accessing or manipulating data accidentally if they have greater access rights in a different role to the one they think they are playing. For example, a user who plays the role of a database administrator may wish to access the database in a role with fewer privileges to avoid accidentally altering data.

Now when a PD receives a request it will receive information about the role and task the user claims to be playing. This allows the PD to obtain the necessary credentials much more efficiently from the task object.

3.3 Secure Communications

In an open network environment (both within and between organisations) the following security techniques should be used whenever secure communication is required:
 
Signed messages
  • To ensure the integrity of message contents  
  • to authenticate the origin of messages  
  • to authenticate the identity of the user responsible for a request, which is essential for the protection of objects.  
Encrypted messages To ensure the secrecy of message content
Nonces or timestamps To ensure freshness of messages
The cryptographic basis for these techniques is well established. In general, no shared secret key is available when communication commences, so it is convenient to use public key cryptography. But its computational cost (whether it is used for signing or encryption) is known to be about 100 times greater than secret key cryptography. For this reason, the use of public key cryptography is usually restricted to the negotiation of secure communication channels based on secret keys.

Trust Domains

Islands of trust arise in the Internet because trust in computer systems and their software is largely based on confidence in the local administrators of systems and their managers. Each region of local trust often corresponds to a corporate Intranet.

Cooperative working and object sharing extends across the boundaries of local trust domains. PerDiS aims to support several companies collaborating in an engineering or construction project, where this cooperation corresponds to a ‘virtual enterprise’. When this occurs, there is a need for authenticated and secure communication both within local trust domains and more widely between trust domains.

A local trust domain is simply a set of network nodes between which sufficient trust exists to justify the use of a single shared secret key to protect communication between them (for signing and for encryption). This level of trust is likely to be based on knowledge of hardware and software environments and the system management policies applied to the computers in the domain. For example, most companies and organisations assume that the hardware, operating systems and middleware are managed and validated by a system manager who takes precautions that are sufficient to provide an acceptable level of protection against penetration by impostors or the introduction of Trojan horses by external attackers.

When cooperation extends across the boundary of a local domain (e.g. outside of a single organisation) it is more difficult to establish a trust domain. We may have trust in the specific individuals with whom we wish to cooperate, but it is often impractical to establish trust in the entire system environment and management policies of another organisation. It is more appropriate to base trust on the public keys of the individuals concerned.

For the security design for the IPF for secure communication between PDs

The secret key shared between the nodes in a local domain is changed relatively infrequently so that key negotiation costs are factored across all the communication occurring within a domain.

For communication between trust domains there will usually be two principals involved – the principal who requested the lock or data, and the principal who gives the PD the authority to provide locks or data, i.e. the last principal to update the data. As it is likely that many messages will be exchanged between the two PDs representing the principals (for example, the locking of several pages within the same cluster) it is undesirable to only use public-key based cryptography. For this reason, upon first contact between the two PDs when representing the pair of principals, public-key cryptography is used to negotiate a shared secret key. Future communication between the principals via the same PDs will use this shared key. This is more expensive, so we must restrict its use strictly to those cases where it is really needed.

Migration and key negotiation

In systems such as PerDiS where migration can occur the cost of maintaining secure communication is an important issue. Migration can lead to the establishment of more secure channels than are strictly necessary unless care is taken.

In PerDiS, when data migrates its location and hence the trust domain in which it is currently located may not be known by the requesting node. In particular, the following patterns of communication may occur:

or In all cases, when the message eventually reaches the location of the target data, direct communication is established with the original sender for the reply and any subsequent interactions.

When one of the patterns of communication described above occurs, the following problem arises: what keys should be used to authenticate requests? For destinations in the local trust domain, we would prefer to use the already established secret key shared between PDs in that domain. For destinations outside of the local trust domain, we must set up a secure channel, negotiating a shared session key based on the public keys of the principals at each end. The latter is costly and we should avoid any scheme that involves setting up secure channel unnecessarily.

The design for the IPF resolves the problem as follows:

For authentication and (optional) decryption of the initial request and the return of results, the remainder of the protocol depends on the location of the target PD. Protocol

The protocol is described in the tables below, showing the states in the protocol. The following notation is used:

[M]Kdomain Message M signed with the shared secret key for the sender’s local trust domain

{D}KP1P2 Message D encrypted with the secret key shared between principals P1 and P2

The protocol shows how credential information may be passed along with the protocol messages. However, passing credentials in this way is optional – they may be shared using shared memory as described in Section 3.2.

In all the cases below, a nonce is used to protect against a replay attack, where a request is eavesdropped and then sent to the node again at a later time. Usually, the recipient of the request will carry out the checking of the nonce. Here this can be used to immediately reject a replayed request. However, because a resource can migrate, it may migrate away from the node where its nonce history has been constructed. If this can occur, it is necessary for the sender of a request to maintain a nonce history, and for a recipient of a request to return to the nonce to the sender for verification (for example step 2 and 3 in the two cases below). However, if nonce histories do migrate with the resource, there is no need for the sender to maintain a nonce history, or for the recipient to return the nonce to the sender for validation.

Communication within a trust domain
 
 
  Header Message Notes
1. A à B [request, A, P, N] Kdomain A sends a request to B, specifying the request, source A, principal P and nonce N, signed in the shared key. 

Knowing Kshared, B can validate the request originates from a trusted node. The nonce is checked to avoid replay attack

2. B à A P, N B requests the P’s credentials from A. B also sends the nonce for verification.
3. A à B CP + [N]Kdomian A returns the credentials requested to B.  

Upon receiving the signed nonce, B knows A has authenticated the request.

4. B à A {D}Kdomain B sends a message containing the data requested to A. 

The message is encrypted with the shared key so that A can be sure the data has come from a trusted node, and so that B can be sure it is received by a trusted node.

 

The node receiving the request may always trust the requesting node to have applied appropriate credential checks. This exchange may also be used between trusting nodes for requests to the same resource, where credentials have been supplied previously using the above mechanism.
 
 
  Header Message Notes
1. A à B: [request, A, P, N] Kdomain A sends a request to B, specifying the request, source A, principal P and nonce N, signed in the shared key. 

B can validate the nonce locally, but the request may be a replay after the object has migrated. 

Knowing Kshared, B can validate the request originates from a trusted node. 

2. B à A N B requests A to validate the nonce.
3. A à [N]Kdomain Upon receiving the signed nonce, B knows A has authenticated the request.
4. B à A: {D}Kdomain B sends a message containing the data requested to A. 

The message is encrypted with the shared key so that A can be sure the data has come from a trusted node, and so that B can be sure it is received by a trusted node.

Inter-domain communication
 
 
  Header Message Notes
1. A à B: [request, A, P1, N]Kdomain A sends a request to B, specifying the request, source A, principal P and nonce N, signed in the domain key. 

B does not share the domain key, so cannot validate the request. Neither does B yet believe the authenticity of principal P.

2. B à A: establish KP1P2 Using a public-key authentication protocol, A and B authenticate the identity of P1 and P2, and establish a session key. There are several methods for achieving this – see the SSL Handshake protocol for further details.
3. B à A [request, A, P1, N]Kdomain, CredentialsP2 B asks A for a new version of the request signed with the new key, sending the original signed request and the credentials of the current holder of the object. 

A (optionally) checks the returned nonce.

4. Aà B [request, N]P1P2 +CredentialsP1 A returns the original request +nonce signed with the session key, so validating the request to B. 

B now knows the request is valid, and is made on behalf of principal P1. B evaluates P1’s credentials to check if they may access the resource

5. B à A {data}P1P2 B sends the data to A, encrypted in the established session key. 

A knows the data has come from the principal at B associated with the session key.

In this protocol, the receiver B returns the entire message 1 to A, which signs them with the shared key before returning them. Returning the entire message allows A to check the signature and the nonce before returning the same message contents signed in the new shared key. This avoids the following attack, where E intercepts communications between A and B, where A and B are in different trust domains (Figure X).

  1. A sends a valid request which is intercepted by E.
  2. E formulates a new request, using the nonce supplied by A, and sends it to B
  3. B returns the nonce to A (via E). A sees a valid nonce and signs it.
  4. A returns the signed nonce to B (via E if necessary) validating E’s improper request.
  By including the request with the nonce, A can ensure the request is valid, and this request is returned signed to B. Hence E’s improper request can be detected and rejected.

Subsequent requests for that resource from the same principal to the same node

The first message is sent to an unknown destination, as before, but the following protocol applies if it eventually arrives at the same destination as a previous request.
 

   
  Header Message Notes
1. A à B: [request, A, P1, N]Kdomain A sends a request to B, specifying the request, source A, principal P and nonce N, signed in the domain key.  B does not share the domain key, so cannot validate the request. Neither does B yet believe the authenticity of principal P.
2. B à A [request, A, P1, N]Kdomain B sends the request and nonce to A, to allow A to authenticate the request.  A already knows the principal associated with the shared key has the right to own the resource, as verified previously.
3. Aà B [request, N]P1P2  If the nonce is valid (i.e. A did make that request) it is returned to B signed in the shared key.  B now knows the request is valid, as A (speaking for P1) has told it so. B already knows that P1 may access the resource from the evaluation for the previous request. 
4. B à A {data}P1P2 B sends the data to A, encrypted in the established session key.  A knows the data has come from the principal at B associated with the session key. B knows that only the principal associated with the session key at A will be able to read the data.
    Attack scenarios

The above protocol ensures that only those nodes with the authority to formulate a request can cause the invocation of a request and/or the return of data from another node. Simple replays of requests will fail due to the nonce security.

The greater dangers come from a combination of masquerading and replay. In the majority of cases a combination of masquerading and replay will not obtain anything the attacker could not have seen by eavesdropping alone, and the data obtained will be encrypted. However, some attacks are still possible:

Denial of service through masquerading and replay of invocation requests

Consider an invocation request by A for increment the total by 2 and return the new value on a resource at B where total is a value local to the resource at B. E can intercept A’s messages and masquerade as A to B.

This attack occurs on an inter-domain communication, once a shared key has been established between A and B (i.e. on subsequent invocations between A and B)

1. A sends its request and a nonce to B, but it is intercepted by E.

2. E sends the request to B.

3. B requests authentication of the request from A.

4. E passes the authentication request to A and returns the authenticated request to B.

5. B makes the invocation, and returns the value.

6. E intercepts and destroys the returned valued.

A will become aware that its request to B has failed, but will not know the state of the resource at B.

This is a denial of service, but such failures could occur at other times due to network or host failure. Therefore if such invocations are to be made we must assume they will occur inside a transactional model, which can be used to recover to a consistent state. Invocations where a change of state in the resource at B is not linked to the value returned to A will not be affected – the invocation will occur, it will just have been instigated by another party.

(For example, increment the total by one. The total is simply incremented by one whoever of A or E sends the request).

Data transfer inside a trust domain

If we trust the integrity of our network against eavesdropping, we may want to merely sign data rather than encrypt it. (Integrity against eavesdropping but not masquerading may be possible if the network is behind a gateway but not a firewall).

The computational advantage of this is not certain – encryption of a block of data with a shared key probably compares favourably to digesting and signing the same block.

However, if the trusted network allows masquerade attacks, an attack is possible if the request leaves the trust domain via a forwarding chain. The diagram below shows the proper route of A à B à C with the data returned to A. However, in the scenario, E intercepts the request, and sends it to C whilst masquerading as A.

If C sends the nonce to A for verification, then it will return it to E. E sends the request for verification to A as itself (after all, it may be the holder of the resource), to ask it to sign the nonce. E can then send the signed nonce on to C. C verifies the signature and returns the data signed (rather than encrypted) to E. E has obtained the data. If nonce checking is only done locally, E receives the data immediately without having to intercept the nonce check.

Hence if data is to be sent only signed rather than encrypted the trusted network should operate behind a firewall that prevents masquerade attacks.

4 A security scenario

The following describes a typical access scenario in PerDiS, and aims to highlight the components, processes and data involved in providing security for PerDiS.

Before starting a PerDiS based application, a user must first start their security shell. The shell presents the user with a list of tasks in which they currently hold roles, and allows them to select a task, or to import a new one. Once a task has been selected, the user then selects the role they wish to play within that task. The security shell extracts the user’s credentials from the relevant task object. The user can then start an application that will enjoy access rights to clusters according to their role.

When the application starts, the user’s security shell passes security information to the application’s ULL. The ULL connects to the local PD, and establishes its identity to the PD. At some point the user’s private key is passed to the PD. The key is used by the PD to sign and encrypt requests and data on behalf of the user.

Eventually the user’s application will generate a request to retrieve or store data, or take a lock.  The request (along with any credential hints) is passed to the local PD. As it is expected that a ULL and PD will be running on the same workstation (or connected over a local area network) the communication is protected only through the security provided by a TCP connection. (However, cryptographic protection may be simply added if the corresponding decrease in performance is acceptable).

On the first request for access to a cluster, the PD will have no information about the security to be applied for the cluster. If the the security attributes for the cluster are held locally the PD has free access to the security attributes, which it obtains from its own storage for use. However, if the security attributes are held remotely, the local PD must obtain the security information from a remote PD. The remote PD will apply access to requests for security information. Only those principals who have access to the data in a cluster can read that cluster’s security attributes. (Hence the request for security information is made on behalf of the requesting principal, using their identity and credentials, not using the PD’s identity.) Because the access rights for the data and security attributes of a cluster are the same, if a remote PD rejects a request for the security information, we know the request for the data will also fail. If the security information is returned, then the local PD must evaluate the information to determine if the user has Read or ReadWrite access.

When the local PD has the security information, it can apply the access control for the data. This will occur if the data is local or remote. If the data is local, once the check has been passed, the data can be sent to the ULL. If the data is remote, performing the check ensures that fraudulent requests aren’t passed on to the remote PD. The request is then sent to the appropriate remote PD, where it will apply the same access control procedures as described above.

Once it has obtained the security attributes the PD has two tasks. It must obtain the user’s credentials from the task object, and compare them to the access rights specified in the ACL. If the role specified by the user has been granted the access rights requested the first stage of access control has been passed. The PD must confirm the validity of the credentials. To do this, it must check that the credetials are authentic, and that the user presenting the credentials is who they claim to be. To check the credentials the PD must identify the task manager who signed the RoleInTask certificate, and then check the validity of the signatures on the certificates. Confirmation of the user’s identity depends on whether the request is local or remote. If the request is from a local ULL, the identity of the user has already been established when the ULL first connected to the PD. Otherwise, if the request is from a remote PD, the principal making the request signs it. This signature can be used to prove that the principal made the request. In both cases the PD consults a certificate database sever to obtain the public key certificates for the principals to verify the corresponding private keys. If these tests are passed, access is granted at the level indicated in the ACL.

For better performance, the results of the access control evaluation are cached. When a subsequent access request to the same cluster is made, the PD will already know the access rights the user has in that cluster. It now only needs to reconfirm the identity of the user before granting access to the cluster.

If a local request is satisfied from a local cluster, the PD can be trusted to provide the correct information. However, when a request is received from a remote PD, the remote PD must prove its authority to provide the information. The PD must either be the homesite of the data, or it must show that a user who had the credentials to do so caused the data to be held by the PD. The PD either signs the data to be returned with its own key, or signs it with the key of the principal that provides the authority for the data.

To provide for the secure communication of both requests and replies between PDs, the comunication channels between PDs are encrypted.

5. Naming of users, roles and tasks

Throughout the PerDiS security model we need to be able to uniquely identify principals. We need to be able to name the users assigned roles in tasks, to name these roles, to names the tasks these roles take place in, and name the task managers with authority for assigning the users to the roles. We also need to be able to name clusters.

It is vital that these names can be assigned in such a matter that every principal in uniquely named, and that the name claimed by a principal can be proven to be correct. Otherwise all of the security model is undermined; for example if we assign a role to a named user, but another user claims the same name, unless the second name can be shown to be invalid the second user will incorrectly have the same access rights as the first. Such failures could occur intentionally or accidentally.

Clusters are named using URLs of the form pds://home_site/cluster_name. As much of the security relies on establishing the identity of the home site for a cluster, this is clearly an important area to consider. Clearly the name specified must be correct, otherwise we are identifying a different cluster.  However, there is a system level mapping from the name to a server, and this may occur through a variety of methods, most including the use of the DNS service to resolve the home site name to the correct IP address.

The use of signatures to prove the authenticity of messages means that even if we mistakenly connect to the incorrect PD for a cluster, we will be able to detect this.  This is because the certificate obtained for the home site is identified by the site's name, not its IP number.  So, even if we do connect to an incorrect site, the communication will not be signed using the proper key.

In task naming, the name of a task is the URL for its task object.  This allows us to obtain information about a task, and to identify the task manager of a task to verify credentials.  Agai, we can use the securing of communication through the use of signatures to verify that we have connected to the correct PD.

Naming of roles within tasks is always with reference to a task, so need only be unique within the task.

Naming of principals is more interesting, as these will often be humans, and their identity understood by other humans. A variety of naming schemes has been developed. See work on X.509 [x509] and SPKI [spki] for more information.

6.  Replication and Security granularity for access control

The unit of protection in the PerDiS security model is the cluster, and so it is logical that access control should be applied at this level.  However, as we have already seen, the granularity of replication allows locks to be taken and data accessed at the level of pages or ranges of bytes.  This raised this issue of different protection policies within a cluster.

It also raises the question of when access control should be applied.

ULL to PD Security

For each cluster that a ULL is accessing, a cluster object exists in both the ULL and the PD.  The process of opening a cluster for access requires the creation of such objects in the ULL and the PD.  Access control can clearly be applied at the opening time to determine if the principal opening the cluster has access rights to the cluster.  If the user has no rights, no objects are created.

However, because the unit of replication is at a finer granularity than the cluster,  many requests for data in the cluster may be made through the cluster object.  The application may also want a mixture of access to data in the cluster, some for write access, and some for read access rights.

To support this, when a cluster is opened, the maximum access rights a user has in that cluster is evaluated and then stored in the cluster objects.  Any further requests for data from the same cluster are then evaluated against this stored rights.

For example, when a user opens a cluster, they may be evaluated as having only Read access to the cluster.  As long as they continue to access data in the cluster under read locks they will succeed.   However, if they then try to take a write lock, their request will be denied.

PD to PD Security

Caches in PerDiS communicate at a sub-cluster level of granularity.  Because the unit of replication may be smaller than a cluster, sub parts of a cluster may be stored at several PDs.  For this reason, there is no notion of opening a cluster between PDs as there is at the ULL-PD level.  Instead, the PD simply requests the segment of the cluster it requires.

For this reason, access control must be applied to every request received.  This means that the identity of the principal making the request must be established for each request message.  Whilst the access rights also need to be evaluated for each request, a similar scheme to the ULL-PD method can be used to make the process more efficient.  Upon the first request for a segment of data in a cluster, the full access control mechanism is used.  However, in case further requests for data in the same cluster are received, the result of the evaluation is cached by the PD (in the same way as the maximum access rights are stored in the cluster object for the ULL request).  Future requests for data in the same cluster can be evaluated using this cached result.

7 Summary

The design describes how the PerDiS security model may be applied for the PerDiS platform.

The security information necessary for access control - access control lists and user credentials, is stored and accessed using the DSM mechanism provided by PerDiS itself.  This allows us to ensure that consistent and up to date information is always available to the access control process, and allows us to perform timely revocation of access rights.

The use of public key cryptography is kept to a minimum level through the use of security domains, whereby different strengths of security can be specified, and the use of a dynamic protocol that negotiates shared seesion keys through the use of public cryptography only where necessary.

Whilst the unit of protection is the cluster, the needs introduced by having different levels of granularity for replication have been considered.
 

8 References

[spki]   There is a set of Internet drafts at http://www.ietf.org/ids.by.wg/spki.html
[x509] CCITT and SIO: "Information Processing Systems - Open Systems Interconnection - The Directory", CCITT X.500-X.521, 1998
[tools]  http://www.perdis.esprit.ec.org/deliverables/docs/T.D.1.1/B/tools.html