TSD white paper
Services for Sensitive Data (TSD)
2023-12-11
Abstract
TSD is a special purpose eInfrastructure for sensitive data. It is a multi-tenant plaform-as-a-service (PAAS), offering secure login, remote access to virtual machines, APIs for application development, and user support. This is offered to customers such as research projects, clinics, students, centers of excellence, private companies and more. TSD processes data on behalf of its customers based on the establishment of data processing agreements, and commercial agreements. Customers in turn, as data owners, are responsible for establishing their own legal basis for data processing, approved by recognised ethical approval bodies such as data protection officers. TSD develops a set of core research services and APIs in-house, and consumes infrastructure services from other parts of the IT-department at UiO. Other parts of the organistion, in turn, consume TSD APIs in order to build research services according to user needs. Keywords: PAAS, Remote Login, APIs, Sensitive Data
1. Introduction to TSD
The TSD (Services for sensitive data) system was developed at the University Centre for IT (USIT), University of Oslo (UiO) during 2011 to 2014, and first launched in May 2014. The service is based on the pilot TSD 1.0 developed during 2008-2011, and is under continuous development. This Whitepaper describes the status of the system to date.
TSD has been designed to offer research services which comply with GDPR regarding policies for research on sensitive data. The majority of the research projects hosted use health information that is directly or indirectly identifiable. In addition, the service hosts all other UiO (and for other institutions) data that are under other strict confidentiality regulations and involved in research. There are also some exceptions for initiatives that need TSD for non-research work, such as clinics, or as a backend for apps in commercial companies.
TSD is a special purpose eInfrastructure, running on an on-premisis cloud at UiO. It provides storage, virtual machines, databases, private physical servers, High Performance Computing (HPC), APIs and integrated web services in a secure environment. Every project has its own set of virtual machines (VMs) inside TSD as a virtual workspace. Additional services enabling data collection, dynamic digital consent management, custom native apps, data publication and data sharing, deploying and hosting custom APIs are also available. TSD´s goal is to be a platform that provides the services and tools needed to work efficiently with research on sensitive data, including sharing, publishing, collecting and analysing sensitive data in an efficient manner.
Users of TSD are required to access and work on their data via virtual workspaces, and there are technical and administrative restrictions regarding downloading data to local facilities. All users must comply with the code of conduct, and all user requirements.
2. High level system description
TSD users access their data via secure remote login to VMs. Users are tied to specific projects, and each project has its own set of VMs. Data is made available to project VMs based on the storage system’s network ACLs. Data is kept within the system by different means (blocking copy-paste (from TSD to client), drive-mounting, printing, and USB forwarding). There will always be an unavoidable risk of data leakage by means of screen-shots and photography from an authenticated and authorised user.
By providing VMs with sufficient computing power and high internal bandwidth, TSD provides a flexible and familiar work environment to research projects. High performance computing needs can be met in three ways: the first is to increase the number of CPUs and RAM on the VMs for medium to large jobs, and the second is to use the TSD HPC resource instead of running jobs on the VM. The last option is to install one or more physical servers (“app-nodes”) within a specific TSD project network for such specific needs.
The reasoning behind this setup is that it is fairly easy to apply a firewall on the outside of the storage, VM-hotel and HPC, thereby securing TSD towards external threats. The firewall guards the system by allowing only authenticated and authorised access. All research projects reside within their own segmented network, either a VLAN or a micro-segmented network region. This gives the system several layers of security and it gives strict separation between TSD projects.
A subset of research data is exposed via API, to web services such as the data publication service. Projects can also set up their own integrations with APIs, given that they anchor such integrations in their own DPIA and risk assessment. TSD provides a self service portal for the management of user credentials, project access control, and resources. In addition a data portal allows upload and download of data.
3. TSD login
User login
All user logins to either web services of virtual machines require Two-Factor authentication. User creation and credentials management for password and one-time codes are self-service as far as possible. Self service is enabled by either leveraging trusted 3rd-party national electronic identity solutions, or by using TSD’s protocol for foreign user registration. If it is not possible to use either one of these protocols, operators can create accounts and/or set credentials and distribute these via approved methods.
Web services integrate with TSD’s OpenID Connect Provider, which does pass word checks against TSD’s IAM database, and radius server. Virtual machines integrate with Active Directory (using kerberos), and radius.
Operator login
TSD operators can log in to linux VMs via the TSD jumphost - a strictly controlled administrative host with extended internal network access. UiO disaster recovery staff can log in to linux VMs via the disaster recovery jump host. Both jump hosts require Two-Factor authentication, UiO employment, signature of the TSD NDA, and explicit access. For operational robustness, a different login system is used then for project users.
4. Networking and firewalling
All of TSD’s network traffic (ingress and egress) is routed through a set of redundant Cisco switches. A miminal set of ACLs on these switches regulate access to network equipment. Layer 3 firewalling is applied by two sets of redundant FreeBSD routers, one set that governs internal traffic and ipv6 egress, and another set that governs ipv4 ingress and egress.
TSD uses a rfc1918 (IETF (1996)) private address range internally. Layer 2 traffic is segmented into VLANs, and each VLAN has one or more IP subnets assigned to it. Firewall rules applied on the FreeBSD gateway routers regulate inter-VLAN traffic in addition to ingress and egress traffic.
TSD’s project separation is implemented in a multilayered and hybrid fashion. The administrative project has 11 VLANs with multiple subnets each. Virtual machines which are provisioned in the administrative project run on a sepa rate VMware ESXi cluster. This protects against potential hypervisor attacks launched from project VMs which run on their own VMware ESXi cluster. TSD projects which do not need any inter-VM communication are provisioned in a shared VLAN and are sandboxed from one another via VMware’s NSX-T mi crosegmentation firewall, which is applied on the VM’s virtual network interface. Other projects are provisioned in separate VLANs from one another.
To protect against TCP-over-DNS attacks TSD runs Request Policy Zones (RPZ) on its DNS providers. This limits which DNS names can be resolved inside the TSD network to an approved list.
The Cisco, FreeBSD, and DNS RPZ config files are managed in version control, changes are applied subject to review, and changes are regularly audited at the change council.
5. VMWare Horizon Infrastructure
TSD supports Windows server and Redhat Linux, available via VMware Horizon View and web clients, both of which require Two-Factor authentication. The login methods use TSD Active Directory (AD) with Kerberos for password verification and the TSD radius server for one-time code checks.
The servers that host view.tsd.usit.no are VMWare appliances and consist of combined firewall/web servers that manage traffic and user login. Once the user is authenticated and authorised the view framework talks to a connection server. This server is a coordinator that keeps track of which VM resources and applications a specific user is allowed to connect to. This information is then presented to the user, after which they can login in to their project server in a full desktop environment.
7. Domain Controller, Active Directory and LDAP
Active Directory (AD) and LDAP services are running on a set of redundant virtual machines. These services only serve TSD and reside on the inside of the firewall.
8. Provisioning users and virtual machines
TSD has a specially developed Identity and Access Management (IAM) system that stores reference information on persons, users, groups, projects, institutions and access control rules. User and group data is synchronised to AD in an event-driven fashion, using the LDAP interface. The IAM API is exposed to the self service portal, by which users can control their own credentials, and access control setting within their own projects. The API is also available to operators via a command-line tool.
Virtual machine management in TSD uses an instance of MREG (the UiO machine(computer) register). MREG feeds the resource provisioning system with information about which resources a project and user should have available. Resource provisioning uses VMware tools, and IBM ESS API etc.
9. Project and user creation
The steps of setting up a project with users for a PI include: Log in to nettskjema.uio.no for setting up new TSD projects. The PI must log in us ing ID-porten, the national IdP (Identity provider) for Norway. PIs that are not Norwegian citizens must contact TSD by email for the time being. Information about the PI is fetched from Norway’s name registry. Further, a copy of the legal clearance for the research project (REK or equivalent) and a signed Data Handler Agreement (DHA) must be uploaded with the application. Then the project is created, along with the person record of the PI.
Project users must log into TSD’s self service portal using ID-porten to apply for membership in a specific project. The PI can then either approve or deny the membership application, and assign them rights to log in to project VMs. An application authorised by a PI triggers auto-generation of the user. Users can then pick set their credentials using BankID on the self service portal.
Foreign user creation is a self-service process initiated by the PI along with the user.
10. One-time passwords
Authentication occurs via password and one-time passwords (OTPs). TSD has implemented an OTP API which has both a HTTP and radius interface. Management is done via HTTP while checking OTPs is done via radius. Both TOTP and HOTP are supported. The same code applies for all users connected to a person.
11. Management network in TSD
All computers used in this setup have one or more management network cards (here used for separate management VLANs). These are connected to a central server that acts as the management server for the management network. Since all the logins go through one of the connection servers or jumphosts, there must be a way to access the network in case all jumphost are down. To deal with such an unlikely situation, TSD has a physical setup in the server-room that can be used to access the central servers and thereafter the management network.
12. Virtual machine hotel
The virtualisation system used is based on VMware. The virtual machines are either Linux or Windows. Provisioning is done based on the master database of TSD and performed by standard VMware components. OS disks are either served from the TSD block storage on Fibre Channel or from the internal VSAN on the VM hosts.
13. Storage
TSD file storage is a physically separated deployment of IBM ESS (8+ PiB) running GPFS and exporting data to projects as NFSv4 with Kerberos and SMBv3 with Kerberos. Further the GPFS is presented directly to the Colossus cluster over infiniband. Physical access to the storage server room is restricted.
Each project has the following standard folders:
• /data/no-backup/, (work area for temporary files)
• /data/durable/ (shared project work area for data that must be backed up)
• /home/username[1-n], (only accessibly by the single user)
There is one /shared directory that everyone can read, used mainly for open data and software.
Failed disk drives are physically destroyed by a commercial disk crusher that makes reconstruction of data impossible.
14. HTTP API
TSD supports data transfer using the HTTPS protocol, via the TSD API. All API components are accessible only by means of specific firewall rules at the IP layer. Traffic is routed through a transparent TCP proxy, and HTTP requests to the different web services are proxied by a HTTP reverse proxy server.
TSD support OpenID Connect for user authentication, and OAuth2.0 basic authentication for machine-to-machine integrations. The API uses centralised HTTP request authorization based on client and user attributes. Access control rules are managed in the IAM system.
The File API is designed for file import and export, and the handling of JSON data. TSD has developed and released an open source API command-line client (TSD (2023)) for file transfers.
15. Databases
TSD provides PostgreSQL and MSSQL hosting. Each database service uses its own VM provisioned in the appropriate project. All DB traffic is encrypted, and DB IP ACLs control traffic.
16. High Performance Computing
Colossus is a HPC cluster with administration frontends and a number of compute nodes. It has Slurm job scheduling, an Infiniband high speed interconnect, and a GPFS file system.
Hardening of the compute nodes also prevent user processes being able to see other users’ processes (e.g. through ‘ps’) to prevent data leaks by process descriptions and options that may include sensitive data being exposed. There is no option to login to compute nodes. Job scripts are submitted from the project’s own virtual machines to the job scheduler, Slurm, which allocates resources and runs the job on the cluster.
17. Backup
Backup is done with an isolated part of the UiO backup system based on Commvault that resides within TSD, but with the data (dedicated disk and shared tape) placed in a different building. The only part that is shared with UiO is the tape library, where TSD has its own partition. All data that is written to tape is encrypted, and the encryption is done inside TSD before data is written to the Commvault system.
18. Physical security
All TSD hardware is kept within USIT server-rooms where only a very limited number of trusted USIT and UiO real-estate department employees are granted access (card and code).
To access the entrance to these two rooms you must have been granted access to the server-room corridor by entering two doors (card and code). The server-room corridor and the area between the first two doors are under video surveillance 24 hours a day, all days of the year. All HW is on dual separate power circuits (except HPC), and special server-room fire-extinguisher systems are in place. One exception is the backup system where there is no video surveillance, but one needs to be granted access to corridor + server-room with card and code for server-room. All data-cables between storage and compute rooms, and to the backup system are on either dedicated hardware or on switches that reside behind two-factor authentication.
19. Monitoring and antivirus
Monitoring of the system is based on the USIT log systems Nivlheim and Zabbix. The monitoring uses client-side agents to check the current status of machines, disks and processes, and log info is exported out of TSD to the regular UiO log handling system. The monitoring is designed not to enable any transportation of project data hidden in logfiles or reports. All monitoring is initiated on the inside of TSD reporting to the outside Zabbix service by a proxy. TSD also has standard UiO virus and malware check on all Windows servers.
20. Software, software provisioning and licensing
The software portfolio available on the virtual machines includes basic office and statistical software on the servers: MS Office / Open Office, SAS, Matlab, Stata, SPSS, R, etc. Project admins can install software on Windows via SCCM, where official packages, and packages built by USIT are distributed. On Windows computers Applocker is enabled to disable user software installation if a user is not a project admin.
Linux-users are able to install software in their /project or /home areas as long as this does not require administrative rights. Linux servers have access to official Redhat software repos. If an rpm requires admin rights, then staff can install it on behalf of users. Linux users can also access easybuild modules which are built for the HPC cluster.
Server management sotfware is distributed to linux hosts via CFEngine, which is controlled by configuration kept in version control.
TSD mainains software mirrors for Python, R, and STATA packages, which can be used from both Windows and Linux.
Users can bring their own site licenses to use licensed software, and in some cases online activations are performed by setting up license proxies.
21. Survey data collection
TSD has enabled easy data collection through the online self-serviced ques tionnaire system Nettskjema, which is also hosted by USIT. This enables easy encrypted data collection that is compliant with privacy regulations. Nettskjema is integrated with ID-porten so respondents may submit a form either anony mously, by FEIDE login, a token, minID, or BankID. This opens for great flexibility. Nettskjema delivers data to TSD using the TSD API. Further TSD enables smartphone and tablet apps to deliver data to TSD through the API. TSD produces datasets which are automatically updated in real-time from the collected survey data.
22. Tablet and phone apps
TSD can be the secure backend to any app, all data traffic between the app and TSD is done using the APIs, and USIT offers app development. USIT does not vouch for the security of apps not developed in-house, but can act as a backend for any app.
23. Containers in TSD
Currently TSD serves some research projects with private VMs for Galaxy portal services and for running containers (Singularity and Docker).
24. Video and sound in TSD
Video and sound playback in TSD works on regular Windows servers through the PCoIP protocol and the BLAST Extreme protocol. For high-performance video usage (video editing or multiple high-resolution, time-synced video streams) TSD has a dedicated Virtual Desktop setup with special vGPU and PCoIP accelerator cards, and a VSAN to support the video storage.
25. Integration architecture
The main idea of integrations in TSD is using REST-API for data exchange between components, and the central role based IAM to control the access to the APIs. There is a high focus on avoiding man-in-the-middle attack vectors. Further the integration architecture utilises a message queue to enable triggered actions and event based system changes and updates rather than batch-updates.
26. Self service portal
TSD has established a self-service portal where users can do the following:
• Apply for membership in a project
• Grant or deny users a membership in a project
• (Re)Set your password and one-time code
• Grant export rights
• Grant/review group privileges
• Generate upload links
• Review personal data
• Audit file exports
27. Dynamic digital consent system
TSD has developed a system based on eSignatures that enable a project to use Nettskjema as their basis for a digital consent collection. The consent portal available outside TSD gives consenters an overview of their consents (across all TSD projects), with the ability to revoke any consent. Another portal desgined for researchers is available inside TSD, which gives an overview of all collected consents.
28. Publication portal
TSD has enabled a system for tagging a file with a Norwegian personal ID number (PID), the holder of this PID can subsequently log in using their BankID and see the file granted access to.
29. Risk evaluation
TSD has been through a thorough evaluation by the chief of IT-Security at UiO. The risk assessment document is available on request from tsd-drift@usit.uio.no. The security assessment of TSD is a continu ous process and the risk evaluation is updated when a significant change has to be made in the infrastructure.
TSD has also been under penetration testing by an internationally recognised IT security expert. The penetration testing attempted i) an illicit login without valid user credentials and ii) and illicit access to data of a given project operated by a licit user of another project. None of the targeted attacks were successful.
30. Paperwork and how to contact TSD
When sensitive data under custody of a given organisation (Data Controller) is to be handled by another organisation (Data Processor) a DHA must be made. In TSD’s case, all non-UiO projects must have a signed DHA before completing project registration and usage. The agreement must be signed by someone at both organisations with sufficient privileges to do so. Additionally, we have made a “lighter” version of the data handler agreement for internal use at UiO. This is done to ensure that the primary investigator (PI), leading a research project using TSD, understands the system, their responsibilities, best practices and the system risk evaluation.
TSD offers the platform as a product for sale, and there is a standard agreement for regulating such sales and cost.
TSD has a Code of Conduct / Rule of engagement document that all users must comply to
TSD has a privacy declaration that all users, sys-admins and the system must comply with. Additionally all TSD sys-admins are governmental employees and thus under NDAs etc according to general Norwegian law.
References
IETF. 1996. “Address Allocation for Private Internets.” IETF. 1996. https: //datatracker.ietf.org/doc/html/rfc1918.
TSD. 2023. “TSD API Client.” TSD. 2023. https://github.com/unioslo/tsd-api client.
Last updated
Was this helpful?