GA4GH Beacon project


  1. Why a Beacon?
  2. What is Beacon?
  3. Beacon v2 scope
    1. The Beacon v2 Framework
    2. The Beacon v2 Model
    3. I want to deploy a Beacon: how does this affect me?
  4. Beacon v2 security
    1. What are the general security principles for Beacon?
    2. How is security actually implemented when I deploy a Beacon?
    3. How do I test a Beacon without having to go through complex security matters (yet)?
  5. Acknowledgements
    1. Beacon partners
    2. Beacon early implementers

Why a Beacon?

Beaconize hospitals
Figure 1. A schematic representation of how Beacon works. (A) Beacon API implementation and (B) A Beacon query and aggregated response

A Beacon is a simple genomics variant discovery tool by aggregating worldwide genomics dataset under one umbrella. The Beacon Project is developed under an initiative by Global Alliance for Genomics and Health (GA4GH) for the federated discovery of genomic data in biomedical research and clinical applications. One of the main bottlenecks in human genomics research is lack of data. Genomics data is identifiable and therefore needs to be protected, however, due to lack of data security infrastructure and good health data practices, it forces clinicians and researchers to not share their data at all. This further slows down the progress in research. In the time of personalised medicines, inclusive diagnostics, prognostic and therapeutic strategies, we simply cannot afford to keep the data locked in. The Beacon API aims to solve this problem through enabling the search of genomic variants and associated information without jeopardising the privacy of the dataset. This way, any hospital or research entity can choose to ‘beaconize’ their omics dataset without compromising the privacy or the ownership of the dataset, thus helping the worldwide community of researchers and assisting science through the power of data.

What is Beacon?

Beacon is an API (sometimes extended with a user interface) that allows for data discovery of genomic and phenoclinic data.

Originally, the Beacon protocol (versions 0 and 1) allowed researchers to get information about the presence/absence of a given, specific, genomic mutation in a set of data, from patients of a given disease or from the population in general (Figure 2). Examples can be found in the ELIXIR Beacon network page.

Beacon v1
Figure 2. Schematic example of a Beacon query (up to version 1)

The version 2 (v2) of the Beacon protocol has been accepted as GA4GH standard in Spring 2022. It includes, among other changes:

Beacon v2 Network Specification
Figure 3. Schematic example of a Beacon query (version 2)

Beacon v2 scope

The Beacon v2 is based on a two-part concept, with the following documents covering essential aspects of the specification:

In principle, this concept allows for different Models (in other domains outside of the Beacon v2 realm, e.g. “Imaging Beacon”) to be built using the same Framework. However, in the current context of Beacon v2, we consider the two elements interdependent and likely to be updated together for subsequent major versions (e.g. from v2 to v3).

The Beacon v2 Framework

If Beacon v2 were a language, the Framework would be the Syntax. It is the structure upon which the whole API is built. Handling the Framework to deploy your own Beacon requires experience with APIs.

The Framework repo includes the elements that are common to all Beacons:

The Beacon v2 Model

The Model is the Semantics of Beacon v2. It covers the different entities and details arising from clinical requirements. Check out the Documentation for Beacon v2 Model’s default schema.

Beacon v2 model
Figure 2. Schematic representation of the Beacon v2 logical Model

The following entities are defined as follows (the links lead to the field descriptions):

I want to deploy a Beacon: how does this affect me?

If you do not have extensive experience in developement and APIs, you might want to deploy a Beacon Instance. A Beacon instance is just an implementation of a Beacon Model that follows the rules stated by the Beacon Framework.

Then, you do not need to clone the Framework repo, you only need to copy (or clone) the Beacon Model and modify it to your specific instance. You will find plenty of references to the Framework in the Model copy, and you will use the JSON schemas in the Framework to validate that both the structure of your requests and responses are compliant with the Beacon Framework. The Beacon verifier tool would help in such validation.

This said, there are several solutions for Beacon implementation, which will depend on many factors, such as your current solution for data maganement, your IT resources, time, etc. Please contact Lauren Fromont, who will put you in touch with our Beacon Dev team.

Beacon v2 security

An implementation of a Beacon must implement the Global Alliance for Genomics and Health (GA4GH) Beacon standard. The V2 standard is currently (January 2022) undertaking the GA4GH approval process, which means it must be approved by both the Regulatory and Ethics, and Data Security foundational workstreams.

What are the general security principles for Beacon?

The Beacon uses a 3-tiered access model - anonymous, registered, and controlled access. A Beacon that supports anonymous access responds to queries irrespective of the source of the query. For a Beacon to respond to a query at the registered tier, the user must identify themselves to the Beacon, for example by using an ELIXIR identity. For a Beacon to respond to a controlled access query, the user must have applied for, and been granted access to, the Beacon (or data derived from one or more individuals within the Beacon) before sending the query. Note that a Beacon may contain datasets (or collections of individuals) whose data is only accessible at specified tiers within the Beacon. This tiered access model allows the owner or controller of a Beacon to determine which responses are returned to whom depending on the query and the user who is making the request, for example to ensure the response respects the consent under which the data were collected. The ELIXIR Beacon network supports Beacons which respond at different tiers, for example only Beacons which have a response to anonymous queries need respond to an anonymous request. As part of the ELIXIR 2019-21 Beacon Network Implementation Study deliverable D3.3 a document has been written to describe security best practice for users interested in deploying or running a Beacon or users who govern data hosted within a Beacon, and the requirements for adding the Beacon to the ELIXIR Beacon network. As the Beacon standard extends in V2 towards supporting phenotype and range queries, the tiered access model becomes more important to ensure the Beacon response is appropriate to the underlying data.

How is security actually implemented when I deploy a Beacon?

Security attributes are part of the Beacon v2 Framework. The file beaconConfiguration.json defines the schema of the JSON file that includes core aspects of a Beacon instance configuration: the third section defines the security:

securityAttributes: Configuration of the security aspects of the Beacon. By default, a Beacon that does not declare the configuration settings would return boolean (true/false) responses, and only if the user is authenticated and explicitly authorized to access the Beacon resources. While this is the safest set of settings, it is also not informative, therefore not recommended unless the Beacon shares very sensitive information. Non-sensitive Beacons preferably opt for a record and PUBLIC combination.

How do I test a Beacon without having to go through complex security matters (yet)?

As a Beacon is designed to support data discoverability of controlled access datasets, it is recommended that synthetic or artificial data is used for testing and initial deployment of Beacon instances. The use of synthetic data for testing is important in that it ensures that the full functionality of a Beacon can be tested and / or demonstrated without risk of exposing data from individuals. In addition to testing or demonstrating a deployment, synthetic data should be used for development, for example adding new features. Additionally, these data can also be used to demonstrate the access levels and data governance procedures for loading data to a Beacon to build trust with data controllers or data access committees who may be considering loading data to a Beacon. An example dataset that contains chromosome specific vcf files is hosted at EGA under dataset accession EGAD00001006673. While this dataset requires a user to log in to get access, the EGA test user can access this dataset.

Acknowledgements

Beacon partners

In 2020, the GA4GH Beacon group started a set of meetings and interviews with GA4GH Driver Projects and with ELIXIR partners in order to determine the scope of the next generation Beacon. The goal was to be useful without breaking the simplicity that made Beacon version 1 successful. Interviews were conducted with the following GA4GH Driver Projects:

Some ELIXIR partners were also interviewed, i.e. Café Variome, FPS, RD-Connect, CINECA, and Disgenet. Among ELIXIR Spain TransBioNet and Bioinformatics in Barcelona members, a set of Catalan hospitals (e.g. Hospital Clinic) are exploring how to use Beacons inside their genomic diagnose teams and how to share the diagnoses between hospitals.

Beacon early implementers

At the time of submission at the end of 2021, five Beacons were already implemented in the Beacon Service Registry. The “early implementers” actively participated in refining the Framework as they were responsible for spotting any issue they might encounter with the Framework or Model.

 


Edit on Github...