GA4GH Beacon project


  1. Why a Beacon?
  2. What is Beacon?
  3. Beacon v2 scope
    1. The Beacon v2 Framework
    2. The Beacon v2 Model
    3. I want to deploy a Beacon: how does this affect me?
  4. Beacon v2 security
    1. What are the general security principles for Beacon?
    2. How is security actually implemented when I deploy a Beacon?
    3. How do I test a Beacon without having to go through complex security matters (yet)?
  5. Acknowledgements
    1. Beacon partners
    2. Beacon early implementers

Why a Beacon?

Beaconize hospitals
Figure 1. A schematic representation of how Beacon works. (A) Beacon API implementation and (B) A Beacon query and aggregated response

A Beacon is a simple genomics variant discovery tool by aggregating worldwide genomics dataset under one umbrella. The Beacon Project is developed under a Global Alliance for Genomics and Health (GA4GH) Iniciative for the federated discovery of genomic data in biomedical research and clinical applications. One of the main bottlenecks in human genomics research is lack of data. Genomics data are identifiable and therefore need to be protected. However, due to lack of data security infrastructure and good health data practices, clinicians and researchers are inclined to not share their data at all. This further slows down the progress in research. In order to promote personalised medicine, inclusive diagnostics, prognostic and therapeutic strategies, we cannot afford to keep the data locked in. The Beacon API aims to solve this problem by enabling the search of genomic variants and associated information without jeopardising the privacy of the dataset. This way, any hospital or research entity can choose to ‘beaconize’ their omics dataset without compromising the privacy or the ownership of the dataset, thus helping the worldwide community of researchers and assisting science through the power of data.

What is Beacon?

Beacon is an API (sometimes extended with a user interface) that allows for data discovery of genomic and phenoclinic data.

Originally, the Beacon protocol (versions 0 and 1) allowed researchers to get information about the presence/absence of a given, specific, genomic mutation in a set of data, from patients of a given disease or the population in general (Figure 2). Examples can be found in the ELIXIR Beacon network page.

Beacon v1
Figure 2. Schematic example of a Beacon query (up to version 1)

The version 2 (v2) of the Beacon protocol has been accepted as GA4GH standard in Spring 2022. It includes, among other changes:

Beacon v2 Network Specification
Figure 3. Schematic example of a Beacon query (version 2)

Beacon v2 scope

The Beacon v2 is based on a two-part concept, with the following documents covering essential aspects of the specification:

In principle, this concept allows for different Models (in other domains outside of the Beacon v2 realm, e.g. “Imaging Beacon”) to be built using the same Framework. However, in the current context of Beacon v2, we consider the two elements interdependent and likely to be updated together for subsequent major versions (e.g. from v2 to v3).

The Beacon v2 Framework

If Beacon v2 were a language, the Framework would be the Syntax. It is the structure upon which the whole API is built. Handling the Framework to deploy your own Beacon requires experience with APIs. The Framework is mostly relevant for developers.

The Framework repo includes the elements that are common to all Beacons:

The Beacon v2 Model

The Model is the Semantics of Beacon v2. It covers the different entities and details arising from clinical requirements. The Model has been developed for biomedical stakeholders. Check out the Documentation for Beacon v2 Model’s default schema.

Beacon v2 model
Figure 2. Schematic representation of the Beacon v2 logical Model

You can find out more information about Datasets, Cohorts, Genomic variations, Individuals, Biosamples, Analyses and Runs on the Beacon Documentation website.

I want to deploy a Beacon: how does this affect me?

If you do not have extensive experience in developement and APIs, you might want to deploy a Beacon Instance. A Beacon instance is just an implementation of a Beacon Model that follows the rules stated by the Beacon Framework.

This said, there are several solutions for Beacon implementation, which will depend on many factors, such as your current solution for data maganement, your IT resources, time, etc. Please contact Lauren Fromont, who will put you in touch with our Beacon Dev team.

Beacon v2 security

An implementation of a Beacon must implement the Global Alliance for Genomics and Health (GA4GH) Beacon standard. The V2 standard has been approved by both the Regulatory and Ethics, and Data Security foundational workstreams.

What are the general security principles for Beacon?

The Beacon uses a 3-tiered access model - anonymous, registered, and controlled access:

Note that a Beacon may contain datasets (or collections of individuals) whose data is only accessible at specified tiers within the Beacon. This tiered access model allows the owner or controller of a Beacon to determine which responses are returned to whom depending on the query and the user who is making the request, for example to ensure the response respects the consent under which the data were collected. The ELIXIR Beacon network supports Beacons which respond at different tiers, for example only Beacons which have a response to anonymous queries need respond to an anonymous request.

As part of the ELIXIR 2019-21 Beacon Network Implementation Study deliverable D3.3 a document has been written to describe security best practice for users interested in deploying or running a Beacon or users who govern data hosted within a Beacon, and the requirements for adding the Beacon to the ELIXIR Beacon network. As the Beacon standard extends in V2 towards supporting phenotype and range queries, the tiered access model becomes more important to ensure the Beacon response is appropriate to the underlying data.

How is security actually implemented when I deploy a Beacon?

Security attributes are part of the Beacon v2 Framework. The file beaconConfiguration.json defines the schema of the JSON file that includes core aspects of a Beacon instance configuration. Its third section, called securityAttributes, defines the security.

Check out the securityAttributes section on the Beacon Documentation website.

How do I test a Beacon without having to go through complex security matters (yet)?

As a Beacon is designed to support data discoverability of controlled access datasets, it is recommended that synthetic or artificial data is used for testing and initial deployment of Beacon instances. The use of synthetic data for testing is important in that it ensures that the full functionality of a Beacon can be tested and / or demonstrated without risk of exposing data from individuals. In addition to testing or demonstrating a deployment, synthetic data should be used for development, for example adding new features. Additionally, these data can also be used to demonstrate the access levels and data governance procedures for loading data to a Beacon to build trust with data controllers or data access committees who may be considering loading data to a Beacon. An example dataset that contains chromosome specific vcf files is hosted at EGA under dataset accession EGAD00001006673. While this dataset requires a user to log in to get access, the EGA test user can access this dataset.

Acknowledgements

Beacon partners

In 2020, the GA4GH Beacon group started a set of meetings and interviews with GA4GH Driver Projects and with ELIXIR partners in order to determine the scope of the next generation Beacon. The goal was to be useful without breaking the simplicity that made Beacon version 1 successful. Interviews were conducted with the following GA4GH Driver Projects:

Some ELIXIR partners were also interviewed, i.e. Café Variome, FPS, RD-Connect, CINECA, and Disgenet. Among ELIXIR Spain TransBioNet and Bioinformatics in Barcelona members, a set of Catalan hospitals (e.g. Hospital Clinic) are exploring how to use Beacons inside their genomic diagnose teams and how to share the diagnoses between hospitals.

Beacon early implementers

At the time of submission at the end of 2021, five Beacons were already implemented in the Beacon Service Registry. The “early implementers” actively participated in refining the Framework as they were responsible for spotting any issue they might encounter with the Framework or Model.

 


Edit on Github...