Why are Beacons important to global genomic data sharing and why is global genomic data sharing important in the first place?

Beacons present an easy-to-implement strategy for determining whether an institution has genomic data in its data set that would be useful to share.

How is sharing data through Beacon different from traditional medical practice?

In current medical practice, doctors commonly share basic, de-identified information about patients’ conditions and genetic variations in the hopes of matching a second case that could be therapeutically transformative. This practice is a long-standing tradition in the medical community but it is neither standardized nor optimized. The Beacon Project turns that informal transaction into a more efficient and effective search that can be a key initial step for learning from valuable genomic information.

Since much of that information is currently collected and held in silos, researchers lack the broad-scale ability to know whether similar data exist that could provide important learning when combined with their own data.

The Beacon Project was created both to test the willingness of organizations to share at the most basic, yes/no level, and also to allow researchers to understand if a specific allele exists in other data sets. Now that a Beacon Network has been created, researchers have the ability to discover valuable information within organizations that have lit Beacons. This is often a needed first step to aggregate and learn from large scale genomic data.

What do institutions commit to when they join the Beacon Network?

When an institution “lights” a Beacon, it demonstrates its willingness and ability to share data.

Precise Variants

Precise variants are such with annotated sequences of 0….n length, and bases of [ATCGN] for the referenceBases and alternateBases attributes (N for “not specified”).

What is a Beacon?

A Beacon is an online web service that allows users to query an institution’s databases, to determine whether they contain a genetic variant of interest. The query is structured as a yes/no question of the form: “Do you have any genomes with an ‘X’ at position Y on chromosome Z?”

What is the Beacon Project doing to mitigate the possibility and likelihood of reidentification through Beacons?

Any publica data resource containing “human derived” data has a potential for re-identification attacks, that is the identification of individuals contributing data to the resource.

Since its inception, the Beacon Project has been actively working with experts in GA4GH’s membership to address potential privacy concerns.

Range Queries and Structural Variants

The adoption of

  • variantType
  • startMin, startMax
  • endMin, endMax

into the Beacon v0.4 release enabled the execution of

  • range queries
  • fuzzy position matching
  • arbitrary sequence variants (though limited through supported & documented vocabularies)

Coordinate use for Beacon Queries

The coordinate system that should be used throughout GA4GH standards is 0-based half open.

