The effects of applying cell-suppression and perturbation to aggregated genetic data

Athos Antoniades, John Keane, Aristos Aristodimou, Christa Philipou, Andreas Constantinou, Christos Georgousopoulos, Federica Tozzi, Kyriacos Kyriacou, Andreas Hadjisavvas, Maria Loizidou, Christiana Demetriou, Constantinos Pattichis

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

The key test for confidence in any association discovered within the medical domain is replication testing. That is, the ability of the association to be detected in independent populations. At the same time, in order to increase the likelihood of discovering statistically significant associations there is a clear need to increase the statistical power of any given study. A key methodology for increasing statistical power is through the use of as many subjects as possible that match a study's inclusion criteria. Thus many have attempted to merge data from multiple independent sources/sites/studies that contain the same inclusion criteria for subjects as a way of creating a much larger study with significantly more statistical power. For these approaches to work though data from multiple sites need to be made available to a single analysis. This practice is significantly limited by the need to respect legal and ethical requirements that are often complicated, ambiguous and inconsistent across different countries. The common approach to achieve merging of data is by sharing aggregated data rather than subject's personal data. Aggregated data however may still in some cases be reverse engineered, therefore traditionally cells within the aggregated data with small values were suppressed, and some or all of the aggregated data were perturbed in order to add noise inhibiting any attempts at identifying personal information of a specific person or sub-group in the original data. In this paper we study the effects of cell-suppression and perturbation on the results of the data analysis. Each approach is looked at by itself as well as in combination using the typical settings documented in the literature. The tests are based on a real dataset that looks for associations between phenotypes and genetic markers. This work is part of the Linked2Safety project that aims to dynamically interconnect distributed patients' data to better enable medical research efforts, whilst respecting patients' anonymity, as well as European and national legislation.

Original languageEnglish
Title of host publicationIEEE 12th International Conference on BioInformatics and BioEngineering, BIBE 2012
Pages644-649
Number of pages6
DOIs
Publication statusPublished - 2012
Externally publishedYes
Event12th IEEE International Conference on BioInformatics and BioEngineering, BIBE 2012 - Larnaca, Cyprus
Duration: 11 Nov 201213 Nov 2012

Publication series

NameIEEE 12th International Conference on BioInformatics and BioEngineering, BIBE 2012

Other

Other12th IEEE International Conference on BioInformatics and BioEngineering, BIBE 2012
Country/TerritoryCyprus
CityLarnaca
Period11/11/1213/11/12

Keywords

  • Aggregated Data
  • Anonymi-sation
  • Cell-suppression
  • Noise
  • Perturbation

Fingerprint

Dive into the research topics of 'The effects of applying cell-suppression and perturbation to aggregated genetic data'. Together they form a unique fingerprint.

Cite this