Skip to content

Survey about sharing health data online

September 23, 2016



Funded by the Wellcome Trust on behalf of the Global Alliance for Genomics and Health

Christoph Bock on ‘Preserve personal freedom in networked societies’

September 1, 2016


“We do not protect data because the data would take harm; rather, we seek to protect the rights and well-being of individuals who might be harmed by certain uses of their data. This observation could hold the key to protecting personal freedom in a world of evaporating privacy.”

Christoph Bock is a principal investigator at the CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences in Vienna. He is also a Project Leader at Genom Austria, a  member of the Global Network of Personal Genome Projects. His thoughtful commentary on data protection sheds light on the true dangers of privacy loss, and offers suggestions for how to deal with its potential impact.

Link to article:


Job opening at PGP-UK (London)

June 6, 2016

Global PGP Network member PGP-UK (based in London) seeks a postdoc in bioinformatics, statistics or computer science to start Oct 1 and focus on complex trait analysis. They are looking for someone with experience in integrative analysis of multidimensional data, plus a track record of leading a project from conception to publication.

Sevgi Umur gave a brief update on PGP-UK at the 2016 GET Conference. Watch now

Exploring the Harvard PGP Dataset with Untap

November 19, 2015

Recently, my co-worker Abram Connelly scraped the phenotypes in the Harvard Personal Genome Project and made it available in a small SQLite database, publicly available for anyone to download. He made a small webapp around the database where people can play around with the data directly in their browser.

Current webapp (first page has link to gzip of the database) is linked to at the top of this page.



The dataset consists of people who have completed the enrollment process and were free to upload their own data for public release and also answer the surveys online (but not necessarily people who have donated samples, had the samples sequence, or had the samples released publicly).

To put it concretely, there are around 4000 people enrolled, and around 200 people with whole genome sequences that were  sequenced, interpreted, and returned by Harvard PGP as of August 2015 (though keep your ears open for upcoming news). (Participants may have whole genomes sequenced independently and then elect to upload and donate the data to the Harvard PGP).


Returning to the webapp, there are a few default tabs, where you can do things like explore what year PGP participants were born, where you can see that our population is mostly young folks…


“Summary” Pre-Packaged View of Allergies of Participants

…or with two clicks see what allergies are most common in PGP participants. Note that this is a quick scrape of the Tapestry database and no clean-up has been done, so you’ll notice allergies being listed twice with different spellings.

SQL Queries for Participants with “Oak” Allergies

On the “queries” tab, you can query the sql database and see the results in neat table form in your browser.

Additionally, there are some pre-packaged but interactive visualizations, where you can edit the text and have the graph update to reflect your changes / newly requested data.

For instance, here’s a display of the participant gender ratio at different ages which I modify to display information about the allergies at different age buckets

before, displaying gender of participants



and after, displaying penicillin and house dust allergies



Obligatory cat statistics

Although one could hope that this graph shows that PGP participants are not more likely to develop allergies to cats as they grow older, we have a lot more younger participants and this is absolute and not percent frequency, so we might have to say the data points to the opposite. Sad!

(Disclaimer: Just for fun, no real thought put into this analysis :] )


Ever wanted a public genotype + phenotype dataset? The Harvard PGP has you covered!

We have phenotype surveys galore (including a recently released one that includes blood type and eye color), with responses available in CSV form. The questions on the survey forms are available on github for now.

I hope you all enjoy! The source code for Untap is on github

and Abram welcomes feature requests / issue reporting. We hope this is beneficial to the GA4GH working groups specifically and other researchers in general.

Joining as a PGP Volunteer

November 3, 2015

I’m Nancy. I recently joined the Harvard Personal Genome Project as a volunteer. I think I’ve joined at a great time, when the Harvard PGP has the world’s largest public dataset that has whole genome sequences linked with genotypes.

I’m excited to join in what I view as an effort that addresses the inherent ethical issues in genomics research: genomes are as individual as a fingerprint, and to stretch the analogy a bit, a smudged fingerprint (de-identified) or summaries of large amounts of fingerprints (aggregation) is only so useful, especially as with the rise of precision medicine we start targeting smaller and smaller subsets of the population with precision medicine.

I think there are many challenges in the HPGP right now, among them challenges in funding and staffing, which contribute to a lot of frustration on behalf of participants, many who have donated blood and saliva samples and waited months and even years without a returned sample from us.

As I’ve worked with the HPGP staff over the last few months, I’ve come to see that every last one of the staff members is working extremely hard to get samples sequenced and genomes returned. However, none of us work on HPGP full-time and we also rely on donated effort from other organization, such as sequencing centers (which we’re very grateful for!). Although our pace may seem slow, I’m still really impressed by how much work has been done already.

I also like to brainstorm about the future. A future where, among other things, you might be able to check on the status of your genome ala Domino’s Pizza instead of having to email us and have to wait for us to laboriously reply to the many emails we get each week.

(just kidding).

On that note, happy fall everyone!

–Nancy Ouyang

PGP & the Critical Assessment of Genome Interpretation

September 9, 2015

CAGI logo

We’re thrilled to announce that data from the Harvard Personal Genome Project is being used in a challenge this year presented by the Critical Assessment of Genome Interpretation (CAGI). CAGI challenges test the ability of researchers to interpret genome data and make phenotype predictions.

PGP data is uniquely valuable for these challenges as it is completely “open source”: the algorithms and data can be completely open. In this challenge, experimenters are asked to predict matching phenotype profiles for a set of genomes. To read more about the challenge, follow this link to CAGI’s website:

The 2015 Harvard PGP conference

August 31, 2015

Next month, the Harvard Personal Genome Project will hold its annual U.S. conference (MindEx 2015) and labs events (PG-Palooza) in Cambridge, MA. The conference will take place on Saturday, September 12 at Harvard University’s famed Sanders Theatre. PG-Palooza labs will be held on Sunday, September 13 at the Cambridge Innovation Center. Thanks to the generosity of our sponsors, all PGP participants will be admitted to both MindEx and PG-Palooza for free!

In years past, the PGP was featured at the GET Conference. This year, the GET Conference is going international. It will take place in Vienna (Sept 17-19, and will feature Genom Austria, and other members of the growing international PGP consortium.

For this year’s U.S. MindEx conference, the Harvard PGP is working together with the Mind First Foundation, and a focus of the conference will be the mental realm: mind and brain, cognition and behavior. Still, as in previous years, the U.S. conference and labs will provide its established focus on open source genomics and citizen participatory science.

To register as a PGP participant for MindEx, please click here to visit the MindEx and PG-Palooza page at the Harvard PGP website (you’ll need to log in to your account), and click on the “Participate” button at the bottom of the page, or go straight to the appropriate EventBrite page ( We recently made all registration free, so simply use Public Registration. At the conference we’ll register you separately for PG-Palooza, which is open only to those enrolled in the PGP.

More about MindEx and PG-Palooza

Conference speakers will include PGP founder and Harvard Professor Dr. George Church, Dr. Ron Kessler (Harvard Medical School), Dr. Martine Rothblatt (United Therapeutics), Dr. Ed Boyden (MIT Synthetic Neurobiology Group), Dr. Richard Wrangham (Harvard), Dr. Madeleine Price Ball (PGP Harvard and Open Humans Project), Dr. Sasha Wait Zaranek (PGP Harvard and Curoverse), Dr. Jordan Smoller (Broad Institute, Harvard Medical, Massachusetts General Hospital), best-selling psychology author David McRaney, gut microbiome experts Justine Debelius and Dr. Siavosh Rezvan Behbahani, and more. PG-Palooza will feature presentations and collections of specimens and data by the Harvard PGP, American Gut, uBiome, LifeNaut, MindModeling@Home, H-Scan,, and more!

For additional details about the conference, labs, speakers, venues, hotels, directions and maps, visit the MindEx conference pages on the Mind First Foundation website (

We hope to see you there!