VIDEO: Future of Genomics, David Altshuler at 2014 GET Conference
David Altshuler presents his talk, “Future of Genomics” as part of the 2014 GET Conference.
The following is a guest post from Alan and Priscilla Oppenheimer.
If you are enrolled in PGP Harvard, you probably received a recent email that mentioned a survey that we, the Alan & Priscilla Oppenheimer Foundation, are inviting you to take. We’d like to share more about who we are and why we’re inviting PGP Harvard participants to take this survey. Although this survey is limited to PGP Harvard participants, we invite others to keep reading. Big changes are ahead that will start affecting us all!
About our foundation
We are a small science-focused family foundation, started in 2007. We knew we were small, but we still wanted to think big. When we became aware of Dr. Church’s new Personal Genome Project, we realized that it provided a great opportunity for a foundation like ours to make a big difference. We felt quite privileged when Dr. Church and his team said we could work with them, helping out where we could.
A few of the areas in which we feel we have made a difference include:
- prototyping the current sequencing effort by sponsoring one of the first genomes beyond the original PGP 10
- creating the initial study guide which helped potential PGP participants learn about genomics and pass the entrance exam (a predecessor to the current one)
- helping out with a number of aspects of the GET conferences
- and, most recently, planning and putting together the current survey.
Our faith in the PGP in particular and personalized health in general has been validated through a number of recent developments, President Obama’s newly announced Precision Medicine initiative being the most visible. Also, as indicated in the recent email, it’s great to see that the PGP has been able to send out almost all submitted enrollee blood samples for sequencing, that the project has spread from Harvard to Canada, the UK, Austria, and beyond, and has spun off important related efforts such as Open Humans.
About our survey
As the cost of a complete human genome sequence falls towards the $1000 mark, and such sequencing begins to become commonplace, it’s now time to ask the gratifying but difficult question of “What’s next?”. For the foundation, the answer is related to understanding what our now-obtainable complete sequence means. Helping to address this question has always been an underlying goal of the PGP, but it is only with recent successes that we have been able to begin focusing on it.
The current survey is our attempt to understand the ways in which PGP enrollees (and by extension many others worldwide) want to try to learn about, explore and understand their genomes. With that data in hand we can then focus our limited resources on one or two key tools to aid in that exploration. If you’re enrolled in PGP, we’d thus very much appreciate your taking our 10-minute survey.
Thank you for your time and your interest in personal genomics.
Alan and Priscilla Oppenheimer
The Alan & Priscilla Oppenheimer Foundation
http://www.oppenheimerfoundation.org
PGP Harvard updates – including a new “real name” option
Some updates about PGP Harvard: (1) we’ve added a new feature to the website that allows participants to share their real name, and (2) we have more whole genomes on the way!
Our new “real name” feature
The Harvard Personal Genome Project has always emphasized that the genetic data our participants publicly share is “identifiable”. This means, even if you remove your name from the data, it’s possible for someone to determine your identity. Almost 4,000 people have enrolled knowing that privacy cannot be guaranteed, and many of them are proudly public about their data.
However, to an outside viewer, the data looks anonymous! PGP Harvard’s profiles have random identifiers (huID numbers). Even for the staff, we’re often unsure whether a participant considers their name to be publicly associated with the profile or not. Sometimes participants do things that seem to indicate they believe their information is public by including their real name in an upload, uploading a photograph, or mentioning their participant ID in another forum. Until now there has been no way for a participant to explicitly choose to associate their name with their data on our website [1].
We’d like the project to look less anonymous and we want to let participants be clear about when they consider their name to be a public fact associated with their data. So we’ve added to the website a feature that allows a participant to associate their real name. (This is based on their first and last name in our system, which they signed the consent form with.)
To share your real name as a PGP Harvard participant: (1) log in to your account on my.pgp-hms.org, (2) select “Public Profile” from the “Participate” menu, (3) edit the “Real Name” section at the top of this page. Here is a screenshot:
More genomes coming
In addition to providing the real names feature to PGP participants, we are also working on processing a new data set received from Complete Genomics, the company responsible for most of the sequencing done by PGP Harvard.
This data comes from around 200 blood samples collected in the past year and a half, including the 2013 GET conference. At this point the most of these genomes have been sequenced and are waiting to be analyzed and approved. We hope to start releasing these to participants soon.
Participants will have a 30- day period to review their data and decide whether or not to withdraw. For everyone that remains a participant, the data will then become public. We look forward to sharing this data and expanding our public resource!
—
[1] There are many participants that have publicly associated their names with their profiles, most notably the first ten participants in PGP Harvard (the “PGP-10”). However, these associations weren’t done within the participant website, but were done in other contexts (e.g. conferences, news articles, press releases, blog posts etc).
December blood sampling in San Diego and St Louis for PGP-Harvard
PGP Harvard is planning two more blood collection events. These events will take place in San Diego, CA on December 16, and in St. Louis, MO on December 29.
PGP Harvard participants who have completed the PGP Participant Survey and all twelve trait surveys are invited to apply to donate blood. Importantly, this event is NOT for those who already have a genome or gave blood at GET2013, GET2014, or at recent Boston or Mountain View collection events.
To apply, please log in to your participant account at my.pgp-hms.org and visit the San Diego collection event page or the St. Louis collection event page. You can complete surveys (or check if you’ve already done them) by visiting the trait surveys page.
Genom Austria Launches as fourth member of the Global Network of Personal Genome Projects
We are delighted to announce the launch yesterday of Genom Austria, the fourth member of the Global Network of Personal Genome Projects! This research study is a joint project of the CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, the Medical University of Vienna, and PersonalGenomes.org. Check out the team.
They launched having already sequenced the whole genomes of two volunteers and plan to enroll and sequence a total of 20 volunteers in the first year. With the addition of Genom Austria, the global network now has member sites at leading institutions in the United States, Canada, United Kingdom and Austria!
Read the press release (PDF).
PersonalGenomes.org is hiring! We are a start-up nonprofit, transforming big ideas about participatory research and open data into resources that can benefit everyone’s health. We are looking for people who are passionate about our mission and excited by the opportunity to work with amazing people all over the globe. We have several open positions, please check them out and share with your family and friends looking for new opportunities:
http://personalgenomes.theresumator.com/apply
PGP Harvard blood collection event: Boston, Sept 20 (Saturday)
PGP Harvard is planning another weekend blood collection event in Boston. The event will take place at Harvard Medical School Saturday, September 20, 10am-4pm.
PGP Harvard participants who have completed the PGP Participant Survey and all twelve trait surveys are invited to apply to donate blood. Importantly, this event is NOT for those who already have a genome or gave blood at GET2013, GET2014, or at recent Boston or Mountain View collection events.
To apply, please log in to your participant account at my.pgp-hms.org and visit the collection event page. You can complete surveys (or check if you’ve already done them) by visiting the trait surveys page.
Comments on GA4GH Data Sharing Draft
The following is a copy of our comments as submitted through the online interface at genomicsandhealth.org.
These comments pertain to the International Code of Conduct for Genomic and Health-Related Data Sharing – DRAFT # 6, produced by the Regulatory and Ethics Working Group of the Global Alliance for Genomics and Health. That draft document can be found at this URL: http://genomicsandhealth.org/our-work/work-products/international-code-conduct-genomic-and-health-related-data-sharing-draft-6
Our most important points are the first two. The first suggests an explicit mandate to inform individuals, families, and communities regarding identifiability of their data. The second suggests individuals, families, and communities from whom data is derived also be considered as potential data sharing recipients.
These comments come from the following Personal Genome Project (PGP)-associated contributors:
- Misha Angrist (PersonalGenomes.org Board Member)
- Madeleine P Ball (PGP Harvard, Director of Research & PersonalGenomes.org staff member)
- Stephan Beck (PGP United Kingdom, Director)
- Jason R Bobe (PersonalGenomes.org Executive Director & PGP Harvard, Director of Community)
- Michael F Chou (PGP Harvard, Director of Human Subjects Research)
- George M Church (PGP Harvard, Principal Investigator & PersonalGenomes.org President)
- Preston W Estep (PGP Harvard, Director of Gerontology and Director of Collections)
- Rifat Hamoudi (PGP United Kingdom, Computational Analysis and Development Leader)
- Ryan Phelan (PersonalGenomes.org Board Member)
- Jane Kaye (PGP United Kingdom, Ethics and Social Implications Leader)
- Jeantine E Lunshof (PGP Harvard, Ethics Consultant)
- Michelle N Meyer (PersonalGenomes.org Board Member)
- Stephen W Scherer (PGP Canada, Principal Investigator)
- Alexander Wait Zaranek (PGP Harvard, Director of Informatics)
1. We strongly suggest explicitly stating participants be informed about identifiability.
(Section 4, Guidelines 4.2)
To respect individuals, families, and communities, and to foster trust and integrity, we strongly believe the foundational principles should mean that individuals, families, and communities be informed about the identifiability of data relating to them. In particular, participants should be informed of the inherent identifiability of an individual from their genome, or from genotype profiling of multiple loci in their genome. To make this clear, section 4.2 of the guidelines:
4.2 Informing individuals, families and communities about the use and exchange of data relating to them, depending on the nature of the data.
Could be changed to specifically mention identifiability:
4.2 Informing individuals, families and communities about the use and exchange of data relating to them, including its identifiability, depending on the nature of the data.
2. We strongly suggest reciprocal consideration of data sharing to and from individuals.
(Section 4, Guidelines 5.2)
To respect individuals, families, and communities, and to foster trust and reciprocity, we strongly believe the foundational principles should mean that individuals, families, and communities from whom data are derived also be considered as potential data sharing recipients. To reflect this, section 5.2 of the guidelines could be updated to also describe consideration of the risks of data sharing to/with individuals, families, and communities (in addition to on/about):
5.2 Considering the realistic harms and benefits of data sharing on individuals, families and communities, including opportunity costs.
To also state “with” individuals, families, and communities:
5.2 Considering the realistic harms and benefits of data sharing on and with individuals, families and communities, including opportunity costs associated with both sharing and not sharing.
Additional Recommendations
3. We suggest avoiding some terms with markedly variable legal meaning.
(Preamble & Section 1)
There are a couple of terms in the draft that have meanings that vary considerably depending on country and legal context. Because this document is intended to convey global policy, we suggest avoiding these terms and, if appropriate, replacing them with terms which avoid unintended or inconsistent legal interpretation.
The first of these is the phrase “moral interests”. One interpretation of this is as “moral rights”, a term that, to our knowledge, varies markedly in its legal meaning. While we recognize the phrase “moral interests” reflects language in Article 27 of the UDHR, we recommend possibly avoiding it to reduce divergent understandings of the meaning of this document.
The other phrase with variable legal meaning is the term “good faith”. As with “moral rights”, in some countries and legal contexts “good faith” has a concrete legal meaning and can be breached. In other contexts, it is an appeal for fair behavior with no legal force.
4. We wonder if there is an expectation that this code may be binding, beyond the signees?
(Section 2)
If not generally binding or enforceable, we suggest changing the phrase:
This code applies to
To state:
This code can potentially be applied to
5. We suggest wording changes to the founding principles.
(Section 3)
The third foundational principle refers to what seems like two principles that aren’t strongly related: “advancing research” and “fair distribution of [research] benefits”. Also, because genomics research is often not related to health (e.g. ancestry), emphasis on “health and wellbeing” as a principle in themselves (the first principle) could be seen as implicitly excluding these fields of research. We suggest stronger emphasis of “research and scientific knowledge” would be more inclusive. Because “health and wellbeing” seem more related to “fair distribution of benefits”, we suggest rewording the foundational principles from:
1. Promote Health and Wellbeing
2. Respect Individuals, Families and Communities
3. Advance Research and the Fair Distribution of Benefits
4. Foster Trust, Integrity and Reciprocity
To instead be:
1. Advance Research and Scientific Knowledge
2. Respect Individuals, Families and Communities
3. Promote Health, Wellbeing, and Fair Distribution of Benefits
4. Foster Trust, Integrity and Reciprocity
6. We suggest explicitly recognizing donors as actors in consent.
(Section 4)
In keeping with the second foundational principle (respect for individuals, families, and communities), we suggest explicitly naming “donors” as those who are giving consent in this sentence:
This Code applies to data that has been consented to for use and/or approved therefor by competent authorities.
To state:
This Code applies to data that has been consented to by donors (or their legal representatives) for use and/or approved therefor by competent authorities.
7. We suggest specifying data provenance trace to the data source.
(Section 4, Guidelines 2.1)
To enable investigators to ensure that their data has been generated from well-consented sources, we recommend updating the phrase:
…tracking the chain of data exchange.
to state:
…tracking the chain of data exchange to its source.
8. We suggest avoiding potentially implying that perfect data security can be achieved.
(Section 4, Guidelines 3.3)
Because perfect data security is not achievable, we recommend changing the phrase:
Installing strict data security measures to prevent unauthorized access, data loss and misuse….
To state:
Installing strict data security measures to mitigate the risk of unauthorized access, data loss and misuse….
9. We suggest clarifying Part 5 of the Guidelines to communicate balancing of risk and benefit.
(Section 4, Guidelines 5)
The title for this section, “Minimizing Harm and Maximizing Benefits”, refers two very different extremes in decision-making. To communicate balancing consideration, we recommend changing the phrase and title:
minimizing harm and maximizing benefits
To instead be:
risk-benefit analysis
It was also unclear to us what outcomes would be considered as potential harms or benefits; it might also be helpful to give examples of these.
June 21 (Sat) Boston: PGP Harvard blood sampling
We’ve collected blood in Boston before at the GET conference, but attending the event isn’t always possible for local residents, so we’ve decided to hold a blood collection event on a weekend. We’re planning a sample collection at Harvard Medical School next Saturday June 21st, 10am-4pm.
PGP Harvard participants who have completed the PGP Participant Survey and all twelve trait surveys are invited to apply to donate blood. Also, this is for folks that aren’t already in the sequencing pipeline – no need to attend if you already have a genome or gave blood at GET2013 or GET2014. To apply, please log in to your participant account at my.pgp-hms.org and visit the collection event page. You can complete surveys (or check if you’ve already done them) by visiting the trait surveys page.
PGP Harvard data in Google Cloud Storage
At PGP Harvard our participants are, by and large, very enthusiastic about understanding genetics and their own genomes. Many participants are programmers, researchers, and often both! It should come as no surprise that our staff are often asked “can I see more of the raw data?”

Some drives our genomes arrived on. Porsche design! That’s how you know it’s quality.
© 2012 Alexander Wait Zaranek, CC-BY license.
We’ve always wanted the entire “raw data” to be public, for participants and researchers alike. One issue that stymied us was the intractable size of the data: this sort of data is typically shipped on terabyte disks. I’m now happy to share that we now have an answer and a place to find the data, although accessing this requires some familiarity with using a command line interface and maybe a smidge of programming.
The full data sets PGP Harvard received from Complete Genomics are now shared on a public bucket on Google Cloud Storage, using credits generously donated by Google. Data is organized by huID.
The bucket: gs://pgp-harvard-data-public
To access the bucket, you should read about installing and using gsutil.
Some example commands
List contents of bucket top level:
gsutil ls gs://pgp-harvard-data-public
Recursively list contents of hu011C57 directory, with date and file size details:
gsutil ls -Rl gs://pgp-harvard-data-public/hu011C57
Download/copy the var file from hu011C57 Complete Genomics data to your current directory (234 MB):
gsutil cp gs://pgp-harvard-data-public/hu011C57/GS000018120-DID/GS000015172-ASM/GS01669-DNA_B05/ASM/var-GS000015172-ASM.tsv.bz2 .
With multi-threading and recursion, copy the hu011C57 directory to your current directory. (40.8 GB):
gsutil -m cp -R gs://pgp-harvard-data-public/hu011C57 .
Use a Google Compute Engine VM to analyze the data
You can also access this data using virtual machines in the Google Compute Engine – this could save you a lot of disk space! Once you have a virtual machine you can, for example, use the Python Client Library to automatically access data.