To help overcome the bottlenecks that limit the development of diagnostic and therapeutic products, academic and industrial researchers, patient organizations and charities, and regulatory and funding institutions should redefine the basis for sharing the knowledge collected in large-scale clinical and experimental studies.
Advances in biological and medical research and their translation into diagnostic, prognostic and therapeutic tools are relying increasingly on partnerships between the academic (government and university) and industrial (biotechnology, pharmacy and technology) sectors, with essential participation and support from patient organizations and charities. Despite these concerted efforts and the promises of genomics and systems biology, over the past two decades the cost of research and development has continuously escalated, while the number of truly novel drugs coming to market has constantly declined. To a large extent, this has been addressed in the private sector through mergers and outsourcing, with downsizing of the research and development workforce. These trends cannot be sustained further as they threaten the economic viability of the healthcare system worldwide. At a time of a global crisis, it is crucial to identify how we can overcome these hurdles. I argue here that we should look again at how knowledge can be shared between all the stakeholders, redefining the frontier between what can be the subject matter of valuable intellectual property rights and what is the basic knowledge that should be made freely available to all.
This is not a new issue. It was hotly debated at the beginning of the Human Genome Project (HGP), and for its entire duration in relation to competition between the public and private sectors. I suggested early on that the nucleic acid sequences collected on a genome scale should be considered as elements of description insufficient to warrant property rights by themselves in the absence of a genuine invention and should thus be placed in the public domain . A similar attitude was taken by the participants of the HGP in 1996, as expressed in the 'Bermuda rules' , with the result that the openly accessible reference human genome sequence is now the common basis for current research. These proposals contributed to the 'Universal declaration on the human genome and human rights' adopted by the United Nations and its Educational, Scientific and Cultural Organization (UNESCO) in 1997-1998, which stated about the human genome: "In a symbolic sense, it is the heritage of humanity", and "The human genome in its natural state shall not give rise to financial gains." The issue was, and remains to a large extent, how best to balance general and particular interests to sustain basic research while promoting efficient healthcare product development. This has been discussed extensively on ethical, legal and social grounds, and the counterproductive underuse of scarce resources when they are protected by excessive intellectual property rights (the tragedy of the 'anticommons' ) has been pointed out; these discussions have led to proposals to establish patent pools to facilitate development of diagnostic tests .
The recent advent and rapid development of new generations of very-high-throughput DNA sequencing methods makes it now possible to foresee that in the next few years the sequencing and assembly of thousands of human genomes (and transcriptomes) will be achievable at a cost of $1,000 each or less, which is a projected decrease of almost a million-fold in less than ten years. Without the availability of the reference sequence, such astonishing advances would not be possible. With each sequencing run delivering information on the scale of the entire GenBank, it is clear that data quality assessment and analysis are becoming the limiting steps, beyond the capability of single individuals or groups. Similar trends can be anticipated for proteins and metabolites when reference proteomes and metabolomes also become available in the coming years. Public electronic repositories for these large-scale datasets, together with standards and open access publications for their description, have been important developments in the past decade for ensuring that they become available for further studies. However, despite requirements by prominent journals and funding agencies for submission of primary data as a condition for publication and financial support, recent surveys indicate variable compliance with these rules in both academia and industry . There is clearly room for significant improvements in this area if researchers are to take the best advantage of the large datasets produced.
The same issues of data quality and availability are becoming prominent in the assembly of ever-increasing patient cohorts for the purpose of clinical trials and genome-wide genetic association studies, now often reaching tens of thousand of samples . Great efforts have been made, initially in developed countries, to establish standards of good practice for informed consent, clinical trial registration and sample collection and storage in biobanks, and these are now being enforced in newly industrialized countries such as China and India. Although these are welcome developments, much also remains to be done to ensure that these essential resources are used to the best advantage of the patients themselves, and to use genomics and bioinformatics to sustain the development of systems biology and medicine . The issues are many and complex, given the sensitive status of human material with respect to legal rules and practices that can vary substantially from country to country. International harmonization of health regulations and intellectual property rights is ongoing; this is necessary but insufficient to overcome the major bottlenecks in the development of healthcare products, and it will take time to mature and adapt to the rapid pace of technology development. All stakeholders should work together to identify topics and areas in which joint actions would improve the situation significantly in the short term.
I would like to suggest that one such topic is the status and availability of large amounts of underexploited experimental and clinical data in public and private laboratories. In many cases, these existing databases have been developed for a specific purpose, with a focus on a small number of biological elements. With the shift from targeted to global analyses, most of the data collected are not exploited at all, although they could be relevant in another context. It must be recognized that the high potential value of these datasets relies to a large extent on the quality of the biological and clinical annotations, which becomes significant only if the experimental data are properly collected and described. When that is the case, the added value will come from provision of the combined data for further analysis by other experts addressing related and complementary questions. Recognizing this as a topic for sharing of knowledge between academic and industrial partners and establishing data warehouses with agreed open access rules would be a significant step in this direction. It will hopefully be discussed intensely in the columns of Genome Medicine.