Zenome : decentralized blockchain driven database of genomic information

January 27, 2018

Written by


The Zenome project is a decentralized blockchain-driven database of genomic information.


Industry 4.0 is a title for the current trend of automation, scaling and data exchange in manufacturing technologies. It includes artificial intelligence, virtual reality, the Internet of things and Big Data analysis. Genomics is a vivid representative of the industry 4.0 that requires solving many urgent problems, such as storage and analysis of Big Data with keeping public access for researchers and privacy for people. Currently there is a problem of inequality in the genome industry. It means that the main part of personal genomic data was concentrated in data centres of genomic corporations, government, scientific and medical institutions and pharmaceutical companies. Moreover, there is an issue of legal limitation of access to personal genomic data, as well as the absence of possibility for genomic data management and sharing.


This genomic data monopolization dramatically inhibits the development in a number of scientific and medical fields. The development of cryptocurrencies and blockchain-based technologies leads to significant transformation of many economical domains. Application of blockchain approach will be a lifeline allowing to upgrade the development of personal genomics. It will make each person the owner of his or her genetic data.


The Zenome project is a decentralized blockchain-driven database of genomic information.


This platform supports the possibility to manage your genomic data while maintaining privacy and ability to make a profit from selling access to different parts of the genome. It will establish equal conditions for drug development and for the progress of scientific and medical technologies.


Zenome is a new economic environment based on genomic data and blockchain technology.


The implementation of our conceptual model will solve the following difficulties:


• Creating an infrastructure for storing Big genomic data using distributed database

• Open access to millions of human genomes worldwide with privacy protection

• Possibility for each person to participate in scientific and clinical research and to make profit from this

• Stimulating the enhancement of genomic sciences in developing countries and de-monopolization of genomic data in developed countries


Genomic data interpretation: machine learning application


The use of machine learning algorithms to assess the risks of multifactorial diseases is being extensively investigated, but thus far, due to the lack of a sufficient number of training samples, existing mathematical models developed by biological scientists outperform machine learning approaches. However, machine learning is already being used to predict certain complex characteristics of the human body. An example is the appearance of prediction in the work of Craig Venter and colleagues. The essence of their work involved analyzing the genomes and approximately 30,000 facial data points from several thousand volunteers. Based on the data obtained, training samples for machine learning algorithms were built and dependencies between genomic traits and individual appearance were determined. Because of this work, machines have learned to accurately restore a person’s appearance based on his or her genomic data.


The results of this project enable the prediction of the appearance of a criminal or of an unborn child during the early stages of pregnancy. By obtaining a blood sample from a pregnant woman and extracting fetal DNA from the blood, the appearance of an unborn child on his or her 18th birthday can be accurately predicted. To implement this project, Craig Venter recruited one of the best machine learning specialists from Google, Franz Och, a star computer scientist known as the chief architect of Google Translate.


Currently, machine learning is not widely used for diseases, as very large and correctly structured samples are needed for training. The creation of a comprehensive database of human genomes, as well as the availability of detailed questionnaires reflecting individuals’ health statuses, can spur the development of computational training in genomics and will result in highly predictive accuracy in determining the risk of disease development. At the same time, these data will be public and available to all users of the system, excluding the possibility of their monopolization. This availability is extremely important, as the concentration of large amounts of data in corporations’ databases will result in monopolies in the 16 field of genomic machine learning.


Personal genomic information is very sensitive for many people.


However, many people do not fully understand that, based on their genomic information, it is possible to determine their lifespan, propensity to make emotional decisions (manipulability of decision-making), likelihood of developing various mental diseases and risk of sudden death due to, for example, heart arrhythmia. Such information could be disadvantageous for job recruitment, election participation, and medical insurance pricing.


There is also the possibility that a bad actor who knows the sequence of a genome could leave fragments of DNA identical to that genome at, for example, the location of a terrorist act to frame or illegally accuse someone. One could be denied medical treatment (or required to pay a higher fee) or barred from obtaining a desired job.


Corporations and governments could deliberately influence one’s decisions and purchases using their knowledge of “weaknesses” in one’s genomic information. Thus, the protection of genomic data privacy is necessary to protect the equal rights of various categories of people. At the same time, some studies have been performed that allow the identification of individuals’ identities based on their anonymous genomes.


Moreover, some companies (http://www.humanlongevity.com/media/) possess machine learning-based algorithms that can accurately reconstruct the appearance of an individual using only his or her genomic data.


The right to own genomic data?


Currently, there is no legislative definition of the right to possess one’s own genetic information. In some developed countries, including the USA, Germany, and Austria, citizens do not have the right to access and possess their genetic data in the context of its interpretation.


An agent, represented by a physician or medical center with the right to provide such information, is needed. This path is used by the companies Pathway Genomics in the USA and CeGaT in Germany (http://www.cegat.de/en/). To undertake genetic analysis, the advice of a physician who could be a provider of genetic testing is needed, and only this physician has a right to interpret the information provided by genetic analysis. In the USA, there are service providers in the field of “genetics for fun,” such as the companies 23andMe and Ancesty.com, which sell genetic tests directly to the end customer, but these companies can only provide information on ethnic origin and certain health-related characteristics (for example, sports characteristics) and lack the permission to provide most medically valuable information.


These restrictions, imposed by regulators such as the FDA, do not hinder the ability of 23andMe to sell access to genetic data to large pharmaceutical companies. Some such deals are known: a deal was made with Genentech (a subdivision of the pharmaceutical giant Roche) for $60 million for a study of Parkinson’s disease, and a deal was made with another large pharmaceutical company, Pfizer, for a study of inflammatory bowel diseases (e.g., Crohn’s disease). Some reports also claim that 23andMe has had negotiations with Novartis over an Alzheimer’s disease study. Thus, we currently give large companies the right to manage our genomic information, to store it, and to profit from it. Corporations, behind the veil of good intentions, monopolize genomic big data, and we cannot predict how this monopoly will influence future drug prices and medicinal discoveries.



Share on Facebook
Share on Twitter
Please reload

Please reload