Scientists have only begun to discover the clinical potential of machine learning in epigenetics. Read on to learn all about it.
In this Article:
- Epigenetics Overview
- Clinical Potential of Epigenetics
- Machine Learning Overview
- Machine Learning Applications in Epigenetics
- Challenges of Using Machine Learning in Epigenetics
- The Future of Machine Learning in Epigenetics
Everything You Need to Know About Machine Learning in Epigenetics
Epigenetics is a sub-field of genetics. It focuses on heritable changes in gene expression that doesn’t involve changes in the gene sequence.
The complex interaction between a person’s genotype, age, and lifestyle can influence gene expression. There are four categories of epigenetic changes:
- DNA methylation
- RNA-centered mechanisms
- Histones modifications
- Chromatin conformation
Among these, most studies focus on DNA methylation when investigating epigenetic changes in mammals.
What is DNA methylation? This refers to a biological process where methyl groups are added to DNA molecules. This process regulates gene expression.
To measure epigenetic changes, researchers usually examine the methylation of cytosine-guanine dinucleotide or CpG sites. These CpG sites sometimes come in clusters, and scientists refer to them as CpG islands.
Clinical Potential of Epigenetics
A person’s genetics rarely changes, but environmental influences can impact epigenetics. That’s why some scientists believe that it’s more useful to consider epigenetics than genetics alone when diagnosing and treating individuals.
On top of that, studies show that epigenetics can mediate between negative environmental influences and the onset of disease. Scientists are eager to harness the clinical potential of epigenetics to help improve early diagnosis and intervention.
Obesity, cardiovascular diseases, and various cancers are examples of diseases linked to DNA methylation. An essential epigenetic contribution is the identification of cancer biomarkers.
In fact, some of these biomarkers are even FDA-approved. An example of an FDA-approved cancer biomarker is SEPT9 for colorectal cancer.
Because of the discovery of SEPT9, there’s now a commercialized kit that can be used to diagnose colorectal cancer using a blood sample. These kinds of applications enhance the clinical potential of epigenetics.
To identify biomarkers linked to diseases, though, researchers need to find patterns in large amounts of patients, hospitals, and administrative data. Unfortunately, manually looking for these patterns would take a substantial amount of time and resources. This is why scientists are now turning to machine learning as a tool to help uncover these patterns.
Machine Learning Overview
Machine learning (ML) is a subdiscipline of artificial intelligence (AI). ML allows computers to learn so that it can process data and predict the outcome of future events.
Machine learning has been around since the 1950s, but the advances in technology have made it easier to process large data sets in the last two decades. It’s an excellent tool for data-rich fields like genetics and epigenetics.
ML uses algorithms to process data. There are three types of ML algorithmic approaches:
- Unsupervised learning – These types of algorithms help identify relationships within a data set. After the algorithm identifies relationships, the researchers need to figure out the importance or relevance of the relationships.
- Supervised learning – These types of algorithms start with predetermined labels that will provide scope for its predictions.
- Deep learning – These types of algorithms can perform both supervised and unsupervised learning tasks. It can work with unstructured and unlabeled data.
Researchers usually follow three steps to develop and test algorithms:
- Step 1: Prepare Data. This entails pre-processing data to remove or fix incomplete entries.
- Step 2: Establish Data sets. Researchers establish three data sets: training data, test data, and validation data.
- Step 3: Run Sets. In the training data set, the parameters are tested to optimize the algorithm. In the test data set, the algorithm is evaluated. Finally, in the validation data set, the algorithm is tested on a different data source.
Depending on data availability, it’s not always possible to validate the algorithm on a different data source. When this happens, researchers run a k-fold cross validation instead.
In k-fold cross-validation, they randomly split the test data into two groups. They use one group as test data and the second group as the validation data.
Machine Learning Applications in Epigenetics
Epigenetic data works well with ML approaches for the following reasons:
- DNA methylation is stable over time. So it can give you a reliable measure of the genome’s chemical composition within a specific timeframe.
- Availability of data-rich repositories. There are large-scale consortiums that can provide enough data sets to run the necessary analyses.
- Ease of obtaining DNA methylation samples. A small blood sample is enough to generate an individual’s DNA methylation profile.
All three ML algorithmic approaches can be used in epigenetics but supervised learning approaches are the most widely used approach for epigenetic data. It has been used to classify various diseases, such as:
- Metastatic brain tumors
- Prostate cancer
- Coronary heart disease
- Neurodevelopmental syndromes
- Central nervous system tumors
Unsupervised learning approaches can also be used on epigenetic data. It’s mostly used to differentiate the DNA methylation patterns of healthy individuals and individuals with specific diseases. In one study, researchers used it to detect the differences in DNA methylation patterns between the subtypes of breast cancer brain metastases.
Deep learning approaches can be used on supervised and unsupervised tasks when the data set becomes more layered, and researchers introduce more labels to consider. For instance, deep learning was used to help identify breast density scores on mammogram images to help predict breast cancer.
Each ML approach has a lot to offer to the practice of clinical epigenetics. However, there are inherent limitations and challenges to consider.
Challenges of Using Machine Learning in Epigenetics
Regardless of the field of application, ML approaches are best used when you have a lot of representative data. Here are some of the challenges researchers have encountered when using ML on epigenetic data:
- It may be challenging to obtain large data sets for rare diseases. On top of that, some large data sets are not available for public use.
- Many epigenetic data sets have fewer samples than variables. This makes it more difficult to establish statistically significant predictions.
- There may be CpG sites that are linked to more than a single gene. These non-linear relationships may make it trickier to generate predictions.
- There’s also an inherent issue with prediction bias. Some models are biased towards certain populations over others.
Researchers need to work on building more representative data sets that include minorities.
The Future of Machine Learning in Epigenetics
More studies are reporting a link between epigenetic changes and diseases. ML applications offer an opportunity to improve the diagnosis and treatment of these diseases.
As technology advances, ML evolves. Scientists continue to learn how to optimize it for the field of epigenetics.
Which applications of machine learning in epigenetics are you most interested in? Share your thoughts with us in the comments section below.
- Clinical Epigenetics: Machine learning and clinical epigenetics: a review of challenges for diagnosis and classification