Using machine learning to detect early-stage cancers
Diagnosing cancer early on can improve a patient’s treatment and prognosis. But detecting cancer in its first stage can be difficult, and current screening methods often require invasive procedures or expensive imaging equipment to identify the initial signs of disease.
But now, an international group of medical researchers — including a UC Berkeley team led by Xin Guo, the Coleman Fung Chair Professor in the Department of Industrial Engineering and Operations Research — has developed a new method that can help detect cancer from a simple blood test, well before the first symptoms are present.
In their study, recently published in Nature Biomedical Engineering, a machine learning algorithm was paired with a novel sequencing method to detect incredibly low concentrations of tumor DNA in blood samples.
In addition to plasma, platelets and white and red blood cells, blood contains degraded fragments of DNA called cell-free DNA. As blood circulates in the body, it picks up traces of this DNA from different organs. In individuals who have cancer, some of this cell-free DNA — called circulating tumor DNA (ctDNA) — comes from tumors.
Typically, to detect ctDNA in blood, scientists use a genetic analysis method called deep methylation sequencing. But this method produces a lot of data, in part because ctDNA isn’t the only DNA present.
The amount of ctDNA varies depending on the location and number of cancerous cells present. For instance, if the cancer is in organs with a lot of blood circulation or there are more cancer cells present, more ctDNA is picked up by the blood flow.
Weeding through large amounts of data for signs of ctDNA is already a challenge, but the data from deep methylation sequencing is further complicated by limitations of the technique that cause damage to the DNA and errors in the signal.
Because of this, scientists have struggled to find analysis methods sensitive enough to detect low concentrations of ctDNA. But Guo had the perfect solution.
“The whole thing gets very noisy. The question is, how can you actually use these very fragmented signals in a way in which you can detect the signals for cancer? That’s what we actually are doing: We use machine learning and train an algorithm to try to find the cancer signals,” said Guo.
She was already using machine learning to develop a screening technique for a disease that causes vision loss and was excited for another way to apply it to the medical field. At first, she was skeptical about the possibility of finding such low concentrations of ctDNA, especially since the researchers had chosen lung cancer as their test case, which has particularly low ctDNA concentrations due to minimal blood flow through the lungs and has a low number of cancerous cells at an early stage. But as the researchers set out to work, they were happily surprised with the results.
“Statistically, it’s very difficult, and we were very lucky to find the optimal algorithm to do that,” said Guo. “It’s really amazing that we could.”
After choosing a common machine learning algorithm called a support vector machine (SVM), Guo and graduate student researcher Chengju Wu tailored it to their purpose using deep methylation sequencing data from 569 patients. They fine-tuned the algorithm until it was able to sift through thousands of data points and locate the ones they needed, despite the noisy signal. Guo likened this process to carefully directing a torch in the right direction to be able to illuminate what you’re looking for.
Once the algorithm was trained, the researchers applied it to a sequencing method they developed — called enhanced linear-splinter amplification sequencing or ELSA-seq — that boosts the data signal. They then tested it against other sequencing methods on 308 patients with cancer and 261 patients without cancer.
The results showed that their method outperformed all others, detecting nearly twice as many patients with cancer than another common sequencing method, and was able to detect ctDNA at concentrations as low as 1 in 10,000. They accurately detected cancer in 52% of the patients with early-stage cancer and 81% of those with late-stage cancer, with a 96% specificity.
Guo hopes to further improve their method by tailoring the algorithm to differentiate between cancerous and benign tumors. She believes that machine learning can be a great benefit to medicine.
“Medical data is where machine learning has the brightest application and the most impact for the future,” said Guo. “We’re two different, distinct fields — and this marriage can really save a lot of lives.”
Other co-authors include researchers from Peking Union Medical College and the Chinese Academy of Medical Sciences, Burning Rock Biotech and Shanghai Chest Hospital.