Algorithms similar to those used by Netflix, Amazon and Facebook have shown the ability to decipher the ‘biological language’ of cancer, Alzheimer’s and other neurodegenerative diseases.
Researchers trained a large-scale language model with a recommendation AI to look at what happens when something goes wrong with proteins that leads to the development of a disease.
The work, conducted by St. John’s College and the University of Cambridge, programed the algorithm to learn the language of shapeshifting droplets of proteins found in cells in order to understand their function and malfunction.
By learning these protein droplets’ language, the team can then ‘correct the grammatical mistakes inside cells that cause disease.”
Scroll down for video
Researchers trained a large-scale language model with a recommendation AI to look at what happens when something goes wrong with proteins that leads to the development of a disease
Professor Tuomas Knowles, a Fellow at St John’s College, said: ‘Any defects connected with these protein droplets can lead to diseases such as cancer.
‘This is why bringing natural language processing technology into research into the molecular origins of protein malfunction is vital if we want to be able to correct the grammatical mistakes inside cells that cause disease.’
Machine learning technology has made waves in the tech industry – Netflix uses it to recommend series, Facebook’s suggest someone to friend and Amazon’s Alexa has an algorithm to recognize people based on their voice.
However, the medical world is adopting the technology in a way that is saving lives.
The work programed the algorithm to learn the language of shapeshifting droplets of proteins found in cells in order to understand their function and malfunction. Pictured are protein condensates forming inside living cells
‘Bringing machine-learning technology into research into neurodegenerative diseases and cancer is an absolute game-changer,’ said Knowles, who is the lead author of the study.
‘Ultimately, the aim will be to use artificial intelligence to develop targeted drugs to dramatically ease symptoms or to prevent dementia happening at all.’
Dr Kadi Liis Saar, first author of the paper and a Research Fellow at St John’s College, was tasked with training the large-scale language model to uncover the protein’s secrets.
She said: ‘The human body is home to thousands and thousands of proteins and scientists don’t yet know the function of many of them. We asked a neural network based language model to learn the language of proteins.
‘We specifically asked the program to learn the language of shapeshifting biomolecular condensates – droplets of proteins found in cells – that scientists really need to understand to crack the language of biological function and malfunction that cause cancer and neurodegenerative diseases like Alzheimer’s.
‘We found it could learn, without being explicitly told, what scientists have already discovered about the language of proteins over decades of research.’
Proteins play a number of key roles in the body, but most of their work is done in cells – they provide structure, function and regulate the body’s tissues and organs.
Alzheimer’s, Parkinson’s and Huntington’s diseases are three of the most common neurodegenerative diseases, but scientists believe there are several hundred.
In Alzheimer’s disease, which affects 50 million people worldwide, proteins go rogue, form clumps and kill healthy nerve cells.
A healthy brain has a quality control system that effectively disposes of these potentially dangerous masses of proteins, known as aggregates.
Scientists now think that some disordered proteins also form liquid-like droplets of proteins called condensates that don’t have a membrane and merge freely with each other.
Unlike protein aggregates which are irreversible, protein condensates can form and reform and are often compared to blobs of shapeshifting wax in lava lamps.
‘Protein condensates have recently attracted a lot of attention in the scientific world because they control key events in the cell such as gene expression – how our DNA is converted into proteins – and protein synthesis – how the cells make proteins,’ Knowles said.
‘Any defects connected with these protein droplets can lead to diseases such as cancer. This is why bringing natural language processing technology into research into the molecular origins of protein malfunction is vital if we want to be able to correct the grammatical mistakes inside cells that cause disease.’
The machine-learning technology is developing at a rapid pace due to the growing availability of data, increased computing power, and technical advances which have created more powerful algorithms.
By learning the language of the proteins (pictured) the team can determine what is malfunctioning. ‘Ultimately, the aim will be to use artificial intelligence to develop targeted drugs to dramatically ease symptoms or to prevent dementia happening at all,’ scientists say
‘We fed the algorithm all of data held on the known proteins so it could learn and predict the language of proteins in the same way these models learn about human language and how WhatsApp knows how to suggest words for you to use,’ Dr Saar said.
‘Then we were able ask it about the specific grammar that leads only some proteins to form condensates inside cells. It is a very challenging problem and unlocking it will help us learn the rules of the language of disease.’
Further use of machine-learning could transform future cancer and neurodegenerative disease research.
Discoveries could be made beyond what scientists currently already know and speculate about diseases and potentially even beyond what the human brain can understand without the help of machine-learning.
‘Machine-learning can be free of the limitations of what researchers think are the targets for scientific exploration and it will mean new connections will be found that we have not even conceived of yet,’ Dr Saar explained.
‘It is really very exciting indeed.’