The method could lead to preventive treatment and delay the occurrence of psychosis.
Using machine learning, researchers found that more frequent use of words associated with sound and speaking with low semantic density or vagueness can predict with 93% accuracy whether an at-risk person will develop psychosis, according to the findings of a study published in npj Schizophrenia.
Results suggested that higher-than-normal usage of sound-related words, combined with a higher usage of words with similar meaning, meant psychosis was likely on the horizon.
“It was previously known that subtle features of future psychosis are present in people’s language, but we’ve used machine learning to actually uncover hidden details about those features,” said senior author Phillip Wolff, Ph.D., a psychology professor at Emory University.
Co-author Elaine Walker, Ph.D., psychology department at Emory, said that if individuals who are at risk of psychosis can be identified earlier, preventive interventions can be used to potentially reverse the deficits.
“There are good data showing that treatments like cognitive-behavioral therapy can delay onset and perhaps even reduce the occurrence of psychosis,” Walker said.
Wolff, Walker and first author Neguine Rezaii, M.D., of the neurology department at Massachusetts General Hospital, used machine learning to establish “norms” for conversational language. To do this, they fed a computer software program the conversations of 30,000 Reddit users.
That information then went into a program called Word2Vec, which uses an algorithm to change words to vectors. Each is assigned a location in a semantic space based on its meaning. Words with similar meanings are closer together than those with different meanings.
The researchers also developed a “vector unpacking” program to analyze the semantic density of word usage. This helped the research team quantify how much information was packed into each sentence.
Wolff and his team then took speech samples from 40 participants from the North American Prodrome Longitudinal Study (NAPLS) at Emory University. NAPLS is designed to increase the understanding of mental health concerns in young people and to prevent the development of more serious mental illness. Thirty participants were included for the second phase of the study and 10 for the third phase, which were used for testing the model.
A member of the research team transcribed video recordings of all 40 interviews. Semantic density and content analyses started with pre-processing stages.
The researcher separated the speech from the participant and interviewer. Then that person identified individual sentences and part-of-speech categories. Researchers used the Stanford Parser to automatically identify all sentences in a string of text. To focus on the meaning of the text, the researcher reduced the sentences of the participant and interviewer to just content words — nouns, verbs, adjectives and adverbs.
Finally, through lemmatization using the Natural Language Toolkit’s WordNetLemmatizer module, the words were expressed in their uninflected forms.
The research team compared the baseline sample to the longitudinal data on whether the participants converted to psychosis.
“This research is interesting not just for its potential to reveal more about mental illness, but for understanding how the mind works — how it puts ideas together,” Wolff said. “Machine-learning technology is advancing so rapidly that it’s giving us tools to data mine the human mind.”
Get the best insights in digital health directly to your inbox.