MOUNTAIN VIEW, Calif. – According to Google researchers, automatic speech recognition or ASR technologies would soon be the new transcription tool that will help managing patient–doctor conversations easier.
Devices employing such technologies include Google Home, Google Translate, and Google Assistant. In one of the latest proof of concept studies, researchers have described their experience in developing two automatic speech recognition or ASR methodologies.
These methods are intended for multi-speaker medical conversations. They concluded that both the latest models could positively impact in streamlining the workflows of medical practitioners.
In the study, researchers wrote that doctors spend approximately 6 hours of their 11-hour workday in the EHR and 1.5 hours documenting. These often cause the burnout rate to increase and the number of primary care physicians to decrease. They stated that with the help of ASR technologies, the transcription would be accelerated, making it immensely useful in the field.
ASR tech is considered to be a foundation technology that makes information summarization and extraction a whole lot easier and faster, relieving the burden of traditional documentation. Most of the current ASR technologies specially designed for transcription are limited to the doctor’s dictations which only use a single speaker.
Furthermore, the conversations between patients and doctors face more difficulties especially with the overlapping of dialogue and different speech patterns.
Due to these difficulties, the researchers decided to develop two new ASR methodologies that can support multi-participant conversations. The first model is a connectionist temporal classification or CTC model. It is primarily focused on the sequence and placement of the individual units of the phonetic speech.
The second system is known as the listen, attend, and spell or LAS model. It is a multi-part neural network that can translate speech into single characters. It then selects the subsequent entries primarily based on prior predictions.
Each of the ASR models was trained for more than 14,000 hours of anonymous medical conversations. This, according to the researchers, needed a significant amount of time and effort to properly align and clean for the training.
The data cleaning became a vital part of the success of the CTC models which then achieved 20.1 percent word error rate. From the researchers’ analysis, most mistakes were at the beginning and the end of conversations.
On the other hand, the LAS models were found to be more resilient to noise or incorrect data alignment. It only reached a word error rate of 18.3 percent which was not related to medical terms. Overall, the system reached 98.2 percent in terms of recall for drug names that were mentioned during the medical conversation.
Since most of the mistakes of both models were rarely regarding medical terms, the researchers concluded that these latest ASR technologies could provide accurate and high-quality results.
Two of the authors of the research, product manager Katherine Chou and software engineer Chung-Cheng Chiu, said in a Google Research Blog post that they would be working with Stanford University physicians. It is in relation to exploring how the ASR tech such as their two methodologies can lessen the daily burdens of a physician.
They added that the team hopes that these recent advancements in technology, the ASR technologies, will not just help to facilitate doctors but can also give thoroughness in medical attention to patients.