Automation of audio to text for medical transcription of cardiology clinic letters

Python

MS SQL Server

Windows OS

Project Description

The aim of this project is to automate the process that is followed to transcribe audio dictation files into patient clinic letters. The current process involves generating clinical letters using Microsoft Word with patient and doctor information and then typing the dictation audio into the body of the letter.

The process takes approximately 15 minutes per letter.

Approach and thought process

The goals of the project are to reduce the amount of time it takes to complete the process and increase the accuracy of the transcribed content.

Dragon software will transcribe audio to text but is costly at $1200 per year. For low volume of usage, the cost outweighs the benefit.
External API-based NLP options are out of the question due to privacy.
Creating a drag-and-drop type interface that uses AI to transcribe the content is one solution. Also, a database to manage the doctor and patient information better is desirable.

Quick sketches

Process flow diagram

Component diagram

Planning

Phase 1

Create a machine learning model that takes in as input an audio dictation and outputs the corresponding transcribed text.

Tasks:

Investigation of different approaches is required. One option is a supervised learning approach using Fourier transformations and Long Short-Term Memory network(LSTM).
Investigation of using CTC (Connectionist Temporal Classification) decoding with a beam search algorithm as a loss function.
Investigation of different data sets for training is required. Mozilla open voice and KenLM are options.
Existing audio files and transcribed content will need to be extracted and compiled for input into the training process.
- Cardiologist 1: 72 weeks @ approx 20mins per week = 24 hours of content
- Cardiologist 2: 72 weeks @ approx 1hour per fortnight = 36 hours of content
Document the software. Create and execute tests.

Considerations:

LSTMs are a better approach than Recurrent Neural Networks(RNNs). The problem with RNNs is vanishing gradients for deep networks.
LSTMs work by having a forget state, so when the network gets deep enough it truncates the network so that it is not as deep.
An LSTM will essentially predict the next token (word or otherwise) based on a seed sequence of words.
A beam search algorithm looks at the cumulative probabilities of utterances based on a specified beam value. Beam search is an alternative to a greedy algorithm approach that only looks at the single highest probability at each step.
Need to think about the integration of an acoustic model and language model in the architecture. The acoustic model takes audio as input and converts to probabilities over an alphabet. The language model converts the probabilities into words of a language.
Is there a way to convert the 60 hours of content into the required format using a programmatic approach rather than processing it manually one at a time?

Phase 2

Implement an SQL database to better manage patient, medical professional, and letter details.

Tasks:

Create descriptive requirements
Generate ER-diagram from description
Generate the relational model and create tables
Create a GUI interface to interact with the database (insert, update, delete transactions)

Phase 3

Create a library tool of complex medical terminology and best practices for medical transcription

Tasks:

Create an architecture that bests supports the user with finding and updating the library items
Create a copy and paste interface to allow the user to copy and paste certain special characters
Design and implement the relevant SQL tables required

Phase 4

Create an interface to compare the final letter with the draft letter and highlight any corrections

Tasks:

Using a string search algorithm and side-by-side view that highlights differences in characters
Keep track of correction statistics so that accuracy can be monitored and reported over time

Phase 5

Create a feature to record the time it takes to prepare the letters and the capability to produce a monthly invoice for the billable hours

Tasks:

Create a stopwatch type interface to start and stop the timer
Create a facility to generate a monthly invoice for the total billable hours for that month

Phase 6

Integrate all components into a unified software

Tasks:

Bring all components together into a unified software
Implement some statistical reporting