Phase 1
Create a machine learning model that takes in as input an audio dictation and outputs the corresponding transcribed text.
Tasks:
- Investigation of different approaches is required. One option is a supervised learning approach using Fourier transformations and Long Short-Term Memory network(LSTM).
- Investigation of using CTC (Connectionist Temporal Classification) decoding with a beam search algorithm as a loss function.
- Investigation of different data sets for training is required. Mozilla open voice and KenLM are options.
- Existing audio files and transcribed content will need to be extracted and compiled for input into the training process.
- Cardiologist 1: 72 weeks @ approx 20mins per week = 24 hours of content
- Cardiologist 2: 72 weeks @ approx 1hour per fortnight = 36 hours of content
- Document the software. Create and execute tests.
Considerations:
- LSTMs are a better approach than Recurrent Neural Networks(RNNs). The problem with RNNs is vanishing gradients for deep networks.
- LSTMs work by having a forget state, so when the network gets deep enough it truncates the network so that it is not as deep.
- An LSTM will essentially predict the next token (word or otherwise) based on a seed sequence of words.
- A beam search algorithm looks at the cumulative probabilities of utterances based on a specified beam value. Beam search is an alternative to a greedy algorithm approach that only looks at the single highest probability at each step.
- Need to think about the integration of an acoustic model and language model in the architecture. The acoustic model takes audio as input and converts to probabilities over an alphabet. The language model converts the probabilities into words of a language.
- Is there a way to convert the 60 hours of content into the required format using a programmatic approach rather than processing it manually one at a time?
Phase 2
Implement an SQL database to better manage patient, medical professional, and letter details.
Tasks:
- Create descriptive requirements
- Generate ER-diagram from description
- Generate the relational model and create tables
- Create a GUI interface to interact with the database (insert, update, delete transactions)
Phase 3
Create a library tool of complex medical terminology and best practices for medical transcription
Tasks:
- Create an architecture that bests supports the user with finding and updating the library items
- Create a copy and paste interface to allow the user to copy and paste certain special characters
- Design and implement the relevant SQL tables required
Phase 4
Create an interface to compare the final letter with the draft letter and highlight any corrections
Tasks:
- Using a string search algorithm and side-by-side view that highlights differences in characters
- Keep track of correction statistics so that accuracy can be monitored and reported over time
Phase 5
Create a feature to record the time it takes to prepare the letters and the capability to produce a monthly invoice for the billable hours
Tasks:
- Create a stopwatch type interface to start and stop the timer
- Create a facility to generate a monthly invoice for the total billable hours for that month
Phase 6
Integrate all components into a unified software
Tasks:
- Bring all components together into a unified software
- Implement some statistical reporting