Medical Mad Libs? Not Anymore
.jpg)
SPEECH RECOGNITION AND TRANSCRIPTION TECHNOLOGIES ARE TRANSFORMING THE WAY THE MILITARY HEALTH SYSTEM DOES BUSINESS.
Speech-to-text recognition software wasn’t so hot a few years ago. A simple sentence— much less one with complex medical lingo like “granulomatous”—came out looking like a kid had played Mad Libs with it. The technology still needs work today, but has improved dramatically and military hospitals are making serious use of it.
To learn just how that technology is being applied to health and patient care, MMT spoke to Lieutenant Colonel Stephen Yoest, assistant chief of the radiology department at Madigan Army Medical Center in Tacoma, Wash. “We are a PACS-based [Picture Archiving and Communications System] so we are an almost entirely digital radiology department. Our PowerScribe voice recognition system is integrated into the workstation. It automatically loads the patient information so that we can just begin dictating on that exam, and it will automatically link what we say up with the [electronic health record (EHR)].”
Yoest said he speaks into a USB microphone, and the system recognizes and displays his speech on the screen in real time. He can use either macros—voice commands for key phrases—for routine examination reports, or he can supplement his reports with free text commentary.
“Say I’m reading a chest X-ray and it’s normal—I could just say ‘chest PA lateral’ and it will automatically load that phrase into the [patient record],” said Yoest. “But suppose there was an incidental finding that I wanted to add. I would just say, for example, ‘Incidentally noted is a 3 mm, sharply-circumscribed, densely-calcified nodule in the right upper lobe… most compatible with an old granulomatous disease. Period.’ It would recognize that, paste it into wherever I have my cursor prepositioned, and I would just click a button to approve it.”
“I like that when I bring the examination up it immediately loads the correct [EHR],” said Yoest. “The alternative is to manually enter the examination number or to use a traditional voice dictation that goes to a transcriptionist and there’s a greater risk of the report that I dictate getting matched to the wrong patient examination. The accuracy of the overall reporting is better because it’s more likely to reflect my intent since I can see it right as I’m dictating. So if I inadvertently say ‘left’ instead of ‘right,’ I can correct it immediately rather than having it come back to me in several days when I may or may not remember whether it was left or right.”
Yoest cautioned that no technology is perfect, however. “There are some minor, idiosyncratic errors. For example, it doesn’t like the word ‘the’ or ‘a’ so much because it confuses those words with people who say ‘uh’ as a verbal pause when they’re talking. Sometimes it will drop those words and I have to go back and reinsert them, but I don’t think the error rate is significantly different than that of the reports I get back from a human transcriptionist who may or may not understand what I’m saying anyway. For me, it’s not an onerous error rate that reduces my efficiency.”
Yoest said that not everyone gets the same results, however. “Some people have better voice recognition than others. We do have the option of sending our reports through the system to a human transcriptionist. We have, for example, one radiologist who’s French, and speaks with a thick accent. His speech recognition is not that good, so he sends everything to the human transcriptionists. There is a menu dropdown box on the application toolbar that you just select to have that work go to the human transcriptionist and then just dictate as usual.
“I think the happiness of individual radiologists with speech recognition technology really depends on how quickly and clearly they speak. I am very happy with it and I think most of the staff here are. But I know there are some in my department and in the professional world who aren’t as happy with a lot of the systems out there. It has to do with how quickly and fluently the radiologist speaks into the system, not the technology itself. If you speak really fast and slur your words together, no technology is going to be very good at recognizing that. The people that I’ve seen that have problems with it fall into that group.”
Yoest said he reads on average about 50 examinations a day, primarily MRIs and CTs. “The main reason I like it is the turnaround time is much faster,” said Yoest. “It’s essentially completed as soon as I finish dictating and close the examination. That report uploads into the hospital information system and it’s immediately available to all the clinicians. So we can read more or less in real time rather than seeing a two- to three-day wait for our reports to get turned around. I think ultimately that system improves the quality of patient care and that’s what’s most important to us.”
Madigan and other military health system (MHS) facilities use the DoD-wide proprietary Armed Forces Health Longitudinal Technology Application (AHLTA) health care information. Developed in 2004 and managed by the Clinical Information Technology Program Office (CITPO), AHLTA serves active-duty personnel and veterans, and their families.
Robert Weideman, senior vice president of marketing for Nuance, echoed Yoest’s comments. “The VA approached us with the challenge of doctors not wanting to type. They’re used to phonedialed dictation and hanging up and having someone else listen to that and key it into the patient record. Doctors don’t want to be data entry operators, and that’s where our product, Dragon Naturally-Speaking Medical, comes into play. It overcomes the resistance to those systems.”
“Most errors happen by doctors seeing 100 patients a day and not remembering the patient versus the approved patient record,” Weideman said. “You’re getting the doctor closer to that patient encounter. But as with any profession, doctors work in different ways. We offer broad solutions to allow the doctors to work the way they want—dictating into a recorder or pocket PC, using a cell phone, or live into the application. New doctors want to use a cell phone or a pocket PC.”
Chris Spring, senior product manager for speech recognition for MedQuist, maker of SpeechQ for Radiology, said, “what happens is a lot of physicians will say, ‘you’re turning me into a transcriptionist.’ But if that physician has their report an hour after the tests are run, well, that has a patient care value beyond just the cost savings. Radiologists need and want to be integrated at the desktop level with PACS. It’s about using it as a tool in your workflow to improve patient care.”
MedQuist was ranked first in front-end speech recognition by the KLAS 2006 Speech Recognition Report. Like its competitors, SpeechQ also supports macros and pre-populated keywords. “The track record’s pretty good among our active users. We’ve seen a positive ROI [return on investment] in 8-18 months,” Spring said. “Speech recognition is a tool to lower your reporting costs and turnaround time. We’ve had dramatic cases where facilities were able to go from seven transcriptionists down to 1.5.”
Dale Kivi, vice president of marketing for CyMed, Inc., said that front-end solutions— that is, real time speech-to-text which is edited right at once on the desktop—are the easiest and least expensive option from a technology perspective.
Before the emergence of back-end solutions, he said, companies like Dictaphone and Lanier dominated the voice capture technology portion of the market while direct labor was provided either in-house or through an outsourced service. Back-end solutions as an alternative to human labor has been on the way for the last 10-15 years but has just recently turned the corner in terms of being able to compete with the quality and total process cost of traditional transcription, although results can vary substantially. This struggle for transcription technology superiority resulted in the acquisition of the dominant voice capture technology players by speech recognition leaders. Dictaphone was acquired by Nuance, the Dragon speech engine firm and Lanier was acquired by rival MedQuist Inc., which is owned by Philips with its own speech recognition technology.
Kivi said it’s cost-conscious facilities to gain access to the more sophisticated back-end model by going with an ASP or Internet-based service model to avoid the capital expense of buying the onsite server. “Back-end speech recognition is not one size fits all. It’s expensive because of all the upfront horsepower needed for voice capture systems, database management platforms, the speech recognition software and the medical discipline lexicon. Back-end systems keep the audio file for each individual dictator, and have the ability to learn based on comparing the final corrections made to the approved report with the draft generated by the speech recognition engine. So the accuracy improves over time without the speaker having to change their habits. That part of technology is the key benefit of back-end recognition since many physicians don’t want to change their dictating habits.”
“Traditionally, physicians would pick up the telephone, key in a patient number, then dictate medical stats like blood pressure,” Weideman said. “The recording would then be stored on an onsite server where it was accessed by transcriptionists who create a first draft transcript that was then returned to the physician for review and approval. In emergency settings, this would happen in a couple of hours ... otherwise, in a couple of days. The labor needed to do that manually costs about $10 billion a year in North America alone.
“So we’ve taken that traditional approach and put speech recognition on the back end to do most of the work, and to allow doctors to behave as they always have. Instead of transcriptionists getting a raw recording, they get a first draft to edit that has already gone through speech recognition. We can save the VA and hospitals 50-80 percent of the processing cost with this approach. You can extrapolate that the more front-end speech recognition that’s used to create an EHR, the greater the cost savings.”
“On the other hand,” said Kivi, “Frontend accuracy is dependent on how the dictator is able to change the way they dictate for the way the engine listens. The technology is making tremendous strides, and there are specific disciplines—like radiology—where it can yield great results due to the limited vocabulary and because the physicians dictate in a quiet back room where they’re by themselves.
“When the vocabulary is smaller and the environment is controlled, that’s going to put you into the ‘sweet spot’ for speech recognition accuracy scores. As the technology continues to improve, an increasing percentage of medical work types or disciplines will qualify for that sweet spot, but presently there are limits to its ability to distinguish what is intended speech and what is background noise distortion. In the end it is still a physician by physician decision for who can deliver a positive ROI compared to traditional methods.
“If you don’t get a good first draft, speech recognition can become a very poor purchase decision. Some companies buy it only to see their costs shoot up because it takes time and money to play ‘Where’s Waldo’ with the errors in the report. And that problem can be view quite differently if the errors are found by editors or if they need to be corrected directly by the physicians, which is expected with most front-end solutions.
“CyMed’s approach is to optimize the full process cost, quality and turnaround time. We apply speech recognition only where we can get a positive ROI on the first draft. Based on the expected quality scores that we can monitor in the background, we may use manual labor instead. There are a lot of products that convert voice to text, but you’re being short sighted if you don’t consider how much manual editing is required. The technology might produce 97 percent accuracy but the problem is that the standard acute care medical report has 300 words per page. That means having 10 errors per page and that is not acceptable to anyone in the medical field. Even an entry-level person will only produce about one or two errors per page.
“Health care costs are rising, and controlling costs is most often the driving force for investigating voice recognition. For this reason, buyers need to not lose sight of the full process picture. A typical transcriptionist may cost between eight and ten cents per line to create a report draft while the major backend players will charge half that to generate their draft. The problem is that the first draft is not the end product. The end product is a 100 percent accurate, properly distributed medical record. The full cost includes that voice capture technology, workflow management system, and speech recognition capital expenses that do not show up in the per line charges. And that’s before you consider the additional editing costs.”
The important question, Kivi said, is which combination of direct labor and/or technologies provides accurate reports at the lowest cost?
“At CyMed, our approach is to help people decide whether, when considering the complete process and all costs involved, it is in their best interest to buy or rent a solution. Is your technology going to be obsolete long before you pay off the capital investment? How consistent are the dictating habits of your physicians? Will the physicians object to editing their own work? Identifying where the proper balance is, that’s what we do. We help define a complete process and service solution. We won’t save money in one area only to spend more in another.”
Ken Lacy, CIO of Precyse Solutions Inc., offers outsourcing through its PrecyseNet, an Internet-based dictation and transcription system. “If you can generate the same document by streamlining processes with technology and gaining efficiencies, that’s a win. People are focused on speech recognition because it can offer a more cost effective way to do transcription. We’ve decided to use M*Modal’s AnyModal CDS, because it has good natural language recognition. We already had a good workflow so we just integrated it into our toolset. The University of Pittsburgh Medical Center was real excited about M*Modal and they set us up with them.”
Greg Horton, director of product management and marketing with M*Modal, said with most front-end dictation and recognition, physicians have to not only change how they speak, but also do voice editing. “Which is sort of a pain and can slow down your time to document. If you listen to dictation recordings, you’ll hear the doctor give instructions to the medical transcriptionist (MT), or say ‘how are you today?’, or give instructions to build a list. You don’t want the transcription to say ‘build a list,’ you want the list to appear. Our technology is able to handle those conversational aspects, pauses or ‘ums. Those sort of language delays are taken out; we’re adjusting for speaking speed and style, and that’s happening in an automated, software-based way.”
“We’re not selling a server or a software license; we’re offering a technology-driven, cost-effective service. It allows us to keep updating the system without having to patch and release things to customers, it’s all internal.
“AnyModal CDS doesn’t change what the physician does; it’s a service that makes the transcriptionist more efficient. It’s a unique technology for natural language and speech along with an encoded document that is highly shareable.
The document that we produce is not simply a bunch of text. It’s actually an XML document that it based on the Health Level 7 Clinical Document Architecture (HL7 CDA) standard [for sharing medical documents between EHR systems]. It’s the text that the doctor spoke, but we have organized it into meaningful sections [Subjective Objective Assessment Plan note]. We’re able to pull out data like blood pressure, pulse, temperature, any sort of measurement that’s dictated, can be shared as data between systems—we can code all that.
“So our goal is to be able to take the doctor’s dictation, produce a document that can be corrected and validated, and have the output of that be input into an EHR system.
“We host everything on our servers and we have a transaction-based model—we charge per-minute for dictation, you pay by usage.
“We’re doing back-end speech recognition, the physician dictates as usual with a handheld digital recorder or into a telephone; a PC is an option but probably the least used because a lot of physicians are dictating in the home or between meetings with clients and the PC is not always available. Typically, all those recordings then go into a transcription workflow. Maybe that workflow is managed by the hospital, or it’s outsourced to a third party like Precyse. In either situation, the [Web services-based] workflow takes control of the recording to decide what to do with it. One option is to send it for recognition and creation of a draft document that an MT—or potentially that the physician—could review.
“We return the document with a score to say how much editing work we think is necessary to complete the document. Our customers have the option of saying, ‘well, that score requires too much editing, I’m just going to send it on to manual transcription. Or [more commonly] they can say okay that’s a good score I’m going to send it to a medical language specialist who—rather than typing from scratch—listens and makes edits to the draft document.”
M*Modal doesn’t do business directly with military, he said. “Several of the largest transcription outsourcers like Precyse are our customers. If some part of the military had its own transcription platform, they could contact us to do the back-end speech services,” Horton said. Because M*Modal initially learns from physicians’ existing written reports, physicians don’t have to change their dictation habits or train the system. “Our technology can be put in place without the physician seeing it. There’s no training process where the physician has to speak a series of paragraphs. Dictation is the most efficient method of documentation for the physician, so we’re trying to fit into that process rather than change it. We’re trying to make the people supporting it more efficient, by allowing them to edit rather than type.”
“If it takes the physician more time, they either need to be able to see fewer patients or their day gets longer. I think that results in a lot of the resistance of physicians to adopt EHR because it’s an extra burden. No one is going to say, ‘oh, you can take two more patients out of your day.’”
Another company, Focus Infomatics, Inc. has been serving medical facilities since 1999 with traditional (non-voice) transcription and speech recognition outsourcing services. “Our services save costs, reduce TAT’s and improves quality, said Chris Blue, Sr., Focus vice president of sales and operations. “Our dedicated effort to become specialists in support of speech recognition has certainly contributed to our success and stature in the marketplace.”
Recently ranked the number one medical transcription service organization in the country by KLAS, a research and consulting firm specializing in monitoring and reporting the performance of health care professional service firms, Focus currently services over 100 medical facilities nationwide. “With offices located in Boston and Los Angeles, we currently produce over 350 million lines annually, while maintaining an unwavering attention to detail that ensures success,” said Blue.
Focus has a multi-faceted approach using voice recognition as a medical transcriptionist tool helping them work more rapidly, and with increased accuracy. The software builds an inventory of words for each individual physician so that it may recognize the dictation more exclusively and precisely. What dictation cannot be recognized is transcribed by MT’s separately, (5-10 percent). The remaining work is recognized and edited, so that a complete and thorough job is performed on every report dictated. Physicians do not change the way they dictate at all. Although the process is seamless to the customer in use, Focus said that great differences will be seen in turn-around time, improved quality and certainly billing.
“We believe the key to our KLAS success has stemmed from our loyalty to customer attention and the dedication of our MT’s,” said Blue. “Focus Infomatics ensures a promise to personalizing your customer experience. This approach not only creates a productive and proactive relationship with our customer base, but further validates our commitment in making sure that transcription runs effectively and efficiently for each individual hospital.”
Recent MHS contract solicitations, though decentralized and disparate according to facility size and needs, for the most part share the same general criteria. The MHS is looking to outsource and use technology to move medical transcription offsite and out of their hospitals and MTFs. Contract announcement descriptions for separate VA medical centers in 2006 sought contractors to “electronically transmit all transcribed reports directly into the specified government [EHR] system” and that dictated material would include “highly technical terminology” requiring “a comprehensive knowledge of specialized vocabularies.”
Not insignificantly, one of the 2006 postings included a provision that “the contractor is not allowed to move work performed … outside of the United States borders.” Overseas transcription—and medical data security in general—has been in the media spotlight in the wake of September 11 and GWOT as the VA has repeatedly come under fire for security lapses.
In 2002, a whistleblower alleged to the VA and FBI that her employer, MedQuist—the largest medical transcription company in the U.S.—was outsourcing medical reports on active-duty soldiers in Afghanistan to transcribers in Pakistan and India. Susan Purdue, a computer systems administrator for the company’s Asheville, N.C., office, alleged in The Asheville Tribune that she had seen a medical report “from an American soldier who had been shot in leg in Kandahar, Afghanistan. He was telling his doctors where he had been, what unit he was with, what weapons he had been firing and said he wanted to get back because his unit was being redeployed to the Korean DMZ.” MedQuist denied the allegations, and later terminated Purdue and closed the Asheville office. Purdue’s claim predated the Sarbanes- Oxley Act of 2002 which shields whistleblowers from retaliation and prevents companies from shredding or otherwise disposing of documents during an inquiry. In an unrelated incident, a Pakistani transcriptionist in 2003 threatened to post University of California at San Francisco Medical Center voice recordings and medical records to the Internet if she were not paid.
These incidents highlight not only the dangers of overseas outsourcing—where transcriptions’ per-line wages are a fraction of that of their U.S. counterparts—but also its rise as companies try to stay competitive by cutting costs. A 2005 U.S. Air Force Special Operations Command contract solicitation stipulated that businesses should “provide pricing per line of dictation.”
“Everyone is trying to get transcription cheaper,” Horton said. “There’s a lot of downward pricing pressure. I think as customers hear that they’re starting to use speech to improve their performance, customers say well then give me a cut, reduce my costs.
“There’s some irony in transcription in terms of that. Most health care providing organizations see transcription as a cost burden. But pushing that documentation process to transcription is much cheaper than having the physician do it themselves. So even though it’s more efficient and cost effective there’s this continual drive to push down the costs of transcription.”
Lacy said although Precyse’s Internetbased workflow doesn’t “push international work, but to be cost-competitive it’s sometimes necessary. Because of HIPAA and the stories out there we have to pay close attention. We have a security compliance officer, and our process and controls are identical regardless of whether the work is done domestically or internationally.”
Kivi said CyMed, the third-largest medical transcription company in the U.S., was recently acquired by Philippines-based SPi Technologies, with production facilities in Manila and India. “Outsourcing is driven by a continual increase in the need for medical transcription at a time when there is a substantial decrease in the domestic labor force. There’s more work and less people to do it, so you have to either send the work offshore or come up with new technology that requires less labor. That’s why speech recognition is such an attractive alternative, especially where there are reasons to not send work offshore. Some contracts always will—and should—require domestic labor. That’s why industry leaders such as ourselves will continue to invest heavily in the development of speech recognition, while at the same time, maintain a strong pool of domestic transcriptionists for those scenarios where they are still the most cost effective full process option.” ♦





