Encoding information is a key part of I.T.

Do you know what’s in your medical record? Does it contain mistakes or omissions?

The extraordinary response to our April 1 post about data transfer from PatientSite to Google Health (86 comments so far) made us realize that the time has come for patients to take responsibility for their personal medical data. Toward that end, we’ve begun writing about how to understand health data. And that starts with understanding a few basics about I.T. … information technology.

It’s become apparent that it’s not effective for us to just hope our health data systems were intelligently designed and reliably executed. (Hm, that sounds like the financial bailouts … assuming “they know what they’re doing” didn’t work out too well there, did it.)

So let’s get on with it. Prerequisite: Read the very short The I in IT stands for Information.



As you probably know, computers can’t actually store information (like the reality that “e-Patient Dave is a hunk”), they only store 1’s and 0’s. The process of converting real information into 1’s and 0’s is called encoding.

The person you see in your camera’s viewfinder is a reality whose image you want to capture (store) so you can view it (retrieve it) later. The camera creates a JPEG file containing 1’s and 0’s that encode the photo in an agreed way. When another program knows how to read the same encoding, the information has been transferred. (Experts will note that JPEG is a “lossy” format that gives up some information, to make the file smaller. That’s right, but it’s not central to this particular topic.)

It’s all about agreement: people agree on the encoding and decoding, so the original intent is preserved. Example: JPEG works as a data format because people (the Joint Photographic Experts Group, JPEG) got together and agreed how the data would be encoded and decoded.



In more complex cases, such as medical information, such groups will kick it up a notch: they agree on a vocabulary, an agreed set of things you can express. Vocabularies always involve a trade-off between completeness and usability: just as with an English dictionary, you can agree on a massive vocabulary, in which it’s time-consuming to find exactly what you want but there’s lots of nuance, or you can agree on a more concise vocabulary.

Some tasks require a rich vocabulary, some don’t. Some vocabularies overlap a lot, some a little. And some people have more use for more words about a particular topic (Wikipedia). So it’s important to choose the right vocabulary for the job.

With that as background, here’s a short write-up of various vocabularies that encode information about various things related to healthcare. I am indebted to our Dr. Danny Sands, who is something of a pioneer in “informatics” (healthcare IT), for the bulk of these descriptions.


Medical data vocabularies

There is no standard “consumer” vocabulary for medical conditions — that’s a work in progress. The list below is a partial list of medical vocabularies used by various professionals. These are concise descriptions; there are references where you can read about these, if you like. Some are:

CPT (Current Procedural Terminology) is a set of codes used for procedures, not diagnoses or conditions, including intensity of outpatient visits. The AMA licenses the use of these codes.

SNOMED CT is a clinically meaningful vocabulary. It’s provided without a license fee from the National Library of Medicine to all developers in the US.  It’s the closest thing we have to a universal clinically useful problem list vocabulary. Most commonly used in practice (office) EMRs, not hospitals. But it’s hard to build tools to make SNOMED easy to navigate. Also, not all systems use it.

BI96 is the controlled vocabulary used for the online medical records at Beth Israel Deaconess Medical Center. It is similar to SNOMED but not as sophisticated (was developed at the BI and sent to the National Library of Medicine to be shared as a clinically useful vocabulary.)

NDDF (aka “FDB“) is used by some systems to represent medication information. It is available at a price from First DataBank (FDB), a division of Hearst Publishing.

RxNorm is an incomplete effort by the NLM to create a drug vocabulary that is available to the public for free. It has never been robust enough for general use.

NDC is a way of representing individual prescriptions for inventory management or drug claims.

LOINC represents clinical observations and test results. Was developed at Regenstrief Institute.


It’s important to realize that there’s no guarantee any given reality will be encoded the same in two different vocabularies. As one of those references says,

17.7:  Comparing coding systems is not easy: Unsurprisingly, the same clinical concept might look very different when coded using different classification systems. …[their origins and histories] inevitably result in the use of different terms for similar concepts.” [emphasis added]

What’s important is agreement and to select the right vocabulary for the job.


And now we come to ICD.

ICD (International Classification of Diseases; Wikipedia) is the data set that my hospital selected to transmit to Google Health. Specifically they use ICD-9, the ninth edition.

ICD-9: Insurance billing codes. A much smaller vocabulary of conditions and symptoms than SNOMED.

For billing purposes this is theoretically appropriate, since insurers don’t need to know the subtle diagnostic differences that doctors need to understand. But the ICD vocabulary is weaker still, because it completely lacks many conditions, and has no way to encode that you were just checking for something rather than actually having the condition.

And if you can’t encode that fact into the system, there’s no way to “decode” it back out. Result: train wreck.

That’s what happened with my supposed “metastases to brain or spine” – they were checking for brain mets and didn’t find any, but the billing code that ended up in the system couldn’t express that. Same for my “intestinal parasitic infection.”  So then, when those billing codes are transmitted as if they were clinical reality, the result is wrong information.

And since the I in IT stands for information, wrong information=#fail.

Worse, in reality, billing codes are often misapplied by “coders” (clerks), to feed something into a claims payment system that it’ll accept. So the “information” placed into the system might not be something the doctor would ever have said in the first place.

Bottom line: in practice, ICD data is useless as health information.

btw, If you look at how ICD-9 is meant to be used (to capture disease frequency statistics), and then realize how it is used (to submit charges to billing systems, in an uncontrolled environment) it’s chilling. Are all the disease statistics we hear skewed by billing clerks trying to cope with an inadequate vocabulary??

It’s been said that ICD-10 is coming and will be better. Well yeah, ICD-10 was developed in 1992, and it still isn’t used yet in most systems. (Yes, healthcare technology is that slow to adapt to change!)

And besides, regardless of the vocabulary, if the process for using it isn’t reliable, and data gets inserted into systems with billing intentions, and then someone reads it out thinking it’s clinical information, disasters can – and do – happen.


So, ladies and germs, that’s tonight’s lesson on how information (reality) is encoded … our second lesson on health IT.



Posted in: e-pts resources | medical records | reforming hc | trends & principles




26 Responses to “Encoding information is a key part of I.T.”

  1. Here we go again!

    So this time, Dave, you have alltogether avoided the use of the confusing term EHR.

    You wrote: “Worse, in reality, billing codes are often misapplied by “coders” (clerks), to feed something into a claims payment system that it’ll accept. So the “information” placed into the system might not be something the doctor would ever have said in the first place.”

    That is EXACTLY why we should not talk about EHR but about ERR (Electronic Reimbursement Record). Then, all of a sudden, the data located in your ERR makes perfect sense. It is doing a fine job of getting the hospital. the doctors and all associated professional costs reimbursed. Who cares if this data is of absolutely no significance and of now proven ZERO benefit, at best. It was never designed to be of benefit to you!

    So, next time you hear of the $19 Billion allocated to the implementation of a solid EHR system across the country, write a letter to your representative and ask them to make sure the next generation of EHR be patient-centric, patient-friendly and above all accurate and designed to let you gain full access to your medical information, put in context.

    This may be the most important topic to improve the prospects of participatory medicine. Implementing current versions of EHR will take us back 20 years, with health professionals in full control of the information you may see about your medical condition. We should make sure it doesn’t happen.

  2. […] post in this series: Encoding information is a key part of I.T. p.s. Separate issue: remember, we haven’t yet gotten into poking around in Google Health […]

  3. ePatientDave says:

    I.T. basics for e-patients, part 2: Data “vocabularies” for healthcare http://is.gd/s65x (Hint: is yr data encoded sanely?)

  4. Yes, Gilles, I avoided using those terms… my goal here is to teach e-patients, and nothing dazzles and disempowers a newcomer like a slew of acronyms of indeterminate meaning.

    IMO, it’s not important to me if someone leaves this post able to say what FPD means; it’s important that they know vocabulary matters and that they know billing codes are not a vocabulary designed for tracking a patient’s clinical status.

    As for which systems do what, personally, I couldn’t tell you if my life depended on it. All I know is that PatientSite does contain clinical information but it also contains, hidden from my view, the billing codes. And it’s the billing codes that got sent to Google.

    I like the idea of ERR or “EBR” (electronic billing records).

  5. Dave the really crooked part of the story is that ICD codes were designed for public health purposes and not at all for billing purposes. Here is what WHO (World Health Organization, an agency of the United Nations) says about ICD-10:

    The ICD is the international standard diagnostic classification for all general epidemiological, many health management purposes and clinical use. These include the analysis of the general health situation of population groups and monitoring of the incidence and prevalence of diseases and other health problems in relation to other variables such as the characteristics and circumstances of the individuals affected, reimbursement, resource allocation, quality and guidelines.

    It is used to classify diseases and other health problems recorded on many types of health and vital records including death certificates and health records. In addition to enabling the storage and retrieval of diagnostic information for clinical, epidemiological and quality purposes, these records also provide the basis for the compilation of national mortality and morbidity statistics by WHO Member States.

    The fact that ICD-9 were transformed into de-facto billing codes in the US explain the strong reticence to move to the ICD-10 while the rest of the world is getting ready to move to ICD-11 and considering that there are no more new and usable available codes in ICD-9! Just last night I read an estimate of the cost to move from ICD-9 to ICD-10: a mere $400 million! No wonder this entire system is a mess!

    It doesn’t happen often but I am in disagreement with you over what is important here. The arcane and increasingly obscure reimbursement systems put in place since 1988 have a fundamental impact on the cost of care and on the fast growing national debt due to medical costs. It has become a fantastic way to shift money from the mass to a much smaller number of “experts”. If we want to avoid National bankruptcy over time we will all have to digg into this dry topic to better understand how 17%of our GDP gets swallowed by an unfair, uneven, and uncontrolled dysfunctional healthcare system.

  6. Very nice write up in the Globe too. Congrats!
    The link in case anyone missed it.

    I did a write up on my blog as well. I thought you did a great job and just put it out there as it was. When you go back into the way coding was done for years, yes there was some improvising done by doctors, and not their fault as they knew the codes that insurance companies would pay on, i.e. it used to be regular exams were not covered so they had to create a reason for your visit. That’s a real old problem that pretty much has been fixed but one that folks my be familiar with as it happened a lot so you ended up with a diagnosis for something you never had, but the insurance covered it and now when you get your PHR health records, where did this come form, I never had that…..

    You did what we have all been waiting to see, a real life no frills post, as it is, the ups and downs, transparency:) Nice work and I hope you decide to check out HealthVault in the same fashion, you may get even more than 86 comments on the next round:)

  7. Dean Procter says:

    Hi Dave,
    In IT we had an expression GIGO – garbage in – garbage out. Clearly medical ‘records’ are not always medical records, rather they’re billing records. Garbage when it comes to treatment.

    There are many issues with what google proposes, the chief one being why would you need to give them your records, or create any other copy of them, merely to share them with someone you have authorised to treat you?

    Security is a serious issue, not effectively addressed by google or any other proposer of a central storage system.

    A central ‘permission system’ is what is required, as part of a medical system which automatically provides only those with a need and your permission, to access your records, wherever they are. This might include insurance, VA, government, employer, doctor, doctor’s receptionist,medical researcher, all with different levels of access, but provided through a central ‘permission system’ with your knowledge and permission.

    The core function of the permission system would be authenticating the participants and providing the path to the information, without ever holding the information. It’s fairly obvious that a doctor isn’t interested in the patients billing information, rather the clinical information.
    Role based authentication would make it easier, more efficient and minimise confusion by not drawing conclusions from codes, or bills, and providing less distraction for the particular user.

    Moving everyone’s medical data to a central ‘warehouse’ will not improve patient care.

    The primary area where we can effect an improvement in treatment is correctly identifying the patient, then we can locate the required records, provide better accountability for treatment costs, reduce fraud and generally provide better outcomes.

    Reduction of costs through efficiencies in billing and fraud control will improve patient outcomes. Streamlining and securing payment and claims processes and reducing paperwork for medical personnel will improve patient care.

    These are the areas we should be focusing on, not a google-type ad supported medical records warehouse which provides little improvements in patient care whilst significantly increasing the risk of error, loss of data and privacy and additional costs for heathcare providers and consumers.

    By all means leave your records where they are, and of course upgrade doctors to electronic systems, but teaching every medical practitioner to speak in codes is a gross waste of education brainpower and resources.
    They already speak a language perfectly adequate to provide treatment. Make the system fit them, not the other way around.
    Best regards.

    DR Procter

  8. Dean,

    Yeah, I first heard GIGO several (ahem) decades ago. :–) You should go back and read my previous post, where I twisted GIGO to “Garbage out [of the old system], garbage in” [to the new one].

    Gilles, you are one step ahead of me. Just this weekend as I thought about this post I was reflecting on “Wait a minute, if this vocabulary was developed for disease statistics, why on earth is it being used for billing??” And indeed, one major missing factor is that the vocabulary has no way to encode “testing for this condition” vs “has this condition” or “suspect he might have this condition.” That would be a pretty fundamental thing to build into something that was designed for accurate billing.

    Dean, what are you talking about with “ad-supported”? I don’t recall seeing ads in Google Health, but besides, I have no difficulty ignoring ads that I don’t care about. Besides, Google’s not the point here. The point is the accuracy of my medical records. (And yours of course.)

  9. Dean Procter says:

    Dave, I was under the impression that advertising is how google is going to be able to generously provide the service, assuming their shareholders eventually want dividends they need to charge some way. As for ignoring ads, sure Dave, you’ve just been diagnosed with disease Z and google knows, so next minute you’ll have the screen telling you “Dave 87% of patients who take DrugA report improvement in symptoms”, Click to buy. I’m sure they’ll link you to a doctor somewhere willing to prescribe it to you if you can’t convince your own.
    Suffering patients are particularly susceptible to such advertising.

    I don’t propose starting with the hardest task first, merely improving the processes where we can without forcing people to learn new things or do more than they do now. Perhaps relieve them of some of the load and they can put some time into arriving at a common language for both treatment and billing. I am not sure that coding is going to be the easiest place to gain efficiencies or improved outcomes in treatment.

    I do agree that there has to be some form of common language for billing, but a common language is not the easiest way to make improvements in health care, and I would seek to pursue the easiest gains first.

    Perhaps I see the priority as who, what, when, why, who paid. The billers obviously see it from another angle. Google Health sees it from the view of how do they get their hands on all that valuable data, and get it into their system so you have to visit them to see it, and see the advertisers ads.
    That’s the real world of health records, but not a doctor or patient priority.
    The doctors and patients want their accurate records available for treatment, not marketing or research primarily.
    Who is the patient? the doctor? google health has neither the correct diagnosis nor the prescription to satisfy the primary concerns of either.

    I put it to you, just because we have a super modern warehouse with all the data in it doesn’t mean that patient outcomes will be better. It has little effect on the personnel and patients in the real world. To improve patient care we must seek to improve the processes that those personnel participate in, and where the information is stored is of little relevance when you are staring at a screen, what is relevant is whether they are the right records for the right person and that you have the right to access them. If that can be properly established, all else follows. Providers will be able to compete to provide records storage services and if it is easier and more efficient then customers will use their services, but it makes little difference where the records are if the mechanism to share them is in place.
    Everyone is in danger of being distracted. Electronic records make sense, provided they are safely stored and adequately backed up and google may one day provide a better service than their competitors, but until all the newly created records and the old ones are digitised (impossible) there will be gaps, errors and ommissions. Simply tipping all the records into a barrel, albeit a sophisticated search enabled one, will not suffice.
    While no-one can argue that e-records can improve efficiencies, they can also degrade services in the short term unless properly conceived and executed.
    That means we need to improve our current processes so that whatever we build upon is a solid foundation for achieving our real goals, better outcomes and lower costs.
    While others herd the cats, I’ll streamline their processes so they have a little time to listen to your message and learn new things.

  10. Dave’s story has succeeded in having me do a 180 degree turn.

    I think we’ll be much safer having our health & medical data hosted by Google/Microsoft or whoever ends up being the main provider of cloud-based PHRs than what currently exists.

    At least with Google we’ll end up having some way to control the information, unlike today where we are completely at the mercy of a few people who tell great stories at conferences and would make you believe that current EHR can have great positive impact on the quality of care. As with everything else in Medicine we should adopt a strict “Verify and then Trust” policy regarding all statements made about the capabilities of the existing EHRs. It would be nice to get some idea if they are almost pure EBR (I like Dave Electronic Billing Record. Sounds better than Electronic Reimbursement Record) or if they can also be used for more useful purposes.

    It is possible that some of the people involved in the design and implementation of EHRs are following our conversations. It would be nice to see them provide some answers.

    Like everything else in today’s world, in our post G. Bush time, transparency about hospital EHR should be an absolute requirement. Anything else will just provide great opportunities for investigative journalism. There is just too much that rest on having effective EHRs to keep relying on rational ignorance.

  11. SusannahFox says:

    Catch up #3: @ePatientDave unpacks the role of encoded information in health I.T. http://is.gd/s65x

  12. Someone who signs herself SusanF has commented on the blog of John Halamka, my hospital’s CIO. Susan, I can’t reach you (no contact info on the comment), but I hope it’s okay that I further broadcast what you said there. John’s post was titled The Limitations of Administrative Data. Susan’s comment:


    No kidding – the limits. Six degrees of separation is more like it.

    1. The US took a system which was largely created for mortality reporting and made a clinical modification so it would be useful for morbidity too.

    2. It is required to be reported for billing.

    3. Payment policy is designed around the codes.

    4. True use of the codes changes dependent upon the payment policies of the insurance company (not supposed to happen under HIPAA, but check with your billing office).

    5. ICD-9 is now over 30 years old because ICD-10 implementation, which would help, was delayed for a decade because the transition would be too hard.

    6. Now its supposed to somehow be used for true clinical condition feedback to the patients.

    It’s like we are trying to use a PC with 64MB of memory to support the EHR. It just won’t work very well.

  13. Dean Procter says:

    I can see the surface level attraction in having someone take care of your records for you, but as with your choice of doctor, perhaps a choice is in order.
    When I put my well worn security hat on I note that it is much more easy to compromise or break into one place to steal/alter/hijack your medical records than perhaps if there are many such repositories of information. ie if ‘super-google’ health went down, no-one would get treatment, and if I wanted to access your records I know where to hack/look. Rather the other scenario where I may not even know where to find them, without your co-operation and I may have to crack several systems in order to get to them without your permission.
    Why I might choose google escapes me, considering there are other well established and proven companies such as IBM and EMC who are better experienced, qualified and equipped to do so. However they obviously haven’t put the PR machine to work in the same way.
    It won’t be the PR machine looking after your records so best look behind and see what is really happening. You don’t want them to dissappear into a cloud.

    One of my first medical records projects was in the 90’s and the goal was to communicate the statistics of procedures performed by doctors particularly in the area of the ‘plumbing’. The overwhelming and alarming conclusion of the data was that there were a lot of doctors performing procedures which they were unqualified to perform, based on the results of tests interpreted by personnel unqualified to interpret them after being administered by others untrained and unqualified to carry out such tests.

    The experience, which was for a world leader in the field to present at a global conference, left me with a strong and lingering fear of medical treatment. Things haven’t changed. I certainly would like to see improvements in medical services.

    Perhaps one good thing about a potential central record data store is that potential patient/victims may at least check whether their doctor is so qualified, and their level of experience in the particular procedure, and whoever trained them, along with the experiences of the patients, even those who perhaps fail to survive the patient experience.

    Of course this will open up a big can of worms for lawyers, liability, marketing tactics etc.
    Far be it from me to suggest that perhaps that would be as good a place to start as any, if you want to start storing medical records and making them publicly available.

    The data on procedures is readily and generally electronically available, is easily sanitised for patient privacy, and would ensure that patients can encourage doctors to be better qualified and educated.

    Perhaps this might be a nice project for google to cut it’s teeth on. If google can perform that, with no doubt, much litigation from doctors, then perhaps afterwards they my be given the chance to handle actual patient records.

    Our objective is better patient outcomes for fewer dollars isn’t it?

  14. […] data. And that starts with understanding a few basics about I.T. … information technology.” Article e-Patient Dave, e-Patients.net, 12 April […]

  15. […] (For those who want more details, the ICD-9 billing codes are listed here and the Wikipedia page about ICD codes is here. Our post on other medical data formats is here.) […]

  16. […] in using medical billing codes to infer information about medical conditions.  E-Patient Dave delves more deeply into the medical coding issue on his blog and offers a good round-up of the various codes used by medical providers.  These […]

  17. […] » Encoding information is a key part of I.T. | e-Patients.net […]

  18. Aaron says:

    So how do we get the health information providers to shape up and get them to transmit the information correctly to google? I have also tried this with Quest Diagnostics + Walgreens and I also see some detail errors in the conduits that nobody is interested in taking responsibility for. Even the google health google group seems to be completely unmonitored. Granted, I’m not in the medical IT field (nor do I want to be, knowing this) but I certainly want to able to assist in any way possible.

  19. […] the hospital is moving to support the SNOMED-CT codes instead.  The patient and his doctor have blogged about their lessons. All well and good. But the larger, business question should be asked at this point. […]

  20. Hi Dave,

    You have a great blog. I found your post very interesting to read. We work with the NHS in the UK and have just launched an additional module to interface with the NHS ESR (Electronic Staff Record) system, I think this a step in the right direction that the NHS has the facility to allow third party systems like ours to syncronize with their central database.

    We do not deal with Patient records, but my point is the accessability of the data is important. I wonder how long it will take Google to launch Google Health in the UK, as far as I know all my personally health records are still kept on paper.

  21. […] more info on data formats see our post on data vocabularies. Share and […]

  22. Donald Green says:

    An EMR was developed for my office and the vocabulary was developed by allowing free text for 6 to 10 months and then converting it into a usable table from there. By using past ICD codes for these assessments, friendly descriptions were matched with the ICD codes.

    Now 4 fields emerge, a friendly description(the clinician’s preferred language), the ICD 9 code, and an additional field comment that is free text with a brief descriptor, and finally room for a longer comment in free text. It is the clinician’s choice which to leave in the record. However if the ICD9 is used it will always appear with a friendly description.

    Program Maintainance allows new additions, or editing. Nothing can be removed however since no longer used or inaccurate vocabulary may still be tied to a record and must stand. The onus is put on the clinician to keep documentation accurate, not on the EMR itself.

    The system was created to be “held open” until it is complete but once closed(electronically signed) the user has only the same calendar day to change it. After that it can not be changed. Of course there is always the option of inserting a new “Encounter” that corrects any past Encounters. This is the equivalent of putting that neat straight line through the record, dating it, and initializing it as in other legal documents.

    Yes there may be some initial extra effort to to customize language to standard vocabularies but the result, once established, is a more accurate documentation process that can be used by its creator and anyone who is a legitimate entitled caregiver to the patient or the patient themselves.

    I am, however, a strong believe that any data used outside of the relationship that created it must be corroborated by a new circumstance of a different user and the patient involved. No data not produced by a current user should not be taken for granted. e-patient Dave has already told his story about that one.

  23. Fascinating comment, Don. I love that the vocabulary was developed by capturing what people actually say!

    Interesting that you code using ICD-9. What do you do when you’re testing *for* something, and the diagnosis isn’t present?

  24. Hi,this is an excellent article,I found it on bing and I love it very much,I agree with what you have said, lots of things will be learned form your site,but I still have some questions with the last part,can you explain it for me ?I need your answer,and I will keep on watching your blog

  25. hello,this article is great,I found it on google and I appreciate it very much,I agree with what you have said, it help me a lot in decision,but I still have some questions with the last part,can you explain it for me ?I need your answer,and I will keep on watching your blog

Leave a Reply