Imagine what would happen if you ran out of numbers. Now imagine how a computer would respond to this, and the necessary precautions to properly control such an event. Despite the end of the Mayan calendar on the date of December 21, 2012, the world did not end. Similarly, no apocalypse occurred when Y2K was realized. Y2K, as you may recall, was when the year changed to 2000, and any system which tracked the year as a 2-digit number would have to deal with a successive year that is smaller than what it followed with a zero value. Just to make things more interesting, financial systems simultaneously also had to handle the death of many European currencies and the birth of the euro, which likewise occurred at the year change. Despite all of the preparations, some problems did occur. This included the occasional newborn who was born 100 years old, and past due notices with 100 years of fines demanded.
In the IT world, the more common phenomena is the need to expand data fields due to running out of numbers, with expansions to larger numerical fields (such as a short int becoming a long int) or using alphanumerics where digits had sufficed. After a history where I will discuss the breadth of these concerns, I shall settle in on the healthcare impact of the switch from ICD-9 to ICD-10, and how it affected regression testing, keeping in mind that even a defined coding style evolves as new diagnoses and procedures require addition.
The most common problem I hear about is the Year 2038 problem, another calendar issue. A 32-bit integer field counts up from January 1, 1970. Everything is fine until January 19, 2038 when that field overflows or wraps, and like Y2K, we have to deal with a future number that “increments” from a very large number to being zero. However, we have 2 decades to sort that out, and most PC’s have already made the leap to being 64-bit instead of 32-bit themselves (although this does not magically fix the concern). The moral of the story is to always have enough high bits to deal with something that increments.
What happens when you have numbers that are not based on counters? The designers of the Internet Protocol decided to use IP address numbers using 32 bits (yes, the same length as the UNIX time counter we just discussed), dubbed IPv4, represented as 4 numbers in a range separated by periods. However, the Internet grew much quicker than expected, and larger chunks of IP addresses were handed out to companies than they needed. The IETF chose to solve this problem by creating new IP addresses as IPv6 which uses 128 bits, the same complexity as a 16-character GUID you might get to unlock a software package.
VIN numbers had a similar problem and would have run out in in 2010, but not by running out of numbers. The 10th position of all 17-digit Vehicle Identification Numbers, since 1980, has been encoded as an alphanumeric character, minus to avoid confusion with other alphanumerics, leaving the ability to clearly define 30 different years. Rather than lengthening VIN numbers beyond 17 characters, it was decided to change the 7th character from being a digit to being a letter (for cars anyways), allowing us to get another 30 years to decide what to do next. As a result, any VIN checker that would have balked at a letter in position 7 needed to be revised. Once again, time is a culprit, always wanting to increase in size beyond what had once been deemed an acceptable maximum. Using Mod_Security, you could implement some security filters for the “validate_logon” URL such as this to prevent letter input:
Like a stress test approaching its breaking point, we can see that ISBN numbers (those are for books), telephone numbers, UPC numbers and Social Security numbers will all start breaching alternate area codes or prefixes before running out. Fortunately, the timelines for these extend to around 2090-2100 at current rates (if current rates can be trusted). Imagine the complexity of modifying a phone system if an extra digit needed to be added to the length! Or every POS system if SKU numbers became a digit longer. We are talking a programming and testing nightmare!
Pulling from our own case studies, we have one where way past 100 years were needed before customer numbers threatened to exceed the defined indexing field. While easy as a concept, the execution needed to be flawless for all routines that handled this identifier. In this case, it was a 7-digit field that became a 9-digit field.
But as fascinating as all of this background has been, let’s not forget that you began reading this because I promised to discuss migration to newer ICD codes (9 to 10 now, 10 to 11 later). Well, obviously there will be lots of deprecated, altered and new codes to be tested. Equally known, there will be new codes periodically added, such as Zika, a more detailed coding than ICD-9’s 066.3 (mosquito-borne fever NEC) or ICD-10’s A92.8 (other specified mosquito-borne viral fevers), which will have a new ICD-10 diagnosis code of A92.5 Zika virus disease. This begs the question of how micro encephalitis, a potential birth defect related to it, will be dealt with. Like Dewey Decimal, ICD has the advantage of adding codes by going to the right of the decimal point, so there’s always room for new codes.
However, ICD procedure codes made a sleek maneuver in the journey from ICD-9 to ICD-10: the first character gained the ability to be a letter instead of a digit. No other procedure code character before ICD-10 had gained this superpower. The diagnosis codes for ICD-10 enjoy the addition of sixth and seventh digit classification, and the procedure codes greatly increased the number of letters that they could begin with. Also, between ICD-9 and ICD-10, the combined number of diagnoses and procedures jumps from around 17,000 to around 155,000.
Making the conversion between ICD-9 and ICD-10 more interesting, not all codes are capable of conversion, because not all things map one-to-one. This includes some ICD-9 codes that potentially map to multiple ICD-10 codes.
The complexities in the switch from ICD-9 to ICD-10 include the following:
Therefore, the testing needs to expand deal with each scenario that I have just mentioned, which can apply to multiple places in the program. Any time a code is read, added, replaced or transferred, it needs to not be curtailed at the previous maximum length. Any flagging of unacceptable letters in the codes need to be cleared but with new limits put in place for the new acceptable letter range per position in the code. And lastly, a method is needed to handle any code translation if needed especially for dealing with situations with no clear one-to-one matching. Imagine the complication of querying the existence of a particular code if you need to consider both ICD-9 and ICD-10 where a one-to-one relationship does not exist .
ICD-11 is still being defined, but the beta draft has letters in the 2nd and 4th positions for diagnoses. ICD-11 already has Zika virus built-in and assigned to 3 different codes: XB01.87 for Zika virus, 1E28 as Zika virus disease, and KA02.1 as Congenital Zika virus infection. According to WHO, as of January 23, 2017, the ICD-11 diagnoses count is around 39,000, which includes 4,000 new terms coming from death certificates. One can expect the procedure code list to grow as well. Consideration for congruence between SNOMED and ICD-11 may be a factor in the future as ICD-11 definition continues.