System Safety

We cannot present here a full analysis of system safety and instead refer the interested reader to items in our bibliography and to various web sites that address these issues. You can see what noted system safety expert Nancy Leveson has to say about this case in the excerpt from her article that we provide in the supporting documentation.

However, we have at least learned that in order to understand the safety issues properly, we must look at them at several levels of social complexity, just as the ImpactCS framework suggests.

Safety at the Individual Level

The programmer

Certainly the single individual who did the programming for Therac 25 had responsibilities as a computing professional. To whom were these responsibilities owed? An obvious first responsibility is to the organization that employed him. Another party to whom responsibility is owed is to the eventual users of the linear accelerator: the patients. We can certainly add to these two (e.g. to the profession, to the machine operators), but let’s take these two for the purposes of this analysis.

The programmer’s responsibilities to his employer were more than simply to do as directed to by the other designers of the system (or by whoever his immediate superiors were). He had a responsibility to make his superiors aware of the dangers inherent in doing safety interlocks only in the software. Whether this danger was obvious to him or not is an interesting question. Even today many computing professionals place more confidence in the safety of software than is likely. Software safety was little understood at the time the Therac-25 system was designed.

The point here is that a computer professional is responsible to the employer for using the best available methods to solve the software problems with which he or she is confronted. There are a variety of professional decisions the programmer made in the Therac-25 design that suggest he was lax in this responsibility (e.g., using unprotected memory, improper initialization, lack of appropriate testing, etc.). Thus, as an employee, he fell short of the mark in providing his employer with professional work. We cannot know whether this shortcoming was one of lack of knowledge or of poor execution.

In addition to responsibilities to his employer, the programmer clearly had a responsibility to the users of the technology he was designing. In the context of safety, his responsibility was to design software that minimized the likelihood of harm from a dangerous medical device. This obligation to "do no harm" need not mean that software should never be paired with medical linear accelerators. From the perspective of the operator we interviewed this pairing was a positive benefit in making setup and treatment easier. But it does mean that, to the extent it was within the professional control of the programmer, he should have designed the system to do no harm while providing this positive good. Again, whether the failure to do this was a result of a lack of knowledge or of poor execution, we cannot know. Huff & Brown provide some speculation on this issue, given the state of the art at the time concerning real time computing.

To sum up, the programmer had clear responsibilities to both his employer and to the users of the device. He clearly failed in these responsibilities. If we were interested in blame, we could not tell the amount of blame to assign here. We know nothing of the programmer’s background or training. Thus we cannot know if the programmer knew how poor the software design and testing was. Nor do we have any idea of the internal company dynamics that may have resulted in the lack of testing.

The operators

We noted in the case write-up that the operators of the linear accelerators had a complex combination of responsibilities. Chief among these are responsibilities to their employer and to the patients.

Just like the programmer of the system, we know little about the background and training of the operators in this case. But we can at least specify their responsibilities. They were responsible to their employers to operate the machine efficiently, getting all the scheduled treatment done in any particular day. They also had a responsibility to their employer to look after the machine so it could be maintained properly. Finally they had a responsibility to their employer to operate the machine carefully and not to place patient in danger.

From published accounts of operator’s comments, from our interview of a Therac-4 operator, and from comment’s made in court documents, it seems clear that none of the operators felt they were placing their patients in any danger when they pressed the button. Thus, we can rule out intentional negligence. But what happened? Leveson suggests that the interface on the console made operators tolerant of error messages, and readily rewarded them for pressing the "proceed" button whenever a minor error appeared. The interface made no distinction between life threatening errors and minor errors, except that major errors would not allow a "proceed." Given this, it is hard to see how the operators might be responsible for the errors, even though they were the ones to press the key.

An interesting issue arises because of the current move among operators to become more professionalized. As operators are better trained, are certified, and are more aware of the workings of the machine, they gain the prestige — but they also gain responsibility. As they become well trained enough to foresee such errors, their responsibility for them will increase.

Safety at the Group Level

There are two organizations at this level whose actions need to be thought about: the treatment facilities and Atomic Energy Canada, Limited.

Atomic Energy Canada, Limited

With regard to safety in this case, AECL’s responsibility in making a medical linear accelerator are to a range of individuals: their shareholders, their employees, the governments of Canada and the United States, to the facilities that bought the machine, and finally to the patients who were treated by them. Responsibility to shareholders and employees are similar, and for this analysis will be considered the same.

Before we look at these specific responsibilities, we will need to understand some of the technical issues involved in the analysis of a system for safety. In this instance, technical knowledge is required to make ethical judgments.

AECL claimed to do a safety analysis of its machine, but in fact the analysis only shows the likelihood of the system failing because a part wears out. There was apparently no systematic search for design flaws in the software until after the FDA required an analysis. Unfortunately, a system can be highly reliable but thereby reliably kill people because of a design flaw. This confusion of reliability analysis and safety analysis is a critical failing on the part of AECL.

Some indication of the motivations behind AECL’s inadequate safety analyses can be gleaned from the way AECL appeared to use probabilities in its analysis. These probabilities seemed to be assigned to quantify and to prove the safety of the system, rather than to identify design flaws. For example, after redesigning the logic to track the microswitches that indicated the position of the turntable, AECL apparently used a sort of Fault Tree Analysis to assert that the safety of the system had been improved by at least 5 orders of magnitude. This astonishing claim of improvement is applied to the safety of the entire machine. This use of probabilities from a Fault Tree Analysis can effectively hide critical design flaws by inflating the perception of reliability and discouraging additional search for design flaws. This hiding of design flaws was a tragic, if unintentional side effect of the improper use of this analysis.

Thus, in failing to look systematically for design flaws in its software, AECL left itself (and its employees and shareholders) open to liability claims from injured consumers. This is clearly also a failure of its responsibility to patients and to the facilities who bought the Therac-25 machine and who were assured there was no way it could hurt patients. This failure must be perceived in the light of prevailing standards (or lack thereof) in system safety at the time of the design and release of Therac-25.

The Cancer Treatment Facilities

The cancer treatment centers a primarily consumers of a product that is tested, maintained, and certified by others. This product was sold to them with assurances that it could not hurt patients. And the facilities do not have the responsibility or the capability to independently check these systems for safety.

But they do have responsibility for the safe operation and low level maintenance of the machines once they are in operation. It was clear that at least one facility fell down in this respect. In the first Tyler accident, the video monitor to the room was unplugged and the intercom was out of order. This would not have been a problem if there were no accidents — but there were. One difficulty with the safe operations of systems is that standard maintenance can become tedious and not seems a necessary component in the safe operation of a regularly used system. In this case, an individual might have been spared a second overdose if the basic communication systems had been working.

We should note, however, the extraordinary efforts of the medical physicist at Tyler in determining the cause of the overdose. This individual effort was supported by the Tyler facility and made possible by the facility’s decision to have a full time physicist on staff. Some evidence of this support comes from the facility’s decision to report the accident to the FDA even though there was no requirement that they do so. Note that the facility decided that their responsibility extended beyond the requirements of the law.

Thus, most facilities had relatively minimal responsibilities in this case and most seemed to fulfill them. The facilities had little power to resolve the problem and depended on AECL and on the FDA’s approval process to protect them and their patients. Perhaps in this dependence they were too optimistic, but it is difficult to see what other choices they might have had.

Safety at the National Level

At the time of the Therac-25 accidents, the Center for Devices and Radiological Health (CDRH) of the FDA was responsible for the oversight of the immense market in radiation based therapy and diagnostics. As we have seen, most (94% in 1984) devices for the market were approved by "pre-market equivalence" and thus not subjected to stringent testing. The CDRH could not have handled the load of testing all these devices.

Since the rules for FDA are set by congress, FDA’s rules need to be analyzed from the perspective of the responsibilities of congress. But FDA implementation of those rules is under its control. Thus we can ask if the CDRH (as a center in FDA) should have allowed Therac-25 to be approved under pre-market equivalence. Without more information this is difficult to determine. The CDRH did seem to vigorously follow the case once it became aware of the Tyler accidents, though there is some evidence that they were reluctant to quickly halt the used of the Therac-25 when the problems became evident. This reluctance may be because of their responsibility to not place an undue burden on manufacturers in their caution regarding a product. This tension between responsibilities to manufacturers/industry and responsibilities to patients is always present in decisions by the FDA. Hindsight makes this one seems easy to decide.

One of the problems in this case is that the FDA depended on AECL to notify it of accidents that had occurred. They did not hear directly from the hospitals when the accidents happened. AECL had, at best, a mixed record of notifying the FDA of problems. So perhaps facilities should have been required to directly report accidents (they are today). But FDA could not make this requirement; it could only enforce existing law. Thus, perhaps it was a responsibility of congress to enact this law. The counter-argument is that congress should allow the market to work out these issues. But in this instance, at least, the market was too slow to save the individuals who were killed or injured.

The entanglement of different ethical issues become very clear at this level of analysis. Different constituencies will value different things (e.g. personal privacy vs business freedom). These choices among different values are as severe at the other levels (e.g. operator’s responsibility to employer and patient) but not as easily seen to the outside observer. Choice and balance among these values becomes inescapable, however, at the political level.

Safety at the Global Level

Communication between the Canadian Radiation Protection Board and the FDA seemed to work pretty well in this case. These two agencies had responsibilities to their respective governments, to industry in their countries, and to patients in their countries. AECL’s communication with FDA did not seem to be hampered by its international flavor. However, this is a case of two relatively similar countries and cultures interacting with each other. Similarities in legal standards and in government oversight made this case easier. This might be an even less happy story if we had been dealing with widely different cultures of business or legal systems in the two countries.

Back to Ethical Issue Chart