Plant personnel learn that oversights in proper maintenance, safety, and operations protocol can be costly
Shortly after midnight on a cold winter morning in the midwestern United States, an extraordinary sequence of unfortunate events culminated in an explosion that destroyed the boiler of a 450MW coal- and gas-fired power plant, resulting in physical damages and lost revenues estimated to be in excess of $500 million. The sequence of events began years earlier and could have been prevented by any number of reasonable interventions by plant management, plant operators, or plant maintenance personnel. The retrospective analysis of the circumstances and events leading up to the catastrophic explosion provides valuable insight into the potentially disastrous results of failures in management, operations, and maintenance and the communications between them.
The process actually began decades earlier when the steam blowdown tank, into which boiler steam is released in the event of an emergency shutdown, was undersized. As a result, the outlet pipe from the steam blowdown tank carried steam instead of condensed water, resulting in damage to the concrete outlet pipe.
A few years before the explosion, a PVC wastewater line was installed in close proximity to the blowdown tank outlet pipe. Some time after installation, the PVC pipe collapsed from the heat of the blowdown outlet pipe, causing sewage backups. Repair personnel ultimately identified the collapsed pipe and replaced a section, but didn't determine the root cause of the pipe collapse.
A few days before the explosion, the wastewater line collapsed again, leading to a toilet backup in the power plant control room the day before the explosion. The contractor hired to clear the toilet backup was unaware of the historical problem and was unable to obtain from plant personnel a copy of any drawings of the wastewater system. Part of the wastewater system included a pumping station that pumped through the same damaged wastewater line. To implement this, a check valve was inserted in the leg of the line extending to the control room toilet. As the sewer contractor attempted to clear the clog, his jetting tool became lodged in the check valve. No plant personnel locked out the lift station pump when the check valve was blocked open. A short time later the automatic pump turned on and, due to the open check valve, discharged an estimated 200 gallons of raw sewage through the toilet pipe onto the third story control room floor.
The electrical signal cables extending from the displays, controls, and indicators in the control room penetrated the floor of the control room extending through the second level to the computers and programmable logic controllers (PLCs) on the first level. Part of the raw sewage that was discharged on the control room floor drained along the cables into the cabinets of the PLC that controlled the fuel safety system (Photo 1). The fuel safety system (FSS) is a highly integrated control system, required by the boiler code (NFPA 85C at that time), that monitors and controls the conditions in the boiler and the status of blowers, igniters, and valves, enforcing specific safety protocols on the system, whether the boiler is firing or not. At the time of the water incursion, the boiler wasn't firing, but the water damaged a number of PLC components, causing the PLC to go into a fault condition. Although the FSS had been contaminated with raw sewage, the plant operators didn't close or lock-out/tag-out the main gas valve for the boiler, as their own procedures required.
The electrical maintenance personnel that were enlisted to assist with the system cleanup and repairs received the call at the end of their eight-hour shift. For various reasons, management assigned to the task three technicians with no prior experience in working on this FSS. The technicians' response to the flood was to wipe the affected devices and blow them dry with compressed air, rather than to follow the recommendations of National Electrical Manufacturers Association (NEMA), “Guidelines for Handling Water Damaged Electrical Equipment,” which would have required them to replace the components and/or return them to the manufacturer for condition assessment. The affected FSS devices included relays, circuit breakers, input/output (I/O) interface cards, and I/O controller cards. One of the devices that failed was part of the electrical path called the “hardwire trip” circuit that sent a continuous “close” signal to the main gas valve for the plant. The electrical maintenance technicians apparently reset the hardwire trip circuit in the course of their troubleshooting efforts. Although this didn't cause the gas valve to open at that time, it did terminate the “close” signal, which would have overridden any inadvertent “open” signal.
Components of the PLC for the fuel safety system were also inundated with the raw sewage. The I/O interface cards provide the electrical connection point for signals coming in from sensors and for signals going out to actuators that control equipment in the plant (Photo 2). The controller cards provide the communications link between the devices in the field and the computer or processor that implements the logic and analysis in the system. Given that there were several racks of I/O cards, the communication between the processor and the field devices is made unique by an address assigned to each of the I/O controller cards. Rack 1 and Rack 2 were the two racks most severely affected by the water incursion.
The I/O racks were still in a faulted condition after the drying process was completed. The technicians proceeded to troubleshoot interface cards in Rack 1, and they eventually succeeded in clearing the faults on Rack 1 by replacing one input card and one output card. Clearing the fault made it possible for communication to resume between the field devices and the processor. Among the devices controlled through Rack 1 were the main gas valve and various control panel alarms in the control room.
After clearing the fault on Rack 1, the technicians worked to clear the faults on Rack 2. They replaced the Rack 2 I/O controller card with a new card, on which they set the proper address. This failed to clear the faults, so they swapped the controller cards of Rack 1 and Rack 2 — without changing any addresses — with the intent of verifying the functionality of the new controller card. When Rack 1 powered up without fault, they concluded that the new controller card was functional, but they didn't return the controller cards to the original positions for which they had been addressed. The effect of powering Rack 1 with the incorrect address was to cause the processor to misinterpret the incoming signals from Rack 1 as if they were the field devices connected to Rack 2, and to misdirect outputs to Rack 1 as if they were being sent to the devices connected to Rack 2. As a result, a signal intended to ensure that a valve remained closed was redirected to open the main gas valve about three hours before the explosion. Natural gas began to flow into the burner manifolds, but without the burner valves open, no gas flowed into the boiler itself.
In the control room, at the time that the electrical technicians powered up Rack 1 with the Rack 2 controller card, several indications of potential catastrophe were observable by the operations personnel, although they later testified that they either didn't observe or didn't believe the indications that were present. Due to the misdirected signals, several new alarms were annunciated on the PLC screen and the alarm panels. The unintentional opening of the main gas valve caused several pressure gages and flow indicators, both on the main indicator board and in the computerized data acquisition system, to show the unintended presence of gas. However, because the operators failed to observe the indications of the dangerous condition, no intervention was taken to prevent the explosion.
Despite the replacement of the controller card for Rack 2, the faults on that I/O rack still weren't cleared. As the electrical maintenance team continued their troubleshooting process, they replaced additional interface cards in Rack 2, finally clearing the fault about 45 minutes after putting the incorrect controller card in Rack 2. As with the error in Rack 1, the incorrectly addressed controller card in Rack 2 caused the PLC to misinterpret and misdirect signals. The result of powering up Rack 2 with the card addressed for Rack 1 was to open one of the burner valves on one side of the boiler, and the energization of a spark igniter on the opposite side of the boiler. As gas was released into the boiler, the mixture of air and gas finally reached a critical point at the igniter about two-and a-quarter hours later, igniting an explosion that completely destroyed the boiler and damaged other plant systems.
The ultimate effect of the explosion on the utility was enormous. Nobody was killed or injured, but the destroyed boiler and other plant systems had to be rebuilt. Besides the cost of rebuilding the plant, the utility also had to purchase replacement power on short-term contracts with widely varying prices. The ensuing litigation found that the explosion was primarily caused by the utility. Management bore some responsibility for its failure to recognize the potentially catastrophic effect of the wastewater pipe collapse and for assigning inadequately trained technicians to carry out repairs without engineering support. Operations bore some responsibility for its failure to carry out and enforce procedures like main gas valve lock-out/tag-out during fuel safety system repairs. Operations also failed to recognize and respond to the flow of natural gas and an extensive set of unexpected alarms. Maintenance failed to follow appropriate procedures when carrying out their troubleshooting and repairs, and failed to make sure that the gas system was locked out and tagged out prior to servicing the fuel safety system.
The explosion was the result of a unique set of circumstances and an accumulated effect of many mistakes. Nevertheless, the explosion was entirely preventable. At any point in the process, management, operations, or maintenance could have interrupted the chain of events that ended in disaster with care, thought, and adherence to well-designed procedures.
Palmer is a senior electrical engineer and Danaher is a senior mechanical engineer, both at Knott Laboratory in Centennial, Colo.