It comes back like a dog with a stick

Tuesday, March 19th, 2019

James Thompson explains the cognitive task of flying a plane like the Boeing 737 Max:

Using James Reason’s explanatory framework (Human Error, 1989), pilots flying the Boeing 737 Max 8 and encountering the opaque workings of MCAS (manoeuvering characteristics augmentation system) are carrying out intentional but mistaken actions: they are trying to pull a plane out of a dive. The plane is in fact climbing away from an airport after takeoff, but a failure in an angle of attack indicator has convinced MCAS that it is in a stall condition. (For extra money, you can buy a second angle of attack indicator, and apparently these two airlines did not do so. For safety, two should be standard at no extra cost). Accordingly, MCAS puts the nose of the plane down to avoid the stall. The pilot reacts by pulling back the yoke so as to resume upward flight, cognizant of the plain fact that unless he can gain height he is going to die, together with his passengers. His action satisfies MCAS for a short while, and then it comes in again, helpfully trying to prevent a stall (because pulling on the yoke is not enough: the whole tail plane has to be “trimmed” into the proper angle). Pilots are doing what comes naturally to them.

MCAS is diligently doing as instructed, but is badly designed, relying as it does in this case on a single indicator, rather than two which could identify and resolve discrepancies, and has no common sense about the overall circumstances of the plane. The pilots know that they have just taken off. MCAS, as far as I know, does not “know” that. Again, as far as I know, MCAS does not know even what height the plane is at. (I know that this is not real Artificial Intelligence, but I used it as an illustration of some of the problems which may arise from AI in transport uses). The pilots respond with “strong-but-wrong” actions (which would be perfectly correct in most circumstances) and MCAS persists with “right-but-wrong” actions because of a severely restricted range of inputs and contextual understanding. Chillingly, it augments a sensor error into a fatal failure. A second sensor and much more training could reduce the impact of this problem, but the inherent instability of the engine/wing configuration remains.

Using Reason’s GEMS system, the pilots made no level 1 slips or lapses in piloting. They had followed the correct procedures and got the plane off the ground properly (once or twice a pilot forgets to put the flaps down at take-off or the wheels down at landing). I think they made no level 2 rule-based errors, because their rule-based reactions were reasonable: they considered the local state information and tried to follow a reasonable rule: avoid crashing into the ground by trying to gain height. They could be accused of a level 3 error: a knowledge-based mistake, but the relevant knowledge was not made available to them. They may have tried to problem-solve by finding a higher level analogy (hard to guess at this, but something like “we have unreliable indicators” or “we have triggered something bad in the autopilot function”) but then they must revert to a mental model of the problem, and think about abstract relations between structure and function, inferring a diagnosis, formulating corrective actions and testing them out. What would that knowledge-based approach entail? Either remembering exactly what should be done in this rare circumstance, or finding the correct page in the manuals to deal with it. Very hard to do when the plane keeps wanting to crash down for unknown reasons shortly after take-off. Somewhat easier when it happens at high altitudes in level flight.

At this point it needs to be pointed out that there is some confusion about how easy it was to switch off CMAS. All the natural actions with the yoke and other controls turn if off, but not permanently. It comes back like a dog with a stick. Worse, it will run to collect a stick you didn’t throw. The correct answer from the stab trim runaway checklist, is to flick two small switches down into the cut out position. Finding them may be a problem (one does not casually switch things off in a cockpit) and for those not warned about the issue, the time taken to find out the required arcane procedure may be insufficient at low altitudes, such as after take-off. Understandably, pilots did not understand the complexity of this system. They had a secret co-pilot on board, and hadn’t been told.

Comments

  1. Kirk says:

    I’m gonna go out on a limb and predict that the actual cause of failure on these flights is going to prove to be something other than these early hot takes on the whole thing, and more probably due to the fact that the pilots who were flying them were not actually pilots, but glorified button-pushers that were out of their depth as soon as something unscripted happened.

    This is the essential problem with all of the attempts to create “expert systems” that can substitute for actual experts. The engineers can only design for what they anticipate to have happen, and the environment is always going to throw unexpected circumstances at the system. Without a real pilot like Sullenberger at the helm, wellllll… Shit’s going to go down. As in, nose-dive into the terrain “down”.

  2. Alistair says:

    Agreed.

    I work with Engineers in complex systems. They consistently over-estimate system reliability in the field despite extensive testing.

    The one thing that strikes me across their work is that they have little or no concept of “something we haven’t thought of” beyond the test environment; the ‘unknown unknowns’ of Rumsfeldian lore. They certainly have no respect for such. Entirely new faults are caused by their last attempt to fix another fault or improve performance.

  3. Alistair says:

    One additional, related point; consider Germanwings 9525
    https://en.wikipedia.org/wiki/Germanwings_Flight_9525

    Everyone on that plane was killed by a “safety” improvement; the practise of locking cockpit doors to guard against terrorism. All it did was concentrate risk in another part of the system.

  4. Bob Sykes says:

    Boeing is allowed to self-certify, meaning there are no feds double checking what the Boeing engineers did.

  5. Kirk says:

    Not convinced the Feds would have done any better than Boeing, TBH. The actual problem here is accountability and poor design judgment. Crucify Boeing for what they are at fault for, here, hold the airlines responsible for their part, and leave it at that.

    The more responsibility and power that you lay off on government, the more diffuse actual accountability becomes. Boeing may have made poor design choices, here; let them pay the price.

    That said, I would not rule out the possibility of industrial espionage. This strikes me as something that is subtle enough to be the result of a deniable attack along the lines of the infamous Stuxnet. I would be very curious to know what sort of safeguards are in place, particularly in Indonesia and Ethiopia, with regards to access to avionics. The location might have a lot to do with the crashes; lack of similar failures here in the US might be related to better security for the planes…

  6. CVLR says:

    Bob Sykes: “Boeing is allowed to self-certify, meaning there are no feds double checking what the Boeing engineers did.”

    Say what now?!

  7. Sam J. says:

    They should have tied into the Inertial navigation system. If a plane is falling it would know and would provide another input that will be on the plane already. The INS would know the angle of the plane and the airspeed in any direction. I have no idea what the standard is in planes but I know they have ring laser gyros in some that are solid state bullet proof and not likely to err.

Leave a Reply