It pays to be paranoid
Richard Parry Jones of Ford talks to Dean Palmer about how the company's success has been partly down to finding the right balance between design innovation and reliability
Richard Parry Jones of Ford talks to Dean Palmer about how the company's success has been partly down to finding the right balance between design innovation and reliability
"Finding failure modes early in technology and product development is the most important challenge we face as engineers. Everything else we do is a consequence of our ability to do this." So says Richard Parry Jones, chief technical officer and head of global product development at Ford Motor Company. He continued: "We should assume a failure mode will occur unless there is compelling evidence to the contrary. Do not travel hopefully - you will be disappointed. Failure modes in the field are simply an inevitable consequence of finding failure modes late in the development cycle. It pays to be paranoid."
And if anyone should know about product reliability and design innovation, Parry Jones is top of the list. Not only is he responsible for every technical change to the design of a Ford motor vehicle, reporting directly to the Board, but he is also in charge of more than 30,000 engineers, scientists and designers across the complete range of Ford, Lincoln, Volvo, Jaguar, Mazda, Aston Martin and Land Rover vehicles. In fact, you could say that he has one of the most demanding jobs in the automotive business.
On the subject of reliability versus design innovation, he has some strong views: "UK manufacturing is very good at the innovation stuff but not so good at the reliability side of things. People will not, over the long haul, except things that are not reliable, however innovative they are. Customers demand reliability, safety and, in the last ten years, environmentally-friendly products. They want to be able to say: 'If I buy this product, it's going to be well engineered, it's going to be reliable, safe, environmentally responsible and it's produced by people who I can trust.'"
The key is failure modes, says Parry Jones, because these slow down technological development. So Ford defines reliability simply as 'failure mode avoidance'. He explained: "Failure modes should therefore be treated as the fundamental quantity in engineering and technical development. They need to be found and counted. It is not the probability of the vehicle breaking down during the useful life of the vehicle or anything like that. What would we do with information like that? First of all, the calculation's probably wrong because most of the assumptions that you need to calculate the probably of failure are invalid: they can't be estimated accurately. Secondly, once you've done that calculation, what do you do with it? Not very useful in our experience. If we use failure mode avoidance, it becomes something we can 'operationalise' and action."
According to Parry Jones, there are five sequential ways in which engineers can create failure modes: "In design and development, we create failure modes. Therefore, our job as engineers is to find them as soon as possible and put countermeasures in place. We can create failure modes at any time in the project. We can create them at the beginning, when we define the mission. These are failure modes due to bad planning - incapability or incapacity - or we can develop failure modes in the definition phase. This would be due to poorly defined requirements. Somewhere between 70 and 80% of all software related failure modes come into this category.
"A third area we can create failure modes is characterisation. We can create failure modes by formulating badly our design solutions. For example having too many moving parts; making it too difficult to make at a high rate or too difficult to service.
"We can also create failure modes during the optimisation phase. Those we would describe as failure modes of technical execution. For example, not being able to come up with countermeasures can be for two basic reasons. One is by making a mistake. This means we know how to fix the problem, we know how to avoid the failure mode but we don't bring the knowledge to bear. That's a mistake. The countermeasure for a mistake is simple vigilance. So we make sure we apply what we know to avoid the failure mode.
"The other possibility in the optimisation phase is sensitivity to noises. The countermeasure here is robustness. The reason we distinguish between these two is the robustness failures are much more difficult to detect than mistake related failures. So you need some very good techniques and some good thinking to examine the sensitivity of the design to noise factors and to design countermeasures where necessary.
"The last area we can create failure modes is in the verification phase. Where we can create failure modes due to non-representative testing. Typically what can happen here is that the test is poorly designed and will not contain all of the noise factors so we will fool ourselves by concluding that we've got a great design because it didn't fail in test. The problem is we didn't design our test to try to make it fail.
But what advice does Parry Jones provide for other manufacturers looking to solve their reliability issues? "On optimisation failure modes, Ford uses health charts. A health chart is a bit of a how-to guide. You know what the performance specification has to be. A health check gives you a check-list; if you meet the health chart check list you will probably meet the performance specification. If you don't conform to the health chart, it's a mistake. The countermeasure for mistakes is primarily a question of vigilance. The only way you can counter mistakes is to not make them. To not make them, you need vigilance and vigilance, of course, requires a bit of structure. In large organisations, you need to support individual vigilance with, if you like, collective vigilance. We're all used to continuous learning, continuous research into best practices including design guidelines and standards, but with an enforced deviation process. Deviation should be the exception not the rule and should be the subject of extremely intense scrutiny, to make sure the deviation is valid and does not place the delivery of reliability at risk. We have a rule at Ford. If you're going to sign a deviation, you can only sign it either if the standard's going to be changed because it has been found to be wrong or in need of modification due to experience or you plan to conform. There is no middle ground, no permanent exception," he added.
Another way in which companies can minimise the occurrence of mistakes within a large company, said Parry Jones, is to have peer group design reviews, by bringing people with experience and expertise to bear on a particular design problem. He explained: "People come in with a fresh-eyes view and see mistakes that the person who designed it was perhaps not able to spot. Those can be very effective."
On to sensitivity to noise, he stated that, in automotive engineering, noises can occur due to a number of factors. "We're trying to produce cars at a rate of about one a minute - that's what those of us in the industry generally try to do. That's pretty tough and puts a lot of demands on our processes and our people. We have an uncontrolled demand space. This is not a nuclear power station or a jumbo jet where we can predict the demand space with a high degree of accuracy. We have millions of customers doing millions of different things with their vehicles and it's impossible for us to measure and monitor everything they do. We also have a lot of product complexity including product sharing across platforms. We may well think a component is very reliable because it's been reliable in vehicle 'A'. When we transpose it to vehicle 'B' it has a different surround space and different noise factors which expose new failure modes. Unless we understand them and take countermeasures, these will result in reliability problems. So the major thing to watch out for here is the interaction between systems. This is the typically overlooked noise factor."
He also said that over-engineering was another way of overcoming the problems. "This is often what companies do in civil engineering and nuclear engineering - you don't want to mess about so you make it very safe. If you do that in the car industry though, you end up with very expensive cars that are far too heavy and nobody would buy them. So we need to utilise the techniques of robustness to find an efficient solution to reliability. Weight, in turn, drives many of the negative characteristics of a vehicle. So, in the context of product development, the process should begin by creating the potential failure modes. These are inevitable of course because of the complexity of cars and the entropic state of our organisations. Secondly, find the potential failure modes, discover them and adopt countermeasures."
The main challenge is to minimise the time between creating the potential failure mode and finding it. You then have more time to develop the countermeasure. The total amount of time available in time to market is of course finite. Customers won't wait. Reliability problems in the field which the customers recognise and the media write about are simply the failure modes that this process didn't identify. They're the ones that got away. The customers found them not the engineers. That's very undesirable. There are other ways of finding them. Lots of late engineering changes. That's a good indication of ones that got away. Consequent lack of engineering attention while everyone fixes the last lot of problems so the site tends to repeat itself."