Key Takeaways

  • Learn about Predictive Maintenance Systems (PMS) to monitor for future system failures and schedule maintenance in advance
  • Explore how you can build a machine learning model to do predictive maintenance of systems
  • Machine learning process steps like the feature Engineering, Model training, Model Evaluation and Model Improvement.
  • Sample application using NASA engine failure dataset to predict the Engine Failure with classification models.

Engine monitoring systems involve using sensors placed in various locations in an aircraft engine to gather information about the engine’s performance.  The sensors provide real-time information to pilots on the operation of the engines and capture data for analysis of the performance of the engine over time.  The data captured reveals important information about the health of the engine.  For example, sensors will monitor how much fuel it takes to make a set amount of power.  Increases over time in the amount of fuel consumed would indicate a degrading of the efficiency of the engine, which means the engine is more expensive to operate and it will need maintenance to restore its efficiency.

Sensors can also detect impending failures and notify both the crew and ground stations. By combining the data from these sensors with advanced analytics, it’s possible to both monitor the aircraft in real time and predict the remaining useful life of an engine component so that maintenance can be scheduled in a timely manner to prevent mechanical failures.

In this article, we will explore on how we can predict the health status of an Turbo aircraft engine from the sensor values collected from various parts of the Engine.

A typical Aircraft’s Turbo Engine

Turbofan engine is a modern gas turbine engine used by the NASA space exploration agency. NASA has created a data set to predict the failures of Turbofan engines over time.

 Data set:

The data set includes time series for each engine. All engines are of the same type, but each engine starts with different degrees of initial wear and variations in the manufacturing process, which is unknown to the user. There are three optional settings(Setting 1,2,3) that can be used to change the performance of each machine. Out of the Many sensors attached to the Each engine,  21 sensors (IoT1 to IoT21)  collecting different measurements related to the engine state at runtime  and deemed to be very important in impacting the Engine performance, is included in the data set.

Over time, each engine develops a fault, which can be seen though sensor readings. The time series ends some time before the failure.

In the data set, every engine has run for a particular number of hours after which they were failed. For example, as shown below Engine.ID:1 ran for 191 Hours and failed at 192nd Hour. Engine.ID:2 ran for 286 cycles(Hours) and failed at 287th Hour. We have details about the sensors of 100 Engines and the number of cycles (Hours) these engines ran before they fail.

Data includes unit (Engine.ID) number, time stamps(cycle), three settings, and readings for 21 sensors.

Data set consists of sensor values of 100 similar Engines at different settings

Data Preparation & Feature Engineering for Prediction:

Now, the objective is to predict when the Engine is going to fail. So, we have created a variable with name RUL which is the Remaining Useful Life of a particular Engine. At Cycle 1(Hour 1), the RUL for Engine.ID:1 is 191 as it failed at 192nd hour. At Cycle 192 (Hour 192), the RUL for Engine.ID:1 is 0 as it failed at 192nd hour. Now, we can Train a model which can take the sensor values as input and predict the RUL of an Engine.

Our dataset is a time-series data set; hence readings are auto-correlated. Therefore, it’s likely that prediction at time “t” is affected by some time window before “t”. Most features we used are based on these time windows. To consider this effect & to reduce the Noise in the data and to improve the predictive power of the model, few other variables which are, moving averages and standard deviation of all the sensor values have been created.

Reducing the Noise in the Sensor data by Smoothing

As shown below, a1 variable indicates the moving average of sensor1 (IOT1) in the last 5 cycles. sd1 variable indicates the moving standard deviation of sensor1 (IOT1) in the last 5 cycles.


To make more business sense, we modified the variable RUL and created a new variable “target” (from the RUL), which has three levels of Engine status: Critical, Warning and normal.

Critical – If RUL (Remaining Useful Life) is less than 15 Hrs/15 Cycles

Warning – If RUL (Remaining Useful Life) is less than 30 Hrs/30 Cycles

Normal- If RUL (Remaining Useful Life is) more than 30 Hrs/30 Cycles

Now, a Machine learning algorithm can be trained on the below data to predict whether the Engine status is critical or warning or normal.

A new Target variable named “target” has been created

Data Exploration:

As shown below, a scatterplot between the variables a11 and a12 indicating that for the lower values of a11 and a12, the engine performance is degrading.

Machine failure status
” One of the patterns revealing the status of the Engine”

There might be many patterns like this, which indicates the status of the engine. But, unhidden these patterns through plots, correlating them and thus predicting the status of the engine by looking at all the give n sensor values manually is not possible.

So, we are deploying a Machine Learning algorithm, train it to understand all the possible patterns in the data and predicts the status of the Engine.

Model Training:

The data set (33,000 readings) is divided in to two data sets: Train (60%-20,000) and Test (40%-13,000) data sets. Two Machine Learning algorithms: XGBoost and Random Forest has been trained on the sensor values in the Train data set (20,000 readings) to predict the Engine status.

Random Forest is a powerful Tree based algorithm based on the bagging technique and the XGBoost is the hypertuned algorithm based on Boosting technique.

Now, the test data set that consists of around 13,000 readings has been given to this trained algorithms i.e; predictive models, for predictions. The predictive model has predicted the engine status as below.

“Predicted values by the Model against the actuals”

The performance of the two Models is plotted below:

Test-Result plot

Evaluating Predictive Models/Model Testing:

In the above line graph, the models have been plotted across 4 metrics: Accuracy, Precision, Recall and F1 score.


Random Forest (RF) has predicted the Engine status with 99.7% accuracy.

Only considering accuracy can be misleading if the classes are not balanced. Class imbalance happens when the same classes are over-represented in the data set. With class imbalance, some models might have high accuracy but poor prediction performance.

Recall & Precision:

To avoid this problem, we use precision and recall. Recall is the ratio between the number of positive values that have predicted and the number of positive values that should be predicted, which is the accuracy of the positive class (a particular class which is crucial out of the three classes available in our target variable).

In our case, “Critical” is the positive class and the Recall tells us the ability of the model to predict “Critical” status of the Machine. Random Forest is predicting the “Critical” status of the Engine with 55% accuracy. So, the recall is 55%.

Precision is defined as the model’s ability to predict positive values. It provides a ratio between the number of truly predicted positive values and the number of all predicted positive values that exist. In our case, if the RF algorithm predicted 100 Engine statuses as “Critical” then in those 100 statuses 80.7 (almost 80) are correct. So, the Precision of RF algorithm is 80.7%

F1 score:

F1 score is a measure of the accuracy of the test. It takes both precision and recall calculating the score. In this case, the F1 score for RF algorithm is 0.65

For accuracy, recall, precision and F1 score, it’s better to have their values close to 1.

Strategies to Improve the Model Performance/Predictive Power:

  1. We can identify the top Important variables that are having highest predictive power and remove the variables that are least Important. As shown below, we can remove variables like IoT15, IoT16 etc… that has zero importance i.e; no power in predicting the target.


Importance scores of all the variables based on their ability to predict Engine’s Status
  1. We can also form new variables that reveals more information on “Critical” class, which indirectly boosts the predictive power of the model (Recall) to predict “Critical” Class. This is Called feature engineering.

Apart from trying other Machine Learning (ML) Algorithms, these two techniques can largely improve the predictive power of the model and we generally try them in our projects.


This aircraft-health-monitoring system predicts the health status of the engine thus decreases the downtime and ensuring that engines are running efficiently.

This solution that we developed, includes data ingestion, data storage, data processing, and advanced analytics—all essential for building an end-to-end predictive-maintenance solution. And while this example is customized for aircraft engine monitoring, the solution can easily be generalized for other predictive-maintenance scenarios.

To discuss more on this please contact :

Tagged on:     

Leave a Reply

Your email address will not be published. Required fields are marked *