May 27, 2024

Hasan Rasay

Academic rank: Assistant professor
Education: Ph.D in Industrial Engineering
Phone: 38305005
Faculty: Faculty of Management Engineering


Reinforcement Learning based on Stochastic Dynamic Programming for Condition-based Maintenance of Deteriorating Production Processes
Type Presentation
Maintenance, Markov decision process, Dynamic programming, reinforcement learning
Researchers Hasan Rasay، Farnoosh Naderkhani، Amir Mohammad Golmohammadi


In this paper, a stochastic dynamic programming model is developed for maintenance planning on a deteriorating multistate production system. The quality of the bath/lot of items produced in each stage is employed as a condition monitoring for condition-based maintenance. The machine has m-1 operational states plus a non-operational state referred as the failure state. At the start of each stage, four actions are available for the management: (1) renew the system; (2) implement maintenance; (3) continue the production, and (4) inspect the system. It is assumed that the impact of the maintenance is imperfect which means after the maintenance, the system is restored to any non-worse states with known probabilities. As the system states change Markovianlly at the end of each stage, and the quality of the items produced depends on the system state, the system can be modeled based on a Markov decision process (MDP). As the MDP is the core of reinforcement learning, for the large-scale problem, it is discussed that the proposed stochastic dynamic programming can be employed to develop reinforcement learning algorithms. To this end, Q-learning algorithm is proposed.