May 27, 2024

Hasan Rasay

Academic rank: Assistant professor
Education: Ph.D in Industrial Engineering
Phone: 38305005
Faculty: Faculty of Management Engineering


A reinforcement learning algorithm for optimal dynamic polcies of joint condition based maintenance and production
Type Presentation
condition-based maintenance; condition-based production, reinforcement learning, Markov decision process
Researchers Hasan Rasay، Fariba Azizi، Mehrnaz Salmani، Farnoosh Naderkhani


This paper focuses on development of joint optimal maintenance and production policy for a specific type of production system that allows for adjustable production rates. The rate of deterioration of the system is directly related to the production rate, with higher production rates resulting in greater expected deterioration. The system’s deterioration can be controlled through two main actions: (1) scheduling and conducting maintenance actions referred to as maintenance policy; and (2) adjusting the production rate referred to as production policy. To determine the optimal actions given the system’s state, a Markov decision process (MDP) is developed and a reinforcement learning algorithm, specifically a Q-learning algorithm, is utilized. The algorithm’s hyper parameters are tuned using a value-iteration algorithm of dynamic programming. The goal is to minimize expected costs for the system over a finite planning horizon