This paper focuses on development of joint optimal maintenance and production policy for a specific type of
production system that allows for adjustable production rates.
The rate of deterioration of the system is directly related to
the production rate, with higher production rates resulting in
greater expected deterioration. The system’s deterioration can
be controlled through two main actions: (1) scheduling and
conducting maintenance actions referred to as maintenance
policy; and (2) adjusting the production rate referred to as
production policy. To determine the optimal actions given the
system’s state, a Markov decision process (MDP) is developed
and a reinforcement learning algorithm, specifically a Q-learning
algorithm, is utilized. The algorithm’s hyper parameters are
tuned using a value-iteration algorithm of dynamic programming.
The goal is to minimize expected costs for the system over a finite
planning horizon