This paper proposes a maintenance decision-making framework for multi-unit systems using Machine Learning (ML). Speci cally, we propose to use Deep Reinforcement Learning (RL) for a dynamic maintenance model of a multi-unit parallel system that is subject to stochastic degradation and random failures. As each unit deteriorates independently in a three-state ho- mogeneous Markov process, we consider each unit to be in one of three states: healthy, unhealthy, or a failed state. We model the interaction among system states based on the Birth/Birth-Death process. By combining individual com- ponent states, we de ne the overall system state. To minimize costs, we use the Markov Decision Process (MDP) framework to solve the optimal main- tenance policy. We apply the Double Deep Q Networks (DDQN) algorithm to solve the problem, making the proposed RL solution more practical and e ective in terms of time and cost savings than traditional MDP approaches. A numerical example is provided which demonstrates how the RL can be used to nd the optimal maintenance policy for the system under study.