This paper addresses the challenge of maintenance planning for multi-component systems, focusing specifically on wind turbine farms, which play a vital role in electricity production. Traditional approaches to maintenance planning rely on pre-specified thresholds, triggering maintenance actions based on conditions such as production rate, system age, or failure states. However, this study proposes a dynamic approach to maintenance planning that considers the actual state of the system in real-time. The system is modeled as a large-scale multi-component parallel production system, where each unit can be in one of three states: good, partial failure, or failure. The transitions between states are governed by a continuous Markov chain, enabling a comprehensive representation of the system's behavior. By utilizing this dynamic modeling approach, maintenance actions can be scheduled based on the current state of the system, allowing for more efficient and effective maintenance decision-making. To optimize the system's profit in an infinite planning horizon, a Markov Decision Process framework is employed. However, due to the exponential increase in the number of system states with the number of units, traditional dynamic programming algorithms are insufficient for solving this large-scale MDP. Hence, reinforcement learning algorithms, specifically Qlearning, are utilized to determine the maintenance actions based on the current system state. The objective of this study is to maximize the system's profit by considering various factors, including the costs of lost demand, profit from overproduction, and the costs associated with maintenance actions. From a practical standpoint, this research holds several values for industries reliant on multi-component production systems. Maintenance managers can harness the insights obtained from this study to formulate cost-effective strategies, ensuring minimal downtime and maximum system uptime. Moreover, as industries progressively l