Adaptive inventory replenishment using structured reinforcement learning by exploiting prior knowledge
Artificial intelligence (AI) is changing business and the way we work on a day-to-day basis. One of the key factors contributing to this revolutionary progress is that AI has an effective and efficient way to learn. Among various approaches, we particularly consider a technique that enables AI to leverage prior knowledge and apply it to improve its learning process.
In the field of inventory management, this paper leverages the prior knowledge to develop an algorithm. In more detail, we propose a novel reinforcement learning algorithm that provides good learning behavior by exploiting the known structural properties of a well performing policy. For example, K-convexity is the basis of the optimality of an (s, S) policy. We use this K-convexity to design the reinforcement learning algorithm for an inventory replenishment problem. In addition, the proposed algorithm directly updates a replenishment policy rather than learning action-value functions. Thanks to these benefits, it rapidly characterizes observed demand and adapts the replenishment policy to switching demand. We show that the proposed method provides near-optimal policies for single- and multi-item problems by exploiting the policy structures.
Figure 1. Daily sales in a retail shop
Figure 2. Trajectory of the inventory policy
Figures demonstrates that the proposed reinforcement learning algorithms reasonably learn the replenishment policy in response to the switching demand. This desirable behavior enables the proposed algorithms to lower the inventory costs by approximately 10% ~ 30% compared to the current replenishment practices in Korean retail industry.
The numerical analysis shows good performance, and the numerical validation confirms its operational efficiency under a practical inventory system. This well-designed algorithm is particularly promising when we require policy updates based on observations lacking precise knowledge of non-stationary demand. We conduct a numerical validation for a retail shop in Korea to examine the applicability of the proposed structured algorithms in the health & beauty retail industry. Compared to the current practice, the newly developed algorithm saves approximately 3 million KRW in sunblock inventory costs. When considering 150 stores, this indicates that the retail shop can expect huge cost reduction in sunblock inventory costs annually. In terms of the business scale of chain stores, applying these savings to the other 10,000 items justifies the feasibility of our method.
* Related Article
Hyungjun Park, Dong Gu Choi, Daiki Min, Adaptive inventory replenishment using structured reinforcement learning by exploiting a policy structure, International Journal of Production Economics, Volume 266, 109029, December 2023