Education, Science, Technology, Innovation and Life
Open Access
Sign In

Prioritized Reward of Deep Reinforcement Learning Applied Mobile Manipulation Reaching Tasks

Download as PDF

DOI: 10.23977/jaip.2025.080313 | Downloads: 0 | Views: 41

Author(s)

Zunchao Zheng 1

Affiliation(s)

1 Intelligent Process Automation and Robotics Lab, Karlsruhe Institute of Technology, Karlsruhe, Germany

Corresponding Author

Zunchao Zheng

ABSTRACT

In this paper, we apply deep reinforcement learning (DRL) for reaching target positions with a mobile manipulator while coordinating the mobile base and the manipulator and study the performance of different reward functions to get a higher success rate and more efficient movement. The reward is basically defined by the function of distance between the robot and the goal. We propose principles to build reward functions based on geometric series theory and discuss possible reward forms combined with different elements. We also present a prioritized reward function for mobile manipulation to weight movements of different parts and further provide a method to define the weights. Experiments are carried out in both two-dimensional and three-dimensional collision-free environments, and a further investigation into a relative task of going through an opening doorway is evaluated in the end.

KEYWORDS

Mobile Manipulation, Deep Reinforcement Learning, Reward Engineering

CITE THIS PAPER

Zunchao Zheng, Prioritized Reward of Deep Reinforcement Learning Applied Mobile Manipulation Reaching Tasks. Journal of Artificial Intelligence Practice (2025) Vol. 8: 102-114. DOI: http://dx.doi.org/10.23977/jaip.2025.080313.

REFERENCES

[1] R. Philippsen, L. Sentis, O. Khatib, "An open source extensible software package to create wholebody compliant skills in personal mobile manipulators," in 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2011, pp. 1036–1041.
[2] A. Dietrich, T. Wimbock, A. Albu-Schaffer, G. Hirzinger, "Reactive whole-body control: Dynamic mobile manipulation using a large number of actuated degrees of freedom," IEEE Robotics & Automation Magazine, vol. 19, no. 2, pp. 20–33, 2012.
[3] G. B. Avanzini, A. M. Zanchettin, P. Rocco, "Constraint-based model predictive control for holonomic mobile manipulators," in 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1473–1479.
[4] M. V. Minniti, F. Farshidian, R. Grandia, M. Hutter, "Whole-body mpc for a dynamically stable mobile manipulator," IEEE Robotics and Automation Letters, vol. 4, no. 4, pp. 3687–3694, 2019.
[5] J. Pankert, M. Hutter, "Perceptive model predictive control for continuous mobile manipulation," IEEE Robotics and Automation Letters, vol. 5, no. 4, pp. 6177–6184, 2020.
[6] F. Farshidian et al., OCS2: An open source library for optimal control of switched systems, [Online]. Available: https://github.com/leggedrobotics/ocs2.
[7] T. P. Lillicrap, J. J. Hunt, A. Pritzel, et al., "Continuous control with deep reinforcement learning," arXiv preprint arXiv:1509.02971, 2015.
[8] S. Fujimoto, H. van Hoof, D. Meger, "Addressing function approximation error in actor-critic methods," CoRR, vol. abs/1802.09477, 2018. arXiv:1802.09477. [Online]. Available: http://arxiv.org/abs/1802.09477.
[9] J. Schulman, F. Wolski, P. Dhariwal, A. Radford, O. Klimov, "Proximal policy optimization algorithms," CoRR, vol. abs/1707.06347, 2017.
[10] M. J. Mataric, “Reward functions for accelerated learning," in Machine Learning Proceedings 1994, Elsevier, 1994, pp. 181–189.
[11] S. Koenig, R. G. Simmons, “The effect of representation and knowledge on goal-directed exploration with reinforcement-learning algorithms," Machine Learning, vol. 22, no. 1-3, pp. 227–250, 1996.
[12] L. Matignon, G. J. Laurent, N. Le Fort-Piat, "Reward function and initial values: Better choices for accelerated goal-directed reinforcement learning," in International Conference on Artificial Neural Networks, Springer, 2006, pp. 840–849.
[13] M. Grzes and D. Kudenko, "Theoretical and empirical analysis of reward shaping in reinforcement learning," in 2009 International Conference on Machine Learning and Applications, IEEE, 2009, pp. 337–344.
[14] J. Kindle, F. Furrer, T. Novkovic, J. J. Chung, R. Siegwart, and J. Nieto, "Whole-body control of a mobile manipulator using end-to-end reinforcement learning,” arXiv preprint arXiv:2003.02637, 2020.
[15] C. Wang, Q. Zhang, Q. Tian, et al., "Learning mobile manipulation through deep reinforcement learning," Sensors, vol. 20, no. 3, p. 939, 2020.
[16] G. Brockman, V. Cheung, L. Pettersson, et al., Openai gym, eprint: arXiv:1606.01540. 2016.
[17] S. Jauhri, J. Peters, G. Chalvatzaki, "Robot learning of mobile manipulation with reachability behavior priors," IEEE Robotics and Automation Letters, vol. 7, no. 3, pp. 8399–8406, 2022.
[18] F. Schmalstieg, D. Honerkamp, T. Welschehold, A. Valada, Learning hierarchical interactive multi-object search for mobile manipulation, IEEE Robotics and Automation Letters, 8(12):8549–8556, 2023
[19] T. Ni, K. Ehsani, L. Weihs, and J. Salvador. Towards disturbance-free visual mobile manipulation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 5219–5231, 2023.
[20] E. Coumans, Y. Bai, Pybullet, a python module for physics simulation for games, robotics and machine learning, http://pybullet.org, 2016–2019.
[21] A. Hill, A. Raffin, M. Ernestus, et al., Stable baselines, https://github.com/hill-a/stable-baselines, 2018.

Downloads: 15130
Visits: 487026

Sponsors, Associates, and Links


All published work is licensed under a Creative Commons Attribution 4.0 International License.

Copyright © 2016 - 2031 Clausius Scientific Press Inc. All Rights Reserved.