I am trying to modify the actuator force gain's P-Term for the ShadowHand environment (Actuator XML here: https://github.com/openai/gym/blob/...gym/envs/robotics/assets/hand/shared.xml#L232). I am trying to duplicate the domain randomization listed in Table 1 of the OpenAI Learning Dextrous In-Hand Manipulation Paper (https://arxiv.org/pdf/1808.00177.pdf). To do this I wrote the following code: Code: for actuator_name in sim.model.actuator_names: scale = np.exp(np.random.uniform(np.log(0.75), np.log(1.5))) sim.model.actuator_gainprm[sim.model.actuator_name2id(actuator_name)][0] *= scale This causes no problems when I run a few rollouts locally, but when I run domain randomization at scale using a machine learning model that trains on thousands of rollouts, my training stops in the middle and the CPU stops doing work without throwing any errors. Is there something incorrect that I am doing to the actuator gain that is causing instability int he environment? Also, is there any situation where the environment can fail silently without errors?