Vishal Rajput
Sep 17, 2024

--

Agreed, I beleive they used that during the training of the model. During inference, it is just using the trajectories learnt by RL at train time.

--

--

No responses yet