1, ppo sampling data is a kind of reinforcement learning, which refers to the process of forming a batch of training data in the actual operating environment during the operation of the algorithm.
2.minibatch training is a batch training method for massive data in deep learning.