This documents attempts to briefly present the algorithm and some experiments found online about it. The following repo seems to be a good resource: here.

Usually, PPO is explained as an actor critic framework. This means there is an agent that acts on the environment, and then there is a critic that collects the feedback from the environment. The main idea about this framework is to select a policy that is similar, so that it is less probable that a bad policy, a very different policy from the original is selected. This is achieved by clipping over the advantage. And then