Reward Hacking
Reward hacking occurs when a policy that optimizes for a proxy goal does not optimize the true goal.
Skalse et al show that a reward function (true goal) can be hacked by a proxy reward function (proxy goal) whenever they disagree on the ranking of all policies.
They also show that most prosaic RL settings use hackable proxy goals.