I think a lot of the main points are covered by the other answers, but there are two things to consider:
1) Bandit based optimization tends to be heavily focused on small changes and visit based measurements. This often leads to an over valuing of short term behavior and a larger impact of variance which can lead to false conclusions. If the algorithm is acting too quickly on the data (which it usually is in a greedy algorithm) then you are likely to make consistent false conclusions and increase the amount of regret generated by the test. This is especially true for sites with less data or more variance in population performance.
It can also lead to self-fulling properties where an initial positive change is overly valued by the amount of times it is presented over time.
2) Any type of optimization is dependent on the spread of the number of variations you compare. If you are trying to maximize performance then you need to maximize the amount of chances for success. It won't matter at all if you choose standard A/B or bandit based optimizing algorithms if the original pool is limited by biases or by common opinion.
In general Bandit Based optimization can produce far superior results to regular A/B testing, but it also highlights organizational problems more. You are handing over all decision making to a system. A system is only as strong as its weakest points and the weakest points are going to be the biases that dictate the inputs to the system and the inability to understand or hand over all decision making to the system. If your organization can handle this then it is a great move, but if it can't then you are more likely to cause more problems then they are worse. Like any good tool, you use it for the situations where it can provide the most value, and not in ones where it doesn't. Both techniques have their place and over reliance on any one leads to massive limits in the outcome generated to your organization.