We consider the multi-armed bandit problem. We show that when the state space is finite the computation of the dynamic allocation indices can be handled by linear programming methods.
Results that may be inaccessible to you are currently showing.
Hide inaccessible results