Don Horrell wrote:Thank you for your interesting comment.
Could you explain a bit more about what you mean by a "dynamic environment" please?
So, if Reinforcement Learning can select an advertisement for a user, that sounds similar to classification (of images?). A person walks past a smart advertising board which somehow identifies that person and the "label" is the type of advert that will be displayed for that person.
Or have I got the wrong end of the stick? Or perhaps even the wrong stick?
Cheers
Don.
Yes the label would be one of some finite number of possible advertisements. The objective here, however, would be for the RL algorithm to optimize the clickthrough rate. So rather than an image classification where the algorithm is trained based on whether or not the classification is correct or not, the decision here is not binary. There is no one correct ad, some ads will result in more clicks (if these are ads on a website) than others, and the goal is to learn which ad will cause a potential customer to be most likely to respond to the ad by buying.
A dynamic environment just to mean that again, the decision isn't a correct/incorrect labeling, but a set of actions that lead to more or less of some outcome (points in a game, clicks for ads, money if trading stocks, etc)