I have a question regarding the Q-learning. I am new to deep learning. Every example that I saw is related to the problem in which we know the goal and how to put the rewards. But my problem is that I don’t know where are my rewards. For example if I had 3 state and 3 actions, one of them that satisfied a criterion should get a reward. Does anybody have any example or paper about the problem like this that the agent does not have any knowledge regarding the environment and the rewards? Thank you.