Dynamic Pricing using Q-learning

I am trying to develop a project (it’s Dynamic Pricing based) which uses Q Learning approach and I am badly stuck and want help regarding some implementation stuff. I’ve the algorithm ready on paper just confused how to properly implement it. Please help! :disappointed_relieved: I would like to talk in DM for specifics of the project if anyone is up for help please please lemme know.