Score
0
Lines
0
Training Progress
AI Decision โ Current Action
โ Left
โ Right
โป Rotate
โ Drop
Q-Values โ What the Agent Thinks
Higher Q-value = agent believes this action leads to better future score
Reward Function
The only human judgment in the system โ everything else the agent learned itself
Line clear+100
Tetris (4 lines)+800
Per holeโ5
Bumpinessโ0.5
Heightโ0.3
Deathโ500
Network Architecture
Input: 15 board features
Hidden: 256 neurons
Hidden: 256 neurons
Hidden: 128 neurons
Output: 4 Q-values
Trained with experience replay + target network stabilisation
15 Input Features
โข 10 column heights
โข Total holes
โข Surface bumpiness
โข Max column height
โข Average height
โข Lines cleared so far
โข Total holes
โข Surface bumpiness
โข Max column height
โข Average height
โข Lines cleared so far