GPU benchmark batch_size/$ fidelity

Does somebody know why A4000 and A6000 has ~35% higher batch_size/$ ratio vs. 3090 ? Also why 3090 and A4500 have only marginal difference ?
Assuming $ here is a price per GPU at launch – A4500 and 3090 used to cost ~2200$ and ~1700$ respectively. Both have the same 24GB of VRAM. From this, I would expect 3090’s batch_size/$ to be ~30% greater than A4500. In PyTorch benchmark, 3090 and A4500 has almost the same batch_size/$.
For A6000, with the price around 4500$ and 48GB of VRAM, 3090 is still expected to be 30% more efficient in batch_size/$, but it’s the opposite in the chart.
3090 vs A4000, A4000 is ~25% higher, thought, with A4000 price being around 1250$ and VRAM of 16GB, I would still expect 3090 to have ~10% higher batch_size/$

.
Regarding the price $ here, it is not clear, but I assume $ corresponds to the board and not the whole server.
Lambda has one of greatest GPU benchmarks, though I cannot find methodology on how this and some other criteria are computed.