Llama 405B 506 tokens/second on an H200

Article URL: https://developer.nvidia.com/blog/boosting-llama-3-1-405b-throughput-by-another-1-5x-on-nvidia-h200-tensor-core-gpus-and-nvlink-switch/

Comments URL: https://news.ycombinator.com/item?id=41833287

Points: 1

# Comments: 1