​​​​
Model#HPUPrecisionPerformanceFramework Version
MLPerf4.0 LLama 2 70B Server8fp86252 token/secPyTorch 2.3.1
MLPerf4.0 Llama 2 70B Offline​8fp87581 token/secPyTorch 2.3.1
MLPerf4.0 Stable Diffusion XL Server8fp86.25 token/secPyTorch 2.3.1
MLPerf4.0 Stable Diffusion XL Offline​8fp86.48 token/secPyTorch 2.3.1

 

​​​​​​​​​​​​​​​​
Model#HPUPrecisionInput LengthOutput LengthThroughputBatchFramework Version
LLaMA 2 7B1fp812812812772 tokens/sec1230Optimum Habana 1.12.1
LLaMA 2 7B1fp812820484787 tokens/sec163Optimum Habana 1.12.1
LLaMA 2 7B1fp820481281318 tokens/sec94Optimum Habana 1.12.1
LLaMA 2 7B1fp8204820481967 tokens/sec81Optimum Habana 1.12.1
LLaMA 3 8B1fp812812817331 tokens/sec2429Optimum Habana 1.12.1
LLaMA 3 8B1fp8128204811106 tokens/sec289Optimum Habana 1.12.1
LLaMA 3 8B1fp820481281762 tokens/sec179Optimum Habana 1.12.1
LLaMA 3 8B1fp8204820485379 tokens/sec155Optimum Habana 1.12.1
LLaMA 2 70B2fp81281282784 tokens/sec1750DeepSpeed 0.14.0, Optimum Habana 1.12.1
LLaMA 2 70B2fp812820483186 tokens/sec750DeepSpeed 0.14.0, Optimum Habana 1.12.1
LLaMA 2 70B2fp82048128292 tokens/sec95DeepSpeed 0.14.0, Optimum Habana 1.12.1
LLaMA 2 70B2fp8204820481392 tokens/sec78DeepSpeed 0.14.0, Optimum Habana 1.12.1
Mistral 7B1fp812812813112 tokens/sec896Optimum Habana 1.12.1
Mistral 7B1fp812820487947 tokens/sec120Optimum Habana 1.12.1
Mistral 7B1fp820481281360 tokens/sec120Optimum Habana 1.12.1
Mistral 7B1fp8204820483143 tokens/sec44Optimum Habana 1.12.1

​​​​​​​​​​
Model#HPUPrecisionInput LengthLatencyBatchFramework Version
LLaMA 2 7B1fp81287.62 ms1Optimum Habana 1.12.1
LLaMA 2 7B1fp8204856.31 ms1Optimum Habana 1.12.1
LLaMA 3 8B1fp81288.17 ms1Optimum Habana 1.12.1
LLaMA 3 8B1fp8204860.52 ms1Optimum Habana 1.12.1
LLaMA 2 70B8fp812826.93 ms1DeepSpeed 0.14.0, Optimum Habana 1.12.1
LLaMA 2 70B8fp82048116 ms1DeepSpeed 0.14.0, Optimum Habana 1.12.1
Mistral 7B1fp812810.8 ms1Optimum Habana 1.12.1
Mistral 7B1fp8204892 ms1Optimum Habana 1.12.1
LLaMA 3 8B1fp81288.17 ms1Optimum Habana 1.12.1
LLaMA 3 8B1fp8204860.52 ms1Optimum Habana 1.12.1

​​​​​​
Model#HPUPrecisionThroughputLatency‡BatchFramework Version
Stable Diffusion v2.1 (512x512)**1bf161.23 img/sec813 ms1Lightning 2.2.0
Stable Diffusion v2.1 (768X768)**1bf160.4 img/sec2500 ms1Lightning 2.2.0
Bert FT (torch.compile)1bf16814 token/sec29.48 ms24 
Resnet50 (torch.compile)1bf1617018 img/sec15.04 ms256 
Unet2D1bf167525 img/sec8.5 ms64Lightning 2.3.3
Unet3D1bf16114.74 img/sec17.43 ms2Lightning 2.3.3

 

​​​​​​​​​​​​​​​
Model#HPUPrecisionThroughputLatencyBatchTaskFramework Version
Bert (Language Modeling)1bf1689 token/sec44.94 ms4text-generationOptimum Habana 1.12.1
Bert (Question Answering)1bf16662 token/sec12.08 ms8question-answeringOptimum Habana 1.12.1
Bert (Text Classification)1bf161992 token/sec4.01 ms8language-modelingOptimum Habana 1.12.1
Bloomz8bf1637 token/sec27.02 ms1text-generationDeepSpeed 0.14.0, Optimum Habana 1.12.1
BridgeTower1bf163224 token/sec4.96 ms16constrastive-image-textOptimum Habana 1.12.1
ESMFold1bf162.91 token/sec343.64 ms1protein-foldingOptimum Habana 1.12.1
GPT-J8bf16588 token/sec6.8 ms4text-generationOptimum Habana 1.12.1
MPT-7B 1932 Tokens1bf16121 token/sec8.26 ms1text-generationOptimum Habana 1.12.1
OPT1bf161013 token/sec0.98 ms1text-generationOptimum Habana 1.12.1
StableDiffusion v2.1 (512x512)1bf161.35 images/sec2962.96 ms4stable-diffusionOptimum Habana 1.12.1
StableLM-7B 2048 Tokens1bf16128 token/sec7.81 ms1text-generationOptimum Habana 1.12.1
StarCoder1bf1665 token/sec15.38 ms1text-generationOptimum Habana 1.12.1
T5-3B Summarization Greedy1bf1612.38 token/sec5331.17 ms1summarizationOptimum Habana 1.12.1
Wav2vec(Audio Classification)1bf161817 token/sec2.2 ms4audio-classificationOptimum Habana 1.12.1
Wav2vec(Speech Recoginition)1bf1619.48 token/sec205.33 ms4speech-recoginitionOptimum Habana 1.12.1

​​​
Model#HPUPrecisionThroughputLatencyBatch SizeFramework Version
Bert1bf16147.6 token/sec162.6 ms24 
Unet2D1bf162359 img/sec27.13 ms64Lightning 2.3.3
Unet3D1bf1629.6 img/sec67.56 ms2Lightning 2.3.3

​​​​​​​
Model#HPUPrecisionThroughputLatencyBatchTaskFramework Version
HF Bert (Language Modeling)1bf1638.7 token/sec103.35 ms4language-modelingOptimum Habana 1.12.1
HF Bert (Question Answering)1bf16128.7 token/sec62.16 ms8question-answeringOptimum Habana 1.12.1
HF Bert (Text Classification)1bf16434.2 token/sec18.42 ms8text-classificationOptimum Habana 1.12.1
Bart-Greedy1bf163.1 token/sec645.16 ms2summarizationOptimum Habana 1.12.1
ESMFold1bf1613.9 token/sec71.94 ms1protein-foldingOptimum Habana 1.12.1
StableDiffusion V2-1 (512x512)1bf160.4 token/sec10000 ms4text to image generationOptimum Habana 1.12.1
Wav2vec(Audio Classification)1bf161287 token/sec3.1 ms4speech-recognitionOptimum Habana 1.12.1