Choose a validated model for reliable serving

Red Hat AI 3

Red Hat AI validated models

Red Hat AI Documentation Team

Abstract

Learn about the validated models that you can inference serve with Red Hat AI.

Chapter 1. About Red Hat AI validated models

Red Hat AI validated models have been tested and verified to work correctly across supported hardware and product configurations. These models are available as Hugging Face downloads, as OCI artifact images, and as modelcar container images. Platform-specific validated models are also available for IBM Spyre on IBM Power and IBM Z systems.

Note

If you are using AI Inference with Podman as part of a RHEL AI deployment, use ModelCar container images or Hugging Face models.

If you are using AI Inference as part of an Red Hat OpenShift AI deployment on OpenShift Container Platform, use OCI artifact images.

Red Hat uses Content from github.com is not included.GuideLLM for performance benchmarking and Content from github.com is not included.Language Model Evaluation Harness for accuracy evaluations.

Explore the Red Hat AI validated models collections on Content from huggingface.co is not included.Hugging Face.

Important

AMD GPUs support FP8 (W8A8) and GGUF quantization variant models only. For more information, see Content from docs.vllm.ai is not included.Supported hardware.

Chapter 2. Red Hat AI validated models - February 2026

The following models, available from Content from huggingface.co is not included.RedHat AI on Hugging Face, are validated for use with Red Hat AI Inference.

Table 2.1. Red Hat AI validated models - February 2026 collection

ModelQuantized variantsHugging Face model cardsValidated on

granite-4.0-h-small

FP8

  • RHAIIS 3.3
  • RHOAI 3.3

granite-4.0-h-tiny

FP8

  • RHAIIS 3.3
  • RHOAI 3.3

Ministral-3-14B-Instruct-2512

None

  • RHAIIS 3.3
  • RHOAI 3.3

Phi-4-reasoning

FP8

  • RHAIIS 3.3
  • RHOAI 3.3

Qwen3-Next-80B-A3B-Instruct

INT4

  • RHAIIS 3.3
  • RHOAI 3.3

Qwen3-VL-235B-A22B-Instruct-NVFP4

None

  • RHAIIS 3.3
  • RHOAI 3.3

Chapter 3. Red Hat AI validated models - January 2026

The following models, available from Content from huggingface.co is not included.RedHat AI on Hugging Face, are validated for use with Red Hat AI Inference.

Table 3.1. Red Hat AI validated models - January 2026 collection

ModelQuantized variantsHugging Face model cardsValidated on

Apertus-8B-Instruct-2509

FP8

  • RHAIIS 3.2.5
  • RHOAI 3.2

Mistral-Large-3-675B-Instruct-2512

None

  • RHAIIS 3.2.5
  • RHOAI 3.2

Mistral-Large-3-675B-Instruct-2512-NVFP4

None

  • RHAIIS 3.2.5
  • RHOAI 3.0

NVIDIA-Nemotron-3-Nano-30B-A3B

FP8

  • RHAIIS 3.2.5
  • RHOAI 3.0

Chapter 4. NVFP4 Models

The following models, available from Content from huggingface.co is not included.RedHat AI on Hugging Face, are validated for use with Red Hat AI Inference.

Table 4.1. NVFP4 Models collection

ModelQuantized variantsHugging Face model cardsValidated on

Mistral-Large-3-675B-Instruct-2512-NVFP4

None

  • RHAIIS 3.2.5
  • RHOAI 3.0

Qwen3-VL-235B-A22B-Instruct-NVFP4

None

  • RHAIIS 3.3
  • RHOAI 3.3

Chapter 5. Red Hat AI validated models - October 2025 collection

The following models, available from Content from huggingface.co is not included.RedHat AI on Hugging Face, are validated for use with Red Hat AI Inference.

Table 5.1. Red Hat AI validated models - October 2025 collection

ModelQuantized variantsHugging Face model cardValidated on

gpt-oss-120b

None

  • RHAIIS 3.2.2
  • RHOAI 2.25

gpt-oss-20b

None

  • RHAIIS 3.2.2
  • RHOAI 2.25

NVIDIA-Nemotron-Nano-9B-v2

INT4, FP8

  • RHAIIS 3.2.2
  • RHOAI 2.25

Qwen3-Coder-480B-A35B-Instruct

FP8

  • RHAIIS 3.2.2
  • RHOAI 2.25

Voxtral-Mini-3B-2507

FP8

  • RHAIIS 3.2.2
  • RHOAI 2.25

whisper-large-v3-turbo

INT4

  • RHAIIS 3.2.2
  • RHOAI 2.25

Chapter 6. Validated models on Hugging Face - September 2025 collection

The following models, available from Content from huggingface.co is not included.RedHat AI on Hugging Face, are validated for use with Red Hat AI Inference.

Table 6.1. Red Hat AI validated models - September 2025 collection

ModelQuantized variantsHugging Face model cardValidated on

DeepSeek-R1-0528

INT4

  • RHAIIS 3.2.1
  • RHOAI 2.24

gemma-3n-E4B-it

FP8

  • RHAIIS 3.2.1
  • RHOAI 2.24

Kimi-K2-Instruct

INT4

  • RHAIIS 3.2.1
  • RHOAI 2.24

Qwen3-8B

FP8

  • RHAIIS 3.2.1
  • RHOAI 2.24

Chapter 7. Validated models on Hugging Face - May 2025 collection

The following models, available from Content from huggingface.co is not included.RedHat AI on Hugging Face, are validated for use with Red Hat AI Inference.

Table 7.1. Red Hat AI validated models - May 2025 collection

ModelQuantized variantsHugging Face model cardValidated on

gemma-2-9b-it

FP8

  • RHAIIS 3.0
  • RHELAI 1.5
  • RHOAI 2.20

granite-3.1-8b-base

INT4

  • RHAIIS 3.0
  • RHELAI 1.5
  • RHOAI 2.20

granite-3.1-8b-instruct

INT4, INT8, FP8

  • RHAIIS 3.0
  • RHELAI 1.5
  • RHOAI 2.20

Llama-3.1-8B-Instruct

None

  • RHAIIS 3.0
  • RHELAI 1.5
  • RHOAI 2.20

Llama-3.1-Nemotron-70B-Instruct-HF

FP8

  • RHAIIS 3.0
  • RHELAI 1.5
  • RHOAI 2.20

Llama-3.3-70B-Instruct

INT4, INT8, FP8

  • RHAIIS 3.0
  • RHELAI 1.5
  • RHOAI 2.20

Llama-4-Maverick-17B-128E-Instruct

FP8

  • RHAIIS 3.0
  • RHELAI 1.5
  • RHOAI 2.20

Llama-4-Scout-17B-16E-Instruct

INT4, FP8

  • RHAIIS 3.0
  • RHELAI 1.5
  • RHOAI 2.20

Meta-Llama-3.1-8B-Instruct

INT4, INT8, FP8

  • RHAIIS 3.0
  • RHELAI 1.5
  • RHOAI 2.20

Mistral-Small-24B-Instruct-2501

INT4, INT8, FP8

  • RHAIIS 3.0
  • RHELAI 1.5
  • RHOAI 2.20

Mistral-Small-3.1-24B-Instruct-2503

INT4, INT8, FP8

  • RHAIIS 3.0
  • RHELAI 1.5
  • RHOAI 2.20

Mixtral-8x7B-Instruct-v0.1

None

  • RHAIIS 3.0
  • RHELAI 1.5
  • RHOAI 2.20

phi-4

INT4, INT8, FP8

  • RHAIIS 3.0
  • RHELAI 1.5
  • RHOAI 2.20

Qwen2.5-7B-Instruct

INT4, INT8, FP8

  • RHAIIS 3.0
  • RHELAI 1.5
  • RHOAI 2.20

Chapter 8. Validated OCI artifact model container images

The following table lists validated OCI artifact model container images available from the Red Hat container registry, including baseline and quantized variants for each supported model.

Table 8.1. Validated OCI artifact model container images

ModelQuantized variantsModelCar images

llama-4-scout-17b-16e-instruct

INT4, FP8

  • Baseline: registry.redhat.io/rhelai1/llama-4-scout-17b-16e-instruct:1.5
  • INT4: registry.redhat.io/rhelai1/llama-4-scout-17b-16e-instruct-quantized-w4a16:1.5
  • FP8: registry.redhat.io/rhelai1/llama-4-scout-17b-16e-instruct-fp8-dynamic:1.5

llama-4-maverick-17b-128e-instruct

FP8

  • Baseline: registry.redhat.io/rhelai1/llama-4-maverick-17b-128e-instruct:1.5
  • FP8: registry.redhat.io/rhelai1/llama-4-maverick-17b-128e-instruct-fp8:1.5

mistral-small-3-1-24b-instruct-2503

INT4, INT8, FP8

  • Baseline: registry.redhat.io/rhelai1/mistral-small-3-1-24b-instruct-2503:1.5
  • INT4: registry.redhat.io/rhelai1/mistral-small-3-1-24b-instruct-2503-quantized-w4a16:1.5
  • INT8: registry.redhat.io/rhelai1/mistral-small-3-1-24b-instruct-2503-quantized-w8a8:1.5
  • FP8: registry.redhat.io/rhelai1/mistral-small-3-1-24b-instruct-2503-fp8-dynamic:1.5

llama-3-3-70b-instruct

INT4, INT8, FP8

  • Baseline: registry.redhat.io/rhelai1/llama-3-3-70b-instruct:1.5
  • INT4: registry.redhat.io/rhelai1/llama-3-3-70b-instruct-quantized-w4a16:1.5
  • INT8: registry.redhat.io/rhelai1/llama-3-3-70b-instruct-quantized-w8a8:1.5
  • FP8: registry.redhat.io/rhelai1/llama-3-3-70b-instruct-fp8-dynamic:1.5

llama-3-1-8b-instruct

INT4, INT8, FP8

  • Baseline: registry.redhat.io/rhelai1/llama-3-1-8b-instruct:1.5
  • INT4: registry.redhat.io/rhelai1/llama-3-1-8b-instruct-quantized-w4a16:1.5
  • INT8: registry.redhat.io/rhelai1/llama-3-1-8b-instruct-quantized-w8a8:1.5
  • FP8: registry.redhat.io/rhelai1/llama-3-1-8b-instruct-fp8-dynamic:1.5

granite-3-1-8b-instruct

INT4, INT8, FP8

  • Baseline: registry.redhat.io/rhelai1/granite-3-1-8b-instruct:1.5
  • INT4: registry.redhat.io/rhelai1/granite-3-1-8b-instruct-quantized-w4a16:1.5
  • INT8: registry.redhat.io/rhelai1/granite-3-1-8b-instruct-quantized-w8a8:1.5
  • FP8: registry.redhat.io/rhelai1/granite-3-1-8b-instruct-fp8-dynamic:1.5

phi-4

INT4, INT8, FP8

  • Baseline: registry.redhat.io/rhelai1/phi-4:1.5
  • INT4: registry.redhat.io/rhelai1/phi-4-quantized-w4a16:1.5
  • INT8: registry.redhat.io/rhelai1/phi-4-quantized-w8a8:1.5
  • FP8: registry.redhat.io/rhelai1/phi-4-fp8-dynamic:1.5

qwen2-5-7b-instruct

INT4, INT8, FP8

  • Baseline: registry.redhat.io/rhelai1/qwen2-5-7b-instruct:1.5
  • INT4: registry.redhat.io/rhelai1/qwen2-5-7b-instruct-quantized-w4a16:1.5
  • INT8: registry.redhat.io/rhelai1/qwen2-5-7b-instruct-quantized-w8a8:1.5
  • FP8: registry.redhat.io/rhelai1/qwen2-5-7b-instruct-fp8-dynamic:1.5

mistral-small-24b-instruct-2501

INT4, INT8, FP8

  • Baseline: registry.redhat.io/rhelai1/mistral-small-24b-instruct-2501:1.5
  • INT4: registry.redhat.io/rhelai1/mistral-small-24b-instruct-2501-quantized-w4a16:1.5
  • INT8: registry.redhat.io/rhelai1/mistral-small-24b-instruct-2501-quantized-w8a8:1.5
  • FP8: registry.redhat.io/rhelai1/mistral-small-24b-instruct-2501-fp8-dynamic:1.5

mixtral-8x7b-instruct-v0-1

None

  • Baseline: registry.redhat.io/rhelai1/mixtral-8x7b-instruct-v0-1:1.4

granite-3-1-8b-base

INT4 (baseline currently unavailable)

  • INT4: registry.redhat.io/rhelai1/granite-3-1-8b-base-quantized-w4a16:1.5

granite-3.1-8b-starter-v2

None

  • Baseline: registry.redhat.io/rhelai1/granite-3.1-8b-starter-v2:1.5

llama-3-1-nemotron-70b-instruct-hf

FP8

  • Baseline: registry.redhat.io/rhelai1/llama-3-1-nemotron-70b-instruct-hf:1.5
  • FP8: registry.redhat.io/rhelai1/llama-3-1-nemotron-70b-instruct-hf-fp8-dynamic:1.5

gemma-2-9b-it

FP8

  • Baseline: registry.redhat.io/rhelai1/gemma-2-9b-it:1.5
  • FP8: registry.redhat.io/rhelai1/gemma-2-9b-it-fp8:1.5

deepseek-r1-0528

INT4 (baseline currently unavailable)

  • INT4: registry.redhat.io/rhelai1/deepseek-r1-0528-quantized-w4a16:1.5

qwen3-8b

FP8 (baseline currently unavailable)

  • FP8: registry.redhat.io/rhelai1/qwen3-8b-fp8-dynamic:1.5

kimi-k2-instruct

INT4 (baseline currently unavailable)

  • INT4: registry.redhat.io/rhelai1/kimi-k2-instruct-quantized-w4a16:1.5

gemma-3n-e4b-it

FP8 (baseline currently unavailable)

  • FP8: registry.redhat.io/rhelai1/gemma-3n-e4b-it-fp8-dynamic:1.5

gpt-oss-120b

None

  • Baseline: registry.redhat.io/rhelai1/gpt-oss-120b:1.5

gpt-oss-20b

None

  • Baseline: registry.redhat.io/rhelai1/gpt-oss-20b:1.5

qwen3-coder-480b-a35b-instruct

FP8 (baseline currently unavailable)

  • FP8: registry.redhat.io/rhelai1/qwen3-coder-480b-a35b-instruct-fp8:1.5

whisper-large-v3-turbo

INT4 (baseline currently unavailable)

  • INT4: registry.redhat.io/rhelai1/whisper-large-v3-turbo-quantized-w4a16:1.5

voxtral-mini-3b-2507

FP8 (baseline currently unavailable)

  • FP8: registry.redhat.io/rhelai1/voxtral-mini-3b-2507-fp8-dynamic:1.5

nvidia-nemotron-nano-9b-v2

FP8 (baseline currently unavailable)

  • FP8: registry.redhat.io/rhelai1/nvidia-nemotron-nano-9b-v2-fp8-dynamic:1.5

Chapter 9. Validated Red Hat AI ModelCar container images

Table 9.1. Validated Red Hat AI ModelCar container images

ModelQuantized variantsModelCar images

llama-4-scout-17b-16e-instruct

INT4, FP8

  • Baseline: registry.redhat.io/rhelai1/modelcar-llama-4-scout-17b-16e-instruct:1.5
  • INT4: registry.redhat.io/rhelai1/modelcar-llama-4-scout-17b-16e-instruct-quantized-w4a16:1.5
  • FP8: registry.redhat.io/rhelai1/modelcar-llama-4-scout-17b-16e-instruct-fp8-dynamic:1.5

llama-4-maverick-17b-128e-instruct

FP8

  • Baseline: registry.redhat.io/rhelai1/modelcar-llama-4-maverick-17b-128e-instruct:1.5
  • FP8: registry.redhat.io/rhelai1/modelcar-llama-4-maverick-17b-128e-instruct-fp8:1.5

mistral-small-3-1-24b-instruct-2503

INT4, INT8, FP8

  • Baseline: registry.redhat.io/rhelai1/modelcar-mistral-small-3-1-24b-instruct-2503:1.5
  • INT4: registry.redhat.io/rhelai1/modelcar-mistral-small-3-1-24b-instruct-2503-quantized-w4a16:1.5
  • INT8: registry.redhat.io/rhelai1/modelcar-mistral-small-3-1-24b-instruct-2503-quantized-w8a8:1.5
  • FP8: registry.redhat.io/rhelai1/modelcar-mistral-small-3-1-24b-instruct-2503-fp8-dynamic:1.5

llama-3-3-70b-instruct

INT4, INT8, FP8

  • Baseline: registry.redhat.io/rhelai1/modelcar-llama-3-3-70b-instruct:1.5
  • INT4: registry.redhat.io/rhelai1/modelcar-llama-3-3-70b-instruct-quantized-w4a16:1.5
  • INT8: registry.redhat.io/rhelai1/modelcar-llama-3-3-70b-instruct-quantized-w8a8:1.5
  • FP8: registry.redhat.io/rhelai1/modelcar-llama-3-3-70b-instruct-fp8-dynamic:1.5

llama-3-1-8b-instruct

INT4, INT8, FP8

  • Baseline: registry.redhat.io/rhelai1/modelcar-llama-3-1-8b-instruct:1.5
  • INT4: registry.redhat.io/rhelai1/modelcar-llama-3-1-8b-instruct-quantized-w4a16:1.5
  • INT8: registry.redhat.io/rhelai1/modelcar-llama-3-1-8b-instruct-quantized-w8a8:1.5
  • FP8: registry.redhat.io/rhelai1/modelcar-llama-3-1-8b-instruct-fp8-dynamic:1.5

granite-3-1-8b-instruct

INT4, INT8, FP8

  • Baseline: registry.redhat.io/rhelai1/modelcar-granite-3-1-8b-instruct:1.5
  • INT4: registry.redhat.io/rhelai1/modelcar-granite-3-1-8b-instruct-quantized-w4a16:1.5
  • INT8: registry.redhat.io/rhelai1/modelcar-granite-3-1-8b-instruct-quantized-w8a8:1.5
  • FP8: registry.redhat.io/rhelai1/modelcar-granite-3-1-8b-instruct-fp8-dynamic:1.5

phi-4

INT4, INT8, FP8

  • Baseline: registry.redhat.io/rhelai1/modelcar-phi-4:1.5
  • INT4: registry.redhat.io/rhelai1/modelcar-phi-4-quantized-w4a16:1.5
  • INT8: registry.redhat.io/rhelai1/modelcar-phi-4-quantized-w8a8:1.5
  • FP8: registry.redhat.io/rhelai1/modelcar-phi-4-fp8-dynamic:1.5

qwen2-5-7b-instruct

INT4, INT8, FP8

  • Baseline: registry.redhat.io/rhelai1/modelcar-qwen2-5-7b-instruct:1.5
  • INT4: registry.redhat.io/rhelai1/modelcar-qwen2-5-7b-instruct-quantized-w4a16:1.5
  • INT8: registry.redhat.io/rhelai1/modelcar-qwen2-5-7b-instruct-quantized-w8a8:1.5
  • FP8: registry.redhat.io/rhelai1/modelcar-qwen2-5-7b-instruct-fp8-dynamic:1.5

mistral-small-24b-instruct-2501

INT4, INT8, FP8

  • Baseline: registry.redhat.io/rhelai1/modelcar-mistral-small-24b-instruct-2501:1.5
  • INT4: registry.redhat.io/rhelai1/modelcar-mistral-small-24b-instruct-2501-quantized-w4a16:1.5
  • INT8: registry.redhat.io/rhelai1/modelcar-mistral-small-24b-instruct-2501-quantized-w8a8:1.5
  • FP8: registry.redhat.io/rhelai1/modelcar-mistral-small-24b-instruct-2501-fp8-dynamic:1.5

mixtral-8x7b-instruct-v0-1

None

  • Baseline: registry.redhat.io/rhelai1/modelcar-mixtral-8x7b-instruct-v0-1:1.4

granite-3-1-8b-base

INT4 (baseline currently unavailable)

  • INT4: registry.redhat.io/rhelai1/modelcar-granite-3-1-8b-base-quantized-w4a16:1.5

granite-3-1-8b-starter-v2

None

  • Baseline: registry.redhat.io/rhelai1/modelcar-granite-3-1-8b-starter-v2:1.5

llama-3-1-nemotron-70b-instruct-hf

FP8

  • Baseline: registry.redhat.io/rhelai1/modelcar-llama-3-1-nemotron-70b-instruct-hf:1.5
  • FP8: registry.redhat.io/rhelai1/modelcar-llama-3-1-nemotron-70b-instruct-hf-fp8-dynamic:1.5

gemma-2-9b-it

FP8

  • Baseline: registry.redhat.io/rhelai1/modelcar-gemma-2-9b-it:1.5
  • FP8: registry.redhat.io/rhelai1/modelcar-gemma-2-9b-it-fp8:1.5

deepseek-r1-0528

INT4 (baseline currently unavailable)

  • INT4: registry.redhat.io/rhelai1/modelcar-deepseek-r1-0528-quantized-w4a16:1.5

qwen3-8b

FP8 (baseline currently unavailable)

  • FP8: registry.redhat.io/rhelai1/modelcar-qwen3-8b-fp8-dynamic:1.5

kimi-k2-instruct

INT4 (baseline currently unavailable)

  • INT4: registry.redhat.io/rhelai1/modelcar-kimi-k2-instruct-quantized-w4a16:1.5

gemma-3n-e4b-it

FP8

  • Baseline: registry.redhat.io/rhelai1/modelcar-gemma-3n-e4b-it:1.5
  • FP8: registry.redhat.io/rhelai1/modelcar-gemma-3n-e4b-it-fp8-dynamic:1.5

gpt-oss-120b

None

  • Baseline: registry.redhat.io/rhelai1/modelcar-gpt-oss-120b:1.5

gpt-oss-20b

None

  • Baseline: registry.redhat.io/rhelai1/modelcar-gpt-oss-20b:1.5

qwen3-coder-480b-a35b-instruct

FP8 (baseline currently unavailable)

  • FP8: registry.redhat.io/rhelai1/modelcar-qwen3-coder-480b-a35b-instruct-fp8:1.5

whisper-large-v3-turbo

INT4 (baseline currently unavailable)

  • INT4: registry.redhat.io/rhelai1/modelcar-whisper-large-v3-turbo-quantized-w4a16:1.5

voxtral-mini-3b-2507

FP8 (baseline currently unavailable)

  • FP8: registry.redhat.io/rhelai1/modelcar-voxtral-mini-3b-2507-fp8-dynamic:1.5

nvidia-nemotron-nano-9b-v2

FP8 (baseline currently unavailable)

  • FP8: registry.redhat.io/rhelai1/modelcar-nvidia-nemotron-nano-9b-v2-fp8-dynamic:1.5

phi-4-reasoning

FP8 (baseline currently unavailable)

  • FP8: registry.redhat.io/rhai/modelcar-phi-4-reasoning-fp8-dynamic:3.0

qwen3-vl-235b-a22b-instruct-nvfp4

None

  • Baseline: registry.redhat.io/rhai/modelcar-qwen3-vl-235b-a22b-instruct-nvfp4:3.0

qwen3-next-80b-a3b-instruct

INT4 (baseline currently unavailable)

  • INT4: registry.redhat.io/rhai/modelcar-qwen3-next-80b-a3b-instruct-quantized-w4a16:3.0

granite-4-0-h-tiny

FP8

  • Baseline: registry.redhat.io/rhai/modelcar-granite-4-0-h-tiny:3.0
  • FP8: registry.redhat.io/rhai/modelcar-granite-4-0-h-tiny-fp8-dynamic:3.0

granite-4-0-h-small

FP8

  • Baseline: registry.redhat.io/rhai/modelcar-granite-4-0-h-small:3.0
  • FP8: registry.redhat.io/rhai/modelcar-granite-4-0-h-small-fp8-dynamic:3.0

mistral-large-3-675b-instruct-2512

None

  • Baseline: registry.redhat.io/rhai/modelcar-mistral-large-3-675b-instruct-2512:3.0

mistral-large-3-675b-instruct-2512-nvfp4

None

  • Baseline: registry.redhat.io/rhai/modelcar-mistral-large-3-675b-instruct-2512-nvfp4:3.0

apertus-8b-instruct-2509

FP8 (baseline currently unavailable)

  • FP8: registry.redhat.io/rhai/modelcar-apertus-8b-instruct-2509-fp8-dynamic:3.0

nvidia-nemotron-3-nano-30b-a3b

FP8 (baseline currently unavailable)

  • FP8: registry.redhat.io/rhai/modelcar-nvidia-nemotron-3-nano-30b-a3b-fp8:3.0

ministral-3-14b-instruct-2512

None

  • Baseline: registry.redhat.io/rhai/modelcar-ministral-3-14b-instruct-2512:3.0

Chapter 10. Validated models for x86_64 CPU inference serving

The following large language models have been validated for use with Red Hat AI Inference on x86_64 CPUs.

Table 10.1. Validated models for inferencing with x86_64 CPU

Important

Quantization formats that require GPU-specific kernels, such as Marlin format, are not supported for CPU inference. Use AWQ or GPTQ quantization formats that are compatible with CPU execution.

The following table provides general guidance for approximate system RAM requirements based on model size:

Table 10.2. Memory requirements for inference serving with x86_64 CPU

Model sizeMinimum RAMRecommended RAM

125M - 500M

8 GB

16 GB

500M - 1B

16 GB

32 GB

1B - 3B

32 GB

64 GB

Note

Actual memory usage depends on the model architecture, context length, and batch size. Increase the VLLM_CPU_KVCACHE_SPACE environment variable to allocate more memory for the key-value cache when using longer context lengths.

Chapter 11. Validated models for use with IBM Power and IBM Spyre AI accelerators

The following large language models are supported for IBM Power systems with IBM Spyre AI accelerators.

Note

IBM Spyre AI accelerator cards support FP16 format model weights only. For compatible models, the Red Hat AI Inference inference engine automatically converts weights to FP16 at startup. No additional configuration is needed.

Table 11.2. Reranker models for use with IBM Spyre AI accelerators

ModelHugging Face model card

bge-reranker-v2-m3

Content from huggingface.co is not included.BAAI/bge-reranker-v2-m3

Important

Pre-built IBM Granite models run with the specific Python packages that are included in the Red Hat AI Inference Spyre container image. The models are tied to fixed configurations for Spyre card count, batch size, and input/output context sizes.

Updating or replacing Python packages in the Red Hat AI Inference Spyre container image is not supported.

Chapter 12. Validated models for use with IBM Z and IBM Spyre AI accelerators

The following large language models are supported for IBM Z systems with IBM Spyre AI accelerators.

Note

IBM Spyre AI accelerator cards support FP16 format model weights only. For compatible models, the Red Hat AI Inference inference engine automatically converts weights to FP16 at startup. No additional configuration is needed.

Table 12.1. Decoder models for use with IBM Spyre AI accelerators

ModelHugging Face model card

granite-3.3-8b-instruct

Content from huggingface.co is not included.ibm-granite/granite-3.3-8b-instruct

Important

Pre-built IBM Granite models run with the specific Python packages that are included in the Red Hat AI Inference Spyre container image. The models are tied to fixed configurations for Spyre card count, batch size, and input/output context sizes.

Updating or replacing Python packages in the Red Hat AI Inference Spyre container image is not supported.

Chapter 13. Validated models for geospatial inference with TerraTorch

The following IBM and NASA Prithvi geospatial foundation models are validated for use with AI Inference and TerraTorch.

Note

Prithvi-EO-2.0 models use the Vision Transformer (ViT) architecture and require TerraTorch as the model implementation backend. These models accept GeoTIFF imagery as input and return segmentation predictions.

Table 13.1. Prithvi geospatial models for use with TerraTorch

ModelUse caseHugging Face model cardValidated on

Prithvi-EO-2.0-300M-TL-Sen1Floods11

Flood detection and mapping

Content from huggingface.co is not included.Prithvi-EO-2.0-300M-TL-Sen1Floods11

RHAIIS 3.3

Prithvi-EO-2.0-300M-BurnScars

Burn scar detection

Content from huggingface.co is not included.Prithvi-EO-2.0-300M-BurnScars

RHAIIS 3.3

Explore the IBM and NASA geospatial models collection on Content from huggingface.co is not included.Hugging Face.

Important

Prithvi geospatial models are validated for use with NVIDIA CUDA AI accelerators only.

These models require specific vLLM server arguments to function correctly. You must include --skip-tokenizer-init, --enforce-eager, and --enable-mm-embeds when starting the inference server.

For more information, see Content from torchgeo.org is not included.Serving TerraTorch Models with vLLM.

Legal Notice

Copyright © Red Hat.
Except as otherwise noted below, the text of and illustrations in this documentation are licensed by Red Hat under the Creative Commons Attribution–Share Alike 3.0 Unported license . If you distribute this document or an adaptation of it, you must provide the URL for the original version.
Red Hat, as the licensor of this document, waives the right to enforce, and agrees not to assert, Section 4d of CC-BY-SA to the fullest extent permitted by applicable law.
Red Hat, the Red Hat logo, JBoss, Hibernate, and RHCE are trademarks or registered trademarks of Red Hat, LLC. or its subsidiaries in the United States and other countries.
Linux® is the registered trademark of Linus Torvalds in the United States and other countries.
XFS is a trademark or registered trademark of Hewlett Packard Enterprise Development LP or its subsidiaries in the United States and other countries.
The OpenStack® Word Mark and OpenStack logo are trademarks or registered trademarks of the Linux Foundation, used under license.
All other trademarks are the property of their respective owners.