NVIDIA Certified Professional - Generative AI LLMs
Validates professional-level expertise in developing, optimizing, and deploying large language models using NVIDIA platforms, including LLM architecture and transformer internals, prompt engineering with chain-of-thought and few-shot techniques, data preparation and tokenization, model optimization with quantization and pruning, fine-tuning with SFT and PEFT/LoRA, evaluation methodology, GPU acceleration and optimization for distributed training, model deployment with TensorRT-LLM and Triton, production monitoring and reliability, and safety and compliance. The exam covers ten domains: Model Optimization (17%), GPU Acceleration and Optimization (14%), Prompt Engineering (13%), Fine-Tuning (13%), Data Preparation (9%), Model Deployment (9%), Evaluation (7%), Production Monitoring and Reliability (7%), LLM Architecture (6%), and Safety, Ethics, and Compliance (5%). Format: 60-70 multiple-choice questions, 120 minutes, proctored online.
Exam domains
- Model Optimization17%
Compressing and accelerating LLMs with TensorRT-LLM: INT4/INT8/FP8 weight and activation quantization (FP8 on H100 Transformer Engine), PagedAttention KV-cache, FlashAttention, kernel fusion, GQA, and speculative decoding via ModelOpt calibration.
- GPU Acceleration and Optimization14%
Scaling LLM training/inference with NeMo Megatron tensor (TP), pipeline (PP), sequence (SP), and context (CP) parallelism plus FSDP, FP8 training, gradient checkpointing, CUDA Graphs, sharded Adam states, chunked prefill, and in-flight batching.
- Fine-Tuning13%
Adapting models with NeMo Framework via full SFT, PEFT/LoRA, and alignment (RLHF, DPO, SteerLM attribute-prediction and attribute-conditioned SFT). Covers NeMo-Run recipes, AutoModel for HF checkpoints, loss masking, and LoRA-to-NIM merging.
- Prompt Engineering13%
Designing production prompts for NIM-served, OpenAI-compatible LLMs: system/user/assistant roles, few-shot exemplars, Chain-of-Thought, ReAct, structured outputs, query decomposition for RAG, asymmetric NeMo Retriever query prefixes, and function-calling.
Sources
Questions are grounded in 50 references from official and authoritative materials.
- Streamlining Data Processing for Domain Adaptive Pretraining with NVIDIA NeMo Curator | NVIDIA Technical Blog
- Post-Training Quantization of LLMs with NVIDIA NeMo and NVIDIA TensorRT Model Optimizer | NVIDIA Technical Blog
- NVIDIA NIM 1.4 Ready to Deploy with 2.4x Faster Inference | NVIDIA Technical Blog