Skip to Main Content
Learning & Reference30 June 202622 min readAI Prompt Architect

Prompt Engineering Glossary: 100+ AI Prompting Terms Defined (2026)

Prompt Engineering Glossary: 100+ AI Prompting Terms Defined (2026)

The most comprehensive glossary of prompt engineering, context engineering & AI prompting terminology — from zero-shot to agentic orchestration, with definitions, examples & links to in-depth guides for every term.

Published: June 2026 · 22 min read · AI Prompt Architect · ExO Intelligence Council

Why You Need This Glossary

Prompt engineering has evolved from a niche curiosity into a core discipline for anyone building, deploying, or managing AI systems. The field moves fast — terms like context engineering, loop engineering, and reasoning effort didn’t exist eighteen months ago, yet they’re now standard vocabulary in production AI teams.

This glossary consolidates over 115 terms you’ll encounter across the prompt engineering ecosystem in 2026. Every definition includes a plain-English explanation, a practical example where relevant, and links to our in-depth guides so you can go deeper on any topic. Whether you’re preparing for a prompt engineering career, building your portfolio, or learning the fundamentals, this is your single reference.

We maintain this glossary as a living document, updated by the ExO Intelligence Council at AI Prompt Architect as the field evolves. Use the Prompt Scorer alongside this glossary to see how these concepts improve your own prompts.

A

A/B Testing (Prompts)
Definition: A controlled experiment comparing two or more prompt variants to measure which produces better outputs against defined metrics such as accuracy, relevance, and tone. Example: Testing “Summarise this contract” vs “Act as a senior legal counsel. Extract the five most material clauses from this contract” and measuring which scores higher on completeness. Learn more: Prompt A/B Testing Guide
Active Prompting
Definition: A technique where the model is first used to identify uncertain or ambiguous examples, which are then annotated with human-crafted chain-of-thought reasoning and fed back as few-shot exemplars. This creates a targeted, high-quality exemplar set rather than relying on random examples. Related: Few-Shot Prompting, Chain-of-Thought
Adapter Tuning
Definition: A parameter-efficient fine-tuning method that inserts small trainable modules (adapters) between frozen layers of a pre-trained model. Only the adapter weights are updated during training, preserving the original model whilst adding task-specific capability. Related: LoRA, PEFT
Agent Orchestration
Definition: The coordination of multiple AI agents, each with distinct roles and tools, to complete complex multi-step tasks. Orchestration involves routing, handoffs, shared memory, and error recovery between agents. Example: A research agent retrieves data, a summarisation agent condenses it, and a writing agent drafts the final report. Learn more: Agent Orchestration Prompts
Agentic AI
Definition: AI systems that act autonomously to achieve goals by planning, using tools, observing results, and iterating without constant human direction. Agentic architectures combine reasoning models with function calling, loops, and memory. Example: An agent that receives “book the cheapest London-to-Tokyo flight for next Tuesday” and autonomously searches airlines, compares prices, and completes the booking. Learn more: Agentic Prompt Engineering
AGENTS.md
Definition: A configuration file placed in a project repository that defines rules, constraints, and behavioural guidelines for AI coding agents. Similar to Cursor Rules or CLAUDE.md, it shapes how an AI assistant interacts with your codebase. Learn more: AGENTS.md vs CLAUDE.md Comparison
Alignment
Definition: The degree to which an AI model’s outputs match human values, intentions, and safety requirements. Alignment is achieved through training techniques (RLHF, constitutional AI) and runtime controls (system prompts, guardrails). Related: Prompt Safety & Guardrails
Analogical Prompting
Definition: A technique where the model generates its own relevant examples or analogies before solving a problem, rather than being given pre-defined few-shot examples. This self-generated context often improves reasoning on novel tasks. Related: Generated Knowledge Prompting
API (Language Model)
Definition: An Application Programming Interface that allows developers to programmatically send prompts to and receive completions from a language model. APIs define endpoints, authentication, rate limits, and response formats. Learn more: API Prompt Management
Attention Mechanism
Definition: The core component of transformer models that allows the AI to weigh the importance of different parts of the input when generating each output token. Self-attention is why the order and structure of your prompt matters. Example: Placing critical instructions at the beginning or end of a prompt exploits the attention mechanism’s tendency to weight those positions more heavily.
Audio Prompting
Definition: Providing audio inputs (speech, music, environmental sounds) directly to a multimodal model as part of the prompt, enabling tasks like transcription, translation, audio analysis, and voice-conditioned generation. Learn more: Audio Prompting Guide
Auto-Prompting (APE)
Definition: Using an AI model to automatically generate, evaluate, and optimise prompts. Also called Automatic Prompt Engineering. Frameworks like DSPy compile prompts programmatically rather than relying on manual crafting. Learn more: Meta-Prompting Techniques
AWQ (Activation-Aware Weight Quantisation)
Definition: A quantisation method that identifies and preserves the most important model weights based on activation patterns, allowing aggressive compression with minimal quality loss. AWQ typically outperforms round-to-nearest quantisation at equivalent bit widths. Related: GPTQ, GGUF

B

Batch Prompting
Definition: Sending multiple tasks or questions in a single prompt to reduce API calls and improve throughput. The model processes them sequentially within one context window. Example: “Translate each of the following five sentences to French: 1. … 2. …” Related: Reduce LLM API Costs
Benchmark
Definition: A standardised test set used to measure and compare model performance across tasks like reasoning (MMLU), coding (HumanEval), or instruction-following (IFEval). Benchmarks help prompt engineers choose the right model for a given task. Related: Evaluation
Bias (in AI)
Definition: Systematic skew in a model’s outputs caused by imbalances in training data, annotation guidelines, or reward modelling. Bias can manifest as stereotyping, underrepresentation, or factual distortion. Prompt engineers mitigate bias through careful persona definitions, constraint setting, and red teaming.
BYOK (Bring Your Own Key)
Definition: An architecture where users supply their own API keys for AI model providers, ensuring zero markup and full cost transparency. AI Prompt Architect uses BYOK with zero-knowledge encryption so your keys never touch our servers. Learn more: BYOK Explained

C

Chain-of-Thought (CoT) Prompting
Definition: A technique that instructs the model to reason step-by-step before producing a final answer. CoT dramatically improves accuracy on maths, logic, and multi-step reasoning tasks by making the model’s reasoning transparent and verifiable. Example: Adding “Let’s think step by step” to a maths problem can increase accuracy from 17% to 78% on GSM8K. Learn more: Chain-of-Thought Advanced Guide · CoT vs STCO Comparison
Chat Completion
Definition: The API format used by most modern LLMs where conversations are structured as arrays of message objects with roles (system, user, assistant). Each call sends the full conversation history and returns the model’s next response.
Claude Code
Definition: Anthropic’s terminal-based AI coding agent that operates directly in your development environment. Unlike IDE-integrated tools, Claude Code uses the command line for autonomous code generation, debugging, and refactoring. Learn more: Claude Code Prompting Guide
Completion
Definition: The text output generated by a language model in response to a prompt. In older API formats, completions were single text continuations; modern APIs use the chat completion format with structured messages.
Constitutional AI
Definition: An alignment approach developed by Anthropic where the model is trained to critique and revise its own outputs against a set of principles (a “constitution”). This reduces reliance on human feedback for safety tuning. Related: RLHF, DPO
Context Engineering
Definition: The discipline of designing the complete information environment — including system prompts, retrieved documents, tool definitions, and memory — that surrounds an AI model at inference time. Context engineering evolved from prompt engineering as context windows expanded and production systems required sophisticated information management rather than just clever phrasing. Learn more: Context Engineering vs Prompt Engineering
Context Length
Definition: The maximum number of tokens a model can process in a single call, encompassing both input and output. Context length determines how much information you can feed the model at once. In 2026, frontier models offer 128K–2M token contexts. Related: Context Window
Context Window
Definition: The total working memory available to a model during a single interaction, measured in tokens. It includes your system prompt, conversation history, retrieved documents, and the model’s response. Managing the context window efficiently is the core challenge of context engineering. Learn more: Structuring Prompts for Context Windows
Contrastive Prompting
Definition: A technique that improves output quality by providing both positive and negative examples — showing the model what to do and what to avoid. This boundary-setting approach is particularly effective for tone, style, and format tasks. Example: “Write a product description that sounds confident and specific (like Example A), NOT vague and generic (like Example B).”

D

Decoder-Only Model
Definition: A transformer architecture that generates text autoregressively — predicting one token at a time based on all preceding tokens. GPT-4, Claude, Gemini, and most modern LLMs use decoder-only architectures. Related: Encoder-Decoder
Dense Model
Definition: A neural network where every parameter is activated for every input, as opposed to a Mixture of Experts model where only a subset of parameters are active. Dense models are simpler but more computationally expensive at equivalent parameter counts.
Directional Stimulus Prompting
Definition: A technique that includes a small “hint” or directional cue in the prompt to steer the model towards a desired output without fully specifying the answer. The stimulus acts as a gentle guide rather than a rigid constraint. Example: “Summarise this article, particularly focusing on the financial implications” uses “financial implications” as the directional stimulus.
DPO (Direct Preference Optimisation)
Definition: An alignment training technique that directly optimises the model to prefer human-chosen outputs over rejected alternatives, without requiring a separate reward model. DPO simplifies the RLHF pipeline whilst achieving comparable alignment quality.
DSPy
Definition: A framework by Stanford NLP that replaces manual prompt engineering with programmatic “prompt compilation.” Instead of hand-crafting prompts, developers define input/output signatures and let DSPy automatically optimise the prompt through iterative evaluation. Related: Auto-Prompting

E

Embedding
Definition: A numerical vector representation of text where semantic meaning is encoded as coordinates in high-dimensional space. Similar concepts cluster together, enabling semantic search, classification, and RAG retrieval. Learn more: RAG Prompt Engineering
Emotional Prompting
Definition: Adding emotional context or urgency to prompts to influence output quality. Research shows that phrases like “This is very important to my career” can measurably improve model performance on certain tasks by engaging different attention patterns.
Encoder-Decoder Model
Definition: A transformer architecture with separate encoding (understanding input) and decoding (generating output) stages. Used primarily for translation and summarisation tasks. The original T5 and BART models use this architecture. Related: Decoder-Only
Endpoint (API)
Definition: A specific URL path in a language model API that accepts requests and returns responses. Different endpoints serve different functions — e.g., `/v1/chat/completions` for chat, `/v1/embeddings` for vector generation.
Evaluation (Prompt)
Definition: The systematic assessment of prompt quality using defined metrics such as accuracy, relevance, consistency, safety, and cost efficiency. Evaluation frameworks range from simple human scoring to automated LLM-as-judge approaches. Learn more: Prompt Evaluation Metrics
Explainability
Definition: The ability to understand and articulate why a model produced a specific output. Prompt engineers improve explainability by using chain-of-thought reasoning, asking models to cite sources, and structuring outputs with clear rationale sections.

F

Fairness
Definition: The principle that AI outputs should not systematically disadvantage or favour particular demographic groups. Prompt engineers address fairness through balanced persona definitions, diverse test cases, and explicit anti-bias constraints in system prompts. Related: Bias, Safety
Feedback Loop
Definition: A cyclical process where model outputs are evaluated (by humans or automated metrics) and the results are used to improve future prompts, fine-tuning data, or reward models. Feedback loops are essential for continuous prompt quality improvement.
Few-Shot Prompting
Definition: Providing two or more input-output examples within the prompt to demonstrate the desired pattern, format, or reasoning style. Few-shot prompting is one of the most reliable techniques for steering model behaviour without fine-tuning. Example: Showing the model three customer enquiry/response pairs before asking it to handle a new enquiry in the same format. Learn more: Few-Shot Prompting Complete Guide
Fine-Tuning
Definition: The process of further training a pre-trained model on a smaller, task-specific dataset to adapt it for a particular domain, style, or function. Fine-tuning bakes knowledge into the model’s weights, unlike prompting which provides knowledge at runtime. Learn more: Fine-Tuning vs Prompt Engineering
Function Calling
Definition: A model capability that generates structured JSON arguments for predefined functions, enabling AI to interact with external APIs, databases, and tools. The model decides when to call a function and with what parameters. Learn more: AI Agent Architectures & Tool Use

G

Generated Knowledge Prompting
Definition: A two-step technique where the model first generates relevant background knowledge about a topic, then uses that self-generated knowledge as context to answer the actual question. This improves accuracy on knowledge-intensive tasks without external retrieval.
GGUF
Definition: A file format for storing quantised language models, designed by the llama.cpp community. GGUF files enable running large models on consumer hardware by reducing precision from 32-bit to 4-bit or 8-bit representations. Related: Quantisation, GPTQ
GPTQ
Definition: A post-training quantisation algorithm that compresses model weights using one-shot calibration data. GPTQ achieves efficient 4-bit quantisation with minimal accuracy loss and is widely used for GPU-based inference. Related: AWQ, GGUF
Grounding
Definition: Connecting a model’s responses to verified, factual information sources to reduce hallucination. Grounding techniques include RAG, citation requirements, and constraining outputs to provided documents only. Learn more: Reduce LLM Hallucinations
Guardrails
Definition: Programmatic or prompt-level constraints that prevent AI models from producing harmful, off-topic, or policy-violating outputs. Guardrails can be implemented in system prompts, middleware layers, or dedicated safety classifiers. Learn more: Prompt Safety & Guardrails

H

Hallucination
Definition: When a model generates confident-sounding information that is factually incorrect, fabricated, or unsupported by the provided data. Hallucinations are a fundamental challenge in LLM deployment, mitigated through grounding, RAG, and structured prompting frameworks like STCO. Learn more: Stop LLM Hallucination Guide
Hard Prompt
Definition: A discrete, human-readable text prompt as opposed to a soft prompt which operates in the model’s embedding space. Standard prompt engineering works with hard prompts — the actual words and instructions you type.
HITL (Human-in-the-Loop)
Definition: A workflow design where human oversight is integrated at critical decision points in an AI pipeline. HITL ensures quality control, catches errors, and maintains accountability in sensitive or high-stakes applications. Related: Agentic AI, Safety

I

Inference
Definition: The process of running a trained model to generate outputs from inputs. In prompt engineering, inference refers to each API call where your prompt is processed and a completion is returned. Inference costs are measured in tokens processed per second and cost per token.
Instruction Tuning
Definition: A fine-tuning approach where a model is trained on large datasets of instruction-response pairs, teaching it to follow natural language commands reliably. Instruction-tuned models (like ChatGPT, Claude) are dramatically better at following prompts than base models.
Interpretability
Definition: The degree to which a human can understand the internal workings and decision-making process of an AI model. While explainability focuses on outputs, interpretability examines the model’s internal representations and attention patterns.

J

Jailbreaking
Definition: Attempts to bypass a model’s safety guardrails and content policies through adversarial prompting techniques. Jailbreaking exploits weaknesses in instruction following, role-playing, and context management. Understanding jailbreaks is essential for building robust defences. Related: Prompt Injection Defence Guide
JSON Mode
Definition: A model configuration that constrains output to valid JSON, ensuring machine-parseable structured responses. JSON mode eliminates the need for post-processing extraction and reduces format errors in API integrations. Learn more: JSON Output Prompt Engineering

K

Knowledge Distillation
Definition: The process of training a smaller “student” model to replicate the behaviour of a larger “teacher” model. Distillation transfers knowledge into a more efficient model that can run faster and cheaper whilst retaining most of the teacher’s capability. Related: Model Distillation

L

LangChain
Definition: An open-source framework for building applications powered by language models. LangChain provides abstractions for prompt templates, chains, agents, memory, and retrieval, accelerating LLM application development. Related: LlamaIndex, LLM Orchestration
Latency
Definition: The time delay between sending a prompt and receiving the first token of the response (time-to-first-token, or TTFT). Latency is a critical metric for user-facing applications and is influenced by model size, prompt length, and infrastructure. Related: Throughput
LlamaIndex
Definition: A data framework for connecting custom data sources to large language models, specialising in RAG pipelines. LlamaIndex handles data ingestion, indexing, and retrieval, making it simpler to build knowledge-grounded applications.
LLM Orchestration
Definition: The practice of managing and coordinating multiple LLM calls, tools, and data sources within a single application workflow. Orchestration layers handle routing, error recovery, retries, and result aggregation. Learn more: LLM Architecture Templates
LoRA (Low-Rank Adaptation)
Definition: A PEFT technique that freezes the original model weights and injects small trainable matrices into each layer. LoRA dramatically reduces the compute and storage needed for fine-tuning — often requiring less than 1% of the parameters to be trained. Related: QLoRA
Loop Engineering
Definition: The design of iterative feedback cycles within AI agent architectures, where outputs are evaluated, revised, and resubmitted to achieve higher quality through multiple passes. Loop engineering complements context engineering and is essential for production agentic systems.

M

MCP (Model Context Protocol)
Definition: An open standard (originally from Anthropic, now under the Linux Foundation) that provides a universal interface for connecting AI models to external data sources, tools, and services — often described as “USB-C for AI.” MCP standardises how agents access databases, file systems, APIs, and enterprise tools. Learn more: MCP Complete Guide
Meta-Prompting
Definition: Using a model to generate, evaluate, or improve prompts for itself or other models. Meta-prompting creates a recursive loop where AI assists in its own instruction design, often producing prompts that outperform human-crafted alternatives. Learn more: Meta-Prompting Techniques
Mixture of Experts (MoE)
Definition: A model architecture that divides the network into specialised “expert” sub-networks, routing each input to only the most relevant experts. MoE models can have trillions of total parameters whilst only activating a fraction per token, balancing capability with efficiency. Related: Sparse Model
Model Distillation
Definition: Training a smaller, faster model to replicate a larger model’s behaviour on specific tasks. Distillation is a key cost-optimisation strategy: use a frontier model for development and evaluation, then distil into a cheaper model for production deployment. Related: Knowledge Distillation
Multimodal Prompting
Definition: Crafting prompts that combine multiple input types — text, images, audio, video, or documents — to leverage a model’s ability to reason across modalities. Multimodal prompts enable tasks impossible with text alone, such as visual question answering. Learn more: Multimodal Prompting Guide

N

Negative Prompting
Definition: Explicitly telling the model what to avoid, exclude, or not do. In text generation, negative constraints improve focus; in image generation (Midjourney, DALL-E, Flux), negative prompts remove unwanted elements like “blurry, watermark, extra fingers.” Learn more: AI Image Prompts Guide

O

Output Indicator
Definition: The component of a prompt that specifies the desired format, structure, or type of the model’s response. Output indicators range from simple (“respond in bullet points”) to complex (full JSON schemas). In the STCO framework, the Output component defines the deliverable format explicitly.

P

P-Tuning
Definition: A PEFT method that prepends trainable continuous vectors (virtual tokens) to the input, learning task-specific representations in the embedding space. P-Tuning bridges the gap between hard prompts and soft prompts. Related: Prefix Tuning
PEFT (Parameter-Efficient Fine-Tuning)
Definition: A family of techniques that adapt large models to specific tasks by training only a small subset of parameters, rather than the full model. PEFT methods include LoRA, adapter tuning, prefix tuning, and P-tuning.
Persona Prompting
Definition: Assigning the model a specific role, profession, or character to shape its tone, expertise, vocabulary, and perspective. Persona prompting is one of the most accessible and effective prompting techniques. Example: “You are a senior DevOps engineer with 15 years of experience in Kubernetes.” Learn more: Role Prompting Guide
PPO (Proximal Policy Optimisation)
Definition: A reinforcement learning algorithm widely used in RLHF pipelines to optimise model behaviour against a reward model. PPO balances exploration and exploitation, preventing the model from deviating too drastically during alignment training. Related: DPO
Prefix Tuning
Definition: A PEFT technique that prepends trainable continuous vectors to each layer of the model, enabling task-specific adaptation without modifying the base model weights. Related: P-Tuning, Soft Prompt
Production Prompting
Definition: The practice of designing prompts specifically for deployment in live applications, where reliability, consistency, cost efficiency, and security matter as much as output quality. Production prompts require version control, evaluation, and monitoring. Learn more: Production-Ready Prompt Engineering
Prompt
Definition: The input text, instruction, or query provided to an AI model to guide its response. A prompt can range from a single question to a multi-thousand-token document containing system instructions, examples, context, and output specifications. Learn more: How to Write Effective AI Prompts
Prompt Caching
Definition: Storing and reusing previously computed prompt prefixes to reduce latency and cost on repeated or similar API calls. Both OpenAI and Anthropic offer prompt caching, which can reduce costs by up to 90% for prompts with shared system-level context. Learn more: Prompt Caching Optimisation
Prompt Chaining
Definition: Using the output of one prompt as the input for a subsequent prompt, creating a sequential pipeline of LLM operations. Chaining decomposes complex tasks into manageable steps, improving reliability and debuggability. Learn more: Prompt Chaining Advanced Guide
Prompt Compression
Definition: Techniques for reducing prompt length whilst preserving essential meaning, lowering token costs and enabling more content within fixed context windows. Methods include summarisation, deduplication, and algorithmic compression (e.g., LLMLingua). Learn more: Prompt Compression Techniques
Prompt Debugging
Definition: The systematic process of identifying and fixing issues in prompts that cause incorrect, inconsistent, or low-quality outputs. Debugging involves isolating failure modes, testing hypotheses, and iteratively refining prompt components using the STCO diagnostic framework. Learn more: Prompt Debugging Guide
Prompt Engineering
Definition: The practice of designing, structuring, and optimising inputs to AI models to produce accurate, consistent, and useful outputs. In 2026, prompt engineering encompasses structured frameworks like STCO, context engineering, and agentic orchestration. Learn more: What Is Prompt Engineering? · Best Practices 2026
Prompt Engineering Best Practices
Definition: A codified set of guidelines for writing high-quality prompts, including: use clear instructions, provide context, specify output format, include examples, set constraints, and iterate based on evaluation. The STCO framework encapsulates these practices into a repeatable structure. Learn more: Prompt Engineering Best Practices 2026
Prompt Flow
Definition: A visual or programmatic tool for building, testing, and deploying LLM workflows. Prompt flow platforms (e.g., Azure Prompt Flow, LangFlow) provide drag-and-drop interfaces for connecting prompts, tools, and evaluation nodes. Learn more: AI Prompt IDE & LLMOps Tools
Prompt Injection
Definition: A security attack where malicious instructions are embedded in user input to override a model’s system prompt and safety constraints. Prompt injection is the “SQL injection of AI” and is addressed by the SHIELD Framework. Learn more: Prompt Injection Defence Guide
Prompt Leaking
Definition: A security vulnerability where an attacker tricks a model into revealing its system prompt or hidden instructions. Prompt leaking exposes proprietary logic, business rules, and potentially sensitive data embedded in system prompts. Learn more: System Prompt Security Guide
Prompt Library
Definition: An organised collection of reusable, tested prompts categorised by use case, domain, or function. A well-maintained prompt library accelerates development, ensures consistency, and captures institutional knowledge. Learn more: How to Build an AI Prompt Library
Prompt Observability
Definition: The practice of monitoring, logging, and analysing prompt performance in production systems. Observability tools track metrics like latency, token usage, error rates, and output quality over time. Learn more: AI Prompt IDE & LLMOps Tools
Prompt Routing
Definition: Automatically directing prompts to different models, configurations, or processing pipelines based on task complexity, cost constraints, or domain requirements. A router might send simple queries to a fast, cheap model and complex reasoning tasks to a frontier model. Learn more: Prompt Routing Guide
Prompt Template
Definition: A reusable prompt structure with variable placeholders that can be filled with different values at runtime. Templates enforce consistency and are the building blocks of prompt libraries. Learn more: Prompt Template Design Patterns
Prompt Versioning
Definition: Tracking changes to prompts over time using version control principles, enabling rollback, comparison, audit trails, and collaborative development. Essential for production prompting where prompt changes can affect application behaviour. Learn more: Prompt Version Control Guide

Q

QLoRA
Definition: A memory-efficient fine-tuning method that combines quantisation (reducing model precision to 4-bit) with LoRA adapters. QLoRA enables fine-tuning of large models on consumer GPUs by drastically reducing memory requirements. Related: PEFT
Quantisation
Definition: Reducing the numerical precision of model weights (e.g., from 32-bit to 4-bit) to decrease memory usage and improve inference speed. Quantisation enables running large models on consumer hardware with minimal quality degradation. Related: GGUF, GPTQ, AWQ

R

RAG (Retrieval-Augmented Generation)
Definition: An architecture that retrieves relevant documents from external knowledge sources and injects them into the prompt context before generation. RAG grounds the model’s responses in specific, current data rather than relying solely on training knowledge. Learn more: RAG Prompt Engineering · RAG vs Long Context Windows
Rate Limiting
Definition: API-level restrictions on the number of requests or tokens a user can consume within a time period. Rate limits protect providers from abuse and require prompt engineers to optimise for efficiency through batching, caching, and compression.
ReAct (Reasoning + Acting)
Definition: A prompting framework that interleaves reasoning traces (Thought) with tool-calling actions (Action) and their results (Observation) in a TAO loop. ReAct enables models to solve complex tasks by thinking, acting, and learning from results iteratively. Learn more: ReAct Prompting Framework
Reasoning Effort
Definition: A model parameter (introduced by OpenAI for o-series models) that controls how much computational “thinking time” the model spends before responding. Higher reasoning effort improves accuracy on complex tasks but increases latency and cost.
Red Teaming
Definition: The practice of systematically testing AI systems by attempting to provoke harmful, incorrect, or policy-violating outputs through adversarial prompts. Red teaming identifies vulnerabilities in guardrails, alignment, and system prompt defences before deployment. Related: SHIELD Framework
Reflection
Definition: A technique where the model reviews and critiques its own previous output, then generates an improved version. Reflection enables self-correction without human feedback and is a key component of agentic architectures.
Reward Model
Definition: A model trained to score outputs based on human preferences, used in RLHF pipelines to guide the policy model towards more desirable responses. The reward model acts as an automated proxy for human judgement. Related: PPO, DPO
RLHF (Reinforcement Learning from Human Feedback)
Definition: A training paradigm where human evaluators rank model outputs, and these rankings train a reward model that guides further model optimisation via reinforcement learning (PPO). RLHF is the primary technique used to align models like ChatGPT and Claude with human preferences.
Role Prompting
Definition: A specific form of persona prompting that assigns the model a professional role to shape its expertise, vocabulary, and response style. Role prompting is a foundational element of the STCO framework’s Situation component. Learn more: Role Prompting Guide

S

Safety
Definition: The broad discipline of ensuring AI systems operate without causing harm, encompassing alignment, guardrails, red teaming, content filtering, and output monitoring. The SHIELD Framework provides a structured approach to safety auditing. Learn more: Prompt Safety & Guardrails
Self-Ask
Definition: A prompting technique where the model explicitly asks and answers sub-questions before tackling the main question. Self-ask decomposes complex queries into manageable steps, improving accuracy on multi-hop reasoning tasks. Related: Chain-of-Thought
Self-Consistency
Definition: A decoding strategy that generates multiple independent reasoning paths for the same prompt and selects the most common answer. Self-consistency improves reliability by marginalising over diverse reasoning chains rather than relying on a single greedy output.
Semantic Kernel
Definition: Microsoft’s open-source SDK for integrating LLMs into applications, providing abstractions for plugins, planners, and memory. Semantic Kernel supports C#, Python, and Java, and is designed for enterprise AI orchestration. Related: LangChain, LLM Orchestration
Definition: Information retrieval that matches queries based on meaning rather than exact keyword matches, powered by embeddings and vector databases. Semantic search is the retrieval engine behind most RAG implementations.
SHIELD Framework
Definition: A proprietary security-first prompt auditing framework developed by AI Prompt Architect. SHIELD (Safety, Hallucination-prevention, Integrity, Ethics, Legal compliance, Data privacy) provides a systematic six-point checklist for evaluating whether production prompts meet enterprise compliance and safety requirements. Every prompt should be SHIELD-audited before deployment. Learn more: Prompt Safety & Guardrails · AI Prompt Security & Compliance
Soft Prompt
Definition: Continuous, learnable vectors prepended to a model’s input in embedding space, rather than discrete text tokens. Soft prompts are optimised through gradient descent and are not human-readable. Related: Hard Prompt, Prefix Tuning
Sparse Model
Definition: A model architecture where only a subset of parameters are activated for each input, as seen in Mixture of Experts. Sparse models achieve high capability with lower inference costs by routing inputs to specialised sub-networks. Related: Dense Model
STCO Framework
Definition: Developed by the AI Prompt Architect team, STCO (Situation, Task, Constraints, Output) is a four-component prompt structuring framework that eliminates ambiguity by forcing explicit definition of the persona/context (Situation), objective (Task), boundaries and rules (Constraints), and expected deliverable format (Output). STCO is designed for repeatable, production-grade prompting and consistently outperforms unstructured approaches. Learn more: STCO Framework Guide · CoT vs STCO Comparison · Framework Comparison 2026
Step-Back Prompting
Definition: A technique that asks the model to consider a broader, more abstract version of the question before answering the specific query. Stepping back activates higher-level reasoning and domain knowledge. Example: Before answering “What happens to the boiling point of water at 5,000m altitude?”, the model first considers “What are the general principles of how atmospheric pressure affects phase transitions?”
Streaming
Definition: Receiving model output tokens incrementally as they are generated, rather than waiting for the complete response. Streaming reduces perceived latency and enables real-time display of responses in chat interfaces.
Structured Output
Definition: Model responses that conform to a predefined schema (JSON, XML, tables, YAML) rather than free-form text. Structured outputs enable reliable machine parsing and integration with downstream systems. Learn more: Structured Output Prompt Engineering
System Message
Definition: The first message in a chat completion conversation that sets the model’s behaviour, persona, constraints, and operational context. The system message is processed before user messages and has the strongest influence on model behaviour. Related: System Prompt
System Prompt
Definition: The foundational instruction set that defines an AI application’s persona, capabilities, constraints, and tone. System prompts are typically hidden from end users and form the backbone of any production AI deployment. Learn more: System Prompt Guide · Production System Prompts

T

Temperature
Definition: A model parameter (0.0–2.0) that controls output randomness. Low temperature (0.0–0.3) produces deterministic, focused responses ideal for factual tasks; high temperature (0.7–1.5) increases creativity and variation, suitable for brainstorming and creative writing. Related: Top-p, Top-k
Throughput
Definition: The rate at which a model processes tokens, typically measured in tokens per second. Throughput determines how many requests a system can handle concurrently and is a key metric for production capacity planning. Related: Latency
Token
Definition: The fundamental unit of text that a language model processes. A token can be a word, sub-word, or character depending on the tokeniser. On average, one token ≈ 0.75 English words, or roughly four characters. Token counts determine context usage and API costs. Related: Tokenisation
Tokenisation
Definition: The process of splitting text into tokens using a model-specific vocabulary. Different models use different tokenisers (BPE, SentencePiece, tiktoken), so the same text can produce different token counts across models.
Tool Use
Definition: The ability of a model to invoke external tools, APIs, or functions during generation. Tool use extends model capabilities beyond text generation to include search, calculation, code execution, and database queries. Learn more: AI Agent Architectures & Tool Use
Top-k Sampling
Definition: A decoding strategy that restricts the model to choose from only the top k most probable next tokens. Lower k values produce more focused text; higher values increase diversity. Top-k is often used alongside temperature and top-p.
Top-p (Nucleus Sampling)
Definition: A decoding strategy that dynamically selects from the smallest set of tokens whose cumulative probability exceeds a threshold p. Top-p = 0.9 means the model considers the fewest tokens needed to cover 90% of the probability mass. Related: Temperature, Top-k
Transfer Learning
Definition: The foundational machine learning principle behind modern LLMs: training a model on a large general dataset, then adapting it (via fine-tuning or prompting) for specific tasks. All prompt engineering is a form of transfer learning — leveraging pre-trained knowledge through natural language instructions.
Transformer
Definition: The neural network architecture introduced in the 2017 “Attention Is All You Need” paper that underpins virtually all modern language models. Transformers process input in parallel using attention mechanisms, enabling efficient training on massive datasets.
Tree-of-Thought (ToT)
Definition: An advanced reasoning technique where the model explores multiple reasoning branches simultaneously, evaluates each path, and selects the most promising one. ToT extends chain-of-thought from a linear chain to a branching tree, excelling at complex planning and puzzle-solving tasks.

U

User Message
Definition: The message in a chat completion conversation that represents the human’s input or request. User messages are processed after the system message and can include text, images, and other content depending on the model’s capabilities.

V

Vector Database
Definition: A specialised database optimised for storing and querying high-dimensional embedding vectors. Vector databases (Pinecone, Weaviate, Qdrant, ChromaDB) power the retrieval component of RAG systems by enabling fast semantic search over millions of documents.
Vibe Coding
Definition: A software development approach coined by Andrej Karpathy in February 2025. Vibe coding involves describing the desired application behaviour in natural language and letting an AI coding assistant (Cursor, Claude Code, GitHub Copilot) generate the implementation. It prioritises speed and intent over manual code authorship, but carries risks around security, maintainability, and technical debt. Learn more: Vibe Coding Guide
Vision Prompting
Definition: Crafting prompts that include images or visual content alongside text instructions, leveraging multimodal model capabilities. Vision prompting enables tasks like image analysis, chart interpretation, UI testing, and visual question answering. Learn more: Vision Prompting Guide

W

Windowed Attention
Definition: An attention mechanism variant where each token only attends to a fixed-size local window of neighbouring tokens rather than the entire sequence. Windowed attention (used in architectures like Mistral) reduces computational cost from quadratic to linear, enabling longer context processing.

Z

Zero-Shot Prompting
Definition: Asking the model to perform a task without providing any examples, relying entirely on its pre-trained knowledge and instruction-following capability. Zero-shot is the simplest prompting approach and works well on modern frontier models for straightforward tasks. Example: “Classify the following customer review as positive, neutral, or negative: [review text]” Learn more: Zero-Shot Prompting Guide

Frequently Asked Questions

What is prompt engineering?

Prompt engineering is the practice of designing, structuring, and optimising inputs to AI models to produce accurate, consistent, and useful outputs. In 2026, it encompasses structured frameworks like STCO, context engineering, and agentic orchestration. It’s evolved from a casual skill into a core engineering discipline with its own tools, metrics, and career paths.

What is the STCO framework?

STCO (Situation, Task, Constraints, Output) is a four-component prompt structuring framework developed by the AI Prompt Architect team. It reduces ambiguity by forcing explicit definition of the persona and context (Situation), the objective (Task), boundaries and rules (Constraints), and the expected deliverable format (Output). STCO consistently outperforms unstructured prompting in controlled evaluations.

What is the difference between zero-shot and few-shot prompting?

Zero-shot prompting asks the model to perform a task without any examples, relying entirely on its training. Few-shot prompting includes 2–5 examples in the prompt to demonstrate the desired pattern, format, or style. Few-shot generally produces more consistent outputs but uses more tokens.

What is context engineering?

Context engineering is the discipline of designing the complete information environment — including system prompts, retrieved documents, tool definitions, and memory — that surrounds an AI model. It evolved from prompt engineering as context windows expanded and production AI systems became more complex. Think of it as managing the AI’s “RAM” rather than just its “instructions.”

How many prompt engineering terms are there?

This glossary defines over 115 terms spanning prompting techniques, model architecture, training methods, security, evaluation, and operations. For day-to-day use, understanding 20–30 core terms is sufficient. Professional prompt engineers typically master 60–80+ terms to work effectively across the full stack.

What is the SHIELD framework?

SHIELD (Safety, Hallucination-prevention, Integrity, Ethics, Legal compliance, Data privacy) is a security-first prompt auditing framework developed by AI Prompt Architect. It provides a systematic six-point checklist for evaluating whether production prompts meet enterprise compliance and safety requirements before deployment.

What is vibe coding?

Vibe coding is a software development approach coined by Andrej Karpathy in 2025. It involves describing the desired application behaviour in natural language and letting an AI coding assistant generate the implementation, prioritising speed and intent over manual code authorship. While powerful for prototyping, it requires careful review for production use.

Get the Prompt Engineering Playbook

Join 5,000+ developers receiving our weekly deep-dives on structured outputs, RAG optimisation, and advanced AI agent prompting.

glossaryprompt engineeringterminologyAI terms2026reference

Expert in prompt architecture and large language model optimization.

Ready to build better prompts?

Start using AI Prompt Architect for free today.

Get Started Free

Outlines' grammar-guided generation produces valid JSON on every call with 0% retry rate, versus 15% retry rates with un.Outlines, '.txt: Structured Generation with Gramma…