Glossary of Terms

ASR (Automated Speech Recognition): The technology that converts spoken audio into written text for processing by the AI.
Agentic AI: AI systems capable of planning and executing multi-step tasks autonomously (e.g., searching a database, updating a record, and confirming a result).
AHT (Average Handle Time): The total duration of a customer interaction, including talk time and after-call work.
Bot Failure Rate / Escalation Rate: The percentage of AI-handled interactions the system was unable to resolve and required transfer to a human agent. A key measure of AI capability ceiling; also called the uncontained rate or transfer rate in some platforms.
CCaaS (Contact Center as a Service): A cloud-based customer experience solution that provides contact center capabilities.
Containment Rate: The percentage of calls fully resolved by an automated system without requiring transfer to a human agent.
Conversation Design: The discipline of crafting natural, effective dialogue flows for AI agents.
DTMF (Dual-Tone Multi-Frequency): The touch-tone keypad signaling system used in traditional telephony, where pressing a button generates a specific audio tone. IVR systems historically relied on DTMF input; modern voice AI replaces DTMF menus with natural language understanding.
FCR (First Contact Resolution): The percentage of customer issues fully resolved during the initial interaction — without the customer needing to call back or follow up. A core CX KPI that voice AI deployments are increasingly measured against.
Full-Duplex: Conversational capability where the AI can listen and speak simultaneously, allowing for natural interruptions and turn-taking.
IVR (Interactive Voice Response): Legacy automated telephony systems that typically rely on keypad (DTMF) or simple keyword inputs.
KYC (Know Your Customer): The mandatory process of verifying the identity of a client, common in financial services.
Latency: The delay between a user finishing their sentence and the AI beginning its response.
LLM (Large Language Model): AI models trained on vast amounts of text that provide the "reasoning" and natural language understanding for modern Voice AI.
Multimodal: AI interfaces that combine multiple input/output types, such as voice, text, and visual markers on a screen.
NPS (Net Promoter Score): A customer loyalty metric derived from asking "How likely are you to recommend us?" on a 0–10 scale. Customers scoring 9–10 are Promoters, 7–8 are Passives, and 0–6 are Detractors. Net score = % Promoters − % Detractors.
PII (Personally Identifiable Information): Data that can be used to identify a specific individual (e.g., SSN, name, birthday).
RAG (Retrieval-Augmented Generation): An AI architecture that grounds LLM responses by retrieving relevant documents, records, or data at query time before generating an answer. Enables voice AI to answer questions based on up-to-date or proprietary information rather than relying solely on training data.
Semantic Accuracy: The degree to which an AI system correctly understands the meaning and intent of a customer utterance, as distinct from transcription accuracy. A system can accurately transcribe words while misinterpreting intent — semantic accuracy measures the latter.
STT (Speech-to-Text): The conversion of spoken audio into written text; often used interchangeably with ASR (Automated Speech Recognition). Some vendors distinguish the terms by pipeline stage; in practice the functions are equivalent.
TTS (Text-to-Speech): The technology that converts the AI's written response into natural-sounding spoken audio.
Warm Transfer: A handoff from an AI system (or human agent) to another human agent in which the receiving agent is briefed on the customer's context before taking the call — as opposed to a cold transfer where the customer must re-explain their situation.

State of Voice AI in Enterprise

Appendix

Glossary of Terms

ASR (Automated Speech Recognition): The technology that converts spoken audio into written text for processing by the AI.
Agentic AI: AI systems capable of planning and executing multi-step tasks autonomously (e.g., searching a database, updating a record, and confirming a result).
AHT (Average Handle Time): The total duration of a customer interaction, including talk time and after-call work.
Bot Failure Rate / Escalation Rate: The percentage of AI-handled interactions the system was unable to resolve and required transfer to a human agent. A key measure of AI capability ceiling; also called the uncontained rate or transfer rate in some platforms.
CCaaS (Contact Center as a Service): A cloud-based customer experience solution that provides contact center capabilities.
Containment Rate: The percentage of calls fully resolved by an automated system without requiring transfer to a human agent.
Conversation Design: The discipline of crafting natural, effective dialogue flows for AI agents.
DTMF (Dual-Tone Multi-Frequency): The touch-tone keypad signaling system used in traditional telephony, where pressing a button generates a specific audio tone. IVR systems historically relied on DTMF input; modern voice AI replaces DTMF menus with natural language understanding.
FCR (First Contact Resolution): The percentage of customer issues fully resolved during the initial interaction — without the customer needing to call back or follow up. A core CX KPI that voice AI deployments are increasingly measured against.
Full-Duplex: Conversational capability where the AI can listen and speak simultaneously, allowing for natural interruptions and turn-taking.
IVR (Interactive Voice Response): Legacy automated telephony systems that typically rely on keypad (DTMF) or simple keyword inputs.
KYC (Know Your Customer): The mandatory process of verifying the identity of a client, common in financial services.
Latency: The delay between a user finishing their sentence and the AI beginning its response.
LLM (Large Language Model): AI models trained on vast amounts of text that provide the "reasoning" and natural language understanding for modern Voice AI.
Multimodal: AI interfaces that combine multiple input/output types, such as voice, text, and visual markers on a screen.
NPS (Net Promoter Score): A customer loyalty metric derived from asking "How likely are you to recommend us?" on a 0–10 scale. Customers scoring 9–10 are Promoters, 7–8 are Passives, and 0–6 are Detractors. Net score = % Promoters − % Detractors.
PII (Personally Identifiable Information): Data that can be used to identify a specific individual (e.g., SSN, name, birthday).
RAG (Retrieval-Augmented Generation): An AI architecture that grounds LLM responses by retrieving relevant documents, records, or data at query time before generating an answer. Enables voice AI to answer questions based on up-to-date or proprietary information rather than relying solely on training data.
Semantic Accuracy: The degree to which an AI system correctly understands the meaning and intent of a customer utterance, as distinct from transcription accuracy. A system can accurately transcribe words while misinterpreting intent — semantic accuracy measures the latter.
STT (Speech-to-Text): The conversion of spoken audio into written text; often used interchangeably with ASR (Automated Speech Recognition). Some vendors distinguish the terms by pipeline stage; in practice the functions are equivalent.
TTS (Text-to-Speech): The technology that converts the AI's written response into natural-sounding spoken audio.
Warm Transfer: A handoff from an AI system (or human agent) to another human agent in which the receiving agent is briefed on the customer's context before taking the call — as opposed to a cold transfer where the customer must re-explain their situation.