

Thai Voice Call Keyword Extraction Using Generative AI
Voice calls remain a key channel for customer service, especially in industries. These conversations hold valuable insights, but much of that information is buried in unstructured audio.
To uncover these insights, we use a keyword-driven approach. The process begins with call recordings between agents and customers. Each speaker’s dialogue is transcribed using speech-to-text. Then, sentence by sentence, a Generative AI model extracts relevant keywords, terms that help analysts quickly understand the conversation’s topic.
Process Flow
Here is the overview diagram of word segmentation and keyword extraction:

Step 1: Speech-to-Text
In this step, recorded conversations between a customer and an agent are converted from audio into text. Each speaker’s dialogue is separated and transcribed individually using a speech-to-text engine. This separation ensures that each sentence retains speaker context (e.g., whether it's from the customer or the agent).
Step 2: Keyword Extraction
Tools: Gemma-3n-e4b
Once the conversation is transcribed, each sentence is passed to an AI model to extract keywords. The model is guided by a custom prompt that simulates the role of a keyword extraction specialist.
The prompt is designed to instruct the AI to focus on terms that help an analyst or supervisor quickly understand and categorize the scenario. These keywords go beyond general vocabulary, highlighting topic-representative terms such as product types, reported issues, or actions mentioned during the conversation. This approach is adaptable and can be adjusted with domain-specific prompts to suit the needs of different industries, ensuring the extracted keywords are relevant to the context.
Evaluation Methods and Results
To assess the performance of the keyword extraction process, two evaluation methods were used: fuzzy match and exact match. These methods compare the keywords extracted by the AI against a set of reference (human-annotated) keywords to measure how accurately the model identifies relevant terms.
- Fuzzy Match Evaluation
Fuzzy matching allows for partial or approximate matches between predicted and reference keywords. This is useful when the extracted keyword is semantically correct but not an exact string match (e.g., “login issue” vs. “cannot log in”).
- Exact Match Evaluation
This method only counts keywords as correct if they match the reference exactly (character-for-character). It’s stricter and highlights how well the model reproduces the expected output precisely.
Results
The following results evaluate the performance of a keyword extraction model designed for an insurance call center. The model analyzes sentences from Thai-language call transcripts between agents and customers, extracting keywords that a call center analyst or supervisor would use to quickly understand and categorize the conversation’s scenario.
.webp)
.webp)
The evaluation shows that the keyword extraction model performs well overall, especially when flexible matching is allowed:
1. Fuzzy Match results indicate strong performance, with an F1 Score of 72.4%, meaning the model accurately captures relevant keywords even when the phrasing varies slightly from the reference.
- To compare performance, we evaluated our model, Gemma-3n-e4b with Amity prompt, against KeyBERT, Typhoon, and the baseline Gemma-3n. The results show that our model achieves significantly higher precision, demonstrating its strength in accurately identifying relevant keywords as shown in following:
.webp)
2. Exact Match results are slightly lower, with an F1 Score of 61.8%, showing the model is reasonably accurate in producing precise, word-for-word matches but leaves room for improvement in exact phrasing.
Overall, the model demonstrates good effectiveness in identifying relevant keywords, especially when semantic similarity is considered.
Benefits
- Improved Call Analysis Efficiency
By automatically extracting relevant keywords from each sentence, supervisors and analysts can quickly grasp the main topics of calls without manually reviewing entire transcripts. - Better Categorization and Tagging
Keywords help classify conversations into meaningful categories such as product issues, customer complaints, or service requests, enabling streamlined case handling and reporting. - Enhanced Customer Insights
Identifying key terms from customer-agent dialogues reveals common pain points, frequently asked questions, and emerging trends to improve service quality. - Scalability and Automation
Automating keyword extraction from large volumes of Thai call transcripts enables consistent, scalable analysis without relying on manual effort.
Collaborate and partner with our AI Lab at Amity Solutions here