This hands-on lab introduces participants to Snowflake Cortex AI's ability to extract valuable insights from unstructured documents using large language models. The lab uses examples of call center transcripts stored as PDFs. Participants will explore functions such as PARSE_DOCUMENT, COMPLETE, TRANSLATE, SENTIMENT, ENTITY_SENTIMENT, CLASSIFY_TEXT, SUMMARIZE, and EXTRACT_ANSWER. These tools empower users to parse, translate, analyze, classify, and query unstructured customer support data to uncover sentiment, highlight issues, and summarize conversations at scale.
PARSE_DOCUMENT
COMPLETE
functionTRANSLATE
SENTIMENT
and ENTITY_SENTIMENT
CLASSIFY_TEXT
SUMMARIZE
EXTRACT_ANSWER
Download the source code for this lab here
π‘ Tip: Explore this interactive walkthrough to learn how to sign up for a Snowflake account.
π‘ Tip: Use the LLM Function Availability page to check which cloud regions are supported.
Create the core Snowflake resources needed to run the AI Lab. This includes a database, warehouse, schemas, and a stage for uploading PDFs.
Download the source code for this step here.
This setup script prepares your Snowflake environment to ingest and process unstructured call center transcripts.
CREATE DATABASE
ensures your lab operates in a clean and isolated environment.CREATE WAREHOUSE
provisions compute resources for running your queries. It's configured to minimize cost with automatic suspend/resume settings.CREATE SCHEMA
creates logical namespaces for raw files (RAW
) and processed/intermediate data (STAGE
).CREATE STAGE
sets up a secure location to upload PDF documents. It supports directory table creation and uses Snowflake-managed encryption.This command creates a new database named LLM_CORTEX_DEMO_DB
if it doesn't already exist. Using IF NOT EXISTS
ensures the script is idempotent and can be rerun safely without causing errors if the database already exists.
CREATE DATABASE IF NOT EXISTS LLM_CORTEX_DEMO_DB;
This step provisions a virtual compute warehouse named USER_STD_XSMALL_WH
. It is configured with the following parameters:
XSMALL
β small and cost-effective for light workloads.STANDARD
β supports most use cases.60 seconds
β pauses automatically after inactivity to save credits.TRUE
β resumes automatically when a query is submitted.TRUE
β starts in a paused state until needed.CREATE OR REPLACE WAREHOUSE USER_STD_XSMALL_WH
WITH
WAREHOUSE_SIZE = 'XSMALL'
WAREHOUSE_TYPE = 'STANDARD'
AUTO_SUSPEND = 60
AUTO_RESUME = TRUE
INITIALLY_SUSPENDED = TRUE;
Schemas are used to logically separate and organize objects within a database.
RAW
: stores the ingested PDF files and unprocessed content.STAGE
: used for parsed, structured, or intermediate content.Using IF NOT EXISTS
prevents duplication errors.
CREATE SCHEMA IF NOT EXISTS LLM_CORTEX_DEMO_DB.RAW;
CREATE SCHEMA IF NOT EXISTS LLM_CORTEX_DEMO_DB.STAGE;
This internal stage acts as a Snowflake-managed file storage area. It is configured to:
DIRECTORY = (ENABLE = TRUE)
settingYou will upload call center PDF transcripts into this stage for processing in later steps.
CREATE OR REPLACE STAGE LLM_CORTEX_DEMO_DB.RAW.INT_STAGE_DOC_RAW
DIRECTORY = ( ENABLE = true )
ENCRYPTION = ( TYPE = 'SNOWFLAKE_SSE' );
Your internal stage LLM_CORTEX_DEMO_DB.RAW.INT_STAGE_DOC_RAW
is already set up.
LLM_CORTEX_DEMO_DB
> RAW
> Stages
.INT_STAGE_DOC_RAW
.Use the PARSE_DOCUMENT()
function to extract the textual content from uploaded PDF files stored in the internal stage, and load that content into a structured table. This allows unstructured documents to be processed and queried using SQL.
Download the source code for this step here.
The PARSE_DOCUMENT()
function is part of the Snowflake Cortex AI suite. It enables extraction of raw text from documents (PDFs, DOCX, etc.) into a usable SQL format.
In this step, we:
PARSE_DOCUMENT()
in LAYOUT mode to preserve formatting.The result is a table with one row per document, containing the filename and its full transcript.
Before running the extraction steps, ensure you're working in the correct context:
USE DATABASE LLM_CORTEX_DEMO_DB;
USE SCHEMA STAGE;
USE WAREHOUSE USER_STD_XSMALL_WH;
This table will store the parsed transcript text alongside the original filename. It acts as the primary source for downstream analysis like sentiment, summarization, and classification.
CREATE OR REPLACE TABLE LLM_CORTEX_DEMO_DB.STAGE.TRANSCRIPT (
FILE_NAME STRING,
TRANSCRIPT VARCHAR
);
π‘ The VARCHAR
type is suitable because Cortex will return the full text as a single block.
This SQL block:
PARSE_DOCUMENT()
to read and convert each file into text.TRANSCRIPT
table.We use TO_VARCHAR(...:content::string)
to safely extract the content portion from the returned JSON variant.
INSERT INTO LLM_CORTEX_DEMO_DB.STAGE.TRANSCRIPT
SELECT
SPLIT_PART(RELATIVE_PATH, '/', -1) AS FILE_NAME,
TO_VARCHAR
(
SNOWFLAKE.CORTEX.PARSE_DOCUMENT(
'@LLM_CORTEX_DEMO_DB.RAW.INT_STAGE_DOC_RAW',
RELATIVE_PATH,
{ 'mode': 'LAYOUT' }
):content::string
) AS TRANSCRIPT
FROM
DIRECTORY('@LLM_CORTEX_DEMO_DB.RAW.INT_STAGE_DOC_RAW')
WHERE
RELATIVE_PATH LIKE '%.pdf';
π LAYOUT
mode retains line breaks and layout structure β useful for keeping the dialogue or sections readable. β οΈ Scanned image-based PDFs won't extract correctly. Consider running OCR first if the file contains no embedded text.
Query the table to review which files were parsed and inspect their content:
SELECT FILE_NAME, TRANSCRIPT FROM LLM_CORTEX_DEMO_DB.STAGE.TRANSCRIPT;
You should see each uploaded file with its corresponding parsed text ready for further analysis.
Use the EXTRACT_ANSWER()
function to identify specific information from each call center transcript, such as the caller's name, the date of the call, and its duration. This lets you transform long-form conversational data into structured, queryable columns.
Download the source code for this step here.
The EXTRACT_ANSWER()
function allows you to pose natural-language questions to Cortex and extract a specific answer from a block of text. It is well-suited for pulling discrete facts from unstructured data.
In this step, we:
CALLER_NAME
, CALL_DATE
, and CALL_DURATION
from each transcriptWe also demonstrate how to convert the model output into valid SQL types (DATE, FLOAT, etc.) using TRY_TO_DATE()
and TRY_TO_NUMBER()
.
USE DATABASE LLM_CORTEX_DEMO_DB;
USE SCHEMA STAGE;
USE WAREHOUSE USER_STD_XSMALL_WH;
This new table will hold both the raw transcript and extracted values for analysis.
CREATE OR REPLACE TABLE LLM_CORTEX_DEMO_DB.STAGE.TRANSCRIPT_CALLER (
FILE_NAME VARCHAR,
CALLER_NAME VARCHAR,
CALL_DATE DATE,
CALL_DURATION FLOAT,
TRANSCRIPT VARCHAR
);
π‘ Storing both raw and derived data together makes this table self-contained and easy to validate.
We apply three natural-language prompts:
Each prompt returns an array of potential answers. We extract the top answer using [0]:answer::string
and apply data cleaning before type conversion.
INSERT INTO LLM_CORTEX_DEMO_DB.STAGE.TRANSCRIPT_CALLER
SELECT
FILE_NAME,
SNOWFLAKE.CORTEX.EXTRACT_ANSWER(
TRANSCRIPT,
'What is the name of the caller?'
)[0]:answer::string AS CALLER_NAME,
TRY_TO_DATE(REPLACE(SNOWFLAKE.CORTEX.EXTRACT_ANSWER(
TRANSCRIPT,
'What is the Date of the call?'
)[0]:answer::string,' ','')) AS CALL_DATE,
TRY_TO_NUMBER(REPLACE(SNOWFLAKE.CORTEX.EXTRACT_ANSWER(
TRANSCRIPT,
'What is the call duration?'
)[0]:answer::string,' ','')) AS CALL_DURATION,
TRANSCRIPT AS TRANSCRIPT
FROM
LLM_CORTEX_DEMO_DB.STAGE.TRANSCRIPT;
π The use of REPLACE(..., ' ', '')
ensures that Cortex answers like " 5 minutes " are correctly converted. π TRY_TO_DATE()
and TRY_TO_NUMBER()
prevent failures if the answer is blank or not a valid format.
SELECT * FROM LLM_CORTEX_DEMO_DB.STAGE.TRANSCRIPT_CALLER;
This will display the extracted metadata next to the transcript β ready for further enrichment, filtering, or analytics.
Use the SUMMARIZE()
function to create a natural-language summary of each call center transcript. Summaries help reduce long conversations into concise descriptions that capture the main themes, customer issues, resolutions, and actions taken.
Download the source code for this step here.
The SUMMARIZE()
function is a one-line API that distills the key elements of a transcript. This is useful for:
The summaries are generated using the snowflake-arctic
model under the hood.
USE DATABASE LLM_CORTEX_DEMO_DB;
USE SCHEMA STAGE;
USE WAREHOUSE USER_STD_XSMALL_WH;
We store both the summary and original transcript to preserve traceability.
CREATE OR REPLACE TABLE LLM_CORTEX_DEMO_DB.STAGE.TRANSCRIPT_SUMMARY (
FILE_NAME VARCHAR,
TRANSCRIPT_SUMMARY VARCHAR,
TRANSCRIPT VARCHAR
);
π‘ Saving the original transcript allows you to revise prompts later if needed.
SUMMARIZE()
This query uses the SUMMARIZE()
function to process all transcripts and return a concise textual summary.
INSERT INTO LLM_CORTEX_DEMO_DB.STAGE.TRANSCRIPT_SUMMARY
SELECT
FILE_NAME,
SNOWFLAKE.CORTEX.SUMMARIZE(TRANSCRIPT) AS TRANSCRIPT_SUMMARY,
TRANSCRIPT
FROM
LLM_CORTEX_DEMO_DB.STAGE.TRANSCRIPT;
π§ The model intelligently finds main points, topics, and outcomes without needing a specific prompt.
π You can also run this incrementally by filtering new transcripts using a WHERE clause.
SELECT * FROM LLM_CORTEX_DEMO_DB.STAGE.TRANSCRIPT_SUMMARY;
Review the generated summaries to validate their tone and informativeness. These can be used directly in reporting interfaces or to trigger follow-up actions.
Use the SENTIMENT()
function to measure the emotional tone of a call transcript. This function returns a numerical score between -1 and 1, indicating negative, neutral, or positive sentiment. This analysis helps identify frustrated customers or celebrate excellent service.
Download the source code for this step here.
The SENTIMENT()
function processes free-form text and returns a floating-point score:
-1.0
represents very negative tone (e.g., angry, disappointed)0.0
represents neutral tone (e.g., factual, emotionless)+1.0
represents very positive tone (e.g., appreciative, satisfied)Sentiment scores help in building dashboards for support performance and alerting systems for unhappy customer interactions.
USE DATABASE LLM_CORTEX_DEMO_DB;
USE SCHEMA STAGE;
USE WAREHOUSE USER_STD_XSMALL_WH;
This table captures the sentiment value alongside each transcript.
CREATE OR REPLACE TABLE LLM_CORTEX_DEMO_DB.STAGE.TRANSCRIPT_SENTIMENT (
FILE_NAME VARCHAR,
OVERALL_SENTIMENT FLOAT,
TRANSCRIPT VARCHAR
);
π‘ You can later join this table with summaries or caller info to analyze sentiment by date, customer, or issue type.
SENTIMENT()
to TranscriptsThis query processes every transcript and returns a sentiment score.
INSERT INTO LLM_CORTEX_DEMO_DB.STAGE.TRANSCRIPT_SENTIMENT
SELECT
FILE_NAME,
SNOWFLAKE.CORTEX.SENTIMENT(TRANSCRIPT) AS OVERALL_SENTIMENT,
TRANSCRIPT
FROM
LLM_CORTEX_DEMO_DB.STAGE.TRANSCRIPT;
π Use this data to monitor average sentiment over time or set alerts for very negative interactions. π This function is fast and works well on large volumes of text with little tuning.
SELECT * FROM LLM_CORTEX_DEMO_DB.STAGE.TRANSCRIPT_SENTIMENT;
You should see each file with a numeric sentiment score that reflects the customer's tone throughout the conversation.
Use the ENTITY_SENTIMENT()
function to assess how specific elements ("Tone of voice", "Issue Resolved", "Follow up action") are discussed in each call transcript. Unlike SENTIMENT()
which provides an overall mood score, ENTITY_SENTIMENT()
offers sentiment feedback for specific labeled concepts.
Download the source code for this step here.
The ENTITY_SENTIMENT()
function analyzes a transcript with respect to user-defined entity labels and returns a JSON array. Each entity is associated with:
sentiment
label: Positive, Neutral, or Negativeconfidence
score: a float indicating model certaintyThis is particularly useful for:
USE DATABASE LLM_CORTEX_DEMO_DB;
USE SCHEMA STAGE;
USE WAREHOUSE USER_STD_XSMALL_WH;
This table stores the structured variant output from the Cortex function.
CREATE OR REPLACE TABLE LLM_CORTEX_DEMO_DB.STAGE.TRANSCRIPT_ENTITY_SENTIMENT (
FILE_NAME STRING,
PRODUCT_ENTITY_SENTIMENT VARIANT,
TRANSCRIPT VARIANT
);
We define target labels using ARRAY_CONSTRUCT()
. This example uses three predefined aspects of call quality:
INSERT INTO LLM_CORTEX_DEMO_DB.STAGE.TRANSCRIPT_ENTITY_SENTIMENT
SELECT
FILE_NAME,
SNOWFLAKE.CORTEX.ENTITY_SENTIMENT(
TRANSCRIPT,
ARRAY_CONSTRUCT('Tone of voice', 'Issue Resolved', 'Follow up action')
) AS PRODUCT_ENTITY_SENTIMENT,
TRANSCRIPT AS TRANSCRIPT
FROM
LLM_CORTEX_DEMO_DB.STAGE.TRANSCRIPT;
π― You can customize the list of entities to match internal QA criteria. π‘ Use variant format here so that JSON response can be explored and flattened later.
Use LATERAL FLATTEN
to convert the array into row-based output for easy filtering:
SELECT
FILE_NAME,
flattened.value:entity::STRING AS ENTITY,
flattened.value:sentiment::STRING AS SENTIMENT,
flattened.value:confidence::FLOAT AS CONFIDENCE
FROM LLM_CORTEX_DEMO_DB.STAGE.TRANSCRIPT_ENTITY_SENTIMENT,
LATERAL FLATTEN(INPUT => PRODUCT_ENTITY_SENTIMENT) AS flattened;
π This result shows one row per label per transcript β ideal for QA review and comparison dashboards.
You can transform entity results into columns using aggregation:
CREATE OR REPLACE TABLE LLM_CORTEX_DEMO_DB.STAGE.TRANSCRIPT_ENTITY_SENTIMENT_FINAL AS
SELECT
FILE_NAME,
MAX(CASE WHEN category.value:entity::STRING = 'Tone of voice' THEN category.value:sentiment::STRING END) AS TONE_OF_VOICE,
MAX(CASE WHEN category.value:entity::STRING = 'Issue Resolved' THEN category.value:sentiment::STRING END) AS ISSUE_RESOLVED,
MAX(CASE WHEN category.value:entity::STRING = 'Follow up action' THEN category.value:sentiment::STRING END) AS FOLLOW_UP
FROM LLM_CORTEX_DEMO_DB.STAGE.TRANSCRIPT_ENTITY_SENTIMENT,
LATERAL FLATTEN(INPUT => PRODUCT_ENTITY_SENTIMENT) AS category
GROUP BY FILE_NAME;
Use the CLASSIFY_TEXT()
function to categorize each transcript into a predefined set of labels. This helps you organize transcripts by intent or topic, enabling faster filtering, tagging, and operational response.
Download the source code for this step here.
The CLASSIFY_TEXT()
function evaluates a block of text and assigns the most semantically appropriate label from a list of user-provided options.
This is especially helpful for:
The function returns a label
and an optional score
indicating how confident the model is in its selection.
USE DATABASE LLM_CORTEX_DEMO_DB;
USE SCHEMA STAGE;
USE WAREHOUSE USER_STD_XSMALL_WH;
This table stores the predicted classification label for each transcript.
CREATE OR REPLACE TABLE LLM_CORTEX_DEMO_DB.STAGE.TRANSCRIPT_CLASSIFICATION (
FILE_NAME VARCHAR,
CALL_CLASSIFICATION VARCHAR,
TRANSCRIPT VARCHAR
);
CLASSIFY_TEXT()
on Each TranscriptThis query classifies each transcript into one of the provided categories using semantic matching.
INSERT INTO LLM_CORTEX_DEMO_DB.STAGE.TRANSCRIPT_CLASSIFICATION
SELECT
FILE_NAME,
SNOWFLAKE.CORTEX.CLASSIFY_TEXT(
TRANSCRIPT,
ARRAY_CONSTRUCT('Report Incident', 'Complaint', 'Follow up')
):label::string AS CALL_CLASSIFICATION,
TRANSCRIPT
FROM
LLM_CORTEX_DEMO_DB.STAGE.TRANSCRIPT;
π― You can expand or modify the label list to suit your support taxonomy. π¬ Optional: Extract the confidence score with :score::FLOAT
to identify uncertain classifications.
SELECT * FROM LLM_CORTEX_DEMO_DB.STAGE.TRANSCRIPT_CLASSIFICATION;
This allows you to segment and route calls based on their classification, improving operational workflows and analytics.
Learn how to use the COMPLETE()
function with tailored prompts to generate structured outputs, summaries, ratings, and red flag detection based on transcript text. This step introduces prompt engineering to control the model's output format and tone.
Download the source code for this step here.
The COMPLETE()
function is one of the most versatile tools in the Snowflake Cortex suite. It supports open-ended prompts and free-form answers, ideal for:
Prompt construction is key. Effective prompts include instruction, format guidance, tone, and the transcript.
USE DATABASE LLM_CORTEX_DEMO_DB;
USE SCHEMA STAGE;
USE WAREHOUSE USER_STD_XSMALL_WH;
Summarize a call with one concise sentence, ideal for dashboards or preview cards.
Technique | Prompt Line |
Instruction | Summarize the following call transcript in one sentence: |
Data | TRANSCRIPT |
SELECT
FILE_NAME,
SNOWFLAKE.CORTEX.COMPLETE(
'snowflake-arctic',
'Summarize the following call transcript in one sentence: ' || TRANSCRIPT
) AS CALL_SUMMARY
FROM LLM_CORTEX_DEMO_DB.STAGE.TRANSCRIPT
WHERE FILE_NAME = 'audiofile11.pdf';
Format call insights as a set of short, professional bullet points for quick triage.
Technique | Prompt Line |
Role / Persona | You are a senior support team lead. |
Instruction | Read the following transcript and summarize it into exactly three bullet points. |
Constraint | Keep each bullet point under 15 words and use a professional tone. |
Format Specification | Use hyphens for each bullet. |
Data | TRANSCRIPT |
SELECT
FILE_NAME,
SNOWFLAKE.CORTEX.COMPLETE(
'snowflake-arctic',
'You are a senior support team lead. ' ||
'Read the following transcript and summarize it into exactly three bullet points. ' ||
'Keep each bullet point under 15 words and use a professional tone. ' ||
'Use hyphens for each bullet. ' ||
'Transcript: ' || TRANSCRIPT
) AS BULLET_SUMMARY
FROM LLM_CORTEX_DEMO_DB.STAGE.TRANSCRIPT
WHERE FILE_NAME = 'audiofile11.pdf';
Check whether a conversation requires escalation using simple classification logic.
Technique | Prompt Line |
Role / Persona | You are a triage assistant. |
Instruction | Does this call require escalation? Answer YES or NO. |
Constraint | If YES, explain briefly in one sentence why. |
Data | TRANSCRIPT |
SELECT
FILE_NAME,
SNOWFLAKE.CORTEX.COMPLETE(
'snowflake-arctic',
'You are a triage assistant. ' ||
'Does this call require escalation? Answer YES or NO. ' ||
'If YES, explain briefly in one sentence why. ' ||
'Transcript: ' || TRANSCRIPT
) AS ESCALATION_FLAG
FROM LLM_CORTEX_DEMO_DB.STAGE.TRANSCRIPT
WHERE FILE_NAME IN ('audiofile11.pdf', 'audiofile79.pdf');
Rate the call performance on clarity, empathy, and professionalism. Output is formatted to be parsed easily.
Technique | Prompt Line |
Role / Persona | You are a quality control AI. |
Instruction | Rate the call on a scale of 1β5 stars based on clarity, empathy, and professionalism. |
Format Specification | Format as "Score: X/5 - Reason". |
Data | TRANSCRIPT |
SELECT
FILE_NAME,
SNOWFLAKE.CORTEX.COMPLETE(
'snowflake-arctic',
'You are a quality control AI. ' ||
'Rate the call on a scale of 1β5 stars based on clarity, empathy, and professionalism. ' ||
'Format as "Score: X/5 - Reason". ' ||
'Transcript: ' || TRANSCRIPT
) AS QUALITY_SCORE
FROM LLM_CORTEX_DEMO_DB.STAGE.TRANSCRIPT
WHERE FILE_NAME = 'audiofile11.pdf';
Generate a customer-facing follow-up message after the call. This is helpful for automating agent tasks or QA verification.
Technique | Prompt Line |
Role / Persona | You are a customer service agent. |
Instruction | Based on this transcript, write a short follow-up email under 100 words. |
Tone & Style | Keep it friendly and professional. |
Data | TRANSCRIPT |
SELECT
FILE_NAME,
SNOWFLAKE.CORTEX.COMPLETE(
'snowflake-arctic',
'You are a customer service agent. ' ||
'Based on this transcript, write a short follow-up email under 100 words. ' ||
'Keep it friendly and professional. ' ||
'Transcript: ' || TRANSCRIPT
) AS FOLLOW_UP_EMAIL
FROM LLM_CORTEX_DEMO_DB.STAGE.TRANSCRIPT
WHERE FILE_NAME = 'audiofile11.pdf';
Identify potential compliance violations or red flags for further review.
Technique | Prompt Line |
Role / Persona | You are a risk compliance assistant. |
Instruction | Read the following transcript and identify any red flags (e.g., threats to cancel, abusive language). |
Constraint | If no red flags are found, return "None". |
Format Specification | Use bullet points for each red flag. |
Data | TRANSCRIPT |
SELECT
FILE_NAME,
SNOWFLAKE.CORTEX.COMPLETE(
'snowflake-arctic',
'You are a risk compliance assistant. ' ||
'Read the following transcript and identify any red flags (e.g., threats to cancel, abusive language). ' ||
'If no red flags are found, return "None". ' ||
'Use bullet points for each red flag. ' ||
'Transcript: ' || TRANSCRIPT
) AS RED_FLAGS
FROM LLM_CORTEX_DEMO_DB.STAGE.TRANSCRIPT
WHERE FILE_NAME = 'audiofile11.pdf';
π‘ The COMPLETE()
function gives you full creative control. Prompt clarity and output consistency improve with well-structured instructions and formatting hints.
Apply advanced prompt engineering techniques with the COMPLETE()
function using multi-message format and model-specific parameters. Learn to:
π‘ Tip: Try experimenting with different models to see how they affect tone, structure, and verbosity. Snowflake currently supports a variety of models including:
snowflake-arctic
llama2-70b-chat
mistral-7b
gemma-7b-it
mixtral-8x7b
Download the source code for this step here.
USE DATABASE LLM_CORTEX_DEMO_DB;
USE SCHEMA STAGE;
USE WAREHOUSE USER_STD_XSMALL_WH;
This query uses the snowflake-arctic
model and a single message in the COMPLETE()
array to simulate a call quality assistant. It analyzes the transcript and produces:
You can tune the output using temperature and max_tokens parameters.
SELECT
FILE_NAME,
SNOWFLAKE.CORTEX.COMPLETE(
'snowflake-arctic',
[
{
'role': 'user',
'content': 'You are a call center quality assistant. ' ||
'Based on the following transcript, generate: ' ||
'\n\n1. A suggested response to the customer ' ||
'\n2. Recommended follow-up actions for the agent ' ||
'\n3. A brief tone analysis ' ||
'\n\nTranscript:\n' || TRANSCRIPT
}
],
{
'temperature': 0.5,
'max_tokens': 300
}
) AS CALL_REVIEW_SUMMARY
FROM
LLM_CORTEX_DEMO_DB.STAGE.TRANSCRIPT
WHERE FILE_NAME = 'audiofile11.pdf';
This prompt uses the llama2-70b-chat
model with a structured message array. The system
role sets the context (you are a summarizer), and the user
role passes in the instruction. The result is a two-line summary along with rich metadata for audit and logging.
SELECT
FILE_NAME,
SNOWFLAKE.CORTEX.COMPLETE(
'llama2-70b-chat',
[
{
'role': 'system',
'content': 'You are a professional summarizer. Extract key information clearly and concisely.'
},
{
'role': 'user',
'content': 'Summarize this transcript in 1-2 sentences: ' || TRANSCRIPT
}
],
{
'temperature': 0.3,
'top_p': 0.9,
'max_tokens': 200
}
) AS TRANSCRIPT_SUMMARY,
TRANSCRIPT_SUMMARY:choices[0]:messages::string,
TRY_TO_TIMESTAMP(TRANSCRIPT_SUMMARY:created::string) AS CREATED,
TRANSCRIPT_SUMMARY:model::string AS MODEL,
TRANSCRIPT_SUMMARY:usage:completion_tokens::number AS COMPLETION_TOKENS,
TRANSCRIPT_SUMMARY:usage:prompt_tokens::number AS PROMPT_TOKENS,
TRANSCRIPT_SUMMARY:usage:total_tokens::number AS TOTAL_TOKENS
FROM
LLM_CORTEX_DEMO_DB.STAGE.TRANSCRIPT
WHERE FILE_NAME = 'audiofile11.pdf';
This example uses a multi-message format to generate a complete professional email reply based on the conversation. The system
message sets up tone and structure. The user
message includes the transcript. The output contains a formatted email and tokens metadata.
SELECT
FILE_NAME,
SNOWFLAKE.CORTEX.COMPLETE(
'llama2-70b-chat',
[
{
'role': 'system',
'content': 'You are a customer service representative crafting professional email responses. ' ||
'Your goal is to write a polite, clear, and helpful reply to the customer. ' ||
'Focus on being empathetic, addressing the main issue, and including any necessary follow-up steps. ' ||
'Respond in the form of an email, with a subject line, greeting, body, and sign-off.'
},
{
'role': 'user',
'content': 'Please write a professional email response to the following call transcript: ' || TRANSCRIPT
}
],
{
'temperature': 0.4,
'top_p': 0.9,
'max_tokens': 300
}
) AS EMAIL_RESPONSE_JSON,
EMAIL_RESPONSE_JSON:choices[0]:messages::string AS EMAIL_RESPONSE,
TRY_TO_TIMESTAMP(EMAIL_RESPONSE_JSON:created::string) AS CREATED,
EMAIL_RESPONSE_JSON:model::string AS MODEL,
EMAIL_RESPONSE_JSON:usage:completion_tokens::number AS COMPLETION_TOKENS,
EMAIL_RESPONSE_JSON:usage:prompt_tokens::number AS PROMPT_TOKENS,
EMAIL_RESPONSE_JSON:usage:total_tokens::number AS TOTAL_TOKENS
FROM
LLM_CORTEX_DEMO_DB.STAGE.TRANSCRIPT
WHERE FILE_NAME = 'audiofile11.pdf';
In this example, the transcript is transformed into a structured JSON format. It:
TRANSCRIPT_JSON
) for later flatteningYou also explicitly define the schema Snowflake Cortex should follow using the response_format
parameter. This schema is written using JSON Schema, a standard format for describing the structure of JSON data. This ensures the output adheres to a predictable structure, making it easier to validate and consume in downstream applications.
Respond in JSON
to the prompt.claude-3-5-sonnet
, mistral-large2
), add Respond in JSON
, and include a detailed schema in the prompt.CREATE OR REPLACE TABLE LLM_CORTEX_DEMO_DB.STAGE.TRANSCRIPT_DIALOGUE (
FILE_NAME VARCHAR,
TRANSCRIPT_JSON VARIANT,
TRANSCRIPT VARCHAR
);
INSERT INTO LLM_CORTEX_DEMO_DB.STAGE.TRANSCRIPT_DIALOGUE
SELECT
FILE_NAME,
SNOWFLAKE.CORTEX.COMPLETE(
'llama2-70b-chat',
[
{
'role': 'system',
'content': 'You will receive a conversation transcript between the tags <transcript></transcript>. ' ||
'Your task is to: ' ||
'\n1. Identify the caller and the agent. ' ||
'\n2. Convert the transcript into a structured JSON conversation. ' ||
'\n3. Tag each line by role, name, order. ' ||
'\n4. Respond in JSON under a top-level "dialogue" key.'
},
{
'role': 'user',
'content': '<transcript>' || TRANSCRIPT || '</transcript>'
}
],
{
'temperature': 0.3,
'top_p': 0.9,
'response_format': {
'type': 'json',
'schema': {
'type': 'object',
'properties': {
'dialogue': {
'type': 'array',
'items': {
'type': 'object',
'properties': {
'order': { 'type': 'integer' },
'role': { 'type': 'string' },
'name': { 'type': 'string' },
'speech': { 'type': 'string' }
},
'required': ['order', 'role', 'name', 'speech']
}
}
},
'required': ['dialogue']
}
}
}
) AS TRANSCRIPT_JSON,
TRANSCRIPT
FROM
LLM_CORTEX_DEMO_DB.STAGE.TRANSCRIPT
WHERE FILE_NAME = 'audiofile11.pdf';
Once you've generated structured JSON dialogue, this step uses LATERAL FLATTEN
to unpack each message into tabular rows for easy querying, filtering, or reporting. This is useful for dashboards and transcript-level analytics.
SELECT
FILE_NAME,
d.value:"role"::STRING AS "ROLE",
d.value:"name"::STRING AS "NAME",
d.value:"speech"::STRING AS "SPEECH",
d.value:"order"::NUMBER AS "ORDER"
FROM
LLM_CORTEX_DEMO_DB.STAGE.TRANSCRIPT_DIALOGUE,
LATERAL FLATTEN(
input => TRANSCRIPT_JSON:"structured_output"[0]:"raw_message":"dialogue"
) AS d;
This final step transforms your structured dialogue JSON into a clean SQL table for visual review, search, or BI dashboard integration.
Bring together all outputs from previous stepsβparsed transcripts, extracted answers, sentiment analysis, summaries, classifications, and completionsβinto a unified, queryable dataset. This is ideal for final reporting, QA review, dashboarding, or export.
Download the source code for this step here.
USE DATABASE LLM_CORTEX_DEMO_DB;
USE SCHEMA STAGE;
USE WAREHOUSE USER_STD_XSMALL_WH;
This SQL script consolidates multiple tables into a comprehensive result that includes everything we've derived:
CREATE OR REPLACE TABLE LLM_CORTEX_DEMO_DB.STAGE.TRANSCRIPT_FINAL AS
SELECT
T.FILE_NAME,
C.CALLER_NAME,
C.CALL_DATE,
C.CALL_DURATION,
S.TRANSCRIPT_SUMMARY,
M.OVERALL_SENTIMENT,
CL.CALL_CLASSIFICATION,
E.PRODUCT_ENTITY_SENTIMENT,
T.TRANSCRIPT
FROM LLM_CORTEX_DEMO_DB.STAGE.TRANSCRIPT T
LEFT JOIN LLM_CORTEX_DEMO_DB.STAGE.TRANSCRIPT_CALLER C USING (FILE_NAME)
LEFT JOIN LLM_CORTEX_DEMO_DB.STAGE.TRANSCRIPT_SUMMARY S USING (FILE_NAME)
LEFT JOIN LLM_CORTEX_DEMO_DB.STAGE.TRANSCRIPT_SENTIMENT M USING (FILE_NAME)
LEFT JOIN LLM_CORTEX_DEMO_DB.STAGE.TRANSCRIPT_CLASSIFICATION CL USING (FILE_NAME)
LEFT JOIN LLM_CORTEX_DEMO_DB.STAGE.TRANSCRIPT_ENTITY_SENTIMENT E USING (FILE_NAME);
π This final table aggregates raw and model-generated data in a single view, enabling unified access for downstream applications.
You can preview results with:
SELECT * FROM LLM_CORTEX_DEMO_DB.STAGE.TRANSCRIPT_FINAL;
Congratulations on completing the Snowflake Cortex AI for Call Center Transcript Analysis lab!
Throughout this lab, you explored a wide range of Snowflake Cortex LLM functions including:
PARSE_DOCUMENT()
β Extract structured text from unstructured PDF documentsEXTRACT_ANSWER()
β Pull specific fields from call transcripts using natural languageSUMMARIZE()
β Generate concise overviews of long conversationsSENTIMENT()
and ENTITY_SENTIMENT()
β Analyze tone and target-specific emotional signalsCLASSIFY_TEXT()
β Categorize transcripts into meaningful labelsCOMPLETE()
β Use prompt engineering to create summaries, flags, and structured outputsThese techniques are broadly applicable beyond call center transcripts. Here are a few examples:
If you're interested in going deeper, consider exploring:
COMPLETE()
for grounded, context-aware answersπ If you participated in this lab as part of a Datalab AI training session, you should receive a certified badge of attendance.
Thank you very much for joining us!
Visit us at www.datalab.co.za to learn more about our AI training programs and data analytics solutions.
Stay connected and get updates on new labs by following us on LinkedIn.