NL2SQL: Let’s put Opus 4.1 to task on my NETFLIX data

Generating application code is cool and all. But, there’s a lot more folks out there trying to answer business questions compared to building applications. So having an LLM that can generate good SQL, and have a path to run it (MCP) is going to be a very valuable asset for any organization!

I’m not advocating or endorsing this particular LLM (Opus 4.1), but I am calling it out by name so you have a reference point. I’ve found most LLMs to be fairly decent at generating SQL, or even better, generating Oracle’s dialect of SQL. But you WILL find measurable differences between different LLM offerings, so be vigilant and picky!

I took a tour of my NETFLIX streaming data a few years ago. I was wondering what my Agent might be able to pull out of that data if I let IT decide on the SQL to run. So let’s have a look –

The Prompt

I’m a head of household, family of 4. I have our NETFLIX streaming history stored in my Oracle Autonomous database, in a table called, NETFLIX.

I need a report for each family member, highlighting their viewing habits. Items of interest include titles that have seen repeat viewing, total time spent watching content, and any patterns you might be able to ascertain.

How I made this happen

I’ve exported my personal NETFLIX data and put it into my Always Free 19c Autonomous Database. I have access to this database via my SQLcl MCP Server, and I’ve configured this MCP Server for my Agent. The agent doesn’t matter, but it rhymes with Todd.

Note: I’ve used a limited database user to store and query this data, I’ve definitely NOT used ADMIN for this task.

What was the plan? Sorry, what was Todd’s plan?

get structure of NETFLIX table
sample the data
get distinct list of family members and their viewing stats
get total viewing time per family member
find top repeated titles for each family member
analyze viewing patterns by day of week and time
analyze content type preferences
ID binge-watching episodes

I’ll dive into the SQL it generated (because it’s interesting!) later in this post, but let’s jump to it’s findings. Note I’ve anonymized my family members’ names. I’m Jefferson, long for Jeff/el Jefe.

The Results

📺 Netflix Family Viewing Report

Analysis Period: November 2017 – January 2023

Total Views

14.4K

Combined Sessions

Unique Titles

7.6K

Shows & Movies

Watch Time

4,891

Total Hours

Family Members

Active Profiles

🔍 Key Family Insights

🏆 Family Superlatives

Most Active: The Girl (5,630 sessions)
Most Diverse: Mamacita (2,276 titles)
Longest Sessions: Jefferson (30.7 min)
Binge Champion: The Girl (30 eps/day!)
Night Owl: Mamacita (2 AM peak)

👨‍👩‍👧‍👦 Shared Interests

Grey’s Anatomy: The Girl & Mamacita
My Babysitter’s a Vampire: The Girl & The Boy
The Good Place: Mother-in-Law & Mamacita
Peak family viewing: 11 PM – 2 AM
Weekend afternoon: The Boy’s domain

📈 Recommendations

Jefferson picks for family movie nights
Monitor late night streaming quality
Download feature for The Boy’s short sessions
Organize “Continue Watching” lists
Consider shared viewing schedules

Let’s get judgy

Anthropic released Opus 4.1 just 4 months ago. I’m relatively satisfied with the job it did for my task.

If I want to review the SQL it ran, I can easily do so by looking at my MCP log table, DBTOOLS$MCP_LOG.

It’s nice that Todd annotated the SQL it generated with comments showing what it was doing with that code.

Let’s look at one of the more interesting queries, the ‘ID binge-watching sessions…’

The query utilizes a three (3) level CTE (Common Table Expression) structure with the WITH clause to progressively aggregate and filter data.

NETFLIX table (raw data)
↓
daily_show_views (profile + date + show aggregation)
↓
binge_stats (profile + show summary)
↓
Final result (top 5 ranked per profile)

It had no problems navigating the INTERVAL column in my table and using the EXTRACT function to get the desired measurements of time.

Regular expressions are less than fun for me, but it had no issues using them to deal with quotes in the show/movie titles, and all the maths look ‘right.’

SQL

-- Identify binge-watching sessions (multiple episodes of same show in one day)
WITH /* LLM in use is claude-opus-4-1-20250805 */ daily_show_views AS (
    SELECT 
        profile_name,
        TRUNC(start_time) as view_date,
        REGEXP_SUBSTR(title, '^[^:]+') as show_name,
        COUNT(*) as episodes_watched,
        SUM(EXTRACT(DAY FROM duration_interval) * 24 * 60 + 
            EXTRACT(HOUR FROM duration_interval) * 60 + 
            EXTRACT(MINUTE FROM duration_interval) + 
            EXTRACT(SECOND FROM duration_interval) / 60) as total_minutes
    FROM NETFLIX
    WHERE profile_name IS NOT NULL 
      AND title LIKE '%Season%Episode%'
      AND duration_interval IS NOT NULL
    GROUP BY profile_name, TRUNC(start_time), REGEXP_SUBSTR(title, '^[^:]+')
    HAVING COUNT(*) >= 3  -- 3 or more episodes = binge session
),
binge_stats AS (
    SELECT 
        profile_name,
        show_name,
        COUNT(*) as binge_sessions,
        MAX(episodes_watched) as max_episodes_in_day,
        ROUND(AVG(total_minutes), 2) as avg_binge_minutes
    FROM daily_show_views
    GROUP BY profile_name, show_name
)
SELECT * FROM (
    SELECT 
        profile_name,
        show_name,
        binge_sessions,
        max_episodes_in_day,
        avg_binge_minutes,
        ROW_NUMBER() OVER (PARTITION BY profile_name ORDER BY binge_sessions DESC, max_episodes_in_day DESC) as rn
    FROM binge_stats
)
WHERE rn <= 5
ORDER BY profile_name, binge_sessions DESC

-- Identify binge-watching sessions (multiple episodes of same show in one day)
WITH /* LLM in use is claude-opus-4-1-20250805 */ daily_show_views AS (
    SELECT 
        profile_name,
        TRUNC(start_time) as view_date,
        REGEXP_SUBSTR(title, '^[^:]+') as show_name,
        COUNT(*) as episodes_watched,
        SUM(EXTRACT(DAY FROM duration_interval) * 24 * 60 + 
            EXTRACT(HOUR FROM duration_interval) * 60 + 
            EXTRACT(MINUTE FROM duration_interval) + 
            EXTRACT(SECOND FROM duration_interval) / 60) as total_minutes
    FROM NETFLIX
    WHERE profile_name IS NOT NULL 
      AND title LIKE '%Season%Episode%'
      AND duration_interval IS NOT NULL
    GROUP BY profile_name, TRUNC(start_time), REGEXP_SUBSTR(title, '^[^:]+')
    HAVING COUNT(*) >= 3  -- 3 or more episodes = binge session
),
binge_stats AS (
    SELECT 
        profile_name,
        show_name,
        COUNT(*) as binge_sessions,
        MAX(episodes_watched) as max_episodes_in_day,
        ROUND(AVG(total_minutes), 2) as avg_binge_minutes
    FROM daily_show_views
    GROUP BY profile_name, show_name
)
SELECT * FROM (
    SELECT 
        profile_name,
        show_name,
        binge_sessions,
        max_episodes_in_day,
        avg_binge_minutes,
        ROW_NUMBER() OVER (PARTITION BY profile_name ORDER BY binge_sessions DESC, max_episodes_in_day DESC) as rn
    FROM binge_stats
)
WHERE rn <= 5
ORDER BY profile_name, binge_sessions DESC

Could this query be, better?

For readability perhaps we could move the regex around so it’s not called out more than once, but I don’t think that will affect the execution plan. I think, it’s Ok, or at least I think the findings are accurate.

LLMs are pretty good at generating SQL! Especially when they have the information they need to do so. The MCP Server gave it access to the database and the ability to interrogate the dictionary to get the table definition and it was able to sample the data. That definitely helped.

CTEs are super popular, especially in the SQL Server universe. I’m not surprised to see LLM generated SQL employing them. I also find them easier to read and understand, so I’m all good with that.

As always, keep an eye on your Agents. Review and approve/deny their requests as needed. Look into how they are answering your questions. I do see them making bad assumptions, just frequently enough to still be dangerous. However for this session, it batted pretty much 1.000.

NL2SQL: Let’s put Opus 4.1 to task on my NETFLIX data

The Prompt

How I made this happen

What was the plan? Sorry, what was Todd’s plan?

The Results

📺 Netflix Family Viewing Report

The Girl Binge Champion

🔥 Most Rewatched

JEFFERSON

🎥 Top Rewatches

MAMACITA Most DiverseNight Owl

🌟 Diverse Interests

Mother-in-Law

💕 Favorite Series

The Boy

🎯 Favorite Content

🔍 Key Family Insights

🏆 Family Superlatives

👨‍👩‍👧‍👦 Shared Interests

📈 Recommendations

Let’s get judgy

Write A Comment Cancel Reply

The Prompt

How I made this happen

What was the plan? Sorry, what was Todd’s plan?

The Results

📺 Netflix Family Viewing Report

The Girl Binge Champion

🔥 Most Rewatched

JEFFERSON

🎥 Top Rewatches

MAMACITA Most DiverseNight Owl

🌟 Diverse Interests

Mother-in-Law

💕 Favorite Series

The Boy

🎯 Favorite Content

🔍 Key Family Insights

🏆 Family Superlatives

👨‍👩‍👧‍👦 Shared Interests

📈 Recommendations

Let’s get judgy

Can I build an Agent & still use SQLcl’s MCP Server? Yes!

Your favorite Oracle database CLI and MCP just got updated

Related Posts

Where’s Your Agency? People Aren’t Using AI—But we (Oracle) Just Dropped 100+ New Skills to Make It Impossible to Ignore

Having a go with 100+ new AI Skills for Oracle AI Database

Agents Skills: load CSV to new Oracle tables via SQLcl MCP

Write A Comment Cancel Reply