Generating application code is cool and all. But, there’s a lot more folks out there trying to answer business questions compared to building applications. So having an LLM that can generate good SQL, and have a path to run it (MCP) is going to be a very valuable asset for any organization!

I’m not advocating or endorsing this particular LLM (Opus 4.1), but I am calling it out by name so you have a reference point. I’ve found most LLMs to be fairly decent at generating SQL, or even better, generating Oracle’s dialect of SQL. But you WILL find measurable differences between different LLM offerings, so be vigilant and picky!

I took a tour of my NETFLIX streaming data a few years ago. I was wondering what my Agent might be able to pull out of that data if I let IT decide on the SQL to run. So let’s have a look –

The Prompt

How I made this happen

I’ve exported my personal NETFLIX data and put it into my Always Free 19c Autonomous Database. I have access to this database via my SQLcl MCP Server, and I’ve configured this MCP Server for my Agent. The agent doesn’t matter, but it rhymes with Todd.

Note: I’ve used a limited database user to store and query this data, I’ve definitely NOT used ADMIN for this task.

What was the plan? Sorry, what was Todd’s plan?

  1. get structure of NETFLIX table
  2. sample the data
  3. get distinct list of family members and their viewing stats
  4. get total viewing time per family member
  5. find top repeated titles for each family member
  6. analyze viewing patterns by day of week and time
  7. analyze content type preferences
  8. ID binge-watching episodes

I’ll dive into the SQL it generated (because it’s interesting!) later in this post, but let’s jump to it’s findings. Note I’ve anonymized my family members’ names. I’m Jefferson, long for Jeff/el Jefe.

The Results

📺 Netflix Family Viewing Report

Analysis Period: November 2017 – January 2023

Total Views
14.4K
Combined Sessions
Unique Titles
7.6K
Shows & Movies
Watch Time
4,891
Total Hours
Family Members
5
Active Profiles
🎭

The Girl Binge Champion

The Series Enthusiast – Teen/Young Adult Content

Views
5,630
Hours
1,486
Titles
1,972
Avg Session
15.8m

🔥 Most Rewatched

GO! The Unforgettable Party 42 times
Grey’s Anatomy (97 binge sessions!) 16 eps/day
My Little Pony: Rainbow Rocks 29 times
Peak Time
11 PM – 12 AM
Favorite Days
Mon & Fri
Content Mix
67% Series
🎬

JEFFERSON

The Movie Buff – Quality Films & Prestige TV

Views
2,740
Hours
1,403
Titles
2,009
Avg Session
30.7m

🎥 Top Rewatches

Total Recall 10 times
The West Wing 29 binges
Avengers: Infinity War 9 times
Peak Time
12 – 1 AM
Favorite Days
Sun & Fri
Content Mix
56% Movies
📺

MAMACITA Most DiverseNight Owl

The Variety Viewer – Reality TV to Documentaries

Views
3,571
Hours
1,229
Titles
2,276
Avg Session
20.7m

🌟 Diverse Interests

Great British Baking Show 25+ views
Formula 1 13 binges
Selling Sunset 13 binges
Peak Time
2 AM (!)
Favorite Days
Mon, Thu, Fri
Content Mix
66% Series
❤️

Mother-in-Law

The Romance & Drama Fan – Heartwarming Content

Views
1,022
Hours
423
Titles
714
Avg Session
24.9m

💕 Favorite Series

Heartland 31 binges
Virgin River 14 binges
Hart of Dixie 11 binges
Peak Time
1 AM
Favorite Days
Sat, Tue, Wed
Content Mix
76% Series
🎮

The Boy

The Young Viewer – Animated & Gaming Content

Views
1,477
Hours
350
Titles
657
Avg Session
14.2m

🎯 Favorite Content

LEGO Jurassic World 18 times
We Bare Bears 19 binges
Minecraft: Story Mode 24+ views
Peak Time
2 PM & 8 PM
Favorite Days
Weekends
Content Mix
72% Series

🔍 Key Family Insights

🏆 Family Superlatives

  • Most Active: The Girl (5,630 sessions)
  • Most Diverse: Mamacita (2,276 titles)
  • Longest Sessions: Jefferson (30.7 min)
  • Binge Champion: The Girl (30 eps/day!)
  • Night Owl: Mamacita (2 AM peak)

👨‍👩‍👧‍👦 Shared Interests

  • Grey’s Anatomy: The Girl & Mamacita
  • My Babysitter’s a Vampire: The Girl & The Boy
  • The Good Place: Mother-in-Law & Mamacita
  • Peak family viewing: 11 PM – 2 AM
  • Weekend afternoon: The Boy’s domain

📈 Recommendations

  • Jefferson picks for family movie nights
  • Monitor late night streaming quality
  • Download feature for The Boy’s short sessions
  • Organize “Continue Watching” lists
  • Consider shared viewing schedules

Let’s get judgy

Anthropic released Opus 4.1 just 4 months ago. I’m relatively satisfied with the job it did for my task.

If I want to review the SQL it ran, I can easily do so by looking at my MCP log table, DBTOOLS$MCP_LOG.

It’s nice that Todd annotated the SQL it generated with comments showing what it was doing with that code.

Let’s look at one of the more interesting queries, the ‘ID binge-watching sessions…’

The query utilizes a three (3) level CTE (Common Table Expression) structure with the WITH clause to progressively aggregate and filter data.

NETFLIX table (raw data)

daily_show_views (profile + date + show aggregation)

binge_stats (profile + show summary)

Final result (top 5 ranked per profile)

It had no problems navigating the INTERVAL column in my table and using the EXTRACT function to get the desired measurements of time.

Regular expressions are less than fun for me, but it had no issues using them to deal with quotes in the show/movie titles, and all the maths look ‘right.’

SQL
-- Identify binge-watching sessions (multiple episodes of same show in one day)
WITH /* LLM in use is claude-opus-4-1-20250805 */ daily_show_views AS (
    SELECT 
        profile_name,
        TRUNC(start_time) as view_date,
        REGEXP_SUBSTR(title, '^[^:]+') as show_name,
        COUNT(*) as episodes_watched,
        SUM(EXTRACT(DAY FROM duration_interval) * 24 * 60 + 
            EXTRACT(HOUR FROM duration_interval) * 60 + 
            EXTRACT(MINUTE FROM duration_interval) + 
            EXTRACT(SECOND FROM duration_interval) / 60) as total_minutes
    FROM NETFLIX
    WHERE profile_name IS NOT NULL 
      AND title LIKE '%Season%Episode%'
      AND duration_interval IS NOT NULL
    GROUP BY profile_name, TRUNC(start_time), REGEXP_SUBSTR(title, '^[^:]+')
    HAVING COUNT(*) >= 3  -- 3 or more episodes = binge session
),
binge_stats AS (
    SELECT 
        profile_name,
        show_name,
        COUNT(*) as binge_sessions,
        MAX(episodes_watched) as max_episodes_in_day,
        ROUND(AVG(total_minutes), 2) as avg_binge_minutes
    FROM daily_show_views
    GROUP BY profile_name, show_name
)
SELECT * FROM (
    SELECT 
        profile_name,
        show_name,
        binge_sessions,
        max_episodes_in_day,
        avg_binge_minutes,
        ROW_NUMBER() OVER (PARTITION BY profile_name ORDER BY binge_sessions DESC, max_episodes_in_day DESC) as rn
    FROM binge_stats
)
WHERE rn <= 5
ORDER BY profile_name, binge_sessions DESC

Could this query be, better?

For readability perhaps we could move the regex around so it’s not called out more than once, but I don’t think that will affect the execution plan. I think, it’s Ok, or at least I think the findings are accurate.

LLMs are pretty good at generating SQL! Especially when they have the information they need to do so. The MCP Server gave it access to the database and the ability to interrogate the dictionary to get the table definition and it was able to sample the data. That definitely helped.

CTEs are super popular, especially in the SQL Server universe. I’m not surprised to see LLM generated SQL employing them. I also find them easier to read and understand, so I’m all good with that.

As always, keep an eye on your Agents. Review and approve/deny their requests as needed. Look into how they are answering your questions. I do see them making bad assumptions, just frequently enough to still be dangerous. However for this session, it batted pretty much 1.000.

Author

I'm a Distinguished Product Manager at Oracle. My mission is to help you and your company be more efficient with our database tools.

Write A Comment