In our previous post, I showed how you could easily download your Spotify personal streaming data, and upload it to a new table in a free (Always!) Oracle Cloud database. We can take our JSON document and with a simple drag and drop, get to a state where we can run SQL!
OK, I have a table and some data, now what??
Let’s look briefly at our table. I want to see how it’s defined, and what the data looks like now that I have rows and columns.
We can see this if we right-click on our table in the left side of the SQL worksheet and select ‘Open.’
- PLAYED, a DATE
- ARTISTNAME, a VARCHAR2
- TRACKNAME, also a VARCHAR2
- DURATION, a NUMBER
Let’s ask some questions!
The best way to learn SQL is to come up with some interesting questions YOU have about YOUR data. This is my Spotify listening history for…exactly what time period?
Here’s how I’m going to write this post: I’m going to ask a question, show the SQL, and give a brief overview of how the SQL works. I’m then going to share Doc links that go into the techniques or concepts in more detail.
What is the timeframe, what period of time does this data cover?
PLAYED is a date. We have some functions available to us for dates (and numbers). MIN and MAX are what they sound like. For a range of values, find me the ‘smallest’ and ‘biggest’ values.
Oracle stores dates internally as numbers. So the smaller the number, the farther back in time we’re going.
The third thing I asked for was a math problem, take the max and subtract from it the min. When we subtract (-) two dates, we get a number that represents the number of days between those two dates.
-- about the data, period of time SELECT MIN(played) start_date, MAX(played) end_date, (MAX(played)-MIN(played)) total_days FROM SPOTIFY_STREAMS_2022;
Another cool trick with DATEs. If you have a DATE value and add ‘1’ to it, you’re adding a day, so basically show me next day from this point in time. We’re going to be talking more about DATEs in a bit.
Who were my favorite artists?
This question is fairly straightforward.
COUNT() is a function that returns a count of rows. Supply the * tells it to count EVERYTHING, even when there is a NULL or duplicate value.
But I don’t want just a count of all the rows. I want counts of multiple groups of rows. I want a count of all the times every single artist shows up in my play history.
That’s what the GROUP BY clause does.
I could ask for a count of records by song, or even by point in time. But since I include the ARTISTNAME in the SELECT and GROUP BY fields, I get counts on those ‘groups’ of records in the table.
The ORDER BY tells the database to SORT my results. The ‘1’ is shorthand for the first column in the SELECT, the count(*). The ‘DESC’ is shorthand for ‘descending’ order. So show me the biggest counts, FIRST.
SELECT COUNT(*), ARTISTNAME FROM SPOTIFY_STREAMS_2022 GROUP BY ARTISTNAME ORDER BY 1 DESC;
What was I listening to on Monday mornings?
Ok, this is where it gets fun. If you don’t understand this all at once, that’s OK. I would say even, that that’s expected.
Alright, in Oracle the DATE data type is used to store ‘datetime’ values, as in a specific point in time. That CAN be represented as a day on the calendar followed by the exact hour, minute, and second.
We can be more precise. If you want fractional seconds, local time zone accommodations, and more, then you can use TIMESTAMP. And once you get more comfortable with the topic, I would suggest we always store temporal data as TIMESTAMP WITH LOCAL TIMEZONE, but we’re not ready for that yet.
Since we’re not day trading and only looking at when we played a song, DATE will do.
So we’re going to repeat the COUNT(*), and ARTISTNAME in the SELECT, and we have it again in the GROUP BY clause. Our ORDER BY clause remains the same as well.
But, instead of looking at ALL the data, I only want to look at specific rows. I only want to look at rows that contain dates that happen to fall on a Monday. And not only that, I only want dates that fell on a Monday AND between the hours of 5AM and 10AM.
TO_CHAR() is a function that converts something, in this case a date, into a string. I want to find the dates that fell on a Monday. If I pass ‘DY’ to TO_CHAR for a DATE, I get back a short code for the day of the week. In this case I would get MON vs MONDAY.
Tip/Hint: I’m assuming you’re like me, an English speaker. But you may parles français o hablas español. The strings you want to filter on, will depend on your locale/NLS settings.
select to_char(sysdate-3, 'DY') SHORT_CODE, to_char(sysdate-3, 'DAY') DAY from dual; SHORT_CODE DAY ---------------- ---------- LUN. LUNDI
So my WHERE clause says I only want to count rows that happened on a MONDAY.
Finding Monday Mornings
But how do I sort the morning hours?
For that I need to grab the HOUR component of the day. ‘HH24’ says give me the hour in military time, so 20 = 8PM.
Remember, TO_CHAR returns the DATE as a string. In this case it returns a two digit number. Instead of saying
to_char(played, 'HH24') between 5 and 10
I’m doing this
to_char(played, 'HH24') in ('10', '11', '12', '13', '14', '15', '16')
My logic is thus:
Comparing strings, where those strings happen to be numbers, can cause problems. Oracle knows that ’10’ is smaller than ’11’, but what happens if I ask it to compare ’09’ with ’10?’ Bad things happen, or COULD happen.
So instead I supply a literal list of values I’m OK with.
But wait, why are you using 10 – 16?
When my code is right, but for the wrong reason
Well, our web technology returns all DATEs as timestamps using the UTC timezone.
The words I wrote above are true. But we’re not returning a date with our query. We’re returning a string in the form of a CHAR via the TO_CHAR call!
But as it turns out, my SQL is still right.
Because Spotify delivers it’s data in UTC format. Remember when I said you need to understand your data to really be able to write SQL against it?
So that’s 5 hours ahead of me here on the East Coast. So ’10’ is the same as ’05’ for me, as ’16’ translates to ’11’ for me. So I’m OK counting any song that played on a Monday from 5:00:00 AM – 11:59:59 AM.
SELECT COUNT(*), ARTISTNAME FROM ADMIN.SPOTIFY_STREAMS_2022 WHERE to_char(played, 'DY') = 'MON' AND to_char(played, 'HH24') IN ('10', '11', '12', '13', '14', '15', '16') GROUP BY ARTISTNAME ORDER BY 1 DESC;
Technically it’s till not 100% right. The PLAYED column contains the moment when the song stopped streaming. I would need to subtract the DURATION time from that to get the actual start time of a song, but I don’t care to be THAT accurate.
Well, what about on a Friday after noon?
-- Friday afternoon bands SELECT COUNT(*), ARTISTNAME FROM ADMIN.SPOTIFY_STREAMS_2022 WHERE to_char(played, 'DY') = 'FRI' AND to_char(played, 'HH24') IN ('17', '18', '19', '20', '21', '22', '23') GROUP BY ARTISTNAME ORDER BY 1 DESC;
I’m writing this post at 9:40PM, if I ask for the current time, I find out that the UTC representation of that is…tomorrow morning!
Your tomorrow is my today, yeah? If you haven’t figured it out yet, working with DATEs and temporal data can give you a major headache!
Some real resources for learning SQL
Chris Saxon is our SQL evangelist for Oracle. He runs a twitter account called ‘SQL Daily’ that posts daily SQL tips and tricks!
Want to find consecutive rows with #SQL?— SQL Daily (@sqldaily) January 16, 2023
This links consecutive rows into one group with pattern matching in Oracle Database:
SELECT * FROM … MATCH_RECOGNIZE (
ORDER BY …
PATTERN ( init consecutive* )
DEFINE consecutive AS col = PREV ( col ) + 1
Chris also recommended this free online SQL course:
Confused, need more help, found a bug in my SQL?
Hop on aboard, leave a comment! I’m sure I made a mistake somewhere, esp since it’s 3AM in London as I write this (in Cary, NC).