ThatJeffSmith

Foreign Keys, a Foreign Concept?

Relational databases are powerful for many reasons, but you cannot deny the ‘relational’ part is pretty compelling. Rows or ‘tuples’ in a table are comprised of related data. So for one row in an Employee table, you can assume that each value or column pertains to that particular Employee.

Foreign Keys - Not Just a Cool Toy

Of course you can’t – or rather shouldn’t – put everything about said employee in a single row or table. I hate using absolutes, but in general it’s good practice to normalize your data. Just because Oracle allows you to use 1,000 columns in a table doesn’t mean you should.


On a side note – curious why developers are crazy about the NoSQL movement? Cassandra just announced (story) they will support 2,000,000,000 (that’s 2 BILLION) columns per table.


So odds are your database has more than one table, or at least your application is using more than one table. There is a fair chance that some of these tables will be RELATED. The ANSI SQL standard allows for these relationships to be defined and enforced via the database. To quote the ANSI shopping site –

“Defines the data structures and basic operations on SQL-data. Provides functional capabilities for creating, accessing, maintaining, controlling, and protecting SQL-data.”

The ‘controlling and protecting’ piece of that quote is not accidental. The database is setup to protect the data. A smart DBA once said that protecting the data was the core of their job. Foreign Keys are one of the databases mechanisms for protecting the integrity of the data across tables. For example if you delete a department from your HR system, you want to make sure you don’t have any employees assigned to that department after the fact. Or going the other direction, if you hire a new employee, you want to make sure they belong to a department. A foreign key constraint can be employed to enforce this business rule.

So why do application developers INSIST on hard-coding the relationship into their applications?
Has this ever happened to you? The first day on the job, you get your database credentials, you login to the database to answer a question for someone, and you have no idea how the data is structured? And to make matters worse, there are no foreign keys, so the database doesn’t know either.

And this is not limited to folks of the Oracle persuasion. I see this rampant across the relational database spectrum (yes, you too MSFT SQL Server.)

I have a few theories why this happens:

  • Ignorance. The ‘architect’ doesn’t know any better, a la the database is just a ‘flat file’ for the application to store data in. Users will never access the data directly, they will always access it via the magical application that provides for every conceivable need.
  • Old wives tale: Foreign Keys impact performance. If we implement them, it will slow the application down. We don’t need the database to ‘check’ anything, because the application KNOWS it’s OK.
  • The application will support multiple database providers, so they short-cut the design in the name of flexibility.
  • Poor design. Or no design. Proper design should give you all of the benefits with few downsides of referential integrity in the database.
  • Maybe there was a kernel of truth in some of these assumptions at one point, but I do not see anything that supports those theories today.

    Pity the poor people who have to query this data.
    First you have to dig to find out how the data is related. Most modelling tools will rely on foreign keys to document relationships. Smarter tools will make a best-guess from common-named column names, but this is just a guess.

    Any time you write a query against these related tables, you will need to write the JOIN manually. Your database IDE will be handicapped, and it will take you longer to write your code as a result. Hey, how fun is this to write?

    SELECT ...
    FROM SH.COSTS
           JOIN SH.PROMOTIONS
              ON (COSTS.PROMO_ID = PROMOTIONS.PROMO_ID)
           JOIN SH.TIMES
              ON (COSTS.TIME_ID = TIMES.TIME_ID)
           JOIN SH.PRODUCTS
              ON (COSTS.PROD_ID = PRODUCTS.PROD_ID)
           JOIN SH.CHANNELS
              ON (COSTS.CHANNEL_ID = CHANNELS.CHANNEL_ID)
           JOIN SH.SALES
              ON (SALES.PROMO_ID = PROMOTIONS.PROMO_ID)
           JOIN SH.COUNTRIES
              ON (CUSTOMERS.COUNTRY_ID = COUNTRIES.COUNTRY_ID)
           JOIN SH.CUSTOMERS
              ON (SALES.CUST_ID = CUSTOMERS.CUST_ID)
           JOIN SH.TIMES
              ON (SALES.TIME_ID = TIMES.TIME_ID)
           JOIN SH.PRODUCTS
              ON (SALES.PROD_ID = PRODUCTS.PROD_ID)
           JOIN SH.CHANNELS
              ON (SALES.CHANNEL_ID = CHANNELS.CHANNEL_ID)
    

    The big boys should know better. But they don’t.
    It’s easy to assume that this only happens with emerging companies. But I see it in enterprise class solutions. In some cases they will define some foreign keys, but not all of them. I JUST DON’T GET IT.

    So developers and architects – please explain it to me – why are foreign keys such a foreign concept?

    OK Jeff, we get the point, but what are we supposed to do, redo the entire database design?
    That would make me and all of your customers very happy. It won’t make you any more money, and sadly that is probably going to prevent it from ever happening. But going forward, if you are going to build something, do it right.

    I love you guys.
    I’ve been a little hard some people on this post, and I am more than ready to soften my position if the evidence supports it. I want everyone to remember that ‘I love you guys.’