Tuesday, September 27, 2005

Some thoughts on database design

I occasionally like to read Joel on Software, a sensible site which ought to be required reading for software company managers. However, today I found some things in the Aardvark spec (PDF) which I strongly disagree with.

Microsoft Access has done some terrible damage to the world of database design. By far the most pernicious effect has been the AutoNumber phenomenon. I've lost track of the number of databases I've seen with foolishness like this:

tblCountry (
CountryID autonumber primary key,
CountryCode char(2),
CountryName varchar(50))

What's wrong with this design? Let me give you a hint. Here are two rows from an actual database I've worked with:

65, 'DE', 'Germany'
218, 'DE', 'GERMANY, FEDERAL REPUBLIC OF'

Oh dear, oh dear. A better design for this table is of course:

tblCountry (
CountryCode char(2) primary key,
CountryName varchar(50))

Books have been written about designing tables, but here is a simple rule: start with the fields you need, and see if you can find a candidate key amongst them. If you can't, chances are your table doesn't need a primary key in the first place. Here's an example of such a table:

InvoiceEntries (
InvoiceNumber int references Invoices (InvoiceNumber),
ItemCode char(4) references Items (ItemCode),
Quantity int);

If you really, really need to be able to uniquely identify each tuple (row), then an autonumber PK is an option. But in general I try to avoid them at all costs. Here's the reason: autonumber PKs are meaningless. What does CountryID=143 signify? What country is that? If you can't look at the rows of your table and make sense of all the data in one reading, you're in a bad place as a DBA.

The second bit of lunacy, also attributable to Microsoft in a small way, is the habit of extending Hungarian notation to table naming. I'm talking about tables called tblCountry, tblCustomer, and so forth. Here's another real-life example of what can happen:

Let's say you start off with a table called tblCustomer. Several months or years later you discover that in fact you have two types of customers, corporate and private. At this stage you have several thousand lines of legacy code dealing with tblCustomer, so changing the table isn't really an option.

Fortunately, databases have wonderful things called views and triggers that let you deal with just such a situation in an elegant fashion. What you need to do is create two new tables, say tblCorporateCustomer and tblPrivateCustomer, and then replace tblCustomer with a view onto the union of these new tables. You write triggers to support the legacy operations of inserting, updating and deleting on tblCustomer. This enables your legacy code to continue working seamlessly while you go on to do great things with the new tables.

There's just one fly in the ointment. tblCustomer is no longer a table, it's a view. What's the new guy on the team going to make of a view called tblCustomer? Exactly what purpose does that "tbl" prefix serve? What's the point of differentiating between tables and views in the first place? Confusion to the new guy? Job security for the greybeards? You tell me.