Tuesday, September 27, 2005

Some thoughts on database design

I occasionally like to read Joel on Software, a sensible site which ought to be required reading for software company managers. However, today I found some things in the Aardvark spec (PDF) which I strongly disagree with.

Microsoft Access has done some terrible damage to the world of database design. By far the most pernicious effect has been the AutoNumber phenomenon. I've lost track of the number of databases I've seen with foolishness like this:

tblCountry (
CountryID autonumber primary key,
CountryCode char(2),
CountryName varchar(50))

What's wrong with this design? Let me give you a hint. Here are two rows from an actual database I've worked with:

65, 'DE', 'Germany'
218, 'DE', 'GERMANY, FEDERAL REPUBLIC OF'

Oh dear, oh dear. A better design for this table is of course:

tblCountry (
CountryCode char(2) primary key,
CountryName varchar(50))

Books have been written about designing tables, but here is a simple rule: start with the fields you need, and see if you can find a candidate key amongst them. If you can't, chances are your table doesn't need a primary key in the first place. Here's an example of such a table:

InvoiceEntries (
InvoiceNumber int references Invoices (InvoiceNumber),
ItemCode char(4) references Items (ItemCode),
Quantity int);

If you really, really need to be able to uniquely identify each tuple (row), then an autonumber PK is an option. But in general I try to avoid them at all costs. Here's the reason: autonumber PKs are meaningless. What does CountryID=143 signify? What country is that? If you can't look at the rows of your table and make sense of all the data in one reading, you're in a bad place as a DBA.

The second bit of lunacy, also attributable to Microsoft in a small way, is the habit of extending Hungarian notation to table naming. I'm talking about tables called tblCountry, tblCustomer, and so forth. Here's another real-life example of what can happen:

Let's say you start off with a table called tblCustomer. Several months or years later you discover that in fact you have two types of customers, corporate and private. At this stage you have several thousand lines of legacy code dealing with tblCustomer, so changing the table isn't really an option.

Fortunately, databases have wonderful things called views and triggers that let you deal with just such a situation in an elegant fashion. What you need to do is create two new tables, say tblCorporateCustomer and tblPrivateCustomer, and then replace tblCustomer with a view onto the union of these new tables. You write triggers to support the legacy operations of inserting, updating and deleting on tblCustomer. This enables your legacy code to continue working seamlessly while you go on to do great things with the new tables.

There's just one fly in the ointment. tblCustomer is no longer a table, it's a view. What's the new guy on the team going to make of a view called tblCustomer? Exactly what purpose does that "tbl" prefix serve? What's the point of differentiating between tables and views in the first place? Confusion to the new guy? Job security for the greybeards? You tell me.

9 Comments:

Anonymous Craig said...

I really strongly disagree with your database design suggestion. I think the primary key should *NEVER* have business meaning. What happens when country code changes? Dont say it wont because anything with any business meaning is alway subject to change.

Personally I use GUID's for primary keys these days. I know some people complain they have too much overhead, but all the benefits far outweigh that issue IMO. I dont like AutoInc keys either because they are terrible for multiple database replication. And just when you create a database that you think will never need replication your boss will come along and ask you to merge to databases. What a disaster!

3:53 PM  
Blogger David Turner said...

What happens when the country code changes? Are you suggesting that there's a problem with mutating PKs? Three words: on update cascade. Come on, I can think of much better arguments against natural PKs than that, and I still advocate them.

10:47 AM  
Blogger markhuron4373 said...

I read over your blog, and i found it inquisitive, you may find My Blog interesting. My blog is just about my day to day life, as a park ranger. So please Click Here To Read My Blog

2:23 AM  
Blogger walterprehiem56356510 said...

I read over your blog, and i found it inquisitive, you may find My Blog interesting. So please Click Here To Read My Blog

http://pennystockinvestment.blogspot.com

5:46 PM  
Blogger leanordjackson85798033 said...

Get any Desired College Degree, In less then 2 weeks.

Call this number now 24 hours a day 7 days a week (413) 208-3069

Get these Degrees NOW!!!

"BA", "BSc", "MA", "MSc", "MBA", "PHD",

Get everything within 2 weeks.
100% verifiable, this is a real deal

Act now you owe it to your future.

(413) 208-3069 call now 24 hours a day, 7 days a week.

6:32 PM  
Blogger djli30izxz said...

hey, I just got a free $500.00 Gift Card. you can redeem yours at Abercrombie & Fitch All you have to do to get yours is Click Here to get a $500 free gift card for your backtoschool wardrobe

11:59 AM  
Blogger 0gdsw92vxu said...

hey, I just got a free $500.00 Gift Card. you can redeem yours at Abercrombie & Fitch All you have to do to get yours is Click Here to get a $500 free gift card for your backtoschool wardrobe

10:51 PM  
Blogger ubka7bjnw23pl said...

hey, I just got a free $500.00 Gift Card. you can redeem yours at Abercrombie & Fitch All you have to do to get yours is Click Here to get a $500 free gift card for your backtoschool wardrobe

10:50 AM  
Anonymous Anonymous said...

Interesting, first off. I worked on a project using GUIDs. Simply because the architect thought it added complexity and thus security. Security by obfuscation is not security!

GUIDs were an absolute nightmare! The issue is when developing you generally don't have a UI yet to do everything you want to with a DB. Infact you rarely ever do. So you end up having to paste the GUID then scan the table, find the GUID and go edit the data!

Give me PK 12123 I can easily remember this and quickly fix a problem. The case becomes even more severe when you start using a Guid for your static "TYPE" kind of data - say a customer type field. Now you have 10 entries with GUIDs. Each time you want to enter something you can only do so by copying and pasting the GUID into a sql query.

I am a big fan of numeric PKs. Using a text string as a PK may cause you some performance issues imo. but if you can get away with it then go for it! I'm all for the elegant simple approach!

Brett Mahon

6:59 PM  

Post a Comment

<< Home