OfficeFrustration - View Single Post

#37 September 26th, 2005, 08:54 PM

Roger Carlson wrote:
I don't recall talking about clustered indexes at all.

That's exactly where you are going wrong. For Access/Jet, you need to
start thinking in terms of PK = clustered index.

I also DO
NOT
believe that a clustered index is a requirement for a primary key.

In Access/Jet, you get a clustered index with every PK, whether you
like it or not. That's why you need to choose your PK carefully.

Certainly, EF Codd said nothing about it. This is an implementation issue,
not a design issue.

Your knowledge of database theory seems to be outdated. Some recycling:

Why are we only allowed one PRIMARY KEY per table? Allow me to quote my
old pal Joe Celko:

"In the first papers that Dr. Codd wrote, he talked about
candidate keys -- all the possible keys that exist in a table. Then
you
were to pick one of them to be called the PRIMARY KEY.

"Frankly, this was a hold-over from the days of sequential files --
hey,
Dr. Codd did not come up with the whole RDBMS model all at once. When
we did EDP (Electronic Data Processing) on magnetic tapes in the 1950's

and 1960's, the typical application merged tapes together, so both the
master tape and the transaction tapes had to be sorted on the same key
(account numbers, or whatever). You do not do random seeks on a
magnetic tape.

"Very quickly, The Good Doctor realized that a key is a key, and giving
a
special name to one of them changes nothing about its nature. Primary
keys were then dropped from database theory.

"However, System R and SQL software had been built on top of old file
systems and Dr. Codd's first papers. The PRIMARY KEY was implemented
using the existing keys and indexing methods in the old file systems.
And it has stayed there since.

"Some SQL systems assumed that the PRIMARY KEY would be the preferred
access path and optimized for it."

This last sentence is significant. The question now is: What special
meaning was given to PRIMARY KEY in the Jet implementation of SQL? To
cut a long story short, I'll give you the answer: clustered indexes
i.e. physical ordering on disk. You can only have one physical order
(think paper copy telephone directory: its physical order is fixed)
hence only one PRIMARY KEY. In Jet there is no other way of specifying
the physical ordering for than to use PRIMARY KEY.

If you are using PRIMARY KEY to merely mean NOT NULL UNIQUE then you
are under-utilizing it at best. If you designate a sole autonumber as
PRIMARY KEY you are certainly using it incorrectly because an
incrementing integer (worse, random GUID) makes for a lousy physical
order (think paper copy telephone directory ordered on telephone number
when your primary usage is to retrieve data by last name then first
name). Choosing a bad PRIMARY KEY can result in placing a performance
hit on your database.

If you decide (and I urge you to resist doing so) to use an autonumber
(ID) to force your rows to be unique for uniqueness' sake where you
have no natural key, then put it last in your PRIMARY KEY definition
and put the columns for your clustered index in appropriate order first
e.g.

PRIMARY KEY (last_name, first_name, ID)

This way, the physical order for the table will be rebuilt in last_name
then first_name order with ID merely to satisfy the uniqueness
requirement.

If you are using autonumber as an artificial/surrogate key on
performance grounds e.g. on the basis that a compound natural key is
less efficient for table joins etc, then NOT NULL UNIQUE is sufficient.
But to use an autonumber for efficientcy then take a performance hit by
making it PRIMARY KEY makes no sense!