OfficeFrustration - View Single Post

**BruceM[_2_]** · August 17th, 2007, 01:35 PM posted to microsoft.public.access.tablesdbdesign

"Jamie Collins" wrote in message
s.com...
On 16 Aug, 17:58, "BruceM" wrote:
Regarding autonumber or other arbitrary PKs

Wait up. PRIMARY KEY designation is *always* arbitrary but I think you
meant 'artificial key' (or similar). Do you mean keys in the logical
model or in the physical implementation...?

Yes, I meant "artificial". You have completely lost me when you say the PK
designation is always arbitrary. Are you saying the designation is
arbitrary? I think I use the term PK where you use "unique constraint".
Perhaps there is a difference between the two, but in any case, when I refer
to a PK I mean the field or fields that are used for relationships. The
record's uniqueness is an atttibute of the record aside from the fact that
it has an autonumber or other artificial key. To put it another way, the
autonumber PK is a convenience, not the only thing that is unique about the
record.

understand
that I *never* argued in favor or creating a table for which the "only
unique constraint is defined on a sole autonumber column".

Do I not recall correctly a thread where you had an entity type
'trainer' (natural person) where there was no industry standard
identifier and the compound of the available attributes (company
identifier, person full name) did not yield uniqueness so you omitted
a key from your logical model and used an autonumber PRIMARY KEY in
your Access implementation? Apologies in advance if I've recalled
incorrectly (google groups archive search seem to be broken just now).

No, you misunderstood somewhat, but you have quite a memory. There were
several issues there. There are enough natural fields to ensure uniqueness,
but that would mean about a six-field key. For most purposes the FirstName,
LastName, MI are enough, but do not provide a guarantee. Another issue was
that most training is done by employees (supervisors, etc.), but that some
training is done by outside people. If I use EmployeeID (or a multi-field
natural key) to identify the trainer, I would need to enter outside trainers
into the Employee table (not desirable) or into a Trainer table with similar
fields, but in that case I would need to come up with a single field PK that
would not conflict with the one in the Employee table, or else create
another multi-field PK. Or I could just store the name, with enough detail
that there is no ambiguity about the person's identify, and be done with it.
Since most outside trainers conduct a session or two only, storing the names
in their own table, or creating a record in the Employee table, seems to
make little sense.

Also, note that I was addressing the group rather than you personally;
again, sorry if I was not clear.

If I have a listing of
vendors, what is the *unchanging* unique constraint?

In the logical model, a good key should be *stable*; immutability is
the ideal since but real life is not always ideal.

I too wish all your vendors were issued with an unchanging identifier.
In which sector do to you operate e.g. DUNS number? In mine,
incorporated bodies must be registered with a government agency which
issues a public and unchanging (for all practical purposes)
identifier. Although there are incentives for businesses to
incorporate, I still need to model unincorporated businesses;
fortunately, we have "passing off" laws which makes trading name (yes,
can change) plus trading period a reasonably stable key. I also need
to model businesses from all other jurisdictions for which an
enterprise key of our own design is used.

Some vendors are incorporated bodies, and some are local machinists who
construct tooling. In any case, the typical situation is to enter the
proposed vendor into the database, then to obtain tax ID or other such
information. Needing a tax ID before the vendor can be entered into the
database is not a real-world option.

A "reasonably stable" key worries me. Again, remember that I am talking
about the field or combination of fields that are related to other tables.
I link to an employee table from several databases. If the employee's name
is part of the key, an employee whose name changes means that several
databases now need to be updated. For that matter, when they change the
employee ID number, as happened recently, it becomes rather awkward and
time-consuming.

I've found trusted sources of potentially useful identifiers wary of
revealing details (I found the 'DX Exchange' people in the UK most
unhelpful).

Yes, there are no easy answers, autonumber included.

My main problem with autonumbers is that they do not exist in the
logical model, therefore they are not a logical key. This was the
problem I recalled from your 'trainers' scenario.

A record's
uniqueness is one thing. The means of identifying it in relationships is
another.

In the physical implementation I've no problem with you or anyone
using a so-called surrogate if you also have a natural key or
enterprise key which exists external to the database. I wouldn't
recommend doing it myself for a variety of reasons e.g. doing so makes
data harder to read.

maybe cascading updates of multi-field keys are not a problem. I would
rather avoid them

Ideally (that word again), I think the SQL DBMS should store the key
value only once and use an internal surrogate, in the way you do by
hand with autonumber, to point to it; this way an ON UPDATE CASCADE
action would physically change only one value. Such SQLs exist but
professionally I need use Microsoft's products, at least in the
immediate future; Microsoft doesn't have them so I go without.

So my ideal surrogate would be hidden and I'd still see the real key
values in the referencing tables but that's not the reality for me.
With your way (by hand with autonumber) you either have to work with
the meaningless values or use a JOIN (or three or thirteen) to see the
real values, so it's not for me thanks and good luck to you.

The surrogate key, when there is one, is hidden from view. I can see it if
I choose, but it is not exposed to the user. The user, however, will see
enough detail to know which Jamie Collins they are selecting when there are
two employees with that name. The user can identify the unique record when
needed without ever seeing the surrogate key.

You will not change my mind on the subject, nor I yours.

One thing that tickles me about the 'autonumber PK' advocates it that
they most often use the incremental Long Integer flavour, presumably
because it makes the data easier to read! If I cannot dissuade you
from autonumber, I urge you to choose random: it will improve
concurrency (let's not start on the whole 'physical clustering on
disk' thing), you will be less inclined to expose the autonumber of
users and, having to type the values over and over, you may come to
appreciate a well designed key (fixed width, check digit, etc). It's
'cruel to be kind'; making you type replication IDs (GUID) would just
be cruel g.

I don't care what the autonumber PK looks like, or whether it is random or
sequential or whatever. The choice has nothing whatever to do with making
the data easier to read, because I do not read the autonmber field except
during the development stages. There is no temptation to expose it to the
users. I can imagine limited situations where it would be OK to do so. For
instance, a calls database may use a number as a reference for the call, in
the same way that electronic banking uses an apparently arbitrary number as
a transaction reference. But this is the infrequent exception. I don't
know about replication IDs, so if I am creating a mess for some future
situation about which I have no understanding now, so be it.

Jamie.

--