Autonumbering Primary Keys

#11 March 21st, 2005, 12:47 PM

I'm not sure what you just said. What do you mean by "true key value"? What
is a mapping exercise. If I use employeeID as the PK I am not using a PK
that is likely to be included in OrderBy clauses, although I will group by
employeeID in order to have each employee's record on a separate page or
something like that. In that case I group by employee ID, and order by
FullName, which is a concatenated field in the report's underlying query. In
a company of about 70 employees there have been no duplicates in FullName,
but if there are I will order by something additional. My question to you is
what do you see as the advantage of updating the records if the PK value
changes in nature (as in the example of EmployeeID). Why is it not simpler
and cleaner to avoid that possibility? What would you use as PK in an
employee records database?

"onedaywhen" wrote:

BruceM wrote:
I have heard
of instances where a company changed the format of an EmployeeID
number or
some other number (by appending a 0, for instance, to allow for more
IDs than
the current system allowed). Had EmployeeID been used as the primary
key, or
as part of a combined PK, it could have been pretty messy.

Actually, having the true key value in each of the referencing tables
would make this quite a straightforward mapping exercise. You'd have to
suspend the constraints during the process, of course.

Jamie.

--

#12 March 21st, 2005, 02:06 PM

BruceM wrote:
What do you mean by "true key value"?

I think I meant a 'natural' key. An autonumber is not a key. In your
brand new Employees table with its autonumber PK, insert a row for an
employee and you get ID=1. Then delete the row and insert the exact
same details again. Now you get ID=2. You have two different IDs that
refer to the same employee. Your ID column is not a key.

In
a company of about 70 employees there have been no duplicates in
FullName

You are aware full name is not likely to be unique, that's good.

What would you use as PK in an
employee records database?

In my country, the government likes to keep tabs on its subjects;
something to do with paying tax and qualifying for state benefits, I'm
told. It has a department that issues each individual with a 9
character unique identifier. If a new employee can't tell their ID, the
employer must construct a temporary number using date of birth and a
letter denoting sex and the tax department gets back to them fairly
promptly with the actual ID (AFAIK notifying the tax office of a new
employee is a legal requirement). When an employer takes on two people
within a short period with the same sex and birth date, I guess they
either ask one of them to try to find out their ID (usually found on an
old payslip) or phone to tax office to see if they can tell the actual
IDs. I understand in certain large countries in North America they have
something similar, known as a social security number (SSN).

What
is a mapping exercise

You proposed a theoretical scenario where a column (EmployeeID) needed
to be changed. Copying the data from the old to the new is what I'm
calling a mapping exercise i.e. you have to map (copy) the data from
each row in the old table into the corresponding row in the new table.

what
do you see as the advantage of updating the records if the PK value
changes in nature (as in the example of EmployeeID).

When doing data conversion/mapping exercises, you have to be sure your
actual results correspond to you expected results. I guess there is no
disadvantage to using an integer value to identify each row for these
purposes but I think I would be happier seeing something that was
obviously a key, rather than a simple row identifier.

Why is it not simpler
and cleaner to avoid that possibility?

Let me propose another theoretical scenario: something happens to your
employee table and your autonumber column is lost. You now have an
EarningsHistory table where each employee is identified using an
integer which has no meaning without the original values from your
Employees table. Consider the same EarningsHistory table where each row
had a unique identifier such as the aforementioned SSN, something
verifiable in reality e.g. phone the tax department and ask them which
employee has a certain SSN. Do you now see what I mean when I say I
feel 'happier' using a real key in the referencing tables?

Jamie.

--

#13 March 24th, 2005, 04:51 PM

"onedaywhen" wrote:

BruceM wrote:
What do you mean by "true key value"?

I think I meant a 'natural' key. An autonumber is not a key. In your
brand new Employees table with its autonumber PK, insert a row for an
employee and you get ID=1. Then delete the row and insert the exact
same details again. Now you get ID=2. You have two different IDs that
refer to the same employee. Your ID column is not a key.

I'm missing your point here. If I delete the record that contains
autonumber ID=1 then there is no record containing ID=1. This would be a
problem only if I had related records in other tables.

In
a company of about 70 employees there have been no duplicates in
FullName

You are aware full name is not likely to be unique, that's good.

I was referring to FullName as an Order By field in a report. As a
concatenated field in a query it can't have a PK in any case. Combining
FirstName and LastName as a PK would be risking duplication.

What
is a mapping exercise

You proposed a theoretical scenario where a column (EmployeeID) needed
to be changed. Copying the data from the old to the new is what I'm
calling a mapping exercise i.e. you have to map (copy) the data from
each row in the old table into the corresponding row in the new table.

Got it. I thought you meant something like that, but wasn't sure.

Let me propose another theoretical scenario: something happens to your
employee table and your autonumber column is lost. You now have an
EarningsHistory table where each employee is identified using an
integer which has no meaning without the original values from your
Employees table. Consider the same EarningsHistory table where each row
had a unique identifier such as the aforementioned SSN, something
verifiable in reality e.g. phone the tax department and ask them which
employee has a certain SSN. Do you now see what I mean when I say I
feel 'happier' using a real key in the referencing tables?

Yes, I see your point about being able to reconstruct the database if the
SSN field is lost. Although I don't quite see how a single field can be lost
(a field involved in a relationship can't be deleted until the relationship
is undone) I think I would prefer reconstructing the PK field if it came to
that. As long as I keep the records ordered by the autonumber field I could
add a new autonumber field, then reorder the related table by the FK field
and replace all occurrences of the lowest number with 1, of the second lowest
with 2, etc. (or something like that). This is assuming my backup also went
bad.

I'm not trying to be contentious here. I really do appreciate your taking
the time to reply and to explain in further detail. I have a feeling that
neither of us is likely to change the other's mind, but you do raise some
points worth considering.

Jamie.

--

#14 March 29th, 2005, 10:02 AM

BruceM wrote:
I'm missing your point here. If I delete the record that contains
autonumber ID=1 then there is no record containing ID=1.

The point is SSN always identifies an entity in reality.

I was referring to FullName as an Order By field in a report. As a
concatenated field in a query it can't have a PK in any case.
Combining
FirstName and LastName as a PK would be risking duplication.

what the autonumer (IDENTITY) is for, i.e. to eliminate the possibility
of duplicates, so make your compound PK be (LastName, FirstName, ID) in
that specific order.

I think you are placing the wrong significance on the meaning of
primary key. For example, consider this:

CREATE TABLE Workers (
ID INTEGER IDENTITY(1,1) NOT NULL UNIQUE,
last_name VARCHAR(35) NOT NULL,
first_name VARCHAR(35) NOT NULL,
PRIMARY KEY (last_name, first_name, ID)
)
;
CREATE TABLE Wages (
ID INTEGER NOT NULL
REFERENCES Workers (ID)
ON UPDATE CASCADE
ON DELETE CASCADE,
start_date DATETIME NOT NULL,
end_date DATETIME,
wage CURRENCY NOT NULL,
PRIMARY KEY (start_date, ID)
)
;

In other words, there is no rule (in either Jet or relational theory)
that says a FK in one table must use a PK from another; any key (a.k.a.
unique identifier) will do. In relational theory, a primary key has no
special powers. A PK has special meaning for Jet, though i.e. the
aforementioned physical ordering/clustered indexes.

If you are not using your PK in terms of its special meaning to Jet,
what are you using it for?

Jamie.

--

#15 April 13th, 2005, 02:02 PM

"onedaywhen" wrote:

BruceM wrote:
I'm missing your point here. If I delete the record that contains
autonumber ID=1 then there is no record containing ID=1.

The point is SSN always identifies an entity in reality.

I was referring to FullName as an Order By field in a report. As a
concatenated field in a query it can't have a PK in any case.
Combining
FirstName and LastName as a PK would be risking duplication.

what the autonumer (IDENTITY) is for, i.e. to eliminate the possibility
of duplicates, so make your compound PK be (LastName, FirstName, ID) in
that specific order.

ID already makes it unique. That's good enough for me.

I think you are placing the wrong significance on the meaning of
primary key. For example, consider this:

CREATE TABLE Workers (
ID INTEGER IDENTITY(1,1) NOT NULL UNIQUE,
last_name VARCHAR(35) NOT NULL,
first_name VARCHAR(35) NOT NULL,
PRIMARY KEY (last_name, first_name, ID)
)
;
CREATE TABLE Wages (
ID INTEGER NOT NULL
REFERENCES Workers (ID)
ON UPDATE CASCADE
ON DELETE CASCADE,
start_date DATETIME NOT NULL,
end_date DATETIME,
wage CURRENCY NOT NULL,
PRIMARY KEY (start_date, ID)
)
;

In other words, there is no rule (in either Jet or relational theory)
that says a FK in one table must use a PK from another; any key (a.k.a.
unique identifier) will do. In relational theory, a primary key has no
special powers. A PK has special meaning for Jet, though i.e. the
aforementioned physical ordering/clustered indexes.

What if I want to order the information in some other way. If there is
validity to your statements about physical order on the disk it seems to me
it would only apply to one way of looking at the data (sorted alphabetically
by name). What happens if I want to sort by age or by gender, or whatever?
I use a PK because it is the established method. I am interested in using
Access to meet my needs, and feel no compulsion to first reinvent the wheel.
PK works, so I use it. I don't care about physical order on the disk or
clustered indexes or anything of that sort. If I saw some objective
benchmark data showing me that my databases are experiencing a performance
hit because of my use of autonumber PKs I would rethink what I am doing, but
until then I will concentrate on the end users. If my databases are
suffering in terms of performance it probably has more to do with inefficient
coding and things of that sort than to physical order on the hard drive.

If you are not using your PK in terms of its special meaning to Jet,
what are you using it for?

Because it works. I really do appreciate your taking the time to reply in
such detail, but I am unlikely to change my ways.

Jamie.

--

#16 April 14th, 2005, 09:42 AM

BruceM wrote:
ID already makes it unique. That's good enough for me.

Let me give you a real life example I encountered just last week. I am
converting a database table with FirstName and LastName columns plus
the ubiquitous single column PK incrementing INTEGER autonumber column
named ID. The autonumber takes care of the duplicates so there could
happily be two 'Jean Dupont's in the database. However, a problem
arises when data arrives via an email and a procedure named (something
like) GetIDUsingName kicks in, which queries the database using
LastName = 'Dupont' and FirstName = 'Jean' and takes the first row from
the resulting recordset. No problem for me: the client's instruction is
to do a straight port, bugs and all and the autonumber PK actually
makes this bizarre function highly predictable in practice. But it
makes me wonder if that duplicate-breaking autonumber is actually a
solution for anything ...

What if I want to order the information in some other way. If there
is
validity to your statements about physical order on the disk it seems
to me
it would only apply to one way of looking at the data (sorted
alphabetically
by name). What happens if I want to sort by age or by gender, or
whatever?

You should always use an ORDER BY. My point is, when the physical order
happens to coincide with the requested ORDER BY you will get a
performance gain. Obviously you can have only one physical order so you
should choose wisely e.g. the order you most frequent request in your
ORDER BY statements. By choosing your autonumber for the physical
order, you never get the performance benefit; looked at another way,
you are always taking a performance hit.

ORDER BY is not the prime example, though. GROUP BY and BETWEEN are
more significant. That's why I used that example of, 'grab me all the
phone numbers of people whose last name begins with A'. If the required
data rows are already next to each other, as it is in a paper copy
telephone directory, the performance advantage is clear (remember the
BETWEEN and GROUP BY are applied to the rowset early on, whereas ORDER
BY is only applied at the end of the process).

I know of only one way of choosing the physical order in my Jet table
and that is to use the PRIMARY KEY functionality and ensure the file is
regularly compacted. If I want a column that cannot be null and must
contain unique values, but would not be the best choice for the
physical order of the table, I can use NOT NULL with either a UNQIUE
CONSTRAINT or a UNIQUE INDEX.

I use a PK because it is the established method. I am interested in
using
Access to meet my needs, and feel no compulsion to first reinvent the
wheel.
PK works, so I use it. I don't care about physical order on the disk
or
clustered indexes or anything of that sort. If I saw some objective
benchmark data showing me that my databases are experiencing a
performance
hit because of my use of autonumber PKs I would rethink what I am
doing

But you are advising others to use autonumber PKs, thus propagating bad
advice.

I guess my methodology is to try to find the best way of doing
something. I can use logic to determine that a table in physical order
of its 'prime usage' will have a performance advantage so I will go
with this approach unless there is another issue e.g. doing so would
make my schema more difficult to maintain.

With the greatest respect, your approach seems to be more like, 'Well,
it works for me and doesn't run like a dog so I won't change my ways
until someone reputable tells me I should.'

But thank you, as ever, for listening.

Jamie.

--

Thread Tools
Show Printable Version Email this Page
Display Modes
Linear Mode Switch to Hybrid Mode Switch to Threaded Mode

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Are Primary keys needed in child tables?	Nevie and Phil	Database Design	2	January 16th, 2005 07:23 AM
Logical question on primary keys...	Access rookie	Database Design	4	January 8th, 2005 11:26 AM
Using Primary Keys	Jodie	General Discussion	1	July 14th, 2004 08:49 PM
Choosing Primary and Foreign Keys	A.V.H	New Users	8	May 23rd, 2004 09:12 PM