Strategy for combining lists

**BruceM** · March 16th, 2007, 05:13 PM posted to microsoft.public.access.tablesdbdesign

"Jamie Collins" wrote in message
oups.com...
On Mar 16, 12:43 pm, "BruceM" wrote:
Which normalization 'principles' were you concerned about? A person's
name is single a atomic fact which in a DBMS is usually split into
elements (e.g. first_name, middle_names, last_name) for practical
purposes, therefore normalization considerations are moot.

The normalization principle is storing the name redundantly. Joe Jones
appears in the Employee table (stored as LastName and FirstName). Then
Joe
Jones is stored in the Instructor field as the full name each time he
conducts a training session. In truth, that doesn't bother me very much,
but I wonder if there is a better way.

If you've got a compound key of (first name, last name) then using it
as a foreign key is not redundancy. If you mean that two text columns
are 'bigger' than one numeric column then that's not redundancy
either.

OK, I think I see your point, but it seems like semantics to me. In a large
company you would probably need five fields to assure that a person can be
uniquely identified, and one of those fields would probably need to be a
Comment field in case there is no "natural" way to tell them apart. By that
reasoning it seems that redundancy is almost irrelevant, since you are
always storing the natural key. By the way, do you use Cascade Updates or
Update queries or what exactly when somebody's name changes?

I see that if I were storing three fields
(Last, First, Middle) as the FK in a table related to the Employee table
then there is no question of reduncancy, since I would be storing the
full
name in every record related to the employee table.

So you seem to understand the point so why the contradiction above?

I do not choose to store three FK fields where one will do. Furthermore,
two people could have the same name, so either that situation needs to be
anticipated, and a suffix field or something added to the key, or the
database will need to be redesigned in order to accomodate that situation.

As I expect you know
from our previous discussions, I am inclined to use surrogate keys unless
a
simple natural key presents itself.

You need a natural/logical key before you can use a surrogate. See my
reply elsewhere in this thread.

Acknowledged and commented upon in other posting.

All you'd be doing is assigning a sequence number to non-employees;
that's not the same thing as uniquely identifying instructors. Would
you issue instructors with this sequence number to and ask them to
quote it in all related correspondence etc? A key must exist in the
reality being modelled therefore if you've invented it then you have
to expose it.

When somebody comes in to to HazMat or
First Aid training, we contact a qualified company, and they send an
instructor. The instructor is identified by name.

I don't see that there is a reason why the key "must" exist in the
reality
being modelled, or that I have to expose the key.

The most common reason you get for not using (first name, last name)
as key is because they are commonly duplicated. Take you natural key
for external trainer, being 'company identifier' (don't know your
jurisdiction/business so let's say DUNS number) and trainer's name.
You could tell me that training company '12345678900' has two trainers
named 'Joe Jones'. There's a temptation to think an autonumber (ID)
will help you here because it allows you to do this:

INSERT INTO ExternalTrainers (ID, company_duns, trainer_name)
VALUES (1, '12345678900', 'Joe Jones')
;
INSERT INTO ExternalTrainers (ID, company_duns, trainer_name)
VALUES (2, '12345678900', 'Joe Jones')
;

You book both for a course. Joe Jones ID=1 goes down a storm but Joe
Jones ID=2 is most unpopular. Upon re-booking you ask the company,
"Send us Joe Jones but can we have someone else instead of Joe Jones,
please?" I don't think asking for Joe Jones ID=1 is going to clarify
matters either. It's the nature of identifiers: they actually have to
identify people.

This is not the model with which I am dealing. A company that has two
people with the same name will find a way to distinguish them if they need
to be paged or whatever. People have more sense than it seems you are
willing to acknowledge. I understand the hypothetical problem, but in the
real world people will find a way of distinguishing one person from another,
except maybe on the No-Fly list. ;-)

My bank issues me with a number, embosses it on my card, encodes it
into the chip and give me a pin to remember. They invented these
identifiers, exposed them and I have to be in possession of them to
identify myself as the account holder.

Jamie.

--

**Jamie Collins** · March 19th, 2007, 02:19 PM posted to microsoft.public.access.tablesdbdesign

On Mar 16, 5:13 pm, "BruceM" wrote:
A company that has two
people with the same name will find a way to distinguish them if they need
to be paged or whatever. People have more sense than it seems you are
willing to acknowledge. I understand the hypothetical problem, but in the
real world people will find a way of distinguishing one person from another

But computers are pretty dumb. You need to tell them everything. You
can't expect them to verify personal data and identifiers for you. I'm
reminded of this article about what a DBMS does:

An Old Class of Errors
by Fabian Pascal
www.dbazine.com/ofinterest/oi-articles/pascal29

[quote]
No DBMS can guarantee truth. If, for example, there is an EMPLOYEES
table in the database:

EMPLOYEES {EMP#, ENAME, DEPT#, SALARY}

that contains a row:

{E21,Pianka,A00,25000}

representing a proposition about an employee:

"Employee uniquely identified by employee number E21, has name
Pianka, works in department A00, earns salary 25,000"

there is no way a DBMS can tell whether the row represents a
proposition that is true or not in the real world (in fact, it does
not even know what an employee, a name, a department, or a salary is!)
The only thing it can, and should do, is to guarantee consistency with
the integrity constraints (and, therefore, the business rules) in
effect.
[Unquote]

Consider the EmployeeID *you* use. I assume the trusted source for
this identifier is your organization's personnel department, who are
responsible for verifying that the person exists, are who they say
they are, are fit for the job (health, qualifications, spent criminal
convictions, etc) and so on. Then, to make things easy for the
enterprise as a whole, the personnel department associate the person
with an EmployeeID, something I call a 'business key', and exposes it
e.g. prints it on the employee's payslip. The personnel department may
even have used positive monotonic integers and a computer could have
generated this for them (i.e. an autonumber) but the important thing
is that the job of identifying people is the responsibility if the
agency that exposes the identifier.

Yes, people can be distinguished by a human in the reality being
modelled. Yes, people entities within the table can be distinguished
by a human using the primary key, even if it's an autonumber. What the
human in question needs is a way to associate the two e.g. which Joe
Jones is autonumber ID=1. AFAIK there are two ways to do this: either
bring different/additional identifier data into the DBMS or expose the
autonumber to the reality e.g. keep different/additional identifier
plus the autonumber in a document management system (think filing
cabinet, one file per trainer, autonumber written on the front of each
file).

Now do you see why I don't think you should put the identifiers
EmployeeID and external_trainer_ID in the same coumn? They may be the
same data type, both identify professional people, etc but they are
not of the same domain and each have different trusted sources. If you
are tempted to do the latter, consider that autonumber advocates
(including the article tina linked to) say that the autonumber pk must
be meaningless outside the DBMS but exposing it means you are
attaching meaning; many autonumber advocates go as far as saying that
an autonumber should not be exposed. Also consider that a monotonic
integer may be good for the DBMS but does not make a user-friendly
identifier in reality, so why not come up with a well thought out
external_trainer_ID and, if you must, use an autonumber as a
'surrogate'?

FWIW I wonder if external_trainer_ID is actually required: I suspect
the most important fact is the contract with the training company and
companies are much easier to identify, even after companies have
merged etc. Maybe the person who gave the training on the day is
something for the 'notes' rather than being part of the identifier.

By the way, do you use Cascade Updates

Sometimes but on a strictly case-by-case basis.

Jamie.

--

**BruceM** · March 19th, 2007, 03:20 PM posted to microsoft.public.access.tablesdbdesign

"Jamie Collins" wrote in message
ups.com...
On Mar 16, 5:13 pm, "BruceM" wrote:
A company that has two
people with the same name will find a way to distinguish them if they
need
to be paged or whatever. People have more sense than it seems you are
willing to acknowledge. I understand the hypothetical problem, but in
the
real world people will find a way of distinguishing one person from
another

But computers are pretty dumb. You need to tell them everything. You
can't expect them to verify personal data and identifiers for you. I'm
reminded of this article about what a DBMS does:

An Old Class of Errors
by Fabian Pascal
www.dbazine.com/ofinterest/oi-articles/pascal29

[quote]
No DBMS can guarantee truth. If, for example, there is an EMPLOYEES
table in the database:

EMPLOYEES {EMP#, ENAME, DEPT#, SALARY}

that contains a row:

{E21,Pianka,A00,25000}

representing a proposition about an employee:

"Employee uniquely identified by employee number E21, has name
Pianka, works in department A00, earns salary 25,000"

there is no way a DBMS can tell whether the row represents a
proposition that is true or not in the real world (in fact, it does
not even know what an employee, a name, a department, or a salary is!)
The only thing it can, and should do, is to guarantee consistency with
the integrity constraints (and, therefore, the business rules) in
effect.
[Unquote]

Consider the EmployeeID *you* use. I assume the trusted source for
this identifier is your organization's personnel department, who are
responsible for verifying that the person exists, are who they say
they are, are fit for the job (health, qualifications, spent criminal
convictions, etc) and so on. Then, to make things easy for the
enterprise as a whole, the personnel department associate the person
with an EmployeeID, something I call a 'business key', and exposes it
e.g. prints it on the employee's payslip. The personnel department may
even have used positive monotonic integers and a computer could have
generated this for them (i.e. an autonumber) but the important thing
is that the job of identifying people is the responsibility if the
agency that exposes the identifier.

Yes, people can be distinguished by a human in the reality being
modelled. Yes, people entities within the table can be distinguished
by a human using the primary key, even if it's an autonumber. What the
human in question needs is a way to associate the two e.g. which Joe
Jones is autonumber ID=1. AFAIK there are two ways to do this: either
bring different/additional identifier data into the DBMS or expose the
autonumber to the reality e.g. keep different/additional identifier
plus the autonumber in a document management system (think filing
cabinet, one file per trainer, autonumber written on the front of each
file).

I work in a facility that has fewer than 100 employees. Due to the nature
of the business, it isn't going to grow to the point of having thousands of
employees. The real-world environment in which this project occurs guides
my decisions. There is no realistic chance that we won't be able to find a
way to distinguish people from each other. If two people with the exact
same name end up working in the same department and have the same job title,
then we can expose the employee ID number or do something else, but until
then I will concern myself with real issues.
I am going to use surrogate keys. I am not going to read any more articles
on the subject. My database is not going to disintegrate into anarchy as a
result of my using surrogate keys. Go ahead and believe that I am wrong,
misguided, or whatever you wish. I don't care.
I have asked time and time again what you do if a person's name changes and
you are using a "natural" key that includes the name. You need to update
all of the five or six or whatever number of foreign key fields are needed
to resolve the relationship, yet you use cascade updates infrequently, I
gather.

Now do you see why I don't think you should put the identifiers
EmployeeID and external_trainer_ID in the same coumn? They may be the
same data type, both identify professional people, etc but they are
not of the same domain and each have different trusted sources. If you
are tempted to do the latter, consider that autonumber advocates
(including the article tina linked to) say that the autonumber pk must
be meaningless outside the DBMS but exposing it means you are
attaching meaning; many autonumber advocates go as far as saying that
an autonumber should not be exposed. Also consider that a monotonic
integer may be good for the DBMS but does not make a user-friendly
identifier in reality, so why not come up with a well thought out
external_trainer_ID and, if you must, use an autonumber as a
'surrogate'?

FWIW I wonder if external_trainer_ID is actually required: I suspect
the most important fact is the contract with the training company and
companies are much easier to identify, even after companies have
merged etc. Maybe the person who gave the training on the day is
something for the 'notes' rather than being part of the identifier.

Right now the users either select an instructor name from the list, or they
type in a name. Anything else would involve using an input box, at the
least, to enter a new outside instructor. I choose to make it as simple as
I can for the users. I will store the instructor name as I always have. If
the instructor is also an employee, I will store that name. If that
instructor's name changes one day, the old records will have the old name.
It doesn't matter. I have asked if it is possible to use a number as the
identifier. I can force the number in the Outside Instructors table to be
above 1000000 or whatever I want. Our employee ID numbers will not go above
four digits. If need be, I can pad the number with zeros or something, but
I won't worry about it for now.
I have asked if it is possible to create the relationship between one of two
tables (Employees and Instructors) and an Instructor field in the
TrainingSession table (actually, in a related table, since there can be
several instructors for a session). Whether I am using a surrogate key or a
six-field natural key, my question, still unanswered, is about that very
problem.

By the way, do you use Cascade Updates

Sometimes but on a strictly case-by-case basis.

Jamie.

--

**Jamie Collins** · March 19th, 2007, 04:27 PM posted to microsoft.public.access.tablesdbdesign

On Mar 19, 3:20 pm, "BruceM" wrote:
I have asked if it is possible to create the relationship between one of two
tables (Employees and Instructors) and an Instructor field in the
TrainingSession table (actually, in a related table, since there can be
several instructors for a session). Whether I am using a surrogate key or a
six-field natural key, my question, still unanswered, is about that very
problem.

A foreign key to references one table. To be able to reference 'one of
two tables' you would need to use a level of abstraction such as
subclassing e.g. (air code)

CREATE TABLE Trainers (
trainer_ID CHAR(10) NOT NULL UNIQUE,
trainer_type CHAR(8) NOT NULL,
CHECK (trainer_type IN ('Internal', 'External'))
)
;
CREATE TABLE InternalTrainers (
employee_ID INTEGER NOT NULL UNIQUE
REFERENCES Employees (employee_ID),
trainer_ID CHAR(10) NOT NULL UNIQUE,
trainer_type CHAR(8) NOT NULL,
CHECK (trainer_type = 'Internal'),
FOREIGN KEY (trainer_type, trainer_ID)
REFERENCES Trainers (trainer_type, trainer_ID),
columns specific to employee-as-trainer here
)
;
CREATE TABLE ExternalTrainers (
trainer_ID CHAR(10) NOT NULL UNIQUE,
trainer_type CHAR(8) NOT NULL,
CHECK (trainer_type = 'External'),
FOREIGN KEY (trainer_type, trainer_ID)
REFERENCES Trainers (trainer_type, trainer_ID),
columns specific to external trainer, including key, here
)
;
CREATE TABLE TrainingSessions (
trainer_ID CHAR(10) NOT NULL UNIQUE,
trainer_type CHAR(8) NOT NULL,
FOREIGN KEY (trainer_type, trainer_ID)
REFERENCES Trainers (trainer_type, trainer_ID),
columns specific to training session, including key, here
)
;

Of course, trainer_ID is completely artificial here. Perhaps the above
should be named 'superclassing' i.e. taking two real but distinct
domains and artificially combining them via a fabricated 'superclass'.

As I said earlier, for me the obvious answer is to keep them separate
i.e. one table for employee-as-trainer sessions and one for external
trainer-run sessions. If you are having trouble combining them, why
force the issue? What is the motivation for a combined table?

I have asked time and time again what you do if a person's name changes and
you are using a "natural" key that includes the name. You need to update
all of the five or six or whatever number of foreign key fields are needed
to resolve the relationship, yet you use cascade updates infrequently, I
gather.

Me personally? I rarely if ever use a person's name as a DBMS
identifier, for all the obvious reasons. I tend to use industry
standard keys (ISBN) and business keys (employee_ID) that have a
trusted source. I think I asked this upthread e.g. are trainers
(people) regulated by an authority, association, etc that can provide
an identifier? And I've also said I think the identifier should be for
the training company (i.e. the one with which there is a constract)
rather than individual people. Sorry, I'm not familiar with the
business training sector in your region of the world to give a
prescriptive answer but every region tends to have a governmental
bureaucracy (e.g. UK=Companies House Reference Number) or tax agency
(UK=HMRC Unique Tax Reference, ECON/SCON numbers, etc) that can
assist; DUNS is private sector and not universal but has the advantage
of being global.

I am going to use surrogate keys. I am not going to read any more articles
on the subject. My database is not going to disintegrate into anarchy as a
result of my using surrogate keys. Go ahead and believe that I am wrong,
misguided, or whatever you wish. I don't care.

I haven't been arguing against surrogates and I generally tend to
avoid doing so because I see it as one of those 'lifestyle choices'
i.e. it's not for me but I don't criticize you for choosing them
because I can see some benefit but I think the disadvantages outweigh.

Jamie.

--

**BruceM** · March 19th, 2007, 05:30 PM posted to microsoft.public.access.tablesdbdesign

"Jamie Collins" wrote in message
ups.com...
On Mar 19, 3:20 pm, "BruceM" wrote:
I have asked if it is possible to create the relationship between one of
two
tables (Employees and Instructors) and an Instructor field in the
TrainingSession table (actually, in a related table, since there can be
several instructors for a session). Whether I am using a surrogate key
or a
six-field natural key, my question, still unanswered, is about that very
problem.

A foreign key to references one table. To be able to reference 'one of
two tables' you would need to use a level of abstraction such as
subclassing e.g. (air code)

CREATE TABLE Trainers (
trainer_ID CHAR(10) NOT NULL UNIQUE,
trainer_type CHAR(8) NOT NULL,
CHECK (trainer_type IN ('Internal', 'External'))
)
;
CREATE TABLE InternalTrainers (
employee_ID INTEGER NOT NULL UNIQUE
REFERENCES Employees (employee_ID),
trainer_ID CHAR(10) NOT NULL UNIQUE,
trainer_type CHAR(8) NOT NULL,
CHECK (trainer_type = 'Internal'),
FOREIGN KEY (trainer_type, trainer_ID)
REFERENCES Trainers (trainer_type, trainer_ID),
columns specific to employee-as-trainer here
)
;
CREATE TABLE ExternalTrainers (
trainer_ID CHAR(10) NOT NULL UNIQUE,
trainer_type CHAR(8) NOT NULL,
CHECK (trainer_type = 'External'),
FOREIGN KEY (trainer_type, trainer_ID)
REFERENCES Trainers (trainer_type, trainer_ID),
columns specific to external trainer, including key, here
)
;
CREATE TABLE TrainingSessions (
trainer_ID CHAR(10) NOT NULL UNIQUE,
trainer_type CHAR(8) NOT NULL,
FOREIGN KEY (trainer_type, trainer_ID)
REFERENCES Trainers (trainer_type, trainer_ID),
columns specific to training session, including key, here
)
;

I don't know how to read this code well enough either to understand just
what it means (what is CHAR(8), etc.), or to get it to produce a table. I
thought that I need to check the ANSI 92 compatible syntax box in Tools
Options Queries, but as I said I don't know how to make it work when it
doesn't produce a table. Or maybe that's not the box I need to check.
Anyhow, I'll take your word for it that I would need to use subclassing.

Of course, trainer_ID is completely artificial here. Perhaps the above
should be named 'superclassing' i.e. taking two real but distinct
domains and artificially combining them via a fabricated 'superclass'.

As I said earlier, for me the obvious answer is to keep them separate
i.e. one table for employee-as-trainer sessions and one for external
trainer-run sessions. If you are having trouble combining them, why
force the issue? What is the motivation for a combined table?

I got it into my head that I should try to avoid storing data redundantly.
To my way of thinking, storing a name over and over again would qualify as
redundancy. Now I don't know what to think, so I'm just going to go with
something that will work for my purposes.

I have asked time and time again what you do if a person's name changes
and
you are using a "natural" key that includes the name. You need to update
all of the five or six or whatever number of foreign key fields are
needed
to resolve the relationship, yet you use cascade updates infrequently, I
gather.

Me personally? I rarely if ever use a person's name as a DBMS
identifier, for all the obvious reasons. I tend to use industry
standard keys (ISBN) and business keys (employee_ID) that have a
trusted source. I think I asked this upthread e.g. are trainers
(people) regulated by an authority, association, etc that can provide
an identifier? And I've also said I think the identifier should be for
the training company (i.e. the one with which there is a constract)
rather than individual people. Sorry, I'm not familiar with the
business training sector in your region of the world to give a
prescriptive answer but every region tends to have a governmental
bureaucracy (e.g. UK=Companies House Reference Number) or tax agency
(UK=HMRC Unique Tax Reference, ECON/SCON numbers, etc) that can
assist; DUNS is private sector and not universal but has the advantage
of being global.

Some trainers are individuals and some are from companies. Identifying
them by company will only work for a very few.

The only thing I'm trying to do is to assure that each person's record has
an unchanging unique identifier. If I am making a database for a volunteer
organization it is quite possible that the people won't have ID numbers, so
I will have to assign something arbitrary. I have no problem with doing
that in situations where people do have standard-issue ID numbers, because I
have already seen the ID number format change. The trainers are not
necessarily regulated in any particular way. HazMat trainers are very
different from trainers who represent a government agency, who are in turn
very different from those who represent our customers and who are providing
instruction on using a new kind of tooling or whatever. For some of these
people there is no particular numbering system at all, much less a single
system for identifying all of them without risk of duplication.

I am going to use surrogate keys. I am not going to read any more
articles
on the subject. My database is not going to disintegrate into anarchy as
a
result of my using surrogate keys. Go ahead and believe that I am wrong,
misguided, or whatever you wish. I don't care.

I haven't been arguing against surrogates and I generally tend to
avoid doing so because I see it as one of those 'lifestyle choices'
i.e. it's not for me but I don't criticize you for choosing them
because I can see some benefit but I think the disadvantages outweigh.

I had thought your arguments were in favor of exposing the key fields, which
is why I thought you were opposed to the use of surrogate keys. Frankly, I
don't follow all of what you are saying, as I don't have the same
vocabulary. Thanks for weighing in, though.

Jamie.

--

Thread Tools
Show Printable Version Email this Page
Display Modes
Linear Mode Switch to Hybrid Mode Switch to Threaded Mode