A Microsoft Office (Excel, Word) forum. OfficeFrustration

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

Go Back   Home » OfficeFrustration forum » Microsoft Access » Database Design
Site Map Home Register Authors List Search Today's Posts Mark Forums Read  

Separate PK in Jxn Tbl?



 
 
Thread Tools Display Modes
  #91  
Old January 27th, 2008, 08:45 PM posted to comp.databases.ms-access, comp.databases.theory,microsoft.public.access, microsoft.public.access.tablesdbdesign,microsoft.public.sqlserver
Rob[_16_]
external usenet poster
 
Posts: 4
Default Separate PK in Jxn Tbl?

On Jan 22, 1:26*pm, "Neil" wrote:
Whenever I've created junction tables in the past, I always made the PK of
the junction table the combined pks from the two other tables. Ex: Table1
PK=A; Table2 PK=B; Junction table consists of two fields, A and B, which
together comprise the PK for the junction table.

However, I just came across some code in which the person created a junction
table with a separate PK consisting of an autonumber field, and then the two
fields.

So I was wondering how others did junction tables -- with a standalone
autonumber PK, or with a PK consisting of the PKs of the tables being
joined? And, if a standalone PK, then why?

Thanks!

Neil


I don't claim to know more about db theory than the cdt regulars, or
to have more experience than the many practioners who have contributed
to this thread.

I do know that the debate over relationship representations has been
going on for years, neither side giving any ground. Some aspects of it
are never addressed. For example, making the two foreign keys in a
junction table a composite PRIMARY key prohibits NULL values from
either key, but declaring a separate surrogate primary key together
with a UNIQUE constraint on the two-foreign-key-composite-key does
allow NULL values for either (or both!). This can be useful for
representing unrelated entity tuples, either childless parents or
orphans.

Practioners in the audience may be interested in this page:

http://www.sfdbs.com/toplevel/fasttrack/fasttrack.shtml

of my website in which I describe an entirely new way to represent
relationships. Or this page:

http://www.sfdbs.com/solopages/relcardtypes.shtml

where the representational capabilities of this new representation
(called Aggregate-Link) are compared to the two more traditional
representations.

Please note, I am not suggesting the use of this representation for
conventional database design. Merely consider it new, and possibly
interesting. (It may not be possible to deploy Aggregate-Link in MS
Access.)

The theorists have already had a field day dumping on this new
representation in this thread:

http://groups.google.com/group/comp....514365a600841#

But most theorists seem to be immune to the practical considerations
of IT. In particular, they don't begin to understand that IT
employers' appreciation for Access is based on cost, not beauty.

Rob
  #92  
Old January 27th, 2008, 09:27 PM posted to comp.databases.ms-access,comp.databases.theory,,microsoft.public.access,microsoft.public.access.tablesdbdesign,,microsoft.public.sqlserver
David W. Fenton
external usenet poster
 
Posts: 3,373
Default Separate PK in Jxn Tbl?

wrote in

m:

If the users only access the tables through forms, conforming to
best practices in Access, how are they going to get garbage into
the tables?


What if there's more than one application built on top of the
database?

I, too, agree that one should put as much of the data logic in the
back end as possible.

However, that doesn't mean I use natural keys very often. I'm
definitely opposed to compound keys for any table whose PK will be a
foreign key in another table. It causes myriad problems of all sorts
(been there, done that), and despite its being theoretically
correct, just doesn't work well in practice.

Just consider one scenario:

You need to build criteria for a query-by-form interface. That means
that to query on the PK of a table with a compound PK, you end up
needing to have multiple fields in your WHERE clause. And if you're
querying multiple records in the table with the compound PK, you'll
need a complex nested OR in your WHERE clause.

I know perfectly well that theoretically speaking you're not
supposed to let your application drive the design of your schema,
but this is a case where common sense tells me that following theory
leads to enormously difficult application logic problems.

Natural keys are great for tables with a single-column natural PK.

Otherwise, surrogate keys make building an application substantially
easier.

And, BTW, I would, of course, advocate that any natural key that is
not used as the PK should naturally have a unique index on it.

And any natural key that can't have a unique index (because some
fields need to be Null) was never a candidate for PK in the first
place, and would have to have had uniqueness enforced in the
application in some fashion anyway.

--
David W. Fenton
http://www.dfenton.com/
usenet at dfenton dot com http://www.dfenton.com/DFA/
  #93  
Old January 27th, 2008, 09:33 PM posted to comp.databases.ms-access,comp.databases.theory,,microsoft.public.access,microsoft.public.access.tablesdbdesign,,microsoft.public.sqlserver
David W. Fenton
external usenet poster
 
Posts: 3,373
Default Separate PK in Jxn Tbl?

JOG wrote in

m:

I certainly don't think developers should excuse sloppy RDBMS
design just because they are using access (and of course I'm sure
many of the professionals here wouldn't dream of doing so, despite
others laxness).


What *are* you talking about?

Any mistakes in schema design that you can make in Access, you can
make in any other RDBMS.

I would agree that there are many places that the wizards in Access
and the sample databases encourage sub-optimal practices. But most
professional developers aren't using either wizards or the sample
databases as their models for developing their own applications.

If your impression of Access comes from futzing with it for 10
minutes and from encountering kludged-together apps created by your
company receptionist, then you just haven't a clue what Access
offers, either as an application development platform or as a data
store (using its native Jet engine).

This is a schema question, and that is orthogonal to Access, because
Access is an application development platform. If you use Jet for
your data store, then Jet is relevant to the discussion. But you can
use any data store that offers an ISAM or ADO or ODBC drivers, and
that means all your schema issues are completely divorced from
Access itself.

That you can't seem to keep this distinction clear in your mentions
of Access demonstrates pretty clearly that you are completely
clueless about Access and really aren't in any position to be making
disparaging comments about it.

--
David W. Fenton http://www.dfenton.com/
usenet at dfenton dot com http://www.dfenton.com/DFA/
  #94  
Old January 27th, 2008, 09:36 PM posted to comp.databases.ms-access, comp.databases.theory,microsoft.public.access, microsoft.public.access.tablesdbdesign,microsoft.public.sqlserver
-CELKO-
external usenet poster
 
Posts: 26
Default Separate PK in Jxn Tbl?

If I were a library lending 2 undiffentiatable copies of 1 book to the same person, I suppose I would need an Autonumber PK plus A BookID and BorrowerID.

No, you would fire your librarian for failure to keep an accession
number or copy number as part of the book's identifier. And the book
should be identified by an ISBN.


  #95  
Old January 27th, 2008, 09:39 PM posted to comp.databases.ms-access,comp.databases.theory,,microsoft.public.access,microsoft.public.access.tablesdbdesign,,microsoft.public.sqlserver
David W. Fenton
external usenet poster
 
Posts: 3,373
Default Separate PK in Jxn Tbl?

JOG wrote in

m:

No probs, although off the top of my head its gonna be a bit
contrived. With an artificial key:

Marriages {id, husband, wife, date}
Kids_from_Marriage {from_id, name, birth}

A query that asks "fetch me all the children whose mother is x"
obviously requires an equijoin, matching Marriages.id and
Kids.from_id. However with the original natural keys:

Marriages {id, husband, wife, date}
Kids_from_Marriage {mother, father, name, birth}

The same query is a simple select. That certainly seems a lot less
complicated to me


Assuming you've got some form of CASCADE UPDATE on your enforced
relationship, that will work.

But it's repeating a huge amount of data, and adding a bunch more
indexes to keep updated. Perhaps these are insignificant issues to
*you* and *your* apps, but my clients' apps (some using Jet, some
using SQL Server, some using MySQL) don't perform so well when you
add in all the overhead.

And it all leaves aside the question of how you know that
husband/wife/date is always going to be unique. I think that on any
given day in the US, there are plenty of marriages in which those
three values will be identical. You could add place. But then, in
large cities, that might not be enough. So use Postal Code in place
of place, and that might do the trick, although in large cities that
might not do it, either.

Given that I can foresee a reasonable possibility of a collision on
this candidate key as currently defined, I'd think long and hard on
whether to use it or not.

And it's one of the main problems any time you're storing data about
people in a data table -- you often lack pieces of the information,
and you run a high risk of collisions between people with the same
names.

And that fact of the real-world entities being modelled makes
finding a natural key that will work as a PK a very hard task.

--
David W. Fenton http://www.dfenton.com/
usenet at dfenton dot com http://www.dfenton.com/DFA/
  #96  
Old January 27th, 2008, 09:45 PM posted to comp.databases.ms-access,comp.databases.theory,,microsoft.public.access,microsoft.public.access.tablesdbdesign,,microsoft.public.sqlserver
David W. Fenton
external usenet poster
 
Posts: 3,373
Default Separate PK in Jxn Tbl?

Rob wrote in

m:

I do know that the debate over relationship representations has
been going on for years, neither side giving any ground. Some
aspects of it are never addressed. For example, making the two
foreign keys in a junction table a composite PRIMARY key prohibits
NULL values from either key, but declaring a separate surrogate
primary key together with a UNIQUE constraint on the
two-foreign-key-composite-key does allow NULL values for either
(or both!). This can be useful for representing unrelated entity
tuples, either childless parents or orphans.


Er, why go to all that trouble? You can represent the same childless
parents by simply omitting a record in the junction table.

--
David W. Fenton http://www.dfenton.com/
usenet at dfenton dot com http://www.dfenton.com/DFA/
  #97  
Old January 27th, 2008, 09:46 PM posted to comp.databases.ms-access, comp.databases.theory,microsoft.public.access, microsoft.public.access.tablesdbdesign,microsoft.public.sqlserver
-CELKO-
external usenet poster
 
Posts: 26
Default Separate PK in Jxn Tbl?

Whenever I've created junction tables [sic] in the past, I always made the PK of the junction table the combined pks from the two other tables..

I wish I knew where people got the term "junction table" when they
mean a table that models a relationship . I guess if you are still
thinking in terms PK-FK being pointer chains and structure and not
relational, then you would borrow old terminology from Network
Databases. And bring that mindset with you, too.

However, I just came across some code in which the person created a junction table with a separate PK consisting of an autonumber field [sic], and then the two fields [sic].


See what I mean about the poisoned mindset? Rows are not records;
fields are not columns; tables are not files; references are
constraints and not pointers.

And you have been doing it right, in spite of having a bad
vocabulary !!

The "id"-iot with the magical auto-numbering is still using a
sequential file model in his head. How can an exposed physical
locator, based on the state of one particular machine at insertion
time, with one copy of one particular release of one particular SQL
product be a relational key? Since a key is subset of the attributes
of an entity, that magical number must exist before each row in
inserted into the table and not be subject to insertion time; this is
by definition. Well, it doesn't, so it is crap.

I would try to clean up this schema since it probably has redundant
duplicates and orphaned rows.
  #98  
Old January 27th, 2008, 09:51 PM posted to comp.databases.ms-access, comp.databases.theory,microsoft.public.access, microsoft.public.access.tablesdbdesign,microsoft.public.sqlserver
-CELKO-
external usenet poster
 
Posts: 26
Default Separate PK in Jxn Tbl?

Does a bottle of Gin have a serial no? I am too poor to buy 2 at a time to see if there is a different number Ah!

In the US, you get a UPC/EAN code and then a batch number with booze.
The closest thing to a serial number is your cash register receipt.
Now if someone will explain to me why we have expiration dates on salt
and bottled water to me because I don't get it.
  #99  
Old January 27th, 2008, 10:01 PM posted to comp.databases.ms-access, comp.databases.theory,microsoft.public.access, microsoft.public.access.tablesdbdesign,microsoft.public.sqlserver
-CELKO-
external usenet poster
 
Posts: 26
Default Separate PK in Jxn Tbl?

It has never caused me the least problem. There has been no revelation in this thread that would cause me to even revisit the decision.

How would you know, since autonumbering have no validation or
verification? Trust me, I earn a lot of my living cleaning up these
"fake pointer chain" SQL schemas.

ATTRIBUTES AND FIRST NORMAL FORM (1NF):

An attribute has to belong to an entity in the real world being
modeled by the RDBMS. In First Normal Form, the attributes is modeled
as a column in a table. It must also be an atomic value. I am not
going to get into the differences between atomic and scalar, but
scalar implied atomic and not the other way around.

IDENTITY does not exist in an entity in the real world being modeled
by the RDBMS. Thus, it is not an attribute and cannot be in a table,
by definition.

IDENTITY is a result of the physical state of particular piece of
hardware at a particular time as read by the current release of a
particular database product. It is not a data type. You cannot have
more than one column of this "type" in a table. It is not NULL-able,
which all data types have to be in RDBMS (Dr. Codd again). It is not
a numeric; you cannot do math with it.

It is what is called a "tag number" -- basically, a nominal scale
written with numbers instead of letters. Only equality tests make
sense (see the part about inserting sets of rows, which means that
IDENTITY has no natural ordering, not even temporal).

DEFINITION OF KEYS:

(Dr. Codd, RM II, page 23):
It is equally valid to interpret the uniqueness property in terms of
object identification: the value of the primary key in each row of the
pertinent R-table identifies the particular object represented by that
row uniquely with the type of object s that are presented by that
relation,

Create a table with an IDENTITY as the key and perform a series of
insertions of the same object, say an automobile:

INSERT INTO Vehicles (vin, vehicle_mileage, iso_tire_size) VALUES
(..);
INSERT INTO Vehicles (vin, vehicle_mileage, iso_tire_size) VALUES
(..);

I now have two cars with the same VIN number. Actually, I have two
copies of the same car (object) with an IDENTITY pseudo-key instead of
the industry standard VIN as the proper relational key. This is a
called an insertion anomaly.

Assume that this pair of insertions led to creating vehicles 41 and 42
in the table, which are the same object. I can update 42's mileage
without touching 41. I now have two versions of the truth in my
table. This is a called an update anomaly.

Likewise, if I wreak vehicle 41, I still have copy 42 in the motor
pool in spite of the fact that the object no longer exists. This is
deletion anomaly.

Before you say that you can make a key from (IDENTITY, vin), read more
Dr. Codd (ibid):

If the primary key is composite and if one of the columns is dropped
from the primary key, the first property [uniqueness] is no longer
guaranteed.

Assume that I have correct VINs and use (IDENTITY, vin) as a key.
Dropping the pair clearly does not work -- a lot of vehicles could
have the same mileage and tire sizes, so I do not have unique rows
guaranteed. Dropping IDENTITY will leave me with a proper key that
can be validated, verified and repeated.

Dropping the VIN does not leave me with a guarantee (i.e.
repeatability and predictability). If I run this code:

BEGIN ATOMIC
DELETE FROM Vehicles
WHERE id = 41;
INSERT INTO Vehicles (vehicle_mileage, iso_tire_size)
VALUES (values of inserted row 41 );
END;

The relational algebra says that I should have in effect done
nothing. I have dropped and re-inserted the same object - an EXCEPT
and UNION operation that cancel. But since IDENTITY is physical and
not logical, this does not work.

If I insert the same vehicle (object) into another table, the system
will not guarantee me that I get the same IDENTITY as the relational
key in the other table. The VIN would be guaranteed.

The guarantee requirement gets worse. SQL is a set-oriented language
and allows me to write things like this.

INSERT INTO Vehicles (vin, vehicle_emileage, iso_tire_size)
SELECT vin, vehicle__mileage, iso_tire_size
FROM NewPurchases;

Since a query result is a table, and a table is a set which has no
ordering, what should the IDENTITY numbers be? The entire, whole,
completed set is presented to Vehicles all at once, not a row at a
time. There are (n!) ways to number (n) rows. Which one did you
pick? Why? The answer in SQL products has been to use whatever the
*physical* order of the physical table happened to be. That non-
relational phrase "physical order" again!

But it is actually worse than that. If the same query is executed
again, but with new statistics or after an index has been dropped or
added, the new execution plan could bring the result set back in a
different physical order.

Can you explain from a logical model why the same rows in the second
query get different IDENTITY numbers? In the relational model, they
should be treated the same if all the values of all the attributes are
identical and each row models the same object as it did before.

Yes, I can write a lot of procedural code and triggers, thus violating
all Codd's rules about high-level data manipulation and integrity
being handled by the RDBMS. But that is a kludge and good SQL
programmers know it.

THE KINDS OF KEYS:

Now for a little more practice than theory. Here is my classification
of types of keys. It is based on common usage

1) A natural key is a subset of attributes which occur in a table and
act as a unique identifier. They are seen by the user. You can go to
the external reality and verify them. You would also like to have
some validation rule. Example: UPC codes on consumer goods (read the
package barcode) and validate them with a check digit or a
manufacturer's website or a tool (geographical co-ordinates validate
with a GPS).

These should be either industry standards or natural phenomena.

2) An artificial key is an extra attribute added to the table which is
seen by the user. It does not exist in the external reality, but can
be verified for syntax or check digits inside itself. It is up to the
DBA to maintain a trusted source for them. Example: the open codes in
the UPC scheme which a user can assign to his own stuff. The check
digits still work, but you have to verify them inside your own
enterprise.

If you have to construct a key yourself, it takes time to design them,
to invent a validation rule, set up audit trails, etc. Yes, doing
things right takes time and work. Not like just popping an IDENTITY
on every table in the schema, is it?

3) An "exposed physical locator" is not based on attributes in the
data model, but in the physical storage and is exposed to user. There
is no reasonable way to predict it or verify it, since it usually
comes from the physical state of the hardware at the time of data
insertion. The system obtains a value thru some physical process in
the hardware totally unrelated to the logical data model.

Just because IDENTITY does not hold a track/sector address (like
Oracle's row_id) does not make a logical key. A hash points to a
table with the address. An index (the mechanism in IDENTITY) resolves
to the target address via pointer chains. If you re-hash, or re-
index, the physical locator has to resolve to the new physical
location.

Technically, these IDENTITY values are not really keys at all, since
they are derived from the PHYSICAL storage and are not even attributes
in the LOGICAL data model.

Physical locators include assorted flavors of indexes (Ingres uses
over a dozen kinds), pointer chains (Sybase SQL Anywhere's key joins),
row numbers (Oracle), hash tables (Teradata), bit vectors (Rushmore
and Foxpro), non-contiguous storage (SAND, the whole database is
reduced to a multi-dimensional vector space), inverted list (Model
204), etc. Any device that you can use to locate a particular row in
storage without a sequential search of the storage qualifies.

In the old days, we had to explicitly create, use and update indexes.
They were very exposed and one of the classic errors was not to re-
index a file. For Oracle, the classic was storing a row_id in code or
a file, and then doing a storage re-organization. All of the exposed
locators were subject to some characteristic screw up where logical
and physical models met. Thus, we had utilities for cleaning up
broken pointer chains, re-indexing files, etc. in those early days.

One of the major advances in SQL was putting this responsibility on
the system and not the programmers. The bad news is that if someone
uses exposed locators in an SQL product, there are not a lot of
utilities for cleaning up the mess.

** Notice that people get "exposed physical locator" and "surrogate"
mixed up; they are totally different concepts. **

WHAT IS A SURROGATE KEY?

A quote from Dr. Codd: "..Database users may cause the system to
generate or delete a surrogate, but they have no control over its
value, nor is its value ever displayed to them ..."(Dr. Codd in ACM
TODS, pp 409-410) and Codd, E. (1979), Extending the database
relational model to capture more meaning. ACM Transactions on
Database Systems, 4(4). pp. 397-434.

This means that a surrogate ought to act like an index; created by the
user, managed by the system and NEVER seen by a user. That means
never used in queries, DRI or anything else that a user does.

Codd also wrote the following:

"There are three difficulties in employing user-controlled keys as
permanent surrogates for entities.

(1) The actual values of user-controlled keys are determined by users
and must therefore be subject to change by them (e.g. if two
companies merge, the two employee databases might be combined with the
result that some or all of the serial numbers might be changed.).

(2) Two relations may have user-controlled keys defined on distinct
domains (e.g. one uses social security, while the other uses employee
serial numbers) and yet the entities denoted are the same.

(3) It may be necessary to carry information about an entity either
before it has been assigned a user-controlled key value or after it
has ceased to have one (e.g. and applicant for a job and a retiree).

These difficulties have the important consequence that an equi-join on
common key values may not yield the same result as a join on common
entities. A solution - proposed in part [4] and more fully in [14] -
is to introduce entity domains which contain system-assigned
surrogates.

[emphasis begin] Database users may cause the system to generate or
delete a surrogate, but they have no control over its value, nor is
its value ever displayed to them....." (Codd in ACM TODS, pp 409-410)
[emphasis end].

That means if IDENTITY were a surrogate, we would not see its values,
print them out as invoice numbers and so forth. That means if
IDENTITY were a surrogate, we could drop it and the schema would still
work. Like an index. But IDENTITY fails on both those points.

PRACTICAL STUFF

This is a summary of points already made, but put in one place. The
summary is "Good bye, data integrity" in 25 words or less.

IDENTITY does not port. This is such a basic part of Software
Engineering; I am not going to comment on it. If anyone thinks that
they will always deal only with their own code on one machine with one
release of one product is not planning a long career.

IDENTITY leaves gaps. This is because it is not a surrogate and does
not verify its value against a relational key. It is a software
implementation of a counter, like we had on some of the old tape
drives. The practical problems involve having to explain the gaps to
the SOX guy in accounting.

IDENTITY has no validation. One tag number is as good as another.

IDENTITY has no verification rules. Looking at an object, you cannot
tell what IDENTITY value will be assigned to it. Is 41 or 42 the id
of this automobile? Prove it.

IDENTITY has no repeatability. The value depends on the arrival time
in the physical storage.

David Cressey advanced the theory that this all started out with
Access 97. If you build a database with one of the database wizards,
this is usually what you get.

Also, if you design your own database and create your own tables,
there comes a time, for each table, where MS Access intervenes, and
tells you that you have not assigned a primary key, an that one is
recommended, and that Access will do it for you, if that's ok.

If you say "yes" (who would not?) what you get is precisely this: an
extra numeric column, auto assigned, and declared as the key of the
new table.

Actually, I can live with this, for so called "entity tables". It
would be better for the newbie DBA to discover a natural key, and
declare that, but it's not so bad. For "relationship tables", the
primary key ought to be a compound key made up of two or more foreign
keys. But that is not what Access does.

Someone who has learned database design before building their first
Access database would not do this. But Access makes it seductively
simple to design and build your own database, without even a minimal
understanding of the consequences of your decisions.


  #100  
Old January 27th, 2008, 10:14 PM posted to comp.databases.ms-access, comp.databases.theory,microsoft.public.access, microsoft.public.access.tablesdbdesign,microsoft.public.sqlserver
-CELKO-
external usenet poster
 
Posts: 26
Default Separate PK in Jxn Tbl?

I always use an autonumber PK and a unique index set on the two FK fields [sic]. Why? No particular good reason. One of my database rules is that all tables have an autonumber primary key [sic].

You add redundancy to a schema and never thought about it? If you had
gone thru the Normalization process, this would stick out as a
fundamental design error immediately. You have not been writing SQL;
you are faking a sequential file system in SQL and even say "field"
instead of "column" -- huge conceptual and implementation differences!

It's also slightly easier to delete the record [sic: rows are not records!] in code.


How do you know that you got the right entity? Don't you have a text
editor if a little extra typing is that serious a problem that you
need to add errors and overhead to your SQL?


Now if I was to have a child table [sic: that terms comes from Network DBs; did you used to program in IMS?] from the junction table [sic: did you mean a table that models a relationship among many entities? Or a multi-way pointer structure as in an Network DB?] then I would absolutely use a autonumber primary key [sic: it cannot be a key by definition] for ease of use when designing queries, forms and reports.


I hope that you are not designing forms and reports in the database.
That is a job for the front. Now, in the old days, when COBOL, et al
had a file system, we had to do it this -- monolithic architecture vs
tiered architecture.

The theorists will argue. I don't care.


Would you say the same thing to an accountant who wanted the books to
balance ??
 




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Forum Jump


All times are GMT +1. The time now is 05:55 PM.


Powered by vBulletin® Version 3.6.4
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright ©2004-2024 OfficeFrustration.
The comments are property of their posters.