Separate PK in Jxn Tbl?

**James A. Fortune**

Brian Selzer wrote:

A typical schema with artificial keys:

Customer {CustomerKey, CustomerNo, ...}
Key {CustomerKey}, Key {CustomerNo}

Item {ItemKey, ItemNo, ...}
Key {ItemKey}, Key {ItemNo}

CI {CustomerItemKey, CustomerKey, ItemKey, CustomerItemNo}
Key {CustomerItemKey}, Key {CustomerKey, ItemKey}
CI[ItemKey] IN Item[ItemKey]
CI[CustomerKey] IN Customer[CustomerKey]

SOLine {SOLineKey, SOKey, SOLineNo, CustomerItemKey, Quantity, Price}
Key {SOLineKey}, Key {SOKey, SOLineNo}
SOLine[CustomerItemKey] IN CI[CustomerItemKey]

A typical schema with natural keys

Customer {CustomerNo, ...}
Key {CustomerNo}

Item {ItemNo, ...}
Key {ItemNo}

CI {CustomerNo, ItemNo, CustomerItemNo}
KEY {CustomerNo, ItemNo}
CI[CustomerNo] IN Customer[CustomerNo]
CI[ItemNo] IN Item[ItemNo]

SOLine {SO#, SOLineNo, CustomerNo, ItemNo, Quantity, Price}
SOLine[CustomerNo, ItemNo] IN CI[CustomerNo, ItemNo]

Now write a query that returns how many of item '12345' were sold to
customer '4321'

It should be obvious that with the natural keys, no joins are
necessary--it's just a simple select from SOLine since all of the
information is actually /in/ SOLine; whereas, with the artifical keys,
several joins are required because in order to query by item number and
customer number, SOLine must be joined to CI which must then be joined to
Customer and Item.

Brian,

Thanks for attempting a concrete example to show your point. Although
the first schema doesn't look like something I would have created I will
go over your example and post back later, perhaps when the dust has
settled a bit in this thread. In spite of my sometimes unorthodox ways
I am a reasonable person who is always looking for better ways to do things.

James A. Fortune

**Rob[_16_]**

On Jan 28, 7:01 am, "David Cressey" wrote:
"Rob" wrote in message

...
On Jan 22, 1:26 pm, "Neil" wrote:

(quote)
I don't claim to know more about db theory than the cdt regulars, or
to have more experience than the many practioners who have contributed
to this thread.

I do know that the debate over relationship representations has been
going on for years, neither side giving any ground. Some aspects of it
are never addressed. For example, making the two foreign keys in a
junction table a composite PRIMARY key prohibits NULL values from
either key, but declaring a separate surrogate primary key together
with a UNIQUE constraint on the two-foreign-key-composite-key does
allow NULL values for either (or both!). This can be useful for
representing unrelated entity tuples, either childless parents or
orphans.

(unquote)

In the case of a junction table, this point is moot.

I'm not sure what you mean by "moot". According to:

http://www.usingenglish.com/referenc...oot+point.html

quote
If something's a moot point, there's some disagreement about it: a
debatable point. In the U.S., this expression usually means that there
is no point in debating something, because it just doesn't matter. An
example: If you are arguing over whether to go the beach or to the
park, but you find out the car won't start and you can't go anywhere,
then the destination is said to be a moot point.
/quote

I googled "moot" to make sure I knew what you meant, and was surprised
by the definition. I will take your meaning to be '"there is no point
in debating" whether NULL values in junction tables are ever useful
because they are not'.

Suppose I have an application that (among other things) assigns child
tuples in a child relation to parent tuples in a parent relation. I've
studied the relationship and determined it can be many-to-many, so I
implement the relationships using a junction table.

Consider the following 2 cases (MS SQL syntax):
IMPLICIT: Implicit Junction Table Representation of childless parents
and orphans:
create table impJT(
impJT_parentFK int not null,
impJT_childFK int not null,
primary key (impJT_parentFK,impJT_childFK)
)
EXPLICIT: Explicit Junction Table Representation of childless parents
and orphans:
create table expJT(
expJT_PK int identity(1,1) primary key,
expJT_parentFK int,
expJT_childFK int,
UNIQUE(expJT_parentFK,expJT_childFK)
)

In the IMPLICIT case, there will be a primary key index on
(impJT_parentFK,impJT_childFK).
In the EXPLICIT case, there will be a primary key index on expJT_PK
and a unique index on (expJT_parentFK,expJT_childFK). NULL values are
allowed for parentFK, for childFK or both.

Now suppose that there is a parent tuple in the parent relation for
"alphaparent" with primary key "alphaparentPK", and that "alphaparent"
is childless. My application wants to know if "alphaparent" is
childless: If so, it will assign "betachild" with primary key
"betachildPK" to "alphaparent" by inserting
("alphaparentPK","betachildPK") into the junction table.

In the IMPLICIT case, ("alphaparentPK",NULL) won't occur in the
junction table. How does my application determine that "alphaparent"
is childless?

a.) select count(*) from parent as p, impJT as j where p.parentname =
"alphaparent" and p.parentPK = j.impJT_parentFK

This is guaranteed to be slow, and the result is equivocal: I can't be
certain whether alphaparent is in the parent table or not, so a count
of zero is not enough. I could insure that "alphaparent" is in the
parent relation with:

b.) select parentPK,count(*) from parent as p, impJT as j where
p.parentname = "alphaparent" and p.parentPK = j.impJT_parentFK group
by p.parentPK

Probably even slower, but gets the result I need. The point is, you
must access both relations.

In the EXPLICIT case, ("alphaparentPK",NULL) does occur in the
junction table which is indexed. In this case, my application can use:

c.) select parentPK from parent as p, expJT as j where p.parentname =
"alphaparent" and j.expJT_childFK is NULL

A smart query optimizer would avoid access to the junction table
altogether, using only its unique index.

There are of course design tradeoffs between the IMPLICIT and EXPLICIT
cases, and I'm not going to debate database and application designs.
But to say '"there is no point in debating" whether NULL values in
junction tables are ever useful' is just a little hasty.

If both FK values are NULL, the entire row of the junction table can be
omitted with no loss of information.

Disagree. There is a tendency among the cdt purists to assume that
meta models (i.e., database schemas) describe set-oriented storage
structures for facts and propositions. I take a different approach: I
consider these as meta models of dynamic universes. Using the EXPLICIT
junction table above, I can visualize a multi-agent application in
which (NULL,NULL) inserted to the junction table is a way for one
agent to signal another agent that some set of actions is required. In
that case, omitting the (NULL,NULL) tuple means information would
indeed be lost. (Imagine that the agents are transient, they have no
ability to "call" one another and they have no persistent storage
besides what is in the database.) Stop squirming purists!!

Childless parents will exist in the table that contains parents, where the
id of the parent is not an FK, but a PK. Same pattern for orphans. The two
tables can even be the same table.

Not sure what you are driving at here. If this is important, please
clarify and I'll try to respond.

Rob

P.S. I'm no fan of junction tables. I use the Aggregate-Link
representation of relationships described he

http://www.sfdbs.com/toplevel/fasttrack/fasttrack.shtml

The A-L representation is so explicit, it allows for childless parent-
and orphan child tuples in the relationship as well as parent- and
child tuples NOT in the relationship. A junction table admits one or
the other, not both.

Your opinion may differ, but I assure you is most welcome.

**Brian Selzer**

"Roy Hann" wrote in message
...
"Brian Selzer" wrote in message
news

I don't think it's productive to blame it on the users. The problem
you're referring to is due to the nature of keys, and misunderstandings
on the part of the database designer as to what constitutes a key. All
that is required for a key to be a key is that in every possible database
instance, a projection over the attributes in the key for a relation has
the same cardinality as the relation. This does /NOT/ mean that a
particular combination of values /always/ identifies the same individual
in the Universe of Discourse, but only in the picture of the Universe
that is a database instance. In other words, a particular combination of
values may not /necessarily/ identify an individual, but rather may only
/contingently/ identify an individual. This is the nature of keys:
either the values for a key are permanent identifiers, or they're not.
It has nothing to do with how well keys are managed. The values for a
key may be managed perfectly, yet still not be permanent identifiers--the
position of something in a list of things comes to mind.

No, the position number identifies the same position, for all time. The
facts about the current occupant of the position may change willy-nilly.

I don't think so. In the domain of positions, a position number identifies
the same position, for all time: that is the nature of a domain. But
whenever a position number appears as a key in a relation, it identifies an
occupant, even though the occupant identified may not be the same occupant
at different database instances: this is the nature of a key.

Roy

**David W. Fenton**

JOG wrote in

m:

On Jan 27, 8:33 pm, "David W. Fenton"
wrote:
JOG wrote

ps.co m:

I certainly don't think developers should excuse sloppy RDBMS
design just because they are using access (and of course I'm
sure many of the professionals here wouldn't dream of doing so,
despite others laxness).

What *are* you talking about?

Any mistakes in schema design that you can make in Access, you
can make in any other RDBMS.

*Sigh*. Yes, but as bob has pointed out, you've misconstrued my
point. Because it is marketed at different business problems (ones
with few concurrent users, simple domains, comparatively smaller
schema), a lot of Access users can get away with mistakes that
someone using, say, Oracle 11g to keep track of millions of facts
would in the end get called up on. So that's nothing to do with
the technology, just the market, which makes your empassioned
defence of the super-duper jetomatic engine a bit misplaced.

I'm not defending Jet here. I'm pointing out a logical error in your
attributing to "Access" something that has nothing specifically to
do with Access.

I'll also ignore the diatribe that followed in light of your
misunderstanding. (And the fact that you share my mother's maiden
name, so may well be long distant family...).

I do not misunderstand. You clearly are not distinguishing a
development platform (Access) from a database engine (Jet) and from
the fact that the issue at hand is a schem design question, and has
nothing whatsoever to do with Access, or with any particular
database engine.

That you can't seem to keep this distinction straight in your posts
shows one of two things:

1. massive ignorance of the tools you are disparaging

2. extremely bad writing skills.

Of maybe it's some of both.

--
David W. Fenton http://www.dfenton.com/
usenet at dfenton dot com http://www.dfenton.com/DFA/

**Rick Brandt**

David Cressey wrote:
"David W. Fenton" wrote in message
.89...
wrote in

m:

If the users only access the tables through forms, conforming to
best practices in Access, how are they going to get garbage into
the tables?

What if there's more than one application built on top of the
database?

I believe this is a moot point when it comes to MS Access. The app
and the database are all stored together in Access. There is, by
definition, only one Access.

Access regulars, feel free to correct this if it's wrong.

It's wrong :-)

It is the norm for a single-user non-serious app built largely by using the
wizards.

It is the opposite of the norm for a multi-user app built by someone who knows
what they are doing.

--
Rick Brandt, Microsoft Access MVP
Email (as appropriate) to...
RBrandt at Hunter dot com

**Rick Brandt**

Bob Badour wrote:
David Cressey wrote:
"David W. Fenton" wrote in message
.89...

wrote in

m:

If the users only access the tables through forms, conforming to
best practices in Access, how are they going to get garbage into
the tables?

What if there's more than one application built on top of the
database?

I believe this is a moot point when it comes to MS Access. The app
and the database are all stored together in Access. There is, by
definition, only one Access.

Access regulars, feel free to correct this if it's wrong.

While awkward, slow and kludgy, it is possible to attach an Access app
to an external database.

There are a great many of us who almost exclusively build Access apps that use
server back ends. Mine use SQL Server and UDB400 on the IBM ISeries. I only
use MDBs to store data in single-user desktop apps.

There is nothing awkward, slow, or kludgy about it.

--
Rick Brandt, Microsoft Access MVP
Email (as appropriate) to...
RBrandt at Hunter dot com

**Bob Badour**

Rick Brandt wrote:

Bob Badour wrote:

David Cressey wrote:

"David W. Fenton" wrote in message
9.0.89...

wrote in

m:

If the users only access the tables through forms, conforming to
best practices in Access, how are they going to get garbage into
the tables?

What if there's more than one application built on top of the
database?

I believe this is a moot point when it comes to MS Access. The app
and the database are all stored together in Access. There is, by
definition, only one Access.

Access regulars, feel free to correct this if it's wrong.

While awkward, slow and kludgy, it is possible to attach an Access app
to an external database.

There are a great many of us who almost exclusively build Access apps that use
server back ends. Mine use SQL Server and UDB400 on the IBM ISeries. I only
use MDBs to store data in single-user desktop apps.

There is nothing awkward, slow, or kludgy about it.

Apparently you have never had to attach an access app to a new instance
of the database it uses. I found that process incredibly awkward, slow
and kludgy.

**Bob Badour**

David W. Fenton wrote:

JOG wrote in

m:

On Jan 27, 8:33 pm, "David W. Fenton"
wrote:

JOG wrote

ps.co m:

I certainly don't think developers should excuse sloppy RDBMS
design just because they are using access (and of course I'm
sure many of the professionals here wouldn't dream of doing so,
despite others laxness).

What *are* you talking about?

Any mistakes in schema design that you can make in Access, you
can make in any other RDBMS.

*Sigh*. Yes, but as bob has pointed out, you've misconstrued my
point. Because it is marketed at different business problems (ones
with few concurrent users, simple domains, comparatively smaller
schema), a lot of Access users can get away with mistakes that
someone using, say, Oracle 11g to keep track of millions of facts
would in the end get called up on. So that's nothing to do with
the technology, just the market, which makes your empassioned
defence of the super-duper jetomatic engine a bit misplaced.

I'm not defending Jet here. I'm pointing out a logical error in your
attributing to "Access" something that has nothing specifically to
do with Access.

You are an idiot. Jim didn't attribute anything to Access. In fact, he
said it would be wrong to let an ignoramus blame his ignorance on the tool.

I'll also ignore the diatribe that followed in light of your
misunderstanding. (And the fact that you share my mother's maiden
name, so may well be long distant family...).

I do not misunderstand.

Then I can only conclude you lack the capacity to comprehend the
relatively simple written english that appears above.

[irrelevancies snipped]

**Neil**

"James A. Fortune" wrote in message
...
Neil wrote:
Whenever I've created junction tables in the past, I always made the PK
of the junction table the combined pks from the two other tables. Ex:
Table1 PK=A; Table2 PK=B; Junction table consists of two fields, A and B,
which together comprise the PK for the junction table.

However, I just came across some code in which the person created a
junction table with a separate PK consisting of an autonumber field, and
then the two fields.

So I was wondering how others did junction tables -- with a standalone
autonumber PK, or with a PK consisting of the PKs of the tables being
joined? And, if a standalone PK, then why?

Thanks!

Neil

Whenever I have multiple key fields, natural or not, I create an
AutoNumber PK for pragmatic reasons. The main reason is that it makes it
easier to create the joins. The theorists are champions at joining tables
and don't have to be concerned with the complexity of the SQL they write.
If I convert an Access table over to SQLServer I add even another field as
a primary key, usually prefixed with SS (Gasp!).

I keep Jamie's advice in the back of my mind, about how enforcing
constraints at the table level is better than enforcing them through code,
but where I work, no one is going to access the table data using anything
other than Access so I am able to take the high road at my leisure. Maybe
my coding practice just needs to catch up with my philosophy.
Theoretically, the idea of using natural keys is more intellectually
satisfying, but for now the lure of simpler joins is winning out. Lately,
I've increased the amount of normalization in one of my databases and the
joins got even more complicated, adding about a line or so in the SQL view
in Access for every new query using those tables. Queries involving many
to many relationships often add additional tables later and highlight the
need to keep joins as simple as possible.

James A. Fortune

I tend to do the same as you: I tend to put autonumber primary key fields in
place where a multi-field PK will do, for the same reasons: i like to be
able to refer to a single key. However, when it comes to junction tables, it
just seems pointless. Referring to two fields instead of one isn't that big
of a deal; and a table made up of two foreign keys is very clean -- adding
an additional key on top of that seems useless and added baggage.

Larry posted that there are times when an autonumber PK is needed, such as
when using the junction talbe in a combo box, and I would agree with him
there. So in a few cases it might serve a purpose (even though it's not
absolutely necessary there either). But for the most part, it seems
unnecessary.

Neil

**Marshall[_3_]**

On Jan 28, 3:02 pm, Rob wrote:

Disagree. There is a tendency among the cdt purists to assume that
meta models (i.e., database schemas) describe set-oriented storage
structures for facts and propositions. I take a different approach: I
consider these as meta models of dynamic universes. Using the EXPLICIT
junction table above, I can visualize a multi-agent application in
which (NULL,NULL) inserted to the junction table is a way for one
agent to signal another agent that some set of actions is required. In
that case, omitting the (NULL,NULL) tuple means information would
indeed be lost. (Imagine that the agents are transient, they have no
ability to "call" one another and they have no persistent storage
besides what is in the database.) Stop squirming purists!!

Sometimes I issue a "DROP TABLE" to indicate to my wife that
she needs to fill her car up with gas.

Marshall

Thread Tools
Show Printable Version Email this Page
Display Modes
Linear Mode Switch to Hybrid Mode Switch to Threaded Mode