View Single Post
  #15  
Old April 8th, 2007, 10:49 PM posted to microsoft.public.access
Tom Ellison
external usenet poster
 
Posts: 88
Default Why do all my Relationships show as Indeterminate ?

Dear David:

As long a we can discuss things in a way that is not acrimoniuous, which
seems quite possible given your response.

I'll say there's nothing in your response I haven't heard before.

A natural key that is text may be longer in the dependent table than putting
a surrogate key in the dependent table. On the other hand, you must add the
identity surrogate column in the foreign key table, and you must index this
additional column. There are trade-offs in the size of the database, and it
is not always the case that there is any noticable difference in the size of
the database. Also, I do not consider the size of the database to be a
critical factor. Hard disk space is not all that expensive. And size does
not, in this case, equate to performance. I would say that "long integer
columns can be stored more efficiently" is fairly insignificant if what
you're saying is it saves disk space. If you're talking performance, then
again there are significant trade-offs.

I quite disagree that repeating the data in the dependent table, even with a
multi-column key, is a normalization issue. I do not agree that this
violates any of the rules of normalization. If you point out that it
repeats those columns that were in the foreign table, then I respond that
having an identity column in the foreign table and repeating that in the
dependent table must be a violation of the same rule. It seems obvious then
that there is no such rule.

It is not "vastly more efficient". It is not uncommon to write a query that
requires one of more columns to be displayed that are part of the natural
key from the foreign key table. If such columns are repeated in the
dependent table, then putting them on a report will not require a JOIN to
the foreign table. Since they already exist in the dependent table, you can
retrive them directly. In this not uncommon case, there need be no lookup
to retrieve those values.

I expect you are referring to avoiding cascade updates when you speak of
using natural PKs only "when the values will *never* change". The rules of
good database design actually state that the natural key should rarely
change, not never, but the point is pretty much the same. In some of my
designs, such cascades events are recorded. A system can go for a few
months between such occurrances. So, I largely agree whith you, if you will
permit.

There's quite a bit to this discussion, and I'm preparing a series, if not a
book, about this very thing. Frankly, I find there are very cogent
arguments on both sides of the discussion. But, finally, I find that the
disagreement is entirely artificial. The real problem is that the issue
should have been addressed inside the design of the database engine (Jet,
SQL Server, etc.) and could then be entirely transparent to both users and
designers. The facility provided by surrogate "identity" keys could have
been addressed within the engine, automatically, and this can be done with
an efficiency vastly greater than either method implemented by we database
designers. That is, the problem is artificial, and its best solution lies
outside what you and I can effect.

I propose that, for a 1:M relationship (the most common) the database engine
maintain the link between the foreign and dependent tables as an internal
pointer directly to the foreign table row. Unlike the autonumber, this
requires no index, so is several times faster. The engine must keep track
of whether this link is valid or not, and use the lookup facility using the
key when it is not, all the time fixing up those links that have been
invalidated when there is idle time on the server.

This would implement all the best features of both natural and surrogate
keys, and indeed out-perform either. Once viewed in this way, the whole
issue is not really a debate between database designers, but a failure of
database engine designers. The debate we are having is artificially created
by a failure to build the database engine optimally.

Please do not think I have no understanding or sympathy for the point of
view you have proposed. I quite agree with you on some of the major
aspects. My opinions on this have been revolutionized in the past 2 years
as I've come to realize the underlying factors causing this debate.

I believe that, using the existing tools, there may be very good cause to
use surrogate keys for relationships. I'm not so sure this ever happens in
a database less than 2 GB. In my upcoming paper on the subject, I have
created databases and contrasted there performances using both paradigms,
with both Jet and SQL Server. The results may surprise many.

Thanks for engaging in this discussion civilly. See you around!

Tom Ellison
Microsoft Access MVP


"David W. Fenton" wrote in message
. 1...
"Tom Ellison" wrote in
:

I quite commonly create entire, complex databases without using
any autonumber columns, using the natural keys as the primary
keys. I have not found there to be any disadvantage in doing this
for fairly small databases, by which I mean any less than 2GB, the
Jet limit. Indeed, I can get more data into my databases by
having fewer columns and fewer indexes.


Single-column natural keys *do* add overhead if they are text fields
because long integer columns can be stored more efficiently.

I also find the update issues very problematic.

For multi-column natural keys, the surrogate key is vastly more
efficient, since for the former, all the data has to be repeated,
which rather defeats the purpose of normalization.

I use natural PKs only for lookup tables, and that not very often --
only when I am absolutely certain the values will *never* change,
ever.

--
David W. Fenton http://www.dfenton.com/
usenet at dfenton dot com http://www.dfenton.com/DFA/