Why do all my Relationships show as Indeterminate ?

**Tom Wickerath** · April 8th, 2007, 04:02 AM posted to microsoft.public.access

You can join on any column, or set of columns, whether it is a primary key
or not. You can join on any key that is unique or not unique.

True enough. What I said is that one of the keys in the relationship must be
unique to prevent the indeterminate relationship that the OP was experiencing.

In an M:M relationship, neither is unique, and therefore an autonumber
is NEVER used in an M:M relationship.

What??? I think you'd better have a look at the Northwind sample database.
Consider the Products, Order Details and Orders tables. I'm sure you know
that the Order Details table serves as the intersection table, to define a
many-to-many relationship between Products and Orders. What type of primary
key is being used in the Products and Orders tables? I see Autonumber in
both tables.

The key on the 1 side of a relationship is unique, but need not be an
autonumber.

I never said it needed to be an autonumber. I happen to have a strong
preference for using autonumber, and I also know from past threads that you
are in the opposite camp, with a preference for natural keys.

Indeed, I can get more data into my databases by having fewer columns
and fewer indexes.

I think that you will find that you get LESS data into your database. Of
course, it depends to some extent on the number of characters in your
text-based PK field. Try the following experiment using the Northwind sample
database, which includes a Text (5) data type for the PK / FK combination
between the Customers and Orders tables. I used Access 2003, with the default
Access 2000 file format:

1.) Create a brand new .mdb file. Turn off NameAutocorrect (we don't want to
include any possible overhead due to name mappings).

2.) Import the Customers and Orders table from a clean copy of Northwind.
Include the relationship with your import.

3.) Compact the database. I get a file that is 408 KB in size for the
text-based PK/FK.
________________________________________

4.) Copy the above database to a new filename.

5.) Add a new field to the Orders table: Number / Long Integer. Name it
(fkCustomerID). Remove the default value of 0.

6.) Delete the relationship between the Customers and Orders tables.

7.) Remove the primary key from CustomerID. Add a new Autonumber data type
(pkCustomerID) and set it as the primary key.

8.) Run the following update query:

UPDATE Customers
INNER JOIN Orders ON Customers.CustomerID = Orders.CustomerID
SET Orders.fkCustomerID = [Customers].[pkCustomerID];

9.) Delete the new update query (if you saved it). Remove the text-based
CustomerID fields from both tables.

10.) Create a new relationship, with enforced RI, between the pkCustomerID
and fkCustomerID fields.

11.) Compact the database. I get a file that is 372 KB in size for the
Autonumber PK / Long Integer FK combination.

Granted, this is a very small set of data (92 customer records and 830
orders). As the number of customer and order records increases, or as other
related tables are added, the difference in size will only become greater,
with the advantage being for the autonumber / long integer field type in the
relationship.

Tom Wickerath
Microsoft Access MVP
https://mvp.support.microsoft.com/profile/Tom
http://www.access.qbuilt.com/html/ex...tributors.html
__________________________________________

"Tom Ellison" wrote:

Dear Tom:

You can join on any column, or set of columns, whether it is a primary key
or not. You can join on any key that is unique or not unique. If the table
is on a 1 side of a relationship, the key will be unique, and possibly a
primary key. If on the M side, it will not be enforced unique (it certainly
can be unique by choice or coincidence in individual cases - there can even
be 0 rows on the M side of a relationship). In an M:M relationship, neither
is unique, and therefore an autonumber is NEVER used in an M:M relationship.
Such tables can have an autonumber columns, they just won't be involved in
the relationship.

The key on the 1 side of a relationship is unique, but need not be an
autonumber. Any unique key of 1 or more columns will work. And, again, any
unique key will do. It need not be the primary key, although it quite often
is.

I quite commonly create entire, complex databases without using any
autonumber columns, using the natural keys as the primary keys. I have not
found there to be any disadvantage in doing this for fairly small databases,
by which I mean any less than 2GB, the Jet limit. Indeed, I can get more
data into my databases by having fewer columns and fewer indexes.

I am working hard on a paper to present a discussion on the difference
between natural key relationships and surrogate (autonumber) key
relationships. I know this has been a hot button issue for some time, and
my recent findings will be surprising. I favor the advantages of both
camps, and am preparing to present a proposal that will give the advantages
of both systems, indeed, which will give additional advantages available
with neither approach.

See you and all my MVP freinds soon in the MVP private newsgroups.

Tom Ellison
Microsoft Access MVP (6 years)

**'69 Camaro** · April 8th, 2007, 09:19 AM posted to microsoft.public.access

Hi, Tom.

Tom Ellison
Microsoft Access MVP (6 years)

Let's see . . . 21 Dec. 2001 until early or mid-2005 and again 1 Jan. 2007
until today. In Gunny math, that's only three or four fingers, not six. Or
were you an MVP for two or three years before and then reselected again in
December 2001?

And why would you need to state how many years you've been an MVP in a
newsgroup post? You've never done that before -- or since. No other MVP
does that in this newsgroup, either. Did you feel it was necessary in order
to justify giving a lesson in elementary relational database design to a
fellow Access MVP who hasn't been an MVP as long as you have?

Wouldn't you feel insulted if that happened to you?

Gunny

See http://www.QBuilt.com for all your database needs.
See http://www.Access.QBuilt.com for Microsoft Access tips and tutorials.
Blogs: www.DataDevilDog.BlogSpot.com, www.DatabaseTips.BlogSpot.com
http://www.Access.QBuilt.com/html/ex...ributors2.html for contact
info.

"Tom Ellison" wrote in message
...
Dear Tom:

You can join on any column, or set of columns, whether it is a primary key
or not. You can join on any key that is unique or not unique. If the
table is on a 1 side of a relationship, the key will be unique, and
possibly a primary key. If on the M side, it will not be enforced unique
(it certainly can be unique by choice or coincidence in individual cases -
there can even be 0 rows on the M side of a relationship). In an M:M
relationship, neither is unique, and therefore an autonumber is NEVER used
in an M:M relationship. Such tables can have an autonumber columns, they
just won't be involved in the relationship.

The key on the 1 side of a relationship is unique, but need not be an
autonumber. Any unique key of 1 or more columns will work. And, again,
any unique key will do. It need not be the primary key, although it quite
often is.

I quite commonly create entire, complex databases without using any
autonumber columns, using the natural keys as the primary keys. I have
not found there to be any disadvantage in doing this for fairly small
databases, by which I mean any less than 2GB, the Jet limit. Indeed, I
can get more data into my databases by having fewer columns and fewer
indexes.

I am working hard on a paper to present a discussion on the difference
between natural key relationships and surrogate (autonumber) key
relationships. I know this has been a hot button issue for some time, and
my recent findings will be surprising. I favor the advantages of both
camps, and am preparing to present a proposal that will give the
advantages of both systems, indeed, which will give additional advantages
available with neither approach.

See you and all my MVP freinds soon in the MVP private newsgroups.

Tom Ellison
Microsoft Access MVP (6 years)

"Tom Wickerath" AOS168b AT comcast DOT net wrote in message
news
Hi David,

That should be "one or both of the fields is uniquely indexed", as
you need both for a 1:1 relationship.

Did you read the rest of my post, where I wrote, where I wrote:
"For your 1:1 relationship, you can join two primary key fields, ... "

Okay, so perhaps I should have wrote:
"For your 1:1 relationship, you MUST join two primary key fields, ...
"

Since 1:1 relationships are fairly rare, compared to 1:M relationships,
writing it as you suggest might cause more user frustration if they want
to
create a 1:M relationship and they tried using uniquely indexed fields on
both sides of the join.

Tom Wickerath
Microsoft Access MVP
https://mvp.support.microsoft.com/profile/Tom
http://www.access.qbuilt.com/html/ex...tributors.html
__________________________________________

"David W. Fenton" wrote:

Tom Wickerath AOS168b AT comcast DOT net wrote in
:

1.) Fields of the same data type (For example, Autonumber ---
Long Integer) 2.) One of the fields is uniquely indexed (either
set as a Primary key, or a unique index has been created first)

That should be "one or both of the fields is uniquely indexed", as
you need both for a 1:1 relationship.

--
David W. Fenton http://www.dfenton.com/
usenet at dfenton dot com http://www.dfenton.com/DFA/

**David W. Fenton** · April 8th, 2007, 01:59 PM posted to microsoft.public.access

Tom Wickerath AOS168b AT comcast DOT net wrote in
news

That should be "one or both of the fields is uniquely indexed",
as you need both for a 1:1 relationship.

Did you read the rest of my post, where I wrote, where I wrote:
"For your 1:1 relationship, you can join two primary key
fields, ... "

Apparently not!

Okay, so perhaps I should have wrote:
"For your 1:1 relationship, you MUST join two primary key
fields, ... "

I tend to read lists/bullet points as being inclusive and that's why
I skipped the details at the end.

Since 1:1 relationships are fairly rare, compared to 1:M
relationships, writing it as you suggest might cause more user
frustration if they want to create a 1:M relationship and they
tried using uniquely indexed fields on both sides of the join.

Perhaps.

--
David W. Fenton http://www.dfenton.com/
usenet at dfenton dot com http://www.dfenton.com/DFA/

**David W. Fenton** · April 8th, 2007, 02:03 PM posted to microsoft.public.access

"Tom Ellison" wrote in
:

I quite commonly create entire, complex databases without using
any autonumber columns, using the natural keys as the primary
keys. I have not found there to be any disadvantage in doing this
for fairly small databases, by which I mean any less than 2GB, the
Jet limit. Indeed, I can get more data into my databases by
having fewer columns and fewer indexes.

Single-column natural keys *do* add overhead if they are text fields
because long integer columns can be stored more efficiently.

I also find the update issues very problematic.

For multi-column natural keys, the surrogate key is vastly more
efficient, since for the former, all the data has to be repeated,
which rather defeats the purpose of normalization.

I use natural PKs only for lookup tables, and that not very often --
only when I am absolutely certain the values will *never* change,
ever.

--
David W. Fenton http://www.dfenton.com/
usenet at dfenton dot com http://www.dfenton.com/DFA/

**Tom Ellison** · April 8th, 2007, 10:49 PM posted to microsoft.public.access

Dear David:

As long a we can discuss things in a way that is not acrimoniuous, which
seems quite possible given your response.

I'll say there's nothing in your response I haven't heard before.

A natural key that is text may be longer in the dependent table than putting
a surrogate key in the dependent table. On the other hand, you must add the
identity surrogate column in the foreign key table, and you must index this
additional column. There are trade-offs in the size of the database, and it
is not always the case that there is any noticable difference in the size of
the database. Also, I do not consider the size of the database to be a
critical factor. Hard disk space is not all that expensive. And size does
not, in this case, equate to performance. I would say that "long integer
columns can be stored more efficiently" is fairly insignificant if what
you're saying is it saves disk space. If you're talking performance, then
again there are significant trade-offs.

I quite disagree that repeating the data in the dependent table, even with a
multi-column key, is a normalization issue. I do not agree that this
violates any of the rules of normalization. If you point out that it
repeats those columns that were in the foreign table, then I respond that
having an identity column in the foreign table and repeating that in the
dependent table must be a violation of the same rule. It seems obvious then
that there is no such rule.

It is not "vastly more efficient". It is not uncommon to write a query that
requires one of more columns to be displayed that are part of the natural
key from the foreign key table. If such columns are repeated in the
dependent table, then putting them on a report will not require a JOIN to
the foreign table. Since they already exist in the dependent table, you can
retrive them directly. In this not uncommon case, there need be no lookup
to retrieve those values.

I expect you are referring to avoiding cascade updates when you speak of
using natural PKs only "when the values will *never* change". The rules of
good database design actually state that the natural key should rarely
change, not never, but the point is pretty much the same. In some of my
designs, such cascades events are recorded. A system can go for a few
months between such occurrances. So, I largely agree whith you, if you will
permit.

There's quite a bit to this discussion, and I'm preparing a series, if not a
book, about this very thing. Frankly, I find there are very cogent
arguments on both sides of the discussion. But, finally, I find that the
disagreement is entirely artificial. The real problem is that the issue
should have been addressed inside the design of the database engine (Jet,
SQL Server, etc.) and could then be entirely transparent to both users and
designers. The facility provided by surrogate "identity" keys could have
been addressed within the engine, automatically, and this can be done with
an efficiency vastly greater than either method implemented by we database
designers. That is, the problem is artificial, and its best solution lies
outside what you and I can effect.

I propose that, for a 1:M relationship (the most common) the database engine
maintain the link between the foreign and dependent tables as an internal
pointer directly to the foreign table row. Unlike the autonumber, this
requires no index, so is several times faster. The engine must keep track
of whether this link is valid or not, and use the lookup facility using the
key when it is not, all the time fixing up those links that have been
invalidated when there is idle time on the server.

This would implement all the best features of both natural and surrogate
keys, and indeed out-perform either. Once viewed in this way, the whole
issue is not really a debate between database designers, but a failure of
database engine designers. The debate we are having is artificially created
by a failure to build the database engine optimally.

Please do not think I have no understanding or sympathy for the point of
view you have proposed. I quite agree with you on some of the major
aspects. My opinions on this have been revolutionized in the past 2 years
as I've come to realize the underlying factors causing this debate.

I believe that, using the existing tools, there may be very good cause to
use surrogate keys for relationships. I'm not so sure this ever happens in
a database less than 2 GB. In my upcoming paper on the subject, I have
created databases and contrasted there performances using both paradigms,
with both Jet and SQL Server. The results may surprise many.

Thanks for engaging in this discussion civilly. See you around!

Tom Ellison
Microsoft Access MVP

"David W. Fenton" wrote in message
. 1...
"Tom Ellison" wrote in
:

I quite commonly create entire, complex databases without using
any autonumber columns, using the natural keys as the primary
keys. I have not found there to be any disadvantage in doing this
for fairly small databases, by which I mean any less than 2GB, the
Jet limit. Indeed, I can get more data into my databases by
having fewer columns and fewer indexes.

Single-column natural keys *do* add overhead if they are text fields
because long integer columns can be stored more efficiently.

I also find the update issues very problematic.

For multi-column natural keys, the surrogate key is vastly more
efficient, since for the former, all the data has to be repeated,
which rather defeats the purpose of normalization.

I use natural PKs only for lookup tables, and that not very often --
only when I am absolutely certain the values will *never* change,
ever.

--
David W. Fenton http://www.dfenton.com/
usenet at dfenton dot com http://www.dfenton.com/DFA/

**Tom Wickerath** · April 8th, 2007, 11:22 PM posted to microsoft.public.access

Tom Ellison:

I'd like to respond to two points you made in your response to David.

On the other hand, you must add the identity surrogate column in the
foreign key table, and you must index this additional column.

The last part of this sentence, "and you must index this additional column",
is true *only* if you create a relationship WITHOUT enforcing referential
integrity. Creating relationships without enforcing RI is little more than an
exercise in drawing lines. If you will create the relationship with enforced
RI, there is no sense in indexing the foreign key as well, because JET will
automatically index this foreign key field as soon as you enforce RI. Note:
This new index will not show up in the Indexes view, in table design, but you
can reveal that it is present with some simple VBA code.

Also, I do not consider the size of the database to be a
critical factor. Hard disk space is not all that expensive.

Perhaps not, but you did make the following statement earlier, which
appeared to be a statement in support of using natural keys instead of
surrogate (autonumber) keys:

Indeed, I can get more data into my databases by having fewer columns
and fewer indexes.

Tom Wickerath
Microsoft Access MVP
https://mvp.support.microsoft.com/profile/Tom
http://www.access.qbuilt.com/html/ex...tributors.html
__________________________________________

"Tom Ellison" wrote:

Dear David:

As long a we can discuss things in a way that is not acrimoniuous, which
seems quite possible given your response.

I'll say there's nothing in your response I haven't heard before.

A natural key that is text may be longer in the dependent table than putting
a surrogate key in the dependent table. On the other hand, you must add the
identity surrogate column in the foreign key table, and you must index this
additional column. There are trade-offs in the size of the database, and it
is not always the case that there is any noticable difference in the size of
the database. Also, I do not consider the size of the database to be a
critical factor. Hard disk space is not all that expensive. And size does
not, in this case, equate to performance. I would say that "long integer
columns can be stored more efficiently" is fairly insignificant if what
you're saying is it saves disk space. If you're talking performance, then
again there are significant trade-offs.

I quite disagree that repeating the data in the dependent table, even with a
multi-column key, is a normalization issue. I do not agree that this
violates any of the rules of normalization. If you point out that it
repeats those columns that were in the foreign table, then I respond that
having an identity column in the foreign table and repeating that in the
dependent table must be a violation of the same rule. It seems obvious then
that there is no such rule.

It is not "vastly more efficient". It is not uncommon to write a query that
requires one of more columns to be displayed that are part of the natural
key from the foreign key table. If such columns are repeated in the
dependent table, then putting them on a report will not require a JOIN to
the foreign table. Since they already exist in the dependent table, you can
retrive them directly. In this not uncommon case, there need be no lookup
to retrieve those values.

I expect you are referring to avoiding cascade updates when you speak of
using natural PKs only "when the values will *never* change". The rules of
good database design actually state that the natural key should rarely
change, not never, but the point is pretty much the same. In some of my
designs, such cascades events are recorded. A system can go for a few
months between such occurrances. So, I largely agree whith you, if you will
permit.

There's quite a bit to this discussion, and I'm preparing a series, if not a
book, about this very thing. Frankly, I find there are very cogent
arguments on both sides of the discussion. But, finally, I find that the
disagreement is entirely artificial. The real problem is that the issue
should have been addressed inside the design of the database engine (Jet,
SQL Server, etc.) and could then be entirely transparent to both users and
designers. The facility provided by surrogate "identity" keys could have
been addressed within the engine, automatically, and this can be done with
an efficiency vastly greater than either method implemented by we database
designers. That is, the problem is artificial, and its best solution lies
outside what you and I can effect.

I propose that, for a 1:M relationship (the most common) the database engine
maintain the link between the foreign and dependent tables as an internal
pointer directly to the foreign table row. Unlike the autonumber, this
requires no index, so is several times faster. The engine must keep track
of whether this link is valid or not, and use the lookup facility using the
key when it is not, all the time fixing up those links that have been
invalidated when there is idle time on the server.

This would implement all the best features of both natural and surrogate
keys, and indeed out-perform either. Once viewed in this way, the whole
issue is not really a debate between database designers, but a failure of
database engine designers. The debate we are having is artificially created
by a failure to build the database engine optimally.

Please do not think I have no understanding or sympathy for the point of
view you have proposed. I quite agree with you on some of the major
aspects. My opinions on this have been revolutionized in the past 2 years
as I've come to realize the underlying factors causing this debate.

I believe that, using the existing tools, there may be very good cause to
use surrogate keys for relationships. I'm not so sure this ever happens in
a database less than 2 GB. In my upcoming paper on the subject, I have
created databases and contrasted there performances using both paradigms,
with both Jet and SQL Server. The results may surprise many.

Thanks for engaging in this discussion civilly. See you around!

Tom Ellison
Microsoft Access MVP

**Tom Ellison** · April 9th, 2007, 12:46 AM posted to microsoft.public.access

Dear Tom:

Are you talking about a 1:M relationship in which the foreign key table is
not indexed on the related column(s)? How can you enforce uniqueness in the
foreign table without a unique index? But then you speak of HAVING a
relationship without RI. Are you speaking of a relationship in which both
tables may have rows that do not participate in the relationship? Without
RI, you are permitting that to happen!

Have you considered what happens to the performance of a query when a JOIN
has no index it can use, and must perform a table scan for every instance of
a row in the FK table? What kind of database design is this?

My statement about having space in the database for more information was a
statement about just that, space. With a 2 GB limit for Jet and MSDE, this
can be an issue. My other statements were about performance, not about
running out of room. These are sometimes competing requirements, and are
certainly not the same thing. In many cases, it is desirable to trade space
for performance. That is exactly the case when deciding to create indexes.

Tom Ellison
Microsoft Access MVP

"Tom Wickerath" AOS168b AT comcast DOT net wrote in message
news

Tom Ellison:

I'd like to respond to two points you made in your response to David.

On the other hand, you must add the identity surrogate column in the
foreign key table, and you must index this additional column.

The last part of this sentence, "and you must index this additional
column",
is true *only* if you create a relationship WITHOUT enforcing referential
integrity. Creating relationships without enforcing RI is little more than
an
exercise in drawing lines. If you will create the relationship with
enforced
RI, there is no sense in indexing the foreign key as well, because JET
will
automatically index this foreign key field as soon as you enforce RI.
Note:
This new index will not show up in the Indexes view, in table design, but
you
can reveal that it is present with some simple VBA code.

Also, I do not consider the size of the database to be a
critical factor. Hard disk space is not all that expensive.

Perhaps not, but you did make the following statement earlier, which
appeared to be a statement in support of using natural keys instead of
surrogate (autonumber) keys:

Indeed, I can get more data into my databases by having fewer columns
and fewer indexes.

Tom Wickerath
Microsoft Access MVP
https://mvp.support.microsoft.com/profile/Tom
http://www.access.qbuilt.com/html/ex...tributors.html
__________________________________________

"Tom Ellison" wrote:

Dear David:

As long a we can discuss things in a way that is not acrimoniuous, which
seems quite possible given your response.

I'll say there's nothing in your response I haven't heard before.

A natural key that is text may be longer in the dependent table than
putting
a surrogate key in the dependent table. On the other hand, you must add
the
identity surrogate column in the foreign key table, and you must index
this
additional column. There are trade-offs in the size of the database, and
it
is not always the case that there is any noticable difference in the size
of
the database. Also, I do not consider the size of the database to be a
critical factor. Hard disk space is not all that expensive. And size
does
not, in this case, equate to performance. I would say that "long integer
columns can be stored more efficiently" is fairly insignificant if what
you're saying is it saves disk space. If you're talking performance,
then
again there are significant trade-offs.

I quite disagree that repeating the data in the dependent table, even
with a
multi-column key, is a normalization issue. I do not agree that this
violates any of the rules of normalization. If you point out that it
repeats those columns that were in the foreign table, then I respond that
having an identity column in the foreign table and repeating that in the
dependent table must be a violation of the same rule. It seems obvious
then
that there is no such rule.

It is not "vastly more efficient". It is not uncommon to write a query
that
requires one of more columns to be displayed that are part of the natural
key from the foreign key table. If such columns are repeated in the
dependent table, then putting them on a report will not require a JOIN to
the foreign table. Since they already exist in the dependent table, you
can
retrive them directly. In this not uncommon case, there need be no
lookup
to retrieve those values.

I expect you are referring to avoiding cascade updates when you speak of
using natural PKs only "when the values will *never* change". The rules
of
good database design actually state that the natural key should rarely
change, not never, but the point is pretty much the same. In some of my
designs, such cascades events are recorded. A system can go for a few
months between such occurrances. So, I largely agree whith you, if you
will
permit.

There's quite a bit to this discussion, and I'm preparing a series, if
not a
book, about this very thing. Frankly, I find there are very cogent
arguments on both sides of the discussion. But, finally, I find that the
disagreement is entirely artificial. The real problem is that the issue
should have been addressed inside the design of the database engine (Jet,
SQL Server, etc.) and could then be entirely transparent to both users
and
designers. The facility provided by surrogate "identity" keys could have
been addressed within the engine, automatically, and this can be done
with
an efficiency vastly greater than either method implemented by we
database
designers. That is, the problem is artificial, and its best solution
lies
outside what you and I can effect.

I propose that, for a 1:M relationship (the most common) the database
engine
maintain the link between the foreign and dependent tables as an internal
pointer directly to the foreign table row. Unlike the autonumber, this
requires no index, so is several times faster. The engine must keep
track
of whether this link is valid or not, and use the lookup facility using
the
key when it is not, all the time fixing up those links that have been
invalidated when there is idle time on the server.

This would implement all the best features of both natural and surrogate
keys, and indeed out-perform either. Once viewed in this way, the whole
issue is not really a debate between database designers, but a failure of
database engine designers. The debate we are having is artificially
created
by a failure to build the database engine optimally.

Please do not think I have no understanding or sympathy for the point of
view you have proposed. I quite agree with you on some of the major
aspects. My opinions on this have been revolutionized in the past 2
years
as I've come to realize the underlying factors causing this debate.

I believe that, using the existing tools, there may be very good cause to
use surrogate keys for relationships. I'm not so sure this ever happens
in
a database less than 2 GB. In my upcoming paper on the subject, I have
created databases and contrasted there performances using both paradigms,
with both Jet and SQL Server. The results may surprise many.

Thanks for engaging in this discussion civilly. See you around!

Tom Ellison
Microsoft Access MVP

**David W. Fenton** · April 9th, 2007, 01:37 AM posted to microsoft.public.access

"Tom Ellison" wrote in
:

As long a we can discuss things in a way that is not acrimoniuous,
which seems quite possible given your response.

I don't know how to respond to that. Are you spoiling for a fight?

I'll say there's nothing in your response I haven't heard before.

I'm sure of that. But I still am baffled why anyone would still
think natural keys are the way to go except in a small number of
cases.

A natural key that is text may be longer in the dependent table
than putting a surrogate key in the dependent table. On the other
hand, you must add the identity surrogate column in the foreign
key table, and you must index this additional column.

You lost me. Assuming we're talking about a single-column PK, the
difference in storage space between a long integer and a text field
is going to be proportional to the difference between the storage
space required for the one vs. the other. Since both require storage
in two tables in order to map the relationship, and because both
have to be indexed, it seems to me that all you need to do is figure
out the difference between the size of the two data types as stored.
Only very small text fields are going to take up less space than a
long integer (4 bytes? or is it 8?).

So it seems quite obvious to me that natural keys are going to
always be longer, unless they are numeric data (in which case they
aren't really natural, as somebody has to maintain uniqueness of the
numeric values).

And the overhead of updating large indexes increases as the number
of records increases, regardless of which is the larger table in the
relationship (parent or child).

There are trade-offs in the size of the database, and it
is not always the case that there is any noticable difference in
the size of the database.

I would be much more concerned about the performance issues with the
larger index data pages that will be requuired for a text-based
natural key.

Also, I do not consider the size of the database to be a
critical factor. Hard disk space is not all that expensive. And
size does not, in this case, equate to performance.

In the case of indexes it certainly does, both in editing/updating
and in retrieval.

I would say that "long integer
columns can be stored more efficiently" is fairly insignificant if
what you're saying is it saves disk space. If you're talking
performance, then again there are significant trade-offs.

Ah! So you agree with what I've written above. I was only going with
"space" as the starting point because that's the ground you chose to
stake out in the discussion. I'm much more concerned with
performance, too, and that's a major reason why I would consider a
natural key only in a very limited number of instances.

I quite disagree that repeating the data in the dependent table,
even with a multi-column key, is a normalization issue. I do not
agree that this violates any of the rules of normalization. If
you point out that it repeats those columns that were in the
foreign table, then I respond that having an identity column in
the foreign table and repeating that in the dependent table must
be a violation of the same rule. It seems obvious then that there
is no such rule.

I don't know what theory says. I don't really care. What I do know
is that the more fields you have, the more they are likely to need
to CASCADE UPDATES. That means a *huge* performance issue, because
an update to any of the columns of the PK of the parent table will
need to update the corresponding column in all the records of the
child table. But with a surrogate key, which is never updated, you
never ever have that issue.

That you can keep the repeated data in synch at the engine level is
dependent on which engine you use. But you are updating data in
multiple locations. An autonumber surrogate key is never changed,
never updated, so the issue with keeping repeated data synchronized
simply vanishes.

To me, that's the main reason to normalize, so that you don't have
to maintain data in multiple locations. Even though it's usually
done at the engine level, multi-column natural keys still require
tye CPU cycles to keep the repeated data in synch.

It is not "vastly more efficient". It is not uncommon to write a
query that requires one of more columns to be displayed that are
part of the natural key from the foreign key table. If such
columns are repeated in the dependent table, then putting them on
a report will not require a JOIN to the foreign table. Since they
already exist in the dependent table, you can retrive them
directly. In this not uncommon case, there need be no lookup to
retrieve those values.

But SQL-based database engines are optimized to make such joins
every efficient. Secondly, in your version, the FK columns won't be
updatable, but in many views, the data the FK points to in the
parent table *will* be updatable.

I expect you are referring to avoiding cascade updates when you
speak of using natural PKs only "when the values will *never*
change". The rules of good database design actually state that
the natural key should rarely change, not never, but the point is
pretty much the same.

Well, now you get into the other major flaw of natural keys -- that
for many entities, there are no reasonable candidate keys. What
would you use as a natural key for a person? There is simply no
viable natural key to identify a person in any application that is
using real-world data and is not handing off the problem to some
other PK-assigning authority (like the Social Security
Administration, which is itself problematic, since, over time, that
PK gets re-used).

In some of my
designs, such cascades events are recorded. A system can go for a
few months between such occurrances. So, I largely agree whith
you, if you will permit.

There are lots of entities where the only candidate natural keys
will need to be updated often. And, of course, there can't be any
Null fields in the compound PK, which means storing default values
that will then have to be eliminated in reporting, etc.

There's quite a bit to this discussion, and I'm preparing a
series, if not a book, about this very thing. Frankly, I find
there are very cogent arguments on both sides of the discussion.
But, finally, I find that the disagreement is entirely artificial.
The real problem is that the issue should have been addressed
inside the design of the database engine (Jet, SQL Server, etc.)
and could then be entirely transparent to both users and
designers. The facility provided by surrogate "identity" keys
could have been addressed within the engine, automatically, and
this can be done with an efficiency vastly greater than either
method implemented by we database designers. That is, the problem
is artificial, and its best solution lies outside what you and I
can effect.

I'd be interested to see your thoughts on that at more length. It
intrigures me as a concept, and it's certainly a general principle
that I would support, but I'm afraid that in so many cases I'm not
happy when such things are actually done for me, behind the scenes.
Some examples: Name Autocorrect, Multi-value fields.

I propose that, for a 1:M relationship (the most common) the
database engine maintain the link between the foreign and
dependent tables as an internal pointer directly to the foreign
table row. Unlike the autonumber, this requires no index, so is
several times faster.

Do you really believe, for instance, that there is no index to the
data pages where memo fields are stored, so that retrieval of the
memo data with the record it belongs to is not quicker? There would
have to be internal structures doing the mapping, and they'd have to
be optimized for efficiency. Seems to me that it would just be a
bunch of hidden structures, as opposed to ones that you can create
and work with yourself.

The engine must keep track
of whether this link is valid or not, and use the lookup facility
using the key when it is not, all the time fixing up those links
that have been invalidated when there is idle time on the server.

Ack. That sounds terrible. What if the server never has idle time?
What if you're not using a server-based data engine? And what
happens to data requests between the start of the update and the
commit of it? Your proposal is adding a whole lot of new locking
issues, seems to me, and would be doing internally exactly what is
happening already in most database engines.

This would implement all the best features of both natural and
surrogate keys, and indeed out-perform either.

I really cannot see how. Perhaps you don't have this level of hand
waving in your planned long-form version of this, but I just don't
see how you get from where we are to what you propose without using
exactly the same kinds of structures we already use.

Once viewed in this way, the whole
issue is not really a debate between database designers, but a
failure of database engine designers. The debate we are having is
artificially created by a failure to build the database engine
optimally.

But the problem I see is that you'd be depending on the db engine to
determine something that is application-specific. That is,
uniqueness is a characteristic of the data used for a particular
purpose in a particular application, and natural keys in particular
are very dependent on that. I don't see how a database engine could
determine this without having an understanding of the way the entity
is being used in the particular application.

Please do not think I have no understanding or sympathy for the
point of view you have proposed. I quite agree with you on some
of the major aspects. My opinions on this have been
revolutionized in the past 2 years as I've come to realize the
underlying factors causing this debate.

I believe that, using the existing tools, there may be very good
cause to use surrogate keys for relationships. I'm not so sure
this ever happens in a database less than 2 GB. In my upcoming
paper on the subject, I have created databases and contrasted
there performances using both paradigms, with both Jet and SQL
Server. The results may surprise many.

Your proposed system would use surrogate GUIDs to maintain the
links. I can't see how it could work otherwise. And you'd still have
to maintain uniqueness of the actual attributes with indexes you
create yourself. So, I see absolutely no difference between the
current surrogate key type of RI and what you're proposing.

--
David W. Fenton http://www.dfenton.com/
usenet at dfenton dot com http://www.dfenton.com/DFA/

**David W. Fenton** · April 9th, 2007, 01:39 AM posted to microsoft.public.access

"Tom Ellison" wrote in
:

Are you talking about a 1:M relationship in which the foreign key
table is not indexed on the related column(s)? How can you
enforce uniqueness in the foreign table without a unique index?
But then you speak of HAVING a relationship without RI. Are you
speaking of a relationship in which both tables may have rows that
do not participate in the relationship? Without RI, you are
permitting that to happen!

I believe he was saying something very specific about Jet databases
and Access -- that when you create a relatinship, a hidden index is
created on the foreign key field.

--
David W. Fenton http://www.dfenton.com/
usenet at dfenton dot com http://www.dfenton.com/DFA/

**Tom Wickerath** · April 9th, 2007, 01:52 AM posted to microsoft.public.access

Exactly.

Tom Wickerath
Microsoft Access MVP
https://mvp.support.microsoft.com/profile/Tom
http://www.access.qbuilt.com/html/ex...tributors.html
__________________________________________

"David W. Fenton" wrote:

I believe he was saying something very specific about Jet databases
and Access -- that when you create a relatinship, a hidden index is
created on the foreign key field.

--
David W. Fenton http://www.dfenton.com/
usenet at dfenton dot com http://www.dfenton.com/DFA/

Thread Tools
Show Printable Version Email this Page
Display Modes
Linear Mode Switch to Hybrid Mode Switch to Threaded Mode