If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. |
|
|
Thread Tools | Display Modes |
#11
|
|||
|
|||
Why do all my Relationships show as Indeterminate ?
You can join on any column, or set of columns, whether it is a primary key
or not. You can join on any key that is unique or not unique. True enough. What I said is that one of the keys in the relationship must be unique to prevent the indeterminate relationship that the OP was experiencing. In an M:M relationship, neither is unique, and therefore an autonumber is NEVER used in an M:M relationship. What??? I think you'd better have a look at the Northwind sample database. Consider the Products, Order Details and Orders tables. I'm sure you know that the Order Details table serves as the intersection table, to define a many-to-many relationship between Products and Orders. What type of primary key is being used in the Products and Orders tables? I see Autonumber in both tables. The key on the 1 side of a relationship is unique, but need not be an autonumber. I never said it needed to be an autonumber. I happen to have a strong preference for using autonumber, and I also know from past threads that you are in the opposite camp, with a preference for natural keys. Indeed, I can get more data into my databases by having fewer columns and fewer indexes. I think that you will find that you get LESS data into your database. Of course, it depends to some extent on the number of characters in your text-based PK field. Try the following experiment using the Northwind sample database, which includes a Text (5) data type for the PK / FK combination between the Customers and Orders tables. I used Access 2003, with the default Access 2000 file format: 1.) Create a brand new .mdb file. Turn off NameAutocorrect (we don't want to include any possible overhead due to name mappings). 2.) Import the Customers and Orders table from a clean copy of Northwind. Include the relationship with your import. 3.) Compact the database. I get a file that is 408 KB in size for the text-based PK/FK. ________________________________________ 4.) Copy the above database to a new filename. 5.) Add a new field to the Orders table: Number / Long Integer. Name it (fkCustomerID). Remove the default value of 0. 6.) Delete the relationship between the Customers and Orders tables. 7.) Remove the primary key from CustomerID. Add a new Autonumber data type (pkCustomerID) and set it as the primary key. 8.) Run the following update query: UPDATE Customers INNER JOIN Orders ON Customers.CustomerID = Orders.CustomerID SET Orders.fkCustomerID = [Customers].[pkCustomerID]; 9.) Delete the new update query (if you saved it). Remove the text-based CustomerID fields from both tables. 10.) Create a new relationship, with enforced RI, between the pkCustomerID and fkCustomerID fields. 11.) Compact the database. I get a file that is 372 KB in size for the Autonumber PK / Long Integer FK combination. Granted, this is a very small set of data (92 customer records and 830 orders). As the number of customer and order records increases, or as other related tables are added, the difference in size will only become greater, with the advantage being for the autonumber / long integer field type in the relationship. Tom Wickerath Microsoft Access MVP https://mvp.support.microsoft.com/profile/Tom http://www.access.qbuilt.com/html/ex...tributors.html __________________________________________ "Tom Ellison" wrote: Dear Tom: You can join on any column, or set of columns, whether it is a primary key or not. You can join on any key that is unique or not unique. If the table is on a 1 side of a relationship, the key will be unique, and possibly a primary key. If on the M side, it will not be enforced unique (it certainly can be unique by choice or coincidence in individual cases - there can even be 0 rows on the M side of a relationship). In an M:M relationship, neither is unique, and therefore an autonumber is NEVER used in an M:M relationship. Such tables can have an autonumber columns, they just won't be involved in the relationship. The key on the 1 side of a relationship is unique, but need not be an autonumber. Any unique key of 1 or more columns will work. And, again, any unique key will do. It need not be the primary key, although it quite often is. I quite commonly create entire, complex databases without using any autonumber columns, using the natural keys as the primary keys. I have not found there to be any disadvantage in doing this for fairly small databases, by which I mean any less than 2GB, the Jet limit. Indeed, I can get more data into my databases by having fewer columns and fewer indexes. I am working hard on a paper to present a discussion on the difference between natural key relationships and surrogate (autonumber) key relationships. I know this has been a hot button issue for some time, and my recent findings will be surprising. I favor the advantages of both camps, and am preparing to present a proposal that will give the advantages of both systems, indeed, which will give additional advantages available with neither approach. See you and all my MVP freinds soon in the MVP private newsgroups. Tom Ellison Microsoft Access MVP (6 years) |
#12
|
|||
|
|||
Why do all my Relationships show as Indeterminate ?
Hi, Tom.
Tom Ellison Microsoft Access MVP (6 years) Let's see . . . 21 Dec. 2001 until early or mid-2005 and again 1 Jan. 2007 until today. In Gunny math, that's only three or four fingers, not six. Or were you an MVP for two or three years before and then reselected again in December 2001? And why would you need to state how many years you've been an MVP in a newsgroup post? You've never done that before -- or since. No other MVP does that in this newsgroup, either. Did you feel it was necessary in order to justify giving a lesson in elementary relational database design to a fellow Access MVP who hasn't been an MVP as long as you have? Wouldn't you feel insulted if that happened to you? Gunny See http://www.QBuilt.com for all your database needs. See http://www.Access.QBuilt.com for Microsoft Access tips and tutorials. Blogs: www.DataDevilDog.BlogSpot.com, www.DatabaseTips.BlogSpot.com http://www.Access.QBuilt.com/html/ex...ributors2.html for contact info. "Tom Ellison" wrote in message ... Dear Tom: You can join on any column, or set of columns, whether it is a primary key or not. You can join on any key that is unique or not unique. If the table is on a 1 side of a relationship, the key will be unique, and possibly a primary key. If on the M side, it will not be enforced unique (it certainly can be unique by choice or coincidence in individual cases - there can even be 0 rows on the M side of a relationship). In an M:M relationship, neither is unique, and therefore an autonumber is NEVER used in an M:M relationship. Such tables can have an autonumber columns, they just won't be involved in the relationship. The key on the 1 side of a relationship is unique, but need not be an autonumber. Any unique key of 1 or more columns will work. And, again, any unique key will do. It need not be the primary key, although it quite often is. I quite commonly create entire, complex databases without using any autonumber columns, using the natural keys as the primary keys. I have not found there to be any disadvantage in doing this for fairly small databases, by which I mean any less than 2GB, the Jet limit. Indeed, I can get more data into my databases by having fewer columns and fewer indexes. I am working hard on a paper to present a discussion on the difference between natural key relationships and surrogate (autonumber) key relationships. I know this has been a hot button issue for some time, and my recent findings will be surprising. I favor the advantages of both camps, and am preparing to present a proposal that will give the advantages of both systems, indeed, which will give additional advantages available with neither approach. See you and all my MVP freinds soon in the MVP private newsgroups. Tom Ellison Microsoft Access MVP (6 years) "Tom Wickerath" AOS168b AT comcast DOT net wrote in message news Hi David, That should be "one or both of the fields is uniquely indexed", as you need both for a 1:1 relationship. Did you read the rest of my post, where I wrote, where I wrote: "For your 1:1 relationship, you can join two primary key fields, ... " Okay, so perhaps I should have wrote: "For your 1:1 relationship, you MUST join two primary key fields, ... " Since 1:1 relationships are fairly rare, compared to 1:M relationships, writing it as you suggest might cause more user frustration if they want to create a 1:M relationship and they tried using uniquely indexed fields on both sides of the join. Tom Wickerath Microsoft Access MVP https://mvp.support.microsoft.com/profile/Tom http://www.access.qbuilt.com/html/ex...tributors.html __________________________________________ "David W. Fenton" wrote: Tom Wickerath AOS168b AT comcast DOT net wrote in : 1.) Fields of the same data type (For example, Autonumber --- Long Integer) 2.) One of the fields is uniquely indexed (either set as a Primary key, or a unique index has been created first) That should be "one or both of the fields is uniquely indexed", as you need both for a 1:1 relationship. -- David W. Fenton http://www.dfenton.com/ usenet at dfenton dot com http://www.dfenton.com/DFA/ |
#13
|
|||
|
|||
Why do all my Relationships show as Indeterminate ?
Tom Wickerath AOS168b AT comcast DOT net wrote in
news That should be "one or both of the fields is uniquely indexed", as you need both for a 1:1 relationship. Did you read the rest of my post, where I wrote, where I wrote: "For your 1:1 relationship, you can join two primary key fields, ... " Apparently not! Okay, so perhaps I should have wrote: "For your 1:1 relationship, you MUST join two primary key fields, ... " I tend to read lists/bullet points as being inclusive and that's why I skipped the details at the end. Since 1:1 relationships are fairly rare, compared to 1:M relationships, writing it as you suggest might cause more user frustration if they want to create a 1:M relationship and they tried using uniquely indexed fields on both sides of the join. Perhaps. -- David W. Fenton http://www.dfenton.com/ usenet at dfenton dot com http://www.dfenton.com/DFA/ |
#14
|
|||
|
|||
Why do all my Relationships show as Indeterminate ?
"Tom Ellison" wrote in
: I quite commonly create entire, complex databases without using any autonumber columns, using the natural keys as the primary keys. I have not found there to be any disadvantage in doing this for fairly small databases, by which I mean any less than 2GB, the Jet limit. Indeed, I can get more data into my databases by having fewer columns and fewer indexes. Single-column natural keys *do* add overhead if they are text fields because long integer columns can be stored more efficiently. I also find the update issues very problematic. For multi-column natural keys, the surrogate key is vastly more efficient, since for the former, all the data has to be repeated, which rather defeats the purpose of normalization. I use natural PKs only for lookup tables, and that not very often -- only when I am absolutely certain the values will *never* change, ever. -- David W. Fenton http://www.dfenton.com/ usenet at dfenton dot com http://www.dfenton.com/DFA/ |
#15
|
|||
|
|||
Why do all my Relationships show as Indeterminate ?
Dear David:
As long a we can discuss things in a way that is not acrimoniuous, which seems quite possible given your response. I'll say there's nothing in your response I haven't heard before. A natural key that is text may be longer in the dependent table than putting a surrogate key in the dependent table. On the other hand, you must add the identity surrogate column in the foreign key table, and you must index this additional column. There are trade-offs in the size of the database, and it is not always the case that there is any noticable difference in the size of the database. Also, I do not consider the size of the database to be a critical factor. Hard disk space is not all that expensive. And size does not, in this case, equate to performance. I would say that "long integer columns can be stored more efficiently" is fairly insignificant if what you're saying is it saves disk space. If you're talking performance, then again there are significant trade-offs. I quite disagree that repeating the data in the dependent table, even with a multi-column key, is a normalization issue. I do not agree that this violates any of the rules of normalization. If you point out that it repeats those columns that were in the foreign table, then I respond that having an identity column in the foreign table and repeating that in the dependent table must be a violation of the same rule. It seems obvious then that there is no such rule. It is not "vastly more efficient". It is not uncommon to write a query that requires one of more columns to be displayed that are part of the natural key from the foreign key table. If such columns are repeated in the dependent table, then putting them on a report will not require a JOIN to the foreign table. Since they already exist in the dependent table, you can retrive them directly. In this not uncommon case, there need be no lookup to retrieve those values. I expect you are referring to avoiding cascade updates when you speak of using natural PKs only "when the values will *never* change". The rules of good database design actually state that the natural key should rarely change, not never, but the point is pretty much the same. In some of my designs, such cascades events are recorded. A system can go for a few months between such occurrances. So, I largely agree whith you, if you will permit. There's quite a bit to this discussion, and I'm preparing a series, if not a book, about this very thing. Frankly, I find there are very cogent arguments on both sides of the discussion. But, finally, I find that the disagreement is entirely artificial. The real problem is that the issue should have been addressed inside the design of the database engine (Jet, SQL Server, etc.) and could then be entirely transparent to both users and designers. The facility provided by surrogate "identity" keys could have been addressed within the engine, automatically, and this can be done with an efficiency vastly greater than either method implemented by we database designers. That is, the problem is artificial, and its best solution lies outside what you and I can effect. I propose that, for a 1:M relationship (the most common) the database engine maintain the link between the foreign and dependent tables as an internal pointer directly to the foreign table row. Unlike the autonumber, this requires no index, so is several times faster. The engine must keep track of whether this link is valid or not, and use the lookup facility using the key when it is not, all the time fixing up those links that have been invalidated when there is idle time on the server. This would implement all the best features of both natural and surrogate keys, and indeed out-perform either. Once viewed in this way, the whole issue is not really a debate between database designers, but a failure of database engine designers. The debate we are having is artificially created by a failure to build the database engine optimally. Please do not think I have no understanding or sympathy for the point of view you have proposed. I quite agree with you on some of the major aspects. My opinions on this have been revolutionized in the past 2 years as I've come to realize the underlying factors causing this debate. I believe that, using the existing tools, there may be very good cause to use surrogate keys for relationships. I'm not so sure this ever happens in a database less than 2 GB. In my upcoming paper on the subject, I have created databases and contrasted there performances using both paradigms, with both Jet and SQL Server. The results may surprise many. Thanks for engaging in this discussion civilly. See you around! Tom Ellison Microsoft Access MVP "David W. Fenton" wrote in message . 1... "Tom Ellison" wrote in : I quite commonly create entire, complex databases without using any autonumber columns, using the natural keys as the primary keys. I have not found there to be any disadvantage in doing this for fairly small databases, by which I mean any less than 2GB, the Jet limit. Indeed, I can get more data into my databases by having fewer columns and fewer indexes. Single-column natural keys *do* add overhead if they are text fields because long integer columns can be stored more efficiently. I also find the update issues very problematic. For multi-column natural keys, the surrogate key is vastly more efficient, since for the former, all the data has to be repeated, which rather defeats the purpose of normalization. I use natural PKs only for lookup tables, and that not very often -- only when I am absolutely certain the values will *never* change, ever. -- David W. Fenton http://www.dfenton.com/ usenet at dfenton dot com http://www.dfenton.com/DFA/ |
#16
|
|||
|
|||
Why do all my Relationships show as Indeterminate ?
Tom Ellison:
I'd like to respond to two points you made in your response to David. On the other hand, you must add the identity surrogate column in the foreign key table, and you must index this additional column. The last part of this sentence, "and you must index this additional column", is true *only* if you create a relationship WITHOUT enforcing referential integrity. Creating relationships without enforcing RI is little more than an exercise in drawing lines. If you will create the relationship with enforced RI, there is no sense in indexing the foreign key as well, because JET will automatically index this foreign key field as soon as you enforce RI. Note: This new index will not show up in the Indexes view, in table design, but you can reveal that it is present with some simple VBA code. Also, I do not consider the size of the database to be a critical factor. Hard disk space is not all that expensive. Perhaps not, but you did make the following statement earlier, which appeared to be a statement in support of using natural keys instead of surrogate (autonumber) keys: Indeed, I can get more data into my databases by having fewer columns and fewer indexes. Tom Wickerath Microsoft Access MVP https://mvp.support.microsoft.com/profile/Tom http://www.access.qbuilt.com/html/ex...tributors.html __________________________________________ "Tom Ellison" wrote: Dear David: As long a we can discuss things in a way that is not acrimoniuous, which seems quite possible given your response. I'll say there's nothing in your response I haven't heard before. A natural key that is text may be longer in the dependent table than putting a surrogate key in the dependent table. On the other hand, you must add the identity surrogate column in the foreign key table, and you must index this additional column. There are trade-offs in the size of the database, and it is not always the case that there is any noticable difference in the size of the database. Also, I do not consider the size of the database to be a critical factor. Hard disk space is not all that expensive. And size does not, in this case, equate to performance. I would say that "long integer columns can be stored more efficiently" is fairly insignificant if what you're saying is it saves disk space. If you're talking performance, then again there are significant trade-offs. I quite disagree that repeating the data in the dependent table, even with a multi-column key, is a normalization issue. I do not agree that this violates any of the rules of normalization. If you point out that it repeats those columns that were in the foreign table, then I respond that having an identity column in the foreign table and repeating that in the dependent table must be a violation of the same rule. It seems obvious then that there is no such rule. It is not "vastly more efficient". It is not uncommon to write a query that requires one of more columns to be displayed that are part of the natural key from the foreign key table. If such columns are repeated in the dependent table, then putting them on a report will not require a JOIN to the foreign table. Since they already exist in the dependent table, you can retrive them directly. In this not uncommon case, there need be no lookup to retrieve those values. I expect you are referring to avoiding cascade updates when you speak of using natural PKs only "when the values will *never* change". The rules of good database design actually state that the natural key should rarely change, not never, but the point is pretty much the same. In some of my designs, such cascades events are recorded. A system can go for a few months between such occurrances. So, I largely agree whith you, if you will permit. There's quite a bit to this discussion, and I'm preparing a series, if not a book, about this very thing. Frankly, I find there are very cogent arguments on both sides of the discussion. But, finally, I find that the disagreement is entirely artificial. The real problem is that the issue should have been addressed inside the design of the database engine (Jet, SQL Server, etc.) and could then be entirely transparent to both users and designers. The facility provided by surrogate "identity" keys could have been addressed within the engine, automatically, and this can be done with an efficiency vastly greater than either method implemented by we database designers. That is, the problem is artificial, and its best solution lies outside what you and I can effect. I propose that, for a 1:M relationship (the most common) the database engine maintain the link between the foreign and dependent tables as an internal pointer directly to the foreign table row. Unlike the autonumber, this requires no index, so is several times faster. The engine must keep track of whether this link is valid or not, and use the lookup facility using the key when it is not, all the time fixing up those links that have been invalidated when there is idle time on the server. This would implement all the best features of both natural and surrogate keys, and indeed out-perform either. Once viewed in this way, the whole issue is not really a debate between database designers, but a failure of database engine designers. The debate we are having is artificially created by a failure to build the database engine optimally. Please do not think I have no understanding or sympathy for the point of view you have proposed. I quite agree with you on some of the major aspects. My opinions on this have been revolutionized in the past 2 years as I've come to realize the underlying factors causing this debate. I believe that, using the existing tools, there may be very good cause to use surrogate keys for relationships. I'm not so sure this ever happens in a database less than 2 GB. In my upcoming paper on the subject, I have created databases and contrasted there performances using both paradigms, with both Jet and SQL Server. The results may surprise many. Thanks for engaging in this discussion civilly. See you around! Tom Ellison Microsoft Access MVP |
#17
|
|||
|
|||
Why do all my Relationships show as Indeterminate ?
Dear Tom:
Are you talking about a 1:M relationship in which the foreign key table is not indexed on the related column(s)? How can you enforce uniqueness in the foreign table without a unique index? But then you speak of HAVING a relationship without RI. Are you speaking of a relationship in which both tables may have rows that do not participate in the relationship? Without RI, you are permitting that to happen! Have you considered what happens to the performance of a query when a JOIN has no index it can use, and must perform a table scan for every instance of a row in the FK table? What kind of database design is this? My statement about having space in the database for more information was a statement about just that, space. With a 2 GB limit for Jet and MSDE, this can be an issue. My other statements were about performance, not about running out of room. These are sometimes competing requirements, and are certainly not the same thing. In many cases, it is desirable to trade space for performance. That is exactly the case when deciding to create indexes. Tom Ellison Microsoft Access MVP "Tom Wickerath" AOS168b AT comcast DOT net wrote in message news Tom Ellison: I'd like to respond to two points you made in your response to David. On the other hand, you must add the identity surrogate column in the foreign key table, and you must index this additional column. The last part of this sentence, "and you must index this additional column", is true *only* if you create a relationship WITHOUT enforcing referential integrity. Creating relationships without enforcing RI is little more than an exercise in drawing lines. If you will create the relationship with enforced RI, there is no sense in indexing the foreign key as well, because JET will automatically index this foreign key field as soon as you enforce RI. Note: This new index will not show up in the Indexes view, in table design, but you can reveal that it is present with some simple VBA code. Also, I do not consider the size of the database to be a critical factor. Hard disk space is not all that expensive. Perhaps not, but you did make the following statement earlier, which appeared to be a statement in support of using natural keys instead of surrogate (autonumber) keys: Indeed, I can get more data into my databases by having fewer columns and fewer indexes. Tom Wickerath Microsoft Access MVP https://mvp.support.microsoft.com/profile/Tom http://www.access.qbuilt.com/html/ex...tributors.html __________________________________________ "Tom Ellison" wrote: Dear David: As long a we can discuss things in a way that is not acrimoniuous, which seems quite possible given your response. I'll say there's nothing in your response I haven't heard before. A natural key that is text may be longer in the dependent table than putting a surrogate key in the dependent table. On the other hand, you must add the identity surrogate column in the foreign key table, and you must index this additional column. There are trade-offs in the size of the database, and it is not always the case that there is any noticable difference in the size of the database. Also, I do not consider the size of the database to be a critical factor. Hard disk space is not all that expensive. And size does not, in this case, equate to performance. I would say that "long integer columns can be stored more efficiently" is fairly insignificant if what you're saying is it saves disk space. If you're talking performance, then again there are significant trade-offs. I quite disagree that repeating the data in the dependent table, even with a multi-column key, is a normalization issue. I do not agree that this violates any of the rules of normalization. If you point out that it repeats those columns that were in the foreign table, then I respond that having an identity column in the foreign table and repeating that in the dependent table must be a violation of the same rule. It seems obvious then that there is no such rule. It is not "vastly more efficient". It is not uncommon to write a query that requires one of more columns to be displayed that are part of the natural key from the foreign key table. If such columns are repeated in the dependent table, then putting them on a report will not require a JOIN to the foreign table. Since they already exist in the dependent table, you can retrive them directly. In this not uncommon case, there need be no lookup to retrieve those values. I expect you are referring to avoiding cascade updates when you speak of using natural PKs only "when the values will *never* change". The rules of good database design actually state that the natural key should rarely change, not never, but the point is pretty much the same. In some of my designs, such cascades events are recorded. A system can go for a few months between such occurrances. So, I largely agree whith you, if you will permit. There's quite a bit to this discussion, and I'm preparing a series, if not a book, about this very thing. Frankly, I find there are very cogent arguments on both sides of the discussion. But, finally, I find that the disagreement is entirely artificial. The real problem is that the issue should have been addressed inside the design of the database engine (Jet, SQL Server, etc.) and could then be entirely transparent to both users and designers. The facility provided by surrogate "identity" keys could have been addressed within the engine, automatically, and this can be done with an efficiency vastly greater than either method implemented by we database designers. That is, the problem is artificial, and its best solution lies outside what you and I can effect. I propose that, for a 1:M relationship (the most common) the database engine maintain the link between the foreign and dependent tables as an internal pointer directly to the foreign table row. Unlike the autonumber, this requires no index, so is several times faster. The engine must keep track of whether this link is valid or not, and use the lookup facility using the key when it is not, all the time fixing up those links that have been invalidated when there is idle time on the server. This would implement all the best features of both natural and surrogate keys, and indeed out-perform either. Once viewed in this way, the whole issue is not really a debate between database designers, but a failure of database engine designers. The debate we are having is artificially created by a failure to build the database engine optimally. Please do not think I have no understanding or sympathy for the point of view you have proposed. I quite agree with you on some of the major aspects. My opinions on this have been revolutionized in the past 2 years as I've come to realize the underlying factors causing this debate. I believe that, using the existing tools, there may be very good cause to use surrogate keys for relationships. I'm not so sure this ever happens in a database less than 2 GB. In my upcoming paper on the subject, I have created databases and contrasted there performances using both paradigms, with both Jet and SQL Server. The results may surprise many. Thanks for engaging in this discussion civilly. See you around! Tom Ellison Microsoft Access MVP |
#18
|
|||
|
|||
Why do all my Relationships show as Indeterminate ?
"Tom Ellison" wrote in
: As long a we can discuss things in a way that is not acrimoniuous, which seems quite possible given your response. I don't know how to respond to that. Are you spoiling for a fight? I'll say there's nothing in your response I haven't heard before. I'm sure of that. But I still am baffled why anyone would still think natural keys are the way to go except in a small number of cases. A natural key that is text may be longer in the dependent table than putting a surrogate key in the dependent table. On the other hand, you must add the identity surrogate column in the foreign key table, and you must index this additional column. You lost me. Assuming we're talking about a single-column PK, the difference in storage space between a long integer and a text field is going to be proportional to the difference between the storage space required for the one vs. the other. Since both require storage in two tables in order to map the relationship, and because both have to be indexed, it seems to me that all you need to do is figure out the difference between the size of the two data types as stored. Only very small text fields are going to take up less space than a long integer (4 bytes? or is it 8?). So it seems quite obvious to me that natural keys are going to always be longer, unless they are numeric data (in which case they aren't really natural, as somebody has to maintain uniqueness of the numeric values). And the overhead of updating large indexes increases as the number of records increases, regardless of which is the larger table in the relationship (parent or child). There are trade-offs in the size of the database, and it is not always the case that there is any noticable difference in the size of the database. I would be much more concerned about the performance issues with the larger index data pages that will be requuired for a text-based natural key. Also, I do not consider the size of the database to be a critical factor. Hard disk space is not all that expensive. And size does not, in this case, equate to performance. In the case of indexes it certainly does, both in editing/updating and in retrieval. I would say that "long integer columns can be stored more efficiently" is fairly insignificant if what you're saying is it saves disk space. If you're talking performance, then again there are significant trade-offs. Ah! So you agree with what I've written above. I was only going with "space" as the starting point because that's the ground you chose to stake out in the discussion. I'm much more concerned with performance, too, and that's a major reason why I would consider a natural key only in a very limited number of instances. I quite disagree that repeating the data in the dependent table, even with a multi-column key, is a normalization issue. I do not agree that this violates any of the rules of normalization. If you point out that it repeats those columns that were in the foreign table, then I respond that having an identity column in the foreign table and repeating that in the dependent table must be a violation of the same rule. It seems obvious then that there is no such rule. I don't know what theory says. I don't really care. What I do know is that the more fields you have, the more they are likely to need to CASCADE UPDATES. That means a *huge* performance issue, because an update to any of the columns of the PK of the parent table will need to update the corresponding column in all the records of the child table. But with a surrogate key, which is never updated, you never ever have that issue. That you can keep the repeated data in synch at the engine level is dependent on which engine you use. But you are updating data in multiple locations. An autonumber surrogate key is never changed, never updated, so the issue with keeping repeated data synchronized simply vanishes. To me, that's the main reason to normalize, so that you don't have to maintain data in multiple locations. Even though it's usually done at the engine level, multi-column natural keys still require tye CPU cycles to keep the repeated data in synch. It is not "vastly more efficient". It is not uncommon to write a query that requires one of more columns to be displayed that are part of the natural key from the foreign key table. If such columns are repeated in the dependent table, then putting them on a report will not require a JOIN to the foreign table. Since they already exist in the dependent table, you can retrive them directly. In this not uncommon case, there need be no lookup to retrieve those values. But SQL-based database engines are optimized to make such joins every efficient. Secondly, in your version, the FK columns won't be updatable, but in many views, the data the FK points to in the parent table *will* be updatable. I expect you are referring to avoiding cascade updates when you speak of using natural PKs only "when the values will *never* change". The rules of good database design actually state that the natural key should rarely change, not never, but the point is pretty much the same. Well, now you get into the other major flaw of natural keys -- that for many entities, there are no reasonable candidate keys. What would you use as a natural key for a person? There is simply no viable natural key to identify a person in any application that is using real-world data and is not handing off the problem to some other PK-assigning authority (like the Social Security Administration, which is itself problematic, since, over time, that PK gets re-used). In some of my designs, such cascades events are recorded. A system can go for a few months between such occurrances. So, I largely agree whith you, if you will permit. There are lots of entities where the only candidate natural keys will need to be updated often. And, of course, there can't be any Null fields in the compound PK, which means storing default values that will then have to be eliminated in reporting, etc. There's quite a bit to this discussion, and I'm preparing a series, if not a book, about this very thing. Frankly, I find there are very cogent arguments on both sides of the discussion. But, finally, I find that the disagreement is entirely artificial. The real problem is that the issue should have been addressed inside the design of the database engine (Jet, SQL Server, etc.) and could then be entirely transparent to both users and designers. The facility provided by surrogate "identity" keys could have been addressed within the engine, automatically, and this can be done with an efficiency vastly greater than either method implemented by we database designers. That is, the problem is artificial, and its best solution lies outside what you and I can effect. I'd be interested to see your thoughts on that at more length. It intrigures me as a concept, and it's certainly a general principle that I would support, but I'm afraid that in so many cases I'm not happy when such things are actually done for me, behind the scenes. Some examples: Name Autocorrect, Multi-value fields. I propose that, for a 1:M relationship (the most common) the database engine maintain the link between the foreign and dependent tables as an internal pointer directly to the foreign table row. Unlike the autonumber, this requires no index, so is several times faster. Do you really believe, for instance, that there is no index to the data pages where memo fields are stored, so that retrieval of the memo data with the record it belongs to is not quicker? There would have to be internal structures doing the mapping, and they'd have to be optimized for efficiency. Seems to me that it would just be a bunch of hidden structures, as opposed to ones that you can create and work with yourself. The engine must keep track of whether this link is valid or not, and use the lookup facility using the key when it is not, all the time fixing up those links that have been invalidated when there is idle time on the server. Ack. That sounds terrible. What if the server never has idle time? What if you're not using a server-based data engine? And what happens to data requests between the start of the update and the commit of it? Your proposal is adding a whole lot of new locking issues, seems to me, and would be doing internally exactly what is happening already in most database engines. This would implement all the best features of both natural and surrogate keys, and indeed out-perform either. I really cannot see how. Perhaps you don't have this level of hand waving in your planned long-form version of this, but I just don't see how you get from where we are to what you propose without using exactly the same kinds of structures we already use. Once viewed in this way, the whole issue is not really a debate between database designers, but a failure of database engine designers. The debate we are having is artificially created by a failure to build the database engine optimally. But the problem I see is that you'd be depending on the db engine to determine something that is application-specific. That is, uniqueness is a characteristic of the data used for a particular purpose in a particular application, and natural keys in particular are very dependent on that. I don't see how a database engine could determine this without having an understanding of the way the entity is being used in the particular application. Please do not think I have no understanding or sympathy for the point of view you have proposed. I quite agree with you on some of the major aspects. My opinions on this have been revolutionized in the past 2 years as I've come to realize the underlying factors causing this debate. I believe that, using the existing tools, there may be very good cause to use surrogate keys for relationships. I'm not so sure this ever happens in a database less than 2 GB. In my upcoming paper on the subject, I have created databases and contrasted there performances using both paradigms, with both Jet and SQL Server. The results may surprise many. Your proposed system would use surrogate GUIDs to maintain the links. I can't see how it could work otherwise. And you'd still have to maintain uniqueness of the actual attributes with indexes you create yourself. So, I see absolutely no difference between the current surrogate key type of RI and what you're proposing. -- David W. Fenton http://www.dfenton.com/ usenet at dfenton dot com http://www.dfenton.com/DFA/ |
#19
|
|||
|
|||
Why do all my Relationships show as Indeterminate ?
"Tom Ellison" wrote in
: Are you talking about a 1:M relationship in which the foreign key table is not indexed on the related column(s)? How can you enforce uniqueness in the foreign table without a unique index? But then you speak of HAVING a relationship without RI. Are you speaking of a relationship in which both tables may have rows that do not participate in the relationship? Without RI, you are permitting that to happen! I believe he was saying something very specific about Jet databases and Access -- that when you create a relatinship, a hidden index is created on the foreign key field. -- David W. Fenton http://www.dfenton.com/ usenet at dfenton dot com http://www.dfenton.com/DFA/ |
#20
|
|||
|
|||
Why do all my Relationships show as Indeterminate ?
Exactly.
Tom Wickerath Microsoft Access MVP https://mvp.support.microsoft.com/profile/Tom http://www.access.qbuilt.com/html/ex...tributors.html __________________________________________ "David W. Fenton" wrote: I believe he was saying something very specific about Jet databases and Access -- that when you create a relatinship, a hidden index is created on the foreign key field. -- David W. Fenton http://www.dfenton.com/ usenet at dfenton dot com http://www.dfenton.com/DFA/ |
Thread Tools | |
Display Modes | |
|
|