If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. |
|
|
Thread Tools | Display Modes |
#11
|
|||
|
|||
Design question (normalization versus practical)
Thanks for this explanation with examples which helps me understand it
better. Lars "KenSheridan via AccessMonster.com" u51882@uwe schreef in bericht news:9dbfb0fa2f3e1@uwe... Lars: Its true that a purpose of normalization is to eliminate redundancy, but that in itself is not the real objective. Redundancy allows for inconsistent data or update anomalies, so by its elimination the integrity of the data is protected, which is the important thing. An example of lack of proper normalization can be found in the sample Northwind database's Customers table. You'll see that this has City, Region and Country columns so we are told numerous times that São Paulo is in SP region (as is Resende) and that SP region is in Brazil. Not only does this require repetitive data entry, but more importantly it opens up the risk of inconsistent data, e.g. it would be perfectly possible to put São Paulo in California in one row and California in Ireland! Proper normalization would prevent this as the fact that São Paulo is in SP region would be stored only once in the database as would the fact that SP region is in Brazil and that California is in the USA. An example of what at first sight might seem to be redundancy, but in fact is not, can also be found in Northwind. The Products table and the OrderDetails table both have UnitPrice columns. It might be thought that the unit price of a product could always be looked up from the Products table, so its unnecessary in Order Details. However, the unit price of a product will change over time, but each order needs to retain the price in force at the time the order was created. Consequently a UnitPrice column is needed in both tables; that in products holds the current price and is used to get the value for that in Order Details (code in the ProductID control's AfterUpdate event procedure in the Order Details Subform does this), which then remains static when the current price (in products) changes. In each case UnitPrice is functionally dependent solely on the whole of the key of the table, so there is no redundancy. Its often said that having separate columns to represent different values of an attribute, e.g. if we were to have separate columns British and Irish to cater for hybrids like me (an absurd example, I know, but it serves to illustrate the point), is an example of lack of normalization, but its really more a case of a breach of Codd's Rule 1, The Information Rule, which as restated by Date in a slightly expanded form is: 'The entire information content of the database is represented in one and only one way, namely as explicit values in column positions in rows in tables'. C J Date - Introduction to Database Systems; 7th Edition; 2000 Having a separate column for each nationality is what is known as (I think these are Date's word's also) 'encoding data as column headings', i.e. it's the names of the column which are the data values, 'British' and 'Irish' rather than the data being stored as values of type Nationality in separate rows as Codd's rule requires. It could be said that having the multiple columns means that the table is not in First Normal Form: 'A relvar is in 1NF if and only if, in every legal value of that relvar, every tuple contains exactly one value for each attribute.' But this requires that we regard the attribute as being Nationality, but no such column exists of course, so while the row (loosely speaking the equivalent of a tuple) does contain more than one value of attribute Nationality, can we say this is the case when no such column exists? This is why I feel that its not really a normalization issue, but a breach at the more fundamental level of the Information Rule. Its an interesting philosophical point, but however we regard it, I think we can conclude that its not a correct approach in terms of the database relational model. Ken Sheridan Stafford, England Lars Brownies wrote: Thanks Ken. A 'surrogate' numeric key is only required if the values in a column in a table are not distinct, e.g. a Cities table needs a CityID primary key as city names can be legitimately duplicated I thought by using a surrogate numeric key in stead of the text itself, you'd save database space. Also, I though the second rule of normalization was to eliminate redundant data, but I just reread and that's only applicable when the ID-number is part of a multivalued key. Nationality was a bad example, but you got my point :-) Lars I could go on. [quoted text clipped - 5 lines] many relationship (one modelling the relationship by resolving it into two one-to-many relationships). A 'surrogate' numeric key is only required if the values in a column in a table are not distinct, e.g. a Cities table [quoted text clipped - 67 lines] Thanks, Lars -- Message posted via http://www.accessmonster.com |
#12
|
|||
|
|||
Design question (normalization versus practical)
On Sun, 18 Oct 2009 08:14:13 +0200, "Lars Brownies" wrote:
Status, Entered_by, Category_1, Category_2, etc. BEEEEPPPP! The above examples are probably *not* repeating fields. Category_1, Category_2 almost certainly ARE repeating fields. In this particular case the status is the phase persons are currently in (regarding the proces for applying a job). It's a field for selections in reports. There's currently no need to keep track of the status changes, just one value for each person. However, I do feel it's better to make this a 1xM relation and do keep track of the status changes but it will cost me a lot more developing time. For instance people can't use the build-in filter option on the subform so I have to build extra forms/code to make this work. And I have limited amount of time for this non paid job. Entered_by is the user who initially enters the record. There can be only one user per record who does this. I also have a 1xm table in which users can type in their remarks. Status and Entered_By certainly sound like 1:n examples, and if you have a lookup table to control the vocabulary using a combo box, with a relationship with referential integrity enforced, then I agree that you don't need a many to many resolver table. Category_1 (in fact is Experience level) can have values from 1 to 3. A person can only have one value and there's no need to keep track of changes in this field. But what about Category_2 and Category_3? THAT'S my objection. If you have three different categories, isn't that ipso facto a many to many relationship? Each Person can be in zero, one, two or three categories (one of them being Experience Level, another being Skill Level, whatever); each category will have many people. Aren't you in fact storing data ("what kind of category") in a fieldname, rather than properly storing in a field? -- John W. Vinson [MVP] |
#13
|
|||
|
|||
Design question (normalization versus practical)
But what about Category_2 and Category_3? THAT'S my objection. If you have three different categories, isn't that ipso facto a many to many relationship? Each Person can be in zero, one, two or three categories (one of them being Experience Level, another being Skill Level, whatever); each category will have many people. Aren't you in fact storing data ("what kind of category") in a fieldname, rather than properly storing in a field? OK, I got it. Thanks for sticking with me. Lars "John W. Vinson" schreef in bericht ... On Sun, 18 Oct 2009 08:14:13 +0200, "Lars Brownies" wrote: Status, Entered_by, Category_1, Category_2, etc. BEEEEPPPP! The above examples are probably *not* repeating fields. Category_1, Category_2 almost certainly ARE repeating fields. In this particular case the status is the phase persons are currently in (regarding the proces for applying a job). It's a field for selections in reports. There's currently no need to keep track of the status changes, just one value for each person. However, I do feel it's better to make this a 1xM relation and do keep track of the status changes but it will cost me a lot more developing time. For instance people can't use the build-in filter option on the subform so I have to build extra forms/code to make this work. And I have limited amount of time for this non paid job. Entered_by is the user who initially enters the record. There can be only one user per record who does this. I also have a 1xm table in which users can type in their remarks. Status and Entered_By certainly sound like 1:n examples, and if you have a lookup table to control the vocabulary using a combo box, with a relationship with referential integrity enforced, then I agree that you don't need a many to many resolver table. Category_1 (in fact is Experience level) can have values from 1 to 3. A person can only have one value and there's no need to keep track of changes in this field. But what about Category_2 and Category_3? THAT'S my objection. If you have three different categories, isn't that ipso facto a many to many relationship? Each Person can be in zero, one, two or three categories (one of them being Experience Level, another being Skill Level, whatever); each category will have many people. Aren't you in fact storing data ("what kind of category") in a fieldname, rather than properly storing in a field? -- John W. Vinson [MVP] |
#14
|
|||
|
|||
Design question (normalization versus practical)
KenSheridan via AccessMonster.com wrote:
4. Querying the database is very much simpler. I could go on… I can believe that point for someone like yourself who can do complex joins in his sleep. For most, properly normalizing the data will increase the complexity of most of the queries required. In: http://groups.google.com/group/micro...0b387c35944260 I said: "That's definitely great advice and a great article. However, the SQL programmer also needs to be aware that doing things properly with a many-to-many relationship can cause future SQL creation and maintenance to become much more difficult if more than a few other tables start getting joined in along with the many-to-many tables. It's not anything that can't be overcome, but things can suddenly get a bit more complex when you have, say, a few many-to-many joins in the mix. For a small database, I think adding a single many-to-many relationship is a very good way for an Access programmer to get used to the SQL that she needs to create. For a larger database with existing many-to-many relationships, I think that the database planner should sit down with the schemata to convince herself that she can create the queries that she knows she'll need in the near future before adding another many-to-many relationship. That way, the painful trade-offs that might come into play will become clearer. That said, based on past experience, it's good to normalize as much as one's SQL abilities and available time allow." Normalization does a lot of wonderful things, but I don't count simplifying queries as one of them. It's good that you continue to stress the importance of normalization. James A. Fortune |
#15
|
|||
|
|||
Design question (normalization versus practical)
James:
Chris Date himself did say in an interview with Mark Whitehorn some years ago that normalization can be taken too far. In the same interview he also said, and I paraphrase here, "but it’s the only bit of science we've got". By 'science' he means something which is capable of expression formally (and therefore unambiguously). The trick, as I see it, is knowing when to stop, and the principal criterion for that is adequate and realistic protection of data integrity. I've always thought it a great shame that Microsoft left the Customers table in Northwind so badly lacking in normalization. Given that Northwind is probably most new Access users' first encounter with a relational database application, and that adding a few more tables is all that's required, it’s a very poor paradigm. It falls well short of the above criterion. Ken Sheridan Stafford, England James A. Fortune wrote: 4. Querying the database is very much simpler. I could go on… I can believe that point for someone like yourself who can do complex joins in his sleep. For most, properly normalizing the data will increase the complexity of most of the queries required. In: http://groups.google.com/group/micro...0b387c35944260 I said: "That's definitely great advice and a great article. However, the SQL programmer also needs to be aware that doing things properly with a many-to-many relationship can cause future SQL creation and maintenance to become much more difficult if more than a few other tables start getting joined in along with the many-to-many tables. It's not anything that can't be overcome, but things can suddenly get a bit more complex when you have, say, a few many-to-many joins in the mix. For a small database, I think adding a single many-to-many relationship is a very good way for an Access programmer to get used to the SQL that she needs to create. For a larger database with existing many-to-many relationships, I think that the database planner should sit down with the schemata to convince herself that she can create the queries that she knows she'll need in the near future before adding another many-to-many relationship. That way, the painful trade-offs that might come into play will become clearer. That said, based on past experience, it's good to normalize as much as one's SQL abilities and available time allow." Normalization does a lot of wonderful things, but I don't count simplifying queries as one of them. It's good that you continue to stress the importance of normalization. James A. Fortune -- Message posted via AccessMonster.com http://www.accessmonster.com/Uwe/For...ccess/200910/1 |
#16
|
|||
|
|||
Design question (normalization versus practical)
I wouldn't want to argue with you, Jim, but I observe that ignoring
relational database design principles and proper normalization is a prescription for hair removal (by the handful) sometime in the future. It can make it downright ugly trying to retrieve and work with data, and sometimes darn near impossible. That said, as a practical matter, third normal form is about as far as most business databases need to take normalization. Larry Linson Microsoft Office Access MVP "James A. Fortune" wrote in message ... KenSheridan via AccessMonster.com wrote: 4. Querying the database is very much simpler. I could go on. I can believe that point for someone like yourself who can do complex joins in his sleep. For most, properly normalizing the data will increase the complexity of most of the queries required. In: http://groups.google.com/group/micro...0b387c35944260 I said: "That's definitely great advice and a great article. However, the SQL programmer also needs to be aware that doing things properly with a many-to-many relationship can cause future SQL creation and maintenance to become much more difficult if more than a few other tables start getting joined in along with the many-to-many tables. It's not anything that can't be overcome, but things can suddenly get a bit more complex when you have, say, a few many-to-many joins in the mix. For a small database, I think adding a single many-to-many relationship is a very good way for an Access programmer to get used to the SQL that she needs to create. For a larger database with existing many-to-many relationships, I think that the database planner should sit down with the schemata to convince herself that she can create the queries that she knows she'll need in the near future before adding another many-to-many relationship. That way, the painful trade-offs that might come into play will become clearer. That said, based on past experience, it's good to normalize as much as one's SQL abilities and available time allow." Normalization does a lot of wonderful things, but I don't count simplifying queries as one of them. It's good that you continue to stress the importance of normalization. James A. Fortune |
#17
|
|||
|
|||
Design question (normalization versus practical)
Larry:
I've never liked this "3NF is usually good enough" view. Its good enough if that's what the business model requires; if it requires normalization to a higher level then that's what's good enough. I know its often said, but I think it tends to make people think that's all they have to aim at regardless of the underlying reality. No free lunches in this game! Ken Sheridan Stafford, England Larry Linson wrote: I wouldn't want to argue with you, Jim, but I observe that ignoring relational database design principles and proper normalization is a prescription for hair removal (by the handful) sometime in the future. It can make it downright ugly trying to retrieve and work with data, and sometimes darn near impossible. That said, as a practical matter, third normal form is about as far as most business databases need to take normalization. Larry Linson Microsoft Office Access MVP 4. Querying the database is very much simpler. [quoted text clipped - 37 lines] James A. Fortune -- Message posted via AccessMonster.com http://www.accessmonster.com/Uwe/For...ccess/200910/1 |
#18
|
|||
|
|||
Design question (normalization versus practical)
Larry Linson wrote:
I wouldn't want to argue with you, Jim, but I observe that ignoring relational database design principles and proper normalization is a prescription for hair removal (by the handful) sometime in the future. It can make it downright ugly trying to retrieve and work with data, and sometimes darn near impossible. That said, as a practical matter, third normal form is about as far as most business databases need to take normalization. Larry Linson Microsoft Office Access MVP I've been there. I also take Ken's advice to heart. I rarely flagellate myself over having too much normalization, except for when a miracle must be performed. I note that the Entity Data Model (EDM) http://en.wikipedia.org/wiki/Entity_Data_Model has some concepts that at least try to address and solve some of the problems caused by the growth of SQL complexity. I also note that, based on personal experience, both under-normalization and over-normalization have the potential to cause autodepilation. Third normal form is a reasonable rule-of-thumb, but more is better, IMO, if you can handle the SQL. James A. Fortune |
#19
|
|||
|
|||
Design question (normalization versus practical)
I must be doing something right. With my 63rd birthday in a couple of weeks
I still have a full head of hair. And its still mostly brown! Ken Sheridan Stafford, England James A. Fortune wrote: I wouldn't want to argue with you, Jim, but I observe that ignoring relational database design principles and proper normalization is a [quoted text clipped - 7 lines] Larry Linson Microsoft Office Access MVP I've been there. I also take Ken's advice to heart. I rarely flagellate myself over having too much normalization, except for when a miracle must be performed. I note that the Entity Data Model (EDM) http://en.wikipedia.org/wiki/Entity_Data_Model has some concepts that at least try to address and solve some of the problems caused by the growth of SQL complexity. I also note that, based on personal experience, both under-normalization and over-normalization have the potential to cause autodepilation. Third normal form is a reasonable rule-of-thumb, but more is better, IMO, if you can handle the SQL. James A. Fortune -- Message posted via AccessMonster.com http://www.accessmonster.com/Uwe/For...ccess/200910/1 |
#20
|
|||
|
|||
Design question (normalization versus practical)
"KenSheridan via AccessMonster.com" u51882@uwe wrote in
news:9dd54edcbb631@uwe: I've always thought it a great shame that Microsoft left the Customers table in Northwind so badly lacking in normalization. The worst possible examples for developing Access applications are found in the sample databases that come with Access and in the ones created from the database templates. It's always been a scandal, in my opinion -- I've been ignoring them since about A97. -- David W. Fenton http://www.dfenton.com/ usenet at dfenton dot com http://www.dfenton.com/DFA/ |
Thread Tools | |
Display Modes | |
|
|