Design question (normalization versus practical)

**Lars Brownies** · October 18th, 2009, 07:17 AM posted to microsoft.public.access

Thanks for this explanation with examples which helps me understand it
better.

Lars

"KenSheridan via AccessMonster.com" u51882@uwe schreef in bericht
news:9dbfb0fa2f3e1@uwe...
Lars:

Its true that a purpose of normalization is to eliminate redundancy, but
that
in itself is not the real objective. Redundancy allows for inconsistent
data
or update anomalies, so by its elimination the integrity of the data is
protected, which is the important thing.

An example of lack of proper normalization can be found in the sample
Northwind database's Customers table. You'll see that this has City,
Region
and Country columns so we are told numerous times that São Paulo is in SP
region (as is Resende) and that SP region is in Brazil. Not only does
this
require repetitive data entry, but more importantly it opens up the risk
of
inconsistent data, e.g. it would be perfectly possible to put São Paulo in
California in one row and California in Ireland! Proper normalization
would
prevent this as the fact that São Paulo is in SP region would be stored
only
once in the database as would the fact that SP region is in Brazil and
that
California is in the USA.

An example of what at first sight might seem to be redundancy, but in fact
is
not, can also be found in Northwind. The Products table and the
OrderDetails
table both have UnitPrice columns. It might be thought that the unit
price
of a product could always be looked up from the Products table, so its
unnecessary in Order Details. However, the unit price of a product will
change over time, but each order needs to retain the price in force at the
time the order was created. Consequently a UnitPrice column is needed in
both tables; that in products holds the current price and is used to get
the
value for that in Order Details (code in the ProductID control's
AfterUpdate
event procedure in the Order Details Subform does this), which then
remains
static when the current price (in products) changes. In each case
UnitPrice
is functionally dependent solely on the whole of the key of the table, so
there is no redundancy.

Its often said that having separate columns to represent different values
of
an attribute, e.g. if we were to have separate columns British and Irish
to
cater for hybrids like me (an absurd example, I know, but it serves to
illustrate the point), is an example of lack of normalization, but its
really more a case of a breach of Codd's Rule 1, The Information Rule,
which
as restated by Date in a slightly expanded form is:

'The entire information content of the database is represented in one and
only one way, namely as explicit values in column positions in rows in
tables'.
C J Date - Introduction to Database Systems; 7th Edition; 2000

Having a separate column for each nationality is what is known as (I think
these are Date's word's also) 'encoding data as column headings', i.e. it's
the names of the column which are the data values, 'British' and 'Irish'
rather than the data being stored as values of type Nationality in
separate
rows as Codd's rule requires.

It could be said that having the multiple columns means that the table is
not
in First Normal Form:

'A relvar is in 1NF if and only if, in every legal value of that relvar,
every tuple contains exactly one value for each attribute.'

But this requires that we regard the attribute as being Nationality, but
no
such column exists of course, so while the row (loosely speaking the
equivalent of a tuple) does contain more than one value of attribute
Nationality, can we say this is the case when no such column exists? This
is
why I feel that its not really a normalization issue, but a breach at the
more fundamental level of the Information Rule. Its an interesting
philosophical point, but however we regard it, I think we can conclude
that
its not a correct approach in terms of the database relational model.

Ken Sheridan
Stafford, England

Lars Brownies wrote:
Thanks Ken.

A 'surrogate' numeric key is only required if
the values in a column in a table are not distinct, e.g. a Cities table
needs
a CityID primary key as city names can be legitimately duplicated

I thought by using a surrogate numeric key in stead of the text itself,
you'd save database space.

Also, I though the second rule of normalization was to eliminate redundant
data, but I just reread and that's only applicable when the ID-number is
part of a multivalued key.

Nationality was a bad example, but you got my point :-)

Lars

I could go on.

[quoted text clipped - 5 lines]
many relationship (one modelling the relationship by resolving it into
two
one-to-many relationships).

A 'surrogate' numeric key is only required if
the values in a column in a table are not distinct, e.g. a Cities table
[quoted text clipped - 67 lines]

Thanks, Lars

--
Message posted via http://www.accessmonster.com

**John W. Vinson** · October 18th, 2009, 06:53 PM posted to microsoft.public.access

On Sun, 18 Oct 2009 08:14:13 +0200, "Lars Brownies" wrote:

Status, Entered_by, Category_1,
Category_2, etc.

BEEEEPPPP!

The above examples are probably *not* repeating fields.
Category_1, Category_2 almost certainly ARE repeating fields.

In this particular case the status is the phase persons are currently in
(regarding the proces for applying a job). It's a field for selections in
reports. There's currently no need to keep track of the status changes, just
one value for each person. However, I do feel it's better to make this a 1xM
relation and do keep track of the status changes but it will cost me a lot
more developing time. For instance people can't use the build-in filter
option
on the subform so I have to build extra forms/code to make this work. And I
have limited amount of time for this non paid job.

Entered_by is the user who initially enters the record. There can be only
one
user per record who does this. I also have a 1xm table in which users can
type in their remarks.

Status and Entered_By certainly sound like 1:n examples, and if you have a
lookup table to control the vocabulary using a combo box, with a relationship
with referential integrity enforced, then I agree that you don't need a many
to many resolver table.

Category_1 (in fact is Experience level) can have values from 1 to 3. A
person can only have one value and there's no need to keep track of changes
in this field.

But what about Category_2 and Category_3? THAT'S my objection. If you have
three different categories, isn't that ipso facto a many to many relationship?
Each Person can be in zero, one, two or three categories (one of them being
Experience Level, another being Skill Level, whatever); each category will
have many people. Aren't you in fact storing data ("what kind of category") in
a fieldname, rather than properly storing in a field?
--

John W. Vinson [MVP]

**Lars Brownies** · October 18th, 2009, 10:09 PM posted to microsoft.public.access

But what about Category_2 and Category_3? THAT'S my objection. If you have
three different categories, isn't that ipso facto a many to many
relationship?
Each Person can be in zero, one, two or three categories (one of them
being
Experience Level, another being Skill Level, whatever); each category will
have many people. Aren't you in fact storing data ("what kind of
category") in
a fieldname, rather than properly storing in a field?

OK, I got it. Thanks for sticking with me.

Lars

"John W. Vinson" schreef in bericht
...
On Sun, 18 Oct 2009 08:14:13 +0200, "Lars Brownies"
wrote:

Status, Entered_by, Category_1,
Category_2, etc.

BEEEEPPPP!

The above examples are probably *not* repeating fields.
Category_1, Category_2 almost certainly ARE repeating fields.

In this particular case the status is the phase persons are currently in
(regarding the proces for applying a job). It's a field for selections in
reports. There's currently no need to keep track of the status changes,
just
one value for each person. However, I do feel it's better to make this a
1xM
relation and do keep track of the status changes but it will cost me a lot
more developing time. For instance people can't use the build-in filter
option
on the subform so I have to build extra forms/code to make this work. And
I
have limited amount of time for this non paid job.

Entered_by is the user who initially enters the record. There can be only
one
user per record who does this. I also have a 1xm table in which users can
type in their remarks.

Status and Entered_By certainly sound like 1:n examples, and if you have a
lookup table to control the vocabulary using a combo box, with a
relationship
with referential integrity enforced, then I agree that you don't need a
many
to many resolver table.

Category_1 (in fact is Experience level) can have values from 1 to 3. A
person can only have one value and there's no need to keep track of
changes
in this field.

But what about Category_2 and Category_3? THAT'S my objection. If you have
three different categories, isn't that ipso facto a many to many
relationship?
Each Person can be in zero, one, two or three categories (one of them
being
Experience Level, another being Skill Level, whatever); each category will
have many people. Aren't you in fact storing data ("what kind of
category") in
a fieldname, rather than properly storing in a field?
--

John W. Vinson [MVP]

**James A. Fortune** · October 19th, 2009, 02:01 PM posted to microsoft.public.access

KenSheridan via AccessMonster.com wrote:
4. Querying the database is very much simpler.

I could go onâ€¦

I can believe that point for someone like yourself who can do complex
joins in his sleep. For most, properly normalizing the data will
increase the complexity of most of the queries required.

In:

http://groups.google.com/group/micro...0b387c35944260

I said:

"That's definitely great advice and a great article. However, the SQL
programmer also needs to be aware that doing things properly with a
many-to-many relationship can cause future SQL creation and maintenance
to become much more difficult if more than a few other tables start
getting joined in along with the many-to-many tables. It's not anything
that can't be overcome, but things can suddenly get a bit more complex
when you have, say, a few many-to-many joins in the mix.

For a small database, I think adding a single many-to-many relationship
is a very good way for an Access programmer to get used to the SQL that
she needs to create.

For a larger database with existing many-to-many relationships, I think
that the database planner should sit down with the schemata to convince
herself that she can create the queries that she knows she'll need in
the near future before adding another many-to-many relationship. That
way, the painful trade-offs that might come into play will become clearer.

That said, based on past experience, it's good to normalize as much as
one's SQL abilities and available time allow."

Normalization does a lot of wonderful things, but I don't count
simplifying queries as one of them. It's good that you continue to
stress the importance of normalization.

James A. Fortune

**KenSheridan via AccessMonster.com** · October 19th, 2009, 05:39 PM posted to microsoft.public.access

James:

Chris Date himself did say in an interview with Mark Whitehorn some years ago
that normalization can be taken too far. In the same interview he also said,
and I paraphrase here, "but itâ€™s the only bit of science we've got". By
'science' he means something which is capable of expression formally (and
therefore unambiguously). The trick, as I see it, is knowing when to stop,
and the principal criterion for that is adequate and realistic protection of
data integrity.

I've always thought it a great shame that Microsoft left the Customers table
in Northwind so badly lacking in normalization. Given that Northwind is
probably most new Access users' first encounter with a relational database
application, and that adding a few more tables is all that's required, itâ€™s a
very poor paradigm. It falls well short of the above criterion.

Ken Sheridan
Stafford, England

James A. Fortune wrote:
4. Querying the database is very much simpler.

I could go onâ€¦

I can believe that point for someone like yourself who can do complex
joins in his sleep. For most, properly normalizing the data will
increase the complexity of most of the queries required.

In:

http://groups.google.com/group/micro...0b387c35944260

I said:

"That's definitely great advice and a great article. However, the SQL
programmer also needs to be aware that doing things properly with a
many-to-many relationship can cause future SQL creation and maintenance
to become much more difficult if more than a few other tables start
getting joined in along with the many-to-many tables. It's not anything
that can't be overcome, but things can suddenly get a bit more complex
when you have, say, a few many-to-many joins in the mix.

For a small database, I think adding a single many-to-many relationship
is a very good way for an Access programmer to get used to the SQL that
she needs to create.

For a larger database with existing many-to-many relationships, I think
that the database planner should sit down with the schemata to convince
herself that she can create the queries that she knows she'll need in
the near future before adding another many-to-many relationship. That
way, the painful trade-offs that might come into play will become clearer.

That said, based on past experience, it's good to normalize as much as
one's SQL abilities and available time allow."

Normalization does a lot of wonderful things, but I don't count
simplifying queries as one of them. It's good that you continue to
stress the importance of normalization.

James A. Fortune

--
Message posted via AccessMonster.com
http://www.accessmonster.com/Uwe/For...ccess/200910/1

**Larry Linson** · October 19th, 2009, 06:04 PM posted to microsoft.public.access

I wouldn't want to argue with you, Jim, but I observe that ignoring
relational database design principles and proper normalization is a
prescription for hair removal (by the handful) sometime in the future. It
can make it downright ugly trying to retrieve and work with data, and
sometimes darn near impossible.

That said, as a practical matter, third normal form is about as far as most
business databases need to take normalization.

Larry Linson
Microsoft Office Access MVP

"James A. Fortune" wrote in message
...
KenSheridan via AccessMonster.com wrote:
4. Querying the database is very much simpler.

I could go on.

I can believe that point for someone like yourself who can do complex
joins in his sleep. For most, properly normalizing the data will increase
the complexity of most of the queries required.

In:

http://groups.google.com/group/micro...0b387c35944260

I said:

"That's definitely great advice and a great article. However, the SQL
programmer also needs to be aware that doing things properly with a
many-to-many relationship can cause future SQL creation and maintenance
to become much more difficult if more than a few other tables start
getting joined in along with the many-to-many tables. It's not anything
that can't be overcome, but things can suddenly get a bit more complex
when you have, say, a few many-to-many joins in the mix.

For a small database, I think adding a single many-to-many relationship
is a very good way for an Access programmer to get used to the SQL that
she needs to create.

For a larger database with existing many-to-many relationships, I think
that the database planner should sit down with the schemata to convince
herself that she can create the queries that she knows she'll need in
the near future before adding another many-to-many relationship. That
way, the painful trade-offs that might come into play will become clearer.

That said, based on past experience, it's good to normalize as much as
one's SQL abilities and available time allow."

Normalization does a lot of wonderful things, but I don't count
simplifying queries as one of them. It's good that you continue to stress
the importance of normalization.

James A. Fortune

**KenSheridan via AccessMonster.com** · October 19th, 2009, 07:03 PM posted to microsoft.public.access

Larry:

I've never liked this "3NF is usually good enough" view. Its good enough if
that's what the business model requires; if it requires normalization to a
higher level then that's what's good enough. I know its often said, but I
think it tends to make people think that's all they have to aim at regardless
of the underlying reality. No free lunches in this game!

Ken Sheridan
Stafford, England

Larry Linson wrote:
I wouldn't want to argue with you, Jim, but I observe that ignoring
relational database design principles and proper normalization is a
prescription for hair removal (by the handful) sometime in the future. It
can make it downright ugly trying to retrieve and work with data, and
sometimes darn near impossible.

That said, as a practical matter, third normal form is about as far as most
business databases need to take normalization.

Larry Linson
Microsoft Office Access MVP

4. Querying the database is very much simpler.

[quoted text clipped - 37 lines]
James A. Fortune

--
Message posted via AccessMonster.com
http://www.accessmonster.com/Uwe/For...ccess/200910/1

**James A. Fortune** · October 19th, 2009, 09:34 PM posted to microsoft.public.access

Larry Linson wrote:
I wouldn't want to argue with you, Jim, but I observe that ignoring
relational database design principles and proper normalization is a
prescription for hair removal (by the handful) sometime in the future. It
can make it downright ugly trying to retrieve and work with data, and
sometimes darn near impossible.

That said, as a practical matter, third normal form is about as far as most
business databases need to take normalization.

Larry Linson
Microsoft Office Access MVP

I've been there. I also take Ken's advice to heart. I rarely
flagellate myself over having too much normalization, except for when a
miracle must be performed. I note that the Entity Data Model (EDM)

http://en.wikipedia.org/wiki/Entity_Data_Model

has some concepts that at least try to address and solve some of the
problems caused by the growth of SQL complexity. I also note that,
based on personal experience, both under-normalization and
over-normalization have the potential to cause autodepilation. Third
normal form is a reasonable rule-of-thumb, but more is better, IMO, if
you can handle the SQL.

James A. Fortune

**KenSheridan via AccessMonster.com** · October 19th, 2009, 11:24 PM posted to microsoft.public.access

I must be doing something right. With my 63rd birthday in a couple of weeks
I still have a full head of hair. And its still mostly brown!

Ken Sheridan
Stafford, England

James A. Fortune wrote:
I wouldn't want to argue with you, Jim, but I observe that ignoring
relational database design principles and proper normalization is a
[quoted text clipped - 7 lines]
Larry Linson
Microsoft Office Access MVP

I've been there. I also take Ken's advice to heart. I rarely
flagellate myself over having too much normalization, except for when a
miracle must be performed. I note that the Entity Data Model (EDM)

http://en.wikipedia.org/wiki/Entity_Data_Model

has some concepts that at least try to address and solve some of the
problems caused by the growth of SQL complexity. I also note that,
based on personal experience, both under-normalization and
over-normalization have the potential to cause autodepilation. Third
normal form is a reasonable rule-of-thumb, but more is better, IMO, if
you can handle the SQL.

James A. Fortune

--
Message posted via AccessMonster.com
http://www.accessmonster.com/Uwe/For...ccess/200910/1

**David W. Fenton** · October 20th, 2009, 03:15 AM posted to microsoft.public.access

"KenSheridan via AccessMonster.com" u51882@uwe wrote in
news:9dd54edcbb631@uwe:

I've always thought it a great shame that Microsoft left the
Customers table in Northwind so badly lacking in normalization.

The worst possible examples for developing Access applications are
found in the sample databases that come with Access and in the ones
created from the database templates. It's always been a scandal, in
my opinion -- I've been ignoring them since about A97.

--
David W. Fenton http://www.dfenton.com/
usenet at dfenton dot com http://www.dfenton.com/DFA/

Thread Tools
Show Printable Version Email this Page
Display Modes
Linear Mode Switch to Hybrid Mode Switch to Threaded Mode