I was told "Fields are expensive, records are cheap"

**David W. Fenton** · February 25th, 2010, 05:22 PM posted to microsoft.public.access.tablesdbdesign

Rick Brandt wrote in
:

When table modifications call for lots of new fields it often
means that a one-to-many relationship that ought to be set up in
multiple tables is being shoe-horned into a single table.

It also means that you have to alter your front-end forms and
reports by adding all the fields, whereas if you're adding records
instead, whatever continuous or datasheet form you're using to
display those records will just have more records in it, so there is
no need to alter then user interface objects.

To *me*, that's what the phrase means, that adding fields is more
complicated in terms of UI, which, to be frank, is where 90% of our
time is spent in the development process (rather than in schema
design/changes).

--
David W. Fenton http://www.dfenton.com/
usenet at dfenton dot com http://www.dfenton.com/DFA/

**David W. Fenton** · February 25th, 2010, 05:32 PM posted to microsoft.public.access.tablesdbdesign

=?Utf-8?B?RGVubmlz?= wrote in
news

However, most personal lines and
commercial lines insurance policy and claim tables can get very
wide.

That suggests they are probably denormalized so some degree.

I know a number of popular applications (ACT springs to mind) use
very wide tables, and this means their application has to be more
complicated to work with the data. It also is one of the main causes
of the performance problems that ACT has in terms of filtering large
datasets -- it's harder to efficiently use indexes when you've got
the data denormalized (and some of the data in ACT (last time I
checked, which was many, many years ago) seems to be stored in
multi-value fields, which is yet another reason for performance
problems, since even if they are indexed, they can't be used
efficiently).

I will agree that a wide table *can* be normalized.

I will also agree that there can be perfectly good reasons to
denormalize.

But I have very few tables in any of my apps that have even close to
50 fields.

A common reason for denormalization that looks properly normalized
is mis-definition of the entities (i.e., a table holds records, each
of which represents an entity, and each field is an attribute of the
entity). In a survey, for instance, it's very common to create a
single record for each survey, with a field for each question. But
this approach is completely unextensible -- any change to the
questionaire requires a change to the table. Or you have to create a
table for each questionaire. Or you have to add fields that are used
in some questionaires and not in others.

I learned this lesson with a medical study database that was tied to
the structure of the UNOS national transplant registry. There was
only one medical study and for it to work the variables being
examined had to be established at the beginning, so it was never
going to change. It seemed logical to use a record per patient. But
before the app was even finished, UNOS changed the data structure of
how they recorded data, and we ended up having to add fields to the
existing tables and then change all the forms and reports to reflect
that. If we'd used a proper structure, i.e., with a record for each
variable, and a table that defined what the variables were, we would
have needed to make absolutely no changes to the *structure* of the
data or the application -- we would have been able to accomodate the
additional variables just by adding *data* to the existing tables.

That's the essence of the problem. Fields are expensive, records are
cheap -- it's easy to change the data, hard to change the data
storage *structure*.

--
David W. Fenton http://www.dfenton.com/
usenet at dfenton dot com http://www.dfenton.com/DFA/

**David W. Fenton** · February 25th, 2010, 05:37 PM posted to microsoft.public.access.tablesdbdesign

=?Utf-8?B?RGVubmlz?= wrote in
news

A program reads and writes rows (not fields) to the disk.

I think you're stuck in a very old-fashioned approach to this, one
that reflects the very constricted memory and storage available back
in the 80s and early 90s.

A Jet/ACE data store doesn't store data in records, but in data
pages. It's not like legacy systems at all. And the fields are
variable length, so records are variable length. The efficiency of
interaction with the storage and RAM are pretty moot since the days
of disk caching, when even a write is done to RAM (and then
persisted to disk when convenient). This means there's hardly any
difference between RAM manipulation and reading/writing from/to the
data store, since both are happening in RAM.

Disk caching has been a given on Windows since Win3.11, c. 1992.
This predates Access, in fact, so Access/Jet has never been as bound
to disk space/performance as the databases that seem to form the
model in your head.

And when a server database is involved, the bottleneck is not RAM
vs. disk, but the network connection, which is vastly slower than
not just the RAM, but even slower than disk access.

So, your model seems completely outdated to me.

--
David W. Fenton http://www.dfenton.com/
usenet at dfenton dot com http://www.dfenton.com/DFA/

**David W. Fenton** · February 25th, 2010, 05:52 PM posted to microsoft.public.access.tablesdbdesign

=?Utf-8?B?RGVubmlz?= wrote in
:

In another posting John clarified what he meant by "Fields are
expensive, records are cheap".

"What I consider "expensive" is that an improperly normalized
table structure will require far more work in getarounds,
contorted queries, form and report maintenance, and code than
would a properly normalized design. "

There's also the performance issue. If you need to query phone
numbers, in a normalized structure, you will be querying one field,
using one index, retrieving the data pages for a single index across
the wire.

With a denormalized structure, you may need to query multiple
fields, using multiple indexes, and pulling more index pages across
the wire. And if your table is denormalized and you need to add
indexes for repeating fields, you're much more likely run up against
the limitations of number of indexes in a single table, which for
Jet/ACE is 32. If you've got 4 phone number fields in your main
table, that's 4 additional indexes, which means you have only 27
left (since you have to have a PK, of course). If you normalize your
phone numbers, the index is in a completely different table, on a
single field, and it uses up none of the indexes in the main table,
leaving you with 31 available indexes.

Some argue that the UI for normalized data is hard to create, or
that if you use easy methods, you're stuck with subforms, and users
don't find subforms easy to use.

Well, tell me that users will have trouble with this:

http://dfenton.com/DFA/examples/SA/ContactInfo.jpg

That's two instances of a single subform, using an outer join
against the list of possible phone number types for each category of
phone numbers (office vs. home) so that the user just sees a flat
form listing all the phone number fields that can be filled out.

All of the phone numbers are stored in a single table, and records
are created there only when there is a number to be stored (email is
stored in the same table in fact, though it's not included in the
same subform, since it is presented differently because it has
different relevant data fields).

So, I think there are no valid objections to normalization.

That said, I don't always insist on normalized phone numbers,
because in most cases, nobody searches on them. The number of
necesssary numbers is also relatively finite -- while in the last 15
years we've added pagers and mobile phones (and now pagers are out
the window and fax is not going to be relevant 20 years from now),
there's not much else to add. In many of my apps, there's phone,
fax, mobile, email and a memo field for OTHER NUMBERS. This covers
everything, really.

Now, all that said, there's an argument to be made that the dropping
of pagers and the coming abandonment of faxes means that the
denormalized structure is flawed in that not only is it hard to
*add* new fields, it's also hard to remove obsolete ones. I would
say the problem with removing fields vs. adding them is
asymmetrical, since you don't have to touch the underlying data
structures -- you only have to alter the user interface
(forms/reports). That's substantially easier to do than adding
entirely new fields, so it's not nearly as bad a problem once you've
reached a number of repeating fields that covers all the reasonable
possibilities.

--
David W. Fenton http://www.dfenton.com/
usenet at dfenton dot com http://www.dfenton.com/DFA/

**David W. Fenton** · February 25th, 2010, 06:03 PM posted to microsoft.public.access.tablesdbdesign

=?Utf-8?B?RGVubmlz?= wrote in
:

You are correct is that I did not describe my table structure.
However, if I read someone had 30 years in relational db design
experience, I would have thought that he or she knew about
designing to 3rd normal form and would not have questioned them.

whether a design is normalized or not depends entirely on how you
define the entity being modelled in the table. If you define the
entity as "this table includes records where each record is a
completed survey" then you'll have a record with a field for each
survey question. It won't be denormalized, because each field is an
attribute of the survey.

But modelling your entity in that fashion is very rigid and
difficult to alter down the road.

If, instead, you consider the questions on the survey to be
variables, each one equal, then the single-record survey structure
is going to be denormalized, since you've got repeating fields
(variable 1, variable 2, variable 3 -- even though they may be named
Age, Sex, Eye Color).

For particular purposes, this may work very well. A Person table
doesn't need to be broken down into repeating variables, since most
people have the same attributes. But there are still plenty of ways
to slice and dice the data, such as whether you treat addresses and
phone numbers as repeating data or not (the clue is where you have
"Work Phone" and "Home Phone" you have repeating data, and "Work
Address" and "Home Address" you have the same thing). What you do
that is up to you -- there is no law that your data has to be
normalized. You choose the level of normalization that is
appropriate to the application you're building.

For surveys, there is no question in my mind that the normalized
structure with each variable as a row in a table instead of as a
field in a record is the only valid way to go, simpley because doing
otherwise makes it very, very difficult to change the survey, or to
store more than one survey.

In other cases, it's not necessarilyl the case that normalized
storage is going to be vastly superior in the long run.

Your particular application may be one where it is appropriate to
add 30 fields to a table (though that seems to indicate to me that
there are either repeating fields or the original design was really
inadequate and not carefully thought through -- not so much by the
database developer as by the client).

But as a general rule, fields are expensive and rows are cheap, and
any time you're contemplating adding a lot of fields to a table, it
should be a red flag that perhaps you should rethink the structure.
That doesn't mean you'll conclude that adding 30 fields to a single
table is wrong, just that you should evaluate the ultimate solution
carefully. Making the wrong decision is a lot harder to rectify
*after* you've implemented it, so taking the time to consider the
issue carefully is going to save loads of time later.

--
David W. Fenton http://www.dfenton.com/
usenet at dfenton dot com http://www.dfenton.com/DFA/

**David W. Fenton** · February 25th, 2010, 06:05 PM posted to microsoft.public.access.tablesdbdesign

=?Utf-8?B?RGVubmlz?= wrote in
:

Response: You are absolutely correct. Personally, I think the
maintenance number is higher. However, I did not interpret his
comment this way for the following reason. If you have to add a
data field so you can produce a report, what do you tell the user
ƒ " I canƒ Tt do it because that field will add 5-10% to the cost
and increases maintenance cost? What I have learned in my 30
years is the cost issue is NOT for the developer to decide, it is
for the user to decide. It is incumbent upon us to inform the
user of the cost, to develop a cost effective design, and maybe do
a cost benefit analysis if the numbers are big enough. However ,
if the user decides it is worth the cost, then it is not our place
to argue.

If you model the new fields as records in a table, in many, many
cases, there is no change needed to the app. Thus, the cost of
adding the new field is ZERO. If you model it as a new field in an
existing table, the cost is substantially larger.

What client, understanding the issues, would choose the $100
solution over the $0 solution? But if you as developer only offer
the denormalized structure, they never get that choice.

--
David W. Fenton http://www.dfenton.com/
usenet at dfenton dot com http://www.dfenton.com/DFA/

**David W. Fenton** · February 25th, 2010, 06:16 PM posted to microsoft.public.access.tablesdbdesign

BruceM gets right to the key point: how you decide to model your
entities determines what is normalized and what is not.

"BruceM via AccessMonster.com" u54429@uwe wrote in
news:a42adb3e4fe06@uwe:

To use your specific example of insurance processing, auto and
home insurance are so different that many developers would use a
different table for the two rather than broaden an existing auto
insurance table to accomodate home insurance. You could argue
that it is all insurance and therefore is an entity that belongs
in a single table.

I would argue that since both are insurance policies, you can use
the survey model, and for each attribute of the policy, you create a
record in a table connected to the main policy record. For the
different types of insurance policy, you'll have a different set of
variables, because they are different types, but by storing each one
as a row in a related table, you've made your structure extensible
so that you can model any type of insurance policy. And if you like,
you can use a single set of user interface objects to work with and
print all of them.

You could similarly argue that
employees and clients are both people, and therefore should be in
one table. Many developers would define "entity" more narrowly,
and therefore use separate tables, in a situation where many
fields are used for one type of entity and not the other, or where
the entities are clearly in different categories.

I long ago switched over to putting all people in the same table,
whether employees or clients. It seems obvious to me that if you
have two tables that have a whole set of fields that are indentical,
that you actually have a single entity stored in two different
tables. This is not to say it is not useful in some circumstances to
set up two separate tables for two instances of what are largely the
same entity. I can see no real benefit from keeping employees and
customers in the same table, though certainly employees might some
day become customers, and could be both at the same time, so there's
a certain justification -- it all depends on the business rules and
how much benefit there is to modelling them as the same or different
entitites.

I go with keeping all real-world entities in the same base table
(think how complicated it would get if you had an employees table
and a customers table and were normalizing your storage of phone
numbers and addresses -- how do you enforce RI if you don't
duplicate the exact same structure for both entity types?), and then
segregating them accordingly. I found this to be the best approach
well over a decade ago and have stuck with it and had no problems.
That doesn't mean others would reach the same conclusion for the
exact same applications, or that the single-entity model is
appropriate to all applications (or even most). It's entirely
dependent on the particular application and the particular
functionality of the app and the business rules in place. However,
given that those things can change drastically over time, I tend to
favor going as normalized as practical on the front end, since that
maximizes flexibility over the long run.

--
David W. Fenton http://www.dfenton.com/
usenet at dfenton dot com http://www.dfenton.com/DFA/

**Duane Hookom** · February 25th, 2010, 08:56 PM posted to microsoft.public.access.tablesdbdesign

Dennis,
Just a comment regarding a comment regarding a comment ;-)

I made the comment: I took over a survey application at a hospital from a
programmer with many more years of programming than I had at the time.

Your response: I donâ€™t understand what this comment has to do with the
subject.

This was all based on your statement regarding your "over 30 years
experience with relational databased".

Years of experience are not always the same as understanding and knowledge.
In your case, your knowledge may be much greater than 30 years. I'm not sure
how we would know that.

I was simply providing an example suggesting that I have followed developers
who had many years of experience without a basic understanding of properly
structured tables. In no way was I suggesting you don't understand
normalization.

--
Duane Hookom
Microsoft Access MVP

**Duane Hookom** · February 25th, 2010, 09:13 PM posted to microsoft.public.access.tablesdbdesign

Dennis,
To further my point about experienced developers I would like you to look at
this page on normalization.

http://www.fmsinc.com/tpapers/datanorm/index.html

I find it very wrong yet it is authored by a very respected developer.

It's my opinion the author transforms one non-normalized structure into
another non-normalized structure.

Why not create a more normalized structure that has fields for ID, Year,
AccountCode, and BudgetAmt?

--
Duane Hookom
Microsoft Access MVP

**Dennis** · February 26th, 2010, 05:22 AM posted to microsoft.public.access.tablesdbdesign

Bruce,

I though I state my point quite clearly in my original message

"I've always been taught the exact opposite - that "Fields are cheap,
records are expensive" since GOING TO DISK IS SO SLOW VERSUS ACCESSING DATA
IN MEMORY. Is there something different about Access where the statement
"Fields are expense, records are cheap" is true." (excuse the caps, but I
had not other way to highlight the text).

The original question has to do with disk access speed versus the speed of
extracting a lot of variable from a long record." and that is all. Everyone
one else is addressing a non-issue. The data structure had NOTHING to do
with the original question, it was simply interesting background information,
that is ALL.

The question was disk access speed over a network versus extracting data
from a long record. That was the whole question and instead of answering the
question, people are worried about 30 fields.

All the 30 year comment was suppose to mean was I don't do stupid mistakes
like repeating fields and that is ALL it meant. Nothing else! However,
everyone, including you, had read way more into that statement.

Your comment of: To use your specific example of insurance processing, auto
and home insurance are so different that many developers would use a
different table for the two rather than broaden an existing auto insurance
table to accommodate home
insurance.

Response: You are so wrong. I know because Iâ€™ve been working in the
insurance field for those 30 years. In that time Iâ€™ve worked on 12 packages
from 12 different vendors and they all have many things in common. One of
them is they all have a policy master table which contains one record per
policy every issued regardless of what line it was written under. And the
policy master usually has about 100 + fields in it. (A couple had 200 +
fields.) They all also had a line specified policy master where the specific
line information (auto, home, work comp, general liability, etc.) for that
policy was store. (There were a lot of other tables too.) Your comment
about the lines being so different is both correct and incorrect. There are
about 100 fields that are common between each line of business between the
policy information, reinsurance, earned and unearned premiums, statistical
reporting to the insurance bureaus, and other such things. Now, for the
information that is line specific, that information is indeed stored in
different tables.

Actually, if the developer chose to have multiple master tables he would be
creating himself a nightmare when came to policy inquiry, claim processing,
earn and unearned premium account, policy reports, renewal processing,
cancellation and reinstatement processing, and a whole long list of other
issues. But then again, that is where the 30 years of experience comes in.

But it is quite a coincidence that all 12 vendors who had very different
products (from flat files on a mainframe to a windows based client server and
everything in-between) and in some cases where written decades apart in time
all took the same approach.

Your comment: Many developers would define "entity" more narrowly, and
therefore use separate tables, in a situation where many fields are used for
one type of
entity and not the other, or where the entities are clearly in different
categories.

Response: You are absolutely correct. The entity, at least from 12
independent software houses that I know of, is the policy. The policy,
regardless of which line of business, has about 100 to 200 common fields.
(The reason for the huge difference is due to what the different system
capabilities. The more they could do, they more data they stored.). That is
why those fields are in the policy master file / table. The many fields at
are held in common, but are specific to each line of business are held in
line of business policy master files and they are all children to the main
policy master parent.

Dennis

Thread Tools
Show Printable Version Email this Page
Display Modes
Linear Mode Switch to Hybrid Mode Switch to Threaded Mode