A Microsoft Office (Excel, Word) forum. OfficeFrustration

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

Go Back   Home » OfficeFrustration forum » Microsoft Access » Database Design
Site Map Home Register Authors List Search Today's Posts Mark Forums Read  

Restated: "Fields are expensive, records are cheap"



 
 
Thread Tools Display Modes
  #1  
Old February 25th, 2010, 05:51 AM posted to microsoft.public.access.tablesdbdesign
Dennis
external usenet poster
 
Posts: 1,222
Default Restated: "Fields are expensive, records are cheap"

Hi,

First let me apolozie for the empty question below. I hit the Post button
my mistake and it posted a blank question. Sorry.

Hi,

I am restating my question because based upon the responses I receive I
obviously stated my question incorrectly. So please let me try again.
Hopefully I will be a bit more sucessful this time around.


In a previous question, I stated that I added 30 fields the membership table
(one of many in the system). The initial implementation was SO successful,
that the user requested quite a few enhancements resulting in the in a HUGE
increase in the scope of the original project. These changes resulted in
the addition of 30 new fields to the membership table.

The last time people responded to my question, they were concerned about the
whether or not these were normalized fields. Please let me state that the
row has been normalized to 3rd normal form and these fields are NOT
re-occurring fields.

One MVP who responded to my original question stated "Fields are expensive,
records are cheap".

Maybe I am misinterpreting his comment. But to me this means that it is
better to have smaller row in multiple linked tables than it is to have one
large row that contains all of the normalized data. (IE – it is better to
split a large normalize row in a single table into multiple rows in multiple
tables). I hope I explained that the right way.


My question pertains to the difference in the disk access time it takes to
retrieve one record from the disk over a network versus retrieving multiple
records versus the overhead involved in Access parsing a lot of fields out of
a large row or parsing the same number of fields from multiple smaller rows.


I've always been taught the exact opposite - that "Fields are cheap, records
are expensive" since going to disk is so SLOW versus accessing data in
memory.

Is there something different about Access where the statement "Fields are
expense, records are cheap" is true? I'm using Access on local machine where
the front and backs end reside on the same machine as well as having multiple
front ends on each client's machine tied into the back end which resides on a
file server over a cat 5 hardwired Ethernet network.

My question is strictly concerning the data access time of multiple row over
the network versus Access’ overhead of extracting data from multiple small
rows versus one large row. And we are assuming a 3rd normal form database
design.

And it may well be that I am totally misinterpreting the “Fields are
expensive, records are cheap” comment.

Thank you for your comments.



Dennis

  #2  
Old February 25th, 2010, 06:37 AM posted to microsoft.public.access.tablesdbdesign
Allen Browne
external usenet poster
 
Posts: 11,706
Default Restated: "Fields are expensive, records are cheap"

The mantra in the subject line is purely a rule-of-thumb for beginning
database designers. It has no bearing on your question about data access.

Your question revolves around how the data is physically stored on the disk.
You're saying that there is a delay to move the disk heads to different
physical locations to retrieve records, and that delay represents degraded
performance.

While there's some truth in that simple view, the reality is much more
complex than that. Access uses pointers to store the data. If you define a
Text field as 255 characters but enter 2 characters in it, it does not use
up 510 bytes (unicode) of disk space to store those 2 characters.
Internally, the database keeps track of where each field starts in each
record. Some fields (memo, OLE, attachments, MVFs) are not actually stored
in-line with the other data in the record either. So it's not a matter of a
single record being faster to retrieve. There are many other factors at work
here, including whether the database has been compacted recently, whether
you are using page- or record-locking, whether the disk is badly fragmented,
how intelligent are the various layers of caching (physical on the drive, or
the controller, or the operating system, or in Access), the Rushmore
algorithms in JET, and so on. Then we may have to start another mindset to
handle SSD storage issues as well.

In the end, you don't really need to consider these as factors when
designing JET tables. It makes no sense to break a 60-field table down into
one main table with several one-to-one relationships just for performance
reasons. The management of those relationships alone represents unnecessary
complexity for the database and for developer. You may be forced to go that
route if you are subclassing, or if your record is too wide to fit into the
4K page size; otherwise don't even consider it (which is probably what you
were thinking in posting.)

Having said that, having 60 fields in one table is unusual. The database I
happen to be working on right now has 93 tables, and none of them has more
than 32 fields (including 6 for tracking when and by whom the record was
created, most recently edited, and deactivated.)

--
Allen Browne - Microsoft MVP. Perth, Western Australia
Tips for Access users - http://allenbrowne.com/tips.html
Reply to group, rather than allenbrowne at mvps dot org.


"Dennis" wrote in message
...
Hi,

First let me apolozie for the empty question below. I hit the Post button
my mistake and it posted a blank question. Sorry.

Hi,

I am restating my question because based upon the responses I receive I
obviously stated my question incorrectly. So please let me try again.
Hopefully I will be a bit more sucessful this time around.


In a previous question, I stated that I added 30 fields the membership
table
(one of many in the system). The initial implementation was SO
successful,
that the user requested quite a few enhancements resulting in the in a
HUGE
increase in the scope of the original project. These changes resulted in
the addition of 30 new fields to the membership table.

The last time people responded to my question, they were concerned about
the
whether or not these were normalized fields. Please let me state that the
row has been normalized to 3rd normal form and these fields are NOT
re-occurring fields.

One MVP who responded to my original question stated "Fields are
expensive,
records are cheap".

Maybe I am misinterpreting his comment. But to me this means that it is
better to have smaller row in multiple linked tables than it is to have
one
large row that contains all of the normalized data. (IE – it is better to
split a large normalize row in a single table into multiple rows in
multiple
tables). I hope I explained that the right way.


My question pertains to the difference in the disk access time it takes to
retrieve one record from the disk over a network versus retrieving
multiple
records versus the overhead involved in Access parsing a lot of fields out
of
a large row or parsing the same number of fields from multiple smaller
rows.


I've always been taught the exact opposite - that "Fields are cheap,
records
are expensive" since going to disk is so SLOW versus accessing data in
memory.

Is there something different about Access where the statement "Fields are
expense, records are cheap" is true? I'm using Access on local machine
where
the front and backs end reside on the same machine as well as having
multiple
front ends on each client's machine tied into the back end which resides
on a
file server over a cat 5 hardwired Ethernet network.

My question is strictly concerning the data access time of multiple row
over
the network versus Access’ overhead of extracting data from multiple small
rows versus one large row. And we are assuming a 3rd normal form database
design.

And it may well be that I am totally misinterpreting the “Fields are
expensive, records are cheap” comment.

Thank you for your comments.



Dennis

  #3  
Old February 25th, 2010, 07:22 AM posted to microsoft.public.access.tablesdbdesign
John W. Vinson
external usenet poster
 
Posts: 18,261
Default Restated: "Fields are expensive, records are cheap"

On Wed, 24 Feb 2010 20:51:01 -0800, Dennis
wrote:

The last time people responded to my question, they were concerned about the
whether or not these were normalized fields. Please let me state that the
row has been normalized to 3rd normal form and these fields are NOT
re-occurring fields.

One MVP who responded to my original question stated "Fields are expensive,
records are cheap".


That was me, and that (non-normal structure) was my concern. It is impossible
to tell from a brief post how knowledgable the poster might be about
relational design. As you well know, wide-flat, spreadsheetish, non-normalized
designs are a very common trap for beginners.

You're obviously not a beginner (now that I know more about your background!)
so my reply was out of line. My only excuse is that at the time I did NOT know
your level of skill. I apologize for jumping the gun.

What I consider "expensive" is that an improperly normalized table structure
will require far more work in getarounds, contorted queries, form and report
maintenance, and code than would a properly normalized design. Given that you
have (it seems) valid Entities with sixty or more discrete, atomic,
non-interdependent Attributes, I'll just express my mixed admiration and
sympathy and bow out.
--

John W. Vinson [MVP]
  #4  
Old February 25th, 2010, 08:25 AM posted to microsoft.public.access.tablesdbdesign
Dennis
external usenet poster
 
Posts: 1,222
Default Restated: "Fields are expensive, records are cheap"

Allen,

Thank you very much for the answering my question.

I understand the issue of disk access quite complex given all those issues
and SCSI versus SATA, non-raid vs different level of raids, speed of the
controller, network switches, routers, and all of the other issues that you
brought up just causes my head to hurt when I even try to think about them.
So you are correct, I did take a VERY simplistic approach in my question.

I did not realize that database actually kept track of where each field
starts and stop. The other variable length record databases I have worked
with just buried a start and stop marker in the record. So while the
"presentation" layer of the database that knew where each field started and
ended, the actual engine had no idea.

I will have to go back and check if it would be possible for all of the
fields to be filled out and possible exceeds the 4k record / row limit. I
did not know about the 4 k limit. Thank you for bringing that up!!!!! You
may have saved me again.

You are correct when you say I was thinking about breaking my 60 field
record into multiple tables. I interpreted the comment from the MVP as
suggesting I should break up 60 field record up into multiple tables. When
an MVP takes time to comment on something I doing, I do try to follow their
advice. It has not yet led me astray.

I have since been informed that the comment "Fields are expensive...” refers
to the dollar cost of added the field to the forms, reports, and the future
maintenance cost associated with that field. But that is another story.

My background is with large commercial insurance companies. All of the
companies have had relational databases designed to the 3rd normal form.
Between the policy information, reinsurance tracking, state, federal, rate
development information we have to maintain for each policy, the typical
policy master table had well over 200 fields in it. The policy master table
just had fields that were common to all policies! Any data that applied to a
specified line (auto, homeowners, etc) is stored in line specific master
policy field. Our coverage tables have over 100 fields. Our Claim master
table had over 100 fields in it.

So for me, 60 fields are actually pretty small. However, I will go back and
re-examine my design. There are things that I could put in a separate table,
but they really are properties of the member.

Thank you for responding and supplying all that wonderful information. It
gives me a better understanding of how Access works internally.


Thanks once again,

Dennis



  #5  
Old February 25th, 2010, 09:24 AM posted to microsoft.public.access.tablesdbdesign
Dennis
external usenet poster
 
Posts: 1,222
Default Restated: "Fields are expensive, records are cheap"



John,

I know it was you, but I did not feel it was appropriate for me to use your
name in my questions, hence “an MVP commented”.

John, no offence was taken by your comment. I did go back and re-read my
original posting and realized that I did not provide anywhere near enough
information for you to think of anything else. That was entirely my fault
for not being more detailed. I’m still having a problem drawing the line
between not enough detail and too much detail. Sorry about that. I agree
that when you read a brief posting, it is very difficult to tell what a
person’s background is. In trying to payback you MVP’s for answering my
questions, I have been trying to answer forum questions that I have
previously asked and received answers. I know it is impossible to tell from
a brief posting how knowledgeable a poster might be. Fully agree there.

I also agree with your comment about wide-flat spreadsheetish non-normalized
designs are. I responded to someone tonight about the training database.
They had quite a few issues with the tables. I fully understand the issue.

I had NO problem what so ever with your response and you have not reason at
all to apologize. Personally, I feel bad that you feel the way you do. I
never took offence at your comments.

While I have a lot of db experience, I am still a newbie when it comes to
Access. I thought that Access was like other database engines. While that
is somewhat true, it is mostly a false statement because the VBA code and
events work so differently from anything I’ve used. I’m still beating my
head against that learning cliff.

I interpreted your comment as implying that it was better to have a couple
of small master tables that one large master table. Since it was coming from
an MVP, I figured that I had better ask some more questions. Since I did not
know how to ask you directly, I just put it on the forum.

I agree with comment about a field being expensive in an improperly
normalized table structure resulting in all sorts of workarounds. 10 of the
30 new fields had to do with funeral information for a deceased member.
Since all of the information is specific to the individual member, I included
on the member table. You might be able to argue that it should go in it own
table, but for data entry and reporting simplicity I included it my member
table.

Actually, your comment did help me because I was reminded about Access 4k
record size limitation. I’m not sure if a completely filled out record will
excess 4k or not, I will have to check. So that alone will prevent a problem
from occurring.

If the potential record size will exceed 4k, I will break on the funeral
information into a separate table as that information is quite lengthy.

I do have one question for you about your “non-interdependent” field. What
do you mean by that? To me, a policy effective date and expiration date are
interdependent because the effective date has to be before the expiration
date. Also City, St, and zip seem to be interdependent. So, what do you
mean by non-interdependent?

Please, never bow out of one my questions. I have learned so much from you
and the other MVP that I would be on the loosing end of that deal.

Thank you for all of your assistance. I have no way to express my
appreciation except to say if you are ever in the Orlando, Fl area I would be
glad to give you and airboat ride in the whiles of the Florida swamps. I’ll
show you the real Florida.

Thanks again for all of your help.

Dennis

  #6  
Old February 25th, 2010, 07:40 PM posted to microsoft.public.access.tablesdbdesign
David W. Fenton
external usenet poster
 
Posts: 3,373
Default Restated: "Fields are expensive, records are cheap"

"Allen Browne" wrote in
:

While there's some truth in that simple view, the reality is much
more complex than that. Access uses pointers to store the data. If
you define a Text field as 255 characters but enter 2 characters
in it, it does not use up 510 bytes (unicode) of disk space to
store those 2 characters. Internally, the database keeps track of
where each field starts in each record. Some fields (memo, OLE,
attachments, MVFs) are not actually stored in-line with the other
data in the record either. So it's not a matter of a single record
being faster to retrieve. There are many other factors at work
here, including whether the database has been compacted recently,
whether you are using page- or record-locking, whether the disk is
badly fragmented, how intelligent are the various layers of
caching (physical on the drive, or the controller, or the
operating system, or in Access), the Rushmore algorithms in JET,
and so on. Then we may have to start another mindset to handle SSD
storage issues as well.


And none of this considers the issue of disk caching, such that
there's not that much difference between data loaded into RAM and
data that is being read from/written to disk, since the latter
usually takes place through the caching mechanism, and is not going
to be limited by the speed of the actual storage medium, but by RAM.
This has been the case in Windows since at least c. 1991-2, when
Windows 3.1 was released with disk caching turned on by default. It
was essential for decent performance in Windows, but it also meant
that your databases were going to be speeded up because of the disk
cache, too (although back then it was largely a read-only
improvement, as lazy writes and such had not been implemented in the
DOS disk cache; any modern version of Windows, though, caches both
reads and writes).

--
David W. Fenton http://www.dfenton.com/
usenet at dfenton dot com http://www.dfenton.com/DFA/
  #7  
Old February 25th, 2010, 07:45 PM posted to microsoft.public.access.tablesdbdesign
David W. Fenton
external usenet poster
 
Posts: 3,373
Default Restated: "Fields are expensive, records are cheap"

=?Utf-8?B?RGVubmlz?= wrote in
:

My background is with large commercial insurance companies. All
of the companies have had relational databases designed to the 3rd
normal form.


The determination of whether or not those tables were actually
normalized depends on the chosen definition of the entity being
modelled in the table. I would consder the 200-field table you
mention later to be unquestionably denormalized, even though I know
nothing at all about the content/function of those fields. That
sounds like a table that has a bunch of fields that are used only
for a single record type, so that an auto insurance policy has one
set of fields, but a renter's insurance policy has a different set
of fields. Any time you're using some fields for some records and
not for others, it's an indication to me that the entity has been
misdefined, and should probably be broken into at least two tables,
with a narrow header table and a long child table, where each row
stores what was formerly a field in the wide table.

All that said, my conclusion could be wrong for any particular
application. But "fields are expensive, rows are cheap" is a
generalized rule of thumb, not a hard-and-fast law of nature. It
allows for exceptions for certain purposes, but is a starting point
for evaluating a schema design.

--
David W. Fenton http://www.dfenton.com/
usenet at dfenton dot com http://www.dfenton.com/DFA/
  #8  
Old February 25th, 2010, 07:49 PM posted to microsoft.public.access.tablesdbdesign
David W. Fenton
external usenet poster
 
Posts: 3,373
Default Restated: "Fields are expensive, records are cheap"

=?Utf-8?B?RGVubmlz?= wrote in
:

I agree with comment about a field being expensive in an
improperly normalized table structure resulting in all sorts of
workarounds. 10 of the 30 new fields had to do with funeral
information for a deceased member. Since all of the information
is specific to the individual member, I included on the member
table. You might be able to argue that it should go in it own
table, but for data entry and reporting simplicity I included it
my member table.


That sounds like the type of data I'd put in a separate 1:1 table,
as it only applies once a particular threshold has been crossed. A
record in that table also means the person is deceased (though you
may not have the information and might still need to store a date of
death in the main record).

I wouldn't call that denormalized, but I have always found a certain
utility in using the 1:1 table for things that apply only after the
entity has reached a certain milestone.

However, I would likely avoid having multiple 1:1 records, though,
as it then becomes complicated to get a single editable row, unless
the 1:1 records are related in a logical chain that is modelled in
the database in that way.

--
David W. Fenton http://www.dfenton.com/
usenet at dfenton dot com http://www.dfenton.com/DFA/
  #9  
Old February 26th, 2010, 05:23 AM posted to microsoft.public.access.tablesdbdesign
Dennis
external usenet poster
 
Posts: 1,222
Default Restated: "Fields are expensive, records are cheap"

David,

I can see where disk caching would help in a sequential process, but does
disk caching really help in a randomly accessed database during data entry?

The first data entry might access the first record, the next the 1,000th
record, then next on the 5,000th record, and so on and so on. So, unless the
entire table is cached, does it really help?

Dennis
  #10  
Old February 26th, 2010, 05:27 AM posted to microsoft.public.access.tablesdbdesign
Dennis
external usenet poster
 
Posts: 1,222
Default Restated: "Fields are expensive, records are cheap"

David,


Hmmm, I see your point and kind of agree with it. My back ground in on
large and midrnage computers where it is nothing to have a 200 field, 30K
record.

However, I realize that Access is a different beast and I'm having to learn
to adjust for it restrictions. Thanks for the insight. Just more to think
about. But then I learn something new also.

Thanks,

Dennis


 




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Forum Jump


All times are GMT +1. The time now is 10:50 PM.


Powered by vBulletin® Version 3.6.4
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright 2004-2024 OfficeFrustration.
The comments are property of their posters.