A Microsoft Office (Excel, Word) forum. OfficeFrustration

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

Go Back   Home » OfficeFrustration forum » Microsoft Access » Database Design
Site Map Home Register Authors List Search Today's Posts Mark Forums Read  

I was told "Fields are expensive, records are cheap"



 
 
Thread Tools Display Modes
  #21  
Old February 26th, 2010, 06:37 AM posted to microsoft.public.access.tablesdbdesign
Dennis
external usenet poster
 
Posts: 1,222
Default I was told "Fields are expensive, records are cheap"

David,

Your comment: I would argue that since both are insurance policies, you can
use the survey model, and for each attribute of the policy, you create a
record in a table connected to the main policy record. For the different
types of insurance policy, you'll have a different set of variables, because
they are different types, but by storing each one as a row in a related
table, you've made your structure extensible so that you can model any type
of insurance policy. And if you like, you can use a single set of user
interface objects to work with and print all of them.

Response: I’ve seen this tried before and it was just too complicated. Not
only do you have the policy table, there is also the coverage master and line
specific coverage table. Then you have the objects that are being insured
tables (houses, warehouses, office buildings, bridges, coin collections,
people - work comp, people’s honesty in bonds, and and on).

Then you have all of the transactions that apply to a policy such as
on-line rating, policy issuance, automatic cancellation for non-pay,
automatic reinstatement for payment received before the cut off date, offer
to renew, automatic renewal, automatic non-renew. Then you have the
sequential endorsement (changes to the policy), that can be done at any level
(policy, coverage, limits, object being insured and the limits on those
specific objects). Then you have out of sequence endorsements where you have
to back off exiting endorsement to apply an endorsement from the past, and
then reapply the backed of endorsement taking into account that the out of
sequence endorsement may have removed the insurable object that a subsequent
endorsement want to change. When you add all of these transactions to all of
the different data, it becomes very very complicated to have a table drive
database.

Maybe there are more recently developed systems that take this approach. I
actually think it is actually a pretty good approach, just very machine
intensive. But with the faster CPU of recent times, it is much more
feasible.

Now that I think about it, one of the systems that I worked on used a
simplified version of what you are suggesting to enable the users to extend
their policy, coverage, and insurable object databases.


Dennis

  #22  
Old February 26th, 2010, 07:00 AM posted to microsoft.public.access.tablesdbdesign
Dennis
external usenet poster
 
Posts: 1,222
Default I was told "Fields are expensive, records are cheap"

BruceM:

Your comment: It is unusual to add 30 fields to a normalized table and
still maintain the normalization, yet you seem unwilling to describe the new
need for 30 fields.

Response: That is because these fields had NOTHING to do with the
questions! As repeated stated, and repeatedly missed the question concern
disk access speed over the network and the speed of ACCESS to extra values
from a big record. That was the question and the only questions. Allen
Browne was kind enough to answer it.

Your comment: But all we have to go on is that you have 30 years
experience, from which it seems we are to accept that the design is beyond
question.

Response: I’ve re-read this entire series of comments and no where in there
did I say “you were accept that the design is beyond question”. I wish you
would point out where I said this. But again, you are taking a truly tiny
bit of background for a question and making it a mountain. The issue you
want to discuss had NOTHING to do with my original question, if you had
bothered to read it. If I remember, in my response to Duane, I said that
I’ve clean up other people messes and I’m sure other people have come along
behind me and clean up message that I left behind.

Your Comment: Had you been willing to describe the database's structure it
could be that somebody here, many of whom are well-trained and very
experienced, could have suggested a different approach.

Response:
1. This question had nothing to do with the db design. As stated above,
I've always been taught the exact opposite - that "Fields are cheap, records
are expensive" since going to disk is so slow versus accessing data in memory.

This is a disk access verus memory access question.


Your comments: Had you been willing to describe the database's structure it
could be that somebody here, many of whom are well-trained and very
experienced, could have suggested a different approach. You could have
accepted or rejected the suggestions, but since that conversation did not
take place it seems you posted to argue with the contention that "fields are
expensive, records are cheap".

My response: Interesting comment. I have posted many database design
questions and I’ve either received no response from “many of whom are
well-trained and very experienced” or responses so vague that I’m left
scratching my head wondering what in the world they were talking about.
Sometime I was able to repost those question in other forums and receive
usable replies and other time I just had to come up with another approach.

I will say that though, I could not have climb partially up the Access
learning cliff with out the help of many people on this forum.

It has been my experience that posting something in the Database group is a
measure of last resort. This question is a prime example. People want to
talk about something that is not part of the question. All this time on
everyone’s part and NO ONE of the people who “are well-trained and very
experienced” answered the question! What a waste!





Dennis
  #23  
Old February 26th, 2010, 07:21 AM posted to microsoft.public.access.tablesdbdesign
Dennis
external usenet poster
 
Posts: 1,222
Default I was told "Fields are expensive, records are cheap"


David,

I want everyone to know that I stand corrected. In my response to Bruce I
stated that no one had answered the original question in this series of
correspondences. At that point I had not read your response.

Thank you for answering the question. I also reposted the question and
Allen Browne, you, and a few other responded to the re-post and answered the
disk access speed issue.


Your comment :A Jet/ACE data store doesn't store data in records, but in
data pages. It's not like legacy systems at all. And the fields are variable
length, so records are variable length. The efficiency of interaction with
the storage and RAM are pretty moot since the days of disk caching, when even
a write is done to RAM (and then persisted to disk when convenient). This
means there's hardly any difference between RAM manipulation and
reading/writing from/to the data store, since both are happening in RAM.

Response: I can see what you are saying, but that assumes that you are
accessing data records sequentially. What happens to all that disk caching
when I am randomly reading records which I would do during data entry (which
is where I am most concerned about disk access and network speed). The user
might need the 1st rcd, then the 1,000 rcd, then the 5,257th rcd, then the
10th rcd, etc. etc.). Unless the entire table is cached, was good does
loading a data page do if the record that you need next is not on that page?
And if the data is cached in a server, that does not good all because you
have the network bottle neck.

I know about write caching and it does not affect the speed of retrieving
data (unless you do not have enough memory and your machine is thrashing),
which is why is was not part of the question.

Your comment: And when a server database is involved, the bottleneck is not
RAM vs. disk, but the network connection, which is vastly slower than not
just the RAM, but even slower than disk access.

My response: I know the network speed is MUCH, MUCH slower which is why I
made it part of the question. Remember, my original interpretation of John’s
“Fields are expensive, records are cheap” had with do with disk access over a
network versus internal CPU process. I though he knew something about
internal process of Access where Access was very slow when processing large
records. I grossly misunderstood his comment. I thought he was saying it is
better to access the disk multiple times with small records that to have one
large Access record. He has since explained his comment and now I know it
had nothing to do with disk accessing speed.

Dennis

  #24  
Old February 26th, 2010, 07:39 AM posted to microsoft.public.access.tablesdbdesign
Dennis
external usenet poster
 
Posts: 1,222
Default I was told "Fields are expensive, records are cheap"

David,

Your comment “When table modifications call for lots of new fields it often
means that a one-to-many relationship that ought to be set up in
multiple tables is being shoe-horned into a single table.”

It also means that you have to alter your front-end forms and
reports by adding all the fields, whereas if you're adding records
instead, whatever continuous or datasheet form you're using to
display those records will just have more records in it, so there is
no need to alter then user interface objects.

Response: I don’t understand you comment. It is absolutely that that is I
add more records, I don’t’ have altered anything. But when the user asks
where to they enter these new fields and where the new reports are, what am I
suppose to say. Well, I did not do add those fields but you can enter more
data? That makes no sense at all.

I’ve yet to figure how to capture additional data and produce reports with
this additional data without modifying the UI or creating the new reports.
Sorry, I have no idea what you mean by the above statement.

Your comment: To *me*, that's what the phrase means, that adding fields is
more complicated in terms of UI, which, to be frank, is where 90% of our time
is spent in the development process (rather than in schema design/changes).

My response: You are preaching to the choir! I fully agree.

However, this project started out a simple membership mailing list. And
that is all it was. Once the users saw it working, their imagination went
wild and they wanted to start capturing additional information so they could
produce additional reports.

When this project started, I knew about the other report and strongly
suggested that they just capture the additional fields and produce all of the
report. However, the users kept saying no, no, no we don’t need at that
stuff. The process is working just fine. And the boss man said – just do the
mailing labels and that all. So that is what I did. I designed it for the
full requirements, but only implemented the mailing label piece.

But what are you supposed to do when the user, who is paying you, say no we
don’t need that when you know that are going to need it? You can only push
the customer so far.

But again, none of this had anything to do with the original disk access
speed vs. Access internal process speed question.


Dennis
  #25  
Old February 26th, 2010, 07:53 AM posted to microsoft.public.access.tablesdbdesign
Dennis
external usenet poster
 
Posts: 1,222
Default I was told "Fields are expensive, records are cheap"


Duane Hookom

Your comment: I made the comment: I took over a survey application at a
hospital from a programmer with many more years of programming than I had at
the time.

Your response: I don’t understand what this comment has to do with the
subject.

This was all based on your statement regarding your "over 30 years
experience with relational databased".

Years of experience are not always the same as understanding and knowledge.
In your case, your knowledge may be much greater than 30 years. I'm not sure
how we would know that.

I was simply providing an example suggesting that I have followed developers
who had many years of experience without a basic understanding of properly
structured tables. In no way was I suggesting you don't understand
normalization.


My response: Ah, you are preaching to the choir. My apologizes. I fully
understand what you are saying. It is very true that experience does not
always count for much. I too have come behind very experience people and had
to clean up the mess and convoluted logic. And as I said, I know that I’ve
had off weeks and left messes behind for someone else to clean up, much to my
embarrassment.

I’ve also had programmer with only a year or so of experience, but with a
fresh out look, come up with some very creative and elegant solutions (which
I doubt I would have arrived at). I’ve done that same thing when I’ve gone
to a new project with much more senior people. Sometimes that can not see the
forest for the trees.


Dennis

  #26  
Old February 26th, 2010, 07:56 AM posted to microsoft.public.access.tablesdbdesign
Dennis
external usenet poster
 
Posts: 1,222
Default I was told "Fields are expensive, records are cheap"

Duane,

I'm going to read this article with great interest. But I see you point.

Dennis
  #27  
Old February 26th, 2010, 02:17 PM posted to microsoft.public.access.tablesdbdesign
BruceM via AccessMonster.com
external usenet poster
 
Posts: 448
Default I was told "Fields are expensive, records are cheap"

Responses inline:

Your comment: It is unusual to add 30 fields to a normalized table and
still maintain the normalization, yet you seem unwilling to describe the new
need for 30 fields.

Response: That is because these fields had NOTHING to do with the
questions! As repeated stated, and repeatedly missed the question concern
disk access speed over the network and the speed of ACCESS to extra values
from a big record. That was the question and the only questions. Allen
Browne was kind enough to answer it.


The original question was about the "fields are expensive, records are cheap"
statement. I have always taken the statement to refer to such factors as
development time to add new fields, and not about hardware and network
considerations. I missed that your original question was targeted to
physical considerations not specifically related to design.


Your comment: But all we have to go on is that you have 30 years
experience, from which it seems we are to accept that the design is beyond
question.

Response: I’ve re-read this entire series of comments and no where in there
did I say “you were accept that the design is beyond question”. I wish you
would point out where I said this.


You wrote:
Response: You are correct is that I did not describe my table structure.
However, if I read someone had 30 years in relational db design experience, I

would have thought that he or she knew about designing to 3rd normal form and

would not have questioned them.

That sounded to me like you were saying the design should not be questioned.

I wrote:
To use your specific example of insurance processing, auto
and home insurance are so different that many developers would use a
different table for the two rather than broaden an existing auto insurance
table to accommodate home
insurance.

Your response: You are so wrong.

No, I am correct that many developers would use separate tables. You can
(and do) argue that the approach is misguided, but it's my turn to point out
it is a different issue from what I raised, which was about the practice of
other developers.

All this time on
everyone’s part and NO ONE of the people who “are well-trained and very
experienced” answered the question! What a waste!


I will write this off as frustration, since you seem to have thanked others,
especially Allen, for their responses.

This closes my participation in this thread. Good luck with the project.

--
Message posted via AccessMonster.com
http://www.accessmonster.com/Uwe/For...esign/201002/1

  #28  
Old February 26th, 2010, 11:49 PM posted to microsoft.public.access.tablesdbdesign
David W. Fenton
external usenet poster
 
Posts: 3,373
Default I was told "Fields are expensive, records are cheap"

=?Utf-8?B?RGVubmlz?= wrote in
:

Your comment :A Jet/ACE data store doesn't store data in records,
but in data pages. It's not like legacy systems at all. And the
fields are variable length, so records are variable length. The
efficiency of interaction with the storage and RAM are pretty moot
since the days of disk caching, when even a write is done to RAM
(and then persisted to disk when convenient). This means there's
hardly any difference between RAM manipulation and reading/writing
from/to the data store, since both are happening in RAM.

Response: I can see what you are saying, but that assumes that
you are accessing data records sequentially. What happens to all
that disk caching when I am randomly reading records which I
would do during data entry (which is where I am most concerned
about disk access and network speed). The user might need the 1st
rcd, then the 1,000 rcd, then the 5,257th rcd, then the 10th rcd,
etc. etc.). Unless the entire table is cached, was good does
loading a data page do if the record that you need next is not on
that page? And if the data is cached in a server, that does not
good all because you have the network bottle neck.


You seem not to have done much investigation of how Jet works.
Indexes are the key. When you request data from a table, Jet first
requests the relevant indexes (based on your criteria) and then uses
the indexes to figure out which data pages to retrieve. The
requested data is going to be brought into RAM for processing, and
since it's been requested from disk, it will be cached in Jet's
internal cache, as well as in the OS's disk cache. When you write to
the data, Jet will take care of writing to disk, which will then be
handled by the OS, and if the written data pages are in the disk
cache (and you are caching writes), the writes will be to the disk
cache and then will be persisted to disk according to your disk
cache settings (and you can also tell Jet explicitly to flush its
cache, which in turn tells the OS to flush its write cache, i.e.,
persist to disk).

This is all managed quite efficienctly, so efficiently, in fact,
that there is very little reason to worry about the things you are
worrying about until you reach extremely large datasets. That's
usually not what is being used by an Access application directly
(i.e., if you have a large dataset, you're likely going to use a
different back-end than Jet/ACE).

I know about write caching and it does not affect the speed of
retrieving data (unless you do not have enough memory and your
machine is thrashing), which is why is was not part of the
question.


Writing is much slower than reading because it involves so many
different operations (writing indexes, determining if new pages need
to be allocated), as well as maintaining as set of appropriate locks
(write locks are a much bigger deal than read locks).

But disk caching is a big deal for reading, too. As I explained
above about the way Jet works, the indexes are the key to retrieving
data efficiently. Data pages in Jet 4 and ACE are 4KBs in size, so
you're retrieving data from the disk in blocks whose size is very
often well-aligned with the file system's block size (the NTFS file
system uses default block sizes in increments of KBs, with 4KBs
being the smallest usual block size, and the next being 8, 16, etc.,
so reading 4KB data pages is likely to be very efficient, since they
will be aligned with the file system's block size). Jet retrieves
what it needs as needed, but only as much as needed.

A simplified explanation of how Jet works: this SQL statement:

SELECT tblPerson.LastName, tblPerson.FirstName, tblPerson.PersonID
FROM tblPerson
WHERE tblPerson.LastName = "Fenton"

....will first retrieve the index on LastName which will be keyed to
the Primary Key, so Jet will then retrieve the primary key index. In
both cases, only as much of the index as necessary is retrieved
(it's not a sequential scan, but a b-tree traversal), and in the
case of the primary key index, since Jet tables are clustered on the
primary key, I'm pretty sure the primary key index is actually the
real data pages (I could be wrong on that).

I don't know for certain if the non-PK indexes store a link to the
PK or if they directly reference the data page of the PK, but the
point is that it's not a sequential scan of anything, but a b-tree
index traversal, which is substantially more efficient.

--
David W. Fenton http://www.dfenton.com/
usenet at dfenton dot com http://www.dfenton.com/DFA/
  #29  
Old February 27th, 2010, 12:07 AM posted to microsoft.public.access.tablesdbdesign
David W. Fenton
external usenet poster
 
Posts: 3,373
Default I was told "Fields are expensive, records are cheap"

=?Utf-8?B?RGVubmlz?= wrote in
news
Your comment “When table modifications call for lots of new
fields it often means that a one-to-many relationship that ought
to be set up in multiple tables is being shoe-horned into a single
table.”

It also means that you have to alter your front-end forms and
reports by adding all the fields, whereas if you're adding records
instead, whatever continuous or datasheet form you're using to
display those records will just have more records in it, so there
is no need to alter then user interface objects.

Response: I don’t understand you comment. It is absolutely
that that is I add more records, I don’t’ have altered
anything. But when the user asks where to they enter these new
fields and where the new reports are, what am I suppose to say.
Well, I did not do add those fields but you can enter more data?
That makes no sense at all.


I suspect you haven't looked at Duane's survey database. In the
structure there (which is a fairly extreme version of what I'm
talking about), when you add a new question to a survey, you add a
new record to the questions table, and link it to a particular
survey. This will then be displayed by the same user interface that
was used to display the other questions.

In the structure you're suggestion, you'd have to add a field for
the question to your survey table and then alter your form for data
entry, and your report for printing out, and all of the tools you
use to analyze and summarize the results.

Now, a lot of statistical tools expect a flat-file structure, with
one record for each completed survey, so it's usually the case that
one takes the many-records structure and denormalizes it into a flat
structure that is exported for use by the statistics program. The
key there is that you don't choose your data structure based on how
you're going to output it. Instead, you choose the structure that is
going to be most extensible with the least amount of work, and then
you take that data and convert it to the structures required for
particular purposes, which is very often a denormalized structure.

I’ve yet to figure how to capture additional data and produce
reports with this additional data without modifying the UI or
creating the new reports. Sorry, I have no idea what you mean by
the above statement.


I strongly recommend you examine Duane's survey sample database.

Your comment: To *me*, that's what the phrase means, that adding
fields is more complicated in terms of UI, which, to be frank, is
where 90% of our time is spent in the development process (rather
than in schema design/changes).

My response: You are preaching to the choir! I fully agree.

However, this project started out a simple membership mailing
list. And that is all it was. Once the users saw it working,
their imagination went wild and they wanted to start capturing
additional information so they could produce additional reports.


I have plenty of databases that are badly structured because they
started out small, when it seemed it wasn't necessary to choose an
industrial-strength schema, and then when they got extended, I
greatly regretted not doing it the right way on the front end.

So, the existing app is no justification for continuing to use a
less than optimal schema.

When this project started, I knew about the other report and
strongly suggested that they just capture the additional fields
and produce all of the report. However, the users kept saying no,
no, no we don’t need at that stuff. The process is working just
fine. And the boss man said – just do the mailing labels and
that all. So that is what I did. I designed it for the full
requirements, but only implemented the mailing label piece.

But what are you supposed to do when the user, who is paying you,
say no we don’t need that when you know that are going to need
it? You can only push the customer so far.


The approach I'm advocating doesn't take more time to implement to
cover the initial requirements and is inherently extensible, without
any additional programming for the UI.

You've front-loaded the programming to handle the future extensions,
which is something I do, too, but the suggestion by those advocating
records over fields is that there is no extra work involved. You
don't have to do anything extra to prepare for the N new fields,
because the structure you've built handles any number of records
from the very beginning.

Have you ever implemented a many-to-many join when the current
requirement was only a many-to-one? Say the client has customers and
there's a customer type. You can store it in a field in the customer
record, but what if the client later decides that a customer can be
more than one customer type simultaneously? At that point you need a
many-to-many join between customers and customer types. If you
structured it that way from the beginning, it would handle 1 type
per customer, and then would handle as many customer types as you
needed when the scope expands.

This is the simplest example of the cheap records vs. expensive
fields equation. And the point is that as soon as the client needs
more than 1 type per customer, you have to decide how many you're
going to accomodate. You could, I guess, add a second customer type
field, but then when they need three, you'd have to add a third new
field. With the join table, you accomodate anything from 0 to
however many you could ever conceive of. It's not more complicated
to implement, and when the customer needs more than 1, there is
nothing that needs to be done (other than, perhaps, altering your UI
to remove any limitations you may have previously put on entry).

But again, none of this had anything to do with the original disk
access speed vs. Access internal process speed question.


That was your red herring. You asked for an explanation of the
expression. You misinterpreted it and went down the access speed
garden path. I'm explaining what the phrase refers to, and it is, in
fact, more efficient to add records than fields (consider the number
of indexes that have to be maintained/searched).

--
David W. Fenton http://www.dfenton.com/
usenet at dfenton dot com http://www.dfenton.com/DFA/
  #30  
Old February 27th, 2010, 12:11 AM posted to microsoft.public.access.tablesdbdesign
David W. Fenton
external usenet poster
 
Posts: 3,373
Default I was told "Fields are expensive, records are cheap"

=?Utf-8?B?RGVubmlz?= wrote in
:

Response: You are absolutely correct. The entity, at least from
12 independent software houses that I know of, is the policy. The
policy, regardless of which line of business, has about 100 to 200
common fields.


I would love to see one of those tables. My bet is that I could make
a much more extensible structure with none of the tables having more
than 30 fields and maybe only even one of them having that many
(most of the data would end up in tables that had only two or three
or four fields).

--
David W. Fenton http://www.dfenton.com/
usenet at dfenton dot com http://www.dfenton.com/DFA/
 




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Forum Jump


All times are GMT +1. The time now is 11:11 PM.


Powered by vBulletin® Version 3.6.4
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright 2004-2024 OfficeFrustration.
The comments are property of their posters.