If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. |
|
|
Thread Tools | Display Modes |
#21
|
|||
|
|||
I was told "Fields are expensive, records are cheap"
David,
Your comment: I would argue that since both are insurance policies, you can use the survey model, and for each attribute of the policy, you create a record in a table connected to the main policy record. For the different types of insurance policy, you'll have a different set of variables, because they are different types, but by storing each one as a row in a related table, you've made your structure extensible so that you can model any type of insurance policy. And if you like, you can use a single set of user interface objects to work with and print all of them. Response: I’ve seen this tried before and it was just too complicated. Not only do you have the policy table, there is also the coverage master and line specific coverage table. Then you have the objects that are being insured tables (houses, warehouses, office buildings, bridges, coin collections, people - work comp, people’s honesty in bonds, and and on). Then you have all of the transactions that apply to a policy such as on-line rating, policy issuance, automatic cancellation for non-pay, automatic reinstatement for payment received before the cut off date, offer to renew, automatic renewal, automatic non-renew. Then you have the sequential endorsement (changes to the policy), that can be done at any level (policy, coverage, limits, object being insured and the limits on those specific objects). Then you have out of sequence endorsements where you have to back off exiting endorsement to apply an endorsement from the past, and then reapply the backed of endorsement taking into account that the out of sequence endorsement may have removed the insurable object that a subsequent endorsement want to change. When you add all of these transactions to all of the different data, it becomes very very complicated to have a table drive database. Maybe there are more recently developed systems that take this approach. I actually think it is actually a pretty good approach, just very machine intensive. But with the faster CPU of recent times, it is much more feasible. Now that I think about it, one of the systems that I worked on used a simplified version of what you are suggesting to enable the users to extend their policy, coverage, and insurable object databases. Dennis |
#22
|
|||
|
|||
I was told "Fields are expensive, records are cheap"
BruceM:
Your comment: It is unusual to add 30 fields to a normalized table and still maintain the normalization, yet you seem unwilling to describe the new need for 30 fields. Response: That is because these fields had NOTHING to do with the questions! As repeated stated, and repeatedly missed the question concern disk access speed over the network and the speed of ACCESS to extra values from a big record. That was the question and the only questions. Allen Browne was kind enough to answer it. Your comment: But all we have to go on is that you have 30 years experience, from which it seems we are to accept that the design is beyond question. Response: I’ve re-read this entire series of comments and no where in there did I say “you were accept that the design is beyond question”. I wish you would point out where I said this. But again, you are taking a truly tiny bit of background for a question and making it a mountain. The issue you want to discuss had NOTHING to do with my original question, if you had bothered to read it. If I remember, in my response to Duane, I said that I’ve clean up other people messes and I’m sure other people have come along behind me and clean up message that I left behind. Your Comment: Had you been willing to describe the database's structure it could be that somebody here, many of whom are well-trained and very experienced, could have suggested a different approach. Response: 1. This question had nothing to do with the db design. As stated above, I've always been taught the exact opposite - that "Fields are cheap, records are expensive" since going to disk is so slow versus accessing data in memory. This is a disk access verus memory access question. Your comments: Had you been willing to describe the database's structure it could be that somebody here, many of whom are well-trained and very experienced, could have suggested a different approach. You could have accepted or rejected the suggestions, but since that conversation did not take place it seems you posted to argue with the contention that "fields are expensive, records are cheap". My response: Interesting comment. I have posted many database design questions and I’ve either received no response from “many of whom are well-trained and very experienced” or responses so vague that I’m left scratching my head wondering what in the world they were talking about. Sometime I was able to repost those question in other forums and receive usable replies and other time I just had to come up with another approach. I will say that though, I could not have climb partially up the Access learning cliff with out the help of many people on this forum. It has been my experience that posting something in the Database group is a measure of last resort. This question is a prime example. People want to talk about something that is not part of the question. All this time on everyone’s part and NO ONE of the people who “are well-trained and very experienced” answered the question! What a waste! Dennis |
#23
|
|||
|
|||
I was told "Fields are expensive, records are cheap"
David, I want everyone to know that I stand corrected. In my response to Bruce I stated that no one had answered the original question in this series of correspondences. At that point I had not read your response. Thank you for answering the question. I also reposted the question and Allen Browne, you, and a few other responded to the re-post and answered the disk access speed issue. Your comment :A Jet/ACE data store doesn't store data in records, but in data pages. It's not like legacy systems at all. And the fields are variable length, so records are variable length. The efficiency of interaction with the storage and RAM are pretty moot since the days of disk caching, when even a write is done to RAM (and then persisted to disk when convenient). This means there's hardly any difference between RAM manipulation and reading/writing from/to the data store, since both are happening in RAM. Response: I can see what you are saying, but that assumes that you are accessing data records sequentially. What happens to all that disk caching when I am randomly reading records which I would do during data entry (which is where I am most concerned about disk access and network speed). The user might need the 1st rcd, then the 1,000 rcd, then the 5,257th rcd, then the 10th rcd, etc. etc.). Unless the entire table is cached, was good does loading a data page do if the record that you need next is not on that page? And if the data is cached in a server, that does not good all because you have the network bottle neck. I know about write caching and it does not affect the speed of retrieving data (unless you do not have enough memory and your machine is thrashing), which is why is was not part of the question. Your comment: And when a server database is involved, the bottleneck is not RAM vs. disk, but the network connection, which is vastly slower than not just the RAM, but even slower than disk access. My response: I know the network speed is MUCH, MUCH slower which is why I made it part of the question. Remember, my original interpretation of John’s “Fields are expensive, records are cheap” had with do with disk access over a network versus internal CPU process. I though he knew something about internal process of Access where Access was very slow when processing large records. I grossly misunderstood his comment. I thought he was saying it is better to access the disk multiple times with small records that to have one large Access record. He has since explained his comment and now I know it had nothing to do with disk accessing speed. Dennis |
#24
|
|||
|
|||
I was told "Fields are expensive, records are cheap"
David,
Your comment “When table modifications call for lots of new fields it often means that a one-to-many relationship that ought to be set up in multiple tables is being shoe-horned into a single table.” It also means that you have to alter your front-end forms and reports by adding all the fields, whereas if you're adding records instead, whatever continuous or datasheet form you're using to display those records will just have more records in it, so there is no need to alter then user interface objects. Response: I don’t understand you comment. It is absolutely that that is I add more records, I don’t’ have altered anything. But when the user asks where to they enter these new fields and where the new reports are, what am I suppose to say. Well, I did not do add those fields but you can enter more data? That makes no sense at all. I’ve yet to figure how to capture additional data and produce reports with this additional data without modifying the UI or creating the new reports. Sorry, I have no idea what you mean by the above statement. Your comment: To *me*, that's what the phrase means, that adding fields is more complicated in terms of UI, which, to be frank, is where 90% of our time is spent in the development process (rather than in schema design/changes). My response: You are preaching to the choir! I fully agree. However, this project started out a simple membership mailing list. And that is all it was. Once the users saw it working, their imagination went wild and they wanted to start capturing additional information so they could produce additional reports. When this project started, I knew about the other report and strongly suggested that they just capture the additional fields and produce all of the report. However, the users kept saying no, no, no we don’t need at that stuff. The process is working just fine. And the boss man said – just do the mailing labels and that all. So that is what I did. I designed it for the full requirements, but only implemented the mailing label piece. But what are you supposed to do when the user, who is paying you, say no we don’t need that when you know that are going to need it? You can only push the customer so far. But again, none of this had anything to do with the original disk access speed vs. Access internal process speed question. Dennis |
#25
|
|||
|
|||
I was told "Fields are expensive, records are cheap"
Duane Hookom Your comment: I made the comment: I took over a survey application at a hospital from a programmer with many more years of programming than I had at the time. Your response: I don’t understand what this comment has to do with the subject. This was all based on your statement regarding your "over 30 years experience with relational databased". Years of experience are not always the same as understanding and knowledge. In your case, your knowledge may be much greater than 30 years. I'm not sure how we would know that. I was simply providing an example suggesting that I have followed developers who had many years of experience without a basic understanding of properly structured tables. In no way was I suggesting you don't understand normalization. My response: Ah, you are preaching to the choir. My apologizes. I fully understand what you are saying. It is very true that experience does not always count for much. I too have come behind very experience people and had to clean up the mess and convoluted logic. And as I said, I know that I’ve had off weeks and left messes behind for someone else to clean up, much to my embarrassment. I’ve also had programmer with only a year or so of experience, but with a fresh out look, come up with some very creative and elegant solutions (which I doubt I would have arrived at). I’ve done that same thing when I’ve gone to a new project with much more senior people. Sometimes that can not see the forest for the trees. Dennis |
#26
|
|||
|
|||
I was told "Fields are expensive, records are cheap"
Duane,
I'm going to read this article with great interest. But I see you point. Dennis |
#27
|
|||
|
|||
I was told "Fields are expensive, records are cheap"
Responses inline:
Your comment: It is unusual to add 30 fields to a normalized table and still maintain the normalization, yet you seem unwilling to describe the new need for 30 fields. Response: That is because these fields had NOTHING to do with the questions! As repeated stated, and repeatedly missed the question concern disk access speed over the network and the speed of ACCESS to extra values from a big record. That was the question and the only questions. Allen Browne was kind enough to answer it. The original question was about the "fields are expensive, records are cheap" statement. I have always taken the statement to refer to such factors as development time to add new fields, and not about hardware and network considerations. I missed that your original question was targeted to physical considerations not specifically related to design. Your comment: But all we have to go on is that you have 30 years experience, from which it seems we are to accept that the design is beyond question. Response: I’ve re-read this entire series of comments and no where in there did I say “you were accept that the design is beyond question”. I wish you would point out where I said this. You wrote: Response: You are correct is that I did not describe my table structure. However, if I read someone had 30 years in relational db design experience, I would have thought that he or she knew about designing to 3rd normal form and would not have questioned them. That sounded to me like you were saying the design should not be questioned. I wrote: To use your specific example of insurance processing, auto and home insurance are so different that many developers would use a different table for the two rather than broaden an existing auto insurance table to accommodate home insurance. Your response: You are so wrong. No, I am correct that many developers would use separate tables. You can (and do) argue that the approach is misguided, but it's my turn to point out it is a different issue from what I raised, which was about the practice of other developers. All this time on everyone’s part and NO ONE of the people who “are well-trained and very experienced” answered the question! What a waste! I will write this off as frustration, since you seem to have thanked others, especially Allen, for their responses. This closes my participation in this thread. Good luck with the project. -- Message posted via AccessMonster.com http://www.accessmonster.com/Uwe/For...esign/201002/1 |
#28
|
|||
|
|||
I was told "Fields are expensive, records are cheap"
=?Utf-8?B?RGVubmlz?= wrote in
: Your comment :A Jet/ACE data store doesn't store data in records, but in data pages. It's not like legacy systems at all. And the fields are variable length, so records are variable length. The efficiency of interaction with the storage and RAM are pretty moot since the days of disk caching, when even a write is done to RAM (and then persisted to disk when convenient). This means there's hardly any difference between RAM manipulation and reading/writing from/to the data store, since both are happening in RAM. Response: I can see what you are saying, but that assumes that you are accessing data records sequentially. What happens to all that disk caching when I am randomly reading records which I would do during data entry (which is where I am most concerned about disk access and network speed). The user might need the 1st rcd, then the 1,000 rcd, then the 5,257th rcd, then the 10th rcd, etc. etc.). Unless the entire table is cached, was good does loading a data page do if the record that you need next is not on that page? And if the data is cached in a server, that does not good all because you have the network bottle neck. You seem not to have done much investigation of how Jet works. Indexes are the key. When you request data from a table, Jet first requests the relevant indexes (based on your criteria) and then uses the indexes to figure out which data pages to retrieve. The requested data is going to be brought into RAM for processing, and since it's been requested from disk, it will be cached in Jet's internal cache, as well as in the OS's disk cache. When you write to the data, Jet will take care of writing to disk, which will then be handled by the OS, and if the written data pages are in the disk cache (and you are caching writes), the writes will be to the disk cache and then will be persisted to disk according to your disk cache settings (and you can also tell Jet explicitly to flush its cache, which in turn tells the OS to flush its write cache, i.e., persist to disk). This is all managed quite efficienctly, so efficiently, in fact, that there is very little reason to worry about the things you are worrying about until you reach extremely large datasets. That's usually not what is being used by an Access application directly (i.e., if you have a large dataset, you're likely going to use a different back-end than Jet/ACE). I know about write caching and it does not affect the speed of retrieving data (unless you do not have enough memory and your machine is thrashing), which is why is was not part of the question. Writing is much slower than reading because it involves so many different operations (writing indexes, determining if new pages need to be allocated), as well as maintaining as set of appropriate locks (write locks are a much bigger deal than read locks). But disk caching is a big deal for reading, too. As I explained above about the way Jet works, the indexes are the key to retrieving data efficiently. Data pages in Jet 4 and ACE are 4KBs in size, so you're retrieving data from the disk in blocks whose size is very often well-aligned with the file system's block size (the NTFS file system uses default block sizes in increments of KBs, with 4KBs being the smallest usual block size, and the next being 8, 16, etc., so reading 4KB data pages is likely to be very efficient, since they will be aligned with the file system's block size). Jet retrieves what it needs as needed, but only as much as needed. A simplified explanation of how Jet works: this SQL statement: SELECT tblPerson.LastName, tblPerson.FirstName, tblPerson.PersonID FROM tblPerson WHERE tblPerson.LastName = "Fenton" ....will first retrieve the index on LastName which will be keyed to the Primary Key, so Jet will then retrieve the primary key index. In both cases, only as much of the index as necessary is retrieved (it's not a sequential scan, but a b-tree traversal), and in the case of the primary key index, since Jet tables are clustered on the primary key, I'm pretty sure the primary key index is actually the real data pages (I could be wrong on that). I don't know for certain if the non-PK indexes store a link to the PK or if they directly reference the data page of the PK, but the point is that it's not a sequential scan of anything, but a b-tree index traversal, which is substantially more efficient. -- David W. Fenton http://www.dfenton.com/ usenet at dfenton dot com http://www.dfenton.com/DFA/ |
#29
|
|||
|
|||
I was told "Fields are expensive, records are cheap"
=?Utf-8?B?RGVubmlz?= wrote in
news Your comment “When table modifications call for lots of new fields it often means that a one-to-many relationship that ought to be set up in multiple tables is being shoe-horned into a single table.” It also means that you have to alter your front-end forms and reports by adding all the fields, whereas if you're adding records instead, whatever continuous or datasheet form you're using to display those records will just have more records in it, so there is no need to alter then user interface objects. Response: I don’t understand you comment. It is absolutely that that is I add more records, I don’t’ have altered anything. But when the user asks where to they enter these new fields and where the new reports are, what am I suppose to say. Well, I did not do add those fields but you can enter more data? That makes no sense at all. I suspect you haven't looked at Duane's survey database. In the structure there (which is a fairly extreme version of what I'm talking about), when you add a new question to a survey, you add a new record to the questions table, and link it to a particular survey. This will then be displayed by the same user interface that was used to display the other questions. In the structure you're suggestion, you'd have to add a field for the question to your survey table and then alter your form for data entry, and your report for printing out, and all of the tools you use to analyze and summarize the results. Now, a lot of statistical tools expect a flat-file structure, with one record for each completed survey, so it's usually the case that one takes the many-records structure and denormalizes it into a flat structure that is exported for use by the statistics program. The key there is that you don't choose your data structure based on how you're going to output it. Instead, you choose the structure that is going to be most extensible with the least amount of work, and then you take that data and convert it to the structures required for particular purposes, which is very often a denormalized structure. I’ve yet to figure how to capture additional data and produce reports with this additional data without modifying the UI or creating the new reports. Sorry, I have no idea what you mean by the above statement. I strongly recommend you examine Duane's survey sample database. Your comment: To *me*, that's what the phrase means, that adding fields is more complicated in terms of UI, which, to be frank, is where 90% of our time is spent in the development process (rather than in schema design/changes). My response: You are preaching to the choir! I fully agree. However, this project started out a simple membership mailing list. And that is all it was. Once the users saw it working, their imagination went wild and they wanted to start capturing additional information so they could produce additional reports. I have plenty of databases that are badly structured because they started out small, when it seemed it wasn't necessary to choose an industrial-strength schema, and then when they got extended, I greatly regretted not doing it the right way on the front end. So, the existing app is no justification for continuing to use a less than optimal schema. When this project started, I knew about the other report and strongly suggested that they just capture the additional fields and produce all of the report. However, the users kept saying no, no, no we don’t need at that stuff. The process is working just fine. And the boss man said – just do the mailing labels and that all. So that is what I did. I designed it for the full requirements, but only implemented the mailing label piece. But what are you supposed to do when the user, who is paying you, say no we don’t need that when you know that are going to need it? You can only push the customer so far. The approach I'm advocating doesn't take more time to implement to cover the initial requirements and is inherently extensible, without any additional programming for the UI. You've front-loaded the programming to handle the future extensions, which is something I do, too, but the suggestion by those advocating records over fields is that there is no extra work involved. You don't have to do anything extra to prepare for the N new fields, because the structure you've built handles any number of records from the very beginning. Have you ever implemented a many-to-many join when the current requirement was only a many-to-one? Say the client has customers and there's a customer type. You can store it in a field in the customer record, but what if the client later decides that a customer can be more than one customer type simultaneously? At that point you need a many-to-many join between customers and customer types. If you structured it that way from the beginning, it would handle 1 type per customer, and then would handle as many customer types as you needed when the scope expands. This is the simplest example of the cheap records vs. expensive fields equation. And the point is that as soon as the client needs more than 1 type per customer, you have to decide how many you're going to accomodate. You could, I guess, add a second customer type field, but then when they need three, you'd have to add a third new field. With the join table, you accomodate anything from 0 to however many you could ever conceive of. It's not more complicated to implement, and when the customer needs more than 1, there is nothing that needs to be done (other than, perhaps, altering your UI to remove any limitations you may have previously put on entry). But again, none of this had anything to do with the original disk access speed vs. Access internal process speed question. That was your red herring. You asked for an explanation of the expression. You misinterpreted it and went down the access speed garden path. I'm explaining what the phrase refers to, and it is, in fact, more efficient to add records than fields (consider the number of indexes that have to be maintained/searched). -- David W. Fenton http://www.dfenton.com/ usenet at dfenton dot com http://www.dfenton.com/DFA/ |
#30
|
|||
|
|||
I was told "Fields are expensive, records are cheap"
=?Utf-8?B?RGVubmlz?= wrote in
: Response: You are absolutely correct. The entity, at least from 12 independent software houses that I know of, is the policy. The policy, regardless of which line of business, has about 100 to 200 common fields. I would love to see one of those tables. My bet is that I could make a much more extensible structure with none of the tables having more than 30 fields and maybe only even one of them having that many (most of the data would end up in tables that had only two or three or four fields). -- David W. Fenton http://www.dfenton.com/ usenet at dfenton dot com http://www.dfenton.com/DFA/ |
Thread Tools | |
Display Modes | |
|
|