If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. |
|
|
Thread Tools | Display Modes |
#11
|
|||
|
|||
I was told "Fields are expensive, records are cheap"
Rick Brandt wrote in
: When table modifications call for lots of new fields it often means that a one-to-many relationship that ought to be set up in multiple tables is being shoe-horned into a single table. It also means that you have to alter your front-end forms and reports by adding all the fields, whereas if you're adding records instead, whatever continuous or datasheet form you're using to display those records will just have more records in it, so there is no need to alter then user interface objects. To *me*, that's what the phrase means, that adding fields is more complicated in terms of UI, which, to be frank, is where 90% of our time is spent in the development process (rather than in schema design/changes). -- David W. Fenton http://www.dfenton.com/ usenet at dfenton dot com http://www.dfenton.com/DFA/ |
#12
|
|||
|
|||
I was told "Fields are expensive, records are cheap"
=?Utf-8?B?RGVubmlz?= wrote in
news However, most personal lines and commercial lines insurance policy and claim tables can get very wide. That suggests they are probably denormalized so some degree. I know a number of popular applications (ACT springs to mind) use very wide tables, and this means their application has to be more complicated to work with the data. It also is one of the main causes of the performance problems that ACT has in terms of filtering large datasets -- it's harder to efficiently use indexes when you've got the data denormalized (and some of the data in ACT (last time I checked, which was many, many years ago) seems to be stored in multi-value fields, which is yet another reason for performance problems, since even if they are indexed, they can't be used efficiently). I will agree that a wide table *can* be normalized. I will also agree that there can be perfectly good reasons to denormalize. But I have very few tables in any of my apps that have even close to 50 fields. A common reason for denormalization that looks properly normalized is mis-definition of the entities (i.e., a table holds records, each of which represents an entity, and each field is an attribute of the entity). In a survey, for instance, it's very common to create a single record for each survey, with a field for each question. But this approach is completely unextensible -- any change to the questionaire requires a change to the table. Or you have to create a table for each questionaire. Or you have to add fields that are used in some questionaires and not in others. I learned this lesson with a medical study database that was tied to the structure of the UNOS national transplant registry. There was only one medical study and for it to work the variables being examined had to be established at the beginning, so it was never going to change. It seemed logical to use a record per patient. But before the app was even finished, UNOS changed the data structure of how they recorded data, and we ended up having to add fields to the existing tables and then change all the forms and reports to reflect that. If we'd used a proper structure, i.e., with a record for each variable, and a table that defined what the variables were, we would have needed to make absolutely no changes to the *structure* of the data or the application -- we would have been able to accomodate the additional variables just by adding *data* to the existing tables. That's the essence of the problem. Fields are expensive, records are cheap -- it's easy to change the data, hard to change the data storage *structure*. -- David W. Fenton http://www.dfenton.com/ usenet at dfenton dot com http://www.dfenton.com/DFA/ |
#13
|
|||
|
|||
I was told "Fields are expensive, records are cheap"
=?Utf-8?B?RGVubmlz?= wrote in
news A program reads and writes rows (not fields) to the disk. I think you're stuck in a very old-fashioned approach to this, one that reflects the very constricted memory and storage available back in the 80s and early 90s. A Jet/ACE data store doesn't store data in records, but in data pages. It's not like legacy systems at all. And the fields are variable length, so records are variable length. The efficiency of interaction with the storage and RAM are pretty moot since the days of disk caching, when even a write is done to RAM (and then persisted to disk when convenient). This means there's hardly any difference between RAM manipulation and reading/writing from/to the data store, since both are happening in RAM. Disk caching has been a given on Windows since Win3.11, c. 1992. This predates Access, in fact, so Access/Jet has never been as bound to disk space/performance as the databases that seem to form the model in your head. And when a server database is involved, the bottleneck is not RAM vs. disk, but the network connection, which is vastly slower than not just the RAM, but even slower than disk access. So, your model seems completely outdated to me. -- David W. Fenton http://www.dfenton.com/ usenet at dfenton dot com http://www.dfenton.com/DFA/ |
#14
|
|||
|
|||
I was told "Fields are expensive, records are cheap"
=?Utf-8?B?RGVubmlz?= wrote in
: In another posting John clarified what he meant by "Fields are expensive, records are cheap". "What I consider "expensive" is that an improperly normalized table structure will require far more work in getarounds, contorted queries, form and report maintenance, and code than would a properly normalized design. " There's also the performance issue. If you need to query phone numbers, in a normalized structure, you will be querying one field, using one index, retrieving the data pages for a single index across the wire. With a denormalized structure, you may need to query multiple fields, using multiple indexes, and pulling more index pages across the wire. And if your table is denormalized and you need to add indexes for repeating fields, you're much more likely run up against the limitations of number of indexes in a single table, which for Jet/ACE is 32. If you've got 4 phone number fields in your main table, that's 4 additional indexes, which means you have only 27 left (since you have to have a PK, of course). If you normalize your phone numbers, the index is in a completely different table, on a single field, and it uses up none of the indexes in the main table, leaving you with 31 available indexes. Some argue that the UI for normalized data is hard to create, or that if you use easy methods, you're stuck with subforms, and users don't find subforms easy to use. Well, tell me that users will have trouble with this: http://dfenton.com/DFA/examples/SA/ContactInfo.jpg That's two instances of a single subform, using an outer join against the list of possible phone number types for each category of phone numbers (office vs. home) so that the user just sees a flat form listing all the phone number fields that can be filled out. All of the phone numbers are stored in a single table, and records are created there only when there is a number to be stored (email is stored in the same table in fact, though it's not included in the same subform, since it is presented differently because it has different relevant data fields). So, I think there are no valid objections to normalization. That said, I don't always insist on normalized phone numbers, because in most cases, nobody searches on them. The number of necesssary numbers is also relatively finite -- while in the last 15 years we've added pagers and mobile phones (and now pagers are out the window and fax is not going to be relevant 20 years from now), there's not much else to add. In many of my apps, there's phone, fax, mobile, email and a memo field for OTHER NUMBERS. This covers everything, really. Now, all that said, there's an argument to be made that the dropping of pagers and the coming abandonment of faxes means that the denormalized structure is flawed in that not only is it hard to *add* new fields, it's also hard to remove obsolete ones. I would say the problem with removing fields vs. adding them is asymmetrical, since you don't have to touch the underlying data structures -- you only have to alter the user interface (forms/reports). That's substantially easier to do than adding entirely new fields, so it's not nearly as bad a problem once you've reached a number of repeating fields that covers all the reasonable possibilities. -- David W. Fenton http://www.dfenton.com/ usenet at dfenton dot com http://www.dfenton.com/DFA/ |
#15
|
|||
|
|||
I was told "Fields are expensive, records are cheap"
=?Utf-8?B?RGVubmlz?= wrote in
: You are correct is that I did not describe my table structure. However, if I read someone had 30 years in relational db design experience, I would have thought that he or she knew about designing to 3rd normal form and would not have questioned them. whether a design is normalized or not depends entirely on how you define the entity being modelled in the table. If you define the entity as "this table includes records where each record is a completed survey" then you'll have a record with a field for each survey question. It won't be denormalized, because each field is an attribute of the survey. But modelling your entity in that fashion is very rigid and difficult to alter down the road. If, instead, you consider the questions on the survey to be variables, each one equal, then the single-record survey structure is going to be denormalized, since you've got repeating fields (variable 1, variable 2, variable 3 -- even though they may be named Age, Sex, Eye Color). For particular purposes, this may work very well. A Person table doesn't need to be broken down into repeating variables, since most people have the same attributes. But there are still plenty of ways to slice and dice the data, such as whether you treat addresses and phone numbers as repeating data or not (the clue is where you have "Work Phone" and "Home Phone" you have repeating data, and "Work Address" and "Home Address" you have the same thing). What you do that is up to you -- there is no law that your data has to be normalized. You choose the level of normalization that is appropriate to the application you're building. For surveys, there is no question in my mind that the normalized structure with each variable as a row in a table instead of as a field in a record is the only valid way to go, simpley because doing otherwise makes it very, very difficult to change the survey, or to store more than one survey. In other cases, it's not necessarilyl the case that normalized storage is going to be vastly superior in the long run. Your particular application may be one where it is appropriate to add 30 fields to a table (though that seems to indicate to me that there are either repeating fields or the original design was really inadequate and not carefully thought through -- not so much by the database developer as by the client). But as a general rule, fields are expensive and rows are cheap, and any time you're contemplating adding a lot of fields to a table, it should be a red flag that perhaps you should rethink the structure. That doesn't mean you'll conclude that adding 30 fields to a single table is wrong, just that you should evaluate the ultimate solution carefully. Making the wrong decision is a lot harder to rectify *after* you've implemented it, so taking the time to consider the issue carefully is going to save loads of time later. -- David W. Fenton http://www.dfenton.com/ usenet at dfenton dot com http://www.dfenton.com/DFA/ |
#16
|
|||
|
|||
I was told "Fields are expensive, records are cheap"
=?Utf-8?B?RGVubmlz?= wrote in
: Response: You are absolutely correct. Personally, I think the maintenance number is higher. However, I did not interpret his comment this way for the following reason. If you have to add a data field so you can produce a report, what do you tell the user ƒ " I canƒ Tt do it because that field will add 5-10% to the cost and increases maintenance cost? What I have learned in my 30 years is the cost issue is NOT for the developer to decide, it is for the user to decide. It is incumbent upon us to inform the user of the cost, to develop a cost effective design, and maybe do a cost benefit analysis if the numbers are big enough. However , if the user decides it is worth the cost, then it is not our place to argue. If you model the new fields as records in a table, in many, many cases, there is no change needed to the app. Thus, the cost of adding the new field is ZERO. If you model it as a new field in an existing table, the cost is substantially larger. What client, understanding the issues, would choose the $100 solution over the $0 solution? But if you as developer only offer the denormalized structure, they never get that choice. -- David W. Fenton http://www.dfenton.com/ usenet at dfenton dot com http://www.dfenton.com/DFA/ |
#17
|
|||
|
|||
I was told "Fields are expensive, records are cheap"
BruceM gets right to the key point: how you decide to model your
entities determines what is normalized and what is not. "BruceM via AccessMonster.com" u54429@uwe wrote in news:a42adb3e4fe06@uwe: To use your specific example of insurance processing, auto and home insurance are so different that many developers would use a different table for the two rather than broaden an existing auto insurance table to accomodate home insurance. You could argue that it is all insurance and therefore is an entity that belongs in a single table. I would argue that since both are insurance policies, you can use the survey model, and for each attribute of the policy, you create a record in a table connected to the main policy record. For the different types of insurance policy, you'll have a different set of variables, because they are different types, but by storing each one as a row in a related table, you've made your structure extensible so that you can model any type of insurance policy. And if you like, you can use a single set of user interface objects to work with and print all of them. You could similarly argue that employees and clients are both people, and therefore should be in one table. Many developers would define "entity" more narrowly, and therefore use separate tables, in a situation where many fields are used for one type of entity and not the other, or where the entities are clearly in different categories. I long ago switched over to putting all people in the same table, whether employees or clients. It seems obvious to me that if you have two tables that have a whole set of fields that are indentical, that you actually have a single entity stored in two different tables. This is not to say it is not useful in some circumstances to set up two separate tables for two instances of what are largely the same entity. I can see no real benefit from keeping employees and customers in the same table, though certainly employees might some day become customers, and could be both at the same time, so there's a certain justification -- it all depends on the business rules and how much benefit there is to modelling them as the same or different entitites. I go with keeping all real-world entities in the same base table (think how complicated it would get if you had an employees table and a customers table and were normalizing your storage of phone numbers and addresses -- how do you enforce RI if you don't duplicate the exact same structure for both entity types?), and then segregating them accordingly. I found this to be the best approach well over a decade ago and have stuck with it and had no problems. That doesn't mean others would reach the same conclusion for the exact same applications, or that the single-entity model is appropriate to all applications (or even most). It's entirely dependent on the particular application and the particular functionality of the app and the business rules in place. However, given that those things can change drastically over time, I tend to favor going as normalized as practical on the front end, since that maximizes flexibility over the long run. -- David W. Fenton http://www.dfenton.com/ usenet at dfenton dot com http://www.dfenton.com/DFA/ |
#18
|
|||
|
|||
I was told "Fields are expensive, records are cheap"
Dennis,
Just a comment regarding a comment regarding a comment ;-) I made the comment: I took over a survey application at a hospital from a programmer with many more years of programming than I had at the time. Your response: I don’t understand what this comment has to do with the subject. This was all based on your statement regarding your "over 30 years experience with relational databased". Years of experience are not always the same as understanding and knowledge. In your case, your knowledge may be much greater than 30 years. I'm not sure how we would know that. I was simply providing an example suggesting that I have followed developers who had many years of experience without a basic understanding of properly structured tables. In no way was I suggesting you don't understand normalization. -- Duane Hookom Microsoft Access MVP |
#19
|
|||
|
|||
I was told "Fields are expensive, records are cheap"
Dennis,
To further my point about experienced developers I would like you to look at this page on normalization. http://www.fmsinc.com/tpapers/datanorm/index.html I find it very wrong yet it is authored by a very respected developer. It's my opinion the author transforms one non-normalized structure into another non-normalized structure. Why not create a more normalized structure that has fields for ID, Year, AccountCode, and BudgetAmt? -- Duane Hookom Microsoft Access MVP |
#20
|
|||
|
|||
I was told "Fields are expensive, records are cheap"
Bruce,
I though I state my point quite clearly in my original message "I've always been taught the exact opposite - that "Fields are cheap, records are expensive" since GOING TO DISK IS SO SLOW VERSUS ACCESSING DATA IN MEMORY. Is there something different about Access where the statement "Fields are expense, records are cheap" is true." (excuse the caps, but I had not other way to highlight the text). The original question has to do with disk access speed versus the speed of extracting a lot of variable from a long record." and that is all. Everyone one else is addressing a non-issue. The data structure had NOTHING to do with the original question, it was simply interesting background information, that is ALL. The question was disk access speed over a network versus extracting data from a long record. That was the whole question and instead of answering the question, people are worried about 30 fields. All the 30 year comment was suppose to mean was I don't do stupid mistakes like repeating fields and that is ALL it meant. Nothing else! However, everyone, including you, had read way more into that statement. Your comment of: To use your specific example of insurance processing, auto and home insurance are so different that many developers would use a different table for the two rather than broaden an existing auto insurance table to accommodate home insurance. Response: You are so wrong. I know because I’ve been working in the insurance field for those 30 years. In that time I’ve worked on 12 packages from 12 different vendors and they all have many things in common. One of them is they all have a policy master table which contains one record per policy every issued regardless of what line it was written under. And the policy master usually has about 100 + fields in it. (A couple had 200 + fields.) They all also had a line specified policy master where the specific line information (auto, home, work comp, general liability, etc.) for that policy was store. (There were a lot of other tables too.) Your comment about the lines being so different is both correct and incorrect. There are about 100 fields that are common between each line of business between the policy information, reinsurance, earned and unearned premiums, statistical reporting to the insurance bureaus, and other such things. Now, for the information that is line specific, that information is indeed stored in different tables. Actually, if the developer chose to have multiple master tables he would be creating himself a nightmare when came to policy inquiry, claim processing, earn and unearned premium account, policy reports, renewal processing, cancellation and reinstatement processing, and a whole long list of other issues. But then again, that is where the 30 years of experience comes in. But it is quite a coincidence that all 12 vendors who had very different products (from flat files on a mainframe to a windows based client server and everything in-between) and in some cases where written decades apart in time all took the same approach. Your comment: Many developers would define "entity" more narrowly, and therefore use separate tables, in a situation where many fields are used for one type of entity and not the other, or where the entities are clearly in different categories. Response: You are absolutely correct. The entity, at least from 12 independent software houses that I know of, is the policy. The policy, regardless of which line of business, has about 100 to 200 common fields. (The reason for the huge difference is due to what the different system capabilities. The more they could do, they more data they stored.). That is why those fields are in the policy master file / table. The many fields at are held in common, but are specific to each line of business are held in line of business policy master files and they are all children to the main policy master parent. Dennis |
Thread Tools | |
Display Modes | |
|
|