If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. |
|
|
|
Thread Tools | Display Modes |
#11
|
|||
|
|||
Restated: "Fields are expensive, records are cheap"
David, Your Comment: The determination of whether or not those tables were actually normalized depends on the chosen definition of the entity being modelled in the table. I would consider the 200-field table you mention later to be unquestionably denormalized, even though I know nothing at all about the content/function of those fields. Response: This is a very common statement for people who do not know the P & C Insurance industry. The 100 to 200 field master record (depending upon the vendor) contained ONLY the common and non-recurring policy information between the different lines of businesses (type of policies). The insured’s name and address information is not even included in these field as that information is in another table. Actually, when you examine the data at the “policy” level for different lines ( Personal Auto, Homeowners, Renters, Work Comp, Commercial Auto, General Liability) you would be surprised to find it is actually quite similar. It is not until you get to the “coverage” and insured “object” level that the different lines vary dramatically. There is a lot of policy information that is never seen by the public. Just off the top of my head some common policy information is reinsurance information, cancelation and reinstatement status (the cancellation & reinstatement history is maintained in a separate table), statistical report to ISO and each individual state, premium accounting (not Acct Rec.) with written, earned, unearned, and inforce premiums, renewal offers tracking, voluntary audit information, physical audit tracking. You could break all this information into it their own tables, but why? So much of the above information is inter-related (like effective and expiration date) that breaking in into separate tables just slows down data access and increase complexity which just increas Once of the client I worked was a state owned insurer of last resort. If you could not find insurance anywhere else, you could purchase it from this company. The company was less then a year old when I started working there. There were located in a city where there were a lot of banking expertise but very few little insurance expertise. Their staff had all sort of banking experience, but no insurance experienced people. The first thing I did was sit down the vendor and go over their system. I learned that system in less than a week. It was simple to learn not because I that brilliant, but because there are only so many ways you can build an insurance system. And if you understand insurance, you can easily understand a new system. (Talk to an auto mechanic – it does not take them long to learn a new “normal passenger” car [I’m excluding the exotic engines] because there are only so many ways to build a “normal passenger car”.) The vendor commented that they were glad I was there because they had been trying to teach the banking people and AS/400 people for about a year and no one really understood the system. Again, that is not because the company did not have smart people or people that lacked experience. It was because their people lacked insurance inexperience. I had to give the insurance company’s CFO the reinsurance premium accounting journal entries for their financial books. This was not because she was dumb (as a matter of fact she was quite brilliant), it was because she did not have insurance accounting experience, which is a quite a bit different from normal accounting entries. But I went through all that just to say the head of the company’s IT department thought the same thing you do (he also came from a banking background). So, he hired some database design consultants / experts to review the database’s design, who again did not understand insurance. (Had the understood insurance, they could have taken a quick look and realize the database was in pretty decent shape.) Instead, they gather all of the data, all of the relationship, all of the interdependencies, and did there thing. Guess what, they came up with some minor suggestions but no major changes, which is what I told the CIO before he started this effort. But oh well. There is where the experience comes in. Also, as I stated in the other discussion on this subject (which I surprised you missed as you are commenting in that discussion also), I’ve worked on 12 different vendor’s insurance systems over the years. Those system have been written DECADES apart with totally different technology and totally different development groups. At one extreme we have the flat file system running on a mainframe and at the other end we have a Window’s base object oriented client server system using an SQL database. And they have all had a 100 to 200 field policy master table. (The more capable systems had the larger number of fields). It is interesting that you would disagree with all those people with all that experience. But whatever. Your comments: That sounds like a table that has a bunch of fields that are used only for a single record type, so that an auto insurance policy has one set of fields, but a renter's insurance policy has a different set of fields. Response: Well, it may sound like that but again this is the common statement for a newbie in the P & C Insurance field. Then normal way I’ve seen the policy master designed is to a common policy master fields where all common fields (all 100 to 200 depending upon the system) are stored in a single table. Then for each line of business (such as auto or renters), you have a line of business policy master file that contains those fields specific to that line of business. This table is an extension of the common policy master table. In a good design, you simply don’t store line specific line fields in the policy master table, you store them in the line specific policy mater files. One of the reasons the policy master record is so big is there is a whole lot of “behind the scenes” data that is being stored that the policy holder never sees. (See above). At the coverage and insured object level, the story is totally different. While structure of the coverage tables mirrors the policy master and line specific policy master, the coverage master table is actually quite narrow. That is because there is not a whole lot of common information (other than effective and expiration dates, policy accounting [not Acct / Rec. info], statistical accounting, coverage limits, and reinsurance) at the coverage level. Most of the coverage information is stored in different line specific coverage and insured objects tables (two or more tables). These tables are extension of the coverage master table and children of the line specific policy master tables. The homeowner coverage is actually comprised of multiple coverage tables because a homeowner policy can cover multiple lines of business. For example, home owner policy can coverage fire and property damage (1 line of business), general liability (another line of business), theft (another line of business), work comp for house hold help (another line of business). These were just the lines of businesses that I could think of off the time of my head. A full implementation of a homeowner policy is extremely involved and very complicated. But, back to your example. Your statement is incorrect. The personal auto policy master, coverage and insured object tables contain the auto specific coverage information, while the renter’s policy, coverage and insured object tables contain the renter’s specific coverage information. The common information for both the auto and renter’s policy is stored in the policy master table. Your comment: Any time you're using some fields for some records and not for others, it's an indication to me that the entity has been misdefined, and should probably be broken into at least two tables, with a narrow header table and a long child table, where each row stores what was formerly a field in the wide table. Response: You are preaching to the choir here! I totally agree. However, we are going to have to disagree on the “narrow header table” issue. The header table is as long as the data model / structure requires it to be. If is it short, it short. If it is long, then it is long. Your comment: All that said, my conclusion could be wrong for any particular application. Response: I agree with this point. Your comment: But "fields are expensive, rows are cheap" is a generalized rule of thumb, not a hard-and-fast law of nature. It allows for exceptions for certain purposes, but is a starting point for evaluating a schema design. Response: I now understand John’s logic behind “Fields are expensive, rows are cheap” and, given the context, I fully agree with it. Dennis |
#12
|
|||
|
|||
Restated: "Fields are expensive, records are cheap"
=?Utf-8?B?RGVubmlz?= wrote in
: I can see where disk caching would help in a sequential process, but does disk caching really help in a randomly accessed database during data entry? Yes, because every modern database use b-tree traversal of indexes to locate records. Next question? -- David W. Fenton http://www.dfenton.com/ usenet at dfenton dot com http://www.dfenton.com/DFA/ |
#13
|
|||
|
|||
Restated: "Fields are expensive, records are cheap"
=?Utf-8?B?RGVubmlz?= wrote in
: Hmmm, I see your point and kind of agree with it. My back ground in on large and midrnage computers where it is nothing to have a 200 field, 30K record. However, I realize that Access is a different beast and I'm having to learn to adjust for it restrictions. Thanks for the insight. Just more to think about. But then I learn something new also. Schema design should be as independent of the database engine as possible, so Access is *not* different in any way, shape or form. I would recommend *as a starting point* the same normalized design for any database engine. We are at least 15-20 years past the days when the developer of a desktop database app needed to worry about the physical storage of data. It's only in edge cases where any modern developer is going to start considering it in the design of the database schema. -- David W. Fenton http://www.dfenton.com/ usenet at dfenton dot com http://www.dfenton.com/DFA/ |
#14
|
|||
|
|||
Restated: "Fields are expensive, records are cheap"
David,
Your comment: Schema design should be as independent of the database engine as possible, so Access is *not* different in any way, shape or form. I would recommend *as a starting point* the same normalized design for any database engine. Response: Your comment is self contradictory. Instead of saying “Schema design should be independent of the database engine” you stated “Schema design should be as independent of the database engine as possible.” The “as possible” statement by definition states that thing will be different between database engines. Which, throws you whole argument out the door. Yes, Access is different from DB/400, and Oracle, and D3. It is very similar, but it is not the same. From what I’ve read Access had not implemented the entire SQL language. Also, Access does not support blobs very well where Oracle does. From what I’ve read, it is highly recommended that we not store blobs in Access database. Rather, we store the path and file name to the blob and let DOS/Windows store the binary file in the specified directory. From what I’ve read Oracle has no problems storing blobs in their records.. I don’t know if DB/400 stores blobs. I know D3 does not store blobs, but jBase might. I know D3 support multi-valued list and I think Oracle does also, which are very useful in exploding part on a fully assembled unit (ie car door). Access does not support multi-valued list. So much for “Access *not* different in any way, shape, or form”. Even as a newbie, I knew that statement was false. Even a prima facia examination of that statement indicates it is false. Are you saying Access is not different in any way, shape of from DB/400, Oracle, My SQL or SQL server? I give you one difference. The maximum record size on the different databases are different. The maximum table and database sizes are different. Those two differences alone negate your statement. I also know that DB/.400 does things differently that Access because I tried some DB/400 approaches on Access and found that that approach did not work on Access. Given that DB/400 does not have a 4k record size limit as Access does, that along could possibly cause a different schema. Access would have multiple tables where DB/400 would have one. I also remember that Access has a relatively small (from a commercial insurance policy standpoint) maximum table / database size. DB/400 and Oracle don’t have that same limitation. To me, this would definitely influence the schema design. Assuming I had one table (which you would in commercial policies) that exceeded the size limitation of Access, I would have to design around that limitation. In DB/400 and Oracle, I would not have to. Each database engine has different capabilities, enhancements, different levels of SQL implementation, and limitations than the next. What it appears that you are saying is we should design our schema to the lowest common denominator and ignore any additional capability offered by the particular database. As soon as you move away from this position, you then have to design different schema for different engines. Granted, those changes might be slight. But as soon you design something different for the different engines, you violated your statement that Access is not different. I will agree that the general schema should be very similar for the different database engines, they will not be the same. The design of a Pick – D3, DB/400, Oracle, and Access schema would be different mainly because of the different capabilities of the different database engine. Your comment: We are at least 15-20 years past the days when the developer of a desktop database app needed to worry about the physical storage of data. It's only in edge cases where any modern developer is going to start considering it in the design of the database schema. Response: You are absolutely correct. That is why I was wondering about John’s comment. I thought that he was implying something about Access’s speed. It turns out I simply misunderstood his statement. Dennis |
#15
|
|||
|
|||
Restated: "Fields are expensive, records are cheap"
David,
I stated "The first data entry might access the first record, the next the 1,000th record, then next on the 5,000th record, and so on and so on. So, unless the entire table is cached, does it really help? " Your comment: Yes, because every modern database use b-tree traversal of indexes to locate records? My response: So what if the modern database uses b-tree traversal of indexes to locate the records. What does that have to do with physically reading the record from the disk on a server? I can see where the speed of determining the location of the particular record would be assisted by this, but knowing where the record is and getting the record are two totally different things. Once the disk address / page / whatever is deteremine the db engine still has to go to disk to get the record unless the entire table is in cach or in memeory on the local machine. So once again, how does all this caching and b-tree traversal speed up the physical reading of a record that is not in memory. the database engine still has to go to disk or worse yet - over the network to get the desired record. If I've got it wrong (which I may well have), please explain where I missing the poing. Thanks, Dennis |
#16
|
|||
|
|||
Restated: "Fields are expensive, records are cheap"
=?Utf-8?B?RGVubmlz?= wrote in
: Also, as I stated in the other discussion on this subject (which I surprised you missed as you are commenting in that discussion also), I Tve worked on 12 different vendor Ts insurance systems over the years. Those system have been written DECADES apart with totally different technology and totally different development groups I have looked at the data structures that have become common practice for a lot of different application types and have repeatedly seen bad designs become the norm for those creating those types of applications. That many people have used the same data structure doesn't make it a good one. Likewise, lots of applications have to support legacy applications that can't easily handle more normalized relational structures and thus a lot of structures that were required 15-20 years ago by the available technology are still in use. This is quite understandable, of course, but if you're developing a new application using modern technology, there's no reason to maintain the old data structures unless they are really the best model of the entities involved. I see this kind of thing all the time. A 200-field table is just not a good model and for Access is way too close to the 255-field limit for me to be comfortable with it. While it's theoretically possible that the structure is not denormalized, I think it's extremely unlikely, and that there are better structures to store and manipulate the very same data. -- David W. Fenton http://www.dfenton.com/ usenet at dfenton dot com http://www.dfenton.com/DFA/ |
#17
|
|||
|
|||
Restated: "Fields are expensive, records are cheap"
=?Utf-8?B?RGVubmlz?= wrote in
: So once again, how does all this caching and b-tree traversal speed up the physical reading of a record that is not in memory. the database engine still has to go to disk or worse yet - over the network to get the desired record. Well, it doesn't, but I thought your question implied that what I was proposing was somehow going to require reading the entire table into memory, which is patently not the case. Only as much as is needed will be retrieved, and once retrieved that will be cached (so that if it's needed again, there won't be another trip to disk/server). This is EXACTLY the way any alternative is going to work, so I just don't see your point in disputing the question. Jet is a random-access engine, in that it doesn't have to read the file sequentially to get to the data, nor does it use record numbers and offsets (as in a fixed-length record). It retrieves the minimum data pages needed to fulfill the request, having looked up in the indexes which are the needed the data pages. It will make the trip to disk only once and cache the result, and go back to disk for a read only when the record has been updated (according to the locking data, rather than checking the data page itself). Your question seems to me to be based on assumptions that don't take account of how modern random-access database access works, or how modern OS's and database engines cache and lock data. But maybe I'm just misunderstanding your basic question. -- David W. Fenton http://www.dfenton.com/ usenet at dfenton dot com http://www.dfenton.com/DFA/ |
#18
|
|||
|
|||
Restated: "Fields are expensive, records are cheap"
=?Utf-8?B?RGVubmlz?= wrote in
: Your comment: Schema design should be as independent of the database engine as possible, so Access is *not* different in any way, shape or form. I would recommend *as a starting point* the same normalized design for any database engine. Response: Your comment is self contradictory. Instead of saying “Schema design should be independent of the database engine” you stated “Schema design should be as independent of the database engine as possible.” The “as possible” statement by definition states that thing will be different between database engines. Which, throws you whole argument out the door. Only in edge cases, which you don't encounter until implementation time. Yes, Access is different from DB/400, and Oracle, and D3. It is very similar, but it is not the same. Well, 99% of the differences between Jet/ACE (Access is not a database) and the mentioned databases is that Jet/ACE is a file-based database engine, and the ones I recognize (D3 is unknown to me) are server-based databases. From what I’ve read Access had not implemented the entire SQL language. There is no such thing as "entire SQL language." There are only SQL standards, e.g., SQL 89, SQL 92, etc., and each establishes a baseline meaning for particular SQL commands. No database engine is required to implement 100% of a SQL spec, though from reading the fanboy arguments about which db engine is better you'd think that was the only thing that mattered. Jet/ACE being a relatively old SQL implementation and predating the first major SQL standard (SQL 89) is rather idiosyncratic in its SQL implementation, and has a lot of legacy aspects (some of which would be quite nice if they were implemented in other SQL dialects, e.g., DISTINCTROW). Jet/ACE SQL is also different in that it tends not to support DDL as well as most other SQL dialects, but I would argue that this is because Jet/ACE very early on got a very fine high-level database interface, i.e., DAO, that was closely tied to Jet/ACE and allowed full control of the database engine without needing to write DDL SQL. There are still many things that you can do in DAO that you can't do in Jet/ACE DDL, and not just things that are Jet/ACE-specific. Also, Access does not support blobs very well where Oracle does. Oh, come on. Who cares? Secondly, Oracle didn't always support BLOB data well -- it's something that has been added into the engine as its users needed it. Jet/ACE's BLOB support has not been as important to its users so it doesn't perhaps support the same level of features as Oracle's implementation. But really, there about a million things that Oracle offers that Jet/ACE lacks, but that's because the purposes of the two database engines are completely different. From what I’ve read, it is highly recommended that we not store blobs in Access database. Yes. And I think it's mostly advisable in other database engines, too. Rather, we store the path and file name to the blob and let DOS/Windows store the binary file in the specified directory. I think you're actually confusing the advice regarding a specific type of BLOB field, i.e., OLE fields, and BLOB fields in general. OLE fields have a lot of overhead that makes them problematic, and BLOB fields lack that overhead (because they are just a bucket in which to store binary data). OLE fields are easy to use because of the wrapper, but are inefficient because of it. BLOB fields are more efficient but harder to use. And Oracle doesn't support OLE fields at all. This is because it has a different intended user than Jet/ACE, and that's as it should be. From what I’ve read Oracle has no problems storing blobs in their records.. Nor does Jet/ACE. I don’t know if DB/400 stores blobs. I know D3 does not store blobs, but jBase might. I know D3 support multi-valued list and I think Oracle does also, which are very useful in exploding part on a fully assembled unit (ie car door). Access does not support multi-valued list. So much for “Access *not* different in any way, shape, or form”. You don't seem to be paying close attention, as multi-vield fields were added in Access 2007 in the ACE in ACCDB format. This was done for Sharepoint comatibility, not because multi-value fields are a good idea -- they most categorically are *not* a good idea, in fact, and support for them is not an indication of a good database engine. Even as a newbie, I knew that statement was false. You're arguing against something I never asserted. You seem to have some sort of reading comprehension problem, as it is patently obvious that when I said this: Schema design should be as independent of the database engine as possible, so Access is *not* different in any way, shape or form. ....the second phrase applies to the first. That is, in regard to the ideal independence of schema design and database engine capabilities, Access is not different from other databases. It is entirely perverse to divorce the second clause from the first and insist that I was saying there were no differences of any kind between Access/Jet/ACE and other databases. No reasonable person would count that as a valid or good-faith interpretation of what was written. Even a prima facia examination of that statement indicates it is false. Are you saying Access is not different in any way, shape of from DB/400, Oracle, My SQL or SQL server? No, I didn't say that. And you know perfectly well that I didn't, unless you are blazingly stupid. I give you one difference. .... Who cares? You're off on a crazy tangent that has nothing to do with what I wrote. I didn't make any assertion that all databases were alike in all respects, and you know I didn't make that assertion. Yet, took several paragraphs to provide evidence to contradict something that was never asserted. I deleted it. I didn't read it. I didn't need to, as it doesn't have anything to do with what I actually assered. Each database engine has different capabilities, enhancements, different levels of SQL implementation, and limitations than the next. What it appears that you are saying is we should design our schema to the lowest common denominator and ignore any additional capability offered by the particular database. No, I'm not saying that at all. I'm saying that schema design is a logical operation, not a physical one. You model the entities according to their characteristics and logical relationships and then you design tables that implement those relationships and that can store the characteristics and do all of that in a way that maximizes data integrity and consistency. Minor details may differ (you may choose different data types in different engines to store a particular attribute), but the logical structure and relationships is determined not by the database engine, but by the characteristics and structure of the real-world entities being modelled for the purpose of the application at hand. The last statement has a couple of corollaries: 1. the same entity may be modelled differently in different applications because the purposes of the applications and role of the entity in each application may be different. 2. two different structures can be logically equivalent, but in terms of database performance one or the other may be more efficient because of the real-world realities of the way database engines store and retrieve data. In some cases, the particular features of a particular database engine may make a difference as to which structure you choose. For instance, if you are putting business rules in the database schema, an engine that supports triggers is likely to be a better choice, and this will have an impact on how you implement the structure. Too often people think of data normalization as a process of taking a flat table and breaking it into constituent parts. But that gets things backwards, because you're starting from the denormalized structure and working backwards to the normalized structure. In my opinion, you should begin with the normalized structure and only denormalize when the application and/or database engine forces it on you. The design of an application is often going to force you to denormalize some things for practical reasons (usually performance), but that doesn't mean you start from the denormalized structure (that would be a form of premature optimization). As soon as you move away from this position, you then have to design different schema for different engines. Different in details, but usually not in terms of entity definitions and logical structure. That is, you might have different data types for certain fields, but the tables and their relationships will be more or less the same. Now, I already outlined above a case where this might not be the case, and that's the one where business rules are incorporated into the schema design, which very often necessitates the use of triggers. In that case, your design will likely be very different structurally for a database that lacks triggers as opposed to one that supports them. Of course, modern practices deprecate the whole idea of incorporating business rules at the engine level. The usual modern preference is to put those in a layer between the application and the database so that the business rules can be altered independent of both the database structure and the application implementation. Historically this has mostly not been possible with Access, as it's designed around direct communcation between Access/Jet/ACE and the database. This changes in A2010 with Sharepoint integration, because Sharepoint segregates certain things into is own separate layers so that things you'd normally put in the application or in the database end up in the layer in between. Granted, those changes might be slight. But as soon you design something different for the different engines, you violated your statement that Access is not different. This is not a statement that I made, so I really see no reason that I should have to defend it. -- David W. Fenton http://www.dfenton.com/ usenet at dfenton dot com http://www.dfenton.com/DFA/ |
|
Thread Tools | |
Display Modes | |
|
|