If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. |
|
|
Thread Tools | Display Modes |
#1
|
|||
|
|||
Splitting Large Table into Smaller Tables - How Much is Too Much?
I built a small database to track HOA information that included lot number,
address info, plat numbers, maps, books, pages, dates etc. that had about 25-30 fields to start, so it was pretty manageable. Since the initial conception, the file size has grown 3 or 4 fold, so there are close to 100-120 fields in this one table. I would like to break it out into smaller linked tables, but not sure which fields need to be split out and which can stay. There are 2 tables to the database. The first is the neighborhood information as a whole and the second table includes all the homes within a neighborhood. Included in the detailed HOA table are several drop-down fields (added since the original database was created) where some data is duplicated like builder, street name, phase number, village, city, state, etc. There may be an instance where out of 200 homes, these fields may overlap on 10-25% of the records. There are probably 15 of these types of fields. There is also a section that has plat and map revisions where there may be up to 10 revisions with map book, page and date. This I know needs to go into a separate table. There is also an architectural section that has paint color, garage, sq footage, building material, etc. I assume this needs to be a separate table, although the overlap between records here is not as obvious. Finally, there are a lot of date fields including closing dates to the builder, homeowner, plat dates, other approval dates, etc. I guess I can split this into 50 linked tables, but is this efficient? It seems like if there are changes anywhere, you could forget to update a related table and mess something up. There would also be a lot of key fields to link them, etc. My questions are how do you determine how much to split out and how much can stay in the main table? Is there a rule of thumb or other that is a good guideline to determine what should be broken out and what should stay? I really don't want to have to manage 50 related tables to create queries and reports, but don't want to overload what is essentially a one table database. Any suggestions would be greatly appreciated!!! |
#2
|
|||
|
|||
Karl Burrows wrote:
I built a small database to track HOA information that included lot number, address info, plat numbers, maps, books, pages, dates etc. that had about 25-30 fields to start, so it was pretty manageable. Since the initial conception, the file size has grown 3 or 4 fold, so there are close to 100-120 fields in this one table. I would like to break it out into smaller linked tables, but not sure which fields need to be split out and which can stay. There are 2 tables to the database. The first is the neighborhood information as a whole and the second table includes all the homes within a neighborhood. Included in the detailed HOA table are several drop-down fields (added since the original database was created) where some data is duplicated like builder, street name, phase number, village, city, state, etc. There may be an instance where out of 200 homes, these fields may overlap on 10-25% of the records. There are probably 15 of these types of fields. There is also a section that has plat and map revisions where there may be up to 10 revisions with map book, page and date. This I know needs to go into a separate table. There is also an architectural section that has paint color, garage, sq footage, building material, etc. I assume this needs to be a separate table, although the overlap between records here is not as obvious. Finally, there are a lot of date fields including closing dates to the builder, homeowner, plat dates, other approval dates, etc. I guess I can split this into 50 linked tables, but is this efficient? It seems like if there are changes anywhere, you could forget to update a related table and mess something up. There would also be a lot of key fields to link them, etc. My questions are how do you determine how much to split out and how much can stay in the main table? Is there a rule of thumb or other that is a good guideline to determine what should be broken out and what should stay? I really don't want to have to manage 50 related tables to create queries and reports, but don't want to overload what is essentially a one table database. Any suggestions would be greatly appreciated!!! Basically any field or group of fields that are repeated, like your contractor fields (name address phone etc.) need their own table. You can try the normalize wizard, but it may not do really well in this situation, but if you are working on a copy, give it a try. -- Joseph Meehan Dia duit |
#3
|
|||
|
|||
You have a very good question here!
Much of what you ask is part art, and then most certainly a part of what you ask is science also. The concept about breaking up data into tables is what we call normalization. And, this process we call normalizing is of course is what makes a database good, or bad. In other words, if you had good data designs, then you would not be now asking how to split things up! Don't take that previous comment as a insult, or some kind of knock against you. The fact that you now are asking more questions, and also asking about "how" one goes about splitting things up show you are well onto a good learning curve here. If we all had perfect designs in the first place, we would not even need this newsgroup! Having said the above, I going to throw in my 2 cents worth on some of your questions: I built a small database to track HOA information that included lot number, address info, plat numbers, maps, books, pages, dates etc. that had about 25-30 fields to start, so it was pretty manageable. Since the initial conception, the file size has grown 3 or 4 fold, so there are close to 100-120 fields in this one table. I would like to break it out into smaller linked tables, but not sure which fields need to be split out and which can stay. Great question. One of the clear things you are learning is that adding new fields is VERY expensive in terms of design. If you add one field to a table, then all of the forms, reports, and possibly sql queries will have to be changed. Imagine how much work this would be in a complex application like a accounting package. With 100's of reports etc, one field change can cost you VERY much in terms of re-design time (that means developer time). In fact, with a bad design, you will spend so much time trying to maintain the system, that NO new features get added. So, the concept of data modeling means that you come up with a table design that allows the application to function, and NOT have to add new fields to the design. For example, that accounting system likely will allow job costing, and that job costing might be by labor, cost by weight, cost by item. Later on, you might decide that you have some costing to be done by length of material used. By using a correct design model (one that is normalized), you will in fact be able to add this new type of job cost based on "length" of materials, but NOT have to change one line of code, or even change the repots in the system. (I mean, many companies sell accounting systems with job costing, and yet no table re-designs occur each time you add a new assembly, or job cost type). Thus, the MOST important thing you can do when designing an application is to get a good data model that is normalized. I guess I can split this into 50 linked tables, but is this efficient? It seems like if there are changes anywhere, you could forget to update a related table and mess something up. There would also be a lot of key fields to link them, etc. Is it efficient? Well, actually, in many cases a normalized design (with lots of tables) is more efficient. For example, you might ask the computer to find all homes that were built by a certain builder. With a un-normalized design, the builder name, address, phone number might have been typed over and over. When you now do a search for a builder, you are pulling in a LOT of extra data. What is even worse, is perhaps during data entry, you slightly misspelled the builder name and will miss some. Lets take the above scenario with a normalized design. With a normalized design, only ONE field (a long number field) with a id that LINKS to the builder table is needed. Thus, essentially, when we search the homes file you will only have to search ONE field select * from tblHomes where BuilderID = 24 (note that you, or your users would never have to type in the above sql, as you would build a nice interface here. My point here is that each home record does NOT store the builder information, but ONLY a builder id field. So, is this more efficient? yes, sure it is as now we don't to search a text field for the builder name. Further, each home record does NOT now store the builder information over and over. This again means that we pull less data from the disk drive. Of course, for making reports, and other things, we DO take a performance hit, since often we will now have to "link" to the builder file to display the full text builder name in a report. So, normalizing and having to "join" data together often does take a bit more processing, but in normal use is rather insignificant. And, in many cases you save tons of processing (and time). For example, now that we have got only ONE copy of the builder name in the builders table, then we can change the builder name by simply going to the builders table, and only have to change ONE builder name. And, the 100's of homes when printed out on a report will reflect this change. So, this is beauty of a relational database system. And, yes, this joining process is VERY VERY fast. On a crappy slow computer, ms-access and JET can easily join 100,000 records in WELL under one second. So, what is the draw back here of having normalized data? Well, first it takes more effort to design the application correctly. Further, you have to put more effort into building your forms. However, often development costs are dramacitially reduced. For example, lets say we forgot to add a cell phone number field to the builder. Well, if we have all of the fields in the homes table, then we would have to add a field to homes table, and find ALL occurrences of that builder name, and then have to put in the phone number. With a normalized design, we only have ONE copy of the builder name, and it is simply process to add that cell phone number field to that form. So, a good design is harder work, but at the end of the day, you get a much more flexibly and ease to change the design. My questions are how do you determine how much to split out and how much can stay in the main table? Is there a rule of thumb or other that is a good guideline to determine what should be broken out and what should stay? Yes, there is some rules of thumb of here. I point to a few articles later on here... I really don't want to have to manage 50 related tables to create queries and reports, but don't want to overload what is essentially a one table database. Well, I don't think you will wind up with 50 tables, but hey, 15 or 20 is not that bad. And, yes, you most certainly will have to make more efforts to build a query, but then those quires will better answer your questions. For example, lets say you need a report of how many houses each builder made. With a normalized design, you would start your report with a simple list of builders, and then add a 'count' to the simple list. If you got a bad design, and the builder fields were in the homes table, then you have to start working on the homes table with a zillion fields. At this point, you will ask your self what is easer: To have two nice and small and simple tables, or one monster table with gazillion fields? So, breaking things up into pieces often means that you get to work with nice and SIMPLE little tables in place of monster tables with lots of fields. here is some reading on normaling concepts: ACC2002: Database Normalization Basics http://support.microsoft.com/?id=283878 Database Normalization Tips by Luke Chung http://www.fmsinc.com/tpapers/genacc...abasenorm.html Data Normalization Fundamentals by Luke Chung http://www.fmsinc.com/tpapers/datanorm/index.html 324613 - Support WebCast: Database Normalization Basics http://support.microsoft.com/?id=324613 -- Albert D. Kallal (Access MVP) Edmonton, Alberta Canada http://www.members.shaw.ca/AlbertKallal |
#4
|
|||
|
|||
On Thu, 2 Jun 2005 00:48:52 -0400, "Karl Burrows"
wrote: I built a small database to track HOA information that included lot number, address info, plat numbers, maps, books, pages, dates etc. that had about 25-30 fields to start, so it was pretty manageable. Since the initial conception, the file size has grown 3 or 4 fold, so there are close to 100-120 fields in this one table. I would like to break it out into smaller linked tables, but not sure which fields need to be split out and which can stay. There are 2 tables to the database. The first is the neighborhood information as a whole and the second table includes all the homes within a neighborhood. Included in the detailed HOA table are several drop-down fields (added since the original database was created) where some data is duplicated like builder, street name, phase number, village, city, state, etc. There may be an instance where out of 200 homes, these fields may overlap on 10-25% of the records. There are probably 15 of these types of fields. There is also a section that has plat and map revisions where there may be up to 10 revisions with map book, page and date. This I know needs to go into a separate table. There is also an architectural section that has paint color, garage, sq footage, building material, etc. I assume this needs to be a separate table, although the overlap between records here is not as obvious. Finally, there are a lot of date fields including closing dates to the builder, homeowner, plat dates, other approval dates, etc. I guess I can split this into 50 linked tables, but is this efficient? It seems like if there are changes anywhere, you could forget to update a related table and mess something up. There would also be a lot of key fields to link them, etc. My questions are how do you determine how much to split out and how much can stay in the main table? Is there a rule of thumb or other that is a good guideline to determine what should be broken out and what should stay? I really don't want to have to manage 50 related tables to create queries and reports, but don't want to overload what is essentially a one table database. You're misunderstanding how normalization works. Forgetting to update a related table is almost never an issue, because each Table stores data for a real-life Entity (person, event, or thing); the information about that entity is stored in that table and ONLY in that table. If it needs updating, you update it there, and you're done! I would see (at least) tables for Homes; Builders; a table of village/city/state; a Revisions table, as you note; tables for Colors (that's obviously a many to many relationship), building materials, etc; and a EventDates table with fields HomeID, EventType, EventDate. Normalize. You'll find that it is not a problem but a benefit. John W. Vinson[MVP] Any suggestions would be greatly appreciated!!! |
#5
|
|||
|
|||
Excellent advice!!! This started off small and manageable, but you know
once more people get involved it grows exponentially. I have already built many queries and reports, but I don't think it will too terrible to go back and add the new tables to the queries since they contain just a few of the fields. I'll spend more time reviewing your recommendations and let you know how it turns out. This is excellent and well written!!!! "Albert D.Kallal" wrote in message ... You have a very good question here! Much of what you ask is part art, and then most certainly a part of what you ask is science also. The concept about breaking up data into tables is what we call normalization. And, this process we call normalizing is of course is what makes a database good, or bad. In other words, if you had good data designs, then you would not be now asking how to split things up! Don't take that previous comment as a insult, or some kind of knock against you. The fact that you now are asking more questions, and also asking about "how" one goes about splitting things up show you are well onto a good learning curve here. If we all had perfect designs in the first place, we would not even need this newsgroup! Having said the above, I going to throw in my 2 cents worth on some of your questions: I built a small database to track HOA information that included lot number, address info, plat numbers, maps, books, pages, dates etc. that had about 25-30 fields to start, so it was pretty manageable. Since the initial conception, the file size has grown 3 or 4 fold, so there are close to 100-120 fields in this one table. I would like to break it out into smaller linked tables, but not sure which fields need to be split out and which can stay. Great question. One of the clear things you are learning is that adding new fields is VERY expensive in terms of design. If you add one field to a table, then all of the forms, reports, and possibly sql queries will have to be changed. Imagine how much work this would be in a complex application like a accounting package. With 100's of reports etc, one field change can cost you VERY much in terms of re-design time (that means developer time). In fact, with a bad design, you will spend so much time trying to maintain the system, that NO new features get added. So, the concept of data modeling means that you come up with a table design that allows the application to function, and NOT have to add new fields to the design. For example, that accounting system likely will allow job costing, and that job costing might be by labor, cost by weight, cost by item. Later on, you might decide that you have some costing to be done by length of material used. By using a correct design model (one that is normalized), you will in fact be able to add this new type of job cost based on "length" of materials, but NOT have to change one line of code, or even change the repots in the system. (I mean, many companies sell accounting systems with job costing, and yet no table re-designs occur each time you add a new assembly, or job cost type). Thus, the MOST important thing you can do when designing an application is to get a good data model that is normalized. I guess I can split this into 50 linked tables, but is this efficient? It seems like if there are changes anywhere, you could forget to update a related table and mess something up. There would also be a lot of key fields to link them, etc. Is it efficient? Well, actually, in many cases a normalized design (with lots of tables) is more efficient. For example, you might ask the computer to find all homes that were built by a certain builder. With a un-normalized design, the builder name, address, phone number might have been typed over and over. When you now do a search for a builder, you are pulling in a LOT of extra data. What is even worse, is perhaps during data entry, you slightly misspelled the builder name and will miss some. Lets take the above scenario with a normalized design. With a normalized design, only ONE field (a long number field) with a id that LINKS to the builder table is needed. Thus, essentially, when we search the homes file you will only have to search ONE field select * from tblHomes where BuilderID = 24 (note that you, or your users would never have to type in the above sql, as you would build a nice interface here. My point here is that each home record does NOT store the builder information, but ONLY a builder id field. So, is this more efficient? yes, sure it is as now we don't to search a text field for the builder name. Further, each home record does NOT now store the builder information over and over. This again means that we pull less data from the disk drive. Of course, for making reports, and other things, we DO take a performance hit, since often we will now have to "link" to the builder file to display the full text builder name in a report. So, normalizing and having to "join" data together often does take a bit more processing, but in normal use is rather insignificant. And, in many cases you save tons of processing (and time). For example, now that we have got only ONE copy of the builder name in the builders table, then we can change the builder name by simply going to the builders table, and only have to change ONE builder name. And, the 100's of homes when printed out on a report will reflect this change. So, this is beauty of a relational database system. And, yes, this joining process is VERY VERY fast. On a crappy slow computer, ms-access and JET can easily join 100,000 records in WELL under one second. So, what is the draw back here of having normalized data? Well, first it takes more effort to design the application correctly. Further, you have to put more effort into building your forms. However, often development costs are dramacitially reduced. For example, lets say we forgot to add a cell phone number field to the builder. Well, if we have all of the fields in the homes table, then we would have to add a field to homes table, and find ALL occurrences of that builder name, and then have to put in the phone number. With a normalized design, we only have ONE copy of the builder name, and it is simply process to add that cell phone number field to that form. So, a good design is harder work, but at the end of the day, you get a much more flexibly and ease to change the design. My questions are how do you determine how much to split out and how much can stay in the main table? Is there a rule of thumb or other that is a good guideline to determine what should be broken out and what should stay? Yes, there is some rules of thumb of here. I point to a few articles later on here... I really don't want to have to manage 50 related tables to create queries and reports, but don't want to overload what is essentially a one table database. Well, I don't think you will wind up with 50 tables, but hey, 15 or 20 is not that bad. And, yes, you most certainly will have to make more efforts to build a query, but then those quires will better answer your questions. For example, lets say you need a report of how many houses each builder made. With a normalized design, you would start your report with a simple list of builders, and then add a 'count' to the simple list. If you got a bad design, and the builder fields were in the homes table, then you have to start working on the homes table with a zillion fields. At this point, you will ask your self what is easer: To have two nice and small and simple tables, or one monster table with gazillion fields? So, breaking things up into pieces often means that you get to work with nice and SIMPLE little tables in place of monster tables with lots of fields. here is some reading on normaling concepts: ACC2002: Database Normalization Basics http://support.microsoft.com/?id=283878 Database Normalization Tips by Luke Chung http://www.fmsinc.com/tpapers/genacc...abasenorm.html Data Normalization Fundamentals by Luke Chung http://www.fmsinc.com/tpapers/datanorm/index.html 324613 - Support WebCast: Database Normalization Basics http://support.microsoft.com/?id=324613 -- Albert D. Kallal (Access MVP) Edmonton, Alberta Canada http://www.members.shaw.ca/AlbertKallal |
Thread Tools | |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Splitting Large Table into Smaller Tables - How Much is Too Much? | Karl Burrows | General Discussion | 4 | June 2nd, 2005 06:49 AM |
Access combo box-show name, not ID, in table? | write on | New Users | 30 | April 30th, 2005 09:11 PM |
Design help, please | SillySally | Using Forms | 27 | March 6th, 2005 04:11 AM |
Manual line break spaces on TOC or Table of tables | Eric | Page Layout | 9 | October 29th, 2004 04:42 PM |
Mutliple Tables lookup? | Westley | Database Design | 4 | June 15th, 2004 01:07 AM |