Splitting Large Table into Smaller Tables - How Much is Too Much?

#1 June 2nd, 2005, 05:48 AM

I built a small database to track HOA information that included lot number,
address info, plat numbers, maps, books, pages, dates etc. that had about
25-30 fields to start, so it was pretty manageable. Since the initial
conception, the file size has grown 3 or 4 fold, so there are close to
100-120 fields in this one table. I would like to break it out into smaller
linked tables, but not sure which fields need to be split out and which can
stay. There are 2 tables to the database. The first is the neighborhood
information as a whole and the second table includes all the homes within a
neighborhood. Included in the detailed HOA table are several drop-down
fields (added since the original database was created) where some data is
duplicated like builder, street name, phase number, village, city, state,
etc. There may be an instance where out of 200 homes, these fields may
overlap on 10-25% of the records. There are probably 15 of these types of
fields. There is also a section that has plat and map revisions where there
may be up to 10 revisions with map book, page and date. This I know needs
to go into a separate table. There is also an architectural section that
has paint color, garage, sq footage, building material, etc. I assume this
needs to be a separate table, although the overlap between records here is
not as obvious. Finally, there are a lot of date fields including closing
dates to the builder, homeowner, plat dates, other approval dates, etc.

I guess I can split this into 50 linked tables, but is this efficient? It
seems like if there are changes anywhere, you could forget to update a
related table and mess something up. There would also be a lot of key
fields to link them, etc.

My questions are how do you determine how much to split out and how much can
stay in the main table? Is there a rule of thumb or other that is a good
guideline to determine what should be broken out and what should stay? I
really don't want to have to manage 50 related tables to create queries and
reports, but don't want to overload what is essentially a one table
database.

Any suggestions would be greatly appreciated!!!

#2 June 2nd, 2005, 06:10 AM

Karl Burrows wrote:
I built a small database to track HOA information that included lot
number, address info, plat numbers, maps, books, pages, dates etc.
that had about 25-30 fields to start, so it was pretty manageable.
Since the initial conception, the file size has grown 3 or 4 fold, so
there are close to 100-120 fields in this one table. I would like to
break it out into smaller linked tables, but not sure which fields
need to be split out and which can stay. There are 2 tables to the
database. The first is the neighborhood information as a whole and
the second table includes all the homes within a neighborhood.
Included in the detailed HOA table are several drop-down fields
(added since the original database was created) where some data is
duplicated like builder, street name, phase number, village, city,
state, etc. There may be an instance where out of 200 homes, these
fields may overlap on 10-25% of the records. There are probably 15
of these types of fields. There is also a section that has plat and
map revisions where there may be up to 10 revisions with map book,
page and date. This I know needs to go into a separate table. There
is also an architectural section that has paint color, garage, sq
footage, building material, etc. I assume this needs to be a
separate table, although the overlap between records here is not as
obvious. Finally, there are a lot of date fields including closing
dates to the builder, homeowner, plat dates, other approval dates,
etc.

I guess I can split this into 50 linked tables, but is this
efficient? It seems like if there are changes anywhere, you could
forget to update a related table and mess something up. There would
also be a lot of key fields to link them, etc.

My questions are how do you determine how much to split out and how
much can stay in the main table? Is there a rule of thumb or other
that is a good guideline to determine what should be broken out and
what should stay? I really don't want to have to manage 50 related
tables to create queries and reports, but don't want to overload what
is essentially a one table database.

Any suggestions would be greatly appreciated!!!

Basically any field or group of fields that are repeated, like your
contractor fields (name address phone etc.) need their own table.

You can try the normalize wizard, but it may not do really well in this
situation, but if you are working on a copy, give it a try.

--
Joseph Meehan

Dia duit

#3 June 2nd, 2005, 06:35 AM

You have a very good question here!

Much of what you ask is part art, and then most certainly a part of what you
ask is science also.

The concept about breaking up data into tables is what we call
normalization. And, this process we call normalizing is of course is what
makes a database good, or bad. In other words, if you had good data designs,
then you would not be now asking how to split things up! Don't take that
previous comment as a insult, or some kind of knock against you. The fact
that you now are asking more questions, and also asking about "how" one goes
about splitting things up show you are well onto a good learning curve here.
If we all had perfect designs in the first place, we would not even need
this newsgroup!

Having said the above, I going to throw in my 2 cents worth on some of your
questions:

I built a small database to track HOA information that included lot number,
address info, plat numbers, maps, books, pages, dates etc. that had about
25-30 fields to start, so it was pretty manageable. Since the initial
conception, the file size has grown 3 or 4 fold, so there are close to
100-120 fields in this one table. I would like to break it out into
smaller
linked tables, but not sure which fields need to be split out and which
can
stay.

Great question. One of the clear things you are learning is that adding new
fields is VERY expensive in terms of design. If you add one field to a
table, then all of the forms, reports, and possibly sql queries will have
to be changed. Imagine how much work this would be in a complex application
like a accounting package. With 100's of reports etc, one field change can
cost you VERY much in terms of re-design time (that means developer time).
In fact, with a bad design, you will spend so much time trying to maintain
the system, that NO new features get added.

So, the concept of data modeling means that you come up with a table design
that allows the application to function, and NOT have to add new fields to
the design. For example, that accounting system likely will allow job
costing, and that job costing might be by labor, cost by weight, cost by
item. Later on, you might decide that you have some costing to be done by
length of material used. By using a correct design model (one that is
normalized), you will in fact be able to add this new type of job cost based
on "length" of materials, but NOT have to change one line of code, or even
change the repots in the system. (I mean, many companies sell accounting
systems with job costing, and yet no table re-designs occur each time
you add a new assembly, or job cost type).

Thus, the MOST important thing you can do when designing an application is
to
get a good data model that is normalized.

I guess I can split this into 50 linked tables, but is this efficient? It
seems like if there are changes anywhere, you could forget to update a
related table and mess something up. There would also be a lot of key
fields to link them, etc.

Is it efficient? Well, actually, in many cases a normalized design (with
lots of tables) is more efficient. For example, you might ask the computer
to find all homes that were built by a certain builder. With a un-normalized
design, the builder name, address, phone number might have been typed over
and over. When you now do a search for a builder, you are pulling in a LOT
of extra data. What is even worse, is perhaps during data entry, you
slightly misspelled the builder name and will miss some.

Lets take the above scenario with a normalized design. With a normalized
design,
only ONE field (a long number field) with a id that LINKS to the builder
table is needed. Thus, essentially, when we search the homes file you will
only have to search ONE field

select * from tblHomes where BuilderID = 24

(note that you, or your users would never have to type in the above sql, as
you would build a nice interface here. My point here is that each home
record does NOT store the builder information, but ONLY a builder id field.
So, is this more efficient? yes, sure it is as now we don't to search a text
field for the builder name. Further, each home record does NOT now store the
builder information over and over. This again means that we pull less data
from the disk drive. Of course, for making reports, and other things, we DO
take a performance hit, since often we will now have to "link" to the
builder file to display the full text builder name in a report. So,
normalizing and having to "join" data together often does take a bit more
processing, but in normal use is rather insignificant. And, in many cases
you save tons of processing (and time). For example, now that we have got
only ONE copy of the builder name in the builders table, then we can change
the builder name by simply going to the builders table, and only have to
change ONE builder name. And, the 100's of homes when printed out on a
report will reflect this change. So, this is beauty of a relational database
system. And, yes, this joining process is VERY VERY fast. On a crappy slow
computer, ms-access and JET can easily join 100,000 records in WELL under
one second.

So, what is the draw back here of having normalized data? Well, first it
takes more effort to design the application correctly. Further, you have to
put more effort into building your forms. However, often development costs
are dramacitially reduced. For example, lets say we forgot to add a cell
phone number field to the builder. Well, if we have all of the fields in the
homes table, then we would have to add a field to homes table, and find ALL
occurrences of that builder name, and then have to put in the phone number.
With a normalized design, we only have ONE copy of the builder name, and it
is
simply process to add that cell phone number field to that form. So, a good
design is harder work, but at the end of the day, you get a much more
flexibly and ease to change the design.

My questions are how do you determine how much to split out and how much
can
stay in the main table? Is there a rule of thumb or other that is a good
guideline to determine what should be broken out and what should stay?

Yes, there is some rules of thumb of here. I point to a few articles later
on
here...

I
really don't want to have to manage 50 related tables to create queries
and
reports, but don't want to overload what is essentially a one table
database.

Well, I don't think you will wind up with 50 tables, but hey, 15 or 20 is
not that bad. And, yes, you most certainly will have to make more efforts to
build a query, but then those quires will better answer your questions.
For example, lets say you need a report of how many houses each builder
made.
With a normalized design, you would start your report with a simple list of
builders, and then add a 'count' to the simple list. If you got a bad
design, and the builder fields were in the homes table, then you have to
start working on the homes table with a zillion fields. At this point, you
will ask your self what is easer: To have two nice and small and simple
tables, or one monster table with gazillion fields? So, breaking things up
into pieces often means that you get to work with nice and SIMPLE little
tables in place of monster tables with lots of fields.

here is some reading on normaling concepts:

ACC2002: Database Normalization Basics
http://support.microsoft.com/?id=283878

Database Normalization Tips by Luke Chung
http://www.fmsinc.com/tpapers/genacc...abasenorm.html

Data Normalization Fundamentals by Luke Chung
http://www.fmsinc.com/tpapers/datanorm/index.html

324613 - Support WebCast: Database Normalization Basics
http://support.microsoft.com/?id=324613

--
Albert D. Kallal (Access MVP)
Edmonton, Alberta Canada

http://www.members.shaw.ca/AlbertKallal

#4 June 2nd, 2005, 06:45 AM

On Thu, 2 Jun 2005 00:48:52 -0400, "Karl Burrows"
wrote:

I built a small database to track HOA information that included lot number,
address info, plat numbers, maps, books, pages, dates etc. that had about
25-30 fields to start, so it was pretty manageable. Since the initial
conception, the file size has grown 3 or 4 fold, so there are close to
100-120 fields in this one table. I would like to break it out into smaller
linked tables, but not sure which fields need to be split out and which can
stay. There are 2 tables to the database. The first is the neighborhood
information as a whole and the second table includes all the homes within a
neighborhood. Included in the detailed HOA table are several drop-down
fields (added since the original database was created) where some data is
duplicated like builder, street name, phase number, village, city, state,
etc. There may be an instance where out of 200 homes, these fields may
overlap on 10-25% of the records. There are probably 15 of these types of
fields. There is also a section that has plat and map revisions where there
may be up to 10 revisions with map book, page and date. This I know needs
to go into a separate table. There is also an architectural section that
has paint color, garage, sq footage, building material, etc. I assume this
needs to be a separate table, although the overlap between records here is
not as obvious. Finally, there are a lot of date fields including closing
dates to the builder, homeowner, plat dates, other approval dates, etc.

I guess I can split this into 50 linked tables, but is this efficient? It
seems like if there are changes anywhere, you could forget to update a
related table and mess something up. There would also be a lot of key
fields to link them, etc.

My questions are how do you determine how much to split out and how much can
stay in the main table? Is there a rule of thumb or other that is a good
guideline to determine what should be broken out and what should stay? I
really don't want to have to manage 50 related tables to create queries and
reports, but don't want to overload what is essentially a one table
database.

You're misunderstanding how normalization works.

Forgetting to update a related table is almost never an issue, because
each Table stores data for a real-life Entity (person, event, or
thing); the information about that entity is stored in that table and
ONLY in that table. If it needs updating, you update it there, and
you're done!

I would see (at least) tables for Homes; Builders; a table of
village/city/state; a Revisions table, as you note; tables for Colors
(that's obviously a many to many relationship), building materials,
etc; and a EventDates table with fields HomeID, EventType, EventDate.

Normalize. You'll find that it is not a problem but a benefit.

John W. Vinson[MVP]
Any suggestions would be greatly appreciated!!!

#5 June 2nd, 2005, 06:49 AM

Excellent advice!!! This started off small and manageable, but you know
once more people get involved it grows exponentially. I have already built
many queries and reports, but I don't think it will too terrible to go back
and add the new tables to the queries since they contain just a few of the
fields. I'll spend more time reviewing your recommendations and let you
know how it turns out.

This is excellent and well written!!!!

"Albert D.Kallal" wrote in message
...
You have a very good question here!

Much of what you ask is part art, and then most certainly a part of what you
ask is science also.

The concept about breaking up data into tables is what we call
normalization. And, this process we call normalizing is of course is what
makes a database good, or bad. In other words, if you had good data designs,
then you would not be now asking how to split things up! Don't take that
previous comment as a insult, or some kind of knock against you. The fact
that you now are asking more questions, and also asking about "how" one goes
about splitting things up show you are well onto a good learning curve here.
If we all had perfect designs in the first place, we would not even need
this newsgroup!

Having said the above, I going to throw in my 2 cents worth on some of your
questions:

I built a small database to track HOA information that included lot number,
address info, plat numbers, maps, books, pages, dates etc. that had about
25-30 fields to start, so it was pretty manageable. Since the initial
conception, the file size has grown 3 or 4 fold, so there are close to
100-120 fields in this one table. I would like to break it out into
smaller
linked tables, but not sure which fields need to be split out and which
can
stay.

Great question. One of the clear things you are learning is that adding new
fields is VERY expensive in terms of design. If you add one field to a
table, then all of the forms, reports, and possibly sql queries will have
to be changed. Imagine how much work this would be in a complex application
like a accounting package. With 100's of reports etc, one field change can
cost you VERY much in terms of re-design time (that means developer time).
In fact, with a bad design, you will spend so much time trying to maintain
the system, that NO new features get added.

So, the concept of data modeling means that you come up with a table design
that allows the application to function, and NOT have to add new fields to
the design. For example, that accounting system likely will allow job
costing, and that job costing might be by labor, cost by weight, cost by
item. Later on, you might decide that you have some costing to be done by
length of material used. By using a correct design model (one that is
normalized), you will in fact be able to add this new type of job cost based
on "length" of materials, but NOT have to change one line of code, or even
change the repots in the system. (I mean, many companies sell accounting
systems with job costing, and yet no table re-designs occur each time
you add a new assembly, or job cost type).

Thus, the MOST important thing you can do when designing an application is
to
get a good data model that is normalized.

I guess I can split this into 50 linked tables, but is this efficient? It
seems like if there are changes anywhere, you could forget to update a
related table and mess something up. There would also be a lot of key
fields to link them, etc.

Is it efficient? Well, actually, in many cases a normalized design (with
lots of tables) is more efficient. For example, you might ask the computer
to find all homes that were built by a certain builder. With a un-normalized
design, the builder name, address, phone number might have been typed over
and over. When you now do a search for a builder, you are pulling in a LOT
of extra data. What is even worse, is perhaps during data entry, you
slightly misspelled the builder name and will miss some.

Lets take the above scenario with a normalized design. With a normalized
design,
only ONE field (a long number field) with a id that LINKS to the builder
table is needed. Thus, essentially, when we search the homes file you will
only have to search ONE field

select * from tblHomes where BuilderID = 24

(note that you, or your users would never have to type in the above sql, as
you would build a nice interface here. My point here is that each home
record does NOT store the builder information, but ONLY a builder id field.
So, is this more efficient? yes, sure it is as now we don't to search a text
field for the builder name. Further, each home record does NOT now store the
builder information over and over. This again means that we pull less data
from the disk drive. Of course, for making reports, and other things, we DO
take a performance hit, since often we will now have to "link" to the
builder file to display the full text builder name in a report. So,
normalizing and having to "join" data together often does take a bit more
processing, but in normal use is rather insignificant. And, in many cases
you save tons of processing (and time). For example, now that we have got
only ONE copy of the builder name in the builders table, then we can change
the builder name by simply going to the builders table, and only have to
change ONE builder name. And, the 100's of homes when printed out on a
report will reflect this change. So, this is beauty of a relational database
system. And, yes, this joining process is VERY VERY fast. On a crappy slow
computer, ms-access and JET can easily join 100,000 records in WELL under
one second.

So, what is the draw back here of having normalized data? Well, first it
takes more effort to design the application correctly. Further, you have to
put more effort into building your forms. However, often development costs
are dramacitially reduced. For example, lets say we forgot to add a cell
phone number field to the builder. Well, if we have all of the fields in the
homes table, then we would have to add a field to homes table, and find ALL
occurrences of that builder name, and then have to put in the phone number.
With a normalized design, we only have ONE copy of the builder name, and it
is
simply process to add that cell phone number field to that form. So, a good
design is harder work, but at the end of the day, you get a much more
flexibly and ease to change the design.

My questions are how do you determine how much to split out and how much
can
stay in the main table? Is there a rule of thumb or other that is a good
guideline to determine what should be broken out and what should stay?

Yes, there is some rules of thumb of here. I point to a few articles later
on
here...

I
really don't want to have to manage 50 related tables to create queries
and
reports, but don't want to overload what is essentially a one table
database.

Well, I don't think you will wind up with 50 tables, but hey, 15 or 20 is
not that bad. And, yes, you most certainly will have to make more efforts to
build a query, but then those quires will better answer your questions.
For example, lets say you need a report of how many houses each builder
made.
With a normalized design, you would start your report with a simple list of
builders, and then add a 'count' to the simple list. If you got a bad
design, and the builder fields were in the homes table, then you have to
start working on the homes table with a zillion fields. At this point, you
will ask your self what is easer: To have two nice and small and simple
tables, or one monster table with gazillion fields? So, breaking things up
into pieces often means that you get to work with nice and SIMPLE little
tables in place of monster tables with lots of fields.

here is some reading on normaling concepts:

ACC2002: Database Normalization Basics
http://support.microsoft.com/?id=283878

Database Normalization Tips by Luke Chung
http://www.fmsinc.com/tpapers/genacc...abasenorm.html

Data Normalization Fundamentals by Luke Chung
http://www.fmsinc.com/tpapers/datanorm/index.html

324613 - Support WebCast: Database Normalization Basics
http://support.microsoft.com/?id=324613

--
Albert D. Kallal (Access MVP)
Edmonton, Alberta Canada

http://www.members.shaw.ca/AlbertKallal

Thread Tools
Show Printable Version Email this Page
Display Modes
Linear Mode Switch to Hybrid Mode Switch to Threaded Mode

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Splitting Large Table into Smaller Tables - How Much is Too Much?	Karl Burrows	General Discussion	4	June 2nd, 2005 06:49 AM
Access combo box-show name, not ID, in table?	write on	New Users	30	April 30th, 2005 09:11 PM
Design help, please	SillySally	Using Forms	27	March 6th, 2005 04:11 AM
Manual line break spaces on TOC or Table of tables	Eric	Page Layout	9	October 29th, 2004 04:42 PM
Mutliple Tables lookup?	Westley	Database Design	4	June 15th, 2004 01:07 AM