Reduce size of data

Craig Alexander Morrison · February 8th, 2006, 02:02 PM posted to microsoft.public.access.tablesdbdesign

Worth every penny. (g)

--
Slainte

Craig Alexander Morrison
Crawbridge Data (Scotland) Limited

"Jeff Boyce" -DISCARD_HYPHEN_TO_END wrote in message
...
Tom

Have you looked into the new Express SQL Server 2005 -- download for free.

Regards

Jeff Boyce
Microsoft Office/Access MVP
Microsoft IT Academy Program Mentor

"Tom Ellison" wrote in message
...
Dear John:

MSDE has same limits, but probably better storage algorithms. I'm not
sure
there would be enough benefit to switching to MSDE. Client will not
upgrade
to full SQL Server. Expense of server OS, SQL Server, plus server
hardware
would be nearly $4K I expect. No where close to what he would spend.

I've used the same procedure in changing to a Single datatype, and the
database dropped by 80% in size. Go figure! It didn't drop even 1%
changing from precision 18 to 7. No explanation from me!

So, as you can see, I had already discussed SQL Server as an alternative.
It's one I certainly like.

My initial results are in. Indexing just one numeric column causes a 83%
increase in performance. Pretty good start. Maybe enough to finish the
project. I just hope Single datatype isn't a problem. It's not dollars
and
cents, but some kind of ratios and such. Probably this will work,
though.
I'll have to explain why 9.17 can result in records that show 9.17,
although it's stored as 9.169993. Life is a series of trade-offs, eh?

I understand about the text pointer thing. This app doesn't search for
that
text column. But surely the area where the text is stored makes the
overall
database sensitive to the length of the strings. But I guess that's the
actual length, not the maximum allocated length. So now that makes sense
also. If the text were all padded to full length with spaces, that would
be
different then.

Thanks for your participation.

Tom Ellison

"John Nurick" wrote in message
...
AFAIK Jet stores Decimals like dBase does: no nibbles, but one byte per
digit, so at least 18 bytes for a decimal of precision 18. However, if
you only need 7 digits precision you could replace your Decimals with
Longs and hard-code the position of the decimal point.

OTOH Jet does not store empty space if a Text field is not full: a
10-character string will occupy the same amount of space whether the
field size is 10 or 255. (This of course means storing a size and a
pointer for each value.)

Bottom line: you're very near the limits of what Jet's designed for,
and
some flavour of SQL client/server looks like the sensible way to go.

On Tue, 7 Feb 2006 22:32:54 -0600, "Tom Ellison"
wrote:

Allowing two digits (nibbles) per byte, a decimal of precision 18
should
require 10 bytes to store (add one nibble for a sign and divide by two,
rounding up). I chose 7, although 6 would have been sufficient for all
existing data, since I expect it's going to add one for a sign anyway,
so
6
+ 1 = 7 would be 4 bytes, and 7 + 1 = 8 would be 4 bytes anyway, so
there
would be no savings there. Just my theory.

--
John Nurick [Microsoft Access MVP]

Please respond in the newgroup and not by email.

Jerry Whittle · February 8th, 2006, 05:09 PM posted to microsoft.public.access.tablesdbdesign

Hi Tom,

About the text field. Making it's size smaller won't reduce the database
size UNLESS it also truncated data. If you put one character in a text field
set to 255, Access only stores one character. No waste. Now if you had 255
characters in that field and reduce the size to 5, the database file would
get smaller, but then you've lost 250 characters from that record.

Then why not just set all text fields to 255 characters? (1) The wizards use
the field size to attempt to size controls on forms and reports. It could get
ugly fast if your zipcode fields were set to 255! (2) Speaking of zipcode
fields, by setting the field size to 5 it would prevent someone from typing
in something grossly incorrect or prevent the Zip+4 format. Not as good as an
input mask, but it works.
--
Jerry Whittle
Light. Strong. Cheap. Pick two. Keith Bontrager - Bicycle Builder.

"Tom Ellison" wrote:

I have the assignment to imporve a database's performance.

The data is in a separate back end database that has just under 3 million
rows. It has no index. All queries are running as table scans, of course.

I want to add some indexes that should fix this nicely. But there's no
room. It's almost 2 gigabytes now.

I've found that several columns are much larger than necessary. Here are
the stats:

Datatype Length Number Change
text 50 1 15
decimal 18 27 7
decimal 18 2
date/time 1
integer 2

The above Number column is the number of such columns in the table. I am
expecting to impact the size of the table by changing a 50 character text
column to 15 characters, and by reducing the precision of the 29 decimal
columns to 7.

I could not do this by just changing the datatypes on a full table.
Something about not enough memory. (I like Jet less and less. I would
never have this kind of mess with MSDE!)

So, I made a copy of the table, cleared out all the records, compacted this,
and made the changes to the 27 decimal columns. The table was just over
100K bytes in size. I then linked to the database containing the copy of
the data and created an append query to put the data in.

When this finished, I had saved less than 100K bytes, a fraction of 1% of
the data size. Does not the size of a Decimal datatype depend on the
precision? If not, then why is there a precision. If you're going to store
10 bytes all the time, why not allow them all to be used to hold data?
Allowing two digits (nibbles) per byte, a decimal of precision 18 should
require 10 bytes to store (add one nibble for a sign and divide by two,
rounding up). I chose 7, although 6 would have been sufficient for all
existing data, since I expect it's going to add one for a sign anyway, so 6
+ 1 = 7 would be 4 bytes, and 7 + 1 = 8 would be 4 bytes anyway, so there
would be no savings there. Just my theory.

Anyway, after completing the append, the table had reduced in size by less
than 100K bytes (out of just under 2 GB). By my calculation, I was
expecting a reduction of just over 50%. Does Jet not store the decimal
datatype in varying sized pieces, according to the precision? Is my formula
of (Precision + 1) / 2 not what you would expect?

I then repeated all the steps, reducing the text column from 50 to 15. This
I expected would save 105MB (35 bytes X 3 million). Bottom line: the file
is now larger than ever!

Does anyone know how a database can be reduced in size to accomplish this?

Tom Ellison

mnature · February 8th, 2006, 06:09 PM posted to microsoft.public.access.tablesdbdesign

I hate to ask this, but I didn't see it on a quick scan through the thread:
Have you done a Compact and Repair on your database, after making changes?

"Tom Ellison" wrote:

I have the assignment to imporve a database's performance.

The data is in a separate back end database that has just under 3 million
rows. It has no index. All queries are running as table scans, of course.

I want to add some indexes that should fix this nicely. But there's no
room. It's almost 2 gigabytes now.

I've found that several columns are much larger than necessary. Here are
the stats:

Datatype Length Number Change
text 50 1 15
decimal 18 27 7
decimal 18 2
date/time 1
integer 2

The above Number column is the number of such columns in the table. I am
expecting to impact the size of the table by changing a 50 character text
column to 15 characters, and by reducing the precision of the 29 decimal
columns to 7.

I could not do this by just changing the datatypes on a full table.
Something about not enough memory. (I like Jet less and less. I would
never have this kind of mess with MSDE!)

So, I made a copy of the table, cleared out all the records, compacted this,
and made the changes to the 27 decimal columns. The table was just over
100K bytes in size. I then linked to the database containing the copy of
the data and created an append query to put the data in.

When this finished, I had saved less than 100K bytes, a fraction of 1% of
the data size. Does not the size of a Decimal datatype depend on the
precision? If not, then why is there a precision. If you're going to store
10 bytes all the time, why not allow them all to be used to hold data?
Allowing two digits (nibbles) per byte, a decimal of precision 18 should
require 10 bytes to store (add one nibble for a sign and divide by two,
rounding up). I chose 7, although 6 would have been sufficient for all
existing data, since I expect it's going to add one for a sign anyway, so 6
+ 1 = 7 would be 4 bytes, and 7 + 1 = 8 would be 4 bytes anyway, so there
would be no savings there. Just my theory.

Anyway, after completing the append, the table had reduced in size by less
than 100K bytes (out of just under 2 GB). By my calculation, I was
expecting a reduction of just over 50%. Does Jet not store the decimal
datatype in varying sized pieces, according to the precision? Is my formula
of (Precision + 1) / 2 not what you would expect?

I then repeated all the steps, reducing the text column from 50 to 15. This
I expected would save 105MB (35 bytes X 3 million). Bottom line: the file
is now larger than ever!

Does anyone know how a database can be reduced in size to accomplish this?

Tom Ellison

Tom Ellison · February 8th, 2006, 08:37 PM posted to microsoft.public.access.tablesdbdesign

Dear Jeff:

Unless they have changed their minds, this has a 1 GB database size limit,
half that of MSDE. Not helpful for this application. I'll be getting into
this some time soon, for sure! But not with respect to the current project.

Thanks, Jeff!

Tom Ellison

"Jeff Boyce" -DISCARD_HYPHEN_TO_END wrote in message
...
Tom

Have you looked into the new Express SQL Server 2005 -- download for free.

Regards

Jeff Boyce
Microsoft Office/Access MVP
Microsoft IT Academy Program Mentor

"Tom Ellison" wrote in message
...
Dear John:

MSDE has same limits, but probably better storage algorithms. I'm not
sure
there would be enough benefit to switching to MSDE. Client will not
upgrade
to full SQL Server. Expense of server OS, SQL Server, plus server
hardware
would be nearly $4K I expect. No where close to what he would spend.

I've used the same procedure in changing to a Single datatype, and the
database dropped by 80% in size. Go figure! It didn't drop even 1%
changing from precision 18 to 7. No explanation from me!

So, as you can see, I had already discussed SQL Server as an alternative.
It's one I certainly like.

My initial results are in. Indexing just one numeric column causes a 83%
increase in performance. Pretty good start. Maybe enough to finish the
project. I just hope Single datatype isn't a problem. It's not dollars
and
cents, but some kind of ratios and such. Probably this will work,
though.
I'll have to explain why 9.17 can result in records that show 9.17,
although it's stored as 9.169993. Life is a series of trade-offs, eh?

I understand about the text pointer thing. This app doesn't search for
that
text column. But surely the area where the text is stored makes the
overall
database sensitive to the length of the strings. But I guess that's the
actual length, not the maximum allocated length. So now that makes sense
also. If the text were all padded to full length with spaces, that would
be
different then.

Thanks for your participation.

Tom Ellison

"John Nurick" wrote in message
...
AFAIK Jet stores Decimals like dBase does: no nibbles, but one byte per
digit, so at least 18 bytes for a decimal of precision 18. However, if
you only need 7 digits precision you could replace your Decimals with
Longs and hard-code the position of the decimal point.

OTOH Jet does not store empty space if a Text field is not full: a
10-character string will occupy the same amount of space whether the
field size is 10 or 255. (This of course means storing a size and a
pointer for each value.)

Bottom line: you're very near the limits of what Jet's designed for,
and
some flavour of SQL client/server looks like the sensible way to go.

On Tue, 7 Feb 2006 22:32:54 -0600, "Tom Ellison"
wrote:

Allowing two digits (nibbles) per byte, a decimal of precision 18
should
require 10 bytes to store (add one nibble for a sign and divide by two,
rounding up). I chose 7, although 6 would have been sufficient for all
existing data, since I expect it's going to add one for a sign anyway,
so
6
+ 1 = 7 would be 4 bytes, and 7 + 1 = 8 would be 4 bytes anyway, so
there
would be no savings there. Just my theory.

--
John Nurick [Microsoft Access MVP]

Please respond in the newgroup and not by email.

Tom Ellison · February 8th, 2006, 08:40 PM posted to microsoft.public.access.tablesdbdesign

Dear MNatu

I mentioned in the thread that I, "compacted this". By that I did mean
compact and repair. That's necessary to get the file size down, as you
said. Then it goes back up when appending.

So, yes, I did this.

Thanks for your contribution!

Tom Ellison

"mnature" wrote in message
...
I hate to ask this, but I didn't see it on a quick scan through the thread:
Have you done a Compact and Repair on your database, after making changes?

"Tom Ellison" wrote:

I have the assignment to imporve a database's performance.

The data is in a separate back end database that has just under 3 million
rows. It has no index. All queries are running as table scans, of
course.

I want to add some indexes that should fix this nicely. But there's no
room. It's almost 2 gigabytes now.

I've found that several columns are much larger than necessary. Here are
the stats:

Datatype Length Number Change
text 50 1 15
decimal 18 27 7
decimal 18 2
date/time 1
integer 2

The above Number column is the number of such columns in the table. I am
expecting to impact the size of the table by changing a 50 character text
column to 15 characters, and by reducing the precision of the 29 decimal
columns to 7.

I could not do this by just changing the datatypes on a full table.
Something about not enough memory. (I like Jet less and less. I would
never have this kind of mess with MSDE!)

So, I made a copy of the table, cleared out all the records, compacted
this,
and made the changes to the 27 decimal columns. The table was just over
100K bytes in size. I then linked to the database containing the copy of
the data and created an append query to put the data in.

When this finished, I had saved less than 100K bytes, a fraction of 1% of
the data size. Does not the size of a Decimal datatype depend on the
precision? If not, then why is there a precision. If you're going to
store
10 bytes all the time, why not allow them all to be used to hold data?
Allowing two digits (nibbles) per byte, a decimal of precision 18 should
require 10 bytes to store (add one nibble for a sign and divide by two,
rounding up). I chose 7, although 6 would have been sufficient for all
existing data, since I expect it's going to add one for a sign anyway, so
6
+ 1 = 7 would be 4 bytes, and 7 + 1 = 8 would be 4 bytes anyway, so there
would be no savings there. Just my theory.

Anyway, after completing the append, the table had reduced in size by
less
than 100K bytes (out of just under 2 GB). By my calculation, I was
expecting a reduction of just over 50%. Does Jet not store the decimal
datatype in varying sized pieces, according to the precision? Is my
formula
of (Precision + 1) / 2 not what you would expect?

I then repeated all the steps, reducing the text column from 50 to 15.
This
I expected would save 105MB (35 bytes X 3 million). Bottom line: the
file
is now larger than ever!

Does anyone know how a database can be reduced in size to accomplish
this?

Tom Ellison

John Nurick · February 8th, 2006, 10:30 PM posted to microsoft.public.access.tablesdbdesign

According to http://msdn.microsoft.com/vstudio/express/sql/powerful/
it's 4 GB. Or is there a gotcha I don't know about?

On Wed, 8 Feb 2006 14:37:48 -0600, "Tom Ellison"
wrote:

Dear Jeff:

Unless they have changed their minds, this has a 1 GB database size limit,
half that of MSDE. Not helpful for this application. I'll be getting into
this some time soon, for sure! But not with respect to the current project.

Thanks, Jeff!

Tom Ellison

"Jeff Boyce" -DISCARD_HYPHEN_TO_END wrote in message
...
Tom

Have you looked into the new Express SQL Server 2005 -- download for free.

Regards

Jeff Boyce
Microsoft Office/Access MVP
Microsoft IT Academy Program Ment

Tom Ellison · February 8th, 2006, 11:23 PM posted to microsoft.public.access.tablesdbdesign

Back at the Chicago regional summit in January 2004 (I think that's the one)
this was announced. I objected, as this would break my existing
installations. Hearing that it's now 4 GB is glorious news. I wish I'd
known that a week ago. I decided against recommending MSDE because it has
the 2 GB limit. A 4 GB limit on Express would be about perfect for them!
Well, I'll have to correct myself. First, research, though. It must run on
non-server platform, and I'll have to check other details. Ignorance is
such a downer. You have made my day!

Tom Ellison

"John Nurick" wrote in message
...
According to http://msdn.microsoft.com/vstudio/express/sql/powerful/
it's 4 GB. Or is there a gotcha I don't know about?

On Wed, 8 Feb 2006 14:37:48 -0600, "Tom Ellison"
wrote:

Dear Jeff:

Unless they have changed their minds, this has a 1 GB database size limit,
half that of MSDE. Not helpful for this application. I'll be getting
into
this some time soon, for sure! But not with respect to the current
project.

Thanks, Jeff!

Tom Ellison

"Jeff Boyce" -DISCARD_HYPHEN_TO_END wrote in message
...
Tom

Have you looked into the new Express SQL Server 2005 -- download for
free.

Regards

Jeff Boyce
Microsoft Office/Access MVP
Microsoft IT Academy Program Ment

Thread Tools
Show Printable Version Email this Page
Display Modes
Linear Mode Switch to Hybrid Mode Switch to Threaded Mode

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Query does not order the same data in the same way	MilkoviÄ‡ Aleksandar	Running & Setting Up Queries	10	December 15th, 2005 12:01 PM
Microsoft Office should use an advance data crosslinking model	m_jurrens	General Discussions	0	December 15th, 2005 07:33 AM
How to Reduce Spreadsheet Size and Speed	ExcelMonkey	Worksheet Functions	4	August 9th, 2005 06:38 PM
multiple docs, one data source	kp	Mailmerge	12	January 31st, 2005 04:41 PM
VBA Code problem error 9	Speedy	General Discussion	19	October 15th, 2004 09:05 PM