If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. |
|
|
|
Thread Tools | Display Modes |
#1
|
|||
|
|||
Reduce size of data
I have the assignment to imporve a database's performance.
The data is in a separate back end database that has just under 3 million rows. It has no index. All queries are running as table scans, of course. I want to add some indexes that should fix this nicely. But there's no room. It's almost 2 gigabytes now. I've found that several columns are much larger than necessary. Here are the stats: Datatype Length Number Change text 50 1 15 decimal 18 27 7 decimal 18 2 date/time 1 integer 2 The above Number column is the number of such columns in the table. I am expecting to impact the size of the table by changing a 50 character text column to 15 characters, and by reducing the precision of the 29 decimal columns to 7. I could not do this by just changing the datatypes on a full table. Something about not enough memory. (I like Jet less and less. I would never have this kind of mess with MSDE!) So, I made a copy of the table, cleared out all the records, compacted this, and made the changes to the 27 decimal columns. The table was just over 100K bytes in size. I then linked to the database containing the copy of the data and created an append query to put the data in. When this finished, I had saved less than 100K bytes, a fraction of 1% of the data size. Does not the size of a Decimal datatype depend on the precision? If not, then why is there a precision. If you're going to store 10 bytes all the time, why not allow them all to be used to hold data? Allowing two digits (nibbles) per byte, a decimal of precision 18 should require 10 bytes to store (add one nibble for a sign and divide by two, rounding up). I chose 7, although 6 would have been sufficient for all existing data, since I expect it's going to add one for a sign anyway, so 6 + 1 = 7 would be 4 bytes, and 7 + 1 = 8 would be 4 bytes anyway, so there would be no savings there. Just my theory. Anyway, after completing the append, the table had reduced in size by less than 100K bytes (out of just under 2 GB). By my calculation, I was expecting a reduction of just over 50%. Does Jet not store the decimal datatype in varying sized pieces, according to the precision? Is my formula of (Precision + 1) / 2 not what you would expect? I then repeated all the steps, reducing the text column from 50 to 15. This I expected would save 105MB (35 bytes X 3 million). Bottom line: the file is now larger than ever! Does anyone know how a database can be reduced in size to accomplish this? Tom Ellison |
#2
|
|||
|
|||
Reduce size of data
Tried another approach. Created a new database and typed in the table's
entire definition, but with the reduced sizes. Still not smaller. I'm expecting it to be half the size, but almost no change. Not even 1%. Tom Ellison "Tom Ellison" wrote in message ... I have the assignment to imporve a database's performance. The data is in a separate back end database that has just under 3 million rows. It has no index. All queries are running as table scans, of course. I want to add some indexes that should fix this nicely. But there's no room. It's almost 2 gigabytes now. I've found that several columns are much larger than necessary. Here are the stats: Datatype Length Number Change text 50 1 15 decimal 18 27 7 decimal 18 2 date/time 1 integer 2 The above Number column is the number of such columns in the table. I am expecting to impact the size of the table by changing a 50 character text column to 15 characters, and by reducing the precision of the 29 decimal columns to 7. I could not do this by just changing the datatypes on a full table. Something about not enough memory. (I like Jet less and less. I would never have this kind of mess with MSDE!) So, I made a copy of the table, cleared out all the records, compacted this, and made the changes to the 27 decimal columns. The table was just over 100K bytes in size. I then linked to the database containing the copy of the data and created an append query to put the data in. When this finished, I had saved less than 100K bytes, a fraction of 1% of the data size. Does not the size of a Decimal datatype depend on the precision? If not, then why is there a precision. If you're going to store 10 bytes all the time, why not allow them all to be used to hold data? Allowing two digits (nibbles) per byte, a decimal of precision 18 should require 10 bytes to store (add one nibble for a sign and divide by two, rounding up). I chose 7, although 6 would have been sufficient for all existing data, since I expect it's going to add one for a sign anyway, so 6 + 1 = 7 would be 4 bytes, and 7 + 1 = 8 would be 4 bytes anyway, so there would be no savings there. Just my theory. Anyway, after completing the append, the table had reduced in size by less than 100K bytes (out of just under 2 GB). By my calculation, I was expecting a reduction of just over 50%. Does Jet not store the decimal datatype in varying sized pieces, according to the precision? Is my formula of (Precision + 1) / 2 not what you would expect? I then repeated all the steps, reducing the text column from 50 to 15. This I expected would save 105MB (35 bytes X 3 million). Bottom line: the file is now larger than ever! Does anyone know how a database can be reduced in size to accomplish this? Tom Ellison |
#3
|
|||
|
|||
Reduce size of data
I changed them to "single" and it cut the file size by 75%. Go figure!
Tom Ellison "Tom Ellison" wrote in message ... I have the assignment to imporve a database's performance. The data is in a separate back end database that has just under 3 million rows. It has no index. All queries are running as table scans, of course. I want to add some indexes that should fix this nicely. But there's no room. It's almost 2 gigabytes now. I've found that several columns are much larger than necessary. Here are the stats: Datatype Length Number Change text 50 1 15 decimal 18 27 7 decimal 18 2 date/time 1 integer 2 The above Number column is the number of such columns in the table. I am expecting to impact the size of the table by changing a 50 character text column to 15 characters, and by reducing the precision of the 29 decimal columns to 7. I could not do this by just changing the datatypes on a full table. Something about not enough memory. (I like Jet less and less. I would never have this kind of mess with MSDE!) So, I made a copy of the table, cleared out all the records, compacted this, and made the changes to the 27 decimal columns. The table was just over 100K bytes in size. I then linked to the database containing the copy of the data and created an append query to put the data in. When this finished, I had saved less than 100K bytes, a fraction of 1% of the data size. Does not the size of a Decimal datatype depend on the precision? If not, then why is there a precision. If you're going to store 10 bytes all the time, why not allow them all to be used to hold data? Allowing two digits (nibbles) per byte, a decimal of precision 18 should require 10 bytes to store (add one nibble for a sign and divide by two, rounding up). I chose 7, although 6 would have been sufficient for all existing data, since I expect it's going to add one for a sign anyway, so 6 + 1 = 7 would be 4 bytes, and 7 + 1 = 8 would be 4 bytes anyway, so there would be no savings there. Just my theory. Anyway, after completing the append, the table had reduced in size by less than 100K bytes (out of just under 2 GB). By my calculation, I was expecting a reduction of just over 50%. Does Jet not store the decimal datatype in varying sized pieces, according to the precision? Is my formula of (Precision + 1) / 2 not what you would expect? I then repeated all the steps, reducing the text column from 50 to 15. This I expected would save 105MB (35 bytes X 3 million). Bottom line: the file is now larger than ever! Does anyone know how a database can be reduced in size to accomplish this? Tom Ellison |
#4
|
|||
|
|||
Reduce size of data
AFAIK Jet stores Decimals like dBase does: no nibbles, but one byte per
digit, so at least 18 bytes for a decimal of precision 18. However, if you only need 7 digits precision you could replace your Decimals with Longs and hard-code the position of the decimal point. OTOH Jet does not store empty space if a Text field is not full: a 10-character string will occupy the same amount of space whether the field size is 10 or 255. (This of course means storing a size and a pointer for each value.) Bottom line: you're very near the limits of what Jet's designed for, and some flavour of SQL client/server looks like the sensible way to go. On Tue, 7 Feb 2006 22:32:54 -0600, "Tom Ellison" wrote: Allowing two digits (nibbles) per byte, a decimal of precision 18 should require 10 bytes to store (add one nibble for a sign and divide by two, rounding up). I chose 7, although 6 would have been sufficient for all existing data, since I expect it's going to add one for a sign anyway, so 6 + 1 = 7 would be 4 bytes, and 7 + 1 = 8 would be 4 bytes anyway, so there would be no savings there. Just my theory. -- John Nurick [Microsoft Access MVP] Please respond in the newgroup and not by email. |
#5
|
|||
|
|||
Reduce size of data
Another good reason to never use the Decimal data type in JET.
-- Allen Browne - Microsoft MVP. Perth, Western Australia. Tips for Access users - http://allenbrowne.com/tips.html Reply to group, rather than allenbrowne at mvps dot org. "Tom Ellison" wrote in message ... I changed them to "single" and it cut the file size by 75%. Go figure! Tom Ellison |
#6
|
|||
|
|||
Reduce size of data
Dear John:
MSDE has same limits, but probably better storage algorithms. I'm not sure there would be enough benefit to switching to MSDE. Client will not upgrade to full SQL Server. Expense of server OS, SQL Server, plus server hardware would be nearly $4K I expect. No where close to what he would spend. I've used the same procedure in changing to a Single datatype, and the database dropped by 80% in size. Go figure! It didn't drop even 1% changing from precision 18 to 7. No explanation from me! So, as you can see, I had already discussed SQL Server as an alternative. It's one I certainly like. My initial results are in. Indexing just one numeric column causes a 83% increase in performance. Pretty good start. Maybe enough to finish the project. I just hope Single datatype isn't a problem. It's not dollars and cents, but some kind of ratios and such. Probably this will work, though. I'll have to explain why 9.17 can result in records that show 9.17, although it's stored as 9.169993. Life is a series of trade-offs, eh? I understand about the text pointer thing. This app doesn't search for that text column. But surely the area where the text is stored makes the overall database sensitive to the length of the strings. But I guess that's the actual length, not the maximum allocated length. So now that makes sense also. If the text were all padded to full length with spaces, that would be different then. Thanks for your participation. Tom Ellison "John Nurick" wrote in message ... AFAIK Jet stores Decimals like dBase does: no nibbles, but one byte per digit, so at least 18 bytes for a decimal of precision 18. However, if you only need 7 digits precision you could replace your Decimals with Longs and hard-code the position of the decimal point. OTOH Jet does not store empty space if a Text field is not full: a 10-character string will occupy the same amount of space whether the field size is 10 or 255. (This of course means storing a size and a pointer for each value.) Bottom line: you're very near the limits of what Jet's designed for, and some flavour of SQL client/server looks like the sensible way to go. On Tue, 7 Feb 2006 22:32:54 -0600, "Tom Ellison" wrote: Allowing two digits (nibbles) per byte, a decimal of precision 18 should require 10 bytes to store (add one nibble for a sign and divide by two, rounding up). I chose 7, although 6 would have been sufficient for all existing data, since I expect it's going to add one for a sign anyway, so 6 + 1 = 7 would be 4 bytes, and 7 + 1 = 8 would be 4 bytes anyway, so there would be no savings there. Just my theory. -- John Nurick [Microsoft Access MVP] Please respond in the newgroup and not by email. |
#7
|
|||
|
|||
Reduce size of data
Dear Allen:
I had never run into it before. It's "new" isn't it. I mean, I knew it was there since the 2000 version, but I've never had to work with it. Was it in '97? That's the second version of Access I learned, and the first I learned thoroughly (oh for the good ole days). The equivalent in MSDE is very useful. I like it. Data size increases one byte at a time, and it even tells you the size of storage in bytes when you're working on it. That's where I got my idea of how I thought this should work. Apparently, it doesn't work that way at all. Thanks again, Allen. I'm absorbing as fast as I can tonight. Tom Ellison "Allen Browne" wrote in message ... Another good reason to never use the Decimal data type in JET. -- Allen Browne - Microsoft MVP. Perth, Western Australia. Tips for Access users - http://allenbrowne.com/tips.html Reply to group, rather than allenbrowne at mvps dot org. "Tom Ellison" wrote in message ... I changed them to "single" and it cut the file size by 75%. Go figure! Tom Ellison |
#8
|
|||
|
|||
Reduce size of data
Correct: JET 3.x (Access 97) did not have the Decimal data type. VBA did
support a Variant of subtype Decimal, but had serious trouble displaying it on a form back then. I find it extremely embarrassing that JET 4 gets this query wrong: SELECT MyField FROM MyTable ORDER BY MyField DESC; If MyField is an unindexed Decimal field, the negative values appear first, then the positive ones, and the zeros and nulls sort unpredictibly! Can you believe that this remains unfixed after more than 5 years? Info, sample, and k.b. reference: http://allenbrowne.com/bug-08.html -- Allen Browne - Microsoft MVP. Perth, Western Australia. Tips for Access users - http://allenbrowne.com/tips.html Reply to group, rather than allenbrowne at mvps dot org. "Tom Ellison" wrote in message ... Dear Allen: I had never run into it before. It's "new" isn't it. I mean, I knew it was there since the 2000 version, but I've never had to work with it. Was it in '97? That's the second version of Access I learned, and the first I learned thoroughly (oh for the good ole days). The equivalent in MSDE is very useful. I like it. Data size increases one byte at a time, and it even tells you the size of storage in bytes when you're working on it. That's where I got my idea of how I thought this should work. Apparently, it doesn't work that way at all. Thanks again, Allen. I'm absorbing as fast as I can tonight. Tom Ellison "Allen Browne" wrote in message ... Another good reason to never use the Decimal data type in JET. |
#9
|
|||
|
|||
Reduce size of data
Dear Allen:
I have been using Currency/Money for almost all numeric columns for some time (other than good old integer and long, of course). I believe I'll stick with that. What I cannot conceive is that any such error would be tolerated in SQL Server. No such error would ever be released, let alone escape correction for years. That speaks for itself. Tom Ellison "Allen Browne" wrote in message ... Correct: JET 3.x (Access 97) did not have the Decimal data type. VBA did support a Variant of subtype Decimal, but had serious trouble displaying it on a form back then. I find it extremely embarrassing that JET 4 gets this query wrong: SELECT MyField FROM MyTable ORDER BY MyField DESC; If MyField is an unindexed Decimal field, the negative values appear first, then the positive ones, and the zeros and nulls sort unpredictibly! Can you believe that this remains unfixed after more than 5 years? Info, sample, and k.b. reference: http://allenbrowne.com/bug-08.html -- Allen Browne - Microsoft MVP. Perth, Western Australia. Tips for Access users - http://allenbrowne.com/tips.html Reply to group, rather than allenbrowne at mvps dot org. "Tom Ellison" wrote in message ... Dear Allen: I had never run into it before. It's "new" isn't it. I mean, I knew it was there since the 2000 version, but I've never had to work with it. Was it in '97? That's the second version of Access I learned, and the first I learned thoroughly (oh for the good ole days). The equivalent in MSDE is very useful. I like it. Data size increases one byte at a time, and it even tells you the size of storage in bytes when you're working on it. That's where I got my idea of how I thought this should work. Apparently, it doesn't work that way at all. Thanks again, Allen. I'm absorbing as fast as I can tonight. Tom Ellison "Allen Browne" wrote in message ... Another good reason to never use the Decimal data type in JET. |
#10
|
|||
|
|||
Reduce size of data
Tom
Have you looked into the new Express SQL Server 2005 -- download for free. Regards Jeff Boyce Microsoft Office/Access MVP Microsoft IT Academy Program Mentor "Tom Ellison" wrote in message ... Dear John: MSDE has same limits, but probably better storage algorithms. I'm not sure there would be enough benefit to switching to MSDE. Client will not upgrade to full SQL Server. Expense of server OS, SQL Server, plus server hardware would be nearly $4K I expect. No where close to what he would spend. I've used the same procedure in changing to a Single datatype, and the database dropped by 80% in size. Go figure! It didn't drop even 1% changing from precision 18 to 7. No explanation from me! So, as you can see, I had already discussed SQL Server as an alternative. It's one I certainly like. My initial results are in. Indexing just one numeric column causes a 83% increase in performance. Pretty good start. Maybe enough to finish the project. I just hope Single datatype isn't a problem. It's not dollars and cents, but some kind of ratios and such. Probably this will work, though. I'll have to explain why 9.17 can result in records that show 9.17, although it's stored as 9.169993. Life is a series of trade-offs, eh? I understand about the text pointer thing. This app doesn't search for that text column. But surely the area where the text is stored makes the overall database sensitive to the length of the strings. But I guess that's the actual length, not the maximum allocated length. So now that makes sense also. If the text were all padded to full length with spaces, that would be different then. Thanks for your participation. Tom Ellison "John Nurick" wrote in message ... AFAIK Jet stores Decimals like dBase does: no nibbles, but one byte per digit, so at least 18 bytes for a decimal of precision 18. However, if you only need 7 digits precision you could replace your Decimals with Longs and hard-code the position of the decimal point. OTOH Jet does not store empty space if a Text field is not full: a 10-character string will occupy the same amount of space whether the field size is 10 or 255. (This of course means storing a size and a pointer for each value.) Bottom line: you're very near the limits of what Jet's designed for, and some flavour of SQL client/server looks like the sensible way to go. On Tue, 7 Feb 2006 22:32:54 -0600, "Tom Ellison" wrote: Allowing two digits (nibbles) per byte, a decimal of precision 18 should require 10 bytes to store (add one nibble for a sign and divide by two, rounding up). I chose 7, although 6 would have been sufficient for all existing data, since I expect it's going to add one for a sign anyway, so 6 + 1 = 7 would be 4 bytes, and 7 + 1 = 8 would be 4 bytes anyway, so there would be no savings there. Just my theory. -- John Nurick [Microsoft Access MVP] Please respond in the newgroup and not by email. |
|
Thread Tools | |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Query does not order the same data in the same way | Milković Aleksandar | Running & Setting Up Queries | 10 | December 15th, 2005 12:01 PM |
Microsoft Office should use an advance data crosslinking model | m_jurrens | General Discussions | 0 | December 15th, 2005 07:33 AM |
How to Reduce Spreadsheet Size and Speed | ExcelMonkey | Worksheet Functions | 4 | August 9th, 2005 06:38 PM |
multiple docs, one data source | kp | Mailmerge | 12 | January 31st, 2005 04:41 PM |
VBA Code problem error 9 | Speedy | General Discussion | 19 | October 15th, 2004 09:05 PM |