If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. |
|
|
Thread Tools | Display Modes |
#31
|
|||
|
|||
trying to minimize Write Conflicts in a multi-user database
Marshall Barton wrote:
Side note. There was a cdma thread on this issue many years ago where David Fenton posted the results of extensive performance testing of using anything that boils down to WHERE False. In my mind, he conclusively demonstrated that Jet query optimization is not smart enough to recognize that no records will be retrieved and consequently does a full table scan. An alternative that can use indexes to optimize data retrieval is to compare a unique indexed field to an impossible value. For example, while only 99.99...% guaranteed safe, WHERE {increment autonumber PK field} = 0 will be orders of magnitude faster than WHERE False. On a large table, this can make a HUGE difference. Thanks for sharing. That certainly did piqued my curiosity and decided to try for myself, mainly to see whether it was still true today as it was 'many years ago'. On a 64-bit Access 2010 & Win2008 OS with 2 GB RAM & 320 GB 7200 RPM HD, I imported into a table containing 2.8 million rows of data or about 110 MB worth, and set up a VBA routine to perform several trials of 3 criteria: WHERE PK = 0 WHERE PK = 0.1 WHERE 1 = 0 Looking at the ShowPlan output, it would seem that the first two would perform better than last one since it states index is used while last one does a table scan. I threw in 0.1, reasoning that since autonumber were integers, 0.1 would be actually a better guarantee should someone dare to insert 0 into the primary key but this obviously means an implicit conversion, so... Anyway, just to pique my curiosity. However, I was unable to show that the time required to open a empty recordset was consistently in favor of any one of those expression. Sometimes 1=0 is the fastest, sometimes, it's 0.1 then sometimes it's 0. However, in all cases, none of those expressions exceeded 100 milliseconds and generally floats between 3 ranges; 15-ish, 30-40-ish and 60-70ish nearly equally among three possible criteria. The next thing I did was to cross reference David's results, but unfortunately I have been unsuccessful in locating the CDMA thread cited. I would love to see what he did and verify I did not miss anything important. It's too early to conclude anything but from those trials I did, I am not quite convinced that 1=0 (which get transformed into "Not 0=0" in ShowPlan, BTW... other variations gets the same transformation) is going to be much slower than PK = 0 or PK = 0.1. The ShowPlan may say 1=0/False/Not 0=0 forces a table scan, but I think we have to question whether it is a case of ShowPlan not being completely honest or detailed, especially considering that we were working with 2.8 million records, which I certainly think would be noticeable if table scan were actually done. If I could get a link to that CDMA thread, I would be quite delighted. |
#32
|
|||
|
|||
trying to minimize Write Conflicts in a multi-user database
I have also run JET ShowPlan tests on the 1= 0 criteria, and indeed, I
noticed the table scans in the Showplan.out files. But, like Banana, the performance *seemed* very fast. None of the tables in databases I work on at work are all that huge, so I wasn't ready to make any conclusions based on those results. But, at my place of work, we are using WANs whether we like it or not, since most of the servers for shared folders that I use are located in Bellevue, WA. and I work in Seattle. Still, the performance is very fast and rock solid. Tom Wickerath Microsoft Access MVP http://www.accessmvp.com/TWickerath/ __________________________________________ "Banana" wrote: Marshall Barton wrote: Side note. There was a cdma thread on this issue many years ago where David Fenton posted the results of extensive performance testing of using anything that boils down to WHERE False. In my mind, he conclusively demonstrated that Jet query optimization is not smart enough to recognize that no records will be retrieved and consequently does a full table scan. An alternative that can use indexes to optimize data retrieval is to compare a unique indexed field to an impossible value. For example, while only 99.99...% guaranteed safe, WHERE {increment autonumber PK field} = 0 will be orders of magnitude faster than WHERE False. On a large table, this can make a HUGE difference. Thanks for sharing. That certainly did piqued my curiosity and decided to try for myself, mainly to see whether it was still true today as it was 'many years ago'. On a 64-bit Access 2010 & Win2008 OS with 2 GB RAM & 320 GB 7200 RPM HD, I imported into a table containing 2.8 million rows of data or about 110 MB worth, and set up a VBA routine to perform several trials of 3 criteria: WHERE PK = 0 WHERE PK = 0.1 WHERE 1 = 0 Looking at the ShowPlan output, it would seem that the first two would perform better than last one since it states index is used while last one does a table scan. I threw in 0.1, reasoning that since autonumber were integers, 0.1 would be actually a better guarantee should someone dare to insert 0 into the primary key but this obviously means an implicit conversion, so... Anyway, just to pique my curiosity. However, I was unable to show that the time required to open a empty recordset was consistently in favor of any one of those expression. Sometimes 1=0 is the fastest, sometimes, it's 0.1 then sometimes it's 0. However, in all cases, none of those expressions exceeded 100 milliseconds and generally floats between 3 ranges; 15-ish, 30-40-ish and 60-70ish nearly equally among three possible criteria. The next thing I did was to cross reference David's results, but unfortunately I have been unsuccessful in locating the CDMA thread cited. I would love to see what he did and verify I did not miss anything important. It's too early to conclude anything but from those trials I did, I am not quite convinced that 1=0 (which get transformed into "Not 0=0" in ShowPlan, BTW... other variations gets the same transformation) is going to be much slower than PK = 0 or PK = 0.1. The ShowPlan may say 1=0/False/Not 0=0 forces a table scan, but I think we have to question whether it is a case of ShowPlan not being completely honest or detailed, especially considering that we were working with 2.8 million records, which I certainly think would be noticeable if table scan were actually done. If I could get a link to that CDMA thread, I would be quite delighted. |
#33
|
|||
|
|||
trying to minimize Write Conflicts in a multi-user database
Banana wrote:
However, I was unable to show that the time required to open a empty recordset was consistently in favor of any one of those expression. Sometimes 1=0 is the fastest, sometimes, it's 0.1 then sometimes it's 0. However, in all cases, none of those expressions exceeded 100 milliseconds and generally floats between 3 ranges; 15-ish, 30-40-ish and 60-70ish nearly equally among three possible criteria. A follow-up- I wanted to cover two more bases: 1) I tested how much time it would take to evaluate the "Not 0=0" by doing a For..Next loop with as many iterations as there were records in the table. While this is not an apple to apple comparison, it should give us some idea of how much time it takes to evaluate the Not 0=0 and thus give us some idea of whether table scanning is actually being done. In my first tests, I got roughly average of about 90-100 milliseconds, which is certainly slower than my reported trials for the Not 0=0 but I do not consider this to be statistically significant so... 2) I decided to be really safe and expand my table from 2.8 millions into 11.4 millions or about 435 MB and re-run the trials. Reasoning that if the table scanning actually were being done, the time needed to execute 1=0 should increase. In those trials, it did not and remained at the same averages I reported last night with the other two criteria. The For..Loop did increase to 400-500 milliseconds which is now significant enough to warrant questioning whether ShowPlan is telling the complete story when it claims that it does a table scan for 1=0/False/Not 0 = 0. |
#34
|
|||
|
|||
trying to minimize Write Conflicts in a multi-user database
Banana wrote:
Marshall Barton wrote: Side note. There was a cdma thread on this issue many years ago where David Fenton posted the results of extensive performance testing of using anything that boils down to WHERE False. In my mind, he conclusively demonstrated that Jet query optimization is not smart enough to recognize that no records will be retrieved and consequently does a full table scan. An alternative that can use indexes to optimize data retrieval is to compare a unique indexed field to an impossible value. For example, while only 99.99...% guaranteed safe, WHERE {increment autonumber PK field} = 0 will be orders of magnitude faster than WHERE False. On a large table, this can make a HUGE difference. Thanks for sharing. That certainly did piqued my curiosity and decided to try for myself, mainly to see whether it was still true today as it was 'many years ago'. On a 64-bit Access 2010 & Win2008 OS with 2 GB RAM & 320 GB 7200 RPM HD, I imported into a table containing 2.8 million rows of data or about 110 MB worth, and set up a VBA routine to perform several trials of 3 criteria: WHERE PK = 0 WHERE PK = 0.1 WHERE 1 = 0 Looking at the ShowPlan output, it would seem that the first two would perform better than last one since it states index is used while last one does a table scan. I threw in 0.1, reasoning that since autonumber were integers, 0.1 would be actually a better guarantee should someone dare to insert 0 into the primary key but this obviously means an implicit conversion, so... Anyway, just to pique my curiosity. However, I was unable to show that the time required to open a empty recordset was consistently in favor of any one of those expression. Sometimes 1=0 is the fastest, sometimes, it's 0.1 then sometimes it's 0. However, in all cases, none of those expressions exceeded 100 milliseconds and generally floats between 3 ranges; 15-ish, 30-40-ish and 60-70ish nearly equally among three possible criteria. The next thing I did was to cross reference David's results, but unfortunately I have been unsuccessful in locating the CDMA thread cited. I would love to see what he did and verify I did not miss anything important. It's too early to conclude anything but from those trials I did, I am not quite convinced that 1=0 (which get transformed into "Not 0=0" in ShowPlan, BTW... other variations gets the same transformation) is going to be much slower than PK = 0 or PK = 0.1. The ShowPlan may say 1=0/False/Not 0=0 forces a table scan, but I think we have to question whether it is a case of ShowPlan not being completely honest or detailed, especially considering that we were working with 2.8 million records, which I certainly think would be noticeable if table scan were actually done. If I could get a link to that CDMA thread, I would be quite delighted. I tried to search Google Groups, but that seems to have lost its ability to find stuff in the archives. David, if you are following this thread, we would appreciate it if you can dredge up that test and re post your findings. Banana, he performance results you and Tom are reporting are very interesting, but I am not totally convinced at this time. Performance testing can be very difficult and tricky, especially with things that involve I/O. When caching is used, the first run after a boot can be drastically different than subsequent runs. Today's multi core processors and seriously fast multi gigabyte memory may very well make a cached table scan quick enough, maybe nearly as fast as an index scan. Then there is the issue of what other active processes are also using the system's resources. A full table scan in cached memory on a lightly loaded system might be pretty fast, but when there are other processes making large demands for processor cycles and memory, the result could be a completely different story. Tom's performance over a WAN (how fast?) is interesting and warrants further analysis under varying user scenarios. More than a little intriguing is the show plan use of Not 0=0 when the query used some other expression for False. This strongly implies that the query optimizer can recognize expressions that evaluate to False. Why it would then say it will use a full table scan seems contradictory to me. -- Marsh MVP [MS Access] |
#35
|
|||
|
|||
trying to minimize Write Conflicts in a multi-user database
On Mon, 18 Jan 2010 10:00:33 -0600, Marshall Barton
wrote: More than a little intriguing is the show plan use of Not 0=0 when the query used some other expression for False. This strongly implies that the query optimizer can recognize expressions that evaluate to False. Why it would then say it will use a full table scan seems contradictory to me. I wonder if it's the same effect that causes references to Rnd() in a query to return the same value for all rows: the query engine determines that no table field is included in the expression and evaluates it only once. Might the same be the case with 0=0 or False as a criterion? -- John W. Vinson [MVP] |
#36
|
|||
|
|||
trying to minimize Write Conflicts in a multi-user database
John W. Vinson wrote:
Marshall Barton wrote: More than a little intriguing is the show plan use of Not 0=0 when the query used some other expression for False. This strongly implies that the query optimizer can recognize expressions that evaluate to False. Why it would then say it will use a full table scan seems contradictory to me. I wonder if it's the same effect that causes references to Rnd() in a query to return the same value for all rows: the query engine determines that no table field is included in the expression and evaluates it only once. Might the same be the case with 0=0 or False as a criterion? Good thought John. Seems like very logical behavior, until they fail to use that info and do a table scan anyway (if that's still what happens?) -- Marsh MVP [MS Access] |
#37
|
|||
|
|||
trying to minimize Write Conflicts in a multi-user database
Marshall Barton wrote:
I tried to search Google Groups, but that seems to have lost its ability to find stuff in the archives. David, if you are following this thread, we would appreciate it if you can dredge up that test and re post your findings. Thanks for the efforts- I was worried I may have had missed something obvious in my attempt. I really wish there was a better archiving solution. Ironically enough, when I was searching the archive, I ran across a old thread where the poster said she couldn't find a cited thread even with deja.com, and that was a decade ago! Guess nothing much really has changed. Banana, he performance results you and Tom are reporting are very interesting, but I am not totally convinced at this time. Performance testing can be very difficult and tricky, especially with things that involve I/O. When caching is used, the first run after a boot can be drastically different than subsequent runs. I don't doubt there will be many variables that could skew the tests which was why I wanted to cross reference David's test to check what I could have had missed. FWIW- using OLE automation to start up an Access application and run only one query using CurrentDb.Execute (e.g. no saved query), then immediately closing, quitting and cleaning up does not appear to materially affect the time. The Showplan seems to suggest that a new instance of engine was started up via OLE Automation, so this should reflect the boot time/cold start/no caching, I'd think. Today's multi core processors and seriously fast multi gigabyte memory may very well make a cached table scan quick enough, maybe nearly as fast as an index scan. I'm not so sure about multi-cores/CPU processing. AFAIK, Access continues to be single-threaded so it'd always use one core so the benefit of mulitple cores/CPU would be indirect in that OS may be able to allocate competing resources to other cores/CPUs but beyond that, not much difference. Note that while JET/ACE has a setting for numbers of threads, KB articles suggests that thread safety was achieved only if you use DAO or OLEDB provider and even so, thread-safety is limited. I also tried tweaking the threads setting from 1 to 4 (default is 3) and the timing was not affected at least slightest, suggesting to me that threading was not relevant here. KB Article: http://support.microsoft.com/kb/169395 http://support.microsoft.com/kb/299973 As for processor & memory enabling a table scan to be as fast as index scan, this was reason why I did extra check of doing a For...Next Loop to get a rough idea of how much time it would take to do the false comparisons that many time. If table scan actually were being done, the time should have had increased when I expanded the tables. It didn't while the For...Next loop increased just as much as the increase of records. As a further comparison, doing a "Not PK = PK" was ridiculously slow, around 40,000 milliseconds. It's also unfair because of comparing two variables rather than two hard coded values, but I think the point remains- if 1=0 actually did do a table scan, it should have been just as slow as the For...Next loop at least and certainly four times slower after the table expanded fourfold. Then there is the issue of what other active processes are also using the system's resources. A full table scan in cached memory on a lightly loaded system might be pretty fast, but when there are other processes making large demands for processor cycles and memory, the result could be a completely different story. I suppose I could try the test again by re-allocating only 512 MB to the VM and see what happens. More than a little intriguing is the show plan use of Not 0=0 when the query used some other expression for False. This strongly implies that the query optimizer can recognize expressions that evaluate to False. Why it would then say it will use a full table scan seems contradictory to me. Well, I've kind of found Showplan to be quite lacking in the details. For instance, it still does not support showing plans when subqueries are used, despite having had went through 2 more JET versions and 2 ACE versions. Another telltale sign is that when the ShowPlan starts up, it records the Date of 0, suggesting to me that any bugs, issues and/or enhancements associated with ShowPlan wasn't considered high-priority and for those reasons, I think it is prudent to take the Showplan's output with a grain of salt. |
#38
|
|||
|
|||
trying to minimize Write Conflicts in a multi-user database
Can you send me an email address so I can send you a zipped sample database?
If you post it here make sure you "munge" it up so the spammers cannot harvest it. Do something like P aul// AT some-//-Domain.Remove.org Please remove extraneous spaces and characters. Or tell me what it is using Send the email to jspencer and the remainder of my address after the at is: hilltop.umbc.edu John Spencer Access MVP 2002-2005, 2007-2010 The Hilltop Institute University of Maryland Baltimore County Paul wrote: If you need help building that code, post back. John, YES, PLEASE! I'd never be able to figure it out on my own. I've taken several runs at trying to understand DAO and ADO coding, but I still struggle with it. The only MS Access book I ever found that explained it in a way I could understand it was the manual for Access 2.0, which I only borrowed temporarily. But I didn't use it right away, and didn't manage to retain what I understood from it. If you know of any Web sites that explain the basics of DAO and ADO, I'd love to check them out. And thanks for your help with this. Paul |
#39
|
|||
|
|||
trying to minimize Write Conflicts in a multi-user database
Well, I've kind of found Showplan to be quite lacking in the details.
For instance, it still does not support showing plans when subqueries are used, despite having had went through 2 more JET versions and 2 ACE versions. Another telltale sign is that when the ShowPlan starts up, it records the Date of 0, suggesting to me that any bugs, issues and/or enhancements associated with ShowPlan wasn't considered high-priority and for those reasons, I think it is prudent to take the Showplan's output with a grain of salt. I would sure like to see improvements made to the ShowPlan functionality. Perhaps several of us can put that forward as a suggestion to the Access Team. Tom Wickerath Microsoft Access MVP http://www.accessmvp.com/TWickerath/ __________________________________________ "Banana" wrote: Marshall Barton wrote: I tried to search Google Groups, but that seems to have lost its ability to find stuff in the archives. David, if you are following this thread, we would appreciate it if you can dredge up that test and re post your findings. Thanks for the efforts- I was worried I may have had missed something obvious in my attempt. I really wish there was a better archiving solution. Ironically enough, when I was searching the archive, I ran across a old thread where the poster said she couldn't find a cited thread even with deja.com, and that was a decade ago! Guess nothing much really has changed. Banana, he performance results you and Tom are reporting are very interesting, but I am not totally convinced at this time. Performance testing can be very difficult and tricky, especially with things that involve I/O. When caching is used, the first run after a boot can be drastically different than subsequent runs. I don't doubt there will be many variables that could skew the tests which was why I wanted to cross reference David's test to check what I could have had missed. FWIW- using OLE automation to start up an Access application and run only one query using CurrentDb.Execute (e.g. no saved query), then immediately closing, quitting and cleaning up does not appear to materially affect the time. The Showplan seems to suggest that a new instance of engine was started up via OLE Automation, so this should reflect the boot time/cold start/no caching, I'd think. Today's multi core processors and seriously fast multi gigabyte memory may very well make a cached table scan quick enough, maybe nearly as fast as an index scan. I'm not so sure about multi-cores/CPU processing. AFAIK, Access continues to be single-threaded so it'd always use one core so the benefit of mulitple cores/CPU would be indirect in that OS may be able to allocate competing resources to other cores/CPUs but beyond that, not much difference. Note that while JET/ACE has a setting for numbers of threads, KB articles suggests that thread safety was achieved only if you use DAO or OLEDB provider and even so, thread-safety is limited. I also tried tweaking the threads setting from 1 to 4 (default is 3) and the timing was not affected at least slightest, suggesting to me that threading was not relevant here. KB Article: http://support.microsoft.com/kb/169395 http://support.microsoft.com/kb/299973 As for processor & memory enabling a table scan to be as fast as index scan, this was reason why I did extra check of doing a For...Next Loop to get a rough idea of how much time it would take to do the false comparisons that many time. If table scan actually were being done, the time should have had increased when I expanded the tables. It didn't while the For...Next loop increased just as much as the increase of records. As a further comparison, doing a "Not PK = PK" was ridiculously slow, around 40,000 milliseconds. It's also unfair because of comparing two variables rather than two hard coded values, but I think the point remains- if 1=0 actually did do a table scan, it should have been just as slow as the For...Next loop at least and certainly four times slower after the table expanded fourfold. Then there is the issue of what other active processes are also using the system's resources. A full table scan in cached memory on a lightly loaded system might be pretty fast, but when there are other processes making large demands for processor cycles and memory, the result could be a completely different story. I suppose I could try the test again by re-allocating only 512 MB to the VM and see what happens. More than a little intriguing is the show plan use of Not 0=0 when the query used some other expression for False. This strongly implies that the query optimizer can recognize expressions that evaluate to False. Why it would then say it will use a full table scan seems contradictory to me. Well, I've kind of found Showplan to be quite lacking in the details. For instance, it still does not support showing plans when subqueries are used, despite having had went through 2 more JET versions and 2 ACE versions. Another telltale sign is that when the ShowPlan starts up, it records the Date of 0, suggesting to me that any bugs, issues and/or enhancements associated with ShowPlan wasn't considered high-priority and for those reasons, I think it is prudent to take the Showplan's output with a grain of salt. |
#40
|
|||
|
|||
trying to minimize Write Conflicts in a multi-user database
Hi Marsh,
I tried to search Google Groups, but that seems to have lost its ability to find stuff in the archives. David, if you are following this thread, we would appreciate it if you can dredge up that test and re post your findings. I sent a private e-mail to David late last night, inquiring about any past test results. He replied as follows: "I don't recall anything of the sort, and don't see it in Google Groups. Nor do I have archives of my posts from way back when. Sorry I can't help. Maybe Marshall has a better reference?" Performance testing can be very difficult and tricky, especially with things that involve I/O. I agree. When I am doing such testing at work, I typically reboot several times, between each test, and I make sure to shut down as many applications as I can, including Outlook, to try to make a somewhat stable baseline. On my work PC, I cannot shut off the antivirus, so I just have to live with the possibility that it may be adding some network load. I think a good way to test the 1=0 question might be to start with a really large table, like Banana did, and monitor the amount of data transferred for: 1.) An intentional table scan Run some query that forces all records to be brought over a network wire, perhaps an aggregrate query to sum a numeric field. 2.) WHERE conditions that include 1=0, WHERE False, etc. One needs to first display the network monitor icon in the System Tray. In Windows XP, one does this via: Control Panel | Network Connections Double-click the connection of interest, and then left-click on Properties. Select the check box for the option that reads "Show icon in notification area when connected". When you double-click the resulting System Tray icon, you will see Packets Sent and Packets Received values. With a baseline that is as repeatable as possible (ie. Outlook and any other networked applications shut down), read the Packets Received value before and after each individual test. The difference (delta) represents how many packets of data was transferred to solve each query. Of course, one must keep in mind that some other application that you may not be able to shut down may have caused some of the traffic for a given test. So, one can run the test several times, in an attempt to make sure there is not a contributing influence from some other application. You need a split application, with the BE on a network share, in order to use this method. Several years ago, John Viescas recommended this method to me as a way of testing how much data actually comes over the network wire. Tom Wickerath Microsoft Access MVP http://www.accessmvp.com/TWickerath/ __________________________________________ "Marshall Barton" wrote: I tried to search Google Groups, but that seems to have lost its ability to find stuff in the archives. David, if you are following this thread, we would appreciate it if you can dredge up that test and re post your findings. Banana, he performance results you and Tom are reporting are very interesting, but I am not totally convinced at this time. Performance testing can be very difficult and tricky, especially with things that involve I/O. When caching is used, the first run after a boot can be drastically different than subsequent runs. Today's multi core processors and seriously fast multi gigabyte memory may very well make a cached table scan quick enough, maybe nearly as fast as an index scan. Then there is the issue of what other active processes are also using the system's resources. A full table scan in cached memory on a lightly loaded system might be pretty fast, but when there are other processes making large demands for processor cycles and memory, the result could be a completely different story. Tom's performance over a WAN (how fast?) is interesting and warrants further analysis under varying user scenarios. More than a little intriguing is the show plan use of Not 0=0 when the query used some other expression for False. This strongly implies that the query optimizer can recognize expressions that evaluate to False. Why it would then say it will use a full table scan seems contradictory to me. -- Marsh MVP [MS Access] |
Thread Tools | |
Display Modes | |
|
|