trying to minimize Write Conflicts in a multi-user database

**Banana[_2_]**

Marshall Barton wrote:
Side note. There was a cdma thread on this issue many years
ago where David Fenton posted the results of extensive
performance testing of using anything that boils down to
WHERE False. In my mind, he conclusively demonstrated that
Jet query optimization is not smart enough to recognize that
no records will be retrieved and consequently does a full
table scan.

An alternative that can use indexes to optimize data
retrieval is to compare a unique indexed field to an
impossible value. For example, while only 99.99...%
guaranteed safe,
WHERE {increment autonumber PK field} = 0
will be orders of magnitude faster than WHERE False. On a
large table, this can make a HUGE difference.

Thanks for sharing. That certainly did piqued my curiosity and decided
to try for myself, mainly to see whether it was still true today as it
was 'many years ago'.

On a 64-bit Access 2010 & Win2008 OS with 2 GB RAM & 320 GB 7200 RPM HD,
I imported into a table containing 2.8 million rows of data or about 110
MB worth, and set up a VBA routine to perform several trials of 3 criteria:

WHERE PK = 0
WHERE PK = 0.1
WHERE 1 = 0

Looking at the ShowPlan output, it would seem that the first two would
perform better than last one since it states index is used while last
one does a table scan. I threw in 0.1, reasoning that since autonumber
were integers, 0.1 would be actually a better guarantee should someone
dare to insert 0 into the primary key but this obviously means an
implicit conversion, so... Anyway, just to pique my curiosity.

However, I was unable to show that the time required to open a empty
recordset was consistently in favor of any one of those expression.
Sometimes 1=0 is the fastest, sometimes, it's 0.1 then sometimes it's 0.
However, in all cases, none of those expressions exceeded 100
milliseconds and generally floats between 3 ranges; 15-ish, 30-40-ish
and 60-70ish nearly equally among three possible criteria.

The next thing I did was to cross reference David's results, but
unfortunately I have been unsuccessful in locating the CDMA thread
cited. I would love to see what he did and verify I did not miss
anything important.

It's too early to conclude anything but from those trials I did, I am
not quite convinced that 1=0 (which get transformed into "Not 0=0" in
ShowPlan, BTW... other variations gets the same transformation) is going
to be much slower than PK = 0 or PK = 0.1. The ShowPlan may say
1=0/False/Not 0=0 forces a table scan, but I think we have to question
whether it is a case of ShowPlan not being completely honest or
detailed, especially considering that we were working with 2.8 million
records, which I certainly think would be noticeable if table scan were
actually done.

If I could get a link to that CDMA thread, I would be quite delighted.

**Tom Wickerath**

I have also run JET ShowPlan tests on the 1= 0 criteria, and indeed, I
noticed the table scans in the Showplan.out files. But, like Banana, the
performance *seemed* very fast. None of the tables in databases I work on at
work are all that huge, so I wasn't ready to make any conclusions based on
those results. But, at my place of work, we are using WANs whether we like it
or not, since most of the servers for shared folders that I use are located
in Bellevue, WA. and I work in Seattle. Still, the performance is very fast
and rock solid.

Tom Wickerath
Microsoft Access MVP
http://www.accessmvp.com/TWickerath/
__________________________________________

"Banana" wrote:

Marshall Barton wrote:
Side note. There was a cdma thread on this issue many years
ago where David Fenton posted the results of extensive
performance testing of using anything that boils down to
WHERE False. In my mind, he conclusively demonstrated that
Jet query optimization is not smart enough to recognize that
no records will be retrieved and consequently does a full
table scan.

An alternative that can use indexes to optimize data
retrieval is to compare a unique indexed field to an
impossible value. For example, while only 99.99...%
guaranteed safe,
WHERE {increment autonumber PK field} = 0
will be orders of magnitude faster than WHERE False. On a
large table, this can make a HUGE difference.

Thanks for sharing. That certainly did piqued my curiosity and decided
to try for myself, mainly to see whether it was still true today as it
was 'many years ago'.

On a 64-bit Access 2010 & Win2008 OS with 2 GB RAM & 320 GB 7200 RPM HD,
I imported into a table containing 2.8 million rows of data or about 110
MB worth, and set up a VBA routine to perform several trials of 3 criteria:

WHERE PK = 0
WHERE PK = 0.1
WHERE 1 = 0

Looking at the ShowPlan output, it would seem that the first two would
perform better than last one since it states index is used while last
one does a table scan. I threw in 0.1, reasoning that since autonumber
were integers, 0.1 would be actually a better guarantee should someone
dare to insert 0 into the primary key but this obviously means an
implicit conversion, so... Anyway, just to pique my curiosity.

However, I was unable to show that the time required to open a empty
recordset was consistently in favor of any one of those expression.
Sometimes 1=0 is the fastest, sometimes, it's 0.1 then sometimes it's 0.
However, in all cases, none of those expressions exceeded 100
milliseconds and generally floats between 3 ranges; 15-ish, 30-40-ish
and 60-70ish nearly equally among three possible criteria.

The next thing I did was to cross reference David's results, but
unfortunately I have been unsuccessful in locating the CDMA thread
cited. I would love to see what he did and verify I did not miss
anything important.

It's too early to conclude anything but from those trials I did, I am
not quite convinced that 1=0 (which get transformed into "Not 0=0" in
ShowPlan, BTW... other variations gets the same transformation) is going
to be much slower than PK = 0 or PK = 0.1. The ShowPlan may say
1=0/False/Not 0=0 forces a table scan, but I think we have to question
whether it is a case of ShowPlan not being completely honest or
detailed, especially considering that we were working with 2.8 million
records, which I certainly think would be noticeable if table scan were
actually done.

If I could get a link to that CDMA thread, I would be quite delighted.

**Banana[_2_]**

Banana wrote:
However, I was unable to show that the time required to open a empty
recordset was consistently in favor of any one of those expression.
Sometimes 1=0 is the fastest, sometimes, it's 0.1 then sometimes it's 0.
However, in all cases, none of those expressions exceeded 100
milliseconds and generally floats between 3 ranges; 15-ish, 30-40-ish
and 60-70ish nearly equally among three possible criteria.

A follow-up-

I wanted to cover two more bases:

1) I tested how much time it would take to evaluate the "Not 0=0" by
doing a For..Next loop with as many iterations as there were records in
the table. While this is not an apple to apple comparison, it should
give us some idea of how much time it takes to evaluate the Not 0=0 and
thus give us some idea of whether table scanning is actually being
done. In my first tests, I got roughly average of about 90-100
milliseconds, which is certainly slower than my reported trials for the
Not 0=0 but I do not consider this to be statistically significant so...

2) I decided to be really safe and expand my table from 2.8 millions
into 11.4 millions or about 435 MB and re-run the trials. Reasoning that
if the table scanning actually were being done, the time needed to
execute 1=0 should increase. In those trials, it did not and remained at
the same averages I reported last night with the other two criteria. The
For..Loop did increase to 400-500 milliseconds which is now significant
enough to warrant questioning whether ShowPlan is telling the complete
story when it claims that it does a table scan for 1=0/False/Not 0 = 0.

**Marshall Barton**

Banana wrote:

Marshall Barton wrote:
Side note. There was a cdma thread on this issue many years
ago where David Fenton posted the results of extensive
performance testing of using anything that boils down to
WHERE False. In my mind, he conclusively demonstrated that
Jet query optimization is not smart enough to recognize that
no records will be retrieved and consequently does a full
table scan.

An alternative that can use indexes to optimize data
retrieval is to compare a unique indexed field to an
impossible value. For example, while only 99.99...%
guaranteed safe,
WHERE {increment autonumber PK field} = 0
will be orders of magnitude faster than WHERE False. On a
large table, this can make a HUGE difference.

Thanks for sharing. That certainly did piqued my curiosity and decided
to try for myself, mainly to see whether it was still true today as it
was 'many years ago'.

On a 64-bit Access 2010 & Win2008 OS with 2 GB RAM & 320 GB 7200 RPM HD,
I imported into a table containing 2.8 million rows of data or about 110
MB worth, and set up a VBA routine to perform several trials of 3 criteria:

WHERE PK = 0
WHERE PK = 0.1
WHERE 1 = 0

Looking at the ShowPlan output, it would seem that the first two would
perform better than last one since it states index is used while last
one does a table scan. I threw in 0.1, reasoning that since autonumber
were integers, 0.1 would be actually a better guarantee should someone
dare to insert 0 into the primary key but this obviously means an
implicit conversion, so... Anyway, just to pique my curiosity.

However, I was unable to show that the time required to open a empty
recordset was consistently in favor of any one of those expression.
Sometimes 1=0 is the fastest, sometimes, it's 0.1 then sometimes it's 0.
However, in all cases, none of those expressions exceeded 100
milliseconds and generally floats between 3 ranges; 15-ish, 30-40-ish
and 60-70ish nearly equally among three possible criteria.

The next thing I did was to cross reference David's results, but
unfortunately I have been unsuccessful in locating the CDMA thread
cited. I would love to see what he did and verify I did not miss
anything important.

It's too early to conclude anything but from those trials I did, I am
not quite convinced that 1=0 (which get transformed into "Not 0=0" in
ShowPlan, BTW... other variations gets the same transformation) is going
to be much slower than PK = 0 or PK = 0.1. The ShowPlan may say
1=0/False/Not 0=0 forces a table scan, but I think we have to question
whether it is a case of ShowPlan not being completely honest or
detailed, especially considering that we were working with 2.8 million
records, which I certainly think would be noticeable if table scan were
actually done.

If I could get a link to that CDMA thread, I would be quite delighted.

I tried to search Google Groups, but that seems to have lost
its ability to find stuff in the archives. David, if you
are following this thread, we would appreciate it if you can
dredge up that test and re post your findings.

Banana, he performance results you and Tom are reporting are
very interesting, but I am not totally convinced at this
time. Performance testing can be very difficult and tricky,
especially with things that involve I/O. When caching is
used, the first run after a boot can be drastically
different than subsequent runs. Today's multi core
processors and seriously fast multi gigabyte memory may very
well make a cached table scan quick enough, maybe nearly as
fast as an index scan.

Then there is the issue of what other active processes are
also using the system's resources. A full table scan in
cached memory on a lightly loaded system might be pretty
fast, but when there are other processes making large
demands for processor cycles and memory, the result could be
a completely different story.

Tom's performance over a WAN (how fast?) is interesting and
warrants further analysis under varying user scenarios.

More than a little intriguing is the show plan use of Not
0=0 when the query used some other expression for False.
This strongly implies that the query optimizer can recognize
expressions that evaluate to False. Why it would then say
it will use a full table scan seems contradictory to me.

--
Marsh
MVP [MS Access]

**John W. Vinson**

On Mon, 18 Jan 2010 10:00:33 -0600, Marshall Barton
wrote:

More than a little intriguing is the show plan use of Not
0=0 when the query used some other expression for False.
This strongly implies that the query optimizer can recognize
expressions that evaluate to False. Why it would then say
it will use a full table scan seems contradictory to me.

I wonder if it's the same effect that causes references to Rnd() in a query to
return the same value for all rows: the query engine determines that no table
field is included in the expression and evaluates it only once. Might the same
be the case with 0=0 or False as a criterion?
--

John W. Vinson [MVP]

**Marshall Barton**

John W. Vinson wrote:

Marshall Barton wrote:

More than a little intriguing is the show plan use of Not
0=0 when the query used some other expression for False.
This strongly implies that the query optimizer can recognize
expressions that evaluate to False. Why it would then say
it will use a full table scan seems contradictory to me.

I wonder if it's the same effect that causes references to Rnd() in a query to
return the same value for all rows: the query engine determines that no table
field is included in the expression and evaluates it only once. Might the same
be the case with 0=0 or False as a criterion?

Good thought John. Seems like very logical behavior, until
they fail to use that info and do a table scan anyway (if
that's still what happens?)

--
Marsh
MVP [MS Access]

**Banana[_2_]**

Marshall Barton wrote:
I tried to search Google Groups, but that seems to have lost
its ability to find stuff in the archives. David, if you
are following this thread, we would appreciate it if you can
dredge up that test and re post your findings.

Thanks for the efforts- I was worried I may have had missed something
obvious in my attempt. I really wish there was a better archiving
solution. Ironically enough, when I was searching the archive, I ran
across a old thread where the poster said she couldn't find a cited
thread even with deja.com, and that was a decade ago! Guess nothing much
really has changed.

Banana, he performance results you and Tom are reporting are
very interesting, but I am not totally convinced at this
time. Performance testing can be very difficult and tricky,
especially with things that involve I/O. When caching is
used, the first run after a boot can be drastically
different than subsequent runs.

I don't doubt there will be many variables that could skew the tests
which was why I wanted to cross reference David's test to check what I
could have had missed.

FWIW- using OLE automation to start up an Access application and run
only one query using CurrentDb.Execute (e.g. no saved query), then
immediately closing, quitting and cleaning up does not appear to
materially affect the time. The Showplan seems to suggest that a new
instance of engine was started up via OLE Automation, so this should
reflect the boot time/cold start/no caching, I'd think.

Today's multi core processors and seriously fast multi gigabyte memory may very
well make a cached table scan quick enough, maybe nearly as
fast as an index scan.

I'm not so sure about multi-cores/CPU processing. AFAIK, Access
continues to be single-threaded so it'd always use one core so the
benefit of mulitple cores/CPU would be indirect in that OS may be able
to allocate competing resources to other cores/CPUs but beyond that, not
much difference. Note that while JET/ACE has a setting for numbers of
threads, KB articles suggests that thread safety was achieved only if
you use DAO or OLEDB provider and even so, thread-safety is limited.

I also tried tweaking the threads setting from 1 to 4 (default is 3) and
the timing was not affected at least slightest, suggesting to me that
threading was not relevant here.

KB Article:
http://support.microsoft.com/kb/169395
http://support.microsoft.com/kb/299973

As for processor & memory enabling a table scan to be as fast as index
scan, this was reason why I did extra check of doing a For...Next Loop
to get a rough idea of how much time it would take to do the false
comparisons that many time. If table scan actually were being done, the
time should have had increased when I expanded the tables. It didn't
while the For...Next loop increased just as much as the increase of
records. As a further comparison, doing a "Not PK = PK" was ridiculously
slow, around 40,000 milliseconds. It's also unfair because of comparing
two variables rather than two hard coded values, but I think the point
remains- if 1=0 actually did do a table scan, it should have been just
as slow as the For...Next loop at least and certainly four times slower
after the table expanded fourfold.

Then there is the issue of what other active processes are
also using the system's resources. A full table scan in
cached memory on a lightly loaded system might be pretty
fast, but when there are other processes making large
demands for processor cycles and memory, the result could be
a completely different story.

I suppose I could try the test again by re-allocating only 512 MB to the
VM and see what happens.

More than a little intriguing is the show plan use of Not
0=0 when the query used some other expression for False.
This strongly implies that the query optimizer can recognize
expressions that evaluate to False. Why it would then say
it will use a full table scan seems contradictory to me.

Well, I've kind of found Showplan to be quite lacking in the details.
For instance, it still does not support showing plans when subqueries
are used, despite having had went through 2 more JET versions and 2 ACE
versions. Another telltale sign is that when the ShowPlan starts up, it
records the Date of 0, suggesting to me that any bugs, issues and/or
enhancements associated with ShowPlan wasn't considered high-priority
and for those reasons, I think it is prudent to take the Showplan's
output with a grain of salt.

**John Spencer**

Can you send me an email address so I can send you a zipped sample database?

If you post it here make sure you "munge" it up so the spammers cannot harvest
it.

Do something like
P aul// AT some-//-Domain.Remove.org

Please remove extraneous spaces and characters. Or tell me what it is using

Send the email to jspencer
and the remainder of my address after the at is:
hilltop.umbc.edu

John Spencer
Access MVP 2002-2005, 2007-2010
The Hilltop Institute
University of Maryland Baltimore County

Paul wrote:
If you need help building that code, post back.

John,

YES, PLEASE! I'd never be able to figure it out on my own.

I've taken several runs at trying to understand DAO and ADO coding, but I
still struggle with it. The only MS Access book I ever found that explained
it in a way I could understand it was the manual for Access 2.0, which I
only borrowed temporarily. But I didn't use it right away, and didn't
manage to retain what I understood from it.

If you know of any Web sites that explain the basics of DAO and ADO, I'd
love to check them out.

And thanks for your help with this.

Paul

**Tom Wickerath**

Well, I've kind of found Showplan to be quite lacking in the details.
For instance, it still does not support showing plans when subqueries
are used, despite having had went through 2 more JET versions and 2 ACE
versions. Another telltale sign is that when the ShowPlan starts up, it
records the Date of 0, suggesting to me that any bugs, issues and/or
enhancements associated with ShowPlan wasn't considered high-priority
and for those reasons, I think it is prudent to take the Showplan's
output with a grain of salt.

I would sure like to see improvements made to the ShowPlan functionality.
Perhaps several of us can put that forward as a suggestion to the Access Team.

Tom Wickerath
Microsoft Access MVP
http://www.accessmvp.com/TWickerath/
__________________________________________

"Banana" wrote:

Marshall Barton wrote:
I tried to search Google Groups, but that seems to have lost
its ability to find stuff in the archives. David, if you
are following this thread, we would appreciate it if you can
dredge up that test and re post your findings.

Thanks for the efforts- I was worried I may have had missed something
obvious in my attempt. I really wish there was a better archiving
solution. Ironically enough, when I was searching the archive, I ran
across a old thread where the poster said she couldn't find a cited
thread even with deja.com, and that was a decade ago! Guess nothing much
really has changed.

Banana, he performance results you and Tom are reporting are
very interesting, but I am not totally convinced at this
time. Performance testing can be very difficult and tricky,
especially with things that involve I/O. When caching is
used, the first run after a boot can be drastically
different than subsequent runs.

I don't doubt there will be many variables that could skew the tests
which was why I wanted to cross reference David's test to check what I
could have had missed.

FWIW- using OLE automation to start up an Access application and run
only one query using CurrentDb.Execute (e.g. no saved query), then
immediately closing, quitting and cleaning up does not appear to
materially affect the time. The Showplan seems to suggest that a new
instance of engine was started up via OLE Automation, so this should
reflect the boot time/cold start/no caching, I'd think.

Today's multi core processors and seriously fast multi gigabyte memory may very
well make a cached table scan quick enough, maybe nearly as
fast as an index scan.

I'm not so sure about multi-cores/CPU processing. AFAIK, Access
continues to be single-threaded so it'd always use one core so the
benefit of mulitple cores/CPU would be indirect in that OS may be able
to allocate competing resources to other cores/CPUs but beyond that, not
much difference. Note that while JET/ACE has a setting for numbers of
threads, KB articles suggests that thread safety was achieved only if
you use DAO or OLEDB provider and even so, thread-safety is limited.

I also tried tweaking the threads setting from 1 to 4 (default is 3) and
the timing was not affected at least slightest, suggesting to me that
threading was not relevant here.

KB Article:
http://support.microsoft.com/kb/169395
http://support.microsoft.com/kb/299973

As for processor & memory enabling a table scan to be as fast as index
scan, this was reason why I did extra check of doing a For...Next Loop
to get a rough idea of how much time it would take to do the false
comparisons that many time. If table scan actually were being done, the
time should have had increased when I expanded the tables. It didn't
while the For...Next loop increased just as much as the increase of
records. As a further comparison, doing a "Not PK = PK" was ridiculously
slow, around 40,000 milliseconds. It's also unfair because of comparing
two variables rather than two hard coded values, but I think the point
remains- if 1=0 actually did do a table scan, it should have been just
as slow as the For...Next loop at least and certainly four times slower
after the table expanded fourfold.

Then there is the issue of what other active processes are
also using the system's resources. A full table scan in
cached memory on a lightly loaded system might be pretty
fast, but when there are other processes making large
demands for processor cycles and memory, the result could be
a completely different story.

I suppose I could try the test again by re-allocating only 512 MB to the
VM and see what happens.

More than a little intriguing is the show plan use of Not
0=0 when the query used some other expression for False.
This strongly implies that the query optimizer can recognize
expressions that evaluate to False. Why it would then say
it will use a full table scan seems contradictory to me.

Well, I've kind of found Showplan to be quite lacking in the details.
For instance, it still does not support showing plans when subqueries
are used, despite having had went through 2 more JET versions and 2 ACE
versions. Another telltale sign is that when the ShowPlan starts up, it
records the Date of 0, suggesting to me that any bugs, issues and/or
enhancements associated with ShowPlan wasn't considered high-priority
and for those reasons, I think it is prudent to take the Showplan's
output with a grain of salt.

**Tom Wickerath**

Hi Marsh,

I tried to search Google Groups, but that seems to have lost
its ability to find stuff in the archives. David, if you
are following this thread, we would appreciate it if you can
dredge up that test and re post your findings.

I sent a private e-mail to David late last night, inquiring about any past
test results. He replied as follows:

"I don't recall anything of the sort, and don't see it in Google
Groups. Nor do I have archives of my posts from way back when.

Sorry I can't help. Maybe Marshall has a better reference?"

Performance testing can be very difficult and tricky,
especially with things that involve I/O.

I agree. When I am doing such testing at work, I typically reboot several
times, between each test, and I make sure to shut down as many applications
as I can, including Outlook, to try to make a somewhat stable baseline. On my
work PC, I cannot shut off the antivirus, so I just have to live with the
possibility that it may be adding some network load.

I think a good way to test the 1=0 question might be to start with a really
large table, like Banana did, and monitor the amount of data transferred for:

1.) An intentional table scan
Run some query that forces all records to be brought over a network wire,
perhaps an aggregrate query to sum a numeric field.

2.) WHERE conditions that include 1=0, WHERE False, etc.

One needs to first display the network monitor icon in the System Tray. In
Windows XP, one does this via:

Control Panel | Network Connections

Double-click the connection of interest, and then left-click on Properties.
Select the check box for the option that reads "Show icon in notification
area when connected". When you double-click the resulting System Tray icon,
you will see Packets Sent and Packets Received values. With a baseline that
is as repeatable as possible (ie. Outlook and any other networked
applications shut down), read the Packets Received value before and after
each individual test. The difference (delta) represents how many packets of
data was transferred to solve each query. Of course, one must keep in mind
that some other application that you may not be able to shut down may have
caused some of the traffic for a given test. So, one can run the test several
times, in an attempt to make sure there is not a contributing influence from
some other application. You need a split application, with the BE on a
network share, in order to use this method. Several years ago, John Viescas
recommended this method to me as a way of testing how much data actually
comes over the network wire.

Tom Wickerath
Microsoft Access MVP
http://www.accessmvp.com/TWickerath/
__________________________________________

"Marshall Barton" wrote:

I tried to search Google Groups, but that seems to have lost
its ability to find stuff in the archives. David, if you
are following this thread, we would appreciate it if you can
dredge up that test and re post your findings.

Banana, he performance results you and Tom are reporting are
very interesting, but I am not totally convinced at this
time. Performance testing can be very difficult and tricky,
especially with things that involve I/O. When caching is
used, the first run after a boot can be drastically
different than subsequent runs. Today's multi core
processors and seriously fast multi gigabyte memory may very
well make a cached table scan quick enough, maybe nearly as
fast as an index scan.

Then there is the issue of what other active processes are
also using the system's resources. A full table scan in
cached memory on a lightly loaded system might be pretty
fast, but when there are other processes making large
demands for processor cycles and memory, the result could be
a completely different story.

Tom's performance over a WAN (how fast?) is interesting and
warrants further analysis under varying user scenarios.

More than a little intriguing is the show plan use of Not
0=0 when the query used some other expression for False.
This strongly implies that the query optimizer can recognize
expressions that evaluate to False. Why it would then say
it will use a full table scan seems contradictory to me.

--
Marsh
MVP [MS Access]

Thread Tools
Show Printable Version Email this Page
Display Modes
Linear Mode Switch to Hybrid Mode Switch to Threaded Mode