View Single Post
  #28  
Old February 26th, 2010, 11:49 PM posted to microsoft.public.access.tablesdbdesign
David W. Fenton
external usenet poster
 
Posts: 3,373
Default I was told "Fields are expensive, records are cheap"

=?Utf-8?B?RGVubmlz?= wrote in
:

Your comment :A Jet/ACE data store doesn't store data in records,
but in data pages. It's not like legacy systems at all. And the
fields are variable length, so records are variable length. The
efficiency of interaction with the storage and RAM are pretty moot
since the days of disk caching, when even a write is done to RAM
(and then persisted to disk when convenient). This means there's
hardly any difference between RAM manipulation and reading/writing
from/to the data store, since both are happening in RAM.

Response: I can see what you are saying, but that assumes that
you are accessing data records sequentially. What happens to all
that disk caching when I am randomly reading records which I
would do during data entry (which is where I am most concerned
about disk access and network speed). The user might need the 1st
rcd, then the 1,000 rcd, then the 5,257th rcd, then the 10th rcd,
etc. etc.). Unless the entire table is cached, was good does
loading a data page do if the record that you need next is not on
that page? And if the data is cached in a server, that does not
good all because you have the network bottle neck.


You seem not to have done much investigation of how Jet works.
Indexes are the key. When you request data from a table, Jet first
requests the relevant indexes (based on your criteria) and then uses
the indexes to figure out which data pages to retrieve. The
requested data is going to be brought into RAM for processing, and
since it's been requested from disk, it will be cached in Jet's
internal cache, as well as in the OS's disk cache. When you write to
the data, Jet will take care of writing to disk, which will then be
handled by the OS, and if the written data pages are in the disk
cache (and you are caching writes), the writes will be to the disk
cache and then will be persisted to disk according to your disk
cache settings (and you can also tell Jet explicitly to flush its
cache, which in turn tells the OS to flush its write cache, i.e.,
persist to disk).

This is all managed quite efficienctly, so efficiently, in fact,
that there is very little reason to worry about the things you are
worrying about until you reach extremely large datasets. That's
usually not what is being used by an Access application directly
(i.e., if you have a large dataset, you're likely going to use a
different back-end than Jet/ACE).

I know about write caching and it does not affect the speed of
retrieving data (unless you do not have enough memory and your
machine is thrashing), which is why is was not part of the
question.


Writing is much slower than reading because it involves so many
different operations (writing indexes, determining if new pages need
to be allocated), as well as maintaining as set of appropriate locks
(write locks are a much bigger deal than read locks).

But disk caching is a big deal for reading, too. As I explained
above about the way Jet works, the indexes are the key to retrieving
data efficiently. Data pages in Jet 4 and ACE are 4KBs in size, so
you're retrieving data from the disk in blocks whose size is very
often well-aligned with the file system's block size (the NTFS file
system uses default block sizes in increments of KBs, with 4KBs
being the smallest usual block size, and the next being 8, 16, etc.,
so reading 4KB data pages is likely to be very efficient, since they
will be aligned with the file system's block size). Jet retrieves
what it needs as needed, but only as much as needed.

A simplified explanation of how Jet works: this SQL statement:

SELECT tblPerson.LastName, tblPerson.FirstName, tblPerson.PersonID
FROM tblPerson
WHERE tblPerson.LastName = "Fenton"

....will first retrieve the index on LastName which will be keyed to
the Primary Key, so Jet will then retrieve the primary key index. In
both cases, only as much of the index as necessary is retrieved
(it's not a sequential scan, but a b-tree traversal), and in the
case of the primary key index, since Jet tables are clustered on the
primary key, I'm pretty sure the primary key index is actually the
real data pages (I could be wrong on that).

I don't know for certain if the non-PK indexes store a link to the
PK or if they directly reference the data page of the PK, but the
point is that it's not a sequential scan of anything, but a b-tree
index traversal, which is substantially more efficient.

--
David W. Fenton http://www.dfenton.com/
usenet at dfenton dot com http://www.dfenton.com/DFA/