Which is more efficient?

**JString** · July 13th, 2008, 06:46 PM posted to microsoft.public.access.tablesdbdesign

The general situation: the law office that I'm working for is converting
their Lotus DB to access. It already has approx. 200k records that are
several years old, all of which are completely lacking unique identifiers.
Accounts are organized by PLAINTIFF, DEFENDANT, CASE#. There should be one
record per account, but it's all fairly disorganized and it's common to see
one account listed as several different records. So, after I clean it all
up, I want to create a feature that guards against this sort of situation.
Here's what I'm thinking about and I need to know what the best option is:

1) Create a find duplicates query based on the fields that I listed above
that is programatically launched by all data entry forms in their before
update events. If it finds another duplicate, alert the user that it exists.

2) Create an ID field (Indexed, duplicates yes) based on the same fields as
above. Data entry forms populate this field on before update. DCount
searches only the ID field for duplicates.

So, which sould be the quickest? Because I need this feature to be run
every time records are changed or new records are entered, I hope you can see
why efficiency is an issue.

Thanks in advance.

**bhicks11 via AccessMonster.com** · July 13th, 2008, 07:35 PM posted to microsoft.public.access.tablesdbdesign

How having the case # field as a combobx that has limit to list set to no.
Then when they key in a case number, if it exists, it will come up, if not
they can enter it. If it exists and they want to edit the record, you can
pull in the record.

Bonnie
http://www.dataplus-svc.com

JString wrote:
The general situation: the law office that I'm working for is converting
their Lotus DB to access. It already has approx. 200k records that are
several years old, all of which are completely lacking unique identifiers.
Accounts are organized by PLAINTIFF, DEFENDANT, CASE#. There should be one
record per account, but it's all fairly disorganized and it's common to see
one account listed as several different records. So, after I clean it all
up, I want to create a feature that guards against this sort of situation.
Here's what I'm thinking about and I need to know what the best option is:

1) Create a find duplicates query based on the fields that I listed above
that is programatically launched by all data entry forms in their before
update events. If it finds another duplicate, alert the user that it exists.

2) Create an ID field (Indexed, duplicates yes) based on the same fields as
above. Data entry forms populate this field on before update. DCount
searches only the ID field for duplicates.

So, which sould be the quickest? Because I need this feature to be run
every time records are changed or new records are entered, I hope you can see
why efficiency is an issue.

Thanks in advance.

--
Message posted via AccessMonster.com
http://www.accessmonster.com/Uwe/For...esign/200807/1

**JString** · July 13th, 2008, 10:42 PM posted to microsoft.public.access.tablesdbdesign

Unfortunately that won't work because the case# is not truly unique. The
case# can sometimes be the same for unrelated cases

"bhicks11 via AccessMonster.com" wrote:

How having the case # field as a combobx that has limit to list set to no.
Then when they key in a case number, if it exists, it will come up, if not
they can enter it. If it exists and they want to edit the record, you can
pull in the record.

Bonnie
http://www.dataplus-svc.com

JString wrote:
The general situation: the law office that I'm working for is converting
their Lotus DB to access. It already has approx. 200k records that are
several years old, all of which are completely lacking unique identifiers.
Accounts are organized by PLAINTIFF, DEFENDANT, CASE#. There should be one
record per account, but it's all fairly disorganized and it's common to see
one account listed as several different records. So, after I clean it all
up, I want to create a feature that guards against this sort of situation.
Here's what I'm thinking about and I need to know what the best option is:

1) Create a find duplicates query based on the fields that I listed above
that is programatically launched by all data entry forms in their before
update events. If it finds another duplicate, alert the user that it exists.

2) Create an ID field (Indexed, duplicates yes) based on the same fields as
above. Data entry forms populate this field on before update. DCount
searches only the ID field for duplicates.

So, which sould be the quickest? Because I need this feature to be run
every time records are changed or new records are entered, I hope you can see
why efficiency is an issue.

Thanks in advance.

--
Message posted via AccessMonster.com
http://www.accessmonster.com/Uwe/For...esign/200807/1

**John W. Vinson** · July 14th, 2008, 12:34 AM posted to microsoft.public.access.tablesdbdesign

On Sun, 13 Jul 2008 10:46:05 -0700, JString
wrote:

The general situation: the law office that I'm working for is converting
their Lotus DB to access. It already has approx. 200k records that are
several years old, all of which are completely lacking unique identifiers.
Accounts are organized by PLAINTIFF, DEFENDANT, CASE#. There should be one
record per account, but it's all fairly disorganized and it's common to see
one account listed as several different records. So, after I clean it all
up, I want to create a feature that guards against this sort of situation.
Here's what I'm thinking about and I need to know what the best option is:

1) Create a find duplicates query based on the fields that I listed above
that is programatically launched by all data entry forms in their before
update events. If it finds another duplicate, alert the user that it exists.

2) Create an ID field (Indexed, duplicates yes) based on the same fields as
above. Data entry forms populate this field on before update. DCount
searches only the ID field for duplicates.

So, which sould be the quickest? Because I need this feature to be run
every time records are changed or new records are entered, I hope you can see
why efficiency is an issue.

Thanks in advance.

You've got worse problems than you may realize.

What if you have a case for plaintiff Joe Jones, and another for Joseph Jones,
and a third for Joe Jones (no, not the car dealer in the first case but the
other Joe Jones, the state representative) - who is actually the Joseph Jones
in the second case?

Name are DREADFUL as unique identifiers. They *WILL* be misspelled, or
variantly spelled; they're not unique (when I was in college there was a
professor John Vinson at the same school); they simply don't WORK as unique
identifiers for a person.

And if your CASE# is ambiguous, I can't imagine how ANY system, Lotus, paper,
human brain or Access can use it. "Janet! Get me the file on Case# 3102!"
"Which one, boss? there are three..."

You unfortunately may need to do a whole lot of manual work creating a People
table with a meaningless numeric identifier for People, with first name, last
name, *and whatever other identifying information you have available* to
uniquely identify the people in your system (who may be defendents,
plaintiffs, attorneys, judges, etc.)

You'll also need to come up with some way to unambiguously identify each Case
and each Account. What you have in your Lotus file will *feed* such a system,
but it will require a lot of help to do so I fear! Good luck!

--

John W. Vinson [MVP]

**JString** · July 14th, 2008, 04:53 PM posted to microsoft.public.access.tablesdbdesign

I think that I may have seen the light.

If I split the Plaintiff and Defendant fields between first and last names,
set the last name fields to required (first names will only exist for
additional info), then the two last name fields + case# should be enough to
create a unique reference.

Of course that doesn't protect against misspelled values but it should be
enough to significantly reduce the number of duplicates.

"John W. Vinson" wrote:

On Sun, 13 Jul 2008 10:46:05 -0700, JString
wrote:

The general situation: the law office that I'm working for is converting
their Lotus DB to access. It already has approx. 200k records that are
several years old, all of which are completely lacking unique identifiers.
Accounts are organized by PLAINTIFF, DEFENDANT, CASE#. There should be one
record per account, but it's all fairly disorganized and it's common to see
one account listed as several different records. So, after I clean it all
up, I want to create a feature that guards against this sort of situation.
Here's what I'm thinking about and I need to know what the best option is:

1) Create a find duplicates query based on the fields that I listed above
that is programatically launched by all data entry forms in their before
update events. If it finds another duplicate, alert the user that it exists.

2) Create an ID field (Indexed, duplicates yes) based on the same fields as
above. Data entry forms populate this field on before update. DCount
searches only the ID field for duplicates.

So, which sould be the quickest? Because I need this feature to be run
every time records are changed or new records are entered, I hope you can see
why efficiency is an issue.

Thanks in advance.

You've got worse problems than you may realize.

What if you have a case for plaintiff Joe Jones, and another for Joseph Jones,
and a third for Joe Jones (no, not the car dealer in the first case but the
other Joe Jones, the state representative) - who is actually the Joseph Jones
in the second case?

Name are DREADFUL as unique identifiers. They *WILL* be misspelled, or
variantly spelled; they're not unique (when I was in college there was a
professor John Vinson at the same school); they simply don't WORK as unique
identifiers for a person.

And if your CASE# is ambiguous, I can't imagine how ANY system, Lotus, paper,
human brain or Access can use it. "Janet! Get me the file on Case# 3102!"
"Which one, boss? there are three..."

You unfortunately may need to do a whole lot of manual work creating a People
table with a meaningless numeric identifier for People, with first name, last
name, *and whatever other identifying information you have available* to
uniquely identify the people in your system (who may be defendents,
plaintiffs, attorneys, judges, etc.)

You'll also need to come up with some way to unambiguously identify each Case
and each Account. What you have in your Lotus file will *feed* such a system,
but it will require a lot of help to do so I fear! Good luck!

--

John W. Vinson [MVP]

**John W. Vinson** · July 14th, 2008, 05:46 PM posted to microsoft.public.access.tablesdbdesign

On Mon, 14 Jul 2008 08:53:01 -0700, JString
wrote:

If I split the Plaintiff and Defendant fields between first and last names,
set the last name fields to required (first names will only exist for
additional info), then the two last name fields + case# should be enough to
create a unique reference.

Of course that doesn't protect against misspelled values but it should be
enough to significantly reduce the number of duplicates.

Names can be repeated.
You say Case# can be repeated.

*You still have a problem*. With your constraints, it would be legitimate to
have

Case# 123; "Jones"; "Barry"
Case# 123; "Jones", "Barry"

referring to two different cases, Joe Jones being sued by Robert Barry and
Sylvia Jones being sued by Al Barry, and happening to have the same Case#.

Unless you can get the Case# to be unique for a Case you will need other
information to define uniqueness.
--

John W. Vinson [MVP]

**bhicks11 via AccessMonster.com** · July 14th, 2008, 06:39 PM posted to microsoft.public.access.tablesdbdesign

He does have a mess and I know this isn't a good solution but as a quick fix
for a guy that obviously doesn't want to make the needed changes:

You could make a combobox for the user to select the correct person/case # -
show as many fields as you think necessary for them to pick from. Now you
have put the responsibility on the user to make sure they are checking for
the correct name/Case#/address, etc.

John - you are absolutely correct - just offering a lazy way out! And you
know how I respect your opinion!

Bonnie
http://www.dataplus-svc.com

John W. Vinson wrote:
If I split the Plaintiff and Defendant fields between first and last names,
set the last name fields to required (first names will only exist for
[quoted text clipped - 3 lines]
Of course that doesn't protect against misspelled values but it should be
enough to significantly reduce the number of duplicates.

Names can be repeated.
You say Case# can be repeated.

*You still have a problem*. With your constraints, it would be legitimate to
have

Case# 123; "Jones"; "Barry"
Case# 123; "Jones", "Barry"

referring to two different cases, Joe Jones being sued by Robert Barry and
Sylvia Jones being sued by Al Barry, and happening to have the same Case#.

Unless you can get the Case# to be unique for a Case you will need other
information to define uniqueness.

--
Message posted via http://www.accessmonster.com

**[email protected]** · July 14th, 2008, 07:19 PM posted to microsoft.public.access.tablesdbdesign

On Jul 14, 1:39*pm, "bhicks11 via AccessMonster.com" u44327@uwe
wrote:
He does have a mess and I know this isn't a good solution but as a quick fix
for a guy that obviously doesn't want to make the needed changes:

You could make a combobox for the user to select the correct person/case # -
show as many fields as you think necessary for them to pick from. *Now you
have put the responsibility on the user to make sure they are checking for
the correct name/Case#/address, etc.

John - you are absolutely correct - just offering a lazy way out! *And you
know how I respect your opinion!

Bonniehttp://www.dataplus-svc.com

John W. Vinson wrote:
If I split the Plaintiff and Defendant fields between first and last names,
set the last name fields to required (first names will only exist for
[quoted text clipped - 3 lines]
Of course that doesn't protect against misspelled values but it should be
enough to significantly reduce the number of duplicates. *

Names can be repeated.
You say Case# can be repeated.

*You still have a problem*. With your constraints, it would be legitimate to
have

Case# 123; "Jones"; "Barry"
Case# 123; "Jones", "Barry"

referring to two different cases, Joe Jones being sued by Robert Barry and
Sylvia Jones being sued by Al Barry, and happening to have the same Case#.

Unless you can get the Case# to be unique for a Case you will need other
information to define uniqueness.

--
Message posted viahttp://www.accessmonster.com- Hide quoted text -

- Show quoted text -

OK, so this may not exactly be helpful, but I'm still baffled by the
fact that there isn't a unique identifier for the case, like I don't
know, Case#. I would think it would be fairly easy to create a unique
case# for every case. Either have it be a sequential number or use
the date and a 2 or 3 or 4 digit case number (depending on how many
are opened every day/month/year). IE...2008-07-001 would be the first
case opened in July of 2008. I personally would push for a unique
identifier for the case, not just for your database but to cut down on
confusion everywhere.

**JString** · July 14th, 2008, 07:21 PM posted to microsoft.public.access.tablesdbdesign

That is true... I suppose if I added in the DateFiled field then the
occurrence of a duplicate would become even more unlikely, but then as I add
more fields to this search the chance of a misspelled duplicate would
increase. I'm no mathematician but I would guess that after a few hundred
thousand entries both of these possibilities would occur at least a handful
of times. It's beginning to sound more and more like a catch-22.

Even so, what about this...

I could build a procedure that counts the records that match these values,
and in the event that a duplicate is found, a popup form could list these
records for user verification/modification. If the user sees that they are
duplicates, he or she could modify them there on the spot, or if they are not
duplicates a button could be pressed that updates some MatchCount field so
that:

Joe Jones vs. Bob Spencer (case 123) might be the first occurrence and have
a value of 1 updated to MatchCount and,

Mary Jones vs. Mark Spencer (case 123), being the second occurrence would
have value 2 updated to MatchCount.

Of course, the original duplicate count procedure would have to include this
new MatchCount field along with all the others.

"John W. Vinson" wrote:

On Mon, 14 Jul 2008 08:53:01 -0700, JString
wrote:

If I split the Plaintiff and Defendant fields between first and last names,
set the last name fields to required (first names will only exist for
additional info), then the two last name fields + case# should be enough to
create a unique reference.

Of course that doesn't protect against misspelled values but it should be
enough to significantly reduce the number of duplicates.

Names can be repeated.
You say Case# can be repeated.

*You still have a problem*. With your constraints, it would be legitimate to
have

Case# 123; "Jones"; "Barry"
Case# 123; "Jones", "Barry"

referring to two different cases, Joe Jones being sued by Robert Barry and
Sylvia Jones being sued by Al Barry, and happening to have the same Case#.

Unless you can get the Case# to be unique for a Case you will need other
information to define uniqueness.
--

John W. Vinson [MVP]

**JString** · July 14th, 2008, 08:07 PM posted to microsoft.public.access.tablesdbdesign

Case# comes from the different court systems in different counties. I don't
know exactly how their numbering systems work and it even looks like these
systems might have varying rules depending on where the case was filed.

" wrote:

On Jul 14, 1:39 pm, "bhicks11 via AccessMonster.com" u44327@uwe
wrote:
He does have a mess and I know this isn't a good solution but as a quick fix
for a guy that obviously doesn't want to make the needed changes:

You could make a combobox for the user to select the correct person/case # -
show as many fields as you think necessary for them to pick from. Now you
have put the responsibility on the user to make sure they are checking for
the correct name/Case#/address, etc.

John - you are absolutely correct - just offering a lazy way out! And you
know how I respect your opinion!

Bonniehttp://www.dataplus-svc.com

John W. Vinson wrote:
If I split the Plaintiff and Defendant fields between first and last names,
set the last name fields to required (first names will only exist for
[quoted text clipped - 3 lines]
Of course that doesn't protect against misspelled values but it should be
enough to significantly reduce the number of duplicates.

Names can be repeated.
You say Case# can be repeated.

*You still have a problem*. With your constraints, it would be legitimate to
have

Case# 123; "Jones"; "Barry"
Case# 123; "Jones", "Barry"

referring to two different cases, Joe Jones being sued by Robert Barry and
Sylvia Jones being sued by Al Barry, and happening to have the same Case#.

Unless you can get the Case# to be unique for a Case you will need other
information to define uniqueness.

--
Message posted viahttp://www.accessmonster.com- Hide quoted text -

- Show quoted text -

OK, so this may not exactly be helpful, but I'm still baffled by the
fact that there isn't a unique identifier for the case, like I don't
know, Case#. I would think it would be fairly easy to create a unique
case# for every case. Either have it be a sequential number or use
the date and a 2 or 3 or 4 digit case number (depending on how many
are opened every day/month/year). IE...2008-07-001 would be the first
case opened in July of 2008. I personally would push for a unique
identifier for the case, not just for your database but to cut down on
confusion everywhere.

Thread Tools
Show Printable Version Email this Page
Display Modes
Linear Mode Switch to Hybrid Mode Switch to Threaded Mode