A Microsoft Office (Excel, Word) forum. OfficeFrustration

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

Go Back   Home » OfficeFrustration forum » Microsoft Access » Database Design
Site Map Home Register Authors List Search Today's Posts Mark Forums Read  

Which is more efficient?



 
 
Thread Tools Display Modes
  #1  
Old July 13th, 2008, 06:46 PM posted to microsoft.public.access.tablesdbdesign
JString
external usenet poster
 
Posts: 90
Default Which is more efficient?

The general situation: the law office that I'm working for is converting
their Lotus DB to access. It already has approx. 200k records that are
several years old, all of which are completely lacking unique identifiers.
Accounts are organized by PLAINTIFF, DEFENDANT, CASE#. There should be one
record per account, but it's all fairly disorganized and it's common to see
one account listed as several different records. So, after I clean it all
up, I want to create a feature that guards against this sort of situation.
Here's what I'm thinking about and I need to know what the best option is:

1) Create a find duplicates query based on the fields that I listed above
that is programatically launched by all data entry forms in their before
update events. If it finds another duplicate, alert the user that it exists.

2) Create an ID field (Indexed, duplicates yes) based on the same fields as
above. Data entry forms populate this field on before update. DCount
searches only the ID field for duplicates.

So, which sould be the quickest? Because I need this feature to be run
every time records are changed or new records are entered, I hope you can see
why efficiency is an issue.

Thanks in advance.
  #2  
Old July 13th, 2008, 07:35 PM posted to microsoft.public.access.tablesdbdesign
bhicks11 via AccessMonster.com
external usenet poster
 
Posts: 529
Default Which is more efficient?

How having the case # field as a combobx that has limit to list set to no.
Then when they key in a case number, if it exists, it will come up, if not
they can enter it. If it exists and they want to edit the record, you can
pull in the record.

Bonnie
http://www.dataplus-svc.com

JString wrote:
The general situation: the law office that I'm working for is converting
their Lotus DB to access. It already has approx. 200k records that are
several years old, all of which are completely lacking unique identifiers.
Accounts are organized by PLAINTIFF, DEFENDANT, CASE#. There should be one
record per account, but it's all fairly disorganized and it's common to see
one account listed as several different records. So, after I clean it all
up, I want to create a feature that guards against this sort of situation.
Here's what I'm thinking about and I need to know what the best option is:

1) Create a find duplicates query based on the fields that I listed above
that is programatically launched by all data entry forms in their before
update events. If it finds another duplicate, alert the user that it exists.

2) Create an ID field (Indexed, duplicates yes) based on the same fields as
above. Data entry forms populate this field on before update. DCount
searches only the ID field for duplicates.

So, which sould be the quickest? Because I need this feature to be run
every time records are changed or new records are entered, I hope you can see
why efficiency is an issue.

Thanks in advance.


--
Message posted via AccessMonster.com
http://www.accessmonster.com/Uwe/For...esign/200807/1

  #3  
Old July 13th, 2008, 10:42 PM posted to microsoft.public.access.tablesdbdesign
JString
external usenet poster
 
Posts: 90
Default Which is more efficient?

Unfortunately that won't work because the case# is not truly unique. The
case# can sometimes be the same for unrelated cases

"bhicks11 via AccessMonster.com" wrote:

How having the case # field as a combobx that has limit to list set to no.
Then when they key in a case number, if it exists, it will come up, if not
they can enter it. If it exists and they want to edit the record, you can
pull in the record.

Bonnie
http://www.dataplus-svc.com

JString wrote:
The general situation: the law office that I'm working for is converting
their Lotus DB to access. It already has approx. 200k records that are
several years old, all of which are completely lacking unique identifiers.
Accounts are organized by PLAINTIFF, DEFENDANT, CASE#. There should be one
record per account, but it's all fairly disorganized and it's common to see
one account listed as several different records. So, after I clean it all
up, I want to create a feature that guards against this sort of situation.
Here's what I'm thinking about and I need to know what the best option is:

1) Create a find duplicates query based on the fields that I listed above
that is programatically launched by all data entry forms in their before
update events. If it finds another duplicate, alert the user that it exists.

2) Create an ID field (Indexed, duplicates yes) based on the same fields as
above. Data entry forms populate this field on before update. DCount
searches only the ID field for duplicates.

So, which sould be the quickest? Because I need this feature to be run
every time records are changed or new records are entered, I hope you can see
why efficiency is an issue.

Thanks in advance.


--
Message posted via AccessMonster.com
http://www.accessmonster.com/Uwe/For...esign/200807/1


  #4  
Old July 14th, 2008, 12:34 AM posted to microsoft.public.access.tablesdbdesign
John W. Vinson
external usenet poster
 
Posts: 18,261
Default Which is more efficient?

On Sun, 13 Jul 2008 10:46:05 -0700, JString
wrote:

The general situation: the law office that I'm working for is converting
their Lotus DB to access. It already has approx. 200k records that are
several years old, all of which are completely lacking unique identifiers.
Accounts are organized by PLAINTIFF, DEFENDANT, CASE#. There should be one
record per account, but it's all fairly disorganized and it's common to see
one account listed as several different records. So, after I clean it all
up, I want to create a feature that guards against this sort of situation.
Here's what I'm thinking about and I need to know what the best option is:

1) Create a find duplicates query based on the fields that I listed above
that is programatically launched by all data entry forms in their before
update events. If it finds another duplicate, alert the user that it exists.

2) Create an ID field (Indexed, duplicates yes) based on the same fields as
above. Data entry forms populate this field on before update. DCount
searches only the ID field for duplicates.

So, which sould be the quickest? Because I need this feature to be run
every time records are changed or new records are entered, I hope you can see
why efficiency is an issue.

Thanks in advance.


You've got worse problems than you may realize.

What if you have a case for plaintiff Joe Jones, and another for Joseph Jones,
and a third for Joe Jones (no, not the car dealer in the first case but the
other Joe Jones, the state representative) - who is actually the Joseph Jones
in the second case?

Name are DREADFUL as unique identifiers. They *WILL* be misspelled, or
variantly spelled; they're not unique (when I was in college there was a
professor John Vinson at the same school); they simply don't WORK as unique
identifiers for a person.

And if your CASE# is ambiguous, I can't imagine how ANY system, Lotus, paper,
human brain or Access can use it. "Janet! Get me the file on Case# 3102!"
"Which one, boss? there are three..."

You unfortunately may need to do a whole lot of manual work creating a People
table with a meaningless numeric identifier for People, with first name, last
name, *and whatever other identifying information you have available* to
uniquely identify the people in your system (who may be defendents,
plaintiffs, attorneys, judges, etc.)

You'll also need to come up with some way to unambiguously identify each Case
and each Account. What you have in your Lotus file will *feed* such a system,
but it will require a lot of help to do so I fear! Good luck!

--

John W. Vinson [MVP]
  #5  
Old July 14th, 2008, 04:53 PM posted to microsoft.public.access.tablesdbdesign
JString
external usenet poster
 
Posts: 90
Default Which is more efficient?

I think that I may have seen the light.

If I split the Plaintiff and Defendant fields between first and last names,
set the last name fields to required (first names will only exist for
additional info), then the two last name fields + case# should be enough to
create a unique reference.

Of course that doesn't protect against misspelled values but it should be
enough to significantly reduce the number of duplicates.

"John W. Vinson" wrote:

On Sun, 13 Jul 2008 10:46:05 -0700, JString
wrote:

The general situation: the law office that I'm working for is converting
their Lotus DB to access. It already has approx. 200k records that are
several years old, all of which are completely lacking unique identifiers.
Accounts are organized by PLAINTIFF, DEFENDANT, CASE#. There should be one
record per account, but it's all fairly disorganized and it's common to see
one account listed as several different records. So, after I clean it all
up, I want to create a feature that guards against this sort of situation.
Here's what I'm thinking about and I need to know what the best option is:

1) Create a find duplicates query based on the fields that I listed above
that is programatically launched by all data entry forms in their before
update events. If it finds another duplicate, alert the user that it exists.

2) Create an ID field (Indexed, duplicates yes) based on the same fields as
above. Data entry forms populate this field on before update. DCount
searches only the ID field for duplicates.

So, which sould be the quickest? Because I need this feature to be run
every time records are changed or new records are entered, I hope you can see
why efficiency is an issue.

Thanks in advance.


You've got worse problems than you may realize.

What if you have a case for plaintiff Joe Jones, and another for Joseph Jones,
and a third for Joe Jones (no, not the car dealer in the first case but the
other Joe Jones, the state representative) - who is actually the Joseph Jones
in the second case?

Name are DREADFUL as unique identifiers. They *WILL* be misspelled, or
variantly spelled; they're not unique (when I was in college there was a
professor John Vinson at the same school); they simply don't WORK as unique
identifiers for a person.

And if your CASE# is ambiguous, I can't imagine how ANY system, Lotus, paper,
human brain or Access can use it. "Janet! Get me the file on Case# 3102!"
"Which one, boss? there are three..."

You unfortunately may need to do a whole lot of manual work creating a People
table with a meaningless numeric identifier for People, with first name, last
name, *and whatever other identifying information you have available* to
uniquely identify the people in your system (who may be defendents,
plaintiffs, attorneys, judges, etc.)

You'll also need to come up with some way to unambiguously identify each Case
and each Account. What you have in your Lotus file will *feed* such a system,
but it will require a lot of help to do so I fear! Good luck!

--

John W. Vinson [MVP]

  #6  
Old July 14th, 2008, 05:46 PM posted to microsoft.public.access.tablesdbdesign
John W. Vinson
external usenet poster
 
Posts: 18,261
Default Which is more efficient?

On Mon, 14 Jul 2008 08:53:01 -0700, JString
wrote:

If I split the Plaintiff and Defendant fields between first and last names,
set the last name fields to required (first names will only exist for
additional info), then the two last name fields + case# should be enough to
create a unique reference.

Of course that doesn't protect against misspelled values but it should be
enough to significantly reduce the number of duplicates.


Names can be repeated.
You say Case# can be repeated.

*You still have a problem*. With your constraints, it would be legitimate to
have

Case# 123; "Jones"; "Barry"
Case# 123; "Jones", "Barry"

referring to two different cases, Joe Jones being sued by Robert Barry and
Sylvia Jones being sued by Al Barry, and happening to have the same Case#.

Unless you can get the Case# to be unique for a Case you will need other
information to define uniqueness.
--

John W. Vinson [MVP]
  #7  
Old July 14th, 2008, 06:39 PM posted to microsoft.public.access.tablesdbdesign
bhicks11 via AccessMonster.com
external usenet poster
 
Posts: 529
Default Which is more efficient?

He does have a mess and I know this isn't a good solution but as a quick fix
for a guy that obviously doesn't want to make the needed changes:

You could make a combobox for the user to select the correct person/case # -
show as many fields as you think necessary for them to pick from. Now you
have put the responsibility on the user to make sure they are checking for
the correct name/Case#/address, etc.

John - you are absolutely correct - just offering a lazy way out! And you
know how I respect your opinion!

Bonnie
http://www.dataplus-svc.com



John W. Vinson wrote:
If I split the Plaintiff and Defendant fields between first and last names,
set the last name fields to required (first names will only exist for

[quoted text clipped - 3 lines]
Of course that doesn't protect against misspelled values but it should be
enough to significantly reduce the number of duplicates.


Names can be repeated.
You say Case# can be repeated.

*You still have a problem*. With your constraints, it would be legitimate to
have

Case# 123; "Jones"; "Barry"
Case# 123; "Jones", "Barry"

referring to two different cases, Joe Jones being sued by Robert Barry and
Sylvia Jones being sued by Al Barry, and happening to have the same Case#.

Unless you can get the Case# to be unique for a Case you will need other
information to define uniqueness.


--
Message posted via http://www.accessmonster.com

  #8  
Old July 14th, 2008, 07:19 PM posted to microsoft.public.access.tablesdbdesign
[email protected]
external usenet poster
 
Posts: 4
Default Which is more efficient?

On Jul 14, 1:39*pm, "bhicks11 via AccessMonster.com" u44327@uwe
wrote:
He does have a mess and I know this isn't a good solution but as a quick fix
for a guy that obviously doesn't want to make the needed changes:

You could make a combobox for the user to select the correct person/case # -
show as many fields as you think necessary for them to pick from. *Now you
have put the responsibility on the user to make sure they are checking for
the correct name/Case#/address, etc.

John - you are absolutely correct - just offering a lazy way out! *And you
know how I respect your opinion!

Bonniehttp://www.dataplus-svc.com





John W. Vinson wrote:
If I split the Plaintiff and Defendant fields between first and last names,
set the last name fields to required (first names will only exist for

[quoted text clipped - 3 lines]
Of course that doesn't protect against misspelled values but it should be
enough to significantly reduce the number of duplicates. *


Names can be repeated.
You say Case# can be repeated.


*You still have a problem*. With your constraints, it would be legitimate to
have


Case# 123; "Jones"; "Barry"
Case# 123; "Jones", "Barry"


referring to two different cases, Joe Jones being sued by Robert Barry and
Sylvia Jones being sued by Al Barry, and happening to have the same Case#.


Unless you can get the Case# to be unique for a Case you will need other
information to define uniqueness.


--
Message posted viahttp://www.accessmonster.com- Hide quoted text -

- Show quoted text -


OK, so this may not exactly be helpful, but I'm still baffled by the
fact that there isn't a unique identifier for the case, like I don't
know, Case#. I would think it would be fairly easy to create a unique
case# for every case. Either have it be a sequential number or use
the date and a 2 or 3 or 4 digit case number (depending on how many
are opened every day/month/year). IE...2008-07-001 would be the first
case opened in July of 2008. I personally would push for a unique
identifier for the case, not just for your database but to cut down on
confusion everywhere.
  #9  
Old July 14th, 2008, 07:21 PM posted to microsoft.public.access.tablesdbdesign
JString
external usenet poster
 
Posts: 90
Default Which is more efficient?

That is true... I suppose if I added in the DateFiled field then the
occurrence of a duplicate would become even more unlikely, but then as I add
more fields to this search the chance of a misspelled duplicate would
increase. I'm no mathematician but I would guess that after a few hundred
thousand entries both of these possibilities would occur at least a handful
of times. It's beginning to sound more and more like a catch-22.

Even so, what about this...

I could build a procedure that counts the records that match these values,
and in the event that a duplicate is found, a popup form could list these
records for user verification/modification. If the user sees that they are
duplicates, he or she could modify them there on the spot, or if they are not
duplicates a button could be pressed that updates some MatchCount field so
that:

Joe Jones vs. Bob Spencer (case 123) might be the first occurrence and have
a value of 1 updated to MatchCount and,

Mary Jones vs. Mark Spencer (case 123), being the second occurrence would
have value 2 updated to MatchCount.

Of course, the original duplicate count procedure would have to include this
new MatchCount field along with all the others.



"John W. Vinson" wrote:

On Mon, 14 Jul 2008 08:53:01 -0700, JString
wrote:

If I split the Plaintiff and Defendant fields between first and last names,
set the last name fields to required (first names will only exist for
additional info), then the two last name fields + case# should be enough to
create a unique reference.

Of course that doesn't protect against misspelled values but it should be
enough to significantly reduce the number of duplicates.


Names can be repeated.
You say Case# can be repeated.

*You still have a problem*. With your constraints, it would be legitimate to
have

Case# 123; "Jones"; "Barry"
Case# 123; "Jones", "Barry"

referring to two different cases, Joe Jones being sued by Robert Barry and
Sylvia Jones being sued by Al Barry, and happening to have the same Case#.

Unless you can get the Case# to be unique for a Case you will need other
information to define uniqueness.
--

John W. Vinson [MVP]

  #10  
Old July 14th, 2008, 08:07 PM posted to microsoft.public.access.tablesdbdesign
JString
external usenet poster
 
Posts: 90
Default Which is more efficient?

Case# comes from the different court systems in different counties. I don't
know exactly how their numbering systems work and it even looks like these
systems might have varying rules depending on where the case was filed.

" wrote:

On Jul 14, 1:39 pm, "bhicks11 via AccessMonster.com" u44327@uwe
wrote:
He does have a mess and I know this isn't a good solution but as a quick fix
for a guy that obviously doesn't want to make the needed changes:

You could make a combobox for the user to select the correct person/case # -
show as many fields as you think necessary for them to pick from. Now you
have put the responsibility on the user to make sure they are checking for
the correct name/Case#/address, etc.

John - you are absolutely correct - just offering a lazy way out! And you
know how I respect your opinion!

Bonniehttp://www.dataplus-svc.com





John W. Vinson wrote:
If I split the Plaintiff and Defendant fields between first and last names,
set the last name fields to required (first names will only exist for
[quoted text clipped - 3 lines]
Of course that doesn't protect against misspelled values but it should be
enough to significantly reduce the number of duplicates.


Names can be repeated.
You say Case# can be repeated.


*You still have a problem*. With your constraints, it would be legitimate to
have


Case# 123; "Jones"; "Barry"
Case# 123; "Jones", "Barry"


referring to two different cases, Joe Jones being sued by Robert Barry and
Sylvia Jones being sued by Al Barry, and happening to have the same Case#.


Unless you can get the Case# to be unique for a Case you will need other
information to define uniqueness.


--
Message posted viahttp://www.accessmonster.com- Hide quoted text -

- Show quoted text -


OK, so this may not exactly be helpful, but I'm still baffled by the
fact that there isn't a unique identifier for the case, like I don't
know, Case#. I would think it would be fairly easy to create a unique
case# for every case. Either have it be a sequential number or use
the date and a 2 or 3 or 4 digit case number (depending on how many
are opened every day/month/year). IE...2008-07-001 would be the first
case opened in July of 2008. I personally would push for a unique
identifier for the case, not just for your database but to cut down on
confusion everywhere.

 




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Forum Jump


All times are GMT +1. The time now is 09:51 AM.


Powered by vBulletin® Version 3.6.4
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright ©2004-2024 OfficeFrustration.
The comments are property of their posters.