If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. |
|
|
|
Thread Tools | Display Modes |
#1
|
|||
|
|||
Which is more efficient?
The general situation: the law office that I'm working for is converting
their Lotus DB to access. It already has approx. 200k records that are several years old, all of which are completely lacking unique identifiers. Accounts are organized by PLAINTIFF, DEFENDANT, CASE#. There should be one record per account, but it's all fairly disorganized and it's common to see one account listed as several different records. So, after I clean it all up, I want to create a feature that guards against this sort of situation. Here's what I'm thinking about and I need to know what the best option is: 1) Create a find duplicates query based on the fields that I listed above that is programatically launched by all data entry forms in their before update events. If it finds another duplicate, alert the user that it exists. 2) Create an ID field (Indexed, duplicates yes) based on the same fields as above. Data entry forms populate this field on before update. DCount searches only the ID field for duplicates. So, which sould be the quickest? Because I need this feature to be run every time records are changed or new records are entered, I hope you can see why efficiency is an issue. Thanks in advance. |
#2
|
|||
|
|||
Which is more efficient?
How having the case # field as a combobx that has limit to list set to no.
Then when they key in a case number, if it exists, it will come up, if not they can enter it. If it exists and they want to edit the record, you can pull in the record. Bonnie http://www.dataplus-svc.com JString wrote: The general situation: the law office that I'm working for is converting their Lotus DB to access. It already has approx. 200k records that are several years old, all of which are completely lacking unique identifiers. Accounts are organized by PLAINTIFF, DEFENDANT, CASE#. There should be one record per account, but it's all fairly disorganized and it's common to see one account listed as several different records. So, after I clean it all up, I want to create a feature that guards against this sort of situation. Here's what I'm thinking about and I need to know what the best option is: 1) Create a find duplicates query based on the fields that I listed above that is programatically launched by all data entry forms in their before update events. If it finds another duplicate, alert the user that it exists. 2) Create an ID field (Indexed, duplicates yes) based on the same fields as above. Data entry forms populate this field on before update. DCount searches only the ID field for duplicates. So, which sould be the quickest? Because I need this feature to be run every time records are changed or new records are entered, I hope you can see why efficiency is an issue. Thanks in advance. -- Message posted via AccessMonster.com http://www.accessmonster.com/Uwe/For...esign/200807/1 |
#3
|
|||
|
|||
Which is more efficient?
Unfortunately that won't work because the case# is not truly unique. The
case# can sometimes be the same for unrelated cases "bhicks11 via AccessMonster.com" wrote: How having the case # field as a combobx that has limit to list set to no. Then when they key in a case number, if it exists, it will come up, if not they can enter it. If it exists and they want to edit the record, you can pull in the record. Bonnie http://www.dataplus-svc.com JString wrote: The general situation: the law office that I'm working for is converting their Lotus DB to access. It already has approx. 200k records that are several years old, all of which are completely lacking unique identifiers. Accounts are organized by PLAINTIFF, DEFENDANT, CASE#. There should be one record per account, but it's all fairly disorganized and it's common to see one account listed as several different records. So, after I clean it all up, I want to create a feature that guards against this sort of situation. Here's what I'm thinking about and I need to know what the best option is: 1) Create a find duplicates query based on the fields that I listed above that is programatically launched by all data entry forms in their before update events. If it finds another duplicate, alert the user that it exists. 2) Create an ID field (Indexed, duplicates yes) based on the same fields as above. Data entry forms populate this field on before update. DCount searches only the ID field for duplicates. So, which sould be the quickest? Because I need this feature to be run every time records are changed or new records are entered, I hope you can see why efficiency is an issue. Thanks in advance. -- Message posted via AccessMonster.com http://www.accessmonster.com/Uwe/For...esign/200807/1 |
#4
|
|||
|
|||
Which is more efficient?
On Sun, 13 Jul 2008 10:46:05 -0700, JString
wrote: The general situation: the law office that I'm working for is converting their Lotus DB to access. It already has approx. 200k records that are several years old, all of which are completely lacking unique identifiers. Accounts are organized by PLAINTIFF, DEFENDANT, CASE#. There should be one record per account, but it's all fairly disorganized and it's common to see one account listed as several different records. So, after I clean it all up, I want to create a feature that guards against this sort of situation. Here's what I'm thinking about and I need to know what the best option is: 1) Create a find duplicates query based on the fields that I listed above that is programatically launched by all data entry forms in their before update events. If it finds another duplicate, alert the user that it exists. 2) Create an ID field (Indexed, duplicates yes) based on the same fields as above. Data entry forms populate this field on before update. DCount searches only the ID field for duplicates. So, which sould be the quickest? Because I need this feature to be run every time records are changed or new records are entered, I hope you can see why efficiency is an issue. Thanks in advance. You've got worse problems than you may realize. What if you have a case for plaintiff Joe Jones, and another for Joseph Jones, and a third for Joe Jones (no, not the car dealer in the first case but the other Joe Jones, the state representative) - who is actually the Joseph Jones in the second case? Name are DREADFUL as unique identifiers. They *WILL* be misspelled, or variantly spelled; they're not unique (when I was in college there was a professor John Vinson at the same school); they simply don't WORK as unique identifiers for a person. And if your CASE# is ambiguous, I can't imagine how ANY system, Lotus, paper, human brain or Access can use it. "Janet! Get me the file on Case# 3102!" "Which one, boss? there are three..." You unfortunately may need to do a whole lot of manual work creating a People table with a meaningless numeric identifier for People, with first name, last name, *and whatever other identifying information you have available* to uniquely identify the people in your system (who may be defendents, plaintiffs, attorneys, judges, etc.) You'll also need to come up with some way to unambiguously identify each Case and each Account. What you have in your Lotus file will *feed* such a system, but it will require a lot of help to do so I fear! Good luck! -- John W. Vinson [MVP] |
#5
|
|||
|
|||
Which is more efficient?
I think that I may have seen the light.
If I split the Plaintiff and Defendant fields between first and last names, set the last name fields to required (first names will only exist for additional info), then the two last name fields + case# should be enough to create a unique reference. Of course that doesn't protect against misspelled values but it should be enough to significantly reduce the number of duplicates. "John W. Vinson" wrote: On Sun, 13 Jul 2008 10:46:05 -0700, JString wrote: The general situation: the law office that I'm working for is converting their Lotus DB to access. It already has approx. 200k records that are several years old, all of which are completely lacking unique identifiers. Accounts are organized by PLAINTIFF, DEFENDANT, CASE#. There should be one record per account, but it's all fairly disorganized and it's common to see one account listed as several different records. So, after I clean it all up, I want to create a feature that guards against this sort of situation. Here's what I'm thinking about and I need to know what the best option is: 1) Create a find duplicates query based on the fields that I listed above that is programatically launched by all data entry forms in their before update events. If it finds another duplicate, alert the user that it exists. 2) Create an ID field (Indexed, duplicates yes) based on the same fields as above. Data entry forms populate this field on before update. DCount searches only the ID field for duplicates. So, which sould be the quickest? Because I need this feature to be run every time records are changed or new records are entered, I hope you can see why efficiency is an issue. Thanks in advance. You've got worse problems than you may realize. What if you have a case for plaintiff Joe Jones, and another for Joseph Jones, and a third for Joe Jones (no, not the car dealer in the first case but the other Joe Jones, the state representative) - who is actually the Joseph Jones in the second case? Name are DREADFUL as unique identifiers. They *WILL* be misspelled, or variantly spelled; they're not unique (when I was in college there was a professor John Vinson at the same school); they simply don't WORK as unique identifiers for a person. And if your CASE# is ambiguous, I can't imagine how ANY system, Lotus, paper, human brain or Access can use it. "Janet! Get me the file on Case# 3102!" "Which one, boss? there are three..." You unfortunately may need to do a whole lot of manual work creating a People table with a meaningless numeric identifier for People, with first name, last name, *and whatever other identifying information you have available* to uniquely identify the people in your system (who may be defendents, plaintiffs, attorneys, judges, etc.) You'll also need to come up with some way to unambiguously identify each Case and each Account. What you have in your Lotus file will *feed* such a system, but it will require a lot of help to do so I fear! Good luck! -- John W. Vinson [MVP] |
#6
|
|||
|
|||
Which is more efficient?
On Mon, 14 Jul 2008 08:53:01 -0700, JString
wrote: If I split the Plaintiff and Defendant fields between first and last names, set the last name fields to required (first names will only exist for additional info), then the two last name fields + case# should be enough to create a unique reference. Of course that doesn't protect against misspelled values but it should be enough to significantly reduce the number of duplicates. Names can be repeated. You say Case# can be repeated. *You still have a problem*. With your constraints, it would be legitimate to have Case# 123; "Jones"; "Barry" Case# 123; "Jones", "Barry" referring to two different cases, Joe Jones being sued by Robert Barry and Sylvia Jones being sued by Al Barry, and happening to have the same Case#. Unless you can get the Case# to be unique for a Case you will need other information to define uniqueness. -- John W. Vinson [MVP] |
#7
|
|||
|
|||
Which is more efficient?
He does have a mess and I know this isn't a good solution but as a quick fix
for a guy that obviously doesn't want to make the needed changes: You could make a combobox for the user to select the correct person/case # - show as many fields as you think necessary for them to pick from. Now you have put the responsibility on the user to make sure they are checking for the correct name/Case#/address, etc. John - you are absolutely correct - just offering a lazy way out! And you know how I respect your opinion! Bonnie http://www.dataplus-svc.com John W. Vinson wrote: If I split the Plaintiff and Defendant fields between first and last names, set the last name fields to required (first names will only exist for [quoted text clipped - 3 lines] Of course that doesn't protect against misspelled values but it should be enough to significantly reduce the number of duplicates. Names can be repeated. You say Case# can be repeated. *You still have a problem*. With your constraints, it would be legitimate to have Case# 123; "Jones"; "Barry" Case# 123; "Jones", "Barry" referring to two different cases, Joe Jones being sued by Robert Barry and Sylvia Jones being sued by Al Barry, and happening to have the same Case#. Unless you can get the Case# to be unique for a Case you will need other information to define uniqueness. -- Message posted via http://www.accessmonster.com |
#8
|
|||
|
|||
Which is more efficient?
On Jul 14, 1:39*pm, "bhicks11 via AccessMonster.com" u44327@uwe
wrote: He does have a mess and I know this isn't a good solution but as a quick fix for a guy that obviously doesn't want to make the needed changes: You could make a combobox for the user to select the correct person/case # - show as many fields as you think necessary for them to pick from. *Now you have put the responsibility on the user to make sure they are checking for the correct name/Case#/address, etc. John - you are absolutely correct - just offering a lazy way out! *And you know how I respect your opinion! Bonniehttp://www.dataplus-svc.com John W. Vinson wrote: If I split the Plaintiff and Defendant fields between first and last names, set the last name fields to required (first names will only exist for [quoted text clipped - 3 lines] Of course that doesn't protect against misspelled values but it should be enough to significantly reduce the number of duplicates. * Names can be repeated. You say Case# can be repeated. *You still have a problem*. With your constraints, it would be legitimate to have Case# 123; "Jones"; "Barry" Case# 123; "Jones", "Barry" referring to two different cases, Joe Jones being sued by Robert Barry and Sylvia Jones being sued by Al Barry, and happening to have the same Case#. Unless you can get the Case# to be unique for a Case you will need other information to define uniqueness. -- Message posted viahttp://www.accessmonster.com- Hide quoted text - - Show quoted text - OK, so this may not exactly be helpful, but I'm still baffled by the fact that there isn't a unique identifier for the case, like I don't know, Case#. I would think it would be fairly easy to create a unique case# for every case. Either have it be a sequential number or use the date and a 2 or 3 or 4 digit case number (depending on how many are opened every day/month/year). IE...2008-07-001 would be the first case opened in July of 2008. I personally would push for a unique identifier for the case, not just for your database but to cut down on confusion everywhere. |
#9
|
|||
|
|||
Which is more efficient?
That is true... I suppose if I added in the DateFiled field then the
occurrence of a duplicate would become even more unlikely, but then as I add more fields to this search the chance of a misspelled duplicate would increase. I'm no mathematician but I would guess that after a few hundred thousand entries both of these possibilities would occur at least a handful of times. It's beginning to sound more and more like a catch-22. Even so, what about this... I could build a procedure that counts the records that match these values, and in the event that a duplicate is found, a popup form could list these records for user verification/modification. If the user sees that they are duplicates, he or she could modify them there on the spot, or if they are not duplicates a button could be pressed that updates some MatchCount field so that: Joe Jones vs. Bob Spencer (case 123) might be the first occurrence and have a value of 1 updated to MatchCount and, Mary Jones vs. Mark Spencer (case 123), being the second occurrence would have value 2 updated to MatchCount. Of course, the original duplicate count procedure would have to include this new MatchCount field along with all the others. "John W. Vinson" wrote: On Mon, 14 Jul 2008 08:53:01 -0700, JString wrote: If I split the Plaintiff and Defendant fields between first and last names, set the last name fields to required (first names will only exist for additional info), then the two last name fields + case# should be enough to create a unique reference. Of course that doesn't protect against misspelled values but it should be enough to significantly reduce the number of duplicates. Names can be repeated. You say Case# can be repeated. *You still have a problem*. With your constraints, it would be legitimate to have Case# 123; "Jones"; "Barry" Case# 123; "Jones", "Barry" referring to two different cases, Joe Jones being sued by Robert Barry and Sylvia Jones being sued by Al Barry, and happening to have the same Case#. Unless you can get the Case# to be unique for a Case you will need other information to define uniqueness. -- John W. Vinson [MVP] |
#10
|
|||
|
|||
Which is more efficient?
Case# comes from the different court systems in different counties. I don't
know exactly how their numbering systems work and it even looks like these systems might have varying rules depending on where the case was filed. " wrote: On Jul 14, 1:39 pm, "bhicks11 via AccessMonster.com" u44327@uwe wrote: He does have a mess and I know this isn't a good solution but as a quick fix for a guy that obviously doesn't want to make the needed changes: You could make a combobox for the user to select the correct person/case # - show as many fields as you think necessary for them to pick from. Now you have put the responsibility on the user to make sure they are checking for the correct name/Case#/address, etc. John - you are absolutely correct - just offering a lazy way out! And you know how I respect your opinion! Bonniehttp://www.dataplus-svc.com John W. Vinson wrote: If I split the Plaintiff and Defendant fields between first and last names, set the last name fields to required (first names will only exist for [quoted text clipped - 3 lines] Of course that doesn't protect against misspelled values but it should be enough to significantly reduce the number of duplicates. Names can be repeated. You say Case# can be repeated. *You still have a problem*. With your constraints, it would be legitimate to have Case# 123; "Jones"; "Barry" Case# 123; "Jones", "Barry" referring to two different cases, Joe Jones being sued by Robert Barry and Sylvia Jones being sued by Al Barry, and happening to have the same Case#. Unless you can get the Case# to be unique for a Case you will need other information to define uniqueness. -- Message posted viahttp://www.accessmonster.com- Hide quoted text - - Show quoted text - OK, so this may not exactly be helpful, but I'm still baffled by the fact that there isn't a unique identifier for the case, like I don't know, Case#. I would think it would be fairly easy to create a unique case# for every case. Either have it be a sequential number or use the date and a 2 or 3 or 4 digit case number (depending on how many are opened every day/month/year). IE...2008-07-001 would be the first case opened in July of 2008. I personally would push for a unique identifier for the case, not just for your database but to cut down on confusion everywhere. |
|
Thread Tools | |
Display Modes | |
|
|