If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. |
|
|
|
Thread Tools | Display Modes |
#11
|
|||
|
|||
Strategy for combining lists
"Jamie Collins" wrote in message oups.com... On Mar 16, 12:43 pm, "BruceM" wrote: Which normalization 'principles' were you concerned about? A person's name is single a atomic fact which in a DBMS is usually split into elements (e.g. first_name, middle_names, last_name) for practical purposes, therefore normalization considerations are moot. The normalization principle is storing the name redundantly. Joe Jones appears in the Employee table (stored as LastName and FirstName). Then Joe Jones is stored in the Instructor field as the full name each time he conducts a training session. In truth, that doesn't bother me very much, but I wonder if there is a better way. If you've got a compound key of (first name, last name) then using it as a foreign key is not redundancy. If you mean that two text columns are 'bigger' than one numeric column then that's not redundancy either. OK, I think I see your point, but it seems like semantics to me. In a large company you would probably need five fields to assure that a person can be uniquely identified, and one of those fields would probably need to be a Comment field in case there is no "natural" way to tell them apart. By that reasoning it seems that redundancy is almost irrelevant, since you are always storing the natural key. By the way, do you use Cascade Updates or Update queries or what exactly when somebody's name changes? I see that if I were storing three fields (Last, First, Middle) as the FK in a table related to the Employee table then there is no question of reduncancy, since I would be storing the full name in every record related to the employee table. So you seem to understand the point so why the contradiction above? I do not choose to store three FK fields where one will do. Furthermore, two people could have the same name, so either that situation needs to be anticipated, and a suffix field or something added to the key, or the database will need to be redesigned in order to accomodate that situation. As I expect you know from our previous discussions, I am inclined to use surrogate keys unless a simple natural key presents itself. You need a natural/logical key before you can use a surrogate. See my reply elsewhere in this thread. Acknowledged and commented upon in other posting. All you'd be doing is assigning a sequence number to non-employees; that's not the same thing as uniquely identifying instructors. Would you issue instructors with this sequence number to and ask them to quote it in all related correspondence etc? A key must exist in the reality being modelled therefore if you've invented it then you have to expose it. When somebody comes in to to HazMat or First Aid training, we contact a qualified company, and they send an instructor. The instructor is identified by name. I don't see that there is a reason why the key "must" exist in the reality being modelled, or that I have to expose the key. The most common reason you get for not using (first name, last name) as key is because they are commonly duplicated. Take you natural key for external trainer, being 'company identifier' (don't know your jurisdiction/business so let's say DUNS number) and trainer's name. You could tell me that training company '12345678900' has two trainers named 'Joe Jones'. There's a temptation to think an autonumber (ID) will help you here because it allows you to do this: INSERT INTO ExternalTrainers (ID, company_duns, trainer_name) VALUES (1, '12345678900', 'Joe Jones') ; INSERT INTO ExternalTrainers (ID, company_duns, trainer_name) VALUES (2, '12345678900', 'Joe Jones') ; You book both for a course. Joe Jones ID=1 goes down a storm but Joe Jones ID=2 is most unpopular. Upon re-booking you ask the company, "Send us Joe Jones but can we have someone else instead of Joe Jones, please?" I don't think asking for Joe Jones ID=1 is going to clarify matters either. It's the nature of identifiers: they actually have to identify people. This is not the model with which I am dealing. A company that has two people with the same name will find a way to distinguish them if they need to be paged or whatever. People have more sense than it seems you are willing to acknowledge. I understand the hypothetical problem, but in the real world people will find a way of distinguishing one person from another, except maybe on the No-Fly list. ;-) My bank issues me with a number, embosses it on my card, encodes it into the chip and give me a pin to remember. They invented these identifiers, exposed them and I have to be in possession of them to identify myself as the account holder. Jamie. -- |
#12
|
|||
|
|||
Strategy for combining lists
On Mar 16, 5:13 pm, "BruceM" wrote:
A company that has two people with the same name will find a way to distinguish them if they need to be paged or whatever. People have more sense than it seems you are willing to acknowledge. I understand the hypothetical problem, but in the real world people will find a way of distinguishing one person from another But computers are pretty dumb. You need to tell them everything. You can't expect them to verify personal data and identifiers for you. I'm reminded of this article about what a DBMS does: An Old Class of Errors by Fabian Pascal www.dbazine.com/ofinterest/oi-articles/pascal29 [quote] No DBMS can guarantee truth. If, for example, there is an EMPLOYEES table in the database: EMPLOYEES {EMP#, ENAME, DEPT#, SALARY} that contains a row: {E21,Pianka,A00,25000} representing a proposition about an employee: "Employee uniquely identified by employee number E21, has name Pianka, works in department A00, earns salary 25,000" there is no way a DBMS can tell whether the row represents a proposition that is true or not in the real world (in fact, it does not even know what an employee, a name, a department, or a salary is!) The only thing it can, and should do, is to guarantee consistency with the integrity constraints (and, therefore, the business rules) in effect. [Unquote] Consider the EmployeeID *you* use. I assume the trusted source for this identifier is your organization's personnel department, who are responsible for verifying that the person exists, are who they say they are, are fit for the job (health, qualifications, spent criminal convictions, etc) and so on. Then, to make things easy for the enterprise as a whole, the personnel department associate the person with an EmployeeID, something I call a 'business key', and exposes it e.g. prints it on the employee's payslip. The personnel department may even have used positive monotonic integers and a computer could have generated this for them (i.e. an autonumber) but the important thing is that the job of identifying people is the responsibility if the agency that exposes the identifier. Yes, people can be distinguished by a human in the reality being modelled. Yes, people entities within the table can be distinguished by a human using the primary key, even if it's an autonumber. What the human in question needs is a way to associate the two e.g. which Joe Jones is autonumber ID=1. AFAIK there are two ways to do this: either bring different/additional identifier data into the DBMS or expose the autonumber to the reality e.g. keep different/additional identifier plus the autonumber in a document management system (think filing cabinet, one file per trainer, autonumber written on the front of each file). Now do you see why I don't think you should put the identifiers EmployeeID and external_trainer_ID in the same coumn? They may be the same data type, both identify professional people, etc but they are not of the same domain and each have different trusted sources. If you are tempted to do the latter, consider that autonumber advocates (including the article tina linked to) say that the autonumber pk must be meaningless outside the DBMS but exposing it means you are attaching meaning; many autonumber advocates go as far as saying that an autonumber should not be exposed. Also consider that a monotonic integer may be good for the DBMS but does not make a user-friendly identifier in reality, so why not come up with a well thought out external_trainer_ID and, if you must, use an autonumber as a 'surrogate'? FWIW I wonder if external_trainer_ID is actually required: I suspect the most important fact is the contract with the training company and companies are much easier to identify, even after companies have merged etc. Maybe the person who gave the training on the day is something for the 'notes' rather than being part of the identifier. By the way, do you use Cascade Updates Sometimes but on a strictly case-by-case basis. Jamie. -- |
#13
|
|||
|
|||
Strategy for combining lists
"Jamie Collins" wrote in message
ups.com... On Mar 16, 5:13 pm, "BruceM" wrote: A company that has two people with the same name will find a way to distinguish them if they need to be paged or whatever. People have more sense than it seems you are willing to acknowledge. I understand the hypothetical problem, but in the real world people will find a way of distinguishing one person from another But computers are pretty dumb. You need to tell them everything. You can't expect them to verify personal data and identifiers for you. I'm reminded of this article about what a DBMS does: An Old Class of Errors by Fabian Pascal www.dbazine.com/ofinterest/oi-articles/pascal29 [quote] No DBMS can guarantee truth. If, for example, there is an EMPLOYEES table in the database: EMPLOYEES {EMP#, ENAME, DEPT#, SALARY} that contains a row: {E21,Pianka,A00,25000} representing a proposition about an employee: "Employee uniquely identified by employee number E21, has name Pianka, works in department A00, earns salary 25,000" there is no way a DBMS can tell whether the row represents a proposition that is true or not in the real world (in fact, it does not even know what an employee, a name, a department, or a salary is!) The only thing it can, and should do, is to guarantee consistency with the integrity constraints (and, therefore, the business rules) in effect. [Unquote] Consider the EmployeeID *you* use. I assume the trusted source for this identifier is your organization's personnel department, who are responsible for verifying that the person exists, are who they say they are, are fit for the job (health, qualifications, spent criminal convictions, etc) and so on. Then, to make things easy for the enterprise as a whole, the personnel department associate the person with an EmployeeID, something I call a 'business key', and exposes it e.g. prints it on the employee's payslip. The personnel department may even have used positive monotonic integers and a computer could have generated this for them (i.e. an autonumber) but the important thing is that the job of identifying people is the responsibility if the agency that exposes the identifier. Yes, people can be distinguished by a human in the reality being modelled. Yes, people entities within the table can be distinguished by a human using the primary key, even if it's an autonumber. What the human in question needs is a way to associate the two e.g. which Joe Jones is autonumber ID=1. AFAIK there are two ways to do this: either bring different/additional identifier data into the DBMS or expose the autonumber to the reality e.g. keep different/additional identifier plus the autonumber in a document management system (think filing cabinet, one file per trainer, autonumber written on the front of each file). I work in a facility that has fewer than 100 employees. Due to the nature of the business, it isn't going to grow to the point of having thousands of employees. The real-world environment in which this project occurs guides my decisions. There is no realistic chance that we won't be able to find a way to distinguish people from each other. If two people with the exact same name end up working in the same department and have the same job title, then we can expose the employee ID number or do something else, but until then I will concern myself with real issues. I am going to use surrogate keys. I am not going to read any more articles on the subject. My database is not going to disintegrate into anarchy as a result of my using surrogate keys. Go ahead and believe that I am wrong, misguided, or whatever you wish. I don't care. I have asked time and time again what you do if a person's name changes and you are using a "natural" key that includes the name. You need to update all of the five or six or whatever number of foreign key fields are needed to resolve the relationship, yet you use cascade updates infrequently, I gather. Now do you see why I don't think you should put the identifiers EmployeeID and external_trainer_ID in the same coumn? They may be the same data type, both identify professional people, etc but they are not of the same domain and each have different trusted sources. If you are tempted to do the latter, consider that autonumber advocates (including the article tina linked to) say that the autonumber pk must be meaningless outside the DBMS but exposing it means you are attaching meaning; many autonumber advocates go as far as saying that an autonumber should not be exposed. Also consider that a monotonic integer may be good for the DBMS but does not make a user-friendly identifier in reality, so why not come up with a well thought out external_trainer_ID and, if you must, use an autonumber as a 'surrogate'? FWIW I wonder if external_trainer_ID is actually required: I suspect the most important fact is the contract with the training company and companies are much easier to identify, even after companies have merged etc. Maybe the person who gave the training on the day is something for the 'notes' rather than being part of the identifier. Right now the users either select an instructor name from the list, or they type in a name. Anything else would involve using an input box, at the least, to enter a new outside instructor. I choose to make it as simple as I can for the users. I will store the instructor name as I always have. If the instructor is also an employee, I will store that name. If that instructor's name changes one day, the old records will have the old name. It doesn't matter. I have asked if it is possible to use a number as the identifier. I can force the number in the Outside Instructors table to be above 1000000 or whatever I want. Our employee ID numbers will not go above four digits. If need be, I can pad the number with zeros or something, but I won't worry about it for now. I have asked if it is possible to create the relationship between one of two tables (Employees and Instructors) and an Instructor field in the TrainingSession table (actually, in a related table, since there can be several instructors for a session). Whether I am using a surrogate key or a six-field natural key, my question, still unanswered, is about that very problem. By the way, do you use Cascade Updates Sometimes but on a strictly case-by-case basis. Jamie. -- |
#14
|
|||
|
|||
Strategy for combining lists
On Mar 19, 3:20 pm, "BruceM" wrote:
I have asked if it is possible to create the relationship between one of two tables (Employees and Instructors) and an Instructor field in the TrainingSession table (actually, in a related table, since there can be several instructors for a session). Whether I am using a surrogate key or a six-field natural key, my question, still unanswered, is about that very problem. A foreign key to references one table. To be able to reference 'one of two tables' you would need to use a level of abstraction such as subclassing e.g. (air code) CREATE TABLE Trainers ( trainer_ID CHAR(10) NOT NULL UNIQUE, trainer_type CHAR(8) NOT NULL, CHECK (trainer_type IN ('Internal', 'External')) ) ; CREATE TABLE InternalTrainers ( employee_ID INTEGER NOT NULL UNIQUE REFERENCES Employees (employee_ID), trainer_ID CHAR(10) NOT NULL UNIQUE, trainer_type CHAR(8) NOT NULL, CHECK (trainer_type = 'Internal'), FOREIGN KEY (trainer_type, trainer_ID) REFERENCES Trainers (trainer_type, trainer_ID), columns specific to employee-as-trainer here ) ; CREATE TABLE ExternalTrainers ( trainer_ID CHAR(10) NOT NULL UNIQUE, trainer_type CHAR(8) NOT NULL, CHECK (trainer_type = 'External'), FOREIGN KEY (trainer_type, trainer_ID) REFERENCES Trainers (trainer_type, trainer_ID), columns specific to external trainer, including key, here ) ; CREATE TABLE TrainingSessions ( trainer_ID CHAR(10) NOT NULL UNIQUE, trainer_type CHAR(8) NOT NULL, FOREIGN KEY (trainer_type, trainer_ID) REFERENCES Trainers (trainer_type, trainer_ID), columns specific to training session, including key, here ) ; Of course, trainer_ID is completely artificial here. Perhaps the above should be named 'superclassing' i.e. taking two real but distinct domains and artificially combining them via a fabricated 'superclass'. As I said earlier, for me the obvious answer is to keep them separate i.e. one table for employee-as-trainer sessions and one for external trainer-run sessions. If you are having trouble combining them, why force the issue? What is the motivation for a combined table? I have asked time and time again what you do if a person's name changes and you are using a "natural" key that includes the name. You need to update all of the five or six or whatever number of foreign key fields are needed to resolve the relationship, yet you use cascade updates infrequently, I gather. Me personally? I rarely if ever use a person's name as a DBMS identifier, for all the obvious reasons. I tend to use industry standard keys (ISBN) and business keys (employee_ID) that have a trusted source. I think I asked this upthread e.g. are trainers (people) regulated by an authority, association, etc that can provide an identifier? And I've also said I think the identifier should be for the training company (i.e. the one with which there is a constract) rather than individual people. Sorry, I'm not familiar with the business training sector in your region of the world to give a prescriptive answer but every region tends to have a governmental bureaucracy (e.g. UK=Companies House Reference Number) or tax agency (UK=HMRC Unique Tax Reference, ECON/SCON numbers, etc) that can assist; DUNS is private sector and not universal but has the advantage of being global. I am going to use surrogate keys. I am not going to read any more articles on the subject. My database is not going to disintegrate into anarchy as a result of my using surrogate keys. Go ahead and believe that I am wrong, misguided, or whatever you wish. I don't care. I haven't been arguing against surrogates and I generally tend to avoid doing so because I see it as one of those 'lifestyle choices' i.e. it's not for me but I don't criticize you for choosing them because I can see some benefit but I think the disadvantages outweigh. Jamie. -- |
#15
|
|||
|
|||
Strategy for combining lists
"Jamie Collins" wrote in message ups.com... On Mar 19, 3:20 pm, "BruceM" wrote: I have asked if it is possible to create the relationship between one of two tables (Employees and Instructors) and an Instructor field in the TrainingSession table (actually, in a related table, since there can be several instructors for a session). Whether I am using a surrogate key or a six-field natural key, my question, still unanswered, is about that very problem. A foreign key to references one table. To be able to reference 'one of two tables' you would need to use a level of abstraction such as subclassing e.g. (air code) CREATE TABLE Trainers ( trainer_ID CHAR(10) NOT NULL UNIQUE, trainer_type CHAR(8) NOT NULL, CHECK (trainer_type IN ('Internal', 'External')) ) ; CREATE TABLE InternalTrainers ( employee_ID INTEGER NOT NULL UNIQUE REFERENCES Employees (employee_ID), trainer_ID CHAR(10) NOT NULL UNIQUE, trainer_type CHAR(8) NOT NULL, CHECK (trainer_type = 'Internal'), FOREIGN KEY (trainer_type, trainer_ID) REFERENCES Trainers (trainer_type, trainer_ID), columns specific to employee-as-trainer here ) ; CREATE TABLE ExternalTrainers ( trainer_ID CHAR(10) NOT NULL UNIQUE, trainer_type CHAR(8) NOT NULL, CHECK (trainer_type = 'External'), FOREIGN KEY (trainer_type, trainer_ID) REFERENCES Trainers (trainer_type, trainer_ID), columns specific to external trainer, including key, here ) ; CREATE TABLE TrainingSessions ( trainer_ID CHAR(10) NOT NULL UNIQUE, trainer_type CHAR(8) NOT NULL, FOREIGN KEY (trainer_type, trainer_ID) REFERENCES Trainers (trainer_type, trainer_ID), columns specific to training session, including key, here ) ; I don't know how to read this code well enough either to understand just what it means (what is CHAR(8), etc.), or to get it to produce a table. I thought that I need to check the ANSI 92 compatible syntax box in Tools Options Queries, but as I said I don't know how to make it work when it doesn't produce a table. Or maybe that's not the box I need to check. Anyhow, I'll take your word for it that I would need to use subclassing. Of course, trainer_ID is completely artificial here. Perhaps the above should be named 'superclassing' i.e. taking two real but distinct domains and artificially combining them via a fabricated 'superclass'. As I said earlier, for me the obvious answer is to keep them separate i.e. one table for employee-as-trainer sessions and one for external trainer-run sessions. If you are having trouble combining them, why force the issue? What is the motivation for a combined table? I got it into my head that I should try to avoid storing data redundantly. To my way of thinking, storing a name over and over again would qualify as redundancy. Now I don't know what to think, so I'm just going to go with something that will work for my purposes. I have asked time and time again what you do if a person's name changes and you are using a "natural" key that includes the name. You need to update all of the five or six or whatever number of foreign key fields are needed to resolve the relationship, yet you use cascade updates infrequently, I gather. Me personally? I rarely if ever use a person's name as a DBMS identifier, for all the obvious reasons. I tend to use industry standard keys (ISBN) and business keys (employee_ID) that have a trusted source. I think I asked this upthread e.g. are trainers (people) regulated by an authority, association, etc that can provide an identifier? And I've also said I think the identifier should be for the training company (i.e. the one with which there is a constract) rather than individual people. Sorry, I'm not familiar with the business training sector in your region of the world to give a prescriptive answer but every region tends to have a governmental bureaucracy (e.g. UK=Companies House Reference Number) or tax agency (UK=HMRC Unique Tax Reference, ECON/SCON numbers, etc) that can assist; DUNS is private sector and not universal but has the advantage of being global. Some trainers are individuals and some are from companies. Identifying them by company will only work for a very few. The only thing I'm trying to do is to assure that each person's record has an unchanging unique identifier. If I am making a database for a volunteer organization it is quite possible that the people won't have ID numbers, so I will have to assign something arbitrary. I have no problem with doing that in situations where people do have standard-issue ID numbers, because I have already seen the ID number format change. The trainers are not necessarily regulated in any particular way. HazMat trainers are very different from trainers who represent a government agency, who are in turn very different from those who represent our customers and who are providing instruction on using a new kind of tooling or whatever. For some of these people there is no particular numbering system at all, much less a single system for identifying all of them without risk of duplication. I am going to use surrogate keys. I am not going to read any more articles on the subject. My database is not going to disintegrate into anarchy as a result of my using surrogate keys. Go ahead and believe that I am wrong, misguided, or whatever you wish. I don't care. I haven't been arguing against surrogates and I generally tend to avoid doing so because I see it as one of those 'lifestyle choices' i.e. it's not for me but I don't criticize you for choosing them because I can see some benefit but I think the disadvantages outweigh. I had thought your arguments were in favor of exposing the key fields, which is why I thought you were opposed to the use of surrogate keys. Frankly, I don't follow all of what you are saying, as I don't have the same vocabulary. Thanks for weighing in, though. Jamie. -- |
|
Thread Tools | |
Display Modes | |
|
|