Ramakrishnan and Gehrke Solutions-215-229

pdf

School

University of California, Berkeley *

*We aren’t endorsed by this school

Course

C200

Subject

Information Systems

Date

Dec 6, 2023

Type

pdf

Pages

15

Uploaded by UltraSnowBear26

Report
210 Chapter 20 5. What normal form is your good schema in? Give an example of a query that is likely to run slower on this schema than on the relation G . 6. Is there a lossless-join, dependency-preserving decomposition of G into BCNF? 7. Is there ever a good reason to accept something less than 3NF when designing a schema for a relational database? Use this example, if necessary adding further constraints, to illustrate your answer. Answer 20.4 Answer omitted. Exercise 20.5 Consider the following BCNF relation, which lists the ids, types (e.g., nuts or bolts), and costs of various parts, along with the number available or in stock: Parts ( pid , pname, cost, num avail ) You are told that the following two queries are extremely important: Find the total number available by part type, for all types. (That is, the sum of the num avail value of all nuts, the sum of the num avail value of all bolts, and so forth) List the pid s of parts with the highest cost. 1. Describe the physical design that you would choose for this relation. That is, what kind of a file structure would you choose for the set of Parts records, and what indexes would you create? 2. Suppose your customers subsequently complain that performance is still not satis- factory (given the indexes and file organization you chose for the Parts relation in response to the previous question). Since you cannot afford to buy new hardware or software, you have to consider a schema redesign. Explain how you would try to obtain better performance by describing the schema for the relation(s) that you would use and your choice of file organizations and indexes on these relations. 3. How would your answers to the two questions change, if at all, if your system did not support indexes with multiple-attribute search keys? Answer 20.5 The answer to each question is given below. 1. A heap file structure could be used for the relation Parts. A dense unclustered B+Tree index on pname, num avail and a dense unclustered B+ Tree index on cost, pid can be created to efficiently answers the queries.
Physical Database Design and Tuning 211 2. The problem could be that the optimizer may not be considering the index only plans that could be obtained using the previously described schema. So we can instead create clustered indexes on pid, cost and pname, num avail . To do this we have to vertically partition the relation into two relations like Parts1( pid, cost ) and Parts2( pid, pname, num avail). (If the indexes themselves have not been implemented properly, then we can instead use sorted file organizations for these two split relations). 3. If the multi attribute keys are not allowed then we can have a clustered B+ Tree indexes on cost and on pname on the two relations. Exercise 20.6 Consider the following BCNF relations, which describe employees and the departments they work in: Emp ( eid , sal, did ) Dept ( did , location, budget ) You are told that the following queries are extremely important: Find the location where a user-specified employee works. Check whether the budget of a department is greater than the salary of each employee in that department. 1. Describe the physical design you would choose for this relation. That is, what kind of a file structure would you choose for these relations, and what indexes would you create? 2. Suppose that your customers subsequently complain that performance is still not satisfactory (given the indexes and file organization that you chose for the rela- tions in response to the previous question). Since you cannot afford to buy new hardware or software, you have to consider a schema redesign. Explain how you would try to obtain better performance by describing the schema for the rela- tion(s) that you would use and your choice of file organizations and indexes on these relations. 3. Suppose that your database system has very inefficient implementations of index structures. What kind of a design would you try in this case? Answer 20.6 Answer omitted. Exercise 20.7 Consider the following BCNF relations, which describe departments in a company and employees:
212 Chapter 20 Dept( did , dname, location, managerid ) Emp( eid , sal ) You are told that the following queries are extremely important: List the names and ids of managers for each department in a user-specified loca- tion, in alphabetical order by department name. Find the average salary of employees who manage departments in a user-specified location. You can assume that no one manages more than one department. 1. Describe the file structures and indexes that you would choose. 2. You subsequently realize that updates to these relations are frequent. Because indexes incur a high overhead, can you think of a way to improve performance on these queries without using indexes? Answer 20.7 The answer to each question is given below. 1. A heap file organization for the two relations is sufficient if we create the following indexes. For the first, a clustered B+ tree index on location, dname would improve performance (we cannot list the names of the managers because there is no name attribute present). We can also have a hash index on eid on the Emp relation to speed up the second query: we find all of the managerids from the B+ tree index, and then use the hash index to find their salaries. 2. Without indexes, we can use horizontal decompostion of the Dept relation based on the location. We can also try sorted file organizations, with the relation Dept sorted on dname and Emp on eid . Exercise 20.8 For each of the following queries, identify one possible reason why an optimizer might not find a good plan. Rewrite the query so that a good plan is likely to be found. Any available indexes or known constraints are listed before each query; assume that the relation schemas are consistent with the attributes referred to in the query. 1. An index is available on the age attribute: SELECT E.dno FROM Employee E WHERE E.age=20 OR E.age=10 2. A B+ tree index is available on the age attribute:
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Physical Database Design and Tuning 213 SELECT E.dno FROM Employee E WHERE E.age < 20 AND E.age > 10 3. An index is available on the age attribute: SELECT E.dno FROM Employee E WHERE 2*E.age < 20 4. No index is available: SELECT DISTINCT * FROM Employee E 5. No index is available: SELECT AVG (E.sal) FROM Employee E GROUP BY E.dno HAVING E.dno=22 6. The sid in Reserves is a foreign key that refers to Sailors: SELECT S.sid FROM Sailors S, Reserves R WHERE S.sid=R.sid Answer 20.8 Answer omitted. Exercise 20.9 Consider two ways to compute the names of employees who earn more than $100,000 and whose age is equal to their manager’s age. First, a nested query: SELECT E1.ename FROM Emp E1 WHERE E1.sal > 100 AND E1.age = ( SELECT E2.age FROM Emp E2, Dept D2 WHERE E1.dname = D2.dname AND D2.mgr = E2.ename ) Second, a query that uses a view definition: SELECT E1.ename FROM Emp E1, MgrAge A WHERE E1.dname = A.dname AND E1.sal > 100 AND E1.age = A.age
214 Chapter 20 CREATE VIEW MgrAge (dname, age) AS SELECT D.dname, E.age FROM Emp E, Dept D WHERE D.mgr = E.ename 1. Describe a situation in which the first query is likely to outperform the second query. 2. Describe a situation in which the second query is likely to outperform the first query. 3. Can you construct an equivalent query that is likely to beat both these queries when every employee who earns more than $100,000 is either 35 or 40 years old? Explain briefly. Answer 20.9 1. Consider the case when there are very few or no employees having salary more than 100K. Then in the first query the nested part would not be computed (due to short circuit evaluation) whereas in the second query the join of Emp and MgrAge would be computed irrespective of the number of Employees with sal > 100K. Also, if there is an index on dname , then the nested portion of of the first query will be efficient. However, the index does not affect the view in the second query since it is used from a view. 2. In the case when there are a large number of employees with sal > 100K and the Dept relation is large, in the first query the join of Dept and Emp would be computed for each tuple in Emp that satisfies the condition E1.sal > 100K, whereas in the latter the join is computed only once. 3. In this case the selectivity of age may be very high. So if we have a B+ Tree index on age, sal , then the following query may perform better. SELECT E1.ename FROM Emp E1 WHERE E1.age=35 AND E1.sal > 100 AND E1.age = ( SELECT E2.age FROM Emp E2, Dept D2 WHERE E1.dname = D2.dname AND D2.mgr = E2.ename) UNION SELECT E1.ename FROM Emp E1 WHERE E1.age = 40 AND E1.sal > 100 AND E1.age = ( SELECT E2.age FROM Emp E2, Dept D2 WHERE E1.dname = D2.dname AND D2.mgr = E2.ename)
21 SECURITY Exercise 21.1 Briefly answer the following questions: 1. Explain the intuition behind the two rules in the Bell-LaPadula model for manda- tory access control. 2. Give an example of how covert channels can be used to defeat the Bell-LaPadula model. 3. Give an example of polyinstantiation. 4. Describe a scenario in which mandatory access controls prevent a breach of security that cannot be prevented through discretionary controls. 5. Describe a scenario in which discretionary access controls are required to enforce a security policy that cannot be enforced using only mandatory controls. 6. If a DBMS already supports discretionary and mandatory access controls, is there a need for encryption? 7. Explain the need for each of the following limits in a statistical database system: (a) A maximum on the number of queries a user can pose. (b) A minimum on the number of tuples involved in answering a query. (c) A maximum on the intersection of two queries (i.e., on the number of tuples that both queries examine). 8. Explain the use of an audit trail, with special reference to a statistical database system. 9. What is the role of the DBA with respect to security? 10. Describe AES and its relationship to DES. 215
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
216 Chapter 21 11. What is public-key encryption? How does it differ from the encryption approach taken in the Data Encryption Standard (DES), and in what ways is it better than DES? 12. Explain how a company offering services on the Internet could use encryption- based techniques to make its order-entry process secure. Discuss the role of DES, AES, SSL, SET, and digital signatures. Search the Web to find out more about related techniques such as electronic cash . Answer 21.1 The answer to each question is given below. 1. The Simple Security Property states that subjects can only interact with objects with a lesser or equal security class. This ensures subjects with low security classes from accessing high security objects. The *-Property states that subjects can only create objects with a greater or equal security class. This prevents a high security subject from mistakenly creating an object with a low security class (which low security subjects could then access!). 2. One example of a covert channel is in statistical databases. If a malicious subject wants to find the salary of a new employee, and can issue queries to find the average salary in a department, and the total number of current employees in the depatment, then the malicious subject can calculate the new employees salary based on the increase in average salary and number of employees. 3. Say relation R contains the following values: cid carname Security Class 1 Honda U 1 Porsche C 2 Toyota C 3 Mazda C 3 Ferrari TS Then subjects with security class U will see R as: cid carname Security Class 1 Honda U Subjects with security class C will see R as: cid carname Security Class 1 Honda U 1 Porsche C 2 Toyota C 3 Mazda C
Security 217 Subjects with security class TS will see R as: cid carname Security Class 1 Honda U 1 Porsche C 2 Toyota C 3 Mazda C 3 Ferrari TS 4. Trojan horse tables are an example where discretionary access controls are not sufficient. If a malicious user creates a table and has access to the source code of some other user with privileges to other tables, then the malicious user can modify the source code to copy tuples from privileged tables to his or her non-privileged table. 5. Manditory access controls do not distinguish between people in the same clearance level so it is not possible to limit permissions to certain users within the same clearance level. Also, it is not possible to give only insert or select privileges to different users in the same level: all users in the same clearance level have select, insert, delete and update privileges. 6. Yes, especially if the data is transmitted over a network in a distributed environ- ment. In these cases it is important to encrypt the data so people ’listening’ on the wire cannot directly access the information. 7. (a) If a user can issue an unlimited number of queries, he or she can repeatedly decompose statistical information by gathering the statistics at each level (for example, at age ¿ 20, age ¿ 21, etc.). (b) If a malicious subject can query a database and retrieve single rows of statis- tical information, he or she may be able to isolate sensitive information such as maximum and minimum values. (c) Often the information from two queries can be combined to deduce or infer specific values. This is often the case with average and total aggregates. This can be prevented by restricting the tuple overlap between queries. 8. The audit trail is a log of updates with the authorization id of the user who issued the update. Since it is possible to infer information from statistical databases using repeated queries, or queries that target a common set of tuples, the DBA can use an audit trail to see which people issued these security-breaking queries. 9. The DBA creates new accounts, ensures that passwords are safe and changed often, assigns mandatory access control levels, and can analyze the audit trail to look for security breaches. They can also assist users with their discretionary permissions.
218 Chapter 21 10. Public-key encryption is an encryption scheme that uses a public encryption key and a private decryption key. These keys are part of one-way functions whose in- verse is very difficult to determine (which is why large prime numbers are involved in encryption algorithms...factoring is difficult!). The public key and private key are inverses which allow a user to encrypt any information, but only the person with the private key can decode the messages. DES has only one key and a specific decrypting algorithm. DES decoding can be more difficult and relies on only one key so both the sender and the receiver must know it. 11. A one-way function is a mathematical function whose inversese is very difficult to determine. These are used to determine the public and private keys, and to do the actual decoding: a message is encoding using the function and is decoded using the inverse of the function. Since the inverse is difficult to find, the code can not be broken easily. 12. An internet server could issue each user a public key with which to encrypt his or her data and send it back to the server (which holds all of the private keys). This way users cannot decode other users’ messages, and even knowledge of the public key is not sufficient to decode the message. With DES, the encryption key is used both in encryption and decryption so sending keys to users is risky (anyone who intercepts the key can potentially decode the message). Exercise 21.2 You are the DBA for the VeryFine Toy Company and create a relation called Employees with fields ename , dept , and salary . For authorization reasons, you also define views EmployeeNames (with ename as the only attribute) and DeptInfo with fields dept and avgsalary . The latter lists the average salary for each department. 1. Show the view definition statements for EmployeeNames and DeptInfo. 2. What privileges should be granted to a user who needs to know only average department salaries for the Toy and CS departments? 3. You want to authorize your secretary to fire people (you will probably tell him whom to fire, but you want to be able to delegate this task), to check on who is an employee, and to check on average department salaries. What privileges should you grant? 4. Continuing with the preceding scenario, you do not want your secretary to be able to look at the salaries of individuals. Does your answer to the previous question ensure this? Be specific: Can your secretary possibly find out salaries of some individuals (depending on the actual set of tuples), or can your secretary always find out the salary of any individual he wants to? 5. You want to give your secretary the authority to allow other people to read the EmployeeNames view. Show the appropriate command.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Security 219 6. Your secretary defines two new views using the EmployeeNames view. The first is called AtoRNames and simply selects names that begin with a letter in the range A to R. The second is called HowManyNames and counts the number of names. You are so pleased with this achievement that you decide to give your secretary the right to insert tuples into the EmployeeNames view. Show the appropriate command and describe what privileges your secretary has after this command is executed. 7. Your secretary allows Todd to read the EmployeeNames relation and later quits. You then revoke the secretary’s privileges. What happens to Todd’s privileges? 8. Give an example of a view update on the preceding schema that cannot be imple- mented through updates to Employees. 9. You decide to go on an extended vacation, and to make sure that emergencies can be handled, you want to authorize your boss Joe to read and modify the Employees relation and the EmployeeNames relation (and Joe must be able to delegate authority, of course, since he is too far up the management hierarchy to actually do any work). Show the appropriate SQL statements. Can Joe read the DeptInfo view? 10. After returning from your (wonderful) vacation, you see a note from Joe, indicating that he authorized his secretary Mike to read the Employees relation. You want to revoke Mike’s SELECT privilege on Employees, but you do not want to revoke the rights you gave to Joe, even temporarily. Can you do this in SQL? 11. Later you realize that Joe has been quite busy. He has defined a view called All- Names using the view EmployeeNames, defined another relation called StaffNames that he has access to (but you cannot access), and given his secretary Mike the right to read from the AllNames view. Mike has passed this right on to his friend Susan. You decide that, even at the cost of annoying Joe by revoking some of his privileges, you simply have to take away Mike and Susan’s rights to see your data. What REVOKE statement would you execute? What rights does Joe have on Employees after this statement is executed? What views are dropped as a consequence? Answer 21.2 Answer omitted. Exercise 21.3 You are a painter and have an Internet store where you sell your paintings directly to the public. You would like customers to pay for their purchases with credit cards, and wish to ensure that these electronic transactions are secure. Assume that Mary wants to purchase your recent painting of the Cornell Uris Library. Answer the following questions. 1. How can you ensure that the user who is purchasing the painting is really Mary?
220 Chapter 21 2. Explain how SSL ensures that the communication of the credit card number is secure. What is the role of a certification authority in this case? 3. Assume that you would like Mary to be able to verify that all your email mes- sages are really sent from you. How can you authenticate your messages without encrypting the actual text? 4. Assume that your customers can also negotiate the price of certain paintings and assume that Mary wants to negotiate the price of your painting of the Madison Terrace. You would like the text of this communication to be private between you and Mary. Explain the advantages and disadvantages of different methods of encrypting your communication with Mary. Answer 21.3 The answer to each question is given below. 1. In order to determine whether the user who is purchasing the painting is really Mary, we need some level of verification when Mary first registers with the system. On the lowest level, we can simply ask the user to confirm things like Mary’s ad- dress or social security number. To increase the level of security, we could also ask the user to verify Mary’s credit card number. Since these numbers are deemed difficult to obtain, most merchant websites consider this sufficient evidence for proof of identity. For an even higher level of security, we can take external steps to verify Mary’s information such as calling her up with the phone number provided, sending a let- ter to Mary’s mailing address, or sending her an e-mail with instructions to reply back. In each instance, we attempt to validate the information the user provided so that the element of uncertainty in the provided information is decreased. 2. SSL Encryption is a form of public-key encryption where a third party certification authority acts to validate public keys between two clients. In a general public-key encryption system, data is sent to a user encrypted with a publicly known key for that user, such that only the user’s private key, known only to that user, can be used to decrypt the data. Attempts to decrypt the information using other keys will produce garbage data, and the ability to decipher the private key is considered computationally expensive even for the most modern computing systems. In SSL Encryption, the client transmitting the data asks the certification au- thority for a certificate containing public key information about the other client. The first client then validates this information by decrypting the certificate to get the second client’s public key. If the decrypted certificate matches up with the certification authority’s information, the first client then uses this public key to encrypt a randomly generated session key and sends it to the second client. The first client and second client now have a randomly generated public-key system
Security 221 that they can use to communicate privately without fear that anyone else can decode their information. Once complete, SSL encryption ensures that data such as credit card informa- tion transmitted between the two clients cannot be easily decrypted by others intercepting packets because the certification authority helped to generate a ran- domly created public-key that only the two clients involved can understand. 3. A message to Mary can be sent unencrypted with a message signature attached to the message. A signature is obtained by applying a one-way function to the message and is considerably smaller than the message itself. Mary can then apply the one-way function and if the results of it match the signature, she’ll know it was authentic. 4. One method of sending Mary a message is to create a digital signature for the message by encrypting it twice. First, we encrypt the message using our private key, then we encrypt the results using Mary’s public key. The first encryption ensures that the message did indeed come from us, since only we know our private key while the second encryption ensures that only Mary can read the message, since only Mary knows her private key. This system is very safe from tampering since it is hard to send a message pre- tending to be someone else as well as difficult to properly decode an intercepted message. The only disadvantages are that it requires that we have copies of each person’s public key as well as spend the time to encrypt/decrypt the messages. For example, if Mary receives the message on her laptop or PDA while traveling, she may not have the resources or public keys to decrypt it and respond, and might need to wait until she returns to the office. Another method of communicating with Mary, discussed in the previous question, is to use message signatures. This allows Mary to be able to read the message from almost any system since it is not encrypted and ensure that the message is authentic. The only disadvantage is that it does not safely prevent someone else from reading the message as well. Exercise 21.4 Consider Exercises 6.6 to 6.9 from Chapter 6. For each exercise, iden- tify what data should be accessible to different groups of users, and write the SQL statements to enforce these access control policies. Answer 21.4 Answer omitted.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
222 Chapter 21 Exercise 21.5 Consider Exercises 7.7 to 7.9 from Chapter 7. For each exercise, dis- cuss where encryption, SSL, and digital signatures are appropriate. Answer 21.5 The answer to each question is given below. Exercise 7.7 For the Notown Records website, encryption plays an important part in ensuring that customers are able to safely interact and order records over the Internet. Before discussing what should be encrypted, it is important to also note what should not be encrypted. Many of operations including searching the database and browsing record catalogs do not require any encryption. These operations are performed often and encrypting every communication from the client to the website would severely drain the server’s resources. As a result, it is better for the server to focus on encrypting only information that is of a more serious nature. There are three places where we can apply an encryption scheme to this system: user registration, user login, and user checkout. The purpose of encrypting data at registration and checkout is obvious, the user is transmitting sensitive personal information like address and credit card numbers, and it is of utmost importance to protect this information. We also encrypt the password transmitted during user login since that ensures that future communications with the user are safe after login. In practice, we would use SSL encryption via a certification authority, e.g., Verisign. Upon an encryption request, the client’s web browser requests the Verisign cer- tificate, validates it by decrypting it, and uses the public key from the decrypted certificate to encrypt a radomly generated session key that it then sends to the server. Using this session key, the certification authority is no longer needed, and the server/client then transmit information just as they would in a normal public key system. The Notown website could also use digital signatures to verify a user’s registration information via e-mail, as well as send an order confirmation e-mail after an order has been placed. By checking the digital signature of e-mails sent, the website and user can help to double check each other’s identities after a new registration has been processed or a transaction has been placed. Exercise 7.8 For this question, security is much more important than in the previous question, since medical records are considered very serious. As such, we would want to encrypt most of the information transmitted during a doctor or patient’s visit to the website. The only information we do not need to transit encrypted would be pages with read-only data, e.g., company information or help pages. Although
Security 223 this puts a higher strain on the system, the demand for security from users in the medical field is much higher than that of users in the record sales industry. As before, we would use an SSL encryption via a certification authorization to handle the actual encryption. When a new user registers, it would especially important to verify their identify as to prevent someone else from ordering pre- scriptions in their name. To this end, digital signatures would be much more important to verify the validity of the e-mail used to sign up with. In addition, external authorization including faxes, phone calls, and written mail should also be used to verify registration information when a user signs up. Exercise 7.9 For the course enrollment system, security is important for many of the processes, but the system does not need to encrypt as much as it did in the online pharmacy system. For faculty members, their passwords should be encrypted when they login as well as their registration information when they sign up for the first time. When they create/delete existing courses, it is probably not necessary to encrypt their data so long as the system is sure the user is properly logged in. To this end, the system could ask the faculty to re-enter their passwords when they are about to submit a change to the system. For students, their login and registration should also be encrypted. Like the faculty system, it is probably not important to encrypt any of the information about what classes a student signs up for as long as the login information is accu- rate. Students would then be able to freely modify their schedule after logging in to the website and only be asked their password again when they were ready to submit their changes to the system. In both cases, SSL encryption would again be the method of choice for the ac- tual encryption process. Digital signatures could be used in e-mails sent to the students confirming their course registration information. Exercise 7.10 For the airline reservation website, encryption would again be important in user login and user registration information for both customers and employees. Other information like searching for flights should be left unencrypted as a customer may search for dozens of flights. When a customer purchases a flight, their order, including their flight information and credit card number, should be encrypted to prevent others from learning which flight they have selected, as well as protecting the customer’s money. For airline employees, it is preferable to encrypt all of the transmitted data so as not to give hackers the opportunity to see any backend part of the system. Since it is likely there will be few employees per number of
224 Chapter 21 clients, the drain on the system from the extra level of encryption for employees should be negligible. Note that before when we considered the prescription website, we recommended encrypting all transmitted data, whereas when we considered the course enroll- ment system we recommended a looser form of encryption on both classes of users. By looser form of encryption, we refer to the fact that some of the transmitted data is encrypted while some data is not. Contrast this with the airline reservation system, where we suggest loose encryption for customers and total encryption for employees. For each, keep in mind that the true level of encryption can change depending on the available resources and sensitivity of the data. As before, SSL encryption is the preferred method of data encryption. For the airline reservation system, digital signatures can be applied to confirm when cus- tomers have places orders via e-mail.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help