Alex Amado, HW7

pdf

School

Purdue University *

*We aren’t endorsed by this school

Course

10100

Subject

Computer Science

Date

Dec 6, 2023

Type

pdf

Pages

Uploaded by MinisterPartridge3587

CS 10100 Homework 07 This is Question 1 on Page 1 CS 10100 Homework 07 Due: 11:59 pm Thursday, Oct. 19, 2023 • Delete nothing from this file. • Edit this file to add your typewritten answers to each question. • When your answer includes a diagram make sure that it is clear and large enough to read. • Ensure that your answer fits on the same page as its question. • If you change the pagination of this file or if your complete answer to a question does not fit on the page with that question, then you may receive a lower score. • Export your completed Word file to PDF. • Upload your PDF file to Gradescope.com. It is your responsibility to upload this assignment to its correct place in Gradescope. You may upload multiple times. Your final upload will be scored. Use the download capability to check your upload. • Uploading will be blocked after the due time (plus grace period). • Max score = 10 points; 1 point per question. • The above directions apply for all assignments uploaded to Gradescope. • Why should your answer be on the same page with its question? Answer: Gradescope has been programmed to expect that your PDF file will have exactly one question and your entire answer to it on one page. This allows Gradescope to automatically find and display your answers to the instructional team for scoring. HW07 Q1. The Dewey Decimal System tries to impose a strict hierarchy on all knowledge. What is the chief problem with a strict hierarchy? Hint: think of classifying a book with a title "Artistic Beauty in Scientific Discoveries". The chief problem with a strict hierarchy is that objects, in the case of the Dewey Decimal System those objects are books, could fit into more than one category. A book titled Artistic Beauty in Scientific Discoveries is a topic and a title that could fit into many categories. Do you classify in the arts, do you classify it with beauty, or do you forget the artistic and beauty part and classify it with science? These are all the types of questions that arise with a strict hierarchy classification system.

CS 10100 Homework 07 This is Question 2 on Page 2 HW07 Q2. The following is a database named Patient_Info. Patient_ID Name Last_4_SSN Zipcode 0000 Pete Persistent Purdue 1869 47907 0001 Mary M. Q. Contrary 9876 47906 0002 Alice B. Park 0000 20755 0003 Bob N. Sansa 9999 20755 0004 Eve S. Dropper 5555 47907 0005 Professor Plum cannot recall 47907 0006 The Prisoner 0006 currently unknown a. Write a SQL query to select all records where the Zipcode is 47907. Select * From Patient_info WHERE zipcode = ‘47907’ b. Write a SQL Query to select all records where the Patient_ID is greater than 0003, but only extract Patient_ID, Name, and Last_4_SSN Select Patient_ID, Name, Last_4_SSN From Patient_info WHERE Patient_ID > ‘0003’ c. Which field of Patient_Info is the best option to be the key of this database table? Explain. The patient ID is the best option to be the key of this database for many reasons. For one, it is unique to each professor, regardless of if two professors have the same name, they will never have the same ID. The patient ID is also the best option because it will stay constant over time, even if their zipcode and name changes.

CS 10100 Homework 07 This is Question 3 on Page 3 HW07 Q3. You are asked to set up a database to store information on all the shoes for sale in a department store. You will need to keep each shoe’s color, category, size, brand, price, how many are available. Devise a schema that specifies a possible set of fields for the database. My schema would look something like this: Shoe color (varchar), shoe category (varchar), shoe size (decimal), shoe price (decimal), shoe brand (varchar), and how many shoes are available (int).

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

CS 10100 Homework 07 This is Question 4 on Page 4 HW07 Q4. Consider the database you created from Question 3 which contains information about thousands of shoes in a department store. Now consider a query to find all the Nike shoes, and list those shoes from lowest price to highest price. Here are two approaches to answer the query: 1. Sort all shoes by price, then select all Nike shoes. 2. Select all Nike shoes, then sort Nike shoes by price. Which approach would be faster? Explain. The second approach would be faster because it narrows the search first and then narrows it further to the data that you are searching for. The category of Nike shoes is much a much bigger category than price, so it makes the best sense to go from biggest category to smallest, so starting by sorting by Nike shoes first, then by price since selection is faster and more efficient than sorting.

CS 10100 Homework 07 This is Question 5 on Page 5 HW07 Q5. A CEO at a company requests that a new database be created for customer accounts. When the database administrator asks what information should be in the database, the CEO says, “Oh, you know, make it include all the usual about customers -- stuff like their addresses.” List 5 fields that the CEO might be referring to by “usual stuff”. By usual stuff, the CEO might be referring to the information a customer may have to put in to make an account. The CEO might be referring to address, phone number, email, name, and finally purchase history.

CS 10100 Homework 07 This is Question 6 on Page 6 HW07 Q6. There are countless documents accessible on the world wide web. Explain how internet search engines are able to provide relevant results quickly when you enter search terms. Even though there are billions and billions of documents on the internet, but search engines are able to provide relevant results quickly when a user enters search terms because they look for different characteristics to minimize results and only focus on what the user is looking for. Search engines focus on keywords as well as algorithms to determine the quality of articles to recommend to users.

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

CS 10100 Homework 07 This is Question 7 on Page 7 HW07 Q7. Given what we know about information retrieval systems, why is it a good idea to use multiple keywords when searching for a document on the Internet? Using multiple keywords when searching for a document on the internet can help the search engine specify what the user is looking for, making data gathering and research easier for the user. Using multiple keywords also provides the search engine with more valuable information in order to come up with more accurate results. Especially when doing thorough research, using multiple keywords can help gain more documents that contain the information that the user is looking for, and it allows the search engine to limit the sources that the user does not need, and increase the amount of information that user was striving for.

CS 10100 Homework 07 This is Question 8 on Page 8 HW07 Q8. The method of classification can greatly affect the speed of a search. Consider classifying all the students at Purdue. How might the office of the Registrar choose to classify the students? How might a student organization dedicated to the success of students from Indiana classify the students? How and why would the classifications used by the two groups differ? The Office of the Registrar might choose to classify students in a variety of ways. The Office of the Restrar could classify students based on age, name, or even major. A Purdue student organization dedicated to the success of students from Indiana would probably classify students by their hometown in Indiana, or by zip code/postal code. The classifications used by these two groups differ because they are being stored for different reasons. The Office of the Registrar is storing student data because they have to keep track of who is a student, and they can use less in-depth measures and methods of classification in doing so. The student organization dedicated to the success of students from Indiana has to take more advanced classification methods into thought since they are storing and classifying data for a much more specific reason than the Office of the Registrar.

CS 10100 Homework 07 This is Question 9 on Page 9 HW07 Q9. Your cousin Charlotte works for the HR department of a large firm. At a family gathering you hear her complaining that after getting married, six people at the firm decided to change their names, and she had to update four separate databases for each name change: payroll information, employee account information, benefit package information, and ID badges. Based on what Charlotte has told you, what mistake did the database administrator make when setting up databases for the company? Hint: think of how the system could be designed so Charlotte wouldn't have to update the names in four separate databases. The database administrator made the mistake of not implementing the priciples of data normalization properly. Data normalization is a process in designing a database where data is organized efficiently to reduce redundancy and update data anomalies. In the case of this scenatrion, it seems that the database suffers from redundancy and a lack of proper data normalization, which leads to updates needing to be done in order to continue to store the same information.

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

CS 10100 Homework 07 This is Question 10 on Page 10 HW07 Q10. What are the two inherent limits of structured data? How does unstructured data differ? The two inherent limits of structured data are rigidity and limited representation power. Rigidity means that the data is best suited and used for in well-defined and predictable cases and scenarios. Limited representation power means that it is great for representing numerical and structured data, and it is best used for quantitative data, such as numbers and dates. Unstructured data differs because unstructured data is highly flexible and in not only how much data can be stored, but also the types of data that can be stored. Unstructured data also differs because it is much harder to analyze and inspect compared to structured data.

Alex Amado, HW7

Related Documents