CS210_231_P2

pdf

School

Mindanao State University - Iligan Institute of Technology *

*We aren’t endorsed by this school

Course

9822

Subject

Computer Science

Date

Nov 24, 2024

Type

pdf

Pages

5

Uploaded by PrivateRose23317

Report
CS210 with Dr. Basit Qureshi (231) Term Project 2 Weight: 5% Due: Before Friday 17 November, 2023. Helping Students Search Jobs! Project Description Electronic resources, such as job search websites, company websites, and professional networking platforms like LinkedIn, provide access to a vast and diverse array of job opportunities. Graduates can explore positions in various industries, locations, and fields, allowing them to find the best fit for their skills and interests. Job searching electronically is much more efficient and convenient than traditional methods like scanning newspapers or walking into offices. Graduates can search and apply for jobs 24/7, from the comfort of their own homes, which saves time and effort. Web scraping is a technique used to extract data from websites. It involves programmatically accessing and collecting information from web pages, typically in a structured format like HTML, and then converting that data into a more usable format, such as a spreadsheet or a database. Web scraping is commonly employed for a variety of purposes, including data analysis, research, and automation. Once the data is collected, it can be transformed into a structured format, such as a CSV file or a database, making it easier to analyze and work with. This project involves the analysis of Binary Search Tree (BST) data structures using a dataset obtained by web scraping from Bayt.com. As a part of this project, you will be responsible for data preprocessing to ensure its compatibility with the program. Additionally, you will develop a custom implementation of a BST in Java. The program will read the dataset and generate outputs based on specific queries. The findings and results will be documented in a concise report. The following lists the steps to be taken to complete this project. 1. The Kaggle Dataset Download Data set from: https://www.kaggle.com/datasets/haninalmarshad/bayt-com- webscraping?select=jobs_bayt_3.csv
Open the csv file in MS-Excel or similar tool. Clean the data by selecting the following fields in the table. Remove the unnecessary columns from the CSV file. DONOT remove any rows from the dataset. 0. Job ID 1. Title 2. Company 3. Date 4. CompanyIndustry 5. Job Role 6. Degree 7. Job City 8. Max Age 9. MonthlySalaryMaxRange Save the file as Dataset.csv. 2. Programming your tool You will write 3 classes in java as follows. Create a BST Node class: NodeBST Description int JobID String Title String Company String DatePosted String CompanyIndustry String JobRole String Degree String Job City int MaxAge int MonthlySalaryMaxRange NodeBST left; NodeBST right; There are 9 sets of values and a KEY, stored in each NodeBST in addition to NodeBST left and right pointers. Your tree inserts/removes/searches the data based on the KEY. You KEY is determined by this formula KEY_COLUMN is (LastDigitofYourPSUStudentID / 2) The valid KEYS are columns titled: Job ID, Title, Company, Date and CompanyIndustry. NodeBST(); String toSting(); The following is the API for the BST class. This implements a Binary Search Tree consisting of Nodes from the Node class. BST Description NodeBST Root int size; Anchors the BST Maintains count for number of nodes BST() void insert() String search() void remove() default constructor inserts the key in the tree if it is non-existent. Adds a duplicate value to the right side. If a key is blank, add “0”. Searches for the key in the BST, returns the value(s) Searches and removes the key in the BST with the first occurrence of the key. Use Hibbard Deletion.
In addition to the above, write a public Main Class containing main method that reads data from the csv file, processes the information and displays the results. ALL 3 classes must be copied in ONE java file called Main.java We suggest that you modify/edit the BST implementation provided at: https://github.com/basit388/cs210/tree/master/Trees/src/BST 3. Running your program The program runs from console with the following parameters: Usage: java -jar Main [KEY] [QUERY] [keyword] Sample Input 1: > Main 0 0 4177085 ***** Start time: 290182692 Started reading dataset file Loaded file in BST End time: 292785317 Time taken: 3022.609 milliseconds ***** ***** Start time: 292785319 Searching keyword: 4177085 End time: 292785322 Time taken: 2.103 milliseconds ***** Found: 4177085 Branch Administration Manager Kinetic Business Solutions 4/15/2020 Medical Clinic Administration Riyadh Sample Input 2: > Main 3 1 4/4/2020 ***** Start time: 290182690 Started reading dataset file Loaded file in BST End time: 292785315 Time taken: 2131.108 milliseconds ***** ***** Start time: 292785320 Removing keyword: 4/4/2020 End time: 292785322 Time taken: 2.103 milliseconds ***** Removed 4175160 Documents Controller & Reports Riyadh 4/4/2020 Administration Support Services Administration Bachelor's degree / higher diploma Riyadh Note on input formatting: The valid values for KEY are 0, 1, 2, 3 and 4. The valid values for QUERY are 0 and 1. 0 is search, 1 is remove. The valid value for Keyword is any length of string. The program matches the input keyword string against the KEY in the BST.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
4. Data Collection: There are 3527 Data records in the dataset. Run your Query with the assigned KEY for your project for data in these lines. Note the “time taken” and record it in this table: Choose the Keyword from the Record line number: Search time Remove time 3 26 101 619 892 1237 2138 2751 3019 3517 5. Conclusions Write a short conclusion answering the following questions. The conclusions should not exceed 300 words. Do you see a pattern in your recorded data for search queries? If so, what happens to search time for a keyword as the record line number increases? Why is it like this? Explain! Do the similar for remove queries. Why the search and remove times could be different?
Evaluation You are NOT allowed to work in a group of any number of students. Your work’s evaluation would be based on Code inspection [4 pts], Quality of report [1 pts] and an interview (optional as needed). Submission Upload 4 files to LMS as follows: Main.jar Main.java ( One file containing ALL classes ) Conclusion sheet (pdf) [ Download the Template here ] Program output file [ Download the Template here ] The submission deadline is final. Late Submissions will be awarded ZERO points. Code Inspection The code would be inspected by the instructor. The instructor would determine the score to be given for code inspection. Generally, a readable code (indentation, clear scope definition) is required. For this project, there are no limitations on time and memory usage. Plagiarism Instructor reserves the right to use appropriate tools to detect plagiarism. If the similarity of your submission is more than 50% with any other submission, you will be awarded a ZERO in the project. This project evaluation will use the Stanford MOSS tool. Useful resources Tutorials on using command line arguments