Project_Part2_Assignment

.docx

School

Harvard University *

*We aren’t endorsed by this school

Course

ECON

Subject

Computer Science

Date

Nov 24, 2024

Type

docx

Pages

2

Uploaded by zablonupwork

Report
Project Part 2: Advanced Data Structures and I/O (Due 12/9/23) Description: This phase of the project extends part 1 with two principal goals – discussing two theoretical applications related to your dataset*, and adapting an “advanced” data structure implementation to your data and performing a simple input/output task. Note: You may work alone or in a group of 2 (of your own choosing) for this project. You may not work in a group of more than 2! Specific details follow: Problem 1 (50 pts) : (I will be very flexible about your choice of application for these.) Write about one paragraph each regarding two distinct theoretical applications related to your dataset that would be best handled with two separate ADTs we looked at beyond the list-based ones (i.e. your options are Stack/Queue, BST, Priority Queue, Hash Table (Map)). Note that you can use one of stack or queue, but not both . Sample applications for the credit-card fraud dataset are below: Make sure you are careful in your descriptions of ADTs and Data Structures! Application 1: Credit card records could be maintained in a self-organizing BST structure sorted on charge amount, with the main goal to have fast and efficient access to outliers in the data (i.e those that are extremely below or extremely above the average). Since the root of a self-organizing BST will have a charge amount close to the median, it should be easy to compare items with the smallest values (via In-order traversal) or largest values (via Reverse In-order traversal) to this quantity. Outlying data could then be investigated further. Application 2: Modified credit-card records could include a “risk” variable based on how likely the transaction is deemed to be fraudulent based on information like price, location, time, etc. The appropriate ADT for this data and application would be then be the priority queue, with higher risk transactions being placed higher in the queue. Investigators (manual or automated) would then “dequeue” records in order of risk when investigating potential fraud. Problem 2: Choose a single existing data structure implementation* not including the list data structure implementations (i.e. your options are, most likely, ArrayStack, CLQueue, BST, Heap) from class. You are to write a C++ program that will i. (30 pts) read your dataset* from part 1 of the project (or you can choose a new one, if preferred) into your chosen structure , and ii. (20 pts) print a subset of rows/records from your dataset in response to user input. I will be flexible with the latter, but the user must be able to provide an input value that changes what elements of your data are printed. Some ideas are below (you can use any of these, modify them, or come up with your own):
a. Print out elements of a BST with a chosen attribute/column that exceeds a certain value (ex: for the CCFraud data only rows with more than an input value for amount.) b. Print out elements from a PQ heap matching the k highest or k lowest (where k is a value input by the user) values for a chosen attribute/column. c. Print out every kth element loaded into a CLQueue (again, where k is a value input by the user) Note: you will likely want to consult and/or modify the main.cpp file you used in part 1 (or create one, if you did not finish!) to handle the reading of data into the structure, as well as writing of it. But remember – you will also need to modify the new data structure you have chosen , which includes details such as modifying or removing other operations (ex: there is likely no need to keep DeleteItem for the BST, and you can potentially even discard the traversals and traversal queue if you like). Do not generate a new data structure from scratch or from an outside online resource – there will be a substantial penalty if it is clear your structure was not based on one that we used during class. Your submission should be a .zip file that includes a .csv file for the final, preprocessed dataset, a series of .h/.cpp source files including the modified structure files and main.cpp, and a .doc(x) or .pdf format for the task 1. Upload the file as “LN_FN_ProjP2.zip” where LN is your last name and FN is your first name. If you worked in a group of 2, your submission can (and probably should) be identical on Blackboard, but you MUST let me know clearly who you worked with in your report. * If you are working in a group of 2, you will want to choose a single dataset and single structure/task to implement. You may split labor in any way choose.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help