Lab1-Data_preparation_with_weka

pdf

School

Algonquin College *

*We aren’t endorsed by this school

Course

8390

Subject

Computer Science

Date

Feb 20, 2024

Type

pdf

Pages

6

Uploaded by CoachFly3948

Report
CST8390 - Lab 1 Part 1 - Data exploration in Weka - Iris dataset Due Date: Due by the end of the lab in Week 1. Fill in your answers in the given Lab1_Answers.doc. Introduction The goal of this part is to install and familiarize the tool Weka. Steps 1. Download and install Weka. You can find it here: http://www.cs.waikato.ac.nz/ml/weka/downloading.html 2. Open Weka and have a look at the interface. It is an open-source project written in Java (created by the academic team of the University of Waikato).
3. Click on the Explorer button on the right side: 4. Check different tabs to familiarize the tool. 5. Weka comes with a number of small datasets. Those files are located at C:\Program Files\Weka- 3-8 (If it is installed at this location. Or else, search for Weka-3-8 to find the installation location). In this folder, there is a subfolder named ‘data’. Open that folder to see all files that comes with Weka. 6. For easy access, copy the folder ‘data’ and paste it in your ‘Documents’ folder. 7. In this part, we will work with the dataset Iris. To open Iris dataset, click on ‘Open file’ in the ‘Preprocess tab’. From your ‘data’ folder, select iris.arff and hit open. 8. To know more about the iris dataset, open iris.arff in notepad++ or in a similar tool and read the comments. 9. Click on visualize tab to see various 2D visualizations of the dataset. a. Click on some graphs to see more details about it. b. In any of the graph, click one ‘x’ to see details about that data record.
10. Fill this table in the Lab1_Answers.doc: Flower Type Count 11. Fill this table: Attribute Minimum Maximum Mean StdDev In order to get the credit for this part: 1. Show the Iris file in Weka during demo. 2. Show the answer document during demo. Part 2 - Data Preparation and Cleaning Activity 1 1. Download EmployeesSalary.csv file from Brightspace. 2. Open EmployeesSalary.csv in excel and explore it. 3. Read https://www.cs.waikato.ac.nz/ml/weka/arff.html to find the expectations of an ARFF file. 4. Identify the attributes of the data. Record the attributes and the type of attribute for the data. 5. Open the file by selecting ‘Open file’ in the ‘Preprocess tab’ of Weka. Check whether all attributes have the required types. Otherwise, apply filters to change them. 6. Which are the four important attributes that are relevant to analyse this dataset? 7. For the nominal attributes of the previous question, fill in the following table:
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Attribute Name: Attribute Name: Label Count Label Count Attribute Name: Label Count 8. Analyze your data to see any anomalies. List the identified anomalies below. Write why you think those records are anomalies in the following format: ( find at least 8 outliers) Id first_name last_name email Address Country Branch Currency Salary Reason Remove Duplicates: 9. How many instances do you have now? ___________________________ 10. Check manually whether any duplicates exist in the file. 11. Now run RemoveDuplicates filter to remove duplicates. To do this, from ‘Filter’, choose weka filters unsupervised instance Remove Duplicates.
12. Select Apply to run the filter operation. 13. a. How many instances do you have now? _________________ b. How many duplicates (how many got removed): ________________ 14. Take a screenshot and paste it in the answer document. 15. Save this new file as EmployeesSalaryNoDuplicates.arff. Nominal to Binary 16. How many nominal attributes do you have? 17. With those nominal values, we cannot apply any of the distance-based classification methods. Convert them into binaries using NominalToBinary filter. For that, from Filter, select weka filters unsupervised attribute NominalToBinary, and hit Apply. (Check other available filters too. You need to use these filters in the future labs). 18. Take a screenshot and paste it in the answer document. 19. Save this file to EmployeesSalaryNoDupBinary.arff 20. Open the file in notepad++ and see the data. 21. Take a screenshot of the file while it is opened in Notepad++. Header should be visible.
In order to get the credit for this lab: 3. Show the EmployeesSalary file in Weka during demo. 4. Show EmployeesSalaryNoDupBinary.arff in Weka. 5. Show the answer document. 6. Upload lab1_Answers.doc in Brightspace before the submission due date. Demo during lab hours AND submission in Brightspace are required to get credits for the lab.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help