accy570lab_week7

html

School

University of Illinois, Urbana Champaign *

*We aren’t endorsed by this school

Course

570

Subject

Information Systems

Date

Dec 6, 2023

Type

html

Pages

12

Uploaded by HighnessLoris3604

Report
Week 7 Lab File path Load data to pandas DataFrame DataFrame attributes File Path Absolute path Windows C:\Users\Linden .... \data.csv Mac: /Users/Linden/.../data.csv Relative path data.csv in currcent foler(same as the notebook file): data.csv data.csv in a subfolder in current folder: subfolder/data.csv Always use / in file path Useful notebook magics %pwd : print current folder as absolute path %ls : list files in current folder In [1]: %pwd Out[1]: 'C:\\Users\\0903i\\Downloads' In [2]: %ls Volume in drive C is OS Volume Serial Number is 7415-2B2A Directory of C:\Users\0903i\Downloads 10/05/2023 08:03 PM <DIR> . 09/28/2023 11:58 AM <DIR> .. 10/05/2023 08:01 PM <DIR> .ipynb_checkpoints 10/05/2023 07:51 PM <DIR> __MACOSX 10/03/2023 05:02 PM 430,568 517 Midterm Project Team 6 - CVS.xlsx 10/05/2023 08:03 PM 48,732 accy570lab_week7.ipynb 09/26/2023 07:49 PM 36,984 After Class Exercises 8.ipynb 09/13/2023 11:00 PM 77,259 AIDLandlordLeasingAgent.pdf 10/05/2023 10:55 AM 23,948,849 appstore_games.csv 09/30/2023 12:31 PM 57,741 Companies Cross Listed in the US.xlsx 10/05/2023 07:51 PM <DIR> data 09/30/2023 12:31 PM 90,775 fortune1000.csv 09/30/2023 12:46 PM 16,437,737 IowaLiquorSales2018.csv 10/05/2023 07:51 PM 17,558 PilgrimBank.xlsx 10/05/2023 12:29 PM 63,958 ProblemSet2_soomink4_thingyuh2-1.ipynb 10/05/2023 08:02 PM 316,891 ProblemSet2_soomink4_thingyuh2-1.twbx 10/05/2023 07:51 PM 214 sales.csv 10/05/2023 07:51 PM 61,194 titanic.csv 13 File(s) 41,588,460 bytes 5 Dir(s) 890,413,084,672 bytes free In [3]: %ls data Volume in drive C is OS
Volume Serial Number is 7415-2B2A Directory of C:\Users\0903i\Downloads\data 10/05/2023 07:51 PM <DIR> . 10/05/2023 08:03 PM <DIR> .. 10/13/2021 02:24 PM <DIR> .ipynb_checkpoints 10/05/2023 07:51 PM 186,693 COST_WMT_stock.csv 10/05/2023 07:51 PM 61,194 titanic.csv 2 File(s) 247,887 bytes 3 Dir(s) 890,404,212,736 bytes free Exercise 1 Find your current working dirctory with notebook magic %pwd. List all files in your current working directory with notebook magic %ls. In [4]: %pwd Out[4]: 'C:\\Users\\0903i\\Downloads' In [6]: %ls Volume in drive C is OS Volume Serial Number is 7415-2B2A Directory of C:\Users\0903i\Downloads 10/05/2023 08:03 PM <DIR> . 09/28/2023 11:58 AM <DIR> .. 10/05/2023 08:01 PM <DIR> .ipynb_checkpoints 10/05/2023 07:51 PM <DIR> __MACOSX 10/03/2023 05:02 PM 430,568 517 Midterm Project Team 6 - CVS.xlsx 10/05/2023 08:03 PM 48,732 accy570lab_week7.ipynb 09/26/2023 07:49 PM 36,984 After Class Exercises 8.ipynb 09/13/2023 11:00 PM 77,259 AIDLandlordLeasingAgent.pdf 10/05/2023 10:55 AM 23,948,849 appstore_games.csv 09/30/2023 12:31 PM 57,741 Companies Cross Listed in the US.xlsx 10/05/2023 07:51 PM <DIR> data 09/30/2023 12:31 PM 90,775 fortune1000.csv 09/30/2023 12:46 PM 16,437,737 IowaLiquorSales2018.csv 10/05/2023 07:51 PM 17,558 PilgrimBank.xlsx 10/05/2023 12:29 PM 63,958 ProblemSet2_soomink4_thingyuh2-1.ipynb 10/05/2023 08:02 PM 316,891 ProblemSet2_soomink4_thingyuh2-1.twbx 10/05/2023 07:51 PM 214 sales.csv 10/05/2023 07:51 PM 61,194 titanic.csv 13 File(s) 41,588,460 bytes 5 Dir(s) 890,388,803,584 bytes free Access Data Files Best practice: Put data file in same folder as notebook. Put data file in subfolder like 'data' Never use absolute path
Always use relative path Use forward slash / in file path string In [7]: import pandas as pd df = pd.read_csv('titanic.csv') df.head() Out[7]: PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare 0 1 0 3 Braund, Mr. Owen Harris male 22.0 1 0 A/5 21171 7.2500 1 2 1 1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 0 PC 17599 71.2833 2 3 1 3 Heikkinen, Miss. Laina female 26.0 0 0 STON/O2 . 3101282 7.9250 3 4 1 1 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 0 113803 53.1000 4 5 0 3 Allen, Mr. William Henry male 35.0 0 0 373450 8.0500 In [8]: df = pd.read_csv('data/titanic.csv') df.head() Out[8]: PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare 0 1 0 3 Braund, Mr. Owen Harris male 22.0 1 0 A/5 21171 7.2500 1 2 1 1 Cumings, Mrs. John Bradley (Florence Briggs female 38.0 1 0 PC 17599 71.2833
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Th... 2 3 1 3 Heikkinen, Miss. Laina female 26.0 0 0 STON/O2 . 3101282 7.9250 3 4 1 1 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 0 113803 53.1000 4 5 0 3 Allen, Mr. William Henry male 35.0 0 0 373450 8.0500 Exercise 2 Load COST_WMT_stock.csv in data folder to a DataFrame df_stock . Display first five rows of df_stock with head() method. In [13]: import pandas as pd df_stock = pd.read_csv('COST_WMT_stock.csv') df_stock.head() Out[13]: Date COST_Adj Close COST_Vo lume COST_ch ange COST_UpO rDown WMT_Adj Close WMT_Vol ume WMT_cha nge WMT D 0 10/1/ 2009 43.190258 5097900 NaN NaN 38.208164 22467100 NaN NaN 1 10/2/ 2009 43.795193 5760800 0.604935 Up 38.270535 15093800 0.062371 Up 2 10/5/ 2009 44.113171 4480100 0.317978 Up 38.254951 14217500 -0.015584 Down 3 10/6/ 2009 44.927494 5865700 0.814323 Up 38.582447 17582800 0.327496 Up 4 10/7/ 2009 45.757320 11117300 0.829826 Up 38.590244 11620200 0.007797 Up DataFrame manipulations Load data pd.read_csv()
pd.read_excel() and more (try tab completion) Get intuition of data df.head() df.tail() df.sample() df.info() In [14]: df.head() Out[14]: PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare 0 1 0 3 Braund, Mr. Owen Harris male 22.0 1 0 A/5 21171 7.2500 1 2 1 1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 0 PC 17599 71.2833 2 3 1 3 Heikkinen, Miss. Laina female 26.0 0 0 STON/O2 . 3101282 7.9250 3 4 1 1 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 0 113803 53.1000 4 5 0 3 Allen, Mr. William Henry male 35.0 0 0 373450 8.0500 In [15]: df.tail() Out[15]: PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare C 886 887 0 2 Montvila, Rev. Juozas male 27.0 0 0 211536 13.00 N 887 888 1 1 Graham, Miss. Margaret female 19.0 0 0 112053 30.00 B
PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare C Edith 888 889 0 3 Johnston, Miss. Catherine Helen "Carrie" female NaN 1 2 W./C. 6607 23.45 N 889 890 1 1 Behr, Mr. Karl Howell male 26.0 0 0 111369 30.00 C 890 891 0 3 Dooley, Mr. Patrick male 32.0 0 0 370376 7.75 N In [16]: df.sample(5) Out[16]: PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare 185 186 0 1 Rood, Mr. Hugh Roscoe male NaN 0 0 113767 50.0000 76 77 0 3 Staneff, Mr. Ivan male NaN 0 0 349208 7.8958 306 307 1 1 Fleming, Miss. Margaret female NaN 0 0 17421 110.883 646 647 0 3 Cor, Mr. Liudevit male 19.0 0 0 349231 7.8958 141 142 1 3 Nysten, Miss. Anna Sofia female 22.0 0 0 347081 7.7500 In [17]: df.sample(5) Out[17]: PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare 618 619 1 2 Becker, Miss. female 4.0 2 1 230136 39.00
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Marion Louise 379 380 0 3 Gustafsson, Mr. Karl Gideon male 19.0 0 0 347069 7.775 369 370 1 1 Aubart, Mme. Leontine Pauline female 24.0 0 0 PC 17477 69.30 342 343 0 2 Collander, Mr. Erik Gustaf male 28.0 0 0 248740 13.00 586 587 0 2 Jarvis, Mr. John Denzil male 47.0 0 0 237565 15.00 In [18]: df.info() <class 'pandas.core.frame.DataFrame'> RangeIndex: 891 entries, 0 to 890 Data columns (total 12 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 PassengerId 891 non-null int64 1 Survived 891 non-null int64 2 Pclass 891 non-null int64 3 Name 891 non-null object 4 Sex 891 non-null object 5 Age 714 non-null float64 6 SibSp 891 non-null int64 7 Parch 891 non-null int64 8 Ticket 891 non-null object 9 Fare 891 non-null float64 10 Cabin 204 non-null object 11 Embarked 889 non-null object dtypes: float64(2), int64(5), object(5) memory usage: 83.7+ KB Exercise 3 Load sales.csv to a DataFrame df_sales with pandas function read_csv() . Take a peek at the DataFrame with DataFrame method head() (or tail, or sample) Get basic DataFrame information with DataFrame method info() . Which column has missing values? ( You don't need to answer this question in the notebook ) In [28]: df_sales = pd.read_csv('sales.csv')
df_sales.head() Out[28]: Item Sales($) Sale Date Notes 0 milk $3.79 1/5/2020 2% 1 fruit $2.99 2/5/2020 NaN 2 meat $15.99 3/5/2020 Beaf 3 chocolate $5.99 4/5/2020 Black 4 Milk ($3.79) 13/5/2020 Return In [25]: df_sales.info() <class 'pandas.core.frame.DataFrame'> RangeIndex: 7 entries, 0 to 6 Data columns (total 4 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Item 7 non-null object 1 Sales($) 7 non-null object 2 Sale Date 7 non-null object 3 Notes 4 non-null object dtypes: object(4) memory usage: 356.0+ bytes Exercise 4 In the following code cell: import pandas as pd type pd.read_ then press TAB key. ( Just try it, don't leave un-completed code in the notebook ) print pd.read_excel() help document using help() or ? Load data from PilgrimBank.xlsx to df_pb using pb.read_excel(). The excel file has two sheets, the data is in 'DataSet' sheet. Display the first 5 rows of df_pb . In [33]: import pandas as pd pd.read_excel? Out[33]: <function pandas.io.excel._base.read_excel(io, sheet_name: 'str | int | list[IntStrT] | None' = 0, *, header: 'int | Sequence[int] | None' = 0, names: 'list[str] | None' = None, index_col: 'int | Sequence[int] | None' = None, usecols: 'int | str | Sequence[int] | Sequence[str] | Callable[[str], bool] | None' = None, squeeze: 'bool | None' = None, dtype: 'DtypeArg | None' = None, engine: "Literal['xlrd', 'openpyxl', 'odf', 'pyxlsb'] | None" = None, converters: 'dict[str, Callable] | dict[int, Callable] | None' = None, true_values: 'Iterable[Hashable] | None' = None, false_values: 'Iterable[Hashable] | None' = None, skiprows: 'Sequence[int] | int | Callable[[int], object] | None' = None, nrows: 'int | None' = None, na_values=None, keep_default_na:
'bool' = True, na_filter: 'bool' = True, verbose: 'bool' = False, parse_dates: 'list | dict | bool' = False, date_parser: 'Callable | None' = None, thousands: 'str | None' = None, decimal: 'str' = '.', comment: 'str | None' = None, skipfooter: 'int' = 0, convert_float: 'bool | None' = None, mangle_dupe_cols: 'bool' = True, storage_options: 'StorageOptions' = None) -> 'DataFrame | dict[IntStrT, DataFrame]'> In [31]: df_pb=pd.read_excel('PilgrimBank.xlsx',sheet_name='DataSet') df_pb.head() Out[31]: ID 9age 9Inc 9AllDem 9District 9Tenure 9Profit 0Profit 9Online 0Online 9Billpay 0 1 NaN NaN 0 1200 6.33 21 NaN 0 NaN 0 1 2 6.0 3.0 1 1200 29.50 -6 -32.0 0 0.0 0 2 3 5.0 5.0 1 1100 26.41 -49 -22.0 1 1.0 0 3 4 NaN NaN 0 1200 2.25 -4 NaN 0 NaN 0 4 5 2.0 9.0 1 1200 9.91 -61 -4.0 0 0.0 0 DataFrame Attributes df.index df.columns df.shape In [34]: df.head() Out[34]: PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare 0 1 0 3 Braund, Mr. Owen Harris male 22.0 1 0 A/5 21171 7.2500 1 2 1 1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 0 PC 17599 71.2833 2 3 1 3 Heikkinen, Miss. Laina female 26.0 0 0 STON/O2 . 3101282 7.9250 3 4 1 1 Futrelle, Mrs. Jacques Heath female 35.0 1 0 113803 53.1000
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare (Lily May Peel) 4 5 0 3 Allen, Mr. William Henry male 35.0 0 0 373450 8.0500 In [35]: df.index Out[35]: RangeIndex(start=0, stop=891, step=1) In [36]: df.columns Out[36]: Index(['PassengerId', 'Survived', 'Pclass', 'Name', 'Sex', 'Age', 'SibSp', 'Parch', 'Ticket', 'Fare', 'Cabin', 'Embarked'], dtype='object') In [37]: df.columns = [c.upper() for c in df.columns] df.head() Out[37]: PASSENGERID SURVIVED PCLASS NAME SEX AGE SIBSP PARCH TICKET F 0 1 0 3 Braund, Mr. Owen Harris male 22.0 1 0 A/5 21171 7. 1 2 1 1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 0 PC 17599 71 2 3 1 3 Heikkinen, Miss. Laina female 26.0 0 0 STON/O2 . 3101282 7. 3 4 1 1 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 0 113803 53 4 5 0 3 Allen, Mr. William male 35.0 0 0 373450 8.
PASSENGERID SURVIVED PCLASS NAME SEX AGE SIBSP PARCH TICKET F Henry In [38]: df.shape Out[38]: (891, 12) In [39]: df.shape[0] Out[39]: 891 Exercise 5 Find number of rows and columns of df_sales with DataFrame attribute shape. Display column names of df_sales with DataFrame attribute columns. In [41]: df_sales.shape Out[41]: (7, 4) In [43]: df_sales.columns Out[43]: Index(['Item', 'Sales($)', 'Sale Date', 'Notes'], dtype='object') In [44]: df_sales.info() <class 'pandas.core.frame.DataFrame'> RangeIndex: 7 entries, 0 to 6 Data columns (total 4 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Item 7 non-null object 1 Sales($) 7 non-null object 2 Sale Date 7 non-null object 3 Notes 4 non-null object dtypes: object(4) memory usage: 356.0+ bytes In [46]: df_sales.columns= ['Item', 'Sales', 'Sale Date', 'Notes'] df_sales.head() Out[46]: Item Sales Sale Date Notes 0 milk $3.79 1/5/2020 2% 1 fruit $2.99 2/5/2020 NaN 2 meat $15.99 3/5/2020 Beaf 3 chocolate $5.99 4/5/2020 Black
Item Sales Sale Date Notes 4 Milk ($3.79) 13/5/2020 Return In [ ]:
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help