cs cribsheet 1

.docx

School

University Of Georgia *

*We aren’t endorsed by this school

Course

2610

Subject

Computer Science

Date

Jul 3, 2024

Type

docx

Pages

3

Uploaded by MagistrateDolphinMaster1135

Beautiful Soup with open(file) as f: soup= BeautifulSoup(f, “html.parser) soup.find(“tag”) : Returns a tag object of the first instance soup.find_all(“tag”): Returns a list of tag objects of all instances soup.find_all(“td”, {“class” : “J”}) : list of tag objects with specific tag methods: tag.text: returns a string of the text displayed by a tag tag[“attrib”] : access the tags attribute <tag attribute= “attribute value”> zip(): pairs items of each iterable, creates tuple, new= list(zip(list1,list2)): [(1,’one’), (2,’two’)] f-strings: print(f"{2} plus {3} equals {2 + 3}") response = requests.get("https://someurl.com"): Sends a request to the website - returns a response object .find(“the”) returns starting index of where the is found NumPy dtype="int32" -> makes int array: a n-dimensional, fixed-size object that holds homogeneous data types np.array([1,2,3], dtype = None) np.zeros((row, column)) : array([[0., 0., 0.], [0., 0., 0.]]) np.ones(3) array([1., 1., 1.]) np.full(shape, fill value) : np.full((3,3), 8) array([[8, 8, 8], [8, 8, 8], [8, 8, 8]]) np.arrange(start, stop(exclusive), step) np.arange(4, 16, 3) -> array([4, 7, 10, 13]) np.linspace(start, stop, num=50, endpoint=True, dtype=float) , stop is inclusive when true np.linspace(0, 10, 5) array([ 0. , 2.5, 5. , 7.5, 10. ]) np.random.random((rows, columns)) , random floats in range [0, 1) np.random.randint(low, high = None, size = None, dtype = int), gives one integer Index/Slice arr = np.array([1,2,3,4,5]) -> arr[-3] -> 3 arr = np.array([[1, 2, 3],[4, 5, 6]]) > arr[1,2] > 6 arr[:,2:] all the rows, third column Vector Operations do not change original array arr = np.array([1, 2, 3]) > arr * 2 > array([2, 4, 6]) arr <= 2 array([True, True, False]) Masking arr[arr % 2 == 0] returns array of only even nums Bitwise &, |, ~ np.sum((arr % 2 == 0) & (arr < 13)) NaN np.isnan() gives T or F arr = np.array([-2, 1.5, np.nan, 2, -5], dtype = float) arr[~np.isnan(arr)] -> array([-2., 1.5, 2., -5.]) Properties .dtype - returns the data type of the elements within an array .ndim - returns the number of dimensions of an array .size - returns the number of elements in an array .shape - returns the shape in the order (rows, columns) of an array .copy() - returns a copy of an array that can be assigned to another variable. .fill(value) - replaces all elements of an array with the specified value but does not return it .reshape(rows, columns) - returns an array with the new shape but does not change it np.where (condition, arr if true, arr if false) - returns a new array .fill(value) - replaces all elements of an array with the specified value but does not return it .resize(rows, columns) - changes the shape of an array but does not return the array .sort(axis = 0) - Sorts in-place the array in ascending order np.concatenate ([arr1, arr2], axis =0) Aggregate Methods: .max() .mean() .sum() .min() np.savetxt() np.loadtxt() Pandas : 2D size mutable s = aSeries(data, index=index) selecting data: by label s.loc[], by index s.iloc[] or s[] masking s[s<3] changing data: use loc or iloc append new values use s.loc[“x”] =4 sort: s.sort_values(ascending = True) ascending: small to large, abc order delete: s.drop[“index”] Data Frame df= pd.DataFrame(data, index=, columns=) selecting one column: df[“column”] returns series selecting rows: use .loc or .iloc df.loc[“row”].loc[“column”] df.loc[“row”, “column”] df.set_index("Course", inplace = True) -> don’t count course column as 1 st column anymore masking: df[df[“avg GPA”] < 3] Add/Replace Row Values if the index doesn’t exist, it adds it df.loc[row, col] = 850 -> changes to 850 df.iloc[-1, :] = [100,3] last row, all col add row df.loc[“isye20”]=[150,3.1] Add/Replace Column df.loc[:, “new”] = df[“new”] = df[“column”] >= 2 adds new col with T/F values Sorting : df.sort_values(by = "col", ascending = False, inplace = False) Removing : df.drop(["col"], axis = 1) columns df.drop (["CS2316", "CS1331"], axis = 0) rows df.drop_duplicates (subset = [“Course”]) , drops duplicates in column course df.nunique (axis = 0) counts # of distinct elements in each column Reading/Writing x= pd.read_csv(“<path>file.csv”, index_col=0) x.to_csv(“<path>fileout.csv”, index=True) Missing Data df.loc[“CS2603”] = [50, np.nan, 0] CS2603 50.0 NaN 0.0 df.dropna() : remove all rows that contain NaN df.fillna(0): fill NaN with value pd.isna(df) check is a value is NaN, gives T or F Aggregates .mean() .sum() .min() .max() .count() add column by taking mean of each row: df[“new col”] = df.mean(axis=1).round(2) add row for mean of each column df.loc[“new row”]= df.mean(axis=0) str method df[“col”].str.contains(“A”, na= False) Groupby total= df.groupby(“country”)[“medals”].count() counts # medals for each country country is the index, medals is the only column group by more than 1 column df.groupby([“country”, “gender”])[“medal”].count() .agg() = when applying more than 1 aggregate on more than 1 column after groupby() Concatenate pd.concat(dfs, axis = 0) , joins them vertically horizontal axis = 1 Plotting line: series.plot(x=series.index, y=series.values) series.plot(x=series.index, y=series.values, kind= “bar”) kind = barh, hist, box precent=series.value_counts()10*100 precent.plot(kind= “pie”) Plotly: px.bar px.pie px.histogram px.box fig= px.scatter(data, x = 'date', y = 'new_deaths', color = 'location') x= “column title” fig = px.line(cases, x = 'date', y = 'new_cases', labels = {'date': 'Day', 'new_cases': 'Number of New Cases'}, title = 'North America”) fig.show() *matplot lib library is easiest to use OPP class= blueprint for creating objects object= data structure created using a class as its blueprint instance – NO self attribute- self.attribute define a class class Dog: instance attribute class Dog: def __init__(self, name, people): self.name=name slef.age = age class attributes class Dog: numLegs= 4 def __eq__(self, other): determines what makes things equal to e/o return self.name == other.name and self.age == other.age def __lt__(self, other): used to define sorting, called when < is used return self.age < other.age def __str__(self): called when object is printed or cast to string return f”{self.name} is {self.age} years old” def __repr__(self): called when object is printed return f”{self.name}” Copy list1= [1,2,3] list1=list2 this does not copy the list, it simply copies the memory location list2.append(999) they both get 999 at the end lista= [3,4,5] listb= copy.copy(lista) #does not share memory, it's a newly constructed list object lista.append(98) print(listb) #345 nested_lista= [[1,2],[3,4],[5,6]] nested_listb = copy.copy(nested_lista) nested_lista.append([1,1,1]) only list a changed nested_lista= [[1,2],[3,4],[5,6]] nested_listb = copy.copy(nested_lista) nested_lista[0][0] = 9 #both changed bc nested list nested_lista= [[1,2],[3,4],[5,6]] nested_listb = copy.deepcopy(nested_lista ) nested_lista[0][0] = 9 #only a changes *if you assign an identified to an existing object an alias is created * copy.copy = copying references to the sublists Inplace inplace = True changes original data frame Fundamentals enumerate(): creates a tuple (index, value) for index, val in enumerate([“anna”, “emily”]) [(0, ‘anna’), (1, ‘Emily’)] returns enumerate object zip(): pairs items of each iterable, creates tuple, returns zip object new= list(zip(list1,list2)): [(1,’one’), (2,’two’)] lambda : add= lambda a: a +10 returned value defined after colon conditional: print(“even” if num%2 ==0 else “odd”) list comprehension : list= [expression for item in iterable if condition] list=[i**2 for i in range(10)] i**2 value is in list dictionary comp : {key:val for item in iterable if x} {i:i**2 for i in range(4)} {0:0, 1:1, 2:4, 3:9} {lis[i]:lis2[i] for i in range range len(lis)} {key:val for key,val in zip([1,2,3], “abc”)} [:-1] everything but last column
Command line mkdir: creates a directory cd: full path of current folder ls: list content in currect directory cat: display content of file if __name__ == "__main__": -> will only print if running from command line. Executes only when you execute as a script Lists: Mutable, can iterate through Method Usage .append() Adds an element at the end of the list .extend() Add the elements of a list (or any iterable) to the end of the current list .index() Returns the index of the first element with the specified value .count() Returns the number of elements with the specified value .remove() Removes the first item with the specified value sorted() -> returns a new list of sorted values sorted(alist, key=lambda x:x[1]) -> sorts by first index .sort() -> mutates original list returns none alist.sort(reverse= True) -> cannot assign this to anything or it returns none list.append(4) -> adds 4 to end of list Tuples Method Usage .count() Returns the number of times a specified value occurs in a tuple .index() Searches the tuple for a specified value and returns the position of where it was found immutable and cannot be sorted tup = (1,2,3) can iterate, index, slice strings immutable and iterable .lower() .upper() .isdigit() .split() .replace() string.split() -> makes string into a list by splitting at the spaces, returns a list string.join() joins iterables on a string, returns string “ “. join(alist) -> must have string before . f-strings: print(f"{2} plus {3} equals {2 + 3}") dictionary dict= {90:”a”, 80:”b”} key:value keys: for key in mydict.keys() value: for val in mydict.values() both: for key,val in mydict.items() access value: dictionary[“key”] updating: dictionary[“key”] = value -> it it already exists it gets updated delete: del dict[‘key’] sets: no indexing/slicing set={1,2”3”} can add/remove, takes out duplicates Fundamentals range(start, stop, step) stop is exclusive indexing-> list[start:stop:step] enumerate(): creates a tuple (index, value) for index, val in enumerate([“anna”, “emily”]) [(0, ‘anna’), (1, ‘Emily’)] returns enumerate object zip(): pairs items of each iterable, creates tuple, returns zip object new= list(zip(list1,list2)): [(1,’one’), (2,’two’)] lambda: add= lambda a: a +10 returned value defined after colon conditional: print(“even” if num%2 ==0 else “odd”) list comprehension: list= [expression for item in iterable if condition] list=[i**2 for i in range(10)] i**2 value is in list dictionary comp: {key:val for item in iterable if x} {i:i**2 for i in range(4)} {0:0, 1:1, 2:4, 3:9} {lis[i]:lis2[i] for i in range range len(lis)} {key:val for key,val in zip([1,2,3], “abc”)} File I/O -> list of strings open file, readlines to create a list of all lines, strip newline char .strip(), split on delimiter .split(“,”) with open(“file.txt”, “r” as f: text= f.read() f.read(): one long string read(:4) 4 char in data f.readline(): string one line at a time f.readlines(): list of every line as a string seek(): moves curser thru file fileObject.seek(offset) writing: open file, write header, loop thru data writing each row as string w/ newline char at end with open(“text.file”, “w”) as out: out.write(“one\ntwo\n”) CSV file -> list of lists each list represents a row of data with open(“names.csv”,”r”) as f: reader=csv.reader(f) data =list(reader) >list of list data[1:] eliminates header line with open(“files.csv”,”w”) as fout: writer=csv.writer(fout) >creates writer object writer.writerow(data) > writes one new row writer.writerows(data) all rows in the file with open(“csvFileName.csv”, “r”) as fin: dictReader = csv.DictReader(fin) listOfDicts = [dict(line) for line in dictReader] with open(“csvFile.csv”, “w”) as fout: dw = csv.DictWriter(fout, fieldnames = [‘key1’, ‘key2’, ...]) dw.writeheader() JSON: web service responses double quotes for strings, true/false for Boolean, null instead None, dict keys must be string type load: JSON to python loads(): parses a string of JSON code and turns it into python dictionary load(): parses JSON file into a python dictionary with open(“file.json”, “r”) as f: dict=json.load(f) dump: python to JSON dumps(): takes python dict and returns JSON string dump(): takes python dict and dumps into JSON file with open(“fileout.json”, “w”) as f: json.dump(output_dict, f) f= what you dump to XML: formatted as element trees HTML: for data display starts with <!doctype html> begins with <html> and ends with </html> visible part is between <body> and </body> headings are defined with the <h1> to <h6> tags links with <a> tag <ul> unordered list <ol> ordered list <li> list item table defined with <table> <tr> row <th> table header <td> data cell <img src = “xx.jpg”> API: request module import requests : Imports the requests module response = requests.get("https://someurl.com"): Sends a request to the website - returns a response object response.status_code : The status code of the request (an attribute) gives an integer response.text : The text that was retrieved by the get request (an attribute) response.json(): Returns the text that was retrieved converted into python (only works if the text was stored in the json format) print(response) =status code only print(response.text[:500])= 1 st 500 characters response = requests. post (“ https://example.com ”) -> sends info to a website 200: successful request, 404: url not found, 500: internal error, 401/3: unauthorized Escape Sequences : not printable character \n = newline \t= tab \\=backsplash RegEx To make non greedy put   ? after the + or * [A-Z][a-z]* capital letter followed by zero or more lower case print(re.findall(".+C",text)) start with one or more character and end with a capital letter C Meta character meaning . Matches any character \ Escape special/meta characters | Or operator ^ Match at beginning of string/line. Represents “not” in a character class $ Match at end of string/line * Match 0 or  more of the preceding regex + Match 1 or more of the preceding regex ? Match 0 or 1 of the preceding regex {} Bounded repetition [] Create a character class () Capture group within the matched substring Character class What it matches [a-z] Any lowercase letter [A-Z] Any uppercase letter [a-zA-Z] Any letter [^A-Z] Anything except uppercase letters Predefined character class What it matches \d Any digit, equivalent to [0-9] \D Any non-digit, equivalent to [^0-9] \s Any whitespace char, equal to [ \t\n\r\f\v] \S Any non-whitespace char, equal to [^\t\n\r\f\ v] \w Any alphanumeric char, equal to [a-zA-Z0-9_] \W Any non alphanumeric char, equal to [^a-zA- Z0-9_] re .match ( ‘regEx ’, a_string ): Checks if the beginning of a_string matches the pattern. returns a Match object.  Otherwise, it returns None . re .search ( ‘regEx ’, a_string ): Checks if any part of a_string matches the pattern. returns a Match object corresponding to the first matching part re .findall ( ‘regEx ’, a_string ): Checks a_string for all non-overlapping matches to the regex supplied and returns a list of the strings that match re .sub (r ‘regEx ’, new_string , a_string ): Checks a_string for all matches to the regex and returns a string with each match replaced by new_string method What it does match_object .start() Returns the index of the start of the string that matched match_object .end() Returns the index after the end of the string that matched match_object .group() Returns the string that matched match_object .span() Returns a tuple of the starting and ending indices of the string matched by the regex *Ending index is exclusive SQL: structured query language Schema - a collection of related tables and constructs
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help