I am a newbie to python. I am working on a CSV file where it has over a million records. In the data, every Location has a unique ID (SiteID). I want to filter for and remove any records where there is no value or mismatch between SiteID and Location in my CSV file. (Note: This script should print the lines number and mismatch field values for each record.) lines = [] count = 0 # read line with open(r"air-quality-data-continuous.csv",'r') as fp: # read an store all lines into list lines = fp.readlines()
I am a newbie to python. I am working on a CSV file where it has over a million records. In the data, every Location has a unique ID (SiteID). I want to filter for and remove any records where there is no value or mismatch between SiteID and Location in my CSV file. (Note: This script should print the lines number and mismatch field values for each record.)
lines = []
count = 0
# read line
with open(r"air-quality-data-continuous.csv",'r') as fp:
# read an store all lines into list
lines = fp.readlines()
print(str(len(lines)) + 'lines in input file')
with open(r"crop.csv", 'w') as fp:
fp.write(lines.pop(0))
#iterate each line
for number, line in enumerate(lines):
if (line[4] == 'NaN'):
print('Empty Site ID found in line:' + str(number))
continue
if(str(line[4]) != line[17]):
fp.write(line)
count +=1
print(str(count + 1) + 'lines written to filter.csv')
print(str(count + 1) + 'lines written to crop.csv')

Step by step
Solved in 2 steps with 2 images









