I have significantly refactored the code and logic I am using to parse the daily files for temperature(s), weather condition and/or humidity. This was a result of refactoring the code to deal with the changes in the daily weather notes starting in 2021.
Refactor
Starting in 2022, I didn’t put the time in the note with the temperature. I just assumed it was whenever I got up and started up the PC. But, I also started recording the weather at YVR as a comparison or replacement if I couldn’t record my own numbers. But it also gave me a value for wind at the time I recorded it.
So I have a separate if block for the high/low case. Then the next if/elif block looks for a time, new function, in the note and proceeds accordingly. The elif looks for YVR in the note, and if found proceeds accordingly. Then there is a separate block of code looking for a humidity value. I slightly refactored the get_wthr_cond() function to look for a couple other conditions (e.g. fog, light or heavy rain, etc.).
I also refactored the code that sorts the year to use for naming the CSV file. And the start and end dates for the loop. I added a new variable, t_do to track where I was in the loop if an error occurred. It wasn’t always easy to see what needed fixing in the data files.
Here’s the relavent bits of code. There was a fair bit of refactoring and bug fixing during development. Won’t bother covering all of that—didn’t take notes.
... ...
def get_wthr_cond(uc_nt):
"""Check for any weather condition(s) in the passed in daily note"""
nt = uc_nt.casefold()
d_cnd = ""
if "mostly cloudy" in nt or "overcast" in nt:
d_cnd = "mostly cloudy"
elif "partly cloudy" in nt:
d_cnd = "partly cloudy"
elif "cloudy" in nt or "clouds" in nt:
d_cnd = "cloudy"
elif "light rain" in nt:
d_cnd = "light rain"
elif "heavy rain" in nt:
d_cnd = "heavy rain"
elif "rain " in nt or "raining" in nt:
d_cnd = "rain"
elif "light snow" in nt:
d_cnd = "light snow"
elif "heavy snow" in nt:
d_cnd = "heavy snow"
elif "snow " in nt or "snow," in nt or "snowing" in nt:
d_cnd = "snow"
elif "fog" in nt or "foggy" in nt:
d_cnd = "fog"
elif "mainly clear" in nt or "mainly sunny" in nt:
d_cnd = "mainly clear"
elif "mostly clear" in nt or "mostly sunny" in nt:
d_cnd = "mostly clear"
elif "clear " in nt or "sun " in nt or "sunny" in nt:
d_cnd = "clear"
return d_cnd
def time_in_note(nt):
t_rgx = r"^.*?(\d{2}:\d{2})"
rx = re.compile(t_rgx, re.IGNORECASE )
t_cur = rx.search(d_nt)
return t_cur is not None
... ...
src_ndx = 0
s_pth = tmpr_srcs[src_ndx].__str__()
print(f"{src_ndx} -> {s_pth}")
tst_mon = "01"
tst_yr = s_pth[-8:-4]
s_dt = f"{tst_yr}.01.01"
e_dt = f"{tst_yr}.12.31"
# CSV file
csv_nm = f"tmpr_{tst_yr}.csv"
csv_pth = fl_pth/csv_nm
fh_csv = open(csv_pth, "w", encoding='utf8')
# let's try to sort out the regexs I will need to extract all the
# pertinent temperature data, expect there will be more than one
# and order of execution may matter
if tst_rgx:
# I will first check if 'low' and 'high' are in the returned note/row
# if so, I will attempt to extract the low and high temp values
# then look at getting the value for the specified time in that note
t_gnr = wd.get_temp_data(tmpr_srcs[src_ndx])
for rw in t_gnr:
# print(f"\n{rw}")
d_dt, d_nt = rw.split(": ", 1)
t_mn = d_dt[5:7]
if d_dt > e_dt:
break
if d_dt >= s_dt:
w_cnd = get_wthr_cond(d_nt)
c_hgh, c_low, c_tmp = "", "", ""
# may or may not be any time in note
have_time = time_in_note(d_nt)
t_do = ""
try:
if "low " in d_nt and "high " in d_nt and "high/low" not in d_nt:
t_do = "high"
h_rgx = r"^.*?high.*?([-\.0-9]+)\&"
rx = re.compile(h_rgx, re.IGNORECASE )
if rx is not None:
c_hgh = rx.search(d_nt).group(1)
t_do = "low"
l_rgx = r"^.*?low.*?([-\.0-9]+)\&"
rx = re.compile(l_rgx, re.IGNORECASE )
if rx is not None:
c_low = rx.search(d_nt).group(1)
if have_time:
t_do = "have time if"
t_rgx = r"^.*?(\d{2}:\d{2}).*?([-\.0-9]+)\&"
rx = re.compile(t_rgx, re.IGNORECASE )
t_cur = rx.search(d_nt)
if t_cur is not None:
c_tmp = f"{t_cur.group(1)} + {t_cur.group(2)}"
else:
t_do = "have time else"
t_rgx = r"^.*([-\.0-9]+)\&.*?(\d{2}:\d{2})\&"
rx = re.compile(t_rgx, re.IGNORECASE )
t_cur = rx.search(d_nt)
if t_cur is not None:
c_tmp = f"{t_cur.group(2)} + {t_cur.group(1)}"
elif "YVR" in d_nt or "yvr" in d_nt:
## deal with changes made in note structure staring in 2022
t_do = "yvr"
t_rgx = r"^.*?([-\.0-9]+)\&"
rx = re.compile(t_rgx, re.IGNORECASE )
t_cur = rx.search(d_nt)
c_tmp = f"06:00 + {t_cur.group(1)}"
t_do = "humidity"
u_rgx = r"\(([\d\.]+\%)"
rx = re.compile(u_rgx, re.IGNORECASE )
u_cur = rx.search(d_nt)
c_hum = u_cur.group(1) if u_cur else ""
if c_tmp != "":
fh_csv.write(f"{d_dt},{c_hgh},{c_low},{c_tmp},{w_cnd},{c_hum}\n")
ct_hgh = (c_hgh != "" and c_hgh in c_tmp)
ct_low = (c_low != "" and c_low in c_tmp)
if ct_hgh or ct_low:
fh_csv.write(f"# {'curr == high' if ct_hgh else 'curr == low'}\n")
except Exception as e:
fh_csv.write(f"# Error: {e}\n# {d_dt}: {t_do}\n# {d_nt}\n")
fh_csv.close()
But, the above worked rather well. The number of messages I was getting previously went down considerably. And, I was able to quickly fix the errors that remained. Took me less than a couple of hours to eliminate the remaining errors. Though I have yet to deal with the current temperature being equal to the 24 hour low or high. Don’t think I need to worry about the current temperature being equal to the 24 hour low. But, the high case is something I will need to sort out.
Refactor Again
I am thinking that I will now refactor the code to write a proper CSV line and put the 24 hour high in the correct date’s CSV line. Which means I will need to account for the case where the current temperature equals the 24 hour high. Definitely can’t use that as the previous days high. So will likely leave the previous day’s high empty.
Will definitely need to re-design the daily temperature table to account for the new and potentially missing values. For the moment I am going to assume the table will be amended to look as follows.
tp_tbl = f"""CREATE TABLE IF NOT EXISTS {c_tnm} (
row_id INTEGER PRIMARY KEY,
datetime TEXT NOT NULL,
temperature REAL NOT NULL,
dmin REAL,
dmax REAL,
condition TEXT,
humidity REAL
);"""
During the refactoring I found a few instances of typos with a comma rather than a decimal point. Found and fixed them all in all the files.
I decided I would need to maintain two lists of temperature/weather data. One for the current day and one for the previous. Since the possible high for the previous day is in the current day’s data, I needed to keep the previous day’s data around until I could decide whether or not to update its high temperature. I would only do so if the 24 hour high was present and did not equal the current temperature. If it did equal the current temperature, I would place it, tentatively, as the current day’s high. Which might change when the data for the next day gets loaded. As I am only processing one file at a time, no easy way for me to update the high for the last day of year in each file.
I also had to deal with the situation where there were multiple entries for a given day. Which required me to modify the order of code execution in the daily note processing loop. I also moved the daily note parsing code into a new function. Unfortunately, I have not yet sorted out how to handle the case where there are multiple lows and/or highs for a given day. The way my code runs, I have no way to deal with the case where there are more than two entries for a given date.
Sorry, lots of code duplication from previous and current posts. And, a lot of debugging and troubleshooting to get to the following that I am not going to discuss.
... ...
tst_rgx = False
tst_csv = True
... ...
def parse_note(d_nt):
"""Parse the daily note to obtain temperature/weather related data
Params:
d_nt: the daily note, text, to parse
Returns:
if no error, returns any temperature/weather data found, as strings,
any or all could be empty strings
- (curr time, curr temp, daily low, daily high, weather condition, humidity)
otherwise, returns error related data
- ("Error", error msg, where error occured in parsing loop)
"""
w_cnd = get_wthr_cond(d_nt)
c_hgh, c_low, c_tmp, c_tm, c_hum = "", "", "", "", ""
# may or may not be any time in note
have_time = time_in_note(d_nt)
t_do = ""
try:
if "low " in d_nt and "high " in d_nt and "high/low" not in d_nt:
t_do = "high"
h_rgx = r"^.*?high.*?([-\.0-9]+)\&"
rx = re.compile(h_rgx, re.IGNORECASE )
if rx is not None:
c_hgh = rx.search(d_nt).group(1)
t_do = "low"
l_rgx = r"^.*?low.*?([-\.0-9]+)\&"
rx = re.compile(l_rgx, re.IGNORECASE )
if rx is not None:
c_low = rx.search(d_nt).group(1)
if have_time:
t_do = "have time if"
t_rgx = r"^.*?(\d{2}:\d{2}).*?([-\.0-9]+)\&"
rx = re.compile(t_rgx, re.IGNORECASE )
t_cur = rx.search(d_nt)
if t_cur is not None:
c_tm = t_cur.group(1)
c_tmp = t_cur.group(2)
else:
t_do = "have time else"
t_rgx = r"^.*([-\.0-9]+)\&.*?(\d{2}:\d{2})\&"
rx = re.compile(t_rgx, re.IGNORECASE )
t_cur = rx.search(d_nt)
if t_cur is not None:
c_tm = t_cur.group(2)
c_tmp = t_cur.group(1)
elif "YVR" in d_nt or "yvr" in d_nt:
t_do = "yvr"
t_rgx = r"^.*?([-\.0-9]+)\&"
rx = re.compile(t_rgx, re.IGNORECASE )
t_cur = rx.search(d_nt)
c_tm = "06:00"
c_tmp = t_cur.group(1)
t_do = "humidity"
u_rgx = r"\(([\d\.]+\%)"
rx = re.compile(u_rgx, re.IGNORECASE )
u_cur = rx.search(d_nt)
c_hum = u_cur.group(1) if u_cur else ""
return((c_tm, c_tmp, c_low, c_hgh, w_cnd, c_hum))
except Exception as e:
return(("Error", e, t_do))
... ...
if tst_rgx or tst_csv:
t_gnr = wd.get_temp_data(tmpr_srcs[src_ndx])
# let's try to sort out the regexs I will need to extract all the
# pertinent temperature data, expect there will be more than one
# and order of execution may matter
if tst_rgx:
... ...
if tst_csv:
pt_data, ct_data = [''] * 7, [''] * 7
p_dt, p_tm, c_dt, c_tm = "", "", "", ""
same_dt = False
for rw in t_gnr:
# print(f"\n{rw}")
d_dt, d_nt = rw.split(": ", 1)
t_mn = d_dt[5:7]
if d_dt >= s_dt:
p_rslt = parse_note(d_nt)
if p_rslt[1] != "Error":
n_tm, c_tmp, c_low, c_hgh, w_cnd, c_hum = p_rslt
if d_dt != c_dt:
p_dt, c_dt = c_dt, d_dt
else:
p_tm, c_tm = c_tm, n_tm
if pt_data[2]:
fh_csv.write(f'{",".join(pt_data)}\n')
pt_data, ct_data = ct_data, [''] * 7
if c_tmp != "":
ct_data = [c_dt, n_tm, c_tmp, c_low, "", w_cnd, c_hum]
if c_hgh != c_tmp:
pt_data[4] = c_hgh
else:
ct_data[4] = c_hgh
if pt_data[0] == ct_data[0]:
# if two entries for same date, make sure 1st entry has correct low and high
# and second has no low or high
if ct_data[4] > pt_data[4]:
pt_data[4] = ct_data[4]
ct_data[4] = ""
if ct_data[3] < pt_data[3]:
pt_data[3] = ct_data[3]
ct_data[3] = ""
ct_data[3], ct_data[3] = "", ""
else:
print(p_rslt)
if c_hgh != c_tmp:
pt_data[4] = c_hgh
else:
ct_data[4] = c_hgh
if pt_data[2]:
fh_csv.write(f'{",".join(pt_data)}\n')
if ct_data[2]:
fh_csv.write(f'{",".join(ct_data)}\n')
fh_csv.close()
And, after processing the 2015 note file, I have 290 rows of data in the appropriate CSV file. A check of the first and last 10 or so entries seems to say that the code is working as intended. I am for now going to assume that is the case. Here’s a short sample.
2015.01.01,07:30,-3.6,-3.6,2.4,clear,
2015.01.02,08:15,0.6,-3.6,,rain,
2015.01.03,07:00,1.3,,2.7,snow,
2015.01.04,08:40,1.1,,,,
2015.01.04,09:20,0.9,,,snow,
2015.01.05,19:45,5.8,5.8,,rain,
2015.01.06,08:30,5.5,,8.8,,
2015.01.07,14:17,6.2,3.5,6.2,,
... ...
2015.07.01,14:59,30.4,18.0,32.2,,
2015.07.03,09:29,24.0,18.3,30.5,,
2015.07.04,09:38,24.3,17.7,20.2,,
2015.07.05,09:04,26.1,19.1,33.9,,
2015.07.06,10:32,23.1,18.5,28.3,,
2015.07.07,12:07,22.6,16.9,,,
... ...
2015.12.24,09:20,2.4,1.9,2.7,,
2015.12.25,10:14,0.2,-0.4,3.0,,
2015.12.26,10:33,1.0,-0.2,2.7,rain,
2015.12.27,10:30,2.1,0.8,,rain,
2015.12.28,10:21,2.7,1.0,2.7,,
2015.12.31,10:02,-1.2,-3.8,,,
Done
Well, possibly more to be said and/or done; but, I think I will call this post finished.
Next post I will look at modifying the temperature table and start on the code to parse the files and update the table. Will be interesting to see how things work, as there will be two or three thousand rows in the table right off the bat.
Until we meet again, when you have time, do look at refactoring your code. One never knows what you will find.