Okay, going to have a shot at the daily note files. They were originally only meant to record garden and yard related notes. But eventually became a sort of diary of anything I felt like writing down.
Refactor Generator
Decided to refactor the file data generator to work with both types of files. Here’s the first attempt.
def get_rg_data(rg_fl, f_typ="rg"):
'''Generator for raingauge php files, only want certain lines so decided use generator
Parameters:
rg_fl: full path to file
f_typ: which file version is file, 'rg' = rain gauge style, 'dn' = daily note style
'''
# d_yr = rg_fl.name[10:14]
# print(d_yr)
with open(rg_fl, "r") as ds:
c_dt = ""
for ln in ds:
# print(ndx)
ln = ln.strip()
do_yld = False
if f_typ == "rg" and ln[:4] == "<br>" and ln[8] == "." and ln[11] == '.' and ln[14] != ":":
do_yld = True
if f_typ == "dn":
if ln[:4] == "<h3>":
c_dt = ln[4:14]
# get list items that may contain rain (mm) or snowfall (cm) info
if "no rain" not in ln and "mm " in ln or "cm " in ln:
do_yld = True
ln = f"{c_dt} {ln}"
if do_yld:
yield ln
Then in data2db.py
, I wrote some code to print out the first few lines returned by the generator using webs\BaRKqgs\gdn\bark_gdn_2022.php
as the source. And new if block of course. That bit of code looks like the following. I did do some refactoring, so this is not actually the very first code attempt.
if __name__ == "__main__":
tst_1 = False
tst_2 = False
tst_3 = False
tst_4 = False
do_rg_2019 = False
do_rg_2021 = False
do_dn = True
... ...
if do_dn:
chk_i_rf = False
proc_fl = True
if proc_fl:
# process the data in the daily note file(s) month by month
# add daily rainfall too database, record dates done
p_mn, c_mn = "", "" # prev mon, curr mon
rf_rws, t_mn, m_dts = [], 0, {} # rainfall rows, tot mon, mon dates
# set up generator for combined data file
# print(d_srcs[11])
# exit(0)
gen_d = get_rg_data(d_srcs[11], f_typ="dn")
c_lp = 0
s_tm = time.perf_counter()
for d_rw in gen_d:
if c_lp == 10:
break
c_dt = d_rw[:10]
c_ln = d_rw[11:]
print(f"{c_dt}\n\t{c_ln}")
c_lp += 1
e_tm = time.perf_counter()
The output is still frightening.
(dbd-3.13) PS R:\learn\dashboard> python data2db.py
2022.01.02
<li>08:00, all gauges still non-functional. Maybe 1 cm of snow overnight.</li>
2022.01.03
<li>08:00 & 1.5mm ?? rain last 24 hours, 1.5mm for the month. Tube: ~?mm.</li>
2022.01.04
<li>With a little more light, I am guessing a little more than 1 cm snow so far.</li>
2022.01.04
<li>08:00 & 0.7mm rain last 24 hours, 2.2mm for the month. Tube: ~?mm.</li>
2022.01.04
<li>Started snowing while we were working on supper. Likely around 15:15 or so. As of 15:45 big flakes coming down in a steady stream. If this keeps up all night, will have another 5-10 cm on the ground by tomorrow morning.</li>
2022.01.04
<li>15:55, snowfall looking a fair bit heavier now than 10 minutes ago. At this rate for the night, I'd bet on at least 10cm locally. If so, would definitely be a snowplow rather than a snow shovel moment.</li>
2022.01.05
<li>Winter storm warning issued for tonight through Thursday. Anyhwere from 10 - 30 cm depending on location.</li>
2022.01.05
<li>08:00 & 4.1mm rain last 24 hours, 6.3mm for the month. That certainly doesn't account for the latest snowfall. Which has once again filled the collectors of both rain gauges. Tube: ~?mm. </li>
2022.01.05
<li>Early to mid afternoon shovelled the patio and used snowblower on the driveway. Left behind lots of ice — the stuff that froze the other day before last night's snowfall. ~3.8cm snow on the driveway. Probably something like 3mm rainfall equivalent. Really quite slippery. Also turned on hummer hearths as it was -3 to -4° when I checked before coming in.</li>
2022.01.06
<li>08:00 & gauges all frozen or covered with snow. 6.3mm for the month. Tube: ~?mm.</li>
proc time: 0.004
Okay, let’s refactor the code and look at the first few rows for February, 2022.
if c_dt[:7] == "2022.02":
print(f"{c_dt}\n\t{c_ln}")
c_lp += 1
And in the terminal I got the following.
(dbd-3.13) PS R:\learn\dashboard> python data2db.py
2022.02.01
<li>Special weather statement on weather.gc.ca. Snowfall of 2-5 cm headed our way for tomorrow. Glad we can stay home.</li>
2022.02.03
<li>08:00 & e-g 6.8 mm rain last 24 hours, 6.8mm for the month; probably includes some melted snow. Tube: ~?mm.</li>
2022.02.04
<li>08:00 & e-g 12.5mm rain last 24 hours, 19.3mm for the month. Tube: ~?mm.</li>
2022.02.05
<li>08:00 & 4.3mm rain last 24 hours, 23.6mm for the month. Tube: ~?mm.</li>
2022.02.09
<li>08:00 & 4.3mm rain last 24 hours, 28.1mm for the month. Tube: ~?mm.</li>
2022.02.11
<li>08:00 & 0.3mm rain last 24 hours, 28.4mm for the month. Tube: ~?mm.</li>
2022.02.14
<li>08:00 & 6.4mm rain last 24 hours, 34.8mm for the month. Tube: ~?mm.</li>
2022.02.15
<li>08:00 & 1.2mm rain last 24 hours, 36.0mm for the month. Tube: ~?mm.</li>
2022.02.19
<li>Rain started sometime after 07:00. 08:00 & 1.8mm rain last 24 hours, 37.8mm for the month. Tube: ~?mm.</li>
2022.02.20
<li>08:00 & 7.1mm rain last 24 hours, 44.9mm for the month. Tube: ~?mm.</li>
proc time: 0.005
Well, that gives me a little hope. Okay, let’s try parsing February and see what we get.
Parse Test Month
I am going to refactor the parsing function to accept the file type and parse the supplied data accordingly.
def parse_rg_row(d_rw, f_typ="rg"):
"""Parse supplied string to obtain the recorded time and rainfall/snowfall amount at the time recorded.
params:
d_rw: the string to be parsed
d_typ: the type of source file from which the data has been taken, "rg", "dn"
returns:
result of appropriage regex match/search
"""
if f_typ == "rg":
rgx = r"^<br>(\d{4}\.\d{2}\.\d{2}) ~?(.*?): [<~]?(.*?) ?mm"
elif f_typ == "dn":
# at least one note had sentence before the time and rainfall amount
rgx = r".*?(\d{2}:\d{2}) \&\;.*?(\d+\.\d+) ?(mm|cm)"
rx = re.compile(rgx, re.IGNORECASE )
return rx.match(d_rw)
And a quick test uing those first few days of February, 2022.
if c_dt[:7] == "2022.02":
print(f"{c_dt}\n\t{c_ln}")
mtch = parse_rg_row(c_ln, f_typ="dn")
if not mtch:
print(f"no match => {mtch}")
else:
d_tm, d_rf, d_unit = mtch.group(1), mtch.group(2), mtch.group(3)
print(f"\tmatch -> time: {d_tm}, r/sf: {d_rf}, units: {d_unit}")
c_lp += 1
(dbd-3.13) PS R:\learn\dashboard> python data2db.py
2022.02.01
<li>Special weather statement on weather.gc.ca. Snowfall of 2-5 cm headed our way for tomorrow. Glad we can stay home.</li>
no match => None
2022.02.03
<li>08:00 & e-g 6.8 mm rain last 24 hours, 6.8mm for the month; probably includes some melted snow. Tube: ~?mm.</li>
match -> time: 08:00, r/sf: 6.8, units: mm
2022.02.04
<li>08:00 & e-g 12.5mm rain last 24 hours, 19.3mm for the month. Tube: ~?mm.</li>
match -> time: 08:00, r/sf: 12.5, units: mm
2022.02.05
<li>08:00 & 4.3mm rain last 24 hours, 23.6mm for the month. Tube: ~?mm.</li>
match -> time: 08:00, r/sf: 4.3, units: mm
2022.02.09
<li>08:00 & 4.3mm rain last 24 hours, 28.1mm for the month. Tube: ~?mm.</li>
match -> time: 08:00, r/sf: 4.3, units: mm
2022.02.11
<li>08:00 & 0.3mm rain last 24 hours, 28.4mm for the month. Tube: ~?mm.</li>
match -> time: 08:00, r/sf: 0.3, units: mm
2022.02.14
<li>08:00 & 6.4mm rain last 24 hours, 34.8mm for the month. Tube: ~?mm.</li>
match -> time: 08:00, r/sf: 6.4, units: mm
2022.02.15
<li>08:00 & 1.2mm rain last 24 hours, 36.0mm for the month. Tube: ~?mm.</li>
match -> time: 08:00, r/sf: 1.2, units: mm
2022.02.19
<li>Rain started sometime after 07:00. 08:00 & 1.8mm rain last 24 hours, 37.8mm for the month. Tube: ~?mm.</li>
match -> time: 08:00, r/sf: 1.8, units: mm
2022.02.20
<li>08:00 & 7.1mm rain last 24 hours, 44.9mm for the month. Tube: ~?mm.</li>
match -> time: 08:00, r/sf: 7.1, units: mm
proc time: 0.007
Okay let’s do the whole of February.
c_mon = ""
t_mon = 0
s_tm = time.perf_counter()
for d_rw in gen_d:
c_dt = d_rw[:10]
c_ln = d_rw[11:]
if c_mon != c_dt[5:7]:
c_mon = c_dt[5:7]
t_mon = 0
if c_mon == "02":
mtch = parse_rg_row(c_ln, f_typ="dn")
if not mtch:
print(f"\nno match -> {c_dt}: {c_ln}\n")
else:
d_tm, d_rf, d_unit = mtch.group(1), mtch.group(2), mtch.group(3)
c_rf = float(d_rf)
if d_unit == "mm":
t_mon += c_rf
else:
t_mon += c_rf / 10
print(f"{c_dt} -> time: {d_tm}, r/sf: {c_rf}, units: {d_unit}, mon2dt: {t_mon:.2f}")
if c_mon == "03":
break
e_tm = time.perf_counter()
print(f"proc time: {(e_tm - s_tm):.3f}")
And, after fixing one entry for February, I got the following in the terminal.
(dbd-3.13) PS R:\learn\dashboard> python data2db.py
no match -> 2022.02.01: <li>Special weather statement on weather.gc.ca. Snowfall of 2-5 cm headed our way for tomorrow. Glad we can stay home.</li>
2022.02.03 -> time: 08:00, r/sf: 6.8, units: mm, mon2dt: 6.80
2022.02.04 -> time: 08:00, r/sf: 12.5, units: mm, mon2dt: 19.30
2022.02.05 -> time: 08:00, r/sf: 4.3, units: mm, mon2dt: 23.60
2022.02.08 -> time: 08:00, r/sf: 4.5, units: mm, mon2dt: 28.10
2022.02.09 -> time: 08:00, r/sf: 4.3, units: mm, mon2dt: 32.40
2022.02.11 -> time: 08:00, r/sf: 0.3, units: mm, mon2dt: 32.70
2022.02.14 -> time: 08:00, r/sf: 6.4, units: mm, mon2dt: 39.10
2022.02.15 -> time: 08:00, r/sf: 1.2, units: mm, mon2dt: 40.30
2022.02.19 -> time: 08:00, r/sf: 1.8, units: mm, mon2dt: 42.10
2022.02.20 -> time: 08:00, r/sf: 7.1, units: mm, mon2dt: 49.20
2022.02.21 -> time: 08:00, r/sf: 0.5, units: mm, mon2dt: 49.70
no match -> 2022.02.24: <li>Woke to a cm or so of snow covering the landscape — natural and man-made. YVR says mainly clear, -3.5°C, wind E 8 km/h (wind chill -7). Forecast says back to more normal temperatures and rain starting Saturday.</li>
2022.02.25 -> time: 08:00, r/sf: 0.8, units: mm, mon2dt: 50.50
2022.02.27 -> time: 08:00, r/sf: 14.7, units: mm, mon2dt: 65.20
no match -> 2022.02.28: <li>Rain to start the day. 45 mm rain since Saturday morning. YVR: light drizzle, 7.0°C, wind E 22 km/h. E-thermo on garage has us at 6°C, but only 85% humidity versus 99% at YVR. Expect ours is wrong and likely temperature it is reporting is incorrect.</li>
2022.02.28 -> time: 08:00, r/sf: 30.7, units: mm, mon2dt: 95.90
no match -> 2022.02.28: <?php /* 97.0 - 5.4 = 91.6, 91.6 - 60.9 = 30.7 mm */ ?>
proc time: 0.007
Now, 95.90 doesn’t match the total month in the spreadsheet. But if you look at the notes, I made an error calculating the total month.
2022.02.04
- Rain to start the day. Much heavier than yesterday morning by the sounds in the downspout off the bedroom. YVR says cloudy, 5.4°C, wind E 18 km/h.
- 08:00 & e-g 12.5mm rain last 24 hours, 19.3mm for the month. Tube: ~?mm.
2022.02.05
- Cloudy start to the day, doesn't appear to be raining. YVR says mostly cloudy, 5.3°C, wind SSW 11 km/h.
- 08:00 & 4.3mm rain last 24 hours, 23.6mm for the month. Tube: ~?mm.
2022.02.06
- Mostly clear this morning. YVR says mostly cloudy, 5.3°C, wind SSW 11 km/h.
- 08:00 & ?mm/no rain last 24 hours, 19.3mm for the month. Tube: ~?mm.
2022.02.07
- Believe it is a cloudy start to this Monday. YVR says mostly cloudy, 6.2°C, wind E 16 km/h. Seems to have been a wee touch of rain overnight. Forecast has possibility of some showers for the early morning and some wind as well.
- Busy elsewhere I guess, missed recording. 08:00 & ?mm/no rain last 24 hours, 19.3mm for the month. Tube: ~?mm.
2022.02.08
- Looks mostly cloudy. YVR says mostly cloudy, 4.6°C, wind E 17 km/h. Hummerhearths not currently on. Forecast says cooler nights later in the week. So won't take them off just yet.
- 08:00 & 4.5mm/no rain last 48 hours, 23.8mm for the month. Tube: ~?mm.
The file added the 4.5 mm rainfall for February 8th to the monthly total. Well the incorrect monthly total. The generator does not present that list item because it contains the string “no rain”. I have fixed that. I believe my 95.9 mm monthly total is the correct value.
Done for Now
Well a another rather short post. But, I really need to take a break and do some thinking.
I am afraid I am going to have to go through each of the remaining years and months using the above approach. And correct any issues before committing the data to the database tables.
I have even thought I should just bite the bullet and manually process the four remaining files/years into some other format and refactor my processing code accordingly.
A bit upset I wasn’t smarter, or at least more consistent, in my approach to recording the rainfall or snowfall equivalent each day.
Until next time, may you accept the challenges your projects throw at you. Whatever the effort required to overcome them.