Okay, have processed all the missing weather data into yearly files, wcond_yyyy.csv,
yyyy = 2015-2025.

Time and File Sizes

Thought I’d show the time it took for each year and the size of the resulting CSV files.

Year20152016201720182019202020212022202320242025
Time1:151:010:590:591:020:571:040:080:010:020:02
Rows30822325031626925824230157

Appears my more consistent recording style in recent years has paid off.

Updating the Database

Okay, now to use those CSV files to update the database. I will do so year by year. During development and testing I will use a copy of the current database.

Lot’s of duplication in the following code. Both of code in the block and of code from previous blocks. But, once again, this is a one time effort. So, decided not to worry about keeping it DRY or any other good coding practices.

    if wc_to_db:
      updt_db = True
      fl_dn =[]
      src_ndx = 0
      f_yr = tmpr_srcs[src_ndx].__str__()[-8:-4]
      n_tbl = rfall.tnms["tp_tnm"]
      e_dt = f"{f_yr}.01.31"

      fl_pth = cwd/"data"
      fl_nm = f"wcond_{f_yr}.csv"
      d_pth = fl_pth/fl_nm
      print(f"\n{src_ndx} -> {d_pth}; end date: {e_dt}")

      d_updt, d_nsrt = [], []
      q_updt, q_nsrt = [], []
      c_mon, p_mon = "", ""
      t_nsrt, t_updt = 0, 0

      # get count of rows in table
      q_rw = f"""SELECT count('datetime') FROM {n_tbl}"""
      rws = rfall.qry_exec(q_rw)
      print(f"\t{rws} rows currently in {n_tbl}")

      with open(d_pth, "r", newline="") as wfl:
        wc_rdr = csv.reader(wfl, delimiter=',')
        for r_wc in wc_rdr:
          if r_wc[0][:10] > e_dt:
            break
          t_mn = r_wc[0][5:7]
          if c_mon != t_mn:
            p_mon, c_mon = c_mon, t_mn
            if d_nsrt:
              if updt_db:  
                # note is many true because passing an array with one or more tuples, not a single tuple
                q_n = f"""INSERT INTO {n_tbl} VALUES (NULL, ?, ?, ?, ?, ?, ?);"""
                rfall.qry_nsrt(q_n, d_nsrt, is_many=True)
              else:
                print("\nlist of table inserts")
                for wc in d_nsrt:
                  print(wc)
              t_nsrt += len(d_nsrt)
              d_nsrt = []
            if d_updt:
              if updt_db:
                q_u = f"""UPDATE {n_tbl} SET condition=? WHERE datetime=?"""
                rfall.qry_updt(q_u, d_updt, is_many=True)
              else:
                print("\nlist of table updates")
                for wc in d_updt:
                  print(wc)
              t_updt += len(d_updt)    
              d_updt = []
            
          chk_wc = get_wc_4dt(r_wc[0][:10])
          f_wc = ""
          if len(chk_wc) > 1:
            for t_wc in chk_wc:
              if not f_wc:
                f_wc = t_wc[2]
          elif len(chk_wc) == 1:
            f_wc = chk_wc[0][2]
          if chk_wc and f_wc != "":
            continue
          else:
            if chk_wc:
              d_updt.append((f"{r_wc[1]}", f"{chk_wc[0][0]} {chk_wc[0][1]}"))
            else:
              if len(r_wc[0]) == 16:
                d_nsrt.append((r_wc[0], None, None, None, r_wc[1], None))
              else:
                d_nsrt.append((f"{r_wc[0]} 06:00", None, None, None, r_wc[1], None))

        # take care of last month
        if d_nsrt:
          if updt_db:
            q_n = f"""INSERT INTO {n_tbl} VALUES (NULL, ?, ?, ?, ?, ?, ?);"""
            rfall.qry_nsrt(q_n, d_nsrt, is_many=True)
            t_nsrt += len(d_nsrt)
          else:
            print("\nlist of table inserts")
            for wc in d_nsrt:
              print(wc)
          d_nsrt = []
        if d_updt:
          if updt_db:
            q_u = f"""UPDATE {n_tbl} SET condition=? WHERE datetime=?"""
            rfall.qry_updt(q_u, d_updt, is_many=True)
            t_updt += len(d_updt)
          else:
            print("\nlist of table updates")
            for wc in d_updt:
              print(wc)
          d_updt = []

        print(f"total inserts: {t_nsrt}, total updates: {t_updt}")
        # get count of rows in table
        q_rw = f"""SELECT count('datetime') FROM {n_tbl}"""
        rws = rfall.qry_exec(q_rw)
        print(f"\t{rws} rows now in {n_tbl}")

And, of course, a bug.

(dbd-3.13) PS R:\learn\dashboard> python data2db.py

1 -> R:\learn\dashboard\data\wcond_2016.csv; end date: 2016.01.31
Traceback (most recent call last):
  File "R:\learn\dashboard\data2db.py", line 1428, in <module>
    rfall.qry_nsrt(q_n, d_nsrt, is_many=True)
    ~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "R:\learn\dashboard\utils\weather_db.py", line 564, in qry_nsrt
    curs.executemany(qry, q_data)
    ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^
sqlite3.IntegrityError: NOT NULL constraint failed: temperature.temperature

Not sure about the best way to go about fixing this. I see a few choices:

  • change the table definition
  • add an extremely low temperature to the insert tuples, then rework code in the Weather_db class to ignore that temperature when getting or processing data
  • don’t add weather condition data if morning temperature not available for that day

Sorting Bug

Having that constraint made sense when I was planning to only have temperatures in the table. But, once I added weather conditions, that is likely no longer the case.

Ok, we know that there are no rows in the temperature table with a temperature for the dates we are trying to insert. I also don’t currently have any methods in the Weather_db class related to the temperature table. So, I guess using an extremely low temperature to cover for the NOT NULL constraint could be done without any major refactoring. But I see that as an unnecessary complication. And, I don’t want to exclude whatever information I have from the database; even if incomplete. So, I guess removing the constraint is how I will proceed. Assuming I can do safely.

Well! Seems SQLite ALTER TABLE does not support DROP CONSTRAINT. So, I will need to:

  • rename the current table
  • recreate the table with the altered constraint
  • move the data from the renamed table into the new table
  • delete/drop the renamed table

So, let’s give that a try. A new if block just for this database modification. In my chk_db_rek module. Lots of print statements to check progress.

if True:
  t_nm = rfall.tnms['tp_tnm']
  # want to drop NOT NULL constraint on temperature column in temperature table

  # get count of rows in table
  q_rw = f"""SELECT count('datetime') FROM {t_nm}"""
  rws = rfall.qry_exec(q_rw)
  print(f"{rws} rows in {t_nm}")

  # rename temperature table
  n_nm = "tmp_temp"
  q_rn = f"""ALTER TABLE {t_nm} RENAME TO {n_nm};"""
  rfall.qry_dlt(q_rn)
  is_temp = rfall.is_tbl_in_db(t_nm)
  is_tmp = rfall.is_tbl_in_db(n_nm)
  print(f"\nis {t_nm} in db: {is_temp};\nis {n_nm} in db: {is_tmp}")

  # create new temperature table, without NOT NULL constraint on temperature field
  # have modified Weather_db method mk_temp_tbl()
  rfall.mk_tbl('tp_tnm')
  is_temp = rfall.is_tbl_in_db(t_nm)
  is_tmp = rfall.is_tbl_in_db(n_nm)
  print(f"\nis {t_nm} in db: {is_temp};\nis {n_nm} in db: {is_tmp}")

  # copy data from old to new table
  q_cp = f"""INSERT INTO {t_nm} SELECT * from {n_nm};"""
  rfall.qry_dlt(q_cp)
  q_rw = f"""SELECT count('datetime') FROM {t_nm}"""
  rws = rfall.qry_exec(q_rw)
  print(f"\n{rws} rows in {t_nm}")

  # delete old table ??
  q_dr = f"""DROP TABLE {n_nm};"""
  rfall.qry_dlt(q_dr)
  is_temp = rfall.is_tbl_in_db(t_nm)
  is_tmp = rfall.is_tbl_in_db(n_nm)
  print(f"\nis {t_nm} in db: {is_temp};\nis {n_nm} in db: {is_tmp}")

  # get schema of new table
  t_nm = rfall.tnms['tp_tnm']
  q_sch = f"""SELECT sql FROM sqlite_schema WHERE name='{t_nm}';"""
  s_sch = rfall.qry_exec(q_sch)
  print("\n", s_sch[0][0])
(dbd-3.13) PS R:\learn\dashboard\utils> python chk_db_rek.py
[(5290,)] rows in temperature

is temperature in db: ([('rainfall',), ('rf_mon_history',), ('tmp_temp',)], False);
is tmp_temp in db: ([('rainfall',), ('rf_mon_history',), ('tmp_temp',)], True)

is temperature in db: ([('rainfall',), ('rf_mon_history',), ('tmp_temp',), ('temperature',)], True);
is tmp_temp in db: ([('rainfall',), ('rf_mon_history',), ('tmp_temp',), ('temperature',)], True)

[(5290,)] rows in temperature

is temperature in db: ([('rainfall',), ('rf_mon_history',), ('temperature',)], True);
is tmp_temp in db: ([('rainfall',), ('rf_mon_history',), ('temperature',)], False)

 CREATE TABLE temperature (
      row_id INTEGER PRIMARY KEY,
      datetime TEXT NOT NULL,
      temperature REAL,
      dmin REAL,
      dmax REAL,
      condition TEXT,
      humidity REAL
      )

And, the temperature column no longer has a NOT NULL constraint. As desired.

Bit of a Surprise

Once I finished the modification of the database, I was surprised to see that the file size had gone from 324 KB to 564 KB. Couldn’t see why that would happen. The temporary table is definitely gone.

From the SQLite documentation:

First, when you drop database objects such as tables, views, indexes, and triggers or delete data from tables, the database file size remains unchanged. Because SQLite just marks the deleted objects as free and reserves it for the future uses. As a result, the size of the database file always grows in size.

SQLite VACUUM

So when, after renaming the table, I created the new one. A new object with its own set of data pages was added to the database file. When I copied over the data to the new table, the database grew in size accordingly. When I dropped the original, renamed table, SQLite did not remove those data pages from the file. It just marked them as free. As I go along it will use those up for any new data or objects I add to the database.

I should have realized this on my own. It is a common behavior in file and/or memory based systems.

I could compress the database file (using VACUUM as indicated in the document quoted above). But its not like the database file is that huge. So don’t expect performance to be significantly affected.

Live and learn, eh?

Back to Updating the Database

Okay, time to see if I can now (finally?) add the data from the weather condition CSV files to the database.

Test Month

I started out by just printing out the data for the selected inserts and updates for January 2015.

(dbd-3.13) PS R:\learn\dashboard> python data2db.py

0 -> R:\learn\dashboard\data\wcond_2015.csv; end date: 2015.01.31

list of table inserts
('2015.01.10 06:00', None, None, None, 'cloudy', None)
('2015.01.15 06:00', None, None, None, 'cloudy', None)

list of table updates
('mostly cloudy', '2015.01.06 08:30')
('rain', '2015.01.11 06:00')
('fog', '2015.01.14 09:00')
('cloudy', '2015.01.17 10:21')
('mainly clear', '2015.01.19 08:20')
('clear', '2015.01.20 09:36')
('mostly cloudy', '2015.01.21 08:38')
('fog', '2015.01.25 09:44')
('partly cloudy', '2015.01.26 09:39')
('cloudy', '2015.01.28 07:50')
('partly cloudy', '2015.01.30 08:05')

Then I updated the database for just January 2015.

(dbd-3.13) PS R:\learn\dashboard> python data2db.py

0 -> R:\learn\dashboard\data\wcond_2015.csv; end date: 2015.01.31
        [(5295,)] rows currently in temperature
total inserts: 2, total updates: 11
        [(5297,)] rows now in temperature

Then I used a script I have to check the contents of the database for a selected period of time. And, for the temperature table for January 2015, I got the following.

(dbd-3.13) PS R:\learn\dashboard\utils> python chk_db_rek.py
SELECT * FROM temperature WHERE substr(datetime, 1, 7)='2015.01'

                   row_id  temperature  dmin  dmax      condition humidity
datetime
2015.01.01 07:30       1         -3.6  -3.6   2.4          clear     None
2015.01.02 08:15       2          0.6  -3.6   NaN           rain     None
2015.01.03 07:00       3          1.3   NaN   2.7           snow     None
2015.01.04 08:40       4          1.1   NaN   NaN                    None
2015.01.04 09:20       5          0.9   NaN   NaN           snow     None
2015.01.05 19:45       6          5.8   5.8   NaN           rain     None
2015.01.06 08:30       7          5.5   NaN   8.8  mostly cloudy     None
2015.01.07 14:17       8          6.2   3.5   6.2         cloudy     None
2015.01.08 09:02       9          3.2   3.2   5.8  mostly cloudy     None
2015.01.09 08:06      10          2.1   1.7   6.6         cloudy     None
2015.01.10 06:00    5298          NaN   NaN   NaN         cloudy     None
2015.01.11 06:00      11          5.9   2.1   6.8           rain     None
2015.01.12 07:22      12          6.2   2.1   7.8            fog     None
2015.01.13 07:57      13          3.9   2.1   6.1            fog     None
2015.01.14 09:00      14          0.3  -0.5   8.2            fog     None
2015.01.15 06:00    5299          NaN   NaN   NaN         cloudy     None
2015.01.16 08:00      15          6.9  -0.4   9.9           rain     None
2015.01.17 10:21      16          6.0   4.7   9.6         cloudy     None
2015.01.19 08:20      17          5.0   5.0   9.6   mainly clear     None
2015.01.20 09:36      18          5.1   4.3   8.1          clear     None
2015.01.21 08:38      19          0.5   0.5   6.5  mostly cloudy     None
2015.01.22 09:06      20          5.5   0.5   NaN         cloudy     None
2015.01.23 08:36      21          7.7   5.5  11.1           rain     None
2015.01.24 08:32      22         10.0   7.7  11.5     light rain     None
2015.01.25 09:44      23         10.3   9.4  13.0            fog     None
2015.01.26 09:39      24          6.8   5.6  15.1  partly cloudy     None
2015.01.27 07:55      25          8.0   5.6  10.0  mostly cloudy     None
2015.01.28 07:50      26          7.4   7.3   NaN         cloudy     None
2015.01.29 07:45      27          3.8   2.5   9.5          clear     None
2015.01.30 08:05      28          1.3   1.3  10.1  partly cloudy     None
2015.01.31 09:49      29          3.2   1.0   NaN            fog     None

And, it pretty much looks, like the inserts and updates listed above have been applied to the table.

Full Year Test

Okay, I am going to restore the database to its previous state and run the update for the whole of 2015. With a slight modification, I am going to print, to the terminal, the selected inserts and updates for December so I have something to compare with the output of my checkup script.

The data currently in the table for December 2015 is as follows.

(dbd-3.13) PS R:\learn\dashboard\utils> python chk_db_rek.py
SELECT * FROM temperature WHERE substr(datetime, 1, 7)='2015.12'

                   row_id  temperature  dmin  dmax condition humidity
datetime
2015.12.01 08:40     267          4.7  -2.6   4.7               None
2015.12.03 09:01     268          9.3   4.7  14.6               None
2015.12.04 09:28     269          7.0   6.1   8.6               None
2015.12.05 09:19     270          7.7   7.0   7.7               None
2015.12.06 08:17     271          6.4   5.7  11.0               None
2015.12.07 08:39     272          8.5   6.4   NaN               None
2015.12.08 08:53     273         11.2   7.8  12.8               None
2015.12.09 10:31     274          9.1   7.4  11.0               None
2015.12.10 10:00     275          6.3   6.1   8.2               None
2015.12.11 10:01     276          6.8   6.3   9.4     clear     None
2015.12.13 08:30     277          6.2   2.3   6.6               None
2015.12.14 10:41     278          5.1   4.2   6.0      rain     None
2015.12.16 10:12     279          2.6   1.0   4.6               None
2015.12.17 08:55     280          1.9   0.8   7.2               None
2015.12.20 14:14     281          6.6   0.9   6.6               None
2015.12.21 10:09     282          4.0   3.7   5.6               None
2015.12.22 09:01     283          2.7   2.7   3.7               None
2015.12.23 08:22     284          2.2   1.9   4.2      rain     None
2015.12.24 09:20     285          2.4   1.9   2.7               None
2015.12.25 10:14     286          0.2  -0.4   3.0      rain     None
2015.12.26 10:33     287          1.0  -0.2   2.7      rain     None
2015.12.27 10:30     288          2.1   0.8   NaN      rain     None
2015.12.28 10:21     289          2.7   1.0   2.7               None
2015.12.31 10:02     290         -1.2  -3.8   NaN               None

And, in the terminal I got the followoing.

(dbd-3.13) PS R:\learn\dashboard> python data2db.py

0 -> R:\learn\dashboard\data\wcond_2015.csv; end date: 2016.01.01
        [(5295,)] rows currently in temperature

list of table inserts
('2015.12.02 06:00', None, None, None, 'rain', None)
('2015.12.12 08:44', None, None, None, 'cloudy', None)
('2015.12.18 06:00', None, None, None, 'cloudy', None)
('2015.12.19 06:00', None, None, None, 'rain', None)
('2015.12.29 06:00', None, None, None, 'mostly clear', None)

list of table updates
('mostly cloudy', '2015.12.01 08:40')
('mostly cloudy', '2015.12.04 09:28')
('rain', '2015.12.05 09:19')
('heavy rain', '2015.12.08 08:53')
('mostly cloudy', '2015.12.09 10:31')
('light rain', '2015.12.10 10:00')
('light rain', '2015.12.13 08:30')
('clear', '2015.12.16 10:12')
('cloudy', '2015.12.17 08:55')
('rain', '2015.12.20 14:14')
('cloudy', '2015.12.21 10:09')
('fog', '2015.12.22 09:01')
('mostly cloudy', '2015.12.28 10:21')
total inserts: 64, total updates: 206
        [(5359,)] rows now in temperature
processing time: 2.7680490000057034 sec

The new state of the December data is as follows.

(dbd-3.13) PS R:\learn\dashboard\utils> python chk_db_rek.py
SELECT * FROM temperature WHERE substr(datetime, 1, 7)='2015.12'

                   row_id  temperature  dmin  dmax      condition humidity
datetime
2015.12.01 08:40     267          4.7  -2.6   4.7  mostly cloudy     None
2015.12.02 06:00    5357          NaN   NaN   NaN           rain     None
2015.12.03 09:01     268          9.3   4.7  14.6                    None
2015.12.04 09:28     269          7.0   6.1   8.6  mostly cloudy     None
2015.12.05 09:19     270          7.7   7.0   7.7           rain     None
2015.12.06 08:17     271          6.4   5.7  11.0                    None
2015.12.07 08:39     272          8.5   6.4   NaN                    None
2015.12.08 08:53     273         11.2   7.8  12.8     heavy rain     None
2015.12.09 10:31     274          9.1   7.4  11.0  mostly cloudy     None
2015.12.10 10:00     275          6.3   6.1   8.2     light rain     None
2015.12.11 10:01     276          6.8   6.3   9.4          clear     None
2015.12.12 08:44    5358          NaN   NaN   NaN         cloudy     None
2015.12.13 08:30     277          6.2   2.3   6.6     light rain     None
2015.12.14 10:41     278          5.1   4.2   6.0           rain     None
2015.12.16 10:12     279          2.6   1.0   4.6          clear     None
2015.12.17 08:55     280          1.9   0.8   7.2         cloudy     None
2015.12.18 06:00    5359          NaN   NaN   NaN         cloudy     None
2015.12.19 06:00    5360          NaN   NaN   NaN           rain     None
2015.12.20 14:14     281          6.6   0.9   6.6           rain     None
2015.12.21 10:09     282          4.0   3.7   5.6         cloudy     None
2015.12.22 09:01     283          2.7   2.7   3.7            fog     None
2015.12.23 08:22     284          2.2   1.9   4.2           rain     None
2015.12.24 09:20     285          2.4   1.9   2.7                    None
2015.12.25 10:14     286          0.2  -0.4   3.0           rain     None
2015.12.26 10:33     287          1.0  -0.2   2.7           rain     None
2015.12.27 10:30     288          2.1   0.8   NaN           rain     None
2015.12.28 10:21     289          2.7   1.0   2.7  mostly cloudy     None
2015.12.29 06:00    5361          NaN   NaN   NaN   mostly clear     None
2015.12.31 10:02     290         -1.2  -3.8   NaN                    None

Quick check says all the December selected inserts and updates are now in the temperature table. And the current January data matches what we had above. So, I am thinking, for 2015, this has worked as intended.

Done

Think that’s it for this one. I will over the next day or two add the data in the remaining CSV files one year at a time.

Until next time, may you see your bugs as opportunities.