I always think I am going to get more covered in most posts than I actually do. I guess I could just keep adding content until I completely cover the subject of the post. But, personally, I have always preferred things in bite-size pieces. And, as I wish to publish a post weekly and only have so much time to spend writing them, bite-sized pieces it must be. Likely to the aggravation of some (if any) readers.

We sorted out the generator related helper functions last post. So let’s move on to the Generator class itself.

Generator

As a reminder, last post I said that:

This time around the generator is not going to be a mirror image of the discriminator. Instead there will be a few downsampling blocks to encode the input image. Then a number of residual blocks to transform the image. And finally a few upsampling blocks to generate the output image (decode).

As also mentioned, I don’t know why that architecture is being used. I am just going by what I have found on the web. With the exception of the residual layers, nothing I’ve read provides a meaningful and understandable explanation for the above model design.

Another, to me, mentionable difference, is that the input and output convolutional blocks use a rather large kernel size, $7x7$. Why? Here’s one explantion: What is the Difference Between Small and Large Kernel Size? And from that article, I think the pertinent bit is that a larger kernel size “Emphasizes global patterns and spatial context.” Which is in keeping with our goal to only have the animals change their stripes. Everything else in the image should remain unchanged.

These two blocks neither downsample nor upsample. Though, I guess you could say they are doing a zero down/upsampling given they are using a convolutional layer. The other two blocks in the encoder/decoder do the down and upsampling.

Those other convolutions use the more common $3x3$ kernel size. And, all the tutorials seemed to use $9$ residual blocks between the downsampling and upsampling networks.

Convolution Arithmetic

Let’s have a quick look at what the encoder does to the size of the input feature at each layer. Refer to the previous post for the formula. You can determine the values I will be using for each convolutional layer’s kernel size, stride and padding from the arithmetic shown below.

And, unlike me, do not confuse the following with the values for the input and output channels of the convolutional method in each encoder layer. Likely obviously, the reverse will be the case for the decoder layers of the generator network. In our case the residual blocks do not alter the feature tensor’s shape.

$$n_{out1} = \left[\frac{256 + 2*3 - 7}{1}\right] + 1 = 256$$
$$n_{out2} = \left[\frac{n_{out1} + 2*1 - 3}{2}\right] + 1 = 128$$
$$n_{out3} = \left[\frac{n_{out2} + 2*1 - 3}{2}\right] + 1 = 64$$

Generator Class

The code below should hopefully follow from the discussions above.

class Generator(nn.Module):
  def __init__(self, init_feats, nbr_rblks=9):
    super().__init__()
    self.features = init_feats
    self.nbr_rblks = nbr_rblks
  
    # encoding/downsampling, using a number of method defaults
    self.encoder = nn.Sequential(
      GConvBlock(3, init_feats, k_sz=7, s_sz=1, ip_sz=3, normalize=False),
      GConvBlock(init_feats, init_feats*2, k_sz=3, s_sz=2, ip_sz=1),
      GConvBlock(init_feats*2, init_feats*4, k_sz=3, s_sz=2, ip_sz=1)
    )

    # using destructuring of list comprehension to generate network of 9 residual blocks
    self.residuals = nn.Sequential(
        *[ResidualBlock(init_feats*4, k_sz=3, s_sz=1, p_sz=1) for _ in range(nbr_rblks)]
    )

    # decoding/upsampling, essentially mirror of the encoder, last block does not upsample
    self.decoder = nn.Sequential(
      GConvBlock(init_feats*4, init_feats*2, k_sz=3, s_sz=2, ip_sz=1, upsample=True),
      GConvBlock(init_feats*2, init_feats, k_sz=3, s_sz=2, ip_sz=1, upsample=True),
      GConvBlock(init_feats, 3, k_sz=7, s_sz=1, ip_sz=3, activation=False, normalize=False)
    )


  def forward(self, x):
    x = self.encoder(x)
    x = self.residuals(x)
    x = self.decoder(x)

    return torch.tanh(x)

And another quick test. Output pretty much as expected I think.

if __name__ == "__main__":
  torch.manual_seed(cfg.pt_seed)

  tst_GN = False
  tst_DCB = False
  tst_D = False
  tst_GCB = False
  tst_RB = False
  tst_G = True
... ...
  if tst_G:
    gnr = Generator(64, nbr_rblks=9)
    x = torch.rand((1, 3, 256, 256))
    out = gnr(x)
    print(f"\n{gnr}")
    # size of features following each layer of encoder
    print(f"\n((256 + 2*3 - 7) / 1) + 1 = {int(((256 + 2*3 - 7) / 1) + 1)}")
    print(f"((256 + 2*1 - 3) / 2) + 1 = {int(((256 + 2*1 - 3) / 2) + 1)}")
    print(f"((128 + 2*1 - 3) / 2) + 1 = {int(((128 + 2*1 - 3) / 2) + 1)}")
    # tensor in and out should be same shape
    print(f"\nx: {x.shape}, out: {out.shape}")

(mclp-3.12) PS F:\learn\mcl_pytorch\proj6> python models.py

Generator(
  (encoder): Sequential(
    (0): GConvBlock(
      (layers): Sequential(
        (0): Conv2d(3, 64, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), padding_mode=reflect)
        (1): Identity()
        (2): ReLU(inplace=True)
      )
    )
    (1): GConvBlock(
      (layers): Sequential(
        (0): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), padding_mode=reflect)
        (1): InstanceNorm2d(128, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
        (2): ReLU(inplace=True)
      )
    )
    (2): GConvBlock(
      (layers): Sequential(
        (0): Conv2d(128, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), padding_mode=reflect)
        (1): InstanceNorm2d(256, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
        (2): ReLU(inplace=True)
      )
    )
  )
  (residuals): Sequential(
    (0): ResidualBlock(
      (layers): Sequential(
        (0): GConvBlock(
          (layers): Sequential(
            (0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), padding_mode=reflect)
            (1): InstanceNorm2d(256, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
            (2): ReLU(inplace=True)
          )
        )
        (1): GConvBlock(
          (layers): Sequential(
            (0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), padding_mode=reflect)
            (1): InstanceNorm2d(256, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
            (2): Identity()
          )
        )
      )
    )
# other 8 ResidualBlock, (1) thru (8), all the same
... ...
  )
  (decoder): Sequential(
    (0): GConvBlock(
      (layers): Sequential(
        (0): ConvTranspose2d(256, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), output_padding=(1, 1))
        (1): InstanceNorm2d(128, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
        (2): ReLU(inplace=True)
      )
    )
    (1): GConvBlock(
      (layers): Sequential(
        (0): ConvTranspose2d(128, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), output_padding=(1, 1))
        (1): InstanceNorm2d(64, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
        (2): ReLU(inplace=True)
      )
    )
    (2): GConvBlock(
      (layers): Sequential(
        (0): Conv2d(64, 3, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), padding_mode=reflect)
        (1): Identity()
        (2): Identity()
      )
    )
  )
)

((256 + 2*3 - 7) / 1) + 1 = 256
((256 + 2*1 - 3) / 2) + 1 = 128
((128 + 2*1 - 3) / 2) + 1 = 64

x: torch.Size([1, 3, 256, 256]), out: torch.Size([1, 3, 256, 256])

Now where? So many choices.

Training Loop

I thought I might need to write functions to calculate the losses for the discriminators and generators. But though there is some arithmetic to do, they are mostly one liners. So, I think we will just tackle the training loop next. The loop will have two sections. One to train the discriminators and one the generators. You will recall that there will be two of each. One for each direction of the cycle.

But it will definitely be a lengthy loop or two. And I am going to try something new: TQDM—to create a progress bar in the terminal window while executing the training loops. I won’t bother with the installation details or the import statement here. The docs cover all of that.

Quick test of using tqdm. Note cfg.epochs is currently $1$ and the default batch size is $4$.

  start_epoch = 0

  # train loop
  print("starting training loop")
  iteration = 0
  for epoch in range(start_epoch, cfg.epochs):
    for idx, (img_a, img_b) in enumerate(tqdm(d_ldr, desc=f"epoch {epoch + 1}")):
      iteration += 1
      if iteration == 20:
        break

(mclp-3.12) PS F:\learn\mcl_pytorch\proj6> python cyc_gan.py -rn h2z
image and checkpoint directories created: runs\h2z_img & runs\h2z_sv
starting training loop
epoch 1:  23%|████████████████▉                                                          | 19/84 [00:01<00:03, 16.49it/s]

And, if I let it run through the whole dataloader:

(mclp-3.12) PS F:\learn\mcl_pytorch\proj6> python cyc_gan.py -rn h2z
image and checkpoint directories created: runs\h2z_img & runs\h2z_sv
starting training loop
epoch 1: 100%|███████████████████████████████████████████████████████████████████████████| 84/84 [00:04<00:00, 18.28it/s]

Discriminators

For each discriminator, we will generate a fake image from the real image. Run the real and fake images through the appropriate discriminator. Then take the average loss for each image to get the total discriminator loss. Once we have the loss we will backpropagate to update the discriminator.

While writing and testing the following I ended up with a memory issue. So, I dropped the batch size to $8$ from $16$.

  for epoch in range(start_epoch, cfg.epochs):
    for idx, (img_a, img_b) in enumerate(tqdm(d_ldr, desc="epoch")):
      iteration += 1
      img_a = img_a.to(cfg.device)
      img_b = img_b.to(cfg.device)

      # Discriminators

      # train discriminator_a
      # generate fake horse image
      fake_a = genr_a(img_b)
      pred_a_real = disc_a(img_a.detach())
      pred_a_fake = disc_a(fake_a.detach())

      da_real_loss = MSE_Loss(pred_a_real, torch.ones_like(pred_a_real))
      da_fake_loss = MSE_Loss(pred_a_fake, torch.zeros_like(pred_a_fake))
      da_loss = da_real_loss + da_fake_loss

      # train discriminator_b
      # generate fake zebra image
      fake_b = genr_b(img_a)
      pred_b_real = disc_b(img_b.detach())
      pred_b_fake = disc_b(fake_b.detach())

      db_real_loss = MSE_Loss(pred_b_real, torch.ones_like(pred_b_real))
      db_fake_loss = MSE_Loss(pred_b_fake, torch.zeros_like(pred_b_fake))
      db_loss = db_real_loss + db_fake_loss

      disc_loss = (da_loss + db_loss) / 2

      # backpropagate discriminator
      opt_disc.zero_grad()
      disc_loss.backward()
      opt_disc.step()

Running the above for one epoch generated the following output.

(mclp-3.12) PS F:\learn\mcl_pytorch\proj6> python cyc_gan.py -rn h2z
image and checkpoint directories created: runs\h2z_img & runs\h2z_sv
starting training loop
epoch 1: 100%|█████████████████████████████████████████████████████████████████████████| 167/167 [00:47<00:00,  3.55it/s]

I am guessing once we add the generator training and the saving of images and/or checkpoints, it will be taking a few to several minutes per epoch of training. Okay, onto training the generators.

Generators

We have in fact already run the generators to produce fake images. So, we are basically going to calculate the total generator loss and run the backprogation. We need the three losses, adversarial, cycle and identity, for each generator. The average generator loss will be a weighted sum of those six losses. So I need a couple more global variables. Specifcially the weight (${lambda}$) for each type of loss. Though I didn’t feel like typing out lamdba for each variable. And I guess I could skip the ones defaulting to 1, but…

wt_adversarial = 1          # weight for adversarial loss when calculating total generator loss
wt_cycle = 10               # weight for cycle loss when calculating total generator loss
wt_identity = 1             # weight for identity loss when calculating total generator loss

Okay, the next part of the training loop.

Another memory issue when coding and testing the generator portion of the training loop. Reduced batch size to $4$. At least for now, lots more testing to do.

      # Generators
      pred_a_fake = disc_a(fake_a)
      pred_b_fake = disc_a(fake_b)

      # adversarial loss
      ga_loss = MSE_Loss(pred_a_fake, torch.ones_like(pred_a_fake))
      gb_loss = MSE_Loss(pred_b_fake, torch.ones_like(pred_b_fake))
      # cycle loss
      ga_cyc_img = genr_a(fake_b)
      gb_cyc_img = genr_b(fake_a)
      ga_cyc_loss = L1_loss(img_a, ga_cyc_img)
      gb_cyc_loss = L1_loss(img_b, gb_cyc_img)
      # identity loss
      ga_id_img = genr_a(img_a)
      gb_id_img = genr_b(img_b)
      ga_id_loss = L1_loss(img_a, ga_id_img)
      gb_id_loss = L1_loss(img_b, gb_id_img)
      # total generator loss, a bit messy this
      genr_loss = (
        ga_loss * cfg.wt_adversarial + gb_loss * cfg.wt_adversarial +
        ga_cyc_loss * cfg.wt_cycle + gb_cyc_loss * cfg.wt_cycle +
        ga_id_loss * cfg.wt_identity + gb_id_loss * cfg.wt_identity
      )

      opt_genr.zero_grad()
      genr_loss.backward()
      opt_genr.step()

And, I think that reduction in batch size is going to have a significant affect on the running time for each epoch of training. I stopped execution as I still want to code saving images and checkpoints during and following training. Didn’t think it was yet worth completing an epoch of training until I had that code working.

(mclp-3.12) PS F:\learn\mcl_pytorch\proj6> python cyc_gan.py -rn h2z
image and checkpoint directories created: runs\h2z_img & runs\h2z_sv
starting training loop
epoch 1:  47%|██████████████████████████████████▎                                      | 157/334 [03:55<04:25,  1.50s/it]
Traceback (most recent call last):
  File "F:\learn\mcl_pytorch\proj6\cyc_gan.py", line 216, in <module>
    genr_loss.backward()
  File "E:\appDev\Miniconda3\envs\mclp-3.12\Lib\site-packages\torch\_tensor.py", line 522, in backward
    torch.autograd.backward(
  File "E:\appDev\Miniconda3\envs\mclp-3.12\Lib\site-packages\torch\autograd\__init__.py", line 266, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
KeyboardInterrupt

Saving Sample Images and Models

PyTorch, via the torchvision.utils package, provides a method, save_image, that I am going to use to periodically save the input and output images from the two generators. It saves the input tensor to an image file. If given a mini-batch tensor, it saves the tensor as a grid of images by calling make_grid. Sounds perfect for this situation.

I wrote a utility function to save the model checkpoints. I save a few variables along with the model and optimizer states. It is similar to one written for a previous project; but, here it is anyway. Because of the way I wrote the forward method in the GaussianNoise class, I am not able to save the torchscript versions of the discriminators. So for now I am also not saving the scripts for the generators.

def sv_chkpt(run_nm, epoch, nw_model, optimizer, c_loss, g_loss, batch_sz, fl_pth):
  torch.save({
            'batch_sz': batch_sz,
            'epoch': epoch,
            'c_loss': c_loss,
            'g_loss': g_loss,
            'run_nm': run_nm,
            'model_state_dict': nw_model.state_dict(),
            'optimizer_state_dict': optimizer.state_dict(),
            }, fl_pth)

And now the code at the end of the training loop and following it looks like this. First few lines shown for reference.

... ....
      opt_genr.zero_grad()
      genr_loss.backward()
      opt_genr.step()

      if iteration % cfg.sv_img_cyc == 0:
        save_image(torch.concat((img_b * 0.5 + 0.5, fake_a * 0.5 + 0.5), dim=0), cfg.img_dir/f"{epoch}_{idx}_{iteration}_fake_a.png", nrow=4)
        save_image(torch.concat((img_a * 0.5 + 0.5, fake_b * 0.5 + 0.5), dim=0), cfg.img_dir/f"{epoch}_{idx}_{iteration}_fake_b.png", nrow=4)

  
  # training epochs complete, save last batch of images
  save_image(torch.concat((img_b * 0.5 + 0.5, fake_a * 0.5 + 0.5), dim=0), cfg.img_dir/f"{epoch}_{idx}_{iteration}_fake_a.png", nrow=4)
  save_image(torch.concat((img_a * 0.5 + 0.5, fake_b * 0.5 + 0.5), dim=0), cfg.img_dir/f"{epoch}_{idx}_{iteration}_fake_b.png", nrow=4)

  # save model states
  sv_chkpt(cfg.run_nm, epoch, genr_a, opt_genr,
           float(disc_loss.cpu().detach()), float(genr_loss.cpu().detach()),
           cfg.batch_sz, cfg.sv_dir/f"generator_a.pt")
  sv_chkpt(cfg.run_nm, epoch, genr_b, opt_genr,
           float(disc_loss.cpu().detach()), float(genr_loss.cpu().detach()),
           cfg.batch_sz, cfg.sv_dir/f"generator_b.pt")
  sv_chkpt(cfg.run_nm, epoch, disc_a, opt_disc,
           float(disc_loss.cpu().detach()), float(genr_loss.cpu().detach()),
           cfg.batch_sz, cfg.sv_dir/f"discriminator_a.pt")
  sv_chkpt(cfg.run_nm, epoch, disc_b, opt_disc,
           float(disc_loss.cpu().detach()), float(genr_loss.cpu().detach()),
           cfg.batch_sz, cfg.sv_dir/f"discriminator_b.pt")

Test

Ran one full epoch of training. GPU ran at 40-55% utilization at a temperature of 73-74°C. Here’s the terminal output. CPU and PC memory useage was fairly low. Though looked like pretty much all of the GPU memory was being used.

(mclp-3.12) PS F:\learn\mcl_pytorch\proj6> python cyc_gan.py -rn h2z
image and checkpoint directories created: runs\h2z_img & runs\h2z_sv
starting training loop
epoch 1: 100%|█████████████████████████████████████████████████████████████████████████| 334/334 [06:33<00:00,  1.18s/it]

6½ minutes for a single epoch.

Won’t bother with any of the images. There was little or no progress made in training the CycleGAN. Not a stripe to be seen in any of the the fake_b images. And nothing by stripes in the fake_a images. Expect this CycleGAN is going to take some considerable time to train.

Done

Okay, that’s enough babbling and code for one of my posts.

Until next time, do enjoy your time coding and training models.

Resources

torchvision.utils.save_image
csv — CSV File Reading and Writing
What is the Difference Between Small and Large Kernel Size?
tqdm repository on GitHub

Too Old To Code

MCL with Pytorch: Cycle GAN, Part V