In this notebook I look at factorizing large kernels into successive smaller ones. In particular, I look at a specific $7\times 7$ image filter that I want to implement as a convolution of two $3 \times 3$ filters. I talk about nonlinear optimization and show how to do this kind of problem in pytorch.
Check it out here.