[D] Prior work using pixel shift to improve VAE accuracy?

Currently, I'm attempting to train up a "f8ch32" VAE
( 8x compression factor, 32 channels)

Its current performance could be rated as "better than sdxl f8ch4, but worse than auraflow f8ch16"

My biggest challenge is improving reconstruction fidelity.
Various searches, etc. suggest to me that the publically known methods for this sort of thing are mostly using LPIPS and GAN.
The trouble with these is that LPIPS can smooth too much, and GANs start making up stuff.
The latter being fine if all you want is "a sharp end result", but lousy if you care about actual fidelity to original image.

I decided to take the old training idea of "use jitter across your training image set" to the extreme, and use pixel shift to attempt to brute-force accuracy.

Specific example usage:

Take a higher resolution image such as 2048x2048.
Define some "pixel shift value". (for this example, ps=2)
Resize the high-res image to an adjacent size of (1024+2)x(1024+2)...
and then deliberately step through all stride-1 crops of 1024x1024 for that
(yielding 9 training images in this specific case)

I seem to be having some initial successs with this method.
However, now I have to play the tuning game to find the most effective weighting values for the loss functions I'm using, like l1 and edge_l1 loss.

Rather than having to continue blindly in the dark, with very limited GPU resources, I thought I would ask if anyone knows of prior work that has already blazed a trail in this area?

submitted by /u/lostinspaz
[link] [comments]

[D] Prior work using pixel shift to improve VAE accuracy?

Want to read more?

Tagged with