Super-resolution of Sentinel-2 images (10M –> 5M)

30 points by mixtape2025-1 5 days ago

Sentinel 2 images are not exactly lined up for different revisits of the same spot. There are minute, yet perceptible subpixel offsets. If there is sufficient aliasing in the system, it should be theoretically possible to extract information from multiple visits. However the linked repo doesn't appear to do that.

DoctorOetker 2 days ago

pff making up details X2 in both directions... could at least have done real synthetic aperture calculations...

curiousObject 2 days ago

The image sensor samples different light wavelengths with a time offset of about 250ms, as the satellite moves over the Earth.
I think that means it could be possible to enhance the resolution by using luminance data from one wavelength to make an ‘educated guess’ at the luminance of other wavelengths. It would be a more advanced version of the kind of interpolation that standard image sensor cameras do with a bayer color filter
So it seems possible to get some extra information out of the system, with a good likelihood of success, but some risk of point hallucinations.
The image sensor and filters are quite complex. Much more complicated than a simple bayer filter CCD/CMOS sensor. It is not AFAIK a moving filter, but a fixed one, however the satellite is obviously moving.
I don’t know if the ‘Super-Resolution’ technique in the OP is taking advantage of that possibility though. I agree it would be disappointing if it’s just guessing —- although perhaps a carefully well-trained ML system would still figure out how to use the available data as I’ve suggested.
the optical Multi-Spectral Instrument (MSI) samples 13 spectral bands: four bands at 10 m, six bands at 20 m and three bands at 60 m spatial resolution
Due to the particular geometrical layout of the focal plane, each spectral band of the MSI observes the ground surface at different times.
https://sentiwiki.copernicus.eu/web/s2-mission
I’m making some guesses, because I don’t understand most of the optics and camera design which that ESA page describes. For instance if anyone can explain why there’s a big ~250ms offset between measuring different light wavelengths, despite the optics and filters being fixed in place immobile relative to each other? Thank you.
The time per orbit is about 100 minutes. Sun-synchronous orbit.
Actually there are 3 satellites. The constellation is supposed to be 2, there’s currently a spare one as well. But the orbits are very widely separated, supposed to be on opposite sides of the planet, so I don’t know how much enhancement there could be from combining the images from all the satellites. And don’t know if the OP’s method even tries that.
Anyway, the folks at ESA working with Sentinel-2/Copernicus must have already thought very hard about anything they can do to enhance these images, surely?
Edit: The L1BSR project which is linked to from the OP git page does include ‘exploiting sensor overlap’! So I assume it really is doing a process similar to what I’ve suggested
RF_Savage 2 days ago

Yeah...

Brajeshwar a day ago

The topic is interesting to us as we (especially my co-founder) have done lots of research in this area, both for adapting existing and inventing new Super Resolution methods for satellite images. We kinda discussed this in detail yesterday.

Btw, we have a demo of the result of the enhancement we achieved about 2 years ago at https://demo.valinor.earth

Looking at this implementation, the noise artifacts likely stem from a few sources. The RCAN model normalizes input by dividing by 400, which is a fixed assumption about Sentinel-2’s radiometric range that doesn’t account for atmospheric variability or scene-specific characteristics. Plus, working with L1B data means you’re enhancing atmospheric artifacts along with ground features - those hazy patterns aren’t just sensor noise but actual atmospheric scattering that gets amplified during super-resolution.

Over the past 2 years, we’ve hit several walls that might sound familiar:

- Models trained on clean datasets (DIV2K, etc.) completely fall apart on real satellite imagery with clouds, shadows, and atmospheric effects.

- The classic CNN architectures like RCAN struggle with global context - they’ll sharpen a building edge but miss that it’s part of a larger urban pattern.

- Training on one sensor and deploying on another is impossible without significant degradation.

Some fixes we’ve found effective:

- Incorporate atmospheric correction directly into the SR pipeline (check out the MuS2 benchmark paper from 2023).

- Use physics-informed neural networks that understand radiative transfer.

- Multi-temporal stacking before SR dramatically reduces noise while preserving real features.

For anyone diving deep into this space, check out:

- SRRepViT (2024) - achieves similar quality to heavyweight models with only 0.25M parameters.

- DiffusionSat - the new foundation model that conditions on geolocation metadata.

- The L1BSR approach from CVPR 2023 that exploits Sentinel-2’s detector overlap for self-supervised training.

- FocalSR (2025) with Fourier-transform attention - game changer for preserving spectral signatures.

Also worth exploring is the WorldStrat dataset for training, and if you’re feeling adventurous, the new SGDM models claiming a 32x enhancement (though take that with a grain of salt for operational use).

The real breakthrough will likely come from models that jointly optimize for visual quality AND radiometric accuracy. Current models excel at one or the other, but rarely both.

If you interested in these topics, we would love to connect. We are at brajeshwar@valinor.earth and amir@valinor.earth