Marigold – repurposing diffusion-based image generators for dense predictions

Marigold repurposes Stable Diffusion for dense prediction tasks such as monocular depth estimation and surface normal prediction, delivering a level of detail often missing even in top discriminative models.

Key aspects that make it great:
– Reuses the original VAE and only lightly fine-tunes the denoising UNet
– Trained on just tens of thousands of synthetic image–modality pairs
– Runs on a single consumer GPU (e.g., RTX 4090)
– Zero-shot generalization to real-world, in-the-wild images

https://mlhonk.substack.com/p/31-marigold

https://arxiv.org/pdf/2505.09358

https://marigoldmonodepth.github.io/