Marigold repurposes Stable Diffusion for dense prediction tasks such as monocular depth estimation and surface normal prediction, delivering a level of detail often missing even in top discriminative models.
Key aspects that make it great:
– Reuses the original VAE and only lightly fine-tunes the denoising UNet
– Trained on just tens of thousands of synthetic image–modality pairs
– Runs on a single consumer GPU (e.g., RTX 4090)
– Zero-shot generalization to real-world, in-the-wild images
https://mlhonk.substack.com/p/31-marigold
https://arxiv.org/pdf/2505.09358
https://marigoldmonodepth.github.io/
