
NVIDIAβs Describe Anything (DAM-3B) is here! A 3B parameter vision-language model that enables smart image & video captioning using focal prompts and cross-attention backbones. Just mark a region β point, box, or scribble β and get localized, natural-language descriptions! Now available on Hugging Face. π¬β¨
Posted on: 13/05/2025