Back to Blog

How to install Describe Anything AI

Describe Anything AI
April 11, 2025
8 min read

Describe Anything Model (DAM) is a powerful tool developed by NVIDIA Research that can generate detailed descriptions for user-specified regions in images and videos. This guide will walk you through the installation process and basic usage of this AI tool.

What is Describe Anything AI?

Describe Anything Model (DAM) is a multimodal large language model that generates detailed descriptions for specific regions in images or videos. Users can specify regions using points, boxes, scribbles, or masks, and DAM will provide rich, contextual descriptions of those regions.

Installation Methods

Method 1: Direct Installation (Recommended)

You can install Describe Anything AI directly from GitHub without cloning the repository:

pip install git+https://github.com/NVlabs/describe-anything

Method 2: Clone and Install

Alternatively, you can clone the repository and install it locally:

git clone https://github.com/NVlabs/describe-anything
cd describe-anything
pip install -v .

Note: If you prefer not to install additional dependencies, the repository also provides a self-contained script for detailed localized image descriptions. Check the examples/dam_with_sam_self_contained.py file or use the Colab notebook.

Running the Demo

After installation, you can run the interactive demo:

cd demo
python app.py

Using the Command Line Tools

Image Description with SAM

To use Segment Anything Model (SAM) to define regions and get descriptions:

python examples/dam_with_sam.py --image_path images/1.jpg --points '[[1172, 812], [1572, 800]]' --output_image_path output_visualization.png

Or use a bounding box:

python examples/dam_with_sam.py --image_path images/1.jpg --box '[800, 500, 1800, 1000]' --use_box --output_image_path output_visualization.png

Video Description with SAM

For videos, DAM can work with just a first-frame annotation:

python examples/dam_video_with_sam2.py --video_file videos/1.mp4 --points '[[1824, 397]]' --output_image_dir videos/1_visualization

Starting the DAM Server

Image-only DAM Server

python dam_server.py --model-path nvidia/DAM-3B --conv-mode v1 --prompt-mode focal_prompt --temperature 0.2 --top_p 0.9 --num_beams 1 --max_new_tokens 512 --workers 1

Image-video joint DAM Server

python dam_server.py --model-path nvidia/DAM-3B-Video --conv-mode v1 --prompt-mode focal_prompt --temperature 0.2 --top_p 0.9 --num_beams 1 --max_new_tokens 512 --workers 1 --image_video_joint_checkpoint

Resources

License Information

Describe Anything is available under multiple licenses:

  • Code: Apache License 2.0
  • Model weights: NVIDIA Research License
  • Training Data: NVIDIA Noncommercial License
  • DLC-Bench: CC BY-NC-SA 4.0

Share this article

Related Articles