
How to install Describe Anything AI
Describe Anything Model (DAM) is a powerful tool developed by NVIDIA Research that can generate detailed descriptions for user-specified regions in images and videos. This guide will walk you through the installation process and basic usage of this AI tool.
What is Describe Anything AI?
Describe Anything Model (DAM) is a multimodal large language model that generates detailed descriptions for specific regions in images or videos. Users can specify regions using points, boxes, scribbles, or masks, and DAM will provide rich, contextual descriptions of those regions.
Installation Methods
Method 1: Direct Installation (Recommended)
You can install Describe Anything AI directly from GitHub without cloning the repository:
pip install git+https://github.com/NVlabs/describe-anything
Method 2: Clone and Install
Alternatively, you can clone the repository and install it locally:
git clone https://github.com/NVlabs/describe-anything cd describe-anything pip install -v .
Note: If you prefer not to install additional dependencies, the repository also provides a self-contained script for detailed localized image descriptions. Check the examples/dam_with_sam_self_contained.py file or use the Colab notebook.
Running the Demo
After installation, you can run the interactive demo:
cd demo python app.py
Using the Command Line Tools
Image Description with SAM
To use Segment Anything Model (SAM) to define regions and get descriptions:
python examples/dam_with_sam.py --image_path images/1.jpg --points '[[1172, 812], [1572, 800]]' --output_image_path output_visualization.png
Or use a bounding box:
python examples/dam_with_sam.py --image_path images/1.jpg --box '[800, 500, 1800, 1000]' --use_box --output_image_path output_visualization.png
Video Description with SAM
For videos, DAM can work with just a first-frame annotation:
python examples/dam_video_with_sam2.py --video_file videos/1.mp4 --points '[[1824, 397]]' --output_image_dir videos/1_visualization
Starting the DAM Server
Image-only DAM Server
python dam_server.py --model-path nvidia/DAM-3B --conv-mode v1 --prompt-mode focal_prompt --temperature 0.2 --top_p 0.9 --num_beams 1 --max_new_tokens 512 --workers 1
Image-video joint DAM Server
python dam_server.py --model-path nvidia/DAM-3B-Video --conv-mode v1 --prompt-mode focal_prompt --temperature 0.2 --top_p 0.9 --num_beams 1 --max_new_tokens 512 --workers 1 --image_video_joint_checkpoint
Resources
Official Links
Model & Datasets
License Information
Describe Anything is available under multiple licenses:
- Code: Apache License 2.0
- Model weights: NVIDIA Research License
- Training Data: NVIDIA Noncommercial License
- DLC-Bench: CC BY-NC-SA 4.0