What is CogVLM2?
CogVLM2 is an open-source multimodal vision-language model family based on Llama3-8B, achieving GPT-4V level performance in image and video understanding tasks.
When was CogVLM2 released?
The image models were released on May 20, 2024, with video variants and updates following in 2024-2025.
Is CogVLM2 free to use?
Yes, fully open-source with weights and code available on Hugging Face under permissive license; no fees for local use.
What are the key models in CogVLM2?
Main variants include cogvlm2-llama3-chat-19B (English), cogvlm2-llama3-chinese-chat-19B (bilingual), and CogVLM2-Video for video tasks.
What hardware does CogVLM2 require?
Int4 quantized versions run on 16GB VRAM GPUs; full precision needs more powerful hardware for optimal speed.
Does CogVLM2 support video understanding?
Yes, CogVLM2-Video processes up to 1-minute videos via keyframe extraction, leading on MVBench and VideoChatGPT benchmarks.
How does CogVLM2 compare to GPT-4V?
It matches or exceeds GPT-4V on many benchmarks like DocVQA (92.3), TextVQA (84-85), and OCR tasks while being fully open-source.
Where can I try CogVLM2 online?
Online demos available at cogvlm2-online.cogviewai.cn:7861 (image) and :7868 (video); also via ZhipuAI platform.




