What is GLM-OCR?
GLM-OCR is a multimodal OCR model from Z.ai for complex document understanding, excelling at text, formula, table recognition, and structured extraction with SOTA performance.
Is GLM-OCR free to use?
Yes, it’s fully open-source under MIT license with weights and code on Hugging Face; local inference is free, while optional hosted API may have costs.
What benchmarks does GLM-OCR lead on?
It scores 94.62 on OmniDocBench V1.5 (ranking #1 overall) and achieves state-of-the-art in formula, table recognition, and information extraction tasks.
How fast is GLM-OCR inference?
It processes 1.86 PDF pages/second and 0.67 images/second (single replica), making it highly efficient for production use.
How many parameters does GLM-OCR have?
GLM-OCR has approximately 0.9 billion parameters, balancing high accuracy with reasonable compute requirements.
What deployment options does GLM-OCR support?
It runs via Hugging Face Transformers, Ollama (easy local), vLLM/SGLang (high-throughput serving), and optional Z.ai hosted API.
How many languages does GLM-OCR support?
It supports 8 languages for document processing (exact list not detailed in model card).
Who developed GLM-OCR?
GLM-OCR was developed by Z.ai (Zhipu AI), with open-source release on Hugging Face in early 2026.




