What is GLM-4.7?
GLM-4.7 is Zhipu AI’s latest flagship open-weight LLM, released December 22, 2025, with major upgrades in agentic coding, reasoning, tool use, and UI generation via interleaved/preserved thinking modes.
When was GLM-4.7 released?
It was officially released on December 22, 2025, with weights on Hugging Face and API/chat access shortly after.
Is GLM-4.7 open-source?
Yes, open-weight with full model available on Hugging Face (zai-org/GLM-4.7) and ModelScope for local deployment under permissive license.
What are GLM-4.7’s key benchmarks?
It achieves 73.8% on SWE-bench Verified, 66.7% SWE-bench Multilingual, 41% Terminal Bench 2.0, 87.4% τ²-Bench, 42.8% HLE with tools, and strong math scores like 95.7% AIME 2025.
How to access GLM-4.7?
Use free chat at chat.z.ai (select model), API via z.ai or OpenRouter (usage-based), or download weights for local run with vLLM/SGLang.
What is the context window for GLM-4.7?
It supports 200,000 tokens context with up to 128K-131K output tokens, ideal for long codebases and multi-turn tasks.
Does GLM-4.7 require a subscription?
Free chat access available; coding agents/tools need GLM Coding Plan from $3/month; API is pay-per-use starting around $0.60/M input tokens.
How does GLM-4.7 compare to competitors?
It leads open models in many coding/agent benchmarks and competes closely with Claude Sonnet 4.5 and Gemini 3.0 Pro in reasoning/tool use, often at lower cost.

GLM-4.7


About This AI
GLM-4.7 is the latest flagship large language model from Zhipu AI (Z.ai), released on December 22, 2025, representing a major upgrade over GLM-4.6 with focus on advanced programming, stable multi-step reasoning, and agentic execution.
Built as an approximately 358B-400B MoE (Mixture-of-Experts) architecture, it features interleaved thinking (reasoning before responses/tool calls), preserved thinking (retaining reasoning across multi-turns), and turn-level thinking control for balancing latency and accuracy.
It excels in multilingual agentic coding, terminal tasks, UI/web development aesthetics, complex math/reasoning, and tool invocation, achieving top open-source results on key benchmarks like SWE-bench Verified (73.8%), SWE-bench Multilingual (66.7%), Terminal Bench 2.0 (41%), τ²-Bench (87.4%), HLE with tools (42.8%), and strong math scores (95.7% AIME 2025).
Supports 200K token context window (up to 131K-128K output), high inference speed (55+ tokens/s), and deep integration with coding agents/tools like Claude Code, Kilo Code, Cline, Roo Code.
Available via Z.ai chat interface (free access with GLM-4.7 selection), API (usage-based pricing starting around $0.60/M input tokens), OpenRouter, Hugging Face/ModelScope weights for local deployment (vLLM/SGLang support), and coding-specific subscriptions starting at $3/month for enhanced quotas in dev tools.
Positioned as a top open-weight alternative for developers, researchers, and enterprises needing reliable coding assistance, agentic workflows, and high-quality reasoning without proprietary lock-in.
Key Features
- Interleaved Thinking: Reasons before every response and tool call for better instruction following and quality
- Preserved Thinking: Retains full reasoning chains across multi-turn conversations to reduce loss in long-horizon tasks
- Turn-level Thinking Control: Enable/disable reasoning per turn to optimize latency and cost
- Elite Agentic Coding: Strong multilingual, terminal-based, and multi-file software engineering performance
- Superior UI/Frontend Generation: Produces clean, modern webpages and slides with accurate layouts
- Advanced Tool Use: Top open-source scores on interactive tool invocation and web browsing benchmarks
- Complex Reasoning Boost: Major gains in math, science, and graduate-level questions
- 200K Token Context: Supports long documents, codebases, and extended conversations
- High Inference Efficiency: 55+ tokens per second with MoE design for fast responses
- Integration with Coding Agents: Native support in tools like Claude Code, Kilo Code, Cline, Roo Code
Price Plans
- Free Chat ($0): Basic access to GLM-4.7 via Z.ai chat interface with usage limits
- GLM Coding Plan ($3/Month starting): Enhanced quotas and integration in coding agents/tools like Claude Code, Cline
- API Usage-based ($0.60/M input, $2.20/M output tokens approx.): Pay-per-use via Z.ai API or OpenRouter for developers
- Pro/Enterprise (Custom): Higher limits, priority, dedicated support for heavy or business use
Pros
- Top-tier open-weight coding: Leads open models on SWE-bench, Terminal Bench, and agent benchmarks
- Stable long-horizon execution: Thinking mechanisms enable reliable multi-step agentic workflows
- Competitive reasoning: Strong math/science scores rivaling or beating many closed models
- Accessible deployment: Open weights on Hugging Face/ModelScope for local use, plus free chat/API options
- Cost-effective coding plans: Low-cost subscriptions unlock high quotas in dev tools
- Multilingual strength: Excellent agentic coding across languages
- Rapid inference: Balanced speed and quality for production use
Cons
- High parameter count: 358B-400B MoE requires substantial hardware for full local inference
- API pricing: Usage-based costs can add up for heavy users
- Coding plan separate: Full agent/tool quotas in third-party tools require additional subscription
- Limited free tier depth: Chat free access may have rate limits; advanced features paid
- Knowledge cutoff: Not explicitly stated but typical for 2025 releases (likely mid-2025)
- Setup for local: Requires vLLM/SGLang expertise and powerful GPUs
- No native multimodal: Focus on text/coding; vision/tool use strong but not emphasized
Use Cases
- Agentic software engineering: Multi-file code generation, debugging, and terminal automation
- Frontend/UI development: Creating modern webpages, slides, and visual prototypes
- Complex math/science reasoning: Solving advanced problems with step-by-step thinking
- Tool-using agents: Web browsing, interactive task execution, and workflow orchestration
- Code review and refactoring: Analyzing large codebases with long context
- Developer productivity: Integrating into IDEs or agents for real-time assistance
- Research prototyping: Testing multi-step reasoning and agent behaviors
Target Audience
- Software developers and engineers: Needing strong coding and agentic support
- AI researchers: Experimenting with frontier open-weight models
- Dev teams: Using in production for code generation and automation
- Students and educators: Learning advanced programming and reasoning
- Startups and enterprises: Cost-effective alternative to closed APIs for dev workflows
- Coding tool users: Subscribers to Claude Code, Cline, etc. wanting GLM-4.7 power
How To Use
- Chat interface: Visit chat.z.ai, select GLM-4.7 from model picker, start prompting
- API access: Sign up at z.ai, get API key, integrate via docs.z.ai/guides/llm/glm-4.7
- Local deployment: Download weights from Hugging Face (zai-org/GLM-4.7), run with vLLM/SGLang
- Coding agents: Subscribe to GLM Coding Plan, use in supported tools like Kilo Code or Cline
- Enable thinking: Prompt with 'think step-by-step' or use turn-level controls in API
- Long context: Upload large code/files or extend conversations up to 200K tokens
- Best results: Use detailed prompts, enable preserved thinking for multi-turn tasks
How we rated GLM-4.7
- Performance: 4.8/5
- Accuracy: 4.7/5
- Features: 4.8/5
- Cost-Efficiency: 4.6/5
- Ease of Use: 4.5/5
- Customization: 4.7/5
- Data Privacy: 4.6/5
- Support: 4.4/5
- Integration: 4.7/5
- Overall Score: 4.7/5
GLM-4.7 integration with other tools
- Z.ai Chat Interface: Direct selection of GLM-4.7 in chat.z.ai for instant use
- Z.ai API: Full programmatic access with thinking mode support for custom apps
- OpenRouter: Available on OpenRouter for easy integration with multiple models
- Coding Agents/Tools: Native support in Claude Code, Kilo Code, Cline, Roo Code, OpenCode
- Local Frameworks: vLLM and SGLang for high-performance self-hosted deployment
Best prompts optimised for GLM-4.7
- Act as a senior full-stack developer. Build a complete modern dark-mode responsive portfolio website in HTML/CSS/JS with animated sections and contact form. Think step-by-step, preserve reasoning across turns.
- Solve this graduate-level math problem from AIME 2025: [insert problem]. Use interleaved thinking, explain each step clearly, and verify answer.
- You are an expert agent. Use tools to research and summarize the latest advancements in quantum computing as of today, then generate a slide deck outline.
- Fix this buggy Python codebase for a web scraper: [paste code/files]. Identify issues, propose fixes, and output corrected version with explanations.
- Generate a professional business presentation slide deck on AI ethics in 2026, including key points, visuals suggestions, and speaker notes.
FAQs
Newly Added Tools
About Author