Open Source - MIT License

AI agents you can
see through

The only autonomous coding agent where every decision is transparent, debated by multiple agents, and backed by persistent trust scores.

// trust is earned, not assumed

View on GitHub 📊 Live Performance Tracker
$ pip install glassbox-ai ⎘ copy

Glass box,
not black box

Every other AI agent is a black box. You give it a task, it gives you a result. You have no idea what happened in between.

GlassBox shows you everything. The reasoning chain. The debate between agents. The trust scores. The pre-declared checklist. The grading. If you can't see how the decision was made, why would you trust it?

agentic-trust-labs/glassbox-ai
- decisions = blackbox(model)
- trust = "assumed"
- oversight = None
+ agents = debate(decisions)
+ trust = earned(verified)
+ oversight = transparent(always)

Label an issue. Get a tested PR.

Four agents, zero guesswork. Every step is visible in the GitHub issue thread.

Step 01
Manager
🦉 The Strategist
Wise and watchful. Reads the issue, classifies it against known templates, and thinks before anyone codes. Generates edge cases upfront, sets confidence, writes the briefing. Doesn't touch code. Only plans.
Step 02
Junior Dev
🦫 The Builder
Heads-down and relentless. Reads every source file and every test file before writing a single line. Uses line-number editing, not string matching. Changes only what needs changing. Born to build.
Step 03
Tester
🦅 The Skeptic
Sharp-eyed and ruthless. Runs syntax checks, the full test suite, and verifies the diff is minimal. Independently validates every edge case the Manager declared. Nothing escapes the hawk.
Step 04
Pull Request
🦋 The Glasswing
The final form. Nothing hidden. Full reasoning chain, every aspect graded, every edge case checked. Beautiful, transparent, ready to fly. The whole point of GlassBox.
0
PRs Merged
0
Tests Passing
0
Agent Roles

What no other agent has

Nine capabilities. Six are unique to GlassBox. All research-backed.

💡
Full Transparency
Every decision, every reasoning step, every debate transcript - visible in GitHub comments. The reasoning IS the product.
Only GlassBox
🤝
Multi-Agent Debate
3 agents with different roles argue about the solution before shipping. Structured adversarial debate, not just retries.
Only GlassBox
🛡
Trust Scoring
Persistent trust scores (SQLite) updated via EMA. Floor 0.30, ceiling 1.00. Agents earn trust through outcomes.
Only GlassBox
📋
Self-Grading Checklist
Agent declares what it will check BEFORE coding, then grades itself after. Pre-declared accountability.
Only GlassBox
🧠
Reflexion Memory
Verbal failure reflections stored and retrieved before similar tasks. Agents learn from mistakes, not just retry.
Shinn et al. NeurIPS 2023
🐛
Failure Taxonomy
15 categorized failure modes (F1-F15) with research sources. When the agent fails, you know exactly why.
Only GlassBox
🌐
MCP Server
Works inside Windsurf, Cursor, Claude Desktop natively. No context switching. Your IDE, your tools.
Any IDE
📦
Template-Driven Fixes
4 fix templates: typo_fix, wrong_value, wrong_name, swapped_args. Line-number editing eliminates "string not found" errors.
Reliable
📜
Open Source
MIT License. Full source on GitHub. PyPI package. Inspect the code, fork it, self-host it. Trust through transparency.
MIT

How GlassBox compares

Honest comparison. Green where we lead, red where we're behind. Transparency starts here.

Capability Devin SWE-agent OpenHands GlassBox AI
Issue to PR
Multi-Agent Debate
Trust Scoring
Self-Grading Checklist
Reflexion Memory
MCP Server (any IDE)
Open Source
Multi-Language Python only
Sandboxed Execution Not yet
Speed ~5 min

Grounded in peer-reviewed research

We didn't invent these ideas. We built on them. Every design decision in GlassBox traces back to published, peer-reviewed work.

Ready to see through
your AI agent?

One command. Full transparency. Zero trust assumptions.