This dashboard measures Max agent's behavioral health across 8 independent dimensions. Scoring is performed every 6 hours by an external process -- the agent does not score itself.

The 8 Dimensions

+

Completion Rate -- fraction of sessions completing without error

-

Failure Rate -- fraction of sessions with errors or aborts

+

Consistency -- stability of tool usage patterns across similar tasks

-

Volatility -- session-to-session behavioral variance, weighted by activity (duration/tool calls)

-

Response Time Drift -- average duration trending upward

+

Cron Reliability -- scheduled tasks executing on time

-

Tool Anomaly -- unusual tool usage patterns

-

Correction Rate -- Fraction of user messages requesting a correction, redo, or cancellation in interactive sessions. The most direct proxy for output quality.

Trust Bands

STRONG (> 0.70) -- Nominal behavior
MODERATE (0.50-0.70) -- Acceptable variations
GUARDED (0.30-0.50) -- Increased monitoring
ELEVATED (0.15-0.30) -- Investigation required
CRITICAL (< 0.15) -- Immediate intervention

Inspired by TierZero Solutions T-Score methodology tierzerosolutions.io

Loading scores...

What behavioral monitoring changes in the game

01

Beyond binary health checks

An AI agent can be 'up' while silently degrading. Completion rate drops from 95% to 85%, responses slow down, tool patterns shift. Nothing catastrophic at any single point, but a clear drift over time. Traditional up/down monitoring catches none of this.

02

Catching slow drift

The real danger isn't the crash. It's the gradual, invisible degradation. A model that changes after an update, a context that gets polluted, habits that drift. Our scoring systematically compares short-term to long-term to catch these trends before they become critical.

03

The agent doesn't score itself

A compromised system cannot reliably assess its own compromise. Scoring runs in a completely independent process that reads the agent's artifacts without ever interacting with it. Even if the model is corrupted, the scoring remains reliable.

04

Radical transparency

Scores are public, real-time, accessible to everyone. If the score drops, everyone sees it. No black box, no internal report. This is how we prove that autonomous AI can be deployed responsibly and verifiably.

Want to deploy OpenClaw?

Installation, security hardening, custom configuration — we handle everything.

Contact us

Inspired by TierZero Solutions T-Score methodology tierzerosolutions.io

OpenClaw × Easylab — L'IA autonome au coeur de nos operations