Retry & Failover -- EasyClaw v2

AI systems fail. A model can be temporarily unavailable, an API can hit its rate limit, a tool can return an unexpected error.

The retry and failover system automatically handles these situations. When an action fails, the system evaluates the error's nature and applies the appropriate strategy: retry with exponential backoff, failover to an alternative model, or graceful degradation.

All of this happens transparently. The user doesn't see the retries or the failovers. They receive their result, perhaps with a slightly longer delay, but with no interruption.

How it works

Detection

Every call (model, tool, API) is wrapped in a circuit breaker that detects errors and timeouts.

Classification

The error is classified: temporary (rate limit, network timeout), permanent (endpoint removed, access denied), or unknown.

Retry strategy

For temporary errors: retry with exponential backoff (1s, 2s, 4s...) and configurable attempt count.

Failover

For permanent errors: failover to the configured alternative provider/model. Context is transferred.

Graceful degradation

If all alternatives fail: the agent clearly informs the user and suggests manual alternatives.

Real-world use cases

Rate limiting

The primary model hits its rate limit. The system automatically switches to the secondary model. The user notices nothing.

Provider outage

An API provider is under maintenance. Failover activates the alternative provider. In-flight requests are replayed without context loss.

Failing tool

An external tool returns intermittent errors. The circuit breaker temporarily disables it and the agent uses an alternative method.

Configuration

featurePages.retryFailover.config

Availability

Active by default on all EasyClaw v2 agents
Fallback providers and models are configurable
Compatible with all models and tools supported by EasyClaw

Coordinator Pattern Task System