> ## Documentation Index
> Fetch the complete documentation index at: https://docs.tallyforagents.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Handle failures and retries

> A recovery playbook for every way a payment can fail.

Payments fail in several distinct ways and each demands a different response. This guide is the playbook: which errors to catch, which to retry, which to surface, and how to avoid double-spend when retrying writes.

## The failure surface

A `tally.payments.create()` call can fail at four different layers:

| Layer           | Looks like                                    | Retry?                            |
| --------------- | --------------------------------------------- | --------------------------------- |
| Network         | `fetch` throws, no HTTP response              | Yes, with idempotency key         |
| Tally pre-check | 4xx with structured `err.code`                | Depends on `err.code`             |
| Privy enclave   | `status: "failed"` on the returned payment    | No — the policy rejected it       |
| Chain           | Eventually `status: "failed"` after broadcast | Sometimes — depends on the revert |

The first three you see immediately. The fourth requires polling or webhooks (covered below).

## Retry decision tree

```
catch (err)
├── network failure        → retry, same idempotency_key
├── RateLimitError         → wait `Retry-After`, retry, same key
├── TallyError (internal)  → retry once with idempotency key
├── ValidationError        → fix input, don't retry
├── NotFoundError          → fix the id, don't retry
└── AuthenticationError    → branch on err.code:
    ├── amount_too_large       → ask LLM/user for a smaller amount
    ├── daily_cap_exceeded     → wait, or extend the permission
    ├── recipient_not_allowed  → fix the recipient
    ├── contract_not_allowed   → fix the contract
    ├── permission_expired     → re-grant the permission
    └── (anything else)        → real auth failure — rotate the key
```

## Idempotency is the retry primitive

Never retry a write without an `idempotency_key`. A retry without one can produce a duplicate on-chain transfer if the original succeeded but the response was lost.

```ts theme={null}
async function payWithRetry(intent: PaymentIntent) {
  const key = `${intent.invoice_id}-${intent.attempt}`; // deterministic

  for (let attempt = 0; attempt < 5; attempt++) {
    try {
      return await tally.payments.create({
        agent_id: intent.agent_id,
        wallet: intent.wallet,
        to: intent.to,
        amount_usdc: intent.amount,
        idempotency_key: key,
      });
    } catch (err) {
      if (err instanceof RateLimitError) {
        await sleep(1000 * 2 ** attempt);
        continue;
      }
      if (err instanceof TallyError && err.status >= 500) {
        await sleep(1000 * 2 ** attempt);
        continue;
      }
      throw err; // anything else — let the caller decide
    }
  }
  throw new Error("max retries exceeded");
}
```

A retry with the same `idempotency_key` returns the original payment record, no on-chain duplicate. See [Idempotency](/api/idempotency) for the details.

## Handling policy violations

The five 403/forbidden codes (`amount_too_large`, `daily_cap_exceeded`, `recipient_not_allowed`, `contract_not_allowed`, `permission_expired`) all come back as `AuthenticationError` from the SDK — the class name is awkward, but the recovery patterns are real:

```ts theme={null}
import { AuthenticationError } from "@tallyforagents/sdk";

try {
  await tally.payments.create({ ... });
} catch (err) {
  if (!(err instanceof AuthenticationError)) throw err;

  switch (err.code) {
    case "amount_too_large":
      // The per-tx max is lower than this request. Smaller amounts may work.
      return retryWithSmallerAmount(err);
    case "daily_cap_exceeded":
      // Wait until the rolling 24h window opens up.
      return scheduleForLater(err);
    case "recipient_not_allowed":
    case "contract_not_allowed":
      // Configuration / wiring bug — the destination isn't in the allowlist.
      logAndAlertOps(err);
      throw err;
    case "permission_expired":
      // The permission's expires_at has passed. User needs to re-grant.
      return promptToReGrant(err);
    default:
      // Real auth failure (invalid key, mode mismatch, etc.)
      logAndAlertOps(err);
      throw err;
  }
}
```

For an LLM-driven agent, the right move is usually to surface the error as **content** in the conversation rather than throwing — the LLM can adapt. For a deterministic worker, throw and let your job system handle it.

## Waiting for the chain

`payments.create()` returns once Privy accepts the signed RPC. The chain might still reject the transaction (gas estimate failure, recipient is a contract that reverts on receive, etc.). To know the final outcome:

### Poll

```ts theme={null}
async function waitForFinal(payment_id: string, timeoutMs = 30_000) {
  const start = Date.now();
  while (Date.now() - start < timeoutMs) {
    const p = await tally.payments.get(payment_id);
    if (p.status !== "pending") return p;
    await new Promise((r) => setTimeout(r, 2_000));
  }
  throw new Error(`Payment ${payment_id} didn't confirm within ${timeoutMs}ms`);
}
```

`payments.get()` lazily refreshes from the chain when called on a pending payment, so polling it is the right shape. Don't poll faster than every 2 seconds — you'll burn rate-limit budget.

### Webhooks

For production, replace the polling loop with `payment.confirmed` and `payment.failed` webhook subscriptions. See [Local webhook development](/guides/local-webhook-dev) for the wire-up.

## Stuck pending payments

A payment that's been pending for more than a couple of minutes is suspect. Possibilities:

* The chain is congested (rare on Base, but possible).
* Privy queued the transaction and hasn't broadcast yet.
* The dashboard's lazy-refresh hasn't fired (the background poller catches these within \~2 minutes).

Diagnosis: open the **Transactions** tab in the dashboard. The page calls `lazily-refresh-from-chain` on every load, so a stuck payment usually resolves the moment you open it. If it doesn't, check the dashboard's audit log — there'll be a `Privy outage` indicator if the underlying signer is having trouble.

## Detecting drift

Two failure modes are worth automated alerting:

* **A payment that's pending for more than 5 minutes.** The background poller (`/api/internal/poll-pending`) handles the long tail, but anything beyond five minutes deserves a look — usually a chain or Privy issue.
* **A spike in `permission_expired` errors.** Indicates a user-facing flow where a permission ran out and nothing prompted re-granting.

Both are detectable from the audit log + webhook event stream. The dashboard's Transactions tab already shows the first; the second is a custom query against your event log.

## Where to go from here

* [Build a paying agent](/guides/build-a-paying-agent) — how to fold this into an LLM loop.
* [Errors (concept)](/sdk/errors) — full class hierarchy and recovery patterns.
* [Idempotency](/api/idempotency) — the deeper dive on retry-safe writes.