What I Learned Building OpenClaw Skills

I've built over a dozen OpenClaw skills now — checkmate, alpha-scout, trip-planner, and others. Some are published on Clawhub. Some worked on the first try. Some broke in embarrassing ways. The failures were more interesting.

Here's what I keep coming back to. Most of it applies beyond OpenClaw, but this is where I learned it.

0. Create skills liberally

Any task where you worked with your agent through extended iteration — and might want to repeat — should become a skill.

Context clears. Conversations end. The patterns you worked out, the prompts you refined, the edge cases you discovered — all of it disappears unless you capture it somewhere permanent.

A skill is that permanent record. It's the accumulated result of your iterations, not just instructions for the next run. Every time you improve it, you're compounding. Every time you don't write it down, you're starting over.

1. LLMs are workers, not loops

The most common mistake: asking an LLM to "keep trying until it works." It won't.

LLMs drift. They lose count. They convince themselves things are done when they aren't. You can't implement reliable retry logic in a prompt.

The fix is boring but it works: loops and retries are code. Call the LLM when you need judgment — research, writing, evaluation. Everything else is a while loop.

2. The context wall will silently kill your quality

Multi-phase skills have a hidden failure mode. Phase 1 writes 8k tokens of output. Phase 2 adds 6k more. By the time your synthesizer runs, it's reading 40k tokens of prior work — and quietly getting worse.

No error. No warning. Just degraded output.

The pattern that helps: have every worker write a KEY FINDINGS block at the top of its output — 3-5 lines summarising what matters. Instruct the orchestrator to read that first, only go deeper if needed. Also: pass file paths to workers, not file contents. Let each agent read exactly what it needs.

3. The filesystem is the canonical state

Workers write to disk. Orchestrators read from disk. This sounds obvious until you've lost 30 minutes of work because the output only existed in an agent's reply.

File-based state means tasks are resumable (re-run skips completed steps), recoverable (partial output beats no output), and auditable. Context compaction stops being scary.

4. Workers go silent and it costs you

A worker finishes and returns "done." Now what? The orchestrator has to re-read the entire SKILL.md to figure out what comes next.

Every completion message should include three things: what happened, where the output is, what to do next. Three lines. Every time. When something fails, include how to recover.

5. Check before you start

This applies to skills that follow the orchestrator pattern — where a coordinator spawns multiple workers for a long-running job. Before any workers spin up, make the first phase a quick sanity check. Spend 15 seconds verifying the task is valid and hasn't already been done.

Does the target already exist? Did this run yesterday with the same input? Is a required config missing? Half of wasted runs are avoidable with a sanity check you write once.

6. A judge with a goal beats a judge with a checklist

Some skills use a judge pattern — a separate agent that evaluates another agent's output and decides whether it passes. Give that judge a 10-point checklist and workers will game it: technically passing every criterion while missing the point.

Give a judge a goal statement — "what does success look like?" — and it reasons holistically. The goal catches things a checklist never would. We shipped this in checkmate. The difference is real.

7. Treat workers like API calls

Defined inputs. One job. Defined outputs. If you can't describe a worker that way, it's doing too much.

This constraint forces the decomposition that makes skills reliable. It also makes the orchestrator simple — call worker, check for output file, move on. When a worker starts making decisions about what to do next, that's a sign to split it.

The underlying ideas apply to any agent skill system. OpenClaw is just where I stress-tested them.

— Shiwei Song · @insipidpoint