May 25, 2026

Why I rebuilt my whole assistant stack on AWS Bedrock

Notes from a year of Betty / OpenClaw — multi-modal personal AI, fully self-hosted.

AI Engineering

25 de mayo de 2026 por

CLIMB IT Solutions, Inc., Manuel Bautista

For about a year I bounced between ChatGPT, Claude.ai, and the various AI features inside Cursor. They were all good. None of them were mine.

That's the gap I built Betty — my agent on top of OpenClaw — to fill.

What I actually wanted

Three things that none of the SaaS products gave me at once:

Persistent memory across sessions and modalities. When I tell my assistant something about a client on a Tuesday call, I want her to remember it on a Thursday Slack thread. The shared-state problem is solvable, but no SaaS product solves it the way I want — they default to per-context isolation for understandable privacy reasons.
Real tool use, not "agent" theatre. I wanted an assistant that could actually read my Odoo CRM, schedule into my Outlook, search my Notion notebooks, and pull from my Slack history. Not a vendor's curated approximation. Direct API integrations to my systems, owned by me.
Ingress wherever I am. I'm rarely at a desk. The assistant needs to take input from Telegram, WhatsApp, Slack, and email — and respond in the same channel. The "chat UI" of the major AI products is the wrong shape for how I work.

What I built

The simplest possible architecture that solves the three problems above.

Compute runs on a small DigitalOcean droplet — the orchestrator, ingress handlers, and tool-call routing. I deliberately did not put any model weights here.
Models run through AWS Bedrock — Claude Sonnet 4.6 as the default — with OpenRouter (Sonnet again) as the fallback if Bedrock has an issue. The Bedrock account gives me a single billing/audit surface and lets me swap models without touching the orchestrator. Embeddings are OpenAI's text-embedding-3-small via OpenRouter.
Storage is plain files on the droplet for the workspace, plus the memory store below.
Memory is SQLite with sqlite-vec — hybrid retrieval that blends BM25 text search (30%) with vector similarity (70%) over a markdown corpus (long-term MEMORY.md, daily notes, and a hand-curated persistent-notes.md). The retrieval pipeline is dumb on purpose. Anything fancier I tried hurt more than it helped.
Tools are a set of MCP-style adapters into MS365 (Graph), Odoo (XML-RPC), Notion, GitHub, and Slack. Each tool is a thin wrapper around an existing API with its own auth and its own audit log.
Ingress is Telegram (primary), WhatsApp (via wacli on the droplet), and Slack (workspace bot). Each ingress sends to the same orchestrator endpoint.

What I learned

Self-hosting beats SaaS for any AI you use heavily. Not for the cost — Bedrock isn't cheaper than a ChatGPT subscription at my volume — but for the fit. I can change anything about how my assistant works without waiting for a vendor.

Persistent memory is the killer feature. Once your assistant actually remembers what you told it three weeks ago, it stops being a search engine and starts being staff.

Tool calls are mostly boring API work. The hype around "agents" obscures a simple truth: 80% of the value comes from connecting an LLM to the three or four systems where your data already lives. I'd take that any day over a smarter model with no hands.

There's a longer technical writeup of the architecture coming. If you're building something similar, I'd like to hear about it.

— Manuel

# AI Engineering

Running a managed-services practice from an RV

What works, what breaks, what nobody warned me about.