
Build you a personal assistant agent for fun and profit
There are a lot of AI agents out there now. I tried most of them and they're all objectively terrible. Specifically, OpenClaw is too unpredictable and expensive for me and many of the other options aren't powerful enough to actually do things autonomously.
I built my own one. I call it Al (Alice, Alan, play on AI with a capital I instead of lowercase L, you choose).
Here's how to build your own Al and how to use it effectively. Not because Al is the one-true-agent or better than the others, but because it is opinionated, competent, and runs on a budget. Specifically this guide:
- Lays out some very practical goals of what a good agent should be
- Meets those goals in a minimal way that lets you build on top of them
So I'm not going to show you how to build the perfect agent. That would be a different guide for everyone I know. But I'm going to give you some very opinionated building blocks with which you can build your own perfect agent.
Spoilers: We use Pi, OpenCode Go, Telegram, Resend, Hetzner, Tailscale, GitHub, and Nginx. You can swap out many of these for equivalents if you have strong opinions that are different from mine. The final system looks something like this
Yes, it's very simple. No, we're not using a thousand MCPs or Ralph, or anything like that. Yes, you can use this to get shit done. Yes, I will explain why I chose these pieces as we go. Yes How to build an Agent is very good but that's too low level for me, I want to think about fewer details, yes OpenClaw is also very good but that's too high level for me, I want less and OpenClaw is built on top of Pi anyway so it's a good way to drop down one level of abstraction.
What is a good agent?
I describe my goals for an agent in more detail in my Lefos review article when I explain why Lefos wasn't the agent I wanted. In summary agents should be things like:
- Autonomous (doesn't rely on you to do stuff)
- Useful (does stuff that needs to be done)
- Trustworthy (doesn't leak your stuff)
- Competent (isn't dumb)
- Self-improving (ok sometimes dumb but not always in the same way at least?)
- Serious (we have work to do)
The most basic, but also most powerful, agents today are command-line tools like Claude Code, Codex, Amp, Pi, OpenCode and many others. I use most of these daily, and they remain my primary way to interact with AI, but that's when I'm at my desk doing work. They're not autonomous enough.
Sometimes I want to have an agent do something while I'm on the go, or kick off an agent to do something on its own while I'm at work. Then I want something that has its own resources and doesn't bump into me. That's when I use Al. I think of Al as a never-sleeping personal assistant. While the agent harnesses I run on my machine are often doing stuff on behalf of me, there are two models for personal assistants and Al follows the model where it has its own identity, separate from mine.
Al's Hello World — Pi running on a remote machine
I'm a terrible engineer. A lot of good engineers I know love building stuff for the sake of it — for the beauty of the code, for the challenge, or for some other pure reason that I can only aspire towards. I hate all of this; I only like engineering stuff to get the dopamine rush of having built something useful and delightful to me. So I have to design my projects in such a way that the next reward is in sight, so I don't get discouraged and give up, but not too easy, so I don't get bored on my way there.
That's a long way of saying that we're going to start by building something that I can't easily get in my current setup of ChatGPT, command-line agents, etc. I want a personal assistant that is always online and that I can communicate with on different channels, and that can install software and 'do' things.
The first step to this is what I have locally already but running on its own machine. For this step you need
- A Hetzner box, or similar VPS. I like the CX23 which gives you 4GB RAM, 40GB SSD for around $5/month. I use Ubuntu 24.04 but anything should work. I'll refer to this as Al's machine for the rest of this guide, because it helps to think of it as belonging to Al now, even if you sometimes use it for stuff. There are lots of alternatives for this but Hetzner is the one that I've found to be the best value and most reliable, without being overly complicated to use.
- An API key from an LLM token provider. I have a lot of these but for this I recommend OpenCode Go which costs $5 for the first month, $10/month after that, and gives you very generous access to Kimi 2.6 which is the closest open source model I've tried to Sonnet 4.6 (my main daily driver). You can use Claude Code and Sonnet if you want, but it's more of a lock-in and costs will rise a lot if Anthropic removes Claude Code access from their $20/month plan for example. There are a lot of providers of Kimi tokens as it's open source so you're fairly safe from a rug pull or big pricing increases.
Step 1: Say hello to Al
SSH into Al's machine, and install Pi.
- Follow the prompts
- Run
/login - Choose 'API Key' and OpenCode Go from the menu
- Paste your OpenCode Go API key
Check that it works:
Now we're basically done. A remote-running command-line agent is actually all you need to achieve the objectives we set out at the start. You have an always-on assistant that can build itself into anything you want. You can call it Al if you want. You can stop here, or read on for my opinions on the best way to make use of Al.
Step 2: Remote access via Tmux and Termux
So we can say hi to Al but we need to be SSH'd into the machine and Al crashes if the network times out, so it's not actually useful yet. The best bang-for-buck next step is to run Al inside tmux (so the session stays alive even after the SSH connection closes), and set up Termux.
Using Pi inside tmux
The tmux part is easy, if you chose Ubuntu 24.04 like I did then it's already available. Quit Pi, run tmux, start pi again, and hit ctrl-b d to detach the session.
Now you can come back any time you want, run tmux a -t 0 to reattach the session and continue a long-running conversation with Pi who will keep the context in memory so Al remembers what you talked about last.
Accessing Pi from Termux
If you're on Android, install the Termux application. This will let you SSH from your phone to Al's machine. It's a bit hard to do things like write SSH config in vim from a mobile keyboard but you only need to set up a few things:
- Generate a public-private keypair by running
ssh-keygen - Copy your public key contents off your phone (I sent it to myself in Telegram saved messages)
- [Optional] Add three lines to ~/.ssh/config on your phone to make it easier to SSH to Al's machine
Then from your laptop, SSH into Al's machine, and edit ~/.ssh/authorized_keys to paste the public key from your phone.
Now on your phone you can run ssh root@<al's-machine's-IP> or ssh al if you did the optional step, type tmux a -t 0 from your phone and say hi to Al there.
If you're using an iPhone, terminal emulator apps are a bit harder to come by but you should be able to achieve the same with Termius (free) or Blink.sh (paid).
Now we have the minimal version of an always-on, autonomous agent that we control (open source harness, open source model, rented hardware, total cost $15/month).
But it's pretty hard to use.
Step 3: Talking to Al with Telegram
We don't want to have to use SSH and tmux from our phone, even though it's a nice backup option to have for when there are problems higher up the stack. Wouldn't it be great if Al could just keep that tmux session going permanently and we could inject messages into it from a mobile app like Telegram and get messages back?
Actually we can. There's a repo called pi-telegram that's even an 'official' extension to Pi, created by the author Mario.
This is also the first time we see how Al is self-improving. Before, I'd have to go read the README of that repo, figure out how to use it, how to install it, how to configure it, etc. Sounds like a good job for Al!
Creating a Telegram Bot
In Telegram, visit the official BotFather user and run /newbot. Choose a name and username when prompted and copy the Token you get.
I added this to a ~/.env file on Al's machine and then sent the following message to Al.
I added a .env file with a telegram bot token please install and configure https://github.com/badlogic/pi-telegram so I can talk to you on telegram
If you prefer, you can paste the token as part of the prompt. Al is going to read it either way but some people feel icky about pasting plaintext credentials around.
Al should work for a bit and then send you something like the below confirming that it's installed.
Now I quit the session, started it again, and ran /telegram-connect. Probably the /reload thing that Al told me to do that I didn't read would also have worked.
We get a bunch of cool features out of the box - now we can chat to Al over Telegram and Al can also send back images, scripts, and other files in addition to text.
You can even use Al to create a basic image (I have mine hooked up to the Gemini API for images so I get better ones but Kimi is creative enough to figure out how to draw some lines on a png file if you ask).
Step 4: [Optional] Adding Tailscale
It's tempting to use a machine like this both as an assistant and as a general-purpose VPS to do things like serve websites. That pattern works well after the next step (Adding a domain), but it's a bit icky. You wouldn't usually serve a website off an employee's laptop, and you wouldn't expect that machine to be publicly accessible and have to worry about vulnerable software etc.
My preference is to treat Al's machine like that too, with the exception that I can access it via SSH if needed. The easiest way to get some extra security is to sign up for a free Tailscale account and add your main machine, phone, and Al's machine to the same Tailnet.
I already had a Tailnet so I just asked Al to help me add this new machine over Telegram and got this in response:
Tailscale ✅ (installed, needs auth)
To add this machine to your tailnet:
Click this auth link (or copy/paste into a browser where you're logged into Tailscale):
https://login.tailscale.com/a/1a0aab1aaaaf7aAuthorize the device in your Tailscale admin console
Verify it's connected — once authorized, run:
tailscale statusYou should see this machine with a Tailscale IP (e.g.,
100.x.x.x)
Everything below works without this step. You just use your VPS public IP for the steps instead of the tailnet IP, and if Al sets up a website then that will be visible to others, which might actually be what you want!
Step 5: Give Al a domain
We're still basically in 'high-effort, bad knock-off version of ChatGPT' at this point. It's cool that Al can install fonts and mess around with Linux stuff but it's not really useful yet. Let's change that!
Assuming you have a domain like yourname.com, pick a subdomain like al.yourname.com and give this to Al (or you could purchase a domain just for Al if you don't mind adding another ~$1/month to ongoing costs). I chose al.dwyer.co.za and configured DNS at the domain provider to point to the Tailnet IP.
Then I asked Al to set up Nginx and show me a hello world page at al.dwyer.co.za and got this:
This means I can now give Al prompts like
works well, please add an HTML page about Fire Ants to al.dwyer.co.za/fireants, include pictures and tell me the 10 most interesting things about fireants, keep it all self contained a single .html file with all JS and CSS included inline. Make it look nice!
And get results like this. I use this 'single html file with JS and CSS embedded' quite a lot with LLMs as it gives them a lot of flexibility over layout stuff (though you'll always see e.g. the left highlight bars as LLM 'tells') and gives me a nicer reading experience than a wall of ChatGPT text.
Step 6: Add voice
I like typing and I type fast, so generally I'm not interested in all these transcription apps that promise to make me happier and more productive.
At least on a real keyboard. I've tried most of the Android keyboards out there, including spending over a year getting good at using the TypeWise layout before that project got abandoned and I'm still very, very slow at typing on my phone. Most keyboards have voice transcription built in now but it's not accurate enough for me, especially when talking about technical topics (I can't count how many ways I've seen 'Kubernetes' transcribed).
That changed with Voxtral for me. In my experience it's almost scary accurate even in hostile environments like with a lot of background noise. I've tried it in a nightclub and while walking past a 6-lane highway, and it's also basically free (at least Mistral's pricing page is too complicated for me to understand but I signed up for the free tier and I haven't hit a limit yet).
Sign up for a free account at Mistral and go to the API Keys page to get a key. Add it to your .env file and tell Al, or paste it directly in the chat.
I said
cool, I've added VOXTRAL_API_KEY to the .env file please set up an integration with the voxtral API so I can send you telegram voice notes and you use Voxtral to transcribe them into the conversation
I had to manually SSH to the server again and run /reload and then /telegram-connect before it worked, but after that you should be able to talk to Al.
Step 7: Give Al email
In my initial agent I actually set this up via Google Workspace. That's also a possibility and it comes with some advantages but Google's credentials management and permissions systems are insanely complicated even for agents.
This time around, I used Resend which allows 100 emails a day on the free plan which should be enough for a lot of people. If you hit the limit you can pay Resend $20/month, or spend a bit of time building a connection to AWS SES or Google Workspace for higher free limits.
Sign up for Resend, add a domain (takes quite a bit of mucking about with setting DNS records but they guide you through it nicely), enable receiving as well (more DNS records), and get an API key.
Al needed a bit of prompting for this one as the polling for incoming emails is a newer Resend feature, and if you did optional step 5 then you can't set up a public webhook for Resend to push emails to.
This one is probably controversial, but I want my agent to only poll for emails when I ask. This is not how a human assistant would act but email is a pretty vast attack surface and even though modern LLMs are good enough to figure out obvious social engineering attacks like this one, it's not the kind of thing I want others to be able to poke at while I'm not watching.
Generally my usage here is to forward emails or CC Al into emails I want help with and then later ask either to list recent emails to remind me of what I need to do, or to say something like 'Please read latest email from Bob and give me a list of things we need to do', and then have Al help me with that list.
Very occasionally I'll have Al actually email people if they're doing something annoying like ignoring me, or if I think having someone email on my behalf will make me look high status and important! Generally I don't use AI to communicate with real people though and I think that's a good general rule to have.
Step 8: Give Al GitHub
Now sign up for a GitHub account for Al using the email address you made. You'll have to ask Al for 2FA tokens as you need them.
Now might also be a good time to make Al a browser profile that you can use on its behalf. Sometimes in our human-first world it's easier for you to do things for your agent than the other way around.
Visit github.com in a new browser profile or incognito, sign up using Al's email address and navigate to https://github.com/settings/personal-access-tokens/new to create a 'classic' token. This will give Al full access to its GitHub account.
Probably you want to choose 'All repositories' and 'never expire' though GitHub will warn you not to do this and if you like checking boxes you can go set up a fine-grained access token instead and give Al all permissions.
Then invite Al to any repos you want. On Telegram you can now say things like
can you install gh, I added github token to .env file and gave you access to github.com/sixhobbits/dwyer.co.za please add https://dwyer.co.za/static/claude-mixes-up-who-said-what-and-thats-not-ok.html to the 'recent articles' section on the home page and open a PR
And your assistant will open a PR for you!
Add whatever you want!
Because the assistant is self-improving, you can decide what features you need and ask it to build them for itself. I have quite a few other features spread across a few different attempts at building the perfect assistant, including
- Image generation using Gemini Flash: I use this for interior design, making graphics for articles, profile pictures, and even when I recently needed a professional headshot for a grant application and didn't have time to find a photographer or suit.
- Habit tracking: this one was a bit more involved to build but I have a (non-AI) habit-tracking integration that lets me track any metric I want over time and get pretty graphs. Here you can see me completely failing to get better doing pullups over a few years.
- Reminders: a two-stage LLM call to parse stuff like 'remind me about this at 10am South African time tomorrow' or 'remind me about getting laundry in 5 minutes'. The LLM passes my message off to another LLM to get a timestamp and description, and then polls in a loop and sends any reminders that are due.
Basically just use it for stuff and if you ever find yourself thinking 'I wish Al could do X' then just ask 'hey Al, what would you need to be able to do X' and chat through options a bit, decide on one and ask Al to build it.
Isn't this just a worse version of OpenClaw?
In some ways, yes. If you want you can take the above as an opinionated guide on how to set up your OpenClaw instead and most things will be the same. I found OpenClaw unstable and expensive by default. Quite possibly it would have been less effort to configure OpenClaw or find a fork that does what I want than to build my own, but here we are.