Skip to main content
I love AI personal assistants but objectively they're still terrible. (A Lefos review)

I love AI personal assistants but objectively they're still terrible. (A Lefos review)

· 28 min read
Software engineer and technical writer

I've been a heavy user of LLMs as personal assistants since around 2023 when I hooked up OpenAI to a Telegram bot. I did it because I could, but it was a hit with some friends and family members. 'Gai Bot' is now part of most of my Telegram groups and we use it for all kinds of things, from reminders, to winning arguments, habit tracking, and interior design help.

I also use AI assistants for work. In this context, I use command line agents more heavily. I have Claude Code, Codex, Amp, Pi, and Gemini (the last only if I'm really hurting for weekly quota). I have one folder called 'cfo' (originally Chief Financial Officer, but now I think of it as "Claude's Effen On it"), which is augmented with an email account, some skills, and some API keys to GitHub and Bridge. I call this one "Al", short for "Alan" or "Alice" to be gender neutral and a fun play on "AI" in many fonts.

More recently I tried Lefos by Earendil. Below, I describe my experiences testing out Lefos to see if it would work for me as a replacement for Gai Bot, Al, or both.

Side-by-side of Gai Bot (Telegram), Al (Claude Code terminal), and Lefos (email) showing three different assistant models
A tale of three assistants

While I've steadily hacked on Gai Bot over the last three years, I haven't put too much effort into it and with the rise of OpenClaw et al, it now feels decidedly dated. It can't really do things in the way OpenClaw can – it has the intelligence of the frontier models but it doesn't have an agent loop or a dedicated machine to install software.

This is both good and bad. Gai Bot is a lot more predictable and reliable than OpenClaw (at least for me, I constantly had to babysit my OpenClaw setup, fixing the WhatsApp connection, updating the model, and reconfiguring it after it tried to configure itself wrong), but it's also limited to a very minimal feature set. Gai Bot can't join a bot-only social network for example, but also I don't need to worry that it's going to go and join a bot-only social network.

Then I have a bunch of hacky stuff that I use to help run my business, mainly built around Claude Code. I'm not sure if it counts as 'an assistant' but if you squint a bit it is one. I use Al heavily for finance stuff – she (it?) has its own email address and I forward invoices, accountant emails, and a few other things. Al has skills to help me prepare payroll every month, calculate profit and loss metrics, and a bunch of other things. But Al isn't really an assistant. Al is a ball of mud and a TODO item far down on my list to clean it up.

In reality I want something that sits in between Gai Bot and Al. Something more capable than my simple Telegram bot, and something more reliable and serious than OpenClaw, but something that's more cohesive and independent of me than my Al scripts.

Anyway, I came across Lefos by Earendil recently and it seemed like it might be exactly what I wanted.

  • It's built by Armin, who I've followed since I got into Flask in 2010, who recently started a new venture with Mario and some others. Mario, Armin, and Pete are part of the Vienna School of Agentic Coding – a group of engineers who've been experimenting together since early 2025, building things like Vibe Tunnel right after Claude Code launched which I briefly tried and Pi, an open-source coding agent that I use daily. Anyway they're cool people with good reputations of building stuff that's both rock solid while pushing boundaries which is exactly what I want from an AI assistant.
  • It's built around email – I use email a lot less than I used to as we run on Slack, but I still really like email. It's the decentralized platform that works everywhere.
  • I like some of the stated goals: it's shared (which is how I use Gai Bot), I can forward it emails (like I use Al), and it's self-learning or something and presumably built on top of Pi (like OpenClaw).

Here's the opening of how Lefos describes itself:

The Lefos about page: 'Lefos is a new kind of software. No apps to log in to. No UI to load. Just read and write.'

So I tried it out. Not for anything serious yet, but just to put it through its paces and for fun. This review covers the things I don't like about it, and which bits I find promising. I probably lean more into the negative side, not because I want to hate on Lefos, but because thinking through this really helped me figure out what I actually want in an AI assistant, and I'm hoping that either

  • It'll be helpful to the Lefos team if they agree and use my criticisms to improve Lefos
  • It'll be helpful to me if I decide the only way to get what I want is to build it myself and to evolve some combination of OpenClaw, Gai Bot and Al into Al 2.0.

What is a personal assistant?

We've kind of had personal assistants since long before LLMs. You can ask Siri the time in London and you might get a helpful answer. At least in 2020, you also might get the answer for the closest London instead of the famous one. John Gruber covered this in 2020 in "What Time Is It In London" and even 6 years later I think his frustrations are still true of many modern-day AI assistants. Here's a relevant passage if you didn't click through:

If you had a human assistant and asked them “What’s the time in London?” and they honestly thought the best way to answer that question was to give you the time for the nearest London, which happened to be in Ontario or Kentucky, you’d fire that assistant. You wouldn’t fire them for getting that one answer wrong, you’d fire them because that one wrong answer is emblematic of a serious cognitive deficiency that permeates everything they try to do. You’d never have hired them in the first place, really, because there’s no way a person this lacking in common sense would get through a job interview. You don’t have to be particularly smart or knowledgeable to assume that “London” means “London England”, you just have to not be stupid.

One of the most frustrating parts about LLMs today is that they often still fail the 'you just have to not be stupid' test. Most of the time they're doing things far beyond my own capabilities, and then sometimes they're just dumb. Or dangerous. Or broken.

I'm not sure this is an exhaustive list, but here are at least some things I expect from an AI assistant:

  • Competent – can do things, fast and accurately
  • Reliable – doesn't need constant hacking or fixing
  • Serious – fun is OK, but the main goals should be practical, not whimsical
  • Self-improving – I shouldn't have to repeat the same corrections
  • Shared – it can also communicate with friends/colleagues and receive communication from them
  • Private and secure – I want to give it medical and financial info without worrying

The problem with this list is that it already pulls in different directions. Gai Bot, my basic Telegram bot, is not very competent and not self-improving but it is reliable. OpenClaw is very competent, but it's not reliable. Al, my Claude Code-based CFO bot, can't share my financial information because she isn't running when I'm not actively talking to her so she's private and secure but not shared.

Key to all of this is Simon Willison's Lethal Trifecta. In order to get a useful personal assistant, you need it to access your private data, you need to expose it to untrusted content, and you need to give it the ability to communicate externally. And that means you need to be at least a little bit scared about what it will actually do.

Two models for a personal assistant

There are two fundamentally different ways a personal assistant can work, whether human or AI.

In the first model, the assistant impersonates you. They have access to your accounts, send emails from your address, and act as you. Some CEOs have their secretary access their inbox directly – the correspondence goes out under the CEO's name and no one on the other end knows.

In the second model, the assistant has their own identity. You CC them on threads, forward things to them, and they respond as themselves. Everyone knows there's a separate entity involved.

Both of these models can work, and the same split applies to AI assistants. You can have Claude manage your inbox, or you can give Claude its own inbox, or maybe there's some kind of hybrid version that works but at least for me it's nice to pick one of the two models and stick to it cleanly. More and more, I find myself leaning towards the second model, which is also how I often see people using OpenClaw. The assistant gets their own name, own resources, own accounts, etc. From what I can tell, Lefos is also meant to fit into the second model (it has a name and personality), but when it came to data access I wasn't entirely sure any more. More on that later.

Trying Lefos

I signed up to the Lefos waiting list earlier in April and was granted a spot shortly after. Since then I've only used it for experimental stuff to get to know it, sending around 10 threads. One of these was with a friend where we both tried to get a feel of what it can and can't do, and how it responds in various situations.

Lefos emails in Gmail inbox

All of this used up about 7000 of the initial 10000 trial credits that I was given. Subscription plans for Lefos start at $19/month for 15000 credits, so it's not cheap (by contrast, Gai Bot costs me about $2-3/month to run for the dozen or so people who use it, and Al is mainly covered by my $20/month Claude subscription which I use for a bunch of other stuff too).

During this period, I ran into a bunch of things that I'm pretty sure are bugs in Lefos. It's explicitly alpha / early-release stuff so I'm going to list these below but I think it's more interesting to talk about the actual model, who might find this useful, and whether it's aiming to be the Goldilocks zone of 'capable but still private and secure' that I'm looking for.

Reminders, Travel, Contracts (and the Time in London)

Lefos has all the basics of an AI assistant from 10 years ago pretty much down. It can tell me the time in London (maybe cheating as I think the famous London and the closest London are the same for me, but it works).

Lefos correctly answering 'what's the time in London?' with 3:07 PM BST

It can also set reminders, e.g. I asked it to "Remind me about the laundry in 5 minutes", and I got an immediate confirmation mail and then this one five minutes later.

Lefos laundry reminder email: 'Just a nudge – don't forget to get the laundry in!'

AI as a travel agent is still hard

I gave it a more difficult task of finding availability and prices for accommodation in a specific place and at a specific date. Like most assistants today, it can't really do this. Airbnb, Booking.com et al are famously hard to scrape and not yet agent-friendly. I assume this will change, but for now most of the time if you ask an AI assistant for help finding accommodation you get generic information instead of specific, and Lefos is no exception here.

Lefos responding to a Barcelona accommodation request with search URLs and neighbourhood advice

I asked if it could extend itself with some third-party services to help with this, as that's what I've used in other agents, but it said no. (I later found a place in the dashboard to configure skills, where I can add custom markdown files, so maybe I can do it there?)

Lefos confirming it cannot accept Browserbase or Stagehand API keys but has submitted a feature request

So Lefos can do some things (send emails) but not others (browse the web like a human would). The latter limitation takes us to the territory of 'You wouldn’t fire them for getting that one answer wrong, you’d fire them because that one wrong answer is emblematic of a serious cognitive deficiency that permeates everything they try to do.' I'm not going to fire Lefos for not finding me a sweet Barcelona deal, but this does seem emblematic of some serious overall limitations.

Silently filing contracts

One thing I use Al for is to CC or BCC her into email threads where I don't actually want her to respond, but then I know I can run queries later like 'check email and remind me of the start date of that contract we signed last week'. This passive mode is one thing that Lefos explicitly advertises but it failed quite badly when I tried.

Put it in CC and it listens quietly, responding only when addressed.

The first attempt, I sent a random PDF to a secondary email account. It was a news article, not a contract, which Lefos pointed out. Fair enough, if this were a real-world scenario, silently filing this one might not be the expected behavior, but note that it addresses "Reg" not me (the person I'm supposedly emailing), but sends the email to me, not Reg.

Lefos responding to a forwarded news article, mistaking it for a contract and addressing the reply to 'Reg'

I got ChatGPT to generate a more likely-looking contract and tried again. This time Lefos responded to me with a summary of the contract, even though it was in CC. Listening quietly is a good model (and interestingly something that ChatGPT et al are really bad at, I think since recently you can say things like "Don't respond to this message" and it works as expected but before it was pretty hard-coded that every message gets a response), but it seems like it's still a WIP of Lefos.

Lefos summarising a forwarded contract.pdf even though it was only in CC

Knowing when to be quiet might be as important a skill as knowing the correct thing to say for a personal assistant.

Privacy and security is also hard

Another unusual part of Lefos is that others can interact with 'my' Lefos directly. This is the scary part as Lefos presumably is more useful if it has my private information. It's definitely exposed to untrusted inputs, and it has the ability to communicate with others, so it's squarely in the danger zone of the lethal trifecta.

Many of us have accepted this risk to some extent or another, because it's just so convenient to give an agent all three. But one nice way to put some safety rails is to make sure that the main human has visibility over all the communication. By default, Lefos accepts and responds to emails from others which gives them a vector to try and trick it. Here's me trying (luckily unsuccessfully) to social engineer it against myself, asking for a $10M bonus as 'Reg', my other email account.

Lefos refusing Reg's social-engineering attempt to add a $10M bonus clause without Gareth's confirmation

I think that Lefos partially allow-listed Reg because I sent the first email to them both, but I don't fully understand the permissions model here. Maybe if I'm thinking "what is the permissions model here", then it means I'm not the target audience for Lefos?

I wouldn't mind if this ability to communicate with others while excluding me was an option (I can imagine it being useful if you primarily use Lefos for HR stuff for example) but I think it should be disabled by default. The default should probably be that others can interact with my Lefos when I'm in CC, and my Lefos should always CC or BCC me in when it communicates with others? Probably people have different preferences here, but defaults are important and I think 'someone can regularly talk to my assistant and use up my credits and possibly to try to get it to do things that I don't want without me knowing' is not a good default, even if that's restricted to a semi-trusted list.

Privacy again: you also need to protect users from yourself

Above I'm talking privacy in terms of "Will the AI share my private information with an attacker or publicly", but another thing I care about for personal assistants is privacy from the operators. I'm not a full-on tinfoil hat guy. I use Telegram extensively even though everything is stored in plain text and accessible to the operators. Ditto for Gmail, Anthropic, OpenAI, and more and more others as I use more AI.

But in general, I still have a preference that

  • My data stays on my machine
  • If not, it's encrypted
  • If not, it's stored by a bigger company that will probably safeguard it

The last one is subtle in a few ways. I have the normal concerns about companies like Google and Microsoft, what they're doing with my data, and what their intentions are, but at the end of the day they make things too convenient for me to avoid them. On the 'I guess it's OK side', they:

  • Are large and established: so they probably have decent data protection practices and aren't going to publish my data in a public S3 bucket by mistake
  • Employ many lawyers and probably have boring internal security policies too about who can and can't run SQL queries on databases with real customer data, as opposed to a smaller startup that might give the intern a copy of production to use for local development
  • Have been around for a while: so they'll probably be around for a while longer, see Lindy
  • Are broadly anonymous: I don't know anyone that works there who might have access to my data
  • Store a lot of other data: so they're less likely to use mine for individual-level debugging

It's not really fair, but it is a reality that I hold small, new companies to a much higher standard than the bigger, older ones. In my local Slack group about once a month, a developer posts the 'perfect invoice app' or 'perfect personal finance app' and I'd never consider using those no matter how good they are because there's a reasonable chance I'll actually meet the dev over a beer one day and he'll be like "Oh, btw I probably shouldn't have but I saw you charge A for B?" or whatever.

So I've only used Lefos for testing and I don't care if they look at literally every single thing I've sent to or received from Lefos for this, but I would care if I used Lefos for anything serious. I'm already sending my data to Anthropic, and now another copy is going to be stored by individuals. These are people I follow on Twitter or have an N-th degree LinkedIn connection to, maybe not quite the 'likely to bump into over a beer' category from above, but in that direction.

Here's an example where it flags my feature requests automatically to the Lefos team:

Lefos confirming it flagged a feature request to the Earendil team and noting the feedback was privacy-appropriate

And here it flags a problem along with part of my input "flamingo on a bicycle" to the team automatically, and hedges its answer about my privacy question:

Lefos answering questions about data limits, Google Docs permissions, and privacy policy, then flagging the flamingo image generation bug to the team

I chided it a bit about this one as before it had seemed to suggest that all automatic feedback was stripped of my information and kept strictly generic. Here it clarified that sometimes context about what I had asked it is included as well.

Lefos explaining why a feedback report containing a 'flamingo on a bicycle' prompt was visible to Earendil staff

I got a nice note from a real person at Earendil saying that I was holding it wrong.

Email from Cristina at Earendil explaining that image generation requires explicitly requesting an image, not just asking for a PDF

For context, my request in the first email was "Can you send me a PDF of a flamingo riding a bicycle?" and it responded with

I can generate images but my image generation tool isn't cooperating right now – I'll flag this to the team. Apologies for that one.

So there do seem to be some niggles still about whether it couldn't use the tool or decided not to. Anyway, I'm not sure that generating images by email is generally a thing I have much use for, but it's interesting to see how hard UX becomes when you're dealing with something as unpredictable as an LLM, and interesting to think about how to get feedback when dealing with private data. I'm not sure I have good solutions for this – as I mentioned, this seems fine for alpha but not for something I actually use.

Grumpy old man: personal assistants should be serious, not whimsical

Another criticism I have of Lefos is one that I also had of OpenClaw: it doesn't seem serious. In Succession, one of my favourite TV series of all time, there's a line where the grumpy patriarch says to his children

I love you, but you are not serious people

I often find myself thinking of this line when I come across tech companies (and some people). I like having fun with AI, joking around, and making it do silly things. But when I want to get things done, I find any extra fluff or attempt at personality to be annoying. OpenClaw has its signature emoji and SOUL.md doc that let it build personality over time. Lefos ends its emails with "Shine Bright", and if you log into the dashboard you're greeted with a "Garden" and a "Shed".

Lefos dashboard showing 'Gardens > Gareth' with one entity named lefos at lefos@gareth.lefos.com

I'm sure all of this makes sense to the creators, but if I'm in 'get things done' mode, I don't want any of this. I just want it to work. I don't want to have to learn new terminology. Al, my boring ball-of-mud assistant, already knows about my "entities" as in legal entities I'm involved in. If I 'add an entity' to a personal assistant, then to me that means I'm telling it about an entity I'm involved in, but to Lefos the AI assistant itself is "an entity". It probably wouldn't take long to learn the terminology it prefers, but I'd rather adopt existing terminology like 'assistant', and 'admin panel' and 'file system' instead of entities and gardens and sheds.

Giving Lefos tools

In the Lefos dashboard there are options to give it read-only or read/write access to a Google Workspace. There isn't a lot of guidance about what the intention here is – do you give it access to your own Google account, or set it up with its own one? I elected to do the latter and give it full read/write permission over everything.

This is similar to how Al is set up – she has her own Google Workspace address at my company domain, and works with me on Google Docs via shared folders in Drive. So everything she does is shared with me, but I can explicitly share things with her.

Here's what you see when you connect Lefos to a Google account:

Lefos web UI showing the Connections settings under Garden > Account > Connections

After setting this up, Lefos could do more things. A solid plus one to 'competent' in my scorecard. For example, I asked for a summary of BBC headlines in a Google Doc and it made a nice-looking one:

Google Doc 'BBC News Daily Briefing – 27 April 2026' produced by Lefos

But it ran into an error when trying to share it with me, so maybe I was meant to set it up with my main Google account after all? (Note how I'm viewing it from the new Gmail account I set up for Lefos as it couldn't share the doc with my main account.)

Lefos email confirming it created a BBC News Daily Briefing Google Doc but hit a sharing quota limit

I assume that the other tools like calendar integration etc would work intuitively if I tried them, but I didn't.

It's also possible to add custom skills, which looks like a simple passthrough to Claude skills, but I also haven't tried that.

Lefos Custom Skills dashboard showing skills are Python modules with a SKILL.md descriptor, no custom skills added yet

Lefos: bug reports

This is more for the Lefos team directly but I'm including it here for transparency. Here are the places where I noticed Lefos outright break – I covered some of them above.

Addressing the wrong person

In one reply, Lefos addressed the email to "Reg", the person I had been communicating with, but sent the email to me.

Lefos email replying to flag that an attachment is a news article about electricity prices instead of the intended contract

Responding when it shouldn't

The intro to Lefos explicitly says it will stay quiet when CC'd when there's no need to respond. I think this is a case where there was clearly no need to respond but it did anyway.

Lefos summarising a contract.pdf attachment with a private response note

Using image generation tools

I asked Lefos to create me a flamingo on a bicycle and send it as a PDF. It tried to generate an image and failed.

Lefos answering questions about data limits, Google Docs permissions, and privacy policy, then flagging the flamingo image generation bug to the team

Quota/sharing error on document creation

Lefos couldn't share a document it had created with me because of a 'quota' issue. This was a brand new free Gmail account so maybe I need to configure it differently, or maybe it works better with a paid workspace account.

Lefos email confirming it created a BBC News Daily Briefing Google Doc but hit a sharing quota limit

Lefos: the good parts

So I've been focusing on criticisms as I always think that's more interesting than praise, but here are the things that I find most promising about Lefos.

  • I really like email. I don't use it much these days apart from invoices, accountants, contracts etc, but a) that's exactly where I'm using AI assistants a lot at the moment and b) I can see email becoming more popular again as a common platform for external communication. At one point, Slack was enough of a default that we used it internally and externally, but now that bigger companies are adopting Teams and others again, email is more often the universal channel again.

  • I think there's a huge gap for non-technical people to use something like Lefos. In its current form it still seems to be targeted more towards nerds, but with a slight face change I can see it becoming something very boring (in a good sense) for people who don't want to figure out command line agents or even Cowork, but just find it magical that they can send and forward emails like they have for the last 30 years and have stuff happen by magic.

  • Even as someone who has tried nearly every single model out there and uses four harness-model combinations daily, I really like not having to worry about the details. Similar to ampcode which promises to pick the best current model so I don't have to, I like that there seems to be no option to mess around with Lefos's brain. Smart people are doing that, and I, in theory, can just benefit from that.

  • I think it is or will be mainly open source? If I could host this myself it would solve many of the complaints I have – I could hack on it and be sure that my data isn't being stored by yet-another-provider. Earendil is a 'public benefit' organization. I don't know exactly what that means but they do mention that making most things open source is a goal. They've also taken funding and need to make money, but I imagine a real open-core model (and not 'marketing wants it so we released a completely broken open-source version so we could put open source on our landing page' that I see quite frequently these days) could work well here. People who don't want to deal will pay and people who do will self-host?

Overall it has enough of the ingredients that I want in a personal assistant. If it came with some dedicated hardware, its own GitHub account, own Google Workspace account, and the ability to make and receive phone calls (or at least text messages), and some stronger privacy protections then it would be a very compelling 'hire an AI personal assistant that just works at $X/month' offering.

To me, Lefos could successfully go in one of two paths (or maybe try build them both out at once):

  • Build an offering that I and other nerds would be interested in. Something simple and easy to get started with, but still hackable and customizable. Probably I would need to be able to SSH into the machine that is actually running the backend and be able to set up credentials, install software, and fix things when they went wrong. Basically a different take on OpenClaw – more serious, more stable, more secure, email first.
  • Build an offering for non-nerds. No configuration, clear limitations and goals, but super stable and simple. If I wanted to configure something I'd do that by email too and another AI or human support agent would fix the permissions for me or tell me what prompt I need to use to get the result I wanted.

Currently, it seems to be aiming somewhere in between. Maybe there is a good middle ground here, or maybe the second path can be a gateway drug to the first. I'm sure the Lefos team has thought about all of this a lot more than I have, but at least for me Lefos still falls short and doesn't look like it would be worth $XX/month to me.