Philip Abao
published · 2026·05·30

Kestrel

Custom text classifiers on our persistent memory architecture for 24/7 workloads.

Happy to release Kestrel, a custom text classifier on our persistent memory architecture for 24/7 workloads.

Kestrel is a different approach to an LLM. I mean, it is one — a baseline that has seen over many, many data.

It’s named after the bird. A small falcon known for its ability to hover in place while hunting.

American Kestrel

Wikipedia

Fun fact: Male and female kestrels look quite different. The males have blue-gray wings while the females are mostly reddish-brown.

We built it this way because the architecture keeps memory and compute flat no matter how long the input is. That makes it realistic to run classifiers continuously at high volume.

All results so far come from fine-tunes on simulated labeled pairs. Even with that, we are reaching 99.2% accuracy on the tasks we’ve tested once we have a full training set.

The flat resource usage is the main reason this approach looks promising for 24/7 production use.

This is especially useful for companies that need reliable chat moderation running around the clock.

Live chat stream (simulated) 4 processed
"You are literally the worst player I have ever seen. Uninstall the game."
Toxicity: High
"Thanks for the quick help, really appreciate it!"
Toxicity: Low
"This is the third time my order is late, I'm done with this garbage service."
Toxicity: Medium
"Can you help me reset my password please?"
Toxicity: Low
Memory while the stream runs · KV cache limit 8 GB
Kestrel
Flat
KV cache 2.0
Bounded
KV cache
Grows
Standard KV cache never forgets — it grows until it runs out of memory. KV cache 2.0 stays bounded with a hard window, dropping context past the cutoff. Kestrel stays bounded too, but older detail decays gradually instead of vanishing at a cutoff.
← back to writing