public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Introduce Sashiko (agentic review of Linux kernel changes)
@ 2026-03-17 15:31 Roman Gushchin
  2026-03-18 12:03 ` Lorenzo Stoakes (Oracle)
  2026-03-18 15:00 ` SeongJae Park
  0 siblings, 2 replies; 8+ messages in thread
From: Roman Gushchin @ 2026-03-17 15:31 UTC (permalink / raw)
  To: linux-kernel, Andrew Morton, Theodore Ts'o, Guenter Roeck,
	Konstantin Ryabitsev, Chris Mason, SeongJae Park, elkin,
	Christian Brauner, Dmitry Vyukov, Sasha Levin, Shakeel Butt,
	Lorenzo Stoakes, Sean Christopherson, Ian Rogers

Hello,

I'm happy to share something my colleagues and I have been working on
for the last several months:
Sashiko - an agentic system for Linux kernel changes.

First, Sashiko is available as a service at:
  * https://sashiko.dev

It reviews all patches sent to LKML and several other Linux kernel
mailing lists using the Gemini 3.1 Pro model.

I want to thank my employer, Google, for providing the ML compute
resources and infrastructure for making this project real.

Sashiko is written in Rust from scratch, mostly using Gemini CLI. It's
fully self-contained and does not rely on any CLI coding tools. It
supports various LLMs (at this moment mostly tested with Gemini
Pro/Flash and slightly with Claude).

And finally it's fully open-source:
  * https://github.com/sashiko-dev/sashiko

It's licensed under the Apache-2.0 License, and the ownership of the
project was transferred to the Linux Foundation. Contributions are
really welcome using DCO.

Sashiko is based on a set of open-source prompts initially developed by
Chris Mason:
  * https://github.com/masoncl/review-prompts/

But Sashiko leverages a different multi-stage review protocol, which
somewhat mimics the human review process and forces the LLM to look at
the proposed change from different angles.

In my measurement, Sashiko was able to find 53% of bugs based
on a completely unfiltered set of 1,000 recent upstream issues using
"Fixes:" tags (using Gemini 3.1 Pro). Some might say that 53% is not
that impressive, but 100% of these issues were missed by human reviewers.
Also, many of these issues (like tricky build failures, performance
problems, etc) are very hard/impossible to spot from reviewing the code,
so arguably 100% is not reachable. We started with low 30's a couple of
months ago; better models and improvements in the review protocol and
subsystem prompts pushed it to low 50's. With better LLMs and collective
effort on prompts we can push even further.

Measuring false positives is much harder, but based on manual reviews of
reviews, it's pretty good: it's rarely dead wrong, but sometimes it can
nitpick or find too many low-value issues. In many cases, it can be
improved with prompt engineering.

* What's next?

This is our first version and it's obviously not perfect. There is a
long list of fixes and improvements to make. Please, don't expect it to
be 100% reliable, even though we'll try hard to keep it up and running.
Please use github issues or email me any bug reports and feature
requests, or send PR's.

As of now, Sashiko only provides a web interface;
however, Konstantin Ryabitsev is already adding sashiko.dev support to b4,
and SeongJae Park is adding support to hkml.
That was really fast, thank you!

We're working on adding an email interface to Sashiko, and soon Sashiko
will be able to send out reviews over email - similar to what the bpf
subsystem already has. It will be opt-in by subsystem and will have options
to CC only the author of the patch, maintainers, volunteers, or send a
fully public reply. If you're a maintainer and have a strong preference
to get reviews over email, please let me know.

We also desperately need better benchmarks, especially when it comes to
false positives. Having a decent vetted set of officially perfect
commits can help with this.

Finally, some subsystems have a good prompts coverage and some don't. It
doesn't have to be lengthy documentation (and it might actually be
counter-productive), but having a small list of things to look at - some
high-level concepts which are hard to grasp from the code, etc. - can
help a lot with both bug discovery and false positives.

Thanks,
Roman

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2026-03-19 22:33 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-17 15:31 Introduce Sashiko (agentic review of Linux kernel changes) Roman Gushchin
2026-03-18 12:03 ` Lorenzo Stoakes (Oracle)
2026-03-18 18:33   ` Roman Gushchin
2026-03-18 18:50     ` Lorenzo Stoakes (Oracle)
2026-03-19 22:33       ` Roman Gushchin
2026-03-18 18:50     ` Chris Mason
2026-03-18 15:00 ` SeongJae Park
2026-03-18 18:43   ` Roman Gushchin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox