From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from out-183.mta0.migadu.com (out-183.mta0.migadu.com [91.218.175.183])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9671343DA21
	for <linux-kernel@vger.kernel.org>; Thu, 19 Mar 2026 22:33:51 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.183
ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1773959634; cv=none; b=LQ8oNgX3isAI7icpf0A4/4pI2IT7VZA/WogJInZ2wZL7+PjCpta0qgHDqbIJFD9kokV8R4eVe/CA1L1M/x6Geqoyg2NnSyDe54szMSlC7uOr2hp5e+7UaAGAZ4upWdZLlIyv6CXQlNY1HQInLqr9LxwPt9G8AWK9BAH79xS4gPc=
ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1773959634; c=relaxed/simple;
	bh=24Hq0PefarVqSMlOqhN4FJtHOSJwMmt4JvqdnYeerj4=;
	h=From:To:Cc:Subject:In-Reply-To:References:Date:Message-ID:
	 MIME-Version:Content-Type; b=QzdKArr2t2yLO6cKufeMsrDRy6nvdgU6wBKzedvwWo2TTdpBDxU/dJiHWPNKL03MIUO1Pmy9NDdzbTW5ELyGEAimIjC8znc+hAGmLmqVAWES+OPQ9X1J+xwC1P2Q/47pnsVo9utIgpWqs05v1KwrPp4pfgA7BxFiyLESqbIOZ/w=
ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=GjiHrVju; arc=none smtp.client-ip=91.218.175.183
Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev
Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="GjiHrVju"
X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers.
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1;
	t=1773959629;
	h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
	 to:to:cc:cc:mime-version:mime-version:content-type:content-type:
	 in-reply-to:in-reply-to:references:references;
	bh=dg8iSNQRNBPo0TgxQTceQgVtBmkxEPVnSd2EPQ6bDMw=;
	b=GjiHrVjuLNV99LdzStFB2AyDnwH04IjF7wIfPr4mn8WHG9qXFhngs8hQnlORun+Ci+wJlo
	uK1DMSRjgtx4y4lrIdBZa2NGxuF/B3U4prGKnZ0DBetN/QixEIpMT0VYjHb3d1DPBK6PEx
	IvTs1f8nH8lKpKOQFoWgEktV6RNPe18=
From: Roman Gushchin <roman.gushchin@linux.dev>
To: "Lorenzo Stoakes (Oracle)" <ljs@kernel.org>
Cc: linux-kernel <linux-kernel@vger.kernel.org>,  Andrew Morton
 <akpm@linux-foundation.org>,  Theodore Ts'o <tytso@mit.edu>,  Guenter
 Roeck <linux@roeck-us.net>,  Konstantin Ryabitsev
 <konstantin@linuxfoundation.org>,  Chris Mason <clm@meta.com>,  SeongJae
 Park <sj@kernel.org>,  elkin@google.com,  Christian Brauner
 <brauner@kernel.org>,  Dmitry Vyukov <dvyukov@google.com>,  Sasha Levin
 <sashal@kernel.org>,  Shakeel Butt <shakeel.butt@linux.dev>,  Lorenzo
 Stoakes <lorenzo.stoakes@oracle.com>,  Sean Christopherson
 <seanjc@google.com>,  Ian Rogers <irogers@google.com>
Subject: Re: Introduce Sashiko (agentic review of Linux kernel changes)
In-Reply-To: <34630bb5-840b-4a99-8e19-51fd4fc8ba96@lucifer.local> (Lorenzo
	Stoakes's message of "Wed, 18 Mar 2026 18:50:27 +0000")
References: <7ia4o6kmpj5s.fsf@castle.c.googlers.com>
	<39e6b4d2-8a30-4eaa-908d-5d11b746f8d5@lucifer.local>
	<87v7etugwd.fsf@linux.dev>
	<34630bb5-840b-4a99-8e19-51fd4fc8ba96@lucifer.local>
Date: Thu, 19 Mar 2026 15:33:38 -0700
Message-ID: <87jyv7a1q5.fsf@linux.dev>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Type: text/plain
X-Migadu-Flow: FLOW_OUT

"Lorenzo Stoakes (Oracle)" <ljs@kernel.org> writes:

> On Wed, Mar 18, 2026 at 11:33:22AM -0700, Roman Gushchin wrote:
>> "Lorenzo Stoakes (Oracle)" <ljs@kernel.org> writes:
>>
>> > On Tue, Mar 17, 2026 at 03:31:11PM +0000, Roman Gushchin wrote:
>> >> Hello,
>> >>
>> >> I'm happy to share something my colleagues and I have been working on
>> >> for the last several months:
>> >> Sashiko - an agentic system for Linux kernel changes.
>> >>
>> >> First, Sashiko is available as a service at:
>> >>   * https://sashiko.dev
>> >>
>> >
>> > ...
>> >
>> > (For one I'm going to go fix some bugs on my series I saw reported there).
>> >
>> > I think over time as the approach/model is refined this will get a LOT
>> > better, it seems these things can acelerate quickly.
>>
>> Hi Lorenzo,
>>
>> Thank you for kind words!
>
> No problem, thanks for your hard work! :)
>
>>
>> RE false positives: I think Chris's prompts were initially heavily
>> biased towards avoiding false positives, but it comes at the cost of
>> missing real issues (in general, I don't have hard data on % of findings).
>> Now he also is looking to relax it a bit, to my knowledge.
>> But then there are different models in use, different protocols, etc.
>>
>> I also have a notion of issue severity and I was thinking about
>> e.g. sending out only reviews revealing critical & high severity bugs
>> (e.g. memory corruptions & panics). Or maybe send the feedback to the
>> author in any case (e.g. for fixing typos), but cc maintainers only if
>> there are serious concerns.
>>
>> And obviously no pressure, I won't enable any public email sending
>> unless there is a consensus across maintainers of the corresponding
>> subsystem.
>
> I think maybe an opt-in thing might work for some of us?

Absolutely, I think with mm we can start with replying to the author and
a dedicated list of volunteers.

> But yeah we can take our time with this, Andrew is looking, I am for
> sure.

Thank you!

>
> Oh and one data point -
> https://lore.kernel.org/linux-mm/cover.1773846935.git.ljs@kernel.org/
>
> Read the v3 change log for a list of the issues it correctly raised for that
> series, so it's definitely useful.
>
> It was about maybe 50/50 noise/signal I think?
>
> But as you can see that's already very useful thank you and has fixed a
> bunch of bugs in that codde!
>
> I'm not sure what Chris is planning, and I keep not going to the AI
> meetings for various reasons (other stuff clashing/away/tired sometimes :)
> but I wonder how we will sync up with Chris's review bot experiments?

So as Chris said, we're syncing regularly and actively thinking how to
organize it. I think we both want to share as much stuff as possible.

The hard part is that we can't easily test each others setup and it's
all very brittle. Initially I tried to use Chris's prompts directly with
only minimal changes, but it was hard to keep Sashiko stable. Plus the
new multi-stage protocol improved the discovery rate by almost 10%,
which was hard to ignore.

My current thinking (and things evolving quickly, so I might have a
different opinion in a couple of weeks) is that we need to separate
per-subsystem knowledge, make sure it's not containing any imperative
instructions or llm/tools specifics and share it completely. We can move
it to a separate repo or even put into the kernel tree, it's all
debatable. In a way, these prompts should be owned by subsystem
maintainers more than anyone else.

Then there are things which can be shared, but are not subsystem-specific.
E.g. an instruction on how to assess issue severity.

And then there is a specific review protocol, which significantly
depends on the tooling and LLM being used. This part is hard to share,
but also it's the place where a lot of experimentation is happening,
so maybe it's fine to have multiple tools. And they might be optimized
for different use cases: e.g. for personal development it might be
beneficial to have a live interaction with llm on the review material
(someone already asked me about this); but for sashiko.dev's mass review
case I do care a lot about the stability and token efficiency.

>> >>
>> >> * What's next?
>> >>
>> >> This is our first version and it's obviously not perfect. There is a
>> >> long list of fixes and improvements to make. Please, don't expect it to
>> >> be 100% reliable, even though we'll try hard to keep it up and running.
>> >> Please use github issues or email me any bug reports and feature
>> >> requests, or send PR's.
>> >
>> > Of course, it's all much appreicated!
>> >
>> >>
>> >> As of now, Sashiko only provides a web interface;
>> >> however, Konstantin Ryabitsev is already adding sashiko.dev support to b4,
>> >> and SeongJae Park is adding support to hkml.
>> >> That was really fast, thank you!
>> >
>> > Thanks to Konstantantin and SJ too but the web interface is pretty nice I
>> > must say so thanks for that! :)
>> >
>> >>
>> >> We're working on adding an email interface to Sashiko, and soon Sashiko
>> >> will be able to send out reviews over email - similar to what the bpf
>> >> subsystem already has. It will be opt-in by subsystem and will have options
>> >
>> > Like I said, I think it's a bit premature for mm at least _at this point_
>> > but I'm sure it'll get there.
>>
>> I'd really appreciate (and actually need) yours and other maintainers and
>> developers feedback here. Even though I can't fix every single false
>> positive as a code issue, I can hopefully tackle some common themes.
>
> Is there a way for us to point out which parts of a review are signal and
> which are noise?

Not yet. I think answering emails is the easiest part and I plan to
teach Sashiko to recognize these answers and analyze them. Maybe Sashiko
can even adjust it's own prompts in a (semi)-automatic way, Idk.

>
> If you could update the web interface for feedback that'd be really handy,
> though I guess there's the painful stuff of having to have users and
> etc. for that :)

Yeah, I'm afraid we might end up trying to build a new JIRA this way...

>
>>
>> Chris did a fantastic work on the bpf subsystem (and several others) by
>> manually analyzing replies to the AI feedback and adjusting prompts. Now
>> we need to repeat this for all other subsystems.
>
> Yeah, I'm happy to feedback if there's a fairly low friction way of doing
> it, but constant workload makes it hard if it requires much more
> effort :)

Can't agree more :)

>
>>
>> >
>> > For now I think we need to get the false positive rate down a fair bit
>> > otherwise it might be a little distracitng.
>> >
>> > But people are _already_ integrating the web interface into workflows, I
>> > check it now, and Andrew is already very keen :) see:
>> >
>> > https://lore.kernel.org/all/20260317121736.f73a828de2a989d1a07efea1@linux-foundation.org/
>> > https://lore.kernel.org/all/20260317113730.45d5cef4ba84be4df631677f@linux-foundation.org/
>> >
>> >> to CC only the author of the patch, maintainers, volunteers, or send a
>> >> fully public reply. If you're a maintainer and have a strong preference
>> >> to get reviews over email, please let me know.
>> >
>> > Well as maintainer I think 'not quite yet' but probably soon is the answer
>> > on that one!
>> >
>> >>
>> >> We also desperately need better benchmarks, especially when it comes to
>> >> false positives. Having a decent vetted set of officially perfect
>> >> commits can help with this.
>> >
>> > Not sure perfect commits exist in the kernel certainly not mine :P
>>
>> Same here :) This is why it's so hard.
>
> Yes, but worthwhile! LLMs are surprisingly good at figuring out issues in
> things, it's a real strength.
>
> And it's already improving the code.
>
>>
>> >
>> >>
>> >> Finally, some subsystems have a good prompts coverage and some don't. It
>> >> doesn't have to be lengthy documentation (and it might actually be
>> >> counter-productive), but having a small list of things to look at - some
>> >> high-level concepts which are hard to grasp from the code, etc. - can
>> >> help a lot with both bug discovery and false positives.
>> >
>> > I guess best contributed to Chris's review-prompts repo right?
>>
>> Both works for me now, we'll figure out with Chris how to sync our
>> prompts. The small problem is that we're using various models, tools and
>> review protocols and barely can test each other's setup. And it's all
>> very fragile, so it's not exactly trivial.
>> But we'll figure out something soon.
>
> Yeah, part of the fun I guess :)
>
>>
>> In general we need to carefully separate instructions (like which tools
>> to use, which prompts to load etc) from factual data. Then we can easily
>> use the factual data with various tooling around.
>
> Hopefully I find some time to contribute some mm-specific stuff too :)

Awesome, waiting for it!

Thanks!