From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from out-172.mta1.migadu.com (out-172.mta1.migadu.com [95.215.58.172])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id EC7CF2D7D42
	for <linux-kernel@vger.kernel.org>; Wed, 18 Mar 2026 18:33:42 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=95.215.58.172
ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1773858825; cv=none; b=QONPe0XfetbY7Q8Vj9bxhZnsfzpggDANDsgFh21SHKM/FOc3hAIFEQZxM7jZ5xSKA5ZDDCXI7WzGK5U6pWAuGIGxX1YGH7UWNOvNS4OHvv1GgKsdYxw4c8R4DQkxl7jxvbYwqRFsNZKVRQsxv0MQgg0WYEzf1wqQzEv7wsYWwBo=
ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1773858825; c=relaxed/simple;
	bh=4HXxDiIsd0C5JcCckyc6f9h/SVZsqLPOPfZVHalnBCY=;
	h=From:To:Cc:Subject:In-Reply-To:References:Date:Message-ID:
	 MIME-Version:Content-Type; b=Ik6nQ7+nPADBOvm55yUGWjlAuhUVk5rU1xEfuKMQ94du7MnWhkvKVNZHigTW+wThjkSMtMZGYnUJqMcmXUOdlUbkI7i6sdqjVdrJoVfaccCe5eHIzzl53bQw7zhjkifmQ7Judc+0YMsOagkzlbslcaC2SAVvjiasBv6WoaMXUZ4=
ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=usn0QxkD; arc=none smtp.client-ip=95.215.58.172
Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev
Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="usn0QxkD"
X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers.
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1;
	t=1773858821;
	h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
	 to:to:cc:cc:mime-version:mime-version:content-type:content-type:
	 in-reply-to:in-reply-to:references:references;
	bh=5aAJDkIcLFAwWhqUqrF3eDO1cKbrsQp8rXJjuoSzoUI=;
	b=usn0QxkD7TiyjsgFeirlF4gctkvXQPua+57/sLKYDI9wUdS807kdKprKVsMX4I/fUZGcSq
	dEHejZ1ODeD94kdK20ypu1ez37XT7t9ffijDP6N7vTIWrZIszVY72ht35gn3/gP2sC6wbR
	SbrHJtaW/XnVAxEgC8nT3Jdf5scJQug=
From: Roman Gushchin <roman.gushchin@linux.dev>
To: "Lorenzo Stoakes (Oracle)" <ljs@kernel.org>
Cc: linux-kernel <linux-kernel@vger.kernel.org>,  Andrew Morton
 <akpm@linux-foundation.org>,  Theodore Ts'o <tytso@mit.edu>,  Guenter
 Roeck <linux@roeck-us.net>,  Konstantin Ryabitsev
 <konstantin@linuxfoundation.org>,  Chris Mason <clm@meta.com>,  SeongJae
 Park <sj@kernel.org>,  elkin@google.com,  Christian Brauner
 <brauner@kernel.org>,  Dmitry Vyukov <dvyukov@google.com>,  Sasha Levin
 <sashal@kernel.org>,  Shakeel Butt <shakeel.butt@linux.dev>,  Lorenzo
 Stoakes <lorenzo.stoakes@oracle.com>,  Sean Christopherson
 <seanjc@google.com>,  Ian Rogers <irogers@google.com>
Subject: Re: Introduce Sashiko (agentic review of Linux kernel changes)
In-Reply-To: <39e6b4d2-8a30-4eaa-908d-5d11b746f8d5@lucifer.local> (Lorenzo
	Stoakes's message of "Wed, 18 Mar 2026 12:03:23 +0000")
References: <7ia4o6kmpj5s.fsf@castle.c.googlers.com>
	<39e6b4d2-8a30-4eaa-908d-5d11b746f8d5@lucifer.local>
Date: Wed, 18 Mar 2026 11:33:22 -0700
Message-ID: <87v7etugwd.fsf@linux.dev>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Type: text/plain
X-Migadu-Flow: FLOW_OUT

"Lorenzo Stoakes (Oracle)" <ljs@kernel.org> writes:

> On Tue, Mar 17, 2026 at 03:31:11PM +0000, Roman Gushchin wrote:
>> Hello,
>>
>> I'm happy to share something my colleagues and I have been working on
>> for the last several months:
>> Sashiko - an agentic system for Linux kernel changes.
>>
>> First, Sashiko is available as a service at:
>>   * https://sashiko.dev
>>
>
> ...
>
> (For one I'm going to go fix some bugs on my series I saw reported there).
>
> I think over time as the approach/model is refined this will get a LOT
> better, it seems these things can acelerate quickly.

Hi Lorenzo,

Thank you for kind words!

RE false positives: I think Chris's prompts were initially heavily
biased towards avoiding false positives, but it comes at the cost of
missing real issues (in general, I don't have hard data on % of findings).
Now he also is looking to relax it a bit, to my knowledge.
But then there are different models in use, different protocols, etc.

I also have a notion of issue severity and I was thinking about
e.g. sending out only reviews revealing critical & high severity bugs
(e.g. memory corruptions & panics). Or maybe send the feedback to the
author in any case (e.g. for fixing typos), but cc maintainers only if
there are serious concerns.

And obviously no pressure, I won't enable any public email sending
unless there is a consensus across maintainers of the corresponding
subsystem.

>>
>> * What's next?
>>
>> This is our first version and it's obviously not perfect. There is a
>> long list of fixes and improvements to make. Please, don't expect it to
>> be 100% reliable, even though we'll try hard to keep it up and running.
>> Please use github issues or email me any bug reports and feature
>> requests, or send PR's.
>
> Of course, it's all much appreicated!
>
>>
>> As of now, Sashiko only provides a web interface;
>> however, Konstantin Ryabitsev is already adding sashiko.dev support to b4,
>> and SeongJae Park is adding support to hkml.
>> That was really fast, thank you!
>
> Thanks to Konstantantin and SJ too but the web interface is pretty nice I
> must say so thanks for that! :)
>
>>
>> We're working on adding an email interface to Sashiko, and soon Sashiko
>> will be able to send out reviews over email - similar to what the bpf
>> subsystem already has. It will be opt-in by subsystem and will have options
>
> Like I said, I think it's a bit premature for mm at least _at this point_
> but I'm sure it'll get there.

I'd really appreciate (and actually need) yours and other maintainers and
developers feedback here. Even though I can't fix every single false
positive as a code issue, I can hopefully tackle some common themes.

Chris did a fantastic work on the bpf subsystem (and several others) by
manually analyzing replies to the AI feedback and adjusting prompts. Now
we need to repeat this for all other subsystems.

>
> For now I think we need to get the false positive rate down a fair bit
> otherwise it might be a little distracitng.
>
> But people are _already_ integrating the web interface into workflows, I
> check it now, and Andrew is already very keen :) see:
>
> https://lore.kernel.org/all/20260317121736.f73a828de2a989d1a07efea1@linux-foundation.org/
> https://lore.kernel.org/all/20260317113730.45d5cef4ba84be4df631677f@linux-foundation.org/
>
>> to CC only the author of the patch, maintainers, volunteers, or send a
>> fully public reply. If you're a maintainer and have a strong preference
>> to get reviews over email, please let me know.
>
> Well as maintainer I think 'not quite yet' but probably soon is the answer
> on that one!
>
>>
>> We also desperately need better benchmarks, especially when it comes to
>> false positives. Having a decent vetted set of officially perfect
>> commits can help with this.
>
> Not sure perfect commits exist in the kernel certainly not mine :P

Same here :) This is why it's so hard.

>
>>
>> Finally, some subsystems have a good prompts coverage and some don't. It
>> doesn't have to be lengthy documentation (and it might actually be
>> counter-productive), but having a small list of things to look at - some
>> high-level concepts which are hard to grasp from the code, etc. - can
>> help a lot with both bug discovery and false positives.
>
> I guess best contributed to Chris's review-prompts repo right?

Both works for me now, we'll figure out with Chris how to sync our
prompts. The small problem is that we're using various models, tools and
review protocols and barely can test each other's setup. And it's all
very fragile, so it's not exactly trivial.
But we'll figure out something soon.

In general we need to carefully separate instructions (like which tools
to use, which prompts to load etc) from factual data. Then we can easily
use the factual data with various tooling around.

Thanks!