From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from out-183.mta0.migadu.com (out-183.mta0.migadu.com [91.218.175.183]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9671343DA21 for ; Thu, 19 Mar 2026 22:33:51 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.183 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773959634; cv=none; b=LQ8oNgX3isAI7icpf0A4/4pI2IT7VZA/WogJInZ2wZL7+PjCpta0qgHDqbIJFD9kokV8R4eVe/CA1L1M/x6Geqoyg2NnSyDe54szMSlC7uOr2hp5e+7UaAGAZ4upWdZLlIyv6CXQlNY1HQInLqr9LxwPt9G8AWK9BAH79xS4gPc= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773959634; c=relaxed/simple; bh=24Hq0PefarVqSMlOqhN4FJtHOSJwMmt4JvqdnYeerj4=; h=From:To:Cc:Subject:In-Reply-To:References:Date:Message-ID: MIME-Version:Content-Type; b=QzdKArr2t2yLO6cKufeMsrDRy6nvdgU6wBKzedvwWo2TTdpBDxU/dJiHWPNKL03MIUO1Pmy9NDdzbTW5ELyGEAimIjC8znc+hAGmLmqVAWES+OPQ9X1J+xwC1P2Q/47pnsVo9utIgpWqs05v1KwrPp4pfgA7BxFiyLESqbIOZ/w= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=GjiHrVju; arc=none smtp.client-ip=91.218.175.183 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="GjiHrVju" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1773959629; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=dg8iSNQRNBPo0TgxQTceQgVtBmkxEPVnSd2EPQ6bDMw=; b=GjiHrVjuLNV99LdzStFB2AyDnwH04IjF7wIfPr4mn8WHG9qXFhngs8hQnlORun+Ci+wJlo uK1DMSRjgtx4y4lrIdBZa2NGxuF/B3U4prGKnZ0DBetN/QixEIpMT0VYjHb3d1DPBK6PEx IvTs1f8nH8lKpKOQFoWgEktV6RNPe18= From: Roman Gushchin To: "Lorenzo Stoakes (Oracle)" Cc: linux-kernel , Andrew Morton , Theodore Ts'o , Guenter Roeck , Konstantin Ryabitsev , Chris Mason , SeongJae Park , elkin@google.com, Christian Brauner , Dmitry Vyukov , Sasha Levin , Shakeel Butt , Lorenzo Stoakes , Sean Christopherson , Ian Rogers Subject: Re: Introduce Sashiko (agentic review of Linux kernel changes) In-Reply-To: <34630bb5-840b-4a99-8e19-51fd4fc8ba96@lucifer.local> (Lorenzo Stoakes's message of "Wed, 18 Mar 2026 18:50:27 +0000") References: <7ia4o6kmpj5s.fsf@castle.c.googlers.com> <39e6b4d2-8a30-4eaa-908d-5d11b746f8d5@lucifer.local> <87v7etugwd.fsf@linux.dev> <34630bb5-840b-4a99-8e19-51fd4fc8ba96@lucifer.local> Date: Thu, 19 Mar 2026 15:33:38 -0700 Message-ID: <87jyv7a1q5.fsf@linux.dev> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain X-Migadu-Flow: FLOW_OUT "Lorenzo Stoakes (Oracle)" writes: > On Wed, Mar 18, 2026 at 11:33:22AM -0700, Roman Gushchin wrote: >> "Lorenzo Stoakes (Oracle)" writes: >> >> > On Tue, Mar 17, 2026 at 03:31:11PM +0000, Roman Gushchin wrote: >> >> Hello, >> >> >> >> I'm happy to share something my colleagues and I have been working on >> >> for the last several months: >> >> Sashiko - an agentic system for Linux kernel changes. >> >> >> >> First, Sashiko is available as a service at: >> >> * https://sashiko.dev >> >> >> > >> > ... >> > >> > (For one I'm going to go fix some bugs on my series I saw reported there). >> > >> > I think over time as the approach/model is refined this will get a LOT >> > better, it seems these things can acelerate quickly. >> >> Hi Lorenzo, >> >> Thank you for kind words! > > No problem, thanks for your hard work! :) > >> >> RE false positives: I think Chris's prompts were initially heavily >> biased towards avoiding false positives, but it comes at the cost of >> missing real issues (in general, I don't have hard data on % of findings). >> Now he also is looking to relax it a bit, to my knowledge. >> But then there are different models in use, different protocols, etc. >> >> I also have a notion of issue severity and I was thinking about >> e.g. sending out only reviews revealing critical & high severity bugs >> (e.g. memory corruptions & panics). Or maybe send the feedback to the >> author in any case (e.g. for fixing typos), but cc maintainers only if >> there are serious concerns. >> >> And obviously no pressure, I won't enable any public email sending >> unless there is a consensus across maintainers of the corresponding >> subsystem. > > I think maybe an opt-in thing might work for some of us? Absolutely, I think with mm we can start with replying to the author and a dedicated list of volunteers. > But yeah we can take our time with this, Andrew is looking, I am for > sure. Thank you! > > Oh and one data point - > https://lore.kernel.org/linux-mm/cover.1773846935.git.ljs@kernel.org/ > > Read the v3 change log for a list of the issues it correctly raised for that > series, so it's definitely useful. > > It was about maybe 50/50 noise/signal I think? > > But as you can see that's already very useful thank you and has fixed a > bunch of bugs in that codde! > > I'm not sure what Chris is planning, and I keep not going to the AI > meetings for various reasons (other stuff clashing/away/tired sometimes :) > but I wonder how we will sync up with Chris's review bot experiments? So as Chris said, we're syncing regularly and actively thinking how to organize it. I think we both want to share as much stuff as possible. The hard part is that we can't easily test each others setup and it's all very brittle. Initially I tried to use Chris's prompts directly with only minimal changes, but it was hard to keep Sashiko stable. Plus the new multi-stage protocol improved the discovery rate by almost 10%, which was hard to ignore. My current thinking (and things evolving quickly, so I might have a different opinion in a couple of weeks) is that we need to separate per-subsystem knowledge, make sure it's not containing any imperative instructions or llm/tools specifics and share it completely. We can move it to a separate repo or even put into the kernel tree, it's all debatable. In a way, these prompts should be owned by subsystem maintainers more than anyone else. Then there are things which can be shared, but are not subsystem-specific. E.g. an instruction on how to assess issue severity. And then there is a specific review protocol, which significantly depends on the tooling and LLM being used. This part is hard to share, but also it's the place where a lot of experimentation is happening, so maybe it's fine to have multiple tools. And they might be optimized for different use cases: e.g. for personal development it might be beneficial to have a live interaction with llm on the review material (someone already asked me about this); but for sashiko.dev's mass review case I do care a lot about the stability and token efficiency. >> >> >> >> * What's next? >> >> >> >> This is our first version and it's obviously not perfect. There is a >> >> long list of fixes and improvements to make. Please, don't expect it to >> >> be 100% reliable, even though we'll try hard to keep it up and running. >> >> Please use github issues or email me any bug reports and feature >> >> requests, or send PR's. >> > >> > Of course, it's all much appreicated! >> > >> >> >> >> As of now, Sashiko only provides a web interface; >> >> however, Konstantin Ryabitsev is already adding sashiko.dev support to b4, >> >> and SeongJae Park is adding support to hkml. >> >> That was really fast, thank you! >> > >> > Thanks to Konstantantin and SJ too but the web interface is pretty nice I >> > must say so thanks for that! :) >> > >> >> >> >> We're working on adding an email interface to Sashiko, and soon Sashiko >> >> will be able to send out reviews over email - similar to what the bpf >> >> subsystem already has. It will be opt-in by subsystem and will have options >> > >> > Like I said, I think it's a bit premature for mm at least _at this point_ >> > but I'm sure it'll get there. >> >> I'd really appreciate (and actually need) yours and other maintainers and >> developers feedback here. Even though I can't fix every single false >> positive as a code issue, I can hopefully tackle some common themes. > > Is there a way for us to point out which parts of a review are signal and > which are noise? Not yet. I think answering emails is the easiest part and I plan to teach Sashiko to recognize these answers and analyze them. Maybe Sashiko can even adjust it's own prompts in a (semi)-automatic way, Idk. > > If you could update the web interface for feedback that'd be really handy, > though I guess there's the painful stuff of having to have users and > etc. for that :) Yeah, I'm afraid we might end up trying to build a new JIRA this way... > >> >> Chris did a fantastic work on the bpf subsystem (and several others) by >> manually analyzing replies to the AI feedback and adjusting prompts. Now >> we need to repeat this for all other subsystems. > > Yeah, I'm happy to feedback if there's a fairly low friction way of doing > it, but constant workload makes it hard if it requires much more > effort :) Can't agree more :) > >> >> > >> > For now I think we need to get the false positive rate down a fair bit >> > otherwise it might be a little distracitng. >> > >> > But people are _already_ integrating the web interface into workflows, I >> > check it now, and Andrew is already very keen :) see: >> > >> > https://lore.kernel.org/all/20260317121736.f73a828de2a989d1a07efea1@linux-foundation.org/ >> > https://lore.kernel.org/all/20260317113730.45d5cef4ba84be4df631677f@linux-foundation.org/ >> > >> >> to CC only the author of the patch, maintainers, volunteers, or send a >> >> fully public reply. If you're a maintainer and have a strong preference >> >> to get reviews over email, please let me know. >> > >> > Well as maintainer I think 'not quite yet' but probably soon is the answer >> > on that one! >> > >> >> >> >> We also desperately need better benchmarks, especially when it comes to >> >> false positives. Having a decent vetted set of officially perfect >> >> commits can help with this. >> > >> > Not sure perfect commits exist in the kernel certainly not mine :P >> >> Same here :) This is why it's so hard. > > Yes, but worthwhile! LLMs are surprisingly good at figuring out issues in > things, it's a real strength. > > And it's already improving the code. > >> >> > >> >> >> >> Finally, some subsystems have a good prompts coverage and some don't. It >> >> doesn't have to be lengthy documentation (and it might actually be >> >> counter-productive), but having a small list of things to look at - some >> >> high-level concepts which are hard to grasp from the code, etc. - can >> >> help a lot with both bug discovery and false positives. >> > >> > I guess best contributed to Chris's review-prompts repo right? >> >> Both works for me now, we'll figure out with Chris how to sync our >> prompts. The small problem is that we're using various models, tools and >> review protocols and barely can test each other's setup. And it's all >> very fragile, so it's not exactly trivial. >> But we'll figure out something soon. > > Yeah, part of the fun I guess :) > >> >> In general we need to carefully separate instructions (like which tools >> to use, which prompts to load etc) from factual data. Then we can easily >> use the factual data with various tooling around. > > Hopefully I find some time to contribute some mm-specific stuff too :) Awesome, waiting for it! Thanks!