* Introduce Sashiko (agentic review of Linux kernel changes) @ 2026-03-17 15:31 Roman Gushchin 2026-03-18 12:03 ` Lorenzo Stoakes (Oracle) 2026-03-18 15:00 ` SeongJae Park 0 siblings, 2 replies; 8+ messages in thread From: Roman Gushchin @ 2026-03-17 15:31 UTC (permalink / raw) To: linux-kernel, Andrew Morton, Theodore Ts'o, Guenter Roeck, Konstantin Ryabitsev, Chris Mason, SeongJae Park, elkin, Christian Brauner, Dmitry Vyukov, Sasha Levin, Shakeel Butt, Lorenzo Stoakes, Sean Christopherson, Ian Rogers Hello, I'm happy to share something my colleagues and I have been working on for the last several months: Sashiko - an agentic system for Linux kernel changes. First, Sashiko is available as a service at: * https://sashiko.dev It reviews all patches sent to LKML and several other Linux kernel mailing lists using the Gemini 3.1 Pro model. I want to thank my employer, Google, for providing the ML compute resources and infrastructure for making this project real. Sashiko is written in Rust from scratch, mostly using Gemini CLI. It's fully self-contained and does not rely on any CLI coding tools. It supports various LLMs (at this moment mostly tested with Gemini Pro/Flash and slightly with Claude). And finally it's fully open-source: * https://github.com/sashiko-dev/sashiko It's licensed under the Apache-2.0 License, and the ownership of the project was transferred to the Linux Foundation. Contributions are really welcome using DCO. Sashiko is based on a set of open-source prompts initially developed by Chris Mason: * https://github.com/masoncl/review-prompts/ But Sashiko leverages a different multi-stage review protocol, which somewhat mimics the human review process and forces the LLM to look at the proposed change from different angles. In my measurement, Sashiko was able to find 53% of bugs based on a completely unfiltered set of 1,000 recent upstream issues using "Fixes:" tags (using Gemini 3.1 Pro). Some might say that 53% is not that impressive, but 100% of these issues were missed by human reviewers. Also, many of these issues (like tricky build failures, performance problems, etc) are very hard/impossible to spot from reviewing the code, so arguably 100% is not reachable. We started with low 30's a couple of months ago; better models and improvements in the review protocol and subsystem prompts pushed it to low 50's. With better LLMs and collective effort on prompts we can push even further. Measuring false positives is much harder, but based on manual reviews of reviews, it's pretty good: it's rarely dead wrong, but sometimes it can nitpick or find too many low-value issues. In many cases, it can be improved with prompt engineering. * What's next? This is our first version and it's obviously not perfect. There is a long list of fixes and improvements to make. Please, don't expect it to be 100% reliable, even though we'll try hard to keep it up and running. Please use github issues or email me any bug reports and feature requests, or send PR's. As of now, Sashiko only provides a web interface; however, Konstantin Ryabitsev is already adding sashiko.dev support to b4, and SeongJae Park is adding support to hkml. That was really fast, thank you! We're working on adding an email interface to Sashiko, and soon Sashiko will be able to send out reviews over email - similar to what the bpf subsystem already has. It will be opt-in by subsystem and will have options to CC only the author of the patch, maintainers, volunteers, or send a fully public reply. If you're a maintainer and have a strong preference to get reviews over email, please let me know. We also desperately need better benchmarks, especially when it comes to false positives. Having a decent vetted set of officially perfect commits can help with this. Finally, some subsystems have a good prompts coverage and some don't. It doesn't have to be lengthy documentation (and it might actually be counter-productive), but having a small list of things to look at - some high-level concepts which are hard to grasp from the code, etc. - can help a lot with both bug discovery and false positives. Thanks, Roman ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Introduce Sashiko (agentic review of Linux kernel changes) 2026-03-17 15:31 Introduce Sashiko (agentic review of Linux kernel changes) Roman Gushchin @ 2026-03-18 12:03 ` Lorenzo Stoakes (Oracle) 2026-03-18 18:33 ` Roman Gushchin 2026-03-18 15:00 ` SeongJae Park 1 sibling, 1 reply; 8+ messages in thread From: Lorenzo Stoakes (Oracle) @ 2026-03-18 12:03 UTC (permalink / raw) To: Roman Gushchin Cc: linux-kernel, Andrew Morton, Theodore Ts'o, Guenter Roeck, Konstantin Ryabitsev, Chris Mason, SeongJae Park, elkin, Christian Brauner, Dmitry Vyukov, Sasha Levin, Shakeel Butt, Lorenzo Stoakes, Sean Christopherson, Ian Rogers On Tue, Mar 17, 2026 at 03:31:11PM +0000, Roman Gushchin wrote: > Hello, > > I'm happy to share something my colleagues and I have been working on > for the last several months: > Sashiko - an agentic system for Linux kernel changes. > > First, Sashiko is available as a service at: > * https://sashiko.dev > > It reviews all patches sent to LKML and several other Linux kernel > mailing lists using the Gemini 3.1 Pro model. > > I want to thank my employer, Google, for providing the ML compute > resources and infrastructure for making this project real. > > Sashiko is written in Rust from scratch, mostly using Gemini CLI. It's > fully self-contained and does not rely on any CLI coding tools. It > supports various LLMs (at this moment mostly tested with Gemini > Pro/Flash and slightly with Claude). > > And finally it's fully open-source: > * https://github.com/sashiko-dev/sashiko Thanks for this! All much appreciated. > > It's licensed under the Apache-2.0 License, and the ownership of the > project was transferred to the Linux Foundation. Contributions are > really welcome using DCO. > > Sashiko is based on a set of open-source prompts initially developed by > Chris Mason: > * https://github.com/masoncl/review-prompts/ > > But Sashiko leverages a different multi-stage review protocol, which > somewhat mimics the human review process and forces the LLM to look at > the proposed change from different angles. > > In my measurement, Sashiko was able to find 53% of bugs based > on a completely unfiltered set of 1,000 recent upstream issues using > "Fixes:" tags (using Gemini 3.1 Pro). Some might say that 53% is not > that impressive, but 100% of these issues were missed by human reviewers. > Also, many of these issues (like tricky build failures, performance > problems, etc) are very hard/impossible to spot from reviewing the code, > so arguably 100% is not reachable. We started with low 30's a couple of > months ago; better models and improvements in the review protocol and > subsystem prompts pushed it to low 50's. With better LLMs and collective > effort on prompts we can push even further. > > Measuring false positives is much harder, but based on manual reviews of > reviews, it's pretty good: it's rarely dead wrong, but sometimes it can > nitpick or find too many low-value issues. In many cases, it can be > improved with prompt engineering. So far I've noticed it has got quite a bit wrong, not quite 'dead wrong' but just very confused :) So for me, compared to Chris's prompts running through Claude it's producing a lot more noise, but it's also producing some useful results. So I think it's not quite good enough for integrating into anything email-wise yet, but it's definitely very useful as an additional tool. (For one I'm going to go fix some bugs on my series I saw reported there). I think over time as the approach/model is refined this will get a LOT better, it seems these things can acelerate quickly. > > * What's next? > > This is our first version and it's obviously not perfect. There is a > long list of fixes and improvements to make. Please, don't expect it to > be 100% reliable, even though we'll try hard to keep it up and running. > Please use github issues or email me any bug reports and feature > requests, or send PR's. Of course, it's all much appreicated! > > As of now, Sashiko only provides a web interface; > however, Konstantin Ryabitsev is already adding sashiko.dev support to b4, > and SeongJae Park is adding support to hkml. > That was really fast, thank you! Thanks to Konstantantin and SJ too but the web interface is pretty nice I must say so thanks for that! :) > > We're working on adding an email interface to Sashiko, and soon Sashiko > will be able to send out reviews over email - similar to what the bpf > subsystem already has. It will be opt-in by subsystem and will have options Like I said, I think it's a bit premature for mm at least _at this point_ but I'm sure it'll get there. For now I think we need to get the false positive rate down a fair bit otherwise it might be a little distracitng. But people are _already_ integrating the web interface into workflows, I check it now, and Andrew is already very keen :) see: https://lore.kernel.org/all/20260317121736.f73a828de2a989d1a07efea1@linux-foundation.org/ https://lore.kernel.org/all/20260317113730.45d5cef4ba84be4df631677f@linux-foundation.org/ > to CC only the author of the patch, maintainers, volunteers, or send a > fully public reply. If you're a maintainer and have a strong preference > to get reviews over email, please let me know. Well as maintainer I think 'not quite yet' but probably soon is the answer on that one! > > We also desperately need better benchmarks, especially when it comes to > false positives. Having a decent vetted set of officially perfect > commits can help with this. Not sure perfect commits exist in the kernel certainly not mine :P > > Finally, some subsystems have a good prompts coverage and some don't. It > doesn't have to be lengthy documentation (and it might actually be > counter-productive), but having a small list of things to look at - some > high-level concepts which are hard to grasp from the code, etc. - can > help a lot with both bug discovery and false positives. I guess best contributed to Chris's review-prompts repo right? > > Thanks, > Roman Cheers, Lorenzo ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Introduce Sashiko (agentic review of Linux kernel changes) 2026-03-18 12:03 ` Lorenzo Stoakes (Oracle) @ 2026-03-18 18:33 ` Roman Gushchin 2026-03-18 18:50 ` Lorenzo Stoakes (Oracle) 2026-03-18 18:50 ` Chris Mason 0 siblings, 2 replies; 8+ messages in thread From: Roman Gushchin @ 2026-03-18 18:33 UTC (permalink / raw) To: Lorenzo Stoakes (Oracle) Cc: linux-kernel, Andrew Morton, Theodore Ts'o, Guenter Roeck, Konstantin Ryabitsev, Chris Mason, SeongJae Park, elkin, Christian Brauner, Dmitry Vyukov, Sasha Levin, Shakeel Butt, Lorenzo Stoakes, Sean Christopherson, Ian Rogers "Lorenzo Stoakes (Oracle)" <ljs@kernel.org> writes: > On Tue, Mar 17, 2026 at 03:31:11PM +0000, Roman Gushchin wrote: >> Hello, >> >> I'm happy to share something my colleagues and I have been working on >> for the last several months: >> Sashiko - an agentic system for Linux kernel changes. >> >> First, Sashiko is available as a service at: >> * https://sashiko.dev >> > > ... > > (For one I'm going to go fix some bugs on my series I saw reported there). > > I think over time as the approach/model is refined this will get a LOT > better, it seems these things can acelerate quickly. Hi Lorenzo, Thank you for kind words! RE false positives: I think Chris's prompts were initially heavily biased towards avoiding false positives, but it comes at the cost of missing real issues (in general, I don't have hard data on % of findings). Now he also is looking to relax it a bit, to my knowledge. But then there are different models in use, different protocols, etc. I also have a notion of issue severity and I was thinking about e.g. sending out only reviews revealing critical & high severity bugs (e.g. memory corruptions & panics). Or maybe send the feedback to the author in any case (e.g. for fixing typos), but cc maintainers only if there are serious concerns. And obviously no pressure, I won't enable any public email sending unless there is a consensus across maintainers of the corresponding subsystem. >> >> * What's next? >> >> This is our first version and it's obviously not perfect. There is a >> long list of fixes and improvements to make. Please, don't expect it to >> be 100% reliable, even though we'll try hard to keep it up and running. >> Please use github issues or email me any bug reports and feature >> requests, or send PR's. > > Of course, it's all much appreicated! > >> >> As of now, Sashiko only provides a web interface; >> however, Konstantin Ryabitsev is already adding sashiko.dev support to b4, >> and SeongJae Park is adding support to hkml. >> That was really fast, thank you! > > Thanks to Konstantantin and SJ too but the web interface is pretty nice I > must say so thanks for that! :) > >> >> We're working on adding an email interface to Sashiko, and soon Sashiko >> will be able to send out reviews over email - similar to what the bpf >> subsystem already has. It will be opt-in by subsystem and will have options > > Like I said, I think it's a bit premature for mm at least _at this point_ > but I'm sure it'll get there. I'd really appreciate (and actually need) yours and other maintainers and developers feedback here. Even though I can't fix every single false positive as a code issue, I can hopefully tackle some common themes. Chris did a fantastic work on the bpf subsystem (and several others) by manually analyzing replies to the AI feedback and adjusting prompts. Now we need to repeat this for all other subsystems. > > For now I think we need to get the false positive rate down a fair bit > otherwise it might be a little distracitng. > > But people are _already_ integrating the web interface into workflows, I > check it now, and Andrew is already very keen :) see: > > https://lore.kernel.org/all/20260317121736.f73a828de2a989d1a07efea1@linux-foundation.org/ > https://lore.kernel.org/all/20260317113730.45d5cef4ba84be4df631677f@linux-foundation.org/ > >> to CC only the author of the patch, maintainers, volunteers, or send a >> fully public reply. If you're a maintainer and have a strong preference >> to get reviews over email, please let me know. > > Well as maintainer I think 'not quite yet' but probably soon is the answer > on that one! > >> >> We also desperately need better benchmarks, especially when it comes to >> false positives. Having a decent vetted set of officially perfect >> commits can help with this. > > Not sure perfect commits exist in the kernel certainly not mine :P Same here :) This is why it's so hard. > >> >> Finally, some subsystems have a good prompts coverage and some don't. It >> doesn't have to be lengthy documentation (and it might actually be >> counter-productive), but having a small list of things to look at - some >> high-level concepts which are hard to grasp from the code, etc. - can >> help a lot with both bug discovery and false positives. > > I guess best contributed to Chris's review-prompts repo right? Both works for me now, we'll figure out with Chris how to sync our prompts. The small problem is that we're using various models, tools and review protocols and barely can test each other's setup. And it's all very fragile, so it's not exactly trivial. But we'll figure out something soon. In general we need to carefully separate instructions (like which tools to use, which prompts to load etc) from factual data. Then we can easily use the factual data with various tooling around. Thanks! ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Introduce Sashiko (agentic review of Linux kernel changes) 2026-03-18 18:33 ` Roman Gushchin @ 2026-03-18 18:50 ` Lorenzo Stoakes (Oracle) 2026-03-19 22:33 ` Roman Gushchin 2026-03-18 18:50 ` Chris Mason 1 sibling, 1 reply; 8+ messages in thread From: Lorenzo Stoakes (Oracle) @ 2026-03-18 18:50 UTC (permalink / raw) To: Roman Gushchin Cc: linux-kernel, Andrew Morton, Theodore Ts'o, Guenter Roeck, Konstantin Ryabitsev, Chris Mason, SeongJae Park, elkin, Christian Brauner, Dmitry Vyukov, Sasha Levin, Shakeel Butt, Lorenzo Stoakes, Sean Christopherson, Ian Rogers On Wed, Mar 18, 2026 at 11:33:22AM -0700, Roman Gushchin wrote: > "Lorenzo Stoakes (Oracle)" <ljs@kernel.org> writes: > > > On Tue, Mar 17, 2026 at 03:31:11PM +0000, Roman Gushchin wrote: > >> Hello, > >> > >> I'm happy to share something my colleagues and I have been working on > >> for the last several months: > >> Sashiko - an agentic system for Linux kernel changes. > >> > >> First, Sashiko is available as a service at: > >> * https://sashiko.dev > >> > > > > ... > > > > (For one I'm going to go fix some bugs on my series I saw reported there). > > > > I think over time as the approach/model is refined this will get a LOT > > better, it seems these things can acelerate quickly. > > Hi Lorenzo, > > Thank you for kind words! No problem, thanks for your hard work! :) > > RE false positives: I think Chris's prompts were initially heavily > biased towards avoiding false positives, but it comes at the cost of > missing real issues (in general, I don't have hard data on % of findings). > Now he also is looking to relax it a bit, to my knowledge. > But then there are different models in use, different protocols, etc. > > I also have a notion of issue severity and I was thinking about > e.g. sending out only reviews revealing critical & high severity bugs > (e.g. memory corruptions & panics). Or maybe send the feedback to the > author in any case (e.g. for fixing typos), but cc maintainers only if > there are serious concerns. > > And obviously no pressure, I won't enable any public email sending > unless there is a consensus across maintainers of the corresponding > subsystem. I think maybe an opt-in thing might work for some of us? But yeah we can take our time with this, Andrew is looking, I am for sure. Oh and one data point - https://lore.kernel.org/linux-mm/cover.1773846935.git.ljs@kernel.org/ Read the v3 change log for a list of the issues it correctly raised for that series, so it's definitely useful. It was about maybe 50/50 noise/signal I think? But as you can see that's already very useful thank you and has fixed a bunch of bugs in that codde! I'm not sure what Chris is planning, and I keep not going to the AI meetings for various reasons (other stuff clashing/away/tired sometimes :) but I wonder how we will sync up with Chris's review bot experiments? > > >> > >> * What's next? > >> > >> This is our first version and it's obviously not perfect. There is a > >> long list of fixes and improvements to make. Please, don't expect it to > >> be 100% reliable, even though we'll try hard to keep it up and running. > >> Please use github issues or email me any bug reports and feature > >> requests, or send PR's. > > > > Of course, it's all much appreicated! > > > >> > >> As of now, Sashiko only provides a web interface; > >> however, Konstantin Ryabitsev is already adding sashiko.dev support to b4, > >> and SeongJae Park is adding support to hkml. > >> That was really fast, thank you! > > > > Thanks to Konstantantin and SJ too but the web interface is pretty nice I > > must say so thanks for that! :) > > > >> > >> We're working on adding an email interface to Sashiko, and soon Sashiko > >> will be able to send out reviews over email - similar to what the bpf > >> subsystem already has. It will be opt-in by subsystem and will have options > > > > Like I said, I think it's a bit premature for mm at least _at this point_ > > but I'm sure it'll get there. > > I'd really appreciate (and actually need) yours and other maintainers and > developers feedback here. Even though I can't fix every single false > positive as a code issue, I can hopefully tackle some common themes. Is there a way for us to point out which parts of a review are signal and which are noise? If you could update the web interface for feedback that'd be really handy, though I guess there's the painful stuff of having to have users and etc. for that :) > > Chris did a fantastic work on the bpf subsystem (and several others) by > manually analyzing replies to the AI feedback and adjusting prompts. Now > we need to repeat this for all other subsystems. Yeah, I'm happy to feedback if there's a fairly low friction way of doing it, but constant workload makes it hard if it requires much more effort :) > > > > > For now I think we need to get the false positive rate down a fair bit > > otherwise it might be a little distracitng. > > > > But people are _already_ integrating the web interface into workflows, I > > check it now, and Andrew is already very keen :) see: > > > > https://lore.kernel.org/all/20260317121736.f73a828de2a989d1a07efea1@linux-foundation.org/ > > https://lore.kernel.org/all/20260317113730.45d5cef4ba84be4df631677f@linux-foundation.org/ > > > >> to CC only the author of the patch, maintainers, volunteers, or send a > >> fully public reply. If you're a maintainer and have a strong preference > >> to get reviews over email, please let me know. > > > > Well as maintainer I think 'not quite yet' but probably soon is the answer > > on that one! > > > >> > >> We also desperately need better benchmarks, especially when it comes to > >> false positives. Having a decent vetted set of officially perfect > >> commits can help with this. > > > > Not sure perfect commits exist in the kernel certainly not mine :P > > Same here :) This is why it's so hard. Yes, but worthwhile! LLMs are surprisingly good at figuring out issues in things, it's a real strength. And it's already improving the code. > > > > >> > >> Finally, some subsystems have a good prompts coverage and some don't. It > >> doesn't have to be lengthy documentation (and it might actually be > >> counter-productive), but having a small list of things to look at - some > >> high-level concepts which are hard to grasp from the code, etc. - can > >> help a lot with both bug discovery and false positives. > > > > I guess best contributed to Chris's review-prompts repo right? > > Both works for me now, we'll figure out with Chris how to sync our > prompts. The small problem is that we're using various models, tools and > review protocols and barely can test each other's setup. And it's all > very fragile, so it's not exactly trivial. > But we'll figure out something soon. Yeah, part of the fun I guess :) > > In general we need to carefully separate instructions (like which tools > to use, which prompts to load etc) from factual data. Then we can easily > use the factual data with various tooling around. Hopefully I find some time to contribute some mm-specific stuff too :) So far claude + Chris's prompts are working pretty great for me, I do see it hallucinate or get things wrong sometimes but it's generally good. Overall I continue to find the more 'creative' the task the worse it does, the more you can constrain it to a problem domain the better it does. > > Thanks! Cheers, Lorenzo ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Introduce Sashiko (agentic review of Linux kernel changes) 2026-03-18 18:50 ` Lorenzo Stoakes (Oracle) @ 2026-03-19 22:33 ` Roman Gushchin 0 siblings, 0 replies; 8+ messages in thread From: Roman Gushchin @ 2026-03-19 22:33 UTC (permalink / raw) To: Lorenzo Stoakes (Oracle) Cc: linux-kernel, Andrew Morton, Theodore Ts'o, Guenter Roeck, Konstantin Ryabitsev, Chris Mason, SeongJae Park, elkin, Christian Brauner, Dmitry Vyukov, Sasha Levin, Shakeel Butt, Lorenzo Stoakes, Sean Christopherson, Ian Rogers "Lorenzo Stoakes (Oracle)" <ljs@kernel.org> writes: > On Wed, Mar 18, 2026 at 11:33:22AM -0700, Roman Gushchin wrote: >> "Lorenzo Stoakes (Oracle)" <ljs@kernel.org> writes: >> >> > On Tue, Mar 17, 2026 at 03:31:11PM +0000, Roman Gushchin wrote: >> >> Hello, >> >> >> >> I'm happy to share something my colleagues and I have been working on >> >> for the last several months: >> >> Sashiko - an agentic system for Linux kernel changes. >> >> >> >> First, Sashiko is available as a service at: >> >> * https://sashiko.dev >> >> >> > >> > ... >> > >> > (For one I'm going to go fix some bugs on my series I saw reported there). >> > >> > I think over time as the approach/model is refined this will get a LOT >> > better, it seems these things can acelerate quickly. >> >> Hi Lorenzo, >> >> Thank you for kind words! > > No problem, thanks for your hard work! :) > >> >> RE false positives: I think Chris's prompts were initially heavily >> biased towards avoiding false positives, but it comes at the cost of >> missing real issues (in general, I don't have hard data on % of findings). >> Now he also is looking to relax it a bit, to my knowledge. >> But then there are different models in use, different protocols, etc. >> >> I also have a notion of issue severity and I was thinking about >> e.g. sending out only reviews revealing critical & high severity bugs >> (e.g. memory corruptions & panics). Or maybe send the feedback to the >> author in any case (e.g. for fixing typos), but cc maintainers only if >> there are serious concerns. >> >> And obviously no pressure, I won't enable any public email sending >> unless there is a consensus across maintainers of the corresponding >> subsystem. > > I think maybe an opt-in thing might work for some of us? Absolutely, I think with mm we can start with replying to the author and a dedicated list of volunteers. > But yeah we can take our time with this, Andrew is looking, I am for > sure. Thank you! > > Oh and one data point - > https://lore.kernel.org/linux-mm/cover.1773846935.git.ljs@kernel.org/ > > Read the v3 change log for a list of the issues it correctly raised for that > series, so it's definitely useful. > > It was about maybe 50/50 noise/signal I think? > > But as you can see that's already very useful thank you and has fixed a > bunch of bugs in that codde! > > I'm not sure what Chris is planning, and I keep not going to the AI > meetings for various reasons (other stuff clashing/away/tired sometimes :) > but I wonder how we will sync up with Chris's review bot experiments? So as Chris said, we're syncing regularly and actively thinking how to organize it. I think we both want to share as much stuff as possible. The hard part is that we can't easily test each others setup and it's all very brittle. Initially I tried to use Chris's prompts directly with only minimal changes, but it was hard to keep Sashiko stable. Plus the new multi-stage protocol improved the discovery rate by almost 10%, which was hard to ignore. My current thinking (and things evolving quickly, so I might have a different opinion in a couple of weeks) is that we need to separate per-subsystem knowledge, make sure it's not containing any imperative instructions or llm/tools specifics and share it completely. We can move it to a separate repo or even put into the kernel tree, it's all debatable. In a way, these prompts should be owned by subsystem maintainers more than anyone else. Then there are things which can be shared, but are not subsystem-specific. E.g. an instruction on how to assess issue severity. And then there is a specific review protocol, which significantly depends on the tooling and LLM being used. This part is hard to share, but also it's the place where a lot of experimentation is happening, so maybe it's fine to have multiple tools. And they might be optimized for different use cases: e.g. for personal development it might be beneficial to have a live interaction with llm on the review material (someone already asked me about this); but for sashiko.dev's mass review case I do care a lot about the stability and token efficiency. >> >> >> >> * What's next? >> >> >> >> This is our first version and it's obviously not perfect. There is a >> >> long list of fixes and improvements to make. Please, don't expect it to >> >> be 100% reliable, even though we'll try hard to keep it up and running. >> >> Please use github issues or email me any bug reports and feature >> >> requests, or send PR's. >> > >> > Of course, it's all much appreicated! >> > >> >> >> >> As of now, Sashiko only provides a web interface; >> >> however, Konstantin Ryabitsev is already adding sashiko.dev support to b4, >> >> and SeongJae Park is adding support to hkml. >> >> That was really fast, thank you! >> > >> > Thanks to Konstantantin and SJ too but the web interface is pretty nice I >> > must say so thanks for that! :) >> > >> >> >> >> We're working on adding an email interface to Sashiko, and soon Sashiko >> >> will be able to send out reviews over email - similar to what the bpf >> >> subsystem already has. It will be opt-in by subsystem and will have options >> > >> > Like I said, I think it's a bit premature for mm at least _at this point_ >> > but I'm sure it'll get there. >> >> I'd really appreciate (and actually need) yours and other maintainers and >> developers feedback here. Even though I can't fix every single false >> positive as a code issue, I can hopefully tackle some common themes. > > Is there a way for us to point out which parts of a review are signal and > which are noise? Not yet. I think answering emails is the easiest part and I plan to teach Sashiko to recognize these answers and analyze them. Maybe Sashiko can even adjust it's own prompts in a (semi)-automatic way, Idk. > > If you could update the web interface for feedback that'd be really handy, > though I guess there's the painful stuff of having to have users and > etc. for that :) Yeah, I'm afraid we might end up trying to build a new JIRA this way... > >> >> Chris did a fantastic work on the bpf subsystem (and several others) by >> manually analyzing replies to the AI feedback and adjusting prompts. Now >> we need to repeat this for all other subsystems. > > Yeah, I'm happy to feedback if there's a fairly low friction way of doing > it, but constant workload makes it hard if it requires much more > effort :) Can't agree more :) > >> >> > >> > For now I think we need to get the false positive rate down a fair bit >> > otherwise it might be a little distracitng. >> > >> > But people are _already_ integrating the web interface into workflows, I >> > check it now, and Andrew is already very keen :) see: >> > >> > https://lore.kernel.org/all/20260317121736.f73a828de2a989d1a07efea1@linux-foundation.org/ >> > https://lore.kernel.org/all/20260317113730.45d5cef4ba84be4df631677f@linux-foundation.org/ >> > >> >> to CC only the author of the patch, maintainers, volunteers, or send a >> >> fully public reply. If you're a maintainer and have a strong preference >> >> to get reviews over email, please let me know. >> > >> > Well as maintainer I think 'not quite yet' but probably soon is the answer >> > on that one! >> > >> >> >> >> We also desperately need better benchmarks, especially when it comes to >> >> false positives. Having a decent vetted set of officially perfect >> >> commits can help with this. >> > >> > Not sure perfect commits exist in the kernel certainly not mine :P >> >> Same here :) This is why it's so hard. > > Yes, but worthwhile! LLMs are surprisingly good at figuring out issues in > things, it's a real strength. > > And it's already improving the code. > >> >> > >> >> >> >> Finally, some subsystems have a good prompts coverage and some don't. It >> >> doesn't have to be lengthy documentation (and it might actually be >> >> counter-productive), but having a small list of things to look at - some >> >> high-level concepts which are hard to grasp from the code, etc. - can >> >> help a lot with both bug discovery and false positives. >> > >> > I guess best contributed to Chris's review-prompts repo right? >> >> Both works for me now, we'll figure out with Chris how to sync our >> prompts. The small problem is that we're using various models, tools and >> review protocols and barely can test each other's setup. And it's all >> very fragile, so it's not exactly trivial. >> But we'll figure out something soon. > > Yeah, part of the fun I guess :) > >> >> In general we need to carefully separate instructions (like which tools >> to use, which prompts to load etc) from factual data. Then we can easily >> use the factual data with various tooling around. > > Hopefully I find some time to contribute some mm-specific stuff too :) Awesome, waiting for it! Thanks! ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Introduce Sashiko (agentic review of Linux kernel changes) 2026-03-18 18:33 ` Roman Gushchin 2026-03-18 18:50 ` Lorenzo Stoakes (Oracle) @ 2026-03-18 18:50 ` Chris Mason 1 sibling, 0 replies; 8+ messages in thread From: Chris Mason @ 2026-03-18 18:50 UTC (permalink / raw) To: Roman Gushchin, Lorenzo Stoakes (Oracle) Cc: linux-kernel, Andrew Morton, Theodore Ts'o, Guenter Roeck, Konstantin Ryabitsev, SeongJae Park, elkin, Christian Brauner, Dmitry Vyukov, Sasha Levin, Shakeel Butt, Lorenzo Stoakes, Sean Christopherson, Ian Rogers On 3/18/26 2:33 PM, Roman Gushchin wrote: > "Lorenzo Stoakes (Oracle)" <ljs@kernel.org> writes: >>> >>> Finally, some subsystems have a good prompts coverage and some don't. It >>> doesn't have to be lengthy documentation (and it might actually be >>> counter-productive), but having a small list of things to look at - some >>> high-level concepts which are hard to grasp from the code, etc. - can >>> help a lot with both bug discovery and false positives. >> >> I guess best contributed to Chris's review-prompts repo right? > > Both works for me now, we'll figure out with Chris how to sync our > prompts. The small problem is that we're using various models, tools and > review protocols and barely can test each other's setup. And it's all > very fragile, so it's not exactly trivial. > But we'll figure out something soon. > > In general we need to carefully separate instructions (like which tools > to use, which prompts to load etc) from factual data. Then we can easily > use the factual data with various tooling around. I'm really excited to see Roman's work go live, and we've been talking about different ways to collaborate for a while. I don't really have answers today other than just trying to iterate and do what works, but I wanted to reply that I'm fully supportive. -chris ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Introduce Sashiko (agentic review of Linux kernel changes) 2026-03-17 15:31 Introduce Sashiko (agentic review of Linux kernel changes) Roman Gushchin 2026-03-18 12:03 ` Lorenzo Stoakes (Oracle) @ 2026-03-18 15:00 ` SeongJae Park 2026-03-18 18:43 ` Roman Gushchin 1 sibling, 1 reply; 8+ messages in thread From: SeongJae Park @ 2026-03-18 15:00 UTC (permalink / raw) To: Roman Gushchin Cc: SeongJae Park, linux-kernel, Andrew Morton, Theodore Ts'o, Guenter Roeck, Konstantin Ryabitsev, Chris Mason, elkin, Christian Brauner, Dmitry Vyukov, Sasha Levin, Shakeel Butt, Lorenzo Stoakes, Sean Christopherson, Ian Rogers, damon Hello Roman, On Tue, 17 Mar 2026 15:31:11 +0000 Roman Gushchin <roman.gushchin@linux.dev> wrote: > Hello, > > I'm happy to share something my colleagues and I have been working on > for the last several months: > Sashiko - an agentic system for Linux kernel changes. > > First, Sashiko is available as a service at: > * https://sashiko.dev Great work. Thank you! There are many similar tools but this is the first free web service I know. I'm still feeling uncomfortable or not prepared for running some AI tools on my own. Therefore I was only waiting for some nice people sharing their AI review results (some people including Chris Mason did, and it was really helpful, thanks again), or the arrival of this kind of public and just working service. This feels like the chat-gpt moment to me. > > It reviews all patches sent to LKML and several other Linux kernel > mailing lists using the Gemini 3.1 Pro model. > > I want to thank my employer, Google, for providing the ML compute > resources and infrastructure for making this project real. > > Sashiko is written in Rust from scratch, mostly using Gemini CLI. It's > fully self-contained and does not rely on any CLI coding tools. It > supports various LLMs (at this moment mostly tested with Gemini > Pro/Flash and slightly with Claude). > > And finally it's fully open-source: > * https://github.com/sashiko-dev/sashiko Awesome. I'm still feeling uncomfortable or not prepared to running some AI tools on my own. But I will try to find ways to contribute. > > It's licensed under the Apache-2.0 License, and the ownership of the > project was transferred to the Linux Foundation. Contributions are > really welcome using DCO. > > Sashiko is based on a set of open-source prompts initially developed by > Chris Mason: > * https://github.com/masoncl/review-prompts/ Kudos to Chris! > > But Sashiko leverages a different multi-stage review protocol, which > somewhat mimics the human review process and forces the LLM to look at > the proposed change from different angles. > > In my measurement, Sashiko was able to find 53% of bugs based > on a completely unfiltered set of 1,000 recent upstream issues using > "Fixes:" tags (using Gemini 3.1 Pro). Some might say that 53% is not > that impressive, but 100% of these issues were missed by human reviewers. > Also, many of these issues (like tricky build failures, performance > problems, etc) are very hard/impossible to spot from reviewing the code, > so arguably 100% is not reachable. We started with low 30's a couple of > months ago; better models and improvements in the review protocol and > subsystem prompts pushed it to low 50's. With better LLMs and collective > effort on prompts we can push even further. > > Measuring false positives is much harder, but based on manual reviews of > reviews, it's pretty good: it's rarely dead wrong, but sometimes it can > nitpick or find too many low-value issues. In many cases, it can be > improved with prompt engineering. > > * What's next? > > This is our first version and it's obviously not perfect. There is a > long list of fixes and improvements to make. Please, don't expect it to > be 100% reliable, even though we'll try hard to keep it up and running. > Please use github issues or email me any bug reports and feature > requests, or send PR's. > > As of now, Sashiko only provides a web interface; > however, Konstantin Ryabitsev is already adding sashiko.dev support to b4, > and SeongJae Park is adding support to hkml. > That was really fast, thank you! hkml support was available owing to Sashiko providing the decent API, and b4's use of it is open source. Kudos to Sashiko team and Konstantin. I'm planning to make more integration into hkml, for my workflow and based on other hkml user feedback. > > We're working on adding an email interface to Sashiko, and soon Sashiko > will be able to send out reviews over email - similar to what the bpf > subsystem already has. It will be opt-in by subsystem and will have options > to CC only the author of the patch, maintainers, volunteers, or send a > fully public reply. If you're a maintainer and have a strong preference > to get reviews over email, please let me know. I, as the maintainer of DAMON subsystem (damon@lists.linux.dev), do have a strong preference to get reviews over email for all patches that sent to the mailing list. I'm already manually doing that. I'm planning to extend hkml for doing this easier. It would be nice and efficient if Sashiko can do this on its own. > > We also desperately need better benchmarks, especially when it comes to > false positives. Having a decent vetted set of officially perfect > commits can help with this. I'm also curious if there is a public channel for giving feedback about the reviews. As you mentioned above, Sashiko sometimes says something that is not technically correct. I'm wondering if there is a way to let Sashiko knows such things for improvement. > > Finally, some subsystems have a good prompts coverage and some don't. It > doesn't have to be lengthy documentation (and it might actually be > counter-productive), but having a small list of things to look at - some > high-level concepts which are hard to grasp from the code, etc. - can > help a lot with both bug discovery and false positives. I found there is no prompt for DAMON. I'm still convinced with Sashiko's current review, and have no idea for DAMON-custom prompts. So that's fine for now. I will consider adding something if I get some idea, though. Again, thanks for making this. Please keep making this improved and available. I will also try to find ways to help. Thanks, SJ [...] ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Introduce Sashiko (agentic review of Linux kernel changes) 2026-03-18 15:00 ` SeongJae Park @ 2026-03-18 18:43 ` Roman Gushchin 0 siblings, 0 replies; 8+ messages in thread From: Roman Gushchin @ 2026-03-18 18:43 UTC (permalink / raw) To: SeongJae Park Cc: linux-kernel, Andrew Morton, Theodore Ts'o, Guenter Roeck, Konstantin Ryabitsev, Chris Mason, elkin, Christian Brauner, Dmitry Vyukov, Sasha Levin, Shakeel Butt, Lorenzo Stoakes, Sean Christopherson, Ian Rogers, damon SeongJae Park <sj@kernel.org> writes: > Hello Roman, > > On Tue, 17 Mar 2026 15:31:11 +0000 Roman Gushchin <roman.gushchin@linux.dev> wrote: > >> Hello, >> >> I'm happy to share something my colleagues and I have been working on >> for the last several months: >> Sashiko - an agentic system for Linux kernel changes. >> >> First, Sashiko is available as a service at: >> * https://sashiko.dev > > Great work. Thank you! > > There are many similar tools but this is the first free web service I know. > I'm still feeling uncomfortable or not prepared for running some AI tools on my > own. Therefore I was only waiting for some nice people sharing their AI review > results (some people including Chris Mason did, and it was really helpful, > thanks again), or the arrival of this kind of public and just working service. > This feels like the chat-gpt moment to me. Thank you! >> >> It reviews all patches sent to LKML and several other Linux kernel >> mailing lists using the Gemini 3.1 Pro model. >> >> I want to thank my employer, Google, for providing the ML compute >> resources and infrastructure for making this project real. >> >> Sashiko is written in Rust from scratch, mostly using Gemini CLI. It's >> fully self-contained and does not rely on any CLI coding tools. It >> supports various LLMs (at this moment mostly tested with Gemini >> Pro/Flash and slightly with Claude). >> >> And finally it's fully open-source: >> * https://github.com/sashiko-dev/sashiko > > Awesome. I'm still feeling uncomfortable or not prepared to running some AI > tools on my own. But I will try to find ways to contribute. > >> >> It's licensed under the Apache-2.0 License, and the ownership of the >> project was transferred to the Linux Foundation. Contributions are >> really welcome using DCO. >> >> Sashiko is based on a set of open-source prompts initially developed by >> Chris Mason: >> * https://github.com/masoncl/review-prompts/ > > Kudos to Chris! > >> >> But Sashiko leverages a different multi-stage review protocol, which >> somewhat mimics the human review process and forces the LLM to look at >> the proposed change from different angles. >> >> In my measurement, Sashiko was able to find 53% of bugs based >> on a completely unfiltered set of 1,000 recent upstream issues using >> "Fixes:" tags (using Gemini 3.1 Pro). Some might say that 53% is not >> that impressive, but 100% of these issues were missed by human reviewers. >> Also, many of these issues (like tricky build failures, performance >> problems, etc) are very hard/impossible to spot from reviewing the code, >> so arguably 100% is not reachable. We started with low 30's a couple of >> months ago; better models and improvements in the review protocol and >> subsystem prompts pushed it to low 50's. With better LLMs and collective >> effort on prompts we can push even further. >> >> Measuring false positives is much harder, but based on manual reviews of >> reviews, it's pretty good: it's rarely dead wrong, but sometimes it can >> nitpick or find too many low-value issues. In many cases, it can be >> improved with prompt engineering. >> >> * What's next? >> >> This is our first version and it's obviously not perfect. There is a >> long list of fixes and improvements to make. Please, don't expect it to >> be 100% reliable, even though we'll try hard to keep it up and running. >> Please use github issues or email me any bug reports and feature >> requests, or send PR's. >> >> As of now, Sashiko only provides a web interface; >> however, Konstantin Ryabitsev is already adding sashiko.dev support to b4, >> and SeongJae Park is adding support to hkml. >> That was really fast, thank you! > > hkml support was available owing to Sashiko providing the decent API, and b4's > use of it is open source. Kudos to Sashiko team and Konstantin. I'm planning > to make more integration into hkml, for my workflow and based on other hkml > user feedback. Thank you for doing this! >> >> We're working on adding an email interface to Sashiko, and soon Sashiko >> will be able to send out reviews over email - similar to what the bpf >> subsystem already has. It will be opt-in by subsystem and will have options >> to CC only the author of the patch, maintainers, volunteers, or send a >> fully public reply. If you're a maintainer and have a strong preference >> to get reviews over email, please let me know. > > I, as the maintainer of DAMON subsystem (damon@lists.linux.dev), do have a > strong preference to get reviews over email for all patches that sent to the > mailing list. I'm already manually doing that. I'm planning to extend hkml > for doing this easier. It would be nice and efficient if Sashiko can do this > on its own. Noted. I'll enable it as soon as we'll have it. > >> >> We also desperately need better benchmarks, especially when it comes to >> false positives. Having a decent vetted set of officially perfect >> commits can help with this. > > I'm also curious if there is a public channel for giving feedback about the > reviews. As you mentioned above, Sashiko sometimes says something that is not > technically correct. I'm wondering if there is a way to let Sashiko knows such > things for improvement. As of now, I suggest using Github issues. Later on, you could simple reply to Sashiko's emails. But also realistically I likely won't be able to look into every single false positive, so I'd really appreciate some initial analysis: e.g. if there is a common pattern or a number of similar reviews with the same problem. >> >> Finally, some subsystems have a good prompts coverage and some don't. It >> doesn't have to be lengthy documentation (and it might actually be >> counter-productive), but having a small list of things to look at - some >> high-level concepts which are hard to grasp from the code, etc. - can >> help a lot with both bug discovery and false positives. > > I found there is no prompt for DAMON. I'm still convinced with Sashiko's > current review, and have no idea for DAMON-custom prompts. So that's fine for > now. I will consider adding something if I get some idea, though. My suggestion is to read through a number of DAMON-specific reviews and see if there are any common patterns of false positives or missed errors. Once you have a feeling like "Damn, Sashiko doesn't really understand X about the DAMON!" then you put it into the prompt. Thanks! ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2026-03-19 22:33 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-03-17 15:31 Introduce Sashiko (agentic review of Linux kernel changes) Roman Gushchin 2026-03-18 12:03 ` Lorenzo Stoakes (Oracle) 2026-03-18 18:33 ` Roman Gushchin 2026-03-18 18:50 ` Lorenzo Stoakes (Oracle) 2026-03-19 22:33 ` Roman Gushchin 2026-03-18 18:50 ` Chris Mason 2026-03-18 15:00 ` SeongJae Park 2026-03-18 18:43 ` Roman Gushchin
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox