Sashiko review emails to the list

All of lore.kernel.org
 help / color / mirror / Atom feed

* Sashiko review emails to the list
@ 2026-06-19 14:19 Fuad Tabba
  2026-06-19 16:45 ` Oliver Upton
  0 siblings, 1 reply; 3+ messages in thread
From: Fuad Tabba @ 2026-06-19 14:19 UTC (permalink / raw)
  To: Marc Zyngier, Oliver Upton
  Cc: Roman Gushchin, Will Deacon, Vincent Donnefort, KVMARM

Hi folks,

I really like Sashiko and find it very useful. It's flagged real bugs
in series I and others have posted to the list (e.g. [1][2][3]), and I
run it locally before sending, which has saved me a few respins.

That said, it's been posting a lot lately, so it seemed worth asking
how the review emails to the list are working out, and whether we
should change anything.

A couple of things stand out:

  1. The noise. Some is genuine false positives, where the model
confabulates ARM details (bit positions, ISS field layouts, and the
like) and flags correct code [4]. The rest is repetition: the same
finding posted against several patches in one series, often a real
issue not introduced by the series.

  2. The emails themselves. Mailing review to the list automatically
can push contributors to respin many times chasing bot comments before
a human's looked, and pull reviewer attention onto findings that may
be misguided to begin with.

On the noise (1):

  - From the logs it looks like Roman's been working on the
false-positive detection on GitHub [5], and the rate's dropped
noticeably in my local testing. I'm not sure the list version has
those changes yet (I get different results locally, fewer FPs), and he
told me yesterday (offlist) he's working on the repetition too.

  - I've been working on the arm64 prompts to stop the confabulation:
the model doesn't have the ARM ARM, so it hallucinates encodings and
asserts them as bugs. I submitted a PR to review-prompt [6], which
should cover most of these but not all. Longer term I have a local
prototype that gives it the actual spec text, but there are copyright
questions around shipping it. I've offered to share it with Roman,
minus the copyrighted material, to see whether it's something we could
use and extend, including to other architectures.

These only address (1). The second concern stands regardless of
accuracy, since it's about respinning before a human's looked, not
whether the comments are right.

These fixes will take a while to land, so the question is what to do
meanwhile. Some options, and surely others:

  - Leave it as-is while the fixes propagate.
  - Stop the emails but keep the reviews on sashiko.dev, so people can
look rather than have them pushed.
  - Disable the emails until the noise is down to a reasonable level,
then re-enable.
  - Raise the confidence bar so it only emails high-confidence findings.

This seemed worth discussing on the usual lists rather than on GitHub,
which has less visibility. What do you think?

Cheers,
/fuad

[1] https://lore.kernel.org/all/CA+EHjTxLVo=GwderoFxqsOEFXV+DrD17nQCkPbnKZPA6mRNxhg@mail.gmail.com/
[2] https://lore.kernel.org/all/20260612113414.1022901-1-tabba@google.com/
[3] https://lore.kernel.org/all/20260615131116.390977-1-tabba@google.com/
[4] https://sashiko.dev/#/patchset/20260619070719.812227-1-tabba@google.com?part=7
[5] https://github.com/sashiko-dev/
[6] https://github.com/masoncl/review-prompts/pull/81

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Sashiko review emails to the list
  2026-06-19 14:19 Sashiko review emails to the list Fuad Tabba
@ 2026-06-19 16:45 ` Oliver Upton
  2026-06-19 18:05   ` Roman Gushchin
  0 siblings, 1 reply; 3+ messages in thread
From: Oliver Upton @ 2026-06-19 16:45 UTC (permalink / raw)
  To: Fuad Tabba
  Cc: Marc Zyngier, Roman Gushchin, Will Deacon, Vincent Donnefort,
	KVMARM

Hey,

On Fri, Jun 19, 2026 at 03:19:05PM +0100, Fuad Tabba wrote:
> Hi folks,
> 
> I really like Sashiko and find it very useful. It's flagged real bugs
> in series I and others have posted to the list (e.g. [1][2][3]), and I
> run it locally before sending, which has saved me a few respins.
> 
> That said, it's been posting a lot lately, so it seemed worth asking
> how the review emails to the list are working out, and whether we
> should change anything.

So this is entirely my fault since I added the email configuration for
the kvmarm list. Sashiko has been finding some truly nasty bugs, posting
on-list is the easiest way to get attention from the right folks to get
things fixed.

With that being said, the signal to noise ratio hasn't been ideal.

> These fixes will take a while to land, so the question is what to do
> meanwhile. Some options, and surely others:
> 
>   - Leave it as-is while the fixes propagate.
>   - Stop the emails but keep the reviews on sashiko.dev, so people can
> look rather than have them pushed.
>   - Disable the emails until the noise is down to a reasonable level,
> then re-enable.

This sounds like the right approach. I'd like to re-enable emails once
we're happy with the quality of reviews.

On another note: Roman, is it possible to separately report pre-existing
issues from findings in a patch? Maintainers have a higher likelihood of
caring about these than individual contributors anyway.

Thanks,
Oliver

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Sashiko review emails to the list
  2026-06-19 16:45 ` Oliver Upton
@ 2026-06-19 18:05   ` Roman Gushchin
  0 siblings, 0 replies; 3+ messages in thread
From: Roman Gushchin @ 2026-06-19 18:05 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Fuad Tabba, Marc Zyngier, Will Deacon, Vincent Donnefort, KVMARM

> On Jun 19, 2026, at 9:45 AM, Oliver Upton <oupton@kernel.org> wrote:
> 
> Hey,
> 
>> On Fri, Jun 19, 2026 at 03:19:05PM +0100, Fuad Tabba wrote:
>> Hi folks,
>> 
>> I really like Sashiko and find it very useful. It's flagged real bugs
>> in series I and others have posted to the list (e.g. [1][2][3]), and I
>> run it locally before sending, which has saved me a few respins.
>> 
>> That said, it's been posting a lot lately, so it seemed worth asking
>> how the review emails to the list are working out, and whether we
>> should change anything.
> 
> So this is entirely my fault since I added the email configuration for
> the kvmarm list. Sashiko has been finding some truly nasty bugs, posting
> on-list is the easiest way to get attention from the right folks to get
> things fixed.
> 
> With that being said, the signal to noise ratio hasn't been ideal.
> 
>> These fixes will take a while to land, so the question is what to do
>> meanwhile. Some options, and surely others:
>> 
>>  - Leave it as-is while the fixes propagate.
>>  - Stop the emails but keep the reviews on sashiko.dev, so people can
>> look rather than have them pushed.
>>  - Disable the emails until the noise is down to a reasonable level,
>> then re-enable.
> 
> This sounds like the right approach. I'd like to re-enable emails once
> we're happy with the quality of reviews.

From my perspective there are 3 main factors affecting the quality of reviews:
1) llm model capabilities. we have little control here, but it’s reasonable to expect that things will get better.
2) sashiko’s common code/harness. we’re improving it, but I’m not sure we have a lot of room left here, probably some. also, there are many tradeoffs to make, e.g. if we start verifying each issue separately, it almost certainly will improve signal/noise, but it will require way more tokens and will be slower.
3) developing per-subsystem prompts. there is a lot of potential here, but this is where we mostly rely on maintainers and developers.

Also not trying to push back on the decision, but I think it’s worth asking what is a reasonable level?
I tried to measure the true positive rate several times based on human feedback and it always was at least ~80% (and usually more for critical/high severity bugs).
And I’m afraid that it won’t be ~100% without compromising on the ability to find bugs. The reality is that there is a significant percentage of issues which are not exactly black and white and even people don’t necessarily agree if it’s an issue or not. Also there is a non-trivial amount of cases when ai findings are incorrectly dismissed by humans.

I’m not trying to pretend it’s perfect (it’s not), and I certainly expect it to be better going forward (and I’m working on it),
but the point is that we might not see _dramatic_ improvements going forward simple because there is no room left for it,
it’s more like a long tail of grey zone issues.

> On another note: Roman, is it possible to separately report pre-existing
> issues from findings in a patch? Maintainers have a higher likelihood of
> caring about these than individual contributors anyway.

It’s certainly possible, but how exactly do you see it?

It’s already somewhat separated (separate counters, separate list on top of each email).
I don’t want to stop reporting them completely (but we can discuss it), because it produces
a constant stream of fixes in the upstream kernel (in hundreds already).

I plan to improve it to e.g. not report multiple times within the same patchset.

Thanks

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2026-06-19 18:06 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-19 14:19 Sashiko review emails to the list Fuad Tabba
2026-06-19 16:45 ` Oliver Upton
2026-06-19 18:05   ` Roman Gushchin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.