Devicetree
 help / color / mirror / Atom feed
* Re: Stop false review statements
@ 2026-05-17 19:53 Roman Gushchin
  0 siblings, 0 replies; 46+ messages in thread
From: Roman Gushchin @ 2026-05-17 19:53 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Greg KH, Krzysztof Kozlowski, debarbos, Arnaldo Carvalho de Melo,
	Konstantin Ryabitsev, Guenter Roeck, sashiko-bot, sashiko-reviews,
	sashiko, Linux Kernel Workflows, Linux Kernel Mailing List,
	devicetree, kfree


> On May 17, 2026, at 11:56 AM, Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> On Sun, 17 May 2026 11:17:06 -0700
> Roman Gushchin <roman.gushchin@linux.dev> wrote:
> 
>>> On May 17, 2026, at 9:40 AM, Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
>>> On Sun, 17 May 2026 12:12:00 +0200
>>> Greg KH <gregkh@linuxfoundation.org> wrote:
>>>>> On Sun, May 17, 2026 at 12:05:56PM +0200, Mauro Carvalho Chehab wrote:
>>>>> On Sat, 16 May 2026 14:59:44 -0700
>>>>> Roman Gushchin <roman.gushchin@linux.dev> wrote:
>>>>>>> On May 16, 2026, at 2:33 PM, Krzysztof Kozlowski <krzk@kernel.org> wrote:
>>>>>>> I find it opposite: clogging commits with useless information, because
>>>>>>> some arbitrary and completely closed-source tool did analysis means
>>>>>>> nothing to me one year later when I look at the commit in the Git history.      
>>>>>> This is simple not true: Sashiko is fully open-source, under Apache 2.0 license
>>>>>> and the code belongs to LF.     
>>>>>> Yes, the instance behind sashiko.dev is using
>>>>>> Gemini 3.1 Pro LLM, which is not open-source, but it’s not a fundamental limitation -
>>>>>> Sashiko is supporting various LLMs, including open models - it’s just a practical
>>>>>> choice: to my knowledge the quality of open models is not on par with frontier closed
>>>>>> models     
>>>>> I would very much prefer using an open source LLM, even if not in pair
>>>>> with latest paid models.
>>>>>> and it would require a non-trivial amount of hardware and infrastructure to run
>>>>>> an open model at the required scale.    
>>>>> IMHO the best would be to have them running on some infra that would accept
>>>>> open source models (*). If there aren't enough resources to have our own
>>>>> infra, there are offers out there which allows running open source models
>>>>> like https://ollama.com/pricing (I never used myself).
>>>>> (*) For instance, Qwen3.6 is brand new and licensed under apache-2.0.
>>>>>   Not bad on my tests running it locally.    
>>>> You can run the tool locally, with whatever model you want, if you want
>>>> to.
>>>> But for now, let's just take the free credits that Google is willing to
>>>> throw at this thing and let it give us reviews IF the maintainer of the
>>>> subsystem feels it is something they want to do.  No one is forcing
>>>> maintainers to do this.  
>>> If Google and/or others are willing to give free credits on their cloud,
>>> they could instead or in addition give free credits to run ollama
>>> there, allowing us to use different models.
>>> From my side, while I won't personally object getting reviews from
>>> Sashiko/Gemini, this is something I can't reproduce locally. I would
>>> very much want something where I can select my LLM preferred model
>>> and run on my ollama docker container on my own GPU, in a way that
>>> I could run it locally before even sending a patch series.  
>> 
>> 2 thoughts here:
>> 1) I actually tried to run it with ollama on my personal framework 13. Adding nominal support is trivial,
>> but the whole thing is not really useful: I can get maybe few hundreds tokens per second using
>> a quantified model with reduced quality; an average sashiko review is consuming 3.5 millions tokens
>> (with Gemini 3.1 pro, it’s also model-dependent).
> 
> Do you mean 3.5 millions tokens per patch series? If so, that
> sounds a lot! Why does it require too many tokens?

It’s an average per patch, not a series. Some are much cheaper, some are much more expensive.
Sashiko posts token cost nearby each review.

Why it uses many tokens? Because in many cases it has to dig deep into the code.
Long sessions with multiple tool calls are expensive. Also Sashiko has a multi-stage
architecture, effectively it reviews every patch multiple times from different angles.
It has a measurable influence on the quality of reviews. The current generation of LLMs
is not good at spotting various types of issues at once: once it sees a memory leak
it can’t think anymore on e.g. locking issues. Also just by running the same thing multiple times
and combining the result you can meaningfully improve the quality.

>> I’m personally all in on having the entire thing as open as possible and I believe Sashiko is what
>> is realistically the best at this moment - a fully open-source harness and set of prompts which
>> can work with a variety of models.
>> I’m happy to merge a support for any LLM model which can produce decent review results.
>> 
>> 2) Due to probabilistic nature of LLMs, nothing is reproducible in a strict sense of the word.
>> Even with exactly the same model/harness/prompts you’ll get different results every time you run it.
>> It’s unfortunate, but it is what it is at the moment.
> 
> By "reproduce locally", I didn't mean in strict sense. Sure, LLM answers
> won't be identical, but I suspect that at least most of the major issues
> on a patch series would be reported by any decent model.

I believe we’re not quite there yet. Models do differ in their abilities to spot
various types of bugs and also producing false positives. Some types of issues
(e.g. complex locking issues) are really hard for best of the current models.

> So, if we have something that one can locally run using its GPU, being
> able to get an answer in the range of a couple of minutes per patch
> should be enough to catch most of the issues.

I’m happy to be wrong here, but my understanding is that it’s not realistic now.
Sashiko reviews taking longer with production grade hardware.

^ permalink raw reply	[flat|nested] 46+ messages in thread
* Re: Stop false review statements
@ 2026-05-17 19:42 Roman Gushchin
  2026-05-17 22:05 ` Mauro Carvalho Chehab
  0 siblings, 1 reply; 46+ messages in thread
From: Roman Gushchin @ 2026-05-17 19:42 UTC (permalink / raw)
  To: Theodore Tso
  Cc: Mauro Carvalho Chehab, Greg KH, Krzysztof Kozlowski, debarbos,
	Arnaldo Carvalho de Melo, Konstantin Ryabitsev, Guenter Roeck,
	sashiko-bot, sashiko-reviews, sashiko, Linux Kernel Workflows,
	Linux Kernel Mailing List, devicetree, kfree


> On May 17, 2026, at 11:57 AM, Theodore Tso <tytso@mit.edu> wrote:
> On Sun, May 17, 2026 at 11:17:06AM -0700, Roman Gushchin wrote:
>> 
>> I actually tried to run it with ollama on my
>> personal framework 13. Adding nominal support is trivial, but the
>> whole thing is not really useful: I can get maybe few hundreds
>> tokens per second using a quantified model with reduced quality; an
>> average sashiko review is consuming 3.5 millions tokens (with Gemini
>> 3.1 pro, it’s also model-dependent).
> 
> I'm curious.  What hardware and LLM model were you using?  A few
> hundred tokens per second seems surprising high.  My initial
> research[1] showes that an M5 Max Macbook Pro costing 5 or 6 kilobucks
> can do 31.6 tokens/second on a 27B 4-bit Quanitized model (Qwen 3.5).

I’ve framework 13 with amd 7840u. I’ve tried several models both on cpu and gpu. 
Sorry, it was a couple of months ago and I don’t remember all the details, so I won’t 
claim any specific numbers, but as I remember the best numbers were around 
a hundred tokens per second. In any case it’s few orders of magnitude slower than
 what is realistically required.

If someone has a powerful hardware and is willing to benchmark sashiko with open-source
models, I’m very interested in results.

> [1] https://www.reddit.com/r/LocalLLaMA/comments/1rzkw4x/m5_max_128g_performance_tests_i_just_got_my_new/
> 
> The model matters of course.  With Gemma 3 27B and a 6-bit
> quantization, it's 21 tokens/s, and with Deepseek R1 8B Q6_K, it's
> 72.8 tokens/second.  But unless you're using a really low-end model,
> or a really expensive, splufty hardware platform, I haven't seen
> reports of hundreds of tokens per second on hardware costing a
> reasonable amount of memory.  (I'll set aside the question of whether
> spending $6k for a fully spec'ed out M5 Max Macbook Pro, or $15k for a
> fully spec'ed out M3 Ultra Mac Studio is "reasonable".)
> 
> As a result I'm not entirely sure how realistic it is to do reviews
> using "free" (you still have to pay $$$ for the hardware) local,
> open-weight LLM's if an average review requires around 3.5 million
> tokens.

Fully agree. But it might change in few years, things are moving quickly.

^ permalink raw reply	[flat|nested] 46+ messages in thread
* Stop false review statements
@ 2026-05-16  8:05 Krzysztof Kozlowski
  2026-05-16 12:11 ` Guenter Roeck
  0 siblings, 1 reply; 46+ messages in thread
From: Krzysztof Kozlowski @ 2026-05-16  8:05 UTC (permalink / raw)
  To: sashiko-bot, sashiko-reviews, sashiko, Linux Kernel Workflows,
	Linux Kernel Mailing List, devicetree@vger.kernel.org

What the hell is that:

https://lore.kernel.org/all/20260515190707.033BDC2BCB0@smtp.kernel.org/

As a bot you CANNOT MAKE a Reviewer's statement of oversight. You are
not a damn human do be able to make such statement. You are a bot, a tool.

Stop faking tags.

And really, considering how many false positives Sashiko produces, how
poor review comments it gives, how many misleading comments, it's
unacceptable to me to consider that a review.

Amount of useless noise Sashiko produces already changed my mind how
useful that tool is.

I will be NAKing every damn tag produced by such tools.


Best regards,
Krzysztof


^ permalink raw reply	[flat|nested] 46+ messages in thread

end of thread, other threads:[~2026-05-18  5:31 UTC | newest]

Thread overview: 46+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-17 19:53 Stop false review statements Roman Gushchin
  -- strict thread matches above, loose matches on Subject: below --
2026-05-17 19:42 Roman Gushchin
2026-05-17 22:05 ` Mauro Carvalho Chehab
2026-05-16  8:05 Krzysztof Kozlowski
2026-05-16 12:11 ` Guenter Roeck
2026-05-16 12:16   ` Krzysztof Kozlowski
2026-05-16 12:23     ` Guenter Roeck
2026-05-16 12:29       ` Krzysztof Kozlowski
2026-05-16 13:24         ` Laurent Pinchart
2026-05-16 13:45           ` Krzysztof Kozlowski
2026-05-16 21:10           ` Mauro Carvalho Chehab
2026-05-17 15:21       ` Jonathan Corbet
2026-05-16 15:20   ` Konstantin Ryabitsev
2026-05-16 15:36     ` Greg KH
2026-05-16 15:41     ` Roman Gushchin
2026-05-16 15:45       ` Greg KH
2026-05-16 15:49         ` Roman Gushchin
2026-05-16 18:28           ` Arnaldo Carvalho de Melo
2026-05-16 21:29             ` Derek Barbosa
2026-05-16 21:33               ` Krzysztof Kozlowski
2026-05-16 21:59                 ` Roman Gushchin
2026-05-17  8:25                   ` Krzysztof Kozlowski
2026-05-17 10:05                   ` Mauro Carvalho Chehab
2026-05-17 10:10                     ` Willy Tarreau
2026-05-17 10:12                     ` Greg KH
2026-05-17 16:29                       ` Theodore Tso
2026-05-17 22:22                         ` Laurent Pinchart
2026-05-17 16:39                       ` Mauro Carvalho Chehab
2026-05-17 17:03                         ` Guenter Roeck
2026-05-17 18:17                         ` Roman Gushchin
2026-05-17 18:56                           ` Mauro Carvalho Chehab
2026-05-18  5:31                             ` Greg KH
2026-05-17 18:57                           ` Theodore Tso
2026-05-17 19:36                             ` Mauro Carvalho Chehab
2026-05-16 18:28           ` Krzysztof Kozlowski
2026-05-16 18:56             ` Roman Gushchin
2026-05-16 19:00               ` Krzysztof Kozlowski
2026-05-16 19:13                 ` Guenter Roeck
2026-05-16 19:25                   ` Guenter Roeck
2026-05-16 19:31                     ` Roman Gushchin
2026-05-16 19:15                 ` Roman Gushchin
2026-05-16 20:41                   ` Theodore Tso
2026-05-17 15:56                   ` Danilo Krummrich
2026-05-17 21:25                     ` Danilo Krummrich
2026-05-18  2:12           ` SeongJae Park
2026-05-16 22:32         ` Mauro Carvalho Chehab

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox