From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from out-171.mta0.migadu.com (out-171.mta0.migadu.com [91.218.175.171]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E7B412C11FD for ; Sun, 17 May 2026 19:53:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.171 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779047618; cv=none; b=iHJeWq+ml+5UsPvYLxf2Bssev44dwear+klgk0J0rryaKgRUR29zUIK1qfY0yw7G+z36xirp5XJQ/P/FmclVAv4C3f+lH7l5wTlsEi9OVjTGltvFVhoUCJjNfhEoO9+pFIk2GrrNK9wH2rcefTpFV1Mkp0WKUtR1VshFdGTGq3Q= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779047618; c=relaxed/simple; bh=rziN1hh0A5wT+pZjQfWI+vaKYthAU6o0Wnfyfi7LghU=; h=Content-Type:From:Mime-Version:Subject:Message-Id:Date:Cc:To; b=YH/gRVJ1w5ZtfkAw7HZ60gGM41h+O8ese2bsMR0hg+4oJc8O0LGWh/jFIliFS1IlJfvCXyyLuibyc5WdLxKNf2/OV+pJ9ttkLY8l1UqHvK3ahrVxMfZ2iFpyusvp4prNnA8GLQJ41k++UwV9itNbD8HySdb9PJop/fLYt6NnIMs= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=K6b+mF8g; arc=none smtp.client-ip=91.218.175.171 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="K6b+mF8g" Content-Type: text/plain; charset=utf-8 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1779047614; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=rrXf+swBl4yp9CtNuCEzzNucE6Q+vtW2TZ7N7Ibq2Gk=; b=K6b+mF8gJmKtADw1GEqNvYlcofJiUFGBMiRX6rw836icAeNlS551hvn+MjBEI6bbqj0efr iLA9NeKDJYFHvXe9konzB3cnVIY3LLCX+Lm1UJTQrklmdIGQGQhvBiBdte0esAWxVlUj4T 9jfylDEjzThfEv7/696dY2je91bJDk4= Content-Transfer-Encoding: quoted-printable X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Roman Gushchin Precedence: bulk X-Mailing-List: devicetree@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 (1.0) Subject: Re: Stop false review statements Message-Id: <2CB46AD9-BD19-4FAC-B992-C1CDBB9C36C0@linux.dev> Date: Sun, 17 May 2026 12:53:19 -0700 Cc: Greg KH , Krzysztof Kozlowski , debarbos@redhat.com, Arnaldo Carvalho de Melo , Konstantin Ryabitsev , Guenter Roeck , sashiko-bot@kernel.org, sashiko-reviews@lists.linux.dev, sashiko@lists.linux.dev, Linux Kernel Workflows , Linux Kernel Mailing List , devicetree@vger.kernel.org, kfree@google.com To: Mauro Carvalho Chehab X-Migadu-Flow: FLOW_OUT > On May 17, 2026, at 11:56=E2=80=AFAM, Mauro Carvalho Chehab wrote: > =EF=BB=BFOn Sun, 17 May 2026 11:17:06 -0700 > Roman Gushchin wrote: >=20 >>> On May 17, 2026, at 9:40=E2=80=AFAM, Mauro Carvalho Chehab wrote: >>> =EF=BB=BFOn Sun, 17 May 2026 12:12:00 +0200 >>> Greg KH wrote: >>>>> On Sun, May 17, 2026 at 12:05:56PM +0200, Mauro Carvalho Chehab wrote:= >>>>> On Sat, 16 May 2026 14:59:44 -0700 >>>>> Roman Gushchin wrote: >>>>>>> On May 16, 2026, at 2:33=E2=80=AFPM, Krzysztof Kozlowski wrote: >>>>>>> I find it opposite: clogging commits with useless information, becau= se >>>>>>> some arbitrary and completely closed-source tool did analysis means >>>>>>> nothing to me one year later when I look at the commit in the Git hi= story. =20 >>>>>> This is simple not true: Sashiko is fully open-source, under Apache 2= .0 license >>>>>> and the code belongs to LF. =20 >>>>>> Yes, the instance behind sashiko.dev is using >>>>>> Gemini 3.1 Pro LLM, which is not open-source, but it=E2=80=99s not a f= undamental limitation - >>>>>> Sashiko is supporting various LLMs, including open models - it=E2=80=99= s just a practical >>>>>> choice: to my knowledge the quality of open models is not on par with= frontier closed >>>>>> models =20 >>>>> I would very much prefer using an open source LLM, even if not in pair= >>>>> with latest paid models. >>>>>> and it would require a non-trivial amount of hardware and infrastruct= ure to run >>>>>> an open model at the required scale. =20 >>>>> IMHO the best would be to have them running on some infra that would a= ccept >>>>> open source models (*). If there aren't enough resources to have our o= wn >>>>> infra, there are offers out there which allows running open source mod= els >>>>> like https://ollama.com/pricing (I never used myself). >>>>> (*) For instance, Qwen3.6 is brand new and licensed under apache-2.0. >>>>> Not bad on my tests running it locally. =20 >>>> You can run the tool locally, with whatever model you want, if you want= >>>> to. >>>> But for now, let's just take the free credits that Google is willing to= >>>> throw at this thing and let it give us reviews IF the maintainer of the= >>>> subsystem feels it is something they want to do. No one is forcing >>>> maintainers to do this. =20 >>> If Google and/or others are willing to give free credits on their cloud,= >>> they could instead or in addition give free credits to run ollama >>> there, allowing us to use different models. >>> =46rom my side, while I won't personally object getting reviews from >>> Sashiko/Gemini, this is something I can't reproduce locally. I would >>> very much want something where I can select my LLM preferred model >>> and run on my ollama docker container on my own GPU, in a way that >>> I could run it locally before even sending a patch series. =20 >>=20 >> 2 thoughts here: >> 1) I actually tried to run it with ollama on my personal framework 13. Ad= ding nominal support is trivial, >> but the whole thing is not really useful: I can get maybe few hundreds to= kens per second using >> a quantified model with reduced quality; an average sashiko review is con= suming 3.5 millions tokens >> (with Gemini 3.1 pro, it=E2=80=99s also model-dependent). >=20 > Do you mean 3.5 millions tokens per patch series? If so, that > sounds a lot! Why does it require too many tokens? It=E2=80=99s an average per patch, not a series. Some are much cheaper, some= are much more expensive. Sashiko posts token cost nearby each review. Why it uses many tokens? Because in many cases it has to dig deep into the c= ode. Long sessions with multiple tool calls are expensive. Also Sashiko has a mul= ti-stage architecture, effectively it reviews every patch multiple times from differe= nt angles. It has a measurable influence on the quality of reviews. The current generat= ion of LLMs is not good at spotting various types of issues at once: once it sees a memo= ry leak it can=E2=80=99t think anymore on e.g. locking issues. Also just by running t= he same thing multiple times and combining the result you can meaningfully improve the quality. >> I=E2=80=99m personally all in on having the entire thing as open as possi= ble and I believe Sashiko is what >> is realistically the best at this moment - a fully open-source harness an= d set of prompts which >> can work with a variety of models. >> I=E2=80=99m happy to merge a support for any LLM model which can produce d= ecent review results. >>=20 >> 2) Due to probabilistic nature of LLMs, nothing is reproducible in a stri= ct sense of the word. >> Even with exactly the same model/harness/prompts you=E2=80=99ll get diffe= rent results every time you run it. >> It=E2=80=99s unfortunate, but it is what it is at the moment. >=20 > By "reproduce locally", I didn't mean in strict sense. Sure, LLM answers > won't be identical, but I suspect that at least most of the major issues > on a patch series would be reported by any decent model. I believe we=E2=80=99re not quite there yet. Models do differ in their abili= ties to spot various types of bugs and also producing false positives. Some types of issu= es (e.g. complex locking issues) are really hard for best of the current models= . > So, if we have something that one can locally run using its GPU, being > able to get an answer in the range of a couple of minutes per patch > should be enough to catch most of the issues. I=E2=80=99m happy to be wrong here, but my understanding is that it=E2=80=99= s not realistic now. Sashiko reviews taking longer with production grade hardware.=