How should we handle using AI for reviewing commits?

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* How should we handle using AI for reviewing commits?
@ 2026-03-12 16:45 Theodore Tso
  2026-03-12 16:57 ` Sasha Levin
  0 siblings, 1 reply; 2+ messages in thread
From: Theodore Tso @ 2026-03-12 16:45 UTC (permalink / raw)
  To: Linux Kernel Developers List; +Cc: Sasha Levin, Jonathan Corbet

[-- Attachment #1: Type: text/plain, Size: 2105 bytes --]

We recently used an AI review bot while applying an LTS backport to an
internal kernel tree at $WORK.  While doing the review, it flagged a
set of concerns which resulted in my creating a patch[1] to address
the issues that it found in the kernel commit.

[1] https://lore.kernel.org/all/20260310122806.1277631-1-tytso@mit.edu/

In this commit there is no LLM generated output in the code, but there
*is* LLM generated output in the commit description, since I quoted
the concerns raised by the LLM.  Per the our new coding-assistants
process document[2], "When AI tools contribute to kernel development,
proper attribution helps track the evolving role of AI in the
development process. Contributions should include an Assisted-by tag..."

[2] https://www.kernel.org/doc/html/latest/process/coding-assistants.html

When I was considering whether I should add something like:

Assisted-by: Gemini:Gemini-3.1-Pro [TOOL]

There was a couple of things that came to mind.  First, should we make
some kind of distintion between exactly how the AI tool assisted in
the development of the commit?  There's a big difference between using
an AI assistant to find a potential issue, to an AI assistant which
created new code out of whole cloth, with a spectrum of changes in
between.  Given that the stated code was to "track the evolving role
of AI", it occured to me that perhaps we should add some indication
about exactly what was the nature of assistance that was provided.

The second observation that I had was that example set of tools for
[TOOL] was "specialized analysis tools": coccinelle, sparse, smatch,
clang-tidy.  I assume the intent was if an AI bot started using tools
like sparse, coccinelle, as an agent?w

If there is a set of LLM prompts which has a name, would that also be
appropriate for TOOL?  Chris Mason's repo has a fairly non-descriptive
name, "review-prompts", but in the future when companies start making
their internal review prompts public, some of them may have more
evocative names that might be more unique and more marketing friendly.  :-)

	  	     	      	   	      - Ted

[-- Attachment #2: Type: message/rfc822, Size: 13813 bytes --]

From: "Theodore Ts'o" <tytso@mit.edu>
To: Ext4 Developers List <linux-ext4@vger.kernel.org>
Cc: "Theodore Ts'o" <tytso@mit.edu>, Jan Kara <jack@suse.cz>
Subject: [RFC PATCH] ext4: handle wraparound when searching for blocks for indirect mapped blocks
Date: Tue, 10 Mar 2026 08:28:06 -0400
Message-ID: <20260310122806.1277631-1-tytso@mit.edu>

Commit 4865c768b563 ("ext4: always allocate blocks only from groups
inode can use") restricts what blocks will be allocated for indirect
block based files to block numbers that fit within 32-bit block
numbers.

However, when using a review bot running on the latest Gemini LLM to
check this commit when backporting into an LTS based kernel, it raised
the following concerns:

   If ac->ac_g_ex.fe_group is >= ngroups (for instance, if the goal
   group was populated via stream allocation from s_mb_last_groups),
   then start will be >= ngroups.

   Does this allow allocating blocks beyond the 32-bit limit for
   indirect block mapped files? The commit message mentions that
   ext4_mb_scan_groups_linear() takes care to not select unsupported
   groups. However, its loop uses group = *start, and the very first
   iteration will call ext4_mb_scan_group() with this unsupported
   group because next_linear_group() is only called at the end of the
   iteration.

   Also, in the optimized scanning paths like
   ext4_mb_scan_groups_p2_aligned(), start is passed into
   ext4_mb_scan_groups_xa_range(). If start >= ngroups, will this
   trigger the WARN_ON_ONCE(end > ngroups || start >= end) because end
   is set to ngroups? Furthermore, after wrapping around, end will be
   set to the original start which is > ngroups, triggering the
   warning a second time. Should this code clamp start to < ngroups
   before scanning?

After reviewing the code paths involved and considering the LLM
review, I believe that while it is unlikely, it is possible that how
commit 4865c768b563 added ext4_get_allocation_groups_count() doesn't
completely address all of the possible situtions that might show up
when using indirect-mapped files and allocating blocks on a file
system with more than 2**32 blocks.

So this commit adds some safety checks and wrap around logic to some
functions if the multiblock allocator: ext4_mb_scan_groups_xa_range(),
ext4_mb_scan_groups_best_avail(), ext4_mb_scan_groups_p2_aligned(),
and ext4_mb_scan_groups().

Fixes: 4865c768b563 ("ext4: always allocate blocks only from groups inode can use")
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: Jan Kara <jack@suse.cz>
---
 fs/ext4/mballoc.c | 26 ++++++++++++++++++++------
 1 file changed, 20 insertions(+), 6 deletions(-)

diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 20e9fdaf4301..454d5a1b4803 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -915,13 +915,17 @@ static int ext4_mb_scan_groups_xa_range(struct ext4_allocation_context *ac,
 	struct ext4_sb_info *sbi = EXT4_SB(sb);
 	enum criteria cr = ac->ac_criteria;
 	ext4_group_t ngroups = ext4_get_allocation_groups_count(ac);
-	unsigned long group = start;
+	unsigned long group, end_range;
 	struct ext4_group_info *grp;

-	if (WARN_ON_ONCE(end > ngroups || start >= end))
-		return 0;
-
-	xa_for_each_range(xa, group, grp, start, end - 1) {
+	if (start >= ngroups)
+		start = 0;
+	group = start;
+wrap_around:
+	end_range = end;
+	if (end_range > ngroups)
+		end_range = ngroups;
+	xa_for_each_range(xa, group, grp, start, end_range - 1) {
 		int err;

 		if (sbi->s_mb_stats)
@@ -933,7 +937,11 @@ static int ext4_mb_scan_groups_xa_range(struct ext4_allocation_context *ac,

 		cond_resched();
 	}
-
+	if (start) {
+		end = start;
+		start = 0;
+		goto wrap_around;
+	}
 	return 0;
 }

@@ -967,6 +975,8 @@ static int ext4_mb_scan_groups_p2_aligned(struct ext4_allocation_context *ac,

 	start = group;
 	end = ext4_get_allocation_groups_count(ac);
+	if (start >= end)
+		return 0;
 wrap_around:
 	for (i = ac->ac_2order; i < MB_NUM_ORDERS(ac->ac_sb); i++) {
 		ret = ext4_mb_scan_groups_largest_free_order_range(ac, i,
@@ -1099,6 +1109,8 @@ static int ext4_mb_scan_groups_best_avail(struct ext4_allocation_context *ac,

 	start = group;
 	end = ext4_get_allocation_groups_count(ac);
+	if (start >= end)
+		start = 0;
 wrap_around:
 	for (i = order; i >= min_order; i--) {
 		int frag_order;
@@ -1199,6 +1211,8 @@ static int ext4_mb_scan_groups(struct ext4_allocation_context *ac)

 	/* searching for the right group start from the goal value specified */
 	start = ac->ac_g_ex.fe_group;
+	if (start >= ngroups)
+		start = 0;
 	ac->ac_prefetch_grp = start;
 	ac->ac_prefetch_nr = 0;

-- 
2.51.0

^ permalink raw reply related	[flat|nested] 2+ messages in thread

* Re: How should we handle using AI for reviewing commits?
  2026-03-12 16:45 How should we handle using AI for reviewing commits? Theodore Tso
@ 2026-03-12 16:57 ` Sasha Levin
  0 siblings, 0 replies; 2+ messages in thread
From: Sasha Levin @ 2026-03-12 16:57 UTC (permalink / raw)
  To: Theodore Tso; +Cc: Linux Kernel Developers List, Jonathan Corbet

On Thu, Mar 12, 2026 at 12:45:09PM -0400, Theodore Tso wrote:
>We recently used an AI review bot while applying an LTS backport to an
>internal kernel tree at $WORK.  While doing the review, it flagged a
>set of concerns which resulted in my creating a patch[1] to address
>the issues that it found in the kernel commit.
>
>[1] https://lore.kernel.org/all/20260310122806.1277631-1-tytso@mit.edu/
>
>In this commit there is no LLM generated output in the code, but there
>*is* LLM generated output in the commit description, since I quoted
>the concerns raised by the LLM.  Per the our new coding-assistants
>process document[2], "When AI tools contribute to kernel development,
>proper attribution helps track the evolving role of AI in the
>development process. Contributions should include an Assisted-by tag..."
>
>[2] https://www.kernel.org/doc/html/latest/process/coding-assistants.html
>
>When I was considering whether I should add something like:
>
>Assisted-by: Gemini:Gemini-3.1-Pro [TOOL]
>
>There was a couple of things that came to mind.  First, should we make
>some kind of distintion between exactly how the AI tool assisted in
>the development of the commit?  There's a big difference between using
>an AI assistant to find a potential issue, to an AI assistant which
>created new code out of whole cloth, with a spectrum of changes in
>between.  Given that the stated code was to "track the evolving role
>of AI", it occured to me that perhaps we should add some indication
>about exactly what was the nature of assistance that was provided.

Why not use Reported-by for these? We already use it for tools (like syzbot),
so this would just be a natural extension.

	Reported-by: Gemini:Gemini-3.1-Pro

>The second observation that I had was that example set of tools for
>[TOOL] was "specialized analysis tools": coccinelle, sparse, smatch,
>clang-tidy.  I assume the intent was if an AI bot started using tools
>like sparse, coccinelle, as an agent?w

Yup, both that as well as just space to list additional tools that the human
used too without adding multiple Assisted-by: lines.

>If there is a set of LLM prompts which has a name, would that also be
>appropriate for TOOL?  Chris Mason's repo has a fairly non-descriptive
>name, "review-prompts", but in the future when companies start making
>their internal review prompts public, some of them may have more
>evocative names that might be more unique and more marketing friendly.  :-)

Something like:

	Reported-by: Gemini:Gemini-3.1-Pro # https://example.com/my-magical-prompt

?

-- 
Thanks,
Sasha

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2026-03-12 16:57 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-12 16:45 How should we handle using AI for reviewing commits? Theodore Tso
2026-03-12 16:57 ` Sasha Levin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox