[PATCH] ocfs2: fix UBSAN array-index-out-of-bounds in ocfs2_sum_rightmost

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH] ocfs2: fix UBSAN array-index-out-of-bounds in ocfs2_sum_rightmost_rec
@ 2026-06-01 15:30 Ian Bridges
  2026-06-02  1:46 ` Joseph Qi
  0 siblings, 1 reply; 3+ messages in thread
From: Ian Bridges @ 2026-06-01 15:30 UTC (permalink / raw)
  To: Mark Fasheh, Joel Becker, Joseph Qi, ocfs2-devel, linux-kernel

[BUG]
On-disk corruption setting l_next_free_rec to 0 in an inode's inline
extent list triggers a UBSAN panic on the next write to that file.

[CAUSE]
ocfs2_sum_rightmost_rec() computes
i = le16_to_cpu(el->l_next_free_rec) - 1
and accesses el->l_recs[i] without validating i. When l_next_free_rec
is 0, i becomes -1; when l_next_free_rec exceeds l_count, i falls
past the end of the array. Either case violates the
__counted_by_le(l_count) annotation on l_recs[] and triggers UBSAN.

[FIX]
Read l_count once into a local variable to eliminate a TOCTOU between
the bounds check and the __counted_by_le-generated check at the array
access. Validate i against both bounds before dereferencing l_recs[].
On an out-of-bounds index call ocfs2_error() since both cases indicate
filesystem corruption.

Update the signature to accept a struct ocfs2_caching_info *ci and
return int, with the cluster sum returned through a u32 out-parameter.
Update both callers to pass et->et_ci and propagate the error.

Fixes: 2f26f58df041 ("ocfs2: annotate flexible array members with __counted_by_le()")
Reported-by: syzbot+be16e33db01e6644db7a@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=be16e33db01e6644db7a
Signed-off-by: Ian Bridges <icb@fastmail.org>
---
This patch contains a proposed fix for a crash reported by syzbot
in ocfs2_grow_tree().

The file names and offsets in this description are from commit
7cb1c5b32a2bfde961fff8d5204526b609bcb30a from this repo:
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging.git

I also have a small test harness that reproduces the original panic,
which I can make available as well.

The Bug

The ocfs2_extent_list structure (fs/ocfs2/ocfs2_fs.h:458) contains
two fields relevant to this bug: l_count, the total number of extent
record slots in the list, and l_next_free_rec, the index of the next
unused slot. The extent record array l_recs is annotated
__counted_by_le(l_count) (fs/ocfs2/ocfs2_fs.h:472), which instructs
UBSAN to bounds-check every access to l_recs against l_count.

ocfs2_sum_rightmost_rec() (fs/ocfs2/alloc.c:1106) is a helper used
by both ocfs2_add_branch() and ocfs2_shift_tree_depth() to compute
the logical cluster position just past the rightmost occupied extent
record. It derives the index of that record as:

i = le16_to_cpu(el->l_next_free_rec) - 1;  /* alloc.c:1110 */

and then accesses el->l_recs[i] (fs/ocfs2/alloc.c:1112) without
first checking that i is a valid index. This is the root cause of
the bug.

The syzbot report shows index -1 at the crash site, which means
l_next_free_rec was 0 at the point of the crash. The crash occurs
inside ocfs2_shift_tree_depth() (fs/ocfs2/alloc.c:1375), which is
reached when ocfs2_find_branch_target() returns 1. That return value
is only produced when l_next_free_rec equals l_count
(fs/ocfs2/alloc.c:1530). Since l_next_free_rec is 0, l_count must
also be 0 for this condition to hold.

The normal inode read path, ocfs2_validate_inode_block()
(fs/ocfs2/inode.c:1423), does not validate the inline extent list
fields l_count or l_next_free_rec. A filesystem image with these
fields set to 0 in an inode's inline extent list therefore passes
read-time validation without error.

The syzbot console log shows a separate validation error at mount
time — "Invalid dinode #17058: Corrupt state (nlink = 0 or mode =
0)" — indicating that the mounted filesystem contained at least one
other corrupted inode. No reproducer was available in the report, so
the exact mechanism by which the corruption was introduced is not
known.

Here is a breakdown of how the crash is triggered:

1. A write syscall eventually calls into ocfs2_insert_extent() to
record the newly allocated cluster in the inode's extent tree.

2. ocfs2_insert_extent() determines that the extent tree has no room
for a new record and calls ocfs2_grow_tree() (fs/ocfs2/alloc.c:1550).

3. ocfs2_grow_tree() calls ocfs2_find_branch_target()
(fs/ocfs2/alloc.c:1561) to walk the tree and find a node with a
free slot. Because l_tree_depth is 0, the traversal loop
(fs/ocfs2/alloc.c:1491) does not execute. At
fs/ocfs2/alloc.c:1530, the condition

el->l_next_free_rec == el->l_count   /* 0 == 0 */

evaluates to true, so the function returns 1, indicating the tree
must grow by shifting its depth.

4. Back in ocfs2_grow_tree(), shift is 1, so ocfs2_shift_tree_depth()
is called (fs/ocfs2/alloc.c:1581).

5. ocfs2_shift_tree_depth() (fs/ocfs2/alloc.c:1375) allocates a new
extent block and copies the root extent list into it. At
fs/ocfs2/alloc.c:1420, it sets:

eb_el->l_next_free_rec = root_el->l_next_free_rec;  /* = 0 */

The copy loop at fs/ocfs2/alloc.c:1421 runs zero iterations since
l_next_free_rec is 0, so eb_el is left with an empty extent list.

6. ocfs2_shift_tree_depth() then calls
ocfs2_sum_rightmost_rec(eb_el) (fs/ocfs2/alloc.c:1433) to determine
the cluster count for the new root record.

7. Inside ocfs2_sum_rightmost_rec(), i is computed as:

i = le16_to_cpu(eb_el->l_next_free_rec) - 1 = 0 - 1 = -1

The subsequent access to el->l_recs[-1] (fs/ocfs2/alloc.c:1112)
violates the __counted_by_le(l_count) annotation on l_recs[]
(fs/ocfs2/ocfs2_fs.h:472). __counted_by_le is a macro defined in
include/linux/compiler_types.h that expands to __counted_by, which
is the GCC/Clang attribute UBSAN uses for array bounds checking. The
UBSAN error message reports __counted_by rather than __counted_by_le
because the check is performed against the expanded attribute. This
triggers a UBSAN array-index-out-of-bounds panic.

The Proposed Fix

The fix modifies ocfs2_sum_rightmost_rec() with three changes.

First, l_count is read once into a local variable before the bounds
check. The __counted_by_le(l_count) annotation causes the compiler
to emit a separate read of el->l_count at the array access site for
the UBSAN check. Without the local variable, there are two
independent reads of l_count — one for the guard and one at the
array access site, where __counted_by_le causes the compiler to
re-read it for the UBSAN check — creating a TOCTOU between them.
Reading l_count once before the guard reduces this window to the
minimum.

Second, the index is checked against both bounds before dereferencing
l_recs[]. Both checks are sound: i < 0 compares against the constant
0, requiring no trust in any on-disk value; i >= count checks the
invariant l_next_free_rec <= l_count, which holds on any valid
filesystem independent of the actual field values. Neither check can
fire on a valid extent list. Corruption that keeps l_next_free_rec
within [1, l_count] will pass the check and produce an incorrect
result, but this is pre-existing behaviour — not a regression — and
cannot be avoided without an external ground truth for l_count, which
I do not believe exists for the inode's inline extent list.

Third, on an out-of-bounds index, ocfs2_error() is called rather than
silently returning a usable value.

To accommodate these changes the function signature is updated: a
struct ocfs2_caching_info *ci parameter is added for superblock
access, the return type changes from u32 to int, the cluster sum is
returned via a new u32 out-parameter, and the inline qualifier is
removed since the function is no longer a trivial helper. Both
callers, ocfs2_add_branch() and ocfs2_shift_tree_depth(), already
have status variables and bail labels, and are updated to pass
et->et_ci, check the return value, and propagate any error up their
respective call chains.

 fs/ocfs2/alloc.c | 36 ++++++++++++++++++++++++++++--------
 1 file changed, 28 insertions(+), 8 deletions(-)

diff --git a/fs/ocfs2/alloc.c b/fs/ocfs2/alloc.c
index 6e5fd3f12a84..329bda065a7a 100644
--- a/fs/ocfs2/alloc.c
+++ b/fs/ocfs2/alloc.c
@@ -1103,14 +1103,22 @@ static int ocfs2_create_new_meta_bhs(handle_t *handle,
  * ocfs2_shift_tree_depth() uses this to determine the # clusters
  * value for the new topmost tree record.
  */
-static inline u32 ocfs2_sum_rightmost_rec(struct ocfs2_extent_list  *el)
+static int ocfs2_sum_rightmost_rec(struct ocfs2_caching_info *ci,
+				   struct ocfs2_extent_list *el,
+				   u32 *result)
 {
-	int i;
+	u16 count = le16_to_cpu(el->l_count);
+	int i = le16_to_cpu(el->l_next_free_rec) - 1;

-	i = le16_to_cpu(el->l_next_free_rec) - 1;
+	if (i < 0 || i >= count)
+		return ocfs2_error(ocfs2_metadata_cache_get_super(ci),
+				   "Owner %llu has invalid l_next_free_rec %u (l_count %u)\n",
+				   (unsigned long long)ocfs2_metadata_cache_owner(ci),
+				   le16_to_cpu(el->l_next_free_rec), count);

-	return le32_to_cpu(el->l_recs[i].e_cpos) +
-		ocfs2_rec_clusters(el, &el->l_recs[i]);
+	*result = le32_to_cpu(el->l_recs[i].e_cpos) +
+		  ocfs2_rec_clusters(el, &el->l_recs[i]);
+	return 0;
 }

 /*
@@ -1199,8 +1207,16 @@ static int ocfs2_add_branch(handle_t *handle,
 	new_blocks = le16_to_cpu(el->l_tree_depth);

 	eb = (struct ocfs2_extent_block *)(*last_eb_bh)->b_data;
-	new_cpos = ocfs2_sum_rightmost_rec(&eb->h_list);
-	root_end = ocfs2_sum_rightmost_rec(et->et_root_el);
+	status = ocfs2_sum_rightmost_rec(et->et_ci, &eb->h_list, &new_cpos);
+	if (status < 0) {
+		mlog_errno(status);
+		goto bail;
+	}
+	status = ocfs2_sum_rightmost_rec(et->et_ci, et->et_root_el, &root_end);
+	if (status < 0) {
+		mlog_errno(status);
+		goto bail;
+	}

 	/*
 	 * If there is a gap before the root end and the real end
@@ -1430,7 +1446,11 @@ static int ocfs2_shift_tree_depth(handle_t *handle,
 		goto bail;
 	}

-	new_clusters = ocfs2_sum_rightmost_rec(eb_el);
+	status = ocfs2_sum_rightmost_rec(et->et_ci, eb_el, &new_clusters);
+	if (status < 0) {
+		mlog_errno(status);
+		goto bail;
+	}

 	/* update root_bh now */
 	le16_add_cpu(&root_el->l_tree_depth, 1);
-- 
2.47.3

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] ocfs2: fix UBSAN array-index-out-of-bounds in ocfs2_sum_rightmost_rec
  2026-06-01 15:30 [PATCH] ocfs2: fix UBSAN array-index-out-of-bounds in ocfs2_sum_rightmost_rec Ian Bridges
@ 2026-06-02  1:46 ` Joseph Qi
  2026-06-03 12:38   ` Ian Bridges
  0 siblings, 1 reply; 3+ messages in thread
From: Joseph Qi @ 2026-06-02  1:46 UTC (permalink / raw)
  To: Ian Bridges
  Cc: Mark Fasheh, Joel Becker, ocfs2-devel@lists.linux.dev,
	linux-kernel@vger.kernel.org



On 6/1/26 11:30 PM, Ian Bridges wrote:
> [BUG]
> On-disk corruption setting l_next_free_rec to 0 in an inode's inline
> extent list triggers a UBSAN panic on the next write to that file.
> 
> [CAUSE]
> ocfs2_sum_rightmost_rec() computes
> i = le16_to_cpu(el->l_next_free_rec) - 1
> and accesses el->l_recs[i] without validating i. When l_next_free_rec
> is 0, i becomes -1; when l_next_free_rec exceeds l_count, i falls
> past the end of the array. Either case violates the
> __counted_by_le(l_count) annotation on l_recs[] and triggers UBSAN.
> 
> [FIX]
> Read l_count once into a local variable to eliminate a TOCTOU between
> the bounds check and the __counted_by_le-generated check at the array
> access. Validate i against both bounds before dereferencing l_recs[].
> On an out-of-bounds index call ocfs2_error() since both cases indicate
> filesystem corruption.
> 
> Update the signature to accept a struct ocfs2_caching_info *ci and
> return int, with the cluster sum returned through a u32 out-parameter.
> Update both callers to pass et->et_ci and propagate the error.
> 
> Fixes: 2f26f58df041 ("ocfs2: annotate flexible array members with __counted_by_le()")
> Reported-by: syzbot+be16e33db01e6644db7a@syzkaller.appspotmail.com
> Closes: https://syzkaller.appspot.com/bug?extid=be16e33db01e6644db7a
> Signed-off-by: Ian Bridges <icb@fastmail.org>
> ---
> This patch contains a proposed fix for a crash reported by syzbot
> in ocfs2_grow_tree().
> 
> The file names and offsets in this description are from commit
> 7cb1c5b32a2bfde961fff8d5204526b609bcb30a from this repo:
> git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging.git
> 
> I also have a small test harness that reproduces the original panic,
> which I can make available as well.
> 
> The Bug
> 
> The ocfs2_extent_list structure (fs/ocfs2/ocfs2_fs.h:458) contains
> two fields relevant to this bug: l_count, the total number of extent
> record slots in the list, and l_next_free_rec, the index of the next
> unused slot. The extent record array l_recs is annotated
> __counted_by_le(l_count) (fs/ocfs2/ocfs2_fs.h:472), which instructs
> UBSAN to bounds-check every access to l_recs against l_count.
> 
> ocfs2_sum_rightmost_rec() (fs/ocfs2/alloc.c:1106) is a helper used
> by both ocfs2_add_branch() and ocfs2_shift_tree_depth() to compute
> the logical cluster position just past the rightmost occupied extent
> record. It derives the index of that record as:
> 
> i = le16_to_cpu(el->l_next_free_rec) - 1;  /* alloc.c:1110 */
> 
> and then accesses el->l_recs[i] (fs/ocfs2/alloc.c:1112) without
> first checking that i is a valid index. This is the root cause of
> the bug.
> 
> The syzbot report shows index -1 at the crash site, which means
> l_next_free_rec was 0 at the point of the crash. The crash occurs
> inside ocfs2_shift_tree_depth() (fs/ocfs2/alloc.c:1375), which is
> reached when ocfs2_find_branch_target() returns 1. That return value
> is only produced when l_next_free_rec equals l_count
> (fs/ocfs2/alloc.c:1530). Since l_next_free_rec is 0, l_count must
> also be 0 for this condition to hold.
> 
> The normal inode read path, ocfs2_validate_inode_block()
> (fs/ocfs2/inode.c:1423), does not validate the inline extent list
> fields l_count or l_next_free_rec. A filesystem image with these
> fields set to 0 in an inode's inline extent list therefore passes
> read-time validation without error.
> 

So why not move the check into ocfs2_validate_inode_block()?
Seems that is the right place to do this kind of work.

Thanks,
Joseph

> The syzbot console log shows a separate validation error at mount
> time — "Invalid dinode #17058: Corrupt state (nlink = 0 or mode =
> 0)" — indicating that the mounted filesystem contained at least one
> other corrupted inode. No reproducer was available in the report, so
> the exact mechanism by which the corruption was introduced is not
> known.
> 
> Here is a breakdown of how the crash is triggered:
> 
> 1. A write syscall eventually calls into ocfs2_insert_extent() to
> record the newly allocated cluster in the inode's extent tree.
> 
> 2. ocfs2_insert_extent() determines that the extent tree has no room
> for a new record and calls ocfs2_grow_tree() (fs/ocfs2/alloc.c:1550).
> 
> 3. ocfs2_grow_tree() calls ocfs2_find_branch_target()
> (fs/ocfs2/alloc.c:1561) to walk the tree and find a node with a
> free slot. Because l_tree_depth is 0, the traversal loop
> (fs/ocfs2/alloc.c:1491) does not execute. At
> fs/ocfs2/alloc.c:1530, the condition
> 
> el->l_next_free_rec == el->l_count   /* 0 == 0 */
> 
> evaluates to true, so the function returns 1, indicating the tree
> must grow by shifting its depth.
> 
> 4. Back in ocfs2_grow_tree(), shift is 1, so ocfs2_shift_tree_depth()
> is called (fs/ocfs2/alloc.c:1581).
> 
> 5. ocfs2_shift_tree_depth() (fs/ocfs2/alloc.c:1375) allocates a new
> extent block and copies the root extent list into it. At
> fs/ocfs2/alloc.c:1420, it sets:
> 
> eb_el->l_next_free_rec = root_el->l_next_free_rec;  /* = 0 */
> 
> The copy loop at fs/ocfs2/alloc.c:1421 runs zero iterations since
> l_next_free_rec is 0, so eb_el is left with an empty extent list.
> 
> 6. ocfs2_shift_tree_depth() then calls
> ocfs2_sum_rightmost_rec(eb_el) (fs/ocfs2/alloc.c:1433) to determine
> the cluster count for the new root record.
> 
> 7. Inside ocfs2_sum_rightmost_rec(), i is computed as:
> 
> i = le16_to_cpu(eb_el->l_next_free_rec) - 1 = 0 - 1 = -1
> 
> The subsequent access to el->l_recs[-1] (fs/ocfs2/alloc.c:1112)
> violates the __counted_by_le(l_count) annotation on l_recs[]
> (fs/ocfs2/ocfs2_fs.h:472). __counted_by_le is a macro defined in
> include/linux/compiler_types.h that expands to __counted_by, which
> is the GCC/Clang attribute UBSAN uses for array bounds checking. The
> UBSAN error message reports __counted_by rather than __counted_by_le
> because the check is performed against the expanded attribute. This
> triggers a UBSAN array-index-out-of-bounds panic.
> 
> The Proposed Fix
> 
> The fix modifies ocfs2_sum_rightmost_rec() with three changes.
> 
> First, l_count is read once into a local variable before the bounds
> check. The __counted_by_le(l_count) annotation causes the compiler
> to emit a separate read of el->l_count at the array access site for
> the UBSAN check. Without the local variable, there are two
> independent reads of l_count — one for the guard and one at the
> array access site, where __counted_by_le causes the compiler to
> re-read it for the UBSAN check — creating a TOCTOU between them.
> Reading l_count once before the guard reduces this window to the
> minimum.
> 
> Second, the index is checked against both bounds before dereferencing
> l_recs[]. Both checks are sound: i < 0 compares against the constant
> 0, requiring no trust in any on-disk value; i >= count checks the
> invariant l_next_free_rec <= l_count, which holds on any valid
> filesystem independent of the actual field values. Neither check can
> fire on a valid extent list. Corruption that keeps l_next_free_rec
> within [1, l_count] will pass the check and produce an incorrect
> result, but this is pre-existing behaviour — not a regression — and
> cannot be avoided without an external ground truth for l_count, which
> I do not believe exists for the inode's inline extent list.
> 
> Third, on an out-of-bounds index, ocfs2_error() is called rather than
> silently returning a usable value.
> 
> To accommodate these changes the function signature is updated: a
> struct ocfs2_caching_info *ci parameter is added for superblock
> access, the return type changes from u32 to int, the cluster sum is
> returned via a new u32 out-parameter, and the inline qualifier is
> removed since the function is no longer a trivial helper. Both
> callers, ocfs2_add_branch() and ocfs2_shift_tree_depth(), already
> have status variables and bail labels, and are updated to pass
> et->et_ci, check the return value, and propagate any error up their
> respective call chains.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] ocfs2: fix UBSAN array-index-out-of-bounds in ocfs2_sum_rightmost_rec
  2026-06-02  1:46 ` Joseph Qi
@ 2026-06-03 12:38   ` Ian Bridges
  0 siblings, 0 replies; 3+ messages in thread
From: Ian Bridges @ 2026-06-03 12:38 UTC (permalink / raw)
  To: Joseph Qi; +Cc: Mark Fasheh, Joel Becker, Joseph Qi, ocfs2-devel, linux-kernel

On Tue, Jun 02, 2026 at 09:46:56AM +0800, Joseph Qi wrote:
> 
> 
> On 6/1/26 11:30 PM, Ian Bridges wrote:
> > [BUG]
> > On-disk corruption setting l_next_free_rec to 0 in an inode's inline
> > extent list triggers a UBSAN panic on the next write to that file.
> > 
> > [CAUSE]
> > ocfs2_sum_rightmost_rec() computes
> > i = le16_to_cpu(el->l_next_free_rec) - 1
> > and accesses el->l_recs[i] without validating i. When l_next_free_rec
> > is 0, i becomes -1; when l_next_free_rec exceeds l_count, i falls
> > past the end of the array. Either case violates the
> > __counted_by_le(l_count) annotation on l_recs[] and triggers UBSAN.
> > 
> > [FIX]
> > Read l_count once into a local variable to eliminate a TOCTOU between
> > the bounds check and the __counted_by_le-generated check at the array
> > access. Validate i against both bounds before dereferencing l_recs[].
> > On an out-of-bounds index call ocfs2_error() since both cases indicate
> > filesystem corruption.
> > 
> > Update the signature to accept a struct ocfs2_caching_info *ci and
> > return int, with the cluster sum returned through a u32 out-parameter.
> > Update both callers to pass et->et_ci and propagate the error.
> > 
> > Fixes: 2f26f58df041 ("ocfs2: annotate flexible array members with __counted_by_le()")
> > Reported-by: syzbot+be16e33db01e6644db7a@syzkaller.appspotmail.com
> > Closes: https://syzkaller.appspot.com/bug?extid=be16e33db01e6644db7a
> > Signed-off-by: Ian Bridges <icb@fastmail.org>
> > ---
> > This patch contains a proposed fix for a crash reported by syzbot
> > in ocfs2_grow_tree().
> > 
> > The file names and offsets in this description are from commit
> > 7cb1c5b32a2bfde961fff8d5204526b609bcb30a from this repo:
> > git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging.git
> > 
> > I also have a small test harness that reproduces the original panic,
> > which I can make available as well.
> > 
> > The Bug
> > 
> > The ocfs2_extent_list structure (fs/ocfs2/ocfs2_fs.h:458) contains
> > two fields relevant to this bug: l_count, the total number of extent
> > record slots in the list, and l_next_free_rec, the index of the next
> > unused slot. The extent record array l_recs is annotated
> > __counted_by_le(l_count) (fs/ocfs2/ocfs2_fs.h:472), which instructs
> > UBSAN to bounds-check every access to l_recs against l_count.
> > 
> > ocfs2_sum_rightmost_rec() (fs/ocfs2/alloc.c:1106) is a helper used
> > by both ocfs2_add_branch() and ocfs2_shift_tree_depth() to compute
> > the logical cluster position just past the rightmost occupied extent
> > record. It derives the index of that record as:
> > 
> > i = le16_to_cpu(el->l_next_free_rec) - 1;  /* alloc.c:1110 */
> > 
> > and then accesses el->l_recs[i] (fs/ocfs2/alloc.c:1112) without
> > first checking that i is a valid index. This is the root cause of
> > the bug.
> > 
> > The syzbot report shows index -1 at the crash site, which means
> > l_next_free_rec was 0 at the point of the crash. The crash occurs
> > inside ocfs2_shift_tree_depth() (fs/ocfs2/alloc.c:1375), which is
> > reached when ocfs2_find_branch_target() returns 1. That return value
> > is only produced when l_next_free_rec equals l_count
> > (fs/ocfs2/alloc.c:1530). Since l_next_free_rec is 0, l_count must
> > also be 0 for this condition to hold.
> > 
> > The normal inode read path, ocfs2_validate_inode_block()
> > (fs/ocfs2/inode.c:1423), does not validate the inline extent list
> > fields l_count or l_next_free_rec. A filesystem image with these
> > fields set to 0 in an inode's inline extent list therefore passes
> > read-time validation without error.
> > 
> 
> So why not move the check into ocfs2_validate_inode_block()?
> Seems that is the right place to do this kind of work.
> 
> Thanks,
> Joseph

When I wrote the patch, I tried to place the check as close to the use
to prevent potential TOCTOU issues. However, after reviewing the code
again, I do thing ofcs2_validate_inode_block() is the correct location
for check. I will work on resubmitting an updated patch.

Ian

> 
> > The syzbot console log shows a separate validation error at mount
> > time — "Invalid dinode #17058: Corrupt state (nlink = 0 or mode =
> > 0)" — indicating that the mounted filesystem contained at least one
> > other corrupted inode. No reproducer was available in the report, so
> > the exact mechanism by which the corruption was introduced is not
> > known.
> > 
> > Here is a breakdown of how the crash is triggered:
> > 
> > 1. A write syscall eventually calls into ocfs2_insert_extent() to
> > record the newly allocated cluster in the inode's extent tree.
> > 
> > 2. ocfs2_insert_extent() determines that the extent tree has no room
> > for a new record and calls ocfs2_grow_tree() (fs/ocfs2/alloc.c:1550).
> > 
> > 3. ocfs2_grow_tree() calls ocfs2_find_branch_target()
> > (fs/ocfs2/alloc.c:1561) to walk the tree and find a node with a
> > free slot. Because l_tree_depth is 0, the traversal loop
> > (fs/ocfs2/alloc.c:1491) does not execute. At
> > fs/ocfs2/alloc.c:1530, the condition
> > 
> > el->l_next_free_rec == el->l_count   /* 0 == 0 */
> > 
> > evaluates to true, so the function returns 1, indicating the tree
> > must grow by shifting its depth.
> > 
> > 4. Back in ocfs2_grow_tree(), shift is 1, so ocfs2_shift_tree_depth()
> > is called (fs/ocfs2/alloc.c:1581).
> > 
> > 5. ocfs2_shift_tree_depth() (fs/ocfs2/alloc.c:1375) allocates a new
> > extent block and copies the root extent list into it. At
> > fs/ocfs2/alloc.c:1420, it sets:
> > 
> > eb_el->l_next_free_rec = root_el->l_next_free_rec;  /* = 0 */
> > 
> > The copy loop at fs/ocfs2/alloc.c:1421 runs zero iterations since
> > l_next_free_rec is 0, so eb_el is left with an empty extent list.
> > 
> > 6. ocfs2_shift_tree_depth() then calls
> > ocfs2_sum_rightmost_rec(eb_el) (fs/ocfs2/alloc.c:1433) to determine
> > the cluster count for the new root record.
> > 
> > 7. Inside ocfs2_sum_rightmost_rec(), i is computed as:
> > 
> > i = le16_to_cpu(eb_el->l_next_free_rec) - 1 = 0 - 1 = -1
> > 
> > The subsequent access to el->l_recs[-1] (fs/ocfs2/alloc.c:1112)
> > violates the __counted_by_le(l_count) annotation on l_recs[]
> > (fs/ocfs2/ocfs2_fs.h:472). __counted_by_le is a macro defined in
> > include/linux/compiler_types.h that expands to __counted_by, which
> > is the GCC/Clang attribute UBSAN uses for array bounds checking. The
> > UBSAN error message reports __counted_by rather than __counted_by_le
> > because the check is performed against the expanded attribute. This
> > triggers a UBSAN array-index-out-of-bounds panic.
> > 
> > The Proposed Fix
> > 
> > The fix modifies ocfs2_sum_rightmost_rec() with three changes.
> > 
> > First, l_count is read once into a local variable before the bounds
> > check. The __counted_by_le(l_count) annotation causes the compiler
> > to emit a separate read of el->l_count at the array access site for
> > the UBSAN check. Without the local variable, there are two
> > independent reads of l_count — one for the guard and one at the
> > array access site, where __counted_by_le causes the compiler to
> > re-read it for the UBSAN check — creating a TOCTOU between them.
> > Reading l_count once before the guard reduces this window to the
> > minimum.
> > 
> > Second, the index is checked against both bounds before dereferencing
> > l_recs[]. Both checks are sound: i < 0 compares against the constant
> > 0, requiring no trust in any on-disk value; i >= count checks the
> > invariant l_next_free_rec <= l_count, which holds on any valid
> > filesystem independent of the actual field values. Neither check can
> > fire on a valid extent list. Corruption that keeps l_next_free_rec
> > within [1, l_count] will pass the check and produce an incorrect
> > result, but this is pre-existing behaviour — not a regression — and
> > cannot be avoided without an external ground truth for l_count, which
> > I do not believe exists for the inode's inline extent list.
> > 
> > Third, on an out-of-bounds index, ocfs2_error() is called rather than
> > silently returning a usable value.
> > 
> > To accommodate these changes the function signature is updated: a
> > struct ocfs2_caching_info *ci parameter is added for superblock
> > access, the return type changes from u32 to int, the cluster sum is
> > returned via a new u32 out-parameter, and the inline qualifier is
> > removed since the function is no longer a trivial helper. Both
> > callers, ocfs2_add_branch() and ocfs2_shift_tree_depth(), already
> > have status variables and bail labels, and are updated to pass
> > et->et_ci, check the return value, and propagate any error up their
> > respective call chains.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2026-06-03 12:38 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-01 15:30 [PATCH] ocfs2: fix UBSAN array-index-out-of-bounds in ocfs2_sum_rightmost_rec Ian Bridges
2026-06-02  1:46 ` Joseph Qi
2026-06-03 12:38   ` Ian Bridges

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.