Subject: [BUG] Five data races in in XFS Filesystem,one potentially harmful

linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Subject: [BUG] Five data races in in XFS Filesystem,one potentially harmful
@ 2025-05-13 12:25 cen zhang
  2025-05-13 21:56 ` Dave Chinner
  2025-05-14 13:17 ` Christoph Hellwig
  0 siblings, 2 replies; 4+ messages in thread
From: cen zhang @ 2025-05-13 12:25 UTC (permalink / raw)
  To: cem; +Cc: linux-xfs, linux-kernel, baijiaju1990, zhenghaoran154

Hello maintainers,

I would like to report five data race bugs we discovered in the XFS
filesystem on Linux kernel v6.14-rc4. These issues were identified
using our in-kernel data race detector.

Among the five races, we believe that four may be benign and could be
annotated using `data_race()` to suppress false positives from
analysis tools. However, one races involve shared global state or
critical memory, and their effects are unclear.
We would appreciate your evaluation on whether those should be fixed
or annotated.

Below is a summary of the findings:

---

Benign Races
============

1. Race in `xfs_bmapi_reserve_delalloc()` and  `xfs_vn_getattr()`
----------------------------------------------------------------

A data race on `ip->i_delayed_blks`.

============ DATARACE ============
Function: xfs_bmapi_reserve_delalloc+0x292c/0x2fd0 fs/xfs/xfs_bmap.c:4138
Function: xfs_buffered_write_iomap_begin+0x27bb/0x3bc0 fs/xfs/xfs_iomap.c:1197
Function: iomap_iter+0x572/0xad0
Function: iomap_file_buffered_write+0x23a/0xd10
Function: xfs_file_buffered_write+0x66b/0x2000 fs/xfs/xfs_file.c:792
Function: xfs_file_write_iter+0x129e/0x19f0 fs/xfs/xfs_file.c:881
Function: do_iter_readv_writev+0x4d6/0x720
Function: vfs_writev+0x348/0xc20
Function: do_writev+0x129/0x280
Function: do_syscall_64+0xc9/0x1a0
Function: entry_SYSCALL_64_after_hwframe+0x77/0x7f
Function: 0x0
========================
Function: xfs_vn_getattr+0x13c4/0x4c40 fs/xfs/xfs_iops.c:645
Function: vfs_fstat+0x239/0x2d0
Function: __se_sys_newfstat+0x47/0x6b0
Function: do_syscall_64+0xc9/0x1a0
Function: entry_SYSCALL_64_after_hwframe+0x77/0x7f
=================END==============

2. Race on `xfs_trans_ail_update_bulk` in `xfs_inode_item_format`
-------------------------------------.

We observed unsynchronized access to `lip->li_lsn`, which may exhibit
store/load tearing. However, we did not observe any symptoms
indicating harmful behavior.

Kernel panic: ============ DATARACE ============
Function: xfs_trans_ail_update_bulk+0xac0/0x25d0 fs/xfs/xfs_trans_ail.c:832
Function: xlog_cil_ail_insert_batch fs/xfs/xfs_log_cil.c:703 [inline]
Function: xlog_cil_ail_insert fs/xfs/xfs_log_cil.c:857 [inline]
Function: xlog_cil_committed+0x1e23/0x3220 fs/xfs/xfs_log_cil.c:904
Function: xlog_cil_process_committed+0x4d8/0x6a0 fs/xfs/xfs_log_cil.c:934
Function: xlog_state_do_callback+0xe52/0x1d70 fs/xfs/xfs_log.c:2525
Function: xlog_state_done_syncing+0x264/0x540 fs/xfs/xfs_log.c:2603
Function: xlog_ioend_work+0x24e/0x320 fs/xfs/xfs_log.c:1247
Function: process_scheduled_works+0x6c7/0xea0
Function: worker_thread+0xaac/0x1010
Function: kthread+0x341/0x760
Function: ret_from_fork+0x4d/0x80
Function: ret_from_fork_asm+0x1a/0x30
============OTHER_INFO============
Function: xfs_inode_item_format+0xe6e/0x6c00 fs/xfs/xfs_inode_item.c:637
Function: xlog_cil_commit+0x39ce/0xa1e0 fs/xfs/xfs_log_cil.c:513
Function: __xfs_trans_commit+0xa3b/0x23f0 fs/xfs/xfs_trans.c:896
Function: xfs_trans_commit+0x494/0x690 fs/xfs/xfs_trans.c:954
Function: xfs_setattr_nonsize+0x1c24/0x2e60 fs/xfs/xfs_iops.c:802
Function: xfs_setattr_size+0x628/0x2610 fs/xfs/xfs_iops.c:877
Function: xfs_vn_setattr_size+0x3ac/0x6a0 fs/xfs/xfs_iops.c:1054
Function: xfs_vn_setattr+0x43b/0xaf0 fs/xfs/xfs_iops.c:1079
Function: notify_change+0x9f9/0xca0
Function: do_truncate+0x18d/0x220
Function: path_openat+0x2741/0x2db0
Function: do_filp_open+0x230/0x440
Function: do_sys_openat2+0xab/0x110
Function: __x64_sys_creat+0xd7/0x100
Function: do_syscall_64+0xc9/0x1a0
Function: entry_SYSCALL_64_after_hwframe+0x77/0x7f
=================END==============

3. Race on `pag->pagf_freeblks`
-------------------------------

Although concurrent, this race is unlikely to affect correctness.

Kernel panic: ============ DATARACE ============
Function: xfs_alloc_longest_free_extent+0x164/0x580
fs/xfs/libxfs/xfs_alloc.c:2427
Function: xfs_bmapi_allocate+0x4349/0xb410 fs/xfs/libxfs/xfs_bmap.c:3314
Function: xfs_bmapi_write+0x2594/0x54b0 fs/xfs/libxfs/xfs_bmap.c:4551
Function: xfs_attr_rmtval_set_blk+0x496/0x9c0
fs/xfs/libxfs/xfs_attr_remote.c:633
Function: xfs_attr_set_iter+0x60e/0xf730 fs/xfs/libxfs/xfs_attr.c:637
Function: xfs_attr_finish_item+0x329/0xa00 fs/xfs/xfs_attr_item.c:505
Function: xfs_defer_finish_one+0x109d/0x28b0 fs/xfs/libxfs/xfs_defer.c:595
Function: xfs_defer_finish_noroll+0x1d91/0x4130 fs/xfs/libxfs/xfs_defer.c:707
Function: xfs_trans_commit+0x392/0x690 fs/xfs/xfs_trans.c:947
Function: xfs_attr_set+0x2a70/0x3e80 fs/xfs/libxfs/xfs_attr.c:1150
Function: xfs_attr_change+0xc03/0x10a0 fs/xfs/xfs_xattr.c:128
Function: xfs_xattr_set+0x535/0x870 fs/xfs/xfs_xattr.c:186
Function: __vfs_setxattr+0x3b6/0x3f0
Function: __vfs_setxattr_noperm+0x115/0x5c0
Function: vfs_setxattr+0x165/0x300
Function: file_setxattr+0x1a9/0x280
Function: path_setxattrat+0x2f4/0x370
Function: __x64_sys_fsetxattr+0xbc/0xe0
Function: do_syscall_64+0xc9/0x1a0
Function: entry_SYSCALL_64_after_hwframe+0x77/0x7f
Function: 0x0
============OTHER_INFO============
Function: xfs_alloc_update_counters+0x238/0x720 fs/xfs/libxfs/xfs_alloc.c:908
Function: xfs_free_ag_extent+0x22e7/0x4f10 fs/xfs/libxfs/xfs_alloc.c:2363
Function: __xfs_free_extent+0x747/0xba0 fs/xfs/libxfs/xfs_alloc.c:4025
Function: xfs_extent_free_finish_item+0x8be/0x18f0 fs/xfs/xfs_extfree_item.c:541
Function: xfs_defer_finish_one+0x109d/0x28b0 fs/xfs/libxfs/xfs_defer.c:595
Function: xfs_defer_finish_noroll+0x1d91/0x4130 fs/xfs/libxfs/xfs_defer.c:707
Function: xfs_defer_finish+0x3e/0x590 fs/xfs/libxfs/xfs_defer.c:741
Function: xfs_bunmapi_range+0x1fe/0x380 fs/xfs/libxfs/xfs_bmap.c:6443
Function: xfs_itruncate_extents_flags+0x660/0x1420 fs/xfs/xfs_inode.c:1066
Function: xfs_itruncate_extents fs/xfs/xfs_inode.h:603 [inline]
Function: xfs_setattr_size+0x12f1/0x2610 fs/xfs/xfs_iops.c:1003
Function: xfs_vn_setattr_size+0x3ac/0x6a0 fs/xfs/xfs_iops.c:1054
Function: xfs_vn_setattr+0x43b/0xaf0 fs/xfs/xfs_iops.c:1079
Function: notify_change+0x9f9/0xca0
Function: do_truncate+0x18d/0x220
Function: path_openat+0x2741/0x2db0
Function: do_filp_open+0x230/0x440
Function: do_sys_openat2+0xab/0x110
Function: __x64_sys_open+0x18d/0x1c0
Function: do_syscall_64+0xc9/0x1a0
Function: entry_SYSCALL_64_after_hwframe+0x77/0x7f
=================END==============

4. Race on `pag->pagf_longest`
------------------------------

Similar to the previous race, this field appears to be safely used
under current access patterns.

Kernel panic: ============ DATARACE ============
Function: xfs_alloc_fixup_longest+0x3d0/0x760 fs/xfs/libxfs/xfs_alloc.c:555
Function: xfs_alloc_fixup_trees+0x1331/0x2190 fs/xfs/libxfs/xfs_alloc.c:757
Function: xfs_alloc_cur_finish+0x3d1/0xd40 fs/xfs/libxfs/xfs_alloc.c:1119
Function: xfs_alloc_ag_vextent_near+0x38b2/0x46a0 fs/xfs/libxfs/xfs_alloc.c:1778
Function: xfs_alloc_vextent_iterate_ags+0xcef/0x1400
fs/xfs/libxfs/xfs_alloc.c:3741
Function: xfs_alloc_vextent_start_ag+0x830/0x14d0 fs/xfs/libxfs/xfs_alloc.c:3816
Function: xfs_bmapi_allocate+0x5016/0xb410 fs/xfs/libxfs/xfs_bmap.c:3764
Function: xfs_bmapi_write+0x2594/0x54b0 fs/xfs/libxfs/xfs_bmap.c:4551
Function: xfs_iomap_write_direct+0x7fc/0x1310 fs/xfs/xfs_iomap.c:322
Function: xfs_direct_write_iomap_begin+0x3278/0x42a0 fs/xfs/xfs_iomap.c:933
Function: iomap_iter+0x572/0xad0
Function: __iomap_dio_rw+0xbc1/0x1e50
Function: iomap_dio_rw+0x46/0xa0
Function: xfs_file_dio_write_unaligned+0x6cc/0x1030 fs/xfs/xfs_file.c:692
Function: xfs_file_write_iter+0x1403/0x19f0 fs/xfs/xfs_file.c:725
Function: do_iter_readv_writev+0x4d6/0x720
Function: vfs_writev+0x348/0xc20
Function: do_writev+0x129/0x280
Function: do_syscall_64+0xc9/0x1a0
Function: entry_SYSCALL_64_after_hwframe+0x77/0x7f
Function: 0x0
============OTHER_INFO============
Function: xfs_alloc_longest_free_extent+0x1f9/0x580
fs/xfs/libxfs/xfs_alloc.c:2427
Function: xfs_bmapi_allocate+0x4349/0xb410 fs/xfs/libxfs/xfs_bmap.c:3314
Function: xfs_bmapi_write+0x2594/0x54b0 fs/xfs/libxfs/xfs_bmap.c:4551
Function: xfs_attr_rmtval_set_blk+0x496/0x9c0
fs/xfs/libxfs/xfs_attr_remote.c:633
Function: xfs_attr_set_iter+0x60e/0xf730 fs/xfs/libxfs/xfs_attr.c:637
Function: xfs_attr_finish_item+0x329/0xa00 fs/xfs/xfs_attr_item.c:505
Function: xfs_defer_finish_one+0x109d/0x28b0 fs/xfs/libxfs/xfs_defer.c:595
Function: xfs_defer_finish_noroll+0x1d91/0x4130 fs/xfs/libxfs/xfs_defer.c:707
Function: xfs_trans_commit+0x392/0x690 fs/xfs/xfs_trans.c:947
Function: xfs_attr_set+0x2a70/0x3e80 fs/xfs/libxfs/xfs_attr.c:1150
Function: xfs_attr_change+0xc03/0x10a0 fs/xfs/xfs_xattr.c:128
Function: xfs_xattr_set+0x535/0x870 fs/xfs/xfs_xattr.c:186
Function: __vfs_setxattr+0x3b6/0x3f0
Function: __vfs_setxattr_noperm+0x115/0x5c0
Function: vfs_setxattr+0x165/0x300
Function: file_setxattr+0x1a9/0x280
Function: path_setxattrat+0x2f4/0x370
Function: __x64_sys_fsetxattr+0xbc/0xe0
Function: do_syscall_64+0xc9/0x1a0
Function: entry_SYSCALL_64_after_hwframe+0x77/0x7f
=================END==============
---

Possibly Harmful Race
======================

1. Race on `bp->b_addr` between `xfs_buf_map_pages()` and `xfs_buf_offset()`
----------------------------------------------------------------------------

Concurrent access to bp->b_addr happens during buffer preparation and
usage. Since this is pointer manipulation of page mappings, store/load
tearing or unexpected reuse might lead to memory corruption or invalid
log item formats. We are not confident in classifying this race as
benign or harmful and would appreciate your guidance on whether it
should be fixed or annotated.

Kernel panic: ============ DATARACE ============
Function: _xfs_buf_map_pages+0x881/0xd20 fs/xfs/xfs_buf.c:450
Function: xfs_buf_get_map+0x1cf3/0x38d0 fs/xfs/xfs_buf.c:767
Function: xfs_buf_read_map+0x1f2/0x1d80 fs/xfs/xfs_buf.c:863
Function: xfs_trans_read_buf_map+0x3c4/0x1dd0 fs/xfs/xfs_trans_buf.c:304
Function: xfs_imap_to_bp+0x415/0x8c0 fs/xfs/xfs_trans.h:212
Function: xfs_inode_item_precommit+0x1555/0x2780 fs/xfs/xfs_inode_item.c:188
Function: __xfs_trans_commit+0x7d7/0x23f0 fs/xfs/xfs_trans.c:826
Function: xfs_trans_commit+0x494/0x690 fs/xfs/xfs_trans.c:954
Function: xfs_create+0x21d8/0x2fe0 fs/xfs/xfs_inode.c:753
Function: xfs_generic_create+0x188b/0x2d90 fs/xfs/xfs_iops.c:215
Function: xfs_vn_create+0x50/0x70 fs/xfs/xfs_iops.c:298
Function: path_openat+0x1190/0x2db0
Function: do_filp_open+0x230/0x440
Function: do_sys_openat2+0xab/0x110
Function: __x64_sys_open+0x18d/0x1c0
Function: do_syscall_64+0xc9/0x1a0
Function: entry_SYSCALL_64_after_hwframe+0x77/0x7f
Function: 0x0
============OTHER_INFO============
Function: xfs_buf_offset+0xbd/0x450 fs/xfs/xfs_buf.c:1676
Function: xfs_inode_item_format+0x2854/0x6c00 fs/xfs/xfs_inode_item.c:533
Function: xlog_cil_commit+0x39ce/0xa1e0 fs/xfs/xfs_log_cil.c:513
Function: __xfs_trans_commit+0xa3b/0x23f0 fs/xfs/xfs_trans.c:896
Function: xfs_trans_commit+0x494/0x690 fs/xfs/xfs_trans.c:954
Function: xfs_setattr_nonsize+0x1c24/0x2e60 fs/xfs/xfs_iops.c:802
Function: xfs_vn_setattr+0x678/0xaf0 fs/xfs/xfs_iops.c:1086
Function: notify_change+0x9f9/0xca0
Function: chmod_common+0x1fe/0x410
Function: __x64_sys_fchmod+0xd4/0x130
Function: do_syscall_64+0xc9/0x1a0
Function: entry_SYSCALL_64_after_hwframe+0x77/0x7f
=================END==============

---

Thank you for your attention to these matters.

Best regards,
Cen Zhang

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Subject: [BUG] Five data races in in XFS Filesystem,one potentially harmful
  2025-05-13 12:25 Subject: [BUG] Five data races in in XFS Filesystem,one potentially harmful cen zhang
@ 2025-05-13 21:56 ` Dave Chinner
  2025-05-14 13:17 ` Christoph Hellwig
  1 sibling, 0 replies; 4+ messages in thread
From: Dave Chinner @ 2025-05-13 21:56 UTC (permalink / raw)
  To: cen zhang; +Cc: cem, linux-xfs, linux-kernel, baijiaju1990, zhenghaoran154

On Tue, May 13, 2025 at 08:25:49PM +0800, cen zhang wrote:
> Hello maintainers,
> 
> I would like to report five data race bugs we discovered in the XFS
> filesystem on Linux kernel v6.14-rc4. These issues were identified
> using our in-kernel data race detector.
> 
> Among the five races, we believe that four may be benign and could be
> annotated using `data_race()` to suppress false positives from
> analysis tools. However, one races involve shared global state or
> critical memory, and their effects are unclear.
> We would appreciate your evaluation on whether those should be fixed
> or annotated.
> 
> Below is a summary of the findings:
> 
> ---
> 
> Benign Races
> ============
> 
> 1. Race in `xfs_bmapi_reserve_delalloc()` and  `xfs_vn_getattr()`
> ----------------------------------------------------------------
> 
> A data race on `ip->i_delayed_blks`.

Not a bug. xfs_vn_getattr() runs unlocked as per the Linux VFS
design. -Everything- that is accessed in xfs_vn_getattr() is a data
race.

> 2. Race on `xfs_trans_ail_update_bulk` in `xfs_inode_item_format`
> -------------------------------------.
> 
> We observed unsynchronized access to `lip->li_lsn`, which may exhibit
> store/load tearing. However, we did not observe any symptoms
> indicating harmful behavior.

Not a bug. The lsn in the log_dinode is garbage and not used
during recovery - it's mainly there as potential debug information.

> 3. Race on `pag->pagf_freeblks`
> -------------------------------
> 
> Although concurrent, this race is unlikely to affect correctness.

It's an optimisitic check done knowing that we don't hold locks and
it can race. The code is explicitly designed this way. Every other
pagf variable used in these algorithms is also racy.

> 4. Race on `pag->pagf_longest`
> ------------------------------
> 
> Similar to the previous race, this field appears to be safely used
> under current access patterns.

Like this one.

> Possibly Harmful Race
> ======================
> 
> 1. Race on `bp->b_addr` between `xfs_buf_map_pages()` and `xfs_buf_offset()`
> ----------------------------------------------------------------------------
> 
> Concurrent access to bp->b_addr happens during buffer preparation and
> usage. Since this is pointer manipulation of page mappings, store/load
> tearing or unexpected reuse might lead to memory corruption or invalid
> log item formats. We are not confident in classifying this race as
> benign or harmful and would appreciate your guidance on whether it
> should be fixed or annotated.

Impossible. It should be obvious that an object undergoing
instantiation can't have a locked, referenced access from some other
object in some other code....

As I said in my reply to Christoph's patch - there are going to be
-hundreds- of these sorts false positives in XFS. Unless there is a
commitment to annotate every single false positive and address the
"data race by design" methods within the VFS architecture, there's
little point in playing an ongoing game of whack-a-mole with these.

Have the policy discussion and obtain a commit to fixing the many,
many false positives that this verification bot will report before
dumping many, many false positive reports on the list.

-Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Subject: [BUG] Five data races in in XFS Filesystem,one potentially harmful
  2025-05-13 12:25 Subject: [BUG] Five data races in in XFS Filesystem,one potentially harmful cen zhang
  2025-05-13 21:56 ` Dave Chinner
@ 2025-05-14 13:17 ` Christoph Hellwig
  2025-05-14 22:16   ` Dave Chinner
  1 sibling, 1 reply; 4+ messages in thread
From: Christoph Hellwig @ 2025-05-14 13:17 UTC (permalink / raw)
  To: cen zhang; +Cc: cem, linux-xfs, linux-kernel, baijiaju1990, zhenghaoran154

> 1. Race in `xfs_bmapi_reserve_delalloc()` and  `xfs_vn_getattr()`
> ----------------------------------------------------------------
> 
> A data race on `ip->i_delayed_blks`.

This is indeed a case for data_race as getattr is just reporting without any
locks.  Can you send a patch?

> 2. Race on `xfs_trans_ail_update_bulk` in `xfs_inode_item_format`
> -------------------------------------.
> 
> We observed unsynchronized access to `lip->li_lsn`, which may exhibit
> store/load tearing. However, we did not observe any symptoms
> indicating harmful behavior.

I think we'll need READ_ONCE/WRITE_ONCE here to be safe on 64-bit
systems to avoid tearing/reordering.  But that still won't help
with 32-bit systems.  Other lsn fields use an atomic64_t for, which
is pretty heavy-handed.

> Function: xfs_alloc_longest_free_extent+0x164/0x580

> Function: xfs_alloc_update_counters+0x238/0x720 fs/xfs/libxfs/xfs_alloc.c:908

Both of these should be called with b_sema held.  Does your tool
treat a semaphore with an initial count of 1 as a lock?  That is still
a pattern in Linux as mutexes don't allow non-owner unlocks.

> 4. Race on `pag->pagf_longest`
> ------------------------------
> 
> Similar to the previous race, this field appears to be safely used
> under current access patterns.

Same here I think.
> 
> Possibly Harmful Race
> ======================
> 
> 1. Race on `bp->b_addr` between `xfs_buf_map_pages()` and `xfs_buf_offset()`
> ----------------------------------------------------------------------------

And I think this is another case of b_sema synchronization unless I'm
missing something.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Subject: [BUG] Five data races in in XFS Filesystem,one potentially harmful
  2025-05-14 13:17 ` Christoph Hellwig
@ 2025-05-14 22:16   ` Dave Chinner
  0 siblings, 0 replies; 4+ messages in thread
From: Dave Chinner @ 2025-05-14 22:16 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: cen zhang, cem, linux-xfs, linux-kernel, baijiaju1990,
	zhenghaoran154

On Wed, May 14, 2025 at 06:17:15AM -0700, Christoph Hellwig wrote:
> > 1. Race in `xfs_bmapi_reserve_delalloc()` and  `xfs_vn_getattr()`
> > ----------------------------------------------------------------
> > 
> > A data race on `ip->i_delayed_blks`.
> 
> This is indeed a case for data_race as getattr is just reporting without any
> locks.  Can you send a patch?

No, please don't play data_race() whack-a-mole with
xfs_vn_getattr().

Please introduce infrastructure that allows us to mark entire
functions with something like __attribute__(data_race) so the
sanitiser infrastructure knows that all accesses within that
function are known to be potentially racy and should not be warned
about.

We can then mark every ->getattr method in every filesystem the same
way in a single patch, and knock out that entire class of false
positive in one hit. That's a much more efficient way of dealing
with this problem than one false positive at a time.

> > 2. Race on `xfs_trans_ail_update_bulk` in `xfs_inode_item_format`
> > -------------------------------------.
> > 
> > We observed unsynchronized access to `lip->li_lsn`, which may exhibit
> > store/load tearing. However, we did not observe any symptoms
> > indicating harmful behavior.
> 
> I think we'll need READ_ONCE/WRITE_ONCE here to be safe on 64-bit
> systems to avoid tearing/reordering.  But that still won't help
> with 32-bit systems.

We had problems with LSN tearing on 32 bit systems 20-odd years ago
back at SGI on MIPS Irix systems. This is why
xfs_trans_ail_copy_lsn() exists - LSNs are only even updated under
the AIL lock, so any read that might result in a critical tear (e.g.
flush lsn tracking in inodes and quots) was done under the AIL
lock on 32 bit systems.

> Other lsn fields use an atomic64_t for, which
> is pretty heavy-handed.

They use atomic64_t because they aren't protected by a specific lock
anymore. This was not done for torn read/write avoidance, but for
scalability optimisation. There is no reason for lip->li_lsn to be
an atomic, as all updates are done under the same serialising lock
(the ail->ail_lock).

As for reordering, nothing that is reading the lip->li_lsn should be
doing so in a place where compiler reordering should make any
difference. It's only updated in two places (set on AIL insert,
cleared on AIL remove) and neither of these two things will race
with readers using the lsn for fsync/formatting/verifier purposes.

I think that even the old use of xfs_trans_ail_copy_lsn() is likely no
longer necessary because flushing of dquots/inodes and reading the
LSN are now fully gated on the objects being locked and unpinned. The
LSN updates occur whilst the object is pinned and pinning can
only occur whilst the object is locked. Hence we -cannot- be doing
simultaneous lip->li_lsn updates and reading lip->li_lsn for
formatting purposes....

We extensively copy LSNs into the V5 on disk format with "racy"
reads and these on disk LSNs are critical to correct recovery
processing. If torn lip->li_lsn reads are actually happening then we
should be seeing this in random whacky recovery failures on
platforms where this happens. The V5 format has been around for well
over a decade now, so we should have seen somei evidence of this if
torn LSN reads were actually a real world problem.

> > Function: xfs_alloc_longest_free_extent+0x164/0x580
> 
> > Function: xfs_alloc_update_counters+0x238/0x720 fs/xfs/libxfs/xfs_alloc.c:908
> 
> Both of these should be called with b_sema held.

Definitely not. Yes xfs_alloc_update_counters() must be called with
the AGF locked, but that's because it's -modifying the AGF-. The
update of the perag piggybacks on this so we don't lose writes. i.e.
we get write vs write serialisation here, we are explicitly not
trying to provide write vs read serialisation.

That's because the AG selection algorithm in
xfs_bmap_btalloc_select_lengths() is an optimistic, unlocked
algorithm. It always has been. It uses the in-memory
pag variables first to select an AG, and we don't care if we race
with an ongoing allocation, because if we select that AG we will
recheck the selection (i.e. the free space info in the pag) once
we've locked the AGF in xfs_alloc_fix_freelist().

IOWs, this is simply another unlocked check, lock, check again
pattern. it's a lot further apart than your typical single logic
statement like:

	if (foo) {
		lock(foo_lock)
		if (foo) {
			/* do something */
		}
		unlock(foo_lock)
	}

But it is exactly the same logic pattern where no binding decision
is made until all the correct locks are held.

As I've already said - the patterns are endemic in XFS. They may not
be as obvious as the common if - lock - if structure above, but
that's just the simplest form of this common lock avoidance pattern.

IOWs, these data races need a lot more careful analysis and
knowledge of what problem the unlocked reads are solving to
determine what the correct fix might be.

To me, having to add READ_ONCE() or data_race() - and maybe comments
- to hundreds of variable accesses across the code base adds noise
without really adding anything of value. This isn't finding new
bugs - it's largely crying wolf about structures and algorithms we
intentionally designed to work this way long ago...

> Does your tool
> treat a semaphore with an initial count of 1 as a lock?  That is still
> a pattern in Linux as mutexes don't allow non-owner unlocks.

If it did, then the bp->b_addr init race wouldn't have been flagged
as an issue.

-Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2025-05-14 22:17 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-13 12:25 Subject: [BUG] Five data races in in XFS Filesystem,one potentially harmful cen zhang
2025-05-13 21:56 ` Dave Chinner
2025-05-14 13:17 ` Christoph Hellwig
2025-05-14 22:16   ` Dave Chinner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).