* iwalk fixes @ 2024-08-12 5:22 Christoph Hellwig 2024-08-12 5:23 ` [PATCH 1/2] xfs: fix handling of RCU freed inodes from other AGs in xfs_icwalk_ag Christoph Hellwig 2024-08-12 5:23 ` [PATCH 2/2] xfs: fix handling of RCU freed inodes from other AGs in xrep_iunlink_mark_incore Christoph Hellwig 0 siblings, 2 replies; 7+ messages in thread From: Christoph Hellwig @ 2024-08-12 5:22 UTC (permalink / raw) To: Chandan Babu R; +Cc: Darrick J. Wong, Dave Chinner, linux-xfs Hi all, I noticed minor issues with skipping RCU freed inodes that end up being reused in other AGs. I've not actuall reproduced these issues, this is purely based on looking over the code for another little project. Diffstat: scrub/agheader_repair.c | 28 +++++++--------------------- xfs_icache.c | 11 +++++++---- 2 files changed, 14 insertions(+), 25 deletions(-) ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 1/2] xfs: fix handling of RCU freed inodes from other AGs in xfs_icwalk_ag 2024-08-12 5:22 iwalk fixes Christoph Hellwig @ 2024-08-12 5:23 ` Christoph Hellwig 2024-08-12 17:39 ` Darrick J. Wong 2024-08-14 21:50 ` Dave Chinner 2024-08-12 5:23 ` [PATCH 2/2] xfs: fix handling of RCU freed inodes from other AGs in xrep_iunlink_mark_incore Christoph Hellwig 1 sibling, 2 replies; 7+ messages in thread From: Christoph Hellwig @ 2024-08-12 5:23 UTC (permalink / raw) To: Chandan Babu R; +Cc: Darrick J. Wong, Dave Chinner, linux-xfs When xfs_icwalk_ag skips an inode because it was RCU freed from another AG, the slot for the inode in the batch array needs to be zeroed. We also really shouldn't try to grab the inode in that case (or at very least undo the grab), so move the call to xfs_icwalk_ag after this sanity check. Fixes: 1a3e8f3da09c ("xfs: convert inode cache lookups to use RCU locking") Signed-off-by: Christoph Hellwig <hch@lst.de> --- fs/xfs/xfs_icache.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c index ae3c049fd3a216..3ee92d3d1770db 100644 --- a/fs/xfs/xfs_icache.c +++ b/fs/xfs/xfs_icache.c @@ -1701,9 +1701,6 @@ xfs_icwalk_ag( for (i = 0; i < nr_found; i++) { struct xfs_inode *ip = batch[i]; - if (done || !xfs_icwalk_igrab(goal, ip, icw)) - batch[i] = NULL; - /* * Update the index for the next lookup. Catch * overflows into the next AG range which can occur if @@ -1716,8 +1713,14 @@ xfs_icwalk_ag( * us to see this inode, so another lookup from the * same index will not find it again. */ - if (XFS_INO_TO_AGNO(mp, ip->i_ino) != pag->pag_agno) + if (XFS_INO_TO_AGNO(mp, ip->i_ino) != pag->pag_agno) { + batch[i] = NULL; continue; + } + + if (done || !xfs_icwalk_igrab(goal, ip, icw)) + batch[i] = NULL; + first_index = XFS_INO_TO_AGINO(mp, ip->i_ino + 1); if (first_index < XFS_INO_TO_AGINO(mp, ip->i_ino)) done = true; -- 2.43.0 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH 1/2] xfs: fix handling of RCU freed inodes from other AGs in xfs_icwalk_ag 2024-08-12 5:23 ` [PATCH 1/2] xfs: fix handling of RCU freed inodes from other AGs in xfs_icwalk_ag Christoph Hellwig @ 2024-08-12 17:39 ` Darrick J. Wong 2024-08-14 21:50 ` Dave Chinner 1 sibling, 0 replies; 7+ messages in thread From: Darrick J. Wong @ 2024-08-12 17:39 UTC (permalink / raw) To: Christoph Hellwig; +Cc: Chandan Babu R, Dave Chinner, linux-xfs On Mon, Aug 12, 2024 at 07:23:00AM +0200, Christoph Hellwig wrote: > When xfs_icwalk_ag skips an inode because it was RCU freed from another > AG, the slot for the inode in the batch array needs to be zeroed. We > also really shouldn't try to grab the inode in that case (or at very > least undo the grab), so move the call to xfs_icwalk_ag after this sanity > check. > > Fixes: 1a3e8f3da09c ("xfs: convert inode cache lookups to use RCU locking") > Signed-off-by: Christoph Hellwig <hch@lst.de> > --- > fs/xfs/xfs_icache.c | 11 +++++++---- > 1 file changed, 7 insertions(+), 4 deletions(-) > > diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c > index ae3c049fd3a216..3ee92d3d1770db 100644 > --- a/fs/xfs/xfs_icache.c > +++ b/fs/xfs/xfs_icache.c > @@ -1701,9 +1701,6 @@ xfs_icwalk_ag( > for (i = 0; i < nr_found; i++) { > struct xfs_inode *ip = batch[i]; > > - if (done || !xfs_icwalk_igrab(goal, ip, icw)) > - batch[i] = NULL; > - > /* > * Update the index for the next lookup. Catch > * overflows into the next AG range which can occur if > @@ -1716,8 +1713,14 @@ xfs_icwalk_ag( > * us to see this inode, so another lookup from the > * same index will not find it again. > */ > - if (XFS_INO_TO_AGNO(mp, ip->i_ino) != pag->pag_agno) > + if (XFS_INO_TO_AGNO(mp, ip->i_ino) != pag->pag_agno) { > + batch[i] = NULL; > continue; > + } > + > + if (done || !xfs_icwalk_igrab(goal, ip, icw)) > + batch[i] = NULL; IOWs, if @ip has been freed and reallocated to a different AG, then we don't want to touch it at all, not even to check IRECLAIMABLE in igrab. I think that sounds correct so Reviewed-by: Darrick J. Wong <djwong@kernel.org> --D > + > first_index = XFS_INO_TO_AGINO(mp, ip->i_ino + 1); > if (first_index < XFS_INO_TO_AGINO(mp, ip->i_ino)) > done = true; > -- > 2.43.0 > > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 1/2] xfs: fix handling of RCU freed inodes from other AGs in xfs_icwalk_ag 2024-08-12 5:23 ` [PATCH 1/2] xfs: fix handling of RCU freed inodes from other AGs in xfs_icwalk_ag Christoph Hellwig 2024-08-12 17:39 ` Darrick J. Wong @ 2024-08-14 21:50 ` Dave Chinner 1 sibling, 0 replies; 7+ messages in thread From: Dave Chinner @ 2024-08-14 21:50 UTC (permalink / raw) To: Christoph Hellwig; +Cc: Chandan Babu R, Darrick J. Wong, linux-xfs On Mon, Aug 12, 2024 at 07:23:00AM +0200, Christoph Hellwig wrote: > When xfs_icwalk_ag skips an inode because it was RCU freed from another > AG, the slot for the inode in the batch array needs to be zeroed. We > also really shouldn't try to grab the inode in that case (or at very > least undo the grab), so move the call to xfs_icwalk_ag after this sanity > check. I think this change is invalid and needs to be reworked. It moves the actual "has the inode been RCU freed in this grace period" checks to after some other check on the inode is performed. From that perspective, this is a bad change to make. From another perspective, this is a poor change to make because.... > Fixes: 1a3e8f3da09c ("xfs: convert inode cache lookups to use RCU locking") > Signed-off-by: Christoph Hellwig <hch@lst.de> ... that commit and the current code isn't actually broken. In commit 1a3e8f3da09c, xfs_inode_ag_walk_grab() added explicit checks against using RCU freed inodes, and that nulls out the inode in the batch array: /* * check for stale RCU freed inode * * If the inode has been reallocated, it doesn't matter if it's not in * the AG we are walking - we are walking for writeback, so if it * passes all the "valid inode" checks and is dirty, then we'll write * it back anyway. If it has been reallocated and still being * initialised, the XFS_INEW check below will catch it. */ spin_lock(&ip->i_flags_lock); if (!ip->i_ino) goto out_unlock_noent; Indeed, xfs_icwalk_igrab() -> xfs_blockgc_igrab() still has this check: /* Check for stale RCU freed inode */ spin_lock(&ip->i_flags_lock); if (!ip->i_ino) goto out_unlock_noent; However, that commit used a different check for reclaimable inodes; it uses XFS_IRECLAIM to indicate that the inode is either being reclaimed or has been RCU freed and so reclaim should not touch it. xfs_reclaim_igrab() still retains that check: spin_lock(&ip->i_flags_lock); if (!__xfs_iflags_test(ip, XFS_IRECLAIMABLE) || __xfs_iflags_test(ip, XFS_IRECLAIM)) { /* not a reclaim candidate. */ spin_unlock(&ip->i_flags_lock); return false; } XFS_IRECLAIM is always set on RCU freed inodes (it's set at the same time ip->i_ino is set to zero in xfs_reclaim_inode()) before the inode is rcu freed, and so neither of these grab functions will allow RCU freed inodes to be grabbed. Hence: > --- > fs/xfs/xfs_icache.c | 11 +++++++---- > 1 file changed, 7 insertions(+), 4 deletions(-) > > diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c > index ae3c049fd3a216..3ee92d3d1770db 100644 > --- a/fs/xfs/xfs_icache.c > +++ b/fs/xfs/xfs_icache.c > @@ -1701,9 +1701,6 @@ xfs_icwalk_ag( > for (i = 0; i < nr_found; i++) { > struct xfs_inode *ip = batch[i]; > > - if (done || !xfs_icwalk_igrab(goal, ip, icw)) > - batch[i] = NULL; > - The existing code here -always- catches RCU freed inodes that have not yet been freed due to the grace period still running. This code replaces RCU freed inodes with NULL in the batch array, and so after this point the inode is guaranteed to be valid. > /* > * Update the index for the next lookup. Catch > * overflows into the next AG range which can occur if > @@ -1716,8 +1713,14 @@ xfs_icwalk_ag( > * us to see this inode, so another lookup from the > * same index will not find it again. > */ > - if (XFS_INO_TO_AGNO(mp, ip->i_ino) != pag->pag_agno) > + if (XFS_INO_TO_AGNO(mp, ip->i_ino) != pag->pag_agno) { > + batch[i] = NULL; > continue; This check is not checking for RCU freed inodes - it was checking for indoes that are both freed and reallocated within a single RCU grace period. I added this check in 1a3e8f3da09c (backin 2010) because I wasn't 100% certain of the RCU existence guarantees and those interacted with read side critical sections. It was defensive, "just in case" code. Looking at this code now - with another 14 years of RCU algorithms experience under my belt - it is obvious that this code is unnecessary and always has been. From RCU first principles, we cannot see a new inode with a different inode number magically appear at this memory address during a rcu_read_lock() side critical period. Such behaviour would require the inode to be removed from the radix tree, freed and then reallocated and reinitialised to a different identity within a single RCU grace period. That is impossible because the object we found during the lookup won't get freed until the RCU grace period ends sometime after we drop the rcu_read_lock().... Hence the right thing to do here is to remove this check completely and leave the rest of the code alone.... -Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 2/2] xfs: fix handling of RCU freed inodes from other AGs in xrep_iunlink_mark_incore 2024-08-12 5:22 iwalk fixes Christoph Hellwig 2024-08-12 5:23 ` [PATCH 1/2] xfs: fix handling of RCU freed inodes from other AGs in xfs_icwalk_ag Christoph Hellwig @ 2024-08-12 5:23 ` Christoph Hellwig 2024-08-12 17:42 ` Darrick J. Wong 2024-08-14 22:04 ` Dave Chinner 1 sibling, 2 replies; 7+ messages in thread From: Christoph Hellwig @ 2024-08-12 5:23 UTC (permalink / raw) To: Chandan Babu R; +Cc: Darrick J. Wong, Dave Chinner, linux-xfs When xrep_iunlink_mark_incore skips an inode because it was RCU freed from another AG, the slot for the inode in the batch array needs to be zeroed. Also un-duplicate the check and remove the need for the xrep_iunlink_igrab helper. Fixes: ab97f4b1c030 ("xfs: repair AGI unlinked inode bucket lists") Signed-off-by: Christoph Hellwig <hch@lst.de> --- fs/xfs/scrub/agheader_repair.c | 28 +++++++--------------------- 1 file changed, 7 insertions(+), 21 deletions(-) diff --git a/fs/xfs/scrub/agheader_repair.c b/fs/xfs/scrub/agheader_repair.c index 2f98d90d7fd66d..558bc86b1b83c3 100644 --- a/fs/xfs/scrub/agheader_repair.c +++ b/fs/xfs/scrub/agheader_repair.c @@ -1108,23 +1108,6 @@ xrep_iunlink_walk_ondisk_bucket( return 0; } -/* Decide if this is an unlinked inode in this AG. */ -STATIC bool -xrep_iunlink_igrab( - struct xfs_perag *pag, - struct xfs_inode *ip) -{ - struct xfs_mount *mp = pag->pag_mount; - - if (XFS_INO_TO_AGNO(mp, ip->i_ino) != pag->pag_agno) - return false; - - if (!xfs_inode_on_unlinked_list(ip)) - return false; - - return true; -} - /* * Mark the given inode in the lookup batch in our unlinked inode bitmap, and * remember if this inode is the start of the unlinked chain. @@ -1196,9 +1179,6 @@ xrep_iunlink_mark_incore( for (i = 0; i < nr_found; i++) { struct xfs_inode *ip = ragi->lookup_batch[i]; - if (done || !xrep_iunlink_igrab(pag, ip)) - ragi->lookup_batch[i] = NULL; - /* * Update the index for the next lookup. Catch * overflows into the next AG range which can occur if @@ -1211,8 +1191,14 @@ xrep_iunlink_mark_incore( * us to see this inode, so another lookup from the * same index will not find it again. */ - if (XFS_INO_TO_AGNO(mp, ip->i_ino) != pag->pag_agno) + if (XFS_INO_TO_AGNO(mp, ip->i_ino) != pag->pag_agno) { + ragi->lookup_batch[i] = NULL; continue; + } + + if (done || !xfs_inode_on_unlinked_list(ip)) + ragi->lookup_batch[i] = NULL; + first_index = XFS_INO_TO_AGINO(mp, ip->i_ino + 1); if (first_index < XFS_INO_TO_AGINO(mp, ip->i_ino)) done = true; -- 2.43.0 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH 2/2] xfs: fix handling of RCU freed inodes from other AGs in xrep_iunlink_mark_incore 2024-08-12 5:23 ` [PATCH 2/2] xfs: fix handling of RCU freed inodes from other AGs in xrep_iunlink_mark_incore Christoph Hellwig @ 2024-08-12 17:42 ` Darrick J. Wong 2024-08-14 22:04 ` Dave Chinner 1 sibling, 0 replies; 7+ messages in thread From: Darrick J. Wong @ 2024-08-12 17:42 UTC (permalink / raw) To: Christoph Hellwig; +Cc: Chandan Babu R, Dave Chinner, linux-xfs On Mon, Aug 12, 2024 at 07:23:01AM +0200, Christoph Hellwig wrote: > When xrep_iunlink_mark_incore skips an inode because it was RCU freed > from another AG, the slot for the inode in the batch array needs to be > zeroed. Also un-duplicate the check and remove the need for the > xrep_iunlink_igrab helper. > > Fixes: ab97f4b1c030 ("xfs: repair AGI unlinked inode bucket lists") > Signed-off-by: Christoph Hellwig <hch@lst.de> Looks good to me, Reviewed-by: Darrick J. Wong <djwong@kernel.org> --D > --- > fs/xfs/scrub/agheader_repair.c | 28 +++++++--------------------- > 1 file changed, 7 insertions(+), 21 deletions(-) > > diff --git a/fs/xfs/scrub/agheader_repair.c b/fs/xfs/scrub/agheader_repair.c > index 2f98d90d7fd66d..558bc86b1b83c3 100644 > --- a/fs/xfs/scrub/agheader_repair.c > +++ b/fs/xfs/scrub/agheader_repair.c > @@ -1108,23 +1108,6 @@ xrep_iunlink_walk_ondisk_bucket( > return 0; > } > > -/* Decide if this is an unlinked inode in this AG. */ > -STATIC bool > -xrep_iunlink_igrab( > - struct xfs_perag *pag, > - struct xfs_inode *ip) > -{ > - struct xfs_mount *mp = pag->pag_mount; > - > - if (XFS_INO_TO_AGNO(mp, ip->i_ino) != pag->pag_agno) > - return false; > - > - if (!xfs_inode_on_unlinked_list(ip)) > - return false; > - > - return true; > -} > - > /* > * Mark the given inode in the lookup batch in our unlinked inode bitmap, and > * remember if this inode is the start of the unlinked chain. > @@ -1196,9 +1179,6 @@ xrep_iunlink_mark_incore( > for (i = 0; i < nr_found; i++) { > struct xfs_inode *ip = ragi->lookup_batch[i]; > > - if (done || !xrep_iunlink_igrab(pag, ip)) > - ragi->lookup_batch[i] = NULL; > - > /* > * Update the index for the next lookup. Catch > * overflows into the next AG range which can occur if > @@ -1211,8 +1191,14 @@ xrep_iunlink_mark_incore( > * us to see this inode, so another lookup from the > * same index will not find it again. > */ > - if (XFS_INO_TO_AGNO(mp, ip->i_ino) != pag->pag_agno) > + if (XFS_INO_TO_AGNO(mp, ip->i_ino) != pag->pag_agno) { > + ragi->lookup_batch[i] = NULL; > continue; > + } > + > + if (done || !xfs_inode_on_unlinked_list(ip)) > + ragi->lookup_batch[i] = NULL; > + > first_index = XFS_INO_TO_AGINO(mp, ip->i_ino + 1); > if (first_index < XFS_INO_TO_AGINO(mp, ip->i_ino)) > done = true; > -- > 2.43.0 > > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 2/2] xfs: fix handling of RCU freed inodes from other AGs in xrep_iunlink_mark_incore 2024-08-12 5:23 ` [PATCH 2/2] xfs: fix handling of RCU freed inodes from other AGs in xrep_iunlink_mark_incore Christoph Hellwig 2024-08-12 17:42 ` Darrick J. Wong @ 2024-08-14 22:04 ` Dave Chinner 1 sibling, 0 replies; 7+ messages in thread From: Dave Chinner @ 2024-08-14 22:04 UTC (permalink / raw) To: Christoph Hellwig; +Cc: Chandan Babu R, Darrick J. Wong, linux-xfs On Mon, Aug 12, 2024 at 07:23:01AM +0200, Christoph Hellwig wrote: > When xrep_iunlink_mark_incore skips an inode because it was RCU freed > from another AG, the slot for the inode in the batch array needs to be > zeroed. Also un-duplicate the check and remove the need for the > xrep_iunlink_igrab helper. > > Fixes: ab97f4b1c030 ("xfs: repair AGI unlinked inode bucket lists") > Signed-off-by: Christoph Hellwig <hch@lst.de> > --- > fs/xfs/scrub/agheader_repair.c | 28 +++++++--------------------- > 1 file changed, 7 insertions(+), 21 deletions(-) > > diff --git a/fs/xfs/scrub/agheader_repair.c b/fs/xfs/scrub/agheader_repair.c > index 2f98d90d7fd66d..558bc86b1b83c3 100644 > --- a/fs/xfs/scrub/agheader_repair.c > +++ b/fs/xfs/scrub/agheader_repair.c > @@ -1108,23 +1108,6 @@ xrep_iunlink_walk_ondisk_bucket( > return 0; > } > > -/* Decide if this is an unlinked inode in this AG. */ > -STATIC bool > -xrep_iunlink_igrab( > - struct xfs_perag *pag, > - struct xfs_inode *ip) > -{ > - struct xfs_mount *mp = pag->pag_mount; > - > - if (XFS_INO_TO_AGNO(mp, ip->i_ino) != pag->pag_agno) > - return false; > - > - if (!xfs_inode_on_unlinked_list(ip)) > - return false; > - > - return true; > -} This code is wrong. It does not explicitly check for RCU freed inodes (i.e. ip->i_ino = 0 or XFS_IRECLAIM being set) and so will never detect stale RCU freed inodes in AG 0. It is probably working by chance to avoid stale freed inodes because ip->i_prev_unlinked will be 0 for such inodes. *However*, this code does not have the necessary memory barriers to guarantee it catches the ip->i_ino or ip->i_prev_unlinked writes prior to freeing. The ip->i_ino check needs to be done under the ip->i_flags_lock as it is the unlock->lock memory barrier that the inode cache RCU lookup algorithms rely on for correct detection for RCU freed inodes. > - > /* > * Mark the given inode in the lookup batch in our unlinked inode bitmap, and > * remember if this inode is the start of the unlinked chain. > @@ -1196,9 +1179,6 @@ xrep_iunlink_mark_incore( > for (i = 0; i < nr_found; i++) { > struct xfs_inode *ip = ragi->lookup_batch[i]; > > - if (done || !xrep_iunlink_igrab(pag, ip)) > - ragi->lookup_batch[i] = NULL; > - > /* > * Update the index for the next lookup. Catch > * overflows into the next AG range which can occur if > @@ -1211,8 +1191,14 @@ xrep_iunlink_mark_incore( > * us to see this inode, so another lookup from the > * same index will not find it again. > */ > - if (XFS_INO_TO_AGNO(mp, ip->i_ino) != pag->pag_agno) > + if (XFS_INO_TO_AGNO(mp, ip->i_ino) != pag->pag_agno) { > + ragi->lookup_batch[i] = NULL; > continue; > + } > + > + if (done || !xfs_inode_on_unlinked_list(ip)) > + ragi->lookup_batch[i] = NULL; Same with this new code - it's not explicitly checking for RCU freed inodes and doesn't have the correct memory barriers. Hence I think the fixes for this code are: 1. change xrep_iunlink_igrab() to use the same RCU freed inode checks as xfs_blockgc_igrab(); and 2. remove the (XFS_INO_TO_AGNO(mp, ip->i_ino) != pag->pag_agno) check altogether. -Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2024-08-14 22:04 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-08-12 5:22 iwalk fixes Christoph Hellwig 2024-08-12 5:23 ` [PATCH 1/2] xfs: fix handling of RCU freed inodes from other AGs in xfs_icwalk_ag Christoph Hellwig 2024-08-12 17:39 ` Darrick J. Wong 2024-08-14 21:50 ` Dave Chinner 2024-08-12 5:23 ` [PATCH 2/2] xfs: fix handling of RCU freed inodes from other AGs in xrep_iunlink_mark_incore Christoph Hellwig 2024-08-12 17:42 ` Darrick J. Wong 2024-08-14 22:04 ` Dave Chinner
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox