[PATCH 1/4] xfs: make inode inactivation state changes atomic

public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH 1/4] xfs: make inode inactivation state changes atomic
  2024-02-01  0:30 [RFC] [PATCH 0/4] xfs: reactivate inodes immediately in xfs_iget Dave Chinner
@ 2024-02-01  0:30 ` Dave Chinner
  2024-02-01 19:07   ` Darrick J. Wong
  0 siblings, 1 reply; 16+ messages in thread
From: Dave Chinner @ 2024-02-01  0:30 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

We need the XFS_NEED_INACTIVE flag to correspond to whether the
inode is on the inodegc queues so that we can then use this state
for lazy removal.

To do this, move the addition of the inode to the inodegc queue
under the ip->i_flags_lock so that it is atomic w.r.t. setting
the XFS_NEED_INACTIVE flag.

Then, when we remove the inode from the inodegc list to actually run
inactivation, clear the XFS_NEED_INACTIVE at the same time we are
setting XFS_INACTIVATING to indicate that inactivation is in
progress.

These changes result in all the state changes and inodegc queuing
being atomic w.r.t. each other and inode lookups via the use of the
ip->i_flags lock.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_icache.c | 18 +++++++++++++++---
 1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
index 06046827b5fe..425b55526386 100644
--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -1875,7 +1875,12 @@ xfs_inodegc_worker(
 	llist_for_each_entry_safe(ip, n, node, i_gclist) {
 		int	error;
 
-		xfs_iflags_set(ip, XFS_INACTIVATING);
+		/* Switch state to inactivating. */
+		spin_lock(&ip->i_flags_lock);
+		ip->i_flags |= XFS_INACTIVATING;
+		ip->i_flags &= ~XFS_NEED_INACTIVE;
+		spin_unlock(&ip->i_flags_lock);
+
 		error = xfs_inodegc_inactivate(ip);
 		if (error && !gc->error)
 			gc->error = error;
@@ -2068,9 +2073,13 @@ xfs_inodegc_queue(
 	unsigned long		queue_delay = 1;
 
 	trace_xfs_inode_set_need_inactive(ip);
+
+	/*
+	 * Put the addition of the inode to the gc list under the
+	 * ip->i_flags_lock so that the state change and list addition are
+	 * atomic w.r.t. lookup operations under the ip->i_flags_lock.
+	 */
 	spin_lock(&ip->i_flags_lock);
-	ip->i_flags |= XFS_NEED_INACTIVE;
-	spin_unlock(&ip->i_flags_lock);
 
 	cpu_nr = get_cpu();
 	gc = this_cpu_ptr(mp->m_inodegc);
@@ -2079,6 +2088,9 @@ xfs_inodegc_queue(
 	WRITE_ONCE(gc->items, items + 1);
 	shrinker_hits = READ_ONCE(gc->shrinker_hits);
 
+	ip->i_flags |= XFS_NEED_INACTIVE;
+	spin_unlock(&ip->i_flags_lock);
+
 	/*
 	 * Ensure the list add is always seen by anyone who finds the cpumask
 	 * bit set. This effectively gives the cpumask bit set operation
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/4] xfs: make inode inactivation state changes atomic
  2024-02-01  0:30 ` [PATCH 1/4] xfs: make inode inactivation state changes atomic Dave Chinner
@ 2024-02-01 19:07   ` Darrick J. Wong
  0 siblings, 0 replies; 16+ messages in thread
From: Darrick J. Wong @ 2024-02-01 19:07 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Thu, Feb 01, 2024 at 11:30:13AM +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> We need the XFS_NEED_INACTIVE flag to correspond to whether the
> inode is on the inodegc queues so that we can then use this state
> for lazy removal.
> 
> To do this, move the addition of the inode to the inodegc queue
> under the ip->i_flags_lock so that it is atomic w.r.t. setting
> the XFS_NEED_INACTIVE flag.
> 
> Then, when we remove the inode from the inodegc list to actually run
> inactivation, clear the XFS_NEED_INACTIVE at the same time we are
> setting XFS_INACTIVATING to indicate that inactivation is in
> progress.
> 
> These changes result in all the state changes and inodegc queuing
> being atomic w.r.t. each other and inode lookups via the use of the
> ip->i_flags lock.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/xfs_icache.c | 18 +++++++++++++++---
>  1 file changed, 15 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
> index 06046827b5fe..425b55526386 100644
> --- a/fs/xfs/xfs_icache.c
> +++ b/fs/xfs/xfs_icache.c
> @@ -1875,7 +1875,12 @@ xfs_inodegc_worker(
>  	llist_for_each_entry_safe(ip, n, node, i_gclist) {
>  		int	error;
>  
> -		xfs_iflags_set(ip, XFS_INACTIVATING);
> +		/* Switch state to inactivating. */
> +		spin_lock(&ip->i_flags_lock);
> +		ip->i_flags |= XFS_INACTIVATING;
> +		ip->i_flags &= ~XFS_NEED_INACTIVE;

The comment for XFS_INACTIVATING ought to be updated to state that
NEED_INACTIVE is cleared at the same time that INACTIVATING is set.

> +		spin_unlock(&ip->i_flags_lock);
> +
>  		error = xfs_inodegc_inactivate(ip);
>  		if (error && !gc->error)
>  			gc->error = error;
> @@ -2068,9 +2073,13 @@ xfs_inodegc_queue(
>  	unsigned long		queue_delay = 1;
>  
>  	trace_xfs_inode_set_need_inactive(ip);
> +
> +	/*
> +	 * Put the addition of the inode to the gc list under the
> +	 * ip->i_flags_lock so that the state change and list addition are
> +	 * atomic w.r.t. lookup operations under the ip->i_flags_lock.
> +	 */
>  	spin_lock(&ip->i_flags_lock);
> -	ip->i_flags |= XFS_NEED_INACTIVE;
> -	spin_unlock(&ip->i_flags_lock);
>  
>  	cpu_nr = get_cpu();
>  	gc = this_cpu_ptr(mp->m_inodegc);
> @@ -2079,6 +2088,9 @@ xfs_inodegc_queue(
>  	WRITE_ONCE(gc->items, items + 1);
>  	shrinker_hits = READ_ONCE(gc->shrinker_hits);
>  
> +	ip->i_flags |= XFS_NEED_INACTIVE;
> +	spin_unlock(&ip->i_flags_lock);

This change mostly makes sense to me, but is it necessary to move the
line that sets XFS_NEED_INACTIVE?  This change extends the critical
section so that the llist_add and the flags update are atomic, so
couldn't this change reduce down to moving the spin_unlock call?

(IOWs I'm not sure if there's a subtlety here or if this is merely rough
draft syndrome.)

--D

> +
>  	/*
>  	 * Ensure the list add is always seen by anyone who finds the cpumask
>  	 * bit set. This effectively gives the cpumask bit set operation
> -- 
> 2.43.0
> 
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v2 0/4] xfs: recycle inactive inodes immediately
@ 2024-03-19  0:15 Dave Chinner
  2024-03-19  0:15 ` [PATCH 1/4] xfs: make inode inactivation state changes atomic Dave Chinner
                   ` (3 more replies)
  0 siblings, 4 replies; 16+ messages in thread
From: Dave Chinner @ 2024-03-19  0:15 UTC (permalink / raw)
  To: linux-xfs

Currently xfs_iget() will flush inodes queued for inactivation
rather than recycling them. Flushing the inodegc queues causes
inactivation to run and the inodes transistion to reclaimable where
they can then be recycled. the xfs_iget() code spins for a short
while before trying the lookup again, and will continue to do
so until the inode has moved to the reclaimable state, at which
point it will recycle the inode.

However, if the filesystem is frozen, we cannot flush the inode gc
queues because we can't make modifications during a freeze and so
inodegc is not running. Hence inodes that need inactivation that
they VFS then tries to reference again (e.g. shrinker reclaimed them
just before they were accessed), xfs_iget() will just spin on the
inode waiting for the freeze to go away so the inode can be flushed.

This can be triggered by creating a bunch of files with post-eof
blocks and stalling them on the inodegc queues like so:

# cp a-set-1MB-files* /mnt/xfs
# xfs_io -xr -c "freeze" /mnt/xfs
# echo 3 > /proc/sys/vm/drop_caches
# ls -l /mnt/test

If the timing is just right, then the 'ls -l' will hang spinning
on inodes as they are now sitting in XFS_NEED_INACTIVE state on
the inodegc queues and won't be processed until the filesystem is
thawed.

Instead of flushing the inode, we could just recycle the inode
immediately. That, however, is complicated by the use of lockless
single linked lists for the inodegc queues. We can't just remove
them, so we need to enable lazy removal from the inodegc queue.

To do this, we make the lockless list addition and removal atomic
w.r.t. the inode state changes via the ip->i_flags_lock. This lock
is also held during xfs_iget() lookups, so it serialises the inodegc
list processing against inode lookup as well.

This then enables us to use the XFS_NEED_INACTIVE state flag to
determine if the inode should be inactivated when removing it from
the inodegc list during inodegc work. i.e. the inodegc worker
decides if inactivation should take place, not the context that is
queuing the inode to the inodegc list.

Hence by clearing the XFS_NEED_INACTIVE flag, we can leave inodes on
the inodegc lists and know that they will not be inactivated when
the worker next runs and sees that inode. It will just remove it
from the list and skip over it.

This gives us lazy list removal, and now we can immediately
reactivate the inode during lookup. This is similar to the recycling
of reclaimable inodes, but just a little bit different. I haven't
tried to combine the implementations - it could be done, but I think
that gets in the way of seeing how reactivation is different from
recycling.

By doing this, it means that the above series of operations will no
longer hang waiting for a thaw to occur. Indeed, we can see the
inode recycle stat getting bumped when the above reproducer is run -
it reactivates the inodes instead of hanging:

# xfs_stats.pl | grep recycle
    xs_ig_frecycle.......            75    vn_reclaim............           304
# cp a-set-1MB-files* /mnt/xfs
# xfs_io -xr -c "freeze" /mnt/xfs
# echo 3 > /proc/sys/vm/drop_caches
# ls -l /mnt/test > /dev/null
# xfs_stats.pl | grep recycle
    xs_ig_frecycle.......           100    vn_reclaim............           330
# xfs_io -xr -c "thaw" /mnt/xfs
# rm -rf /mnt/test/a-set*
# umount /mnt/test
#

Version 2:
- updated XFS_INACTIVATING comment to describe the new behaviours
  of XFS_NEED_INACTIVE and XFS_INACTIVATING bit.
- don't randomly move code about in patches.
- add trace_xfs_iget_recycle_fail() trace point to record
  reactivation failures.
- fix bug when unlinked inode check on NEED_INACTIVE inodes was not done before
  reactivating. This caused failures on xfs/183.

Version 1:
https://lore.kernel.org/linux-xfs/20240201005217.1011010-1-david@fromorbit.com/

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/4] xfs: make inode inactivation state changes atomic
  2024-03-19  0:15 [PATCH v2 0/4] xfs: recycle inactive inodes immediately Dave Chinner
@ 2024-03-19  0:15 ` Dave Chinner
  2024-03-19 18:01   ` Darrick J. Wong
  2024-03-19  0:15 ` [PATCH 2/4] xfs: prepare inode for i_gclist detection Dave Chinner
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 16+ messages in thread
From: Dave Chinner @ 2024-03-19  0:15 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

We need the XFS_NEED_INACTIVE flag to correspond to whether the
inode is on the inodegc queues so that we can then use this state
for lazy removal.

To do this, move the addition of the inode to the inodegc queue
under the ip->i_flags_lock so that it is atomic w.r.t. setting
the XFS_NEED_INACTIVE flag.

Then, when we remove the inode from the inodegc list to actually run
inactivation, clear the XFS_NEED_INACTIVE at the same time we are
setting XFS_INACTIVATING to indicate that inactivation is in
progress.

These changes result in all the state changes and inodegc queuing
being atomic w.r.t. each other and inode lookups via the use of the
ip->i_flags lock.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_icache.c | 16 ++++++++++++++--
 fs/xfs/xfs_inode.h  | 11 +++++++----
 2 files changed, 21 insertions(+), 6 deletions(-)

diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
index 6c87b90754c4..9a362964f656 100644
--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -1880,7 +1880,12 @@ xfs_inodegc_worker(
 	llist_for_each_entry_safe(ip, n, node, i_gclist) {
 		int	error;
 
-		xfs_iflags_set(ip, XFS_INACTIVATING);
+		/* Switch state to inactivating. */
+		spin_lock(&ip->i_flags_lock);
+		ip->i_flags |= XFS_INACTIVATING;
+		ip->i_flags &= ~XFS_NEED_INACTIVE;
+		spin_unlock(&ip->i_flags_lock);
+
 		error = xfs_inodegc_inactivate(ip);
 		if (error && !gc->error)
 			gc->error = error;
@@ -2075,9 +2080,14 @@ xfs_inodegc_queue(
 	unsigned long		queue_delay = 1;
 
 	trace_xfs_inode_set_need_inactive(ip);
+
+	/*
+	 * Put the addition of the inode to the gc list under the
+	 * ip->i_flags_lock so that the state change and list addition are
+	 * atomic w.r.t. lookup operations under the ip->i_flags_lock.
+	 */
 	spin_lock(&ip->i_flags_lock);
 	ip->i_flags |= XFS_NEED_INACTIVE;
-	spin_unlock(&ip->i_flags_lock);
 
 	cpu_nr = get_cpu();
 	gc = this_cpu_ptr(mp->m_inodegc);
@@ -2086,6 +2096,8 @@ xfs_inodegc_queue(
 	WRITE_ONCE(gc->items, items + 1);
 	shrinker_hits = READ_ONCE(gc->shrinker_hits);
 
+	spin_unlock(&ip->i_flags_lock);
+
 	/*
 	 * Ensure the list add is always seen by anyone who finds the cpumask
 	 * bit set. This effectively gives the cpumask bit set operation
diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
index 94fa79ae1591..b0943d888f5c 100644
--- a/fs/xfs/xfs_inode.h
+++ b/fs/xfs/xfs_inode.h
@@ -349,10 +349,13 @@ static inline bool xfs_inode_has_forcealign(struct xfs_inode *ip)
 
 /*
  * If we need to update on-disk metadata before this IRECLAIMABLE inode can be
- * freed, then NEED_INACTIVE will be set.  Once we start the updates, the
- * INACTIVATING bit will be set to keep iget away from this inode.  After the
- * inactivation completes, both flags will be cleared and the inode is a
- * plain old IRECLAIMABLE inode.
+ * freed, then NEED_INACTIVE will be set. If the inode is accessed via iget
+ * whilst NEED_INACTIVE is set, the inode will be reactivated and become a
+ * normal inode again. Once we start the inactivation, the INACTIVATING bit will
+ * be set and the NEED_INACTIVE bit will be cleared. The INACTIVATING bit will
+ * keep iget away from this inode whilst inactivation is in progress.  After the
+ * inactivation completes, INACTIVATING will be cleared and the inode
+ * transitions to a plain old IRECLAIMABLE inode.
  */
 #define XFS_INACTIVATING	(1 << 13)
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 2/4] xfs: prepare inode for i_gclist detection
  2024-03-19  0:15 [PATCH v2 0/4] xfs: recycle inactive inodes immediately Dave Chinner
  2024-03-19  0:15 ` [PATCH 1/4] xfs: make inode inactivation state changes atomic Dave Chinner
@ 2024-03-19  0:15 ` Dave Chinner
  2024-03-19  0:15 ` [PATCH 3/4] xfs: allow lazy removal of inodes from the inodegc queues Dave Chinner
  2024-03-19  0:16 ` [PATCH 4/4] xfs: reactivate XFS_NEED_INACTIVE inodes from xfs_iget Dave Chinner
  3 siblings, 0 replies; 16+ messages in thread
From: Dave Chinner @ 2024-03-19  0:15 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

We currently don't initialise the inode->i_gclist member because it
it not necessary for a pure llist_add/llist_del_all producer-
consumer usage pattern.  However, for lazy removal from the inodegc
list, we need to be able to determine if the inode is already on an
inodegc list before we queue it.

We can do this detection by using llist_on_list(), but this requires
that we initialise the llist_node before we use it, and we
re-initialise it when we remove it from the llist.

Because we already serialise the inodegc list add with inode state
changes under the ip->i_flags_lock, we can do the initialisation on
list removal atomically with the state change. We can also do the
check of whether the inode is already on a inodegc list inside the
state change region on insert.

This gives us the ability to use llist_on_list(ip->i_gclist) to
determine if the inode needs to be queued for inactivation without
having to depend on inode state flags.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/xfs_icache.c | 19 +++++++++++++++++--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
index 9a362964f656..559b8f71dc91 100644
--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -113,6 +113,7 @@ xfs_inode_alloc(
 	spin_lock_init(&ip->i_ioend_lock);
 	ip->i_next_unlinked = NULLAGINO;
 	ip->i_prev_unlinked = 0;
+	init_llist_node(&ip->i_gclist);
 
 	return ip;
 }
@@ -1880,8 +1881,14 @@ xfs_inodegc_worker(
 	llist_for_each_entry_safe(ip, n, node, i_gclist) {
 		int	error;
 
-		/* Switch state to inactivating. */
+		/*
+		 * Switch state to inactivating and remove the inode from the
+		 * gclist. This allows the use of llist_on_list() in the queuing
+		 * code to determine if the inode is already on an inodegc
+		 * queue.
+		 */
 		spin_lock(&ip->i_flags_lock);
+		init_llist_node(&ip->i_gclist);
 		ip->i_flags |= XFS_INACTIVATING;
 		ip->i_flags &= ~XFS_NEED_INACTIVE;
 		spin_unlock(&ip->i_flags_lock);
@@ -2082,13 +2089,21 @@ xfs_inodegc_queue(
 	trace_xfs_inode_set_need_inactive(ip);
 
 	/*
-	 * Put the addition of the inode to the gc list under the
+	 * The addition of the inode to the gc list is done under the
 	 * ip->i_flags_lock so that the state change and list addition are
 	 * atomic w.r.t. lookup operations under the ip->i_flags_lock.
+	 * The removal is also done under the ip->i_flags_lock and so this
+	 * allows us to safely use llist_on_list() here to determine if the
+	 * inode is already queued on an inactivation queue.
 	 */
 	spin_lock(&ip->i_flags_lock);
 	ip->i_flags |= XFS_NEED_INACTIVE;
 
+	if (llist_on_list(&ip->i_gclist)) {
+		spin_unlock(&ip->i_flags_lock);
+		return;
+	}
+
 	cpu_nr = get_cpu();
 	gc = this_cpu_ptr(mp->m_inodegc);
 	llist_add(&ip->i_gclist, &gc->list);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 3/4] xfs: allow lazy removal of inodes from the inodegc queues
  2024-03-19  0:15 [PATCH v2 0/4] xfs: recycle inactive inodes immediately Dave Chinner
  2024-03-19  0:15 ` [PATCH 1/4] xfs: make inode inactivation state changes atomic Dave Chinner
  2024-03-19  0:15 ` [PATCH 2/4] xfs: prepare inode for i_gclist detection Dave Chinner
@ 2024-03-19  0:15 ` Dave Chinner
  2024-03-19  0:16 ` [PATCH 4/4] xfs: reactivate XFS_NEED_INACTIVE inodes from xfs_iget Dave Chinner
  3 siblings, 0 replies; 16+ messages in thread
From: Dave Chinner @ 2024-03-19  0:15 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

To allow us to recycle inodes that are awaiting inactivation, we
need to enable lazy removal of inodes from the list. Th elist is a
lockless single linked variant, so we can't just remove inodes from
the list at will.

Instead, we can remove them lazily whenever inodegc runs by enabling
the inodegc processing to determine whether inactivation needs to be
done at processing time rather than queuing time.

We've already modified the queuing code to only queue the inode if
it isn't already queued, so here all we need to do is modify the
queue processing to determine if inactivation needs to be done.

Hence we introduce the behaviour that we can cancel inactivation
processing simply by clearing the XFS_NEED_INACTIVE flag on the
inode. Processing will check this flag and skip inactivation
processing if it is not set. The flag is always set at queuing time,
regardless of whether the inode is already one the queues or not.
Hence if it is not set at processing time, it means that something
has cancelled the inactivation and we should just remove it from the
list and then leave it alone.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/xfs_icache.c | 36 +++++++++++++++++++++++++++++-------
 1 file changed, 29 insertions(+), 7 deletions(-)

diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
index 559b8f71dc91..7359753b892b 100644
--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -1882,13 +1882,21 @@ xfs_inodegc_worker(
 		int	error;
 
 		/*
-		 * Switch state to inactivating and remove the inode from the
-		 * gclist. This allows the use of llist_on_list() in the queuing
-		 * code to determine if the inode is already on an inodegc
-		 * queue.
+		 * Remove the inode from the gclist and determine if it needs to
+		 * be processed. The XFS_NEED_INACTIVE flag gets cleared if the
+		 * inode is reactivated after queuing, but the list removal is
+		 * lazy and left up to us.
+		 *
+		 * We always remove the inode from the list to allow the use of
+		 * llist_on_list() in the queuing code to determine if the inode
+		 * is already on an inodegc queue.
 		 */
 		spin_lock(&ip->i_flags_lock);
 		init_llist_node(&ip->i_gclist);
+		if (!(ip->i_flags & XFS_NEED_INACTIVE)) {
+			spin_unlock(&ip->i_flags_lock);
+			continue;
+		}
 		ip->i_flags |= XFS_INACTIVATING;
 		ip->i_flags &= ~XFS_NEED_INACTIVE;
 		spin_unlock(&ip->i_flags_lock);
@@ -2160,7 +2168,6 @@ xfs_inode_mark_reclaimable(
 	struct xfs_inode	*ip)
 {
 	struct xfs_mount	*mp = ip->i_mount;
-	bool			need_inactive;
 
 	XFS_STATS_INC(mp, vn_reclaim);
 
@@ -2169,8 +2176,23 @@ xfs_inode_mark_reclaimable(
 	 */
 	ASSERT_ALWAYS(!xfs_iflags_test(ip, XFS_ALL_IRECLAIM_FLAGS));
 
-	need_inactive = xfs_inode_needs_inactive(ip);
-	if (need_inactive) {
+	/*
+	 * If the inode is already queued for inactivation because it was
+	 * re-activated and is now being reclaimed again (e.g. fs has been
+	 * frozen for a while) we must ensure that the inode waits for inodegc
+	 * to be run and removes it from the inodegc queue before it moves to
+	 * the reclaimable state and gets freed.
+	 *
+	 * We don't care about races here. We can't race with a list addition
+	 * because only one thread can be evicting the inode from the VFS cache,
+	 * hence false negatives can't occur and we only need to worry about
+	 * list removal races.  If we get a false positive from a list removal
+	 * race, then the inode goes through the inactive list whether it needs
+	 * to or not. This will slow down reclaim of this inode slightly but
+	 * should have no other side effects.
+	 */
+	if (llist_on_list(&ip->i_gclist) ||
+	    xfs_inode_needs_inactive(ip)) {
 		xfs_inodegc_queue(ip);
 		return;
 	}
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 4/4] xfs: reactivate XFS_NEED_INACTIVE inodes from xfs_iget
  2024-03-19  0:15 [PATCH v2 0/4] xfs: recycle inactive inodes immediately Dave Chinner
                   ` (2 preceding siblings ...)
  2024-03-19  0:15 ` [PATCH 3/4] xfs: allow lazy removal of inodes from the inodegc queues Dave Chinner
@ 2024-03-19  0:16 ` Dave Chinner
  2024-03-19 18:11   ` Darrick J. Wong
  2024-03-20  8:39   ` Andre Noll
  3 siblings, 2 replies; 16+ messages in thread
From: Dave Chinner @ 2024-03-19  0:16 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

When xfs_iget() finds an inode that is queued for inactivation, it
issues an inodegc flush to trigger the inactivation work and then
retries the lookup.

However, when the filesystem is frozen, inodegc is turned off and
the flush does nothing and does not block. This results in lookup
spinning on NEED_INACTIVE inodes and being unable to make progress
until the filesystem is thawed. This is less than ideal.

The only reason we can't immediately recycle the inode is that it
queued on a lockless list we can't remove it from. However, those
lists now support lazy removal, and so we can now modify the lookup
code to reactivate inode queued for inactivation. The process is
identical to how we recycle reclaimable inodes from xfs_iget(), so
this ends up being a relatively simple change to make.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_icache.c | 110 +++++++++++++++++++++++++++++++++++---------
 1 file changed, 87 insertions(+), 23 deletions(-)

diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
index 7359753b892b..56de3e843df2 100644
--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -63,6 +63,8 @@ static int xfs_icwalk_ag(struct xfs_perag *pag,
 					 XFS_ICWALK_FLAG_RECLAIM_SICK | \
 					 XFS_ICWALK_FLAG_UNION)
 
+static void xfs_inodegc_queue(struct xfs_inode *ip);
+
 /*
  * Allocate and initialise an xfs_inode.
  */
@@ -325,6 +327,7 @@ xfs_reinit_inode(
 	return error;
 }
 
+
 /*
  * Carefully nudge an inode whose VFS state has been torn down back into a
  * usable state.  Drops the i_flags_lock and the rcu read lock.
@@ -388,7 +391,82 @@ xfs_iget_recycle(
 	inode->i_state = I_NEW;
 	spin_unlock(&ip->i_flags_lock);
 	spin_unlock(&pag->pag_ici_lock);
+	XFS_STATS_INC(mp, xs_ig_frecycle);
+	return 0;
+}
 
+static int
+xfs_iget_reactivate(
+	struct xfs_perag	*pag,
+	struct xfs_inode	*ip) __releases(&ip->i_flags_lock)
+{
+	struct xfs_mount	*mp = ip->i_mount;
+	struct inode		*inode = VFS_I(ip);
+	int			error;
+
+	trace_xfs_iget_recycle(ip);
+
+	/*
+	 * If the inode has been unlinked, then the lookup must not find it
+	 * until inactivation has actually freed the inode.
+	 */
+	if (VFS_I(ip)->i_nlink == 0) {
+		spin_unlock(&ip->i_flags_lock);
+		rcu_read_unlock();
+		return -ENOENT;
+	}
+
+	/*
+	 * Take the ILOCK here to serialise against lookup races with putting
+	 * the inode back on the inodegc queue during error handling.
+	 */
+	if (!xfs_ilock_nowait(ip, XFS_ILOCK_EXCL))
+		return -EAGAIN;
+
+	/*
+	 * Move the state to inactivating so both inactivation and racing
+	 * lookups will skip over this inode until we've finished reactivating
+	 * it and can return it to the XFS_INEW state.
+	 */
+	ip->i_flags &= ~XFS_NEED_INACTIVE;
+	ip->i_flags |= XFS_INACTIVATING;
+	spin_unlock(&ip->i_flags_lock);
+	rcu_read_unlock();
+
+	ASSERT(!rwsem_is_locked(&inode->i_rwsem));
+	error = xfs_reinit_inode(mp, inode);
+	if (error) {
+		/*
+		 * Well, that sucks. Put the inode back on the inactive queue.
+		 * Do this while still under the ILOCK so that we can set the
+		 * NEED_INACTIVE flag and clear the INACTIVATING flag an not
+		 * have another lookup race with us before we've finished
+		 * putting the inode back on the inodegc queue.
+		 */
+		spin_unlock(&ip->i_flags_lock);
+		ip->i_flags |= XFS_NEED_INACTIVE;
+		ip->i_flags &= ~XFS_INACTIVATING;
+		spin_unlock(&ip->i_flags_lock);
+
+		xfs_inodegc_queue(ip);
+		xfs_iunlock(ip, XFS_ILOCK_EXCL);
+
+		trace_xfs_iget_recycle_fail(ip);
+		return error;
+	}
+	xfs_iunlock(ip, XFS_ILOCK_EXCL);
+
+	/*
+	 * Reset the inode state to new so that xfs_iget() will complete
+	 * the required remaining inode initialisation before it returns the
+	 * inode to the caller.
+	 */
+	spin_lock(&ip->i_flags_lock);
+	ip->i_flags &= ~XFS_IRECLAIM_RESET_FLAGS;
+	ip->i_flags |= XFS_INEW;
+	inode->i_state = I_NEW;
+	spin_unlock(&ip->i_flags_lock);
+	XFS_STATS_INC(mp, xs_ig_frecycle);
 	return 0;
 }
 
@@ -526,15 +604,6 @@ xfs_iget_cache_hit(
 	if (ip->i_flags & (XFS_INEW | XFS_IRECLAIM | XFS_INACTIVATING))
 		goto out_skip;
 
-	if (ip->i_flags & XFS_NEED_INACTIVE) {
-		/* Unlinked inodes cannot be re-grabbed. */
-		if (VFS_I(ip)->i_nlink == 0) {
-			error = -ENOENT;
-			goto out_error;
-		}
-		goto out_inodegc_flush;
-	}
-
 	/*
 	 * Check the inode free state is valid. This also detects lookup
 	 * racing with unlinks.
@@ -545,11 +614,18 @@ xfs_iget_cache_hit(
 
 	/* Skip inodes that have no vfs state. */
 	if ((flags & XFS_IGET_INCORE) &&
-	    (ip->i_flags & XFS_IRECLAIMABLE))
+	    (ip->i_flags & (XFS_IRECLAIMABLE | XFS_NEED_INACTIVE)))
 		goto out_skip;
 
 	/* The inode fits the selection criteria; process it. */
-	if (ip->i_flags & XFS_IRECLAIMABLE) {
+	if (ip->i_flags & XFS_NEED_INACTIVE) {
+		/* Drops i_flags_lock and RCU read lock. */
+		error = xfs_iget_reactivate(pag, ip);
+		if (error == -EAGAIN)
+			goto out_skip;
+		if (error)
+			return error;
+	} else if (ip->i_flags & XFS_IRECLAIMABLE) {
 		/* Drops i_flags_lock and RCU read lock. */
 		error = xfs_iget_recycle(pag, ip);
 		if (error == -EAGAIN)
@@ -578,23 +654,11 @@ xfs_iget_cache_hit(
 
 out_skip:
 	trace_xfs_iget_skip(ip);
-	XFS_STATS_INC(mp, xs_ig_frecycle);
 	error = -EAGAIN;
 out_error:
 	spin_unlock(&ip->i_flags_lock);
 	rcu_read_unlock();
 	return error;
-
-out_inodegc_flush:
-	spin_unlock(&ip->i_flags_lock);
-	rcu_read_unlock();
-	/*
-	 * Do not wait for the workers, because the caller could hold an AGI
-	 * buffer lock.  We're just going to sleep in a loop anyway.
-	 */
-	if (xfs_is_inodegc_enabled(mp))
-		xfs_inodegc_queue_all(mp);
-	return -EAGAIN;
 }
 
 static int
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/4] xfs: make inode inactivation state changes atomic
  2024-03-19  0:15 ` [PATCH 1/4] xfs: make inode inactivation state changes atomic Dave Chinner
@ 2024-03-19 18:01   ` Darrick J. Wong
  0 siblings, 0 replies; 16+ messages in thread
From: Darrick J. Wong @ 2024-03-19 18:01 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Tue, Mar 19, 2024 at 11:15:57AM +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> We need the XFS_NEED_INACTIVE flag to correspond to whether the
> inode is on the inodegc queues so that we can then use this state
> for lazy removal.
> 
> To do this, move the addition of the inode to the inodegc queue
> under the ip->i_flags_lock so that it is atomic w.r.t. setting
> the XFS_NEED_INACTIVE flag.
> 
> Then, when we remove the inode from the inodegc list to actually run
> inactivation, clear the XFS_NEED_INACTIVE at the same time we are
> setting XFS_INACTIVATING to indicate that inactivation is in
> progress.
> 
> These changes result in all the state changes and inodegc queuing
> being atomic w.r.t. each other and inode lookups via the use of the
> ip->i_flags lock.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>

Pretty straightforward lock coverage extension,
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> ---
>  fs/xfs/xfs_icache.c | 16 ++++++++++++++--
>  fs/xfs/xfs_inode.h  | 11 +++++++----
>  2 files changed, 21 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
> index 6c87b90754c4..9a362964f656 100644
> --- a/fs/xfs/xfs_icache.c
> +++ b/fs/xfs/xfs_icache.c
> @@ -1880,7 +1880,12 @@ xfs_inodegc_worker(
>  	llist_for_each_entry_safe(ip, n, node, i_gclist) {
>  		int	error;
>  
> -		xfs_iflags_set(ip, XFS_INACTIVATING);
> +		/* Switch state to inactivating. */
> +		spin_lock(&ip->i_flags_lock);
> +		ip->i_flags |= XFS_INACTIVATING;
> +		ip->i_flags &= ~XFS_NEED_INACTIVE;
> +		spin_unlock(&ip->i_flags_lock);
> +
>  		error = xfs_inodegc_inactivate(ip);
>  		if (error && !gc->error)
>  			gc->error = error;
> @@ -2075,9 +2080,14 @@ xfs_inodegc_queue(
>  	unsigned long		queue_delay = 1;
>  
>  	trace_xfs_inode_set_need_inactive(ip);
> +
> +	/*
> +	 * Put the addition of the inode to the gc list under the
> +	 * ip->i_flags_lock so that the state change and list addition are
> +	 * atomic w.r.t. lookup operations under the ip->i_flags_lock.
> +	 */
>  	spin_lock(&ip->i_flags_lock);
>  	ip->i_flags |= XFS_NEED_INACTIVE;
> -	spin_unlock(&ip->i_flags_lock);
>  
>  	cpu_nr = get_cpu();
>  	gc = this_cpu_ptr(mp->m_inodegc);
> @@ -2086,6 +2096,8 @@ xfs_inodegc_queue(
>  	WRITE_ONCE(gc->items, items + 1);
>  	shrinker_hits = READ_ONCE(gc->shrinker_hits);
>  
> +	spin_unlock(&ip->i_flags_lock);
> +
>  	/*
>  	 * Ensure the list add is always seen by anyone who finds the cpumask
>  	 * bit set. This effectively gives the cpumask bit set operation
> diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
> index 94fa79ae1591..b0943d888f5c 100644
> --- a/fs/xfs/xfs_inode.h
> +++ b/fs/xfs/xfs_inode.h
> @@ -349,10 +349,13 @@ static inline bool xfs_inode_has_forcealign(struct xfs_inode *ip)
>  
>  /*
>   * If we need to update on-disk metadata before this IRECLAIMABLE inode can be
> - * freed, then NEED_INACTIVE will be set.  Once we start the updates, the
> - * INACTIVATING bit will be set to keep iget away from this inode.  After the
> - * inactivation completes, both flags will be cleared and the inode is a
> - * plain old IRECLAIMABLE inode.
> + * freed, then NEED_INACTIVE will be set. If the inode is accessed via iget
> + * whilst NEED_INACTIVE is set, the inode will be reactivated and become a
> + * normal inode again. Once we start the inactivation, the INACTIVATING bit will
> + * be set and the NEED_INACTIVE bit will be cleared. The INACTIVATING bit will
> + * keep iget away from this inode whilst inactivation is in progress.  After the
> + * inactivation completes, INACTIVATING will be cleared and the inode
> + * transitions to a plain old IRECLAIMABLE inode.
>   */
>  #define XFS_INACTIVATING	(1 << 13)
>  
> -- 
> 2.43.0
> 
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 4/4] xfs: reactivate XFS_NEED_INACTIVE inodes from xfs_iget
  2024-03-19  0:16 ` [PATCH 4/4] xfs: reactivate XFS_NEED_INACTIVE inodes from xfs_iget Dave Chinner
@ 2024-03-19 18:11   ` Darrick J. Wong
  2024-03-20  8:39   ` Andre Noll
  1 sibling, 0 replies; 16+ messages in thread
From: Darrick J. Wong @ 2024-03-19 18:11 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Tue, Mar 19, 2024 at 11:16:00AM +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> When xfs_iget() finds an inode that is queued for inactivation, it
> issues an inodegc flush to trigger the inactivation work and then
> retries the lookup.
> 
> However, when the filesystem is frozen, inodegc is turned off and
> the flush does nothing and does not block. This results in lookup
> spinning on NEED_INACTIVE inodes and being unable to make progress
> until the filesystem is thawed. This is less than ideal.
> 
> The only reason we can't immediately recycle the inode is that it
> queued on a lockless list we can't remove it from. However, those
> lists now support lazy removal, and so we can now modify the lookup
> code to reactivate inode queued for inactivation. The process is
> identical to how we recycle reclaimable inodes from xfs_iget(), so
> this ends up being a relatively simple change to make.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/xfs_icache.c | 110 +++++++++++++++++++++++++++++++++++---------
>  1 file changed, 87 insertions(+), 23 deletions(-)
> 
> diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
> index 7359753b892b..56de3e843df2 100644
> --- a/fs/xfs/xfs_icache.c
> +++ b/fs/xfs/xfs_icache.c
> @@ -63,6 +63,8 @@ static int xfs_icwalk_ag(struct xfs_perag *pag,
>  					 XFS_ICWALK_FLAG_RECLAIM_SICK | \
>  					 XFS_ICWALK_FLAG_UNION)
>  
> +static void xfs_inodegc_queue(struct xfs_inode *ip);
> +
>  /*
>   * Allocate and initialise an xfs_inode.
>   */
> @@ -325,6 +327,7 @@ xfs_reinit_inode(
>  	return error;
>  }
>  
> +
>  /*
>   * Carefully nudge an inode whose VFS state has been torn down back into a
>   * usable state.  Drops the i_flags_lock and the rcu read lock.
> @@ -388,7 +391,82 @@ xfs_iget_recycle(
>  	inode->i_state = I_NEW;
>  	spin_unlock(&ip->i_flags_lock);
>  	spin_unlock(&pag->pag_ici_lock);
> +	XFS_STATS_INC(mp, xs_ig_frecycle);
> +	return 0;
> +}
>  
> +static int
> +xfs_iget_reactivate(
> +	struct xfs_perag	*pag,
> +	struct xfs_inode	*ip) __releases(&ip->i_flags_lock)
> +{
> +	struct xfs_mount	*mp = ip->i_mount;
> +	struct inode		*inode = VFS_I(ip);
> +	int			error;
> +
> +	trace_xfs_iget_recycle(ip);
> +
> +	/*
> +	 * If the inode has been unlinked, then the lookup must not find it
> +	 * until inactivation has actually freed the inode.
> +	 */
> +	if (VFS_I(ip)->i_nlink == 0) {
> +		spin_unlock(&ip->i_flags_lock);
> +		rcu_read_unlock();
> +		return -ENOENT;
> +	}
> +
> +	/*
> +	 * Take the ILOCK here to serialise against lookup races with putting
> +	 * the inode back on the inodegc queue during error handling.
> +	 */
> +	if (!xfs_ilock_nowait(ip, XFS_ILOCK_EXCL))
> +		return -EAGAIN;
> +
> +	/*
> +	 * Move the state to inactivating so both inactivation and racing
> +	 * lookups will skip over this inode until we've finished reactivating
> +	 * it and can return it to the XFS_INEW state.
> +	 */
> +	ip->i_flags &= ~XFS_NEED_INACTIVE;
> +	ip->i_flags |= XFS_INACTIVATING;
> +	spin_unlock(&ip->i_flags_lock);
> +	rcu_read_unlock();
> +
> +	ASSERT(!rwsem_is_locked(&inode->i_rwsem));
> +	error = xfs_reinit_inode(mp, inode);
> +	if (error) {
> +		/*
> +		 * Well, that sucks. Put the inode back on the inactive queue.
> +		 * Do this while still under the ILOCK so that we can set the
> +		 * NEED_INACTIVE flag and clear the INACTIVATING flag an not

The sentence structure here is a little funky to me.  How about:

"...and clear the INACTIVATING flag without another lookup racing with us..."

?

With that changed,
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D


> +		 * have another lookup race with us before we've finished
> +		 * putting the inode back on the inodegc queue.
> +		 */
> +		spin_unlock(&ip->i_flags_lock);
> +		ip->i_flags |= XFS_NEED_INACTIVE;
> +		ip->i_flags &= ~XFS_INACTIVATING;
> +		spin_unlock(&ip->i_flags_lock);
> +
> +		xfs_inodegc_queue(ip);
> +		xfs_iunlock(ip, XFS_ILOCK_EXCL);
> +
> +		trace_xfs_iget_recycle_fail(ip);
> +		return error;
> +	}
> +	xfs_iunlock(ip, XFS_ILOCK_EXCL);
> +
> +	/*
> +	 * Reset the inode state to new so that xfs_iget() will complete
> +	 * the required remaining inode initialisation before it returns the
> +	 * inode to the caller.
> +	 */
> +	spin_lock(&ip->i_flags_lock);
> +	ip->i_flags &= ~XFS_IRECLAIM_RESET_FLAGS;
> +	ip->i_flags |= XFS_INEW;
> +	inode->i_state = I_NEW;
> +	spin_unlock(&ip->i_flags_lock);
> +	XFS_STATS_INC(mp, xs_ig_frecycle);
>  	return 0;
>  }
>  
> @@ -526,15 +604,6 @@ xfs_iget_cache_hit(
>  	if (ip->i_flags & (XFS_INEW | XFS_IRECLAIM | XFS_INACTIVATING))
>  		goto out_skip;
>  
> -	if (ip->i_flags & XFS_NEED_INACTIVE) {
> -		/* Unlinked inodes cannot be re-grabbed. */
> -		if (VFS_I(ip)->i_nlink == 0) {
> -			error = -ENOENT;
> -			goto out_error;
> -		}
> -		goto out_inodegc_flush;
> -	}
> -
>  	/*
>  	 * Check the inode free state is valid. This also detects lookup
>  	 * racing with unlinks.
> @@ -545,11 +614,18 @@ xfs_iget_cache_hit(
>  
>  	/* Skip inodes that have no vfs state. */
>  	if ((flags & XFS_IGET_INCORE) &&
> -	    (ip->i_flags & XFS_IRECLAIMABLE))
> +	    (ip->i_flags & (XFS_IRECLAIMABLE | XFS_NEED_INACTIVE)))
>  		goto out_skip;
>  
>  	/* The inode fits the selection criteria; process it. */
> -	if (ip->i_flags & XFS_IRECLAIMABLE) {
> +	if (ip->i_flags & XFS_NEED_INACTIVE) {
> +		/* Drops i_flags_lock and RCU read lock. */
> +		error = xfs_iget_reactivate(pag, ip);
> +		if (error == -EAGAIN)
> +			goto out_skip;
> +		if (error)
> +			return error;
> +	} else if (ip->i_flags & XFS_IRECLAIMABLE) {
>  		/* Drops i_flags_lock and RCU read lock. */
>  		error = xfs_iget_recycle(pag, ip);
>  		if (error == -EAGAIN)
> @@ -578,23 +654,11 @@ xfs_iget_cache_hit(
>  
>  out_skip:
>  	trace_xfs_iget_skip(ip);
> -	XFS_STATS_INC(mp, xs_ig_frecycle);
>  	error = -EAGAIN;
>  out_error:
>  	spin_unlock(&ip->i_flags_lock);
>  	rcu_read_unlock();
>  	return error;
> -
> -out_inodegc_flush:
> -	spin_unlock(&ip->i_flags_lock);
> -	rcu_read_unlock();
> -	/*
> -	 * Do not wait for the workers, because the caller could hold an AGI
> -	 * buffer lock.  We're just going to sleep in a loop anyway.
> -	 */
> -	if (xfs_is_inodegc_enabled(mp))
> -		xfs_inodegc_queue_all(mp);
> -	return -EAGAIN;
>  }
>  
>  static int
> -- 
> 2.43.0
> 
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 4/4] xfs: reactivate XFS_NEED_INACTIVE inodes from xfs_iget
  2024-03-19  0:16 ` [PATCH 4/4] xfs: reactivate XFS_NEED_INACTIVE inodes from xfs_iget Dave Chinner
  2024-03-19 18:11   ` Darrick J. Wong
@ 2024-03-20  8:39   ` Andre Noll
  2024-03-20 14:53     ` Darrick J. Wong
  2024-03-20 21:58     ` Dave Chinner
  1 sibling, 2 replies; 16+ messages in thread
From: Andre Noll @ 2024-03-20  8:39 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

[-- Attachment #1: Type: text/plain, Size: 831 bytes --]

On Tue, Mar 19, 11:16, Dave Chinner wrote
> +		/*
> +		 * Well, that sucks. Put the inode back on the inactive queue.
> +		 * Do this while still under the ILOCK so that we can set the
> +		 * NEED_INACTIVE flag and clear the INACTIVATING flag an not
> +		 * have another lookup race with us before we've finished
> +		 * putting the inode back on the inodegc queue.
> +		 */
> +		spin_unlock(&ip->i_flags_lock);
> +		ip->i_flags |= XFS_NEED_INACTIVE;
> +		ip->i_flags &= ~XFS_INACTIVATING;
> +		spin_unlock(&ip->i_flags_lock);

This doesn't look right. Shouldn't the first spin_unlock() be spin_lock()?

Also, there's a typo in the comment (s/an/and).

Best
Andre
-- 
Max Planck Institute for Biology
Tel: (+49) 7071 601 829
Max-Planck-Ring 5, 72076 Tübingen, Germany
http://people.tuebingen.mpg.de/maan/

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 4/4] xfs: reactivate XFS_NEED_INACTIVE inodes from xfs_iget
  2024-03-20  8:39   ` Andre Noll
@ 2024-03-20 14:53     ` Darrick J. Wong
  2024-03-20 16:58       ` Andre Noll
  2024-03-20 21:58     ` Dave Chinner
  1 sibling, 1 reply; 16+ messages in thread
From: Darrick J. Wong @ 2024-03-20 14:53 UTC (permalink / raw)
  To: Andre Noll; +Cc: Dave Chinner, linux-xfs

On Wed, Mar 20, 2024 at 09:39:57AM +0100, Andre Noll wrote:
> On Tue, Mar 19, 11:16, Dave Chinner wrote
> > +		/*
> > +		 * Well, that sucks. Put the inode back on the inactive queue.
> > +		 * Do this while still under the ILOCK so that we can set the
> > +		 * NEED_INACTIVE flag and clear the INACTIVATING flag an not
> > +		 * have another lookup race with us before we've finished
> > +		 * putting the inode back on the inodegc queue.
> > +		 */
> > +		spin_unlock(&ip->i_flags_lock);
> > +		ip->i_flags |= XFS_NEED_INACTIVE;
> > +		ip->i_flags &= ~XFS_INACTIVATING;
> > +		spin_unlock(&ip->i_flags_lock);
> 
> This doesn't look right. Shouldn't the first spin_unlock() be spin_lock()?

Yes.  So much for my hand inspection of code. :(

(Doesn't simple lock debugging catch these sorts of things?)

((It sure would be nice if locking returned a droppable "object" to do
the unlock ala Rust and then spin_lock could be __must_check.))

--D

> Also, there's a typo in the comment (s/an/and).
> Best
> Andre
> -- 
> Max Planck Institute for Biology
> Tel: (+49) 7071 601 829
> Max-Planck-Ring 5, 72076 Tübingen, Germany
> http://people.tuebingen.mpg.de/maan/



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 4/4] xfs: reactivate XFS_NEED_INACTIVE inodes from xfs_iget
  2024-03-20 14:53     ` Darrick J. Wong
@ 2024-03-20 16:58       ` Andre Noll
  2024-03-20 22:51         ` Dave Chinner
  0 siblings, 1 reply; 16+ messages in thread
From: Andre Noll @ 2024-03-20 16:58 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Dave Chinner, linux-xfs

[-- Attachment #1: Type: text/plain, Size: 1619 bytes --]

On Wed, Mar 20, 07:53, Darrick J. Wong wrote
> On Wed, Mar 20, 2024 at 09:39:57AM +0100, Andre Noll wrote:
> > On Tue, Mar 19, 11:16, Dave Chinner wrote
> > > +		/*
> > > +		 * Well, that sucks. Put the inode back on the inactive queue.
> > > +		 * Do this while still under the ILOCK so that we can set the
> > > +		 * NEED_INACTIVE flag and clear the INACTIVATING flag an not
> > > +		 * have another lookup race with us before we've finished
> > > +		 * putting the inode back on the inodegc queue.
> > > +		 */
> > > +		spin_unlock(&ip->i_flags_lock);
> > > +		ip->i_flags |= XFS_NEED_INACTIVE;
> > > +		ip->i_flags &= ~XFS_INACTIVATING;
> > > +		spin_unlock(&ip->i_flags_lock);
> > 
> > This doesn't look right. Shouldn't the first spin_unlock() be spin_lock()?
> 
> Yes.  So much for my hand inspection of code. :(

Given enough hand inspections, all bugs are shallow :)

> (Doesn't simple lock debugging catch these sorts of things?)

Maybe this error path doesn't get exercised because xfs_reinit_inode()
never fails. AFAICT, it can only fail if security_inode_alloc()
can't allocate the composite inode blob.

> ((It sure would be nice if locking returned a droppable "object" to do
> the unlock ala Rust and then spin_lock could be __must_check.))

There's the *LOCK_GUARD* macros which employ gcc's cleanup attribute
to automatically call e.g. spin_unlock() when a variable goes out of
scope (see 54da6a0924311).

Best
Andre
-- 
Max Planck Institute for Biology
Tel: (+49) 7071 601 829
Max-Planck-Ring 5, 72076 Tübingen, Germany
http://people.tuebingen.mpg.de/maan/

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 4/4] xfs: reactivate XFS_NEED_INACTIVE inodes from xfs_iget
  2024-03-20  8:39   ` Andre Noll
  2024-03-20 14:53     ` Darrick J. Wong
@ 2024-03-20 21:58     ` Dave Chinner
  1 sibling, 0 replies; 16+ messages in thread
From: Dave Chinner @ 2024-03-20 21:58 UTC (permalink / raw)
  To: Andre Noll; +Cc: linux-xfs

On Wed, Mar 20, 2024 at 09:39:57AM +0100, Andre Noll wrote:
> On Tue, Mar 19, 11:16, Dave Chinner wrote
> > +		/*
> > +		 * Well, that sucks. Put the inode back on the inactive queue.
> > +		 * Do this while still under the ILOCK so that we can set the
> > +		 * NEED_INACTIVE flag and clear the INACTIVATING flag an not
> > +		 * have another lookup race with us before we've finished
> > +		 * putting the inode back on the inodegc queue.
> > +		 */
> > +		spin_unlock(&ip->i_flags_lock);
> > +		ip->i_flags |= XFS_NEED_INACTIVE;
> > +		ip->i_flags &= ~XFS_INACTIVATING;
> > +		spin_unlock(&ip->i_flags_lock);
> 
> This doesn't look right. Shouldn't the first spin_unlock() be spin_lock()?

Good catch. Fixed.

-Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 4/4] xfs: reactivate XFS_NEED_INACTIVE inodes from xfs_iget
  2024-03-20 16:58       ` Andre Noll
@ 2024-03-20 22:51         ` Dave Chinner
  2024-03-21  9:59           ` Andre Noll
  0 siblings, 1 reply; 16+ messages in thread
From: Dave Chinner @ 2024-03-20 22:51 UTC (permalink / raw)
  To: Andre Noll; +Cc: Darrick J. Wong, linux-xfs

On Wed, Mar 20, 2024 at 05:58:53PM +0100, Andre Noll wrote:
> On Wed, Mar 20, 07:53, Darrick J. Wong wrote
> > On Wed, Mar 20, 2024 at 09:39:57AM +0100, Andre Noll wrote:
> > > On Tue, Mar 19, 11:16, Dave Chinner wrote
> > > > +		/*
> > > > +		 * Well, that sucks. Put the inode back on the inactive queue.
> > > > +		 * Do this while still under the ILOCK so that we can set the
> > > > +		 * NEED_INACTIVE flag and clear the INACTIVATING flag an not
> > > > +		 * have another lookup race with us before we've finished
> > > > +		 * putting the inode back on the inodegc queue.
> > > > +		 */
> > > > +		spin_unlock(&ip->i_flags_lock);
> > > > +		ip->i_flags |= XFS_NEED_INACTIVE;
> > > > +		ip->i_flags &= ~XFS_INACTIVATING;
> > > > +		spin_unlock(&ip->i_flags_lock);
> > > 
> > > This doesn't look right. Shouldn't the first spin_unlock() be spin_lock()?
> > 
> > Yes.  So much for my hand inspection of code. :(
> 
> Given enough hand inspections, all bugs are shallow :)

Sparse should have found that, if I ran it. :/

Ah, but sparse gets confused by the fact that the return from the
function may or may not have unlocked stuff:

fs/xfs/xfs_icache.c:355:9: warning: context imbalance in 'xfs_iget_recycle' - unexpected unlock
fs/xfs/xfs_icache.c:414:28: warning: context imbalance in 'xfs_iget_reactivate' - unexpected unlock
fs/xfs/xfs_icache.c:656:28: warning: context imbalance in 'xfs_iget_cache_hit' - different lock contexts for basic block

So if I fix that (that'll be patch 5 for this series), i get:

  CC      fs/xfs/xfs_icache.o
  CHECK   fs/xfs/xfs_icache.c
fs/xfs/xfs_icache.c:459:28: warning: context imbalance in 'xfs_iget_reactivate' - unexpected unlock

Yup, sparse now catches the unbalanced locking.

I just haven't thought to run sparse on XFS recently - running
sparse on a full kernel build is just .... awful. I think I'll
change my build script so that when I do an '--xfs-only' built it
also enables sparse as it's only rebuilding fs/xfs at that point....

> > (Doesn't simple lock debugging catch these sorts of things?)
> 
> Maybe this error path doesn't get exercised because xfs_reinit_inode()
> never fails. AFAICT, it can only fail if security_inode_alloc()
> can't allocate the composite inode blob.

Which syzkaller triggers every so often. I also do all my testing
with selinux enabled, so security_inode_alloc() is actually being
exercised and definitely has the potential to fail on my small
memory configs...

> > ((It sure would be nice if locking returned a droppable "object" to do
> > the unlock ala Rust and then spin_lock could be __must_check.))
> 
> There's the *LOCK_GUARD* macros which employ gcc's cleanup attribute
> to automatically call e.g. spin_unlock() when a variable goes out of
> scope (see 54da6a0924311).

IMO, the LOCK_GUARD stuff is an awful anti-pattern. It means some
error paths -look broken- because they lack unlocks, and we have to
explicitly change code to return from functions with the guarded
locks held. This is a diametrically opposed locking pattern to the
existing non-guarded lockign patterns - correct behaviour in one
pattern is broken behaviour in the other, and vice versa.

That's just -insane- from a code maintenance point of view.

And they are completely useless for anythign complex like these
XFS icache functions because the lock scope is not balanced across
functions.

The lock can also be taken by functions called within the guard
scope, and so using guarded lock scoping would result in deadlocks.
i.e. xfs_inodegc_queue() needs to take the i_flags_lock, so it must
be dropped before we call that.

So, yeah, lock guards seem to me to be largely just a "look ma, no
need for rust because we can mightily abuse the C preprocessor!"
anti-pattern looking for a problem to solve.

-Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 4/4] xfs: reactivate XFS_NEED_INACTIVE inodes from xfs_iget
  2024-03-20 22:51         ` Dave Chinner
@ 2024-03-21  9:59           ` Andre Noll
  2024-03-22  1:09             ` Dave Chinner
  0 siblings, 1 reply; 16+ messages in thread
From: Andre Noll @ 2024-03-21  9:59 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Darrick J. Wong, linux-xfs

[-- Attachment #1: Type: text/plain, Size: 3176 bytes --]

On Thu, Mar 21, 09:51, Dave Chinner wrote
> I just haven't thought to run sparse on XFS recently - running
> sparse on a full kernel build is just .... awful. I think I'll
> change my build script so that when I do an '--xfs-only' built it
> also enables sparse as it's only rebuilding fs/xfs at that point....

Would it be less awful to run coccinelle with a selected set of
semantic patches that catch defective patterns such as double
unlock/free?

> > > (Doesn't simple lock debugging catch these sorts of things?)
> > 
> > Maybe this error path doesn't get exercised because xfs_reinit_inode()
> > never fails. AFAICT, it can only fail if security_inode_alloc()
> > can't allocate the composite inode blob.
> 
> Which syzkaller triggers every so often. I also do all my testing
> with selinux enabled, so security_inode_alloc() is actually being
> exercised and definitely has the potential to fail on my small
> memory configs...

One could try to trigger ENOMEM more easily in functions like this
by allocating bigger slab caches for debug builds.

> > > ((It sure would be nice if locking returned a droppable "object" to do
> > > the unlock ala Rust and then spin_lock could be __must_check.))
> > 
> > There's the *LOCK_GUARD* macros which employ gcc's cleanup attribute
> > to automatically call e.g. spin_unlock() when a variable goes out of
> > scope (see 54da6a0924311).
> 
> IMO, the LOCK_GUARD stuff is an awful anti-pattern. It means some
> error paths -look broken- because they lack unlocks, and we have to
> explicitly change code to return from functions with the guarded
> locks held. This is a diametrically opposed locking pattern to the
> existing non-guarded lockign patterns - correct behaviour in one
> pattern is broken behaviour in the other, and vice versa.
> 
> That's just -insane- from a code maintenance point of view.

Converting all locks in fs/xfs in one go is not an option either, as
this would be too big to review, and non-trivial to begin with. There
are 180+ calls to spin_lock(), and that's just the spinlocks. Also
these patches would interfere badly with ongoing work.

> And they are completely useless for anythign complex like these
> XFS icache functions because the lock scope is not balanced across
> functions.
>
> The lock can also be taken by functions called within the guard
> scope, and so using guarded lock scoping would result in deadlocks.
> i.e. xfs_inodegc_queue() needs to take the i_flags_lock, so it must
> be dropped before we call that.

Yup, these can't use the LOCK_GUARD macros, which leads to an unholy
mix of guarded and unguarded locks.

> So, yeah, lock guards seem to me to be largely just a "look ma, no
> need for rust because we can mightily abuse the C preprocessor!"
> anti-pattern looking for a problem to solve.

Do you think there is a valid use case for the cleanup attribute,
or do you believe that the whole concept is mis-designed?

Thanks for sharing your opinions.
Andre
-- 
Max Planck Institute for Biology
Tel: (+49) 7071 601 829
Max-Planck-Ring 5, 72076 Tübingen, Germany
http://people.tuebingen.mpg.de/maan/

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 4/4] xfs: reactivate XFS_NEED_INACTIVE inodes from xfs_iget
  2024-03-21  9:59           ` Andre Noll
@ 2024-03-22  1:09             ` Dave Chinner
  0 siblings, 0 replies; 16+ messages in thread
From: Dave Chinner @ 2024-03-22  1:09 UTC (permalink / raw)
  To: Andre Noll; +Cc: Darrick J. Wong, linux-xfs

On Thu, Mar 21, 2024 at 10:59:22AM +0100, Andre Noll wrote:
> On Thu, Mar 21, 09:51, Dave Chinner wrote
> > I just haven't thought to run sparse on XFS recently - running
> > sparse on a full kernel build is just .... awful. I think I'll
> > change my build script so that when I do an '--xfs-only' built it
> > also enables sparse as it's only rebuilding fs/xfs at that point....
> 
> Would it be less awful to run coccinelle with a selected set of
> semantic patches that catch defective patterns such as double
> unlock/free?

Much more awful - because then I have to write scripts to do this
checking rather than just add a command line parameter to the build.

> > > > (Doesn't simple lock debugging catch these sorts of things?)
> > > 
> > > Maybe this error path doesn't get exercised because xfs_reinit_inode()
> > > never fails. AFAICT, it can only fail if security_inode_alloc()
> > > can't allocate the composite inode blob.
> > 
> > Which syzkaller triggers every so often. I also do all my testing
> > with selinux enabled, so security_inode_alloc() is actually being
> > exercised and definitely has the potential to fail on my small
> > memory configs...
> 
> One could try to trigger ENOMEM more easily in functions like this
> by allocating bigger slab caches for debug builds.

That doesn't solve the problem - people keep trying to tell us that
all we need it "better testing" when the right solution to the
problem is for memory allocation to *never fail* unless the caller
says it is OK to fail. Better error injection and/or forced failures
don't actually help us all that much because of the massive scope of
the error checking that has to be done. Getting rid of the need for
error checking altogether is a much better long term solution to
this problem...

> > > > ((It sure would be nice if locking returned a droppable "object" to do
> > > > the unlock ala Rust and then spin_lock could be __must_check.))
> > > 
> > > There's the *LOCK_GUARD* macros which employ gcc's cleanup attribute
> > > to automatically call e.g. spin_unlock() when a variable goes out of
> > > scope (see 54da6a0924311).
> > 
> > IMO, the LOCK_GUARD stuff is an awful anti-pattern. It means some
> > error paths -look broken- because they lack unlocks, and we have to
> > explicitly change code to return from functions with the guarded
> > locks held. This is a diametrically opposed locking pattern to the
> > existing non-guarded lockign patterns - correct behaviour in one
> > pattern is broken behaviour in the other, and vice versa.
> > 
> > That's just -insane- from a code maintenance point of view.
> 
> Converting all locks in fs/xfs in one go is not an option either, as
> this would be too big to review, and non-trivial to begin with.

It's simply not possible because of the issues I mentioned, plus
others.

> There
> are 180+ calls to spin_lock(), and that's just the spinlocks. Also
> these patches would interfere badly with ongoing work.

ANywhere you have unbalanced lock contexts, non-trivial nested
locking, reverse order locking (via trylocks), children doing unlock
and lock to change lock contexts, etc then this "guarded lock scope"
does not work. XFS is -full- of these non-trivial locking
algorithms, so it's just not a good idea to even start trying to do
a conversion...

> > And they are completely useless for anythign complex like these
> > XFS icache functions because the lock scope is not balanced across
> > functions.
> >
> > The lock can also be taken by functions called within the guard
> > scope, and so using guarded lock scoping would result in deadlocks.
> > i.e. xfs_inodegc_queue() needs to take the i_flags_lock, so it must
> > be dropped before we call that.
> 
> Yup, these can't use the LOCK_GUARD macros, which leads to an unholy
> mix of guarded and unguarded locks.

Exactly my point.

> > So, yeah, lock guards seem to me to be largely just a "look ma, no
> > need for rust because we can mightily abuse the C preprocessor!"
> > anti-pattern looking for a problem to solve.
> 
> Do you think there is a valid use case for the cleanup attribute,
> or do you believe that the whole concept is mis-designed?

Sure, there's plenty of cases where scoped cleanup attributes really
does make the code better.  e.g. we had XFS changes that used this
attribute in a complex loop iterator rejected back before it became
accepted so that this lock guard template thingy could be
implemented with it.

-Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2024-03-22  1:15 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-03-19  0:15 [PATCH v2 0/4] xfs: recycle inactive inodes immediately Dave Chinner
2024-03-19  0:15 ` [PATCH 1/4] xfs: make inode inactivation state changes atomic Dave Chinner
2024-03-19 18:01   ` Darrick J. Wong
2024-03-19  0:15 ` [PATCH 2/4] xfs: prepare inode for i_gclist detection Dave Chinner
2024-03-19  0:15 ` [PATCH 3/4] xfs: allow lazy removal of inodes from the inodegc queues Dave Chinner
2024-03-19  0:16 ` [PATCH 4/4] xfs: reactivate XFS_NEED_INACTIVE inodes from xfs_iget Dave Chinner
2024-03-19 18:11   ` Darrick J. Wong
2024-03-20  8:39   ` Andre Noll
2024-03-20 14:53     ` Darrick J. Wong
2024-03-20 16:58       ` Andre Noll
2024-03-20 22:51         ` Dave Chinner
2024-03-21  9:59           ` Andre Noll
2024-03-22  1:09             ` Dave Chinner
2024-03-20 21:58     ` Dave Chinner
  -- strict thread matches above, loose matches on Subject: below --
2024-02-01  0:30 [RFC] [PATCH 0/4] xfs: reactivate inodes immediately in xfs_iget Dave Chinner
2024-02-01  0:30 ` [PATCH 1/4] xfs: make inode inactivation state changes atomic Dave Chinner
2024-02-01 19:07   ` Darrick J. Wong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox