public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCHSET RFC 0/7] xfs: log intent item recovery should reconstruct defer work state
@ 2023-11-28 20:26 Darrick J. Wong
  2023-11-28 20:26 ` [PATCH 1/7] xfs: don't leak recovered attri intent items Darrick J. Wong
                   ` (7 more replies)
  0 siblings, 8 replies; 24+ messages in thread
From: Darrick J. Wong @ 2023-11-28 20:26 UTC (permalink / raw)
  To: djwong, leo.lilong, chandanbabu, hch; +Cc: linux-xfs

Hi all,

Long Li reported a KASAN report from a UAF when intent recovery fails:

 ==================================================================
 BUG: KASAN: slab-use-after-free in xfs_cui_release+0xb7/0xc0
 Read of size 4 at addr ffff888012575e60 by task kworker/u8:3/103
 CPU: 3 PID: 103 Comm: kworker/u8:3 Not tainted 6.4.0-rc7-next-20230619-00003-g94543a53f9a4-dirty #166
 Workqueue: xfs-cil/sda xlog_cil_push_work
 Call Trace:
  <TASK>
  dump_stack_lvl+0x50/0x70
  print_report+0xc2/0x600
  kasan_report+0xb6/0xe0
  xfs_cui_release+0xb7/0xc0
  xfs_cud_item_release+0x3c/0x90
  xfs_trans_committed_bulk+0x2d5/0x7f0
  xlog_cil_committed+0xaba/0xf20
  xlog_cil_push_work+0x1a60/0x2360
  process_one_work+0x78e/0x1140
  worker_thread+0x58b/0xf60
  kthread+0x2cd/0x3c0
  ret_from_fork+0x1f/0x30
  </TASK>

 Allocated by task 531:
  kasan_save_stack+0x22/0x40
  kasan_set_track+0x25/0x30
  __kasan_slab_alloc+0x55/0x60
  kmem_cache_alloc+0x195/0x5f0
  xfs_cui_init+0x198/0x1d0
  xlog_recover_cui_commit_pass2+0x133/0x5f0
  xlog_recover_items_pass2+0x107/0x230
  xlog_recover_commit_trans+0x3e7/0x9c0
  xlog_recovery_process_trans+0x140/0x1d0
  xlog_recover_process_ophdr+0x1a0/0x3d0
  xlog_recover_process_data+0x108/0x2d0
  xlog_recover_process+0x1f6/0x280
  xlog_do_recovery_pass+0x609/0xdb0
  xlog_do_log_recovery+0x84/0xe0
  xlog_do_recover+0x7d/0x470
  xlog_recover+0x25f/0x490
  xfs_log_mount+0x2dd/0x6f0
  xfs_mountfs+0x11ce/0x1e70
  xfs_fs_fill_super+0x10ec/0x1b20
  get_tree_bdev+0x3c8/0x730
  vfs_get_tree+0x89/0x2c0
  path_mount+0xecf/0x1800
  do_mount+0xf3/0x110
  __x64_sys_mount+0x154/0x1f0
  do_syscall_64+0x39/0x80
  entry_SYSCALL_64_after_hwframe+0x63/0xcd

 Freed by task 531:
  kasan_save_stack+0x22/0x40
  kasan_set_track+0x25/0x30
  kasan_save_free_info+0x2b/0x40
  __kasan_slab_free+0x114/0x1b0
  kmem_cache_free+0xf8/0x510
  xfs_cui_item_free+0x95/0xb0
  xfs_cui_release+0x86/0xc0
  xlog_recover_cancel_intents.isra.0+0xf8/0x210
  xlog_recover_finish+0x7e7/0x980
  xfs_log_mount_finish+0x2bb/0x4a0
  xfs_mountfs+0x14bf/0x1e70
  xfs_fs_fill_super+0x10ec/0x1b20
  get_tree_bdev+0x3c8/0x730
  vfs_get_tree+0x89/0x2c0
  path_mount+0xecf/0x1800
  do_mount+0xf3/0x110
  __x64_sys_mount+0x154/0x1f0
  do_syscall_64+0x39/0x80
  entry_SYSCALL_64_after_hwframe+0x63/0xcd

 The buggy address belongs to the object at ffff888012575dc8
  which belongs to the cache xfs_cui_item of size 432
 The buggy address is located 152 bytes inside of
  freed 432-byte region [ffff888012575dc8, ffff888012575f78)

 The buggy address belongs to the physical page:
 page:ffffea0000495d00 refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff888012576208 pfn:0x12574
 head:ffffea0000495d00 order:2 entire_mapcount:0 nr_pages_mapped:0 pincount:0
 flags: 0x1fffff80010200(slab|head|node=0|zone=1|lastcpupid=0x1fffff)
 page_type: 0xffffffff()
 raw: 001fffff80010200 ffff888012092f40 ffff888014570150 ffff888014570150
 raw: ffff888012576208 00000000001e0010 00000001ffffffff 0000000000000000
 page dumped because: kasan: bad access detected

 Memory state around the buggy address:
  ffff888012575d00: fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc fc
  ffff888012575d80: fc fc fc fc fc fc fc fc fc fa fb fb fb fb fb fb
 >ffff888012575e00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                                        ^
  ffff888012575e80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
  ffff888012575f00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fc
 ==================================================================

"If process intents fails, intent items left in AIL will be delete
from AIL and freed in error handling, even intent items that have been
recovered and created done items. After this, uaf will be triggered when
done item committed, because at this point the released intent item will
be accessed.

xlog_recover_finish                     xlog_cil_push_work
----------------------------            ---------------------------
xlog_recover_process_intents
  xfs_cui_item_recover//cui_refcount == 1
    xfs_trans_get_cud
    xfs_trans_commit
      <add cud item to cil>
  xfs_cui_item_recover
    <error occurred and return>
xlog_recover_cancel_intents
  xfs_cui_release     //cui_refcount == 0
    xfs_cui_item_free //free cui
  <release other intent items>
xlog_force_shutdown   //shutdown
                               <...>
                                        <push items in cil>
                                        xlog_cil_committed
                                          xfs_cud_item_release
                                            xfs_cui_release // UAF

"Intent log items are created with a reference count of 2, one for the
creator, and one for the intent done object. Log recovery explicitly
drops the creator reference after it is inserted into the AIL, but it
then processes the log item as if it also owns the intent-done reference.

"The code in ->iop_recovery should assume that it passes the reference
to the done intent, we can remove the intent item from the AIL after
creating the done-intent, but if that code fails before creating the
done-intent then it needs to release the intent reference by log recovery
itself.

"That way when we go to cancel the intent, the only intents we find in
the AIL are the ones we know have not been processed yet and hence we
can safely drop both the creator and the intent done reference from
xlog_recover_cancel_intents().

"Hence if we remove the intent from the list of intents that need to
be recovered after we have done the initial recovery, we acheive two
things:

"1. the tail of the log can be moved forward with the commit of the
done intent or new intent to continue the operation, and

"2. We avoid the problem of trying to determine how many reference
counts we need to drop from intent recovery cancelling because we
never come across intents we've actually attempted recovery on."

Restated: The cause of the UAF is that xlog_recover_cancel_intents
thinks that it owns the refcount on any intent item in the AIL, and that
it's always safe to release these intent items.  This is not true after
the recovery function creates an log intent done item and points it at
the log intent item because releasing the done item always releases the
intent item.

The runtime defer ops code avoids all this by tracking both the log
intent and the intent done items, and releasing only the intent done
item if both have been created.  Long Li proposed fixing this by adding
state flags, but I have a more comprehensive fix.

First, observe that the latter half of the intent _recover functions are
nearly open-coded versions of the corresponding _finish_one function
that uses an onstack deferred work item to single-step through the item.

Second, notice that the recover function is not an exact match because
of the odd behavior that unfinished recovered work items are relogged
with separate log intent items instead of a single new log intent item,
which is what the defer ops machinery does.

Dave and I have long suspected that recovery should be reconstructing
the defer work state from what's in the recovered intent item.  Now we
finally have an excuse to refactor the code to do that.

This series starts by fixing a resource leak in LARP recovery.  We fix
the bug that Long Li reported by switching the intent recovery code to
construct chains of xfs_defer_pending objects and then using the defer
pending objects to track the intent/done item ownership.  Finally, we
clean up the code to reconstruct the exact incore state, which means we
can remove all the opencoded _recover code, which makes maintaining log
items much easier.

If you're going to start using this code, I strongly recommend pulling
from my git trees, which are linked below.

This has been lightly tested with fstests.  Enjoy!
Comments and questions are, as always, welcome.

--D

kernel git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=reconstruct-defer-work-6.7
---
 fs/xfs/libxfs/xfs_defer.c       |  121 ++++++++++++++++++++++-------
 fs/xfs/libxfs/xfs_defer.h       |   19 +++++
 fs/xfs/libxfs/xfs_log_recover.h |    7 ++
 fs/xfs/xfs_attr_item.c          |  135 ++++++++++++++++-----------------
 fs/xfs/xfs_bmap_item.c          |   90 ++++++++++++----------
 fs/xfs/xfs_extfree_item.c       |  128 +++++++++++++------------------
 fs/xfs/xfs_log.c                |    1 
 fs/xfs/xfs_log_priv.h           |    1 
 fs/xfs/xfs_log_recover.c        |  134 ++++++++++++++++++--------------
 fs/xfs/xfs_refcount_item.c      |  139 +++++++++++----------------------
 fs/xfs/xfs_rmap_item.c          |  162 +++++++++++++++++++--------------------
 fs/xfs/xfs_trans.h              |    2 
 12 files changed, 490 insertions(+), 449 deletions(-)


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 1/7] xfs: don't leak recovered attri intent items
  2023-11-28 20:26 [PATCHSET RFC 0/7] xfs: log intent item recovery should reconstruct defer work state Darrick J. Wong
@ 2023-11-28 20:26 ` Darrick J. Wong
  2023-11-30  7:34   ` Christoph Hellwig
  2023-11-28 20:26 ` [PATCH 2/7] xfs: use xfs_defer_pending objects to recover " Darrick J. Wong
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 24+ messages in thread
From: Darrick J. Wong @ 2023-11-28 20:26 UTC (permalink / raw)
  To: djwong, leo.lilong, chandanbabu, hch; +Cc: linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

If recovery finds an xattr log intent item calling for the removal of an
attribute and the file doesn't even have an attr fork, we know that the
removal is trivially complete.  However, we can't just exit the recovery
function without doing something about the recovered log intent item --
it's still on the AIL, and not logging an attrd item means it stays
there forever.

This has likely not been seen in practice because few people use LARP
and the runtime code won't log the attri for a no-attrfork removexattr
operation.  But let's fix this anyway.

Also we shouldn't really be testing the attr fork presence until we've
taken the ILOCK, though this doesn't matter much in recovery, which is
single threaded.

Fixes: fdaf1bb3cafc ("xfs: ATTR_REPLACE algorithm with LARP enabled needs rework")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/xfs_attr_item.c |    9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)


diff --git a/fs/xfs/xfs_attr_item.c b/fs/xfs/xfs_attr_item.c
index 36fe2abb16e6..11e88a76a33c 100644
--- a/fs/xfs/xfs_attr_item.c
+++ b/fs/xfs/xfs_attr_item.c
@@ -329,6 +329,13 @@ xfs_xattri_finish_update(
 		goto out;
 	}
 
+	/* If an attr removal is trivially complete, we're done. */
+	if (attr->xattri_op_flags == XFS_ATTRI_OP_FLAGS_REMOVE &&
+	    !xfs_inode_hasattr(args->dp)) {
+		error = 0;
+		goto out;
+	}
+
 	error = xfs_attr_set_iter(attr);
 	if (!error && attr->xattri_dela_state != XFS_DAS_DONE)
 		error = -EAGAIN;
@@ -608,8 +615,6 @@ xfs_attri_item_recover(
 			attr->xattri_dela_state = xfs_attr_init_add_state(args);
 		break;
 	case XFS_ATTRI_OP_FLAGS_REMOVE:
-		if (!xfs_inode_hasattr(args->dp))
-			goto out;
 		attr->xattri_dela_state = xfs_attr_init_remove_state(args);
 		break;
 	default:


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 2/7] xfs: use xfs_defer_pending objects to recover intent items
  2023-11-28 20:26 [PATCHSET RFC 0/7] xfs: log intent item recovery should reconstruct defer work state Darrick J. Wong
  2023-11-28 20:26 ` [PATCH 1/7] xfs: don't leak recovered attri intent items Darrick J. Wong
@ 2023-11-28 20:26 ` Darrick J. Wong
  2023-11-30  7:53   ` Christoph Hellwig
  2023-11-28 20:26 ` [PATCH 3/7] xfs: pass the xfs_defer_pending object to iop_recover Darrick J. Wong
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 24+ messages in thread
From: Darrick J. Wong @ 2023-11-28 20:26 UTC (permalink / raw)
  To: djwong, leo.lilong, chandanbabu, hch; +Cc: linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

One thing I never quite got around to doing is porting the log intent
item recovery code to reconstruct the deferred pending work state.  As a
result, each intent item open codes xfs_defer_finish_one in its recovery
method, because that's what the EFI code did before xfs_defer.c even
existed.

This is a gross thing to have left unfixed -- if an EFI cannot proceed
due to busy extents, we end up creating separate new EFIs for each
unfinished work item, which is a change in behavior from what runtime
would have done.

Worse yet, Long Li pointed out that there's a UAF in the recovery code.
The ->commit_pass2 function adds the intent item to the AIL and drops
the refcount.  The one remaining refcount is now owned by the recovery
mechanism (aka the log intent items in the AIL) with the intent of
giving the refcount to the intent done item in the ->iop_recover
function.

However, if something fails later in recovery, xlog_recover_finish will
walk the recovered intent items in the AIL and release them.  If the CIL
hasn't been pushed before that point (which is possible since we don't
force the log until later) then the intent done release will try to free
its associated intent, which has already been freed.

This patch starts to address this mess by having the ->commit_pass2
functions recreate the xfs_defer_pending state.  The next few patches
will fix the recovery functions.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_defer.c       |  102 +++++++++++++++++++++++++----------
 fs/xfs/libxfs/xfs_defer.h       |    5 ++
 fs/xfs/libxfs/xfs_log_recover.h |    3 +
 fs/xfs/xfs_attr_item.c          |   10 +--
 fs/xfs/xfs_bmap_item.c          |    9 +--
 fs/xfs/xfs_extfree_item.c       |    9 +--
 fs/xfs/xfs_log.c                |    1 
 fs/xfs/xfs_log_priv.h           |    1 
 fs/xfs/xfs_log_recover.c        |  115 ++++++++++++++++++++-------------------
 fs/xfs/xfs_refcount_item.c      |    9 +--
 fs/xfs/xfs_rmap_item.c          |    9 +--
 11 files changed, 158 insertions(+), 115 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_defer.c b/fs/xfs/libxfs/xfs_defer.c
index f71679ce23b9..0fc714474f87 100644
--- a/fs/xfs/libxfs/xfs_defer.c
+++ b/fs/xfs/libxfs/xfs_defer.c
@@ -245,23 +245,53 @@ xfs_defer_create_intents(
 	return ret;
 }
 
-STATIC void
+static inline void
 xfs_defer_pending_abort(
+	struct xfs_mount		*mp,
+	struct xfs_defer_pending	*dfp)
+{
+	const struct xfs_defer_op_type	*ops = defer_op_types[dfp->dfp_type];
+
+	trace_xfs_defer_pending_abort(mp, dfp);
+
+	if (dfp->dfp_intent && !dfp->dfp_done) {
+		ops->abort_intent(dfp->dfp_intent);
+		dfp->dfp_intent = NULL;
+	}
+}
+
+static inline void
+xfs_defer_pending_cancel_work(
+	struct xfs_mount		*mp,
+	struct xfs_defer_pending	*dfp)
+{
+	const struct xfs_defer_op_type	*ops = defer_op_types[dfp->dfp_type];
+	struct list_head		*pwi;
+	struct list_head		*n;
+
+	trace_xfs_defer_cancel_list(mp, dfp);
+
+	list_del(&dfp->dfp_list);
+	list_for_each_safe(pwi, n, &dfp->dfp_work) {
+		list_del(pwi);
+		dfp->dfp_count--;
+		trace_xfs_defer_cancel_item(mp, dfp, pwi);
+		ops->cancel_item(pwi);
+	}
+	ASSERT(dfp->dfp_count == 0);
+	kmem_cache_free(xfs_defer_pending_cache, dfp);
+}
+
+STATIC void
+xfs_defer_pending_abort_list(
 	struct xfs_mount		*mp,
 	struct list_head		*dop_list)
 {
 	struct xfs_defer_pending	*dfp;
-	const struct xfs_defer_op_type	*ops;
 
 	/* Abort intent items that don't have a done item. */
-	list_for_each_entry(dfp, dop_list, dfp_list) {
-		ops = defer_op_types[dfp->dfp_type];
-		trace_xfs_defer_pending_abort(mp, dfp);
-		if (dfp->dfp_intent && !dfp->dfp_done) {
-			ops->abort_intent(dfp->dfp_intent);
-			dfp->dfp_intent = NULL;
-		}
-	}
+	list_for_each_entry(dfp, dop_list, dfp_list)
+		xfs_defer_pending_abort(mp, dfp);
 }
 
 /* Abort all the intents that were committed. */
@@ -271,7 +301,7 @@ xfs_defer_trans_abort(
 	struct list_head		*dop_pending)
 {
 	trace_xfs_defer_trans_abort(tp, _RET_IP_);
-	xfs_defer_pending_abort(tp->t_mountp, dop_pending);
+	xfs_defer_pending_abort_list(tp->t_mountp, dop_pending);
 }
 
 /*
@@ -389,27 +419,13 @@ xfs_defer_cancel_list(
 {
 	struct xfs_defer_pending	*dfp;
 	struct xfs_defer_pending	*pli;
-	struct list_head		*pwi;
-	struct list_head		*n;
-	const struct xfs_defer_op_type	*ops;
 
 	/*
 	 * Free the pending items.  Caller should already have arranged
 	 * for the intent items to be released.
 	 */
-	list_for_each_entry_safe(dfp, pli, dop_list, dfp_list) {
-		ops = defer_op_types[dfp->dfp_type];
-		trace_xfs_defer_cancel_list(mp, dfp);
-		list_del(&dfp->dfp_list);
-		list_for_each_safe(pwi, n, &dfp->dfp_work) {
-			list_del(pwi);
-			dfp->dfp_count--;
-			trace_xfs_defer_cancel_item(mp, dfp, pwi);
-			ops->cancel_item(pwi);
-		}
-		ASSERT(dfp->dfp_count == 0);
-		kmem_cache_free(xfs_defer_pending_cache, dfp);
-	}
+	list_for_each_entry_safe(dfp, pli, dop_list, dfp_list)
+		xfs_defer_pending_cancel_work(mp, dfp);
 }
 
 /*
@@ -665,6 +681,36 @@ xfs_defer_add(
 	dfp->dfp_count++;
 }
 
+/* Create a pending deferred work item for use with log recovery. */
+struct xfs_defer_pending *
+xfs_defer_start_recovery(
+	struct xfs_log_item		*lip,
+	enum xfs_defer_ops_type		dfp_type)
+{
+	struct xfs_defer_pending	*dfp;
+
+	dfp = kmem_cache_zalloc(xfs_defer_pending_cache,
+			GFP_NOFS | __GFP_NOFAIL);
+	dfp->dfp_type = dfp_type;
+	dfp->dfp_intent = lip;
+	INIT_LIST_HEAD(&dfp->dfp_work);
+	INIT_LIST_HEAD(&dfp->dfp_list);
+	return dfp;
+}
+
+/*
+ * Cancel a deferred work item created to recover a log intent item.  @dfp
+ * will be freed after this function returns.
+ */
+void
+xfs_defer_cancel_recovery(
+	struct xfs_mount		*mp,
+	struct xfs_defer_pending	*dfp)
+{
+	xfs_defer_pending_abort(mp, dfp);
+	xfs_defer_pending_cancel_work(mp, dfp);
+}
+
 /*
  * Move deferred ops from one transaction to another and reset the source to
  * initial state. This is primarily used to carry state forward across
@@ -769,7 +815,7 @@ xfs_defer_ops_capture_abort(
 {
 	unsigned short			i;
 
-	xfs_defer_pending_abort(mp, &dfc->dfc_dfops);
+	xfs_defer_pending_abort_list(mp, &dfc->dfc_dfops);
 	xfs_defer_cancel_list(mp, &dfc->dfc_dfops);
 
 	for (i = 0; i < dfc->dfc_held.dr_bufs; i++)
diff --git a/fs/xfs/libxfs/xfs_defer.h b/fs/xfs/libxfs/xfs_defer.h
index 8788ad5f6a73..3c923a728323 100644
--- a/fs/xfs/libxfs/xfs_defer.h
+++ b/fs/xfs/libxfs/xfs_defer.h
@@ -125,6 +125,11 @@ void xfs_defer_ops_capture_abort(struct xfs_mount *mp,
 		struct xfs_defer_capture *d);
 void xfs_defer_resources_rele(struct xfs_defer_resources *dres);
 
+struct xfs_defer_pending *xfs_defer_start_recovery(struct xfs_log_item *lip,
+		enum xfs_defer_ops_type dfp_type);
+void xfs_defer_cancel_recovery(struct xfs_mount *mp,
+		struct xfs_defer_pending *dfp);
+
 int __init xfs_defer_init_item_caches(void);
 void xfs_defer_destroy_item_caches(void);
 
diff --git a/fs/xfs/libxfs/xfs_log_recover.h b/fs/xfs/libxfs/xfs_log_recover.h
index a5100a11faf9..271a4ce7375c 100644
--- a/fs/xfs/libxfs/xfs_log_recover.h
+++ b/fs/xfs/libxfs/xfs_log_recover.h
@@ -153,4 +153,7 @@ xlog_recover_resv(const struct xfs_trans_res *r)
 	return ret;
 }
 
+void xlog_recover_intent_item(struct xlog *log, struct xfs_log_item *lip,
+		xfs_lsn_t lsn, unsigned int dfp_type);
+
 #endif	/* __XFS_LOG_RECOVER_H__ */
diff --git a/fs/xfs/xfs_attr_item.c b/fs/xfs/xfs_attr_item.c
index 11e88a76a33c..a32716b8cbbd 100644
--- a/fs/xfs/xfs_attr_item.c
+++ b/fs/xfs/xfs_attr_item.c
@@ -772,14 +772,8 @@ xlog_recover_attri_commit_pass2(
 	attrip = xfs_attri_init(mp, nv);
 	memcpy(&attrip->attri_format, attri_formatp, len);
 
-	/*
-	 * The ATTRI has two references. One for the ATTRD and one for ATTRI to
-	 * ensure it makes it into the AIL. Insert the ATTRI into the AIL
-	 * directly and drop the ATTRI reference. Note that
-	 * xfs_trans_ail_update() drops the AIL lock.
-	 */
-	xfs_trans_ail_insert(log->l_ailp, &attrip->attri_item, lsn);
-	xfs_attri_release(attrip);
+	xlog_recover_intent_item(log, &attrip->attri_item, lsn,
+			XFS_DEFER_OPS_TYPE_ATTR);
 	xfs_attri_log_nameval_put(nv);
 	return 0;
 }
diff --git a/fs/xfs/xfs_bmap_item.c b/fs/xfs/xfs_bmap_item.c
index e736a0844c89..6cbae4fdf43f 100644
--- a/fs/xfs/xfs_bmap_item.c
+++ b/fs/xfs/xfs_bmap_item.c
@@ -681,12 +681,9 @@ xlog_recover_bui_commit_pass2(
 	buip = xfs_bui_init(mp);
 	xfs_bui_copy_format(&buip->bui_format, bui_formatp);
 	atomic_set(&buip->bui_next_extent, bui_formatp->bui_nextents);
-	/*
-	 * Insert the intent into the AIL directly and drop one reference so
-	 * that finishing or canceling the work will drop the other.
-	 */
-	xfs_trans_ail_insert(log->l_ailp, &buip->bui_item, lsn);
-	xfs_bui_release(buip);
+
+	xlog_recover_intent_item(log, &buip->bui_item, lsn,
+			XFS_DEFER_OPS_TYPE_BMAP);
 	return 0;
 }
 
diff --git a/fs/xfs/xfs_extfree_item.c b/fs/xfs/xfs_extfree_item.c
index 3fa8789820ad..cf0ddeb70580 100644
--- a/fs/xfs/xfs_extfree_item.c
+++ b/fs/xfs/xfs_extfree_item.c
@@ -820,12 +820,9 @@ xlog_recover_efi_commit_pass2(
 		return error;
 	}
 	atomic_set(&efip->efi_next_extent, efi_formatp->efi_nextents);
-	/*
-	 * Insert the intent into the AIL directly and drop one reference so
-	 * that finishing or canceling the work will drop the other.
-	 */
-	xfs_trans_ail_insert(log->l_ailp, &efip->efi_item, lsn);
-	xfs_efi_release(efip);
+
+	xlog_recover_intent_item(log, &efip->efi_item, lsn,
+			XFS_DEFER_OPS_TYPE_FREE);
 	return 0;
 }
 
diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
index ee206facf0dc..ee087f2388fb 100644
--- a/fs/xfs/xfs_log.c
+++ b/fs/xfs/xfs_log.c
@@ -1542,6 +1542,7 @@ xlog_alloc_log(
 	log->l_covered_state = XLOG_STATE_COVER_IDLE;
 	set_bit(XLOG_ACTIVE_RECOVERY, &log->l_opstate);
 	INIT_DELAYED_WORK(&log->l_work, xfs_log_worker);
+	INIT_LIST_HEAD(&log->l_dfops);
 
 	log->l_prev_block  = -1;
 	/* log->l_tail_lsn = 0x100000000LL; cycle = 1; current block = 0 */
diff --git a/fs/xfs/xfs_log_priv.h b/fs/xfs/xfs_log_priv.h
index fa3ad1d7b31c..c7083f3297cc 100644
--- a/fs/xfs/xfs_log_priv.h
+++ b/fs/xfs/xfs_log_priv.h
@@ -407,6 +407,7 @@ struct xlog {
 	long			l_opstate;	/* operational state */
 	uint			l_quotaoffs_flag; /* XFS_DQ_*, for QUOTAOFFs */
 	struct list_head	*l_buf_cancel_table;
+	struct list_head	l_dfops;	/* recovered log intent items */
 	int			l_iclog_hsize;  /* size of iclog header */
 	int			l_iclog_heads;  /* # of iclog header sectors */
 	uint			l_sectBBsize;   /* sector size in BBs (2^n) */
diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index a1e18b24971a..49481d9c1ced 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -1723,30 +1723,24 @@ xlog_clear_stale_blocks(
  */
 void
 xlog_recover_release_intent(
-	struct xlog		*log,
-	unsigned short		intent_type,
-	uint64_t		intent_id)
+	struct xlog			*log,
+	unsigned short			intent_type,
+	uint64_t			intent_id)
 {
-	struct xfs_ail_cursor	cur;
-	struct xfs_log_item	*lip;
-	struct xfs_ail		*ailp = log->l_ailp;
+	struct xfs_defer_pending	*dfp, *n;
+
+	list_for_each_entry_safe(dfp, n, &log->l_dfops, dfp_list) {
+		struct xfs_log_item	*lip = dfp->dfp_intent;
 
-	spin_lock(&ailp->ail_lock);
-	for (lip = xfs_trans_ail_cursor_first(ailp, &cur, 0); lip != NULL;
-	     lip = xfs_trans_ail_cursor_next(ailp, &cur)) {
 		if (lip->li_type != intent_type)
 			continue;
 		if (!lip->li_ops->iop_match(lip, intent_id))
 			continue;
 
-		spin_unlock(&ailp->ail_lock);
-		lip->li_ops->iop_release(lip);
-		spin_lock(&ailp->ail_lock);
-		break;
-	}
+		ASSERT(xlog_item_is_intent(lip));
 
-	xfs_trans_ail_cursor_done(&cur);
-	spin_unlock(&ailp->ail_lock);
+		xfs_defer_cancel_recovery(log->l_mp, dfp);
+	}
 }
 
 int
@@ -1939,6 +1933,32 @@ xlog_buf_readahead(
 		xfs_buf_readahead(log->l_mp->m_ddev_targp, blkno, len, ops);
 }
 
+/*
+ * Create a deferred work structure for resuming and tracking the progress of a
+ * log intent item that was found during recovery.
+ */
+void
+xlog_recover_intent_item(
+	struct xlog			*log,
+	struct xfs_log_item		*lip,
+	xfs_lsn_t			lsn,
+	unsigned int			dfp_type)
+{
+	struct xfs_defer_pending	*dfp;
+
+	ASSERT(xlog_item_is_intent(lip));
+
+	dfp = xfs_defer_start_recovery(lip, dfp_type);
+	list_add_tail(&dfp->dfp_list, &log->l_dfops);
+
+	/*
+	 * Insert the intent into the AIL directly and drop one reference so
+	 * that finishing or canceling the work will drop the other.
+	 */
+	xfs_trans_ail_insert(log->l_ailp, lip, lsn);
+	lip->li_ops->iop_unpin(lip, 0);
+}
+
 STATIC int
 xlog_recover_items_pass2(
 	struct xlog                     *log,
@@ -2533,29 +2553,24 @@ xlog_abort_defer_ops(
  */
 STATIC int
 xlog_recover_process_intents(
-	struct xlog		*log)
+	struct xlog			*log)
 {
 	LIST_HEAD(capture_list);
-	struct xfs_ail_cursor	cur;
-	struct xfs_log_item	*lip;
-	struct xfs_ail		*ailp;
-	int			error = 0;
+	struct xfs_defer_pending	*dfp, *n;
+	int				error = 0;
 #if defined(DEBUG) || defined(XFS_WARN)
-	xfs_lsn_t		last_lsn;
-#endif
+	xfs_lsn_t			last_lsn;
 
-	ailp = log->l_ailp;
-	spin_lock(&ailp->ail_lock);
-#if defined(DEBUG) || defined(XFS_WARN)
+	spin_lock(&log->l_ailp->ail_lock);
 	last_lsn = xlog_assign_lsn(log->l_curr_cycle, log->l_curr_block);
+	spin_unlock(&log->l_ailp->ail_lock);
 #endif
-	for (lip = xfs_trans_ail_cursor_first(ailp, &cur, 0);
-	     lip != NULL;
-	     lip = xfs_trans_ail_cursor_next(ailp, &cur)) {
-		const struct xfs_item_ops	*ops;
 
-		if (!xlog_item_is_intent(lip))
-			break;
+	list_for_each_entry_safe(dfp, n, &log->l_dfops, dfp_list) {
+		struct xfs_log_item	*lip = dfp->dfp_intent;
+		const struct xfs_item_ops *ops = lip->li_ops;
+
+		ASSERT(xlog_item_is_intent(lip));
 
 		/*
 		 * We should never see a redo item with a LSN higher than
@@ -2573,19 +2588,21 @@ xlog_recover_process_intents(
 		 * The recovery function can free the log item, so we must not
 		 * access lip after it returns.
 		 */
-		spin_unlock(&ailp->ail_lock);
-		ops = lip->li_ops;
 		error = ops->iop_recover(lip, &capture_list);
-		spin_lock(&ailp->ail_lock);
 		if (error) {
 			trace_xlog_intent_recovery_failed(log->l_mp, error,
 					ops->iop_recover);
 			break;
 		}
-	}
 
-	xfs_trans_ail_cursor_done(&cur);
-	spin_unlock(&ailp->ail_lock);
+		/*
+		 * XXX: @lip could have been freed, so detach the log item from
+		 * the pending item.  This does not fix the existing UAF bug
+		 * that occurs if ->iop_recover fails after creating the intent
+		 * done item.
+		 */
+		dfp->dfp_intent = NULL;
+	}
 	if (error)
 		goto err;
 
@@ -2606,27 +2623,15 @@ xlog_recover_process_intents(
  */
 STATIC void
 xlog_recover_cancel_intents(
-	struct xlog		*log)
+	struct xlog			*log)
 {
-	struct xfs_log_item	*lip;
-	struct xfs_ail_cursor	cur;
-	struct xfs_ail		*ailp;
+	struct xfs_defer_pending	*dfp, *n;
 
-	ailp = log->l_ailp;
-	spin_lock(&ailp->ail_lock);
-	lip = xfs_trans_ail_cursor_first(ailp, &cur, 0);
-	while (lip != NULL) {
-		if (!xlog_item_is_intent(lip))
-			break;
+	list_for_each_entry_safe(dfp, n, &log->l_dfops, dfp_list) {
+		ASSERT(xlog_item_is_intent(dfp->dfp_intent));
 
-		spin_unlock(&ailp->ail_lock);
-		lip->li_ops->iop_release(lip);
-		spin_lock(&ailp->ail_lock);
-		lip = xfs_trans_ail_cursor_next(ailp, &cur);
+		xfs_defer_cancel_recovery(log->l_mp, dfp);
 	}
-
-	xfs_trans_ail_cursor_done(&cur);
-	spin_unlock(&ailp->ail_lock);
 }
 
 /*
diff --git a/fs/xfs/xfs_refcount_item.c b/fs/xfs/xfs_refcount_item.c
index 2d4444d61e98..b88cb2e98227 100644
--- a/fs/xfs/xfs_refcount_item.c
+++ b/fs/xfs/xfs_refcount_item.c
@@ -696,12 +696,9 @@ xlog_recover_cui_commit_pass2(
 	cuip = xfs_cui_init(mp, cui_formatp->cui_nextents);
 	xfs_cui_copy_format(&cuip->cui_format, cui_formatp);
 	atomic_set(&cuip->cui_next_extent, cui_formatp->cui_nextents);
-	/*
-	 * Insert the intent into the AIL directly and drop one reference so
-	 * that finishing or canceling the work will drop the other.
-	 */
-	xfs_trans_ail_insert(log->l_ailp, &cuip->cui_item, lsn);
-	xfs_cui_release(cuip);
+
+	xlog_recover_intent_item(log, &cuip->cui_item, lsn,
+			XFS_DEFER_OPS_TYPE_REFCOUNT);
 	return 0;
 }
 
diff --git a/fs/xfs/xfs_rmap_item.c b/fs/xfs/xfs_rmap_item.c
index 0e0e747028da..c30d4a4a14b2 100644
--- a/fs/xfs/xfs_rmap_item.c
+++ b/fs/xfs/xfs_rmap_item.c
@@ -702,12 +702,9 @@ xlog_recover_rui_commit_pass2(
 	ruip = xfs_rui_init(mp, rui_formatp->rui_nextents);
 	xfs_rui_copy_format(&ruip->rui_format, rui_formatp);
 	atomic_set(&ruip->rui_next_extent, rui_formatp->rui_nextents);
-	/*
-	 * Insert the intent into the AIL directly and drop one reference so
-	 * that finishing or canceling the work will drop the other.
-	 */
-	xfs_trans_ail_insert(log->l_ailp, &ruip->rui_item, lsn);
-	xfs_rui_release(ruip);
+
+	xlog_recover_intent_item(log, &ruip->rui_item, lsn,
+			XFS_DEFER_OPS_TYPE_RMAP);
 	return 0;
 }
 


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 3/7] xfs: pass the xfs_defer_pending object to iop_recover
  2023-11-28 20:26 [PATCHSET RFC 0/7] xfs: log intent item recovery should reconstruct defer work state Darrick J. Wong
  2023-11-28 20:26 ` [PATCH 1/7] xfs: don't leak recovered attri intent items Darrick J. Wong
  2023-11-28 20:26 ` [PATCH 2/7] xfs: use xfs_defer_pending objects to recover " Darrick J. Wong
@ 2023-11-28 20:26 ` Darrick J. Wong
  2023-11-30  7:53   ` Christoph Hellwig
  2023-11-28 20:26 ` [PATCH 4/7] xfs: transfer recovered intent item ownership in ->iop_recover Darrick J. Wong
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 24+ messages in thread
From: Darrick J. Wong @ 2023-11-28 20:26 UTC (permalink / raw)
  To: djwong, leo.lilong, chandanbabu, hch; +Cc: linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Now that log intent item recovery recreates the xfs_defer_pending state,
we should pass that into the ->iop_recover routines so that the intent
item can finish the recreation work.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/xfs_attr_item.c     |    3 ++-
 fs/xfs/xfs_bmap_item.c     |    3 ++-
 fs/xfs/xfs_extfree_item.c  |    3 ++-
 fs/xfs/xfs_log_recover.c   |    2 +-
 fs/xfs/xfs_refcount_item.c |    3 ++-
 fs/xfs/xfs_rmap_item.c     |    3 ++-
 fs/xfs/xfs_trans.h         |    4 +++-
 7 files changed, 14 insertions(+), 7 deletions(-)


diff --git a/fs/xfs/xfs_attr_item.c b/fs/xfs/xfs_attr_item.c
index a32716b8cbbd..6119a7a480a0 100644
--- a/fs/xfs/xfs_attr_item.c
+++ b/fs/xfs/xfs_attr_item.c
@@ -545,9 +545,10 @@ xfs_attri_validate(
  */
 STATIC int
 xfs_attri_item_recover(
-	struct xfs_log_item		*lip,
+	struct xfs_defer_pending	*dfp,
 	struct list_head		*capture_list)
 {
+	struct xfs_log_item		*lip = dfp->dfp_intent;
 	struct xfs_attri_log_item	*attrip = ATTRI_ITEM(lip);
 	struct xfs_attr_intent		*attr;
 	struct xfs_mount		*mp = lip->li_log->l_mp;
diff --git a/fs/xfs/xfs_bmap_item.c b/fs/xfs/xfs_bmap_item.c
index 6cbae4fdf43f..3ef55de370b5 100644
--- a/fs/xfs/xfs_bmap_item.c
+++ b/fs/xfs/xfs_bmap_item.c
@@ -486,11 +486,12 @@ xfs_bui_validate(
  */
 STATIC int
 xfs_bui_item_recover(
-	struct xfs_log_item		*lip,
+	struct xfs_defer_pending	*dfp,
 	struct list_head		*capture_list)
 {
 	struct xfs_bmap_intent		fake = { };
 	struct xfs_trans_res		resv;
+	struct xfs_log_item		*lip = dfp->dfp_intent;
 	struct xfs_bui_log_item		*buip = BUI_ITEM(lip);
 	struct xfs_trans		*tp;
 	struct xfs_inode		*ip = NULL;
diff --git a/fs/xfs/xfs_extfree_item.c b/fs/xfs/xfs_extfree_item.c
index cf0ddeb70580..a8245c5ffe49 100644
--- a/fs/xfs/xfs_extfree_item.c
+++ b/fs/xfs/xfs_extfree_item.c
@@ -657,10 +657,11 @@ xfs_efi_validate_ext(
  */
 STATIC int
 xfs_efi_item_recover(
-	struct xfs_log_item		*lip,
+	struct xfs_defer_pending	*dfp,
 	struct list_head		*capture_list)
 {
 	struct xfs_trans_res		resv;
+	struct xfs_log_item		*lip = dfp->dfp_intent;
 	struct xfs_efi_log_item		*efip = EFI_ITEM(lip);
 	struct xfs_mount		*mp = lip->li_log->l_mp;
 	struct xfs_efd_log_item		*efdp;
diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index 49481d9c1ced..c0a75ec1173a 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -2588,7 +2588,7 @@ xlog_recover_process_intents(
 		 * The recovery function can free the log item, so we must not
 		 * access lip after it returns.
 		 */
-		error = ops->iop_recover(lip, &capture_list);
+		error = ops->iop_recover(dfp, &capture_list);
 		if (error) {
 			trace_xlog_intent_recovery_failed(log->l_mp, error,
 					ops->iop_recover);
diff --git a/fs/xfs/xfs_refcount_item.c b/fs/xfs/xfs_refcount_item.c
index b88cb2e98227..3456201aa3e6 100644
--- a/fs/xfs/xfs_refcount_item.c
+++ b/fs/xfs/xfs_refcount_item.c
@@ -474,10 +474,11 @@ xfs_cui_validate_phys(
  */
 STATIC int
 xfs_cui_item_recover(
-	struct xfs_log_item		*lip,
+	struct xfs_defer_pending	*dfp,
 	struct list_head		*capture_list)
 {
 	struct xfs_trans_res		resv;
+	struct xfs_log_item		*lip = dfp->dfp_intent;
 	struct xfs_cui_log_item		*cuip = CUI_ITEM(lip);
 	struct xfs_cud_log_item		*cudp;
 	struct xfs_trans		*tp;
diff --git a/fs/xfs/xfs_rmap_item.c b/fs/xfs/xfs_rmap_item.c
index c30d4a4a14b2..dfd5a3e4b1fb 100644
--- a/fs/xfs/xfs_rmap_item.c
+++ b/fs/xfs/xfs_rmap_item.c
@@ -504,10 +504,11 @@ xfs_rui_validate_map(
  */
 STATIC int
 xfs_rui_item_recover(
-	struct xfs_log_item		*lip,
+	struct xfs_defer_pending	*dfp,
 	struct list_head		*capture_list)
 {
 	struct xfs_trans_res		resv;
+	struct xfs_log_item		*lip = dfp->dfp_intent;
 	struct xfs_rui_log_item		*ruip = RUI_ITEM(lip);
 	struct xfs_rud_log_item		*rudp;
 	struct xfs_trans		*tp;
diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h
index 6e3646d524ce..4e38357237c3 100644
--- a/fs/xfs/xfs_trans.h
+++ b/fs/xfs/xfs_trans.h
@@ -66,6 +66,8 @@ struct xfs_log_item {
 	{ (1u << XFS_LI_DIRTY),		"DIRTY" }, \
 	{ (1u << XFS_LI_WHITEOUT),	"WHITEOUT" }
 
+struct xfs_defer_pending;
+
 struct xfs_item_ops {
 	unsigned flags;
 	void (*iop_size)(struct xfs_log_item *, int *, int *);
@@ -78,7 +80,7 @@ struct xfs_item_ops {
 	xfs_lsn_t (*iop_committed)(struct xfs_log_item *, xfs_lsn_t);
 	uint (*iop_push)(struct xfs_log_item *, struct list_head *);
 	void (*iop_release)(struct xfs_log_item *);
-	int (*iop_recover)(struct xfs_log_item *lip,
+	int (*iop_recover)(struct xfs_defer_pending *dfp,
 			   struct list_head *capture_list);
 	bool (*iop_match)(struct xfs_log_item *item, uint64_t id);
 	struct xfs_log_item *(*iop_relog)(struct xfs_log_item *intent,


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 4/7] xfs: transfer recovered intent item ownership in ->iop_recover
  2023-11-28 20:26 [PATCHSET RFC 0/7] xfs: log intent item recovery should reconstruct defer work state Darrick J. Wong
                   ` (2 preceding siblings ...)
  2023-11-28 20:26 ` [PATCH 3/7] xfs: pass the xfs_defer_pending object to iop_recover Darrick J. Wong
@ 2023-11-28 20:26 ` Darrick J. Wong
  2023-11-30  7:55   ` Christoph Hellwig
  2023-11-28 20:26 ` [PATCH 5/7] xfs: recreate work items when recovering intent items Darrick J. Wong
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 24+ messages in thread
From: Darrick J. Wong @ 2023-11-28 20:26 UTC (permalink / raw)
  To: djwong, leo.lilong, chandanbabu, hch; +Cc: linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Now that we pass the xfs_defer_pending object into the intent item
recovery functions, we know exactly when ownership of the sole refcount
passes from the recovery context to the intent done item.  At that
point, we need to null out dfp_intent so that the recovery mechanism
won't release it.  This should fix the UAF problem reported by Long Li.

Note that we still want to recreate the full deferred work state.  That
will be addressed in the next patches.

Fixes: 2e76f188fd90 ("xfs: cancel intents immediately if process_intents fails")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_log_recover.h |    2 ++
 fs/xfs/xfs_attr_item.c          |    1 +
 fs/xfs/xfs_bmap_item.c          |    2 ++
 fs/xfs/xfs_extfree_item.c       |    2 ++
 fs/xfs/xfs_log_recover.c        |   23 +++++++++++++----------
 fs/xfs/xfs_refcount_item.c      |    1 +
 fs/xfs/xfs_rmap_item.c          |    2 ++
 7 files changed, 23 insertions(+), 10 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_log_recover.h b/fs/xfs/libxfs/xfs_log_recover.h
index 271a4ce7375c..13583df9f239 100644
--- a/fs/xfs/libxfs/xfs_log_recover.h
+++ b/fs/xfs/libxfs/xfs_log_recover.h
@@ -155,5 +155,7 @@ xlog_recover_resv(const struct xfs_trans_res *r)
 
 void xlog_recover_intent_item(struct xlog *log, struct xfs_log_item *lip,
 		xfs_lsn_t lsn, unsigned int dfp_type);
+void xlog_recover_transfer_intent(struct xfs_trans *tp,
+		struct xfs_defer_pending *dfp);
 
 #endif	/* __XFS_LOG_RECOVER_H__ */
diff --git a/fs/xfs/xfs_attr_item.c b/fs/xfs/xfs_attr_item.c
index 6119a7a480a0..82775e9537df 100644
--- a/fs/xfs/xfs_attr_item.c
+++ b/fs/xfs/xfs_attr_item.c
@@ -632,6 +632,7 @@ xfs_attri_item_recover(
 
 	args->trans = tp;
 	done_item = xfs_trans_get_attrd(tp, attrip);
+	xlog_recover_transfer_intent(tp, dfp);
 
 	xfs_ilock(ip, XFS_ILOCK_EXCL);
 	xfs_trans_ijoin(tp, ip, 0);
diff --git a/fs/xfs/xfs_bmap_item.c b/fs/xfs/xfs_bmap_item.c
index 3ef55de370b5..b6d63b8bdad5 100644
--- a/fs/xfs/xfs_bmap_item.c
+++ b/fs/xfs/xfs_bmap_item.c
@@ -524,6 +524,8 @@ xfs_bui_item_recover(
 		goto err_rele;
 
 	budp = xfs_trans_get_bud(tp, buip);
+	xlog_recover_transfer_intent(tp, dfp);
+
 	xfs_ilock(ip, XFS_ILOCK_EXCL);
 	xfs_trans_ijoin(tp, ip, 0);
 
diff --git a/fs/xfs/xfs_extfree_item.c b/fs/xfs/xfs_extfree_item.c
index a8245c5ffe49..c9908fb33765 100644
--- a/fs/xfs/xfs_extfree_item.c
+++ b/fs/xfs/xfs_extfree_item.c
@@ -689,7 +689,9 @@ xfs_efi_item_recover(
 	error = xfs_trans_alloc(mp, &resv, 0, 0, 0, &tp);
 	if (error)
 		return error;
+
 	efdp = xfs_trans_get_efd(tp, efip, efip->efi_format.efi_nextents);
+	xlog_recover_transfer_intent(tp, dfp);
 
 	for (i = 0; i < efip->efi_format.efi_nextents; i++) {
 		struct xfs_extent_free_item	fake = {
diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index c0a75ec1173a..038ef6858298 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -2585,8 +2585,7 @@ xlog_recover_process_intents(
 		 * the recover routine or else those subsequent intents will be
 		 * replayed in the wrong order!
 		 *
-		 * The recovery function can free the log item, so we must not
-		 * access lip after it returns.
+		 * The recovery function can free @dfp.
 		 */
 		error = ops->iop_recover(dfp, &capture_list);
 		if (error) {
@@ -2594,14 +2593,6 @@ xlog_recover_process_intents(
 					ops->iop_recover);
 			break;
 		}
-
-		/*
-		 * XXX: @lip could have been freed, so detach the log item from
-		 * the pending item.  This does not fix the existing UAF bug
-		 * that occurs if ->iop_recover fails after creating the intent
-		 * done item.
-		 */
-		dfp->dfp_intent = NULL;
 	}
 	if (error)
 		goto err;
@@ -2634,6 +2625,18 @@ xlog_recover_cancel_intents(
 	}
 }
 
+/*
+ * Transfer ownership of the recovered log intent item to the recovery
+ * transaction.
+ */
+void
+xlog_recover_transfer_intent(
+	struct xfs_trans		*tp,
+	struct xfs_defer_pending	*dfp)
+{
+	dfp->dfp_intent = NULL;
+}
+
 /*
  * This routine performs a transaction to null out a bad inode pointer
  * in an agi unlinked inode hash bucket.
diff --git a/fs/xfs/xfs_refcount_item.c b/fs/xfs/xfs_refcount_item.c
index 3456201aa3e6..f1b259223802 100644
--- a/fs/xfs/xfs_refcount_item.c
+++ b/fs/xfs/xfs_refcount_item.c
@@ -523,6 +523,7 @@ xfs_cui_item_recover(
 		return error;
 
 	cudp = xfs_trans_get_cud(tp, cuip);
+	xlog_recover_transfer_intent(tp, dfp);
 
 	for (i = 0; i < cuip->cui_format.cui_nextents; i++) {
 		struct xfs_refcount_intent	fake = { };
diff --git a/fs/xfs/xfs_rmap_item.c b/fs/xfs/xfs_rmap_item.c
index dfd5a3e4b1fb..5e8a02d2b045 100644
--- a/fs/xfs/xfs_rmap_item.c
+++ b/fs/xfs/xfs_rmap_item.c
@@ -537,7 +537,9 @@ xfs_rui_item_recover(
 			XFS_TRANS_RESERVE, &tp);
 	if (error)
 		return error;
+
 	rudp = xfs_trans_get_rud(tp, ruip);
+	xlog_recover_transfer_intent(tp, dfp);
 
 	for (i = 0; i < ruip->rui_format.rui_nextents; i++) {
 		struct xfs_rmap_intent	fake = { };


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 5/7] xfs: recreate work items when recovering intent items
  2023-11-28 20:26 [PATCHSET RFC 0/7] xfs: log intent item recovery should reconstruct defer work state Darrick J. Wong
                   ` (3 preceding siblings ...)
  2023-11-28 20:26 ` [PATCH 4/7] xfs: transfer recovered intent item ownership in ->iop_recover Darrick J. Wong
@ 2023-11-28 20:26 ` Darrick J. Wong
  2023-11-30  8:06   ` Christoph Hellwig
  2023-11-30 13:13   ` Long Li
  2023-11-28 20:27 ` [PATCH 6/7] xfs: use xfs_defer_finish_one to finish recovered work items Darrick J. Wong
                   ` (2 subsequent siblings)
  7 siblings, 2 replies; 24+ messages in thread
From: Darrick J. Wong @ 2023-11-28 20:26 UTC (permalink / raw)
  To: djwong, leo.lilong, chandanbabu, hch; +Cc: linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Recreate work items for each xfs_defer_pending object when we are
recovering intent items.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_defer.h  |    9 ++++
 fs/xfs/xfs_attr_item.c     |   94 +++++++++++++++++++++----------------
 fs/xfs/xfs_bmap_item.c     |   56 ++++++++++++++--------
 fs/xfs/xfs_extfree_item.c  |   50 ++++++++++++-------
 fs/xfs/xfs_refcount_item.c |   61 +++++++++++-------------
 fs/xfs/xfs_rmap_item.c     |  113 ++++++++++++++++++++++++--------------------
 6 files changed, 221 insertions(+), 162 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_defer.h b/fs/xfs/libxfs/xfs_defer.h
index 3c923a728323..ee1e76d3f7e8 100644
--- a/fs/xfs/libxfs/xfs_defer.h
+++ b/fs/xfs/libxfs/xfs_defer.h
@@ -130,6 +130,15 @@ struct xfs_defer_pending *xfs_defer_start_recovery(struct xfs_log_item *lip,
 void xfs_defer_cancel_recovery(struct xfs_mount *mp,
 		struct xfs_defer_pending *dfp);
 
+static inline void
+xfs_defer_recover_work_item(
+	struct xfs_defer_pending	*dfp,
+	struct list_head		*work)
+{
+	list_add_tail(work, &dfp->dfp_work);
+	dfp->dfp_count++;
+}
+
 int __init xfs_defer_init_item_caches(void);
 void xfs_defer_destroy_item_caches(void);
 
diff --git a/fs/xfs/xfs_attr_item.c b/fs/xfs/xfs_attr_item.c
index 82775e9537df..fbc88325848a 100644
--- a/fs/xfs/xfs_attr_item.c
+++ b/fs/xfs/xfs_attr_item.c
@@ -539,47 +539,22 @@ xfs_attri_validate(
 	return xfs_verify_ino(mp, attrp->alfi_ino);
 }
 
-/*
- * Process an attr intent item that was recovered from the log.  We need to
- * delete the attr that it describes.
- */
-STATIC int
-xfs_attri_item_recover(
+static inline struct xfs_attr_intent *
+xfs_attri_recover_work(
+	struct xfs_mount		*mp,
 	struct xfs_defer_pending	*dfp,
-	struct list_head		*capture_list)
+	struct xfs_attri_log_format	*attrp,
+	struct xfs_inode		*ip,
+	struct xfs_attri_log_nameval	*nv)
 {
-	struct xfs_log_item		*lip = dfp->dfp_intent;
-	struct xfs_attri_log_item	*attrip = ATTRI_ITEM(lip);
 	struct xfs_attr_intent		*attr;
-	struct xfs_mount		*mp = lip->li_log->l_mp;
-	struct xfs_inode		*ip;
 	struct xfs_da_args		*args;
-	struct xfs_trans		*tp;
-	struct xfs_trans_res		resv;
-	struct xfs_attri_log_format	*attrp;
-	struct xfs_attri_log_nameval	*nv = attrip->attri_nameval;
-	int				error;
-	int				total;
-	int				local;
-	struct xfs_attrd_log_item	*done_item = NULL;
-
-	/*
-	 * First check the validity of the attr described by the ATTRI.  If any
-	 * are bad, then assume that all are bad and just toss the ATTRI.
-	 */
-	attrp = &attrip->attri_format;
-	if (!xfs_attri_validate(mp, attrp) ||
-	    !xfs_attr_namecheck(nv->name.i_addr, nv->name.i_len))
-		return -EFSCORRUPTED;
-
-	error = xlog_recover_iget(mp,  attrp->alfi_ino, &ip);
-	if (error)
-		return error;
 
 	attr = kmem_zalloc(sizeof(struct xfs_attr_intent) +
 			   sizeof(struct xfs_da_args), KM_NOFS);
 	args = (struct xfs_da_args *)(attr + 1);
 
+	INIT_LIST_HEAD(&attr->xattri_list);
 	attr->xattri_da_args = args;
 	attr->xattri_op_flags = attrp->alfi_op_flags &
 						XFS_ATTRI_OP_FLAGS_TYPE_MASK;
@@ -607,6 +582,8 @@ xfs_attri_item_recover(
 	switch (attr->xattri_op_flags) {
 	case XFS_ATTRI_OP_FLAGS_SET:
 	case XFS_ATTRI_OP_FLAGS_REPLACE:
+		int			local;
+
 		args->value = nv->value.i_addr;
 		args->valuelen = nv->value.i_len;
 		args->total = xfs_attr_calc_size(args, &local);
@@ -618,19 +595,58 @@ xfs_attri_item_recover(
 	case XFS_ATTRI_OP_FLAGS_REMOVE:
 		attr->xattri_dela_state = xfs_attr_init_remove_state(args);
 		break;
-	default:
-		ASSERT(0);
-		error = -EFSCORRUPTED;
-		goto out;
 	}
 
+	xfs_defer_recover_work_item(dfp, &attr->xattri_list);
+	return attr;
+}
+
+/*
+ * Process an attr intent item that was recovered from the log.  We need to
+ * delete the attr that it describes.
+ */
+STATIC int
+xfs_attri_item_recover(
+	struct xfs_defer_pending	*dfp,
+	struct list_head		*capture_list)
+{
+	struct xfs_log_item		*lip = dfp->dfp_intent;
+	struct xfs_attri_log_item	*attrip = ATTRI_ITEM(lip);
+	struct xfs_attr_intent		*attr;
+	struct xfs_mount		*mp = lip->li_log->l_mp;
+	struct xfs_inode		*ip;
+	struct xfs_da_args		*args;
+	struct xfs_trans		*tp;
+	struct xfs_trans_res		resv;
+	struct xfs_attri_log_format	*attrp;
+	struct xfs_attri_log_nameval	*nv = attrip->attri_nameval;
+	int				error;
+	int				total;
+	struct xfs_attrd_log_item	*done_item = NULL;
+
+	/*
+	 * First check the validity of the attr described by the ATTRI.  If any
+	 * are bad, then assume that all are bad and just toss the ATTRI.
+	 */
+	attrp = &attrip->attri_format;
+	if (!xfs_attri_validate(mp, attrp) ||
+	    !xfs_attr_namecheck(nv->name.i_addr, nv->name.i_len))
+		return -EFSCORRUPTED;
+
+	error = xlog_recover_iget(mp,  attrp->alfi_ino, &ip);
+	if (error)
+		return error;
+
+	attr = xfs_attri_recover_work(mp, dfp, attrp, ip, nv);
+	args = attr->xattri_da_args;
+
 	xfs_init_attr_trans(args, &resv, &total);
 	resv = xlog_recover_resv(&resv);
 	error = xfs_trans_alloc(mp, &resv, total, 0, XFS_TRANS_RESERVE, &tp);
 	if (error)
-		goto out;
-
+		return error;
 	args->trans = tp;
+
 	done_item = xfs_trans_get_attrd(tp, attrip);
 	xlog_recover_transfer_intent(tp, dfp);
 
@@ -661,8 +677,6 @@ xfs_attri_item_recover(
 out_unlock:
 	xfs_iunlock(ip, XFS_ILOCK_EXCL);
 	xfs_irele(ip);
-out:
-	xfs_attr_free_item(attr);
 	return error;
 }
 
diff --git a/fs/xfs/xfs_bmap_item.c b/fs/xfs/xfs_bmap_item.c
index b6d63b8bdad5..2046547299d6 100644
--- a/fs/xfs/xfs_bmap_item.c
+++ b/fs/xfs/xfs_bmap_item.c
@@ -480,6 +480,29 @@ xfs_bui_validate(
 	return xfs_verify_fsbext(mp, map->me_startblock, map->me_len);
 }
 
+static inline struct xfs_bmap_intent *
+xfs_bui_recover_work(
+	struct xfs_mount		*mp,
+	struct xfs_defer_pending	*dfp,
+	struct xfs_map_extent		*map)
+{
+	struct xfs_bmap_intent		*bi;
+
+	bi = kmem_cache_zalloc(xfs_bmap_intent_cache, GFP_NOFS | __GFP_NOFAIL);
+	INIT_LIST_HEAD(&bi->bi_list);
+	bi->bi_whichfork = (map->me_flags & XFS_BMAP_EXTENT_ATTR_FORK) ?
+			XFS_ATTR_FORK : XFS_DATA_FORK;
+	bi->bi_type = map->me_flags & XFS_BMAP_EXTENT_TYPE_MASK;
+	bi->bi_bmap.br_startblock = map->me_startblock;
+	bi->bi_bmap.br_startoff = map->me_startoff;
+	bi->bi_bmap.br_blockcount = map->me_len;
+	bi->bi_bmap.br_state = (map->me_flags & XFS_BMAP_EXTENT_UNWRITTEN) ?
+			XFS_EXT_UNWRITTEN : XFS_EXT_NORM;
+
+	xfs_defer_recover_work_item(dfp, &bi->bi_list);
+	return bi;
+}
+
 /*
  * Process a bmap update intent item that was recovered from the log.
  * We need to update some inode's bmbt.
@@ -489,7 +512,6 @@ xfs_bui_item_recover(
 	struct xfs_defer_pending	*dfp,
 	struct list_head		*capture_list)
 {
-	struct xfs_bmap_intent		fake = { };
 	struct xfs_trans_res		resv;
 	struct xfs_log_item		*lip = dfp->dfp_intent;
 	struct xfs_bui_log_item		*buip = BUI_ITEM(lip);
@@ -498,6 +520,7 @@ xfs_bui_item_recover(
 	struct xfs_mount		*mp = lip->li_log->l_mp;
 	struct xfs_map_extent		*map;
 	struct xfs_bud_log_item		*budp;
+	struct xfs_bmap_intent		*fake;
 	int				iext_delta;
 	int				error = 0;
 
@@ -508,9 +531,7 @@ xfs_bui_item_recover(
 	}
 
 	map = &buip->bui_format.bui_extents[0];
-	fake.bi_whichfork = (map->me_flags & XFS_BMAP_EXTENT_ATTR_FORK) ?
-			XFS_ATTR_FORK : XFS_DATA_FORK;
-	fake.bi_type = map->me_flags & XFS_BMAP_EXTENT_TYPE_MASK;
+	fake = xfs_bui_recover_work(mp, dfp, map);
 
 	error = xlog_recover_iget(mp, map->me_owner, &ip);
 	if (error)
@@ -529,36 +550,31 @@ xfs_bui_item_recover(
 	xfs_ilock(ip, XFS_ILOCK_EXCL);
 	xfs_trans_ijoin(tp, ip, 0);
 
-	if (fake.bi_type == XFS_BMAP_MAP)
+	if (fake->bi_type == XFS_BMAP_MAP)
 		iext_delta = XFS_IEXT_ADD_NOSPLIT_CNT;
 	else
 		iext_delta = XFS_IEXT_PUNCH_HOLE_CNT;
 
-	error = xfs_iext_count_may_overflow(ip, fake.bi_whichfork, iext_delta);
+	error = xfs_iext_count_may_overflow(ip, fake->bi_whichfork, iext_delta);
 	if (error == -EFBIG)
 		error = xfs_iext_count_upgrade(tp, ip, iext_delta);
 	if (error)
 		goto err_cancel;
 
-	fake.bi_owner = ip;
-	fake.bi_bmap.br_startblock = map->me_startblock;
-	fake.bi_bmap.br_startoff = map->me_startoff;
-	fake.bi_bmap.br_blockcount = map->me_len;
-	fake.bi_bmap.br_state = (map->me_flags & XFS_BMAP_EXTENT_UNWRITTEN) ?
-			XFS_EXT_UNWRITTEN : XFS_EXT_NORM;
+	fake->bi_owner = ip;
 
-	xfs_bmap_update_get_group(mp, &fake);
-	error = xfs_trans_log_finish_bmap_update(tp, budp, &fake);
+	xfs_bmap_update_get_group(mp, fake);
+	error = xfs_trans_log_finish_bmap_update(tp, budp, fake);
 	if (error == -EFSCORRUPTED)
-		XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp, map,
-				sizeof(*map));
-	xfs_bmap_update_put_group(&fake);
+		XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp,
+				&buip->bui_format, sizeof(buip->bui_format));
+	xfs_bmap_update_put_group(fake);
 	if (error)
 		goto err_cancel;
 
-	if (fake.bi_bmap.br_blockcount > 0) {
-		ASSERT(fake.bi_type == XFS_BMAP_UNMAP);
-		xfs_bmap_unmap_extent(tp, ip, &fake.bi_bmap);
+	if (fake->bi_bmap.br_blockcount > 0) {
+		ASSERT(fake->bi_type == XFS_BMAP_UNMAP);
+		xfs_bmap_unmap_extent(tp, ip, &fake->bi_bmap);
 	}
 
 	/*
diff --git a/fs/xfs/xfs_extfree_item.c b/fs/xfs/xfs_extfree_item.c
index c9908fb33765..d76241c62e81 100644
--- a/fs/xfs/xfs_extfree_item.c
+++ b/fs/xfs/xfs_extfree_item.c
@@ -651,6 +651,25 @@ xfs_efi_validate_ext(
 	return xfs_verify_fsbext(mp, extp->ext_start, extp->ext_len);
 }
 
+static inline void
+xfs_efi_recover_work(
+	struct xfs_mount		*mp,
+	struct xfs_defer_pending	*dfp,
+	struct xfs_extent		*extp)
+{
+	struct xfs_extent_free_item	*xefi;
+
+	xefi = kmem_cache_zalloc(xfs_extfree_item_cache,
+			       GFP_KERNEL | __GFP_NOFAIL);
+	INIT_LIST_HEAD(&xefi->xefi_list);
+	xefi->xefi_startblock = extp->ext_start;
+	xefi->xefi_blockcount = extp->ext_len;
+	xefi->xefi_agresv = XFS_AG_RESV_NONE;
+	xefi->xefi_owner = XFS_RMAP_OWN_UNKNOWN;
+
+	xfs_defer_recover_work_item(dfp, &xefi->xefi_list);
+}
+
 /*
  * Process an extent free intent item that was recovered from
  * the log.  We need to free the extents that it describes.
@@ -666,6 +685,7 @@ xfs_efi_item_recover(
 	struct xfs_mount		*mp = lip->li_log->l_mp;
 	struct xfs_efd_log_item		*efdp;
 	struct xfs_trans		*tp;
+	struct xfs_extent_free_item	*fake;
 	int				i;
 	int				error = 0;
 	bool				requeue_only = false;
@@ -683,6 +703,8 @@ xfs_efi_item_recover(
 					sizeof(efip->efi_format));
 			return -EFSCORRUPTED;
 		}
+
+		xfs_efi_recover_work(mp, dfp, &efip->efi_format.efi_extents[i]);
 	}
 
 	resv = xlog_recover_resv(&M_RES(mp)->tr_itruncate);
@@ -693,22 +715,11 @@ xfs_efi_item_recover(
 	efdp = xfs_trans_get_efd(tp, efip, efip->efi_format.efi_nextents);
 	xlog_recover_transfer_intent(tp, dfp);
 
-	for (i = 0; i < efip->efi_format.efi_nextents; i++) {
-		struct xfs_extent_free_item	fake = {
-			.xefi_owner		= XFS_RMAP_OWN_UNKNOWN,
-			.xefi_agresv		= XFS_AG_RESV_NONE,
-		};
-		struct xfs_extent		*extp;
-
-		extp = &efip->efi_format.efi_extents[i];
-
-		fake.xefi_startblock = extp->ext_start;
-		fake.xefi_blockcount = extp->ext_len;
-
+	list_for_each_entry(fake, &dfp->dfp_work, xefi_list) {
 		if (!requeue_only) {
-			xfs_extent_free_get_group(mp, &fake);
-			error = xfs_trans_free_extent(tp, efdp, &fake);
-			xfs_extent_free_put_group(&fake);
+			xfs_extent_free_get_group(mp, fake);
+			error = xfs_trans_free_extent(tp, efdp, fake);
+			xfs_extent_free_put_group(fake);
 		}
 
 		/*
@@ -717,10 +728,10 @@ xfs_efi_item_recover(
 		 * run again later with a new transaction context.
 		 */
 		if (error == -EAGAIN || requeue_only) {
-			error = xfs_free_extent_later(tp, fake.xefi_startblock,
-					fake.xefi_blockcount,
+			error = xfs_free_extent_later(tp, fake->xefi_startblock,
+					fake->xefi_blockcount,
 					&XFS_RMAP_OINFO_ANY_OWNER,
-					fake.xefi_agresv);
+					fake->xefi_agresv);
 			if (!error) {
 				requeue_only = true;
 				continue;
@@ -729,7 +740,8 @@ xfs_efi_item_recover(
 
 		if (error == -EFSCORRUPTED)
 			XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp,
-					extp, sizeof(*extp));
+					&efip->efi_format,
+					sizeof(efip->efi_format));
 		if (error)
 			goto abort_error;
 
diff --git a/fs/xfs/xfs_refcount_item.c b/fs/xfs/xfs_refcount_item.c
index f1b259223802..fbf1e7a0a784 100644
--- a/fs/xfs/xfs_refcount_item.c
+++ b/fs/xfs/xfs_refcount_item.c
@@ -468,6 +468,24 @@ xfs_cui_validate_phys(
 	return xfs_verify_fsbext(mp, pmap->pe_startblock, pmap->pe_len);
 }
 
+static inline void
+xfs_cui_recover_work(
+	struct xfs_mount		*mp,
+	struct xfs_defer_pending	*dfp,
+	struct xfs_phys_extent		*pmap)
+{
+	struct xfs_refcount_intent	*ri;
+
+	ri = kmem_cache_alloc(xfs_refcount_intent_cache,
+			GFP_NOFS | __GFP_NOFAIL);
+	INIT_LIST_HEAD(&ri->ri_list);
+	ri->ri_type = pmap->pe_flags & XFS_REFCOUNT_EXTENT_TYPE_MASK;
+	ri->ri_startblock = pmap->pe_startblock;
+	ri->ri_blockcount = pmap->pe_len;
+
+	xfs_defer_recover_work_item(dfp, &ri->ri_list);
+}
+
 /*
  * Process a refcount update intent item that was recovered from the log.
  * We need to update the refcountbt.
@@ -484,7 +502,7 @@ xfs_cui_item_recover(
 	struct xfs_trans		*tp;
 	struct xfs_btree_cur		*rcur = NULL;
 	struct xfs_mount		*mp = lip->li_log->l_mp;
-	unsigned int			refc_type;
+	struct xfs_refcount_intent	*fake;
 	bool				requeue_only = false;
 	int				i;
 	int				error = 0;
@@ -502,6 +520,8 @@ xfs_cui_item_recover(
 					sizeof(cuip->cui_format));
 			return -EFSCORRUPTED;
 		}
+
+		xfs_cui_recover_work(mp, dfp, &cuip->cui_format.cui_extents[i]);
 	}
 
 	/*
@@ -525,35 +545,12 @@ xfs_cui_item_recover(
 	cudp = xfs_trans_get_cud(tp, cuip);
 	xlog_recover_transfer_intent(tp, dfp);
 
-	for (i = 0; i < cuip->cui_format.cui_nextents; i++) {
-		struct xfs_refcount_intent	fake = { };
-		struct xfs_phys_extent		*pmap;
-
-		pmap = &cuip->cui_format.cui_extents[i];
-		refc_type = pmap->pe_flags & XFS_REFCOUNT_EXTENT_TYPE_MASK;
-		switch (refc_type) {
-		case XFS_REFCOUNT_INCREASE:
-		case XFS_REFCOUNT_DECREASE:
-		case XFS_REFCOUNT_ALLOC_COW:
-		case XFS_REFCOUNT_FREE_COW:
-			fake.ri_type = refc_type;
-			break;
-		default:
-			XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp,
-					&cuip->cui_format,
-					sizeof(cuip->cui_format));
-			error = -EFSCORRUPTED;
-			goto abort_error;
-		}
-
-		fake.ri_startblock = pmap->pe_startblock;
-		fake.ri_blockcount = pmap->pe_len;
-
+	list_for_each_entry(fake, &dfp->dfp_work, ri_list) {
 		if (!requeue_only) {
-			xfs_refcount_update_get_group(mp, &fake);
+			xfs_refcount_update_get_group(mp, fake);
 			error = xfs_trans_log_finish_refcount_update(tp, cudp,
-					&fake, &rcur);
-			xfs_refcount_update_put_group(&fake);
+					fake, &rcur);
+			xfs_refcount_update_put_group(fake);
 		}
 		if (error == -EFSCORRUPTED)
 			XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp,
@@ -563,13 +560,13 @@ xfs_cui_item_recover(
 			goto abort_error;
 
 		/* Requeue what we didn't finish. */
-		if (fake.ri_blockcount > 0) {
+		if (fake->ri_blockcount > 0) {
 			struct xfs_bmbt_irec	irec = {
-				.br_startblock	= fake.ri_startblock,
-				.br_blockcount	= fake.ri_blockcount,
+				.br_startblock	= fake->ri_startblock,
+				.br_blockcount	= fake->ri_blockcount,
 			};
 
-			switch (fake.ri_type) {
+			switch (fake->ri_type) {
 			case XFS_REFCOUNT_INCREASE:
 				xfs_refcount_increase_extent(tp, &irec);
 				break;
diff --git a/fs/xfs/xfs_rmap_item.c b/fs/xfs/xfs_rmap_item.c
index 5e8a02d2b045..62400c83120d 100644
--- a/fs/xfs/xfs_rmap_item.c
+++ b/fs/xfs/xfs_rmap_item.c
@@ -498,6 +498,59 @@ xfs_rui_validate_map(
 	return xfs_verify_fsbext(mp, map->me_startblock, map->me_len);
 }
 
+static inline void
+xfs_rui_recover_work(
+	struct xfs_mount		*mp,
+	struct xfs_defer_pending	*dfp,
+	const struct xfs_map_extent	*map)
+{
+	struct xfs_rmap_intent		*ri;
+
+	ri = kmem_cache_alloc(xfs_rmap_intent_cache, GFP_NOFS | __GFP_NOFAIL);
+	INIT_LIST_HEAD(&ri->ri_list);
+
+	switch (map->me_flags & XFS_RMAP_EXTENT_TYPE_MASK) {
+	case XFS_RMAP_EXTENT_MAP:
+		ri->ri_type = XFS_RMAP_MAP;
+		break;
+	case XFS_RMAP_EXTENT_MAP_SHARED:
+		ri->ri_type = XFS_RMAP_MAP_SHARED;
+		break;
+	case XFS_RMAP_EXTENT_UNMAP:
+		ri->ri_type = XFS_RMAP_UNMAP;
+		break;
+	case XFS_RMAP_EXTENT_UNMAP_SHARED:
+		ri->ri_type = XFS_RMAP_UNMAP_SHARED;
+		break;
+	case XFS_RMAP_EXTENT_CONVERT:
+		ri->ri_type = XFS_RMAP_CONVERT;
+		break;
+	case XFS_RMAP_EXTENT_CONVERT_SHARED:
+		ri->ri_type = XFS_RMAP_CONVERT_SHARED;
+		break;
+	case XFS_RMAP_EXTENT_ALLOC:
+		ri->ri_type = XFS_RMAP_ALLOC;
+		break;
+	case XFS_RMAP_EXTENT_FREE:
+		ri->ri_type = XFS_RMAP_FREE;
+		break;
+	default:
+		ASSERT(0);
+		return;
+	}
+
+	ri->ri_owner = map->me_owner;
+	ri->ri_whichfork = (map->me_flags & XFS_RMAP_EXTENT_ATTR_FORK) ?
+			XFS_ATTR_FORK : XFS_DATA_FORK;
+	ri->ri_bmap.br_startblock = map->me_startblock;
+	ri->ri_bmap.br_startoff = map->me_startoff;
+	ri->ri_bmap.br_blockcount = map->me_len;
+	ri->ri_bmap.br_state = (map->me_flags & XFS_RMAP_EXTENT_UNWRITTEN) ?
+			XFS_EXT_UNWRITTEN : XFS_EXT_NORM;
+
+	xfs_defer_recover_work_item(dfp, &ri->ri_list);
+}
+
 /*
  * Process an rmap update intent item that was recovered from the log.
  * We need to update the rmapbt.
@@ -514,6 +567,7 @@ xfs_rui_item_recover(
 	struct xfs_trans		*tp;
 	struct xfs_btree_cur		*rcur = NULL;
 	struct xfs_mount		*mp = lip->li_log->l_mp;
+	struct xfs_rmap_intent		*fake;
 	int				i;
 	int				error = 0;
 
@@ -530,6 +584,8 @@ xfs_rui_item_recover(
 					sizeof(ruip->rui_format));
 			return -EFSCORRUPTED;
 		}
+
+		xfs_rui_recover_work(mp, dfp, &ruip->rui_format.rui_extents[i]);
 	}
 
 	resv = xlog_recover_resv(&M_RES(mp)->tr_itruncate);
@@ -541,60 +597,15 @@ xfs_rui_item_recover(
 	rudp = xfs_trans_get_rud(tp, ruip);
 	xlog_recover_transfer_intent(tp, dfp);
 
-	for (i = 0; i < ruip->rui_format.rui_nextents; i++) {
-		struct xfs_rmap_intent	fake = { };
-		struct xfs_map_extent	*map;
-
-		map = &ruip->rui_format.rui_extents[i];
-		switch (map->me_flags & XFS_RMAP_EXTENT_TYPE_MASK) {
-		case XFS_RMAP_EXTENT_MAP:
-			fake.ri_type = XFS_RMAP_MAP;
-			break;
-		case XFS_RMAP_EXTENT_MAP_SHARED:
-			fake.ri_type = XFS_RMAP_MAP_SHARED;
-			break;
-		case XFS_RMAP_EXTENT_UNMAP:
-			fake.ri_type = XFS_RMAP_UNMAP;
-			break;
-		case XFS_RMAP_EXTENT_UNMAP_SHARED:
-			fake.ri_type = XFS_RMAP_UNMAP_SHARED;
-			break;
-		case XFS_RMAP_EXTENT_CONVERT:
-			fake.ri_type = XFS_RMAP_CONVERT;
-			break;
-		case XFS_RMAP_EXTENT_CONVERT_SHARED:
-			fake.ri_type = XFS_RMAP_CONVERT_SHARED;
-			break;
-		case XFS_RMAP_EXTENT_ALLOC:
-			fake.ri_type = XFS_RMAP_ALLOC;
-			break;
-		case XFS_RMAP_EXTENT_FREE:
-			fake.ri_type = XFS_RMAP_FREE;
-			break;
-		default:
-			XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp,
-					&ruip->rui_format,
-					sizeof(ruip->rui_format));
-			error = -EFSCORRUPTED;
-			goto abort_error;
-		}
-
-		fake.ri_owner = map->me_owner;
-		fake.ri_whichfork = (map->me_flags & XFS_RMAP_EXTENT_ATTR_FORK) ?
-				XFS_ATTR_FORK : XFS_DATA_FORK;
-		fake.ri_bmap.br_startblock = map->me_startblock;
-		fake.ri_bmap.br_startoff = map->me_startoff;
-		fake.ri_bmap.br_blockcount = map->me_len;
-		fake.ri_bmap.br_state = (map->me_flags & XFS_RMAP_EXTENT_UNWRITTEN) ?
-				XFS_EXT_UNWRITTEN : XFS_EXT_NORM;
-
-		xfs_rmap_update_get_group(mp, &fake);
-		error = xfs_trans_log_finish_rmap_update(tp, rudp, &fake,
+	list_for_each_entry(fake, &dfp->dfp_work, ri_list) {
+		xfs_rmap_update_get_group(mp, fake);
+		error = xfs_trans_log_finish_rmap_update(tp, rudp, fake,
 				&rcur);
 		if (error == -EFSCORRUPTED)
 			XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp,
-					map, sizeof(*map));
-		xfs_rmap_update_put_group(&fake);
+					&ruip->rui_format,
+					sizeof(ruip->rui_format));
+		xfs_rmap_update_put_group(fake);
 		if (error)
 			goto abort_error;
 


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 6/7] xfs: use xfs_defer_finish_one to finish recovered work items
  2023-11-28 20:26 [PATCHSET RFC 0/7] xfs: log intent item recovery should reconstruct defer work state Darrick J. Wong
                   ` (4 preceding siblings ...)
  2023-11-28 20:26 ` [PATCH 5/7] xfs: recreate work items when recovering intent items Darrick J. Wong
@ 2023-11-28 20:27 ` Darrick J. Wong
  2023-11-30  8:10   ` Christoph Hellwig
  2023-11-28 20:27 ` [PATCH 7/7] xfs: move ->iop_recover to xfs_defer_op_type Darrick J. Wong
  2023-11-30 12:06 ` [PATCHSET RFC 0/7] xfs: log intent item recovery should reconstruct defer work state Long Li
  7 siblings, 1 reply; 24+ messages in thread
From: Darrick J. Wong @ 2023-11-28 20:27 UTC (permalink / raw)
  To: djwong, leo.lilong, chandanbabu, hch; +Cc: linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Get rid of the open-coded calls to xfs_defer_finish_one.  This also
means that the recovery transaction takes care of cleaning up the dfp,
and we have solved (I hope) all the ownership issues in recovery.

While we're at it, dump xattr log items if the recovery fails with
corruption errors like we do for the other intent items.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_defer.c       |    2 +
 fs/xfs/libxfs/xfs_defer.h       |    1 +
 fs/xfs/libxfs/xfs_log_recover.h |    2 +
 fs/xfs/xfs_attr_item.c          |   24 +++------------
 fs/xfs/xfs_bmap_item.c          |   24 ++++-----------
 fs/xfs/xfs_extfree_item.c       |   45 +++++------------------------
 fs/xfs/xfs_log_recover.c        |   15 +++++++---
 fs/xfs/xfs_refcount_item.c      |   61 +++++----------------------------------
 fs/xfs/xfs_rmap_item.c          |   29 +++++--------------
 9 files changed, 50 insertions(+), 153 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_defer.c b/fs/xfs/libxfs/xfs_defer.c
index 0fc714474f87..7ad554dda0aa 100644
--- a/fs/xfs/libxfs/xfs_defer.c
+++ b/fs/xfs/libxfs/xfs_defer.c
@@ -484,7 +484,7 @@ xfs_defer_relog(
  * Log an intent-done item for the first pending intent, and finish the work
  * items.
  */
-static int
+int
 xfs_defer_finish_one(
 	struct xfs_trans		*tp,
 	struct xfs_defer_pending	*dfp)
diff --git a/fs/xfs/libxfs/xfs_defer.h b/fs/xfs/libxfs/xfs_defer.h
index ee1e76d3f7e8..49aaba364abc 100644
--- a/fs/xfs/libxfs/xfs_defer.h
+++ b/fs/xfs/libxfs/xfs_defer.h
@@ -41,6 +41,7 @@ void xfs_defer_add(struct xfs_trans *tp, enum xfs_defer_ops_type type,
 		struct list_head *h);
 int xfs_defer_finish_noroll(struct xfs_trans **tp);
 int xfs_defer_finish(struct xfs_trans **tp);
+int xfs_defer_finish_one(struct xfs_trans *tp, struct xfs_defer_pending *dfp);
 void xfs_defer_cancel(struct xfs_trans *);
 void xfs_defer_move(struct xfs_trans *dtp, struct xfs_trans *stp);
 
diff --git a/fs/xfs/libxfs/xfs_log_recover.h b/fs/xfs/libxfs/xfs_log_recover.h
index 13583df9f239..52162a17fc5e 100644
--- a/fs/xfs/libxfs/xfs_log_recover.h
+++ b/fs/xfs/libxfs/xfs_log_recover.h
@@ -155,7 +155,7 @@ xlog_recover_resv(const struct xfs_trans_res *r)
 
 void xlog_recover_intent_item(struct xlog *log, struct xfs_log_item *lip,
 		xfs_lsn_t lsn, unsigned int dfp_type);
-void xlog_recover_transfer_intent(struct xfs_trans *tp,
+int xlog_recover_finish_intent(struct xfs_trans *tp,
 		struct xfs_defer_pending *dfp);
 
 #endif	/* __XFS_LOG_RECOVER_H__ */
diff --git a/fs/xfs/xfs_attr_item.c b/fs/xfs/xfs_attr_item.c
index fbc88325848a..20a14ac13a51 100644
--- a/fs/xfs/xfs_attr_item.c
+++ b/fs/xfs/xfs_attr_item.c
@@ -622,7 +622,6 @@ xfs_attri_item_recover(
 	struct xfs_attri_log_nameval	*nv = attrip->attri_nameval;
 	int				error;
 	int				total;
-	struct xfs_attrd_log_item	*done_item = NULL;
 
 	/*
 	 * First check the validity of the attr described by the ATTRI.  If any
@@ -647,27 +646,14 @@ xfs_attri_item_recover(
 		return error;
 	args->trans = tp;
 
-	done_item = xfs_trans_get_attrd(tp, attrip);
-	xlog_recover_transfer_intent(tp, dfp);
-
 	xfs_ilock(ip, XFS_ILOCK_EXCL);
 	xfs_trans_ijoin(tp, ip, 0);
 
-	error = xfs_xattri_finish_update(attr, done_item);
-	if (error == -EAGAIN) {
-		/*
-		 * There's more work to do, so add the intent item to this
-		 * transaction so that we can continue it later.
-		 */
-		xfs_defer_add(tp, XFS_DEFER_OPS_TYPE_ATTR, &attr->xattri_list);
-		error = xfs_defer_ops_capture_and_commit(tp, capture_list);
-		if (error)
-			goto out_unlock;
-
-		xfs_iunlock(ip, XFS_ILOCK_EXCL);
-		xfs_irele(ip);
-		return 0;
-	}
+	error = xlog_recover_finish_intent(tp, dfp);
+	if (error == -EFSCORRUPTED)
+		XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp,
+				&attrip->attri_format,
+				sizeof(attrip->attri_format));
 	if (error) {
 		xfs_trans_cancel(tp);
 		goto out_unlock;
diff --git a/fs/xfs/xfs_bmap_item.c b/fs/xfs/xfs_bmap_item.c
index 2046547299d6..84381611be73 100644
--- a/fs/xfs/xfs_bmap_item.c
+++ b/fs/xfs/xfs_bmap_item.c
@@ -498,6 +498,7 @@ xfs_bui_recover_work(
 	bi->bi_bmap.br_blockcount = map->me_len;
 	bi->bi_bmap.br_state = (map->me_flags & XFS_BMAP_EXTENT_UNWRITTEN) ?
 			XFS_EXT_UNWRITTEN : XFS_EXT_NORM;
+	xfs_bmap_update_get_group(mp, bi);
 
 	xfs_defer_recover_work_item(dfp, &bi->bi_list);
 	return bi;
@@ -519,8 +520,7 @@ xfs_bui_item_recover(
 	struct xfs_inode		*ip = NULL;
 	struct xfs_mount		*mp = lip->li_log->l_mp;
 	struct xfs_map_extent		*map;
-	struct xfs_bud_log_item		*budp;
-	struct xfs_bmap_intent		*fake;
+	struct xfs_bmap_intent		*work;
 	int				iext_delta;
 	int				error = 0;
 
@@ -531,7 +531,7 @@ xfs_bui_item_recover(
 	}
 
 	map = &buip->bui_format.bui_extents[0];
-	fake = xfs_bui_recover_work(mp, dfp, map);
+	work = xfs_bui_recover_work(mp, dfp, map);
 
 	error = xlog_recover_iget(mp, map->me_owner, &ip);
 	if (error)
@@ -544,39 +544,29 @@ xfs_bui_item_recover(
 	if (error)
 		goto err_rele;
 
-	budp = xfs_trans_get_bud(tp, buip);
-	xlog_recover_transfer_intent(tp, dfp);
-
 	xfs_ilock(ip, XFS_ILOCK_EXCL);
 	xfs_trans_ijoin(tp, ip, 0);
 
-	if (fake->bi_type == XFS_BMAP_MAP)
+	if (work->bi_type == XFS_BMAP_MAP)
 		iext_delta = XFS_IEXT_ADD_NOSPLIT_CNT;
 	else
 		iext_delta = XFS_IEXT_PUNCH_HOLE_CNT;
 
-	error = xfs_iext_count_may_overflow(ip, fake->bi_whichfork, iext_delta);
+	error = xfs_iext_count_may_overflow(ip, work->bi_whichfork, iext_delta);
 	if (error == -EFBIG)
 		error = xfs_iext_count_upgrade(tp, ip, iext_delta);
 	if (error)
 		goto err_cancel;
 
-	fake->bi_owner = ip;
+	work->bi_owner = ip;
 
-	xfs_bmap_update_get_group(mp, fake);
-	error = xfs_trans_log_finish_bmap_update(tp, budp, fake);
+	error = xlog_recover_finish_intent(tp, dfp);
 	if (error == -EFSCORRUPTED)
 		XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp,
 				&buip->bui_format, sizeof(buip->bui_format));
-	xfs_bmap_update_put_group(fake);
 	if (error)
 		goto err_cancel;
 
-	if (fake->bi_bmap.br_blockcount > 0) {
-		ASSERT(fake->bi_type == XFS_BMAP_UNMAP);
-		xfs_bmap_unmap_extent(tp, ip, &fake->bi_bmap);
-	}
-
 	/*
 	 * Commit transaction, which frees the transaction and saves the inode
 	 * for later replay activities.
diff --git a/fs/xfs/xfs_extfree_item.c b/fs/xfs/xfs_extfree_item.c
index d76241c62e81..1ee24506415e 100644
--- a/fs/xfs/xfs_extfree_item.c
+++ b/fs/xfs/xfs_extfree_item.c
@@ -666,6 +666,7 @@ xfs_efi_recover_work(
 	xefi->xefi_blockcount = extp->ext_len;
 	xefi->xefi_agresv = XFS_AG_RESV_NONE;
 	xefi->xefi_owner = XFS_RMAP_OWN_UNKNOWN;
+	xfs_extent_free_get_group(mp, xefi);
 
 	xfs_defer_recover_work_item(dfp, &xefi->xefi_list);
 }
@@ -683,12 +684,9 @@ xfs_efi_item_recover(
 	struct xfs_log_item		*lip = dfp->dfp_intent;
 	struct xfs_efi_log_item		*efip = EFI_ITEM(lip);
 	struct xfs_mount		*mp = lip->li_log->l_mp;
-	struct xfs_efd_log_item		*efdp;
 	struct xfs_trans		*tp;
-	struct xfs_extent_free_item	*fake;
 	int				i;
 	int				error = 0;
-	bool				requeue_only = false;
 
 	/*
 	 * First check the validity of the extents described by the
@@ -712,40 +710,13 @@ xfs_efi_item_recover(
 	if (error)
 		return error;
 
-	efdp = xfs_trans_get_efd(tp, efip, efip->efi_format.efi_nextents);
-	xlog_recover_transfer_intent(tp, dfp);
-
-	list_for_each_entry(fake, &dfp->dfp_work, xefi_list) {
-		if (!requeue_only) {
-			xfs_extent_free_get_group(mp, fake);
-			error = xfs_trans_free_extent(tp, efdp, fake);
-			xfs_extent_free_put_group(fake);
-		}
-
-		/*
-		 * If we can't free the extent without potentially deadlocking,
-		 * requeue the rest of the extents to a new so that they get
-		 * run again later with a new transaction context.
-		 */
-		if (error == -EAGAIN || requeue_only) {
-			error = xfs_free_extent_later(tp, fake->xefi_startblock,
-					fake->xefi_blockcount,
-					&XFS_RMAP_OINFO_ANY_OWNER,
-					fake->xefi_agresv);
-			if (!error) {
-				requeue_only = true;
-				continue;
-			}
-		}
-
-		if (error == -EFSCORRUPTED)
-			XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp,
-					&efip->efi_format,
-					sizeof(efip->efi_format));
-		if (error)
-			goto abort_error;
-
-	}
+	error = xlog_recover_finish_intent(tp, dfp);
+	if (error == -EFSCORRUPTED)
+		XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp,
+				&efip->efi_format,
+				sizeof(efip->efi_format));
+	if (error)
+		goto abort_error;
 
 	return xfs_defer_ops_capture_and_commit(tp, capture_list);
 
diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index 038ef6858298..9bd2bdf42a84 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -2627,14 +2627,21 @@ xlog_recover_cancel_intents(
 
 /*
  * Transfer ownership of the recovered log intent item to the recovery
- * transaction.
+ * transaction and try to finish the work.  If there is more work to be done,
+ * the dfp will remain attached to the transaction.
  */
-void
-xlog_recover_transfer_intent(
+int
+xlog_recover_finish_intent(
 	struct xfs_trans		*tp,
 	struct xfs_defer_pending	*dfp)
 {
-	dfp->dfp_intent = NULL;
+	int				error;
+
+	list_move(&dfp->dfp_list, &tp->t_dfops);
+	error = xfs_defer_finish_one(tp, dfp);
+	if (error == -EAGAIN)
+		return 0;
+	return error;
 }
 
 /*
diff --git a/fs/xfs/xfs_refcount_item.c b/fs/xfs/xfs_refcount_item.c
index fbf1e7a0a784..a0e3a42555ec 100644
--- a/fs/xfs/xfs_refcount_item.c
+++ b/fs/xfs/xfs_refcount_item.c
@@ -482,6 +482,7 @@ xfs_cui_recover_work(
 	ri->ri_type = pmap->pe_flags & XFS_REFCOUNT_EXTENT_TYPE_MASK;
 	ri->ri_startblock = pmap->pe_startblock;
 	ri->ri_blockcount = pmap->pe_len;
+	xfs_refcount_update_get_group(mp, ri);
 
 	xfs_defer_recover_work_item(dfp, &ri->ri_list);
 }
@@ -498,12 +499,8 @@ xfs_cui_item_recover(
 	struct xfs_trans_res		resv;
 	struct xfs_log_item		*lip = dfp->dfp_intent;
 	struct xfs_cui_log_item		*cuip = CUI_ITEM(lip);
-	struct xfs_cud_log_item		*cudp;
 	struct xfs_trans		*tp;
-	struct xfs_btree_cur		*rcur = NULL;
 	struct xfs_mount		*mp = lip->li_log->l_mp;
-	struct xfs_refcount_intent	*fake;
-	bool				requeue_only = false;
 	int				i;
 	int				error = 0;
 
@@ -542,59 +539,17 @@ xfs_cui_item_recover(
 	if (error)
 		return error;
 
-	cudp = xfs_trans_get_cud(tp, cuip);
-	xlog_recover_transfer_intent(tp, dfp);
+	error = xlog_recover_finish_intent(tp, dfp);
+	if (error == -EFSCORRUPTED)
+		XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp,
+				&cuip->cui_format,
+				sizeof(cuip->cui_format));
+	if (error)
+		goto abort_error;
 
-	list_for_each_entry(fake, &dfp->dfp_work, ri_list) {
-		if (!requeue_only) {
-			xfs_refcount_update_get_group(mp, fake);
-			error = xfs_trans_log_finish_refcount_update(tp, cudp,
-					fake, &rcur);
-			xfs_refcount_update_put_group(fake);
-		}
-		if (error == -EFSCORRUPTED)
-			XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp,
-					&cuip->cui_format,
-					sizeof(cuip->cui_format));
-		if (error)
-			goto abort_error;
-
-		/* Requeue what we didn't finish. */
-		if (fake->ri_blockcount > 0) {
-			struct xfs_bmbt_irec	irec = {
-				.br_startblock	= fake->ri_startblock,
-				.br_blockcount	= fake->ri_blockcount,
-			};
-
-			switch (fake->ri_type) {
-			case XFS_REFCOUNT_INCREASE:
-				xfs_refcount_increase_extent(tp, &irec);
-				break;
-			case XFS_REFCOUNT_DECREASE:
-				xfs_refcount_decrease_extent(tp, &irec);
-				break;
-			case XFS_REFCOUNT_ALLOC_COW:
-				xfs_refcount_alloc_cow_extent(tp,
-						irec.br_startblock,
-						irec.br_blockcount);
-				break;
-			case XFS_REFCOUNT_FREE_COW:
-				xfs_refcount_free_cow_extent(tp,
-						irec.br_startblock,
-						irec.br_blockcount);
-				break;
-			default:
-				ASSERT(0);
-			}
-			requeue_only = true;
-		}
-	}
-
-	xfs_refcount_finish_one_cleanup(tp, rcur, error);
 	return xfs_defer_ops_capture_and_commit(tp, capture_list);
 
 abort_error:
-	xfs_refcount_finish_one_cleanup(tp, rcur, error);
 	xfs_trans_cancel(tp);
 	return error;
 }
diff --git a/fs/xfs/xfs_rmap_item.c b/fs/xfs/xfs_rmap_item.c
index 62400c83120d..2c883ca8a545 100644
--- a/fs/xfs/xfs_rmap_item.c
+++ b/fs/xfs/xfs_rmap_item.c
@@ -547,6 +547,7 @@ xfs_rui_recover_work(
 	ri->ri_bmap.br_blockcount = map->me_len;
 	ri->ri_bmap.br_state = (map->me_flags & XFS_RMAP_EXTENT_UNWRITTEN) ?
 			XFS_EXT_UNWRITTEN : XFS_EXT_NORM;
+	xfs_rmap_update_get_group(mp, ri);
 
 	xfs_defer_recover_work_item(dfp, &ri->ri_list);
 }
@@ -563,11 +564,8 @@ xfs_rui_item_recover(
 	struct xfs_trans_res		resv;
 	struct xfs_log_item		*lip = dfp->dfp_intent;
 	struct xfs_rui_log_item		*ruip = RUI_ITEM(lip);
-	struct xfs_rud_log_item		*rudp;
 	struct xfs_trans		*tp;
-	struct xfs_btree_cur		*rcur = NULL;
 	struct xfs_mount		*mp = lip->li_log->l_mp;
-	struct xfs_rmap_intent		*fake;
 	int				i;
 	int				error = 0;
 
@@ -594,28 +592,17 @@ xfs_rui_item_recover(
 	if (error)
 		return error;
 
-	rudp = xfs_trans_get_rud(tp, ruip);
-	xlog_recover_transfer_intent(tp, dfp);
+	error = xlog_recover_finish_intent(tp, dfp);
+	if (error == -EFSCORRUPTED)
+		XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp,
+				&ruip->rui_format,
+				sizeof(ruip->rui_format));
+	if (error)
+		goto abort_error;
 
-	list_for_each_entry(fake, &dfp->dfp_work, ri_list) {
-		xfs_rmap_update_get_group(mp, fake);
-		error = xfs_trans_log_finish_rmap_update(tp, rudp, fake,
-				&rcur);
-		if (error == -EFSCORRUPTED)
-			XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp,
-					&ruip->rui_format,
-					sizeof(ruip->rui_format));
-		xfs_rmap_update_put_group(fake);
-		if (error)
-			goto abort_error;
-
-	}
-
-	xfs_rmap_finish_one_cleanup(tp, rcur, error);
 	return xfs_defer_ops_capture_and_commit(tp, capture_list);
 
 abort_error:
-	xfs_rmap_finish_one_cleanup(tp, rcur, error);
 	xfs_trans_cancel(tp);
 	return error;
 }


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 7/7] xfs: move ->iop_recover to xfs_defer_op_type
  2023-11-28 20:26 [PATCHSET RFC 0/7] xfs: log intent item recovery should reconstruct defer work state Darrick J. Wong
                   ` (5 preceding siblings ...)
  2023-11-28 20:27 ` [PATCH 6/7] xfs: use xfs_defer_finish_one to finish recovered work items Darrick J. Wong
@ 2023-11-28 20:27 ` Darrick J. Wong
  2023-11-30  8:11   ` Christoph Hellwig
  2023-11-30 12:06 ` [PATCHSET RFC 0/7] xfs: log intent item recovery should reconstruct defer work state Long Li
  7 siblings, 1 reply; 24+ messages in thread
From: Darrick J. Wong @ 2023-11-28 20:27 UTC (permalink / raw)
  To: djwong, leo.lilong, chandanbabu, hch; +Cc: linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Finish off the series by moving the intent item recovery function
pointer to the xfs_defer_op_type struct, since this is really a deferred
work function now.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_defer.c       |   17 +++++++++++++++
 fs/xfs/libxfs/xfs_defer.h       |    4 ++++
 fs/xfs/libxfs/xfs_log_recover.h |    2 ++
 fs/xfs/xfs_attr_item.c          |    4 ++--
 fs/xfs/xfs_bmap_item.c          |   22 ++++++++++----------
 fs/xfs/xfs_extfree_item.c       |   43 ++++++++++++++++++++-------------------
 fs/xfs/xfs_log_recover.c        |    9 +++-----
 fs/xfs/xfs_refcount_item.c      |   24 +++++++++++-----------
 fs/xfs/xfs_rmap_item.c          |   24 +++++++++++-----------
 fs/xfs/xfs_trans.h              |    4 ----
 10 files changed, 85 insertions(+), 68 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_defer.c b/fs/xfs/libxfs/xfs_defer.c
index 7ad554dda0aa..2b805e6fd320 100644
--- a/fs/xfs/libxfs/xfs_defer.c
+++ b/fs/xfs/libxfs/xfs_defer.c
@@ -711,6 +711,23 @@ xfs_defer_cancel_recovery(
 	xfs_defer_pending_cancel_work(mp, dfp);
 }
 
+/* Replay the deferred work item created from a recovered log intent item. */
+int
+xfs_defer_finish_recovery(
+	struct xfs_mount		*mp,
+	struct xfs_defer_pending	*dfp,
+	struct list_head		*capture_list)
+{
+	const struct xfs_defer_op_type	*ops = defer_op_types[dfp->dfp_type];
+	int				error;
+
+	error = ops->recover_work(dfp, capture_list);
+	if (error)
+		trace_xlog_intent_recovery_failed(mp, error,
+				ops->recover_work);
+	return error;
+}
+
 /*
  * Move deferred ops from one transaction to another and reset the source to
  * initial state. This is primarily used to carry state forward across
diff --git a/fs/xfs/libxfs/xfs_defer.h b/fs/xfs/libxfs/xfs_defer.h
index 49aaba364abc..d4800491f749 100644
--- a/fs/xfs/libxfs/xfs_defer.h
+++ b/fs/xfs/libxfs/xfs_defer.h
@@ -57,6 +57,8 @@ struct xfs_defer_op_type {
 	void (*finish_cleanup)(struct xfs_trans *tp,
 			struct xfs_btree_cur *state, int error);
 	void (*cancel_item)(struct list_head *item);
+	int (*recover_work)(struct xfs_defer_pending *dfp,
+			    struct list_head *capture_list);
 	unsigned int		max_items;
 };
 
@@ -130,6 +132,8 @@ struct xfs_defer_pending *xfs_defer_start_recovery(struct xfs_log_item *lip,
 		enum xfs_defer_ops_type dfp_type);
 void xfs_defer_cancel_recovery(struct xfs_mount *mp,
 		struct xfs_defer_pending *dfp);
+int xfs_defer_finish_recovery(struct xfs_mount *mp,
+		struct xfs_defer_pending *dfp, struct list_head *capture_list);
 
 static inline void
 xfs_defer_recover_work_item(
diff --git a/fs/xfs/libxfs/xfs_log_recover.h b/fs/xfs/libxfs/xfs_log_recover.h
index 52162a17fc5e..c8e5d912895b 100644
--- a/fs/xfs/libxfs/xfs_log_recover.h
+++ b/fs/xfs/libxfs/xfs_log_recover.h
@@ -153,6 +153,8 @@ xlog_recover_resv(const struct xfs_trans_res *r)
 	return ret;
 }
 
+struct xfs_defer_pending;
+
 void xlog_recover_intent_item(struct xlog *log, struct xfs_log_item *lip,
 		xfs_lsn_t lsn, unsigned int dfp_type);
 int xlog_recover_finish_intent(struct xfs_trans *tp,
diff --git a/fs/xfs/xfs_attr_item.c b/fs/xfs/xfs_attr_item.c
index 20a14ac13a51..b4b45d7333e3 100644
--- a/fs/xfs/xfs_attr_item.c
+++ b/fs/xfs/xfs_attr_item.c
@@ -606,7 +606,7 @@ xfs_attri_recover_work(
  * delete the attr that it describes.
  */
 STATIC int
-xfs_attri_item_recover(
+xfs_attr_recover_work(
 	struct xfs_defer_pending	*dfp,
 	struct list_head		*capture_list)
 {
@@ -822,6 +822,7 @@ const struct xfs_defer_op_type xfs_attr_defer_type = {
 	.create_done	= xfs_attr_create_done,
 	.finish_item	= xfs_attr_finish_item,
 	.cancel_item	= xfs_attr_cancel_item,
+	.recover_work	= xfs_attr_recover_work,
 };
 
 /*
@@ -858,7 +859,6 @@ static const struct xfs_item_ops xfs_attri_item_ops = {
 	.iop_format	= xfs_attri_item_format,
 	.iop_unpin	= xfs_attri_item_unpin,
 	.iop_release    = xfs_attri_item_release,
-	.iop_recover	= xfs_attri_item_recover,
 	.iop_match	= xfs_attri_item_match,
 	.iop_relog	= xfs_attri_item_relog,
 };
diff --git a/fs/xfs/xfs_bmap_item.c b/fs/xfs/xfs_bmap_item.c
index 84381611be73..193166a81aa7 100644
--- a/fs/xfs/xfs_bmap_item.c
+++ b/fs/xfs/xfs_bmap_item.c
@@ -437,15 +437,6 @@ xfs_bmap_update_cancel_item(
 	kmem_cache_free(xfs_bmap_intent_cache, bi);
 }
 
-const struct xfs_defer_op_type xfs_bmap_update_defer_type = {
-	.max_items	= XFS_BUI_MAX_FAST_EXTENTS,
-	.create_intent	= xfs_bmap_update_create_intent,
-	.abort_intent	= xfs_bmap_update_abort_intent,
-	.create_done	= xfs_bmap_update_create_done,
-	.finish_item	= xfs_bmap_update_finish_item,
-	.cancel_item	= xfs_bmap_update_cancel_item,
-};
-
 /* Is this recovered BUI ok? */
 static inline bool
 xfs_bui_validate(
@@ -509,7 +500,7 @@ xfs_bui_recover_work(
  * We need to update some inode's bmbt.
  */
 STATIC int
-xfs_bui_item_recover(
+xfs_bmap_recover_work(
 	struct xfs_defer_pending	*dfp,
 	struct list_head		*capture_list)
 {
@@ -588,6 +579,16 @@ xfs_bui_item_recover(
 	return error;
 }
 
+const struct xfs_defer_op_type xfs_bmap_update_defer_type = {
+	.max_items	= XFS_BUI_MAX_FAST_EXTENTS,
+	.create_intent	= xfs_bmap_update_create_intent,
+	.abort_intent	= xfs_bmap_update_abort_intent,
+	.create_done	= xfs_bmap_update_create_done,
+	.finish_item	= xfs_bmap_update_finish_item,
+	.cancel_item	= xfs_bmap_update_cancel_item,
+	.recover_work	= xfs_bmap_recover_work,
+};
+
 STATIC bool
 xfs_bui_item_match(
 	struct xfs_log_item	*lip,
@@ -628,7 +629,6 @@ static const struct xfs_item_ops xfs_bui_item_ops = {
 	.iop_format	= xfs_bui_item_format,
 	.iop_unpin	= xfs_bui_item_unpin,
 	.iop_release	= xfs_bui_item_release,
-	.iop_recover	= xfs_bui_item_recover,
 	.iop_match	= xfs_bui_item_match,
 	.iop_relog	= xfs_bui_item_relog,
 };
diff --git a/fs/xfs/xfs_extfree_item.c b/fs/xfs/xfs_extfree_item.c
index 1ee24506415e..94eb2f726829 100644
--- a/fs/xfs/xfs_extfree_item.c
+++ b/fs/xfs/xfs_extfree_item.c
@@ -567,15 +567,6 @@ xfs_extent_free_cancel_item(
 	kmem_cache_free(xfs_extfree_item_cache, xefi);
 }
 
-const struct xfs_defer_op_type xfs_extent_free_defer_type = {
-	.max_items	= XFS_EFI_MAX_FAST_EXTENTS,
-	.create_intent	= xfs_extent_free_create_intent,
-	.abort_intent	= xfs_extent_free_abort_intent,
-	.create_done	= xfs_extent_free_create_done,
-	.finish_item	= xfs_extent_free_finish_item,
-	.cancel_item	= xfs_extent_free_cancel_item,
-};
-
 /*
  * AGFL blocks are accounted differently in the reserve pools and are not
  * inserted into the busy extent list.
@@ -632,16 +623,6 @@ xfs_agfl_free_finish_item(
 	return error;
 }
 
-/* sub-type with special handling for AGFL deferred frees */
-const struct xfs_defer_op_type xfs_agfl_free_defer_type = {
-	.max_items	= XFS_EFI_MAX_FAST_EXTENTS,
-	.create_intent	= xfs_extent_free_create_intent,
-	.abort_intent	= xfs_extent_free_abort_intent,
-	.create_done	= xfs_extent_free_create_done,
-	.finish_item	= xfs_agfl_free_finish_item,
-	.cancel_item	= xfs_extent_free_cancel_item,
-};
-
 /* Is this recovered EFI ok? */
 static inline bool
 xfs_efi_validate_ext(
@@ -676,7 +657,7 @@ xfs_efi_recover_work(
  * the log.  We need to free the extents that it describes.
  */
 STATIC int
-xfs_efi_item_recover(
+xfs_extent_free_recover_work(
 	struct xfs_defer_pending	*dfp,
 	struct list_head		*capture_list)
 {
@@ -725,6 +706,27 @@ xfs_efi_item_recover(
 	return error;
 }
 
+const struct xfs_defer_op_type xfs_extent_free_defer_type = {
+	.max_items	= XFS_EFI_MAX_FAST_EXTENTS,
+	.create_intent	= xfs_extent_free_create_intent,
+	.abort_intent	= xfs_extent_free_abort_intent,
+	.create_done	= xfs_extent_free_create_done,
+	.finish_item	= xfs_extent_free_finish_item,
+	.cancel_item	= xfs_extent_free_cancel_item,
+	.recover_work	= xfs_extent_free_recover_work,
+};
+
+/* sub-type with special handling for AGFL deferred frees */
+const struct xfs_defer_op_type xfs_agfl_free_defer_type = {
+	.max_items	= XFS_EFI_MAX_FAST_EXTENTS,
+	.create_intent	= xfs_extent_free_create_intent,
+	.abort_intent	= xfs_extent_free_abort_intent,
+	.create_done	= xfs_extent_free_create_done,
+	.finish_item	= xfs_agfl_free_finish_item,
+	.cancel_item	= xfs_extent_free_cancel_item,
+	.recover_work	= xfs_extent_free_recover_work,
+};
+
 STATIC bool
 xfs_efi_item_match(
 	struct xfs_log_item	*lip,
@@ -767,7 +769,6 @@ static const struct xfs_item_ops xfs_efi_item_ops = {
 	.iop_format	= xfs_efi_item_format,
 	.iop_unpin	= xfs_efi_item_unpin,
 	.iop_release	= xfs_efi_item_release,
-	.iop_recover	= xfs_efi_item_recover,
 	.iop_match	= xfs_efi_item_match,
 	.iop_relog	= xfs_efi_item_relog,
 };
diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index 9bd2bdf42a84..7bd67e0400ee 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -2568,7 +2568,6 @@ xlog_recover_process_intents(
 
 	list_for_each_entry_safe(dfp, n, &log->l_dfops, dfp_list) {
 		struct xfs_log_item	*lip = dfp->dfp_intent;
-		const struct xfs_item_ops *ops = lip->li_ops;
 
 		ASSERT(xlog_item_is_intent(lip));
 
@@ -2587,12 +2586,10 @@ xlog_recover_process_intents(
 		 *
 		 * The recovery function can free @dfp.
 		 */
-		error = ops->iop_recover(dfp, &capture_list);
-		if (error) {
-			trace_xlog_intent_recovery_failed(log->l_mp, error,
-					ops->iop_recover);
+		error = xfs_defer_finish_recovery(log->l_mp, dfp,
+				&capture_list);
+		if (error)
 			break;
-		}
 	}
 	if (error)
 		goto err;
diff --git a/fs/xfs/xfs_refcount_item.c b/fs/xfs/xfs_refcount_item.c
index a0e3a42555ec..494e2561420f 100644
--- a/fs/xfs/xfs_refcount_item.c
+++ b/fs/xfs/xfs_refcount_item.c
@@ -433,16 +433,6 @@ xfs_refcount_update_cancel_item(
 	kmem_cache_free(xfs_refcount_intent_cache, ri);
 }
 
-const struct xfs_defer_op_type xfs_refcount_update_defer_type = {
-	.max_items	= XFS_CUI_MAX_FAST_EXTENTS,
-	.create_intent	= xfs_refcount_update_create_intent,
-	.abort_intent	= xfs_refcount_update_abort_intent,
-	.create_done	= xfs_refcount_update_create_done,
-	.finish_item	= xfs_refcount_update_finish_item,
-	.finish_cleanup = xfs_refcount_finish_one_cleanup,
-	.cancel_item	= xfs_refcount_update_cancel_item,
-};
-
 /* Is this recovered CUI ok? */
 static inline bool
 xfs_cui_validate_phys(
@@ -492,7 +482,7 @@ xfs_cui_recover_work(
  * We need to update the refcountbt.
  */
 STATIC int
-xfs_cui_item_recover(
+xfs_refcount_recover_work(
 	struct xfs_defer_pending	*dfp,
 	struct list_head		*capture_list)
 {
@@ -554,6 +544,17 @@ xfs_cui_item_recover(
 	return error;
 }
 
+const struct xfs_defer_op_type xfs_refcount_update_defer_type = {
+	.max_items	= XFS_CUI_MAX_FAST_EXTENTS,
+	.create_intent	= xfs_refcount_update_create_intent,
+	.abort_intent	= xfs_refcount_update_abort_intent,
+	.create_done	= xfs_refcount_update_create_done,
+	.finish_item	= xfs_refcount_update_finish_item,
+	.finish_cleanup = xfs_refcount_finish_one_cleanup,
+	.cancel_item	= xfs_refcount_update_cancel_item,
+	.recover_work	= xfs_refcount_recover_work,
+};
+
 STATIC bool
 xfs_cui_item_match(
 	struct xfs_log_item	*lip,
@@ -594,7 +595,6 @@ static const struct xfs_item_ops xfs_cui_item_ops = {
 	.iop_format	= xfs_cui_item_format,
 	.iop_unpin	= xfs_cui_item_unpin,
 	.iop_release	= xfs_cui_item_release,
-	.iop_recover	= xfs_cui_item_recover,
 	.iop_match	= xfs_cui_item_match,
 	.iop_relog	= xfs_cui_item_relog,
 };
diff --git a/fs/xfs/xfs_rmap_item.c b/fs/xfs/xfs_rmap_item.c
index 2c883ca8a545..75b71be845b7 100644
--- a/fs/xfs/xfs_rmap_item.c
+++ b/fs/xfs/xfs_rmap_item.c
@@ -452,16 +452,6 @@ xfs_rmap_update_cancel_item(
 	kmem_cache_free(xfs_rmap_intent_cache, ri);
 }
 
-const struct xfs_defer_op_type xfs_rmap_update_defer_type = {
-	.max_items	= XFS_RUI_MAX_FAST_EXTENTS,
-	.create_intent	= xfs_rmap_update_create_intent,
-	.abort_intent	= xfs_rmap_update_abort_intent,
-	.create_done	= xfs_rmap_update_create_done,
-	.finish_item	= xfs_rmap_update_finish_item,
-	.finish_cleanup = xfs_rmap_finish_one_cleanup,
-	.cancel_item	= xfs_rmap_update_cancel_item,
-};
-
 /* Is this recovered RUI ok? */
 static inline bool
 xfs_rui_validate_map(
@@ -557,7 +547,7 @@ xfs_rui_recover_work(
  * We need to update the rmapbt.
  */
 STATIC int
-xfs_rui_item_recover(
+xfs_rmap_recover_work(
 	struct xfs_defer_pending	*dfp,
 	struct list_head		*capture_list)
 {
@@ -607,6 +597,17 @@ xfs_rui_item_recover(
 	return error;
 }
 
+const struct xfs_defer_op_type xfs_rmap_update_defer_type = {
+	.max_items	= XFS_RUI_MAX_FAST_EXTENTS,
+	.create_intent	= xfs_rmap_update_create_intent,
+	.abort_intent	= xfs_rmap_update_abort_intent,
+	.create_done	= xfs_rmap_update_create_done,
+	.finish_item	= xfs_rmap_update_finish_item,
+	.finish_cleanup = xfs_rmap_finish_one_cleanup,
+	.cancel_item	= xfs_rmap_update_cancel_item,
+	.recover_work	= xfs_rmap_recover_work,
+};
+
 STATIC bool
 xfs_rui_item_match(
 	struct xfs_log_item	*lip,
@@ -647,7 +648,6 @@ static const struct xfs_item_ops xfs_rui_item_ops = {
 	.iop_format	= xfs_rui_item_format,
 	.iop_unpin	= xfs_rui_item_unpin,
 	.iop_release	= xfs_rui_item_release,
-	.iop_recover	= xfs_rui_item_recover,
 	.iop_match	= xfs_rui_item_match,
 	.iop_relog	= xfs_rui_item_relog,
 };
diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h
index 4e38357237c3..5fb018ad9fc0 100644
--- a/fs/xfs/xfs_trans.h
+++ b/fs/xfs/xfs_trans.h
@@ -66,8 +66,6 @@ struct xfs_log_item {
 	{ (1u << XFS_LI_DIRTY),		"DIRTY" }, \
 	{ (1u << XFS_LI_WHITEOUT),	"WHITEOUT" }
 
-struct xfs_defer_pending;
-
 struct xfs_item_ops {
 	unsigned flags;
 	void (*iop_size)(struct xfs_log_item *, int *, int *);
@@ -80,8 +78,6 @@ struct xfs_item_ops {
 	xfs_lsn_t (*iop_committed)(struct xfs_log_item *, xfs_lsn_t);
 	uint (*iop_push)(struct xfs_log_item *, struct list_head *);
 	void (*iop_release)(struct xfs_log_item *);
-	int (*iop_recover)(struct xfs_defer_pending *dfp,
-			   struct list_head *capture_list);
 	bool (*iop_match)(struct xfs_log_item *item, uint64_t id);
 	struct xfs_log_item *(*iop_relog)(struct xfs_log_item *intent,
 			struct xfs_trans *tp);


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/7] xfs: don't leak recovered attri intent items
  2023-11-28 20:26 ` [PATCH 1/7] xfs: don't leak recovered attri intent items Darrick J. Wong
@ 2023-11-30  7:34   ` Christoph Hellwig
  2023-11-30 17:02     ` Darrick J. Wong
  0 siblings, 1 reply; 24+ messages in thread
From: Christoph Hellwig @ 2023-11-30  7:34 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: leo.lilong, chandanbabu, hch, linux-xfs

On Tue, Nov 28, 2023 at 12:26:34PM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> If recovery finds an xattr log intent item calling for the removal of an
> attribute and the file doesn't even have an attr fork, we know that the
> removal is trivially complete.  However, we can't just exit the recovery
> function without doing something about the recovered log intent item --
> it's still on the AIL, and not logging an attrd item means it stays
> there forever.
> 
> This has likely not been seen in practice because few people use LARP
> and the runtime code won't log the attri for a no-attrfork removexattr
> operation.  But let's fix this anyway.
> 
> Also we shouldn't really be testing the attr fork presence until we've
> taken the ILOCK, though this doesn't matter much in recovery, which is
> single threaded.
> 
> Fixes: fdaf1bb3cafc ("xfs: ATTR_REPLACE algorithm with LARP enabled needs rework")

No useful comment here as the attr logging code is new to me, but what
is the LARP mode?  I see plenty of references to it in commit logs,
a small amount in the code mostly related to error injection, but it
would be really good to expand the acronym somehwere as I can't find
any explanation in the code or commit logs..

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/7] xfs: use xfs_defer_pending objects to recover intent items
  2023-11-28 20:26 ` [PATCH 2/7] xfs: use xfs_defer_pending objects to recover " Darrick J. Wong
@ 2023-11-30  7:53   ` Christoph Hellwig
  2023-11-30 17:14     ` Darrick J. Wong
  0 siblings, 1 reply; 24+ messages in thread
From: Christoph Hellwig @ 2023-11-30  7:53 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: leo.lilong, chandanbabu, hch, linux-xfs

> +/* Create a pending deferred work item for use with log recovery. */
> +struct xfs_defer_pending *
> +xfs_defer_start_recovery(
> +	struct xfs_log_item		*lip,
> +	enum xfs_defer_ops_type		dfp_type)
> +{
> +	struct xfs_defer_pending	*dfp;
> +
> +	dfp = kmem_cache_zalloc(xfs_defer_pending_cache,
> +			GFP_NOFS | __GFP_NOFAIL);
> +	dfp->dfp_type = dfp_type;
> +	dfp->dfp_intent = lip;
> +	INIT_LIST_HEAD(&dfp->dfp_work);
> +	INIT_LIST_HEAD(&dfp->dfp_list);
> +	return dfp;
> +}

Initializing dfp_list here is a bit pointless as the caller instantly
adds it to log->l_dfops.

> -#if defined(DEBUG) || defined(XFS_WARN)
> +	spin_lock(&log->l_ailp->ail_lock);
>  	last_lsn = xlog_assign_lsn(log->l_curr_cycle, log->l_curr_block);
> +	spin_unlock(&log->l_ailp->ail_lock);

ail_lock shouldn't be needed here.

Otherwise this looks reasonable to me.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 3/7] xfs: pass the xfs_defer_pending object to iop_recover
  2023-11-28 20:26 ` [PATCH 3/7] xfs: pass the xfs_defer_pending object to iop_recover Darrick J. Wong
@ 2023-11-30  7:53   ` Christoph Hellwig
  0 siblings, 0 replies; 24+ messages in thread
From: Christoph Hellwig @ 2023-11-30  7:53 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: leo.lilong, chandanbabu, hch, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 4/7] xfs: transfer recovered intent item ownership in ->iop_recover
  2023-11-28 20:26 ` [PATCH 4/7] xfs: transfer recovered intent item ownership in ->iop_recover Darrick J. Wong
@ 2023-11-30  7:55   ` Christoph Hellwig
  0 siblings, 0 replies; 24+ messages in thread
From: Christoph Hellwig @ 2023-11-30  7:55 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: leo.lilong, chandanbabu, hch, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 5/7] xfs: recreate work items when recovering intent items
  2023-11-28 20:26 ` [PATCH 5/7] xfs: recreate work items when recovering intent items Darrick J. Wong
@ 2023-11-30  8:06   ` Christoph Hellwig
  2023-11-30 17:36     ` Darrick J. Wong
  2023-11-30 13:13   ` Long Li
  1 sibling, 1 reply; 24+ messages in thread
From: Christoph Hellwig @ 2023-11-30  8:06 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: leo.lilong, chandanbabu, hch, linux-xfs

> +static inline void
> +xfs_defer_recover_work_item(
> +	struct xfs_defer_pending	*dfp,
> +	struct list_head		*work)
> +{
> +	list_add_tail(work, &dfp->dfp_work);
> +	dfp->dfp_count++;
> +}

The same (arguably trivial) logic is also duplicated in xfs_defer_add.
Maybe give the helper a more generic name and use it there as well?

> +	INIT_LIST_HEAD(&attr->xattri_list);

No need to initialize this (and all the other list_heads in the items)
as we're only adding them to the work list, but never actualy using them
as the head of a list or do list_empty checks on them.

> +	xfs_defer_recover_work_item(dfp, &bi->bi_list);

Does this commit on it's own actually work?  We add the items to
the work list, but I don't think we ever take it off without the next
patch?

> -	struct xfs_bmap_intent		fake = { };
>  	struct xfs_trans_res		resv;
>  	struct xfs_log_item		*lip = dfp->dfp_intent;
>  	struct xfs_bui_log_item		*buip = BUI_ITEM(lip);
> @@ -498,6 +520,7 @@ xfs_bui_item_recover(
>  	struct xfs_mount		*mp = lip->li_log->l_mp;
>  	struct xfs_map_extent		*map;
>  	struct xfs_bud_log_item		*budp;
> +	struct xfs_bmap_intent		*fake;

So this patch moves the intent structures off the stack, and the next
one then renames it to work.  Maybe do the renames here so that we don't
have to touch every single use of them twice?

Otherwise this looks nice, and I really like the better structure of
the recovery code that this is leading towards.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 6/7] xfs: use xfs_defer_finish_one to finish recovered work items
  2023-11-28 20:27 ` [PATCH 6/7] xfs: use xfs_defer_finish_one to finish recovered work items Darrick J. Wong
@ 2023-11-30  8:10   ` Christoph Hellwig
  2023-11-30 17:59     ` Darrick J. Wong
  0 siblings, 1 reply; 24+ messages in thread
From: Christoph Hellwig @ 2023-11-30  8:10 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: leo.lilong, chandanbabu, hch, linux-xfs

On Tue, Nov 28, 2023 at 12:27:03PM -0800, Darrick J. Wong wrote:
> While we're at it, dump xattr log items if the recovery fails with
> corruption errors like we do for the other intent items.

Normally I'd expect this as a split out prep patch.

> -	error = xfs_xattri_finish_update(attr, done_item);

As a follow on cleanup, xfs_xattri_finish_update could now be folded
into xfs_attr_finish_item, making the logic a littl simpler to follow.
Same for the other items.

Otherwise this looks good to me.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 7/7] xfs: move ->iop_recover to xfs_defer_op_type
  2023-11-28 20:27 ` [PATCH 7/7] xfs: move ->iop_recover to xfs_defer_op_type Darrick J. Wong
@ 2023-11-30  8:11   ` Christoph Hellwig
  0 siblings, 0 replies; 24+ messages in thread
From: Christoph Hellwig @ 2023-11-30  8:11 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: leo.lilong, chandanbabu, hch, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCHSET RFC 0/7] xfs: log intent item recovery should reconstruct defer work state
  2023-11-28 20:26 [PATCHSET RFC 0/7] xfs: log intent item recovery should reconstruct defer work state Darrick J. Wong
                   ` (6 preceding siblings ...)
  2023-11-28 20:27 ` [PATCH 7/7] xfs: move ->iop_recover to xfs_defer_op_type Darrick J. Wong
@ 2023-11-30 12:06 ` Long Li
  7 siblings, 0 replies; 24+ messages in thread
From: Long Li @ 2023-11-30 12:06 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs, chandanbabu, hch

On Tue, Nov 28, 2023 at 12:26:28PM -0800, Darrick J. Wong wrote:
> Hi all,
> 
> Long Li reported a KASAN report from a UAF when intent recovery fails:
> 
>  ==================================================================
>  BUG: KASAN: slab-use-after-free in xfs_cui_release+0xb7/0xc0
>  Read of size 4 at addr ffff888012575e60 by task kworker/u8:3/103
>  CPU: 3 PID: 103 Comm: kworker/u8:3 Not tainted 6.4.0-rc7-next-20230619-00003-g94543a53f9a4-dirty #166
>  Workqueue: xfs-cil/sda xlog_cil_push_work
>  Call Trace:
>   <TASK>
>   dump_stack_lvl+0x50/0x70
>   print_report+0xc2/0x600
>   kasan_report+0xb6/0xe0
>   xfs_cui_release+0xb7/0xc0
>   xfs_cud_item_release+0x3c/0x90
>   xfs_trans_committed_bulk+0x2d5/0x7f0
>   xlog_cil_committed+0xaba/0xf20
>   xlog_cil_push_work+0x1a60/0x2360
>   process_one_work+0x78e/0x1140
>   worker_thread+0x58b/0xf60
>   kthread+0x2cd/0x3c0
>   ret_from_fork+0x1f/0x30
>   </TASK>
> 
>  Allocated by task 531:
>   kasan_save_stack+0x22/0x40
>   kasan_set_track+0x25/0x30
>   __kasan_slab_alloc+0x55/0x60
>   kmem_cache_alloc+0x195/0x5f0
>   xfs_cui_init+0x198/0x1d0
>   xlog_recover_cui_commit_pass2+0x133/0x5f0
>   xlog_recover_items_pass2+0x107/0x230
>   xlog_recover_commit_trans+0x3e7/0x9c0
>   xlog_recovery_process_trans+0x140/0x1d0
>   xlog_recover_process_ophdr+0x1a0/0x3d0
>   xlog_recover_process_data+0x108/0x2d0
>   xlog_recover_process+0x1f6/0x280
>   xlog_do_recovery_pass+0x609/0xdb0
>   xlog_do_log_recovery+0x84/0xe0
>   xlog_do_recover+0x7d/0x470
>   xlog_recover+0x25f/0x490
>   xfs_log_mount+0x2dd/0x6f0
>   xfs_mountfs+0x11ce/0x1e70
>   xfs_fs_fill_super+0x10ec/0x1b20
>   get_tree_bdev+0x3c8/0x730
>   vfs_get_tree+0x89/0x2c0
>   path_mount+0xecf/0x1800
>   do_mount+0xf3/0x110
>   __x64_sys_mount+0x154/0x1f0
>   do_syscall_64+0x39/0x80
>   entry_SYSCALL_64_after_hwframe+0x63/0xcd
> 
>  Freed by task 531:
>   kasan_save_stack+0x22/0x40
>   kasan_set_track+0x25/0x30
>   kasan_save_free_info+0x2b/0x40
>   __kasan_slab_free+0x114/0x1b0
>   kmem_cache_free+0xf8/0x510
>   xfs_cui_item_free+0x95/0xb0
>   xfs_cui_release+0x86/0xc0
>   xlog_recover_cancel_intents.isra.0+0xf8/0x210
>   xlog_recover_finish+0x7e7/0x980
>   xfs_log_mount_finish+0x2bb/0x4a0
>   xfs_mountfs+0x14bf/0x1e70
>   xfs_fs_fill_super+0x10ec/0x1b20
>   get_tree_bdev+0x3c8/0x730
>   vfs_get_tree+0x89/0x2c0
>   path_mount+0xecf/0x1800
>   do_mount+0xf3/0x110
>   __x64_sys_mount+0x154/0x1f0
>   do_syscall_64+0x39/0x80
>   entry_SYSCALL_64_after_hwframe+0x63/0xcd
> 
>  The buggy address belongs to the object at ffff888012575dc8
>   which belongs to the cache xfs_cui_item of size 432
>  The buggy address is located 152 bytes inside of
>   freed 432-byte region [ffff888012575dc8, ffff888012575f78)
> 
>  The buggy address belongs to the physical page:
>  page:ffffea0000495d00 refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff888012576208 pfn:0x12574
>  head:ffffea0000495d00 order:2 entire_mapcount:0 nr_pages_mapped:0 pincount:0
>  flags: 0x1fffff80010200(slab|head|node=0|zone=1|lastcpupid=0x1fffff)
>  page_type: 0xffffffff()
>  raw: 001fffff80010200 ffff888012092f40 ffff888014570150 ffff888014570150
>  raw: ffff888012576208 00000000001e0010 00000001ffffffff 0000000000000000
>  page dumped because: kasan: bad access detected
> 
>  Memory state around the buggy address:
>   ffff888012575d00: fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc fc
>   ffff888012575d80: fc fc fc fc fc fc fc fc fc fa fb fb fb fb fb fb
>  >ffff888012575e00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>                                                         ^
>   ffff888012575e80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>   ffff888012575f00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fc
>  ==================================================================
> 
> "If process intents fails, intent items left in AIL will be delete
> from AIL and freed in error handling, even intent items that have been
> recovered and created done items. After this, uaf will be triggered when
> done item committed, because at this point the released intent item will
> be accessed.
> 
> xlog_recover_finish                     xlog_cil_push_work
> ----------------------------            ---------------------------
> xlog_recover_process_intents
>   xfs_cui_item_recover//cui_refcount == 1
>     xfs_trans_get_cud
>     xfs_trans_commit
>       <add cud item to cil>
>   xfs_cui_item_recover
>     <error occurred and return>
> xlog_recover_cancel_intents
>   xfs_cui_release     //cui_refcount == 0
>     xfs_cui_item_free //free cui
>   <release other intent items>
> xlog_force_shutdown   //shutdown
>                                <...>
>                                         <push items in cil>
>                                         xlog_cil_committed
>                                           xfs_cud_item_release
>                                             xfs_cui_release // UAF
> 
> "Intent log items are created with a reference count of 2, one for the
> creator, and one for the intent done object. Log recovery explicitly
> drops the creator reference after it is inserted into the AIL, but it
> then processes the log item as if it also owns the intent-done reference.
> 
> "The code in ->iop_recovery should assume that it passes the reference
> to the done intent, we can remove the intent item from the AIL after
> creating the done-intent, but if that code fails before creating the
> done-intent then it needs to release the intent reference by log recovery
> itself.
> 
> "That way when we go to cancel the intent, the only intents we find in
> the AIL are the ones we know have not been processed yet and hence we
> can safely drop both the creator and the intent done reference from
> xlog_recover_cancel_intents().
> 
> "Hence if we remove the intent from the list of intents that need to
> be recovered after we have done the initial recovery, we acheive two
> things:
> 
> "1. the tail of the log can be moved forward with the commit of the
> done intent or new intent to continue the operation, and
> 
> "2. We avoid the problem of trying to determine how many reference
> counts we need to drop from intent recovery cancelling because we
> never come across intents we've actually attempted recovery on."
> 
> Restated: The cause of the UAF is that xlog_recover_cancel_intents
> thinks that it owns the refcount on any intent item in the AIL, and that
> it's always safe to release these intent items.  This is not true after
> the recovery function creates an log intent done item and points it at
> the log intent item because releasing the done item always releases the
> intent item.
> 
> The runtime defer ops code avoids all this by tracking both the log
> intent and the intent done items, and releasing only the intent done
> item if both have been created.  Long Li proposed fixing this by adding
> state flags, but I have a more comprehensive fix.
> 
> First, observe that the latter half of the intent _recover functions are
> nearly open-coded versions of the corresponding _finish_one function
> that uses an onstack deferred work item to single-step through the item.
> 
> Second, notice that the recover function is not an exact match because
> of the odd behavior that unfinished recovered work items are relogged
> with separate log intent items instead of a single new log intent item,
> which is what the defer ops machinery does.
> 
> Dave and I have long suspected that recovery should be reconstructing
> the defer work state from what's in the recovered intent item.  Now we
> finally have an excuse to refactor the code to do that.
> 
> This series starts by fixing a resource leak in LARP recovery.  We fix
> the bug that Long Li reported by switching the intent recovery code to
> construct chains of xfs_defer_pending objects and then using the defer
> pending objects to track the intent/done item ownership.  Finally, we
> clean up the code to reconstruct the exact incore state, which means we
> can remove all the opencoded _recover code, which makes maintaining log
> items much easier.
> 

Thanks for fixing this UAF issue, it really is a much more comprehensive fix,
and makes the intent item recovery code much easier to maintain.

Best Regards
Long Li
 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 5/7] xfs: recreate work items when recovering intent items
  2023-11-28 20:26 ` [PATCH 5/7] xfs: recreate work items when recovering intent items Darrick J. Wong
  2023-11-30  8:06   ` Christoph Hellwig
@ 2023-11-30 13:13   ` Long Li
  2023-11-30 17:19     ` Darrick J. Wong
  1 sibling, 1 reply; 24+ messages in thread
From: Long Li @ 2023-11-30 13:13 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs, chandanbabu, hch

On Tue, Nov 28, 2023 at 12:26:57PM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> Recreate work items for each xfs_defer_pending object when we are
> recovering intent items.
> 
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> ---
>  fs/xfs/libxfs/xfs_defer.h  |    9 ++++
>  fs/xfs/xfs_attr_item.c     |   94 +++++++++++++++++++++----------------
>  fs/xfs/xfs_bmap_item.c     |   56 ++++++++++++++--------
>  fs/xfs/xfs_extfree_item.c  |   50 ++++++++++++-------
>  fs/xfs/xfs_refcount_item.c |   61 +++++++++++-------------
>  fs/xfs/xfs_rmap_item.c     |  113 ++++++++++++++++++++++++--------------------
>  6 files changed, 221 insertions(+), 162 deletions(-)
> 
> 
> diff --git a/fs/xfs/libxfs/xfs_defer.h b/fs/xfs/libxfs/xfs_defer.h
> index 3c923a728323..ee1e76d3f7e8 100644
> --- a/fs/xfs/libxfs/xfs_defer.h
> +++ b/fs/xfs/libxfs/xfs_defer.h
> @@ -130,6 +130,15 @@ struct xfs_defer_pending *xfs_defer_start_recovery(struct xfs_log_item *lip,
>  void xfs_defer_cancel_recovery(struct xfs_mount *mp,
>  		struct xfs_defer_pending *dfp);
>  
> +static inline void
> +xfs_defer_recover_work_item(
> +	struct xfs_defer_pending	*dfp,
> +	struct list_head		*work)
> +{
> +	list_add_tail(work, &dfp->dfp_work);
> +	dfp->dfp_count++;
> +}
> +
>  int __init xfs_defer_init_item_caches(void);
>  void xfs_defer_destroy_item_caches(void);
>  
> diff --git a/fs/xfs/xfs_attr_item.c b/fs/xfs/xfs_attr_item.c
> index 82775e9537df..fbc88325848a 100644
> --- a/fs/xfs/xfs_attr_item.c
> +++ b/fs/xfs/xfs_attr_item.c
> @@ -539,47 +539,22 @@ xfs_attri_validate(
>  	return xfs_verify_ino(mp, attrp->alfi_ino);
>  }
>  
> -/*
> - * Process an attr intent item that was recovered from the log.  We need to
> - * delete the attr that it describes.
> - */
> -STATIC int
> -xfs_attri_item_recover(
> +static inline struct xfs_attr_intent *
> +xfs_attri_recover_work(
> +	struct xfs_mount		*mp,
>  	struct xfs_defer_pending	*dfp,
> -	struct list_head		*capture_list)
> +	struct xfs_attri_log_format	*attrp,
> +	struct xfs_inode		*ip,
> +	struct xfs_attri_log_nameval	*nv)
>  {
> -	struct xfs_log_item		*lip = dfp->dfp_intent;
> -	struct xfs_attri_log_item	*attrip = ATTRI_ITEM(lip);
>  	struct xfs_attr_intent		*attr;
> -	struct xfs_mount		*mp = lip->li_log->l_mp;
> -	struct xfs_inode		*ip;
>  	struct xfs_da_args		*args;
> -	struct xfs_trans		*tp;
> -	struct xfs_trans_res		resv;
> -	struct xfs_attri_log_format	*attrp;
> -	struct xfs_attri_log_nameval	*nv = attrip->attri_nameval;
> -	int				error;
> -	int				total;
> -	int				local;
> -	struct xfs_attrd_log_item	*done_item = NULL;
> -
> -	/*
> -	 * First check the validity of the attr described by the ATTRI.  If any
> -	 * are bad, then assume that all are bad and just toss the ATTRI.
> -	 */
> -	attrp = &attrip->attri_format;
> -	if (!xfs_attri_validate(mp, attrp) ||
> -	    !xfs_attr_namecheck(nv->name.i_addr, nv->name.i_len))
> -		return -EFSCORRUPTED;
> -
> -	error = xlog_recover_iget(mp,  attrp->alfi_ino, &ip);
> -	if (error)
> -		return error;
>  
>  	attr = kmem_zalloc(sizeof(struct xfs_attr_intent) +
>  			   sizeof(struct xfs_da_args), KM_NOFS);
>  	args = (struct xfs_da_args *)(attr + 1);
>  
> +	INIT_LIST_HEAD(&attr->xattri_list);
>  	attr->xattri_da_args = args;
>  	attr->xattri_op_flags = attrp->alfi_op_flags &
>  						XFS_ATTRI_OP_FLAGS_TYPE_MASK;
> @@ -607,6 +582,8 @@ xfs_attri_item_recover(
>  	switch (attr->xattri_op_flags) {
>  	case XFS_ATTRI_OP_FLAGS_SET:
>  	case XFS_ATTRI_OP_FLAGS_REPLACE:
> +		int			local;
> +
>  		args->value = nv->value.i_addr;
>  		args->valuelen = nv->value.i_len;
>  		args->total = xfs_attr_calc_size(args, &local);
 
When I compile the kernel with this set of patches, I get the following error:

fs/xfs/xfs_attr_item.c: In function ‘xfs_attri_recover_work’:                                            
fs/xfs/xfs_attr_item.c:585:3: error: a label can only be part of a statement and a declaration is not a statement
  585 |   int   local;                                                                                   
      |   ^~~ 

Thanks,
Long Li


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/7] xfs: don't leak recovered attri intent items
  2023-11-30  7:34   ` Christoph Hellwig
@ 2023-11-30 17:02     ` Darrick J. Wong
  2023-12-04  4:50       ` Christoph Hellwig
  0 siblings, 1 reply; 24+ messages in thread
From: Darrick J. Wong @ 2023-11-30 17:02 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: leo.lilong, chandanbabu, hch, linux-xfs

On Wed, Nov 29, 2023 at 11:34:41PM -0800, Christoph Hellwig wrote:
> On Tue, Nov 28, 2023 at 12:26:34PM -0800, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@kernel.org>
> > 
> > If recovery finds an xattr log intent item calling for the removal of an
> > attribute and the file doesn't even have an attr fork, we know that the
> > removal is trivially complete.  However, we can't just exit the recovery
> > function without doing something about the recovered log intent item --
> > it's still on the AIL, and not logging an attrd item means it stays
> > there forever.
> > 
> > This has likely not been seen in practice because few people use LARP
> > and the runtime code won't log the attri for a no-attrfork removexattr
> > operation.  But let's fix this anyway.
> > 
> > Also we shouldn't really be testing the attr fork presence until we've
> > taken the ILOCK, though this doesn't matter much in recovery, which is
> > single threaded.
> > 
> > Fixes: fdaf1bb3cafc ("xfs: ATTR_REPLACE algorithm with LARP enabled needs rework")
> 
> No useful comment here as the attr logging code is new to me, but what
> is the LARP mode?  I see plenty of references to it in commit logs,
> a small amount in the code mostly related to error injection, but it
> would be really good to expand the acronym somehwere as I can't find
> any explanation in the code or commit logs..

LARP == Logged extended Attributes via Replay Persistence

(IOWs, a silly developer acronym for writing attr log intent items.)

--D

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/7] xfs: use xfs_defer_pending objects to recover intent items
  2023-11-30  7:53   ` Christoph Hellwig
@ 2023-11-30 17:14     ` Darrick J. Wong
  2023-12-04  4:50       ` Christoph Hellwig
  0 siblings, 1 reply; 24+ messages in thread
From: Darrick J. Wong @ 2023-11-30 17:14 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: leo.lilong, chandanbabu, hch, linux-xfs

On Wed, Nov 29, 2023 at 11:53:31PM -0800, Christoph Hellwig wrote:
> > +/* Create a pending deferred work item for use with log recovery. */
> > +struct xfs_defer_pending *
> > +xfs_defer_start_recovery(
> > +	struct xfs_log_item		*lip,
> > +	enum xfs_defer_ops_type		dfp_type)
> > +{
> > +	struct xfs_defer_pending	*dfp;
> > +
> > +	dfp = kmem_cache_zalloc(xfs_defer_pending_cache,
> > +			GFP_NOFS | __GFP_NOFAIL);
> > +	dfp->dfp_type = dfp_type;
> > +	dfp->dfp_intent = lip;
> > +	INIT_LIST_HEAD(&dfp->dfp_work);
> > +	INIT_LIST_HEAD(&dfp->dfp_list);
> > +	return dfp;
> > +}
> 
> Initializing dfp_list here is a bit pointless as the caller instantly
> adds it to log->l_dfops.

I'd rather pass the list_head into xfs_defer_start_recovery so that it
can create a fully initialized dfp and add it to the tracker.

> > -#if defined(DEBUG) || defined(XFS_WARN)
> > +	spin_lock(&log->l_ailp->ail_lock);
> >  	last_lsn = xlog_assign_lsn(log->l_curr_cycle, log->l_curr_block);
> > +	spin_unlock(&log->l_ailp->ail_lock);
> 
> ail_lock shouldn't be needed here.

Oh, right, the AIL lock was to protect the cursor, not the debug
statements.

> Otherwise this looks reasonable to me.

<nod>

--D

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 5/7] xfs: recreate work items when recovering intent items
  2023-11-30 13:13   ` Long Li
@ 2023-11-30 17:19     ` Darrick J. Wong
  0 siblings, 0 replies; 24+ messages in thread
From: Darrick J. Wong @ 2023-11-30 17:19 UTC (permalink / raw)
  To: Long Li; +Cc: linux-xfs, chandanbabu, hch

On Thu, Nov 30, 2023 at 09:13:15PM +0800, Long Li wrote:
> On Tue, Nov 28, 2023 at 12:26:57PM -0800, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@kernel.org>
> > 
> > Recreate work items for each xfs_defer_pending object when we are
> > recovering intent items.
> > 
> > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > ---
> >  fs/xfs/libxfs/xfs_defer.h  |    9 ++++
> >  fs/xfs/xfs_attr_item.c     |   94 +++++++++++++++++++++----------------
> >  fs/xfs/xfs_bmap_item.c     |   56 ++++++++++++++--------
> >  fs/xfs/xfs_extfree_item.c  |   50 ++++++++++++-------
> >  fs/xfs/xfs_refcount_item.c |   61 +++++++++++-------------
> >  fs/xfs/xfs_rmap_item.c     |  113 ++++++++++++++++++++++++--------------------
> >  6 files changed, 221 insertions(+), 162 deletions(-)
> > 
> > 
> > diff --git a/fs/xfs/libxfs/xfs_defer.h b/fs/xfs/libxfs/xfs_defer.h
> > index 3c923a728323..ee1e76d3f7e8 100644
> > --- a/fs/xfs/libxfs/xfs_defer.h
> > +++ b/fs/xfs/libxfs/xfs_defer.h
> > @@ -130,6 +130,15 @@ struct xfs_defer_pending *xfs_defer_start_recovery(struct xfs_log_item *lip,
> >  void xfs_defer_cancel_recovery(struct xfs_mount *mp,
> >  		struct xfs_defer_pending *dfp);
> >  
> > +static inline void
> > +xfs_defer_recover_work_item(
> > +	struct xfs_defer_pending	*dfp,
> > +	struct list_head		*work)
> > +{
> > +	list_add_tail(work, &dfp->dfp_work);
> > +	dfp->dfp_count++;
> > +}
> > +
> >  int __init xfs_defer_init_item_caches(void);
> >  void xfs_defer_destroy_item_caches(void);
> >  
> > diff --git a/fs/xfs/xfs_attr_item.c b/fs/xfs/xfs_attr_item.c
> > index 82775e9537df..fbc88325848a 100644
> > --- a/fs/xfs/xfs_attr_item.c
> > +++ b/fs/xfs/xfs_attr_item.c
> > @@ -539,47 +539,22 @@ xfs_attri_validate(
> >  	return xfs_verify_ino(mp, attrp->alfi_ino);
> >  }
> >  
> > -/*
> > - * Process an attr intent item that was recovered from the log.  We need to
> > - * delete the attr that it describes.
> > - */
> > -STATIC int
> > -xfs_attri_item_recover(
> > +static inline struct xfs_attr_intent *
> > +xfs_attri_recover_work(
> > +	struct xfs_mount		*mp,
> >  	struct xfs_defer_pending	*dfp,
> > -	struct list_head		*capture_list)
> > +	struct xfs_attri_log_format	*attrp,
> > +	struct xfs_inode		*ip,
> > +	struct xfs_attri_log_nameval	*nv)
> >  {
> > -	struct xfs_log_item		*lip = dfp->dfp_intent;
> > -	struct xfs_attri_log_item	*attrip = ATTRI_ITEM(lip);
> >  	struct xfs_attr_intent		*attr;
> > -	struct xfs_mount		*mp = lip->li_log->l_mp;
> > -	struct xfs_inode		*ip;
> >  	struct xfs_da_args		*args;
> > -	struct xfs_trans		*tp;
> > -	struct xfs_trans_res		resv;
> > -	struct xfs_attri_log_format	*attrp;
> > -	struct xfs_attri_log_nameval	*nv = attrip->attri_nameval;
> > -	int				error;
> > -	int				total;
> > -	int				local;
> > -	struct xfs_attrd_log_item	*done_item = NULL;
> > -
> > -	/*
> > -	 * First check the validity of the attr described by the ATTRI.  If any
> > -	 * are bad, then assume that all are bad and just toss the ATTRI.
> > -	 */
> > -	attrp = &attrip->attri_format;
> > -	if (!xfs_attri_validate(mp, attrp) ||
> > -	    !xfs_attr_namecheck(nv->name.i_addr, nv->name.i_len))
> > -		return -EFSCORRUPTED;
> > -
> > -	error = xlog_recover_iget(mp,  attrp->alfi_ino, &ip);
> > -	if (error)
> > -		return error;
> >  
> >  	attr = kmem_zalloc(sizeof(struct xfs_attr_intent) +
> >  			   sizeof(struct xfs_da_args), KM_NOFS);
> >  	args = (struct xfs_da_args *)(attr + 1);
> >  
> > +	INIT_LIST_HEAD(&attr->xattri_list);
> >  	attr->xattri_da_args = args;
> >  	attr->xattri_op_flags = attrp->alfi_op_flags &
> >  						XFS_ATTRI_OP_FLAGS_TYPE_MASK;
> > @@ -607,6 +582,8 @@ xfs_attri_item_recover(
> >  	switch (attr->xattri_op_flags) {
> >  	case XFS_ATTRI_OP_FLAGS_SET:
> >  	case XFS_ATTRI_OP_FLAGS_REPLACE:
> > +		int			local;
> > +
> >  		args->value = nv->value.i_addr;
> >  		args->valuelen = nv->value.i_len;
> >  		args->total = xfs_attr_calc_size(args, &local);
>  
> When I compile the kernel with this set of patches, I get the following error:
> 
> fs/xfs/xfs_attr_item.c: In function ‘xfs_attri_recover_work’:                                            
> fs/xfs/xfs_attr_item.c:585:3: error: a label can only be part of a statement and a declaration is not a statement
>   585 |   int   local;                                                                                   
>       |   ^~~ 

Yeah, the build bots complained about my ham-handed attempt to reduce
variable scope as well.  Oddly, gcc 12.2 doesn't seem to mind it.  That
error message is puzzling.

--D

> Thanks,
> Long Li
> 
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 5/7] xfs: recreate work items when recovering intent items
  2023-11-30  8:06   ` Christoph Hellwig
@ 2023-11-30 17:36     ` Darrick J. Wong
  0 siblings, 0 replies; 24+ messages in thread
From: Darrick J. Wong @ 2023-11-30 17:36 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: leo.lilong, chandanbabu, hch, linux-xfs

On Thu, Nov 30, 2023 at 12:06:17AM -0800, Christoph Hellwig wrote:
> > +static inline void
> > +xfs_defer_recover_work_item(
> > +	struct xfs_defer_pending	*dfp,
> > +	struct list_head		*work)
> > +{
> > +	list_add_tail(work, &dfp->dfp_work);
> > +	dfp->dfp_count++;
> > +}
> 
> The same (arguably trivial) logic is also duplicated in xfs_defer_add.
> Maybe give the helper a more generic name and use it there as well?

Done.

> > +	INIT_LIST_HEAD(&attr->xattri_list);
> 
> No need to initialize this (and all the other list_heads in the items)
> as we're only adding them to the work list, but never actualy using them
> as the head of a list or do list_empty checks on them.

Ah, right, list_add_tail doesn't care about the current contents of
new->{next,prev}.  Fixed.

> > +	xfs_defer_recover_work_item(dfp, &bi->bi_list);
> 
> Does this commit on it's own actually work?  We add the items to
> the work list, but I don't think we ever take it off without the next
> patch?

Oops, I think "xfs: use xfs_defer_pending objects to recover intent
items" leaks the dfp in the success case.  After the loop in
xlog_recover_process_intents sets dfp->dfp_intent = NULL, it ought to be
calling xfs_defer_cancel_recovery to free the dfp_work items and the dfp
itself.

> > -	struct xfs_bmap_intent		fake = { };
> >  	struct xfs_trans_res		resv;
> >  	struct xfs_log_item		*lip = dfp->dfp_intent;
> >  	struct xfs_bui_log_item		*buip = BUI_ITEM(lip);
> > @@ -498,6 +520,7 @@ xfs_bui_item_recover(
> >  	struct xfs_mount		*mp = lip->li_log->l_mp;
> >  	struct xfs_map_extent		*map;
> >  	struct xfs_bud_log_item		*budp;
> > +	struct xfs_bmap_intent		*fake;
> 
> So this patch moves the intent structures off the stack, and the next
> one then renames it to work.  Maybe do the renames here so that we don't
> have to touch every single use of them twice?

The next patch removes those variables altogether, so I'll remove the
fake -> work rename entirely.

> Otherwise this looks nice, and I really like the better structure of
> the recovery code that this is leading towards.

Oh good!

--D

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 6/7] xfs: use xfs_defer_finish_one to finish recovered work items
  2023-11-30  8:10   ` Christoph Hellwig
@ 2023-11-30 17:59     ` Darrick J. Wong
  0 siblings, 0 replies; 24+ messages in thread
From: Darrick J. Wong @ 2023-11-30 17:59 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: leo.lilong, chandanbabu, hch, linux-xfs

On Thu, Nov 30, 2023 at 12:10:43AM -0800, Christoph Hellwig wrote:
> On Tue, Nov 28, 2023 at 12:27:03PM -0800, Darrick J. Wong wrote:
> > While we're at it, dump xattr log items if the recovery fails with
> > corruption errors like we do for the other intent items.
> 
> Normally I'd expect this as a split out prep patch.

Done.

> > -	error = xfs_xattri_finish_update(attr, done_item);
> 
> As a follow on cleanup, xfs_xattri_finish_update could now be folded
> into xfs_attr_finish_item, making the logic a littl simpler to follow.
> Same for the other items.

That'll be a separate patch at the end.

> Otherwise this looks good to me.

Yay!

--D

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/7] xfs: don't leak recovered attri intent items
  2023-11-30 17:02     ` Darrick J. Wong
@ 2023-12-04  4:50       ` Christoph Hellwig
  0 siblings, 0 replies; 24+ messages in thread
From: Christoph Hellwig @ 2023-12-04  4:50 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Christoph Hellwig, leo.lilong, chandanbabu, hch, linux-xfs

On Thu, Nov 30, 2023 at 09:02:36AM -0800, Darrick J. Wong wrote:
> > No useful comment here as the attr logging code is new to me, but what
> > is the LARP mode?  I see plenty of references to it in commit logs,
> > a small amount in the code mostly related to error injection, but it
> > would be really good to expand the acronym somehwere as I can't find
> > any explanation in the code or commit logs..
> 
> LARP == Logged extended Attributes via Replay Persistence
> 
> (IOWs, a silly developer acronym for writing attr log intent items.)

We should probably spell it out somewhere.  I also found comments it's
a debug only feature, which also confuses me.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/7] xfs: use xfs_defer_pending objects to recover intent items
  2023-11-30 17:14     ` Darrick J. Wong
@ 2023-12-04  4:50       ` Christoph Hellwig
  0 siblings, 0 replies; 24+ messages in thread
From: Christoph Hellwig @ 2023-12-04  4:50 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Christoph Hellwig, leo.lilong, chandanbabu, hch, linux-xfs

On Thu, Nov 30, 2023 at 09:14:55AM -0800, Darrick J. Wong wrote:
> > > +	dfp = kmem_cache_zalloc(xfs_defer_pending_cache,
> > > +			GFP_NOFS | __GFP_NOFAIL);
> > > +	dfp->dfp_type = dfp_type;
> > > +	dfp->dfp_intent = lip;
> > > +	INIT_LIST_HEAD(&dfp->dfp_work);
> > > +	INIT_LIST_HEAD(&dfp->dfp_list);
> > > +	return dfp;
> > > +}
> > 
> > Initializing dfp_list here is a bit pointless as the caller instantly
> > adds it to log->l_dfops.
> 
> I'd rather pass the list_head into xfs_defer_start_recovery so that it
> can create a fully initialized dfp and add it to the tracker.

Sounds good to me.


^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2023-12-04  4:50 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-11-28 20:26 [PATCHSET RFC 0/7] xfs: log intent item recovery should reconstruct defer work state Darrick J. Wong
2023-11-28 20:26 ` [PATCH 1/7] xfs: don't leak recovered attri intent items Darrick J. Wong
2023-11-30  7:34   ` Christoph Hellwig
2023-11-30 17:02     ` Darrick J. Wong
2023-12-04  4:50       ` Christoph Hellwig
2023-11-28 20:26 ` [PATCH 2/7] xfs: use xfs_defer_pending objects to recover " Darrick J. Wong
2023-11-30  7:53   ` Christoph Hellwig
2023-11-30 17:14     ` Darrick J. Wong
2023-12-04  4:50       ` Christoph Hellwig
2023-11-28 20:26 ` [PATCH 3/7] xfs: pass the xfs_defer_pending object to iop_recover Darrick J. Wong
2023-11-30  7:53   ` Christoph Hellwig
2023-11-28 20:26 ` [PATCH 4/7] xfs: transfer recovered intent item ownership in ->iop_recover Darrick J. Wong
2023-11-30  7:55   ` Christoph Hellwig
2023-11-28 20:26 ` [PATCH 5/7] xfs: recreate work items when recovering intent items Darrick J. Wong
2023-11-30  8:06   ` Christoph Hellwig
2023-11-30 17:36     ` Darrick J. Wong
2023-11-30 13:13   ` Long Li
2023-11-30 17:19     ` Darrick J. Wong
2023-11-28 20:27 ` [PATCH 6/7] xfs: use xfs_defer_finish_one to finish recovered work items Darrick J. Wong
2023-11-30  8:10   ` Christoph Hellwig
2023-11-30 17:59     ` Darrick J. Wong
2023-11-28 20:27 ` [PATCH 7/7] xfs: move ->iop_recover to xfs_defer_op_type Darrick J. Wong
2023-11-30  8:11   ` Christoph Hellwig
2023-11-30 12:06 ` [PATCHSET RFC 0/7] xfs: log intent item recovery should reconstruct defer work state Long Li

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox