[PATCH 00/12] xfs: remove remaining kmem interfaces and GFP

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 00/12] xfs: remove remaining kmem interfaces and GFP_NOFS usage
@ 2024-01-15 22:59 Dave Chinner
  2024-01-15 22:59 ` [PATCH 01/12] xfs: convert kmem_zalloc() to kzalloc() Dave Chinner
                   ` (12 more replies)
  0 siblings, 13 replies; 29+ messages in thread
From: Dave Chinner @ 2024-01-15 22:59 UTC (permalink / raw)
  To: linux-xfs; +Cc: willy, linux-mm

This series does two things. Firstly it removes the remaining XFS
specific kernel memory allocation wrappers, converting everything to
using GFP flags directly. Secondly, it converts all the GFP_NOFS
flag usage to use the scoped memalloc_nofs_save() API instead of
direct calls with the GFP_NOFS.

The first part of the series (fs/xfs/kmem.[ch] removal) is straight
forward.  We've done lots of this stuff in the past leading up to
the point; this is just converting the final remaining usage to the
native kernel interface. The only down-side to this is that we end
up propagating __GFP_NOFAIL everywhere into the code. This is no big
deal for XFS - it's just formalising the fact that all our
allocations are __GFP_NOFAIL by default, except for the ones we
explicity mark as able to fail. This may be a surprise of people
outside XFS, but we've been doing this for a couple of decades now
and the sky hasn't fallen yet.

The second part of the series is more involved - in most cases
GFP_NOFS is redundant because we are already in a scoped NOFS
context (e.g. transactions) so the conversion to GFP_KERNEL isn't a
huge issue.

However, there are some code paths where we have used GFP_NOFS to
prevent lockdep warnings because the code is called from both
GFP_KERNEL and GFP_NOFS contexts and so lockdep gets confused when
it has tracked code as GFP_NOFS and then sees it enter direct
reclaim, recurse into the filesystem and take fs locks from the
GFP_KERNEL caller. There are a couple of other lockdep false
positive paths that can occur that we've shut up with GFP_NOFS, too.
More recently, we've been using the __GFP_NOLOCKDEP flag to signal
this "lockdep gives false positives here" condition, so one of the
things this patchset does is convert all the GFP_NOFS calls in code
that can be run from both GFP_KERNEL and GFP_NOFS contexts, and/or
run both above and below reclaim with GFP_KERNEL | __GFP_NOLOCKDEP.

This means that some allocations have gone from having KM_NOFS tags
to having GFP_KERNEL | __GFP_NOLOCKDEP | __GFP_NOFAIL. There is an
increase in verbosity here, but the first step in cleaning all this
mess up is consistently annotating all the allocation sites with the
correct tags.

Later in the patchset, we start adding new scoped NOFS contexts to
cover cases where we really need NOFS but rely on code being called
to understand that it is actually in a NOFS context. And example of
this is intent recovery - allocating the intent structure occurs
outside transaction scope, but still needs to be NOFS scope because
of all the pending work already queued. The rest of the work is done
under transaction context, giving it NOFS context, but these initial
allocations aren't inside that scope. IOWs, the entire intent
recovery scope should really be covered by a single NOFS context.
The patch set ends up putting the entire second phase of recovery
(intents, unlnked list, reflink cleanup) under a single NOFS context
because we really don't want reclaim to operate on the filesystem
whilst we are performing these operations. Hence a single high level
NOFS scope is appropriate here.

The end result is that GFP_NOFS is completely gone from XFS,
replaced by correct annotations and more widely deployed scoped
allocation contexts. This passes fstests with lockdep, KASAN and
other debuggin enabled without any regressions or new lockdep false
positives.

Comments, thoughts and ideas?

----

Version 1:
- based on v6.7 + linux-xfs/for-next

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 01/12] xfs: convert kmem_zalloc() to kzalloc()
  2024-01-15 22:59 [PATCH 00/12] xfs: remove remaining kmem interfaces and GFP_NOFS usage Dave Chinner
@ 2024-01-15 22:59 ` Dave Chinner
  2024-01-18 22:48   ` Darrick J. Wong
  2024-01-15 22:59 ` [PATCH 03/12] xfs: move kmem_to_page() Dave Chinner
                   ` (11 subsequent siblings)
  12 siblings, 1 reply; 29+ messages in thread
From: Dave Chinner @ 2024-01-15 22:59 UTC (permalink / raw)
  To: linux-xfs; +Cc: willy, linux-mm

From: Dave Chinner <dchinner@redhat.com>

There's no reason to keep the kmem_zalloc() around anymore, it's
just a thin wrapper around kmalloc(), so lets get rid of it.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/kmem.h                     |  7 -------
 fs/xfs/libxfs/xfs_ag.c            |  2 +-
 fs/xfs/libxfs/xfs_attr_leaf.c     |  3 ++-
 fs/xfs/libxfs/xfs_btree_staging.c |  2 +-
 fs/xfs/libxfs/xfs_da_btree.c      |  5 +++--
 fs/xfs/libxfs/xfs_defer.c         |  2 +-
 fs/xfs/libxfs/xfs_dir2.c          | 18 +++++++++---------
 fs/xfs/libxfs/xfs_iext_tree.c     | 12 ++++++++----
 fs/xfs/xfs_attr_item.c            |  4 ++--
 fs/xfs/xfs_buf.c                  |  6 +++---
 fs/xfs/xfs_buf_item.c             |  4 ++--
 fs/xfs/xfs_error.c                |  4 ++--
 fs/xfs/xfs_extent_busy.c          |  3 ++-
 fs/xfs/xfs_itable.c               |  8 ++++----
 fs/xfs/xfs_iwalk.c                |  3 ++-
 fs/xfs/xfs_log.c                  |  5 +++--
 fs/xfs/xfs_log_cil.c              |  4 ++--
 fs/xfs/xfs_log_recover.c          | 10 +++++-----
 fs/xfs/xfs_mru_cache.c            |  7 ++++---
 fs/xfs/xfs_qm.c                   |  3 ++-
 fs/xfs/xfs_refcount_item.c        |  4 ++--
 fs/xfs/xfs_rmap_item.c            |  3 ++-
 fs/xfs/xfs_trans_ail.c            |  3 ++-
 23 files changed, 64 insertions(+), 58 deletions(-)

diff --git a/fs/xfs/kmem.h b/fs/xfs/kmem.h
index b987dc2c6851..bce31182c9e8 100644
--- a/fs/xfs/kmem.h
+++ b/fs/xfs/kmem.h
@@ -62,13 +62,6 @@ static inline void  kmem_free(const void *ptr)
 	kvfree(ptr);
 }
 
-
-static inline void *
-kmem_zalloc(size_t size, xfs_km_flags_t flags)
-{
-	return kmem_alloc(size, flags | KM_ZERO);
-}
-
 /*
  * Zone interfaces
  */
diff --git a/fs/xfs/libxfs/xfs_ag.c b/fs/xfs/libxfs/xfs_ag.c
index 39d9525270b7..96a6bfd58931 100644
--- a/fs/xfs/libxfs/xfs_ag.c
+++ b/fs/xfs/libxfs/xfs_ag.c
@@ -381,7 +381,7 @@ xfs_initialize_perag(
 			continue;
 		}
 
-		pag = kmem_zalloc(sizeof(*pag), KM_MAYFAIL);
+		pag = kzalloc(sizeof(*pag), GFP_KERNEL | __GFP_RETRY_MAYFAIL);
 		if (!pag) {
 			error = -ENOMEM;
 			goto out_unwind_new_pags;
diff --git a/fs/xfs/libxfs/xfs_attr_leaf.c b/fs/xfs/libxfs/xfs_attr_leaf.c
index 6374bf107242..ab4223bf51ee 100644
--- a/fs/xfs/libxfs/xfs_attr_leaf.c
+++ b/fs/xfs/libxfs/xfs_attr_leaf.c
@@ -2250,7 +2250,8 @@ xfs_attr3_leaf_unbalance(
 		struct xfs_attr_leafblock *tmp_leaf;
 		struct xfs_attr3_icleaf_hdr tmphdr;
 
-		tmp_leaf = kmem_zalloc(state->args->geo->blksize, 0);
+		tmp_leaf = kzalloc(state->args->geo->blksize,
+				GFP_KERNEL | __GFP_NOFAIL);
 
 		/*
 		 * Copy the header into the temp leaf so that all the stuff
diff --git a/fs/xfs/libxfs/xfs_btree_staging.c b/fs/xfs/libxfs/xfs_btree_staging.c
index e276eba87cb1..eff29425fd76 100644
--- a/fs/xfs/libxfs/xfs_btree_staging.c
+++ b/fs/xfs/libxfs/xfs_btree_staging.c
@@ -406,7 +406,7 @@ xfs_btree_bload_prep_block(
 
 		/* Allocate a new incore btree root block. */
 		new_size = bbl->iroot_size(cur, level, nr_this_block, priv);
-		ifp->if_broot = kmem_zalloc(new_size, 0);
+		ifp->if_broot = kzalloc(new_size, GFP_KERNEL);
 		ifp->if_broot_bytes = (int)new_size;
 
 		/* Initialize it and send it out. */
diff --git a/fs/xfs/libxfs/xfs_da_btree.c b/fs/xfs/libxfs/xfs_da_btree.c
index 5457188bb4de..73aae6543906 100644
--- a/fs/xfs/libxfs/xfs_da_btree.c
+++ b/fs/xfs/libxfs/xfs_da_btree.c
@@ -2518,7 +2518,7 @@ xfs_dabuf_map(
 	int			error = 0, nirecs, i;
 
 	if (nfsb > 1)
-		irecs = kmem_zalloc(sizeof(irec) * nfsb, KM_NOFS);
+		irecs = kzalloc(sizeof(irec) * nfsb, GFP_NOFS | __GFP_NOFAIL);
 
 	nirecs = nfsb;
 	error = xfs_bmapi_read(dp, bno, nfsb, irecs, &nirecs,
@@ -2531,7 +2531,8 @@ xfs_dabuf_map(
 	 * larger one that needs to be free by the caller.
 	 */
 	if (nirecs > 1) {
-		map = kmem_zalloc(nirecs * sizeof(struct xfs_buf_map), KM_NOFS);
+		map = kzalloc(nirecs * sizeof(struct xfs_buf_map),
+				GFP_NOFS | __GFP_NOFAIL);
 		if (!map) {
 			error = -ENOMEM;
 			goto out_free_irecs;
diff --git a/fs/xfs/libxfs/xfs_defer.c b/fs/xfs/libxfs/xfs_defer.c
index 66a17910d021..07d318b1f807 100644
--- a/fs/xfs/libxfs/xfs_defer.c
+++ b/fs/xfs/libxfs/xfs_defer.c
@@ -979,7 +979,7 @@ xfs_defer_ops_capture(
 		return ERR_PTR(error);
 
 	/* Create an object to capture the defer ops. */
-	dfc = kmem_zalloc(sizeof(*dfc), KM_NOFS);
+	dfc = kzalloc(sizeof(*dfc), GFP_NOFS | __GFP_NOFAIL);
 	INIT_LIST_HEAD(&dfc->dfc_list);
 	INIT_LIST_HEAD(&dfc->dfc_dfops);
 
diff --git a/fs/xfs/libxfs/xfs_dir2.c b/fs/xfs/libxfs/xfs_dir2.c
index a76673281514..54915a302e96 100644
--- a/fs/xfs/libxfs/xfs_dir2.c
+++ b/fs/xfs/libxfs/xfs_dir2.c
@@ -104,10 +104,10 @@ xfs_da_mount(
 	ASSERT(mp->m_sb.sb_versionnum & XFS_SB_VERSION_DIRV2BIT);
 	ASSERT(xfs_dir2_dirblock_bytes(&mp->m_sb) <= XFS_MAX_BLOCKSIZE);
 
-	mp->m_dir_geo = kmem_zalloc(sizeof(struct xfs_da_geometry),
-				    KM_MAYFAIL);
-	mp->m_attr_geo = kmem_zalloc(sizeof(struct xfs_da_geometry),
-				     KM_MAYFAIL);
+	mp->m_dir_geo = kzalloc(sizeof(struct xfs_da_geometry),
+				GFP_KERNEL | __GFP_RETRY_MAYFAIL);
+	mp->m_attr_geo = kzalloc(sizeof(struct xfs_da_geometry),
+				GFP_KERNEL | __GFP_RETRY_MAYFAIL);
 	if (!mp->m_dir_geo || !mp->m_attr_geo) {
 		kmem_free(mp->m_dir_geo);
 		kmem_free(mp->m_attr_geo);
@@ -236,7 +236,7 @@ xfs_dir_init(
 	if (error)
 		return error;
 
-	args = kmem_zalloc(sizeof(*args), KM_NOFS);
+	args = kzalloc(sizeof(*args), GFP_NOFS | __GFP_NOFAIL);
 	if (!args)
 		return -ENOMEM;
 
@@ -273,7 +273,7 @@ xfs_dir_createname(
 		XFS_STATS_INC(dp->i_mount, xs_dir_create);
 	}
 
-	args = kmem_zalloc(sizeof(*args), KM_NOFS);
+	args = kzalloc(sizeof(*args), GFP_NOFS | __GFP_NOFAIL);
 	if (!args)
 		return -ENOMEM;
 
@@ -372,7 +372,7 @@ xfs_dir_lookup(
 	 * lockdep Doing this avoids having to add a bunch of lockdep class
 	 * annotations into the reclaim path for the ilock.
 	 */
-	args = kmem_zalloc(sizeof(*args), KM_NOFS);
+	args = kzalloc(sizeof(*args), GFP_NOFS | __GFP_NOFAIL);
 	args->geo = dp->i_mount->m_dir_geo;
 	args->name = name->name;
 	args->namelen = name->len;
@@ -441,7 +441,7 @@ xfs_dir_removename(
 	ASSERT(S_ISDIR(VFS_I(dp)->i_mode));
 	XFS_STATS_INC(dp->i_mount, xs_dir_remove);
 
-	args = kmem_zalloc(sizeof(*args), KM_NOFS);
+	args = kzalloc(sizeof(*args), GFP_NOFS | __GFP_NOFAIL);
 	if (!args)
 		return -ENOMEM;
 
@@ -502,7 +502,7 @@ xfs_dir_replace(
 	if (rval)
 		return rval;
 
-	args = kmem_zalloc(sizeof(*args), KM_NOFS);
+	args = kzalloc(sizeof(*args), GFP_NOFS | __GFP_NOFAIL);
 	if (!args)
 		return -ENOMEM;
 
diff --git a/fs/xfs/libxfs/xfs_iext_tree.c b/fs/xfs/libxfs/xfs_iext_tree.c
index f4e6b200cdf8..4522f3c7a23f 100644
--- a/fs/xfs/libxfs/xfs_iext_tree.c
+++ b/fs/xfs/libxfs/xfs_iext_tree.c
@@ -398,7 +398,8 @@ static void
 xfs_iext_grow(
 	struct xfs_ifork	*ifp)
 {
-	struct xfs_iext_node	*node = kmem_zalloc(NODE_SIZE, KM_NOFS);
+	struct xfs_iext_node	*node = kzalloc(NODE_SIZE,
+						GFP_NOFS | __GFP_NOFAIL);
 	int			i;
 
 	if (ifp->if_height == 1) {
@@ -454,7 +455,8 @@ xfs_iext_split_node(
 	int			*nr_entries)
 {
 	struct xfs_iext_node	*node = *nodep;
-	struct xfs_iext_node	*new = kmem_zalloc(NODE_SIZE, KM_NOFS);
+	struct xfs_iext_node	*new = kzalloc(NODE_SIZE,
+						GFP_NOFS | __GFP_NOFAIL);
 	const int		nr_move = KEYS_PER_NODE / 2;
 	int			nr_keep = nr_move + (KEYS_PER_NODE & 1);
 	int			i = 0;
@@ -542,7 +544,8 @@ xfs_iext_split_leaf(
 	int			*nr_entries)
 {
 	struct xfs_iext_leaf	*leaf = cur->leaf;
-	struct xfs_iext_leaf	*new = kmem_zalloc(NODE_SIZE, KM_NOFS);
+	struct xfs_iext_leaf	*new = kzalloc(NODE_SIZE,
+						GFP_NOFS | __GFP_NOFAIL);
 	const int		nr_move = RECS_PER_LEAF / 2;
 	int			nr_keep = nr_move + (RECS_PER_LEAF & 1);
 	int			i;
@@ -583,7 +586,8 @@ xfs_iext_alloc_root(
 {
 	ASSERT(ifp->if_bytes == 0);
 
-	ifp->if_data = kmem_zalloc(sizeof(struct xfs_iext_rec), KM_NOFS);
+	ifp->if_data = kzalloc(sizeof(struct xfs_iext_rec),
+					GFP_NOFS | __GFP_NOFAIL);
 	ifp->if_height = 1;
 
 	/* now that we have a node step into it */
diff --git a/fs/xfs/xfs_attr_item.c b/fs/xfs/xfs_attr_item.c
index 9e02111bd890..2e454a0d6f19 100644
--- a/fs/xfs/xfs_attr_item.c
+++ b/fs/xfs/xfs_attr_item.c
@@ -512,8 +512,8 @@ xfs_attri_recover_work(
 	if (error)
 		return ERR_PTR(error);
 
-	attr = kmem_zalloc(sizeof(struct xfs_attr_intent) +
-			   sizeof(struct xfs_da_args), KM_NOFS);
+	attr = kzalloc(sizeof(struct xfs_attr_intent) +
+			sizeof(struct xfs_da_args), GFP_NOFS | __GFP_NOFAIL);
 	args = (struct xfs_da_args *)(attr + 1);
 
 	attr->xattri_da_args = args;
diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index ec4bd7a24d88..710ea4c97122 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -189,8 +189,8 @@ xfs_buf_get_maps(
 		return 0;
 	}
 
-	bp->b_maps = kmem_zalloc(map_count * sizeof(struct xfs_buf_map),
-				KM_NOFS);
+	bp->b_maps = kzalloc(map_count * sizeof(struct xfs_buf_map),
+				GFP_NOFS | __GFP_NOFAIL);
 	if (!bp->b_maps)
 		return -ENOMEM;
 	return 0;
@@ -2002,7 +2002,7 @@ xfs_alloc_buftarg(
 #if defined(CONFIG_FS_DAX) && defined(CONFIG_MEMORY_FAILURE)
 	ops = &xfs_dax_holder_operations;
 #endif
-	btp = kmem_zalloc(sizeof(*btp), KM_NOFS);
+	btp = kzalloc(sizeof(*btp), GFP_NOFS | __GFP_NOFAIL);
 
 	btp->bt_mount = mp;
 	btp->bt_bdev_handle = bdev_handle;
diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c
index 023d4e0385dd..ec93d34188c8 100644
--- a/fs/xfs/xfs_buf_item.c
+++ b/fs/xfs/xfs_buf_item.c
@@ -805,8 +805,8 @@ xfs_buf_item_get_format(
 		return;
 	}
 
-	bip->bli_formats = kmem_zalloc(count * sizeof(struct xfs_buf_log_format),
-				0);
+	bip->bli_formats = kzalloc(count * sizeof(struct xfs_buf_log_format),
+				GFP_KERNEL | __GFP_NOFAIL);
 }
 
 STATIC void
diff --git a/fs/xfs/xfs_error.c b/fs/xfs/xfs_error.c
index b2cbbba3e15a..456520d60cd0 100644
--- a/fs/xfs/xfs_error.c
+++ b/fs/xfs/xfs_error.c
@@ -240,8 +240,8 @@ xfs_errortag_init(
 {
 	int ret;
 
-	mp->m_errortag = kmem_zalloc(sizeof(unsigned int) * XFS_ERRTAG_MAX,
-			KM_MAYFAIL);
+	mp->m_errortag = kzalloc(sizeof(unsigned int) * XFS_ERRTAG_MAX,
+				GFP_KERNEL | __GFP_RETRY_MAYFAIL);
 	if (!mp->m_errortag)
 		return -ENOMEM;
 
diff --git a/fs/xfs/xfs_extent_busy.c b/fs/xfs/xfs_extent_busy.c
index 2ccde32c9a9e..b90c3dd43e03 100644
--- a/fs/xfs/xfs_extent_busy.c
+++ b/fs/xfs/xfs_extent_busy.c
@@ -32,7 +32,8 @@ xfs_extent_busy_insert_list(
 	struct rb_node		**rbp;
 	struct rb_node		*parent = NULL;
 
-	new = kmem_zalloc(sizeof(struct xfs_extent_busy), 0);
+	new = kzalloc(sizeof(struct xfs_extent_busy),
+			GFP_KERNEL | __GFP_NOFAIL);
 	new->agno = pag->pag_agno;
 	new->bno = bno;
 	new->length = len;
diff --git a/fs/xfs/xfs_itable.c b/fs/xfs/xfs_itable.c
index 14462614fcc8..14211174267a 100644
--- a/fs/xfs/xfs_itable.c
+++ b/fs/xfs/xfs_itable.c
@@ -197,8 +197,8 @@ xfs_bulkstat_one(
 
 	ASSERT(breq->icount == 1);
 
-	bc.buf = kmem_zalloc(sizeof(struct xfs_bulkstat),
-			KM_MAYFAIL);
+	bc.buf = kzalloc(sizeof(struct xfs_bulkstat),
+			GFP_KERNEL | __GFP_RETRY_MAYFAIL);
 	if (!bc.buf)
 		return -ENOMEM;
 
@@ -289,8 +289,8 @@ xfs_bulkstat(
 	if (xfs_bulkstat_already_done(breq->mp, breq->startino))
 		return 0;
 
-	bc.buf = kmem_zalloc(sizeof(struct xfs_bulkstat),
-			KM_MAYFAIL);
+	bc.buf = kzalloc(sizeof(struct xfs_bulkstat),
+			GFP_KERNEL | __GFP_RETRY_MAYFAIL);
 	if (!bc.buf)
 		return -ENOMEM;
 
diff --git a/fs/xfs/xfs_iwalk.c b/fs/xfs/xfs_iwalk.c
index b3275e8d47b6..8dbb7c054b28 100644
--- a/fs/xfs/xfs_iwalk.c
+++ b/fs/xfs/xfs_iwalk.c
@@ -663,7 +663,8 @@ xfs_iwalk_threaded(
 		if (xfs_pwork_ctl_want_abort(&pctl))
 			break;
 
-		iwag = kmem_zalloc(sizeof(struct xfs_iwalk_ag), 0);
+		iwag = kzalloc(sizeof(struct xfs_iwalk_ag),
+				GFP_KERNEL | __GFP_NOFAIL);
 		iwag->mp = mp;
 
 		/*
diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
index a1650fc81382..d38cfaadc726 100644
--- a/fs/xfs/xfs_log.c
+++ b/fs/xfs/xfs_log.c
@@ -1528,7 +1528,7 @@ xlog_alloc_log(
 	int			error = -ENOMEM;
 	uint			log2_size = 0;
 
-	log = kmem_zalloc(sizeof(struct xlog), KM_MAYFAIL);
+	log = kzalloc(sizeof(struct xlog), GFP_KERNEL | __GFP_RETRY_MAYFAIL);
 	if (!log) {
 		xfs_warn(mp, "Log allocation failed: No memory!");
 		goto out;
@@ -1605,7 +1605,8 @@ xlog_alloc_log(
 		size_t bvec_size = howmany(log->l_iclog_size, PAGE_SIZE) *
 				sizeof(struct bio_vec);
 
-		iclog = kmem_zalloc(sizeof(*iclog) + bvec_size, KM_MAYFAIL);
+		iclog = kzalloc(sizeof(*iclog) + bvec_size,
+				GFP_KERNEL | __GFP_RETRY_MAYFAIL);
 		if (!iclog)
 			goto out_free_iclog;
 
diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
index 67a99d94701e..3c705f22b0ab 100644
--- a/fs/xfs/xfs_log_cil.c
+++ b/fs/xfs/xfs_log_cil.c
@@ -100,7 +100,7 @@ xlog_cil_ctx_alloc(void)
 {
 	struct xfs_cil_ctx	*ctx;
 
-	ctx = kmem_zalloc(sizeof(*ctx), KM_NOFS);
+	ctx = kzalloc(sizeof(*ctx), GFP_NOFS | __GFP_NOFAIL);
 	INIT_LIST_HEAD(&ctx->committing);
 	INIT_LIST_HEAD(&ctx->busy_extents.extent_list);
 	INIT_LIST_HEAD(&ctx->log_items);
@@ -1747,7 +1747,7 @@ xlog_cil_init(
 	struct xlog_cil_pcp	*cilpcp;
 	int			cpu;
 
-	cil = kmem_zalloc(sizeof(*cil), KM_MAYFAIL);
+	cil = kzalloc(sizeof(*cil), GFP_KERNEL | __GFP_RETRY_MAYFAIL);
 	if (!cil)
 		return -ENOMEM;
 	/*
diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index 1251c81e55f9..4a27ecdbb546 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -2057,7 +2057,8 @@ xlog_recover_add_item(
 {
 	struct xlog_recover_item *item;
 
-	item = kmem_zalloc(sizeof(struct xlog_recover_item), 0);
+	item = kzalloc(sizeof(struct xlog_recover_item),
+			GFP_KERNEL | __GFP_NOFAIL);
 	INIT_LIST_HEAD(&item->ri_list);
 	list_add_tail(&item->ri_list, head);
 }
@@ -2187,9 +2188,8 @@ xlog_recover_add_to_trans(
 		}
 
 		item->ri_total = in_f->ilf_size;
-		item->ri_buf =
-			kmem_zalloc(item->ri_total * sizeof(xfs_log_iovec_t),
-				    0);
+		item->ri_buf = kzalloc(item->ri_total * sizeof(xfs_log_iovec_t),
+				GFP_KERNEL | __GFP_NOFAIL);
 	}
 
 	if (item->ri_total <= item->ri_cnt) {
@@ -2332,7 +2332,7 @@ xlog_recover_ophdr_to_trans(
 	 * This is a new transaction so allocate a new recovery container to
 	 * hold the recovery ops that will follow.
 	 */
-	trans = kmem_zalloc(sizeof(struct xlog_recover), 0);
+	trans = kzalloc(sizeof(struct xlog_recover), GFP_KERNEL | __GFP_NOFAIL);
 	trans->r_log_tid = tid;
 	trans->r_lsn = be64_to_cpu(rhead->h_lsn);
 	INIT_LIST_HEAD(&trans->r_itemq);
diff --git a/fs/xfs/xfs_mru_cache.c b/fs/xfs/xfs_mru_cache.c
index f85e3b07ab44..feae3115617b 100644
--- a/fs/xfs/xfs_mru_cache.c
+++ b/fs/xfs/xfs_mru_cache.c
@@ -333,13 +333,14 @@ xfs_mru_cache_create(
 	if (!(grp_time = msecs_to_jiffies(lifetime_ms) / grp_count))
 		return -EINVAL;
 
-	if (!(mru = kmem_zalloc(sizeof(*mru), 0)))
+	mru = kzalloc(sizeof(*mru), GFP_KERNEL | __GFP_NOFAIL);
+	if (!mru)
 		return -ENOMEM;
 
 	/* An extra list is needed to avoid reaping up to a grp_time early. */
 	mru->grp_count = grp_count + 1;
-	mru->lists = kmem_zalloc(mru->grp_count * sizeof(*mru->lists), 0);
-
+	mru->lists = kzalloc(mru->grp_count * sizeof(*mru->lists),
+				GFP_KERNEL | __GFP_NOFAIL);
 	if (!mru->lists) {
 		err = -ENOMEM;
 		goto exit;
diff --git a/fs/xfs/xfs_qm.c b/fs/xfs/xfs_qm.c
index 94a7932ac570..b9d11376c88a 100644
--- a/fs/xfs/xfs_qm.c
+++ b/fs/xfs/xfs_qm.c
@@ -628,7 +628,8 @@ xfs_qm_init_quotainfo(
 
 	ASSERT(XFS_IS_QUOTA_ON(mp));
 
-	qinf = mp->m_quotainfo = kmem_zalloc(sizeof(struct xfs_quotainfo), 0);
+	qinf = mp->m_quotainfo = kzalloc(sizeof(struct xfs_quotainfo),
+					GFP_KERNEL | __GFP_NOFAIL);
 
 	error = list_lru_init(&qinf->qi_lru);
 	if (error)
diff --git a/fs/xfs/xfs_refcount_item.c b/fs/xfs/xfs_refcount_item.c
index 20ad8086da60..78d0cda60abf 100644
--- a/fs/xfs/xfs_refcount_item.c
+++ b/fs/xfs/xfs_refcount_item.c
@@ -143,8 +143,8 @@ xfs_cui_init(
 
 	ASSERT(nextents > 0);
 	if (nextents > XFS_CUI_MAX_FAST_EXTENTS)
-		cuip = kmem_zalloc(xfs_cui_log_item_sizeof(nextents),
-				0);
+		cuip = kzalloc(xfs_cui_log_item_sizeof(nextents),
+				GFP_KERNEL | __GFP_NOFAIL);
 	else
 		cuip = kmem_cache_zalloc(xfs_cui_cache,
 					 GFP_KERNEL | __GFP_NOFAIL);
diff --git a/fs/xfs/xfs_rmap_item.c b/fs/xfs/xfs_rmap_item.c
index 79ad0087aeca..31a921fc34b2 100644
--- a/fs/xfs/xfs_rmap_item.c
+++ b/fs/xfs/xfs_rmap_item.c
@@ -142,7 +142,8 @@ xfs_rui_init(
 
 	ASSERT(nextents > 0);
 	if (nextents > XFS_RUI_MAX_FAST_EXTENTS)
-		ruip = kmem_zalloc(xfs_rui_log_item_sizeof(nextents), 0);
+		ruip = kzalloc(xfs_rui_log_item_sizeof(nextents),
+				GFP_KERNEL | __GFP_NOFAIL);
 	else
 		ruip = kmem_cache_zalloc(xfs_rui_cache,
 					 GFP_KERNEL | __GFP_NOFAIL);
diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c
index 1098452e7f95..5f206cdb40ff 100644
--- a/fs/xfs/xfs_trans_ail.c
+++ b/fs/xfs/xfs_trans_ail.c
@@ -901,7 +901,8 @@ xfs_trans_ail_init(
 {
 	struct xfs_ail	*ailp;
 
-	ailp = kmem_zalloc(sizeof(struct xfs_ail), KM_MAYFAIL);
+	ailp = kzalloc(sizeof(struct xfs_ail),
+			GFP_KERNEL | __GFP_RETRY_MAYFAIL);
 	if (!ailp)
 		return -ENOMEM;
 
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 03/12] xfs: move kmem_to_page()
  2024-01-15 22:59 [PATCH 00/12] xfs: remove remaining kmem interfaces and GFP_NOFS usage Dave Chinner
  2024-01-15 22:59 ` [PATCH 01/12] xfs: convert kmem_zalloc() to kzalloc() Dave Chinner
@ 2024-01-15 22:59 ` Dave Chinner
  2024-01-18 22:50   ` Darrick J. Wong
  2024-01-15 22:59 ` [PATCH 04/12] xfs: convert kmem_free() for kvmalloc users to kvfree() Dave Chinner
                   ` (10 subsequent siblings)
  12 siblings, 1 reply; 29+ messages in thread
From: Dave Chinner @ 2024-01-15 22:59 UTC (permalink / raw)
  To: linux-xfs; +Cc: willy, linux-mm

From: Dave Chinner <dchinner@redhat.com>

Move it to the general xfs linux wrapper header file so we can
prepare to remove kmem.h

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/kmem.h      | 11 -----------
 fs/xfs/xfs_linux.h | 11 +++++++++++
 2 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/fs/xfs/kmem.h b/fs/xfs/kmem.h
index 1343f1a6f99b..48e43f29f2a0 100644
--- a/fs/xfs/kmem.h
+++ b/fs/xfs/kmem.h
@@ -20,15 +20,4 @@ static inline void  kmem_free(const void *ptr)
 	kvfree(ptr);
 }
 
-/*
- * Zone interfaces
- */
-static inline struct page *
-kmem_to_page(void *addr)
-{
-	if (is_vmalloc_addr(addr))
-		return vmalloc_to_page(addr);
-	return virt_to_page(addr);
-}
-
 #endif /* __XFS_SUPPORT_KMEM_H__ */
diff --git a/fs/xfs/xfs_linux.h b/fs/xfs/xfs_linux.h
index d7873e0360f0..666618b463c9 100644
--- a/fs/xfs/xfs_linux.h
+++ b/fs/xfs/xfs_linux.h
@@ -269,4 +269,15 @@ int xfs_rw_bdev(struct block_device *bdev, sector_t sector, unsigned int count,
 # define PTR_FMT "%p"
 #endif
 
+/*
+ * Helper for IO routines to grab backing pages from allocated kernel memory.
+ */
+static inline struct page *
+kmem_to_page(void *addr)
+{
+	if (is_vmalloc_addr(addr))
+		return vmalloc_to_page(addr);
+	return virt_to_page(addr);
+}
+
 #endif /* __XFS_LINUX__ */
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 04/12] xfs: convert kmem_free() for kvmalloc users to kvfree()
  2024-01-15 22:59 [PATCH 00/12] xfs: remove remaining kmem interfaces and GFP_NOFS usage Dave Chinner
  2024-01-15 22:59 ` [PATCH 01/12] xfs: convert kmem_zalloc() to kzalloc() Dave Chinner
  2024-01-15 22:59 ` [PATCH 03/12] xfs: move kmem_to_page() Dave Chinner
@ 2024-01-15 22:59 ` Dave Chinner
  2024-01-18 22:53   ` Darrick J. Wong
  2024-01-15 22:59 ` [PATCH 05/12] xfs: convert remaining kmem_free() to kfree() Dave Chinner
                   ` (9 subsequent siblings)
  12 siblings, 1 reply; 29+ messages in thread
From: Dave Chinner @ 2024-01-15 22:59 UTC (permalink / raw)
  To: linux-xfs; +Cc: willy, linux-mm

From: Dave Chinner <dchinner@redhat.com>

Start getting rid of kmem_free() by converting all the cases where
memory can come from vmalloc interfaces to calling kvfree()
directly.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_acl.c           |  4 ++--
 fs/xfs/xfs_attr_item.c     |  4 ++--
 fs/xfs/xfs_bmap_item.c     |  4 ++--
 fs/xfs/xfs_buf_item.c      |  2 +-
 fs/xfs/xfs_dquot.c         |  2 +-
 fs/xfs/xfs_extfree_item.c  |  4 ++--
 fs/xfs/xfs_icreate_item.c  |  2 +-
 fs/xfs/xfs_inode_item.c    |  2 +-
 fs/xfs/xfs_ioctl.c         |  2 +-
 fs/xfs/xfs_log.c           |  4 ++--
 fs/xfs/xfs_log_cil.c       |  2 +-
 fs/xfs/xfs_log_recover.c   | 42 +++++++++++++++++++-------------------
 fs/xfs/xfs_refcount_item.c |  4 ++--
 fs/xfs/xfs_rmap_item.c     |  4 ++--
 fs/xfs/xfs_rtalloc.c       |  6 +++---
 15 files changed, 44 insertions(+), 44 deletions(-)

diff --git a/fs/xfs/xfs_acl.c b/fs/xfs/xfs_acl.c
index 6b840301817a..4bf69c9c088e 100644
--- a/fs/xfs/xfs_acl.c
+++ b/fs/xfs/xfs_acl.c
@@ -167,7 +167,7 @@ xfs_get_acl(struct inode *inode, int type, bool rcu)
 		acl = ERR_PTR(error);
 	}
 
-	kmem_free(args.value);
+	kvfree(args.value);
 	return acl;
 }
 
@@ -204,7 +204,7 @@ __xfs_set_acl(struct inode *inode, struct posix_acl *acl, int type)
 	}
 
 	error = xfs_attr_change(&args);
-	kmem_free(args.value);
+	kvfree(args.value);
 
 	/*
 	 * If the attribute didn't exist to start with that's fine.
diff --git a/fs/xfs/xfs_attr_item.c b/fs/xfs/xfs_attr_item.c
index 2e454a0d6f19..f7ba80d575d4 100644
--- a/fs/xfs/xfs_attr_item.c
+++ b/fs/xfs/xfs_attr_item.c
@@ -108,7 +108,7 @@ STATIC void
 xfs_attri_item_free(
 	struct xfs_attri_log_item	*attrip)
 {
-	kmem_free(attrip->attri_item.li_lv_shadow);
+	kvfree(attrip->attri_item.li_lv_shadow);
 	xfs_attri_log_nameval_put(attrip->attri_nameval);
 	kmem_cache_free(xfs_attri_cache, attrip);
 }
@@ -251,7 +251,7 @@ static inline struct xfs_attrd_log_item *ATTRD_ITEM(struct xfs_log_item *lip)
 STATIC void
 xfs_attrd_item_free(struct xfs_attrd_log_item *attrdp)
 {
-	kmem_free(attrdp->attrd_item.li_lv_shadow);
+	kvfree(attrdp->attrd_item.li_lv_shadow);
 	kmem_cache_free(xfs_attrd_cache, attrdp);
 }
 
diff --git a/fs/xfs/xfs_bmap_item.c b/fs/xfs/xfs_bmap_item.c
index 52fb8a148b7d..029a6a8d0efd 100644
--- a/fs/xfs/xfs_bmap_item.c
+++ b/fs/xfs/xfs_bmap_item.c
@@ -40,7 +40,7 @@ STATIC void
 xfs_bui_item_free(
 	struct xfs_bui_log_item	*buip)
 {
-	kmem_free(buip->bui_item.li_lv_shadow);
+	kvfree(buip->bui_item.li_lv_shadow);
 	kmem_cache_free(xfs_bui_cache, buip);
 }
 
@@ -201,7 +201,7 @@ xfs_bud_item_release(
 	struct xfs_bud_log_item	*budp = BUD_ITEM(lip);
 
 	xfs_bui_release(budp->bud_buip);
-	kmem_free(budp->bud_item.li_lv_shadow);
+	kvfree(budp->bud_item.li_lv_shadow);
 	kmem_cache_free(xfs_bud_cache, budp);
 }
 
diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c
index ec93d34188c8..545040c6ae87 100644
--- a/fs/xfs/xfs_buf_item.c
+++ b/fs/xfs/xfs_buf_item.c
@@ -1044,7 +1044,7 @@ xfs_buf_item_free(
 	struct xfs_buf_log_item	*bip)
 {
 	xfs_buf_item_free_format(bip);
-	kmem_free(bip->bli_item.li_lv_shadow);
+	kvfree(bip->bli_item.li_lv_shadow);
 	kmem_cache_free(xfs_buf_item_cache, bip);
 }
 
diff --git a/fs/xfs/xfs_dquot.c b/fs/xfs/xfs_dquot.c
index a93ad76f23c5..17c82f5e783c 100644
--- a/fs/xfs/xfs_dquot.c
+++ b/fs/xfs/xfs_dquot.c
@@ -53,7 +53,7 @@ xfs_qm_dqdestroy(
 {
 	ASSERT(list_empty(&dqp->q_lru));
 
-	kmem_free(dqp->q_logitem.qli_item.li_lv_shadow);
+	kvfree(dqp->q_logitem.qli_item.li_lv_shadow);
 	mutex_destroy(&dqp->q_qlock);
 
 	XFS_STATS_DEC(dqp->q_mount, xs_qm_dquot);
diff --git a/fs/xfs/xfs_extfree_item.c b/fs/xfs/xfs_extfree_item.c
index 1d1185fca6a5..6062703a2723 100644
--- a/fs/xfs/xfs_extfree_item.c
+++ b/fs/xfs/xfs_extfree_item.c
@@ -40,7 +40,7 @@ STATIC void
 xfs_efi_item_free(
 	struct xfs_efi_log_item	*efip)
 {
-	kmem_free(efip->efi_item.li_lv_shadow);
+	kvfree(efip->efi_item.li_lv_shadow);
 	if (efip->efi_format.efi_nextents > XFS_EFI_MAX_FAST_EXTENTS)
 		kmem_free(efip);
 	else
@@ -229,7 +229,7 @@ static inline struct xfs_efd_log_item *EFD_ITEM(struct xfs_log_item *lip)
 STATIC void
 xfs_efd_item_free(struct xfs_efd_log_item *efdp)
 {
-	kmem_free(efdp->efd_item.li_lv_shadow);
+	kvfree(efdp->efd_item.li_lv_shadow);
 	if (efdp->efd_format.efd_nextents > XFS_EFD_MAX_FAST_EXTENTS)
 		kmem_free(efdp);
 	else
diff --git a/fs/xfs/xfs_icreate_item.c b/fs/xfs/xfs_icreate_item.c
index b05314d48176..4345db501714 100644
--- a/fs/xfs/xfs_icreate_item.c
+++ b/fs/xfs/xfs_icreate_item.c
@@ -63,7 +63,7 @@ STATIC void
 xfs_icreate_item_release(
 	struct xfs_log_item	*lip)
 {
-	kmem_free(ICR_ITEM(lip)->ic_item.li_lv_shadow);
+	kvfree(ICR_ITEM(lip)->ic_item.li_lv_shadow);
 	kmem_cache_free(xfs_icreate_cache, ICR_ITEM(lip));
 }
 
diff --git a/fs/xfs/xfs_inode_item.c b/fs/xfs/xfs_inode_item.c
index 0aee97ba0be8..bfbeafc8e120 100644
--- a/fs/xfs/xfs_inode_item.c
+++ b/fs/xfs/xfs_inode_item.c
@@ -856,7 +856,7 @@ xfs_inode_item_destroy(
 	ASSERT(iip->ili_item.li_buf == NULL);
 
 	ip->i_itemp = NULL;
-	kmem_free(iip->ili_item.li_lv_shadow);
+	kvfree(iip->ili_item.li_lv_shadow);
 	kmem_cache_free(xfs_ili_cache, iip);
 }
 
diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index f02b6e558af5..45fb169bd819 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -493,7 +493,7 @@ xfs_attrmulti_attr_get(
 		error = -EFAULT;
 
 out_kfree:
-	kmem_free(args.value);
+	kvfree(args.value);
 	return error;
 }
 
diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
index d38cfaadc726..0009ffbec932 100644
--- a/fs/xfs/xfs_log.c
+++ b/fs/xfs/xfs_log.c
@@ -1662,7 +1662,7 @@ xlog_alloc_log(
 out_free_iclog:
 	for (iclog = log->l_iclog; iclog; iclog = prev_iclog) {
 		prev_iclog = iclog->ic_next;
-		kmem_free(iclog->ic_data);
+		kvfree(iclog->ic_data);
 		kmem_free(iclog);
 		if (prev_iclog == log->l_iclog)
 			break;
@@ -2119,7 +2119,7 @@ xlog_dealloc_log(
 	iclog = log->l_iclog;
 	for (i = 0; i < log->l_iclog_bufs; i++) {
 		next_iclog = iclog->ic_next;
-		kmem_free(iclog->ic_data);
+		kvfree(iclog->ic_data);
 		kmem_free(iclog);
 		iclog = next_iclog;
 	}
diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
index 3c705f22b0ab..2c0512916cc9 100644
--- a/fs/xfs/xfs_log_cil.c
+++ b/fs/xfs/xfs_log_cil.c
@@ -339,7 +339,7 @@ xlog_cil_alloc_shadow_bufs(
 			 * the buffer, only the log vector header and the iovec
 			 * storage.
 			 */
-			kmem_free(lip->li_lv_shadow);
+			kvfree(lip->li_lv_shadow);
 			lv = xlog_kvmalloc(buf_size);
 
 			memset(lv, 0, xlog_cil_iovec_space(niovecs));
diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index e3bd503edcab..295306ef6959 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -361,7 +361,7 @@ xlog_find_verify_cycle(
 	*new_blk = -1;
 
 out:
-	kmem_free(buffer);
+	kvfree(buffer);
 	return error;
 }
 
@@ -477,7 +477,7 @@ xlog_find_verify_log_record(
 		*last_blk = i;
 
 out:
-	kmem_free(buffer);
+	kvfree(buffer);
 	return error;
 }
 
@@ -731,7 +731,7 @@ xlog_find_head(
 			goto out_free_buffer;
 	}
 
-	kmem_free(buffer);
+	kvfree(buffer);
 	if (head_blk == log_bbnum)
 		*return_head_blk = 0;
 	else
@@ -745,7 +745,7 @@ xlog_find_head(
 	return 0;
 
 out_free_buffer:
-	kmem_free(buffer);
+	kvfree(buffer);
 	if (error)
 		xfs_warn(log->l_mp, "failed to find log head");
 	return error;
@@ -999,7 +999,7 @@ xlog_verify_tail(
 		"Tail block (0x%llx) overwrite detected. Updated to 0x%llx",
 			 orig_tail, *tail_blk);
 out:
-	kmem_free(buffer);
+	kvfree(buffer);
 	return error;
 }
 
@@ -1046,7 +1046,7 @@ xlog_verify_head(
 	error = xlog_rseek_logrec_hdr(log, *head_blk, *tail_blk,
 				      XLOG_MAX_ICLOGS, tmp_buffer,
 				      &tmp_rhead_blk, &tmp_rhead, &tmp_wrapped);
-	kmem_free(tmp_buffer);
+	kvfree(tmp_buffer);
 	if (error < 0)
 		return error;
 
@@ -1365,7 +1365,7 @@ xlog_find_tail(
 		error = xlog_clear_stale_blocks(log, tail_lsn);
 
 done:
-	kmem_free(buffer);
+	kvfree(buffer);
 
 	if (error)
 		xfs_warn(log->l_mp, "failed to locate log tail");
@@ -1399,6 +1399,7 @@ xlog_find_zeroed(
 	xfs_daddr_t	new_blk, last_blk, start_blk;
 	xfs_daddr_t     num_scan_bblks;
 	int	        error, log_bbnum = log->l_logBBsize;
+	int		ret = 1;
 
 	*blk_no = 0;
 
@@ -1413,8 +1414,7 @@ xlog_find_zeroed(
 	first_cycle = xlog_get_cycle(offset);
 	if (first_cycle == 0) {		/* completely zeroed log */
 		*blk_no = 0;
-		kmem_free(buffer);
-		return 1;
+		goto out_free_buffer;
 	}
 
 	/* check partially zeroed log */
@@ -1424,8 +1424,8 @@ xlog_find_zeroed(
 
 	last_cycle = xlog_get_cycle(offset);
 	if (last_cycle != 0) {		/* log completely written to */
-		kmem_free(buffer);
-		return 0;
+		ret = 0;
+		goto out_free_buffer;
 	}
 
 	/* we have a partially zeroed log */
@@ -1471,10 +1471,10 @@ xlog_find_zeroed(
 
 	*blk_no = last_blk;
 out_free_buffer:
-	kmem_free(buffer);
+	kvfree(buffer);
 	if (error)
 		return error;
-	return 1;
+	return ret;
 }
 
 /*
@@ -1583,7 +1583,7 @@ xlog_write_log_records(
 	}
 
 out_free_buffer:
-	kmem_free(buffer);
+	kvfree(buffer);
 	return error;
 }
 
@@ -2183,7 +2183,7 @@ xlog_recover_add_to_trans(
 		"bad number of regions (%d) in inode log format",
 				  in_f->ilf_size);
 			ASSERT(0);
-			kmem_free(ptr);
+			kvfree(ptr);
 			return -EFSCORRUPTED;
 		}
 
@@ -2197,7 +2197,7 @@ xlog_recover_add_to_trans(
 	"log item region count (%d) overflowed size (%d)",
 				item->ri_cnt, item->ri_total);
 		ASSERT(0);
-		kmem_free(ptr);
+		kvfree(ptr);
 		return -EFSCORRUPTED;
 	}
 
@@ -2227,7 +2227,7 @@ xlog_recover_free_trans(
 		/* Free the regions in the item. */
 		list_del(&item->ri_list);
 		for (i = 0; i < item->ri_cnt; i++)
-			kmem_free(item->ri_buf[i].i_addr);
+			kvfree(item->ri_buf[i].i_addr);
 		/* Free the item itself */
 		kmem_free(item->ri_buf);
 		kmem_free(item);
@@ -3024,7 +3024,7 @@ xlog_do_recovery_pass(
 
 		hblks = xlog_logrec_hblks(log, rhead);
 		if (hblks != 1) {
-			kmem_free(hbp);
+			kvfree(hbp);
 			hbp = xlog_alloc_buffer(log, hblks);
 		}
 	} else {
@@ -3038,7 +3038,7 @@ xlog_do_recovery_pass(
 		return -ENOMEM;
 	dbp = xlog_alloc_buffer(log, BTOBB(h_size));
 	if (!dbp) {
-		kmem_free(hbp);
+		kvfree(hbp);
 		return -ENOMEM;
 	}
 
@@ -3199,9 +3199,9 @@ xlog_do_recovery_pass(
 	}
 
  bread_err2:
-	kmem_free(dbp);
+	kvfree(dbp);
  bread_err1:
-	kmem_free(hbp);
+	kvfree(hbp);
 
 	/*
 	 * Submit buffers that have been added from the last record processed,
diff --git a/fs/xfs/xfs_refcount_item.c b/fs/xfs/xfs_refcount_item.c
index 78d0cda60abf..a9b322e23cfb 100644
--- a/fs/xfs/xfs_refcount_item.c
+++ b/fs/xfs/xfs_refcount_item.c
@@ -36,7 +36,7 @@ STATIC void
 xfs_cui_item_free(
 	struct xfs_cui_log_item	*cuip)
 {
-	kmem_free(cuip->cui_item.li_lv_shadow);
+	kvfree(cuip->cui_item.li_lv_shadow);
 	if (cuip->cui_format.cui_nextents > XFS_CUI_MAX_FAST_EXTENTS)
 		kmem_free(cuip);
 	else
@@ -207,7 +207,7 @@ xfs_cud_item_release(
 	struct xfs_cud_log_item	*cudp = CUD_ITEM(lip);
 
 	xfs_cui_release(cudp->cud_cuip);
-	kmem_free(cudp->cud_item.li_lv_shadow);
+	kvfree(cudp->cud_item.li_lv_shadow);
 	kmem_cache_free(xfs_cud_cache, cudp);
 }
 
diff --git a/fs/xfs/xfs_rmap_item.c b/fs/xfs/xfs_rmap_item.c
index 31a921fc34b2..489ca8c0e1dc 100644
--- a/fs/xfs/xfs_rmap_item.c
+++ b/fs/xfs/xfs_rmap_item.c
@@ -36,7 +36,7 @@ STATIC void
 xfs_rui_item_free(
 	struct xfs_rui_log_item	*ruip)
 {
-	kmem_free(ruip->rui_item.li_lv_shadow);
+	kvfree(ruip->rui_item.li_lv_shadow);
 	if (ruip->rui_format.rui_nextents > XFS_RUI_MAX_FAST_EXTENTS)
 		kmem_free(ruip);
 	else
@@ -206,7 +206,7 @@ xfs_rud_item_release(
 	struct xfs_rud_log_item	*rudp = RUD_ITEM(lip);
 
 	xfs_rui_release(rudp->rud_ruip);
-	kmem_free(rudp->rud_item.li_lv_shadow);
+	kvfree(rudp->rud_item.li_lv_shadow);
 	kmem_cache_free(xfs_rud_cache, rudp);
 }
 
diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c
index 8a8d6197203e..57ed9baaf156 100644
--- a/fs/xfs/xfs_rtalloc.c
+++ b/fs/xfs/xfs_rtalloc.c
@@ -1059,10 +1059,10 @@ xfs_growfs_rt(
 	 */
 	if (rsum_cache != mp->m_rsum_cache) {
 		if (error) {
-			kmem_free(mp->m_rsum_cache);
+			kvfree(mp->m_rsum_cache);
 			mp->m_rsum_cache = rsum_cache;
 		} else {
-			kmem_free(rsum_cache);
+			kvfree(rsum_cache);
 		}
 	}
 
@@ -1233,7 +1233,7 @@ void
 xfs_rtunmount_inodes(
 	struct xfs_mount	*mp)
 {
-	kmem_free(mp->m_rsum_cache);
+	kvfree(mp->m_rsum_cache);
 	if (mp->m_rbmip)
 		xfs_irele(mp->m_rbmip);
 	if (mp->m_rsumip)
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 05/12] xfs: convert remaining kmem_free() to kfree()
  2024-01-15 22:59 [PATCH 00/12] xfs: remove remaining kmem interfaces and GFP_NOFS usage Dave Chinner
                   ` (2 preceding siblings ...)
  2024-01-15 22:59 ` [PATCH 04/12] xfs: convert kmem_free() for kvmalloc users to kvfree() Dave Chinner
@ 2024-01-15 22:59 ` Dave Chinner
  2024-01-18 22:54   ` Darrick J. Wong
  2024-01-15 22:59 ` [PATCH 06/12] xfs: use an empty transaction for fstrim Dave Chinner
                   ` (8 subsequent siblings)
  12 siblings, 1 reply; 29+ messages in thread
From: Dave Chinner @ 2024-01-15 22:59 UTC (permalink / raw)
  To: linux-xfs; +Cc: willy, linux-mm

From: Dave Chinner <dchinner@redhat.com>

The remaining callers of kmem_free() are freeing heap memory, so
we can convert them directly to kfree() and get rid of kmem_free()
altogether.

This conversion was done with:

$ for f in `git grep -l kmem_free fs/xfs`; do
> sed -i s/kmem_free/kfree/ $f
> done
$

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/kmem.h                     | 23 -----------------------
 fs/xfs/libxfs/xfs_ag.c            |  6 +++---
 fs/xfs/libxfs/xfs_attr_leaf.c     |  8 ++++----
 fs/xfs/libxfs/xfs_btree.c         |  2 +-
 fs/xfs/libxfs/xfs_btree_staging.c |  4 ++--
 fs/xfs/libxfs/xfs_da_btree.c      | 10 +++++-----
 fs/xfs/libxfs/xfs_defer.c         |  4 ++--
 fs/xfs/libxfs/xfs_dir2.c          | 18 +++++++++---------
 fs/xfs/libxfs/xfs_dir2_block.c    |  4 ++--
 fs/xfs/libxfs/xfs_dir2_sf.c       |  8 ++++----
 fs/xfs/libxfs/xfs_iext_tree.c     |  8 ++++----
 fs/xfs/libxfs/xfs_inode_fork.c    |  6 +++---
 fs/xfs/scrub/cow_repair.c         |  2 +-
 fs/xfs/xfs_attr_item.c            |  2 +-
 fs/xfs/xfs_attr_list.c            |  4 ++--
 fs/xfs/xfs_buf.c                  | 12 ++++++------
 fs/xfs/xfs_buf_item.c             |  2 +-
 fs/xfs/xfs_buf_item_recover.c     |  6 +++---
 fs/xfs/xfs_discard.c              |  2 +-
 fs/xfs/xfs_error.c                |  4 ++--
 fs/xfs/xfs_extent_busy.c          |  2 +-
 fs/xfs/xfs_extfree_item.c         |  4 ++--
 fs/xfs/xfs_filestream.c           |  4 ++--
 fs/xfs/xfs_inode.c                |  4 ++--
 fs/xfs/xfs_inode_item_recover.c   |  2 +-
 fs/xfs/xfs_ioctl.c                |  6 +++---
 fs/xfs/xfs_iops.c                 |  2 +-
 fs/xfs/xfs_itable.c               |  4 ++--
 fs/xfs/xfs_iwalk.c                |  4 ++--
 fs/xfs/xfs_linux.h                |  3 +--
 fs/xfs/xfs_log.c                  |  8 ++++----
 fs/xfs/xfs_log_cil.c              | 14 +++++++-------
 fs/xfs/xfs_log_recover.c          |  6 +++---
 fs/xfs/xfs_mount.c                |  2 +-
 fs/xfs/xfs_mru_cache.c            |  8 ++++----
 fs/xfs/xfs_qm.c                   |  6 +++---
 fs/xfs/xfs_refcount_item.c        |  2 +-
 fs/xfs/xfs_rmap_item.c            |  2 +-
 fs/xfs/xfs_rtalloc.c              |  2 +-
 fs/xfs/xfs_super.c                |  2 +-
 fs/xfs/xfs_trans_ail.c            |  4 ++--
 41 files changed, 101 insertions(+), 125 deletions(-)
 delete mode 100644 fs/xfs/kmem.h

diff --git a/fs/xfs/kmem.h b/fs/xfs/kmem.h
deleted file mode 100644
index 48e43f29f2a0..000000000000
--- a/fs/xfs/kmem.h
+++ /dev/null
@@ -1,23 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-/*
- * Copyright (c) 2000-2005 Silicon Graphics, Inc.
- * All Rights Reserved.
- */
-#ifndef __XFS_SUPPORT_KMEM_H__
-#define __XFS_SUPPORT_KMEM_H__
-
-#include <linux/slab.h>
-#include <linux/sched.h>
-#include <linux/mm.h>
-#include <linux/vmalloc.h>
-
-/*
- * General memory allocation interfaces
- */
-
-static inline void  kmem_free(const void *ptr)
-{
-	kvfree(ptr);
-}
-
-#endif /* __XFS_SUPPORT_KMEM_H__ */
diff --git a/fs/xfs/libxfs/xfs_ag.c b/fs/xfs/libxfs/xfs_ag.c
index 96a6bfd58931..937ea48d5cc0 100644
--- a/fs/xfs/libxfs/xfs_ag.c
+++ b/fs/xfs/libxfs/xfs_ag.c
@@ -241,7 +241,7 @@ __xfs_free_perag(
 	struct xfs_perag *pag = container_of(head, struct xfs_perag, rcu_head);
 
 	ASSERT(!delayed_work_pending(&pag->pag_blockgc_work));
-	kmem_free(pag);
+	kfree(pag);
 }
 
 /*
@@ -353,7 +353,7 @@ xfs_free_unused_perag_range(
 			break;
 		xfs_buf_hash_destroy(pag);
 		xfs_defer_drain_free(&pag->pag_intents_drain);
-		kmem_free(pag);
+		kfree(pag);
 	}
 }
 
@@ -453,7 +453,7 @@ xfs_initialize_perag(
 	radix_tree_delete(&mp->m_perag_tree, index);
 	spin_unlock(&mp->m_perag_lock);
 out_free_pag:
-	kmem_free(pag);
+	kfree(pag);
 out_unwind_new_pags:
 	/* unwind any prior newly initialized pags */
 	xfs_free_unused_perag_range(mp, first_initialised, agcount);
diff --git a/fs/xfs/libxfs/xfs_attr_leaf.c b/fs/xfs/libxfs/xfs_attr_leaf.c
index 033382cf514d..192d9938a231 100644
--- a/fs/xfs/libxfs/xfs_attr_leaf.c
+++ b/fs/xfs/libxfs/xfs_attr_leaf.c
@@ -923,7 +923,7 @@ xfs_attr_shortform_to_leaf(
 	}
 	error = 0;
 out:
-	kmem_free(tmpbuffer);
+	kfree(tmpbuffer);
 	return error;
 }
 
@@ -1124,7 +1124,7 @@ xfs_attr3_leaf_to_shortform(
 	error = 0;
 
 out:
-	kmem_free(tmpbuffer);
+	kfree(tmpbuffer);
 	return error;
 }
 
@@ -1570,7 +1570,7 @@ xfs_attr3_leaf_compact(
 	 */
 	xfs_trans_log_buf(trans, bp, 0, args->geo->blksize - 1);
 
-	kmem_free(tmpbuffer);
+	kfree(tmpbuffer);
 }
 
 /*
@@ -2290,7 +2290,7 @@ xfs_attr3_leaf_unbalance(
 		}
 		memcpy(save_leaf, tmp_leaf, state->args->geo->blksize);
 		savehdr = tmphdr; /* struct copy */
-		kmem_free(tmp_leaf);
+		kfree(tmp_leaf);
 	}
 
 	xfs_attr3_leaf_hdr_to_disk(state->args->geo, save_leaf, &savehdr);
diff --git a/fs/xfs/libxfs/xfs_btree.c b/fs/xfs/libxfs/xfs_btree.c
index ea8d3659df20..1adfc35c99c9 100644
--- a/fs/xfs/libxfs/xfs_btree.c
+++ b/fs/xfs/libxfs/xfs_btree.c
@@ -451,7 +451,7 @@ xfs_btree_del_cursor(
 	ASSERT(cur->bc_btnum != XFS_BTNUM_BMAP || cur->bc_ino.allocated == 0 ||
 	       xfs_is_shutdown(cur->bc_mp) || error != 0);
 	if (unlikely(cur->bc_flags & XFS_BTREE_STAGING))
-		kmem_free(cur->bc_ops);
+		kfree(cur->bc_ops);
 	if (!(cur->bc_flags & XFS_BTREE_LONG_PTRS) && cur->bc_ag.pag)
 		xfs_perag_put(cur->bc_ag.pag);
 	kmem_cache_free(cur->bc_cache, cur);
diff --git a/fs/xfs/libxfs/xfs_btree_staging.c b/fs/xfs/libxfs/xfs_btree_staging.c
index 065e4a00a2f4..961f6b898f4b 100644
--- a/fs/xfs/libxfs/xfs_btree_staging.c
+++ b/fs/xfs/libxfs/xfs_btree_staging.c
@@ -171,7 +171,7 @@ xfs_btree_commit_afakeroot(
 
 	trace_xfs_btree_commit_afakeroot(cur);
 
-	kmem_free((void *)cur->bc_ops);
+	kfree((void *)cur->bc_ops);
 	cur->bc_ag.agbp = agbp;
 	cur->bc_ops = ops;
 	cur->bc_flags &= ~XFS_BTREE_STAGING;
@@ -254,7 +254,7 @@ xfs_btree_commit_ifakeroot(
 
 	trace_xfs_btree_commit_ifakeroot(cur);
 
-	kmem_free((void *)cur->bc_ops);
+	kfree((void *)cur->bc_ops);
 	cur->bc_ino.ifake = NULL;
 	cur->bc_ino.whichfork = whichfork;
 	cur->bc_ops = ops;
diff --git a/fs/xfs/libxfs/xfs_da_btree.c b/fs/xfs/libxfs/xfs_da_btree.c
index 331b9251b185..3383b4525381 100644
--- a/fs/xfs/libxfs/xfs_da_btree.c
+++ b/fs/xfs/libxfs/xfs_da_btree.c
@@ -2220,7 +2220,7 @@ xfs_da_grow_inode_int(
 
 out_free_map:
 	if (mapp != &map)
-		kmem_free(mapp);
+		kfree(mapp);
 	return error;
 }
 
@@ -2559,7 +2559,7 @@ xfs_dabuf_map(
 	*nmaps = nirecs;
 out_free_irecs:
 	if (irecs != &irec)
-		kmem_free(irecs);
+		kfree(irecs);
 	return error;
 
 invalid_mapping:
@@ -2615,7 +2615,7 @@ xfs_da_get_buf(
 
 out_free:
 	if (mapp != &map)
-		kmem_free(mapp);
+		kfree(mapp);
 
 	return error;
 }
@@ -2656,7 +2656,7 @@ xfs_da_read_buf(
 	*bpp = bp;
 out_free:
 	if (mapp != &map)
-		kmem_free(mapp);
+		kfree(mapp);
 
 	return error;
 }
@@ -2687,7 +2687,7 @@ xfs_da_reada_buf(
 
 out_free:
 	if (mapp != &map)
-		kmem_free(mapp);
+		kfree(mapp);
 
 	return error;
 }
diff --git a/fs/xfs/libxfs/xfs_defer.c b/fs/xfs/libxfs/xfs_defer.c
index 07d318b1f807..75689c151a54 100644
--- a/fs/xfs/libxfs/xfs_defer.c
+++ b/fs/xfs/libxfs/xfs_defer.c
@@ -1038,7 +1038,7 @@ xfs_defer_ops_capture_abort(
 	for (i = 0; i < dfc->dfc_held.dr_inos; i++)
 		xfs_irele(dfc->dfc_held.dr_ip[i]);
 
-	kmem_free(dfc);
+	kfree(dfc);
 }
 
 /*
@@ -1114,7 +1114,7 @@ xfs_defer_ops_continue(
 	list_splice_init(&dfc->dfc_dfops, &tp->t_dfops);
 	tp->t_flags |= dfc->dfc_tpflags;
 
-	kmem_free(dfc);
+	kfree(dfc);
 }
 
 /* Release the resources captured and continued during recovery. */
diff --git a/fs/xfs/libxfs/xfs_dir2.c b/fs/xfs/libxfs/xfs_dir2.c
index 370d67300455..e60aa8f8d0a7 100644
--- a/fs/xfs/libxfs/xfs_dir2.c
+++ b/fs/xfs/libxfs/xfs_dir2.c
@@ -109,8 +109,8 @@ xfs_da_mount(
 	mp->m_attr_geo = kzalloc(sizeof(struct xfs_da_geometry),
 				GFP_KERNEL | __GFP_RETRY_MAYFAIL);
 	if (!mp->m_dir_geo || !mp->m_attr_geo) {
-		kmem_free(mp->m_dir_geo);
-		kmem_free(mp->m_attr_geo);
+		kfree(mp->m_dir_geo);
+		kfree(mp->m_attr_geo);
 		return -ENOMEM;
 	}
 
@@ -178,8 +178,8 @@ void
 xfs_da_unmount(
 	struct xfs_mount	*mp)
 {
-	kmem_free(mp->m_dir_geo);
-	kmem_free(mp->m_attr_geo);
+	kfree(mp->m_dir_geo);
+	kfree(mp->m_attr_geo);
 }
 
 /*
@@ -244,7 +244,7 @@ xfs_dir_init(
 	args->dp = dp;
 	args->trans = tp;
 	error = xfs_dir2_sf_create(args, pdp->i_ino);
-	kmem_free(args);
+	kfree(args);
 	return error;
 }
 
@@ -313,7 +313,7 @@ xfs_dir_createname(
 		rval = xfs_dir2_node_addname(args);
 
 out_free:
-	kmem_free(args);
+	kfree(args);
 	return rval;
 }
 
@@ -419,7 +419,7 @@ xfs_dir_lookup(
 	}
 out_free:
 	xfs_iunlock(dp, lock_mode);
-	kmem_free(args);
+	kfree(args);
 	return rval;
 }
 
@@ -477,7 +477,7 @@ xfs_dir_removename(
 	else
 		rval = xfs_dir2_node_removename(args);
 out_free:
-	kmem_free(args);
+	kfree(args);
 	return rval;
 }
 
@@ -538,7 +538,7 @@ xfs_dir_replace(
 	else
 		rval = xfs_dir2_node_replace(args);
 out_free:
-	kmem_free(args);
+	kfree(args);
 	return rval;
 }
 
diff --git a/fs/xfs/libxfs/xfs_dir2_block.c b/fs/xfs/libxfs/xfs_dir2_block.c
index 506c65caaec5..fde46081a824 100644
--- a/fs/xfs/libxfs/xfs_dir2_block.c
+++ b/fs/xfs/libxfs/xfs_dir2_block.c
@@ -1253,7 +1253,7 @@ xfs_dir2_sf_to_block(
 			sfep = xfs_dir2_sf_nextentry(mp, sfp, sfep);
 	}
 	/* Done with the temporary buffer */
-	kmem_free(sfp);
+	kfree(sfp);
 	/*
 	 * Sort the leaf entries by hash value.
 	 */
@@ -1268,6 +1268,6 @@ xfs_dir2_sf_to_block(
 	xfs_dir3_data_check(dp, bp);
 	return 0;
 out_free:
-	kmem_free(sfp);
+	kfree(sfp);
 	return error;
 }
diff --git a/fs/xfs/libxfs/xfs_dir2_sf.c b/fs/xfs/libxfs/xfs_dir2_sf.c
index 7b1f41cff9e0..17a20384c8b7 100644
--- a/fs/xfs/libxfs/xfs_dir2_sf.c
+++ b/fs/xfs/libxfs/xfs_dir2_sf.c
@@ -350,7 +350,7 @@ xfs_dir2_block_to_sf(
 	xfs_dir2_sf_check(args);
 out:
 	xfs_trans_log_inode(args->trans, dp, logflags);
-	kmem_free(sfp);
+	kfree(sfp);
 	return error;
 }
 
@@ -576,7 +576,7 @@ xfs_dir2_sf_addname_hard(
 		sfep = xfs_dir2_sf_nextentry(mp, sfp, sfep);
 		memcpy(sfep, oldsfep, old_isize - nbytes);
 	}
-	kmem_free(buf);
+	kfree(buf);
 	dp->i_disk_size = new_isize;
 	xfs_dir2_sf_check(args);
 }
@@ -1190,7 +1190,7 @@ xfs_dir2_sf_toino4(
 	/*
 	 * Clean up the inode.
 	 */
-	kmem_free(buf);
+	kfree(buf);
 	dp->i_disk_size = newsize;
 	xfs_trans_log_inode(args->trans, dp, XFS_ILOG_CORE | XFS_ILOG_DDATA);
 }
@@ -1262,7 +1262,7 @@ xfs_dir2_sf_toino8(
 	/*
 	 * Clean up the inode.
 	 */
-	kmem_free(buf);
+	kfree(buf);
 	dp->i_disk_size = newsize;
 	xfs_trans_log_inode(args->trans, dp, XFS_ILOG_CORE | XFS_ILOG_DDATA);
 }
diff --git a/fs/xfs/libxfs/xfs_iext_tree.c b/fs/xfs/libxfs/xfs_iext_tree.c
index 4522f3c7a23f..16f18b08fe4c 100644
--- a/fs/xfs/libxfs/xfs_iext_tree.c
+++ b/fs/xfs/libxfs/xfs_iext_tree.c
@@ -747,7 +747,7 @@ xfs_iext_remove_node(
 again:
 	ASSERT(node->ptrs[pos]);
 	ASSERT(node->ptrs[pos] == victim);
-	kmem_free(victim);
+	kfree(victim);
 
 	nr_entries = xfs_iext_node_nr_entries(node, pos) - 1;
 	offset = node->keys[0];
@@ -793,7 +793,7 @@ xfs_iext_remove_node(
 		ASSERT(node == ifp->if_data);
 		ifp->if_data = node->ptrs[0];
 		ifp->if_height--;
-		kmem_free(node);
+		kfree(node);
 	}
 }
 
@@ -867,7 +867,7 @@ xfs_iext_free_last_leaf(
 	struct xfs_ifork	*ifp)
 {
 	ifp->if_height--;
-	kmem_free(ifp->if_data);
+	kfree(ifp->if_data);
 	ifp->if_data = NULL;
 }
 
@@ -1048,7 +1048,7 @@ xfs_iext_destroy_node(
 		}
 	}
 
-	kmem_free(node);
+	kfree(node);
 }
 
 void
diff --git a/fs/xfs/libxfs/xfs_inode_fork.c b/fs/xfs/libxfs/xfs_inode_fork.c
index f3cf7f933e15..f6d5b86b608d 100644
--- a/fs/xfs/libxfs/xfs_inode_fork.c
+++ b/fs/xfs/libxfs/xfs_inode_fork.c
@@ -471,7 +471,7 @@ xfs_iroot_realloc(
 						     (int)new_size);
 		memcpy(np, op, new_max * (uint)sizeof(xfs_fsblock_t));
 	}
-	kmem_free(ifp->if_broot);
+	kfree(ifp->if_broot);
 	ifp->if_broot = new_broot;
 	ifp->if_broot_bytes = (int)new_size;
 	if (ifp->if_broot)
@@ -525,13 +525,13 @@ xfs_idestroy_fork(
 	struct xfs_ifork	*ifp)
 {
 	if (ifp->if_broot != NULL) {
-		kmem_free(ifp->if_broot);
+		kfree(ifp->if_broot);
 		ifp->if_broot = NULL;
 	}
 
 	switch (ifp->if_format) {
 	case XFS_DINODE_FMT_LOCAL:
-		kmem_free(ifp->if_data);
+		kfree(ifp->if_data);
 		ifp->if_data = NULL;
 		break;
 	case XFS_DINODE_FMT_EXTENTS:
diff --git a/fs/xfs/scrub/cow_repair.c b/fs/xfs/scrub/cow_repair.c
index 1e82c727af8e..4de3f0f40f48 100644
--- a/fs/xfs/scrub/cow_repair.c
+++ b/fs/xfs/scrub/cow_repair.c
@@ -609,6 +609,6 @@ xrep_bmap_cow(
 out_bitmap:
 	xfsb_bitmap_destroy(&xc->old_cowfork_fsblocks);
 	xoff_bitmap_destroy(&xc->bad_fileoffs);
-	kmem_free(xc);
+	kfree(xc);
 	return error;
 }
diff --git a/fs/xfs/xfs_attr_item.c b/fs/xfs/xfs_attr_item.c
index f7ba80d575d4..2a142cefdc3d 100644
--- a/fs/xfs/xfs_attr_item.c
+++ b/fs/xfs/xfs_attr_item.c
@@ -386,7 +386,7 @@ xfs_attr_free_item(
 		xfs_da_state_free(attr->xattri_da_state);
 	xfs_attri_log_nameval_put(attr->xattri_nameval);
 	if (attr->xattri_da_args->op_flags & XFS_DA_OP_RECOVERY)
-		kmem_free(attr);
+		kfree(attr);
 	else
 		kmem_cache_free(xfs_attr_intent_cache, attr);
 }
diff --git a/fs/xfs/xfs_attr_list.c b/fs/xfs/xfs_attr_list.c
index 5f7a44d21cc9..0318d768520a 100644
--- a/fs/xfs/xfs_attr_list.c
+++ b/fs/xfs/xfs_attr_list.c
@@ -124,7 +124,7 @@ xfs_attr_shortform_list(
 					     XFS_ERRLEVEL_LOW,
 					     context->dp->i_mount, sfe,
 					     sizeof(*sfe));
-			kmem_free(sbuf);
+			kfree(sbuf);
 			return -EFSCORRUPTED;
 		}
 
@@ -188,7 +188,7 @@ xfs_attr_shortform_list(
 		cursor->offset++;
 	}
 out:
-	kmem_free(sbuf);
+	kfree(sbuf);
 	return error;
 }
 
diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index c348af806616..a09ffbbb0dda 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -204,7 +204,7 @@ xfs_buf_free_maps(
 	struct xfs_buf	*bp)
 {
 	if (bp->b_maps != &bp->__b_map) {
-		kmem_free(bp->b_maps);
+		kfree(bp->b_maps);
 		bp->b_maps = NULL;
 	}
 }
@@ -289,7 +289,7 @@ xfs_buf_free_pages(
 	mm_account_reclaimed_pages(bp->b_page_count);
 
 	if (bp->b_pages != bp->b_page_array)
-		kmem_free(bp->b_pages);
+		kfree(bp->b_pages);
 	bp->b_pages = NULL;
 	bp->b_flags &= ~_XBF_PAGES;
 }
@@ -315,7 +315,7 @@ xfs_buf_free(
 	if (bp->b_flags & _XBF_PAGES)
 		xfs_buf_free_pages(bp);
 	else if (bp->b_flags & _XBF_KMEM)
-		kmem_free(bp->b_addr);
+		kfree(bp->b_addr);
 
 	call_rcu(&bp->b_rcu, xfs_buf_free_callback);
 }
@@ -339,7 +339,7 @@ xfs_buf_alloc_kmem(
 	if (((unsigned long)(bp->b_addr + size - 1) & PAGE_MASK) !=
 	    ((unsigned long)bp->b_addr & PAGE_MASK)) {
 		/* b_addr spans two pages - use alloc_page instead */
-		kmem_free(bp->b_addr);
+		kfree(bp->b_addr);
 		bp->b_addr = NULL;
 		return -ENOMEM;
 	}
@@ -1953,7 +1953,7 @@ xfs_free_buftarg(
 	if (btp->bt_bdev != btp->bt_mount->m_super->s_bdev)
 		bdev_release(btp->bt_bdev_handle);
 
-	kmem_free(btp);
+	kfree(btp);
 }
 
 int
@@ -2045,7 +2045,7 @@ xfs_alloc_buftarg(
 error_lru:
 	list_lru_destroy(&btp->bt_lru);
 error_free:
-	kmem_free(btp);
+	kfree(btp);
 	return NULL;
 }
 
diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c
index 545040c6ae87..43031842341a 100644
--- a/fs/xfs/xfs_buf_item.c
+++ b/fs/xfs/xfs_buf_item.c
@@ -814,7 +814,7 @@ xfs_buf_item_free_format(
 	struct xfs_buf_log_item	*bip)
 {
 	if (bip->bli_formats != &bip->__bli_format) {
-		kmem_free(bip->bli_formats);
+		kfree(bip->bli_formats);
 		bip->bli_formats = NULL;
 	}
 }
diff --git a/fs/xfs/xfs_buf_item_recover.c b/fs/xfs/xfs_buf_item_recover.c
index 34776f4c05ac..09e893cf563c 100644
--- a/fs/xfs/xfs_buf_item_recover.c
+++ b/fs/xfs/xfs_buf_item_recover.c
@@ -129,7 +129,7 @@ xlog_put_buffer_cancelled(
 
 	if (--bcp->bc_refcount == 0) {
 		list_del(&bcp->bc_list);
-		kmem_free(bcp);
+		kfree(bcp);
 	}
 	return true;
 }
@@ -1062,10 +1062,10 @@ xlog_free_buf_cancel_table(
 				&log->l_buf_cancel_table[i],
 				struct xfs_buf_cancel, bc_list))) {
 			list_del(&bc->bc_list);
-			kmem_free(bc);
+			kfree(bc);
 		}
 	}
 
-	kmem_free(log->l_buf_cancel_table);
+	kfree(log->l_buf_cancel_table);
 	log->l_buf_cancel_table = NULL;
 }
diff --git a/fs/xfs/xfs_discard.c b/fs/xfs/xfs_discard.c
index d5787991bb5b..8539f5c9a774 100644
--- a/fs/xfs/xfs_discard.c
+++ b/fs/xfs/xfs_discard.c
@@ -79,7 +79,7 @@ xfs_discard_endio_work(
 		container_of(work, struct xfs_busy_extents, endio_work);
 
 	xfs_extent_busy_clear(extents->mount, &extents->extent_list, false);
-	kmem_free(extents->owner);
+	kfree(extents->owner);
 }
 
 /*
diff --git a/fs/xfs/xfs_error.c b/fs/xfs/xfs_error.c
index 456520d60cd0..7ad0e92c6b5b 100644
--- a/fs/xfs/xfs_error.c
+++ b/fs/xfs/xfs_error.c
@@ -248,7 +248,7 @@ xfs_errortag_init(
 	ret = xfs_sysfs_init(&mp->m_errortag_kobj, &xfs_errortag_ktype,
 				&mp->m_kobj, "errortag");
 	if (ret)
-		kmem_free(mp->m_errortag);
+		kfree(mp->m_errortag);
 	return ret;
 }
 
@@ -257,7 +257,7 @@ xfs_errortag_del(
 	struct xfs_mount	*mp)
 {
 	xfs_sysfs_del(&mp->m_errortag_kobj);
-	kmem_free(mp->m_errortag);
+	kfree(mp->m_errortag);
 }
 
 static bool
diff --git a/fs/xfs/xfs_extent_busy.c b/fs/xfs/xfs_extent_busy.c
index b90c3dd43e03..56cfa1498571 100644
--- a/fs/xfs/xfs_extent_busy.c
+++ b/fs/xfs/xfs_extent_busy.c
@@ -531,7 +531,7 @@ xfs_extent_busy_clear_one(
 	}
 
 	list_del_init(&busyp->list);
-	kmem_free(busyp);
+	kfree(busyp);
 }
 
 static void
diff --git a/fs/xfs/xfs_extfree_item.c b/fs/xfs/xfs_extfree_item.c
index 6062703a2723..8c382f092332 100644
--- a/fs/xfs/xfs_extfree_item.c
+++ b/fs/xfs/xfs_extfree_item.c
@@ -42,7 +42,7 @@ xfs_efi_item_free(
 {
 	kvfree(efip->efi_item.li_lv_shadow);
 	if (efip->efi_format.efi_nextents > XFS_EFI_MAX_FAST_EXTENTS)
-		kmem_free(efip);
+		kfree(efip);
 	else
 		kmem_cache_free(xfs_efi_cache, efip);
 }
@@ -231,7 +231,7 @@ xfs_efd_item_free(struct xfs_efd_log_item *efdp)
 {
 	kvfree(efdp->efd_item.li_lv_shadow);
 	if (efdp->efd_format.efd_nextents > XFS_EFD_MAX_FAST_EXTENTS)
-		kmem_free(efdp);
+		kfree(efdp);
 	else
 		kmem_cache_free(xfs_efd_cache, efdp);
 }
diff --git a/fs/xfs/xfs_filestream.c b/fs/xfs/xfs_filestream.c
index e2a3c8d3fe4f..e3aaa0555597 100644
--- a/fs/xfs/xfs_filestream.c
+++ b/fs/xfs/xfs_filestream.c
@@ -44,7 +44,7 @@ xfs_fstrm_free_func(
 	atomic_dec(&pag->pagf_fstrms);
 	xfs_perag_rele(pag);
 
-	kmem_free(item);
+	kfree(item);
 }
 
 /*
@@ -326,7 +326,7 @@ xfs_filestream_create_association(
 
 out_free_item:
 	xfs_perag_rele(item->pag);
-	kmem_free(item);
+	kfree(item);
 out_put_fstrms:
 	atomic_dec(&args->pag->pagf_fstrms);
 	return 0;
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index 1fd94958aa97..37ec247edc13 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -671,7 +671,7 @@ xfs_lookup(
 
 out_free_name:
 	if (ci_name)
-		kmem_free(ci_name->name);
+		kfree(ci_name->name);
 out_unlock:
 	*ipp = NULL;
 	return error;
@@ -2378,7 +2378,7 @@ xfs_ifree(
 	 * already been freed by xfs_attr_inactive.
 	 */
 	if (ip->i_df.if_format == XFS_DINODE_FMT_LOCAL) {
-		kmem_free(ip->i_df.if_data);
+		kfree(ip->i_df.if_data);
 		ip->i_df.if_data = NULL;
 		ip->i_df.if_bytes = 0;
 	}
diff --git a/fs/xfs/xfs_inode_item_recover.c b/fs/xfs/xfs_inode_item_recover.c
index 5d7b937179a0..dbdab4ce7c44 100644
--- a/fs/xfs/xfs_inode_item_recover.c
+++ b/fs/xfs/xfs_inode_item_recover.c
@@ -554,7 +554,7 @@ xlog_recover_inode_commit_pass2(
 	xfs_buf_relse(bp);
 error:
 	if (need_free)
-		kmem_free(in_f);
+		kfree(in_f);
 	return error;
 }
 
diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index 45fb169bd819..7eeebcb6b925 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -435,7 +435,7 @@ xfs_ioc_attr_list(
 	    copy_to_user(ucursor, &context.cursor, sizeof(context.cursor)))
 		error = -EFAULT;
 out_free:
-	kmem_free(buffer);
+	kfree(buffer);
 	return error;
 }
 
@@ -1506,7 +1506,7 @@ xfs_ioc_getbmap(
 
 	error = 0;
 out_free_buf:
-	kmem_free(buf);
+	kfree(buf);
 	return error;
 }
 
@@ -1636,7 +1636,7 @@ xfs_ioc_getfsmap(
 	}
 
 out_free:
-	kmem_free(recs);
+	kfree(recs);
 	return error;
 }
 
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index a0d77f5f512e..be102fd49560 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -346,7 +346,7 @@ xfs_vn_ci_lookup(
 	dname.name = ci_name.name;
 	dname.len = ci_name.len;
 	dentry = d_add_ci(dentry, VFS_I(ip), &dname);
-	kmem_free(ci_name.name);
+	kfree(ci_name.name);
 	return dentry;
 }
 
diff --git a/fs/xfs/xfs_itable.c b/fs/xfs/xfs_itable.c
index 14211174267a..95fc31b9f87d 100644
--- a/fs/xfs/xfs_itable.c
+++ b/fs/xfs/xfs_itable.c
@@ -214,7 +214,7 @@ xfs_bulkstat_one(
 			breq->startino, &bc);
 	xfs_trans_cancel(tp);
 out:
-	kmem_free(bc.buf);
+	kfree(bc.buf);
 
 	/*
 	 * If we reported one inode to userspace then we abort because we hit
@@ -309,7 +309,7 @@ xfs_bulkstat(
 			xfs_bulkstat_iwalk, breq->icount, &bc);
 	xfs_trans_cancel(tp);
 out:
-	kmem_free(bc.buf);
+	kfree(bc.buf);
 
 	/*
 	 * We found some inodes, so clear the error status and return them.
diff --git a/fs/xfs/xfs_iwalk.c b/fs/xfs/xfs_iwalk.c
index 5dd622aa54c5..6d2eb6364867 100644
--- a/fs/xfs/xfs_iwalk.c
+++ b/fs/xfs/xfs_iwalk.c
@@ -172,7 +172,7 @@ STATIC void
 xfs_iwalk_free(
 	struct xfs_iwalk_ag	*iwag)
 {
-	kmem_free(iwag->recs);
+	kfree(iwag->recs);
 	iwag->recs = NULL;
 }
 
@@ -627,7 +627,7 @@ xfs_iwalk_ag_work(
 	xfs_iwalk_free(iwag);
 out:
 	xfs_perag_put(iwag->pag);
-	kmem_free(iwag);
+	kfree(iwag);
 	return error;
 }
 
diff --git a/fs/xfs/xfs_linux.h b/fs/xfs/xfs_linux.h
index 666618b463c9..caccb7f76690 100644
--- a/fs/xfs/xfs_linux.h
+++ b/fs/xfs/xfs_linux.h
@@ -20,8 +20,6 @@ typedef __u32			xfs_dev_t;
 typedef __u32			xfs_nlink_t;
 
 #include "xfs_types.h"
-
-#include "kmem.h"
 #include "mrlock.h"
 
 #include <linux/semaphore.h>
@@ -30,6 +28,7 @@ typedef __u32			xfs_nlink_t;
 #include <linux/kernel.h>
 #include <linux/blkdev.h>
 #include <linux/slab.h>
+#include <linux/vmalloc.h>
 #include <linux/crc32c.h>
 #include <linux/module.h>
 #include <linux/mutex.h>
diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
index 0009ffbec932..ee39639bb92b 100644
--- a/fs/xfs/xfs_log.c
+++ b/fs/xfs/xfs_log.c
@@ -1663,12 +1663,12 @@ xlog_alloc_log(
 	for (iclog = log->l_iclog; iclog; iclog = prev_iclog) {
 		prev_iclog = iclog->ic_next;
 		kvfree(iclog->ic_data);
-		kmem_free(iclog);
+		kfree(iclog);
 		if (prev_iclog == log->l_iclog)
 			break;
 	}
 out_free_log:
-	kmem_free(log);
+	kfree(log);
 out:
 	return ERR_PTR(error);
 }	/* xlog_alloc_log */
@@ -2120,13 +2120,13 @@ xlog_dealloc_log(
 	for (i = 0; i < log->l_iclog_bufs; i++) {
 		next_iclog = iclog->ic_next;
 		kvfree(iclog->ic_data);
-		kmem_free(iclog);
+		kfree(iclog);
 		iclog = next_iclog;
 	}
 
 	log->l_mp->m_log = NULL;
 	destroy_workqueue(log->l_ioend_workqueue);
-	kmem_free(log);
+	kfree(log);
 }
 
 /*
diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
index 2c0512916cc9..815a2181004c 100644
--- a/fs/xfs/xfs_log_cil.c
+++ b/fs/xfs/xfs_log_cil.c
@@ -703,7 +703,7 @@ xlog_cil_free_logvec(
 	while (!list_empty(lv_chain)) {
 		lv = list_first_entry(lv_chain, struct xfs_log_vec, lv_list);
 		list_del_init(&lv->lv_list);
-		kmem_free(lv);
+		kfree(lv);
 	}
 }
 
@@ -753,7 +753,7 @@ xlog_cil_committed(
 		return;
 	}
 
-	kmem_free(ctx);
+	kfree(ctx);
 }
 
 void
@@ -1339,7 +1339,7 @@ xlog_cil_push_work(
 out_skip:
 	up_write(&cil->xc_ctx_lock);
 	xfs_log_ticket_put(new_ctx->ticket);
-	kmem_free(new_ctx);
+	kfree(new_ctx);
 	return;
 
 out_abort_free_ticket:
@@ -1533,7 +1533,7 @@ xlog_cil_process_intents(
 		set_bit(XFS_LI_WHITEOUT, &ilip->li_flags);
 		trace_xfs_cil_whiteout_mark(ilip);
 		len += ilip->li_lv->lv_bytes;
-		kmem_free(ilip->li_lv);
+		kfree(ilip->li_lv);
 		ilip->li_lv = NULL;
 
 		xfs_trans_del_item(lip);
@@ -1786,7 +1786,7 @@ xlog_cil_init(
 out_destroy_wq:
 	destroy_workqueue(cil->xc_push_wq);
 out_destroy_cil:
-	kmem_free(cil);
+	kfree(cil);
 	return -ENOMEM;
 }
 
@@ -1799,12 +1799,12 @@ xlog_cil_destroy(
 	if (cil->xc_ctx) {
 		if (cil->xc_ctx->ticket)
 			xfs_log_ticket_put(cil->xc_ctx->ticket);
-		kmem_free(cil->xc_ctx);
+		kfree(cil->xc_ctx);
 	}
 
 	ASSERT(test_bit(XLOG_CIL_EMPTY, &cil->xc_flags));
 	free_percpu(cil->xc_pcp);
 	destroy_workqueue(cil->xc_push_wq);
-	kmem_free(cil);
+	kfree(cil);
 }
 
diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index 295306ef6959..e9ed43a833af 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -2229,11 +2229,11 @@ xlog_recover_free_trans(
 		for (i = 0; i < item->ri_cnt; i++)
 			kvfree(item->ri_buf[i].i_addr);
 		/* Free the item itself */
-		kmem_free(item->ri_buf);
-		kmem_free(item);
+		kfree(item->ri_buf);
+		kfree(item);
 	}
 	/* Free the transaction recover structure */
-	kmem_free(trans);
+	kfree(trans);
 }
 
 /*
diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
index aabb25dc3efa..7328034d42ed 100644
--- a/fs/xfs/xfs_mount.c
+++ b/fs/xfs/xfs_mount.c
@@ -45,7 +45,7 @@ xfs_uuid_table_free(void)
 {
 	if (xfs_uuid_table_size == 0)
 		return;
-	kmem_free(xfs_uuid_table);
+	kfree(xfs_uuid_table);
 	xfs_uuid_table = NULL;
 	xfs_uuid_table_size = 0;
 }
diff --git a/fs/xfs/xfs_mru_cache.c b/fs/xfs/xfs_mru_cache.c
index feae3115617b..ce496704748d 100644
--- a/fs/xfs/xfs_mru_cache.c
+++ b/fs/xfs/xfs_mru_cache.c
@@ -365,9 +365,9 @@ xfs_mru_cache_create(
 
 exit:
 	if (err && mru && mru->lists)
-		kmem_free(mru->lists);
+		kfree(mru->lists);
 	if (err && mru)
-		kmem_free(mru);
+		kfree(mru);
 
 	return err;
 }
@@ -407,8 +407,8 @@ xfs_mru_cache_destroy(
 
 	xfs_mru_cache_flush(mru);
 
-	kmem_free(mru->lists);
-	kmem_free(mru);
+	kfree(mru->lists);
+	kfree(mru);
 }
 
 /*
diff --git a/fs/xfs/xfs_qm.c b/fs/xfs/xfs_qm.c
index b130bf49013b..46a7fe70e57e 100644
--- a/fs/xfs/xfs_qm.c
+++ b/fs/xfs/xfs_qm.c
@@ -701,7 +701,7 @@ xfs_qm_init_quotainfo(
 out_free_lru:
 	list_lru_destroy(&qinf->qi_lru);
 out_free_qinf:
-	kmem_free(qinf);
+	kfree(qinf);
 	mp->m_quotainfo = NULL;
 	return error;
 }
@@ -725,7 +725,7 @@ xfs_qm_destroy_quotainfo(
 	xfs_qm_destroy_quotainos(qi);
 	mutex_destroy(&qi->qi_tree_lock);
 	mutex_destroy(&qi->qi_quotaofflock);
-	kmem_free(qi);
+	kfree(qi);
 	mp->m_quotainfo = NULL;
 }
 
@@ -1060,7 +1060,7 @@ xfs_qm_reset_dqcounts_buf(
 	} while (nmaps > 0);
 
 out:
-	kmem_free(map);
+	kfree(map);
 	return error;
 }
 
diff --git a/fs/xfs/xfs_refcount_item.c b/fs/xfs/xfs_refcount_item.c
index a9b322e23cfb..d850b9685f7f 100644
--- a/fs/xfs/xfs_refcount_item.c
+++ b/fs/xfs/xfs_refcount_item.c
@@ -38,7 +38,7 @@ xfs_cui_item_free(
 {
 	kvfree(cuip->cui_item.li_lv_shadow);
 	if (cuip->cui_format.cui_nextents > XFS_CUI_MAX_FAST_EXTENTS)
-		kmem_free(cuip);
+		kfree(cuip);
 	else
 		kmem_cache_free(xfs_cui_cache, cuip);
 }
diff --git a/fs/xfs/xfs_rmap_item.c b/fs/xfs/xfs_rmap_item.c
index 489ca8c0e1dc..a40b92ac81e8 100644
--- a/fs/xfs/xfs_rmap_item.c
+++ b/fs/xfs/xfs_rmap_item.c
@@ -38,7 +38,7 @@ xfs_rui_item_free(
 {
 	kvfree(ruip->rui_item.li_lv_shadow);
 	if (ruip->rui_format.rui_nextents > XFS_RUI_MAX_FAST_EXTENTS)
-		kmem_free(ruip);
+		kfree(ruip);
 	else
 		kmem_cache_free(xfs_rui_cache, ruip);
 }
diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c
index 57ed9baaf156..2f85567f3d75 100644
--- a/fs/xfs/xfs_rtalloc.c
+++ b/fs/xfs/xfs_rtalloc.c
@@ -1050,7 +1050,7 @@ xfs_growfs_rt(
 	/*
 	 * Free the fake mp structure.
 	 */
-	kmem_free(nmp);
+	kfree(nmp);
 
 	/*
 	 * If we had to allocate a new rsum_cache, we either need to free the
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 7b1b29814be2..96cb00e94551 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -773,7 +773,7 @@ xfs_mount_free(
 	debugfs_remove(mp->m_debugfs);
 	kfree(mp->m_rtname);
 	kfree(mp->m_logname);
-	kmem_free(mp);
+	kfree(mp);
 }
 
 STATIC int
diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c
index 5f206cdb40ff..e4c343096f95 100644
--- a/fs/xfs/xfs_trans_ail.c
+++ b/fs/xfs/xfs_trans_ail.c
@@ -922,7 +922,7 @@ xfs_trans_ail_init(
 	return 0;
 
 out_free_ailp:
-	kmem_free(ailp);
+	kfree(ailp);
 	return -ENOMEM;
 }
 
@@ -933,5 +933,5 @@ xfs_trans_ail_destroy(
 	struct xfs_ail	*ailp = mp->m_ail;
 
 	kthread_stop(ailp->ail_task);
-	kmem_free(ailp);
+	kfree(ailp);
 }
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 06/12] xfs: use an empty transaction for fstrim
  2024-01-15 22:59 [PATCH 00/12] xfs: remove remaining kmem interfaces and GFP_NOFS usage Dave Chinner
                   ` (3 preceding siblings ...)
  2024-01-15 22:59 ` [PATCH 05/12] xfs: convert remaining kmem_free() to kfree() Dave Chinner
@ 2024-01-15 22:59 ` Dave Chinner
  2024-01-18 22:55   ` Darrick J. Wong
  2024-01-15 22:59 ` [PATCH 07/12] xfs: use __GFP_NOLOCKDEP instead of GFP_NOFS Dave Chinner
                   ` (7 subsequent siblings)
  12 siblings, 1 reply; 29+ messages in thread
From: Dave Chinner @ 2024-01-15 22:59 UTC (permalink / raw)
  To: linux-xfs; +Cc: willy, linux-mm

From: Dave Chinner <dchinner@redhat.com>

We currently use a btree walk in the fstrim code. This requires a
btree cursor and btree cursors are only used inside transactions
except for the fstrim code. This means that all the btree operations
that allocate memory operate in both GFP_KERNEL and GFP_NOFS
contexts.

This causes problems with lockdep being unable to determine the
difference between objects that are safe to lock both above and
below memory reclaim. Free space btree buffers are definitely locked
both above and below reclaim and that means we have to mark all
btree infrastructure allocations with GFP_NOFS to avoid potential
lockdep false positives.

If we wrap this btree walk in an empty cursor, all btree walks are
now done under transaction context and so all allocations inherit
GFP_NOFS context from the tranaction. This enables us to move all
the btree allocations to GFP_KERNEL context and hence help remove
the explicit use of GFP_NOFS in XFS.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_discard.c | 15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/fs/xfs/xfs_discard.c b/fs/xfs/xfs_discard.c
index 8539f5c9a774..299b8f907292 100644
--- a/fs/xfs/xfs_discard.c
+++ b/fs/xfs/xfs_discard.c
@@ -8,6 +8,7 @@
 #include "xfs_format.h"
 #include "xfs_log_format.h"
 #include "xfs_trans_resv.h"
+#include "xfs_trans.h"
 #include "xfs_mount.h"
 #include "xfs_btree.h"
 #include "xfs_alloc_btree.h"
@@ -120,7 +121,7 @@ xfs_discard_extents(
 		error = __blkdev_issue_discard(mp->m_ddev_targp->bt_bdev,
 				XFS_AGB_TO_DADDR(mp, busyp->agno, busyp->bno),
 				XFS_FSB_TO_BB(mp, busyp->length),
-				GFP_NOFS, &bio);
+				GFP_KERNEL, &bio);
 		if (error && error != -EOPNOTSUPP) {
 			xfs_info(mp,
 	 "discard failed for extent [0x%llx,%u], error %d",
@@ -155,6 +156,7 @@ xfs_trim_gather_extents(
 	uint64_t		*blocks_trimmed)
 {
 	struct xfs_mount	*mp = pag->pag_mount;
+	struct xfs_trans	*tp;
 	struct xfs_btree_cur	*cur;
 	struct xfs_buf		*agbp;
 	int			error;
@@ -168,11 +170,15 @@ xfs_trim_gather_extents(
 	 */
 	xfs_log_force(mp, XFS_LOG_SYNC);
 
-	error = xfs_alloc_read_agf(pag, NULL, 0, &agbp);
+	error = xfs_trans_alloc_empty(mp, &tp);
 	if (error)
 		return error;
 
-	cur = xfs_allocbt_init_cursor(mp, NULL, agbp, pag, XFS_BTNUM_CNT);
+	error = xfs_alloc_read_agf(pag, tp, 0, &agbp);
+	if (error)
+		goto out_trans_cancel;
+
+	cur = xfs_allocbt_init_cursor(mp, tp, agbp, pag, XFS_BTNUM_CNT);
 
 	/*
 	 * Look up the extent length requested in the AGF and start with it.
@@ -279,7 +285,8 @@ xfs_trim_gather_extents(
 		xfs_extent_busy_clear(mp, &extents->extent_list, false);
 out_del_cursor:
 	xfs_btree_del_cursor(cur, error);
-	xfs_buf_relse(agbp);
+out_trans_cancel:
+	xfs_trans_cancel(tp);
 	return error;
 }
 
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 07/12] xfs: use __GFP_NOLOCKDEP instead of GFP_NOFS
  2024-01-15 22:59 [PATCH 00/12] xfs: remove remaining kmem interfaces and GFP_NOFS usage Dave Chinner
                   ` (4 preceding siblings ...)
  2024-01-15 22:59 ` [PATCH 06/12] xfs: use an empty transaction for fstrim Dave Chinner
@ 2024-01-15 22:59 ` Dave Chinner
  2024-01-18 23:32   ` Darrick J. Wong
  2024-06-22  9:44   ` Long Li
  2024-01-15 22:59 ` [PATCH 08/12] xfs: use GFP_KERNEL in pure transaction contexts Dave Chinner
                   ` (6 subsequent siblings)
  12 siblings, 2 replies; 29+ messages in thread
From: Dave Chinner @ 2024-01-15 22:59 UTC (permalink / raw)
  To: linux-xfs; +Cc: willy, linux-mm

From: Dave Chinner <dchinner@redhat.com>

In the past we've had problems with lockdep false positives stemming
from inode locking occurring in memory reclaim contexts (e.g. from
superblock shrinkers). Lockdep doesn't know that inodes access from
above memory reclaim cannot be accessed from below memory reclaim
(and vice versa) but there has never been a good solution to solving
this problem with lockdep annotations.

This situation isn't unique to inode locks - buffers are also locked
above and below memory reclaim, and we have to maintain lock
ordering for them - and against inodes - appropriately. IOWs, the
same code paths and locks are taken both above and below memory
reclaim and so we always need to make sure the lock orders are
consistent. We are spared the lockdep problems this might cause
by the fact that semaphores and bit locks aren't covered by lockdep.

In general, this sort of lockdep false positive detection is cause
by code that runs GFP_KERNEL memory allocation with an actively
referenced inode locked. When it is run from a transaction, memory
allocation is automatically GFP_NOFS, so we don't have reclaim
recursion issues. So in the places where we do memory allocation
with inodes locked outside of a transaction, we have explicitly set
them to use GFP_NOFS allocations to prevent lockdep false positives
from being reported if the allocation dips into direct memory
reclaim.

More recently, __GFP_NOLOCKDEP was added to the memory allocation
flags to tell lockdep not to track that particular allocation for
the purposes of reclaim recursion detection. This is a much better
way of preventing false positives - it allows us to use GFP_KERNEL
context outside of transactions, and allows direct memory reclaim to
proceed normally without throwing out false positive deadlock
warnings.

The obvious places that lock inodes and do memory allocation are the
lookup paths and inode extent list initialisation. These occur in
non-transactional GFP_KERNEL contexts, and so can run direct reclaim
and lock inodes.

This patch makes a first path through all the explicit GFP_NOFS
allocations in XFS and converts the obvious ones to GFP_KERNEL |
__GFP_NOLOCKDEP as a first step towards removing explicit GFP_NOFS
allocations from the XFS code.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/libxfs/xfs_ag.c         |  2 +-
 fs/xfs/libxfs/xfs_btree.h      |  4 +++-
 fs/xfs/libxfs/xfs_da_btree.c   |  8 +++++---
 fs/xfs/libxfs/xfs_dir2.c       | 14 ++++----------
 fs/xfs/libxfs/xfs_iext_tree.c  | 22 +++++++++++++---------
 fs/xfs/libxfs/xfs_inode_fork.c |  8 +++++---
 fs/xfs/xfs_icache.c            |  5 ++---
 fs/xfs/xfs_qm.c                |  6 +++---
 8 files changed, 36 insertions(+), 33 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_ag.c b/fs/xfs/libxfs/xfs_ag.c
index 937ea48d5cc0..036f4ee43fd3 100644
--- a/fs/xfs/libxfs/xfs_ag.c
+++ b/fs/xfs/libxfs/xfs_ag.c
@@ -389,7 +389,7 @@ xfs_initialize_perag(
 		pag->pag_agno = index;
 		pag->pag_mount = mp;
 
-		error = radix_tree_preload(GFP_NOFS);
+		error = radix_tree_preload(GFP_KERNEL | __GFP_RETRY_MAYFAIL);
 		if (error)
 			goto out_free_pag;
 
diff --git a/fs/xfs/libxfs/xfs_btree.h b/fs/xfs/libxfs/xfs_btree.h
index d906324e25c8..75a0e2c8e115 100644
--- a/fs/xfs/libxfs/xfs_btree.h
+++ b/fs/xfs/libxfs/xfs_btree.h
@@ -725,7 +725,9 @@ xfs_btree_alloc_cursor(
 {
 	struct xfs_btree_cur	*cur;
 
-	cur = kmem_cache_zalloc(cache, GFP_NOFS | __GFP_NOFAIL);
+	/* BMBT allocations can come through from non-transactional context. */
+	cur = kmem_cache_zalloc(cache,
+			GFP_KERNEL | __GFP_NOLOCKDEP | __GFP_NOFAIL);
 	cur->bc_tp = tp;
 	cur->bc_mp = mp;
 	cur->bc_btnum = btnum;
diff --git a/fs/xfs/libxfs/xfs_da_btree.c b/fs/xfs/libxfs/xfs_da_btree.c
index 3383b4525381..444ec1560f43 100644
--- a/fs/xfs/libxfs/xfs_da_btree.c
+++ b/fs/xfs/libxfs/xfs_da_btree.c
@@ -85,7 +85,8 @@ xfs_da_state_alloc(
 {
 	struct xfs_da_state	*state;
 
-	state = kmem_cache_zalloc(xfs_da_state_cache, GFP_NOFS | __GFP_NOFAIL);
+	state = kmem_cache_zalloc(xfs_da_state_cache,
+			GFP_KERNEL | __GFP_NOLOCKDEP | __GFP_NOFAIL);
 	state->args = args;
 	state->mp = args->dp->i_mount;
 	return state;
@@ -2519,7 +2520,8 @@ xfs_dabuf_map(
 	int			error = 0, nirecs, i;
 
 	if (nfsb > 1)
-		irecs = kzalloc(sizeof(irec) * nfsb, GFP_NOFS | __GFP_NOFAIL);
+		irecs = kzalloc(sizeof(irec) * nfsb,
+				GFP_KERNEL | __GFP_NOLOCKDEP | __GFP_NOFAIL);
 
 	nirecs = nfsb;
 	error = xfs_bmapi_read(dp, bno, nfsb, irecs, &nirecs,
@@ -2533,7 +2535,7 @@ xfs_dabuf_map(
 	 */
 	if (nirecs > 1) {
 		map = kzalloc(nirecs * sizeof(struct xfs_buf_map),
-				GFP_NOFS | __GFP_NOFAIL);
+				GFP_KERNEL | __GFP_NOLOCKDEP | __GFP_NOFAIL);
 		if (!map) {
 			error = -ENOMEM;
 			goto out_free_irecs;
diff --git a/fs/xfs/libxfs/xfs_dir2.c b/fs/xfs/libxfs/xfs_dir2.c
index e60aa8f8d0a7..728f72f0d078 100644
--- a/fs/xfs/libxfs/xfs_dir2.c
+++ b/fs/xfs/libxfs/xfs_dir2.c
@@ -333,7 +333,8 @@ xfs_dir_cilookup_result(
 					!(args->op_flags & XFS_DA_OP_CILOOKUP))
 		return -EEXIST;
 
-	args->value = kmalloc(len, GFP_NOFS | __GFP_RETRY_MAYFAIL);
+	args->value = kmalloc(len,
+			GFP_KERNEL | __GFP_NOLOCKDEP | __GFP_RETRY_MAYFAIL);
 	if (!args->value)
 		return -ENOMEM;
 
@@ -364,15 +365,8 @@ xfs_dir_lookup(
 	ASSERT(S_ISDIR(VFS_I(dp)->i_mode));
 	XFS_STATS_INC(dp->i_mount, xs_dir_lookup);
 
-	/*
-	 * We need to use KM_NOFS here so that lockdep will not throw false
-	 * positive deadlock warnings on a non-transactional lookup path. It is
-	 * safe to recurse into inode recalim in that case, but lockdep can't
-	 * easily be taught about it. Hence KM_NOFS avoids having to add more
-	 * lockdep Doing this avoids having to add a bunch of lockdep class
-	 * annotations into the reclaim path for the ilock.
-	 */
-	args = kzalloc(sizeof(*args), GFP_NOFS | __GFP_NOFAIL);
+	args = kzalloc(sizeof(*args),
+			GFP_KERNEL | __GFP_NOLOCKDEP | __GFP_NOFAIL);
 	args->geo = dp->i_mount->m_dir_geo;
 	args->name = name->name;
 	args->namelen = name->len;
diff --git a/fs/xfs/libxfs/xfs_iext_tree.c b/fs/xfs/libxfs/xfs_iext_tree.c
index 16f18b08fe4c..8796f2b3e534 100644
--- a/fs/xfs/libxfs/xfs_iext_tree.c
+++ b/fs/xfs/libxfs/xfs_iext_tree.c
@@ -394,12 +394,18 @@ xfs_iext_leaf_key(
 	return leaf->recs[n].lo & XFS_IEXT_STARTOFF_MASK;
 }
 
+static inline void *
+xfs_iext_alloc_node(
+	int	size)
+{
+	return kzalloc(size, GFP_KERNEL | __GFP_NOLOCKDEP | __GFP_NOFAIL);
+}
+
 static void
 xfs_iext_grow(
 	struct xfs_ifork	*ifp)
 {
-	struct xfs_iext_node	*node = kzalloc(NODE_SIZE,
-						GFP_NOFS | __GFP_NOFAIL);
+	struct xfs_iext_node	*node = xfs_iext_alloc_node(NODE_SIZE);
 	int			i;
 
 	if (ifp->if_height == 1) {
@@ -455,8 +461,7 @@ xfs_iext_split_node(
 	int			*nr_entries)
 {
 	struct xfs_iext_node	*node = *nodep;
-	struct xfs_iext_node	*new = kzalloc(NODE_SIZE,
-						GFP_NOFS | __GFP_NOFAIL);
+	struct xfs_iext_node	*new = xfs_iext_alloc_node(NODE_SIZE);
 	const int		nr_move = KEYS_PER_NODE / 2;
 	int			nr_keep = nr_move + (KEYS_PER_NODE & 1);
 	int			i = 0;
@@ -544,8 +549,7 @@ xfs_iext_split_leaf(
 	int			*nr_entries)
 {
 	struct xfs_iext_leaf	*leaf = cur->leaf;
-	struct xfs_iext_leaf	*new = kzalloc(NODE_SIZE,
-						GFP_NOFS | __GFP_NOFAIL);
+	struct xfs_iext_leaf	*new = xfs_iext_alloc_node(NODE_SIZE);
 	const int		nr_move = RECS_PER_LEAF / 2;
 	int			nr_keep = nr_move + (RECS_PER_LEAF & 1);
 	int			i;
@@ -586,8 +590,7 @@ xfs_iext_alloc_root(
 {
 	ASSERT(ifp->if_bytes == 0);
 
-	ifp->if_data = kzalloc(sizeof(struct xfs_iext_rec),
-					GFP_NOFS | __GFP_NOFAIL);
+	ifp->if_data = xfs_iext_alloc_node(sizeof(struct xfs_iext_rec));
 	ifp->if_height = 1;
 
 	/* now that we have a node step into it */
@@ -607,7 +610,8 @@ xfs_iext_realloc_root(
 	if (new_size / sizeof(struct xfs_iext_rec) == RECS_PER_LEAF)
 		new_size = NODE_SIZE;
 
-	new = krealloc(ifp->if_data, new_size, GFP_NOFS | __GFP_NOFAIL);
+	new = krealloc(ifp->if_data, new_size,
+			GFP_KERNEL | __GFP_NOLOCKDEP | __GFP_NOFAIL);
 	memset(new + ifp->if_bytes, 0, new_size - ifp->if_bytes);
 	ifp->if_data = new;
 	cur->leaf = new;
diff --git a/fs/xfs/libxfs/xfs_inode_fork.c b/fs/xfs/libxfs/xfs_inode_fork.c
index f6d5b86b608d..709fda3d742f 100644
--- a/fs/xfs/libxfs/xfs_inode_fork.c
+++ b/fs/xfs/libxfs/xfs_inode_fork.c
@@ -50,7 +50,8 @@ xfs_init_local_fork(
 		mem_size++;
 
 	if (size) {
-		char *new_data = kmalloc(mem_size, GFP_NOFS | __GFP_NOFAIL);
+		char *new_data = kmalloc(mem_size,
+				GFP_KERNEL | __GFP_NOLOCKDEP | __GFP_NOFAIL);
 
 		memcpy(new_data, data, size);
 		if (zero_terminate)
@@ -205,7 +206,8 @@ xfs_iformat_btree(
 	}
 
 	ifp->if_broot_bytes = size;
-	ifp->if_broot = kmalloc(size, GFP_NOFS | __GFP_NOFAIL);
+	ifp->if_broot = kmalloc(size,
+				GFP_KERNEL | __GFP_NOLOCKDEP | __GFP_NOFAIL);
 	ASSERT(ifp->if_broot != NULL);
 	/*
 	 * Copy and convert from the on-disk structure
@@ -690,7 +692,7 @@ xfs_ifork_init_cow(
 		return;
 
 	ip->i_cowfp = kmem_cache_zalloc(xfs_ifork_cache,
-				       GFP_NOFS | __GFP_NOFAIL);
+				GFP_KERNEL | __GFP_NOLOCKDEP | __GFP_NOFAIL);
 	ip->i_cowfp->if_format = XFS_DINODE_FMT_EXTENTS;
 }
 
diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
index dba514a2c84d..06046827b5fe 100644
--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -659,10 +659,9 @@ xfs_iget_cache_miss(
 	/*
 	 * Preload the radix tree so we can insert safely under the
 	 * write spinlock. Note that we cannot sleep inside the preload
-	 * region. Since we can be called from transaction context, don't
-	 * recurse into the file system.
+	 * region.
 	 */
-	if (radix_tree_preload(GFP_NOFS)) {
+	if (radix_tree_preload(GFP_KERNEL | __GFP_NOLOCKDEP)) {
 		error = -EAGAIN;
 		goto out_destroy;
 	}
diff --git a/fs/xfs/xfs_qm.c b/fs/xfs/xfs_qm.c
index 46a7fe70e57e..384a5349e696 100644
--- a/fs/xfs/xfs_qm.c
+++ b/fs/xfs/xfs_qm.c
@@ -643,9 +643,9 @@ xfs_qm_init_quotainfo(
 	if (error)
 		goto out_free_lru;
 
-	INIT_RADIX_TREE(&qinf->qi_uquota_tree, GFP_NOFS);
-	INIT_RADIX_TREE(&qinf->qi_gquota_tree, GFP_NOFS);
-	INIT_RADIX_TREE(&qinf->qi_pquota_tree, GFP_NOFS);
+	INIT_RADIX_TREE(&qinf->qi_uquota_tree, GFP_KERNEL);
+	INIT_RADIX_TREE(&qinf->qi_gquota_tree, GFP_KERNEL);
+	INIT_RADIX_TREE(&qinf->qi_pquota_tree, GFP_KERNEL);
 	mutex_init(&qinf->qi_tree_lock);
 
 	/* mutex used to serialize quotaoffs */
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 08/12] xfs: use GFP_KERNEL in pure transaction contexts
  2024-01-15 22:59 [PATCH 00/12] xfs: remove remaining kmem interfaces and GFP_NOFS usage Dave Chinner
                   ` (5 preceding siblings ...)
  2024-01-15 22:59 ` [PATCH 07/12] xfs: use __GFP_NOLOCKDEP instead of GFP_NOFS Dave Chinner
@ 2024-01-15 22:59 ` Dave Chinner
  2024-01-18 23:38   ` Darrick J. Wong
  2024-01-15 22:59 ` [PATCH 09/12] xfs: place intent recovery under NOFS allocation context Dave Chinner
                   ` (5 subsequent siblings)
  12 siblings, 1 reply; 29+ messages in thread
From: Dave Chinner @ 2024-01-15 22:59 UTC (permalink / raw)
  To: linux-xfs; +Cc: willy, linux-mm

From: Dave Chinner <dchinner@redhat.com>

When running in a transaction context, memory allocations are scoped
to GFP_NOFS. Hence we don't need to use GFP_NOFS contexts in pure
transaction context allocations - GFP_KERNEL will automatically get
converted to GFP_NOFS as appropriate.

Go through the code and convert all the obvious GFP_NOFS allocations
in transaction context to use GFP_KERNEL. This further reduces the
explicit use of GFP_NOFS in XFS.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/libxfs/xfs_attr.c       |  3 ++-
 fs/xfs/libxfs/xfs_bmap.c       |  2 +-
 fs/xfs/libxfs/xfs_defer.c      |  6 +++---
 fs/xfs/libxfs/xfs_dir2.c       |  8 ++++----
 fs/xfs/libxfs/xfs_inode_fork.c |  8 ++++----
 fs/xfs/libxfs/xfs_refcount.c   |  2 +-
 fs/xfs/libxfs/xfs_rmap.c       |  2 +-
 fs/xfs/xfs_attr_item.c         |  4 ++--
 fs/xfs/xfs_bmap_util.c         |  2 +-
 fs/xfs/xfs_buf.c               | 28 +++++++++++++++++-----------
 fs/xfs/xfs_log.c               |  3 ++-
 fs/xfs/xfs_mru_cache.c         |  2 +-
 12 files changed, 39 insertions(+), 31 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
index 9976a00a73f9..269a57420859 100644
--- a/fs/xfs/libxfs/xfs_attr.c
+++ b/fs/xfs/libxfs/xfs_attr.c
@@ -891,7 +891,8 @@ xfs_attr_defer_add(
 
 	struct xfs_attr_intent	*new;
 
-	new = kmem_cache_zalloc(xfs_attr_intent_cache, GFP_NOFS | __GFP_NOFAIL);
+	new = kmem_cache_zalloc(xfs_attr_intent_cache,
+			GFP_KERNEL | __GFP_NOFAIL);
 	new->xattri_op_flags = op_flags;
 	new->xattri_da_args = args;
 
diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 98aaca933bdd..fbdaa53deecd 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -6098,7 +6098,7 @@ __xfs_bmap_add(
 			bmap->br_blockcount,
 			bmap->br_state);
 
-	bi = kmem_cache_alloc(xfs_bmap_intent_cache, GFP_NOFS | __GFP_NOFAIL);
+	bi = kmem_cache_alloc(xfs_bmap_intent_cache, GFP_KERNEL | __GFP_NOFAIL);
 	INIT_LIST_HEAD(&bi->bi_list);
 	bi->bi_type = type;
 	bi->bi_owner = ip;
diff --git a/fs/xfs/libxfs/xfs_defer.c b/fs/xfs/libxfs/xfs_defer.c
index 75689c151a54..8ae4401f6810 100644
--- a/fs/xfs/libxfs/xfs_defer.c
+++ b/fs/xfs/libxfs/xfs_defer.c
@@ -825,7 +825,7 @@ xfs_defer_alloc(
 	struct xfs_defer_pending	*dfp;
 
 	dfp = kmem_cache_zalloc(xfs_defer_pending_cache,
-			GFP_NOFS | __GFP_NOFAIL);
+			GFP_KERNEL | __GFP_NOFAIL);
 	dfp->dfp_ops = ops;
 	INIT_LIST_HEAD(&dfp->dfp_work);
 	list_add_tail(&dfp->dfp_list, &tp->t_dfops);
@@ -888,7 +888,7 @@ xfs_defer_start_recovery(
 	struct xfs_defer_pending	*dfp;
 
 	dfp = kmem_cache_zalloc(xfs_defer_pending_cache,
-			GFP_NOFS | __GFP_NOFAIL);
+			GFP_KERNEL | __GFP_NOFAIL);
 	dfp->dfp_ops = ops;
 	dfp->dfp_intent = lip;
 	INIT_LIST_HEAD(&dfp->dfp_work);
@@ -979,7 +979,7 @@ xfs_defer_ops_capture(
 		return ERR_PTR(error);
 
 	/* Create an object to capture the defer ops. */
-	dfc = kzalloc(sizeof(*dfc), GFP_NOFS | __GFP_NOFAIL);
+	dfc = kzalloc(sizeof(*dfc), GFP_KERNEL | __GFP_NOFAIL);
 	INIT_LIST_HEAD(&dfc->dfc_list);
 	INIT_LIST_HEAD(&dfc->dfc_dfops);
 
diff --git a/fs/xfs/libxfs/xfs_dir2.c b/fs/xfs/libxfs/xfs_dir2.c
index 728f72f0d078..8c9403b33191 100644
--- a/fs/xfs/libxfs/xfs_dir2.c
+++ b/fs/xfs/libxfs/xfs_dir2.c
@@ -236,7 +236,7 @@ xfs_dir_init(
 	if (error)
 		return error;
 
-	args = kzalloc(sizeof(*args), GFP_NOFS | __GFP_NOFAIL);
+	args = kzalloc(sizeof(*args), GFP_KERNEL | __GFP_NOFAIL);
 	if (!args)
 		return -ENOMEM;
 
@@ -273,7 +273,7 @@ xfs_dir_createname(
 		XFS_STATS_INC(dp->i_mount, xs_dir_create);
 	}
 
-	args = kzalloc(sizeof(*args), GFP_NOFS | __GFP_NOFAIL);
+	args = kzalloc(sizeof(*args), GFP_KERNEL | __GFP_NOFAIL);
 	if (!args)
 		return -ENOMEM;
 
@@ -435,7 +435,7 @@ xfs_dir_removename(
 	ASSERT(S_ISDIR(VFS_I(dp)->i_mode));
 	XFS_STATS_INC(dp->i_mount, xs_dir_remove);
 
-	args = kzalloc(sizeof(*args), GFP_NOFS | __GFP_NOFAIL);
+	args = kzalloc(sizeof(*args), GFP_KERNEL | __GFP_NOFAIL);
 	if (!args)
 		return -ENOMEM;
 
@@ -496,7 +496,7 @@ xfs_dir_replace(
 	if (rval)
 		return rval;
 
-	args = kzalloc(sizeof(*args), GFP_NOFS | __GFP_NOFAIL);
+	args = kzalloc(sizeof(*args), GFP_KERNEL | __GFP_NOFAIL);
 	if (!args)
 		return -ENOMEM;
 
diff --git a/fs/xfs/libxfs/xfs_inode_fork.c b/fs/xfs/libxfs/xfs_inode_fork.c
index 709fda3d742f..136d5d7b9de9 100644
--- a/fs/xfs/libxfs/xfs_inode_fork.c
+++ b/fs/xfs/libxfs/xfs_inode_fork.c
@@ -402,7 +402,7 @@ xfs_iroot_realloc(
 		if (ifp->if_broot_bytes == 0) {
 			new_size = XFS_BMAP_BROOT_SPACE_CALC(mp, rec_diff);
 			ifp->if_broot = kmalloc(new_size,
-						GFP_NOFS | __GFP_NOFAIL);
+						GFP_KERNEL | __GFP_NOFAIL);
 			ifp->if_broot_bytes = (int)new_size;
 			return;
 		}
@@ -417,7 +417,7 @@ xfs_iroot_realloc(
 		new_max = cur_max + rec_diff;
 		new_size = XFS_BMAP_BROOT_SPACE_CALC(mp, new_max);
 		ifp->if_broot = krealloc(ifp->if_broot, new_size,
-					 GFP_NOFS | __GFP_NOFAIL);
+					 GFP_KERNEL | __GFP_NOFAIL);
 		op = (char *)XFS_BMAP_BROOT_PTR_ADDR(mp, ifp->if_broot, 1,
 						     ifp->if_broot_bytes);
 		np = (char *)XFS_BMAP_BROOT_PTR_ADDR(mp, ifp->if_broot, 1,
@@ -443,7 +443,7 @@ xfs_iroot_realloc(
 	else
 		new_size = 0;
 	if (new_size > 0) {
-		new_broot = kmalloc(new_size, GFP_NOFS | __GFP_NOFAIL);
+		new_broot = kmalloc(new_size, GFP_KERNEL | __GFP_NOFAIL);
 		/*
 		 * First copy over the btree block header.
 		 */
@@ -512,7 +512,7 @@ xfs_idata_realloc(
 
 	if (byte_diff) {
 		ifp->if_data = krealloc(ifp->if_data, new_size,
-					GFP_NOFS | __GFP_NOFAIL);
+					GFP_KERNEL | __GFP_NOFAIL);
 		if (new_size == 0)
 			ifp->if_data = NULL;
 		ifp->if_bytes = new_size;
diff --git a/fs/xfs/libxfs/xfs_refcount.c b/fs/xfs/libxfs/xfs_refcount.c
index 6709a7f8bad5..7df52daa22cf 100644
--- a/fs/xfs/libxfs/xfs_refcount.c
+++ b/fs/xfs/libxfs/xfs_refcount.c
@@ -1449,7 +1449,7 @@ __xfs_refcount_add(
 			blockcount);
 
 	ri = kmem_cache_alloc(xfs_refcount_intent_cache,
-			GFP_NOFS | __GFP_NOFAIL);
+			GFP_KERNEL | __GFP_NOFAIL);
 	INIT_LIST_HEAD(&ri->ri_list);
 	ri->ri_type = type;
 	ri->ri_startblock = startblock;
diff --git a/fs/xfs/libxfs/xfs_rmap.c b/fs/xfs/libxfs/xfs_rmap.c
index 76bf7f48cb5a..0bd1f47b2c2b 100644
--- a/fs/xfs/libxfs/xfs_rmap.c
+++ b/fs/xfs/libxfs/xfs_rmap.c
@@ -2559,7 +2559,7 @@ __xfs_rmap_add(
 			bmap->br_blockcount,
 			bmap->br_state);
 
-	ri = kmem_cache_alloc(xfs_rmap_intent_cache, GFP_NOFS | __GFP_NOFAIL);
+	ri = kmem_cache_alloc(xfs_rmap_intent_cache, GFP_KERNEL | __GFP_NOFAIL);
 	INIT_LIST_HEAD(&ri->ri_list);
 	ri->ri_type = type;
 	ri->ri_owner = owner;
diff --git a/fs/xfs/xfs_attr_item.c b/fs/xfs/xfs_attr_item.c
index 2a142cefdc3d..0bf25a2ba3b6 100644
--- a/fs/xfs/xfs_attr_item.c
+++ b/fs/xfs/xfs_attr_item.c
@@ -226,7 +226,7 @@ xfs_attri_init(
 {
 	struct xfs_attri_log_item	*attrip;
 
-	attrip = kmem_cache_zalloc(xfs_attri_cache, GFP_NOFS | __GFP_NOFAIL);
+	attrip = kmem_cache_zalloc(xfs_attri_cache, GFP_KERNEL | __GFP_NOFAIL);
 
 	/*
 	 * Grab an extra reference to the name/value buffer for this log item.
@@ -666,7 +666,7 @@ xfs_attr_create_done(
 
 	attrip = ATTRI_ITEM(intent);
 
-	attrdp = kmem_cache_zalloc(xfs_attrd_cache, GFP_NOFS | __GFP_NOFAIL);
+	attrdp = kmem_cache_zalloc(xfs_attrd_cache, GFP_KERNEL | __GFP_NOFAIL);
 
 	xfs_log_item_init(tp->t_mountp, &attrdp->attrd_item, XFS_LI_ATTRD,
 			  &xfs_attrd_item_ops);
diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index c2531c28905c..cb2a4b940292 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -66,7 +66,7 @@ xfs_zero_extent(
 	return blkdev_issue_zeroout(target->bt_bdev,
 		block << (mp->m_super->s_blocksize_bits - 9),
 		count_fsb << (mp->m_super->s_blocksize_bits - 9),
-		GFP_NOFS, 0);
+		GFP_KERNEL, 0);
 }
 
 /*
diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index a09ffbbb0dda..de99368000b4 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -190,7 +190,7 @@ xfs_buf_get_maps(
 	}
 
 	bp->b_maps = kzalloc(map_count * sizeof(struct xfs_buf_map),
-				GFP_NOFS | __GFP_NOFAIL);
+			GFP_KERNEL | __GFP_NOLOCKDEP | __GFP_NOFAIL);
 	if (!bp->b_maps)
 		return -ENOMEM;
 	return 0;
@@ -222,7 +222,8 @@ _xfs_buf_alloc(
 	int			i;
 
 	*bpp = NULL;
-	bp = kmem_cache_zalloc(xfs_buf_cache, GFP_NOFS | __GFP_NOFAIL);
+	bp = kmem_cache_zalloc(xfs_buf_cache,
+			GFP_KERNEL | __GFP_NOLOCKDEP | __GFP_NOFAIL);
 
 	/*
 	 * We don't want certain flags to appear in b_flags unless they are
@@ -325,7 +326,7 @@ xfs_buf_alloc_kmem(
 	struct xfs_buf	*bp,
 	xfs_buf_flags_t	flags)
 {
-	gfp_t		gfp_mask = GFP_NOFS | __GFP_NOFAIL;
+	gfp_t		gfp_mask = GFP_KERNEL | __GFP_NOLOCKDEP | __GFP_NOFAIL;
 	size_t		size = BBTOB(bp->b_length);
 
 	/* Assure zeroed buffer for non-read cases. */
@@ -356,13 +357,11 @@ xfs_buf_alloc_pages(
 	struct xfs_buf	*bp,
 	xfs_buf_flags_t	flags)
 {
-	gfp_t		gfp_mask = __GFP_NOWARN;
+	gfp_t		gfp_mask = GFP_KERNEL | __GFP_NOLOCKDEP | __GFP_NOWARN;
 	long		filled = 0;
 
 	if (flags & XBF_READ_AHEAD)
 		gfp_mask |= __GFP_NORETRY;
-	else
-		gfp_mask |= GFP_NOFS;
 
 	/* Make sure that we have a page list */
 	bp->b_page_count = DIV_ROUND_UP(BBTOB(bp->b_length), PAGE_SIZE);
@@ -429,11 +428,18 @@ _xfs_buf_map_pages(
 
 		/*
 		 * vm_map_ram() will allocate auxiliary structures (e.g.
-		 * pagetables) with GFP_KERNEL, yet we are likely to be under
-		 * GFP_NOFS context here. Hence we need to tell memory reclaim
-		 * that we are in such a context via PF_MEMALLOC_NOFS to prevent
-		 * memory reclaim re-entering the filesystem here and
-		 * potentially deadlocking.
+		 * pagetables) with GFP_KERNEL, yet we often under a scoped nofs
+		 * context here. Mixing GFP_KERNEL with GFP_NOFS allocations
+		 * from the same call site that can be run from both above and
+		 * below memory reclaim causes lockdep false positives. Hence we
+		 * always need to force this allocation to nofs context because
+		 * we can't pass __GFP_NOLOCKDEP down to auxillary structures to
+		 * prevent false positive lockdep reports.
+		 *
+		 * XXX(dgc): I think dquot reclaim is the only place we can get
+		 * to this function from memory reclaim context now. If we fix
+		 * that like we've fixed inode reclaim to avoid writeback from
+		 * reclaim, this nofs wrapping can go away.
 		 */
 		nofs_flag = memalloc_nofs_save();
 		do {
diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
index ee39639bb92b..1f68569e62ca 100644
--- a/fs/xfs/xfs_log.c
+++ b/fs/xfs/xfs_log.c
@@ -3518,7 +3518,8 @@ xlog_ticket_alloc(
 	struct xlog_ticket	*tic;
 	int			unit_res;
 
-	tic = kmem_cache_zalloc(xfs_log_ticket_cache, GFP_NOFS | __GFP_NOFAIL);
+	tic = kmem_cache_zalloc(xfs_log_ticket_cache,
+			GFP_KERNEL | __GFP_NOFAIL);
 
 	unit_res = xlog_calc_unit_res(log, unit_bytes, &tic->t_iclog_hdrs);
 
diff --git a/fs/xfs/xfs_mru_cache.c b/fs/xfs/xfs_mru_cache.c
index ce496704748d..7443debaffd6 100644
--- a/fs/xfs/xfs_mru_cache.c
+++ b/fs/xfs/xfs_mru_cache.c
@@ -428,7 +428,7 @@ xfs_mru_cache_insert(
 	if (!mru || !mru->lists)
 		return -EINVAL;
 
-	if (radix_tree_preload(GFP_NOFS))
+	if (radix_tree_preload(GFP_KERNEL))
 		return -ENOMEM;
 
 	INIT_LIST_HEAD(&elem->list_node);
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 09/12] xfs: place intent recovery under NOFS allocation context
  2024-01-15 22:59 [PATCH 00/12] xfs: remove remaining kmem interfaces and GFP_NOFS usage Dave Chinner
                   ` (6 preceding siblings ...)
  2024-01-15 22:59 ` [PATCH 08/12] xfs: use GFP_KERNEL in pure transaction contexts Dave Chinner
@ 2024-01-15 22:59 ` Dave Chinner
  2024-01-18 23:39   ` Darrick J. Wong
  2024-01-15 22:59 ` [PATCH 10/12] xfs: place the CIL under nofs " Dave Chinner
                   ` (4 subsequent siblings)
  12 siblings, 1 reply; 29+ messages in thread
From: Dave Chinner @ 2024-01-15 22:59 UTC (permalink / raw)
  To: linux-xfs; +Cc: willy, linux-mm

From: Dave Chinner <dchinner@redhat.com>

When recovery starts processing intents, all of the initial intent
allocations are done outside of transaction contexts. That means
they need to specifically use GFP_NOFS as we do not want memory
reclaim to attempt to run direct reclaim of filesystem objects while
we have lots of objects added into deferred operations.

Rather than use GFP_NOFS for these specific allocations, just place
the entire intent recovery process under NOFS context and we can
then just use GFP_KERNEL for these allocations.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_attr_item.c     |  2 +-
 fs/xfs/xfs_bmap_item.c     |  3 ++-
 fs/xfs/xfs_log_recover.c   | 18 ++++++++++++++----
 fs/xfs/xfs_refcount_item.c |  2 +-
 fs/xfs/xfs_rmap_item.c     |  2 +-
 5 files changed, 19 insertions(+), 8 deletions(-)

diff --git a/fs/xfs/xfs_attr_item.c b/fs/xfs/xfs_attr_item.c
index 0bf25a2ba3b6..e14e229fc712 100644
--- a/fs/xfs/xfs_attr_item.c
+++ b/fs/xfs/xfs_attr_item.c
@@ -513,7 +513,7 @@ xfs_attri_recover_work(
 		return ERR_PTR(error);
 
 	attr = kzalloc(sizeof(struct xfs_attr_intent) +
-			sizeof(struct xfs_da_args), GFP_NOFS | __GFP_NOFAIL);
+			sizeof(struct xfs_da_args), GFP_KERNEL | __GFP_NOFAIL);
 	args = (struct xfs_da_args *)(attr + 1);
 
 	attr->xattri_da_args = args;
diff --git a/fs/xfs/xfs_bmap_item.c b/fs/xfs/xfs_bmap_item.c
index 029a6a8d0efd..e3c58090e976 100644
--- a/fs/xfs/xfs_bmap_item.c
+++ b/fs/xfs/xfs_bmap_item.c
@@ -445,7 +445,8 @@ xfs_bui_recover_work(
 	if (error)
 		return ERR_PTR(error);
 
-	bi = kmem_cache_zalloc(xfs_bmap_intent_cache, GFP_NOFS | __GFP_NOFAIL);
+	bi = kmem_cache_zalloc(xfs_bmap_intent_cache,
+			GFP_KERNEL | __GFP_NOFAIL);
 	bi->bi_whichfork = (map->me_flags & XFS_BMAP_EXTENT_ATTR_FORK) ?
 			XFS_ATTR_FORK : XFS_DATA_FORK;
 	bi->bi_type = map->me_flags & XFS_BMAP_EXTENT_TYPE_MASK;
diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index e9ed43a833af..8c1d260bb9e1 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -3443,12 +3443,19 @@ xlog_recover(
  * part of recovery so that the root and real-time bitmap inodes can be read in
  * from disk in between the two stages.  This is necessary so that we can free
  * space in the real-time portion of the file system.
+ *
+ * We run this whole process under GFP_NOFS allocation context. We do a
+ * combination of non-transactional and transactional work, yet we really don't
+ * want to recurse into the filesystem from direct reclaim during any of this
+ * processing. This allows all the recovery code run here not to care about the
+ * memory allocation context it is running in.
  */
 int
 xlog_recover_finish(
 	struct xlog	*log)
 {
-	int	error;
+	unsigned int	nofs_flags = memalloc_nofs_save();
+	int		error;
 
 	error = xlog_recover_process_intents(log);
 	if (error) {
@@ -3462,7 +3469,7 @@ xlog_recover_finish(
 		xlog_recover_cancel_intents(log);
 		xfs_alert(log->l_mp, "Failed to recover intents");
 		xlog_force_shutdown(log, SHUTDOWN_LOG_IO_ERROR);
-		return error;
+		goto out_error;
 	}
 
 	/*
@@ -3483,7 +3490,7 @@ xlog_recover_finish(
 		if (error < 0) {
 			xfs_alert(log->l_mp,
 	"Failed to clear log incompat features on recovery");
-			return error;
+			goto out_error;
 		}
 	}
 
@@ -3508,9 +3515,12 @@ xlog_recover_finish(
 		 * and AIL.
 		 */
 		xlog_force_shutdown(log, SHUTDOWN_LOG_IO_ERROR);
+		goto out_error;
 	}
 
-	return 0;
+out_error:
+	memalloc_nofs_restore(nofs_flags);
+	return error;
 }
 
 void
diff --git a/fs/xfs/xfs_refcount_item.c b/fs/xfs/xfs_refcount_item.c
index d850b9685f7f..14919b33e4fe 100644
--- a/fs/xfs/xfs_refcount_item.c
+++ b/fs/xfs/xfs_refcount_item.c
@@ -425,7 +425,7 @@ xfs_cui_recover_work(
 	struct xfs_refcount_intent	*ri;
 
 	ri = kmem_cache_alloc(xfs_refcount_intent_cache,
-			GFP_NOFS | __GFP_NOFAIL);
+			GFP_KERNEL | __GFP_NOFAIL);
 	ri->ri_type = pmap->pe_flags & XFS_REFCOUNT_EXTENT_TYPE_MASK;
 	ri->ri_startblock = pmap->pe_startblock;
 	ri->ri_blockcount = pmap->pe_len;
diff --git a/fs/xfs/xfs_rmap_item.c b/fs/xfs/xfs_rmap_item.c
index a40b92ac81e8..e473124e29cc 100644
--- a/fs/xfs/xfs_rmap_item.c
+++ b/fs/xfs/xfs_rmap_item.c
@@ -455,7 +455,7 @@ xfs_rui_recover_work(
 {
 	struct xfs_rmap_intent		*ri;
 
-	ri = kmem_cache_alloc(xfs_rmap_intent_cache, GFP_NOFS | __GFP_NOFAIL);
+	ri = kmem_cache_alloc(xfs_rmap_intent_cache, GFP_KERNEL | __GFP_NOFAIL);
 
 	switch (map->me_flags & XFS_RMAP_EXTENT_TYPE_MASK) {
 	case XFS_RMAP_EXTENT_MAP:
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 10/12] xfs: place the CIL under nofs allocation context
  2024-01-15 22:59 [PATCH 00/12] xfs: remove remaining kmem interfaces and GFP_NOFS usage Dave Chinner
                   ` (7 preceding siblings ...)
  2024-01-15 22:59 ` [PATCH 09/12] xfs: place intent recovery under NOFS allocation context Dave Chinner
@ 2024-01-15 22:59 ` Dave Chinner
  2024-01-18 23:41   ` Darrick J. Wong
  2024-01-15 22:59 ` [PATCH 11/12] xfs: clean up remaining GFP_NOFS users Dave Chinner
                   ` (3 subsequent siblings)
  12 siblings, 1 reply; 29+ messages in thread
From: Dave Chinner @ 2024-01-15 22:59 UTC (permalink / raw)
  To: linux-xfs; +Cc: willy, linux-mm

From: Dave Chinner <dchinner@redhat.com>

This is core code that needs to run in low memory conditions and
can be triggered from memory reclaim. While it runs in a workqueue,
it really shouldn't be recursing back into the filesystem during
any memory allocation it needs to function.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_log_cil.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
index 815a2181004c..8c3b09777006 100644
--- a/fs/xfs/xfs_log_cil.c
+++ b/fs/xfs/xfs_log_cil.c
@@ -100,7 +100,7 @@ xlog_cil_ctx_alloc(void)
 {
 	struct xfs_cil_ctx	*ctx;
 
-	ctx = kzalloc(sizeof(*ctx), GFP_NOFS | __GFP_NOFAIL);
+	ctx = kzalloc(sizeof(*ctx), GFP_KERNEL | __GFP_NOFAIL);
 	INIT_LIST_HEAD(&ctx->committing);
 	INIT_LIST_HEAD(&ctx->busy_extents.extent_list);
 	INIT_LIST_HEAD(&ctx->log_items);
@@ -1116,11 +1116,18 @@ xlog_cil_cleanup_whiteouts(
  * same sequence twice.  If we get a race between multiple pushes for the same
  * sequence they will block on the first one and then abort, hence avoiding
  * needless pushes.
+ *
+ * This runs from a workqueue so it does not inherent any specific memory
+ * allocation context. However, we do not want to block on memory reclaim
+ * recursing back into the filesystem because this push may have been triggered
+ * by memory reclaim itself. Hence we really need to run under full GFP_NOFS
+ * contraints here.
  */
 static void
 xlog_cil_push_work(
 	struct work_struct	*work)
 {
+	unsigned int		nofs_flags = memalloc_nofs_save();
 	struct xfs_cil_ctx	*ctx =
 		container_of(work, struct xfs_cil_ctx, push_work);
 	struct xfs_cil		*cil = ctx->cil;
@@ -1334,12 +1341,14 @@ xlog_cil_push_work(
 	spin_unlock(&log->l_icloglock);
 	xlog_cil_cleanup_whiteouts(&whiteouts);
 	xfs_log_ticket_ungrant(log, ticket);
+	memalloc_nofs_restore(nofs_flags);
 	return;
 
 out_skip:
 	up_write(&cil->xc_ctx_lock);
 	xfs_log_ticket_put(new_ctx->ticket);
 	kfree(new_ctx);
+	memalloc_nofs_restore(nofs_flags);
 	return;
 
 out_abort_free_ticket:
@@ -1348,6 +1357,7 @@ xlog_cil_push_work(
 	if (!ctx->commit_iclog) {
 		xfs_log_ticket_ungrant(log, ctx->ticket);
 		xlog_cil_committed(ctx);
+		memalloc_nofs_restore(nofs_flags);
 		return;
 	}
 	spin_lock(&log->l_icloglock);
@@ -1356,6 +1366,7 @@ xlog_cil_push_work(
 	/* Not safe to reference ctx now! */
 	spin_unlock(&log->l_icloglock);
 	xfs_log_ticket_ungrant(log, ticket);
+	memalloc_nofs_restore(nofs_flags);
 }
 
 /*
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 11/12] xfs: clean up remaining GFP_NOFS users
  2024-01-15 22:59 [PATCH 00/12] xfs: remove remaining kmem interfaces and GFP_NOFS usage Dave Chinner
                   ` (8 preceding siblings ...)
  2024-01-15 22:59 ` [PATCH 10/12] xfs: place the CIL under nofs " Dave Chinner
@ 2024-01-15 22:59 ` Dave Chinner
  2024-01-19  0:52   ` Darrick J. Wong
  2024-01-15 22:59 ` [PATCH 12/12] xfs: use xfs_defer_alloc a bit more Dave Chinner
                   ` (2 subsequent siblings)
  12 siblings, 1 reply; 29+ messages in thread
From: Dave Chinner @ 2024-01-15 22:59 UTC (permalink / raw)
  To: linux-xfs; +Cc: willy, linux-mm

From: Dave Chinner <dchinner@redhat.com>

These few remaining GFP_NOFS callers do not need to use GFP_NOFS at
all. They are only called from a non-transactional context or cannot
be accessed from memory reclaim due to other constraints. Hence they
can just use GFP_KERNEL.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/libxfs/xfs_btree_staging.c | 4 ++--
 fs/xfs/xfs_attr_list.c            | 2 +-
 fs/xfs/xfs_buf.c                  | 2 +-
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_btree_staging.c b/fs/xfs/libxfs/xfs_btree_staging.c
index 961f6b898f4b..f0c69f9bb169 100644
--- a/fs/xfs/libxfs/xfs_btree_staging.c
+++ b/fs/xfs/libxfs/xfs_btree_staging.c
@@ -139,7 +139,7 @@ xfs_btree_stage_afakeroot(
 	ASSERT(!(cur->bc_flags & XFS_BTREE_ROOT_IN_INODE));
 	ASSERT(cur->bc_tp == NULL);
 
-	nops = kmalloc(sizeof(struct xfs_btree_ops), GFP_NOFS | __GFP_NOFAIL);
+	nops = kmalloc(sizeof(struct xfs_btree_ops), GFP_KERNEL | __GFP_NOFAIL);
 	memcpy(nops, cur->bc_ops, sizeof(struct xfs_btree_ops));
 	nops->alloc_block = xfs_btree_fakeroot_alloc_block;
 	nops->free_block = xfs_btree_fakeroot_free_block;
@@ -220,7 +220,7 @@ xfs_btree_stage_ifakeroot(
 	ASSERT(cur->bc_flags & XFS_BTREE_ROOT_IN_INODE);
 	ASSERT(cur->bc_tp == NULL);
 
-	nops = kmalloc(sizeof(struct xfs_btree_ops), GFP_NOFS | __GFP_NOFAIL);
+	nops = kmalloc(sizeof(struct xfs_btree_ops), GFP_KERNEL | __GFP_NOFAIL);
 	memcpy(nops, cur->bc_ops, sizeof(struct xfs_btree_ops));
 	nops->alloc_block = xfs_btree_fakeroot_alloc_block;
 	nops->free_block = xfs_btree_fakeroot_free_block;
diff --git a/fs/xfs/xfs_attr_list.c b/fs/xfs/xfs_attr_list.c
index 0318d768520a..47453510c0ab 100644
--- a/fs/xfs/xfs_attr_list.c
+++ b/fs/xfs/xfs_attr_list.c
@@ -109,7 +109,7 @@ xfs_attr_shortform_list(
 	 * It didn't all fit, so we have to sort everything on hashval.
 	 */
 	sbsize = sf->count * sizeof(*sbuf);
-	sbp = sbuf = kmalloc(sbsize, GFP_NOFS | __GFP_NOFAIL);
+	sbp = sbuf = kmalloc(sbsize, GFP_KERNEL | __GFP_NOFAIL);
 
 	/*
 	 * Scan the attribute list for the rest of the entries, storing
diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index de99368000b4..08f2fbc04db5 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -2008,7 +2008,7 @@ xfs_alloc_buftarg(
 #if defined(CONFIG_FS_DAX) && defined(CONFIG_MEMORY_FAILURE)
 	ops = &xfs_dax_holder_operations;
 #endif
-	btp = kzalloc(sizeof(*btp), GFP_NOFS | __GFP_NOFAIL);
+	btp = kzalloc(sizeof(*btp), GFP_KERNEL | __GFP_NOFAIL);
 
 	btp->bt_mount = mp;
 	btp->bt_bdev_handle = bdev_handle;
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 12/12] xfs: use xfs_defer_alloc a bit more
  2024-01-15 22:59 [PATCH 00/12] xfs: remove remaining kmem interfaces and GFP_NOFS usage Dave Chinner
                   ` (9 preceding siblings ...)
  2024-01-15 22:59 ` [PATCH 11/12] xfs: clean up remaining GFP_NOFS users Dave Chinner
@ 2024-01-15 22:59 ` Dave Chinner
  2024-01-18 23:41   ` Darrick J. Wong
       [not found] ` <20240115230113.4080105-3-david@fromorbit.com>
  2024-03-25 17:46 ` [PATCH 00/12] xfs: remove remaining kmem interfaces and GFP_NOFS usage Pankaj Raghav (Samsung)
  12 siblings, 1 reply; 29+ messages in thread
From: Dave Chinner @ 2024-01-15 22:59 UTC (permalink / raw)
  To: linux-xfs; +Cc: willy, linux-mm

From: Dave Chinner <dchinner@redhat.com>

Noticed by inspection, simple factoring allows the same allocation
routine to be used for both transaction and recovery contexts.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/libxfs/xfs_defer.c | 15 +++++----------
 1 file changed, 5 insertions(+), 10 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_defer.c b/fs/xfs/libxfs/xfs_defer.c
index 8ae4401f6810..6ed3a5fda081 100644
--- a/fs/xfs/libxfs/xfs_defer.c
+++ b/fs/xfs/libxfs/xfs_defer.c
@@ -819,7 +819,7 @@ xfs_defer_can_append(
 /* Create a new pending item at the end of the transaction list. */
 static inline struct xfs_defer_pending *
 xfs_defer_alloc(
-	struct xfs_trans		*tp,
+	struct list_head		*dfops,
 	const struct xfs_defer_op_type	*ops)
 {
 	struct xfs_defer_pending	*dfp;
@@ -828,7 +828,7 @@ xfs_defer_alloc(
 			GFP_KERNEL | __GFP_NOFAIL);
 	dfp->dfp_ops = ops;
 	INIT_LIST_HEAD(&dfp->dfp_work);
-	list_add_tail(&dfp->dfp_list, &tp->t_dfops);
+	list_add_tail(&dfp->dfp_list, dfops);
 
 	return dfp;
 }
@@ -846,7 +846,7 @@ xfs_defer_add(
 
 	dfp = xfs_defer_find_last(tp, ops);
 	if (!dfp || !xfs_defer_can_append(dfp, ops))
-		dfp = xfs_defer_alloc(tp, ops);
+		dfp = xfs_defer_alloc(&tp->t_dfops, ops);
 
 	xfs_defer_add_item(dfp, li);
 	trace_xfs_defer_add_item(tp->t_mountp, dfp, li);
@@ -870,7 +870,7 @@ xfs_defer_add_barrier(
 	if (dfp)
 		return;
 
-	xfs_defer_alloc(tp, &xfs_barrier_defer_type);
+	xfs_defer_alloc(&tp->t_dfops, &xfs_barrier_defer_type);
 
 	trace_xfs_defer_add_item(tp->t_mountp, dfp, NULL);
 }
@@ -885,14 +885,9 @@ xfs_defer_start_recovery(
 	struct list_head		*r_dfops,
 	const struct xfs_defer_op_type	*ops)
 {
-	struct xfs_defer_pending	*dfp;
+	struct xfs_defer_pending	*dfp = xfs_defer_alloc(r_dfops, ops);
 
-	dfp = kmem_cache_zalloc(xfs_defer_pending_cache,
-			GFP_KERNEL | __GFP_NOFAIL);
-	dfp->dfp_ops = ops;
 	dfp->dfp_intent = lip;
-	INIT_LIST_HEAD(&dfp->dfp_work);
-	list_add_tail(&dfp->dfp_list, r_dfops);
 }
 
 /*
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH 01/12] xfs: convert kmem_zalloc() to kzalloc()
  2024-01-15 22:59 ` [PATCH 01/12] xfs: convert kmem_zalloc() to kzalloc() Dave Chinner
@ 2024-01-18 22:48   ` Darrick J. Wong
  0 siblings, 0 replies; 29+ messages in thread
From: Darrick J. Wong @ 2024-01-18 22:48 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs, willy, linux-mm

On Tue, Jan 16, 2024 at 09:59:39AM +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> There's no reason to keep the kmem_zalloc() around anymore, it's
> just a thin wrapper around kmalloc(), so lets get rid of it.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>

Looks good to me
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> ---
>  fs/xfs/kmem.h                     |  7 -------
>  fs/xfs/libxfs/xfs_ag.c            |  2 +-
>  fs/xfs/libxfs/xfs_attr_leaf.c     |  3 ++-
>  fs/xfs/libxfs/xfs_btree_staging.c |  2 +-
>  fs/xfs/libxfs/xfs_da_btree.c      |  5 +++--
>  fs/xfs/libxfs/xfs_defer.c         |  2 +-
>  fs/xfs/libxfs/xfs_dir2.c          | 18 +++++++++---------
>  fs/xfs/libxfs/xfs_iext_tree.c     | 12 ++++++++----
>  fs/xfs/xfs_attr_item.c            |  4 ++--
>  fs/xfs/xfs_buf.c                  |  6 +++---
>  fs/xfs/xfs_buf_item.c             |  4 ++--
>  fs/xfs/xfs_error.c                |  4 ++--
>  fs/xfs/xfs_extent_busy.c          |  3 ++-
>  fs/xfs/xfs_itable.c               |  8 ++++----
>  fs/xfs/xfs_iwalk.c                |  3 ++-
>  fs/xfs/xfs_log.c                  |  5 +++--
>  fs/xfs/xfs_log_cil.c              |  4 ++--
>  fs/xfs/xfs_log_recover.c          | 10 +++++-----
>  fs/xfs/xfs_mru_cache.c            |  7 ++++---
>  fs/xfs/xfs_qm.c                   |  3 ++-
>  fs/xfs/xfs_refcount_item.c        |  4 ++--
>  fs/xfs/xfs_rmap_item.c            |  3 ++-
>  fs/xfs/xfs_trans_ail.c            |  3 ++-
>  23 files changed, 64 insertions(+), 58 deletions(-)
> 
> diff --git a/fs/xfs/kmem.h b/fs/xfs/kmem.h
> index b987dc2c6851..bce31182c9e8 100644
> --- a/fs/xfs/kmem.h
> +++ b/fs/xfs/kmem.h
> @@ -62,13 +62,6 @@ static inline void  kmem_free(const void *ptr)
>  	kvfree(ptr);
>  }
>  
> -
> -static inline void *
> -kmem_zalloc(size_t size, xfs_km_flags_t flags)
> -{
> -	return kmem_alloc(size, flags | KM_ZERO);
> -}
> -
>  /*
>   * Zone interfaces
>   */
> diff --git a/fs/xfs/libxfs/xfs_ag.c b/fs/xfs/libxfs/xfs_ag.c
> index 39d9525270b7..96a6bfd58931 100644
> --- a/fs/xfs/libxfs/xfs_ag.c
> +++ b/fs/xfs/libxfs/xfs_ag.c
> @@ -381,7 +381,7 @@ xfs_initialize_perag(
>  			continue;
>  		}
>  
> -		pag = kmem_zalloc(sizeof(*pag), KM_MAYFAIL);
> +		pag = kzalloc(sizeof(*pag), GFP_KERNEL | __GFP_RETRY_MAYFAIL);
>  		if (!pag) {
>  			error = -ENOMEM;
>  			goto out_unwind_new_pags;
> diff --git a/fs/xfs/libxfs/xfs_attr_leaf.c b/fs/xfs/libxfs/xfs_attr_leaf.c
> index 6374bf107242..ab4223bf51ee 100644
> --- a/fs/xfs/libxfs/xfs_attr_leaf.c
> +++ b/fs/xfs/libxfs/xfs_attr_leaf.c
> @@ -2250,7 +2250,8 @@ xfs_attr3_leaf_unbalance(
>  		struct xfs_attr_leafblock *tmp_leaf;
>  		struct xfs_attr3_icleaf_hdr tmphdr;
>  
> -		tmp_leaf = kmem_zalloc(state->args->geo->blksize, 0);
> +		tmp_leaf = kzalloc(state->args->geo->blksize,
> +				GFP_KERNEL | __GFP_NOFAIL);
>  
>  		/*
>  		 * Copy the header into the temp leaf so that all the stuff
> diff --git a/fs/xfs/libxfs/xfs_btree_staging.c b/fs/xfs/libxfs/xfs_btree_staging.c
> index e276eba87cb1..eff29425fd76 100644
> --- a/fs/xfs/libxfs/xfs_btree_staging.c
> +++ b/fs/xfs/libxfs/xfs_btree_staging.c
> @@ -406,7 +406,7 @@ xfs_btree_bload_prep_block(
>  
>  		/* Allocate a new incore btree root block. */
>  		new_size = bbl->iroot_size(cur, level, nr_this_block, priv);
> -		ifp->if_broot = kmem_zalloc(new_size, 0);
> +		ifp->if_broot = kzalloc(new_size, GFP_KERNEL);
>  		ifp->if_broot_bytes = (int)new_size;
>  
>  		/* Initialize it and send it out. */
> diff --git a/fs/xfs/libxfs/xfs_da_btree.c b/fs/xfs/libxfs/xfs_da_btree.c
> index 5457188bb4de..73aae6543906 100644
> --- a/fs/xfs/libxfs/xfs_da_btree.c
> +++ b/fs/xfs/libxfs/xfs_da_btree.c
> @@ -2518,7 +2518,7 @@ xfs_dabuf_map(
>  	int			error = 0, nirecs, i;
>  
>  	if (nfsb > 1)
> -		irecs = kmem_zalloc(sizeof(irec) * nfsb, KM_NOFS);
> +		irecs = kzalloc(sizeof(irec) * nfsb, GFP_NOFS | __GFP_NOFAIL);
>  
>  	nirecs = nfsb;
>  	error = xfs_bmapi_read(dp, bno, nfsb, irecs, &nirecs,
> @@ -2531,7 +2531,8 @@ xfs_dabuf_map(
>  	 * larger one that needs to be free by the caller.
>  	 */
>  	if (nirecs > 1) {
> -		map = kmem_zalloc(nirecs * sizeof(struct xfs_buf_map), KM_NOFS);
> +		map = kzalloc(nirecs * sizeof(struct xfs_buf_map),
> +				GFP_NOFS | __GFP_NOFAIL);
>  		if (!map) {
>  			error = -ENOMEM;
>  			goto out_free_irecs;
> diff --git a/fs/xfs/libxfs/xfs_defer.c b/fs/xfs/libxfs/xfs_defer.c
> index 66a17910d021..07d318b1f807 100644
> --- a/fs/xfs/libxfs/xfs_defer.c
> +++ b/fs/xfs/libxfs/xfs_defer.c
> @@ -979,7 +979,7 @@ xfs_defer_ops_capture(
>  		return ERR_PTR(error);
>  
>  	/* Create an object to capture the defer ops. */
> -	dfc = kmem_zalloc(sizeof(*dfc), KM_NOFS);
> +	dfc = kzalloc(sizeof(*dfc), GFP_NOFS | __GFP_NOFAIL);
>  	INIT_LIST_HEAD(&dfc->dfc_list);
>  	INIT_LIST_HEAD(&dfc->dfc_dfops);
>  
> diff --git a/fs/xfs/libxfs/xfs_dir2.c b/fs/xfs/libxfs/xfs_dir2.c
> index a76673281514..54915a302e96 100644
> --- a/fs/xfs/libxfs/xfs_dir2.c
> +++ b/fs/xfs/libxfs/xfs_dir2.c
> @@ -104,10 +104,10 @@ xfs_da_mount(
>  	ASSERT(mp->m_sb.sb_versionnum & XFS_SB_VERSION_DIRV2BIT);
>  	ASSERT(xfs_dir2_dirblock_bytes(&mp->m_sb) <= XFS_MAX_BLOCKSIZE);
>  
> -	mp->m_dir_geo = kmem_zalloc(sizeof(struct xfs_da_geometry),
> -				    KM_MAYFAIL);
> -	mp->m_attr_geo = kmem_zalloc(sizeof(struct xfs_da_geometry),
> -				     KM_MAYFAIL);
> +	mp->m_dir_geo = kzalloc(sizeof(struct xfs_da_geometry),
> +				GFP_KERNEL | __GFP_RETRY_MAYFAIL);
> +	mp->m_attr_geo = kzalloc(sizeof(struct xfs_da_geometry),
> +				GFP_KERNEL | __GFP_RETRY_MAYFAIL);
>  	if (!mp->m_dir_geo || !mp->m_attr_geo) {
>  		kmem_free(mp->m_dir_geo);
>  		kmem_free(mp->m_attr_geo);
> @@ -236,7 +236,7 @@ xfs_dir_init(
>  	if (error)
>  		return error;
>  
> -	args = kmem_zalloc(sizeof(*args), KM_NOFS);
> +	args = kzalloc(sizeof(*args), GFP_NOFS | __GFP_NOFAIL);
>  	if (!args)
>  		return -ENOMEM;
>  
> @@ -273,7 +273,7 @@ xfs_dir_createname(
>  		XFS_STATS_INC(dp->i_mount, xs_dir_create);
>  	}
>  
> -	args = kmem_zalloc(sizeof(*args), KM_NOFS);
> +	args = kzalloc(sizeof(*args), GFP_NOFS | __GFP_NOFAIL);
>  	if (!args)
>  		return -ENOMEM;
>  
> @@ -372,7 +372,7 @@ xfs_dir_lookup(
>  	 * lockdep Doing this avoids having to add a bunch of lockdep class
>  	 * annotations into the reclaim path for the ilock.
>  	 */
> -	args = kmem_zalloc(sizeof(*args), KM_NOFS);
> +	args = kzalloc(sizeof(*args), GFP_NOFS | __GFP_NOFAIL);
>  	args->geo = dp->i_mount->m_dir_geo;
>  	args->name = name->name;
>  	args->namelen = name->len;
> @@ -441,7 +441,7 @@ xfs_dir_removename(
>  	ASSERT(S_ISDIR(VFS_I(dp)->i_mode));
>  	XFS_STATS_INC(dp->i_mount, xs_dir_remove);
>  
> -	args = kmem_zalloc(sizeof(*args), KM_NOFS);
> +	args = kzalloc(sizeof(*args), GFP_NOFS | __GFP_NOFAIL);
>  	if (!args)
>  		return -ENOMEM;
>  
> @@ -502,7 +502,7 @@ xfs_dir_replace(
>  	if (rval)
>  		return rval;
>  
> -	args = kmem_zalloc(sizeof(*args), KM_NOFS);
> +	args = kzalloc(sizeof(*args), GFP_NOFS | __GFP_NOFAIL);
>  	if (!args)
>  		return -ENOMEM;
>  
> diff --git a/fs/xfs/libxfs/xfs_iext_tree.c b/fs/xfs/libxfs/xfs_iext_tree.c
> index f4e6b200cdf8..4522f3c7a23f 100644
> --- a/fs/xfs/libxfs/xfs_iext_tree.c
> +++ b/fs/xfs/libxfs/xfs_iext_tree.c
> @@ -398,7 +398,8 @@ static void
>  xfs_iext_grow(
>  	struct xfs_ifork	*ifp)
>  {
> -	struct xfs_iext_node	*node = kmem_zalloc(NODE_SIZE, KM_NOFS);
> +	struct xfs_iext_node	*node = kzalloc(NODE_SIZE,
> +						GFP_NOFS | __GFP_NOFAIL);
>  	int			i;
>  
>  	if (ifp->if_height == 1) {
> @@ -454,7 +455,8 @@ xfs_iext_split_node(
>  	int			*nr_entries)
>  {
>  	struct xfs_iext_node	*node = *nodep;
> -	struct xfs_iext_node	*new = kmem_zalloc(NODE_SIZE, KM_NOFS);
> +	struct xfs_iext_node	*new = kzalloc(NODE_SIZE,
> +						GFP_NOFS | __GFP_NOFAIL);
>  	const int		nr_move = KEYS_PER_NODE / 2;
>  	int			nr_keep = nr_move + (KEYS_PER_NODE & 1);
>  	int			i = 0;
> @@ -542,7 +544,8 @@ xfs_iext_split_leaf(
>  	int			*nr_entries)
>  {
>  	struct xfs_iext_leaf	*leaf = cur->leaf;
> -	struct xfs_iext_leaf	*new = kmem_zalloc(NODE_SIZE, KM_NOFS);
> +	struct xfs_iext_leaf	*new = kzalloc(NODE_SIZE,
> +						GFP_NOFS | __GFP_NOFAIL);
>  	const int		nr_move = RECS_PER_LEAF / 2;
>  	int			nr_keep = nr_move + (RECS_PER_LEAF & 1);
>  	int			i;
> @@ -583,7 +586,8 @@ xfs_iext_alloc_root(
>  {
>  	ASSERT(ifp->if_bytes == 0);
>  
> -	ifp->if_data = kmem_zalloc(sizeof(struct xfs_iext_rec), KM_NOFS);
> +	ifp->if_data = kzalloc(sizeof(struct xfs_iext_rec),
> +					GFP_NOFS | __GFP_NOFAIL);
>  	ifp->if_height = 1;
>  
>  	/* now that we have a node step into it */
> diff --git a/fs/xfs/xfs_attr_item.c b/fs/xfs/xfs_attr_item.c
> index 9e02111bd890..2e454a0d6f19 100644
> --- a/fs/xfs/xfs_attr_item.c
> +++ b/fs/xfs/xfs_attr_item.c
> @@ -512,8 +512,8 @@ xfs_attri_recover_work(
>  	if (error)
>  		return ERR_PTR(error);
>  
> -	attr = kmem_zalloc(sizeof(struct xfs_attr_intent) +
> -			   sizeof(struct xfs_da_args), KM_NOFS);
> +	attr = kzalloc(sizeof(struct xfs_attr_intent) +
> +			sizeof(struct xfs_da_args), GFP_NOFS | __GFP_NOFAIL);
>  	args = (struct xfs_da_args *)(attr + 1);
>  
>  	attr->xattri_da_args = args;
> diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
> index ec4bd7a24d88..710ea4c97122 100644
> --- a/fs/xfs/xfs_buf.c
> +++ b/fs/xfs/xfs_buf.c
> @@ -189,8 +189,8 @@ xfs_buf_get_maps(
>  		return 0;
>  	}
>  
> -	bp->b_maps = kmem_zalloc(map_count * sizeof(struct xfs_buf_map),
> -				KM_NOFS);
> +	bp->b_maps = kzalloc(map_count * sizeof(struct xfs_buf_map),
> +				GFP_NOFS | __GFP_NOFAIL);
>  	if (!bp->b_maps)
>  		return -ENOMEM;
>  	return 0;
> @@ -2002,7 +2002,7 @@ xfs_alloc_buftarg(
>  #if defined(CONFIG_FS_DAX) && defined(CONFIG_MEMORY_FAILURE)
>  	ops = &xfs_dax_holder_operations;
>  #endif
> -	btp = kmem_zalloc(sizeof(*btp), KM_NOFS);
> +	btp = kzalloc(sizeof(*btp), GFP_NOFS | __GFP_NOFAIL);
>  
>  	btp->bt_mount = mp;
>  	btp->bt_bdev_handle = bdev_handle;
> diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c
> index 023d4e0385dd..ec93d34188c8 100644
> --- a/fs/xfs/xfs_buf_item.c
> +++ b/fs/xfs/xfs_buf_item.c
> @@ -805,8 +805,8 @@ xfs_buf_item_get_format(
>  		return;
>  	}
>  
> -	bip->bli_formats = kmem_zalloc(count * sizeof(struct xfs_buf_log_format),
> -				0);
> +	bip->bli_formats = kzalloc(count * sizeof(struct xfs_buf_log_format),
> +				GFP_KERNEL | __GFP_NOFAIL);
>  }
>  
>  STATIC void
> diff --git a/fs/xfs/xfs_error.c b/fs/xfs/xfs_error.c
> index b2cbbba3e15a..456520d60cd0 100644
> --- a/fs/xfs/xfs_error.c
> +++ b/fs/xfs/xfs_error.c
> @@ -240,8 +240,8 @@ xfs_errortag_init(
>  {
>  	int ret;
>  
> -	mp->m_errortag = kmem_zalloc(sizeof(unsigned int) * XFS_ERRTAG_MAX,
> -			KM_MAYFAIL);
> +	mp->m_errortag = kzalloc(sizeof(unsigned int) * XFS_ERRTAG_MAX,
> +				GFP_KERNEL | __GFP_RETRY_MAYFAIL);
>  	if (!mp->m_errortag)
>  		return -ENOMEM;
>  
> diff --git a/fs/xfs/xfs_extent_busy.c b/fs/xfs/xfs_extent_busy.c
> index 2ccde32c9a9e..b90c3dd43e03 100644
> --- a/fs/xfs/xfs_extent_busy.c
> +++ b/fs/xfs/xfs_extent_busy.c
> @@ -32,7 +32,8 @@ xfs_extent_busy_insert_list(
>  	struct rb_node		**rbp;
>  	struct rb_node		*parent = NULL;
>  
> -	new = kmem_zalloc(sizeof(struct xfs_extent_busy), 0);
> +	new = kzalloc(sizeof(struct xfs_extent_busy),
> +			GFP_KERNEL | __GFP_NOFAIL);
>  	new->agno = pag->pag_agno;
>  	new->bno = bno;
>  	new->length = len;
> diff --git a/fs/xfs/xfs_itable.c b/fs/xfs/xfs_itable.c
> index 14462614fcc8..14211174267a 100644
> --- a/fs/xfs/xfs_itable.c
> +++ b/fs/xfs/xfs_itable.c
> @@ -197,8 +197,8 @@ xfs_bulkstat_one(
>  
>  	ASSERT(breq->icount == 1);
>  
> -	bc.buf = kmem_zalloc(sizeof(struct xfs_bulkstat),
> -			KM_MAYFAIL);
> +	bc.buf = kzalloc(sizeof(struct xfs_bulkstat),
> +			GFP_KERNEL | __GFP_RETRY_MAYFAIL);
>  	if (!bc.buf)
>  		return -ENOMEM;
>  
> @@ -289,8 +289,8 @@ xfs_bulkstat(
>  	if (xfs_bulkstat_already_done(breq->mp, breq->startino))
>  		return 0;
>  
> -	bc.buf = kmem_zalloc(sizeof(struct xfs_bulkstat),
> -			KM_MAYFAIL);
> +	bc.buf = kzalloc(sizeof(struct xfs_bulkstat),
> +			GFP_KERNEL | __GFP_RETRY_MAYFAIL);
>  	if (!bc.buf)
>  		return -ENOMEM;
>  
> diff --git a/fs/xfs/xfs_iwalk.c b/fs/xfs/xfs_iwalk.c
> index b3275e8d47b6..8dbb7c054b28 100644
> --- a/fs/xfs/xfs_iwalk.c
> +++ b/fs/xfs/xfs_iwalk.c
> @@ -663,7 +663,8 @@ xfs_iwalk_threaded(
>  		if (xfs_pwork_ctl_want_abort(&pctl))
>  			break;
>  
> -		iwag = kmem_zalloc(sizeof(struct xfs_iwalk_ag), 0);
> +		iwag = kzalloc(sizeof(struct xfs_iwalk_ag),
> +				GFP_KERNEL | __GFP_NOFAIL);
>  		iwag->mp = mp;
>  
>  		/*
> diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
> index a1650fc81382..d38cfaadc726 100644
> --- a/fs/xfs/xfs_log.c
> +++ b/fs/xfs/xfs_log.c
> @@ -1528,7 +1528,7 @@ xlog_alloc_log(
>  	int			error = -ENOMEM;
>  	uint			log2_size = 0;
>  
> -	log = kmem_zalloc(sizeof(struct xlog), KM_MAYFAIL);
> +	log = kzalloc(sizeof(struct xlog), GFP_KERNEL | __GFP_RETRY_MAYFAIL);
>  	if (!log) {
>  		xfs_warn(mp, "Log allocation failed: No memory!");
>  		goto out;
> @@ -1605,7 +1605,8 @@ xlog_alloc_log(
>  		size_t bvec_size = howmany(log->l_iclog_size, PAGE_SIZE) *
>  				sizeof(struct bio_vec);
>  
> -		iclog = kmem_zalloc(sizeof(*iclog) + bvec_size, KM_MAYFAIL);
> +		iclog = kzalloc(sizeof(*iclog) + bvec_size,
> +				GFP_KERNEL | __GFP_RETRY_MAYFAIL);
>  		if (!iclog)
>  			goto out_free_iclog;
>  
> diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
> index 67a99d94701e..3c705f22b0ab 100644
> --- a/fs/xfs/xfs_log_cil.c
> +++ b/fs/xfs/xfs_log_cil.c
> @@ -100,7 +100,7 @@ xlog_cil_ctx_alloc(void)
>  {
>  	struct xfs_cil_ctx	*ctx;
>  
> -	ctx = kmem_zalloc(sizeof(*ctx), KM_NOFS);
> +	ctx = kzalloc(sizeof(*ctx), GFP_NOFS | __GFP_NOFAIL);
>  	INIT_LIST_HEAD(&ctx->committing);
>  	INIT_LIST_HEAD(&ctx->busy_extents.extent_list);
>  	INIT_LIST_HEAD(&ctx->log_items);
> @@ -1747,7 +1747,7 @@ xlog_cil_init(
>  	struct xlog_cil_pcp	*cilpcp;
>  	int			cpu;
>  
> -	cil = kmem_zalloc(sizeof(*cil), KM_MAYFAIL);
> +	cil = kzalloc(sizeof(*cil), GFP_KERNEL | __GFP_RETRY_MAYFAIL);
>  	if (!cil)
>  		return -ENOMEM;
>  	/*
> diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
> index 1251c81e55f9..4a27ecdbb546 100644
> --- a/fs/xfs/xfs_log_recover.c
> +++ b/fs/xfs/xfs_log_recover.c
> @@ -2057,7 +2057,8 @@ xlog_recover_add_item(
>  {
>  	struct xlog_recover_item *item;
>  
> -	item = kmem_zalloc(sizeof(struct xlog_recover_item), 0);
> +	item = kzalloc(sizeof(struct xlog_recover_item),
> +			GFP_KERNEL | __GFP_NOFAIL);
>  	INIT_LIST_HEAD(&item->ri_list);
>  	list_add_tail(&item->ri_list, head);
>  }
> @@ -2187,9 +2188,8 @@ xlog_recover_add_to_trans(
>  		}
>  
>  		item->ri_total = in_f->ilf_size;
> -		item->ri_buf =
> -			kmem_zalloc(item->ri_total * sizeof(xfs_log_iovec_t),
> -				    0);
> +		item->ri_buf = kzalloc(item->ri_total * sizeof(xfs_log_iovec_t),
> +				GFP_KERNEL | __GFP_NOFAIL);
>  	}
>  
>  	if (item->ri_total <= item->ri_cnt) {
> @@ -2332,7 +2332,7 @@ xlog_recover_ophdr_to_trans(
>  	 * This is a new transaction so allocate a new recovery container to
>  	 * hold the recovery ops that will follow.
>  	 */
> -	trans = kmem_zalloc(sizeof(struct xlog_recover), 0);
> +	trans = kzalloc(sizeof(struct xlog_recover), GFP_KERNEL | __GFP_NOFAIL);
>  	trans->r_log_tid = tid;
>  	trans->r_lsn = be64_to_cpu(rhead->h_lsn);
>  	INIT_LIST_HEAD(&trans->r_itemq);
> diff --git a/fs/xfs/xfs_mru_cache.c b/fs/xfs/xfs_mru_cache.c
> index f85e3b07ab44..feae3115617b 100644
> --- a/fs/xfs/xfs_mru_cache.c
> +++ b/fs/xfs/xfs_mru_cache.c
> @@ -333,13 +333,14 @@ xfs_mru_cache_create(
>  	if (!(grp_time = msecs_to_jiffies(lifetime_ms) / grp_count))
>  		return -EINVAL;
>  
> -	if (!(mru = kmem_zalloc(sizeof(*mru), 0)))
> +	mru = kzalloc(sizeof(*mru), GFP_KERNEL | __GFP_NOFAIL);
> +	if (!mru)
>  		return -ENOMEM;
>  
>  	/* An extra list is needed to avoid reaping up to a grp_time early. */
>  	mru->grp_count = grp_count + 1;
> -	mru->lists = kmem_zalloc(mru->grp_count * sizeof(*mru->lists), 0);
> -
> +	mru->lists = kzalloc(mru->grp_count * sizeof(*mru->lists),
> +				GFP_KERNEL | __GFP_NOFAIL);
>  	if (!mru->lists) {
>  		err = -ENOMEM;
>  		goto exit;
> diff --git a/fs/xfs/xfs_qm.c b/fs/xfs/xfs_qm.c
> index 94a7932ac570..b9d11376c88a 100644
> --- a/fs/xfs/xfs_qm.c
> +++ b/fs/xfs/xfs_qm.c
> @@ -628,7 +628,8 @@ xfs_qm_init_quotainfo(
>  
>  	ASSERT(XFS_IS_QUOTA_ON(mp));
>  
> -	qinf = mp->m_quotainfo = kmem_zalloc(sizeof(struct xfs_quotainfo), 0);
> +	qinf = mp->m_quotainfo = kzalloc(sizeof(struct xfs_quotainfo),
> +					GFP_KERNEL | __GFP_NOFAIL);
>  
>  	error = list_lru_init(&qinf->qi_lru);
>  	if (error)
> diff --git a/fs/xfs/xfs_refcount_item.c b/fs/xfs/xfs_refcount_item.c
> index 20ad8086da60..78d0cda60abf 100644
> --- a/fs/xfs/xfs_refcount_item.c
> +++ b/fs/xfs/xfs_refcount_item.c
> @@ -143,8 +143,8 @@ xfs_cui_init(
>  
>  	ASSERT(nextents > 0);
>  	if (nextents > XFS_CUI_MAX_FAST_EXTENTS)
> -		cuip = kmem_zalloc(xfs_cui_log_item_sizeof(nextents),
> -				0);
> +		cuip = kzalloc(xfs_cui_log_item_sizeof(nextents),
> +				GFP_KERNEL | __GFP_NOFAIL);
>  	else
>  		cuip = kmem_cache_zalloc(xfs_cui_cache,
>  					 GFP_KERNEL | __GFP_NOFAIL);
> diff --git a/fs/xfs/xfs_rmap_item.c b/fs/xfs/xfs_rmap_item.c
> index 79ad0087aeca..31a921fc34b2 100644
> --- a/fs/xfs/xfs_rmap_item.c
> +++ b/fs/xfs/xfs_rmap_item.c
> @@ -142,7 +142,8 @@ xfs_rui_init(
>  
>  	ASSERT(nextents > 0);
>  	if (nextents > XFS_RUI_MAX_FAST_EXTENTS)
> -		ruip = kmem_zalloc(xfs_rui_log_item_sizeof(nextents), 0);
> +		ruip = kzalloc(xfs_rui_log_item_sizeof(nextents),
> +				GFP_KERNEL | __GFP_NOFAIL);
>  	else
>  		ruip = kmem_cache_zalloc(xfs_rui_cache,
>  					 GFP_KERNEL | __GFP_NOFAIL);
> diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c
> index 1098452e7f95..5f206cdb40ff 100644
> --- a/fs/xfs/xfs_trans_ail.c
> +++ b/fs/xfs/xfs_trans_ail.c
> @@ -901,7 +901,8 @@ xfs_trans_ail_init(
>  {
>  	struct xfs_ail	*ailp;
>  
> -	ailp = kmem_zalloc(sizeof(struct xfs_ail), KM_MAYFAIL);
> +	ailp = kzalloc(sizeof(struct xfs_ail),
> +			GFP_KERNEL | __GFP_RETRY_MAYFAIL);
>  	if (!ailp)
>  		return -ENOMEM;
>  
> -- 
> 2.43.0
> 
> 


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 02/12] xfs: convert kmem_alloc() to kmalloc()
       [not found] ` <20240115230113.4080105-3-david@fromorbit.com>
@ 2024-01-18 22:50   ` Darrick J. Wong
  0 siblings, 0 replies; 29+ messages in thread
From: Darrick J. Wong @ 2024-01-18 22:50 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs, willy, linux-mm

On Tue, Jan 16, 2024 at 09:59:40AM +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> kmem_alloc() is just a thin wrapper around kmalloc() these days.
> Convert everything to use kmalloc() so we can get rid of the
> wrapper.
> 
> Note: the transaction region allocation in xlog_add_to_transaction()
> can be a high order allocation. Converting it to use
> kmalloc(__GFP_NOFAIL) results in warnings in the page allocation
> code being triggered because the mm subsystem does not want us to
> use __GFP_NOFAIL with high order allocations like we've been doing
> with the kmem_alloc() wrapper for a couple of decades. Hence this
> specific case gets converted to xlog_kvmalloc() rather than
> kmalloc() to avoid this issue.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>

Pretty straightforward changeup,
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> ---
>  fs/xfs/Makefile                   |  3 +--
>  fs/xfs/kmem.c                     | 30 ----------------------
>  fs/xfs/kmem.h                     | 42 -------------------------------
>  fs/xfs/libxfs/xfs_attr_leaf.c     |  7 +++---
>  fs/xfs/libxfs/xfs_btree_staging.c |  4 +--
>  fs/xfs/libxfs/xfs_da_btree.c      |  3 ++-
>  fs/xfs/libxfs/xfs_dir2.c          |  2 +-
>  fs/xfs/libxfs/xfs_dir2_block.c    |  2 +-
>  fs/xfs/libxfs/xfs_dir2_sf.c       |  8 +++---
>  fs/xfs/libxfs/xfs_inode_fork.c    | 15 +++++------
>  fs/xfs/xfs_attr_list.c            |  2 +-
>  fs/xfs/xfs_buf.c                  |  6 ++---
>  fs/xfs/xfs_buf_item_recover.c     |  2 +-
>  fs/xfs/xfs_filestream.c           |  2 +-
>  fs/xfs/xfs_inode_item_recover.c   |  3 ++-
>  fs/xfs/xfs_iwalk.c                |  2 +-
>  fs/xfs/xfs_log_recover.c          |  2 +-
>  fs/xfs/xfs_qm.c                   |  3 ++-
>  fs/xfs/xfs_rtalloc.c              |  2 +-
>  fs/xfs/xfs_super.c                |  2 +-
>  fs/xfs/xfs_trace.h                | 25 ------------------
>  21 files changed, 36 insertions(+), 131 deletions(-)
>  delete mode 100644 fs/xfs/kmem.c
> 
> diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
> index fbe3cdc79036..35a23427055b 100644
> --- a/fs/xfs/Makefile
> +++ b/fs/xfs/Makefile
> @@ -92,8 +92,7 @@ xfs-y				+= xfs_aops.o \
>  				   xfs_symlink.o \
>  				   xfs_sysfs.o \
>  				   xfs_trans.o \
> -				   xfs_xattr.o \
> -				   kmem.o
> +				   xfs_xattr.o
>  
>  # low-level transaction/log code
>  xfs-y				+= xfs_log.o \
> diff --git a/fs/xfs/kmem.c b/fs/xfs/kmem.c
> deleted file mode 100644
> index c557a030acfe..000000000000
> --- a/fs/xfs/kmem.c
> +++ /dev/null
> @@ -1,30 +0,0 @@
> -// SPDX-License-Identifier: GPL-2.0
> -/*
> - * Copyright (c) 2000-2005 Silicon Graphics, Inc.
> - * All Rights Reserved.
> - */
> -#include "xfs.h"
> -#include "xfs_message.h"
> -#include "xfs_trace.h"
> -
> -void *
> -kmem_alloc(size_t size, xfs_km_flags_t flags)
> -{
> -	int	retries = 0;
> -	gfp_t	lflags = kmem_flags_convert(flags);
> -	void	*ptr;
> -
> -	trace_kmem_alloc(size, flags, _RET_IP_);
> -
> -	do {
> -		ptr = kmalloc(size, lflags);
> -		if (ptr || (flags & KM_MAYFAIL))
> -			return ptr;
> -		if (!(++retries % 100))
> -			xfs_err(NULL,
> -	"%s(%u) possible memory allocation deadlock size %u in %s (mode:0x%x)",
> -				current->comm, current->pid,
> -				(unsigned int)size, __func__, lflags);
> -		memalloc_retry_wait(lflags);
> -	} while (1);
> -}
> diff --git a/fs/xfs/kmem.h b/fs/xfs/kmem.h
> index bce31182c9e8..1343f1a6f99b 100644
> --- a/fs/xfs/kmem.h
> +++ b/fs/xfs/kmem.h
> @@ -15,48 +15,6 @@
>   * General memory allocation interfaces
>   */
>  
> -typedef unsigned __bitwise xfs_km_flags_t;
> -#define KM_NOFS		((__force xfs_km_flags_t)0x0004u)
> -#define KM_MAYFAIL	((__force xfs_km_flags_t)0x0008u)
> -#define KM_ZERO		((__force xfs_km_flags_t)0x0010u)
> -#define KM_NOLOCKDEP	((__force xfs_km_flags_t)0x0020u)
> -
> -/*
> - * We use a special process flag to avoid recursive callbacks into
> - * the filesystem during transactions.  We will also issue our own
> - * warnings, so we explicitly skip any generic ones (silly of us).
> - */
> -static inline gfp_t
> -kmem_flags_convert(xfs_km_flags_t flags)
> -{
> -	gfp_t	lflags;
> -
> -	BUG_ON(flags & ~(KM_NOFS | KM_MAYFAIL | KM_ZERO | KM_NOLOCKDEP));
> -
> -	lflags = GFP_KERNEL | __GFP_NOWARN;
> -	if (flags & KM_NOFS)
> -		lflags &= ~__GFP_FS;
> -
> -	/*
> -	 * Default page/slab allocator behavior is to retry for ever
> -	 * for small allocations. We can override this behavior by using
> -	 * __GFP_RETRY_MAYFAIL which will tell the allocator to retry as long
> -	 * as it is feasible but rather fail than retry forever for all
> -	 * request sizes.
> -	 */
> -	if (flags & KM_MAYFAIL)
> -		lflags |= __GFP_RETRY_MAYFAIL;
> -
> -	if (flags & KM_ZERO)
> -		lflags |= __GFP_ZERO;
> -
> -	if (flags & KM_NOLOCKDEP)
> -		lflags |= __GFP_NOLOCKDEP;
> -
> -	return lflags;
> -}
> -
> -extern void *kmem_alloc(size_t, xfs_km_flags_t);
>  static inline void  kmem_free(const void *ptr)
>  {
>  	kvfree(ptr);
> diff --git a/fs/xfs/libxfs/xfs_attr_leaf.c b/fs/xfs/libxfs/xfs_attr_leaf.c
> index ab4223bf51ee..033382cf514d 100644
> --- a/fs/xfs/libxfs/xfs_attr_leaf.c
> +++ b/fs/xfs/libxfs/xfs_attr_leaf.c
> @@ -879,8 +879,7 @@ xfs_attr_shortform_to_leaf(
>  
>  	trace_xfs_attr_sf_to_leaf(args);
>  
> -	tmpbuffer = kmem_alloc(size, 0);
> -	ASSERT(tmpbuffer != NULL);
> +	tmpbuffer = kmalloc(size, GFP_KERNEL | __GFP_NOFAIL);
>  	memcpy(tmpbuffer, ifp->if_data, size);
>  	sf = (struct xfs_attr_sf_hdr *)tmpbuffer;
>  
> @@ -1059,7 +1058,7 @@ xfs_attr3_leaf_to_shortform(
>  
>  	trace_xfs_attr_leaf_to_sf(args);
>  
> -	tmpbuffer = kmem_alloc(args->geo->blksize, 0);
> +	tmpbuffer = kmalloc(args->geo->blksize, GFP_KERNEL | __GFP_NOFAIL);
>  	if (!tmpbuffer)
>  		return -ENOMEM;
>  
> @@ -1533,7 +1532,7 @@ xfs_attr3_leaf_compact(
>  
>  	trace_xfs_attr_leaf_compact(args);
>  
> -	tmpbuffer = kmem_alloc(args->geo->blksize, 0);
> +	tmpbuffer = kmalloc(args->geo->blksize, GFP_KERNEL | __GFP_NOFAIL);
>  	memcpy(tmpbuffer, bp->b_addr, args->geo->blksize);
>  	memset(bp->b_addr, 0, args->geo->blksize);
>  	leaf_src = (xfs_attr_leafblock_t *)tmpbuffer;
> diff --git a/fs/xfs/libxfs/xfs_btree_staging.c b/fs/xfs/libxfs/xfs_btree_staging.c
> index eff29425fd76..065e4a00a2f4 100644
> --- a/fs/xfs/libxfs/xfs_btree_staging.c
> +++ b/fs/xfs/libxfs/xfs_btree_staging.c
> @@ -139,7 +139,7 @@ xfs_btree_stage_afakeroot(
>  	ASSERT(!(cur->bc_flags & XFS_BTREE_ROOT_IN_INODE));
>  	ASSERT(cur->bc_tp == NULL);
>  
> -	nops = kmem_alloc(sizeof(struct xfs_btree_ops), KM_NOFS);
> +	nops = kmalloc(sizeof(struct xfs_btree_ops), GFP_NOFS | __GFP_NOFAIL);
>  	memcpy(nops, cur->bc_ops, sizeof(struct xfs_btree_ops));
>  	nops->alloc_block = xfs_btree_fakeroot_alloc_block;
>  	nops->free_block = xfs_btree_fakeroot_free_block;
> @@ -220,7 +220,7 @@ xfs_btree_stage_ifakeroot(
>  	ASSERT(cur->bc_flags & XFS_BTREE_ROOT_IN_INODE);
>  	ASSERT(cur->bc_tp == NULL);
>  
> -	nops = kmem_alloc(sizeof(struct xfs_btree_ops), KM_NOFS);
> +	nops = kmalloc(sizeof(struct xfs_btree_ops), GFP_NOFS | __GFP_NOFAIL);
>  	memcpy(nops, cur->bc_ops, sizeof(struct xfs_btree_ops));
>  	nops->alloc_block = xfs_btree_fakeroot_alloc_block;
>  	nops->free_block = xfs_btree_fakeroot_free_block;
> diff --git a/fs/xfs/libxfs/xfs_da_btree.c b/fs/xfs/libxfs/xfs_da_btree.c
> index 73aae6543906..331b9251b185 100644
> --- a/fs/xfs/libxfs/xfs_da_btree.c
> +++ b/fs/xfs/libxfs/xfs_da_btree.c
> @@ -2182,7 +2182,8 @@ xfs_da_grow_inode_int(
>  		 * If we didn't get it and the block might work if fragmented,
>  		 * try without the CONTIG flag.  Loop until we get it all.
>  		 */
> -		mapp = kmem_alloc(sizeof(*mapp) * count, 0);
> +		mapp = kmalloc(sizeof(*mapp) * count,
> +				GFP_KERNEL | __GFP_NOFAIL);
>  		for (b = *bno, mapi = 0; b < *bno + count; ) {
>  			c = (int)(*bno + count - b);
>  			nmap = min(XFS_BMAP_MAX_NMAP, c);
> diff --git a/fs/xfs/libxfs/xfs_dir2.c b/fs/xfs/libxfs/xfs_dir2.c
> index 54915a302e96..370d67300455 100644
> --- a/fs/xfs/libxfs/xfs_dir2.c
> +++ b/fs/xfs/libxfs/xfs_dir2.c
> @@ -333,7 +333,7 @@ xfs_dir_cilookup_result(
>  					!(args->op_flags & XFS_DA_OP_CILOOKUP))
>  		return -EEXIST;
>  
> -	args->value = kmem_alloc(len, KM_NOFS | KM_MAYFAIL);
> +	args->value = kmalloc(len, GFP_NOFS | __GFP_RETRY_MAYFAIL);
>  	if (!args->value)
>  		return -ENOMEM;
>  
> diff --git a/fs/xfs/libxfs/xfs_dir2_block.c b/fs/xfs/libxfs/xfs_dir2_block.c
> index 3c256d4cc40b..506c65caaec5 100644
> --- a/fs/xfs/libxfs/xfs_dir2_block.c
> +++ b/fs/xfs/libxfs/xfs_dir2_block.c
> @@ -1108,7 +1108,7 @@ xfs_dir2_sf_to_block(
>  	 * Copy the directory into a temporary buffer.
>  	 * Then pitch the incore inode data so we can make extents.
>  	 */
> -	sfp = kmem_alloc(ifp->if_bytes, 0);
> +	sfp = kmalloc(ifp->if_bytes, GFP_KERNEL | __GFP_NOFAIL);
>  	memcpy(sfp, oldsfp, ifp->if_bytes);
>  
>  	xfs_idata_realloc(dp, -ifp->if_bytes, XFS_DATA_FORK);
> diff --git a/fs/xfs/libxfs/xfs_dir2_sf.c b/fs/xfs/libxfs/xfs_dir2_sf.c
> index e1f83fc7b6ad..7b1f41cff9e0 100644
> --- a/fs/xfs/libxfs/xfs_dir2_sf.c
> +++ b/fs/xfs/libxfs/xfs_dir2_sf.c
> @@ -276,7 +276,7 @@ xfs_dir2_block_to_sf(
>  	 * format the data into.  Once we have formatted the data, we can free
>  	 * the block and copy the formatted data into the inode literal area.
>  	 */
> -	sfp = kmem_alloc(mp->m_sb.sb_inodesize, 0);
> +	sfp = kmalloc(mp->m_sb.sb_inodesize, GFP_KERNEL | __GFP_NOFAIL);
>  	memcpy(sfp, sfhp, xfs_dir2_sf_hdr_size(sfhp->i8count));
>  
>  	/*
> @@ -524,7 +524,7 @@ xfs_dir2_sf_addname_hard(
>  	 * Copy the old directory to the stack buffer.
>  	 */
>  	old_isize = (int)dp->i_disk_size;
> -	buf = kmem_alloc(old_isize, 0);
> +	buf = kmalloc(old_isize, GFP_KERNEL | __GFP_NOFAIL);
>  	oldsfp = (xfs_dir2_sf_hdr_t *)buf;
>  	memcpy(oldsfp, dp->i_df.if_data, old_isize);
>  	/*
> @@ -1151,7 +1151,7 @@ xfs_dir2_sf_toino4(
>  	 * Don't want xfs_idata_realloc copying the data here.
>  	 */
>  	oldsize = dp->i_df.if_bytes;
> -	buf = kmem_alloc(oldsize, 0);
> +	buf = kmalloc(oldsize, GFP_KERNEL | __GFP_NOFAIL);
>  	ASSERT(oldsfp->i8count == 1);
>  	memcpy(buf, oldsfp, oldsize);
>  	/*
> @@ -1223,7 +1223,7 @@ xfs_dir2_sf_toino8(
>  	 * Don't want xfs_idata_realloc copying the data here.
>  	 */
>  	oldsize = dp->i_df.if_bytes;
> -	buf = kmem_alloc(oldsize, 0);
> +	buf = kmalloc(oldsize, GFP_KERNEL | __GFP_NOFAIL);
>  	ASSERT(oldsfp->i8count == 0);
>  	memcpy(buf, oldsfp, oldsize);
>  	/*
> diff --git a/fs/xfs/libxfs/xfs_inode_fork.c b/fs/xfs/libxfs/xfs_inode_fork.c
> index f4569e18a8d0..f3cf7f933e15 100644
> --- a/fs/xfs/libxfs/xfs_inode_fork.c
> +++ b/fs/xfs/libxfs/xfs_inode_fork.c
> @@ -50,7 +50,7 @@ xfs_init_local_fork(
>  		mem_size++;
>  
>  	if (size) {
> -		char *new_data = kmem_alloc(mem_size, KM_NOFS);
> +		char *new_data = kmalloc(mem_size, GFP_NOFS | __GFP_NOFAIL);
>  
>  		memcpy(new_data, data, size);
>  		if (zero_terminate)
> @@ -77,7 +77,7 @@ xfs_iformat_local(
>  	/*
>  	 * If the size is unreasonable, then something
>  	 * is wrong and we just bail out rather than crash in
> -	 * kmem_alloc() or memcpy() below.
> +	 * kmalloc() or memcpy() below.
>  	 */
>  	if (unlikely(size > XFS_DFORK_SIZE(dip, ip->i_mount, whichfork))) {
>  		xfs_warn(ip->i_mount,
> @@ -116,7 +116,7 @@ xfs_iformat_extents(
>  
>  	/*
>  	 * If the number of extents is unreasonable, then something is wrong and
> -	 * we just bail out rather than crash in kmem_alloc() or memcpy() below.
> +	 * we just bail out rather than crash in kmalloc() or memcpy() below.
>  	 */
>  	if (unlikely(size < 0 || size > XFS_DFORK_SIZE(dip, mp, whichfork))) {
>  		xfs_warn(ip->i_mount, "corrupt inode %llu ((a)extents = %llu).",
> @@ -205,7 +205,7 @@ xfs_iformat_btree(
>  	}
>  
>  	ifp->if_broot_bytes = size;
> -	ifp->if_broot = kmem_alloc(size, KM_NOFS);
> +	ifp->if_broot = kmalloc(size, GFP_NOFS | __GFP_NOFAIL);
>  	ASSERT(ifp->if_broot != NULL);
>  	/*
>  	 * Copy and convert from the on-disk structure
> @@ -399,7 +399,8 @@ xfs_iroot_realloc(
>  		 */
>  		if (ifp->if_broot_bytes == 0) {
>  			new_size = XFS_BMAP_BROOT_SPACE_CALC(mp, rec_diff);
> -			ifp->if_broot = kmem_alloc(new_size, KM_NOFS);
> +			ifp->if_broot = kmalloc(new_size,
> +						GFP_NOFS | __GFP_NOFAIL);
>  			ifp->if_broot_bytes = (int)new_size;
>  			return;
>  		}
> @@ -440,7 +441,7 @@ xfs_iroot_realloc(
>  	else
>  		new_size = 0;
>  	if (new_size > 0) {
> -		new_broot = kmem_alloc(new_size, KM_NOFS);
> +		new_broot = kmalloc(new_size, GFP_NOFS | __GFP_NOFAIL);
>  		/*
>  		 * First copy over the btree block header.
>  		 */
> @@ -488,7 +489,7 @@ xfs_iroot_realloc(
>   *
>   * If the amount of space needed has decreased below the size of the
>   * inline buffer, then switch to using the inline buffer.  Otherwise,
> - * use kmem_realloc() or kmem_alloc() to adjust the size of the buffer
> + * use krealloc() or kmalloc() to adjust the size of the buffer
>   * to what is needed.
>   *
>   * ip -- the inode whose if_data area is changing
> diff --git a/fs/xfs/xfs_attr_list.c b/fs/xfs/xfs_attr_list.c
> index e368ad671e26..5f7a44d21cc9 100644
> --- a/fs/xfs/xfs_attr_list.c
> +++ b/fs/xfs/xfs_attr_list.c
> @@ -109,7 +109,7 @@ xfs_attr_shortform_list(
>  	 * It didn't all fit, so we have to sort everything on hashval.
>  	 */
>  	sbsize = sf->count * sizeof(*sbuf);
> -	sbp = sbuf = kmem_alloc(sbsize, KM_NOFS);
> +	sbp = sbuf = kmalloc(sbsize, GFP_NOFS | __GFP_NOFAIL);
>  
>  	/*
>  	 * Scan the attribute list for the rest of the entries, storing
> diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
> index 710ea4c97122..c348af806616 100644
> --- a/fs/xfs/xfs_buf.c
> +++ b/fs/xfs/xfs_buf.c
> @@ -325,14 +325,14 @@ xfs_buf_alloc_kmem(
>  	struct xfs_buf	*bp,
>  	xfs_buf_flags_t	flags)
>  {
> -	xfs_km_flags_t	kmflag_mask = KM_NOFS;
> +	gfp_t		gfp_mask = GFP_NOFS | __GFP_NOFAIL;
>  	size_t		size = BBTOB(bp->b_length);
>  
>  	/* Assure zeroed buffer for non-read cases. */
>  	if (!(flags & XBF_READ))
> -		kmflag_mask |= KM_ZERO;
> +		gfp_mask |= __GFP_ZERO;
>  
> -	bp->b_addr = kmem_alloc(size, kmflag_mask);
> +	bp->b_addr = kmalloc(size, gfp_mask);
>  	if (!bp->b_addr)
>  		return -ENOMEM;
>  
> diff --git a/fs/xfs/xfs_buf_item_recover.c b/fs/xfs/xfs_buf_item_recover.c
> index 43167f543afc..34776f4c05ac 100644
> --- a/fs/xfs/xfs_buf_item_recover.c
> +++ b/fs/xfs/xfs_buf_item_recover.c
> @@ -85,7 +85,7 @@ xlog_add_buffer_cancelled(
>  		return false;
>  	}
>  
> -	bcp = kmem_alloc(sizeof(struct xfs_buf_cancel), 0);
> +	bcp = kmalloc(sizeof(struct xfs_buf_cancel), GFP_KERNEL | __GFP_NOFAIL);
>  	bcp->bc_blkno = blkno;
>  	bcp->bc_len = len;
>  	bcp->bc_refcount = 1;
> diff --git a/fs/xfs/xfs_filestream.c b/fs/xfs/xfs_filestream.c
> index 2fc98d313708..e2a3c8d3fe4f 100644
> --- a/fs/xfs/xfs_filestream.c
> +++ b/fs/xfs/xfs_filestream.c
> @@ -313,7 +313,7 @@ xfs_filestream_create_association(
>  	 * we return a referenced AG, the allocation can still go ahead just
>  	 * fine.
>  	 */
> -	item = kmem_alloc(sizeof(*item), KM_MAYFAIL);
> +	item = kmalloc(sizeof(*item), GFP_KERNEL | __GFP_RETRY_MAYFAIL);
>  	if (!item)
>  		goto out_put_fstrms;
>  
> diff --git a/fs/xfs/xfs_inode_item_recover.c b/fs/xfs/xfs_inode_item_recover.c
> index 144198a6b270..5d7b937179a0 100644
> --- a/fs/xfs/xfs_inode_item_recover.c
> +++ b/fs/xfs/xfs_inode_item_recover.c
> @@ -291,7 +291,8 @@ xlog_recover_inode_commit_pass2(
>  	if (item->ri_buf[0].i_len == sizeof(struct xfs_inode_log_format)) {
>  		in_f = item->ri_buf[0].i_addr;
>  	} else {
> -		in_f = kmem_alloc(sizeof(struct xfs_inode_log_format), 0);
> +		in_f = kmalloc(sizeof(struct xfs_inode_log_format),
> +				GFP_KERNEL | __GFP_NOFAIL);
>  		need_free = 1;
>  		error = xfs_inode_item_format_convert(&item->ri_buf[0], in_f);
>  		if (error)
> diff --git a/fs/xfs/xfs_iwalk.c b/fs/xfs/xfs_iwalk.c
> index 8dbb7c054b28..5dd622aa54c5 100644
> --- a/fs/xfs/xfs_iwalk.c
> +++ b/fs/xfs/xfs_iwalk.c
> @@ -160,7 +160,7 @@ xfs_iwalk_alloc(
>  
>  	/* Allocate a prefetch buffer for inobt records. */
>  	size = iwag->sz_recs * sizeof(struct xfs_inobt_rec_incore);
> -	iwag->recs = kmem_alloc(size, KM_MAYFAIL);
> +	iwag->recs = kmalloc(size, GFP_KERNEL | __GFP_RETRY_MAYFAIL);
>  	if (iwag->recs == NULL)
>  		return -ENOMEM;
>  
> diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
> index 4a27ecdbb546..e3bd503edcab 100644
> --- a/fs/xfs/xfs_log_recover.c
> +++ b/fs/xfs/xfs_log_recover.c
> @@ -2161,7 +2161,7 @@ xlog_recover_add_to_trans(
>  		return 0;
>  	}
>  
> -	ptr = kmem_alloc(len, 0);
> +	ptr = xlog_kvmalloc(len);
>  	memcpy(ptr, dp, len);
>  	in_f = (struct xfs_inode_log_format *)ptr;
>  
> diff --git a/fs/xfs/xfs_qm.c b/fs/xfs/xfs_qm.c
> index b9d11376c88a..b130bf49013b 100644
> --- a/fs/xfs/xfs_qm.c
> +++ b/fs/xfs/xfs_qm.c
> @@ -997,7 +997,8 @@ xfs_qm_reset_dqcounts_buf(
>  	if (qip->i_nblocks == 0)
>  		return 0;
>  
> -	map = kmem_alloc(XFS_DQITER_MAP_SIZE * sizeof(*map), 0);
> +	map = kmalloc(XFS_DQITER_MAP_SIZE * sizeof(*map),
> +			GFP_KERNEL | __GFP_NOFAIL);
>  
>  	lblkno = 0;
>  	maxlblkcnt = XFS_B_TO_FSB(mp, mp->m_super->s_maxbytes);
> diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c
> index 8649d981a097..8a8d6197203e 100644
> --- a/fs/xfs/xfs_rtalloc.c
> +++ b/fs/xfs/xfs_rtalloc.c
> @@ -903,7 +903,7 @@ xfs_growfs_rt(
>  	/*
>  	 * Allocate a new (fake) mount/sb.
>  	 */
> -	nmp = kmem_alloc(sizeof(*nmp), 0);
> +	nmp = kmalloc(sizeof(*nmp), GFP_KERNEL | __GFP_NOFAIL);
>  	/*
>  	 * Loop over the bitmap blocks.
>  	 * We will do everything one bitmap block at a time.
> diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> index d0009430a627..7b1b29814be2 100644
> --- a/fs/xfs/xfs_super.c
> +++ b/fs/xfs/xfs_super.c
> @@ -1982,7 +1982,7 @@ static int xfs_init_fs_context(
>  {
>  	struct xfs_mount	*mp;
>  
> -	mp = kmem_alloc(sizeof(struct xfs_mount), KM_ZERO);
> +	mp = kzalloc(sizeof(struct xfs_mount), GFP_KERNEL | __GFP_NOFAIL);
>  	if (!mp)
>  		return -ENOMEM;
>  
> diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
> index 0984a1c884c7..c7e57efe0356 100644
> --- a/fs/xfs/xfs_trace.h
> +++ b/fs/xfs/xfs_trace.h
> @@ -4040,31 +4040,6 @@ TRACE_EVENT(xfs_pwork_init,
>  		  __entry->nr_threads, __entry->pid)
>  )
>  
> -DECLARE_EVENT_CLASS(xfs_kmem_class,
> -	TP_PROTO(ssize_t size, int flags, unsigned long caller_ip),
> -	TP_ARGS(size, flags, caller_ip),
> -	TP_STRUCT__entry(
> -		__field(ssize_t, size)
> -		__field(int, flags)
> -		__field(unsigned long, caller_ip)
> -	),
> -	TP_fast_assign(
> -		__entry->size = size;
> -		__entry->flags = flags;
> -		__entry->caller_ip = caller_ip;
> -	),
> -	TP_printk("size %zd flags 0x%x caller %pS",
> -		  __entry->size,
> -		  __entry->flags,
> -		  (char *)__entry->caller_ip)
> -)
> -
> -#define DEFINE_KMEM_EVENT(name) \
> -DEFINE_EVENT(xfs_kmem_class, name, \
> -	TP_PROTO(ssize_t size, int flags, unsigned long caller_ip), \
> -	TP_ARGS(size, flags, caller_ip))
> -DEFINE_KMEM_EVENT(kmem_alloc);
> -
>  TRACE_EVENT(xfs_check_new_dalign,
>  	TP_PROTO(struct xfs_mount *mp, int new_dalign, xfs_ino_t calc_rootino),
>  	TP_ARGS(mp, new_dalign, calc_rootino),
> -- 
> 2.43.0
> 
> 


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 03/12] xfs: move kmem_to_page()
  2024-01-15 22:59 ` [PATCH 03/12] xfs: move kmem_to_page() Dave Chinner
@ 2024-01-18 22:50   ` Darrick J. Wong
  0 siblings, 0 replies; 29+ messages in thread
From: Darrick J. Wong @ 2024-01-18 22:50 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs, willy, linux-mm

On Tue, Jan 16, 2024 at 09:59:41AM +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> Move it to the general xfs linux wrapper header file so we can
> prepare to remove kmem.h
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>

Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> ---
>  fs/xfs/kmem.h      | 11 -----------
>  fs/xfs/xfs_linux.h | 11 +++++++++++
>  2 files changed, 11 insertions(+), 11 deletions(-)
> 
> diff --git a/fs/xfs/kmem.h b/fs/xfs/kmem.h
> index 1343f1a6f99b..48e43f29f2a0 100644
> --- a/fs/xfs/kmem.h
> +++ b/fs/xfs/kmem.h
> @@ -20,15 +20,4 @@ static inline void  kmem_free(const void *ptr)
>  	kvfree(ptr);
>  }
>  
> -/*
> - * Zone interfaces
> - */
> -static inline struct page *
> -kmem_to_page(void *addr)
> -{
> -	if (is_vmalloc_addr(addr))
> -		return vmalloc_to_page(addr);
> -	return virt_to_page(addr);
> -}
> -
>  #endif /* __XFS_SUPPORT_KMEM_H__ */
> diff --git a/fs/xfs/xfs_linux.h b/fs/xfs/xfs_linux.h
> index d7873e0360f0..666618b463c9 100644
> --- a/fs/xfs/xfs_linux.h
> +++ b/fs/xfs/xfs_linux.h
> @@ -269,4 +269,15 @@ int xfs_rw_bdev(struct block_device *bdev, sector_t sector, unsigned int count,
>  # define PTR_FMT "%p"
>  #endif
>  
> +/*
> + * Helper for IO routines to grab backing pages from allocated kernel memory.
> + */
> +static inline struct page *
> +kmem_to_page(void *addr)
> +{
> +	if (is_vmalloc_addr(addr))
> +		return vmalloc_to_page(addr);
> +	return virt_to_page(addr);
> +}
> +
>  #endif /* __XFS_LINUX__ */
> -- 
> 2.43.0
> 
> 


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 04/12] xfs: convert kmem_free() for kvmalloc users to kvfree()
  2024-01-15 22:59 ` [PATCH 04/12] xfs: convert kmem_free() for kvmalloc users to kvfree() Dave Chinner
@ 2024-01-18 22:53   ` Darrick J. Wong
  0 siblings, 0 replies; 29+ messages in thread
From: Darrick J. Wong @ 2024-01-18 22:53 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs, willy, linux-mm

On Tue, Jan 16, 2024 at 09:59:42AM +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> Start getting rid of kmem_free() by converting all the cases where
> memory can come from vmalloc interfaces to calling kvfree()
> directly.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>

Looks fine to me!

Just as a warning there's some dumb bot out there that will flag
"unnecessary" use of kvfree where kfree could be used.  I choose to
ignore that bot because who gives a f*** but that's the state of things
now. :(

Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> ---
>  fs/xfs/xfs_acl.c           |  4 ++--
>  fs/xfs/xfs_attr_item.c     |  4 ++--
>  fs/xfs/xfs_bmap_item.c     |  4 ++--
>  fs/xfs/xfs_buf_item.c      |  2 +-
>  fs/xfs/xfs_dquot.c         |  2 +-
>  fs/xfs/xfs_extfree_item.c  |  4 ++--
>  fs/xfs/xfs_icreate_item.c  |  2 +-
>  fs/xfs/xfs_inode_item.c    |  2 +-
>  fs/xfs/xfs_ioctl.c         |  2 +-
>  fs/xfs/xfs_log.c           |  4 ++--
>  fs/xfs/xfs_log_cil.c       |  2 +-
>  fs/xfs/xfs_log_recover.c   | 42 +++++++++++++++++++-------------------
>  fs/xfs/xfs_refcount_item.c |  4 ++--
>  fs/xfs/xfs_rmap_item.c     |  4 ++--
>  fs/xfs/xfs_rtalloc.c       |  6 +++---
>  15 files changed, 44 insertions(+), 44 deletions(-)
> 
> diff --git a/fs/xfs/xfs_acl.c b/fs/xfs/xfs_acl.c
> index 6b840301817a..4bf69c9c088e 100644
> --- a/fs/xfs/xfs_acl.c
> +++ b/fs/xfs/xfs_acl.c
> @@ -167,7 +167,7 @@ xfs_get_acl(struct inode *inode, int type, bool rcu)
>  		acl = ERR_PTR(error);
>  	}
>  
> -	kmem_free(args.value);
> +	kvfree(args.value);
>  	return acl;
>  }
>  
> @@ -204,7 +204,7 @@ __xfs_set_acl(struct inode *inode, struct posix_acl *acl, int type)
>  	}
>  
>  	error = xfs_attr_change(&args);
> -	kmem_free(args.value);
> +	kvfree(args.value);
>  
>  	/*
>  	 * If the attribute didn't exist to start with that's fine.
> diff --git a/fs/xfs/xfs_attr_item.c b/fs/xfs/xfs_attr_item.c
> index 2e454a0d6f19..f7ba80d575d4 100644
> --- a/fs/xfs/xfs_attr_item.c
> +++ b/fs/xfs/xfs_attr_item.c
> @@ -108,7 +108,7 @@ STATIC void
>  xfs_attri_item_free(
>  	struct xfs_attri_log_item	*attrip)
>  {
> -	kmem_free(attrip->attri_item.li_lv_shadow);
> +	kvfree(attrip->attri_item.li_lv_shadow);
>  	xfs_attri_log_nameval_put(attrip->attri_nameval);
>  	kmem_cache_free(xfs_attri_cache, attrip);
>  }
> @@ -251,7 +251,7 @@ static inline struct xfs_attrd_log_item *ATTRD_ITEM(struct xfs_log_item *lip)
>  STATIC void
>  xfs_attrd_item_free(struct xfs_attrd_log_item *attrdp)
>  {
> -	kmem_free(attrdp->attrd_item.li_lv_shadow);
> +	kvfree(attrdp->attrd_item.li_lv_shadow);
>  	kmem_cache_free(xfs_attrd_cache, attrdp);
>  }
>  
> diff --git a/fs/xfs/xfs_bmap_item.c b/fs/xfs/xfs_bmap_item.c
> index 52fb8a148b7d..029a6a8d0efd 100644
> --- a/fs/xfs/xfs_bmap_item.c
> +++ b/fs/xfs/xfs_bmap_item.c
> @@ -40,7 +40,7 @@ STATIC void
>  xfs_bui_item_free(
>  	struct xfs_bui_log_item	*buip)
>  {
> -	kmem_free(buip->bui_item.li_lv_shadow);
> +	kvfree(buip->bui_item.li_lv_shadow);
>  	kmem_cache_free(xfs_bui_cache, buip);
>  }
>  
> @@ -201,7 +201,7 @@ xfs_bud_item_release(
>  	struct xfs_bud_log_item	*budp = BUD_ITEM(lip);
>  
>  	xfs_bui_release(budp->bud_buip);
> -	kmem_free(budp->bud_item.li_lv_shadow);
> +	kvfree(budp->bud_item.li_lv_shadow);
>  	kmem_cache_free(xfs_bud_cache, budp);
>  }
>  
> diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c
> index ec93d34188c8..545040c6ae87 100644
> --- a/fs/xfs/xfs_buf_item.c
> +++ b/fs/xfs/xfs_buf_item.c
> @@ -1044,7 +1044,7 @@ xfs_buf_item_free(
>  	struct xfs_buf_log_item	*bip)
>  {
>  	xfs_buf_item_free_format(bip);
> -	kmem_free(bip->bli_item.li_lv_shadow);
> +	kvfree(bip->bli_item.li_lv_shadow);
>  	kmem_cache_free(xfs_buf_item_cache, bip);
>  }
>  
> diff --git a/fs/xfs/xfs_dquot.c b/fs/xfs/xfs_dquot.c
> index a93ad76f23c5..17c82f5e783c 100644
> --- a/fs/xfs/xfs_dquot.c
> +++ b/fs/xfs/xfs_dquot.c
> @@ -53,7 +53,7 @@ xfs_qm_dqdestroy(
>  {
>  	ASSERT(list_empty(&dqp->q_lru));
>  
> -	kmem_free(dqp->q_logitem.qli_item.li_lv_shadow);
> +	kvfree(dqp->q_logitem.qli_item.li_lv_shadow);
>  	mutex_destroy(&dqp->q_qlock);
>  
>  	XFS_STATS_DEC(dqp->q_mount, xs_qm_dquot);
> diff --git a/fs/xfs/xfs_extfree_item.c b/fs/xfs/xfs_extfree_item.c
> index 1d1185fca6a5..6062703a2723 100644
> --- a/fs/xfs/xfs_extfree_item.c
> +++ b/fs/xfs/xfs_extfree_item.c
> @@ -40,7 +40,7 @@ STATIC void
>  xfs_efi_item_free(
>  	struct xfs_efi_log_item	*efip)
>  {
> -	kmem_free(efip->efi_item.li_lv_shadow);
> +	kvfree(efip->efi_item.li_lv_shadow);
>  	if (efip->efi_format.efi_nextents > XFS_EFI_MAX_FAST_EXTENTS)
>  		kmem_free(efip);
>  	else
> @@ -229,7 +229,7 @@ static inline struct xfs_efd_log_item *EFD_ITEM(struct xfs_log_item *lip)
>  STATIC void
>  xfs_efd_item_free(struct xfs_efd_log_item *efdp)
>  {
> -	kmem_free(efdp->efd_item.li_lv_shadow);
> +	kvfree(efdp->efd_item.li_lv_shadow);
>  	if (efdp->efd_format.efd_nextents > XFS_EFD_MAX_FAST_EXTENTS)
>  		kmem_free(efdp);
>  	else
> diff --git a/fs/xfs/xfs_icreate_item.c b/fs/xfs/xfs_icreate_item.c
> index b05314d48176..4345db501714 100644
> --- a/fs/xfs/xfs_icreate_item.c
> +++ b/fs/xfs/xfs_icreate_item.c
> @@ -63,7 +63,7 @@ STATIC void
>  xfs_icreate_item_release(
>  	struct xfs_log_item	*lip)
>  {
> -	kmem_free(ICR_ITEM(lip)->ic_item.li_lv_shadow);
> +	kvfree(ICR_ITEM(lip)->ic_item.li_lv_shadow);
>  	kmem_cache_free(xfs_icreate_cache, ICR_ITEM(lip));
>  }
>  
> diff --git a/fs/xfs/xfs_inode_item.c b/fs/xfs/xfs_inode_item.c
> index 0aee97ba0be8..bfbeafc8e120 100644
> --- a/fs/xfs/xfs_inode_item.c
> +++ b/fs/xfs/xfs_inode_item.c
> @@ -856,7 +856,7 @@ xfs_inode_item_destroy(
>  	ASSERT(iip->ili_item.li_buf == NULL);
>  
>  	ip->i_itemp = NULL;
> -	kmem_free(iip->ili_item.li_lv_shadow);
> +	kvfree(iip->ili_item.li_lv_shadow);
>  	kmem_cache_free(xfs_ili_cache, iip);
>  }
>  
> diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
> index f02b6e558af5..45fb169bd819 100644
> --- a/fs/xfs/xfs_ioctl.c
> +++ b/fs/xfs/xfs_ioctl.c
> @@ -493,7 +493,7 @@ xfs_attrmulti_attr_get(
>  		error = -EFAULT;
>  
>  out_kfree:
> -	kmem_free(args.value);
> +	kvfree(args.value);
>  	return error;
>  }
>  
> diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
> index d38cfaadc726..0009ffbec932 100644
> --- a/fs/xfs/xfs_log.c
> +++ b/fs/xfs/xfs_log.c
> @@ -1662,7 +1662,7 @@ xlog_alloc_log(
>  out_free_iclog:
>  	for (iclog = log->l_iclog; iclog; iclog = prev_iclog) {
>  		prev_iclog = iclog->ic_next;
> -		kmem_free(iclog->ic_data);
> +		kvfree(iclog->ic_data);
>  		kmem_free(iclog);
>  		if (prev_iclog == log->l_iclog)
>  			break;
> @@ -2119,7 +2119,7 @@ xlog_dealloc_log(
>  	iclog = log->l_iclog;
>  	for (i = 0; i < log->l_iclog_bufs; i++) {
>  		next_iclog = iclog->ic_next;
> -		kmem_free(iclog->ic_data);
> +		kvfree(iclog->ic_data);
>  		kmem_free(iclog);
>  		iclog = next_iclog;
>  	}
> diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
> index 3c705f22b0ab..2c0512916cc9 100644
> --- a/fs/xfs/xfs_log_cil.c
> +++ b/fs/xfs/xfs_log_cil.c
> @@ -339,7 +339,7 @@ xlog_cil_alloc_shadow_bufs(
>  			 * the buffer, only the log vector header and the iovec
>  			 * storage.
>  			 */
> -			kmem_free(lip->li_lv_shadow);
> +			kvfree(lip->li_lv_shadow);
>  			lv = xlog_kvmalloc(buf_size);
>  
>  			memset(lv, 0, xlog_cil_iovec_space(niovecs));
> diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
> index e3bd503edcab..295306ef6959 100644
> --- a/fs/xfs/xfs_log_recover.c
> +++ b/fs/xfs/xfs_log_recover.c
> @@ -361,7 +361,7 @@ xlog_find_verify_cycle(
>  	*new_blk = -1;
>  
>  out:
> -	kmem_free(buffer);
> +	kvfree(buffer);
>  	return error;
>  }
>  
> @@ -477,7 +477,7 @@ xlog_find_verify_log_record(
>  		*last_blk = i;
>  
>  out:
> -	kmem_free(buffer);
> +	kvfree(buffer);
>  	return error;
>  }
>  
> @@ -731,7 +731,7 @@ xlog_find_head(
>  			goto out_free_buffer;
>  	}
>  
> -	kmem_free(buffer);
> +	kvfree(buffer);
>  	if (head_blk == log_bbnum)
>  		*return_head_blk = 0;
>  	else
> @@ -745,7 +745,7 @@ xlog_find_head(
>  	return 0;
>  
>  out_free_buffer:
> -	kmem_free(buffer);
> +	kvfree(buffer);
>  	if (error)
>  		xfs_warn(log->l_mp, "failed to find log head");
>  	return error;
> @@ -999,7 +999,7 @@ xlog_verify_tail(
>  		"Tail block (0x%llx) overwrite detected. Updated to 0x%llx",
>  			 orig_tail, *tail_blk);
>  out:
> -	kmem_free(buffer);
> +	kvfree(buffer);
>  	return error;
>  }
>  
> @@ -1046,7 +1046,7 @@ xlog_verify_head(
>  	error = xlog_rseek_logrec_hdr(log, *head_blk, *tail_blk,
>  				      XLOG_MAX_ICLOGS, tmp_buffer,
>  				      &tmp_rhead_blk, &tmp_rhead, &tmp_wrapped);
> -	kmem_free(tmp_buffer);
> +	kvfree(tmp_buffer);
>  	if (error < 0)
>  		return error;
>  
> @@ -1365,7 +1365,7 @@ xlog_find_tail(
>  		error = xlog_clear_stale_blocks(log, tail_lsn);
>  
>  done:
> -	kmem_free(buffer);
> +	kvfree(buffer);
>  
>  	if (error)
>  		xfs_warn(log->l_mp, "failed to locate log tail");
> @@ -1399,6 +1399,7 @@ xlog_find_zeroed(
>  	xfs_daddr_t	new_blk, last_blk, start_blk;
>  	xfs_daddr_t     num_scan_bblks;
>  	int	        error, log_bbnum = log->l_logBBsize;
> +	int		ret = 1;
>  
>  	*blk_no = 0;
>  
> @@ -1413,8 +1414,7 @@ xlog_find_zeroed(
>  	first_cycle = xlog_get_cycle(offset);
>  	if (first_cycle == 0) {		/* completely zeroed log */
>  		*blk_no = 0;
> -		kmem_free(buffer);
> -		return 1;
> +		goto out_free_buffer;
>  	}
>  
>  	/* check partially zeroed log */
> @@ -1424,8 +1424,8 @@ xlog_find_zeroed(
>  
>  	last_cycle = xlog_get_cycle(offset);
>  	if (last_cycle != 0) {		/* log completely written to */
> -		kmem_free(buffer);
> -		return 0;
> +		ret = 0;
> +		goto out_free_buffer;
>  	}
>  
>  	/* we have a partially zeroed log */
> @@ -1471,10 +1471,10 @@ xlog_find_zeroed(
>  
>  	*blk_no = last_blk;
>  out_free_buffer:
> -	kmem_free(buffer);
> +	kvfree(buffer);
>  	if (error)
>  		return error;
> -	return 1;
> +	return ret;
>  }
>  
>  /*
> @@ -1583,7 +1583,7 @@ xlog_write_log_records(
>  	}
>  
>  out_free_buffer:
> -	kmem_free(buffer);
> +	kvfree(buffer);
>  	return error;
>  }
>  
> @@ -2183,7 +2183,7 @@ xlog_recover_add_to_trans(
>  		"bad number of regions (%d) in inode log format",
>  				  in_f->ilf_size);
>  			ASSERT(0);
> -			kmem_free(ptr);
> +			kvfree(ptr);
>  			return -EFSCORRUPTED;
>  		}
>  
> @@ -2197,7 +2197,7 @@ xlog_recover_add_to_trans(
>  	"log item region count (%d) overflowed size (%d)",
>  				item->ri_cnt, item->ri_total);
>  		ASSERT(0);
> -		kmem_free(ptr);
> +		kvfree(ptr);
>  		return -EFSCORRUPTED;
>  	}
>  
> @@ -2227,7 +2227,7 @@ xlog_recover_free_trans(
>  		/* Free the regions in the item. */
>  		list_del(&item->ri_list);
>  		for (i = 0; i < item->ri_cnt; i++)
> -			kmem_free(item->ri_buf[i].i_addr);
> +			kvfree(item->ri_buf[i].i_addr);
>  		/* Free the item itself */
>  		kmem_free(item->ri_buf);
>  		kmem_free(item);
> @@ -3024,7 +3024,7 @@ xlog_do_recovery_pass(
>  
>  		hblks = xlog_logrec_hblks(log, rhead);
>  		if (hblks != 1) {
> -			kmem_free(hbp);
> +			kvfree(hbp);
>  			hbp = xlog_alloc_buffer(log, hblks);
>  		}
>  	} else {
> @@ -3038,7 +3038,7 @@ xlog_do_recovery_pass(
>  		return -ENOMEM;
>  	dbp = xlog_alloc_buffer(log, BTOBB(h_size));
>  	if (!dbp) {
> -		kmem_free(hbp);
> +		kvfree(hbp);
>  		return -ENOMEM;
>  	}
>  
> @@ -3199,9 +3199,9 @@ xlog_do_recovery_pass(
>  	}
>  
>   bread_err2:
> -	kmem_free(dbp);
> +	kvfree(dbp);
>   bread_err1:
> -	kmem_free(hbp);
> +	kvfree(hbp);
>  
>  	/*
>  	 * Submit buffers that have been added from the last record processed,
> diff --git a/fs/xfs/xfs_refcount_item.c b/fs/xfs/xfs_refcount_item.c
> index 78d0cda60abf..a9b322e23cfb 100644
> --- a/fs/xfs/xfs_refcount_item.c
> +++ b/fs/xfs/xfs_refcount_item.c
> @@ -36,7 +36,7 @@ STATIC void
>  xfs_cui_item_free(
>  	struct xfs_cui_log_item	*cuip)
>  {
> -	kmem_free(cuip->cui_item.li_lv_shadow);
> +	kvfree(cuip->cui_item.li_lv_shadow);
>  	if (cuip->cui_format.cui_nextents > XFS_CUI_MAX_FAST_EXTENTS)
>  		kmem_free(cuip);
>  	else
> @@ -207,7 +207,7 @@ xfs_cud_item_release(
>  	struct xfs_cud_log_item	*cudp = CUD_ITEM(lip);
>  
>  	xfs_cui_release(cudp->cud_cuip);
> -	kmem_free(cudp->cud_item.li_lv_shadow);
> +	kvfree(cudp->cud_item.li_lv_shadow);
>  	kmem_cache_free(xfs_cud_cache, cudp);
>  }
>  
> diff --git a/fs/xfs/xfs_rmap_item.c b/fs/xfs/xfs_rmap_item.c
> index 31a921fc34b2..489ca8c0e1dc 100644
> --- a/fs/xfs/xfs_rmap_item.c
> +++ b/fs/xfs/xfs_rmap_item.c
> @@ -36,7 +36,7 @@ STATIC void
>  xfs_rui_item_free(
>  	struct xfs_rui_log_item	*ruip)
>  {
> -	kmem_free(ruip->rui_item.li_lv_shadow);
> +	kvfree(ruip->rui_item.li_lv_shadow);
>  	if (ruip->rui_format.rui_nextents > XFS_RUI_MAX_FAST_EXTENTS)
>  		kmem_free(ruip);
>  	else
> @@ -206,7 +206,7 @@ xfs_rud_item_release(
>  	struct xfs_rud_log_item	*rudp = RUD_ITEM(lip);
>  
>  	xfs_rui_release(rudp->rud_ruip);
> -	kmem_free(rudp->rud_item.li_lv_shadow);
> +	kvfree(rudp->rud_item.li_lv_shadow);
>  	kmem_cache_free(xfs_rud_cache, rudp);
>  }
>  
> diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c
> index 8a8d6197203e..57ed9baaf156 100644
> --- a/fs/xfs/xfs_rtalloc.c
> +++ b/fs/xfs/xfs_rtalloc.c
> @@ -1059,10 +1059,10 @@ xfs_growfs_rt(
>  	 */
>  	if (rsum_cache != mp->m_rsum_cache) {
>  		if (error) {
> -			kmem_free(mp->m_rsum_cache);
> +			kvfree(mp->m_rsum_cache);
>  			mp->m_rsum_cache = rsum_cache;
>  		} else {
> -			kmem_free(rsum_cache);
> +			kvfree(rsum_cache);
>  		}
>  	}
>  
> @@ -1233,7 +1233,7 @@ void
>  xfs_rtunmount_inodes(
>  	struct xfs_mount	*mp)
>  {
> -	kmem_free(mp->m_rsum_cache);
> +	kvfree(mp->m_rsum_cache);
>  	if (mp->m_rbmip)
>  		xfs_irele(mp->m_rbmip);
>  	if (mp->m_rsumip)
> -- 
> 2.43.0
> 
> 


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 05/12] xfs: convert remaining kmem_free() to kfree()
  2024-01-15 22:59 ` [PATCH 05/12] xfs: convert remaining kmem_free() to kfree() Dave Chinner
@ 2024-01-18 22:54   ` Darrick J. Wong
  0 siblings, 0 replies; 29+ messages in thread
From: Darrick J. Wong @ 2024-01-18 22:54 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs, willy, linux-mm

On Tue, Jan 16, 2024 at 09:59:43AM +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> The remaining callers of kmem_free() are freeing heap memory, so
> we can convert them directly to kfree() and get rid of kmem_free()
> altogether.
> 
> This conversion was done with:
> 
> $ for f in `git grep -l kmem_free fs/xfs`; do
> > sed -i s/kmem_free/kfree/ $f
> > done
> $
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>

/me bets this will cause interesting merge conflicts with the online
repair patchset but that's no reason to slow this down.

Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> ---
>  fs/xfs/kmem.h                     | 23 -----------------------
>  fs/xfs/libxfs/xfs_ag.c            |  6 +++---
>  fs/xfs/libxfs/xfs_attr_leaf.c     |  8 ++++----
>  fs/xfs/libxfs/xfs_btree.c         |  2 +-
>  fs/xfs/libxfs/xfs_btree_staging.c |  4 ++--
>  fs/xfs/libxfs/xfs_da_btree.c      | 10 +++++-----
>  fs/xfs/libxfs/xfs_defer.c         |  4 ++--
>  fs/xfs/libxfs/xfs_dir2.c          | 18 +++++++++---------
>  fs/xfs/libxfs/xfs_dir2_block.c    |  4 ++--
>  fs/xfs/libxfs/xfs_dir2_sf.c       |  8 ++++----
>  fs/xfs/libxfs/xfs_iext_tree.c     |  8 ++++----
>  fs/xfs/libxfs/xfs_inode_fork.c    |  6 +++---
>  fs/xfs/scrub/cow_repair.c         |  2 +-
>  fs/xfs/xfs_attr_item.c            |  2 +-
>  fs/xfs/xfs_attr_list.c            |  4 ++--
>  fs/xfs/xfs_buf.c                  | 12 ++++++------
>  fs/xfs/xfs_buf_item.c             |  2 +-
>  fs/xfs/xfs_buf_item_recover.c     |  6 +++---
>  fs/xfs/xfs_discard.c              |  2 +-
>  fs/xfs/xfs_error.c                |  4 ++--
>  fs/xfs/xfs_extent_busy.c          |  2 +-
>  fs/xfs/xfs_extfree_item.c         |  4 ++--
>  fs/xfs/xfs_filestream.c           |  4 ++--
>  fs/xfs/xfs_inode.c                |  4 ++--
>  fs/xfs/xfs_inode_item_recover.c   |  2 +-
>  fs/xfs/xfs_ioctl.c                |  6 +++---
>  fs/xfs/xfs_iops.c                 |  2 +-
>  fs/xfs/xfs_itable.c               |  4 ++--
>  fs/xfs/xfs_iwalk.c                |  4 ++--
>  fs/xfs/xfs_linux.h                |  3 +--
>  fs/xfs/xfs_log.c                  |  8 ++++----
>  fs/xfs/xfs_log_cil.c              | 14 +++++++-------
>  fs/xfs/xfs_log_recover.c          |  6 +++---
>  fs/xfs/xfs_mount.c                |  2 +-
>  fs/xfs/xfs_mru_cache.c            |  8 ++++----
>  fs/xfs/xfs_qm.c                   |  6 +++---
>  fs/xfs/xfs_refcount_item.c        |  2 +-
>  fs/xfs/xfs_rmap_item.c            |  2 +-
>  fs/xfs/xfs_rtalloc.c              |  2 +-
>  fs/xfs/xfs_super.c                |  2 +-
>  fs/xfs/xfs_trans_ail.c            |  4 ++--
>  41 files changed, 101 insertions(+), 125 deletions(-)
>  delete mode 100644 fs/xfs/kmem.h
> 
> diff --git a/fs/xfs/kmem.h b/fs/xfs/kmem.h
> deleted file mode 100644
> index 48e43f29f2a0..000000000000
> --- a/fs/xfs/kmem.h
> +++ /dev/null
> @@ -1,23 +0,0 @@
> -/* SPDX-License-Identifier: GPL-2.0 */
> -/*
> - * Copyright (c) 2000-2005 Silicon Graphics, Inc.
> - * All Rights Reserved.
> - */
> -#ifndef __XFS_SUPPORT_KMEM_H__
> -#define __XFS_SUPPORT_KMEM_H__
> -
> -#include <linux/slab.h>
> -#include <linux/sched.h>
> -#include <linux/mm.h>
> -#include <linux/vmalloc.h>
> -
> -/*
> - * General memory allocation interfaces
> - */
> -
> -static inline void  kmem_free(const void *ptr)
> -{
> -	kvfree(ptr);
> -}
> -
> -#endif /* __XFS_SUPPORT_KMEM_H__ */
> diff --git a/fs/xfs/libxfs/xfs_ag.c b/fs/xfs/libxfs/xfs_ag.c
> index 96a6bfd58931..937ea48d5cc0 100644
> --- a/fs/xfs/libxfs/xfs_ag.c
> +++ b/fs/xfs/libxfs/xfs_ag.c
> @@ -241,7 +241,7 @@ __xfs_free_perag(
>  	struct xfs_perag *pag = container_of(head, struct xfs_perag, rcu_head);
>  
>  	ASSERT(!delayed_work_pending(&pag->pag_blockgc_work));
> -	kmem_free(pag);
> +	kfree(pag);
>  }
>  
>  /*
> @@ -353,7 +353,7 @@ xfs_free_unused_perag_range(
>  			break;
>  		xfs_buf_hash_destroy(pag);
>  		xfs_defer_drain_free(&pag->pag_intents_drain);
> -		kmem_free(pag);
> +		kfree(pag);
>  	}
>  }
>  
> @@ -453,7 +453,7 @@ xfs_initialize_perag(
>  	radix_tree_delete(&mp->m_perag_tree, index);
>  	spin_unlock(&mp->m_perag_lock);
>  out_free_pag:
> -	kmem_free(pag);
> +	kfree(pag);
>  out_unwind_new_pags:
>  	/* unwind any prior newly initialized pags */
>  	xfs_free_unused_perag_range(mp, first_initialised, agcount);
> diff --git a/fs/xfs/libxfs/xfs_attr_leaf.c b/fs/xfs/libxfs/xfs_attr_leaf.c
> index 033382cf514d..192d9938a231 100644
> --- a/fs/xfs/libxfs/xfs_attr_leaf.c
> +++ b/fs/xfs/libxfs/xfs_attr_leaf.c
> @@ -923,7 +923,7 @@ xfs_attr_shortform_to_leaf(
>  	}
>  	error = 0;
>  out:
> -	kmem_free(tmpbuffer);
> +	kfree(tmpbuffer);
>  	return error;
>  }
>  
> @@ -1124,7 +1124,7 @@ xfs_attr3_leaf_to_shortform(
>  	error = 0;
>  
>  out:
> -	kmem_free(tmpbuffer);
> +	kfree(tmpbuffer);
>  	return error;
>  }
>  
> @@ -1570,7 +1570,7 @@ xfs_attr3_leaf_compact(
>  	 */
>  	xfs_trans_log_buf(trans, bp, 0, args->geo->blksize - 1);
>  
> -	kmem_free(tmpbuffer);
> +	kfree(tmpbuffer);
>  }
>  
>  /*
> @@ -2290,7 +2290,7 @@ xfs_attr3_leaf_unbalance(
>  		}
>  		memcpy(save_leaf, tmp_leaf, state->args->geo->blksize);
>  		savehdr = tmphdr; /* struct copy */
> -		kmem_free(tmp_leaf);
> +		kfree(tmp_leaf);
>  	}
>  
>  	xfs_attr3_leaf_hdr_to_disk(state->args->geo, save_leaf, &savehdr);
> diff --git a/fs/xfs/libxfs/xfs_btree.c b/fs/xfs/libxfs/xfs_btree.c
> index ea8d3659df20..1adfc35c99c9 100644
> --- a/fs/xfs/libxfs/xfs_btree.c
> +++ b/fs/xfs/libxfs/xfs_btree.c
> @@ -451,7 +451,7 @@ xfs_btree_del_cursor(
>  	ASSERT(cur->bc_btnum != XFS_BTNUM_BMAP || cur->bc_ino.allocated == 0 ||
>  	       xfs_is_shutdown(cur->bc_mp) || error != 0);
>  	if (unlikely(cur->bc_flags & XFS_BTREE_STAGING))
> -		kmem_free(cur->bc_ops);
> +		kfree(cur->bc_ops);
>  	if (!(cur->bc_flags & XFS_BTREE_LONG_PTRS) && cur->bc_ag.pag)
>  		xfs_perag_put(cur->bc_ag.pag);
>  	kmem_cache_free(cur->bc_cache, cur);
> diff --git a/fs/xfs/libxfs/xfs_btree_staging.c b/fs/xfs/libxfs/xfs_btree_staging.c
> index 065e4a00a2f4..961f6b898f4b 100644
> --- a/fs/xfs/libxfs/xfs_btree_staging.c
> +++ b/fs/xfs/libxfs/xfs_btree_staging.c
> @@ -171,7 +171,7 @@ xfs_btree_commit_afakeroot(
>  
>  	trace_xfs_btree_commit_afakeroot(cur);
>  
> -	kmem_free((void *)cur->bc_ops);
> +	kfree((void *)cur->bc_ops);
>  	cur->bc_ag.agbp = agbp;
>  	cur->bc_ops = ops;
>  	cur->bc_flags &= ~XFS_BTREE_STAGING;
> @@ -254,7 +254,7 @@ xfs_btree_commit_ifakeroot(
>  
>  	trace_xfs_btree_commit_ifakeroot(cur);
>  
> -	kmem_free((void *)cur->bc_ops);
> +	kfree((void *)cur->bc_ops);
>  	cur->bc_ino.ifake = NULL;
>  	cur->bc_ino.whichfork = whichfork;
>  	cur->bc_ops = ops;
> diff --git a/fs/xfs/libxfs/xfs_da_btree.c b/fs/xfs/libxfs/xfs_da_btree.c
> index 331b9251b185..3383b4525381 100644
> --- a/fs/xfs/libxfs/xfs_da_btree.c
> +++ b/fs/xfs/libxfs/xfs_da_btree.c
> @@ -2220,7 +2220,7 @@ xfs_da_grow_inode_int(
>  
>  out_free_map:
>  	if (mapp != &map)
> -		kmem_free(mapp);
> +		kfree(mapp);
>  	return error;
>  }
>  
> @@ -2559,7 +2559,7 @@ xfs_dabuf_map(
>  	*nmaps = nirecs;
>  out_free_irecs:
>  	if (irecs != &irec)
> -		kmem_free(irecs);
> +		kfree(irecs);
>  	return error;
>  
>  invalid_mapping:
> @@ -2615,7 +2615,7 @@ xfs_da_get_buf(
>  
>  out_free:
>  	if (mapp != &map)
> -		kmem_free(mapp);
> +		kfree(mapp);
>  
>  	return error;
>  }
> @@ -2656,7 +2656,7 @@ xfs_da_read_buf(
>  	*bpp = bp;
>  out_free:
>  	if (mapp != &map)
> -		kmem_free(mapp);
> +		kfree(mapp);
>  
>  	return error;
>  }
> @@ -2687,7 +2687,7 @@ xfs_da_reada_buf(
>  
>  out_free:
>  	if (mapp != &map)
> -		kmem_free(mapp);
> +		kfree(mapp);
>  
>  	return error;
>  }
> diff --git a/fs/xfs/libxfs/xfs_defer.c b/fs/xfs/libxfs/xfs_defer.c
> index 07d318b1f807..75689c151a54 100644
> --- a/fs/xfs/libxfs/xfs_defer.c
> +++ b/fs/xfs/libxfs/xfs_defer.c
> @@ -1038,7 +1038,7 @@ xfs_defer_ops_capture_abort(
>  	for (i = 0; i < dfc->dfc_held.dr_inos; i++)
>  		xfs_irele(dfc->dfc_held.dr_ip[i]);
>  
> -	kmem_free(dfc);
> +	kfree(dfc);
>  }
>  
>  /*
> @@ -1114,7 +1114,7 @@ xfs_defer_ops_continue(
>  	list_splice_init(&dfc->dfc_dfops, &tp->t_dfops);
>  	tp->t_flags |= dfc->dfc_tpflags;
>  
> -	kmem_free(dfc);
> +	kfree(dfc);
>  }
>  
>  /* Release the resources captured and continued during recovery. */
> diff --git a/fs/xfs/libxfs/xfs_dir2.c b/fs/xfs/libxfs/xfs_dir2.c
> index 370d67300455..e60aa8f8d0a7 100644
> --- a/fs/xfs/libxfs/xfs_dir2.c
> +++ b/fs/xfs/libxfs/xfs_dir2.c
> @@ -109,8 +109,8 @@ xfs_da_mount(
>  	mp->m_attr_geo = kzalloc(sizeof(struct xfs_da_geometry),
>  				GFP_KERNEL | __GFP_RETRY_MAYFAIL);
>  	if (!mp->m_dir_geo || !mp->m_attr_geo) {
> -		kmem_free(mp->m_dir_geo);
> -		kmem_free(mp->m_attr_geo);
> +		kfree(mp->m_dir_geo);
> +		kfree(mp->m_attr_geo);
>  		return -ENOMEM;
>  	}
>  
> @@ -178,8 +178,8 @@ void
>  xfs_da_unmount(
>  	struct xfs_mount	*mp)
>  {
> -	kmem_free(mp->m_dir_geo);
> -	kmem_free(mp->m_attr_geo);
> +	kfree(mp->m_dir_geo);
> +	kfree(mp->m_attr_geo);
>  }
>  
>  /*
> @@ -244,7 +244,7 @@ xfs_dir_init(
>  	args->dp = dp;
>  	args->trans = tp;
>  	error = xfs_dir2_sf_create(args, pdp->i_ino);
> -	kmem_free(args);
> +	kfree(args);
>  	return error;
>  }
>  
> @@ -313,7 +313,7 @@ xfs_dir_createname(
>  		rval = xfs_dir2_node_addname(args);
>  
>  out_free:
> -	kmem_free(args);
> +	kfree(args);
>  	return rval;
>  }
>  
> @@ -419,7 +419,7 @@ xfs_dir_lookup(
>  	}
>  out_free:
>  	xfs_iunlock(dp, lock_mode);
> -	kmem_free(args);
> +	kfree(args);
>  	return rval;
>  }
>  
> @@ -477,7 +477,7 @@ xfs_dir_removename(
>  	else
>  		rval = xfs_dir2_node_removename(args);
>  out_free:
> -	kmem_free(args);
> +	kfree(args);
>  	return rval;
>  }
>  
> @@ -538,7 +538,7 @@ xfs_dir_replace(
>  	else
>  		rval = xfs_dir2_node_replace(args);
>  out_free:
> -	kmem_free(args);
> +	kfree(args);
>  	return rval;
>  }
>  
> diff --git a/fs/xfs/libxfs/xfs_dir2_block.c b/fs/xfs/libxfs/xfs_dir2_block.c
> index 506c65caaec5..fde46081a824 100644
> --- a/fs/xfs/libxfs/xfs_dir2_block.c
> +++ b/fs/xfs/libxfs/xfs_dir2_block.c
> @@ -1253,7 +1253,7 @@ xfs_dir2_sf_to_block(
>  			sfep = xfs_dir2_sf_nextentry(mp, sfp, sfep);
>  	}
>  	/* Done with the temporary buffer */
> -	kmem_free(sfp);
> +	kfree(sfp);
>  	/*
>  	 * Sort the leaf entries by hash value.
>  	 */
> @@ -1268,6 +1268,6 @@ xfs_dir2_sf_to_block(
>  	xfs_dir3_data_check(dp, bp);
>  	return 0;
>  out_free:
> -	kmem_free(sfp);
> +	kfree(sfp);
>  	return error;
>  }
> diff --git a/fs/xfs/libxfs/xfs_dir2_sf.c b/fs/xfs/libxfs/xfs_dir2_sf.c
> index 7b1f41cff9e0..17a20384c8b7 100644
> --- a/fs/xfs/libxfs/xfs_dir2_sf.c
> +++ b/fs/xfs/libxfs/xfs_dir2_sf.c
> @@ -350,7 +350,7 @@ xfs_dir2_block_to_sf(
>  	xfs_dir2_sf_check(args);
>  out:
>  	xfs_trans_log_inode(args->trans, dp, logflags);
> -	kmem_free(sfp);
> +	kfree(sfp);
>  	return error;
>  }
>  
> @@ -576,7 +576,7 @@ xfs_dir2_sf_addname_hard(
>  		sfep = xfs_dir2_sf_nextentry(mp, sfp, sfep);
>  		memcpy(sfep, oldsfep, old_isize - nbytes);
>  	}
> -	kmem_free(buf);
> +	kfree(buf);
>  	dp->i_disk_size = new_isize;
>  	xfs_dir2_sf_check(args);
>  }
> @@ -1190,7 +1190,7 @@ xfs_dir2_sf_toino4(
>  	/*
>  	 * Clean up the inode.
>  	 */
> -	kmem_free(buf);
> +	kfree(buf);
>  	dp->i_disk_size = newsize;
>  	xfs_trans_log_inode(args->trans, dp, XFS_ILOG_CORE | XFS_ILOG_DDATA);
>  }
> @@ -1262,7 +1262,7 @@ xfs_dir2_sf_toino8(
>  	/*
>  	 * Clean up the inode.
>  	 */
> -	kmem_free(buf);
> +	kfree(buf);
>  	dp->i_disk_size = newsize;
>  	xfs_trans_log_inode(args->trans, dp, XFS_ILOG_CORE | XFS_ILOG_DDATA);
>  }
> diff --git a/fs/xfs/libxfs/xfs_iext_tree.c b/fs/xfs/libxfs/xfs_iext_tree.c
> index 4522f3c7a23f..16f18b08fe4c 100644
> --- a/fs/xfs/libxfs/xfs_iext_tree.c
> +++ b/fs/xfs/libxfs/xfs_iext_tree.c
> @@ -747,7 +747,7 @@ xfs_iext_remove_node(
>  again:
>  	ASSERT(node->ptrs[pos]);
>  	ASSERT(node->ptrs[pos] == victim);
> -	kmem_free(victim);
> +	kfree(victim);
>  
>  	nr_entries = xfs_iext_node_nr_entries(node, pos) - 1;
>  	offset = node->keys[0];
> @@ -793,7 +793,7 @@ xfs_iext_remove_node(
>  		ASSERT(node == ifp->if_data);
>  		ifp->if_data = node->ptrs[0];
>  		ifp->if_height--;
> -		kmem_free(node);
> +		kfree(node);
>  	}
>  }
>  
> @@ -867,7 +867,7 @@ xfs_iext_free_last_leaf(
>  	struct xfs_ifork	*ifp)
>  {
>  	ifp->if_height--;
> -	kmem_free(ifp->if_data);
> +	kfree(ifp->if_data);
>  	ifp->if_data = NULL;
>  }
>  
> @@ -1048,7 +1048,7 @@ xfs_iext_destroy_node(
>  		}
>  	}
>  
> -	kmem_free(node);
> +	kfree(node);
>  }
>  
>  void
> diff --git a/fs/xfs/libxfs/xfs_inode_fork.c b/fs/xfs/libxfs/xfs_inode_fork.c
> index f3cf7f933e15..f6d5b86b608d 100644
> --- a/fs/xfs/libxfs/xfs_inode_fork.c
> +++ b/fs/xfs/libxfs/xfs_inode_fork.c
> @@ -471,7 +471,7 @@ xfs_iroot_realloc(
>  						     (int)new_size);
>  		memcpy(np, op, new_max * (uint)sizeof(xfs_fsblock_t));
>  	}
> -	kmem_free(ifp->if_broot);
> +	kfree(ifp->if_broot);
>  	ifp->if_broot = new_broot;
>  	ifp->if_broot_bytes = (int)new_size;
>  	if (ifp->if_broot)
> @@ -525,13 +525,13 @@ xfs_idestroy_fork(
>  	struct xfs_ifork	*ifp)
>  {
>  	if (ifp->if_broot != NULL) {
> -		kmem_free(ifp->if_broot);
> +		kfree(ifp->if_broot);
>  		ifp->if_broot = NULL;
>  	}
>  
>  	switch (ifp->if_format) {
>  	case XFS_DINODE_FMT_LOCAL:
> -		kmem_free(ifp->if_data);
> +		kfree(ifp->if_data);
>  		ifp->if_data = NULL;
>  		break;
>  	case XFS_DINODE_FMT_EXTENTS:
> diff --git a/fs/xfs/scrub/cow_repair.c b/fs/xfs/scrub/cow_repair.c
> index 1e82c727af8e..4de3f0f40f48 100644
> --- a/fs/xfs/scrub/cow_repair.c
> +++ b/fs/xfs/scrub/cow_repair.c
> @@ -609,6 +609,6 @@ xrep_bmap_cow(
>  out_bitmap:
>  	xfsb_bitmap_destroy(&xc->old_cowfork_fsblocks);
>  	xoff_bitmap_destroy(&xc->bad_fileoffs);
> -	kmem_free(xc);
> +	kfree(xc);
>  	return error;
>  }
> diff --git a/fs/xfs/xfs_attr_item.c b/fs/xfs/xfs_attr_item.c
> index f7ba80d575d4..2a142cefdc3d 100644
> --- a/fs/xfs/xfs_attr_item.c
> +++ b/fs/xfs/xfs_attr_item.c
> @@ -386,7 +386,7 @@ xfs_attr_free_item(
>  		xfs_da_state_free(attr->xattri_da_state);
>  	xfs_attri_log_nameval_put(attr->xattri_nameval);
>  	if (attr->xattri_da_args->op_flags & XFS_DA_OP_RECOVERY)
> -		kmem_free(attr);
> +		kfree(attr);
>  	else
>  		kmem_cache_free(xfs_attr_intent_cache, attr);
>  }
> diff --git a/fs/xfs/xfs_attr_list.c b/fs/xfs/xfs_attr_list.c
> index 5f7a44d21cc9..0318d768520a 100644
> --- a/fs/xfs/xfs_attr_list.c
> +++ b/fs/xfs/xfs_attr_list.c
> @@ -124,7 +124,7 @@ xfs_attr_shortform_list(
>  					     XFS_ERRLEVEL_LOW,
>  					     context->dp->i_mount, sfe,
>  					     sizeof(*sfe));
> -			kmem_free(sbuf);
> +			kfree(sbuf);
>  			return -EFSCORRUPTED;
>  		}
>  
> @@ -188,7 +188,7 @@ xfs_attr_shortform_list(
>  		cursor->offset++;
>  	}
>  out:
> -	kmem_free(sbuf);
> +	kfree(sbuf);
>  	return error;
>  }
>  
> diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
> index c348af806616..a09ffbbb0dda 100644
> --- a/fs/xfs/xfs_buf.c
> +++ b/fs/xfs/xfs_buf.c
> @@ -204,7 +204,7 @@ xfs_buf_free_maps(
>  	struct xfs_buf	*bp)
>  {
>  	if (bp->b_maps != &bp->__b_map) {
> -		kmem_free(bp->b_maps);
> +		kfree(bp->b_maps);
>  		bp->b_maps = NULL;
>  	}
>  }
> @@ -289,7 +289,7 @@ xfs_buf_free_pages(
>  	mm_account_reclaimed_pages(bp->b_page_count);
>  
>  	if (bp->b_pages != bp->b_page_array)
> -		kmem_free(bp->b_pages);
> +		kfree(bp->b_pages);
>  	bp->b_pages = NULL;
>  	bp->b_flags &= ~_XBF_PAGES;
>  }
> @@ -315,7 +315,7 @@ xfs_buf_free(
>  	if (bp->b_flags & _XBF_PAGES)
>  		xfs_buf_free_pages(bp);
>  	else if (bp->b_flags & _XBF_KMEM)
> -		kmem_free(bp->b_addr);
> +		kfree(bp->b_addr);
>  
>  	call_rcu(&bp->b_rcu, xfs_buf_free_callback);
>  }
> @@ -339,7 +339,7 @@ xfs_buf_alloc_kmem(
>  	if (((unsigned long)(bp->b_addr + size - 1) & PAGE_MASK) !=
>  	    ((unsigned long)bp->b_addr & PAGE_MASK)) {
>  		/* b_addr spans two pages - use alloc_page instead */
> -		kmem_free(bp->b_addr);
> +		kfree(bp->b_addr);
>  		bp->b_addr = NULL;
>  		return -ENOMEM;
>  	}
> @@ -1953,7 +1953,7 @@ xfs_free_buftarg(
>  	if (btp->bt_bdev != btp->bt_mount->m_super->s_bdev)
>  		bdev_release(btp->bt_bdev_handle);
>  
> -	kmem_free(btp);
> +	kfree(btp);
>  }
>  
>  int
> @@ -2045,7 +2045,7 @@ xfs_alloc_buftarg(
>  error_lru:
>  	list_lru_destroy(&btp->bt_lru);
>  error_free:
> -	kmem_free(btp);
> +	kfree(btp);
>  	return NULL;
>  }
>  
> diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c
> index 545040c6ae87..43031842341a 100644
> --- a/fs/xfs/xfs_buf_item.c
> +++ b/fs/xfs/xfs_buf_item.c
> @@ -814,7 +814,7 @@ xfs_buf_item_free_format(
>  	struct xfs_buf_log_item	*bip)
>  {
>  	if (bip->bli_formats != &bip->__bli_format) {
> -		kmem_free(bip->bli_formats);
> +		kfree(bip->bli_formats);
>  		bip->bli_formats = NULL;
>  	}
>  }
> diff --git a/fs/xfs/xfs_buf_item_recover.c b/fs/xfs/xfs_buf_item_recover.c
> index 34776f4c05ac..09e893cf563c 100644
> --- a/fs/xfs/xfs_buf_item_recover.c
> +++ b/fs/xfs/xfs_buf_item_recover.c
> @@ -129,7 +129,7 @@ xlog_put_buffer_cancelled(
>  
>  	if (--bcp->bc_refcount == 0) {
>  		list_del(&bcp->bc_list);
> -		kmem_free(bcp);
> +		kfree(bcp);
>  	}
>  	return true;
>  }
> @@ -1062,10 +1062,10 @@ xlog_free_buf_cancel_table(
>  				&log->l_buf_cancel_table[i],
>  				struct xfs_buf_cancel, bc_list))) {
>  			list_del(&bc->bc_list);
> -			kmem_free(bc);
> +			kfree(bc);
>  		}
>  	}
>  
> -	kmem_free(log->l_buf_cancel_table);
> +	kfree(log->l_buf_cancel_table);
>  	log->l_buf_cancel_table = NULL;
>  }
> diff --git a/fs/xfs/xfs_discard.c b/fs/xfs/xfs_discard.c
> index d5787991bb5b..8539f5c9a774 100644
> --- a/fs/xfs/xfs_discard.c
> +++ b/fs/xfs/xfs_discard.c
> @@ -79,7 +79,7 @@ xfs_discard_endio_work(
>  		container_of(work, struct xfs_busy_extents, endio_work);
>  
>  	xfs_extent_busy_clear(extents->mount, &extents->extent_list, false);
> -	kmem_free(extents->owner);
> +	kfree(extents->owner);
>  }
>  
>  /*
> diff --git a/fs/xfs/xfs_error.c b/fs/xfs/xfs_error.c
> index 456520d60cd0..7ad0e92c6b5b 100644
> --- a/fs/xfs/xfs_error.c
> +++ b/fs/xfs/xfs_error.c
> @@ -248,7 +248,7 @@ xfs_errortag_init(
>  	ret = xfs_sysfs_init(&mp->m_errortag_kobj, &xfs_errortag_ktype,
>  				&mp->m_kobj, "errortag");
>  	if (ret)
> -		kmem_free(mp->m_errortag);
> +		kfree(mp->m_errortag);
>  	return ret;
>  }
>  
> @@ -257,7 +257,7 @@ xfs_errortag_del(
>  	struct xfs_mount	*mp)
>  {
>  	xfs_sysfs_del(&mp->m_errortag_kobj);
> -	kmem_free(mp->m_errortag);
> +	kfree(mp->m_errortag);
>  }
>  
>  static bool
> diff --git a/fs/xfs/xfs_extent_busy.c b/fs/xfs/xfs_extent_busy.c
> index b90c3dd43e03..56cfa1498571 100644
> --- a/fs/xfs/xfs_extent_busy.c
> +++ b/fs/xfs/xfs_extent_busy.c
> @@ -531,7 +531,7 @@ xfs_extent_busy_clear_one(
>  	}
>  
>  	list_del_init(&busyp->list);
> -	kmem_free(busyp);
> +	kfree(busyp);
>  }
>  
>  static void
> diff --git a/fs/xfs/xfs_extfree_item.c b/fs/xfs/xfs_extfree_item.c
> index 6062703a2723..8c382f092332 100644
> --- a/fs/xfs/xfs_extfree_item.c
> +++ b/fs/xfs/xfs_extfree_item.c
> @@ -42,7 +42,7 @@ xfs_efi_item_free(
>  {
>  	kvfree(efip->efi_item.li_lv_shadow);
>  	if (efip->efi_format.efi_nextents > XFS_EFI_MAX_FAST_EXTENTS)
> -		kmem_free(efip);
> +		kfree(efip);
>  	else
>  		kmem_cache_free(xfs_efi_cache, efip);
>  }
> @@ -231,7 +231,7 @@ xfs_efd_item_free(struct xfs_efd_log_item *efdp)
>  {
>  	kvfree(efdp->efd_item.li_lv_shadow);
>  	if (efdp->efd_format.efd_nextents > XFS_EFD_MAX_FAST_EXTENTS)
> -		kmem_free(efdp);
> +		kfree(efdp);
>  	else
>  		kmem_cache_free(xfs_efd_cache, efdp);
>  }
> diff --git a/fs/xfs/xfs_filestream.c b/fs/xfs/xfs_filestream.c
> index e2a3c8d3fe4f..e3aaa0555597 100644
> --- a/fs/xfs/xfs_filestream.c
> +++ b/fs/xfs/xfs_filestream.c
> @@ -44,7 +44,7 @@ xfs_fstrm_free_func(
>  	atomic_dec(&pag->pagf_fstrms);
>  	xfs_perag_rele(pag);
>  
> -	kmem_free(item);
> +	kfree(item);
>  }
>  
>  /*
> @@ -326,7 +326,7 @@ xfs_filestream_create_association(
>  
>  out_free_item:
>  	xfs_perag_rele(item->pag);
> -	kmem_free(item);
> +	kfree(item);
>  out_put_fstrms:
>  	atomic_dec(&args->pag->pagf_fstrms);
>  	return 0;
> diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> index 1fd94958aa97..37ec247edc13 100644
> --- a/fs/xfs/xfs_inode.c
> +++ b/fs/xfs/xfs_inode.c
> @@ -671,7 +671,7 @@ xfs_lookup(
>  
>  out_free_name:
>  	if (ci_name)
> -		kmem_free(ci_name->name);
> +		kfree(ci_name->name);
>  out_unlock:
>  	*ipp = NULL;
>  	return error;
> @@ -2378,7 +2378,7 @@ xfs_ifree(
>  	 * already been freed by xfs_attr_inactive.
>  	 */
>  	if (ip->i_df.if_format == XFS_DINODE_FMT_LOCAL) {
> -		kmem_free(ip->i_df.if_data);
> +		kfree(ip->i_df.if_data);
>  		ip->i_df.if_data = NULL;
>  		ip->i_df.if_bytes = 0;
>  	}
> diff --git a/fs/xfs/xfs_inode_item_recover.c b/fs/xfs/xfs_inode_item_recover.c
> index 5d7b937179a0..dbdab4ce7c44 100644
> --- a/fs/xfs/xfs_inode_item_recover.c
> +++ b/fs/xfs/xfs_inode_item_recover.c
> @@ -554,7 +554,7 @@ xlog_recover_inode_commit_pass2(
>  	xfs_buf_relse(bp);
>  error:
>  	if (need_free)
> -		kmem_free(in_f);
> +		kfree(in_f);
>  	return error;
>  }
>  
> diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
> index 45fb169bd819..7eeebcb6b925 100644
> --- a/fs/xfs/xfs_ioctl.c
> +++ b/fs/xfs/xfs_ioctl.c
> @@ -435,7 +435,7 @@ xfs_ioc_attr_list(
>  	    copy_to_user(ucursor, &context.cursor, sizeof(context.cursor)))
>  		error = -EFAULT;
>  out_free:
> -	kmem_free(buffer);
> +	kfree(buffer);
>  	return error;
>  }
>  
> @@ -1506,7 +1506,7 @@ xfs_ioc_getbmap(
>  
>  	error = 0;
>  out_free_buf:
> -	kmem_free(buf);
> +	kfree(buf);
>  	return error;
>  }
>  
> @@ -1636,7 +1636,7 @@ xfs_ioc_getfsmap(
>  	}
>  
>  out_free:
> -	kmem_free(recs);
> +	kfree(recs);
>  	return error;
>  }
>  
> diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
> index a0d77f5f512e..be102fd49560 100644
> --- a/fs/xfs/xfs_iops.c
> +++ b/fs/xfs/xfs_iops.c
> @@ -346,7 +346,7 @@ xfs_vn_ci_lookup(
>  	dname.name = ci_name.name;
>  	dname.len = ci_name.len;
>  	dentry = d_add_ci(dentry, VFS_I(ip), &dname);
> -	kmem_free(ci_name.name);
> +	kfree(ci_name.name);
>  	return dentry;
>  }
>  
> diff --git a/fs/xfs/xfs_itable.c b/fs/xfs/xfs_itable.c
> index 14211174267a..95fc31b9f87d 100644
> --- a/fs/xfs/xfs_itable.c
> +++ b/fs/xfs/xfs_itable.c
> @@ -214,7 +214,7 @@ xfs_bulkstat_one(
>  			breq->startino, &bc);
>  	xfs_trans_cancel(tp);
>  out:
> -	kmem_free(bc.buf);
> +	kfree(bc.buf);
>  
>  	/*
>  	 * If we reported one inode to userspace then we abort because we hit
> @@ -309,7 +309,7 @@ xfs_bulkstat(
>  			xfs_bulkstat_iwalk, breq->icount, &bc);
>  	xfs_trans_cancel(tp);
>  out:
> -	kmem_free(bc.buf);
> +	kfree(bc.buf);
>  
>  	/*
>  	 * We found some inodes, so clear the error status and return them.
> diff --git a/fs/xfs/xfs_iwalk.c b/fs/xfs/xfs_iwalk.c
> index 5dd622aa54c5..6d2eb6364867 100644
> --- a/fs/xfs/xfs_iwalk.c
> +++ b/fs/xfs/xfs_iwalk.c
> @@ -172,7 +172,7 @@ STATIC void
>  xfs_iwalk_free(
>  	struct xfs_iwalk_ag	*iwag)
>  {
> -	kmem_free(iwag->recs);
> +	kfree(iwag->recs);
>  	iwag->recs = NULL;
>  }
>  
> @@ -627,7 +627,7 @@ xfs_iwalk_ag_work(
>  	xfs_iwalk_free(iwag);
>  out:
>  	xfs_perag_put(iwag->pag);
> -	kmem_free(iwag);
> +	kfree(iwag);
>  	return error;
>  }
>  
> diff --git a/fs/xfs/xfs_linux.h b/fs/xfs/xfs_linux.h
> index 666618b463c9..caccb7f76690 100644
> --- a/fs/xfs/xfs_linux.h
> +++ b/fs/xfs/xfs_linux.h
> @@ -20,8 +20,6 @@ typedef __u32			xfs_dev_t;
>  typedef __u32			xfs_nlink_t;
>  
>  #include "xfs_types.h"
> -
> -#include "kmem.h"
>  #include "mrlock.h"
>  
>  #include <linux/semaphore.h>
> @@ -30,6 +28,7 @@ typedef __u32			xfs_nlink_t;
>  #include <linux/kernel.h>
>  #include <linux/blkdev.h>
>  #include <linux/slab.h>
> +#include <linux/vmalloc.h>
>  #include <linux/crc32c.h>
>  #include <linux/module.h>
>  #include <linux/mutex.h>
> diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
> index 0009ffbec932..ee39639bb92b 100644
> --- a/fs/xfs/xfs_log.c
> +++ b/fs/xfs/xfs_log.c
> @@ -1663,12 +1663,12 @@ xlog_alloc_log(
>  	for (iclog = log->l_iclog; iclog; iclog = prev_iclog) {
>  		prev_iclog = iclog->ic_next;
>  		kvfree(iclog->ic_data);
> -		kmem_free(iclog);
> +		kfree(iclog);
>  		if (prev_iclog == log->l_iclog)
>  			break;
>  	}
>  out_free_log:
> -	kmem_free(log);
> +	kfree(log);
>  out:
>  	return ERR_PTR(error);
>  }	/* xlog_alloc_log */
> @@ -2120,13 +2120,13 @@ xlog_dealloc_log(
>  	for (i = 0; i < log->l_iclog_bufs; i++) {
>  		next_iclog = iclog->ic_next;
>  		kvfree(iclog->ic_data);
> -		kmem_free(iclog);
> +		kfree(iclog);
>  		iclog = next_iclog;
>  	}
>  
>  	log->l_mp->m_log = NULL;
>  	destroy_workqueue(log->l_ioend_workqueue);
> -	kmem_free(log);
> +	kfree(log);
>  }
>  
>  /*
> diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
> index 2c0512916cc9..815a2181004c 100644
> --- a/fs/xfs/xfs_log_cil.c
> +++ b/fs/xfs/xfs_log_cil.c
> @@ -703,7 +703,7 @@ xlog_cil_free_logvec(
>  	while (!list_empty(lv_chain)) {
>  		lv = list_first_entry(lv_chain, struct xfs_log_vec, lv_list);
>  		list_del_init(&lv->lv_list);
> -		kmem_free(lv);
> +		kfree(lv);
>  	}
>  }
>  
> @@ -753,7 +753,7 @@ xlog_cil_committed(
>  		return;
>  	}
>  
> -	kmem_free(ctx);
> +	kfree(ctx);
>  }
>  
>  void
> @@ -1339,7 +1339,7 @@ xlog_cil_push_work(
>  out_skip:
>  	up_write(&cil->xc_ctx_lock);
>  	xfs_log_ticket_put(new_ctx->ticket);
> -	kmem_free(new_ctx);
> +	kfree(new_ctx);
>  	return;
>  
>  out_abort_free_ticket:
> @@ -1533,7 +1533,7 @@ xlog_cil_process_intents(
>  		set_bit(XFS_LI_WHITEOUT, &ilip->li_flags);
>  		trace_xfs_cil_whiteout_mark(ilip);
>  		len += ilip->li_lv->lv_bytes;
> -		kmem_free(ilip->li_lv);
> +		kfree(ilip->li_lv);
>  		ilip->li_lv = NULL;
>  
>  		xfs_trans_del_item(lip);
> @@ -1786,7 +1786,7 @@ xlog_cil_init(
>  out_destroy_wq:
>  	destroy_workqueue(cil->xc_push_wq);
>  out_destroy_cil:
> -	kmem_free(cil);
> +	kfree(cil);
>  	return -ENOMEM;
>  }
>  
> @@ -1799,12 +1799,12 @@ xlog_cil_destroy(
>  	if (cil->xc_ctx) {
>  		if (cil->xc_ctx->ticket)
>  			xfs_log_ticket_put(cil->xc_ctx->ticket);
> -		kmem_free(cil->xc_ctx);
> +		kfree(cil->xc_ctx);
>  	}
>  
>  	ASSERT(test_bit(XLOG_CIL_EMPTY, &cil->xc_flags));
>  	free_percpu(cil->xc_pcp);
>  	destroy_workqueue(cil->xc_push_wq);
> -	kmem_free(cil);
> +	kfree(cil);
>  }
>  
> diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
> index 295306ef6959..e9ed43a833af 100644
> --- a/fs/xfs/xfs_log_recover.c
> +++ b/fs/xfs/xfs_log_recover.c
> @@ -2229,11 +2229,11 @@ xlog_recover_free_trans(
>  		for (i = 0; i < item->ri_cnt; i++)
>  			kvfree(item->ri_buf[i].i_addr);
>  		/* Free the item itself */
> -		kmem_free(item->ri_buf);
> -		kmem_free(item);
> +		kfree(item->ri_buf);
> +		kfree(item);
>  	}
>  	/* Free the transaction recover structure */
> -	kmem_free(trans);
> +	kfree(trans);
>  }
>  
>  /*
> diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
> index aabb25dc3efa..7328034d42ed 100644
> --- a/fs/xfs/xfs_mount.c
> +++ b/fs/xfs/xfs_mount.c
> @@ -45,7 +45,7 @@ xfs_uuid_table_free(void)
>  {
>  	if (xfs_uuid_table_size == 0)
>  		return;
> -	kmem_free(xfs_uuid_table);
> +	kfree(xfs_uuid_table);
>  	xfs_uuid_table = NULL;
>  	xfs_uuid_table_size = 0;
>  }
> diff --git a/fs/xfs/xfs_mru_cache.c b/fs/xfs/xfs_mru_cache.c
> index feae3115617b..ce496704748d 100644
> --- a/fs/xfs/xfs_mru_cache.c
> +++ b/fs/xfs/xfs_mru_cache.c
> @@ -365,9 +365,9 @@ xfs_mru_cache_create(
>  
>  exit:
>  	if (err && mru && mru->lists)
> -		kmem_free(mru->lists);
> +		kfree(mru->lists);
>  	if (err && mru)
> -		kmem_free(mru);
> +		kfree(mru);
>  
>  	return err;
>  }
> @@ -407,8 +407,8 @@ xfs_mru_cache_destroy(
>  
>  	xfs_mru_cache_flush(mru);
>  
> -	kmem_free(mru->lists);
> -	kmem_free(mru);
> +	kfree(mru->lists);
> +	kfree(mru);
>  }
>  
>  /*
> diff --git a/fs/xfs/xfs_qm.c b/fs/xfs/xfs_qm.c
> index b130bf49013b..46a7fe70e57e 100644
> --- a/fs/xfs/xfs_qm.c
> +++ b/fs/xfs/xfs_qm.c
> @@ -701,7 +701,7 @@ xfs_qm_init_quotainfo(
>  out_free_lru:
>  	list_lru_destroy(&qinf->qi_lru);
>  out_free_qinf:
> -	kmem_free(qinf);
> +	kfree(qinf);
>  	mp->m_quotainfo = NULL;
>  	return error;
>  }
> @@ -725,7 +725,7 @@ xfs_qm_destroy_quotainfo(
>  	xfs_qm_destroy_quotainos(qi);
>  	mutex_destroy(&qi->qi_tree_lock);
>  	mutex_destroy(&qi->qi_quotaofflock);
> -	kmem_free(qi);
> +	kfree(qi);
>  	mp->m_quotainfo = NULL;
>  }
>  
> @@ -1060,7 +1060,7 @@ xfs_qm_reset_dqcounts_buf(
>  	} while (nmaps > 0);
>  
>  out:
> -	kmem_free(map);
> +	kfree(map);
>  	return error;
>  }
>  
> diff --git a/fs/xfs/xfs_refcount_item.c b/fs/xfs/xfs_refcount_item.c
> index a9b322e23cfb..d850b9685f7f 100644
> --- a/fs/xfs/xfs_refcount_item.c
> +++ b/fs/xfs/xfs_refcount_item.c
> @@ -38,7 +38,7 @@ xfs_cui_item_free(
>  {
>  	kvfree(cuip->cui_item.li_lv_shadow);
>  	if (cuip->cui_format.cui_nextents > XFS_CUI_MAX_FAST_EXTENTS)
> -		kmem_free(cuip);
> +		kfree(cuip);
>  	else
>  		kmem_cache_free(xfs_cui_cache, cuip);
>  }
> diff --git a/fs/xfs/xfs_rmap_item.c b/fs/xfs/xfs_rmap_item.c
> index 489ca8c0e1dc..a40b92ac81e8 100644
> --- a/fs/xfs/xfs_rmap_item.c
> +++ b/fs/xfs/xfs_rmap_item.c
> @@ -38,7 +38,7 @@ xfs_rui_item_free(
>  {
>  	kvfree(ruip->rui_item.li_lv_shadow);
>  	if (ruip->rui_format.rui_nextents > XFS_RUI_MAX_FAST_EXTENTS)
> -		kmem_free(ruip);
> +		kfree(ruip);
>  	else
>  		kmem_cache_free(xfs_rui_cache, ruip);
>  }
> diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c
> index 57ed9baaf156..2f85567f3d75 100644
> --- a/fs/xfs/xfs_rtalloc.c
> +++ b/fs/xfs/xfs_rtalloc.c
> @@ -1050,7 +1050,7 @@ xfs_growfs_rt(
>  	/*
>  	 * Free the fake mp structure.
>  	 */
> -	kmem_free(nmp);
> +	kfree(nmp);
>  
>  	/*
>  	 * If we had to allocate a new rsum_cache, we either need to free the
> diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> index 7b1b29814be2..96cb00e94551 100644
> --- a/fs/xfs/xfs_super.c
> +++ b/fs/xfs/xfs_super.c
> @@ -773,7 +773,7 @@ xfs_mount_free(
>  	debugfs_remove(mp->m_debugfs);
>  	kfree(mp->m_rtname);
>  	kfree(mp->m_logname);
> -	kmem_free(mp);
> +	kfree(mp);
>  }
>  
>  STATIC int
> diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c
> index 5f206cdb40ff..e4c343096f95 100644
> --- a/fs/xfs/xfs_trans_ail.c
> +++ b/fs/xfs/xfs_trans_ail.c
> @@ -922,7 +922,7 @@ xfs_trans_ail_init(
>  	return 0;
>  
>  out_free_ailp:
> -	kmem_free(ailp);
> +	kfree(ailp);
>  	return -ENOMEM;
>  }
>  
> @@ -933,5 +933,5 @@ xfs_trans_ail_destroy(
>  	struct xfs_ail	*ailp = mp->m_ail;
>  
>  	kthread_stop(ailp->ail_task);
> -	kmem_free(ailp);
> +	kfree(ailp);
>  }
> -- 
> 2.43.0
> 
> 


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 06/12] xfs: use an empty transaction for fstrim
  2024-01-15 22:59 ` [PATCH 06/12] xfs: use an empty transaction for fstrim Dave Chinner
@ 2024-01-18 22:55   ` Darrick J. Wong
  0 siblings, 0 replies; 29+ messages in thread
From: Darrick J. Wong @ 2024-01-18 22:55 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs, willy, linux-mm

On Tue, Jan 16, 2024 at 09:59:44AM +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> We currently use a btree walk in the fstrim code. This requires a
> btree cursor and btree cursors are only used inside transactions
> except for the fstrim code. This means that all the btree operations
> that allocate memory operate in both GFP_KERNEL and GFP_NOFS
> contexts.
> 
> This causes problems with lockdep being unable to determine the
> difference between objects that are safe to lock both above and
> below memory reclaim. Free space btree buffers are definitely locked
> both above and below reclaim and that means we have to mark all
> btree infrastructure allocations with GFP_NOFS to avoid potential
> lockdep false positives.
> 
> If we wrap this btree walk in an empty cursor, all btree walks are
> now done under transaction context and so all allocations inherit
> GFP_NOFS context from the tranaction. This enables us to move all
> the btree allocations to GFP_KERNEL context and hence help remove
> the explicit use of GFP_NOFS in XFS.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>

LOL I just wrote this exact patch to shut up lockdep.

Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> ---
>  fs/xfs/xfs_discard.c | 15 +++++++++++----
>  1 file changed, 11 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/xfs/xfs_discard.c b/fs/xfs/xfs_discard.c
> index 8539f5c9a774..299b8f907292 100644
> --- a/fs/xfs/xfs_discard.c
> +++ b/fs/xfs/xfs_discard.c
> @@ -8,6 +8,7 @@
>  #include "xfs_format.h"
>  #include "xfs_log_format.h"
>  #include "xfs_trans_resv.h"
> +#include "xfs_trans.h"
>  #include "xfs_mount.h"
>  #include "xfs_btree.h"
>  #include "xfs_alloc_btree.h"
> @@ -120,7 +121,7 @@ xfs_discard_extents(
>  		error = __blkdev_issue_discard(mp->m_ddev_targp->bt_bdev,
>  				XFS_AGB_TO_DADDR(mp, busyp->agno, busyp->bno),
>  				XFS_FSB_TO_BB(mp, busyp->length),
> -				GFP_NOFS, &bio);
> +				GFP_KERNEL, &bio);
>  		if (error && error != -EOPNOTSUPP) {
>  			xfs_info(mp,
>  	 "discard failed for extent [0x%llx,%u], error %d",
> @@ -155,6 +156,7 @@ xfs_trim_gather_extents(
>  	uint64_t		*blocks_trimmed)
>  {
>  	struct xfs_mount	*mp = pag->pag_mount;
> +	struct xfs_trans	*tp;
>  	struct xfs_btree_cur	*cur;
>  	struct xfs_buf		*agbp;
>  	int			error;
> @@ -168,11 +170,15 @@ xfs_trim_gather_extents(
>  	 */
>  	xfs_log_force(mp, XFS_LOG_SYNC);
>  
> -	error = xfs_alloc_read_agf(pag, NULL, 0, &agbp);
> +	error = xfs_trans_alloc_empty(mp, &tp);
>  	if (error)
>  		return error;
>  
> -	cur = xfs_allocbt_init_cursor(mp, NULL, agbp, pag, XFS_BTNUM_CNT);
> +	error = xfs_alloc_read_agf(pag, tp, 0, &agbp);
> +	if (error)
> +		goto out_trans_cancel;
> +
> +	cur = xfs_allocbt_init_cursor(mp, tp, agbp, pag, XFS_BTNUM_CNT);
>  
>  	/*
>  	 * Look up the extent length requested in the AGF and start with it.
> @@ -279,7 +285,8 @@ xfs_trim_gather_extents(
>  		xfs_extent_busy_clear(mp, &extents->extent_list, false);
>  out_del_cursor:
>  	xfs_btree_del_cursor(cur, error);
> -	xfs_buf_relse(agbp);
> +out_trans_cancel:
> +	xfs_trans_cancel(tp);
>  	return error;
>  }
>  
> -- 
> 2.43.0
> 
> 


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 07/12] xfs: use __GFP_NOLOCKDEP instead of GFP_NOFS
  2024-01-15 22:59 ` [PATCH 07/12] xfs: use __GFP_NOLOCKDEP instead of GFP_NOFS Dave Chinner
@ 2024-01-18 23:32   ` Darrick J. Wong
  2024-06-22  9:44   ` Long Li
  1 sibling, 0 replies; 29+ messages in thread
From: Darrick J. Wong @ 2024-01-18 23:32 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs, willy, linux-mm

On Tue, Jan 16, 2024 at 09:59:45AM +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> In the past we've had problems with lockdep false positives stemming
> from inode locking occurring in memory reclaim contexts (e.g. from
> superblock shrinkers). Lockdep doesn't know that inodes access from
> above memory reclaim cannot be accessed from below memory reclaim
> (and vice versa) but there has never been a good solution to solving
> this problem with lockdep annotations.
> 
> This situation isn't unique to inode locks - buffers are also locked
> above and below memory reclaim, and we have to maintain lock
> ordering for them - and against inodes - appropriately. IOWs, the
> same code paths and locks are taken both above and below memory
> reclaim and so we always need to make sure the lock orders are
> consistent. We are spared the lockdep problems this might cause
> by the fact that semaphores and bit locks aren't covered by lockdep.
> 
> In general, this sort of lockdep false positive detection is cause
> by code that runs GFP_KERNEL memory allocation with an actively
> referenced inode locked. When it is run from a transaction, memory
> allocation is automatically GFP_NOFS, so we don't have reclaim
> recursion issues. So in the places where we do memory allocation
> with inodes locked outside of a transaction, we have explicitly set
> them to use GFP_NOFS allocations to prevent lockdep false positives
> from being reported if the allocation dips into direct memory
> reclaim.
> 
> More recently, __GFP_NOLOCKDEP was added to the memory allocation
> flags to tell lockdep not to track that particular allocation for
> the purposes of reclaim recursion detection. This is a much better
> way of preventing false positives - it allows us to use GFP_KERNEL
> context outside of transactions, and allows direct memory reclaim to
> proceed normally without throwing out false positive deadlock
> warnings.
> 
> The obvious places that lock inodes and do memory allocation are the
> lookup paths and inode extent list initialisation. These occur in
> non-transactional GFP_KERNEL contexts, and so can run direct reclaim
> and lock inodes.
> 
> This patch makes a first path through all the explicit GFP_NOFS
> allocations in XFS and converts the obvious ones to GFP_KERNEL |
> __GFP_NOLOCKDEP as a first step towards removing explicit GFP_NOFS
> allocations from the XFS code.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>

Looks pretty straightforward to me,
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> ---
>  fs/xfs/libxfs/xfs_ag.c         |  2 +-
>  fs/xfs/libxfs/xfs_btree.h      |  4 +++-
>  fs/xfs/libxfs/xfs_da_btree.c   |  8 +++++---
>  fs/xfs/libxfs/xfs_dir2.c       | 14 ++++----------
>  fs/xfs/libxfs/xfs_iext_tree.c  | 22 +++++++++++++---------
>  fs/xfs/libxfs/xfs_inode_fork.c |  8 +++++---
>  fs/xfs/xfs_icache.c            |  5 ++---
>  fs/xfs/xfs_qm.c                |  6 +++---
>  8 files changed, 36 insertions(+), 33 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_ag.c b/fs/xfs/libxfs/xfs_ag.c
> index 937ea48d5cc0..036f4ee43fd3 100644
> --- a/fs/xfs/libxfs/xfs_ag.c
> +++ b/fs/xfs/libxfs/xfs_ag.c
> @@ -389,7 +389,7 @@ xfs_initialize_perag(
>  		pag->pag_agno = index;
>  		pag->pag_mount = mp;
>  
> -		error = radix_tree_preload(GFP_NOFS);
> +		error = radix_tree_preload(GFP_KERNEL | __GFP_RETRY_MAYFAIL);
>  		if (error)
>  			goto out_free_pag;
>  
> diff --git a/fs/xfs/libxfs/xfs_btree.h b/fs/xfs/libxfs/xfs_btree.h
> index d906324e25c8..75a0e2c8e115 100644
> --- a/fs/xfs/libxfs/xfs_btree.h
> +++ b/fs/xfs/libxfs/xfs_btree.h
> @@ -725,7 +725,9 @@ xfs_btree_alloc_cursor(
>  {
>  	struct xfs_btree_cur	*cur;
>  
> -	cur = kmem_cache_zalloc(cache, GFP_NOFS | __GFP_NOFAIL);
> +	/* BMBT allocations can come through from non-transactional context. */
> +	cur = kmem_cache_zalloc(cache,
> +			GFP_KERNEL | __GFP_NOLOCKDEP | __GFP_NOFAIL);
>  	cur->bc_tp = tp;
>  	cur->bc_mp = mp;
>  	cur->bc_btnum = btnum;
> diff --git a/fs/xfs/libxfs/xfs_da_btree.c b/fs/xfs/libxfs/xfs_da_btree.c
> index 3383b4525381..444ec1560f43 100644
> --- a/fs/xfs/libxfs/xfs_da_btree.c
> +++ b/fs/xfs/libxfs/xfs_da_btree.c
> @@ -85,7 +85,8 @@ xfs_da_state_alloc(
>  {
>  	struct xfs_da_state	*state;
>  
> -	state = kmem_cache_zalloc(xfs_da_state_cache, GFP_NOFS | __GFP_NOFAIL);
> +	state = kmem_cache_zalloc(xfs_da_state_cache,
> +			GFP_KERNEL | __GFP_NOLOCKDEP | __GFP_NOFAIL);
>  	state->args = args;
>  	state->mp = args->dp->i_mount;
>  	return state;
> @@ -2519,7 +2520,8 @@ xfs_dabuf_map(
>  	int			error = 0, nirecs, i;
>  
>  	if (nfsb > 1)
> -		irecs = kzalloc(sizeof(irec) * nfsb, GFP_NOFS | __GFP_NOFAIL);
> +		irecs = kzalloc(sizeof(irec) * nfsb,
> +				GFP_KERNEL | __GFP_NOLOCKDEP | __GFP_NOFAIL);
>  
>  	nirecs = nfsb;
>  	error = xfs_bmapi_read(dp, bno, nfsb, irecs, &nirecs,
> @@ -2533,7 +2535,7 @@ xfs_dabuf_map(
>  	 */
>  	if (nirecs > 1) {
>  		map = kzalloc(nirecs * sizeof(struct xfs_buf_map),
> -				GFP_NOFS | __GFP_NOFAIL);
> +				GFP_KERNEL | __GFP_NOLOCKDEP | __GFP_NOFAIL);
>  		if (!map) {
>  			error = -ENOMEM;
>  			goto out_free_irecs;
> diff --git a/fs/xfs/libxfs/xfs_dir2.c b/fs/xfs/libxfs/xfs_dir2.c
> index e60aa8f8d0a7..728f72f0d078 100644
> --- a/fs/xfs/libxfs/xfs_dir2.c
> +++ b/fs/xfs/libxfs/xfs_dir2.c
> @@ -333,7 +333,8 @@ xfs_dir_cilookup_result(
>  					!(args->op_flags & XFS_DA_OP_CILOOKUP))
>  		return -EEXIST;
>  
> -	args->value = kmalloc(len, GFP_NOFS | __GFP_RETRY_MAYFAIL);
> +	args->value = kmalloc(len,
> +			GFP_KERNEL | __GFP_NOLOCKDEP | __GFP_RETRY_MAYFAIL);
>  	if (!args->value)
>  		return -ENOMEM;
>  
> @@ -364,15 +365,8 @@ xfs_dir_lookup(
>  	ASSERT(S_ISDIR(VFS_I(dp)->i_mode));
>  	XFS_STATS_INC(dp->i_mount, xs_dir_lookup);
>  
> -	/*
> -	 * We need to use KM_NOFS here so that lockdep will not throw false
> -	 * positive deadlock warnings on a non-transactional lookup path. It is
> -	 * safe to recurse into inode recalim in that case, but lockdep can't
> -	 * easily be taught about it. Hence KM_NOFS avoids having to add more
> -	 * lockdep Doing this avoids having to add a bunch of lockdep class
> -	 * annotations into the reclaim path for the ilock.
> -	 */
> -	args = kzalloc(sizeof(*args), GFP_NOFS | __GFP_NOFAIL);
> +	args = kzalloc(sizeof(*args),
> +			GFP_KERNEL | __GFP_NOLOCKDEP | __GFP_NOFAIL);
>  	args->geo = dp->i_mount->m_dir_geo;
>  	args->name = name->name;
>  	args->namelen = name->len;
> diff --git a/fs/xfs/libxfs/xfs_iext_tree.c b/fs/xfs/libxfs/xfs_iext_tree.c
> index 16f18b08fe4c..8796f2b3e534 100644
> --- a/fs/xfs/libxfs/xfs_iext_tree.c
> +++ b/fs/xfs/libxfs/xfs_iext_tree.c
> @@ -394,12 +394,18 @@ xfs_iext_leaf_key(
>  	return leaf->recs[n].lo & XFS_IEXT_STARTOFF_MASK;
>  }
>  
> +static inline void *
> +xfs_iext_alloc_node(
> +	int	size)
> +{
> +	return kzalloc(size, GFP_KERNEL | __GFP_NOLOCKDEP | __GFP_NOFAIL);
> +}
> +
>  static void
>  xfs_iext_grow(
>  	struct xfs_ifork	*ifp)
>  {
> -	struct xfs_iext_node	*node = kzalloc(NODE_SIZE,
> -						GFP_NOFS | __GFP_NOFAIL);
> +	struct xfs_iext_node	*node = xfs_iext_alloc_node(NODE_SIZE);
>  	int			i;
>  
>  	if (ifp->if_height == 1) {
> @@ -455,8 +461,7 @@ xfs_iext_split_node(
>  	int			*nr_entries)
>  {
>  	struct xfs_iext_node	*node = *nodep;
> -	struct xfs_iext_node	*new = kzalloc(NODE_SIZE,
> -						GFP_NOFS | __GFP_NOFAIL);
> +	struct xfs_iext_node	*new = xfs_iext_alloc_node(NODE_SIZE);
>  	const int		nr_move = KEYS_PER_NODE / 2;
>  	int			nr_keep = nr_move + (KEYS_PER_NODE & 1);
>  	int			i = 0;
> @@ -544,8 +549,7 @@ xfs_iext_split_leaf(
>  	int			*nr_entries)
>  {
>  	struct xfs_iext_leaf	*leaf = cur->leaf;
> -	struct xfs_iext_leaf	*new = kzalloc(NODE_SIZE,
> -						GFP_NOFS | __GFP_NOFAIL);
> +	struct xfs_iext_leaf	*new = xfs_iext_alloc_node(NODE_SIZE);
>  	const int		nr_move = RECS_PER_LEAF / 2;
>  	int			nr_keep = nr_move + (RECS_PER_LEAF & 1);
>  	int			i;
> @@ -586,8 +590,7 @@ xfs_iext_alloc_root(
>  {
>  	ASSERT(ifp->if_bytes == 0);
>  
> -	ifp->if_data = kzalloc(sizeof(struct xfs_iext_rec),
> -					GFP_NOFS | __GFP_NOFAIL);
> +	ifp->if_data = xfs_iext_alloc_node(sizeof(struct xfs_iext_rec));
>  	ifp->if_height = 1;
>  
>  	/* now that we have a node step into it */
> @@ -607,7 +610,8 @@ xfs_iext_realloc_root(
>  	if (new_size / sizeof(struct xfs_iext_rec) == RECS_PER_LEAF)
>  		new_size = NODE_SIZE;
>  
> -	new = krealloc(ifp->if_data, new_size, GFP_NOFS | __GFP_NOFAIL);
> +	new = krealloc(ifp->if_data, new_size,
> +			GFP_KERNEL | __GFP_NOLOCKDEP | __GFP_NOFAIL);
>  	memset(new + ifp->if_bytes, 0, new_size - ifp->if_bytes);
>  	ifp->if_data = new;
>  	cur->leaf = new;
> diff --git a/fs/xfs/libxfs/xfs_inode_fork.c b/fs/xfs/libxfs/xfs_inode_fork.c
> index f6d5b86b608d..709fda3d742f 100644
> --- a/fs/xfs/libxfs/xfs_inode_fork.c
> +++ b/fs/xfs/libxfs/xfs_inode_fork.c
> @@ -50,7 +50,8 @@ xfs_init_local_fork(
>  		mem_size++;
>  
>  	if (size) {
> -		char *new_data = kmalloc(mem_size, GFP_NOFS | __GFP_NOFAIL);
> +		char *new_data = kmalloc(mem_size,
> +				GFP_KERNEL | __GFP_NOLOCKDEP | __GFP_NOFAIL);
>  
>  		memcpy(new_data, data, size);
>  		if (zero_terminate)
> @@ -205,7 +206,8 @@ xfs_iformat_btree(
>  	}
>  
>  	ifp->if_broot_bytes = size;
> -	ifp->if_broot = kmalloc(size, GFP_NOFS | __GFP_NOFAIL);
> +	ifp->if_broot = kmalloc(size,
> +				GFP_KERNEL | __GFP_NOLOCKDEP | __GFP_NOFAIL);
>  	ASSERT(ifp->if_broot != NULL);
>  	/*
>  	 * Copy and convert from the on-disk structure
> @@ -690,7 +692,7 @@ xfs_ifork_init_cow(
>  		return;
>  
>  	ip->i_cowfp = kmem_cache_zalloc(xfs_ifork_cache,
> -				       GFP_NOFS | __GFP_NOFAIL);
> +				GFP_KERNEL | __GFP_NOLOCKDEP | __GFP_NOFAIL);
>  	ip->i_cowfp->if_format = XFS_DINODE_FMT_EXTENTS;
>  }
>  
> diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
> index dba514a2c84d..06046827b5fe 100644
> --- a/fs/xfs/xfs_icache.c
> +++ b/fs/xfs/xfs_icache.c
> @@ -659,10 +659,9 @@ xfs_iget_cache_miss(
>  	/*
>  	 * Preload the radix tree so we can insert safely under the
>  	 * write spinlock. Note that we cannot sleep inside the preload
> -	 * region. Since we can be called from transaction context, don't
> -	 * recurse into the file system.
> +	 * region.
>  	 */
> -	if (radix_tree_preload(GFP_NOFS)) {
> +	if (radix_tree_preload(GFP_KERNEL | __GFP_NOLOCKDEP)) {
>  		error = -EAGAIN;
>  		goto out_destroy;
>  	}
> diff --git a/fs/xfs/xfs_qm.c b/fs/xfs/xfs_qm.c
> index 46a7fe70e57e..384a5349e696 100644
> --- a/fs/xfs/xfs_qm.c
> +++ b/fs/xfs/xfs_qm.c
> @@ -643,9 +643,9 @@ xfs_qm_init_quotainfo(
>  	if (error)
>  		goto out_free_lru;
>  
> -	INIT_RADIX_TREE(&qinf->qi_uquota_tree, GFP_NOFS);
> -	INIT_RADIX_TREE(&qinf->qi_gquota_tree, GFP_NOFS);
> -	INIT_RADIX_TREE(&qinf->qi_pquota_tree, GFP_NOFS);
> +	INIT_RADIX_TREE(&qinf->qi_uquota_tree, GFP_KERNEL);
> +	INIT_RADIX_TREE(&qinf->qi_gquota_tree, GFP_KERNEL);
> +	INIT_RADIX_TREE(&qinf->qi_pquota_tree, GFP_KERNEL);
>  	mutex_init(&qinf->qi_tree_lock);
>  
>  	/* mutex used to serialize quotaoffs */
> -- 
> 2.43.0
> 
> 


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 08/12] xfs: use GFP_KERNEL in pure transaction contexts
  2024-01-15 22:59 ` [PATCH 08/12] xfs: use GFP_KERNEL in pure transaction contexts Dave Chinner
@ 2024-01-18 23:38   ` Darrick J. Wong
  0 siblings, 0 replies; 29+ messages in thread
From: Darrick J. Wong @ 2024-01-18 23:38 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs, willy, linux-mm

On Tue, Jan 16, 2024 at 09:59:46AM +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> When running in a transaction context, memory allocations are scoped
> to GFP_NOFS. Hence we don't need to use GFP_NOFS contexts in pure
> transaction context allocations - GFP_KERNEL will automatically get
> converted to GFP_NOFS as appropriate.
> 
> Go through the code and convert all the obvious GFP_NOFS allocations
> in transaction context to use GFP_KERNEL. This further reduces the
> explicit use of GFP_NOFS in XFS.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>

Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> ---
>  fs/xfs/libxfs/xfs_attr.c       |  3 ++-
>  fs/xfs/libxfs/xfs_bmap.c       |  2 +-
>  fs/xfs/libxfs/xfs_defer.c      |  6 +++---
>  fs/xfs/libxfs/xfs_dir2.c       |  8 ++++----
>  fs/xfs/libxfs/xfs_inode_fork.c |  8 ++++----
>  fs/xfs/libxfs/xfs_refcount.c   |  2 +-
>  fs/xfs/libxfs/xfs_rmap.c       |  2 +-
>  fs/xfs/xfs_attr_item.c         |  4 ++--
>  fs/xfs/xfs_bmap_util.c         |  2 +-
>  fs/xfs/xfs_buf.c               | 28 +++++++++++++++++-----------
>  fs/xfs/xfs_log.c               |  3 ++-
>  fs/xfs/xfs_mru_cache.c         |  2 +-
>  12 files changed, 39 insertions(+), 31 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
> index 9976a00a73f9..269a57420859 100644
> --- a/fs/xfs/libxfs/xfs_attr.c
> +++ b/fs/xfs/libxfs/xfs_attr.c
> @@ -891,7 +891,8 @@ xfs_attr_defer_add(
>  
>  	struct xfs_attr_intent	*new;
>  
> -	new = kmem_cache_zalloc(xfs_attr_intent_cache, GFP_NOFS | __GFP_NOFAIL);
> +	new = kmem_cache_zalloc(xfs_attr_intent_cache,
> +			GFP_KERNEL | __GFP_NOFAIL);
>  	new->xattri_op_flags = op_flags;
>  	new->xattri_da_args = args;
>  
> diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> index 98aaca933bdd..fbdaa53deecd 100644
> --- a/fs/xfs/libxfs/xfs_bmap.c
> +++ b/fs/xfs/libxfs/xfs_bmap.c
> @@ -6098,7 +6098,7 @@ __xfs_bmap_add(
>  			bmap->br_blockcount,
>  			bmap->br_state);
>  
> -	bi = kmem_cache_alloc(xfs_bmap_intent_cache, GFP_NOFS | __GFP_NOFAIL);
> +	bi = kmem_cache_alloc(xfs_bmap_intent_cache, GFP_KERNEL | __GFP_NOFAIL);
>  	INIT_LIST_HEAD(&bi->bi_list);
>  	bi->bi_type = type;
>  	bi->bi_owner = ip;
> diff --git a/fs/xfs/libxfs/xfs_defer.c b/fs/xfs/libxfs/xfs_defer.c
> index 75689c151a54..8ae4401f6810 100644
> --- a/fs/xfs/libxfs/xfs_defer.c
> +++ b/fs/xfs/libxfs/xfs_defer.c
> @@ -825,7 +825,7 @@ xfs_defer_alloc(
>  	struct xfs_defer_pending	*dfp;
>  
>  	dfp = kmem_cache_zalloc(xfs_defer_pending_cache,
> -			GFP_NOFS | __GFP_NOFAIL);
> +			GFP_KERNEL | __GFP_NOFAIL);
>  	dfp->dfp_ops = ops;
>  	INIT_LIST_HEAD(&dfp->dfp_work);
>  	list_add_tail(&dfp->dfp_list, &tp->t_dfops);
> @@ -888,7 +888,7 @@ xfs_defer_start_recovery(
>  	struct xfs_defer_pending	*dfp;
>  
>  	dfp = kmem_cache_zalloc(xfs_defer_pending_cache,
> -			GFP_NOFS | __GFP_NOFAIL);
> +			GFP_KERNEL | __GFP_NOFAIL);
>  	dfp->dfp_ops = ops;
>  	dfp->dfp_intent = lip;
>  	INIT_LIST_HEAD(&dfp->dfp_work);
> @@ -979,7 +979,7 @@ xfs_defer_ops_capture(
>  		return ERR_PTR(error);
>  
>  	/* Create an object to capture the defer ops. */
> -	dfc = kzalloc(sizeof(*dfc), GFP_NOFS | __GFP_NOFAIL);
> +	dfc = kzalloc(sizeof(*dfc), GFP_KERNEL | __GFP_NOFAIL);
>  	INIT_LIST_HEAD(&dfc->dfc_list);
>  	INIT_LIST_HEAD(&dfc->dfc_dfops);
>  
> diff --git a/fs/xfs/libxfs/xfs_dir2.c b/fs/xfs/libxfs/xfs_dir2.c
> index 728f72f0d078..8c9403b33191 100644
> --- a/fs/xfs/libxfs/xfs_dir2.c
> +++ b/fs/xfs/libxfs/xfs_dir2.c
> @@ -236,7 +236,7 @@ xfs_dir_init(
>  	if (error)
>  		return error;
>  
> -	args = kzalloc(sizeof(*args), GFP_NOFS | __GFP_NOFAIL);
> +	args = kzalloc(sizeof(*args), GFP_KERNEL | __GFP_NOFAIL);
>  	if (!args)
>  		return -ENOMEM;
>  
> @@ -273,7 +273,7 @@ xfs_dir_createname(
>  		XFS_STATS_INC(dp->i_mount, xs_dir_create);
>  	}
>  
> -	args = kzalloc(sizeof(*args), GFP_NOFS | __GFP_NOFAIL);
> +	args = kzalloc(sizeof(*args), GFP_KERNEL | __GFP_NOFAIL);
>  	if (!args)
>  		return -ENOMEM;
>  
> @@ -435,7 +435,7 @@ xfs_dir_removename(
>  	ASSERT(S_ISDIR(VFS_I(dp)->i_mode));
>  	XFS_STATS_INC(dp->i_mount, xs_dir_remove);
>  
> -	args = kzalloc(sizeof(*args), GFP_NOFS | __GFP_NOFAIL);
> +	args = kzalloc(sizeof(*args), GFP_KERNEL | __GFP_NOFAIL);
>  	if (!args)
>  		return -ENOMEM;
>  
> @@ -496,7 +496,7 @@ xfs_dir_replace(
>  	if (rval)
>  		return rval;
>  
> -	args = kzalloc(sizeof(*args), GFP_NOFS | __GFP_NOFAIL);
> +	args = kzalloc(sizeof(*args), GFP_KERNEL | __GFP_NOFAIL);
>  	if (!args)
>  		return -ENOMEM;
>  
> diff --git a/fs/xfs/libxfs/xfs_inode_fork.c b/fs/xfs/libxfs/xfs_inode_fork.c
> index 709fda3d742f..136d5d7b9de9 100644
> --- a/fs/xfs/libxfs/xfs_inode_fork.c
> +++ b/fs/xfs/libxfs/xfs_inode_fork.c
> @@ -402,7 +402,7 @@ xfs_iroot_realloc(
>  		if (ifp->if_broot_bytes == 0) {
>  			new_size = XFS_BMAP_BROOT_SPACE_CALC(mp, rec_diff);
>  			ifp->if_broot = kmalloc(new_size,
> -						GFP_NOFS | __GFP_NOFAIL);
> +						GFP_KERNEL | __GFP_NOFAIL);
>  			ifp->if_broot_bytes = (int)new_size;
>  			return;
>  		}
> @@ -417,7 +417,7 @@ xfs_iroot_realloc(
>  		new_max = cur_max + rec_diff;
>  		new_size = XFS_BMAP_BROOT_SPACE_CALC(mp, new_max);
>  		ifp->if_broot = krealloc(ifp->if_broot, new_size,
> -					 GFP_NOFS | __GFP_NOFAIL);
> +					 GFP_KERNEL | __GFP_NOFAIL);
>  		op = (char *)XFS_BMAP_BROOT_PTR_ADDR(mp, ifp->if_broot, 1,
>  						     ifp->if_broot_bytes);
>  		np = (char *)XFS_BMAP_BROOT_PTR_ADDR(mp, ifp->if_broot, 1,
> @@ -443,7 +443,7 @@ xfs_iroot_realloc(
>  	else
>  		new_size = 0;
>  	if (new_size > 0) {
> -		new_broot = kmalloc(new_size, GFP_NOFS | __GFP_NOFAIL);
> +		new_broot = kmalloc(new_size, GFP_KERNEL | __GFP_NOFAIL);
>  		/*
>  		 * First copy over the btree block header.
>  		 */
> @@ -512,7 +512,7 @@ xfs_idata_realloc(
>  
>  	if (byte_diff) {
>  		ifp->if_data = krealloc(ifp->if_data, new_size,
> -					GFP_NOFS | __GFP_NOFAIL);
> +					GFP_KERNEL | __GFP_NOFAIL);
>  		if (new_size == 0)
>  			ifp->if_data = NULL;
>  		ifp->if_bytes = new_size;
> diff --git a/fs/xfs/libxfs/xfs_refcount.c b/fs/xfs/libxfs/xfs_refcount.c
> index 6709a7f8bad5..7df52daa22cf 100644
> --- a/fs/xfs/libxfs/xfs_refcount.c
> +++ b/fs/xfs/libxfs/xfs_refcount.c
> @@ -1449,7 +1449,7 @@ __xfs_refcount_add(
>  			blockcount);
>  
>  	ri = kmem_cache_alloc(xfs_refcount_intent_cache,
> -			GFP_NOFS | __GFP_NOFAIL);
> +			GFP_KERNEL | __GFP_NOFAIL);
>  	INIT_LIST_HEAD(&ri->ri_list);
>  	ri->ri_type = type;
>  	ri->ri_startblock = startblock;
> diff --git a/fs/xfs/libxfs/xfs_rmap.c b/fs/xfs/libxfs/xfs_rmap.c
> index 76bf7f48cb5a..0bd1f47b2c2b 100644
> --- a/fs/xfs/libxfs/xfs_rmap.c
> +++ b/fs/xfs/libxfs/xfs_rmap.c
> @@ -2559,7 +2559,7 @@ __xfs_rmap_add(
>  			bmap->br_blockcount,
>  			bmap->br_state);
>  
> -	ri = kmem_cache_alloc(xfs_rmap_intent_cache, GFP_NOFS | __GFP_NOFAIL);
> +	ri = kmem_cache_alloc(xfs_rmap_intent_cache, GFP_KERNEL | __GFP_NOFAIL);
>  	INIT_LIST_HEAD(&ri->ri_list);
>  	ri->ri_type = type;
>  	ri->ri_owner = owner;
> diff --git a/fs/xfs/xfs_attr_item.c b/fs/xfs/xfs_attr_item.c
> index 2a142cefdc3d..0bf25a2ba3b6 100644
> --- a/fs/xfs/xfs_attr_item.c
> +++ b/fs/xfs/xfs_attr_item.c
> @@ -226,7 +226,7 @@ xfs_attri_init(
>  {
>  	struct xfs_attri_log_item	*attrip;
>  
> -	attrip = kmem_cache_zalloc(xfs_attri_cache, GFP_NOFS | __GFP_NOFAIL);
> +	attrip = kmem_cache_zalloc(xfs_attri_cache, GFP_KERNEL | __GFP_NOFAIL);
>  
>  	/*
>  	 * Grab an extra reference to the name/value buffer for this log item.
> @@ -666,7 +666,7 @@ xfs_attr_create_done(
>  
>  	attrip = ATTRI_ITEM(intent);
>  
> -	attrdp = kmem_cache_zalloc(xfs_attrd_cache, GFP_NOFS | __GFP_NOFAIL);
> +	attrdp = kmem_cache_zalloc(xfs_attrd_cache, GFP_KERNEL | __GFP_NOFAIL);
>  
>  	xfs_log_item_init(tp->t_mountp, &attrdp->attrd_item, XFS_LI_ATTRD,
>  			  &xfs_attrd_item_ops);
> diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
> index c2531c28905c..cb2a4b940292 100644
> --- a/fs/xfs/xfs_bmap_util.c
> +++ b/fs/xfs/xfs_bmap_util.c
> @@ -66,7 +66,7 @@ xfs_zero_extent(
>  	return blkdev_issue_zeroout(target->bt_bdev,
>  		block << (mp->m_super->s_blocksize_bits - 9),
>  		count_fsb << (mp->m_super->s_blocksize_bits - 9),
> -		GFP_NOFS, 0);
> +		GFP_KERNEL, 0);
>  }
>  
>  /*
> diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
> index a09ffbbb0dda..de99368000b4 100644
> --- a/fs/xfs/xfs_buf.c
> +++ b/fs/xfs/xfs_buf.c
> @@ -190,7 +190,7 @@ xfs_buf_get_maps(
>  	}
>  
>  	bp->b_maps = kzalloc(map_count * sizeof(struct xfs_buf_map),
> -				GFP_NOFS | __GFP_NOFAIL);
> +			GFP_KERNEL | __GFP_NOLOCKDEP | __GFP_NOFAIL);
>  	if (!bp->b_maps)
>  		return -ENOMEM;
>  	return 0;
> @@ -222,7 +222,8 @@ _xfs_buf_alloc(
>  	int			i;
>  
>  	*bpp = NULL;
> -	bp = kmem_cache_zalloc(xfs_buf_cache, GFP_NOFS | __GFP_NOFAIL);
> +	bp = kmem_cache_zalloc(xfs_buf_cache,
> +			GFP_KERNEL | __GFP_NOLOCKDEP | __GFP_NOFAIL);
>  
>  	/*
>  	 * We don't want certain flags to appear in b_flags unless they are
> @@ -325,7 +326,7 @@ xfs_buf_alloc_kmem(
>  	struct xfs_buf	*bp,
>  	xfs_buf_flags_t	flags)
>  {
> -	gfp_t		gfp_mask = GFP_NOFS | __GFP_NOFAIL;
> +	gfp_t		gfp_mask = GFP_KERNEL | __GFP_NOLOCKDEP | __GFP_NOFAIL;
>  	size_t		size = BBTOB(bp->b_length);
>  
>  	/* Assure zeroed buffer for non-read cases. */
> @@ -356,13 +357,11 @@ xfs_buf_alloc_pages(
>  	struct xfs_buf	*bp,
>  	xfs_buf_flags_t	flags)
>  {
> -	gfp_t		gfp_mask = __GFP_NOWARN;
> +	gfp_t		gfp_mask = GFP_KERNEL | __GFP_NOLOCKDEP | __GFP_NOWARN;
>  	long		filled = 0;
>  
>  	if (flags & XBF_READ_AHEAD)
>  		gfp_mask |= __GFP_NORETRY;
> -	else
> -		gfp_mask |= GFP_NOFS;
>  
>  	/* Make sure that we have a page list */
>  	bp->b_page_count = DIV_ROUND_UP(BBTOB(bp->b_length), PAGE_SIZE);
> @@ -429,11 +428,18 @@ _xfs_buf_map_pages(
>  
>  		/*
>  		 * vm_map_ram() will allocate auxiliary structures (e.g.
> -		 * pagetables) with GFP_KERNEL, yet we are likely to be under
> -		 * GFP_NOFS context here. Hence we need to tell memory reclaim
> -		 * that we are in such a context via PF_MEMALLOC_NOFS to prevent
> -		 * memory reclaim re-entering the filesystem here and
> -		 * potentially deadlocking.
> +		 * pagetables) with GFP_KERNEL, yet we often under a scoped nofs
> +		 * context here. Mixing GFP_KERNEL with GFP_NOFS allocations
> +		 * from the same call site that can be run from both above and
> +		 * below memory reclaim causes lockdep false positives. Hence we
> +		 * always need to force this allocation to nofs context because
> +		 * we can't pass __GFP_NOLOCKDEP down to auxillary structures to
> +		 * prevent false positive lockdep reports.
> +		 *
> +		 * XXX(dgc): I think dquot reclaim is the only place we can get
> +		 * to this function from memory reclaim context now. If we fix
> +		 * that like we've fixed inode reclaim to avoid writeback from
> +		 * reclaim, this nofs wrapping can go away.
>  		 */
>  		nofs_flag = memalloc_nofs_save();
>  		do {
> diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
> index ee39639bb92b..1f68569e62ca 100644
> --- a/fs/xfs/xfs_log.c
> +++ b/fs/xfs/xfs_log.c
> @@ -3518,7 +3518,8 @@ xlog_ticket_alloc(
>  	struct xlog_ticket	*tic;
>  	int			unit_res;
>  
> -	tic = kmem_cache_zalloc(xfs_log_ticket_cache, GFP_NOFS | __GFP_NOFAIL);
> +	tic = kmem_cache_zalloc(xfs_log_ticket_cache,
> +			GFP_KERNEL | __GFP_NOFAIL);
>  
>  	unit_res = xlog_calc_unit_res(log, unit_bytes, &tic->t_iclog_hdrs);
>  
> diff --git a/fs/xfs/xfs_mru_cache.c b/fs/xfs/xfs_mru_cache.c
> index ce496704748d..7443debaffd6 100644
> --- a/fs/xfs/xfs_mru_cache.c
> +++ b/fs/xfs/xfs_mru_cache.c
> @@ -428,7 +428,7 @@ xfs_mru_cache_insert(
>  	if (!mru || !mru->lists)
>  		return -EINVAL;
>  
> -	if (radix_tree_preload(GFP_NOFS))
> +	if (radix_tree_preload(GFP_KERNEL))
>  		return -ENOMEM;
>  
>  	INIT_LIST_HEAD(&elem->list_node);
> -- 
> 2.43.0
> 
> 


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 09/12] xfs: place intent recovery under NOFS allocation context
  2024-01-15 22:59 ` [PATCH 09/12] xfs: place intent recovery under NOFS allocation context Dave Chinner
@ 2024-01-18 23:39   ` Darrick J. Wong
  0 siblings, 0 replies; 29+ messages in thread
From: Darrick J. Wong @ 2024-01-18 23:39 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs, willy, linux-mm

On Tue, Jan 16, 2024 at 09:59:47AM +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> When recovery starts processing intents, all of the initial intent
> allocations are done outside of transaction contexts. That means
> they need to specifically use GFP_NOFS as we do not want memory
> reclaim to attempt to run direct reclaim of filesystem objects while
> we have lots of objects added into deferred operations.
> 
> Rather than use GFP_NOFS for these specific allocations, just place
> the entire intent recovery process under NOFS context and we can
> then just use GFP_KERNEL for these allocations.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>

Hooray!  This finally goes away...
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> ---
>  fs/xfs/xfs_attr_item.c     |  2 +-
>  fs/xfs/xfs_bmap_item.c     |  3 ++-
>  fs/xfs/xfs_log_recover.c   | 18 ++++++++++++++----
>  fs/xfs/xfs_refcount_item.c |  2 +-
>  fs/xfs/xfs_rmap_item.c     |  2 +-
>  5 files changed, 19 insertions(+), 8 deletions(-)
> 
> diff --git a/fs/xfs/xfs_attr_item.c b/fs/xfs/xfs_attr_item.c
> index 0bf25a2ba3b6..e14e229fc712 100644
> --- a/fs/xfs/xfs_attr_item.c
> +++ b/fs/xfs/xfs_attr_item.c
> @@ -513,7 +513,7 @@ xfs_attri_recover_work(
>  		return ERR_PTR(error);
>  
>  	attr = kzalloc(sizeof(struct xfs_attr_intent) +
> -			sizeof(struct xfs_da_args), GFP_NOFS | __GFP_NOFAIL);
> +			sizeof(struct xfs_da_args), GFP_KERNEL | __GFP_NOFAIL);
>  	args = (struct xfs_da_args *)(attr + 1);
>  
>  	attr->xattri_da_args = args;
> diff --git a/fs/xfs/xfs_bmap_item.c b/fs/xfs/xfs_bmap_item.c
> index 029a6a8d0efd..e3c58090e976 100644
> --- a/fs/xfs/xfs_bmap_item.c
> +++ b/fs/xfs/xfs_bmap_item.c
> @@ -445,7 +445,8 @@ xfs_bui_recover_work(
>  	if (error)
>  		return ERR_PTR(error);
>  
> -	bi = kmem_cache_zalloc(xfs_bmap_intent_cache, GFP_NOFS | __GFP_NOFAIL);
> +	bi = kmem_cache_zalloc(xfs_bmap_intent_cache,
> +			GFP_KERNEL | __GFP_NOFAIL);
>  	bi->bi_whichfork = (map->me_flags & XFS_BMAP_EXTENT_ATTR_FORK) ?
>  			XFS_ATTR_FORK : XFS_DATA_FORK;
>  	bi->bi_type = map->me_flags & XFS_BMAP_EXTENT_TYPE_MASK;
> diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
> index e9ed43a833af..8c1d260bb9e1 100644
> --- a/fs/xfs/xfs_log_recover.c
> +++ b/fs/xfs/xfs_log_recover.c
> @@ -3443,12 +3443,19 @@ xlog_recover(
>   * part of recovery so that the root and real-time bitmap inodes can be read in
>   * from disk in between the two stages.  This is necessary so that we can free
>   * space in the real-time portion of the file system.
> + *
> + * We run this whole process under GFP_NOFS allocation context. We do a
> + * combination of non-transactional and transactional work, yet we really don't
> + * want to recurse into the filesystem from direct reclaim during any of this
> + * processing. This allows all the recovery code run here not to care about the
> + * memory allocation context it is running in.
>   */
>  int
>  xlog_recover_finish(
>  	struct xlog	*log)
>  {
> -	int	error;
> +	unsigned int	nofs_flags = memalloc_nofs_save();
> +	int		error;
>  
>  	error = xlog_recover_process_intents(log);
>  	if (error) {
> @@ -3462,7 +3469,7 @@ xlog_recover_finish(
>  		xlog_recover_cancel_intents(log);
>  		xfs_alert(log->l_mp, "Failed to recover intents");
>  		xlog_force_shutdown(log, SHUTDOWN_LOG_IO_ERROR);
> -		return error;
> +		goto out_error;
>  	}
>  
>  	/*
> @@ -3483,7 +3490,7 @@ xlog_recover_finish(
>  		if (error < 0) {
>  			xfs_alert(log->l_mp,
>  	"Failed to clear log incompat features on recovery");
> -			return error;
> +			goto out_error;
>  		}
>  	}
>  
> @@ -3508,9 +3515,12 @@ xlog_recover_finish(
>  		 * and AIL.
>  		 */
>  		xlog_force_shutdown(log, SHUTDOWN_LOG_IO_ERROR);
> +		goto out_error;
>  	}
>  
> -	return 0;
> +out_error:
> +	memalloc_nofs_restore(nofs_flags);
> +	return error;
>  }
>  
>  void
> diff --git a/fs/xfs/xfs_refcount_item.c b/fs/xfs/xfs_refcount_item.c
> index d850b9685f7f..14919b33e4fe 100644
> --- a/fs/xfs/xfs_refcount_item.c
> +++ b/fs/xfs/xfs_refcount_item.c
> @@ -425,7 +425,7 @@ xfs_cui_recover_work(
>  	struct xfs_refcount_intent	*ri;
>  
>  	ri = kmem_cache_alloc(xfs_refcount_intent_cache,
> -			GFP_NOFS | __GFP_NOFAIL);
> +			GFP_KERNEL | __GFP_NOFAIL);
>  	ri->ri_type = pmap->pe_flags & XFS_REFCOUNT_EXTENT_TYPE_MASK;
>  	ri->ri_startblock = pmap->pe_startblock;
>  	ri->ri_blockcount = pmap->pe_len;
> diff --git a/fs/xfs/xfs_rmap_item.c b/fs/xfs/xfs_rmap_item.c
> index a40b92ac81e8..e473124e29cc 100644
> --- a/fs/xfs/xfs_rmap_item.c
> +++ b/fs/xfs/xfs_rmap_item.c
> @@ -455,7 +455,7 @@ xfs_rui_recover_work(
>  {
>  	struct xfs_rmap_intent		*ri;
>  
> -	ri = kmem_cache_alloc(xfs_rmap_intent_cache, GFP_NOFS | __GFP_NOFAIL);
> +	ri = kmem_cache_alloc(xfs_rmap_intent_cache, GFP_KERNEL | __GFP_NOFAIL);
>  
>  	switch (map->me_flags & XFS_RMAP_EXTENT_TYPE_MASK) {
>  	case XFS_RMAP_EXTENT_MAP:
> -- 
> 2.43.0
> 
> 


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 10/12] xfs: place the CIL under nofs allocation context
  2024-01-15 22:59 ` [PATCH 10/12] xfs: place the CIL under nofs " Dave Chinner
@ 2024-01-18 23:41   ` Darrick J. Wong
  0 siblings, 0 replies; 29+ messages in thread
From: Darrick J. Wong @ 2024-01-18 23:41 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs, willy, linux-mm

On Tue, Jan 16, 2024 at 09:59:48AM +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> This is core code that needs to run in low memory conditions and
> can be triggered from memory reclaim. While it runs in a workqueue,
> it really shouldn't be recursing back into the filesystem during
> any memory allocation it needs to function.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/xfs_log_cil.c | 13 ++++++++++++-
>  1 file changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
> index 815a2181004c..8c3b09777006 100644
> --- a/fs/xfs/xfs_log_cil.c
> +++ b/fs/xfs/xfs_log_cil.c
> @@ -100,7 +100,7 @@ xlog_cil_ctx_alloc(void)
>  {
>  	struct xfs_cil_ctx	*ctx;
>  
> -	ctx = kzalloc(sizeof(*ctx), GFP_NOFS | __GFP_NOFAIL);
> +	ctx = kzalloc(sizeof(*ctx), GFP_KERNEL | __GFP_NOFAIL);
>  	INIT_LIST_HEAD(&ctx->committing);
>  	INIT_LIST_HEAD(&ctx->busy_extents.extent_list);
>  	INIT_LIST_HEAD(&ctx->log_items);
> @@ -1116,11 +1116,18 @@ xlog_cil_cleanup_whiteouts(
>   * same sequence twice.  If we get a race between multiple pushes for the same
>   * sequence they will block on the first one and then abort, hence avoiding
>   * needless pushes.
> + *
> + * This runs from a workqueue so it does not inherent any specific memory

                                       inherit? ^^^^^^^^

If that change is correct,
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> + * allocation context. However, we do not want to block on memory reclaim
> + * recursing back into the filesystem because this push may have been triggered
> + * by memory reclaim itself. Hence we really need to run under full GFP_NOFS
> + * contraints here.
>   */
>  static void
>  xlog_cil_push_work(
>  	struct work_struct	*work)
>  {
> +	unsigned int		nofs_flags = memalloc_nofs_save();
>  	struct xfs_cil_ctx	*ctx =
>  		container_of(work, struct xfs_cil_ctx, push_work);
>  	struct xfs_cil		*cil = ctx->cil;
> @@ -1334,12 +1341,14 @@ xlog_cil_push_work(
>  	spin_unlock(&log->l_icloglock);
>  	xlog_cil_cleanup_whiteouts(&whiteouts);
>  	xfs_log_ticket_ungrant(log, ticket);
> +	memalloc_nofs_restore(nofs_flags);
>  	return;
>  
>  out_skip:
>  	up_write(&cil->xc_ctx_lock);
>  	xfs_log_ticket_put(new_ctx->ticket);
>  	kfree(new_ctx);
> +	memalloc_nofs_restore(nofs_flags);
>  	return;
>  
>  out_abort_free_ticket:
> @@ -1348,6 +1357,7 @@ xlog_cil_push_work(
>  	if (!ctx->commit_iclog) {
>  		xfs_log_ticket_ungrant(log, ctx->ticket);
>  		xlog_cil_committed(ctx);
> +		memalloc_nofs_restore(nofs_flags);
>  		return;
>  	}
>  	spin_lock(&log->l_icloglock);
> @@ -1356,6 +1366,7 @@ xlog_cil_push_work(
>  	/* Not safe to reference ctx now! */
>  	spin_unlock(&log->l_icloglock);
>  	xfs_log_ticket_ungrant(log, ticket);
> +	memalloc_nofs_restore(nofs_flags);
>  }
>  
>  /*
> -- 
> 2.43.0
> 
> 


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 12/12] xfs: use xfs_defer_alloc a bit more
  2024-01-15 22:59 ` [PATCH 12/12] xfs: use xfs_defer_alloc a bit more Dave Chinner
@ 2024-01-18 23:41   ` Darrick J. Wong
  0 siblings, 0 replies; 29+ messages in thread
From: Darrick J. Wong @ 2024-01-18 23:41 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs, willy, linux-mm

On Tue, Jan 16, 2024 at 09:59:50AM +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> Noticed by inspection, simple factoring allows the same allocation
> routine to be used for both transaction and recovery contexts.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>

Looks good to me,
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> ---
>  fs/xfs/libxfs/xfs_defer.c | 15 +++++----------
>  1 file changed, 5 insertions(+), 10 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_defer.c b/fs/xfs/libxfs/xfs_defer.c
> index 8ae4401f6810..6ed3a5fda081 100644
> --- a/fs/xfs/libxfs/xfs_defer.c
> +++ b/fs/xfs/libxfs/xfs_defer.c
> @@ -819,7 +819,7 @@ xfs_defer_can_append(
>  /* Create a new pending item at the end of the transaction list. */
>  static inline struct xfs_defer_pending *
>  xfs_defer_alloc(
> -	struct xfs_trans		*tp,
> +	struct list_head		*dfops,
>  	const struct xfs_defer_op_type	*ops)
>  {
>  	struct xfs_defer_pending	*dfp;
> @@ -828,7 +828,7 @@ xfs_defer_alloc(
>  			GFP_KERNEL | __GFP_NOFAIL);
>  	dfp->dfp_ops = ops;
>  	INIT_LIST_HEAD(&dfp->dfp_work);
> -	list_add_tail(&dfp->dfp_list, &tp->t_dfops);
> +	list_add_tail(&dfp->dfp_list, dfops);
>  
>  	return dfp;
>  }
> @@ -846,7 +846,7 @@ xfs_defer_add(
>  
>  	dfp = xfs_defer_find_last(tp, ops);
>  	if (!dfp || !xfs_defer_can_append(dfp, ops))
> -		dfp = xfs_defer_alloc(tp, ops);
> +		dfp = xfs_defer_alloc(&tp->t_dfops, ops);
>  
>  	xfs_defer_add_item(dfp, li);
>  	trace_xfs_defer_add_item(tp->t_mountp, dfp, li);
> @@ -870,7 +870,7 @@ xfs_defer_add_barrier(
>  	if (dfp)
>  		return;
>  
> -	xfs_defer_alloc(tp, &xfs_barrier_defer_type);
> +	xfs_defer_alloc(&tp->t_dfops, &xfs_barrier_defer_type);
>  
>  	trace_xfs_defer_add_item(tp->t_mountp, dfp, NULL);
>  }
> @@ -885,14 +885,9 @@ xfs_defer_start_recovery(
>  	struct list_head		*r_dfops,
>  	const struct xfs_defer_op_type	*ops)
>  {
> -	struct xfs_defer_pending	*dfp;
> +	struct xfs_defer_pending	*dfp = xfs_defer_alloc(r_dfops, ops);
>  
> -	dfp = kmem_cache_zalloc(xfs_defer_pending_cache,
> -			GFP_KERNEL | __GFP_NOFAIL);
> -	dfp->dfp_ops = ops;
>  	dfp->dfp_intent = lip;
> -	INIT_LIST_HEAD(&dfp->dfp_work);
> -	list_add_tail(&dfp->dfp_list, r_dfops);
>  }
>  
>  /*
> -- 
> 2.43.0
> 
> 


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 11/12] xfs: clean up remaining GFP_NOFS users
  2024-01-15 22:59 ` [PATCH 11/12] xfs: clean up remaining GFP_NOFS users Dave Chinner
@ 2024-01-19  0:52   ` Darrick J. Wong
  0 siblings, 0 replies; 29+ messages in thread
From: Darrick J. Wong @ 2024-01-19  0:52 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs, willy, linux-mm

On Tue, Jan 16, 2024 at 09:59:49AM +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> These few remaining GFP_NOFS callers do not need to use GFP_NOFS at
> all. They are only called from a non-transactional context or cannot
> be accessed from memory reclaim due to other constraints. Hence they
> can just use GFP_KERNEL.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>

Looks good,
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> ---
>  fs/xfs/libxfs/xfs_btree_staging.c | 4 ++--
>  fs/xfs/xfs_attr_list.c            | 2 +-
>  fs/xfs/xfs_buf.c                  | 2 +-
>  3 files changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_btree_staging.c b/fs/xfs/libxfs/xfs_btree_staging.c
> index 961f6b898f4b..f0c69f9bb169 100644
> --- a/fs/xfs/libxfs/xfs_btree_staging.c
> +++ b/fs/xfs/libxfs/xfs_btree_staging.c
> @@ -139,7 +139,7 @@ xfs_btree_stage_afakeroot(
>  	ASSERT(!(cur->bc_flags & XFS_BTREE_ROOT_IN_INODE));
>  	ASSERT(cur->bc_tp == NULL);
>  
> -	nops = kmalloc(sizeof(struct xfs_btree_ops), GFP_NOFS | __GFP_NOFAIL);
> +	nops = kmalloc(sizeof(struct xfs_btree_ops), GFP_KERNEL | __GFP_NOFAIL);
>  	memcpy(nops, cur->bc_ops, sizeof(struct xfs_btree_ops));
>  	nops->alloc_block = xfs_btree_fakeroot_alloc_block;
>  	nops->free_block = xfs_btree_fakeroot_free_block;
> @@ -220,7 +220,7 @@ xfs_btree_stage_ifakeroot(
>  	ASSERT(cur->bc_flags & XFS_BTREE_ROOT_IN_INODE);
>  	ASSERT(cur->bc_tp == NULL);
>  
> -	nops = kmalloc(sizeof(struct xfs_btree_ops), GFP_NOFS | __GFP_NOFAIL);
> +	nops = kmalloc(sizeof(struct xfs_btree_ops), GFP_KERNEL | __GFP_NOFAIL);
>  	memcpy(nops, cur->bc_ops, sizeof(struct xfs_btree_ops));
>  	nops->alloc_block = xfs_btree_fakeroot_alloc_block;
>  	nops->free_block = xfs_btree_fakeroot_free_block;
> diff --git a/fs/xfs/xfs_attr_list.c b/fs/xfs/xfs_attr_list.c
> index 0318d768520a..47453510c0ab 100644
> --- a/fs/xfs/xfs_attr_list.c
> +++ b/fs/xfs/xfs_attr_list.c
> @@ -109,7 +109,7 @@ xfs_attr_shortform_list(
>  	 * It didn't all fit, so we have to sort everything on hashval.
>  	 */
>  	sbsize = sf->count * sizeof(*sbuf);
> -	sbp = sbuf = kmalloc(sbsize, GFP_NOFS | __GFP_NOFAIL);
> +	sbp = sbuf = kmalloc(sbsize, GFP_KERNEL | __GFP_NOFAIL);
>  
>  	/*
>  	 * Scan the attribute list for the rest of the entries, storing
> diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
> index de99368000b4..08f2fbc04db5 100644
> --- a/fs/xfs/xfs_buf.c
> +++ b/fs/xfs/xfs_buf.c
> @@ -2008,7 +2008,7 @@ xfs_alloc_buftarg(
>  #if defined(CONFIG_FS_DAX) && defined(CONFIG_MEMORY_FAILURE)
>  	ops = &xfs_dax_holder_operations;
>  #endif
> -	btp = kzalloc(sizeof(*btp), GFP_NOFS | __GFP_NOFAIL);
> +	btp = kzalloc(sizeof(*btp), GFP_KERNEL | __GFP_NOFAIL);
>  
>  	btp->bt_mount = mp;
>  	btp->bt_bdev_handle = bdev_handle;
> -- 
> 2.43.0
> 
> 


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 00/12] xfs: remove remaining kmem interfaces and GFP_NOFS usage
  2024-01-15 22:59 [PATCH 00/12] xfs: remove remaining kmem interfaces and GFP_NOFS usage Dave Chinner
                   ` (11 preceding siblings ...)
       [not found] ` <20240115230113.4080105-3-david@fromorbit.com>
@ 2024-03-25 17:46 ` Pankaj Raghav (Samsung)
  2024-04-01 21:30   ` Dave Chinner
  12 siblings, 1 reply; 29+ messages in thread
From: Pankaj Raghav (Samsung) @ 2024-03-25 17:46 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs, willy, linux-mm, kernel, mcgrof, gost.dev

> 
> The first part of the series (fs/xfs/kmem.[ch] removal) is straight
> forward.  We've done lots of this stuff in the past leading up to
> the point; this is just converting the final remaining usage to the
> native kernel interface. The only down-side to this is that we end
> up propagating __GFP_NOFAIL everywhere into the code. This is no big
> deal for XFS - it's just formalising the fact that all our
> allocations are __GFP_NOFAIL by default, except for the ones we
> explicity mark as able to fail. This may be a surprise of people
> outside XFS, but we've been doing this for a couple of decades now
> and the sky hasn't fallen yet.

Definetly a surprise to me. :)

I rebased my LBS patches with these changes and generic/476 started to
break in page alloc[1]:

static inline
struct page *rmqueue(struct zone *preferred_zone,
			struct zone *zone, unsigned int order,
			gfp_t gfp_flags, unsigned int alloc_flags,
			int migratetype)
{
	struct page *page;

	/*
	 * We most definitely don't want callers attempting to
	 * allocate greater than order-1 page units with __GFP_NOFAIL.
	 */
	WARN_ON_ONCE((gfp_flags & __GFP_NOFAIL) && (order > 1));
...

The reason for this is the call from xfs_attr_leaf.c to allocate memory
with attr->geo->blksize, which is set to 1 FSB. As 1 FSB can correspond
to order > 1 in LBS, this WARN_ON_ONCE is triggered.

This was not an issue before as xfs/kmem.c retried manually in a loop
without passing the __GFP_NOFAIL flag.

As not all calls to kmalloc in xfs_attr_leaf.c call handles ENOMEM
errors, what would be the correct approach for LBS configurations?

One possible idea is to use __GFP_RETRY_MAYFAIL for LBS configuration as
it will resemble the way things worked before.

Let me know your thoughts.
--
Pankaj
[1] https://elixir.bootlin.com/linux/v6.9-rc1/source/mm/page_alloc.c#L2902


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 00/12] xfs: remove remaining kmem interfaces and GFP_NOFS usage
  2024-03-25 17:46 ` [PATCH 00/12] xfs: remove remaining kmem interfaces and GFP_NOFS usage Pankaj Raghav (Samsung)
@ 2024-04-01 21:30   ` Dave Chinner
  0 siblings, 0 replies; 29+ messages in thread
From: Dave Chinner @ 2024-04-01 21:30 UTC (permalink / raw)
  To: Pankaj Raghav (Samsung); +Cc: linux-xfs, willy, linux-mm, mcgrof, gost.dev

On Mon, Mar 25, 2024 at 06:46:29PM +0100, Pankaj Raghav (Samsung) wrote:
> > 
> > The first part of the series (fs/xfs/kmem.[ch] removal) is straight
> > forward.  We've done lots of this stuff in the past leading up to
> > the point; this is just converting the final remaining usage to the
> > native kernel interface. The only down-side to this is that we end
> > up propagating __GFP_NOFAIL everywhere into the code. This is no big
> > deal for XFS - it's just formalising the fact that all our
> > allocations are __GFP_NOFAIL by default, except for the ones we
> > explicity mark as able to fail. This may be a surprise of people
> > outside XFS, but we've been doing this for a couple of decades now
> > and the sky hasn't fallen yet.
> 
> Definetly a surprise to me. :)
> 
> I rebased my LBS patches with these changes and generic/476 started to
> break in page alloc[1]:
> 
> static inline
> struct page *rmqueue(struct zone *preferred_zone,
> 			struct zone *zone, unsigned int order,
> 			gfp_t gfp_flags, unsigned int alloc_flags,
> 			int migratetype)
> {
> 	struct page *page;
> 
> 	/*
> 	 * We most definitely don't want callers attempting to
> 	 * allocate greater than order-1 page units with __GFP_NOFAIL.
> 	 */
> 	WARN_ON_ONCE((gfp_flags & __GFP_NOFAIL) && (order > 1));
> ...

Yeah, that warning needs to go. It's just unnecessary noise at this
point in time - at minimum should be gated on __GFP_NOWARN.

> The reason for this is the call from xfs_attr_leaf.c to allocate memory
> with attr->geo->blksize, which is set to 1 FSB. As 1 FSB can correspond
> to order > 1 in LBS, this WARN_ON_ONCE is triggered.
> 
> This was not an issue before as xfs/kmem.c retried manually in a loop
> without passing the __GFP_NOFAIL flag.

Right, we've been doing this sort of "no fail" high order kmalloc
thing for a couple of decades in XFS, explicitly to avoid arbitrary
noise like this warning.....

> As not all calls to kmalloc in xfs_attr_leaf.c call handles ENOMEM
> errors, what would be the correct approach for LBS configurations?

Use kvmalloc().

-Dave.
-- 
Dave Chinner
david@fromorbit.com


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 07/12] xfs: use __GFP_NOLOCKDEP instead of GFP_NOFS
  2024-01-15 22:59 ` [PATCH 07/12] xfs: use __GFP_NOLOCKDEP instead of GFP_NOFS Dave Chinner
  2024-01-18 23:32   ` Darrick J. Wong
@ 2024-06-22  9:44   ` Long Li
  2024-07-02  5:55     ` Dave Chinner
  1 sibling, 1 reply; 29+ messages in thread
From: Long Li @ 2024-06-22  9:44 UTC (permalink / raw)
  To: Dave Chinner; +Cc: willy, linux-mm, linux-xfs

On Tue, Jan 16, 2024 at 09:59:45AM +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> In the past we've had problems with lockdep false positives stemming
> from inode locking occurring in memory reclaim contexts (e.g. from
> superblock shrinkers). Lockdep doesn't know that inodes access from
> above memory reclaim cannot be accessed from below memory reclaim
> (and vice versa) but there has never been a good solution to solving
> this problem with lockdep annotations.
> 
> This situation isn't unique to inode locks - buffers are also locked
> above and below memory reclaim, and we have to maintain lock
> ordering for them - and against inodes - appropriately. IOWs, the
> same code paths and locks are taken both above and below memory
> reclaim and so we always need to make sure the lock orders are
> consistent. We are spared the lockdep problems this might cause
> by the fact that semaphores and bit locks aren't covered by lockdep.
> 
> In general, this sort of lockdep false positive detection is cause
> by code that runs GFP_KERNEL memory allocation with an actively
> referenced inode locked. When it is run from a transaction, memory
> allocation is automatically GFP_NOFS, so we don't have reclaim
> recursion issues. So in the places where we do memory allocation
> with inodes locked outside of a transaction, we have explicitly set
> them to use GFP_NOFS allocations to prevent lockdep false positives
> from being reported if the allocation dips into direct memory
> reclaim.
> 
> More recently, __GFP_NOLOCKDEP was added to the memory allocation
> flags to tell lockdep not to track that particular allocation for
> the purposes of reclaim recursion detection. This is a much better
> way of preventing false positives - it allows us to use GFP_KERNEL
> context outside of transactions, and allows direct memory reclaim to
> proceed normally without throwing out false positive deadlock
> warnings.

Hi Dave,

I recently encountered the following AA deadlock lockdep warning
in Linux-6.9.0. This version of the kernel has currently merged
your patch set. I believe this is a lockdep false positive warning.

The xfs_dir_lookup_args() function is in a non-transactional context
and allocates memory with the __GFP_NOLOCKDEP flag in xfs_buf_alloc_pages().
Even though __GFP_NOLOCKDEP can tell lockdep not to track that particular
allocation for the purposes of reclaim recursion detection, it cannot
completely replace __GFP_NOFS. Getting trapped in direct memory reclaim
maybe trigger the AA deadlock warning as shown below.

Or am I mistaken somewhere? I look forward to your reply.

Thanks,
Long Li

[12051.255974][ T6480] ============================================
[12051.256590][ T6480] WARNING: possible recursive locking detected
[12051.257207][ T6480] 6.9.0-xfstests-12131-gb902367d6fde-dirty #747 Not tainted
[12051.257919][ T6480] --------------------------------------------
[12051.258513][ T6480] cc1/6480 is trying to acquire lock:
[12051.259017][ T6480] ffff88804f40a018 (&xfs_dir_ilock_class){++++}-{3:3}, at: xfs_icwalk_ag+0x7c0/0x1690
[12051.259926][ T6480]
[12051.259926][ T6480] but task is already holding lock:
[12051.260599][ T6480] ffff8881004b5658 (&xfs_dir_ilock_class){++++}-{3:3}, at: xfs_ilock_data_map_shared+0x52/0x70
[12051.261546][ T6480]
[12051.261546][ T6480] other info that might help us debug this:
[12051.262288][ T6480]  Possible unsafe locking scenario:
[12051.262288][ T6480]
[12051.262972][ T6480]        CPU0
[12051.263283][ T6480]        ----
[12051.263587][ T6480]   lock(&xfs_dir_ilock_class);
[12051.264048][ T6480]   lock(&xfs_dir_ilock_class);
[12051.264502][ T6480]
[12051.264502][ T6480]  *** DEADLOCK ***
[12051.264502][ T6480]
[12051.265267][ T6480]  May be due to missing lock nesting notation
[12051.265267][ T6480]
[12051.266052][ T6480] 3 locks held by cc1/6480:
[12051.266477][ T6480]  #0: ffff8881004b5878 (&inode->i_sb->s_type->i_mutex_dir_key){++++}-{3:3}, at: path_openat+0xaa4/0x1090
[12051.267526][ T6480]  #1: ffff8881004b5658 (&xfs_dir_ilock_class){++++}-{3:3}, at: xfs_ilock_data_map_shared+0x52/0x70
[12051.268528][ T6480]  #2: ffff888107fda0e0 (&type->s_umount_key#42){.+.+}-{3:3}, at: super_trylock_shared+0x1c/0xb0
[12051.269511][ T6480]
[12051.269511][ T6480] stack backtrace:
[12051.270092][ T6480] CPU: 2 PID: 6480 Comm: cc1 Not tainted 6.9.0-xfstests-12131-gb902367d6fde-dirty #747
[12051.271012][ T6480] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190727_073836-buildvm-ppc64le-16.ppc.fedoraproject.org-3.fc31 04/01/2014
[12051.272321][ T6480] Call Trace:
[12051.272640][ T6480]  <TASK>
[12051.272913][ T6480]  dump_stack_lvl+0x82/0xd0
[12051.273347][ T6480]  validate_chain+0xe70/0x1d30
[12051.274765][ T6480]  __lock_acquire+0xd9a/0x1e90
[12051.275208][ T6480]  lock_acquire+0x1a9/0x4f0
[12051.277032][ T6480]  down_write_nested+0x9b/0x200
[12051.279413][ T6480]  xfs_icwalk_ag+0x7c0/0x1690
[12051.284326][ T6480]  xfs_icwalk+0x4f/0xe0
[12051.284735][ T6480]  xfs_reclaim_inodes_nr+0x148/0x1f0
[12051.285792][ T6480]  super_cache_scan+0x30c/0x440
[12051.286247][ T6480]  do_shrink_slab+0x340/0xce0
[12051.286701][ T6480]  shrink_slab_memcg+0x231/0x8f0
[12051.289127][ T6480]  shrink_slab+0x4ad/0x4f0
[12051.290620][ T6480]  shrink_node+0x86b/0x1de0
[12051.291055][ T6480]  do_try_to_free_pages+0x2c4/0x1490
[12051.293643][ T6480]  try_to_free_pages+0x20d/0x540
[12051.294641][ T6480]  __alloc_pages_slowpath.constprop.0+0x754/0x2050
[12051.299337][ T6480]  __alloc_pages_noprof+0x54f/0x660
[12051.301344][ T6480]  alloc_pages_bulk_noprof+0x6fb/0xe00
[12051.302404][ T6480]  xfs_buf_alloc_pages+0x1b9/0x850
[12051.302889][ T6480]  xfs_buf_get_map+0xe86/0x1590
[12051.303847][ T6480]  xfs_buf_read_map+0xb6/0x7f0
[12051.306234][ T6480]  xfs_trans_read_buf_map+0x474/0xd30
[12051.307753][ T6480]  xfs_da_read_buf+0x1c8/0x2c0
[12051.310298][ T6480]  xfs_dir3_data_read+0x36/0x2e0
[12051.310783][ T6480]  xfs_dir2_leafn_lookup_for_entry+0x3d6/0x14b0
[12051.313039][ T6480]  xfs_da3_node_lookup_int+0xef1/0x1810
[12051.315658][ T6480]  xfs_dir2_node_lookup+0xc5/0x580
[12051.317156][ T6480]  xfs_dir_lookup_args+0xbf/0xe0
[12075.149236][ T5555]  new_slab+0x2c4/0x320
[12075.149602][ T5555]  ___slab_alloc+0xcdd/0x1640
[12075.152775][ T5555]  __slab_alloc.isra.0+0x1f/0x40
[12075.153238][ T5555]  kmem_cache_alloc_noprof+0x34f/0x3a0
[12075.154130][ T5555]  vm_area_dup+0x51/0x160
[12075.154772][ T5555]  __split_vma+0x135/0x1930
[12075.158003][ T5555]  vma_modify+0x228/0x300
[12075.158380][ T5555]  mprotect_fixup+0x1a0/0x950
[12075.159252][ T5555]  do_mprotect_pkey+0x79c/0xa40
[12075.161063][ T5555]  __x64_sys_mprotect+0x78/0xc0
[12075.161492][ T5555]  do_syscall_64+0x66/0x140
[12075.161891][ T5555]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[12075.162409][ T5555] RIP: 0033:0x7ff736f2bc5b
[12075.162811][ T5555] Code: 73 01 c3 48 8d 0d a5 15 01 00 f7 d8 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 f3 0f 1e fa b8 0a 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 83
[12075.164490][ T5555] RSP: 002b:00007ffc9c420998 EFLAGS: 00000206 ORIG_RAX: 000000000000000a
[12075.165229][ T5555] RAX: ffffffffffffffda RBX: 00007ff736f3ca30 RCX: 00007ff736f2bc5b
[12075.165937][ T5555] RDX: 0000000000000001 RSI: 0000000000002000 RDI: 00007ff736f3a000
[12075.166644][ T5555] RBP: 00007ffc9c420ab0 R08: 0000000000000000 R09: 0000000000000000
[12075.167349][ T5555] R10: 00007ff736f09000 R11: 0000000000000206 R12: 0000000000000000
[12075.168061][ T5555] R13: 00007ff736f3b9e0 R14: 00007ff736f3ca30 R15: 00007ff736f09000
[12075.168773][ T5555]  </TASK>
[12075.169090][ T5555] Mem-Info:
[12075.169378][ T5555] active_anon:6735 inactive_anon:1469067 isolated_anon:0
[12075.169378][ T5555]  active_file:24 inactive_file:508 isolated_file:424
[12075.169378][ T5555]  unevictable:0 dirty:1 writeback:0
[12075.169378][ T5555]  slab_reclaimable:56327 slab_unreclaimable:112381
[12075.169378][ T5555]  mapped:718 shmem:275 pagetables:53700
[12075.169378][ T5555]  sec_pagetables:0 bounce:0
[12075.169378][ T5555]  kernel_misc_reclaimable:0
[12075.169378][ T5555]  free:11595 free_pcp:554 free_cma:0
[12075.173320][ T5555] Node 0 active_anon:26940kB inactive_anon:5876268kB active_file:96kB inactive_file:2032kB unevictable:0kB isolated(anon):0kB isolated(file):1696kB mapped:2872kB dirtyo
[12075.175767][ T5555] Node 0 DMA free:20kB boost:0kB min:20kB low:32kB high:44kB reserved_highatomic:0KB active_anon:68kB inactive_anon:15272kB active_file:0kB inactive_file:0kB unevictabB
[12075.178094][ T5555] lowmem_reserve[]: 0 2895 6821 0 0
[12075.178559][ T5555] Node 0 DMA32 free:19760kB boost:15612kB min:20092kB low:23056kB high:26020kB reserved_highatomic:0KB active_anon:12592kB inactive_anon:2416704kB active_file:0kB inacB
[12075.181075][ T5555] lowmem_reserve[]: 0 0 3925 0 0
[12075.181517][ T5555] Node 0 Normal free:26600kB boost:21156kB min:27228kB low:31244kB high:35260kB reserved_highatomic:0KB active_anon:14280kB inactive_anon:3444292kB active_file:120kB iB
[12075.184140][ T5555] lowmem_reserve[]: 0 0 0 0 0
[12075.184554][ T5555] Node 0 DMA: 1*4kB (U) 2*8kB (UM) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 20kB
[12075.185606][ T5555] Node 0 DMA32: 85*4kB (UME) 45*8kB (UME) 12*16kB (UME) 6*32kB (UE) 300*64kB (UE) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 20284kB
[12075.186882][ T5555] Node 0 Normal: 0*4kB 1*8kB (U) 0*16kB 299*32kB (U) 247*64kB (UE) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 25384kB
[12075.188047][ T5555] 1152 total pagecache pages
[12075.188450][ T5555] 0 pages in swap cache
[12075.188821][ T5555] Free swap  = 0kB
[12075.189201][ T5555] Total swap = 0kB
[12075.189768][ T5555] 2097018 pages RAM
[12075.190103][ T5555] 0 pages HighMem/MovableOnly
[12075.190509][ T5555] 345037 pages reserved


> 
> The obvious places that lock inodes and do memory allocation are the
> lookup paths and inode extent list initialisation. These occur in
> non-transactional GFP_KERNEL contexts, and so can run direct reclaim
> and lock inodes.
> 
> This patch makes a first path through all the explicit GFP_NOFS
> allocations in XFS and converts the obvious ones to GFP_KERNEL |
> __GFP_NOLOCKDEP as a first step towards removing explicit GFP_NOFS
> allocations from the XFS code.
> 
 


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 07/12] xfs: use __GFP_NOLOCKDEP instead of GFP_NOFS
  2024-06-22  9:44   ` Long Li
@ 2024-07-02  5:55     ` Dave Chinner
  2024-07-02  8:00       ` Long Li
  0 siblings, 1 reply; 29+ messages in thread
From: Dave Chinner @ 2024-07-02  5:55 UTC (permalink / raw)
  To: Long Li; +Cc: willy, linux-mm, linux-xfs

On Sat, Jun 22, 2024 at 05:44:11PM +0800, Long Li wrote:
> On Tue, Jan 16, 2024 at 09:59:45AM +1100, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > In the past we've had problems with lockdep false positives stemming
> > from inode locking occurring in memory reclaim contexts (e.g. from
> > superblock shrinkers). Lockdep doesn't know that inodes access from
> > above memory reclaim cannot be accessed from below memory reclaim
> > (and vice versa) but there has never been a good solution to solving
> > this problem with lockdep annotations.
> > 
> > This situation isn't unique to inode locks - buffers are also locked
> > above and below memory reclaim, and we have to maintain lock
> > ordering for them - and against inodes - appropriately. IOWs, the
> > same code paths and locks are taken both above and below memory
> > reclaim and so we always need to make sure the lock orders are
> > consistent. We are spared the lockdep problems this might cause
> > by the fact that semaphores and bit locks aren't covered by lockdep.
> > 
> > In general, this sort of lockdep false positive detection is cause
> > by code that runs GFP_KERNEL memory allocation with an actively
> > referenced inode locked. When it is run from a transaction, memory
> > allocation is automatically GFP_NOFS, so we don't have reclaim
> > recursion issues. So in the places where we do memory allocation
> > with inodes locked outside of a transaction, we have explicitly set
> > them to use GFP_NOFS allocations to prevent lockdep false positives
> > from being reported if the allocation dips into direct memory
> > reclaim.
> > 
> > More recently, __GFP_NOLOCKDEP was added to the memory allocation
> > flags to tell lockdep not to track that particular allocation for
> > the purposes of reclaim recursion detection. This is a much better
> > way of preventing false positives - it allows us to use GFP_KERNEL
> > context outside of transactions, and allows direct memory reclaim to
> > proceed normally without throwing out false positive deadlock
> > warnings.
> 
> Hi Dave,
> 
> I recently encountered the following AA deadlock lockdep warning
> in Linux-6.9.0. This version of the kernel has currently merged
> your patch set. I believe this is a lockdep false positive warning.

Yes, it is.

> The xfs_dir_lookup_args() function is in a non-transactional context
> and allocates memory with the __GFP_NOLOCKDEP flag in xfs_buf_alloc_pages().
> Even though __GFP_NOLOCKDEP can tell lockdep not to track that particular
> allocation for the purposes of reclaim recursion detection, it cannot
> completely replace __GFP_NOFS.

We are not trying to replace GFP_NOFS with __GFP_NOLOCKDEP. What we
are trying to do is annotate the allocation sites where lockdep
false positives will occur. That way if we get a lockdep report from
a location that uses __GFP_NOLOCKDEP, we know that it is either a
false positive or there is some nested allocation that did not honor
__GFP_NOLOCKDEP.

We've already fixed a bunch of nested allocations (e.g. kasan,
kmemleak, etc) to propagate the __GFP_NOLOCKDEP flag so they don't
generate false positives, either. So the amount of noise has already
been reduced.

> Getting trapped in direct memory reclaim
> maybe trigger the AA deadlock warning as shown below.

No, it can't. xfs_dir_lookup() can only lock referenced inodes.
xfs_reclaim_inodes_nr() can only lock unreferenced inodes. It is not
possible for the same inode to be both referenced and unreferenced
at the same time, therefore memory reclaim cannot self deadlock
through this path.

I expected to see some situations like this when getting rid of
GFP_NOFS (because now memory reclaim runs in places it never used
to). Once I have an idea of the sorts of false positives that are
still being tripped over, I can formulate a plan to eradicate them,
too.

-Dave.
-- 
Dave Chinner
david@fromorbit.com


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 07/12] xfs: use __GFP_NOLOCKDEP instead of GFP_NOFS
  2024-07-02  5:55     ` Dave Chinner
@ 2024-07-02  8:00       ` Long Li
  0 siblings, 0 replies; 29+ messages in thread
From: Long Li @ 2024-07-02  8:00 UTC (permalink / raw)
  To: Dave Chinner; +Cc: willy, linux-mm, linux-xfs

On Tue, Jul 02, 2024 at 03:55:10PM +1000, Dave Chinner wrote:
> On Sat, Jun 22, 2024 at 05:44:11PM +0800, Long Li wrote:
> > On Tue, Jan 16, 2024 at 09:59:45AM +1100, Dave Chinner wrote:
> > > From: Dave Chinner <dchinner@redhat.com>
> > > 
> > > In the past we've had problems with lockdep false positives stemming
> > > from inode locking occurring in memory reclaim contexts (e.g. from
> > > superblock shrinkers). Lockdep doesn't know that inodes access from
> > > above memory reclaim cannot be accessed from below memory reclaim
> > > (and vice versa) but there has never been a good solution to solving
> > > this problem with lockdep annotations.
> > > 
> > > This situation isn't unique to inode locks - buffers are also locked
> > > above and below memory reclaim, and we have to maintain lock
> > > ordering for them - and against inodes - appropriately. IOWs, the
> > > same code paths and locks are taken both above and below memory
> > > reclaim and so we always need to make sure the lock orders are
> > > consistent. We are spared the lockdep problems this might cause
> > > by the fact that semaphores and bit locks aren't covered by lockdep.
> > > 
> > > In general, this sort of lockdep false positive detection is cause
> > > by code that runs GFP_KERNEL memory allocation with an actively
> > > referenced inode locked. When it is run from a transaction, memory
> > > allocation is automatically GFP_NOFS, so we don't have reclaim
> > > recursion issues. So in the places where we do memory allocation
> > > with inodes locked outside of a transaction, we have explicitly set
> > > them to use GFP_NOFS allocations to prevent lockdep false positives
> > > from being reported if the allocation dips into direct memory
> > > reclaim.
> > > 
> > > More recently, __GFP_NOLOCKDEP was added to the memory allocation
> > > flags to tell lockdep not to track that particular allocation for
> > > the purposes of reclaim recursion detection. This is a much better
> > > way of preventing false positives - it allows us to use GFP_KERNEL
> > > context outside of transactions, and allows direct memory reclaim to
> > > proceed normally without throwing out false positive deadlock
> > > warnings.
> > 
> > Hi Dave,
> > 
> > I recently encountered the following AA deadlock lockdep warning
> > in Linux-6.9.0. This version of the kernel has currently merged
> > your patch set. I believe this is a lockdep false positive warning.
> 
> Yes, it is.
> 
> > The xfs_dir_lookup_args() function is in a non-transactional context
> > and allocates memory with the __GFP_NOLOCKDEP flag in xfs_buf_alloc_pages().
> > Even though __GFP_NOLOCKDEP can tell lockdep not to track that particular
> > allocation for the purposes of reclaim recursion detection, it cannot
> > completely replace __GFP_NOFS.
> 
> We are not trying to replace GFP_NOFS with __GFP_NOLOCKDEP. What we
> are trying to do is annotate the allocation sites where lockdep
> false positives will occur. That way if we get a lockdep report from
> a location that uses __GFP_NOLOCKDEP, we know that it is either a
> false positive or there is some nested allocation that did not honor
> __GFP_NOLOCKDEP.
> 
> We've already fixed a bunch of nested allocations (e.g. kasan,
> kmemleak, etc) to propagate the __GFP_NOLOCKDEP flag so they don't
> generate false positives, either. So the amount of noise has already
> been reduced.
> 
> > Getting trapped in direct memory reclaim
> > maybe trigger the AA deadlock warning as shown below.
> 
> No, it can't. xfs_dir_lookup() can only lock referenced inodes.
> xfs_reclaim_inodes_nr() can only lock unreferenced inodes. It is not
> possible for the same inode to be both referenced and unreferenced
> at the same time, therefore memory reclaim cannot self deadlock
> through this path.

Yes, I know. An AA deadlock couldn't happen in this situation because
it's not the same inode, so it's just a lockdep false positive warning.

> 
> I expected to see some situations like this when getting rid of
> GFP_NOFS (because now memory reclaim runs in places it never used
> to). Once I have an idea of the sorts of false positives that are
> still being tripped over, I can formulate a plan to eradicate them,
> too.

Ok, memory reclaim may run in those places where GFP_NOFS is removed.
Some new lockdep false positive warnings may appear. I hope this report
can help you eradicate them in the future.

Thanks for your reply. :)

> 
> -Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> 


^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2024-07-02  7:49 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-01-15 22:59 [PATCH 00/12] xfs: remove remaining kmem interfaces and GFP_NOFS usage Dave Chinner
2024-01-15 22:59 ` [PATCH 01/12] xfs: convert kmem_zalloc() to kzalloc() Dave Chinner
2024-01-18 22:48   ` Darrick J. Wong
2024-01-15 22:59 ` [PATCH 03/12] xfs: move kmem_to_page() Dave Chinner
2024-01-18 22:50   ` Darrick J. Wong
2024-01-15 22:59 ` [PATCH 04/12] xfs: convert kmem_free() for kvmalloc users to kvfree() Dave Chinner
2024-01-18 22:53   ` Darrick J. Wong
2024-01-15 22:59 ` [PATCH 05/12] xfs: convert remaining kmem_free() to kfree() Dave Chinner
2024-01-18 22:54   ` Darrick J. Wong
2024-01-15 22:59 ` [PATCH 06/12] xfs: use an empty transaction for fstrim Dave Chinner
2024-01-18 22:55   ` Darrick J. Wong
2024-01-15 22:59 ` [PATCH 07/12] xfs: use __GFP_NOLOCKDEP instead of GFP_NOFS Dave Chinner
2024-01-18 23:32   ` Darrick J. Wong
2024-06-22  9:44   ` Long Li
2024-07-02  5:55     ` Dave Chinner
2024-07-02  8:00       ` Long Li
2024-01-15 22:59 ` [PATCH 08/12] xfs: use GFP_KERNEL in pure transaction contexts Dave Chinner
2024-01-18 23:38   ` Darrick J. Wong
2024-01-15 22:59 ` [PATCH 09/12] xfs: place intent recovery under NOFS allocation context Dave Chinner
2024-01-18 23:39   ` Darrick J. Wong
2024-01-15 22:59 ` [PATCH 10/12] xfs: place the CIL under nofs " Dave Chinner
2024-01-18 23:41   ` Darrick J. Wong
2024-01-15 22:59 ` [PATCH 11/12] xfs: clean up remaining GFP_NOFS users Dave Chinner
2024-01-19  0:52   ` Darrick J. Wong
2024-01-15 22:59 ` [PATCH 12/12] xfs: use xfs_defer_alloc a bit more Dave Chinner
2024-01-18 23:41   ` Darrick J. Wong
     [not found] ` <20240115230113.4080105-3-david@fromorbit.com>
2024-01-18 22:50   ` [PATCH 02/12] xfs: convert kmem_alloc() to kmalloc() Darrick J. Wong
2024-03-25 17:46 ` [PATCH 00/12] xfs: remove remaining kmem interfaces and GFP_NOFS usage Pankaj Raghav (Samsung)
2024-04-01 21:30   ` Dave Chinner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).