Re: [PATCH v2 3/4] mm/fs: split the file's i_mmap tree

All of lore.kernel.org
 help / color / mirror / Atom feed

From: sashiko-bot@kernel.org
To: "Huang Shijie" <huangsj@hygon.cn>
Cc: linux-perf-users@vger.kernel.org
Subject: Re: [PATCH v2 3/4] mm/fs: split the file's i_mmap tree
Date: Thu, 11 Jun 2026 06:37:11 +0000	[thread overview]
Message-ID: <20260611063712.51C791F00893@smtp.kernel.org> (raw)
In-Reply-To: <20260611061915.2354307-4-huangsj@hygon.cn>

Thank you for your contribution! Sashiko AI review found 8 potential issue(s) to consider:
- [Critical] Immediate self-deadlock in `vma_prepare()` when adjacent VMAs share the same `tree_idx`.
- [Critical] `vma_complete()` blindly unlocks both `vp->vma` and `vp->adj_next` trees, risking a double unlock.
- [High] Lock inversion deadlock between `vma_prepare()` and global `i_mmap` tree locking functions.
- [High] Massive per-inode memory bloat due to unconditional split tree allocation.
- [High] `address_space_init_once()` no longer initializes `i_mmap` fields.
- [High] Lock ordering inversion between `anon_vma->rwsem` and `mapping->i_mmap` sibling tree locks.
- [Medium] Iterating over all `i_mmap` sibling trees triggers lockdep recursive locking splats.
- [Low] Syntax error in `hugetlbfs_lockdep_set_class()` blocks compilation.
--

commit f5f0f658ce23acc54c74ca009d1b5fa489c2d03a
Author: Huang Shijie <huangsj@hygon.cn>
Subject: mm/fs: split the file's i_mmap tree

This commit splits the single i_mmap tree per file into multiple sibling
trees to reduce lock contention during VMA operations. It introduces new
fields and updates lock handling and interval tree iterators across the
memory management subsystem.

> diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
> --- a/fs/hugetlbfs/inode.c
> +++ b/fs/hugetlbfs/inode.c
[ ... ]
> @@ -891,6 +891,23 @@ static struct inode *hugetlbfs_get_root(struct super_block *sb,
>   */
>  static struct lock_class_key hugetlbfs_i_mmap_rwsem_key;
>  
> +#ifdef CONFIG_SPLIT_I_MMAP
> +static void hugetlbfs_lockdep_set_class(struct address_space *mapping)
> +{
> +	int i;
> +
> +	for (i = 0; i < split_tree_num; i++) {
> +		lockdep_set_class(&mapping->i_mmap[i].rwsem,

[Severity: Low]
Does this cause a compilation error? Since mapping->i_mmap is an array of
pointers to struct i_mmap_tree, using the dot operator to access rwsem
will fail. Should this be mapping->i_mmap[i]->rwsem instead?

> +				&hugetlbfs_i_mmap_rwsem_key);
> +	}
> +}

[ ... ]
> diff --git a/fs/inode.c b/fs/inode.c
> --- a/fs/inode.c
> +++ b/fs/inode.c
[ ... ]
> @@ -214,6 +214,70 @@ static int no_open(struct inode *inode, struct file *file)
>  	return -ENXIO;
>  }
>  
> +#ifdef CONFIG_SPLIT_I_MMAP
[ ... ]
> +static int init_mapping_i_mmap(struct address_space *mapping, gfp_t gfp)
> +{
> +	struct i_mmap_tree *tree;
> +	int i;
> +
> +	/* The extra one is used as terminator in vma_interval_tree_foreach() */
> +	mapping->i_mmap = kzalloc(sizeof(tree) * (split_tree_num + 1), gfp);
> +	if (!mapping->i_mmap)
> +		return -ENOMEM;
> +
> +	for (i = 0; i < split_tree_num; i++) {
> +		tree = kzalloc_node(sizeof(*tree), gfp, i);

[Severity: High]
Could this lead to significant memory bloat? By allocating the array and
multiple trees unconditionally for every instantiated inode, high-core-count
systems might consume massive amounts of memory for millions of cached
inodes, many of which may never use the i_mmap tree.

> +		if (!tree)
> +			goto nomem;
> +
> +		tree->root = RB_ROOT_CACHED;
> +		init_rwsem(&tree->rwsem);
> +
> +		mapping->i_mmap[i] = tree;
> +	}
> +	return 0;
> +nomem:
[ ... ]
> @@ -480,9 +550,7 @@ EXPORT_SYMBOL(inc_nlink);
>  static void __address_space_init_once(struct address_space *mapping)
>  {
>  	xa_init_flags(&mapping->i_pages, XA_FLAGS_LOCK_IRQ | XA_FLAGS_ACCOUNT);
> -	init_rwsem(&mapping->i_mmap_rwsem);
>  	spin_lock_init(&mapping->i_private_lock);
> -	mapping->i_mmap = RB_ROOT_CACHED;
>  }

[Severity: High]
Will this cause issues for non-inode users of address_space? Since
address_space_init_once() no longer initializes the i_mmap fields,
standalone mappings could trigger null pointer dereferences when their
i_mmap locks are acquired.

> diff --git a/include/linux/fs.h b/include/linux/fs.h
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
[ ... ]
> +static inline void i_mmap_lock_write(struct address_space *mapping)
> +{
> +	int i;
> +
> +	for (i = 0; i < split_tree_num; i++)
> +		down_write(&mapping->i_mmap[i]->rwsem);
> +}

[Severity: Medium]
Will this trigger lockdep splats for recursive locking? Because all sibling
tree semaphores share the same static lock class, acquiring them
sequentially inside this loop might cause lockdep to detect a false
recursion and exceed its maximum depth limits.

> diff --git a/mm/vma.c b/mm/vma.c
> --- a/mm/vma.c
> +++ b/mm/vma.c
[ ... ]
> @@ -318,12 +321,17 @@ static void vma_prepare(struct vma_prepare *vp)
>  	}
>  
>  	if (vp->file) {
> +		i_mmap_tree_lock_write(vp->mapping, vp->vma);

[Severity: High]
Does this sequence invert the standard locking hierarchy? This path inside
vma_prepare() now acquires the i_mmap tree lock after acquiring the anon_vma
lock (acquired earlier in the function), whereas previously the full i_mmap
lock was acquired before locking the anon_vma.

>  		flush_dcache_mmap_lock(vp->mapping);
>  		vma_interval_tree_remove(vp->vma,
> -					get_i_mmap_root(vp->mapping));
> -		if (vp->adj_next)
> +					get_rb_root(vp->vma, vp->mapping));
> +		dec_mapping_vma(vp->mapping, vp->vma);
> +		if (vp->adj_next) {
> +			i_mmap_tree_lock_write(vp->mapping, vp->adj_next);

[Severity: Critical]
Is there a risk of an immediate self-deadlock here? If vp->vma and
vp->adj_next share the same tree index, this thread will attempt to acquire
the exact same semaphore twice in a row.

[Severity: High]
Could this also cause a lock inversion against the global tree traversal
functions? The global functions lock all trees in ascending index order,
but this path locks based on adjacent vma tree indexes which could be in
descending order.

>  			vma_interval_tree_remove(vp->adj_next,
> -					get_i_mmap_root(vp->mapping));
> +					get_rb_root(vp->adj_next, vp->mapping));
> +			dec_mapping_vma(vp->mapping, vp->adj_next);
> +		}
>  	}
[ ... ]
> @@ -340,12 +348,17 @@ static void vma_complete(struct vma_prepare *vp, struct vma_iterator *vmi,
>  			 struct mm_struct *mm)
>  {
>  	if (vp->file) {
> -		if (vp->adj_next)
> +		if (vp->adj_next) {
>  			vma_interval_tree_insert(vp->adj_next,
> -					get_i_mmap_root(vp->mapping));
> +					get_rb_root(vp->adj_next, vp->mapping));
> +			inc_mapping_vma(vp->mapping, vp->adj_next);
> +			i_mmap_tree_unlock_write(vp->mapping, vp->adj_next);
> +		}
>  		vma_interval_tree_insert(vp->vma,
> -					get_i_mmap_root(vp->mapping));
> +					get_rb_root(vp->vma, vp->mapping));
> +		inc_mapping_vma(vp->mapping, vp->vma);
>  		flush_dcache_mmap_unlock(vp->mapping);
> +		i_mmap_tree_unlock_write(vp->mapping, vp->vma);
>  	}

[Severity: Critical]
If vp->vma and vp->adj_next both map to the same sibling tree index, could
this cause a double unlock? This sequence releases the tree for vp->adj_next
and then for vp->vma, which would unlock the same semaphore twice and
corrupt its state.

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260611061915.2354307-1-huangsj@hygon.cn?part=3

next prev parent reply	other threads:[~2026-06-11  6:37 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-11  6:18 [PATCH v2 0/4] mm: split the file's i_mmap tree for NUMA Huang Shijie
2026-06-11  6:18 ` [PATCH v2 1/4] mm: use mapping_mapped to simplify the code Huang Shijie
2026-06-11 11:13   ` Pedro Falcato
2026-06-11 15:52   ` Lorenzo Stoakes
2026-06-11  6:18 ` [PATCH v2 2/4] mm: use get_i_mmap_root to access the file's i_mmap Huang Shijie
2026-06-11  6:31   ` sashiko-bot
2026-06-11  6:18 ` [PATCH v2 3/4] mm/fs: split the file's i_mmap tree Huang Shijie
2026-06-11  6:37   ` sashiko-bot [this message]
2026-06-11 11:11   ` Pedro Falcato
2026-06-11 15:48     ` Lorenzo Stoakes
2026-06-11  6:19 ` [PATCH v2 4/4] docs/mm: update document for split " Huang Shijie
2026-06-11  6:26   ` sashiko-bot
2026-06-11 16:00 ` [PATCH v2 0/4] mm: split the file's i_mmap tree for NUMA Lorenzo Stoakes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260611063712.51C791F00893@smtp.kernel.org \
    --to=sashiko-bot@kernel.org \
    --cc=huangsj@hygon.cn \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=sashiko-reviews@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.