Re: [RFC] mm: stress-ng --mremap triggers severe lruvec lock contention in populate/unmap paths

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Lorenzo Stoakes <ljs@kernel.org>
To: Hugh Dickins <hughd@google.com>
Cc: John Hubbard <jhubbard@nvidia.com>,
	 Joseph Salisbury <joseph.salisbury@oracle.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	 David Hildenbrand <david@kernel.org>,
	Chris Li <chrisl@kernel.org>, Kairui Song <kasong@tencent.com>,
	 Jason Gunthorpe <jgg@ziepe.ca>, Peter Xu <peterx@redhat.com>,
	 Kemeng Shi <shikemeng@huaweicloud.com>,
	Nhat Pham <nphamcs@gmail.com>, Baoquan He <bhe@redhat.com>,
	 Barry Song <baohua@kernel.org>,
	linux-mm@kvack.org, LKML <linux-kernel@vger.kernel.org>
Subject: Re: [RFC] mm: stress-ng --mremap triggers severe lruvec lock contention in populate/unmap paths
Date: Thu, 9 Apr 2026 19:03:41 +0100	[thread overview]
Message-ID: <adfhGOHcg4AF3IFn@lucifer> (raw)
In-Reply-To: <982e5964-5ea6-eaf7-a11a-0692f14a6943@google.com>

On Tue, Apr 07, 2026 at 05:35:18PM -0700, Hugh Dickins wrote:
> On Tue, 7 Apr 2026, John Hubbard wrote:
> > On 4/7/26 1:09 PM, Joseph Salisbury wrote:
> > > Hello,
> > >
> > > I would like to ask for feedback on an MM performance issue triggered by
> > > stress-ng's mremap stressor:
> > >
> > > stress-ng --mremap 8192 --mremap-bytes 4K --timeout 30 --metrics-brief
> > >
> > > This was first investigated as a possible regression from 0ca0c24e3211
> > > ("mm: store zero pages to be swapped out in a bitmap"), but the current
> > > evidence suggests that commit is mostly exposing an older problem for
> > > this workload rather than directly causing it.
> > >
> >
> > Can you try this out? (Adding Hugh to Cc.)
> >
> > From: John Hubbard <jhubbard@nvidia.com>
> > Date: Tue, 7 Apr 2026 15:33:47 -0700
> > Subject: [PATCH] mm/gup: skip lru_add_drain() for non-locked populate
> > X-NVConfidentiality: public
> > Cc: John Hubbard <jhubbard@nvidia.com>
> >
> > populate_vma_page_range() calls lru_add_drain() unconditionally after
> > __get_user_pages(). With high-frequency single-page MAP_POPULATE/munmap
> > cycles at high thread counts, this forces a lruvec->lru_lock acquire
> > per page, defeating per-CPU folio_batch batching.
> >
> > The drain was added by commit ece369c7e104 ("mm/munlock: add
> > lru_add_drain() to fix memcg_stat_test") for VM_LOCKED populate, where
> > unevictable page stats must be accurate after faulting. Non-locked VMAs
> > have no such requirement. Skip the drain for them.
> >
> > Cc: Hugh Dickins <hughd@google.com>
> > Signed-off-by: John Hubbard <jhubbard@nvidia.com>
>
> Thanks for the Cc.  I'm not convinced that we should be making such a
> change, just to avoid the stress that an avowed stresstest is showing;
> but can let others debate that - and, need it be said, I have no
> problem with Joseph trying your patch.

Yeah, the test case (as said by others also) is rather synthetic, and it's a
test designed to saturate, if not I/O throttled by swap then we hammer the
populate path. It feels like a micro-optimisation for something that is not (at
least not yet demonstrated to be) an actual problem.

stress-ng is not a benchmarking tool per se, it's designed to eek out bugs.

So really we need to see a real-world case I think.

>
> I tend to stand by my comment in that commit, that it's not just for
> VM_LOCKED: I believe it's in everyone's interest that a bulk faulting
> interface like populate_vma_page_range() or faultin_vma_page_range()
> should drain its local pagevecs at the end, to save others sometimes
> needing the much more expensive lru_add_drain_all().

I mean yeah, but I guess anywhere that _really_ needs to be sure of the drain
has to do an lru_add_drain_all(), because it'd be fragile to rely on
lru_add_drain()'s being done at the right time?

>
> But lru_add_drain() and lru_add_drain_all(): there's so much to be
> said and agonized over there  They've distressed me for years, and
> are a hot topic for us at present.  But I won't be able to contribute
> more on that subject, not this week.

Yeah they do feel rather delicate... :) sometimes you _really do_ need to know
everything's drained. But other times it feels a bit whack-a-mole.

I also do agree it makes sense to drain locally after a batch operation.

It all comes down to whether this manifests in a real-world case, at which point
maybe this is a more useful change?

>
> Hugh
>
> > ---
> >  mm/gup.c | 13 ++++++++++++-
> >  1 file changed, 12 insertions(+), 1 deletion(-)
> >
> > diff --git a/mm/gup.c b/mm/gup.c
> > index 8e7dc2c6ee73..2dd5de1cb5b9 100644
> > --- a/mm/gup.c
> > +++ b/mm/gup.c
> > @@ -1816,6 +1816,7 @@ long populate_vma_page_range(struct vm_area_struct *vma,
> >  	struct mm_struct *mm = vma->vm_mm;
> >  	unsigned long nr_pages = (end - start) / PAGE_SIZE;
> >  	int local_locked = 1;
> > +	bool need_drain;
> >  	int gup_flags;
> >  	long ret;
> >
> > @@ -1857,9 +1858,19 @@ long populate_vma_page_range(struct vm_area_struct *vma,
> >  	 * We made sure addr is within a VMA, so the following will
> >  	 * not result in a stack expansion that recurses back here.
> >  	 */
> > +	/*
> > +	 * Read VM_LOCKED before __get_user_pages(), which may drop
> > +	 * mmap_lock when FOLL_UNLOCKABLE is set, after which the vma
> > +	 * must not be accessed. The read is stable: mmap_lock is held
> > +	 * for read here, so mlock() (which needs the write lock)
> > +	 * cannot change VM_LOCKED concurrently.
> > +	 */

BTW, not to nitpick (OK, maybe to nitpick :) this comments feels a bit
redundant. Maybe useful to note that the lock might be dropped (but you don't
indicate why it's OK to still assume state about the VMA), and it's a known
thing that you need a VMA write lock to alter flags, if we had to comment this
each time mm would be mostly comments :)

So if you want a comment here I'd say something like 'the lock might be dropped
due to FOLL_UNLOCKABLE, but that's ok, we would simply end up doing a redundant
drain in this case'.

But I'm not sure it's needed?

> > +	need_drain = vma->vm_flags & VM_LOCKED;

Please use the new VMA flag interface :)

	need_drain = vma_test(VMA_LOCKED_BIT);

> > +
> >  	ret = __get_user_pages(mm, start, nr_pages, gup_flags,
> >  			       NULL, locked ? locked : &local_locked);
> > -	lru_add_drain();
> > +	if (need_drain)
> > +		lru_add_drain();
> >  	return ret;
> >  }
> >
> >
> > base-commit: 3036cd0d3328220a1858b1ab390be8b562774e8a
> > --
> > 2.53.0
> >
> >
> > thanks,
> > --
> > John Hubbard
>

Cheers, Lorenzo

next prev parent reply	other threads:[~2026-04-09 18:03 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-07 20:09 [RFC] mm: stress-ng --mremap triggers severe lruvec lock contention in populate/unmap paths Joseph Salisbury
2026-04-07 21:47 ` Pedro Falcato
2026-04-08  8:09   ` David Hildenbrand (Arm)
2026-04-08 14:27     ` [External] : " Joseph Salisbury
2026-04-09 16:37       ` Haakon Bugge
2026-04-09 17:26         ` Joseph Salisbury
2026-04-10 10:43         ` [External] : " Pedro Falcato
2026-04-09 18:24     ` Lorenzo Stoakes
2026-04-09 21:59     ` Barry Song
2026-04-10 10:30       ` Pedro Falcato
2026-04-11  9:09         ` Barry Song
2026-04-07 22:44 ` John Hubbard
2026-04-08  0:35   ` Hugh Dickins
2026-04-09 18:03     ` Lorenzo Stoakes [this message]
2026-04-09 18:12       ` John Hubbard
2026-04-09 18:20         ` David Hildenbrand (Arm)
2026-04-09 18:47         ` Lorenzo Stoakes
2026-04-09 18:15       ` Haakon Bugge
2026-04-09 18:43         ` Lorenzo Stoakes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=adfhGOHcg4AF3IFn@lucifer \
    --to=ljs@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=baohua@kernel.org \
    --cc=bhe@redhat.com \
    --cc=chrisl@kernel.org \
    --cc=david@kernel.org \
    --cc=hughd@google.com \
    --cc=jgg@ziepe.ca \
    --cc=jhubbard@nvidia.com \
    --cc=joseph.salisbury@oracle.com \
    --cc=kasong@tencent.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=nphamcs@gmail.com \
    --cc=peterx@redhat.com \
    --cc=shikemeng@huaweicloud.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.