All of lore.kernel.org
 help / color / mirror / Atom feed
From: Joseph Salisbury <joseph.salisbury@oracle.com>
To: Haakon Bugge <haakon.bugge@oracle.com>
Cc: "David Hildenbrand (Arm)" <david@kernel.org>,
	Pedro Falcato <pfalcato@suse.de>,
	Andrew Morton <akpm@linux-foundation.org>,
	Chris Li <chrisl@kernel.org>, Kairui Song <kasong@tencent.com>,
	Jason Gunthorpe <jgg@ziepe.ca>,
	John Hubbard <jhubbard@nvidia.com>, Peter Xu <peterx@redhat.com>,
	Kemeng Shi <shikemeng@huaweicloud.com>,
	Nhat Pham <nphamcs@gmail.com>, Baoquan He <bhe@redhat.com>,
	Barry Song <baohua@kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Subject: Re: [RFC] mm: stress-ng --mremap triggers severe lruvec lock contention in populate/unmap paths
Date: Thu, 9 Apr 2026 13:26:05 -0400	[thread overview]
Message-ID: <30bc7a88-a8d5-4eee-95cd-c0ecf70ff66a@oracle.com> (raw)
In-Reply-To: <F72E050D-BAA4-4D5B-AFDC-5F8A20D8A70E@oracle.com>



On 4/9/26 12:37 PM, Haakon Bugge wrote:
>> On 8 Apr 2026, at 16:27, Joseph Salisbury <joseph.salisbury@oracle.com> wrote:
>>
>>
>>
>> On 4/8/26 4:09 AM, David Hildenbrand (Arm) wrote:
>>>>> It was also found that adding '--mremap-numa' changes the behavior
>>>>> substantially:
>>>> "assign memory mapped pages to randomly selected NUMA nodes. This is
>>>> disabled for systems that do not support NUMA."
>>>>
>>>> so this is just sharding your lock contention across your NUMA nodes (you
>>>> have an lruvec per node).
>>>>
>>>>> stress-ng --mremap 8192 --mremap-bytes 4K --timeout 30 --mremap-numa
>>>>> --metrics-brief
>>>>>
>>>>> mremap 2570798 29.39 8.06 106.23 87466.50 22494.74
>>>>>
>>>>> So it's possible that either actual swapping, or the mbind(...,
>>>>> MPOL_MF_MOVE) path used by '--mremap-numa', removes most of the excessive
>>>>> system time.
>>>>>
>>>>> Does this look like a known MM scalability issue around short-lived
>>>>> MAP_POPULATE / munmap churn?
>>>> Yes. Is this an actual issue on some workload?
>>> Same thought, it's unclear to me why we should care here. In particular,
>>> when talking about excessive use of zero-filled pages.
>>>
>> Currently this is only showing up with that particular stress test. We will try John's patch and provide feedback.
>>
>> Thanks for all the feedback, everyone!
> I reported this internally and have worked with Joseph on it. I tested v7.0-rc7-68-g7f87a5ea75f01 ("-"), "Base", vs. ditto plus John Hubbard's patch ("+"), "Test".
>
> Stress-ng command: stress-ng --mremap 8192 --mremap-bytes 4K --timeout 30 --metrics-brief
>
> System is an AMD EPYC 9J45:
>    NUMA node(s):              2
>    NUMA node0 CPU(s):         0-127,256-383
>    NUMA node1 CPU(s):         128-255,384-511
>
> The stress-ng command was run ten times and here are the averages and pstdev:
>
>     bogo ops/s   pstdev  system time   pstdev
>     (realtime)
> --------------------------------------------
> -     3192638      35%        24041      32%
> +     3657904       5%        15278       0%
>
> This is 15% improvement in bogo ops/s (realtime) and a decent 36% reduction in system time.
>
> I shamelessly copied and modified the fio command from [1]. I ran:
>
> # fio -filename=/dev/nvme0n1 -direct=0 -thread -size=1024G -rwmixwrite=30 \
> --norandommap --randrepeat=0 -ioengine=mmap -bs=4k -numjobs=1024 -runtime=3600 \
> --time_based -group_reporting -name=mytest
>
> (that is, one hour runtime)
>
> - read: IOPS=14.0M, BW=53.4GiB/s (57.3GB/s)(188TiB/3608413msec)
> + read: IOPS=16.0M, BW=61.2GiB/s (65.7GB/s)(215TiB/3600051msec)
> - READ: bw=53.4GiB/s (57.3GB/s), 53.4GiB/s-53.4GiB/s (57.3GB/s-57.3GB/s), io=188TiB (207TB), run=3608413-3608413msec
> + READ: bw=61.2GiB/s (65.7GB/s), 61.2GiB/s-61.2GiB/s (65.7GB/s-65.7GB/s), io=215TiB (237TB), run=3600051-3600051msec
>
> Also, running Base, I see tons of:
>
> Jobs: 726 (f=726): [_(2),R(1),_(1),R(3),_(4),R(6),_(1),R(2),_(2),R(2),_(3),R(1),_(5),R(2),_(1),R(2),_(1),R(1),_(2),R(2),_(1),R(1),_(1),R(2),_(1),R(3),_(1),R(3),_(1),R(1),_(1),R(1),_(1),R(1),_(1),R(3),_(1),R(3),_(1),R(1),_(3),R(1),_(1),R(5),_(1),R(5),_(1),R(1),_(2),R(1),_(4),R(2),_(1),R(3),_(1),R(3),_(1),R(1),_(2),R(1),_(1),R(8),_(1),R(4),_(1),R(3),_(1),R(1),_(1),R(2),_(1),R(7),_(2),R(2)
>
> when the fio test terminates, which I do not see using Test. I take that as the threads do not terminate timely using the Base kernel.
>
>
> Thxs, Håkon
>
>
> [1] https://lkml.org/lkml/2024/7/3/1049
>
>
Adding Lorenzo Stoakes to Cc.


  reply	other threads:[~2026-04-09 17:26 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-07 20:09 [RFC] mm: stress-ng --mremap triggers severe lruvec lock contention in populate/unmap paths Joseph Salisbury
2026-04-07 21:47 ` Pedro Falcato
2026-04-08  8:09   ` David Hildenbrand (Arm)
2026-04-08 14:27     ` [External] : " Joseph Salisbury
2026-04-09 16:37       ` Haakon Bugge
2026-04-09 17:26         ` Joseph Salisbury [this message]
2026-04-10 10:43         ` Pedro Falcato
2026-04-09 18:24     ` Lorenzo Stoakes
2026-04-09 21:59     ` Barry Song
2026-04-10 10:30       ` Pedro Falcato
2026-04-11  9:09         ` Barry Song
2026-04-07 22:44 ` John Hubbard
2026-04-08  0:35   ` Hugh Dickins
2026-04-09 18:03     ` Lorenzo Stoakes
2026-04-09 18:12       ` John Hubbard
2026-04-09 18:20         ` David Hildenbrand (Arm)
2026-04-09 18:47         ` Lorenzo Stoakes
2026-04-09 18:15       ` Haakon Bugge
2026-04-09 18:43         ` Lorenzo Stoakes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=30bc7a88-a8d5-4eee-95cd-c0ecf70ff66a@oracle.com \
    --to=joseph.salisbury@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=baohua@kernel.org \
    --cc=bhe@redhat.com \
    --cc=chrisl@kernel.org \
    --cc=david@kernel.org \
    --cc=haakon.bugge@oracle.com \
    --cc=jgg@ziepe.ca \
    --cc=jhubbard@nvidia.com \
    --cc=kasong@tencent.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=nphamcs@gmail.com \
    --cc=peterx@redhat.com \
    --cc=pfalcato@suse.de \
    --cc=shikemeng@huaweicloud.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.