From: Uladzislau Rezki <urezki@gmail.com>
To: Dave Chinner <david@fromorbit.com>
Cc: Uladzislau Rezki <urezki@gmail.com>,
linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>,
LKML <linux-kernel@vger.kernel.org>, Baoquan He <bhe@redhat.com>,
Lorenzo Stoakes <lstoakes@gmail.com>,
Christoph Hellwig <hch@infradead.org>,
Matthew Wilcox <willy@infradead.org>,
"Liam R . Howlett" <Liam.Howlett@oracle.com>,
"Paul E . McKenney" <paulmck@kernel.org>,
Joel Fernandes <joel@joelfernandes.org>,
Oleksiy Avramchenko <oleksiy.avramchenko@sony.com>
Subject: Re: [PATCH v3 10/11] mm: vmalloc: Set nr_nodes based on CPUs in a system
Date: Thu, 18 Jan 2024 19:23:47 +0100 [thread overview]
Message-ID: <ZalssyzC8_HsFZON@pc636> (raw)
In-Reply-To: <Zab9yuUiz8OCMOHw@dread.disaster.area>
On Wed, Jan 17, 2024 at 09:06:02AM +1100, Dave Chinner wrote:
> On Mon, Jan 15, 2024 at 08:09:29PM +0100, Uladzislau Rezki wrote:
> > > On Tue, Jan 02, 2024 at 07:46:32PM +0100, Uladzislau Rezki (Sony) wrote:
> > > > A number of nodes which are used in the alloc/free paths is
> > > > set based on num_possible_cpus() in a system. Please note a
> > > > high limit threshold though is fixed and corresponds to 128
> > > > nodes.
> > >
> > > Large CPU count machines are NUMA machines. ALl of the allocation
> > > and reclaim is NUMA node based i.e. a pgdat per NUMA node.
> > >
> > > Shrinkers are also able to be run in a NUMA aware mode so that
> > > per-node structures can be reclaimed similar to how per-node LRU
> > > lists are scanned for reclaim.
> > >
> > > Hence I'm left to wonder if it would be better to have a vmalloc
> > > area per pgdat (or sub-node cluster) rather than just base the
> > > number on CPU count and then have an arbitrary maximum number when
> > > we get to 128 CPU cores. We can have 128 CPU cores in a
> > > single socket these days, so not being able to scale the vmalloc
> > > areas beyond a single socket seems like a bit of a limitation.
> > >
> > >
> > > Hence I'm left to wonder if it would be better to have a vmalloc
> > > area per pgdat (or sub-node cluster) rather than just base the
> > >
> > > Scaling out the vmalloc areas in a NUMA aware fashion allows the
> > > shrinker to be run in numa aware mode, which gets rid of the need
> > > for the global shrinker to loop over every single vmap area in every
> > > shrinker invocation. Only the vm areas on the node that has a memory
> > > shortage need to be scanned and reclaimed, it doesn't need reclaim
> > > everything globally when a single node runs out of memory.
> > >
> > > Yes, this may not give quite as good microbenchmark scalability
> > > results, but being able to locate each vm area in node local memory
> > > and have operation on them largely isolated to node-local tasks and
> > > vmalloc area reclaim will work much better on large multi-socket
> > > NUMA machines.
> > >
> > Currently i fix the max nodes number to 128. This is because i do not
> > have an access to such big NUMA systems whereas i do have an access to
> > around ~128 ones. That is why i have decided to stop on that number as
> > of now.
>
> I suspect you are confusing number of CPUs with number of NUMA nodes.
>
I do not think so :)
>
> A NUMA system with 128 nodes is a large NUMA system that will have
> thousands of CPU cores, whilst above you talk about basing the
> count on CPU cores and that a single socket can have 128 cores?
>
> > We can easily set nr_nodes to num_possible_cpus() and let it scale for
> > anyone. But before doing this, i would like to give it a try as a first
> > step because i have not tested it well on really big NUMA systems.
>
> I don't think you need to have large NUMA systems to test it. We
> have the "fakenuma" feature for a reason. Essentially, once you
> have enough CPU cores that catastrophic lock contention can be
> generated in a fast path (can take as few as 4-5 CPU cores), then
> you can effectively test NUMA scalability with fakenuma by creating
> nodes with >=8 CPUs each.
>
> This is how I've done testing of numa aware algorithms (like
> shrinkers!) for the past decade - I haven't had direct access to a
> big NUMA machine since 2008, yet it's relatively trivial to test
> NUMA based scalability algorithms without them these days.
>
I see your point. NUMA-aware scalability require reworking adding extra
layer that allows such scaling.
If the socket has 256 CPUs, how do scale VAs inside that node among
those CPUs?
--
Uladzislau Rezki
next prev parent reply other threads:[~2024-01-18 18:23 UTC|newest]
Thread overview: 55+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-01-02 18:46 [PATCH v3 00/11] Mitigate a vmap lock contention v3 Uladzislau Rezki (Sony)
2024-01-02 18:46 ` [PATCH v3 01/11] mm: vmalloc: Add va_alloc() helper Uladzislau Rezki (Sony)
2024-01-02 18:46 ` [PATCH v3 02/11] mm: vmalloc: Rename adjust_va_to_fit_type() function Uladzislau Rezki (Sony)
2024-01-02 18:46 ` [PATCH v3 03/11] mm: vmalloc: Move vmap_init_free_space() down in vmalloc.c Uladzislau Rezki (Sony)
2024-01-02 18:46 ` [PATCH v3 04/11] mm: vmalloc: Remove global vmap_area_root rb-tree Uladzislau Rezki (Sony)
2024-01-05 8:10 ` Wen Gu
2024-01-05 10:50 ` Uladzislau Rezki
2024-01-06 9:17 ` Wen Gu
2024-01-06 16:36 ` Uladzislau Rezki
2024-01-07 6:59 ` Hillf Danton
2024-01-08 7:45 ` Wen Gu
2024-01-08 18:37 ` Uladzislau Rezki
2024-01-16 23:25 ` Lorenzo Stoakes
2024-01-18 13:15 ` Uladzislau Rezki
2024-01-20 12:55 ` Lorenzo Stoakes
2024-01-22 17:44 ` Uladzislau Rezki
2024-01-02 18:46 ` [PATCH v3 05/11] mm/vmalloc: remove vmap_area_list Uladzislau Rezki (Sony)
2024-01-16 23:36 ` Lorenzo Stoakes
2024-01-02 18:46 ` [PATCH v3 06/11] mm: vmalloc: Remove global purge_vmap_area_root rb-tree Uladzislau Rezki (Sony)
2024-01-02 18:46 ` [PATCH v3 07/11] mm: vmalloc: Offload free_vmap_area_lock lock Uladzislau Rezki (Sony)
2024-01-03 11:08 ` Hillf Danton
2024-01-03 15:47 ` Uladzislau Rezki
2024-01-11 9:02 ` Dave Chinner
2024-01-11 15:54 ` Uladzislau Rezki
2024-01-11 20:37 ` Dave Chinner
2024-01-12 12:18 ` Uladzislau Rezki
2024-01-16 22:12 ` Dave Chinner
2024-01-18 18:15 ` Uladzislau Rezki
2024-02-08 0:25 ` Baoquan He
2024-02-08 13:57 ` Uladzislau Rezki
2024-02-28 9:48 ` Baoquan He
2024-02-28 10:39 ` Uladzislau Rezki
2024-02-28 12:26 ` Baoquan He
2024-03-22 18:21 ` Guenter Roeck
2024-03-22 19:03 ` Uladzislau Rezki
2024-03-22 20:53 ` Guenter Roeck
2024-01-02 18:46 ` [PATCH v3 08/11] mm: vmalloc: Support multiple nodes in vread_iter Uladzislau Rezki (Sony)
2024-01-02 18:46 ` [PATCH v3 09/11] mm: vmalloc: Support multiple nodes in vmallocinfo Uladzislau Rezki (Sony)
2024-01-02 18:46 ` [PATCH v3 10/11] mm: vmalloc: Set nr_nodes based on CPUs in a system Uladzislau Rezki (Sony)
2024-01-11 9:25 ` Dave Chinner
2024-01-15 19:09 ` Uladzislau Rezki
2024-01-16 22:06 ` Dave Chinner
2024-01-18 18:23 ` Uladzislau Rezki [this message]
2024-01-18 21:28 ` Dave Chinner
2024-01-19 10:32 ` Uladzislau Rezki
2024-01-02 18:46 ` [PATCH v3 11/11] mm: vmalloc: Add a shrinker to drain vmap pools Uladzislau Rezki (Sony)
2024-02-22 8:35 ` [PATCH v3 00/11] Mitigate a vmap lock contention v3 Uladzislau Rezki
2024-02-22 23:15 ` Pedro Falcato
2024-02-23 9:34 ` Uladzislau Rezki
2024-02-23 10:26 ` Baoquan He
2024-02-23 11:06 ` Uladzislau Rezki
2024-02-23 15:57 ` Baoquan He
2024-02-23 18:55 ` Uladzislau Rezki
2024-02-28 9:27 ` Baoquan He
2024-02-29 10:38 ` Uladzislau Rezki
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZalssyzC8_HsFZON@pc636 \
--to=urezki@gmail.com \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=bhe@redhat.com \
--cc=david@fromorbit.com \
--cc=hch@infradead.org \
--cc=joel@joelfernandes.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lstoakes@gmail.com \
--cc=oleksiy.avramchenko@sony.com \
--cc=paulmck@kernel.org \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.