All of lore.kernel.org
 help / color / mirror / Atom feed
From: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
To: Michal Hocko <mhocko@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Mel Gorman <mgorman@suse.de>,
	"Kirill A. Shutemov" <kirill@shutemov.name>,
	Andrew Morton <akpm@linux-foundation.org>,
	linuxppc-dev@lists.ozlabs.org, Christopher Lameter <cl@linux.com>,
	Vlastimil Babka <vbabka@suse.cz>
Subject: Re: [PATCH v2 3/3] mm/page_alloc: Keep memoryless cpuless node 0 offline
Date: Thu, 30 Apr 2020 12:48:20 +0530	[thread overview]
Message-ID: <20200430071820.GF19958@linux.vnet.ibm.com> (raw)
In-Reply-To: <20200429122211.GD28637@dhcp22.suse.cz>

* Michal Hocko <mhocko@kernel.org> [2020-04-29 14:22:11]:

> On Wed 29-04-20 07:11:45, Srikar Dronamraju wrote:
> > > > 
> > > > By marking, N_ONLINE as NODE_MASK_NONE, lets stop assuming that Node 0 is
> > > > always online.
> > > > 
> > > > ...
> > > >
> > > > --- a/mm/page_alloc.c
> > > > +++ b/mm/page_alloc.c
> > > > @@ -116,8 +116,10 @@ EXPORT_SYMBOL(latent_entropy);
> > > >   */
> > > >  nodemask_t node_states[NR_NODE_STATES] __read_mostly = {
> > > >  	[N_POSSIBLE] = NODE_MASK_ALL,
> > > > +#ifdef CONFIG_NUMA
> > > > +	[N_ONLINE] = NODE_MASK_NONE,
> > > > +#else
> > > >  	[N_ONLINE] = { { [0] = 1UL } },
> > > > -#ifndef CONFIG_NUMA
> > > >  	[N_NORMAL_MEMORY] = { { [0] = 1UL } },
> > > >  #ifdef CONFIG_HIGHMEM
> > > >  	[N_HIGH_MEMORY] = { { [0] = 1UL } },
> > > 
> > > So on all other NUMA machines, when does node 0 get marked online?
> > > 
> > > This change means that for some time during boot, such machines will
> > > now be running with node 0 marked as offline.  What are the
> > > implications of this?  Will something break?
> > 
> > Till the nodes are detected, marking Node 0 as online tends to be redundant.
> > Because the system doesn't know if its a NUMA or a non-NUMA system.
> > Once we detect the nodes, we online them immediately. Hence I don't see any
> > side-effects or negative implications of this change.
> > 
> > However if I am missing anything, please do let me know.
> > 
> > >From my part, I have tested this on
> > 1. Non-NUMA Single node but CPUs and memory coming from zero node.
> > 2. Non-NUMA Single node but CPUs and memory coming from non-zero node.
> > 3. NUMA Multi node but with CPUs and memory from node 0.
> > 4. NUMA Multi node but with no CPUs and memory from node 0.
> 
> Have you tested on something else than ppc? Each arch does the NUMA
> setup separately and this is a big mess. E.g. x86 marks even memory less
> nodes (see init_memory_less_node) as online.
> 

while I have predominantly tested on ppc, I did test on X86 with CONFIG_NUMA
enabled/disabled on both single node and multi node machines.
However, I dont have a cpuless/memoryless x86 system.

> Honestly I have hard time to evaluate the effect of this patch. It makes
> some sense to assume all nodes offline before they get online but this
> is a land mine territory.
> 
> I am also not sure what kind of problem this is going to address. You
> have mentioned numa balancing without many details.

1. On a machine with just one node with node number not being 0,
the current setup will end up showing 2 online nodes. And when there are
more than one online nodes, numa_balancing gets enabled.

Without patch
$ grep numa /proc/vmstat
numa_hit 95179
numa_miss 0
numa_foreign 0
numa_interleave 3764
numa_local 95179
numa_other 0
numa_pte_updates 1206973                 <----------
numa_huge_pte_updates 4654                 <----------
numa_hint_faults 19560                 <----------
numa_hint_faults_local 19560                 <----------
numa_pages_migrated 0


With patch
$ grep numa /proc/vmstat 
numa_hit 322338756
numa_miss 0
numa_foreign 0
numa_interleave 3790
numa_local 322338756
numa_other 0
numa_pte_updates 0                 <----------
numa_huge_pte_updates 0                 <----------
numa_hint_faults 0                 <----------
numa_hint_faults_local 0                 <----------
numa_pages_migrated 0

So we have a redundant page hinting numa faults which we can avoid.

2. Few people have complained about existence of this dummy node when
parsing lscpu and numactl o/p. They somehow start to think that the tools
are reporting incorrectly or the kernel is not able to recognize resources
connected to the node.

-- 
Thanks and Regards
Srikar Dronamraju

WARNING: multiple messages have this Message-ID (diff)
From: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
To: Michal Hocko <mhocko@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	linuxppc-dev@lists.ozlabs.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, Mel Gorman <mgorman@suse.de>,
	Vlastimil Babka <vbabka@suse.cz>,
	"Kirill A. Shutemov" <kirill@shutemov.name>,
	Christopher Lameter <cl@linux.com>,
	Michael Ellerman <mpe@ellerman.id.au>,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [PATCH v2 3/3] mm/page_alloc: Keep memoryless cpuless node 0 offline
Date: Thu, 30 Apr 2020 12:48:20 +0530	[thread overview]
Message-ID: <20200430071820.GF19958@linux.vnet.ibm.com> (raw)
In-Reply-To: <20200429122211.GD28637@dhcp22.suse.cz>

* Michal Hocko <mhocko@kernel.org> [2020-04-29 14:22:11]:

> On Wed 29-04-20 07:11:45, Srikar Dronamraju wrote:
> > > > 
> > > > By marking, N_ONLINE as NODE_MASK_NONE, lets stop assuming that Node 0 is
> > > > always online.
> > > > 
> > > > ...
> > > >
> > > > --- a/mm/page_alloc.c
> > > > +++ b/mm/page_alloc.c
> > > > @@ -116,8 +116,10 @@ EXPORT_SYMBOL(latent_entropy);
> > > >   */
> > > >  nodemask_t node_states[NR_NODE_STATES] __read_mostly = {
> > > >  	[N_POSSIBLE] = NODE_MASK_ALL,
> > > > +#ifdef CONFIG_NUMA
> > > > +	[N_ONLINE] = NODE_MASK_NONE,
> > > > +#else
> > > >  	[N_ONLINE] = { { [0] = 1UL } },
> > > > -#ifndef CONFIG_NUMA
> > > >  	[N_NORMAL_MEMORY] = { { [0] = 1UL } },
> > > >  #ifdef CONFIG_HIGHMEM
> > > >  	[N_HIGH_MEMORY] = { { [0] = 1UL } },
> > > 
> > > So on all other NUMA machines, when does node 0 get marked online?
> > > 
> > > This change means that for some time during boot, such machines will
> > > now be running with node 0 marked as offline.  What are the
> > > implications of this?  Will something break?
> > 
> > Till the nodes are detected, marking Node 0 as online tends to be redundant.
> > Because the system doesn't know if its a NUMA or a non-NUMA system.
> > Once we detect the nodes, we online them immediately. Hence I don't see any
> > side-effects or negative implications of this change.
> > 
> > However if I am missing anything, please do let me know.
> > 
> > >From my part, I have tested this on
> > 1. Non-NUMA Single node but CPUs and memory coming from zero node.
> > 2. Non-NUMA Single node but CPUs and memory coming from non-zero node.
> > 3. NUMA Multi node but with CPUs and memory from node 0.
> > 4. NUMA Multi node but with no CPUs and memory from node 0.
> 
> Have you tested on something else than ppc? Each arch does the NUMA
> setup separately and this is a big mess. E.g. x86 marks even memory less
> nodes (see init_memory_less_node) as online.
> 

while I have predominantly tested on ppc, I did test on X86 with CONFIG_NUMA
enabled/disabled on both single node and multi node machines.
However, I dont have a cpuless/memoryless x86 system.

> Honestly I have hard time to evaluate the effect of this patch. It makes
> some sense to assume all nodes offline before they get online but this
> is a land mine territory.
> 
> I am also not sure what kind of problem this is going to address. You
> have mentioned numa balancing without many details.

1. On a machine with just one node with node number not being 0,
the current setup will end up showing 2 online nodes. And when there are
more than one online nodes, numa_balancing gets enabled.

Without patch
$ grep numa /proc/vmstat
numa_hit 95179
numa_miss 0
numa_foreign 0
numa_interleave 3764
numa_local 95179
numa_other 0
numa_pte_updates 1206973                 <----------
numa_huge_pte_updates 4654                 <----------
numa_hint_faults 19560                 <----------
numa_hint_faults_local 19560                 <----------
numa_pages_migrated 0


With patch
$ grep numa /proc/vmstat 
numa_hit 322338756
numa_miss 0
numa_foreign 0
numa_interleave 3790
numa_local 322338756
numa_other 0
numa_pte_updates 0                 <----------
numa_huge_pte_updates 0                 <----------
numa_hint_faults 0                 <----------
numa_hint_faults_local 0                 <----------
numa_pages_migrated 0

So we have a redundant page hinting numa faults which we can avoid.

2. Few people have complained about existence of this dummy node when
parsing lscpu and numactl o/p. They somehow start to think that the tools
are reporting incorrectly or the kernel is not able to recognize resources
connected to the node.

-- 
Thanks and Regards
Srikar Dronamraju


  reply	other threads:[~2020-04-30  7:21 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-28  9:38 [PATCH v2 0/3] Offline memoryless cpuless node 0 Srikar Dronamraju
2020-04-28  9:38 ` Srikar Dronamraju
2020-04-28  9:38 ` [PATCH v2 1/3] powerpc/numa: Set numa_node for all possible cpus Srikar Dronamraju
2020-04-28  9:38   ` Srikar Dronamraju
2020-04-28  9:38 ` [PATCH v2 2/3] powerpc/numa: Prefer node id queried from vphn Srikar Dronamraju
2020-04-28  9:38   ` Srikar Dronamraju
2020-04-29  6:52   ` Gautham R Shenoy
2020-04-29  6:52     ` Gautham R Shenoy
2020-04-30  4:34     ` Srikar Dronamraju
2020-04-30  4:34       ` Srikar Dronamraju
2020-04-28  9:38 ` [PATCH v2 3/3] mm/page_alloc: Keep memoryless cpuless node 0 offline Srikar Dronamraju
2020-04-28  9:38   ` Srikar Dronamraju
2020-04-28 23:59   ` Andrew Morton
2020-04-28 23:59     ` Andrew Morton
2020-04-29  1:41     ` Srikar Dronamraju
2020-04-29  1:41       ` Srikar Dronamraju
2020-04-29 12:22       ` Michal Hocko
2020-04-29 12:22         ` Michal Hocko
2020-04-30  7:18         ` Srikar Dronamraju [this message]
2020-04-30  7:18           ` Srikar Dronamraju
2020-05-04  9:37           ` Michal Hocko
2020-05-04  9:37             ` Michal Hocko
2020-05-08 13:03             ` Srikar Dronamraju
2020-05-08 13:03               ` Srikar Dronamraju
2020-05-08 13:39               ` David Hildenbrand
2020-05-08 13:39                 ` David Hildenbrand
2020-05-08 13:42                 ` David Hildenbrand
2020-05-08 13:42                   ` David Hildenbrand
2020-05-11 17:47                   ` Srikar Dronamraju
2020-05-11 17:47                     ` Srikar Dronamraju
2020-05-12  7:49                     ` David Hildenbrand
2020-05-12  7:49                       ` David Hildenbrand
2020-05-12 10:42                       ` Srikar Dronamraju
2020-05-12 10:42                         ` Srikar Dronamraju

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200430071820.GF19958@linux.vnet.ibm.com \
    --to=srikar@linux.vnet.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=cl@linux.com \
    --cc=kirill@shutemov.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.