linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [RFC] mm/memory_hotplug: wrong node identified if memory was never on-lined.
@ 2018-09-12 14:02 Jonathan Cameron
  2018-09-18 12:13 ` Oscar Salvador
  2019-03-24  6:12 ` Anshuman Khandual
  0 siblings, 2 replies; 4+ messages in thread
From: Jonathan Cameron @ 2018-09-12 14:02 UTC (permalink / raw)
  To: linux-mm; +Cc: linuxarm, Oscar Salvador

Hi All,

I've been accidentally (i.e. due to a script bug) testing some odd corner
cases of memory hotplug and run into this issue.

If we hot add some memory we have carefully avoided the need to use
get_nid_for_pfn as it isn't set until we online the memory.

Unfortunately if we never online the memory but instead just remove it again
we don't have any such protection so in unregister_mem_sect_under_nodes
we end up trying to call sysfs_remove_link for memory on (typically) node0
instead of the intended node.

So the path to this problem is

add_memory(Node, addr, size);
-> add_memory_resource(Node ...)
---> link_mem_sections(Node ...)
------> register_mem_sect_under_node(
----------> sysfs_create_link_nowarn(&node_devices[Node]->dev.kobj,...
(which creates the link to say
/sys/bus/nodes/devices/node5/memory84

Note that in code we avoid checking the nid set for the pfn in hotplug
paths.

remove_memory(Node, addr, size);
-> arch_remove_memory(start, size, NULL);
---> __remove_pages
-----> __remove_section
-------> unregister_memory_section
----------> remove_memory_section(Node,... -- Node set to 0 but not used at all.
-------------> unregister_mem_sect_under_node() - node not passed in anyway
---------------->get_nid_for_pfn(pfn).  (try to get it back again)
-------------------->sysfs_remove_link (wrong node number)
tries to remove
/sys/bus/nodes/devices/node0/memory84 which doesn't exist.

So not tidy, but not critical - but you get BUG_ON when you try
to add the memory again as there is a left over link in the way.


Now I'm not sure what the preferred fix for this would be.
1) Actually set the nid for each pfn during hot add rather than waiting for
   online.
2) Modify the whole call chain to pass the nid through as we know it at the
   remove_memory call for hotplug cases...

I personally favour option 2 but don't really have a deep enough understanding
to know if this is going to cause trouble anywhere else.

I mocked up option 2 using some updated arm64 hotplug patches and it seems
superficially fine if fairly invasive.

The whole structure is a little odd in that in the probe path the sysfs links
are not called via architecture specific code whilst in the remove they are.

Jonathan

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [RFC] mm/memory_hotplug: wrong node identified if memory was never on-lined.
  2018-09-12 14:02 [RFC] mm/memory_hotplug: wrong node identified if memory was never on-lined Jonathan Cameron
@ 2018-09-18 12:13 ` Oscar Salvador
  2018-09-18 12:24   ` Jonathan Cameron
  2019-03-24  6:12 ` Anshuman Khandual
  1 sibling, 1 reply; 4+ messages in thread
From: Oscar Salvador @ 2018-09-18 12:13 UTC (permalink / raw)
  To: Jonathan Cameron; +Cc: linux-mm, linuxarm

On Wed, Sep 12, 2018 at 03:02:18PM +0100, Jonathan Cameron wrote:
> Now I'm not sure what the preferred fix for this would be.
> 1) Actually set the nid for each pfn during hot add rather than waiting for
>    online.
> 2) Modify the whole call chain to pass the nid through as we know it at the
>    remove_memory call for hotplug cases...

Hi Jonathan,

I am back from vacation after four weeks, so I might still be in a bubble.

I was cleaning up unregister_mem_sect_under_nodes in [1], but I failed
to see this.
I think that we can pass the node down the chain.

Looking closer, we might be able to get rid of the nodemask var there,
but I need to take a closer look.

I had a RFCv2 sent a month ago [2] to fix another problem.
That patchset, among other things, replaces the zone paramater with the nid.

I was about to send a new version of that patchset, without RFC this time, so
if you do not mind, I could add this change in there and you can comment it.

What do you think?

[1] https://patchwork.kernel.org/patch/10568547/
[2] https://patchwork.kernel.org/patch/10569085/

Thanks
-- 
Oscar Salvador
SUSE L3

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [RFC] mm/memory_hotplug: wrong node identified if memory was never on-lined.
  2018-09-18 12:13 ` Oscar Salvador
@ 2018-09-18 12:24   ` Jonathan Cameron
  0 siblings, 0 replies; 4+ messages in thread
From: Jonathan Cameron @ 2018-09-18 12:24 UTC (permalink / raw)
  To: Oscar Salvador, linuxarm; +Cc: linux-mm

On Tue, 18 Sep 2018 14:13:42 +0200
Oscar Salvador <osalvador@techadventures.net> wrote:

> On Wed, Sep 12, 2018 at 03:02:18PM +0100, Jonathan Cameron wrote:
> > Now I'm not sure what the preferred fix for this would be.
> > 1) Actually set the nid for each pfn during hot add rather than waiting for
> >    online.
> > 2) Modify the whole call chain to pass the nid through as we know it at the
> >    remove_memory call for hotplug cases...  
> 
> Hi Jonathan,
Hi Oscar,

> 
> I am back from vacation after four weeks, so I might still be in a bubble.
> 
> I was cleaning up unregister_mem_sect_under_nodes in [1], but I failed
> to see this.
> I think that we can pass the node down the chain.
> 
> Looking closer, we might be able to get rid of the nodemask var there,
> but I need to take a closer look.
> 
> I had a RFCv2 sent a month ago [2] to fix another problem.
Ah. Yes I hadn't made the connection that it would be doing most of what is
needed here as well. Thanks.

> That patchset, among other things, replaces the zone paramater with the nid.
> 
> I was about to send a new version of that patchset, without RFC this time, so
> if you do not mind, I could add this change in there and you can comment it.
That would be great.

Thanks,

Jonathan
> 
> What do you think?
> 
> [1] https://patchwork.kernel.org/patch/10568547/
> [2] https://patchwork.kernel.org/patch/10569085/
> 
> Thanks

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [RFC] mm/memory_hotplug: wrong node identified if memory was never on-lined.
  2018-09-12 14:02 [RFC] mm/memory_hotplug: wrong node identified if memory was never on-lined Jonathan Cameron
  2018-09-18 12:13 ` Oscar Salvador
@ 2019-03-24  6:12 ` Anshuman Khandual
  1 sibling, 0 replies; 4+ messages in thread
From: Anshuman Khandual @ 2019-03-24  6:12 UTC (permalink / raw)
  To: Jonathan Cameron, linux-mm; +Cc: linuxarm, Oscar Salvador

Hello Jonathan,

On 09/12/2018 07:32 PM, Jonathan Cameron wrote:
> Hi All,
> 
> I've been accidentally (i.e. due to a script bug) testing some odd corner
> cases of memory hotplug and run into this issue.
> 
> If we hot add some memory we have carefully avoided the need to use
> get_nid_for_pfn as it isn't set until we online the memory.

get_nid_for_pfn() gets avoided during memory hot add into an existing and
initialized node.

add_memory(nid, start, size)
	add_memory_resource(nid, res, online)
		link_mem_sections(nid, start_pfn, nr_pages, false)
			register_mem_sect_under_node(mem_blk, nid, false)
				sysfs_create_link_nowarn()
> 
> Unfortunately if we never online the memory but instead just remove it again
> we don't have any such protection so in unregister_mem_sect_under_nodes
> we end up trying to call sysfs_remove_link for memory on (typically) node0
> instead of the intended node.

Are you some how calling arch_remove_memory() directly instead of going through
__remove_memory() first. Because __remove_memory() will remove the range from
memblock first making pfn_valid() fail in get_nid_for_pfn() on arm64 platform
because of the memblock search for mapped memory. Just wondering how you did
not hit this problem first. But yes if you have some how crossed this point you
will probably see page_to_nid(pfn_to_page(pfn)) return as 0 for an uninitialized
page because the page never went online first.

> 
> So the path to this problem is
> 
> add_memory(Node, addr, size);
> -> add_memory_resource(Node ...)
> ---> link_mem_sections(Node ...)
> ------> register_mem_sect_under_node(
> ----------> sysfs_create_link_nowarn(&node_devices[Node]->dev.kobj,...
> (which creates the link to say
> /sys/bus/nodes/devices/node5/memory84
> 
> Note that in code we avoid checking the nid set for the pfn in hotplug
> paths.

Right.

> 
> remove_memory(Node, addr, size);
> -> arch_remove_memory(start, size, NULL);
> ---> __remove_pages
> -----> __remove_section
> -------> unregister_memory_section
> ----------> remove_memory_section(Node,... -- Node set to 0 but not used at all.
> -------------> unregister_mem_sect_under_node() - node not passed in anyway
> ---------------->get_nid_for_pfn(pfn).  (try to get it back again)
> -------------------->sysfs_remove_link (wrong node number)
> tries to remove
> /sys/bus/nodes/devices/node0/memory84 which doesn't exist.
> 
> So not tidy, but not critical - but you get BUG_ON when you try
> to add the memory again as there is a left over link in the way.

Afterwards the system is anyway broken from memory hotplug point of view atleast.

> 
> 
> Now I'm not sure what the preferred fix for this would be.
> 1) Actually set the nid for each pfn during hot add rather than waiting for
>    online.

IIUC that will try to partially init a page only with it's nid and nothing else.
Not sure if that will be okay.

> 2) Modify the whole call chain to pass the nid through as we know it at the
>    remove_memory call for hotplug cases...


Both the top level functions has got nid. Wondering why page_to_nid() still needs
to be fetched while creating or removing sysfs links. Is there some corner cases
where nid might change while memory hot add/remove is already in progress with
hotplug lock.

__add_memory(nid, start, size)
__remove_memory(nid, start, size)

> 
> I personally favour option 2 but don't really have a deep enough understanding
> to know if this is going to cause trouble anywhere else.
> 
> I mocked up option 2 using some updated arm64 hotplug patches and it seems
> superficially fine if fairly invasive.
> 
> The whole structure is a little odd in that in the probe path the sysfs links
> are not called via architecture specific code whilst in the remove they are.

Right. I guess thats because __add_pages(called from arch_add_memory) does not
create the link where as __remove_pages(called from arch_remove_memory) removes
the sysfs link.


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2019-03-24  6:12 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-09-12 14:02 [RFC] mm/memory_hotplug: wrong node identified if memory was never on-lined Jonathan Cameron
2018-09-18 12:13 ` Oscar Salvador
2018-09-18 12:24   ` Jonathan Cameron
2019-03-24  6:12 ` Anshuman Khandual

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).