linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
From: Michael Bringmann <mwb@linux.vnet.ibm.com>
To: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>,
	Michal Hocko <mhocko@kernel.org>
Cc: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>,
	Kees Cook <keescook@chromium.org>,
	Mathieu Malaterre <malat@debian.org>,
	linux-kernel@vger.kernel.org, Nicholas Piggin <npiggin@gmail.com>,
	Pavel Tatashin <pasha.tatashin@oracle.com>,
	linux-mm@kvack.org,
	Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com>,
	Juliet Kim <minkim@us.ibm.com>,
	Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>,
	Nathan Fontenot <nfont@linux.vnet.ibm.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	YASUAKI ISHIMATSU <yasu.isimatu@gmail.com>,
	linuxppc-dev@lists.ozlabs.org,
	Dan Williams <dan.j.williams@intel.com>,
	Oscar Salvador <osalvador@suse.de>
Subject: Re: [PATCH] migration/mm: Add WARN_ON to try_offline_node
Date: Tue, 2 Oct 2018 09:51:40 -0500	[thread overview]
Message-ID: <df95f828-1963-d8b9-ab58-6d29d2d152d2@linux.vnet.ibm.com> (raw)
In-Reply-To: <bdbca329-7d35-0535-1737-94a06a19ae28@linux.vnet.ibm.com>

See below.

On 10/01/2018 06:20 PM, Tyrel Datwyler wrote:
> On 10/01/2018 01:27 PM, Michal Hocko wrote:
>> On Mon 01-10-18 13:56:25, Michael Bringmann wrote:
>>> In some LPAR migration scenarios, device-tree modifications are
>>> made to the affinity of the memory in the system.  For instance,
>>> it may occur that memory is installed to nodes 0,3 on a source
>>> system, and to nodes 0,2 on a target system.  Node 2 may not
>>> have been initialized/allocated on the target system.
>>>
>>> After migration, if a RTAS PRRN memory remove is made to a
>>> memory block that was in node 3 on the source system, then
>>> try_offline_node tries to remove it from node 2 on the target.
>>> The NODE_DATA(2) block would not be initialized on the target,
>>> and there is no validation check in the current code to prevent
>>> the use of a NULL pointer.
>>
>> I am not familiar with ppc and the above doesn't really help me
>> much. Sorry about that. But from the above it is not clear to me whether
>> it is the caller which does something unexpected or the hotplug code
>> being not robust enough. From your changelog I would suggest the later
>> but why don't we see the same problem for other archs? Is this a problem
>> of unrolling a partial failure?
>>
>> dlpar_remove_lmb does the following
>>
>> 	nid = memory_add_physaddr_to_nid(lmb->base_addr);
>>
>> 	remove_memory(nid, lmb->base_addr, block_sz);
>>
>> 	/* Update memory regions for memory remove */
>> 	memblock_remove(lmb->base_addr, block_sz);
>>
>> 	dlpar_remove_device_tree_lmb(lmb);
>>
>> Is the whole operation correct when remove_memory simply backs off
>> silently. Why don't we have to care about memblock resp
>> dlpar_remove_device_tree_lmb parts? In other words how come the physical
>> memory range is valid while the node association is not?
>>
> 
> I think the issue here is a race between the LPM code updating affinity and PRRN events being processed. Does your other patch[1] not fix the issue? Or is it that the LPM affinity updates don't do any of the initialization/allocation you mentioned?

This patch addresses the specific case where PRRN changes to CPU or memory are occurring on a system that also observes affinity changes during migration.  Yes,  there is a race condition -- if the PRRN events reliably occurred before the device-tree was updated, this error would not occur.  However, they overlap or occur after the changes during most LPMs observed.

When the device-tree affinity attributes have changed for memory, the 'nid' affinity calculated points to a different node for the memory block than the one used to install it, previously on the source system.  The newly calculated 'nid' affinity may not yet be initialized on the target system.  The current memory tracking mechanisms do not record the node to which a memory block was associated when it was added.  Nathan is looking at adding this feature to the new implementation of LMBs, but it is not there yet, and won't be present in earlier kernels without backporting a significant number of changes.

My other patch[1] is more intended to address locking and CPU update dependencies between PRRN changes and RTAS requests which did not necessarly involve memory updates.  The node to memory association problem after migration would be present even with this issue resolved.

> 
> -Tyrel
> 
> [1] https://lore.kernel.org/linuxppc-dev/20181001185603.11373.61650.stgit@ltcalpine2-lp9.aus.stglabs.ibm.com/T/#u
> 

Michael

-- 
Michael W. Bringmann
Linux Technology Center
IBM Corporation
Tie-Line  363-5196
External: (512) 286-5196
Cell:       (512) 466-0650
mwb@linux.vnet.ibm.com


  reply	other threads:[~2018-10-02 15:14 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-10-01 18:56 [PATCH] migration/mm: Add WARN_ON to try_offline_node Michael Bringmann
2018-10-01 20:02 ` Kees Cook
2018-10-01 20:27 ` Michal Hocko
2018-10-01 23:20   ` Tyrel Datwyler
2018-10-02 14:51     ` Michael Bringmann [this message]
2018-10-02 14:59       ` Michal Hocko
2018-10-02 15:14         ` Michael Bringmann
2018-10-02 16:04           ` Michal Hocko
2018-10-02 18:13             ` Michael Bringmann
2018-10-02 19:45               ` Tyrel Datwyler
2018-10-03  7:03                 ` Michal Hocko
2018-10-03 13:27                 ` Michael Bringmann
2018-10-03 23:05                   ` Tyrel Datwyler
2018-10-04  1:02                     ` Michael Bringmann
2018-10-01 23:23   ` Tyrel Datwyler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=df95f828-1963-d8b9-ab58-6d29d2d152d2@linux.vnet.ibm.com \
    --to=mwb@linux.vnet.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=bauerman@linux.vnet.ibm.com \
    --cc=dan.j.williams@intel.com \
    --cc=keescook@chromium.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=malat@debian.org \
    --cc=mauricfo@linux.vnet.ibm.com \
    --cc=mhocko@kernel.org \
    --cc=minkim@us.ibm.com \
    --cc=nfont@linux.vnet.ibm.com \
    --cc=npiggin@gmail.com \
    --cc=osalvador@suse.de \
    --cc=pasha.tatashin@oracle.com \
    --cc=tlfalcon@linux.vnet.ibm.com \
    --cc=tyreld@linux.vnet.ibm.com \
    --cc=yasu.isimatu@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).