From: Michal Hocko <mhocko@kernel.org>
To: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
Cc: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>,
Kees Cook <keescook@chromium.org>,
Mathieu Malaterre <malat@debian.org>,
linux-kernel@vger.kernel.org, Nicholas Piggin <npiggin@gmail.com>,
Pavel Tatashin <pasha.tatashin@oracle.com>,
linux-mm@kvack.org, Michael Bringmann <mwb@linux.vnet.ibm.com>,
Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com>,
Juliet Kim <minkim@us.ibm.com>,
Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>,
Nathan Fontenot <nfont@linux.vnet.ibm.com>,
Andrew Morton <akpm@linux-foundation.org>,
YASUAKI ISHIMATSU <yasu.isimatu@gmail.com>,
linuxppc-dev@lists.ozlabs.org,
Dan Williams <dan.j.williams@intel.com>,
Oscar Salvador <osalvador@suse.de>
Subject: Re: [PATCH] migration/mm: Add WARN_ON to try_offline_node
Date: Wed, 3 Oct 2018 09:03:20 +0200 [thread overview]
Message-ID: <20181003070320.GE18290@dhcp22.suse.cz> (raw)
In-Reply-To: <bbc5f219-614f-b024-0888-8ad216c5eaf8@linux.vnet.ibm.com>
On Tue 02-10-18 12:45:50, Tyrel Datwyler wrote:
> On 10/02/2018 11:13 AM, Michael Bringmann wrote:
> >
> >
> > On 10/02/2018 11:04 AM, Michal Hocko wrote:
> >> On Tue 02-10-18 10:14:49, Michael Bringmann wrote:
> >>> On 10/02/2018 09:59 AM, Michal Hocko wrote:
> >>>> On Tue 02-10-18 09:51:40, Michael Bringmann wrote:
> >>>> [...]
> >>>>> When the device-tree affinity attributes have changed for memory,
> >>>>> the 'nid' affinity calculated points to a different node for the
> >>>>> memory block than the one used to install it, previously on the
> >>>>> source system. The newly calculated 'nid' affinity may not yet
> >>>>> be initialized on the target system. The current memory tracking
> >>>>> mechanisms do not record the node to which a memory block was
> >>>>> associated when it was added. Nathan is looking at adding this
> >>>>> feature to the new implementation of LMBs, but it is not there
> >>>>> yet, and won't be present in earlier kernels without backporting a
> >>>>> significant number of changes.
> >>>>
> >>>> Then the patch you have proposed here just papers over a real issue, no?
> >>>> IIUC then you simply do not remove the memory if you lose the race.
> >>>
> >>> The problem occurs when removing memory after an affinity change
> >>> references a node that was previously unreferenced. Other code
> >>> in 'kernel/mm/memory_hotplug.c' deals with initializing an empty
> >>> node when adding memory to a system. The 'removing memory' case is
> >>> specific to systems that perform LPM and allow device-tree changes.
> >>> The powerpc kernel does not have the option of accepting some PRRN
> >>> requests and accepting others. It must perform them all.
> >>
> >> I am sorry, but you are still too cryptic for me. Either there is a
> >> correctness issue and the the patch doesn't really fix anything or the
> >> final race doesn't make any difference and then the ppc code should be
> >> explicit about that. Checking the node inside the hotplug core code just
> >> looks as a wrong layer to mitigate an arch specific problem. I am not
> >> saying the patch is a no-go but if anything we want a big fat comment
> >> explaining how this is possible because right now it just points to an
> >> incorrect API usage.
> >>
> >> That being said, this sounds pretty much ppc specific problem and I
> >> would _prefer_ it to be handled there (along with a big fat comment of
> >> course).
> >
> > Let me try again. Regardless of the path to which we get to this condition,
> > we currently crash the kernel. This patch changes that to a WARN_ON notice
> > and continues executing the kernel without shutting down the system. I saw
> > the problem during powerpc testing, because that is the focus of my work.
> > There are other paths to this function besides powerpc. I feel that the
> > kernel should keep running instead of halting.
>
> This is still basically a hack to get around a known race. In itself
> this patch is still worth while in that we shouldn't crash the kernel
> on a null pointer dereference. However, I think the actual problem
> still needs to be addressed. We shouldn't run any PRRN events for the
> source system on the target after a migration. The device tree update
> should have taken care of telling us about new affinities and what
> not. Can we just throw out any queued PRRN events when we wake up on
> the target?
And until a proper fix is developed can we have NODE_DATA test in the
affected code rather than pollute the generic code with something that
is essentially a wrong usage of the API? With a big fat warning
explaining what is going on here?
--
Michal Hocko
SUSE Labs
next prev parent reply other threads:[~2018-10-03 7:25 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-10-01 18:56 [PATCH] migration/mm: Add WARN_ON to try_offline_node Michael Bringmann
2018-10-01 20:02 ` Kees Cook
2018-10-01 20:27 ` Michal Hocko
2018-10-01 23:20 ` Tyrel Datwyler
2018-10-02 14:51 ` Michael Bringmann
2018-10-02 14:59 ` Michal Hocko
2018-10-02 15:14 ` Michael Bringmann
2018-10-02 16:04 ` Michal Hocko
2018-10-02 18:13 ` Michael Bringmann
2018-10-02 19:45 ` Tyrel Datwyler
2018-10-03 7:03 ` Michal Hocko [this message]
2018-10-03 13:27 ` Michael Bringmann
2018-10-03 23:05 ` Tyrel Datwyler
2018-10-04 1:02 ` Michael Bringmann
2018-10-01 23:23 ` Tyrel Datwyler
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20181003070320.GE18290@dhcp22.suse.cz \
--to=mhocko@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=bauerman@linux.vnet.ibm.com \
--cc=dan.j.williams@intel.com \
--cc=keescook@chromium.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=malat@debian.org \
--cc=mauricfo@linux.vnet.ibm.com \
--cc=minkim@us.ibm.com \
--cc=mwb@linux.vnet.ibm.com \
--cc=nfont@linux.vnet.ibm.com \
--cc=npiggin@gmail.com \
--cc=osalvador@suse.de \
--cc=pasha.tatashin@oracle.com \
--cc=tlfalcon@linux.vnet.ibm.com \
--cc=tyreld@linux.vnet.ibm.com \
--cc=yasu.isimatu@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).