From: Michael Ellerman <patch-notifications@ellerman.id.au>
To: Nathan Fontenot <nfont@linux.vnet.ibm.com>,
linuxppc-dev@lists.ozlabs.org
Subject: Re: [PATCH v2] powerpc/pseries: Track LMB nid instead of using device tree
Date: Fri, 3 May 2019 16:59:32 +1000 (AEST) [thread overview]
Message-ID: <44wNKc5PzHz9sPf@ozlabs.org> (raw)
In-Reply-To: <153849454718.31242.7801365038229982137.stgit@ltcalpine2-lp14.aus.stglabs.ibm.com>
On Tue, 2018-10-02 at 15:35:59 UTC, Nathan Fontenot wrote:
> When removing memory we need to remove the memory from the node
> it was added to instead of looking up the node it should be in
> in the device tree.
>
> During testing we have seen scenarios where the affinity for a
> LMB changes due to a partition migration or PRRN event. In these
> cases the node the LMB exists in may not match the node the device
> tree indicates it belongs in. This can lead to a system crash
> when trying to DLPAR remove the LMB after a migration or PRRN
> event. The current code looks up the node in the device tree to
> remove the LMB from, the crash occurs when we try to offline this
> node and it does not have any data, i.e. node_data[nid] == NULL.
>
> 36:mon> e
> cpu 0x36: Vector: 300 (Data Access) at [c0000001828b7810]
> pc: c00000000036d08c: try_offline_node+0x2c/0x1b0
> lr: c0000000003a14ec: remove_memory+0xbc/0x110
> sp: c0000001828b7a90
> msr: 800000000280b033
> dar: 9a28
> dsisr: 40000000
> current = 0xc0000006329c4c80
> paca = 0xc000000007a55200 softe: 0 irq_happened: 0x01
> pid = 76926, comm = kworker/u320:3
>
> 36:mon> t
> [link register ] c0000000003a14ec remove_memory+0xbc/0x110
> [c0000001828b7a90] c00000000006a1cc arch_remove_memory+0x9c/0xd0 (unreliable)
> [c0000001828b7ad0] c0000000003a14e0 remove_memory+0xb0/0x110
> [c0000001828b7b20] c0000000000c7db4 dlpar_remove_lmb+0x94/0x160
> [c0000001828b7b60] c0000000000c8ef8 dlpar_memory+0x7e8/0xd10
> [c0000001828b7bf0] c0000000000bf828 handle_dlpar_errorlog+0xf8/0x160
> [c0000001828b7c60] c0000000000bf8cc pseries_hp_work_fn+0x3c/0xa0
> [c0000001828b7c90] c000000000128cd8 process_one_work+0x298/0x5a0
> [c0000001828b7d20] c000000000129068 worker_thread+0x88/0x620
> [c0000001828b7dc0] c00000000013223c kthread+0x1ac/0x1c0
> [c0000001828b7e30] c00000000000b45c ret_from_kernel_thread+0x5c/0x80
>
> To resolve this we need to track the node a LMB belongs to when
> it is added to the system so we can remove it from that node instead
> of the node that the device tree indicates it should belong to.
>
> Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Applied to powerpc next, thanks.
https://git.kernel.org/powerpc/c/b2d3b5ee66f2a04a918cc043cec0c9ed
cheers
prev parent reply other threads:[~2019-05-03 8:00 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-10-02 15:35 [PATCH v2] powerpc/pseries: Track LMB nid instead of using device tree Nathan Fontenot
2019-05-03 6:59 ` Michael Ellerman [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=44wNKc5PzHz9sPf@ozlabs.org \
--to=patch-notifications@ellerman.id.au \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=nfont@linux.vnet.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).