All of lore.kernel.org
 help / color / mirror / Atom feed
From: Nathan Lynch <nathanl@linux.ibm.com>
To: Scott Cheloha <cheloha@linux.ibm.com>, linuxppc-dev@ozlabs.org
Cc: Laurent Dufour <ldufour@linux.vnet.ibm.com>,
	Michal Suchanek <msuchanek@suse.de>,
	David Hildenbrand <david@redhat.com>,
	Rick Lindsley <ricklind@linux.vnet.ibm.com>
Subject: Re: [PATCH v3] pseries/drmem: don't cache node id in drmem_lmb struct
Date: Fri, 14 Aug 2020 12:27:59 -0500	[thread overview]
Message-ID: <87y2mhxhls.fsf@linux.ibm.com> (raw)
In-Reply-To: <20200811015115.63677-1-cheloha@linux.ibm.com>

Scott Cheloha <cheloha@linux.ibm.com> writes:
> At memory hot-remove time we can retrieve an LMB's nid from its
> corresponding memory_block.  There is no need to store the nid
> in multiple locations.
>
> Note that lmb_to_memblock() uses find_memory_block() to get the
> corresponding memory_block.  As find_memory_block() runs in sub-linear
> time this approach is negligibly slower than what we do at present.
>
> In exchange for this lookup at hot-remove time we no longer need to
> call memory_add_physaddr_to_nid() during drmem_init() for each LMB.
> On powerpc, memory_add_physaddr_to_nid() is a linear search, so this
> spares us an O(n^2) initialization during boot.
>
> On systems with many LMBs that initialization overhead is palpable and
> disruptive.  For example, on a box with 249854 LMBs we're seeing
> drmem_init() take upwards of 30 seconds to complete:
>
> [   53.721639] drmem: initializing drmem v2
> [   80.604346] watchdog: BUG: soft lockup - CPU#65 stuck for 23s! [swapper/0:1]
> [   80.604377] Modules linked in:
> [   80.604389] CPU: 65 PID: 1 Comm: swapper/0 Not tainted 5.6.0-rc2+ #4
> [   80.604397] NIP:  c0000000000a4980 LR: c0000000000a4940 CTR: 0000000000000000
> [   80.604407] REGS: c0002dbff8493830 TRAP: 0901   Not tainted  (5.6.0-rc2+)
> [   80.604412] MSR:  8000000002009033 <SF,VEC,EE,ME,IR,DR,RI,LE>  CR: 44000248  XER: 0000000d
> [   80.604431] CFAR: c0000000000a4a38 IRQMASK: 0
> [   80.604431] GPR00: c0000000000a4940 c0002dbff8493ac0 c000000001904400 c0003cfffffede30
> [   80.604431] GPR04: 0000000000000000 c000000000f4095a 000000000000002f 0000000010000000
> [   80.604431] GPR08: c0000bf7ecdb7fb8 c0000bf7ecc2d3c8 0000000000000008 c00c0002fdfb2001
> [   80.604431] GPR12: 0000000000000000 c00000001e8ec200
> [   80.604477] NIP [c0000000000a4980] hot_add_scn_to_nid+0xa0/0x3e0
> [   80.604486] LR [c0000000000a4940] hot_add_scn_to_nid+0x60/0x3e0
> [   80.604492] Call Trace:
> [   80.604498] [c0002dbff8493ac0] [c0000000000a4940] hot_add_scn_to_nid+0x60/0x3e0 (unreliable)
> [   80.604509] [c0002dbff8493b20] [c000000000087c10] memory_add_physaddr_to_nid+0x20/0x60
> [   80.604521] [c0002dbff8493b40] [c0000000010d4880] drmem_init+0x25c/0x2f0
> [   80.604530] [c0002dbff8493c10] [c000000000010154] do_one_initcall+0x64/0x2c0
> [   80.604540] [c0002dbff8493ce0] [c0000000010c4aa0] kernel_init_freeable+0x2d8/0x3a0
> [   80.604550] [c0002dbff8493db0] [c000000000010824] kernel_init+0x2c/0x148
> [   80.604560] [c0002dbff8493e20] [c00000000000b648] ret_from_kernel_thread+0x5c/0x74
> [   80.604567] Instruction dump:
> [   80.604574] 392918e8 e9490000 e90a000a e92a0000 80ea000c 1d080018 3908ffe8 7d094214
> [   80.604586] 7fa94040 419d00dc e9490010 714a0088 <2faa0008> 409e00ac e9490000 7fbe5040
> [   89.047390] drmem: 249854 LMB(s)
>
> With a patched kernel on the same machine we're no longer seeing the
> soft lockup.  drmem_init() now completes in negligible time, even when
> the LMB count is large.

This has been a real annoyance, and it's great to see it fixed in a way
that both simplifies the code and reduces data structure size.

>
> Signed-off-by: Scott Cheloha <cheloha@linux.ibm.com>

I think this also should have:

Fixes: b2d3b5ee66f2 ("powerpc/pseries: Track LMB nid instead of using device tree")

since you're essentially reverting it.

The problem that commit was trying to address arose from the fact that
the removal path was determining the Linux node for a block from the
firmware description of the block:

memory_add_physaddr_to_nid
  hot_add_scn_to_nid
    hot_add_drconf_scn_to_nid
      of_drconf_to_nid_single
        [parses the DT for the nid]

However, this can become out of sync with Linux's NUMA assignment for
the memory due to VM migration or other events.

I'm satisfied that this change does not reintroduce that problem. The
removal path still derives the node without going to the device tree,
but now performs a more efficient lookup.

Reviewed-by: Nathan Lynch <nathanl@linux.ibm.com>

  reply	other threads:[~2020-08-14 17:32 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-11  1:51 [PATCH v3] pseries/drmem: don't cache node id in drmem_lmb struct Scott Cheloha
2020-08-14 17:27 ` Nathan Lynch [this message]
2020-08-14 19:31   ` Nathan Lynch
2020-08-21  8:33 ` Laurent Dufour
2020-09-10 18:28   ` Scott Cheloha
2020-09-09 13:27 ` Michael Ellerman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87y2mhxhls.fsf@linux.ibm.com \
    --to=nathanl@linux.ibm.com \
    --cc=cheloha@linux.ibm.com \
    --cc=david@redhat.com \
    --cc=ldufour@linux.vnet.ibm.com \
    --cc=linuxppc-dev@ozlabs.org \
    --cc=msuchanek@suse.de \
    --cc=ricklind@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.