From: Nathan Lynch <nathanl@linux.ibm.com>
To: Scott Cheloha <cheloha@linux.ibm.com>
Cc: Nathan Fontenont <ndfont@gmail.com>,
Michal Hocko <mhocko@suse.com>,
Michal Suchanek <msuchanek@suse.com>,
David Hildenbrand <david@redhat.com>,
linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org,
Rick Lindsley <ricklind@linux.vnet.ibm.com>
Subject: Re: [PATCH v2 1/2] powerpc/drmem: accelerate memory_add_physaddr_to_nid() with LMB xarray
Date: Fri, 21 Feb 2020 14:02:18 -0600 [thread overview]
Message-ID: <87r1ynso8l.fsf@linux.ibm.com> (raw)
In-Reply-To: <20200221172901.1596249-2-cheloha@linux.ibm.com>
Hi Scott,
Scott Cheloha <cheloha@linux.ibm.com> writes:
> On PowerPC, memory_add_physaddr_to_nid() uses a linear search to find
> an LMB matching the given address. This scales very poorly when there
> are many LMBs. The poor scaling cripples drmem_init() during boot:
> lmb_set_nid(), which calls memory_add_physaddr_to_nid(), is called for
> each LMB.
>
> If we index each LMB in an xarray by its base address we can achieve
> O(log n) search during memory_add_physaddr_to_nid(), which scales much
> better.
Is O(log n) correct? I had thought lookup complexity depends on the
characteristics of the key.
Repeating some points from my comments on v1 below.
> +static int drmem_cache_lmb_for_lookup(struct drmem_lmb *lmb)
This is called only from __init functions, so it should be __init as well.
> +{
> + void *ret;
> +
> + ret = xa_store(&drmem_lmb_xa_base_addr, lmb->base_addr, lmb,
> + GFP_KERNEL);
> + if (xa_is_err(ret))
> + return xa_err(ret);
> +
> + return 0;
> +}
[...]
> @@ -364,6 +383,8 @@ static void __init init_drmem_v1_lmbs(const __be32 *prop)
>
> for_each_drmem_lmb(lmb) {
> read_drconf_v1_cell(lmb, &prop);
> + if (drmem_cache_lmb_for_lookup(lmb) != 0)
> + return;
> lmb_set_nid(lmb);
> }
> }
> @@ -411,6 +432,9 @@ static void __init init_drmem_v2_lmbs(const __be32 *prop)
> lmb->aa_index = dr_cell.aa_index;
> lmb->flags = dr_cell.flags;
>
> + if (drmem_cache_lmb_for_lookup(lmb) != 0)
> + return;
> +
Failing to record an lmb in the cache shouldn't be cause for silently
aborting this initialization. Future lookups against the caches (should
the system even boot) may fail, but the drmem_lmbs will still be
initialized correctly.
I'd say just ignore (or perhaps log once) xa_store() failures as long as
this code only runs at boot.
> diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
> index 3c7dec70cda0..0fd7963a991e 100644
> --- a/arch/powerpc/mm/numa.c
> +++ b/arch/powerpc/mm/numa.c
> @@ -958,27 +958,18 @@ early_param("topology_updates", early_topology_updates);
> static int hot_add_drconf_scn_to_nid(unsigned long scn_addr)
> {
> struct drmem_lmb *lmb;
> - unsigned long lmb_size;
> - int nid = NUMA_NO_NODE;
> -
> - lmb_size = drmem_lmb_size();
> -
> - for_each_drmem_lmb(lmb) {
> - /* skip this block if it is reserved or not assigned to
> - * this partition */
> - if ((lmb->flags & DRCONF_MEM_RESERVED)
> - || !(lmb->flags & DRCONF_MEM_ASSIGNED))
> - continue;
> -
> - if ((scn_addr < lmb->base_addr)
> - || (scn_addr >= (lmb->base_addr + lmb_size)))
> - continue;
>
> - nid = of_drconf_to_nid_single(lmb);
> - break;
> - }
> + lmb = drmem_find_lmb_by_base_addr(scn_addr);
It occurs to me that the old behavior here will succeed in looking up a
drmem_lmb if scn_addr is within it, while the new behavior is that the
given address must match the starting address of the drmem_lmb. I think
this is probably OK though?
> + if (lmb == NULL)
> + return NUMA_NO_NODE;
> +
> + /* can't use this block if it is reserved or not assigned to
> + * this partition */
> + if ((lmb->flags & DRCONF_MEM_RESERVED)
> + || !(lmb->flags & DRCONF_MEM_ASSIGNED))
> + return NUMA_NO_NODE;
>
> - return nid;
> + return of_drconf_to_nid_single(lmb);
> }
Otherwise I have no concerns about the changes to
hot_add_drconf_scn_to_nid.
Other than the minor points I've made here, this looks like a tidy
self-contained change to fix drmem initialization latency.
next prev parent reply other threads:[~2020-02-21 20:04 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-01-28 22:11 [PATCH] powerpc/drmem: cache LMBs in xarray to accelerate lookup Scott Cheloha
2020-01-28 23:56 ` Nathan Lynch
2020-01-29 18:10 ` Scott Cheloha
2020-01-30 16:09 ` Fontenot, Nathan
2020-02-03 20:13 ` Scott Cheloha
2020-02-05 14:33 ` Fontenot, Nathan
2020-02-04 16:19 ` Scott Cheloha
2020-02-21 17:29 ` pseries: accelerate drmem and simplify hotplug with xarrays Scott Cheloha
2020-02-21 17:29 ` [PATCH v2 1/2] powerpc/drmem: accelerate memory_add_physaddr_to_nid() with LMB xarray Scott Cheloha
2020-02-21 20:02 ` Nathan Lynch [this message]
2020-07-22 23:00 ` Anton Blanchard
2020-02-21 17:29 ` [PATCH v2 2/2] pseries/hotplug-memory: leverage xarray API to simplify code Scott Cheloha
2020-02-21 20:03 ` Nathan Lynch
2020-02-21 18:11 ` [PATCH] powerpc/drmem: cache LMBs in xarray to accelerate lookup Nathan Lynch
2020-02-21 18:28 ` Nathan Lynch
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87r1ynso8l.fsf@linux.ibm.com \
--to=nathanl@linux.ibm.com \
--cc=cheloha@linux.ibm.com \
--cc=david@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=mhocko@suse.com \
--cc=msuchanek@suse.com \
--cc=ndfont@gmail.com \
--cc=ricklind@linux.vnet.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).