devicetree.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Steve Capper <steve.capper-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
To: Ganapatrao Kulkarni <gpkulkarni-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Cc: Mark Rutland <mark.rutland-5wv7dgnIgG8@public.gmane.org>,
	Ganapatrao Kulkarni
	<ganapatrao.kulkarni-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8@public.gmane.org>,
	Catalin Marinas <Catalin.Marinas-5wv7dgnIgG8@public.gmane.org>,
	Will Deacon <Will.Deacon-5wv7dgnIgG8@public.gmane.org>,
	"grant.likely-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org"
	<grant.likely-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>,
	"robh+dt-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org"
	<robh+dt-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>,
	"devicetree-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
	<devicetree-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	"linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org"
	<linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org>,
	Leif Lindholm
	<leif.lindholm-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>,
	Roy Franz <roy.franz-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>,
	Ard Biesheuvel
	<ard.biesheuvel-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>,
	msalter-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org
Subject: Re: [RFC PATCH 4/4] arm64:numa: adding numa support for arm64 platforms.
Date: Mon, 20 Oct 2014 15:25:56 +0100	[thread overview]
Message-ID: <20141020142555.GA9968@linaro.org> (raw)
In-Reply-To: <CAFpQJXXkgtr6E0owHxAu0MG8+7s5LBt_mVB9gQM1VfkX2rY5FQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

On Fri, Oct 17, 2014 at 10:49:56PM +0530, Ganapatrao Kulkarni wrote:
> Hi All,
> 
> On Mon, Oct 6, 2014 at 11:22 PM, Ganapatrao Kulkarni
> <gpkulkarni-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> > On Mon, Oct 6, 2014 at 4:56 PM, Mark Rutland <mark.rutland-5wv7dgnIgG8@public.gmane.org> wrote:
> >> On Mon, Oct 06, 2014 at 06:14:36AM +0100, Ganapatrao Kulkarni wrote:
> >>> Hi Mark,
> >>>
> >>> On Fri, Oct 3, 2014 at 5:43 PM, Mark Rutland <mark.rutland-5wv7dgnIgG8@public.gmane.org> wrote:
> >>> > On Thu, Sep 25, 2014 at 10:03:59AM +0100, Ganapatrao Kulkarni wrote:
> >>> >> Adding numa support for arm64 based platforms.
> >>> >> This version creates numa mapping by parsing the dt table.
> >>> >> cpu to node id mapping is derived from cluster_id as defined in cpu-map.
> >>> >> memory to node id mapping is derived from nid property of memory node.
> >>> >
> >>> > [...]
> >>> >
> >>> >> +/*
> >>> >> + * Too small node sizes may confuse the VM badly. Usually they
> >>> >> + * result from BIOS bugs. So dont recognize nodes as standalone
> >>> >> + * NUMA entities that have less than this amount of RAM listed:
> >>> >> + */
> >>> >> +#define NODE_MIN_SIZE (4*1024*1024)
> >>> >
> >>> > Why do these confuse the VM? what does BIOS have to do with arm64?
> >>> sneaked in from x86, will remove this.
> >>> >
> >>> >> +
> >>> >> +#define parent_node(node)      (node)
> >>> >
> >>> > Huh?
> >>> for thunder, no hierarchy at numa nodes. shall i put under ifdef or
> >>> separate header file?
> >>
> >> I was confused by a node being its own parent, but that seems to be the
> >> case elsewhere for parent_node() implementations. Please at least have a
> >> comment that we're assuming a flat hierarchy (for now).
> > sure, will add the required comments.
> >>
> >>> >
> >>> > [...]
> >>> >
> >>> >> @@ -168,6 +191,11 @@ void __init bootmem_init(void)
> >>> >>         min = PFN_UP(memblock_start_of_DRAM());
> >>> >>         max = PFN_DOWN(memblock_end_of_DRAM());
> >>> >>
> >>> >> +       high_memory = __va((max << PAGE_SHIFT) - 1) + 1;
> >>> >> +       max_pfn = max_low_pfn = max;
> >>> >> +
> >>> >> +       if (IS_ENABLED(CONFIG_NUMA))
> >>> >> +               arm64_numa_init();
> >>> >
> >>> > Is this function defined if !CONFIG_NUMA? Surely it must do nothing in
> >>> > that case anyway?
> >>> yes, if is not required, will remove it.
> >>> >
> >>> > [...]
> >>> >
> >>> >> +/*
> >>> >> + *  Set the cpu to node and mem mapping
> >>> >> + */
> >>> >> +void numa_store_cpu_info(cpu)
> >>> >> +{
> >>> >> +       cpu_to_node_map[cpu] = cpu_topology[cpu].cluster_id;
> >>> >> +       cpumask_set_cpu(cpu, node_to_cpumask_map[cpu_to_node_map[cpu]]);
> >>> >> +       set_numa_node(cpu_to_node_map[cpu]);
> >>> >> +       set_numa_mem(local_memory_node(cpu_to_node_map[cpu]));
> >>> >> +}
> >>> >
> >>> > I don't like this. I think we need to be more explicit in the DT w.r.t.
> >>> > the relationship between memory and the CPU hierarchy.
> >>> >
> >>> > I can imagine that we might end up with systems with multiple levels of
> >>> > NUMA hierarchy (using MPIDR_EL1.Aff{3,2}), and I'd rather that we were
> >>> > explcit as possible from the start w.r.t. the relationship between
> >>> > memory and groups of CPUs such that we don't end up with multiple ways
> >>> > of specifying said relationship, and horrible edge cases around implicit
> >>> > definitions (e.g. the nid to cluster mapping).
> >>> are you recomending to have explicit nid attribute to each cpu device node?
> >>
> >> I am recommending that we make the relationship explicit. If anything,
> >> using the cpu-map (with phandles) seems like a better approach.
> > yes, will add the mapping.
> >>
> >>> >> +/**
> >>> >> + * dummy_numa_init - Fallback dummy NUMA init
> >>> >> + *
> >>> >> + * Used if there's no underlying NUMA architecture, NUMA initialization
> >>> >> + * fails, or NUMA is disabled on the command line.
> >>> >> + *
> >>> >> + * Must online at least one node and add memory blocks that cover all
> >>> >> + * allowed memory.  This function must not fail.
> >>> >> + */
> >>> >> +static int __init dummy_numa_init(void)
> >>> >> +{
> >>> >> +       pr_info("%s\n",
> >>> >> +              numa_off ? "NUMA turned off" : "No NUMA configuration found");
> >>> >
> >>> > Why not print "NUMA turned off" in numa_setup?
> >>> enters this function only when, numa is turned off or the DT/ACPI
> >>> based numa init fails.
> >>
> >> Sure. Moving the "NUMA turned off" print into numa_setup would mean you
> >> could just print "Using dummy NUMA layout" or something to that effect
> >> here -- the function has no need to care about the value of numa_off.
> > agreed.
> >>
> >>> >
> >>> >> +       pr_info("Faking a node at [mem %#018Lx-%#018Lx]\n",
> >>> >> +              0LLU, PFN_PHYS(max_pfn) - 1);
> >>> >> +
> >>> >> +       node_set(0, numa_nodes_parsed);
> >>> >> +       numa_add_memblk(0, 0, PFN_PHYS(max_pfn));
> >>> >> +
> >>> >> +       return 0;
> >>> >> +}
> >>> >> +
> >>> >> +/**
> >>> >> + * early_init_dt_scan_numa_map - parse memory node and map nid to memory range.
> >>> >> + */
> >>> >> +int __init early_init_dt_scan_numa_map(unsigned long node, const char *uname,
> >>> >> +                                    int depth, void *data)
> >>> >> +{
> >>> >> +       const char *type = of_get_flat_dt_prop(node, "device_type", NULL);
> >>> >> +       const __be32 *reg, *endp, *nid_prop;
> >>> >> +       int l, nid;
> >>> >> +
> >>> >> +       /* We are scanning "memory" nodes only */
> >>> >> +       if (type == NULL) {
> >>> >> +               /*
> >>> >> +                * The longtrail doesn't have a device_type on the
> >>> >> +                * /memory node, so look for the node called /memory@0.
> >>> >> +                */
> >>> >> +               if (depth != 1 || strcmp(uname, "memory@0") != 0)
> >>> >> +                       return 0;
> >>> >
> >>> > This has no place on arm64.
> >>> i am not sure that we can move to driver/of, at this moment this is
> >>> arm64 specific binding.
> >>
> >> I meant the longtrail workaround, hence my comments on that below.
> > oh! thanks, will remove this.
> >>
> >>> > We limited to longtrail workaround in the core memory parsing to PPC32
> >>> > only in commit b44aa25d20e2ef6b (of: Handle memory@0 node on PPC32
> >>> > only). This code doesn't need it enabled ever.
> >>> >
> >>> > Are you booting using UEFI? This isn't going to work when the memory map
> >>> tried with bootwrapper, working on to boot from UEFI.
> >>> > comes from UEFI and we have no memory nodes in the DTB.
> >>> yes, there is issue with UEFI boot, since memory node is removed.
> >>> i request UEFI stub dev-team to suggest the possible ways to address this.
> >>
> >> I've Cc'd a few people who have worked on the stub and/or EFI memory map
> >> stuff. It would be worth keeping them on Cc so as to keep them informed.
> > thanks.
> >>
> >> I believe that the EFI stub is doing the right thing by ensuring that
> >> the EFI memory map is used, so this is just another configuration that
> >> your binding has to consider.
> > going through EFI stub, next is to boot numa kernel using UEFI, will
> > include UEFI boot support in v2 patch.
> >>
> >> Mark.
> Below is the example for the proposal of numa bindings in DT.
> This covers cpu to node mapping, memory ranges to node mapping.
> Also defines proximity distance matrix of nodes to each other.
> please let me know your comments to go ahead with the implementation.
> 
> numa-map{
>          /*  Address cells used for memory range base address in mem-map.
>              For all others, size-cells is used.
>              Node-count tells the number of numa nodes in the system.
>          */
>          #address-cells = <2>;
>          #size-cells = <1>;
>          #node-count = <4>;
> 
>          /* Memmap for memory ranges on each node>
> 
>          mem-map = <0x0   0x00c00000 0>,
>            <0x1   0x00000000 1>,
>            <0x100 0x00000000 2>,
>            <0x200 0x00000000 3>;
> 
>          /* CPU to node map for 4 NODE and 16 CPUs system
>         < first-cpu last-cpu  node belongs>
>          */
>          cpu-map = <0 3 0>,
>                          <4 7 1>,
>                         <8 11 2>,
>                         <12 16 3>;
> 
>          /*Proximity Distance matrix for 4Node system
>                        <from-node to-node distance>
>         */
>          node-matrix=    <0 0 10>,
>                                <0 1 20>,
>                                <0 2 30>,
>                                <0 3 10>,
>                                <1 0 20>,
>                                 <1 1 10>,
>                                <1 2 30>,
>                               <1 3 10>,
>                               <2 0 30>,
>                               <2 1 20>,
>                               <2 2 10>,
>                              <2 3 10>,
>                             <3 0 10>,
>                             <3 1 20>,
>                             <3 2 30>,
>                             <3 3 10>,
>  }

Hi Ganapat,
The above caught my attention.

For a 4-node system do we not need 16 distances; the implication of that
would be that the distance between node A-B could be different from the
distance between B-A? Also the distance from a node to itself could be
safely assumed to be zero?

I think we should have a symmetric matrix with zero-diagonals so strictly
only seven values would need specifying for a 4-node system.

Cheers,
-- 
Steve
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  parent reply	other threads:[~2014-10-20 14:25 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-09-25  9:03 [RFC PATCH 0/4] arm64:numa: Add numa support for arm64 platforms Ganapatrao Kulkarni
     [not found] ` <1411635840-24038-1-git-send-email-ganapatrao.kulkarni-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8@public.gmane.org>
2014-09-25  9:03   ` [RFC PATCH 1/4] arm64: defconfig: increase NR_CPUS range to 2-128 Ganapatrao Kulkarni
     [not found]     ` <1411635840-24038-2-git-send-email-ganapatrao.kulkarni-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8@public.gmane.org>
2014-10-03 10:58       ` Mark Rutland
2014-10-06  4:29         ` Ganapatrao Kulkarni
2014-09-25  9:03   ` [RFC PATCH 2/4] arm/arm64:dt:numa: adding numa node mapping for memory nodes Ganapatrao Kulkarni
     [not found]     ` <1411635840-24038-3-git-send-email-ganapatrao.kulkarni-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8@public.gmane.org>
2014-10-03 11:05       ` Mark Rutland
2014-10-06  4:20         ` Ganapatrao Kulkarni
     [not found]           ` <CAFpQJXUaTGS+D8q-GDLRhWFJg88rJToMpVnSHggS5e0HLDvXPw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-10-06 11:08             ` Mark Rutland
2014-10-06 17:26               ` Ganapatrao Kulkarni
2014-09-25  9:03   ` [RFC PATCH 3/4] arm64:thunder: Add initial dts for Cavium Thunder SoC in 2 Node topology Ganapatrao Kulkarni
     [not found]     ` <1411635840-24038-4-git-send-email-ganapatrao.kulkarni-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8@public.gmane.org>
2014-10-03 11:19       ` Mark Rutland
2014-09-25  9:03   ` [RFC PATCH 4/4] arm64:numa: adding numa support for arm64 platforms Ganapatrao Kulkarni
     [not found]     ` <1411635840-24038-5-git-send-email-ganapatrao.kulkarni-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8@public.gmane.org>
2014-10-03 12:13       ` Mark Rutland
2014-10-06  5:14         ` Ganapatrao Kulkarni
     [not found]           ` <CAFpQJXVJtn=PTYTd5icHhfYxsQnEzUP+w9kXXDJSs-M=eGNWVw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-10-06 11:26             ` Mark Rutland
2014-10-06 17:52               ` Ganapatrao Kulkarni
     [not found]                 ` <CAFpQJXU4AW_WycWUWWaONvCiFgJU5KC_coCH8EU9kO=a0Rf9hQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-10-17 17:19                   ` Ganapatrao Kulkarni
     [not found]                     ` <CAFpQJXXkgtr6E0owHxAu0MG8+7s5LBt_mVB9gQM1VfkX2rY5FQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-10-20 14:25                       ` Steve Capper [this message]
     [not found]                         ` <20141020142555.GA9968-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
2014-10-20 14:30                           ` Steve Capper
     [not found]                             ` <20141020143045.GA10233-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
2014-10-22 11:27                               ` Ganapatrao Kulkarni
2014-09-25  9:04   ` Ganapatrao Kulkarni

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20141020142555.GA9968@linaro.org \
    --to=steve.capper-qsej5fyqhm4dnm+yrofe0a@public.gmane.org \
    --cc=Catalin.Marinas-5wv7dgnIgG8@public.gmane.org \
    --cc=Will.Deacon-5wv7dgnIgG8@public.gmane.org \
    --cc=ard.biesheuvel-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org \
    --cc=devicetree-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=ganapatrao.kulkarni-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8@public.gmane.org \
    --cc=gpkulkarni-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
    --cc=grant.likely-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org \
    --cc=leif.lindholm-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org \
    --cc=linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org \
    --cc=mark.rutland-5wv7dgnIgG8@public.gmane.org \
    --cc=msalter-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
    --cc=robh+dt-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org \
    --cc=roy.franz-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).