All of lore.kernel.org
 help / color / mirror / Atom feed
From: AKASHI Takahiro <takahiro.akashi@linaro.org>
To: Fran??ois Ozog <francois.ozog@linaro.org>
Cc: Heinrich Schuchardt <xypron.glpk@gmx.de>,
	Ilias Apalodimas <ilias.apalodimas@linaro.org>,
	Tuomas Tynkkynen <tuomas.tynkkynen@iki.fi>,
	U-Boot Mailing List <u-boot@lists.denx.de>
Subject: Re: QEMU NUMA and U-Boot
Date: Wed, 7 Jul 2021 19:16:15 +0900	[thread overview]
Message-ID: <20210707101615.GA49079@laputa> (raw)
In-Reply-To: <CAHFG_=Vw-+MgcyHrTqJ6__xo5jSAU=+9D6TU6cC22Dh_D1T3Ew@mail.gmail.com>

On Wed, Jul 07, 2021 at 11:37:19AM +0200, Fran??ois Ozog wrote:
> On Wed, 7 Jul 2021 at 09:40, François Ozog <francois.ozog@linaro.org> wrote:
> 
> > On Wed, 7 Jul 2021 at 05:59, Heinrich Schuchardt <xypron.glpk@gmx.de>
> > wrote:
> > >
> > > Am 7. Juli 2021 05:18:20 MESZ schrieb Heinrich Schuchardt <
> > xypron.glpk@gmx.de>:
> > > >Am 7. Juli 2021 03:44:35 MESZ schrieb AKASHI Takahiro
> > > ><takahiro.akashi@linaro.org>:
> > > >>François,
> > > >>
> > > >>On Tue, Jul 06, 2021 at 08:10:08PM +0200, Heinrich Schuchardt wrote:
> > > >>> On 7/6/21 6:13 PM, François Ozog wrote:
> > > >>> > Hi Heinrich, U-Boot 2021-07rc5 does not take into account memory
> > > >>> > description when using Qemu 5.2 NUMA configuration to adapt memory
> > > >>map
> > > >>> > (kernel_addr_r...):
> > > >>> >
> > > >>> >         -smp 4 \
> > > >>> >           -m 8G,slots=2,maxmem=16G \
> > > >>> >          -object memory-backend-ram,size=4G,id=m0 \
> > > >>> >          -object memory-backend-ram,size=4G,id=m1 \
> > > >>> >          -numa node,cpus=0-1,nodeid=0,memdev=m0 \
> > > >>> >          -numa node,cpus=2-3,nodeid=1,memdev=m1
> > > >>> >
> > > >>> > kernel_addr_r is still 0x4040000 and thus you can't use it to
> > > >>bootefi.
> > > >>> >
> > > >>> > fdt addr 0x13ede6de0; fdt print
> > > >>> >
> > > >>> > Displays fdt while I think it should not.
> > > >>> >
> > > >>> > If I load the kernel at dram.start, the load works but not boot
> > > >>> >
> > > >>> > U-Boot 2021.07 (Jul 06 2021 - 13:26:43 +0000)
> > > >>> >
> > > >>> >
> > > >>> > DRAM:4 GiB
> > > >>> >
> > > >>> > Flash: 64 MiB
> > > >>> >
> > > >>> > Loading Environment from Flash... OK
> > > >>> >
> > > >>> > In:pl011@9000000
> > > >>> >
> > > >>> > Out: pl011@9000000
> > > >>> >
> > > >>> > Err: pl011@9000000
> > > >>> >
> > > >>> > Net: eth0: virtio-net#32
> > > >>> >
> > > >>> > Hit any key to stop autoboot:0
> > > >>> >
> > > >>> > =>
> > > >>> >
> > > >>> > => bdinfo
> > > >>> >
> > > >>> > boot_params = 0x0000000000000000
> > > >>> >
> > > >>> > DRAM bank = 0x0000000000000000
> > > >>> >
> > > >>> > -> start= 0x0000000140000000
> > > >>> >
> > > >>> > -> size = 0x0000000100000000
> > > >>> >
> > > >>> > flashstart= 0x0000000000000000
> > > >>> >
> > > >>> > flashsize = 0x0000000004000000
> > > >>> >
> > > >>> > flashoffset = 0x00000000000bc990
> > > >>> >
> > > >>> > baudrate= 115200 bps
> > > >>> >
> > > >>> > relocaddr = 0x000000013ff27000
> > > >>> >
> > > >>> > reloc off = 0x000000013ff27000
> > > >>> >
> > > >>> > Build = 64-bit
> > > >>> >
> > > >>> > current eth = virtio-net#32
> > > >>> >
> > > >>> > ethaddr = 52:52:52:52:52:52
> > > >>> >
> > > >>> > IP addr = <NULL>
> > > >>> >
> > > >>> > fdt_blob= 0x000000013ede6de0
> > > >>> >
> > > >>> > new_fdt = 0x000000013ede6de0
> > > >>> >
> > > >>> > fdt_size= 0x0000000000100000
> > > >>> >
> > > >>> > lmb_dump_all:
> > > >>> >
> > > >>> > memory.cnt= 0x1
> > > >>> >
> > > >>> > memory.reg[0x0].base = 0x140000000
> > > >>> >
> > > >>> > .size = 0x100000000
> > > >>> >
> > > >>> >
> > > >>> > reserved.cnt= 0x0
> > > >>> >
> > > >>> > arch_number = 0x0000000000000000
> > > >>> >
> > > >>> > TLB addr= 0x000000013fff0000
> > > >>> >
> > > >>> > irq_sp= 0x000000013ede6dd0
> > > >>> >
> > > >>> > sp start= 0x000000013ede6dd0
> > > >>> >
> > > >>> > Early malloc usage: 3a8 / 2000
> > > >>> >
> > > >>> > => load virtio 0:1 0x140000000 /oskit.efi
> > > >>> >
> > > >>> > 853424 bytes read in 1 ms (813.9 MiB/s)
> > > >>> >
> > > >>> > => bootefi0x140000000 0x13ede6dd0
> > > >>> >
> > > >>> > ERROR: Failed to register WaitForKey event
> > > >>> >
> > > >>> > Setting OsIndications failed
> > > >>> >
> > > >>> > Error: Cannot initialize UEFI sub-system, r = 9
> > > >>> >
> > > >>> >
> > > >>> > I think there is a need to calculate memory map based on previous
> > > >>> > firmware (TFA, QEMU can be considered as previous frimware)
> > > >>information
> > > >>> > (DT or blob_list).
> > > >>> >
> > > >>> > What do you think ?
> > > >>> >
> > > >>> > Cheers
> > > >>> >
> > > >>> > FF
> > > >>> >
> > > >>> > --
> > > >>> >
> > > >>> > François-Frédéric Ozog | /Director Business Development/
> > > >>> > T: +33.67221.6485
> > > >>> > francois.ozog@linaro.org <mailto:francois.ozog@linaro.org>
> > > >>| Skype: ffozog
> > > >>> >
> > > >>> >
> > > >>>
> > > >>> The kernel load address is hard coded here:
> > > >>> include/configs/qemu-arm.h:41:  "kernel_addr_r=0x40400000\0" \
> > > >>>
> > > >>> bdinfo shows:
> > > >>> DRAM start = 0x140000000
> > > >>> DRAM size  = 0x100000000
> > > >>>
> > > >>> fdt addr $fdt_addr
> > > >>> fdt printf
> > > >>>
> > > >>> shows two memory areas. One at 40000000, one at 140000000.
> > > >>
> > > >>(This shows that U-Boot receives a correct memory map via dtb.)
> > > >>
> > > >>Is this a NUMA machine, isn't it? Why should we care of which
> > > >>memory region be used here? Please note that this is a virtual
> > > >machine,
> > > >>there is no practical difference between two regions.
> > > >>
> > > >>The root problem is that U-Boot did not recognize there were two
> > > >>memory regions. We can fix this issue in either way:
> > > >>
> > > >>1)
> > > >>diff --git a/configs/qemu_arm64_defconfig
> > > >>b/configs/qemu_arm64_defconfig
> > > >>index f6e586627a8e..b70ffae8bf6e 100644
> > > >>--- a/configs/qemu_arm64_defconfig
> > > >>+++ b/configs/qemu_arm64_defconfig
> > > >>@@ -1,7 +1,7 @@
> > > >> CONFIG_ARM=y
> > > >> CONFIG_POSITION_INDEPENDENT=y
> > > >> CONFIG_ARCH_QEMU=y
> > > >>-CONFIG_NR_DRAM_BANKS=1
> > > >>+CONFIG_NR_DRAM_BANKS=2
> > > >> CONFIG_ENV_SIZE=0x40000
> > > >> CONFIG_ENV_SECT_SIZE=0x40000
> > > >> CONFIG_AHCI=y
> > > >>
> > > >>2)
> > > >>diff --git a/lib/fdtdec.c b/lib/fdtdec.c
> > > >>index 4b097fb588ed..4067ea2dead6 100644
> > > >>--- a/lib/fdtdec.c
> > > >>+++ b/lib/fdtdec.c
> > > >>@@ -1111,7 +1111,7 @@ int fdtdec_setup_memory_banksize(void)
> > > >>                return -EINVAL;
> > > >>        }
> > > >>
> > > >>-       for (bank = 0; bank < CONFIG_NR_DRAM_BANKS; bank++) {
> > > >>+       for (bank = 0; ; bank++) {
> > > >>                ret = ofnode_read_resource(mem, reg++, &res);
> > > >>                if (ret < 0) {
> > > >>                        reg = 0;
> > > >>
> > > >>   (fdtdec_setup_memory_banksize() is called in dram_init_banksize().)
> > > >>
> > > >>
> > > >>(2) seems much better, but I don't know why we had to use
> > > >>CONFIG_NR_DRAM_BANKS here.
> > > >>
> >
> > 2) alone does not work as other places in the code refer to
> > CONFIG_NR_DRAM_BANKS. Setting ...BANKS to 32 makes my code work and
> > bdinfo seems now correct:
> >
> => bdinfo
> > boot_params = 0x0000000000000000
> > DRAM bank   = 0x0000000000000000
> > -> start    = 0x0000000140000000
> > -> size     = 0x0000000100000000
> > DRAM bank   = 0x0000000000000001
> > -> start    = 0x0000000040000000
> > -> size     = 0x0000000100000000
> > flashstart  = 0x0000000000000000
> > flashsize   = 0x0000000004000000
> > flashoffset = 0x00000000000bcb88
> > baudrate    = 115200 bps
> > relocaddr   = 0x000000013ff27000
> > reloc off   = 0x000000013ff27000
> > Build       = 64-bit
> > current eth = virtio-net#32
> > ethaddr     = 52:52:52:52:52:52
> > IP addr     = <NULL>
> > fdt_blob    = 0x000000013ede6cf0
> > new_fdt     = 0x000000013ede6cf0
> > fdt_size    = 0x0000000000100000
> > lmb_dump_all:
> >     memory.cnt   = 0x1
> >     memory.reg[0x0].base   = 0x40000000
> >   .size   = 0x200000000
> >     reserved.cnt   = 0x1
> >     reserved.reg[0x0].base = 0x13ede58f0
> >     .size = 0x121a710
> > arch_number = 0x0000000000000000
> > TLB addr    = 0x000000013fff0000
> > irq_sp      = 0x000000013ede6ce0
> > sp start    = 0x000000013ede6ce0
> > Early malloc usage: 3a8 / 2000
> >
> > May I suggest you propose a combined patch Akashi-san? If we assume
> > NUMA systems to be tested up to 8 nodes to mimic real existing
> > enterprise hardware and up to 4 memory slots (say for memory hot
> > plugging tests) what about a default value of 32? Alternatively, we
> > could set this value to a much higher one if the costs are negligible.
> >
> >
> > Well, lets not rush as there are other twists:
> 
> the 4G bank in node 1 is marked BootServicesData in the UEFI GetMemoryMap
> which I assume is not the case. EDK2 reports it as ConventionalMemory.
> 
> The root cause seem to be gd->ramtop not being setup properly.
> 
> Further analysis shows that the DT passed to the booted EFI payload does
> not seem to be correct:
> 
> DT fragment passed to U-Boot
> 
> memory@140000000 {
> numa-node-id = <0x00000001>;
> reg = <0x00000001 0x40000000 0x00000001 0x00000000>;
> device_type = "memory";
> };
> memory@40000000 {
> numa-node-id = <0x00000000>;
> reg = <0x00000000 0x40000000 0x00000001 0x00000000>;
> device_type = "memory";
> };
> 
> DT passed to payload (as per my debug code):
> 
> memory@140000000: memory
> 
>     numa-node-id 1
> 
>     reg (len= 32)
> 
>          140000000 100000000
> 
>          40000000 100000000
> 
> memory@40000000: memory
> 
>     numa-node-id 0
> 
>     reg (len= 16)
> 
>          40000000 100000000
> 
> I am investigating this further...

You should check the logic of fdt_fixup_memory_banks()
which is called this way:
  efi_dt_fixup()
    image_setup_libfdt()
      arch_fixup_fdt()
        fdt_fixup_memory_banks()

What it does is to put *all* the memory regions unconditionally as
a single "reg" array into the *first-detected* "memory" node, which is
"memory@140000000" in this case.
It means that this function doesn't respect NUMA configuration.

-Takahiro Akashi


> > >>In this case, other occurrences of CONFIG_NR_DRAM_BANKS in this file
> > > >>should be replaced with a variable for it.
> > > >>
> > > >>> Your use case is well beyond the typical U-Boot usage. So I guess it
> > > >>> will be up to Linaro to provide the necessary patches:
> > > >>>
> > > >>> * determine the active CPU
> > > >>> * determine the RAM assigned to the active CPU according
> > > >>>   to the numa-node-id in the device-tree
> > > >>> * make sure that U-Boot only uses the memory of the active CPU
> > > >>>   internally
> > > >>> * make sure that the UEFI memory map contains a compliant
> > > >description
> > > >>> * possibly, dynamically set up the environment variables
> > > >>>
> > > >>> +CC Tuomas Tynkkynen (maintainer for qemu_arm64_defconfig)
> > > >>
> > > >>For (1), we'd better have a different config, or increase
> > > >>the value of CONFIG_NR_DRAM_BANKS to a bigger number?
> > > >
> > > >Is the system configured such that each CPU can access the others CPU's
> > > >RAM when entering U-Boot?
> > > >
> > > >Best regards
> > > >
> > > >Heinrich
> > > >
> > >
> > > At least the comments for this patch sound as if on a physical system
> > cross NUMA node memory access is only available after full SMP
> > initialization:
> > >
> > >
> > https://patchwork.kernel.org/project/linux-acpi/patch/20180625130552.5636-1-lorenzo.pieralisi@arm.com/
> > >
> > > QEMU may be less restrictive.
> > >
> > > QEMU allows the node distance to be 255 indicating that cross node
> > access is infeasible.
> > >
> > > Best regards
> > >
> > > Heinrich
> > >
> > > >>
> > > >>-Takahiro Akashi
> > > >>
> > > >>
> > > >>> Best regards
> > > >>>
> > > >>> Heinrich
> > >
> >
> >
> > --
> > François-Frédéric Ozog | Director Business Development
> > T: +33.67221.6485
> > francois.ozog@linaro.org | Skype: ffozog
> >
> 
> 
> -- 
> François-Frédéric Ozog | *Director Business Development*
> T: +33.67221.6485
> francois.ozog@linaro.org | Skype: ffozog

  reply	other threads:[~2021-07-07 10:16 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CAHFG_=WTxz24QAPgfSfS_uz7X6wZMnjqwjTkbi0eO1Vk_TGkiQ@mail.gmail.com>
2021-07-06 18:10 ` QEMU NUMA and U-Boot Heinrich Schuchardt
2021-07-07  1:44   ` AKASHI Takahiro
2021-07-07  3:18     ` Heinrich Schuchardt
2021-07-07  3:58       ` Heinrich Schuchardt
2021-07-07  3:59       ` Heinrich Schuchardt
2021-07-07  7:40         ` François Ozog
2021-07-07  9:37           ` François Ozog
2021-07-07 10:16             ` AKASHI Takahiro [this message]
2021-07-07 11:00               ` François Ozog
2021-07-07 15:15                 ` François Ozog
2021-07-07 17:39                   ` Heinrich Schuchardt
2022-03-23  7:29                     ` François Ozog
2022-03-23  8:20                       ` Mark Kettenis

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210707101615.GA49079@laputa \
    --to=takahiro.akashi@linaro.org \
    --cc=francois.ozog@linaro.org \
    --cc=ilias.apalodimas@linaro.org \
    --cc=tuomas.tynkkynen@iki.fi \
    --cc=u-boot@lists.denx.de \
    --cc=xypron.glpk@gmx.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.