All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Alex Bennée" <alex.bennee@linaro.org>
To: "Cédric Le Goater" <clg@kaod.org>
Cc: "Peter Delevoryas" <pdel@fb.com>,
	"Peter Maydell" <peter.maydell@linaro.org>,
	"Andrew Jeffery" <andrew@aj.id.au>,
	"Joel Stanley" <joel@jms.id.au>,
	"pbonzini@redhat.com" <pbonzini@redhat.com>,
	"berrange@redhat.com" <berrange@redhat.com>,
	"eduardo@habkost.net" <eduardo@habkost.net>,
	"marcel.apfelbaum@gmail.com" <marcel.apfelbaum@gmail.com>,
	"richard.henderson@linaro.org" <richard.henderson@linaro.org>,
	"Philippe Mathieu-Daudé" <f4bug@amsat.org>,
	"ani@anisinha.ca" <ani@anisinha.ca>,
	"Cameron Esfahani via" <qemu-devel@nongnu.org>,
	qemu-arm <qemu-arm@nongnu.org>
Subject: Re: [PATCH 12/14] aspeed: Make aspeed_board_init_flashes public
Date: Wed, 29 Jun 2022 19:24:02 +0100	[thread overview]
Message-ID: <87edz7gwff.fsf@linaro.org> (raw)
In-Reply-To: <8072cd96-a032-55f2-0417-1989a7bbeca0@kaod.org>


Cédric Le Goater <clg@kaod.org> writes:

> On 6/29/22 16:14, Alex Bennée wrote:
>> Cédric Le Goater <clg@kaod.org> writes:
>> 
>>> On 6/24/22 18:50, Cédric Le Goater wrote:
>>>> On 6/23/22 20:43, Peter Delevoryas wrote:
>>>>>
>>>>>
>>>>>> On Jun 23, 2022, at 8:09 AM, Cédric Le Goater <clg@kaod.org> wrote:
>>>>>>
>>>>>> On 6/23/22 12:26, Peter Delevoryas wrote:
>>>>>>> Signed-off-by: Peter Delevoryas <pdel@fb.com>
>>>>>>
>>>>>> Let's start simple without flash support. We should be able to
>>>>>> load FW blobs in each CPU address space using loader devices.
>>>>>
>>>>> Actually, I was unable to do this, perhaps because the fb OpenBMC
>>>>> boot sequence is a little weird. I specifically _needed_ to have
>>>>> a flash device which maps the firmware in at 0x2000_0000, because
>>>>> the fb OpenBMC U-Boot SPL jumps to that address to start executing
>>>>> from flash? I think this is also why fb OpenBMC machines can be so slow.
>>>>>
>>>>> $ ./build/qemu-system-arm -machine fby35 \
>>>>>       -device loader,file=fby35.mtd,addr=0,cpu-num=0 -nographic \
>>>>>       -d int -drive file=fby35.mtd,format=raw,if=mtd
>>>> Ideally we should be booting from the flash device directly using
>>>> the machine option '-M ast2600-evb,execute-in-place=true' like HW
>>>> does. Instructions are fetched using SPI transfers. But the amount
>>>> of code generated is tremendous.
>> Yeah because there is a potential race when reading from HW so we
>> throw
>> away TB's after executing them because we have no way of knowing if it
>> has changed under our feet. See 873d64ac30 (accel/tcg: re-factor non-RAM
>> execution code) which cleaned up this handling.
>> 
>>>> See some profiling below for a
>>>> run which barely reaches DRAM training in U-Boot.
>>>
>>> Some more profiling on both ast2500 and ast2600 machines shows :
>>>
>>>
>>> * ast2600-evb,execute-in-place=true :
>>>
>>> Type               Object  Call site                Wait Time (s)         Count  Average (us)
>>> ---------------------------------------------------------------------------------------------
>>> BQL mutex  0x564dc03922e0  accel/tcg/cputlb.c:1365       14.21443
>>> 32909927          0.43
>> This is unavoidable as a HW access needs the BQL held so we will go
>> through this cycle every executed instruction.
>> Did I miss why the flash contents are not mapped into the physical
>> address space? Isn't that how it appear to the processor?
>
>
> There are two modes :
>           if (ASPEED_MACHINE(machine)->mmio_exec) {
>             memory_region_init_alias(boot_rom, NULL, "aspeed.boot_rom",
>                                      &fl->mmio, 0, size);
>             memory_region_add_subregion(get_system_memory(), FIRMWARE_ADDR,
>                                         boot_rom);
>         } else {
>             memory_region_init_rom(boot_rom, NULL, "aspeed.boot_rom",
>                                    size, &error_abort);
>             memory_region_add_subregion(get_system_memory(), FIRMWARE_ADDR,
>                                         boot_rom);
>             write_boot_rom(drive0, FIRMWARE_ADDR, size, &error_abort);
>         }
>
> The default boot mode uses the ROM. No issue.
>
> The "execute-in-place=true" option creates an alias on the region of
> the flash contents and each instruction is then fetched from the flash
> drive with SPI transactions.
>
> With old FW images, using an older U-boot, the machine boots in a couple
> of seconds. See the profiling below for a witherspoon-bmc machine using
> U-Boot 2016.07.
>
>   qemu-system-arm -M witherspoon-bmc,execute-in-place=true  -drive file=./flash-witherspoon-bmc,format=raw,if=mtd -drive file=./flash-witherspoon-bmc2,format=raw,if=mtd -nographic -nodefaults -snapshot -serial mon:stdio -enable-sync-profile
>   ...
>   U-Boot 2016.07-00040-g8425e96e2e27-dirty (Jun 24 2022 - 23:21:57 +0200)
>             Watchdog enabled
>   DRAM:  496 MiB
>   Flash: 32 MiB
>   In:    serial
>   Out:   serial
>   Err:   serial
>   Net:
>   (qemu) info sync-profile
>   Type               Object  Call site                Wait Time (s)         Count  Average (us)
>   ---------------------------------------------------------------------------------------------
>   BQL mutex  0x56189610b2e0  accel/tcg/cputlb.c:1365        0.25311      12346237          0.02
>   condvar    0x5618970cf220  softmmu/cpus.c:423             0.05506             2      27530.78
>   BQL mutex  0x56189610b2e0  util/rcu.c:269                 0.04709             2      23544.26
>   condvar    0x561896d0fc78  util/thread-pool.c:90          0.01340            83        161.47
>   condvar    0x56189610b240  softmmu/cpus.c:571             0.00005             1         54.93
>   condvar    0x56189610b280  softmmu/cpus.c:642             0.00003             1         32.88
>   BQL mutex  0x56189610b2e0  util/main-loop.c:318           0.00003            34          0.76
>   mutex      0x561896eade00  tcg/region.c:204               0.00002           995          0.02
>   rec_mutex  [           2]  util/async.c:682               0.00002           493          0.03
>   mutex      [           2]  chardev/char.c:118             0.00001           404          0.03
>   ---------------------------------------------------------------------------------------------
>
>
> However, with recent U-boots, it takes quite a while to reach DRAM training.
> Close to a minute. See the profiling below for an ast2500-evb machine using
> U-Boot 2019.04.
>
>  qemu-system-arm -M ast2500-evb,execute-in-place=true  -net nic,macaddr=C0:FF:EE:00:00:03,netdev=net0  -drive file=./flash-ast2500-evb,format=raw,if=mtd  -nographic -nodefaults -snapshot -serial mon:stdio  -enable-sync-profile
>   qemu-system-arm: warning: Aspeed iBT has no chardev backend
>   qemu-system-arm: warning: nic ftgmac100.1 has no peer
>   QEMU 7.0.50 monitor - type 'help' for more information
>      U-Boot 2019.04-00080-g6ca27db3f97b-dirty (Jun 24 2022 - 23:22:03
>     +0200)
>      SOC : AST2500-A1
>   RST : Power On
>   LPC Mode : SIO:Enable : SuperIO-2e
>   Eth : MAC0: RGMII, , MAC1: RGMII,
>   Model: AST2500 EVB
>   DRAM:  448 MiB (capacity:512 MiB, VGA:64 MiB, ECC:off)
>   MMC:   sdhci_slot0@100: 0, sdhci_slot1@200: 1
>   Loading Environment from SPI Flash... SF: Detected mx25l25635e with page size 256 Bytes, erase size 64 KiB, total 32 MiB
>   *** Warning - bad CRC, using default environment
>      In:    serial@1e784000
>   Out:   serial@1e784000
>   Err:   serial@1e784000
>   Net:   eth0: ethernet@1e660000
>   Warning: ethernet@1e680000 (eth1) using random MAC address - 4a:e5:9a:4a:c7:c5
>   , eth1: ethernet@1e680000
>   Hit any key to stop autoboot:  2
>   (qemu) info sync-profile
>   Type               Object  Call site                Wait Time (s)         Count  Average (us)
>   ---------------------------------------------------------------------------------------------
>   condvar    0x561f10c9ef88  util/thread-pool.c:90         10.01196            28     357570.00
>   BQL mutex  0x561f102362e0  accel/tcg/cputlb.c:1365        0.29496      14248621          0.02
>   condvar    0x561f110325a0  softmmu/cpus.c:423             0.02231             2      11152.57
>   BQL mutex  0x561f102362e0  util/rcu.c:269                 0.01447             4       3618.60
>   condvar    0x561f10236240  softmmu/cpus.c:571             0.00010             1        102.19
>   mutex      0x561f10e9f1c0  tcg/region.c:204               0.00007          3052          0.02
>   mutex      [           2]  chardev/char.c:118             0.00003          1486          0.02
>   condvar    0x561f10236280  softmmu/cpus.c:642             0.00003             1         29.38
>   BQL mutex  0x561f102362e0  accel/tcg/cputlb.c:1426        0.00002           973          0.02
>   BQL mutex  0x561f102362e0  util/main-loop.c:318           0.00001            34          0.41
>   ---------------------------------------------------------------------------------------------
>   Something in the layout of the FW is making a big difference. One
> that could be relevant is that the recent versions are using a device
> tree.
>
> There might be no good solution to this issue but I fail to analyze
> it correctly. Is there a way to collect information on the usage of
> Translation Blocks ?

You could expand the data we collect in tb_tree_stats and expose it via
info jit.

>
> Thanks,
>
> C.


-- 
Alex Bennée

  reply	other threads:[~2022-06-29 18:24 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20220623102617.2164175-1-pdel@fb.com>
     [not found] ` <20220623102617.2164175-3-pdel@fb.com>
2022-06-23 12:11   ` [PATCH 02/14] sysbus: Remove sysbus_address_space Peter Maydell
     [not found] ` <20220623102617.2164175-5-pdel@fb.com>
2022-06-23 12:15   ` [PATCH 04/14] sysbus: Add sysbus_mmio_map_in Peter Maydell
2022-06-23 18:29     ` Peter Delevoryas
2022-06-23 18:29       ` Peter Delevoryas
     [not found] ` <20220623102617.2164175-9-pdel@fb.com>
2022-06-23 12:57   ` [PATCH 08/14] aspeed: Replace direct get_system_memory() calls Peter Maydell
2022-06-23 15:39     ` Cédric Le Goater
2022-06-23 18:45       ` Peter Delevoryas
2022-06-23 18:45         ` Peter Delevoryas
     [not found] ` <20220623102617.2164175-13-pdel@fb.com>
2022-06-23 15:09   ` [PATCH 12/14] aspeed: Make aspeed_board_init_flashes public Cédric Le Goater
2022-06-23 18:43     ` Peter Delevoryas
2022-06-23 18:43       ` Peter Delevoryas
2022-06-24 16:50       ` Cédric Le Goater
2022-06-29  9:11         ` Cédric Le Goater
2022-06-29 14:14           ` Alex Bennée
2022-06-29 15:54             ` Cédric Le Goater
2022-06-29 18:24               ` Alex Bennée [this message]
2022-06-30  8:49                 ` Cédric Le Goater
2022-06-30  9:43                   ` Alex Bennée
2022-07-05 12:35                     ` Cédric Le Goater

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87edz7gwff.fsf@linaro.org \
    --to=alex.bennee@linaro.org \
    --cc=andrew@aj.id.au \
    --cc=ani@anisinha.ca \
    --cc=berrange@redhat.com \
    --cc=clg@kaod.org \
    --cc=eduardo@habkost.net \
    --cc=f4bug@amsat.org \
    --cc=joel@jms.id.au \
    --cc=marcel.apfelbaum@gmail.com \
    --cc=pbonzini@redhat.com \
    --cc=pdel@fb.com \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-arm@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=richard.henderson@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.