qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: "Cédric Le Goater" <clg@kaod.org>
To: "Alex Bennée" <alex.bennee@linaro.org>
Cc: "Peter Delevoryas" <pdel@fb.com>,
	"Peter Maydell" <peter.maydell@linaro.org>,
	"Andrew Jeffery" <andrew@aj.id.au>,
	"Joel Stanley" <joel@jms.id.au>,
	"pbonzini@redhat.com" <pbonzini@redhat.com>,
	"berrange@redhat.com" <berrange@redhat.com>,
	"eduardo@habkost.net" <eduardo@habkost.net>,
	"marcel.apfelbaum@gmail.com" <marcel.apfelbaum@gmail.com>,
	"richard.henderson@linaro.org" <richard.henderson@linaro.org>,
	"Philippe Mathieu-Daudé" <f4bug@amsat.org>,
	"ani@anisinha.ca" <ani@anisinha.ca>,
	"Cameron Esfahani via" <qemu-devel@nongnu.org>,
	qemu-arm <qemu-arm@nongnu.org>
Subject: Re: [PATCH 12/14] aspeed: Make aspeed_board_init_flashes public
Date: Thu, 30 Jun 2022 10:49:08 +0200	[thread overview]
Message-ID: <a35ac443-c93e-fb2b-6da9-2a3dc16f7d8c@kaod.org> (raw)
In-Reply-To: <87edz7gwff.fsf@linaro.org>

On 6/29/22 20:24, Alex Bennée wrote:
> 
> Cédric Le Goater <clg@kaod.org> writes:
> 
>> On 6/29/22 16:14, Alex Bennée wrote:
>>> Cédric Le Goater <clg@kaod.org> writes:
>>>
>>>> On 6/24/22 18:50, Cédric Le Goater wrote:
>>>>> On 6/23/22 20:43, Peter Delevoryas wrote:
>>>>>>
>>>>>>
>>>>>>> On Jun 23, 2022, at 8:09 AM, Cédric Le Goater <clg@kaod.org> wrote:
>>>>>>>
>>>>>>> On 6/23/22 12:26, Peter Delevoryas wrote:
>>>>>>>> Signed-off-by: Peter Delevoryas <pdel@fb.com>
>>>>>>>
>>>>>>> Let's start simple without flash support. We should be able to
>>>>>>> load FW blobs in each CPU address space using loader devices.
>>>>>>
>>>>>> Actually, I was unable to do this, perhaps because the fb OpenBMC
>>>>>> boot sequence is a little weird. I specifically _needed_ to have
>>>>>> a flash device which maps the firmware in at 0x2000_0000, because
>>>>>> the fb OpenBMC U-Boot SPL jumps to that address to start executing
>>>>>> from flash? I think this is also why fb OpenBMC machines can be so slow.
>>>>>>
>>>>>> $ ./build/qemu-system-arm -machine fby35 \
>>>>>>        -device loader,file=fby35.mtd,addr=0,cpu-num=0 -nographic \
>>>>>>        -d int -drive file=fby35.mtd,format=raw,if=mtd
>>>>> Ideally we should be booting from the flash device directly using
>>>>> the machine option '-M ast2600-evb,execute-in-place=true' like HW
>>>>> does. Instructions are fetched using SPI transfers. But the amount
>>>>> of code generated is tremendous.
>>> Yeah because there is a potential race when reading from HW so we
>>> throw
>>> away TB's after executing them because we have no way of knowing if it
>>> has changed under our feet. See 873d64ac30 (accel/tcg: re-factor non-RAM
>>> execution code) which cleaned up this handling.
>>>
>>>>> See some profiling below for a
>>>>> run which barely reaches DRAM training in U-Boot.
>>>>
>>>> Some more profiling on both ast2500 and ast2600 machines shows :
>>>>
>>>>
>>>> * ast2600-evb,execute-in-place=true :
>>>>
>>>> Type               Object  Call site                Wait Time (s)         Count  Average (us)
>>>> ---------------------------------------------------------------------------------------------
>>>> BQL mutex  0x564dc03922e0  accel/tcg/cputlb.c:1365       14.21443
>>>> 32909927          0.43
>>> This is unavoidable as a HW access needs the BQL held so we will go
>>> through this cycle every executed instruction.
>>> Did I miss why the flash contents are not mapped into the physical
>>> address space? Isn't that how it appear to the processor?
>>
>>
>> There are two modes :
>>            if (ASPEED_MACHINE(machine)->mmio_exec) {
>>              memory_region_init_alias(boot_rom, NULL, "aspeed.boot_rom",
>>                                       &fl->mmio, 0, size);
>>              memory_region_add_subregion(get_system_memory(), FIRMWARE_ADDR,
>>                                          boot_rom);
>>          } else {
>>              memory_region_init_rom(boot_rom, NULL, "aspeed.boot_rom",
>>                                     size, &error_abort);
>>              memory_region_add_subregion(get_system_memory(), FIRMWARE_ADDR,
>>                                          boot_rom);
>>              write_boot_rom(drive0, FIRMWARE_ADDR, size, &error_abort);
>>          }
>>
>> The default boot mode uses the ROM. No issue.
>>
>> The "execute-in-place=true" option creates an alias on the region of
>> the flash contents and each instruction is then fetched from the flash
>> drive with SPI transactions.
>>
>> With old FW images, using an older U-boot, the machine boots in a couple
>> of seconds. See the profiling below for a witherspoon-bmc machine using
>> U-Boot 2016.07.
>>
>>    qemu-system-arm -M witherspoon-bmc,execute-in-place=true  -drive file=./flash-witherspoon-bmc,format=raw,if=mtd -drive file=./flash-witherspoon-bmc2,format=raw,if=mtd -nographic -nodefaults -snapshot -serial mon:stdio -enable-sync-profile
>>    ...
>>    U-Boot 2016.07-00040-g8425e96e2e27-dirty (Jun 24 2022 - 23:21:57 +0200)
>>              Watchdog enabled
>>    DRAM:  496 MiB
>>    Flash: 32 MiB
>>    In:    serial
>>    Out:   serial
>>    Err:   serial
>>    Net:
>>    (qemu) info sync-profile
>>    Type               Object  Call site                Wait Time (s)         Count  Average (us)
>>    ---------------------------------------------------------------------------------------------
>>    BQL mutex  0x56189610b2e0  accel/tcg/cputlb.c:1365        0.25311      12346237          0.02
>>    condvar    0x5618970cf220  softmmu/cpus.c:423             0.05506             2      27530.78
>>    BQL mutex  0x56189610b2e0  util/rcu.c:269                 0.04709             2      23544.26
>>    condvar    0x561896d0fc78  util/thread-pool.c:90          0.01340            83        161.47
>>    condvar    0x56189610b240  softmmu/cpus.c:571             0.00005             1         54.93
>>    condvar    0x56189610b280  softmmu/cpus.c:642             0.00003             1         32.88
>>    BQL mutex  0x56189610b2e0  util/main-loop.c:318           0.00003            34          0.76
>>    mutex      0x561896eade00  tcg/region.c:204               0.00002           995          0.02
>>    rec_mutex  [           2]  util/async.c:682               0.00002           493          0.03
>>    mutex      [           2]  chardev/char.c:118             0.00001           404          0.03
>>    ---------------------------------------------------------------------------------------------
>>
>>
>> However, with recent U-boots, it takes quite a while to reach DRAM training.
>> Close to a minute. See the profiling below for an ast2500-evb machine using
>> U-Boot 2019.04.
>>
>>   qemu-system-arm -M ast2500-evb,execute-in-place=true  -net nic,macaddr=C0:FF:EE:00:00:03,netdev=net0  -drive file=./flash-ast2500-evb,format=raw,if=mtd  -nographic -nodefaults -snapshot -serial mon:stdio  -enable-sync-profile
>>    qemu-system-arm: warning: Aspeed iBT has no chardev backend
>>    qemu-system-arm: warning: nic ftgmac100.1 has no peer
>>    QEMU 7.0.50 monitor - type 'help' for more information
>>       U-Boot 2019.04-00080-g6ca27db3f97b-dirty (Jun 24 2022 - 23:22:03
>>      +0200)
>>       SOC : AST2500-A1
>>    RST : Power On
>>    LPC Mode : SIO:Enable : SuperIO-2e
>>    Eth : MAC0: RGMII, , MAC1: RGMII,
>>    Model: AST2500 EVB
>>    DRAM:  448 MiB (capacity:512 MiB, VGA:64 MiB, ECC:off)
>>    MMC:   sdhci_slot0@100: 0, sdhci_slot1@200: 1
>>    Loading Environment from SPI Flash... SF: Detected mx25l25635e with page size 256 Bytes, erase size 64 KiB, total 32 MiB
>>    *** Warning - bad CRC, using default environment
>>       In:    serial@1e784000
>>    Out:   serial@1e784000
>>    Err:   serial@1e784000
>>    Net:   eth0: ethernet@1e660000
>>    Warning: ethernet@1e680000 (eth1) using random MAC address - 4a:e5:9a:4a:c7:c5
>>    , eth1: ethernet@1e680000
>>    Hit any key to stop autoboot:  2
>>    (qemu) info sync-profile
>>    Type               Object  Call site                Wait Time (s)         Count  Average (us)
>>    ---------------------------------------------------------------------------------------------
>>    condvar    0x561f10c9ef88  util/thread-pool.c:90         10.01196            28     357570.00
>>    BQL mutex  0x561f102362e0  accel/tcg/cputlb.c:1365        0.29496      14248621          0.02
>>    condvar    0x561f110325a0  softmmu/cpus.c:423             0.02231             2      11152.57
>>    BQL mutex  0x561f102362e0  util/rcu.c:269                 0.01447             4       3618.60
>>    condvar    0x561f10236240  softmmu/cpus.c:571             0.00010             1        102.19
>>    mutex      0x561f10e9f1c0  tcg/region.c:204               0.00007          3052          0.02
>>    mutex      [           2]  chardev/char.c:118             0.00003          1486          0.02
>>    condvar    0x561f10236280  softmmu/cpus.c:642             0.00003             1         29.38
>>    BQL mutex  0x561f102362e0  accel/tcg/cputlb.c:1426        0.00002           973          0.02
>>    BQL mutex  0x561f102362e0  util/main-loop.c:318           0.00001            34          0.41
>>    ---------------------------------------------------------------------------------------------
>>    Something in the layout of the FW is making a big difference. One
>> that could be relevant is that the recent versions are using a device
>> tree.
>>
>> There might be no good solution to this issue but I fail to analyze
>> it correctly. Is there a way to collect information on the usage of
>> Translation Blocks ?
> 
> You could expand the data we collect in tb_tree_stats and expose it via
> info jit.

The "fast" run, U-Boot 2016.07, gives :


   Translation buffer state:
   gen code size       254880371/1073736704
   TB count            1089
   TB avg target size  16 max=356 bytes
   TB avg host size    278 bytes (expansion ratio: 17.2)
   cross page TB count 0 (0%)
   direct jump count   501 (46%) (2 jumps=372 34%)
   TB hash buckets     1025/8192 (12.51% head buckets used)
   TB hash occupancy   3.32% avg chain occ. Histogram: [0.0,7.5)%|█  ▁  ▁  ▁|[67.5,75.0]%
   TB hash avg chain   1.000 buckets. Histogram: 1|█|1
   
   Statistics:
   TB flush count      0
   TB invalidate count 0
   TLB full flushes    0
   TLB partial flushes 2
   TLB elided flushes  2338
   JIT cycles          2221788409 (0.926 s at 2.4 GHz)
   translated TBs      738520 (aborted=0 0.0%)
   avg ops/TB          15.7 max=459
   deleted ops/TB      2.72
   avg temps/TB        32.89 max=88
   avg host code/TB    113.7
   avg search data/TB  5.2
   cycles/op           192.0
   cycles/in byte      748.7
   cycles/out byte     26.4
   cycles/search byte     582.8
     gen_interm time   57.6%
     gen_code time     42.4%
   optim./code time    19.4%
   liveness/code time  26.1%
   cpu_restore count   0
     avg cycles        0.0
   

and the "slow", U-Boot 2019.04 :

Translation buffer state:
gen code size       368603795/1073736704
TB count            3052
TB avg target size  16 max=360 bytes
TB avg host size    293 bytes (expansion ratio: 17.6)
cross page TB count 0 (0%)
direct jump count   1431 (46%) (2 jumps=1104 36%)
TB hash buckets     2559/8192 (31.24% head buckets used)
TB hash occupancy   9.31% avg chain occ. Histogram: [0,10)%|█ ▃  ▁ ▁ ▁|[90,100]%
TB hash avg chain   1.000 buckets. Histogram: 1|█|1

Statistics:
TB flush count      3
TB invalidate count 0
TLB full flushes    0
TLB partial flushes 3
TLB elided flushes  2367
JIT cycles          26479044772 (11.033 s at 2.4 GHz)
translated TBs      10552169 (aborted=0 0.0%)
avg ops/TB          15.0 max=464
deleted ops/TB      2.44
avg temps/TB        32.43 max=89
avg host code/TB    99.0
avg search data/TB  5.0
cycles/op           167.7
cycles/in byte      626.8
cycles/out byte     25.4
cycles/search byte     499.4
   gen_interm time   50.4%
   gen_code time     49.6%
optim./code time    19.5%
liveness/code time  27.7%
cpu_restore count   0
   avg cycles        0.0


A lot more TBs.
   
C.


  reply	other threads:[~2022-06-30  8:53 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20220623102617.2164175-1-pdel@fb.com>
     [not found] ` <20220623102617.2164175-3-pdel@fb.com>
2022-06-23 12:11   ` [PATCH 02/14] sysbus: Remove sysbus_address_space Peter Maydell
     [not found] ` <20220623102617.2164175-5-pdel@fb.com>
2022-06-23 12:15   ` [PATCH 04/14] sysbus: Add sysbus_mmio_map_in Peter Maydell
2022-06-23 18:29     ` Peter Delevoryas
     [not found] ` <20220623102617.2164175-9-pdel@fb.com>
2022-06-23 12:57   ` [PATCH 08/14] aspeed: Replace direct get_system_memory() calls Peter Maydell
2022-06-23 15:39     ` Cédric Le Goater
2022-06-23 18:45       ` Peter Delevoryas
     [not found] ` <20220623102617.2164175-13-pdel@fb.com>
2022-06-23 15:09   ` [PATCH 12/14] aspeed: Make aspeed_board_init_flashes public Cédric Le Goater
2022-06-23 18:43     ` Peter Delevoryas
2022-06-24 16:50       ` Cédric Le Goater
2022-06-29  9:11         ` Cédric Le Goater
2022-06-29 14:14           ` Alex Bennée
2022-06-29 15:54             ` Cédric Le Goater
2022-06-29 18:24               ` Alex Bennée
2022-06-30  8:49                 ` Cédric Le Goater [this message]
2022-06-30  9:43                   ` Alex Bennée
2022-07-05 12:35                     ` Cédric Le Goater

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a35ac443-c93e-fb2b-6da9-2a3dc16f7d8c@kaod.org \
    --to=clg@kaod.org \
    --cc=alex.bennee@linaro.org \
    --cc=andrew@aj.id.au \
    --cc=ani@anisinha.ca \
    --cc=berrange@redhat.com \
    --cc=eduardo@habkost.net \
    --cc=f4bug@amsat.org \
    --cc=joel@jms.id.au \
    --cc=marcel.apfelbaum@gmail.com \
    --cc=pbonzini@redhat.com \
    --cc=pdel@fb.com \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-arm@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=richard.henderson@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).