From: "Alex Bennée" <alex.bennee@linaro.org>
To: "Cédric Le Goater" <clg@kaod.org>
Cc: "Peter Delevoryas" <pdel@fb.com>,
"Peter Maydell" <peter.maydell@linaro.org>,
"Andrew Jeffery" <andrew@aj.id.au>,
"Joel Stanley" <joel@jms.id.au>,
"pbonzini@redhat.com" <pbonzini@redhat.com>,
"berrange@redhat.com" <berrange@redhat.com>,
"eduardo@habkost.net" <eduardo@habkost.net>,
"marcel.apfelbaum@gmail.com" <marcel.apfelbaum@gmail.com>,
"richard.henderson@linaro.org" <richard.henderson@linaro.org>,
"Philippe Mathieu-Daudé" <f4bug@amsat.org>,
"ani@anisinha.ca" <ani@anisinha.ca>,
"Cameron Esfahani via" <qemu-devel@nongnu.org>,
qemu-arm <qemu-arm@nongnu.org>
Subject: Re: [PATCH 12/14] aspeed: Make aspeed_board_init_flashes public
Date: Wed, 29 Jun 2022 15:14:10 +0100 [thread overview]
Message-ID: <87zghvh7rt.fsf@linaro.org> (raw)
In-Reply-To: <07128acf-329a-f372-c48c-0c3cb498d3d0@kaod.org>
Cédric Le Goater <clg@kaod.org> writes:
> On 6/24/22 18:50, Cédric Le Goater wrote:
>> On 6/23/22 20:43, Peter Delevoryas wrote:
>>>
>>>
>>>> On Jun 23, 2022, at 8:09 AM, Cédric Le Goater <clg@kaod.org> wrote:
>>>>
>>>> On 6/23/22 12:26, Peter Delevoryas wrote:
>>>>> Signed-off-by: Peter Delevoryas <pdel@fb.com>
>>>>
>>>> Let's start simple without flash support. We should be able to
>>>> load FW blobs in each CPU address space using loader devices.
>>>
>>> Actually, I was unable to do this, perhaps because the fb OpenBMC
>>> boot sequence is a little weird. I specifically _needed_ to have
>>> a flash device which maps the firmware in at 0x2000_0000, because
>>> the fb OpenBMC U-Boot SPL jumps to that address to start executing
>>> from flash? I think this is also why fb OpenBMC machines can be so slow.
>>>
>>> $ ./build/qemu-system-arm -machine fby35 \
>>> -device loader,file=fby35.mtd,addr=0,cpu-num=0 -nographic \
>>> -d int -drive file=fby35.mtd,format=raw,if=mtd
>> Ideally we should be booting from the flash device directly using
>> the machine option '-M ast2600-evb,execute-in-place=true' like HW
>> does. Instructions are fetched using SPI transfers. But the amount
>> of code generated is tremendous.
Yeah because there is a potential race when reading from HW so we throw
away TB's after executing them because we have no way of knowing if it
has changed under our feet. See 873d64ac30 (accel/tcg: re-factor non-RAM
execution code) which cleaned up this handling.
>> See some profiling below for a
>> run which barely reaches DRAM training in U-Boot.
>
> Some more profiling on both ast2500 and ast2600 machines shows :
>
>
> * ast2600-evb,execute-in-place=true :
>
> Type Object Call site Wait Time (s) Count Average (us)
> ---------------------------------------------------------------------------------------------
> BQL mutex 0x564dc03922e0 accel/tcg/cputlb.c:1365 14.21443
> 32909927 0.43
This is unavoidable as a HW access needs the BQL held so we will go
through this cycle every executed instruction.
Did I miss why the flash contents are not mapped into the physical
address space? Isn't that how it appear to the processor?
> condvar 0x564dc0f02988 util/thread-pool.c:90 10.02312 56 178984.32
> condvar [ 2] softmmu/cpus.c:423 0.10051 6 16752.04
> BQL mutex 0x564dc03922e0 util/rcu.c:269 0.04372 4 10930.60
> BQL mutex 0x564dc03922e0 cpus-common.c:341 0.00151 8 189.16
> condvar 0x564dc0390360 cpus-common.c:176 0.00092 8 115.04
> condvar 0x564dc0392280 softmmu/cpus.c:642 0.00013 2 65.04
> condvar 0x564dc0392240 softmmu/cpus.c:571 0.00010 2 49.54
> BQL mutex 0x564dc03922e0 accel/tcg/cputlb.c:1426 0.00006 467 0.14
> condvar 0x564dc03903a0 cpus-common.c:206 0.00004 8 5.28
> ---------------------------------------------------------------------------------------------
>
>
> * ast2500-evb,execute-in-place=true :
>
> Type Object Call site Wait Time (s) Count Average (us)
> ---------------------------------------------------------------------------------------------
> condvar 0x55a581137f88 util/thread-pool.c:90 10.01158 28 357556.50
> BQL mutex 0x55a57f0e02e0 accel/tcg/cputlb.c:1365 0.29886 14394475 0.02
> condvar 0x55a5814cb5a0 softmmu/cpus.c:423 0.02182 2 10912.44
> BQL mutex 0x55a57f0e02e0 util/rcu.c:269 0.01420 4 3549.56
> mutex 0x55a5813381c0 tcg/region.c:204 0.00007 3052 0.02
> condvar 0x55a57f0e0280 softmmu/cpus.c:642 0.00006 1 59.79
> mutex [ 2] chardev/char.c:118 0.00003 1492 0.02
> BQL mutex 0x55a57f0e02e0 util/main-loop.c:318 0.00002 34 0.72
> BQL mutex 0x55a57f0e02e0 accel/tcg/cputlb.c:1426 0.00002 973 0.02
> condvar 0x55a57f0e0240 softmmu/cpus.c:571 0.00002 1 15.16
> ---------------------------------------------------------------------------------------------
>
> C.
>
>
>
>> * execute-in-place=true
>> Each sample counts as 0.01 seconds.
>> % cumulative self self total
>> time seconds seconds calls ns/call ns/call name
>> 100.00 0.02 0.02 164276 121.75 121.75 memory_region_init_rom_device
>> 0.00 0.02 0.00 1610346008 0.00 0.00 tcg_code_capacity
>> 0.00 0.02 0.00 567612621 0.00 0.00 type_register_static_array
>> 0.00 0.02 0.00 328886191 0.00 0.00 do_common_semihosting
>> 0.00 0.02 0.00 297215811 0.00 0.00 container_get
>> 0.00 0.02 0.00 292670030 0.00 0.00 arm_cpu_tlb_fill
>> 0.00 0.02 0.00 195416119 0.00 0.00 arm_cpu_register_gdb_regs_for_features
>> 0.00 0.02 0.00 193326677 0.00 0.00 object_type_get_instance_size
>> 0.00 0.02 0.00 182365829 0.00 0.00 tcg_op_insert_after
>> 0.00 0.02 0.00 150668458 0.00 0.00 plugin_gen_tb_end
>> 0.00 0.02 0.00 142171940 0.00 0.00 gen_new_label
>> 0.00 0.02 0.00 133200628 0.00 0.00 smbios_build_type_38_table
>> 0.00 0.02 0.00 130540338 0.00 0.00 object_dynamic_cast_assert
>> 0.00 0.02 0.00 129223195 0.00 0.00 cpu_loop_exit_atomic
>> 0.00 0.02 0.00 121759298 0.00 0.00 tcg_remove_ops_after
>> 0.00 0.02 0.00 116887887 0.00 0.00 in_code_gen_buffer
>> 0.00 0.02 0.00 111803833 0.00 0.00 tcg_emit_op
>> 0.00 0.02 0.00 106052221 0.00 0.00 object_class_dynamic_cast_assert
>> 0.00 0.02 0.00 99704054 0.00 0.00 __jit_debug_register_code
>> 0.00 0.02 0.00 97812458 0.00 0.00 object_get_class
>> 0.00 0.02 0.00 88952594 0.00 0.00 tcg_splitwx_to_rx
>> 0.00 0.02 0.00 85790920 0.00 0.00 object_class_dynamic_cast
>> 0.00 0.02 0.00 73780673 0.00 0.00 helper_exit_atomic
>> 0.00 0.02 0.00 65337482 0.00 0.00 tcg_op_supported
>> 0.00 0.02 0.00 61213619 0.00 0.00 tcg_func_start
>> 0.00 0.02 0.00 54477684 0.00 0.00 tcg_flush_softmmu_tlb
>> 0.00 0.02 0.00 53968980 0.00 0.00 tcg_temp_new_internal
>> 0.00 0.02 0.00 51526008 0.00 0.00 qemu_in_vcpu_thread
>> 0.00 0.02 0.00 40750952 0.00 0.00 pflash_cfi02_register
>> 0.00 0.02 0.00 38039442 0.00 0.00 tcg_gen_op2
>> 0.00 0.02 0.00 37068039 0.00 0.00 tcg_gen_op1
>> 0.00 0.02 0.00 36473276 0.00 0.00 tcg_gen_op3
>> 0.00 0.02 0.00 36310225 0.00 0.00 gen_gvec_uaba
>> 0.00 0.02 0.00 30985436 0.00 0.00 tb_set_jmp_target
>> 0.00 0.02 0.00 30291796 0.00 0.00 tcg_constant_internal
>> 0.00 0.02 0.00 29857950 0.00 0.00 ssi_transfer
>> * execute-in-place=false
>> Each sample counts as 0.01 seconds.
>> % cumulative self self total
>> time seconds seconds calls ns/call ns/call name
>> 40.00 0.02 0.02 551149 36.29 36.29 aspeed_board_init_flashes
>> 20.00 0.03 0.01 3937238 2.54 2.54 register_cp_regs_for_features
>> 20.00 0.04 0.01 674096 14.83 14.83 gen_gvec_uaba
>> 20.00 0.05 0.01 457461 21.86 21.86 finalize_target_page_bits
>> 0.00 0.05 0.00 5364258 0.00 0.00 arm_gt_hvtimer_cb
>> 0.00 0.05 0.00 2467532 0.00 0.00 helper_neon_narrow_sat_s8
>> 0.00 0.05 0.00 2431860 0.00 0.00 opb_opb2fsi_address
>> 0.00 0.05 0.00 1828453 0.00 0.00 cpsr_read
>> 0.00 0.05 0.00 1820659 0.00 0.00 cpu_get_tb_cpu_state
>> 0.00 0.05 0.00 1441344 0.00 0.00 arm_cpu_tlb_fill
>> 0.00 0.05 0.00 1427177 0.00 0.00 cxl_usp_to_cstate
>> 0.00 0.05 0.00 1161059 0.00 5.85 aarch64_sync_64_to_32
>> 0.00 0.05 0.00 886523 0.00 0.00 helper_iwmmxt_maxsb
>> 0.00 0.05 0.00 831393 0.00 0.00 arm_log_exception
>> 0.00 0.05 0.00 746940 0.00 0.00 helper_v7m_preserve_fp_state
>> 0.00 0.05 0.00 728354 0.00 0.00 hmp_calc_dirty_rate
>> 0.00 0.05 0.00 681634 0.00 0.00 helper_sadd8
>> 0.00 0.05 0.00 487743 0.00 7.14 qmp_query_cpu_definitions
>> 0.00 0.05 0.00 420528 0.00 0.00 arm_v7m_cpu_do_interrupt
>> 0.00 0.05 0.00 382245 0.00 0.00 helper_ssub8
>> 0.00 0.05 0.00 374192 0.00 0.00 helper_usub8
>> 0.00 0.05 0.00 347199 0.00 0.00 usb_msd_load_request
>> 0.00 0.05 0.00 325862 0.00 0.00 target_disas
>> 0.00 0.05 0.00 322375 0.00 0.00 arm_hcrx_el2_eff
>> 0.00 0.05 0.00 317835 0.00 0.00 virtio_bus_device_iommu_enabled
>> 0.00 0.05 0.00 309559 0.00 0.00 mig_throttle_counter_reset
>> 0.00 0.05 0.00 301557 0.00 0.00 ram_bytes_remaining
>> 0.00 0.05 0.00 292888 0.00 0.00 helper_v7m_blxns
>> 0.00 0.05 0.00 289093 0.00 0.00 tpm_util_show_buffer
>> 0.00 0.05 0.00 274156 0.00 0.00 helper_sxtb16
>> 0.00 0.05 0.00 273588 0.00 0.00 write_v7m_exception
>> 0.00 0.05 0.00 271619 0.00 0.00 page_size_init
>> 0.00 0.05 0.00 270247 0.00 0.00 qemu_fdt_setprop_sized_cells_from_array
>> 0.00 0.05 0.00 229643 0.00 14.69 helper_neon_addl_u32
--
Alex Bennée
next prev parent reply other threads:[~2022-06-29 14:40 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20220623102617.2164175-1-pdel@fb.com>
[not found] ` <20220623102617.2164175-3-pdel@fb.com>
2022-06-23 12:11 ` [PATCH 02/14] sysbus: Remove sysbus_address_space Peter Maydell
[not found] ` <20220623102617.2164175-5-pdel@fb.com>
2022-06-23 12:15 ` [PATCH 04/14] sysbus: Add sysbus_mmio_map_in Peter Maydell
2022-06-23 18:29 ` Peter Delevoryas
[not found] ` <20220623102617.2164175-9-pdel@fb.com>
2022-06-23 12:57 ` [PATCH 08/14] aspeed: Replace direct get_system_memory() calls Peter Maydell
2022-06-23 15:39 ` Cédric Le Goater
2022-06-23 18:45 ` Peter Delevoryas
[not found] ` <20220623102617.2164175-13-pdel@fb.com>
2022-06-23 15:09 ` [PATCH 12/14] aspeed: Make aspeed_board_init_flashes public Cédric Le Goater
2022-06-23 18:43 ` Peter Delevoryas
2022-06-24 16:50 ` Cédric Le Goater
2022-06-29 9:11 ` Cédric Le Goater
2022-06-29 14:14 ` Alex Bennée [this message]
2022-06-29 15:54 ` Cédric Le Goater
2022-06-29 18:24 ` Alex Bennée
2022-06-30 8:49 ` Cédric Le Goater
2022-06-30 9:43 ` Alex Bennée
2022-07-05 12:35 ` Cédric Le Goater
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87zghvh7rt.fsf@linaro.org \
--to=alex.bennee@linaro.org \
--cc=andrew@aj.id.au \
--cc=ani@anisinha.ca \
--cc=berrange@redhat.com \
--cc=clg@kaod.org \
--cc=eduardo@habkost.net \
--cc=f4bug@amsat.org \
--cc=joel@jms.id.au \
--cc=marcel.apfelbaum@gmail.com \
--cc=pbonzini@redhat.com \
--cc=pdel@fb.com \
--cc=peter.maydell@linaro.org \
--cc=qemu-arm@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=richard.henderson@linaro.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).