* Re: Why QEMU translates one instruction to a TB?
[not found] <tencent_EAC696641F035EB7E9885302EAAE37455907@qq.com>
@ 2020-09-17 7:38 ` Philippe Mathieu-Daudé
2020-09-17 7:45 ` Philippe Mathieu-Daudé
2020-09-17 8:41 ` Alex Bennée
2 siblings, 0 replies; 5+ messages in thread
From: Philippe Mathieu-Daudé @ 2020-09-17 7:38 UTC (permalink / raw)
To: casmac, qemu-devel; +Cc: Peter Maydell
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=gb18030, Size: 2793 bytes --]
On 9/17/20 8:25 AM, casmac wrote:
> Hi all,
> 0202 02 We try to add DSP architecure to QEMU 4.2. To load the COFF format
> object file, we have added loader code to load content from
> 02 the object file. The rom_add_blob() function is used. We firstly
> analyze the COFF file to figure out which sections are chained
> 02 together(so each chain forms a "memory blob"), and then allocate the
> memory blobs.
> 02
> 02 The psuedo code looks like:
> 02
> 020202 02 02 02 for(i=0; i<BADTYPE; i++){
> 02 02 02 02 02 02 if(ary_sect_chain[i].exist) 02 //there is a chain of sections
> to allocate
> 02 02 02 02 02 02 {
> 02 02 02 02 02 02 02 02 ary_sect_chain[i].mem_region = g_new(MemoryRegion, 1);
> 02 02 02 02 02 02 02 02 memory_region_init_ram(...);
> 02 02 02 02 02 02 02 02 memory_region_add_subregion(sysmem, ....);
> 02 02 02 02 02 02 02 02 rom_add_blob(....);
> 02 02 02 02 02 02 }
> 02 02 0202 02 }
Why do this silly mapping when you know your DSP memory map?
> ------------------------------------------------------
> ok.lds file:
>
> MEMORY 02 /* MEMORY directive */
> {
> 02 02 ROM:020202020202020202 02 origin = 000000h02 02 length = 001000h0202 02 /* 4K
> 32-bit words on-chip ROM (C31/VC33) */
Per the TI spru031f datasheet, this is external (there is no
on-chip ROM).
I have my doubts there is actually a ROM mapped here...
Is this linkscript used to *test* a BIOS written in SRAM by
some JTAG?
> 02 02 /* 256K 32-bit word off-chip SRAM (D.Module.VC33-150-S2) */
> 02 02 BIOS:02020202 02 origin = 001000h020202 02 length = 000300h
> 02 02 CONF_UTL: 02 origin = 001300h020202 02 length = 000800h
> 02 02 FREE:02020202 02 origin = 001B00h020202 02 length = 03F500h02 /* 259328 32-bit
> words */
> 02 02 RAM_0_1:0202 02 origin = 809800h02 02 length = 000800h0202 02 /* 2 x 1K
> 32-bit word on-chip SRAM (C31/VC33) */
> 02 02 RAM_2_3:0202 02 origin = 800000h02 02 length = 008000h0202 02 /* 2 x 16K
> 32-bit word on-chip SRAM (VC33 only) */
> }
You probably want to use:
memory_region_init_ram(&s->extsram, OBJECT(dev), "eSRAM",
256 * KiB, &error_fatal);
memory_region_add_subregion(get_system_memory(),
0x000000, &s->extsram);
memory_region_init_ram(&s->ocsram, OBJECT(dev), "iSRAM",
2 * KiB, &error_fatal);
memory_region_add_subregion(get_system_memory(),
0x809800, &s->ocsram);
Then different areas of the object file will be loaded into
the either the iSRAM or the eSRAM.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Why QEMU translates one instruction to a TB?
[not found] <tencent_EAC696641F035EB7E9885302EAAE37455907@qq.com>
2020-09-17 7:38 ` Why QEMU translates one instruction to a TB? Philippe Mathieu-Daudé
@ 2020-09-17 7:45 ` Philippe Mathieu-Daudé
[not found] ` <tencent_6FBC0FD37CA798D4766FE6B2822DAC3E2908@qq.com>
2020-09-17 8:41 ` Alex Bennée
2 siblings, 1 reply; 5+ messages in thread
From: Philippe Mathieu-Daudé @ 2020-09-17 7:45 UTC (permalink / raw)
To: casmac, qemu-devel; +Cc: Alex Bennée, Peter Maydell, Richard Henderson
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=gb18030, Size: 2013 bytes --]
On 9/17/20 8:25 AM, casmac wrote:
> Hi all,
> 0202 02 We try to add DSP architecure to QEMU 4.2. To load the COFF format
> object file, we have added loader code to load content from
> 02 the object file.
[...]
> 02 02 The COFF loader works functionally, but we then found that sometimes
> QEMU is down-graded - it treats each instruction as one TB. In version
> 4.2,02 debugging shows
> that get_page_addr_code_host() from accel/tcg/cputlb.c returns -1, as
> shown below.
>
> accel/tcg/cputlb.c:
> tb_page_addr_t get_page_addr_code_hostp(CPUArchState *env, target_ulong
> addr,
> 02020202020202020202020202020202020202020202020202020202020202020202020202 02 void **hostp)
> {
> 02 02 uintptr_t mmu_idx = cpu_mmu_index(env, true);
> 02 02 uintptr_t index = tlb_index(env, mmu_idx, addr);
> 02 02 CPUTLBEntry *entry = tlb_entry(env, mmu_idx, addr);
> 02 02 void *p;
>
> 02 02 //.....
> 02 02 if (unlikely(entry->addr_code & TLB_MMIO)) {
> 0202020202 02 /* The region is not backed by RAM.02 */
> 0202020202 02 if (hostp) {
> 020202020202020202 02 *hostp = NULL;
> 0202020202 02 }
> 0202020202 02 return -1;02 02 02 02 /* debugging falls to this branch, after this
> point QEMU translate one instruction to a TB02 */
> 02 02 }
> 02 02 //.......
> }02 02
>
> 02 02 One intresting fact is that this somehow depends on the linker
> command file. The object file generated by the following linker command
> file(per_instr.lds)
> will "trigger" the problem. But QEMU work well with the object file
> linked by the other linker command file (ok.lds).
> 02 02 What cause get_page_addr_code_hostp() function to return -1? I have
> no clue at all. Any advise is appreciated!!
Maybe the "execute from small-MMU-region RAM" problem?
See:
https://www.mail-archive.com/qemu-devel@nongnu.org/msg549660.html
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Why QEMU translates one instruction to a TB?
[not found] <tencent_EAC696641F035EB7E9885302EAAE37455907@qq.com>
2020-09-17 7:38 ` Why QEMU translates one instruction to a TB? Philippe Mathieu-Daudé
2020-09-17 7:45 ` Philippe Mathieu-Daudé
@ 2020-09-17 8:41 ` Alex Bennée
2 siblings, 0 replies; 5+ messages in thread
From: Alex Bennée @ 2020-09-17 8:41 UTC (permalink / raw)
To: casmac; +Cc: Peter Maydell, qemu-devel
casmac <climber.cui@qq.com> writes:
> Hi all,
> We try to add DSP architecure to QEMU 4.2. To load the COFF format object file, we have added loader code to load content from
> the object file. The rom_add_blob() function is used. We firstly analyze the COFF file to figure out which sections are chained
> together(so each chain forms a "memory blob"), and then allocate the memory blobs.
>
> The psuedo code looks like:
>
> for(i=0; i<BADTYPE; i++){
> if(ary_sect_chain[i].exist) //there is a chain of sections to allocate
> {
> ary_sect_chain[i].mem_region = g_new(MemoryRegion, 1);
> memory_region_init_ram(...);
> memory_region_add_subregion(sysmem, ....);
> rom_add_blob(....);
> }
> }
>
<snip>
> if (unlikely(entry->addr_code & TLB_MMIO)) {
> /* The region is not backed by
> RAM. */
This is the crux of it. If the address looked up isn't in a RAM region
then the TLB code can't assume a contiguous page of instructions or that
the instruction executed on one read will be the same on the next so it
will only execute a single instruction at a time and not cache the
resulting TB either forcing a fresh re-translation each time.
All TLB_MMIO access basically force the slow path.
I suspect there is something wrong in your memory region mappings.
--
Alex Bennée
^ permalink raw reply [flat|nested] 5+ messages in thread