* should ioapic_service really be modelling cpu writes?
@ 2022-11-10 17:01 Alex Bennée
2022-11-10 17:55 ` Alex Bennée
0 siblings, 1 reply; 8+ messages in thread
From: Alex Bennée @ 2022-11-10 17:01 UTC (permalink / raw)
To: QEMU Developers
Cc: Michael S. Tsirkin, Paolo Bonzini, Peter Xu,
Philippe Mathieu-Daudé, Peter Maydell
Hi,
I've been trying to remove current_cpu hacks from our hw/ emulation and
replace them with an explicit cpu_index derived from MemTxAttrs. So far
this has been going mostly ok but I've run into problems on x86 due to
the way the apic/ioapic are modelled. It comes down to the function
ioapic_service() which eventually does:
/* No matter whether IR is enabled, we translate
* the IOAPIC message into a MSI one, and its
* address space will decide whether we need a
* translation. */
stl_le_phys(ioapic_as, info.addr, info.data);
Which essentially calls the memory API to simulate a memory write.
However to generate a properly formed MemTxAttrs we need to know what
CPU we are running on. In the case of ioapic_service() we may well be in
the main thread either for an expiring timer:
hpet_timer->qemu_set_irq->ioapic_set_irq
or for reset handling:
pc_machine_reset->hpet_reset->qemu_set_irq->ioapic_set_irq
neither of which can get a associated CPU. I assume if the actual writes
are triggered we never actually actioned stuff because we had:
DeviceState *cpu_get_current_apic(void)
{
if (current_cpu) {
X86CPU *cpu = X86_CPU(current_cpu);
return cpu->apic_state;
} else {
return NULL;
}
}
which basically causes the updates to be dropped on the floor. However
during the conversion I went with something like:
if (attrs.requester_type != MTRT_CPU) {
return MEMTX_ACCESS_ERROR;
}
dev = cpu_get_current_apic(attrs.requester_id);
but that breaks current behaviour. It makes me wonder if the modelling
of the APIC is really representative of what the real HW does or if the
memory writes are simply a shortcut that happens to work?
--
Alex Bennée
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: should ioapic_service really be modelling cpu writes?
2022-11-10 17:01 should ioapic_service really be modelling cpu writes? Alex Bennée
@ 2022-11-10 17:55 ` Alex Bennée
2022-11-10 22:42 ` Peter Xu
0 siblings, 1 reply; 8+ messages in thread
From: Alex Bennée @ 2022-11-10 17:55 UTC (permalink / raw)
To: QEMU Developers
Cc: Michael S. Tsirkin, Paolo Bonzini, Peter Xu,
Philippe Mathieu-Daudé, Peter Maydell
Alex Bennée <alex.bennee@linaro.org> writes:
> Hi,
>
> I've been trying to remove current_cpu hacks from our hw/ emulation and
> replace them with an explicit cpu_index derived from MemTxAttrs. So far
> this has been going mostly ok but I've run into problems on x86 due to
> the way the apic/ioapic are modelled. It comes down to the function
> ioapic_service() which eventually does:
>
> /* No matter whether IR is enabled, we translate
> * the IOAPIC message into a MSI one, and its
> * address space will decide whether we need a
> * translation. */
> stl_le_phys(ioapic_as, info.addr, info.data);
>
> Which essentially calls the memory API to simulate a memory write.
> However to generate a properly formed MemTxAttrs we need to know what
> CPU we are running on. In the case of ioapic_service() we may well be in
> the main thread either for an expiring timer:
>
> hpet_timer->qemu_set_irq->ioapic_set_irq
>
> or for reset handling:
>
> pc_machine_reset->hpet_reset->qemu_set_irq->ioapic_set_irq
>
> neither of which can get a associated CPU. I assume if the actual writes
> are triggered we never actually actioned stuff because we had:
>
> DeviceState *cpu_get_current_apic(void)
> {
> if (current_cpu) {
> X86CPU *cpu = X86_CPU(current_cpu);
> return cpu->apic_state;
> } else {
> return NULL;
> }
> }
>
> which basically causes the updates to be dropped on the floor. However
> during the conversion I went with something like:
>
> if (attrs.requester_type != MTRT_CPU) {
> return MEMTX_ACCESS_ERROR;
> }
> dev = cpu_get_current_apic(attrs.requester_id);
>
> but that breaks current behaviour. It makes me wonder if the modelling
> of the APIC is really representative of what the real HW does or if the
> memory writes are simply a shortcut that happens to work?
And indeed we see call chains like this:
#0 apic_mem_write (opaque=0x555556fb1d40, addr=4100, val=16417, size=4, attrs=...) at ../../hw/intc/apic.c:754
#1 0x0000555555da4f89 in memory_region_write_with_attrs_accessor (mr=0x555556fb1de0, addr=4100, value=0x7ffff094f2c8, size=4, shift=0, mask=4294967295, attrs=...) at ../../
softmmu/memory.c:513
#2 0x0000555555da50ce in access_with_adjusted_size (addr=4100, value=0x7ffff094f2c8, size=4, access_size_min=1, access_size_max=4, access_fn=0x555555da4e8b <memory_region_w
rite_with_attrs_accessor>, mr=0x555556fb1de0, attrs=...) at ../../softmmu/memory.c:554
#3 0x0000555555da82ef in memory_region_dispatch_write (mr=0x555556fb1de0, addr=4100, data=16417, op=MO_32, attrs=...) at ../../softmmu/memory.c:1521
#4 0x0000555555db7ecd in address_space_stl_internal (as=0x555557b786c8, addr=4276097028, val=16417, attrs=..., result=0x0, endian=DEVICE_LITTLE_ENDIAN) at /home/alex.bennee
/lsrc/qemu.git/memory_ldst.c.inc:319
#5 0x0000555555db800c in address_space_stl_le (as=0x555557b786c8, addr=4276097028, val=16417, attrs=..., result=0x0) at /home/alex.bennee/lsrc/qemu.git/memory_ldst.c.inc:35
7
#6 0x0000555555a0ac53 in pci_msi_trigger (dev=0x555557b78490, msg=...) at ../../hw/pci/pci.c:326
#7 0x0000555555a07374 in msi_send_message (dev=0x555557b78490, msg=...) at ../../hw/pci/msi.c:378
#8 0x0000555555a07327 in msi_notify (dev=0x555557b78490, vector=0) at ../../hw/pci/msi.c:373
#9 0x00005555559572a6 in ahci_irq_raise (s=0x555557b78eb0) at ../../hw/ide/ahci.c:187
#10 0x0000555555957401 in ahci_check_irq (s=0x555557b78eb0) at ../../hw/ide/ahci.c:221
#11 0x00005555559574ea in ahci_trigger_irq (s=0x555557b78eb0, d=0x555557b7adb0, irqbit=AHCI_PORT_IRQ_BIT_PSS) at ../../hw/ide/ahci.c:240
#12 0x000055555595a6b9 in ahci_pio_transfer (dma=0x555557b7adb0) at ../../hw/ide/ahci.c:1402
#13 0x000055555595e948 in ide_transfer_start_norecurse (s=0x555557b7ae70, buf=0x555557b84800 "@", size=512, end_transfer_func=0x55555595ea51 <ide_transfer_stop>) at ../../hw
/ide/core.c:566
#14 0x000055555595e981 in ide_transfer_start (s=0x555557b7ae70, buf=0x555557b84800 "@", size=512, end_transfer_func=0x55555595ea51 <ide_transfer_stop>) at ../../hw/ide/core.
c:573
#15 0x0000555555960e0d in cmd_identify (s=0x555557b7ae70, cmd=236 '\354') at ../../hw/ide/core.c:1441
#16 0x0000555555962759 in ide_exec_cmd (bus=0x555557b7ade8, val=236) at ../../hw/ide/core.c:2149
#17 0x000055555595a12d in handle_reg_h2d_fis (s=0x555557b78eb0, port=0, slot=0 '\000', cmd_fis=0x7fff5dca0500 "'\200", <incomplete sequence \354>) at ../../hw/ide/ahci.c:127
1
#18 0x000055555595a39a in handle_cmd (s=0x555557b78eb0, port=0, slot=0 '\000') at ../../hw/ide/ahci.c:1322
#19 0x000055555595817f in check_cmd (s=0x555557b78eb0, port=0) at ../../hw/ide/ahci.c:594
#20 0x00005555559579ae in ahci_port_write (s=0x555557b78eb0, port=0, offset=56, val=1) at ../../hw/ide/ahci.c:374
#21 0x0000555555957f16 in ahci_mem_write (opaque=0x555557b78eb0, addr=312, val=1, size=4) at ../../hw/ide/ahci.c:515
#22 0x0000555555da4e80 in memory_region_write_accessor (mr=0x555557b78ee0, addr=312, value=0x7ffff094f868, size=4, shift=0, mask=4294967295, attrs=...) at ../../softmmu/memo
ry.c:492
#23 0x0000555555da50ce in access_with_adjusted_size (addr=312, value=0x7ffff094f868, size=4, access_size_min=1, access_size_max=4, access_fn=0x555555da4d86 <memory_region_wr
ite_accessor>, mr=0x555557b78ee0, attrs=...) at ../../softmmu/memory.c:554
#24 0x0000555555da82b1 in memory_region_dispatch_write (mr=0x555557b78ee0, addr=312, data=1, op=MO_32, attrs=...) at ../../softmmu/memory.c:1514
#25 0x0000555555e391ac in io_writex (env=0x555557171eb0, full=0x7fffe89ebba8, mmu_idx=2, val=1, addr=18446632136124748088, retaddr=140734909768034, op=MO_32) at ../../accel/
tcg/cputlb.c:1434
#26 0x0000555555e3b861 in store_helper (env=0x555557171eb0, addr=18446632136124748088, val=1, oi=34, retaddr=140734909768034, op=MO_32) at ../../accel/tcg/cputlb.c:2375
#27 0x0000555555e3bbd7 in full_le_stl_mmu (env=0x555557171eb0, addr=18446632136124748088, val=1, oi=34, retaddr=140734909768034) at ../../accel/tcg/cputlb.c:2463
#28 0x0000555555e3bc15 in helper_le_stl_mmu (env=0x555557171eb0, addr=18446632136124748088, val=1, oi=34, retaddr=140734909768034) at ../../accel/tcg/cputlb.c:2469
#29 0x00007fff664de162 in code_gen_buffer ()
#30 0x0000555555e268f9 in cpu_tb_exec (cpu=0x555557171110, itb=0x7fffa64ddfc0, tb_exit=0x7ffff094ffd0) at ../../accel/tcg/cpu-exec.c:438
#31 0x0000555555e275c3 in cpu_loop_exec_tb (cpu=0x555557171110, tb=0x7fffa64ddfc0, pc=18446744071770558920, last_tb=0x7ffff094ffe0, tb_exit=0x7ffff094ffd0) at ../../accel/tc
g/cpu-exec.c:868
#32 0x0000555555e279bb in cpu_exec (cpu=0x555557171110) at ../../accel/tcg/cpu-exec.c:1032
#33 0x0000555555e54d99 in tcg_cpus_exec (cpu=0x555557171110) at ../../accel/tcg/tcg-accel-ops.c:69
#34 0x0000555555e55481 in mttcg_cpu_thread_fn (arg=0x555557171110) at ../../accel/tcg/tcg-accel-ops-mttcg.c:95
#35 0x00005555560553fe in qemu_thread_start (args=0x555556aa22e0) at ../../util/qemu-thread-posix.c:505
#36 0x00007ffff5201b43 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#37 0x00007ffff5293a00 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
So the CPU has triggered an access to the AHCI controller which then
triggers pci_msi_trigger() and in the old world assumed the MSI IRQ
would arrive at the same CPU. Is this how PCI IRQs work? Can you not
control which lane of the APIC receives a given IRQ line from the PCI
bus?
--
Alex Bennée
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: should ioapic_service really be modelling cpu writes?
2022-11-10 17:55 ` Alex Bennée
@ 2022-11-10 22:42 ` Peter Xu
2022-11-11 11:08 ` Paolo Bonzini
2022-11-11 12:26 ` Alex Bennée
0 siblings, 2 replies; 8+ messages in thread
From: Peter Xu @ 2022-11-10 22:42 UTC (permalink / raw)
To: Alex Bennée
Cc: QEMU Developers, Michael S. Tsirkin, Paolo Bonzini,
Philippe Mathieu-Daudé, Peter Maydell
Hi, Alex,
On Thu, Nov 10, 2022 at 05:55:51PM +0000, Alex Bennée wrote:
>
> Alex Bennée <alex.bennee@linaro.org> writes:
>
> > Hi,
> >
> > I've been trying to remove current_cpu hacks from our hw/ emulation and
> > replace them with an explicit cpu_index derived from MemTxAttrs. So far
> > this has been going mostly ok but I've run into problems on x86 due to
> > the way the apic/ioapic are modelled. It comes down to the function
> > ioapic_service() which eventually does:
> >
> > /* No matter whether IR is enabled, we translate
> > * the IOAPIC message into a MSI one, and its
> > * address space will decide whether we need a
> > * translation. */
> > stl_le_phys(ioapic_as, info.addr, info.data);
> >
> > Which essentially calls the memory API to simulate a memory write.
> > However to generate a properly formed MemTxAttrs we need to know what
> > CPU we are running on. In the case of ioapic_service() we may well be in
> > the main thread either for an expiring timer:
> >
> > hpet_timer->qemu_set_irq->ioapic_set_irq
> >
> > or for reset handling:
> >
> > pc_machine_reset->hpet_reset->qemu_set_irq->ioapic_set_irq
> >
> > neither of which can get a associated CPU. I assume if the actual writes
> > are triggered we never actually actioned stuff because we had:
> >
> > DeviceState *cpu_get_current_apic(void)
> > {
> > if (current_cpu) {
> > X86CPU *cpu = X86_CPU(current_cpu);
> > return cpu->apic_state;
> > } else {
> > return NULL;
> > }
> > }
> >
> > which basically causes the updates to be dropped on the floor.
I think it shouldn't? Normally the irq will be in MSI format (IOAPIC will
translate to an MSI in QEMU, per ioapic_entry_parse()).
I had a feeling that it'll just go the shortcut here (MSI always starts
with 0xfeeXXXXX so definitely bigger than 0xfff):
if (addr > 0xfff || !index) {
/* MSI and MMIO APIC are at the same memory location,
* but actually not on the global bus: MSI is on PCI bus
* APIC is connected directly to the CPU.
* Mapping them on the global bus happens to work because
* MSI registers are reserved in APIC MMIO and vice versa. */
MSIMessage msi = { .address = addr, .data = val };
apic_send_msi(&msi);
return;
}
apic_send_msi() doesn't need a cpu context.
> > However
> > during the conversion I went with something like:
> >
> > if (attrs.requester_type != MTRT_CPU) {
> > return MEMTX_ACCESS_ERROR;
> > }
> > dev = cpu_get_current_apic(attrs.requester_id);
> >
> > but that breaks current behaviour. It makes me wonder if the modelling
> > of the APIC is really representative of what the real HW does or if the
> > memory writes are simply a shortcut that happens to work?
No expert on that, but per my understanding ioapic isn't really bound to
any apic, so it doesn't need any cpu context. As a quick reference of
that, one can look at Intel SDM Vol 3 Chap 10, figure 10.3 will be a
generic modern x86_64 system APIC structure.
In hardware there should have a 3-wire apic bus that take care of all the
messaging, including at least not only ioapic irqs to any cores, or IPIs
between the cores. The messages coming from the ioapic should not require
any apic too as it can come from devices, just like what we do with qemu
when the device does things like pci_set_irq(), iiuc.
>
> And indeed we see call chains like this:
>
> #0 apic_mem_write (opaque=0x555556fb1d40, addr=4100, val=16417, size=4, attrs=...) at ../../hw/intc/apic.c:754
> #1 0x0000555555da4f89 in memory_region_write_with_attrs_accessor (mr=0x555556fb1de0, addr=4100, value=0x7ffff094f2c8, size=4, shift=0, mask=4294967295, attrs=...) at ../../
> softmmu/memory.c:513
> #2 0x0000555555da50ce in access_with_adjusted_size (addr=4100, value=0x7ffff094f2c8, size=4, access_size_min=1, access_size_max=4, access_fn=0x555555da4e8b <memory_region_w
> rite_with_attrs_accessor>, mr=0x555556fb1de0, attrs=...) at ../../softmmu/memory.c:554
> #3 0x0000555555da82ef in memory_region_dispatch_write (mr=0x555556fb1de0, addr=4100, data=16417, op=MO_32, attrs=...) at ../../softmmu/memory.c:1521
> #4 0x0000555555db7ecd in address_space_stl_internal (as=0x555557b786c8, addr=4276097028, val=16417, attrs=..., result=0x0, endian=DEVICE_LITTLE_ENDIAN) at /home/alex.bennee
> /lsrc/qemu.git/memory_ldst.c.inc:319
> #5 0x0000555555db800c in address_space_stl_le (as=0x555557b786c8, addr=4276097028, val=16417, attrs=..., result=0x0) at /home/alex.bennee/lsrc/qemu.git/memory_ldst.c.inc:35
> 7
> #6 0x0000555555a0ac53 in pci_msi_trigger (dev=0x555557b78490, msg=...) at ../../hw/pci/pci.c:326
> #7 0x0000555555a07374 in msi_send_message (dev=0x555557b78490, msg=...) at ../../hw/pci/msi.c:378
> #8 0x0000555555a07327 in msi_notify (dev=0x555557b78490, vector=0) at ../../hw/pci/msi.c:373
> #9 0x00005555559572a6 in ahci_irq_raise (s=0x555557b78eb0) at ../../hw/ide/ahci.c:187
> #10 0x0000555555957401 in ahci_check_irq (s=0x555557b78eb0) at ../../hw/ide/ahci.c:221
> #11 0x00005555559574ea in ahci_trigger_irq (s=0x555557b78eb0, d=0x555557b7adb0, irqbit=AHCI_PORT_IRQ_BIT_PSS) at ../../hw/ide/ahci.c:240
> #12 0x000055555595a6b9 in ahci_pio_transfer (dma=0x555557b7adb0) at ../../hw/ide/ahci.c:1402
> #13 0x000055555595e948 in ide_transfer_start_norecurse (s=0x555557b7ae70, buf=0x555557b84800 "@", size=512, end_transfer_func=0x55555595ea51 <ide_transfer_stop>) at ../../hw
> /ide/core.c:566
> #14 0x000055555595e981 in ide_transfer_start (s=0x555557b7ae70, buf=0x555557b84800 "@", size=512, end_transfer_func=0x55555595ea51 <ide_transfer_stop>) at ../../hw/ide/core.
> c:573
> #15 0x0000555555960e0d in cmd_identify (s=0x555557b7ae70, cmd=236 '\354') at ../../hw/ide/core.c:1441
> #16 0x0000555555962759 in ide_exec_cmd (bus=0x555557b7ade8, val=236) at ../../hw/ide/core.c:2149
> #17 0x000055555595a12d in handle_reg_h2d_fis (s=0x555557b78eb0, port=0, slot=0 '\000', cmd_fis=0x7fff5dca0500 "'\200", <incomplete sequence \354>) at ../../hw/ide/ahci.c:127
> 1
> #18 0x000055555595a39a in handle_cmd (s=0x555557b78eb0, port=0, slot=0 '\000') at ../../hw/ide/ahci.c:1322
> #19 0x000055555595817f in check_cmd (s=0x555557b78eb0, port=0) at ../../hw/ide/ahci.c:594
> #20 0x00005555559579ae in ahci_port_write (s=0x555557b78eb0, port=0, offset=56, val=1) at ../../hw/ide/ahci.c:374
> #21 0x0000555555957f16 in ahci_mem_write (opaque=0x555557b78eb0, addr=312, val=1, size=4) at ../../hw/ide/ahci.c:515
> #22 0x0000555555da4e80 in memory_region_write_accessor (mr=0x555557b78ee0, addr=312, value=0x7ffff094f868, size=4, shift=0, mask=4294967295, attrs=...) at ../../softmmu/memo
> ry.c:492
> #23 0x0000555555da50ce in access_with_adjusted_size (addr=312, value=0x7ffff094f868, size=4, access_size_min=1, access_size_max=4, access_fn=0x555555da4d86 <memory_region_wr
> ite_accessor>, mr=0x555557b78ee0, attrs=...) at ../../softmmu/memory.c:554
> #24 0x0000555555da82b1 in memory_region_dispatch_write (mr=0x555557b78ee0, addr=312, data=1, op=MO_32, attrs=...) at ../../softmmu/memory.c:1514
> #25 0x0000555555e391ac in io_writex (env=0x555557171eb0, full=0x7fffe89ebba8, mmu_idx=2, val=1, addr=18446632136124748088, retaddr=140734909768034, op=MO_32) at ../../accel/
> tcg/cputlb.c:1434
> #26 0x0000555555e3b861 in store_helper (env=0x555557171eb0, addr=18446632136124748088, val=1, oi=34, retaddr=140734909768034, op=MO_32) at ../../accel/tcg/cputlb.c:2375
> #27 0x0000555555e3bbd7 in full_le_stl_mmu (env=0x555557171eb0, addr=18446632136124748088, val=1, oi=34, retaddr=140734909768034) at ../../accel/tcg/cputlb.c:2463
> #28 0x0000555555e3bc15 in helper_le_stl_mmu (env=0x555557171eb0, addr=18446632136124748088, val=1, oi=34, retaddr=140734909768034) at ../../accel/tcg/cputlb.c:2469
> #29 0x00007fff664de162 in code_gen_buffer ()
> #30 0x0000555555e268f9 in cpu_tb_exec (cpu=0x555557171110, itb=0x7fffa64ddfc0, tb_exit=0x7ffff094ffd0) at ../../accel/tcg/cpu-exec.c:438
> #31 0x0000555555e275c3 in cpu_loop_exec_tb (cpu=0x555557171110, tb=0x7fffa64ddfc0, pc=18446744071770558920, last_tb=0x7ffff094ffe0, tb_exit=0x7ffff094ffd0) at ../../accel/tc
> g/cpu-exec.c:868
> #32 0x0000555555e279bb in cpu_exec (cpu=0x555557171110) at ../../accel/tcg/cpu-exec.c:1032
> #33 0x0000555555e54d99 in tcg_cpus_exec (cpu=0x555557171110) at ../../accel/tcg/tcg-accel-ops.c:69
> #34 0x0000555555e55481 in mttcg_cpu_thread_fn (arg=0x555557171110) at ../../accel/tcg/tcg-accel-ops-mttcg.c:95
> #35 0x00005555560553fe in qemu_thread_start (args=0x555556aa22e0) at ../../util/qemu-thread-posix.c:505
> #36 0x00007ffff5201b43 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
> #37 0x00007ffff5293a00 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
>
> So the CPU has triggered an access to the AHCI controller which then
> triggers pci_msi_trigger() and in the old world assumed the MSI IRQ
> would arrive at the same CPU. Is this how PCI IRQs work? Can you not
> control which lane of the APIC receives a given IRQ line from the PCI
> bus?
AFAICT apic_mem_write() doesn't mean that this cpu will take this IRQ. The
target core to respond to the IRQ will be defined in the dest ID field of
either an MSI message or embeded in the IOAPIC entry being setup by the
guest driver. E.g. MSI message format can also be found in SDM Vol 3 chap
10.11.1, in QEMU we can refer to "dest" field of apic_send_msi().
Thanks,
--
Peter Xu
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: should ioapic_service really be modelling cpu writes?
2022-11-10 22:42 ` Peter Xu
@ 2022-11-11 11:08 ` Paolo Bonzini
2022-11-11 12:26 ` Alex Bennée
1 sibling, 0 replies; 8+ messages in thread
From: Paolo Bonzini @ 2022-11-11 11:08 UTC (permalink / raw)
To: Peter Xu, Alex Bennée
Cc: QEMU Developers, Michael S. Tsirkin, Philippe Mathieu-Daudé,
Peter Maydell
On 11/10/22 23:42, Peter Xu wrote:
> I think it shouldn't? Normally the irq will be in MSI format (IOAPIC will
> translate to an MSI in QEMU, per ioapic_entry_parse()).
>
> I had a feeling that it'll just go the shortcut here (MSI always starts
> with 0xfeeXXXXX so definitely bigger than 0xfff):
Note that QEMU subtracts 0xfee00000 by the time you get to
apic_mem_write, but still yes, that's what happens for IOAPIC. The
write is on the PCI bus.
> if (addr > 0xfff || !index) {
> /* MSI and MMIO APIC are at the same memory location,
> * but actually not on the global bus: MSI is on PCI bus
> * APIC is connected directly to the CPU.
> * Mapping them on the global bus happens to work because
> * MSI registers are reserved in APIC MMIO and vice versa. */
> MSIMessage msi = { .address = addr, .data = val };
> apic_send_msi(&msi);
> return;
> }
>
> apic_send_msi() doesn't need a cpu context.
Alex, perhaps you can change the shortcut to
if (size < 4) {
return;
}
dev = cpu_get_current_apic(memtxattrs);
if (!dev) {
/* comment here... */
MSIMessage msi = { .address = addr, .data = val };
apic_send_msi(&msi);
return;
}
s = APIC(dev);
...
Paolo
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: should ioapic_service really be modelling cpu writes?
2022-11-10 22:42 ` Peter Xu
2022-11-11 11:08 ` Paolo Bonzini
@ 2022-11-11 12:26 ` Alex Bennée
2022-11-11 13:14 ` Paolo Bonzini
1 sibling, 1 reply; 8+ messages in thread
From: Alex Bennée @ 2022-11-11 12:26 UTC (permalink / raw)
To: Peter Xu
Cc: QEMU Developers, Michael S. Tsirkin, Paolo Bonzini,
Philippe Mathieu-Daudé, Peter Maydell
Peter Xu <peterx@redhat.com> writes:
> Hi, Alex,
>
> On Thu, Nov 10, 2022 at 05:55:51PM +0000, Alex Bennée wrote:
>>
>> Alex Bennée <alex.bennee@linaro.org> writes:
>>
>> > Hi,
>> >
>> > I've been trying to remove current_cpu hacks from our hw/ emulation and
>> > replace them with an explicit cpu_index derived from MemTxAttrs. So far
>> > this has been going mostly ok but I've run into problems on x86 due to
>> > the way the apic/ioapic are modelled. It comes down to the function
>> > ioapic_service() which eventually does:
>> >
>> > /* No matter whether IR is enabled, we translate
>> > * the IOAPIC message into a MSI one, and its
>> > * address space will decide whether we need a
>> > * translation. */
>> > stl_le_phys(ioapic_as, info.addr, info.data);
>> >
>> > Which essentially calls the memory API to simulate a memory write.
>> > However to generate a properly formed MemTxAttrs we need to know what
>> > CPU we are running on. In the case of ioapic_service() we may well be in
>> > the main thread either for an expiring timer:
>> >
>> > hpet_timer->qemu_set_irq->ioapic_set_irq
>> >
>> > or for reset handling:
>> >
>> > pc_machine_reset->hpet_reset->qemu_set_irq->ioapic_set_irq
>> >
>> > neither of which can get a associated CPU. I assume if the actual writes
>> > are triggered we never actually actioned stuff because we had:
>> >
>> > DeviceState *cpu_get_current_apic(void)
>> > {
>> > if (current_cpu) {
>> > X86CPU *cpu = X86_CPU(current_cpu);
>> > return cpu->apic_state;
>> > } else {
>> > return NULL;
>> > }
>> > }
>> >
>> > which basically causes the updates to be dropped on the floor.
>
> I think it shouldn't? Normally the irq will be in MSI format (IOAPIC will
> translate to an MSI in QEMU, per ioapic_entry_parse()).
>
> I had a feeling that it'll just go the shortcut here (MSI always starts
> with 0xfeeXXXXX so definitely bigger than 0xfff):
>
> if (addr > 0xfff || !index) {
> /* MSI and MMIO APIC are at the same memory location,
> * but actually not on the global bus: MSI is on PCI bus
> * APIC is connected directly to the CPU.
> * Mapping them on the global bus happens to work because
> * MSI registers are reserved in APIC MMIO and vice versa. */
> MSIMessage msi = { .address = addr, .data = val };
> apic_send_msi(&msi);
> return;
> }
>
> apic_send_msi() doesn't need a cpu context.
Ahh so yes maybe my changes where too quick to reject the MMIO. So I've
made the following tweak to ioapic_service():
/*
* No matter whether IR is enabled, we translate
* the IOAPIC message into a MSI one, and its
* address space will decide whether we need a
* translation.
*
* As it is an access from something other than the
* CPU or a PCI device we set its source as machine
* specific.
*/
{
MemTxAttrs attrs = MEMTXATTRS_MACHINE(MEMTX_IOAPIC);
MemTxResult res;
address_space_stl_le(ioapic_as, info.addr, info.data,
attrs, &res);
if (res != MEMTX_ERROR) {
qemu_log_mask(LOG_GUEST_ERROR,
"%s: couldn't write to %"PRIx32"\n", __func__, info.addr);
}
}
and I had already tweaked the pci_msi_trigger so:
static void pci_msi_trigger(PCIDevice *dev, MSIMessage msg)
{
MemTxAttrs attrs = {
.requester_type = MTRT_PCI,
.requester_id = pci_requester_id(dev)
};
address_space_stl_le(&dev->bus_master_as, msg.address, msg.data,
attrs, NULL);
}
> No expert on that, but per my understanding ioapic isn't really bound to
> any apic, so it doesn't need any cpu context. As a quick reference of
> that, one can look at Intel SDM Vol 3 Chap 10, figure 10.3 will be a
> generic modern x86_64 system APIC structure.
>
> In hardware there should have a 3-wire apic bus that take care of all the
> messaging, including at least not only ioapic irqs to any cores, or IPIs
> between the cores. The messages coming from the ioapic should not require
> any apic too as it can come from devices, just like what we do with qemu
> when the device does things like pci_set_irq(), iiuc.
<snip>
>
> AFAICT apic_mem_write() doesn't mean that this cpu will take this IRQ. The
> target core to respond to the IRQ will be defined in the dest ID field of
> either an MSI message or embeded in the IOAPIC entry being setup by the
> guest driver. E.g. MSI message format can also be found in SDM Vol 3 chap
> 10.11.1, in QEMU we can refer to "dest" field of apic_send_msi().
So now the start of apic_mem_write checks and validates MSIs first:
static MemTxResult apic_mem_write(void *opaque, hwaddr addr, uint64_t val,
unsigned int size, MemTxAttrs attrs)
{
DeviceState *dev;
APICCommonState *s;
int index = (addr >> 4) & 0xff;
if (size < 4) {
return MEMTX_ERROR;
}
/*
* MSI and MMIO APIC are at the same memory location, but actually
* not on the global bus: MSI is on PCI bus APIC is connected
* directly to the CPU.
*
* We can check the MemTxAttrs to check they are coming from where
* we expect. Even though the MSI registers are reserved in APIC
* MMIO and vice versa they shouldn't respond to CPU writes.
*/
if (addr > 0xfff || !index) {
switch (attrs.requester_type) {
case MTRT_MACHINE:
/* should be from the directly wired IOPIC */
if (attrs.requester_id != MEMTX_IOAPIC) {
qemu_log_mask(LOG_GUEST_ERROR,
"%s: rejecting machine write from something other that IOPIC (%x)",
__func__, attrs.requester_id);
return MEMTX_ACCESS_ERROR;
}
break;
case MTRT_PCI:
/* PCI signalled MSI */
break;
case MTRT_UNSPECIFIED:
qemu_log_mask(LOG_GUEST_ERROR,
"%s: rejecting unspecified write", __func__);
return MEMTX_ACCESS_ERROR;
case MTRT_CPU:
/* can CPU directly trigger MSIs? */
break;
}
MSIMessage msi = { .address = addr, .data = val };
apic_send_msi(&msi);
return MEMTX_OK;
}
if (attrs.requester_type != MTRT_CPU) {
return MEMTX_ACCESS_ERROR;
}
dev = cpu_get_current_apic(attrs.requester_id);
s = APIC(dev);
trace_apic_mem_writel(addr, val);
which at least gets things booting properly. Does this seem like a
better modelling of the APIC behaviour?
--
Alex Bennée
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: should ioapic_service really be modelling cpu writes?
2022-11-11 12:26 ` Alex Bennée
@ 2022-11-11 13:14 ` Paolo Bonzini
2022-11-11 14:00 ` Alex Bennée
0 siblings, 1 reply; 8+ messages in thread
From: Paolo Bonzini @ 2022-11-11 13:14 UTC (permalink / raw)
To: Alex Bennée, Peter Xu
Cc: QEMU Developers, Michael S. Tsirkin, Philippe Mathieu-Daudé,
Peter Maydell
On 11/11/22 13:26, Alex Bennée wrote:
> if (addr > 0xfff || !index) {
> switch (attrs.requester_type) {
> }
> MSIMessage msi = { .address = addr, .data = val };
> apic_send_msi(&msi);
> return MEMTX_OK;
> }
> which at least gets things booting properly. Does this seem like a
> better modelling of the APIC behaviour?
Yes and you don't even need the "if", just do MTRT_CPU vs everything else.
Paolo
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: should ioapic_service really be modelling cpu writes?
2022-11-11 13:14 ` Paolo Bonzini
@ 2022-11-11 14:00 ` Alex Bennée
2022-11-11 15:57 ` Paolo Bonzini
0 siblings, 1 reply; 8+ messages in thread
From: Alex Bennée @ 2022-11-11 14:00 UTC (permalink / raw)
To: Paolo Bonzini
Cc: Peter Xu, QEMU Developers, Michael S. Tsirkin,
Philippe Mathieu-Daudé, Peter Maydell
Paolo Bonzini <pbonzini@redhat.com> writes:
> On 11/11/22 13:26, Alex Bennée wrote:
>> if (addr > 0xfff || !index) {
>> switch (attrs.requester_type) {
>> }
>> MSIMessage msi = { .address = addr, .data = val };
>> apic_send_msi(&msi);
>> return MEMTX_OK;
>> }
>
>
>> which at least gets things booting properly. Does this seem like a
>> better modelling of the APIC behaviour?
>
> Yes and you don't even need the "if", just do MTRT_CPU vs everything
> else.
Can the CPU trigger MSIs by writing to this area of memory? I went for
the explicit switch for clarity but are you saying:
if (attrs.requester_type != MTRT_CPU) {
MSIMessage msi = { .address = addr, .data = val };
apic_send_msi(&msi);
return MEMTX_OK;
} else {
return MEMTX_ACESSS_ERROR;
}
for the MSI range?
>
> Paolo
--
Alex Bennée
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: should ioapic_service really be modelling cpu writes?
2022-11-11 14:00 ` Alex Bennée
@ 2022-11-11 15:57 ` Paolo Bonzini
0 siblings, 0 replies; 8+ messages in thread
From: Paolo Bonzini @ 2022-11-11 15:57 UTC (permalink / raw)
To: Alex Bennée
Cc: Peter Xu, QEMU Developers, Michael S. Tsirkin,
Philippe Mathieu-Daudé, Peter Maydell
[-- Attachment #1: Type: text/plain, Size: 1527 bytes --]
Il ven 11 nov 2022, 15:03 Alex Bennée <alex.bennee@linaro.org> ha scritto:
>
> Paolo Bonzini <pbonzini@redhat.com> writes:
>
> > On 11/11/22 13:26, Alex Bennée wrote:
> >> if (addr > 0xfff || !index) {
> >> switch (attrs.requester_type) {
> >> }
> >> MSIMessage msi = { .address = addr, .data = val };
> >> apic_send_msi(&msi);
> >> return MEMTX_OK;
> >> }
> >
> >
> >> which at least gets things booting properly. Does this seem like a
> >> better modelling of the APIC behaviour?
> >
> > Yes and you don't even need the "if", just do MTRT_CPU vs everything
> > else.
>
> Can the CPU trigger MSIs by writing to this area of memory?
No, it's a different bus. If it can in QEMU that's a bug.
I went for
> the explicit switch for clarity but are you saying:
>
> if (attrs.requester_type != MTRT_CPU) {
> MSIMessage msi = { .address = addr, .data = val };
> apic_send_msi(&msi);
> return MEMTX_OK;
> } else {
> return MEMTX_ACESSS_ERROR;
> }
>
> for the MSI range?
>
Yes that would work. It can be tightened even further by removing the "if
(addr ...)" completely and only checking the requester type (which in turn
I would do with a function like "return APIC based on txattrs requester
type and id, or return NULL if requester not MTRT_CPU"), but no need to
hurry.
Thanks,
Paolo
>
> >
> > Paolo
>
>
> --
> Alex Bennée
>
>
[-- Attachment #2: Type: text/html, Size: 2730 bytes --]
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2022-11-11 15:58 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-11-10 17:01 should ioapic_service really be modelling cpu writes? Alex Bennée
2022-11-10 17:55 ` Alex Bennée
2022-11-10 22:42 ` Peter Xu
2022-11-11 11:08 ` Paolo Bonzini
2022-11-11 12:26 ` Alex Bennée
2022-11-11 13:14 ` Paolo Bonzini
2022-11-11 14:00 ` Alex Bennée
2022-11-11 15:57 ` Paolo Bonzini
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).