apparent race condition in mttcg memory handling

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* apparent race condition in mttcg memory handling
@ 2025-05-30 19:20 Michael Tokarev
  2025-06-04 10:47 ` Michael Tokarev
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Michael Tokarev @ 2025-05-30 19:20 UTC (permalink / raw)
  To: QEMU Development

Hi!

For quite some time (almost whole day yesterday) I'm trying to find out
what's going on with mmtcg in qemu.  There's apparently a race condition
somewhere, like a use-after-free or something.

It started as an incarnation of
https://gitlab.com/qemu-project/qemu/-/issues/1921 -- the same assertion
failure, but on an x86_64 host this time (it's also mentioned in that
issue).

However, that particular assertion failure is not the only possible
outcome.  We're hitting multiple assertion failures or SIGSEGVs in
physmem.c and related files, - 4 or 5 different places so far.

The problem here is that the bug is rather difficult to reproduce.
What I've been using so far was to make most host cores busy, and
specify amount of virtual CPUs close to actual host cores (threads).

For example, on my 4-core, 8-threads notebook, I used `stress -c 8`
and ran qemu with -smp 10, to trigger this issue. However, on this
very notebook it is really difficult to trigger it, - it happens
every 30..50 runs or so.

The reproducer I was using - it was just booting kernel, no user-
space is needed. Qemu crashes during kernel init, or it runs fine.

I used regular kernel from debian sid:
  http://deb.debian.org/debian/pool/main/l/linux-signed-amd64/linux-image-amd64_6.12.29-1_amd64.deb
Extract vmlinuz-6.12.29-amd64 from there.

In order to simplify the reproducing, I created a tiny initrd with
just one executable in there, which does a poweroff:

cat >poweroff.c <<'EOF'
#include <sys/reboot.h>
#include <unistd.h>

int main(void) {
   reboot(RB_POWER_OFF);
   sleep(5);
   return 0;
}
EOF
diet gcc -static -o init poweroff.c
echo init | cpio -o -H newc > initrd

(it uses dietlibc, optional, just to make the initrd smaller).

Now, the qemu invocation I used:

  qemu-system-x86_64 -kernel vmlinuz -initrd initrd \
    -append "console=ttyS0" \
    -vga none -display none \
    -serial file:/dev/tty \
    -monitor stdio \
    -m 256 \
    -smp 16

This way, it either succeeds, terminating normally due to
the initrd hating the system, or it will segfault or assert
as per the issue.

For a 64-core machine, I used -smp 64, and had 16..40 cores
being busy with other stuff.  Also, adding `nice' in front
of that command apparently helps.

Now, to the various issues/places I've hit.  Here's a typical
output:

...
[    3.129806] smpboot: x86: Booting SMP configuration:
[    3.135789] .... node  #0, CPUs:        #1  #2  #3  #4  #5  #6  #7 
#8  #9
[    0.000000] calibrate_delay_direct() failed to get a good estimate 
for loops_per_jiffy.
[    0.000000] Probably due to long platform interrupts. Consider using 
"lpj=" boot option.
[    0.000000] calibrate_delay_direct() failed to get a good estimate 
for loops_per_jiffy.
[    0.000000] Probably due to long platform interrupts. Consider using 
"lpj=" boot option.
[    0.000000] calibrate_delay_direct() failed to get a good estimate 
for loops_per_jiffy.
[    0.000000] Probably due to long platform interrupts. Consider using 
"lpj=" boot option.
[    0.000000] calibrate_delay_direct() failed to get a good estimate 
for loops_per_jiffy.
[    0.000000] Probably due to long platform interrupts. Consider using 
"lpj=" boot option.
[    4.494389] calibrate_delay_direct() failed to get a good estimate 
for loops_per_jiffy.
[    4.494389] Probably due to long platform interrupts. Consider using 
"lpj=" boot option.
[    4.494396] calibrate_delay_direct() failed to get a good estimate 
for loops_per_jiffy.
[    4.494396] Probably due to long platform interrupts. Consider using 
"lpj=" boot option.
[    4.494401] calibrate_delay_direct() failed to get a good estimate 
for loops_per_jiffy.
[    4.494401] Probably due to long platform interrupts. Consider using 
"lpj=" boot option.
[    4.494408] calibrate_delay_direct() failed to get a good estimate 
for loops_per_jiffy.
[    4.494408] Probably due to long platform interrupts. Consider using 
"lpj=" boot option.
[    4.494415] calibrate_delay_direct() failed to get a good estimate 
for loops_per_jiffy.
[    4.494415] Probably due to long platform interrupts. Consider using 
"lpj=" boot option.
[    5.864038] smp: Brought up 1 node, 10 CPUs
[    5.865772] smpboot: Total of 10 processors activated (25983.25 BogoMIPS)
[    6.119683] Memory: 200320K/261624K available (16384K kernel code, 
2486K rwdata, 11780K rodata, 4148K init, 4956K bss, 53176K reserved, 0K 
cma-reserved)
[    6.591933] devtmpfs: initialized
[    6.635844] x86/mm: Memory block size: 128MB
[    6.756849] clocksource: jiffies: mask: 0xffffffff max_cycles: 
0xffffffff, max_idle_ns: 7645041785100000 ns
[    6.774545] futex hash table entries: 4096 (order: 6, 262144 bytes, 
linear)
[    6.840775] pinctrl core: initialized pinctrl subsystem
[    7.117085] NET: Registered PF_NETLINK/PF_ROUTE protocol family
[    7.165883] DMA: preallocated 128 KiB GFP_KERNEL pool for atomic 
allocations
[    7.184243] DMA: preallocated 128 KiB GFP_KERNEL|GFP_DMA pool for 
atomic allocations
[    7.188322] DMA: preallocated 128 KiB GFP_KERNEL|GFP_DMA32 pool for 
atomic allocations
[    7.195902] audit: initializing netlink subsys (disabled)
[    7.223865] audit: type=2000 audit(1748628013.324:1): 
state=initialized audit_enabled=0 res=1
[    7.290904] thermal_sys: Registered thermal governor 'fair_share'
[    7.291980] thermal_sys: Registered thermal governor 'bang_bang'
[    7.295875] thermal_sys: Registered thermal governor 'step_wise'
[    7.299817] thermal_sys: Registered thermal governor 'user_space'
[    7.303804] thermal_sys: Registered thermal governor 'power_allocator'
[    7.316281] cpuidle: using governor ladder
[    7.331907] cpuidle: using governor menu
[    7.348199] acpiphp: ACPI Hot Plug PCI Controller Driver version: 0.5
[    7.407802] PCI: Using configuration type 1 for base access
[    7.417386] mtrr: your CPUs had inconsistent fixed MTRR settings
[    7.418244] mtrr: your CPUs had inconsistent variable MTRR settings
[    7.419048] mtrr: your CPUs had inconsistent MTRRdefType settings
[    7.419938] mtrr: probably your BIOS does not setup all CPUs.
[    7.420691] mtrr: corrected configuration.
[    7.461270] kprobes: kprobe jump-optimization is enabled. All kprobes 
are optimized if possible.
[    7.591938] HugeTLB: registered 2.00 MiB page size, pre-allocated 0 pages
[    7.595986] HugeTLB: 28 KiB vmemmap can be freed for a 2.00 MiB page
[    7.816900] ACPI: Added _OSI(Module Device)
[    7.819950] ACPI: Added _OSI(Processor Device)
[    7.823873] ACPI: Added _OSI(3.0 _SCP Extensions)
[    7.827683] ACPI: Added _OSI(Processor Aggregator Device)
[    8.000944] ACPI: 1 ACPI AML tables successfully acquired and loaded
[    8.355952] ACPI: Interpreter enabled
[    8.406604] ACPI: PM: (supports S0 S3 S4 S5)
[    8.416143] ACPI: Using IOAPIC for interrupt routing
[    8.448173] PCI: Using host bridge windows from ACPI; if necessary, 
use "pci=nocrs" and report a bug
[    8.468051] PCI: Using E820 reservations for host bridge windows
[    8.562534] ACPI: Enabled 2 GPEs in block 00 to 0F
[    9.153432] ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
[    9.166585] acpi PNP0A03:00: _OSC: OS supports [ASPM ClockPM Segments 
MSI HPX-Type3]
[    9.168452] acpi PNP0A03:00: _OSC: not requesting OS control; OS 
requires [ExtendedConfig ASPM ClockPM MSI]
[    9.181933] acpi PNP0A03:00: fail to add MMCONFIG information, can't 
access extended configuration space under this bridge
[    9.297562] acpiphp: Slot [2] registered
...
[    9.369007] PCI host bridge to bus 0000:00
[    9.376590] pci_bus 0000:00: root bus resource [io  0x0000-0x0cf7 window]
[    9.379987] pci_bus 0000:00: root bus resource [io  0x0d00-0xffff window]
[    9.383826] pci_bus 0000:00: root bus resource [mem 
0x000a0000-0x000bffff window]
[    9.387818] pci_bus 0000:00: root bus resource [mem 
0x10000000-0xfebfffff window]
[    9.393681] pci_bus 0000:00: root bus resource [mem 
0x100000000-0x17fffffff window]
[    9.396987] pci_bus 0000:00: root bus resource [bus 00-ff]
[    9.414378] pci 0000:00:00.0: [8086:1237] type 00 class 0x060000 
conventional PCI endpoint
[    9.477179] pci 0000:00:01.0: [8086:7000] type 00 class 0x060100 
conventional PCI endpoint
[    9.494836] pci 0000:00:01.1: [8086:7010] type 00 class 0x010180 
conventional PCI endpoint
[    9.527173] pci 0000:00:01.1: BAR 4 [io  0xc040-0xc04f]
Segmentation fault


So it breaks somewhere in PCI init, after SMP/CPUs has been inited
by the guest kernel.

Thread 21 "qemu-system-x86" received signal SIGSEGV, Segmentation fault.
0x0000555555e2e9c0 in section_covers_addr (section=0x7fff58307, 
addr=182591488) at ../system/physmem.c:309
309         return int128_gethi(section->size) ||
(gdb) p *section
Cannot access memory at address 0x7fff58307

This one has been seen multiple times.

Thread 53 "qemu-system-x86" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffe8a7fc6c0 (LWP 104067)]
0x0000555555e30382 in memory_region_section_get_iotlb 
(cpu=0x5555584e0a90, section=0x7fff58c3eac0) at
              ../system/physmem.c:1002
1002        return section - d->map.sections;
d is NULL here


Thread 22 "qemu-system-x86" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fff0bfff6c0 (LWP 57595)]
0x0000555555e42c9a in memory_region_get_iommu (mr=0xffffffc1ffffffc1) at 
include/exec/memory.h:1756
1756        if (mr->alias) {
(gdb) p *mr
Cannot access memory at address 0xffffffc1ffffffc1
(gdb) frame 1
#1  0x0000555555e42cb9 in memory_region_get_iommu (mr=0x7fff54239a10) at 
include/exec/memory.h:1757
1757            return memory_region_get_iommu(mr->alias);
(gdb) p mr
$1 = (MemoryRegion *) 0x7fff54239a10


[    9.222531] pci 0000:00:02.0: BAR 0 [mem 0xfebc0000-0xfebdffff]
[
Thread 54 "qemu-system-x86" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffebeffd6c0 (LWP 14977)]

(gdb) l
1004    /* Called from RCU critical section */
1005    hwaddr memory_region_section_get_iotlb(CPUState *cpu,
1006                                           MemoryRegionSection *section)
1007    {
1008        AddressSpaceDispatch *d = flatview_to_dispatch(section->fv);
1009        return section - d->map.sections;
1010    }
1011
1012    static int subpage_register(subpage_t *mmio, uint32_t start, 
uint32_t end,
1013                                uint16_t section);

(gdb) p *section
$1 = {size = 4940204083636081308795136, mr = 0x7fff98739760, fv = 
0x7fff998f6fd0,
   offset_within_region = 0, offset_within_address_space = 0, readonly = 
false,
   nonvolatile = false, unmergeable = 12}

(gdb) p *section->fv
$2 = {rcu = {next = 0x0, func = 0x20281}, ref = 2555275280, ranges = 
0x7fff99486a60, nr = 0,
   nr_allocated = 0, dispatch = 0x0, root = 0xffffffc1ffffffc1}

(gdb) bt
#0  0x0000555555e5118c in memory_region_section_get_iotlb 
(cpu=cpu@entry=0x55555894fdf0,
     section=section@entry=0x7fff984e6810) at system/physmem.c:1009
#1  0x0000555555e6e07a in tlb_set_page_full (cpu=cpu@entry=0x55555894fdf0,
     mmu_idx=mmu_idx@entry=6, addr=addr@entry=151134208, 
full=full@entry=0x7ffebeffbd60)
     at accel/tcg/cputlb.c:1088
#2  0x0000555555e70a92 in tlb_set_page_with_attrs 
(cpu=cpu@entry=0x55555894fdf0,
     addr=addr@entry=151134208, paddr=paddr@entry=151134208, attrs=..., 
prot=<optimized out>,
     mmu_idx=mmu_idx@entry=6, size=4096) at accel/tcg/cputlb.c:1193
#3  0x0000555555d4ae44 in x86_cpu_tlb_fill (cs=0x55555894fdf0, 
addr=151138272,
     size=<optimized out>, access_type=MMU_DATA_STORE, mmu_idx=6, 
probe=<optimized out>,
     retaddr=0) at target/i386/tcg/system/excp_helper.c:624
#4  0x0000555555e6e8cf in tlb_fill_align (cpu=0x55555894fdf0, 
addr=151138272,
     type=MMU_DATA_LOAD, type@entry=MMU_DATA_STORE, mmu_idx=6, 
memop=memop@entry=MO_8,
     size=-1739692016, size@entry=151138272, probe=true, ra=0) at 
accel/tcg/cputlb.c:1251
#5  0x0000555555e6eb0d in probe_access_internal 
(cpu=cpu@entry=0x55555894fdf0,
     addr=addr@entry=151138272, fault_size=fault_size@entry=0,
     access_type=access_type@entry=MMU_DATA_STORE, mmu_idx=<optimized out>,
     nonfault=nonfault@entry=true, phost=0x7ffebeffc0a8, 
pfull=0x7ffebeffbfa0, retaddr=0,
     check_mem_cbs=false) at accel/tcg/cputlb.c:1371
#6  0x0000555555e70c84 in probe_access_full_mmu (env=0x5555589529b0, 
addr=addr@entry=151138272,
     size=size@entry=0, access_type=access_type@entry=MMU_DATA_STORE, 
mmu_idx=<optimized out>,
     phost=phost@entry=0x7ffebeffc0a8, pfull=0x7ffebeffbfa0) at 
accel/tcg/cputlb.c:1439
#7  0x0000555555d497c9 in ptw_translate (inout=0x7ffebeffc090, 
addr=151138272)
     at target/i386/tcg/system/excp_helper.c:68
#8  0x0000555555d49988 in mmu_translate (env=env@entry=0x5555589529b0,
     in=in@entry=0x7ffebeffc140, out=out@entry=0x7ffebeffc110, 
err=err@entry=0x7ffebeffc120,
     ra=ra@entry=0) at target/i386/tcg/system/excp_helper.c:198
#9  0x0000555555d4aece in get_physical_address (env=0x5555589529b0, 
addr=18446741874686299840,
     access_type=MMU_DATA_LOAD, mmu_idx=4, out=0x7ffebeffc110, 
err=0x7ffebeffc120, ra=0)
     at target/i386/tcg/system/excp_helper.c:597
#10 x86_cpu_tlb_fill (cs=0x55555894fdf0, addr=18446741874686299840, 
size=<optimized out>,
     access_type=MMU_DATA_LOAD, mmu_idx=4, probe=<optimized out>, retaddr=0)
     at target/i386/tcg/system/excp_helper.c:617
#11 0x0000555555e6e8cf in tlb_fill_align (cpu=0x55555894fdf0, 
addr=18446741874686299840,
     type=type@entry=MMU_DATA_LOAD, mmu_idx=4, memop=MO_8, 
memop@entry=MO_32, size=-1739692016,
     size@entry=3776, probe=false, ra=0) at accel/tcg/cputlb.c:1251
#12 0x0000555555e6ed4d in mmu_lookup1 (cpu=cpu@entry=0x55555894fdf0,
     data=data@entry=0x7ffebeffc310, memop=memop@entry=MO_32, 
mmu_idx=mmu_idx@entry=4,
     access_type=access_type@entry=MMU_DATA_LOAD, ra=ra@entry=0) at 
accel/tcg/cputlb.c:1652
#13 0x0000555555e6eea5 in mmu_lookup (cpu=cpu@entry=0x55555894fdf0,
     addr=addr@entry=18446741874686299840, oi=oi@entry=36, ra=ra@entry=0,
     type=type@entry=MMU_DATA_LOAD, l=l@entry=0x7ffebeffc310) at 
accel/tcg/cputlb.c:1755
#14 0x0000555555e6f2f3 in do_ld4_mmu (cpu=cpu@entry=0x55555894fdf0,
     addr=addr@entry=18446741874686299840, oi=oi@entry=36, ra=ra@entry=0,
     access_type=access_type@entry=MMU_DATA_LOAD) at accel/tcg/cputlb.c:2364
#15 0x0000555555e71dba in cpu_ldl_mmu (env=0x5555589529b0, 
addr=18446741874686299840, oi=36,
     ra=0) at accel/tcg/ldst_common.c.inc:165
#16 cpu_ldl_le_mmuidx_ra (env=env@entry=0x5555589529b0, 
addr=addr@entry=18446741874686299840,
     mmu_idx=<optimized out>, ra=ra@entry=0) at 
accel/tcg/ldst_common.c.inc:308
#17 0x0000555555db72da in do_interrupt64 (env=0x5555589529b0, intno=236, 
is_int=0, error_code=0,
     next_eip=<optimized out>, is_hw=0) at target/i386/tcg/seg_helper.c:954
#18 do_interrupt_all (cpu=cpu@entry=0x55555894fdf0, intno=236, 
is_int=is_int@entry=0,
     error_code=error_code@entry=0, next_eip=next_eip@entry=0, 
is_hw=is_hw@entry=1)
     at target/i386/tcg/seg_helper.c:1213
#19 0x0000555555db884a in do_interrupt_x86_hardirq 
(env=env@entry=0x5555589529b0,
     intno=<optimized out>, is_hw=is_hw@entry=1) at 
target/i386/tcg/seg_helper.c:1245
#20 0x0000555555d4f06f in x86_cpu_exec_interrupt (cs=0x55555894fdf0,
     interrupt_request=<optimized out>) at 
target/i386/tcg/system/seg_helper.c:209
#21 0x0000555555e660ed in cpu_handle_interrupt (cpu=0x55555894fdf0, 
last_tb=<synthetic pointer>)
     at accel/tcg/cpu-exec.c:851
#22 cpu_exec_loop (cpu=cpu@entry=0x55555894fdf0, sc=sc@entry=0x7ffebeffc580)
     at accel/tcg/cpu-exec.c:955
#23 0x0000555555e663f1 in cpu_exec_setjmp (cpu=cpu@entry=0x55555894fdf0,
     sc=sc@entry=0x7ffebeffc580) at accel/tcg/cpu-exec.c:1033
     type=type@entry=MMU_DATA_LOAD, l=l@entry=0x7ffebeffc310) at 
accel/tcg/cputlb.c:1755
#14 0x0000555555e6f2f3 in do_ld4_mmu (cpu=cpu@entry=0x55555894fdf0,
     addr=addr@entry=18446741874686299840, oi=oi@entry=36, ra=ra@entry=0,
     access_type=access_type@entry=MMU_DATA_LOAD) at accel/tcg/cputlb.c:2364
#15 0x0000555555e71dba in cpu_ldl_mmu (env=0x5555589529b0, 
addr=18446741874686299840, oi=36,
     ra=0) at accel/tcg/ldst_common.c.inc:165
#16 cpu_ldl_le_mmuidx_ra (env=env@entry=0x5555589529b0, 
addr=addr@entry=18446741874686299840,
     mmu_idx=<optimized out>, ra=ra@entry=0) at 
accel/tcg/ldst_common.c.inc:308
#17 0x0000555555db72da in do_interrupt64 (env=0x5555589529b0, intno=236, 
is_int=0, error_code=0,
     next_eip=<optimized out>, is_hw=0) at target/i386/tcg/seg_helper.c:954
#18 do_interrupt_all (cpu=cpu@entry=0x55555894fdf0, intno=236, 
is_int=is_int@entry=0,
     error_code=error_code@entry=0, next_eip=next_eip@entry=0, 
is_hw=is_hw@entry=1)
     at target/i386/tcg/seg_helper.c:1213
#19 0x0000555555db884a in do_interrupt_x86_hardirq 
(env=env@entry=0x5555589529b0,
     intno=<optimized out>, is_hw=is_hw@entry=1) at 
target/i386/tcg/seg_helper.c:1245
#20 0x0000555555d4f06f in x86_cpu_exec_interrupt (cs=0x55555894fdf0,
     interrupt_request=<optimized out>) at 
target/i386/tcg/system/seg_helper.c:209
#21 0x0000555555e660ed in cpu_handle_interrupt (cpu=0x55555894fdf0, 
last_tb=<synthetic pointer>)
     at accel/tcg/cpu-exec.c:851
#22 cpu_exec_loop (cpu=cpu@entry=0x55555894fdf0, sc=sc@entry=0x7ffebeffc580)
     at accel/tcg/cpu-exec.c:955
#23 0x0000555555e663f1 in cpu_exec_setjmp (cpu=cpu@entry=0x55555894fdf0,
     sc=sc@entry=0x7ffebeffc580) at accel/tcg/cpu-exec.c:1033
--Type <RET> for more, q to quit, c to continue without paging--
#24 0x0000555555e66a5d in cpu_exec (cpu=cpu@entry=0x55555894fdf0) at 
accel/tcg/cpu-exec.c:1059
#25 0x0000555555d2bdc7 in tcg_cpu_exec (cpu=cpu@entry=0x55555894fdf0)
     at accel/tcg/tcg-accel-ops.c:80
#26 0x0000555555d2c1c3 in mttcg_cpu_thread_fn (arg=arg@entry=0x55555894fdf0)
     at accel/tcg/tcg-accel-ops-mttcg.c:94
#27 0x0000555556056d90 in qemu_thread_start (args=0x5555589cdba0) at 
util/qemu-thread-posix.c:541
#28 0x00007ffff60e0b7b in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#29 0x00007ffff615e7b8 in ?? () from /lib/x86_64-linux-gnu/libc.so.6


qemu-system-x86_64: ./include/exec/ram_addr.h:91: ramblock_ptr: 
Assertion `offset_in_ramblock(block, offset)' failed.

(gdb) bt
#0  0x00007ffff6076507 in abort () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007ffff6076420 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#2  0x0000555555a047fa in ramblock_ptr (offset=281471527758833, 
block=<optimized out>)
     at ./include/exec/ram_addr.h:91
#3  0x0000555555a04c83 in ramblock_ptr (block=<optimized out>, 
offset=<optimized out>)
     at system/physmem.c:2238
#4  qemu_ram_ptr_length (lock=false, is_write=true, block=<optimized 
out>, addr=<optimized out>,
     size=0x0) at system/physmem.c:2430
#5  qemu_map_ram_ptr (ram_block=<optimized out>, addr=<optimized out>) 
at system/physmem.c:2443
#6  0x0000555555e4af6b in memory_region_get_ram_ptr (mr=<optimized out>) 
at system/memory.c:2452
#7  0x0000555555e6e024 in tlb_set_page_full (cpu=cpu@entry=0x5555589f9f50,
     mmu_idx=mmu_idx@entry=4, addr=addr@entry=18446741874686296064,
     full=full@entry=0x7ffebd7f90b0) at accel/tcg/cputlb.c:1065
#8  0x0000555555e70a92 in tlb_set_page_with_attrs 
(cpu=cpu@entry=0x5555589f9f50,
     addr=addr@entry=18446741874686296064, paddr=paddr@entry=206749696, 
attrs=...,
     prot=<optimized out>, mmu_idx=mmu_idx@entry=4, size=4096) at 
accel/tcg/cputlb.c:1193
#9  0x0000555555d4ae44 in x86_cpu_tlb_fill (cs=0x5555589f9f50, 
addr=18446741874686299840,
     size=<optimized out>, access_type=MMU_DATA_LOAD, mmu_idx=4, 
probe=<optimized out>, retaddr=0)
     at target/i386/tcg/system/excp_helper.c:624
#10 0x0000555555e6e8cf in tlb_fill_align (cpu=0x5555589f9f50, 
addr=18446741874686299840,
     type=type@entry=MMU_DATA_LOAD, mmu_idx=4, memop=MO_8, 
memop@entry=MO_32, size=-1115714056,
     size@entry=3776, probe=false, ra=0) at accel/tcg/cputlb.c:1251
#11 0x0000555555e6ed4d in mmu_lookup1 (cpu=cpu@entry=0x5555589f9f50,
     data=data@entry=0x7ffebd7f9310, memop=memop@entry=MO_32, 
mmu_idx=mmu_idx@entry=4,
     access_type=access_type@entry=MMU_DATA_LOAD, ra=ra@entry=0) at 
accel/tcg/cputlb.c:1652
#12 0x0000555555e6eea5 in mmu_lookup (cpu=cpu@entry=0x5555589f9f50,
     addr=addr@entry=18446741874686299840, oi=oi@entry=36, ra=ra@entry=0,
     type=type@entry=MMU_DATA_LOAD, l=l@entry=0x7ffebd7f9310) at 
accel/tcg/cputlb.c:1755
#13 0x0000555555e6f2f3 in do_ld4_mmu (cpu=cpu@entry=0x5555589f9f50,
     addr=addr@entry=18446741874686299840, oi=oi@entry=36, ra=ra@entry=0,
     access_type=access_type@entry=MMU_DATA_LOAD) at accel/tcg/cputlb.c:2364
#14 0x0000555555e71dba in cpu_ldl_mmu (env=0x5555589fcb10, 
addr=18446741874686299840, oi=36,
     ra=0) at accel/tcg/ldst_common.c.inc:165
#15 cpu_ldl_le_mmuidx_ra (env=env@entry=0x5555589fcb10, 
addr=addr@entry=18446741874686299840,
     mmu_idx=<optimized out>, ra=ra@entry=0) at 
accel/tcg/ldst_common.c.inc:308
#16 0x0000555555db72da in do_interrupt64 (env=0x5555589fcb10, intno=236, 
is_int=0, error_code=0,
     next_eip=<optimized out>, is_hw=0) at target/i386/tcg/seg_helper.c:954
#17 do_interrupt_all (cpu=cpu@entry=0x5555589f9f50, intno=236, 
is_int=is_int@entry=0,
     error_code=error_code@entry=0, next_eip=next_eip@entry=0, 
is_hw=is_hw@entry=1)
     at target/i386/tcg/seg_helper.c:1213
#18 0x0000555555db884a in do_interrupt_x86_hardirq 
(env=env@entry=0x5555589fcb10,
     intno=<optimized out>, is_hw=is_hw@entry=1) at 
target/i386/tcg/seg_helper.c:1245
#19 0x0000555555d4f06f in x86_cpu_exec_interrupt (cs=0x5555589f9f50,
     interrupt_request=<optimized out>) at 
target/i386/tcg/system/seg_helper.c:209
#20 0x0000555555e660ed in cpu_handle_interrupt (cpu=0x5555589f9f50, 
last_tb=<synthetic pointer>)
     at accel/tcg/cpu-exec.c:851
#21 cpu_exec_loop (cpu=cpu@entry=0x5555589f9f50, sc=sc@entry=0x7ffebd7f9580)
     at accel/tcg/cpu-exec.c:955
#22 0x0000555555e663f1 in cpu_exec_setjmp (cpu=cpu@entry=0x5555589f9f50,
     sc=sc@entry=0x7ffebd7f9580) at accel/tcg/cpu-exec.c:1033
#23 0x0000555555e66a5d in cpu_exec (cpu=cpu@entry=0x5555589f9f50) at 
accel/tcg/cpu-exec.c:1059
#24 0x0000555555d2bdc7 in tcg_cpu_exec (cpu=cpu@entry=0x5555589f9f50)
     at accel/tcg/tcg-accel-ops.c:80
#25 0x0000555555d2c1c3 in mttcg_cpu_thread_fn (arg=arg@entry=0x5555589f9f50)
     at accel/tcg/tcg-accel-ops-mttcg.c:94
#26 0x0000555556056d90 in qemu_thread_start (args=0x55555856bf60) at 
util/qemu-thread-posix.c:541
#27 0x00007ffff60e0b7b in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#28 0x00007ffff615e7b8 in ?? () from /lib/x86_64-linux-gnu/libc.so.6

(gdb) frame 2
#2  0x0000555555a047fa in ramblock_ptr (offset=281471527758833, 
block=<optimized out>)
     at ./include/exec/ram_addr.h:91
91          assert(offset_in_ramblock(block, offset));

(gdb) l
86          return (b && b->host && offset < b->used_length) ? true : false;
87      }
88
89      static inline void *ramblock_ptr(RAMBlock *block, ram_addr_t offset)
90      {
91          assert(offset_in_ramblock(block, offset));
92          return (char *)block->host + offset;
93      }
94
95      static inline unsigned long int ramblock_recv_bitmap_offset(void 
*host_addr,


[    9.439487] pci 0000:00:02.0: BAR 1 [io  0xc000-0xc03f]

Thread 65 "qemu-system-x86" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffe9cff96c0 (LWP 15472)]
phys_page_find (d=d@entry=0x7fff905ec880, addr=addr@entry=111288320) at 
system/physmem.c:337
337         if (section_covers_addr(&sections[lp.ptr], addr)) {
(gdb) l
332             }
333             p = nodes[lp.ptr];
334             lp = p[(index >> (i * P_L2_BITS)) & (P_L2_SIZE - 1)];
335         }
336
337         if (section_covers_addr(&sections[lp.ptr], addr)) {
338             return &sections[lp.ptr];
339         } else {
340             return &sections[PHYS_SECTION_UNASSIGNED];
341         }
(gdb)


I was doing a bisection between 9.2.0 and 10.0.0, since we observed
this issue happening wiht 10.0 but not - with 9.2.  So some of the
above failures might be somewhere from the middle between 9.2 and
10.0.  However, I was able to trigger some of the failures with
9.2.0, though with much less probability.  And some can be triggered
in current master too, with much better probability.

On my 4-core notebook, the above command line fails every 20..50 run.

I was never able to reproduce the assertion failure as shown in !1921.

As of now, this issue is hitting debian trixie, - in debci, when a
package which creates a guest image tries to run qemu but in the
debci environment there's no kvm available, so it resorts to tcg.

On IRC, Manos Pitsidianakis noted that he was debugging use-after-free
with MemoryRegion recently, and posted a patch which can help a bit:
https://people.linaro.org/~manos.pitsidianakis/backtrace.diff

I'm not sure where to go from here.

Just collecting everything we have now.

Thanks,

/mjt


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: apparent race condition in mttcg memory handling
  2025-05-30 19:20 apparent race condition in mttcg memory handling Michael Tokarev
@ 2025-06-04 10:47 ` Michael Tokarev
  2025-07-21 11:47 ` Philippe Mathieu-Daudé
  2025-07-22 20:11 ` Gustavo Romero
  2 siblings, 0 replies; 12+ messages in thread
From: Michael Tokarev @ 2025-06-04 10:47 UTC (permalink / raw)
  To: QEMU Development

Here's a typical output with ASan enabled, fwiw:

$ ./qemu-system-x86_64 -smp 16 -m 256 -vga none -display none -kernel 
/boot/vmlinuz-6.12.29-amd64 -append "console=ttyS0" -serial 
file:/dev/tty -monitor stdio -initrd ~/debvm/initrd
==368707==WARNING: ASan doesn't fully support makecontext/swapcontext 
functions and may produce false positives in some cases!
QEMU 10.0.50 monitor - type 'help' for more information
(qemu) [    0.000000] Linux version 6.12.29-amd64 
(debian-kernel@lists.debian.org) (x86_64-linux-gnu-gcc-14 (Debian 
14.2.0-19) 14.2.0, GNU ld (GNU Binutils for Debian) 2.44) #1 SMP 
PREEMPT_DYNAMIC Debian 6.12.29-1 (2025-05-18)
[    0.000000] Command line: console=ttyS0
[    0.000000] BIOS-provided physical RAM map:
[    0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
[    0.000000] BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] 
reserved
[    0.000000] BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] 
reserved
[    0.000000] BIOS-e820: [mem 0x0000000000100000-0x000000000ffdffff] usable
[    0.000000] BIOS-e820: [mem 0x000000000ffe0000-0x000000000fffffff] 
reserved
[    0.000000] BIOS-e820: [mem 0x00000000fffc0000-0x00000000ffffffff] 
reserved
[    0.000000] BIOS-e820: [mem 0x000000fd00000000-0x000000ffffffffff] 
reserved
[    0.000000] NX (Execute Disable) protection: active
[    0.000000] APIC: Static calls initialized
[    0.000000] SMBIOS 2.8 present.
[    0.000000] DMI: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
[    0.000000] DMI: Memory slots populated: 1/1
[    0.000000] tsc: Fast TSC calibration failed
[    0.000000] AGP: No AGP bridge found
[    0.000000] last_pfn = 0xffe0 max_arch_pfn = 0x400000000
[    0.000000] MTRR map: 4 entries (3 fixed + 1 variable; max 19), built 
from 8 variable MTRRs
[    0.000000] x86/PAT: Configuration [0-7]: WB  WC  UC- UC  WB  WP  UC- 
WT
[    0.000000] found SMP MP-table at [mem 0x000f5480-0x000f548f]
[    0.000000] RAMDISK: [mem 0x0ffdb000-0x0ffdffff]
[    0.000000] ACPI: Early table checksum verification disabled
[    0.000000] ACPI: RSDP 0x00000000000F52A0 000014 (v00 BOCHS )
[    0.000000] ACPI: RSDT 0x000000000FFE28F3 000034 (v01 BOCHS  BXPC 
00000001 BXPC 00000001)
[    0.000000] ACPI: FACP 0x000000000FFE272F 000074 (v01 BOCHS  BXPC 
00000001 BXPC 00000001)
[    0.000000] ACPI: DSDT 0x000000000FFE0040 0026EF (v01 BOCHS  BXPC 
00000001 BXPC 00000001)
[    0.000000] ACPI: FACS 0x000000000FFE0000 000040
[    0.000000] ACPI: APIC 0x000000000FFE27A3 0000F0 (v03 BOCHS  BXPC 
00000001 BXPC 00000001)
[    0.000000] ACPI: HPET 0x000000000FFE2893 000038 (v01 BOCHS  BXPC 
00000001 BXPC 00000001)
[    0.000000] ACPI: WAET 0x000000000FFE28CB 000028 (v01 BOCHS  BXPC 
00000001 BXPC 00000001)
[    0.000000] ACPI: Reserving FACP table memory at [mem 
0xffe272f-0xffe27a2]
[    0.000000] ACPI: Reserving DSDT table memory at [mem 
0xffe0040-0xffe272e]
[    0.000000] ACPI: Reserving FACS table memory at [mem 
0xffe0000-0xffe003f]
[    0.000000] ACPI: Reserving APIC table memory at [mem 
0xffe27a3-0xffe2892]
[    0.000000] ACPI: Reserving HPET table memory at [mem 
0xffe2893-0xffe28ca]
[    0.000000] ACPI: Reserving WAET table memory at [mem 
0xffe28cb-0xffe28f2]
[    0.000000] No NUMA configuration found
[    0.000000] Faking a node at [mem 0x0000000000000000-0x000000000ffdffff]
[    0.000000] NODE_DATA(0) allocated [mem 0x0ffb0680-0x0ffdafff]
[    0.000000] Zone ranges:
[    0.000000]   DMA      [mem 0x0000000000001000-0x0000000000ffffff]
[    0.000000]   DMA32    [mem 0x0000000001000000-0x000000000ffdffff]
[    0.000000]   Normal   empty
[    0.000000]   Device   empty
[    0.000000] Movable zone start for each node
[    0.000000] Early memory node ranges
[    0.000000]   node   0: [mem 0x0000000000001000-0x000000000009efff]
[    0.000000]   node   0: [mem 0x0000000000100000-0x000000000ffdffff]
[    0.000000] Initmem setup node 0 [mem 
0x0000000000001000-0x000000000ffdffff]
[    0.000000] On node 0, zone DMA: 1 pages in unavailable ranges
[    0.000000] On node 0, zone DMA: 97 pages in unavailable ranges
[    0.000000] On node 0, zone DMA32: 32 pages in unavailable ranges
[    0.000000] ACPI: PM-Timer IO Port: 0x608
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0xff] dfl dfl lint[0x1])
[    0.000000] IOAPIC[0]: apic_id 0, version 32, address 0xfec00000, GSI 
0-23
[    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
[    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 5 global_irq 5 high level)
[    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
[    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 10 global_irq 10 high level)
[    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 11 global_irq 11 high level)
[    0.000000] ACPI: Using ACPI (MADT) for SMP configuration information
[    0.000000] ACPI: HPET id: 0x8086a201 base: 0xfed00000
[    0.000000] CPU topo: Max. logical packages:   1
[    0.000000] CPU topo: Max. logical dies:       1
[    0.000000] CPU topo: Max. dies per package:   1
[    0.000000] CPU topo: Max. threads per core:   1
[    0.000000] CPU topo: Num. cores per package:    16
[    0.000000] CPU topo: Num. threads per package:  16
[    0.000000] CPU topo: Allowing 16 present CPUs plus 0 hotplug CPUs
[    0.000000] PM: hibernation: Registered nosave memory: [mem 
0x00000000-0x00000fff]
[    0.000000] PM: hibernation: Registered nosave memory: [mem 
0x0009f000-0x000fffff]
[    0.000000] [mem 0x10000000-0xfffbffff] available for PCI devices
[    0.000000] Booting paravirtualized kernel on bare hardware
[    0.000000] clocksource: refined-jiffies: mask: 0xffffffff 
max_cycles: 0xffffffff, max_idle_ns: 7645519600211568 ns
[    0.000000] setup_percpu: NR_CPUS:8192 nr_cpumask_bits:16 
nr_cpu_ids:16 nr_node_ids:1
[    0.000000] percpu: Embedded 66 pages/cpu s233472 r8192 d28672 u524288
[    0.000000] Kernel command line: console=ttyS0
[    0.000000] Dentry cache hash table entries: 32768 (order: 6, 262144 
bytes, linear)
[    0.000000] Inode-cache hash table entries: 16384 (order: 5, 131072 
bytes, linear)
[    0.000000] Fallback order for Node 0: 0
[    0.000000] Built 1 zonelists, mobility grouping on.  Total pages: 65406
[    0.000000] Policy zone: DMA32
[    0.000000] mem auto-init: stack:all(zero), heap alloc:on, heap free:off
[    0.000000] AGP: Checking aperture...
[    0.000000] AGP: No AGP bridge found
[    0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=16, Nodes=1
[    0.000000] ftrace: allocating 45689 entries in 179 pages
[    0.000000] ftrace: allocated 179 pages with 5 groups
[    0.000000] Dynamic Preempt: voluntary
[    0.000000] rcu: Preemptible hierarchical RCU implementation.
[    0.000000] rcu: 	RCU restricting CPUs from NR_CPUS=8192 to 
nr_cpu_ids=16.
[    0.000000] 	Trampoline variant of Tasks RCU enabled.
[    0.000000] 	Rude variant of Tasks RCU enabled.
[    0.000000] 	Tracing variant of Tasks RCU enabled.
[    0.000000] rcu: RCU calculated value of scheduler-enlistment delay 
is 25 jiffies.
[    0.000000] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=16
[    0.000000] RCU Tasks: Setting shift to 4 and lim to 1 
rcu_task_cb_adjust=1 rcu_task_cpu_ids=16.
[    0.000000] RCU Tasks Rude: Setting shift to 4 and lim to 1 
rcu_task_cb_adjust=1 rcu_task_cpu_ids=16.
[    0.000000] RCU Tasks Trace: Setting shift to 4 and lim to 1 
rcu_task_cb_adjust=1 rcu_task_cpu_ids=16.
[    0.000000] NR_IRQS: 524544, nr_irqs: 552, preallocated irqs: 16
[    0.000000] rcu: srcu_init: Setting srcu_struct sizes based on 
contention.
[    0.000000] Console: colour *CGA 80x25
[    0.000000] printk: legacy console [ttyS0] enabled
[    0.000000] ACPI: Core revision 20240827
[    0.000000] clocksource: hpet: mask: 0xffffffff max_cycles: 
0xffffffff, max_idle_ns: 19112604467 ns
[    0.060000] APIC: Switch to symmetric I/O mode setup
[    0.136000] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
[    0.172000] tsc: Unable to calibrate against PIT
[    0.172000] tsc: using HPET reference calibration
[    0.176000] tsc: Detected 2096.090 MHz processor
[    0.007755] clocksource: tsc-early: mask: 0xffffffffffffffff 
max_cycles: 0x1e36c30ca71, max_idle_ns: 440795294664 ns
[    0.019694] Calibrating delay loop (skipped), value calculated using 
timer frequency.. 4192.18 BogoMIPS (lpj=8384360)
[    0.081754] Last level iTLB entries: 4KB 512, 2MB 255, 4MB 127
[    0.083138] Last level dTLB entries: 4KB 512, 2MB 255, 4MB 127, 1GB 0
[    0.093255] Spectre V1 : Mitigation: usercopy/swapgs barriers and 
__user pointer sanitization
[    0.102414] Spectre V2 : Mitigation: Retpolines
[    0.105952] Spectre V2 : Spectre v2 / SpectreRSB: Filling RSB on 
context switch and VMEXIT
[    0.160434] x86/fpu: x87 FPU will use FXSAVE
[    3.002703] Freeing SMP alternatives memory: 40K
[    3.023274] pid_max: default: 32768 minimum: 301
[    3.122961] LSM: initializing 
lsm=lockdown,capability,landlock,yama,apparmor,tomoyo,bpf,ipe,ima,evm
[    3.172533] landlock: Up and running.
[    3.173855] Yama: disabled by default; enable with sysctl kernel.yama.*
[    3.269917] AppArmor: AppArmor initialized
[    3.275313] TOMOYO Linux initialized
[    3.305559] LSM support for eBPF active
[    3.381819] Mount-cache hash table entries: 512 (order: 0, 4096 
bytes, linear)
[    3.386196] Mountpoint-cache hash table entries: 512 (order: 0, 4096 
bytes, linear)
[    4.149559] smpboot: CPU0: AMD QEMU Virtual CPU version 2.5+ (family: 
0xf, model: 0x6b, stepping: 0x1)
[    4.326143] Performance Events: PMU not available due to 
virtualization, using software events only.
[    4.358224] signal: max sigframe size: 1440
[    4.378978] rcu: Hierarchical SRCU implementation.
[    4.382048] rcu: 	Max phase no-delay instances is 1000.
[    4.418254] Timer migration: 2 hierarchy levels; 8 children per 
group; 2 crossnode level
[    4.558206] NMI watchdog: Perf NMI watchdog permanently disabled
[    4.603431] smp: Bringing up secondary CPUs ...
[    4.702376] smpboot: x86: Booting SMP configuration:
[    4.703724] .... node  #0, CPUs:        #1  #2  #3  #4  #5  #6  #7 
#8  #9 #10 #11 #12 #13 #14 #15
[    0.000000] calibrate_delay_direct() dropping max bogoMips estimate 4 
= 9105957
[    0.000000] calibrate_delay_direct() failed to get a good estimate 
for loops_per_jiffy.
[    0.000000] Probably due to long platform interrupts. Consider using 
"lpj=" boot option.
[    0.000000] calibrate_delay_direct() dropping max bogoMips estimate 1 
= 28440919
[    0.000000] calibrate_delay_direct() dropping max bogoMips estimate 3 
= 20962063
[    0.000000] calibrate_delay_direct() dropping max bogoMips estimate 4 
= 11352022
[    0.000000] calibrate_delay_direct() failed to get a good estimate 
for loops_per_jiffy.
[    0.000000] Probably due to long platform interrupts. Consider using 
"lpj=" boot option.
[    5.969337] calibrate_delay_direct() failed to get a good estimate 
for loops_per_jiffy.
[    5.969337] Probably due to long platform interrupts. Consider using 
"lpj=" boot option.
[    5.969343] calibrate_delay_direct() failed to get a good estimate 
for loops_per_jiffy.
[    5.969343] Probably due to long platform interrupts. Consider using 
"lpj=" boot option.
[    5.969348] calibrate_delay_direct() dropping max bogoMips estimate 2 
= 27830974
[    5.969358] calibrate_delay_direct() dropping max bogoMips estimate 3 
= 30234130
[    5.969358] calibrate_delay_direct() failed to get a good estimate 
for loops_per_jiffy.
[    5.969358] Probably due to long platform interrupts. Consider using 
"lpj=" boot option.
[    5.969364] calibrate_delay_direct() dropping max bogoMips estimate 1 
= 21780255
[    5.969364] calibrate_delay_direct() dropping min bogoMips estimate 3 
= 7553311
[    5.969364] calibrate_delay_direct() dropping min bogoMips estimate 4 
= 8179132
[    5.969369] calibrate_delay_direct() failed to get a good estimate 
for loops_per_jiffy.
[    5.969369] Probably due to long platform interrupts. Consider using 
"lpj=" boot option.
[    5.969374] calibrate_delay_direct() failed to get a good estimate 
for loops_per_jiffy.
[    5.969374] Probably due to long platform interrupts. Consider using 
"lpj=" boot option.
[    5.969389] calibrate_delay_direct() failed to get a good estimate 
for loops_per_jiffy.
[    5.969389] Probably due to long platform interrupts. Consider using 
"lpj=" boot option.
[    5.969400] calibrate_delay_direct() dropping min bogoMips estimate 1 
= 1631122
[    5.969405] calibrate_delay_direct() dropping min bogoMips estimate 0 
= 8501104
[    5.969410] calibrate_delay_direct() failed to get a good estimate 
for loops_per_jiffy.
[    5.969410] Probably due to long platform interrupts. Consider using 
"lpj=" boot option.
[    5.969415] calibrate_delay_direct() dropping max bogoMips estimate 1 
= 9766470
[    5.969415] calibrate_delay_direct() failed to get a good estimate 
for loops_per_jiffy.
[    5.969415] Probably due to long platform interrupts. Consider using 
"lpj=" boot option.
[    7.946795] smp: Brought up 1 node, 16 CPUs
[    7.949559] smpboot: Total of 16 processors activated (36914.04 BogoMIPS)
[    8.167796] Memory: 197656K/261624K available (16384K kernel code, 
2486K rwdata, 11780K rodata, 4148K init, 4956K bss, 54800K reserved, 0K 
cma-reserved)
[    8.433923] devtmpfs: initialized
[    8.547308] x86/mm: Memory block size: 128MB
[    8.751207] clocksource: jiffies: mask: 0xffffffff max_cycles: 
0xffffffff, max_idle_ns: 7645041785100000 ns
[    8.775080] futex hash table entries: 4096 (order: 6, 262144 bytes, 
linear)
[    8.868262] pinctrl core: initialized pinctrl subsystem
[    9.322265] NET: Registered PF_NETLINK/PF_ROUTE protocol family
[    9.434496] DMA: preallocated 128 KiB GFP_KERNEL pool for atomic 
allocations
[    9.446267] DMA: preallocated 128 KiB GFP_KERNEL|GFP_DMA pool for 
atomic allocations
[    9.450210] DMA: preallocated 128 KiB GFP_KERNEL|GFP_DMA32 pool for 
atomic allocations
[    9.455908] audit: initializing netlink subsys (disabled)
[    9.494877] audit: type=2000 audit(1749033951.660:1): 
state=initialized audit_enabled=0 res=1
[    9.622753] thermal_sys: Registered thermal governor 'fair_share'
[    9.623234] thermal_sys: Registered thermal governor 'bang_bang'
[    9.625842] thermal_sys: Registered thermal governor 'step_wise'
[    9.629649] thermal_sys: Registered thermal governor 'user_space'
[    9.633699] thermal_sys: Registered thermal governor 'power_allocator'
[    9.653949] cpuidle: using governor ladder
[    9.661815] cpuidle: using governor menu
[    9.696090] acpiphp: ACPI Hot Plug PCI Controller Driver version: 0.5
[    9.781559] PCI: Using configuration type 1 for base access
[    9.797961] mtrr: your CPUs had inconsistent fixed MTRR settings
[    9.801893] mtrr: your CPUs had inconsistent variable MTRR settings
[    9.806120] mtrr: your CPUs had inconsistent MTRRdefType settings
[    9.807416] mtrr: probably your BIOS does not setup all CPUs.
[    9.808407] mtrr: corrected configuration.
[    9.858380] kprobes: kprobe jump-optimization is enabled. All kprobes 
are optimized if possible.
[   10.012084] HugeTLB: registered 2.00 MiB page size, pre-allocated 0 pages
[   10.013878] HugeTLB: 28 KiB vmemmap can be freed for a 2.00 MiB page
[   10.348768] ACPI: Added _OSI(Module Device)
[   10.349947] ACPI: Added _OSI(Processor Device)
[   10.353682] ACPI: Added _OSI(3.0 _SCP Extensions)
[   10.357664] ACPI: Added _OSI(Processor Aggregator Device)
[   10.678343] ACPI: 1 ACPI AML tables successfully acquired and loaded
[   11.221996] ACPI: Interpreter enabled
[   11.262899] ACPI: PM: (supports S0 S3 S4 S5)
[   11.270094] ACPI: Using IOAPIC for interrupt routing
[   11.290614] PCI: Using host bridge windows from ACPI; if necessary, 
use "pci=nocrs" and report a bug
[   11.302139] PCI: Using E820 reservations for host bridge windows
[   11.353959] ACPI: Enabled 2 GPEs in block 00 to 0F
[   12.252675] ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
[   12.287708] acpi PNP0A03:00: _OSC: OS supports [ASPM ClockPM Segments 
MSI HPX-Type3]
[   12.290520] acpi PNP0A03:00: _OSC: not requesting OS control; OS 
requires [ExtendedConfig ASPM ClockPM MSI]
[   12.326309] acpi PNP0A03:00: fail to add MMCONFIG information, can't 
access extended configuration space under this bridge
[   12.537420] acpiphp: Slot [2] registered
[   12.542702] acpiphp: Slot [3] registered
[   12.546456] acpiphp: Slot [4] registered
[   12.550273] acpiphp: Slot [5] registered
[   12.554307] acpiphp: Slot [6] registered
[   12.558329] acpiphp: Slot [7] registered
[   12.558515] acpiphp: Slot [8] registered
[   12.560865] acpiphp: Slot [9] registered
[   12.561559] acpiphp: Slot [10] registered
[   12.561559] acpiphp: Slot [11] registered
[   12.566400] acpiphp: Slot [12] registered
[   12.574391] acpiphp: Slot [13] registered
[   12.578194] acpiphp: Slot [14] registered
[   12.580588] acpiphp: Slot [15] registered
[   12.586418] acpiphp: Slot [16] registered
[   12.587678] acpiphp: Slot [17] registered
[   12.588808] acpiphp: Slot [18] registered
[   12.594504] acpiphp: Slot [19] registered
[   12.602435] acpiphp: Slot [20] registered
[   12.603927] acpiphp: Slot [21] registered
[   12.606341] acpiphp: Slot [22] registered
[   12.607797] acpiphp: Slot [23] registered
[   12.608969] acpiphp: Slot [24] registered
[   12.609559] acpiphp: Slot [25] registered
[   12.609559] acpiphp: Slot [26] registered
[   12.609559] acpiphp: Slot [27] registered
[   12.610162] acpiphp: Slot [28] registered
[   12.611594] acpiphp: Slot [29] registered
[   12.612960] acpiphp: Slot [30] registered
[   12.614401] acpiphp: Slot [31] registered
[   12.620799] PCI host bridge to bus 0000:00
[   12.630278] pci_bus 0000:00: root bus resource [io  0x0000-0x0cf7 window]
[   12.639483] pci_bus 0000:00: root bus resource [io  0x0d00-0xffff window]
[   12.641728] pci_bus 0000:00: root bus resource [mem 
0x000a0000-0x000bffff window]
[   12.644426] pci_bus 0000:00: root bus resource [mem 
0x10000000-0xfebfffff window]
[   12.645559] pci_bus 0000:00: root bus resource [mem 
0x100000000-0x17fffffff window]
[   12.659495] pci_bus 0000:00: root bus resource [bus 00-ff]
[   12.713130] pci 0000:00:00.0: [8086:1237] type 00 class 0x060000 
conventional PCI endpoint
[   12.896856] pci 0000:00:01.0: [8086:7000] type 00 class 0x060100 
conventional PCI endpoint
[   12.920028] pci 0000:00:01.1: [8086:7010] type 00 class 0x010180 
conventional PCI endpoint
[   12.991922] pci 0000:00:01.1: BAR 4 [io  0xc040-0xc04f]
[   13.005559] pci 0000:00:01.1: BAR 0 [io  0x01f0-0x01f7]: legacy IDE quirk
[   13.005559] pci 0000:00:01.1: BAR 1 [io  0x03f6]: legacy IDE quirk
[   13.005559] pci 0000:00:01.1: BAR 2 [io  0x0170-0x0177]: legacy IDE quirk
[   13.013769] pci 0000:00:01.1: BAR 3 [io  0x0376]: legacy IDE quirk
[   13.026884] pci 0000:00:01.3: [8086:7113] type 00 class 0x068000 
conventional PCI endpoint
[   13.045860] pci 0000:00:01.3: quirk: [io  0x0600-0x063f] claimed by 
PIIX4 ACPI
[   13.055916] pci 0000:00:01.3: quirk: [io  0x0700-0x070f] claimed by 
PIIX4 SMB
[   13.059147] pci 0000:00:01.3: quirk_piix4_acpi+0x0/0x180 took 19531 usecs
[   13.079357] pci 0000:00:02.0: [8086:100e] type 00 class 0x020000 
conventional PCI endpoint
=================================================================
==368707==ERROR: AddressSanitizer: heap-use-after-free on address 
0x6060003d5f80 at pc 0x55ae8aeb437f bp 0x7f96d99f5500 sp 0x7f96d99f54f8
READ of size 8 at 0x6060003d5f80 thread T10
     #0 0x55ae8aeb437e in address_space_lookup_region 
../../home/mjt/qemu/master/system/physmem.c:350
     #1 0x55ae8aeb4648 in address_space_translate_internal 
../../home/mjt/qemu/master/system/physmem.c:374
     #2 0x55ae8aeb65b6 in address_space_translate_for_iotlb 
../../home/mjt/qemu/master/system/physmem.c:698
     #3 0x55ae8b0c938f in tlb_set_page_full 
../../home/mjt/qemu/master/accel/tcg/cputlb.c:1052
     #4 0x55ae8b0ca499 in tlb_set_page_with_attrs 
../../home/mjt/qemu/master/accel/tcg/cputlb.c:1199
     #5 0x55ae8b2370c0 in x86_cpu_tlb_fill 
../../home/mjt/qemu/master/target/i386/tcg/system/excp_helper.c:628
     #6 0x55ae8b0caa74 in tlb_fill_align 
../../home/mjt/qemu/master/accel/tcg/cputlb.c:1257
     #7 0x55ae8b0cfc75 in mmu_lookup1 
../../home/mjt/qemu/master/accel/tcg/cputlb.c:1658
     #8 0x55ae8b0d0534 in mmu_lookup 
../../home/mjt/qemu/master/accel/tcg/cputlb.c:1761
     #9 0x55ae8b0d3a3b in do_ld4_mmu 
../../home/mjt/qemu/master/accel/tcg/cputlb.c:2374
     #10 0x55ae8b0d8ad0 in cpu_ldl_mmu 
../../home/mjt/qemu/master/accel/tcg/ldst_common.c.inc:165
     #11 0x55ae8b3b11d9 in cpu_ldl_le_mmuidx_ra 
/home/mjt/qemu/master/include/accel/tcg/cpu-ldst.h:142
     #12 0x55ae8b3b8373 in do_interrupt64 
../../home/mjt/qemu/master/target/i386/tcg/seg_helper.c:979
     #13 0x55ae8b3ba0bd in do_interrupt_all 
../../home/mjt/qemu/master/target/i386/tcg/seg_helper.c:1238
     #14 0x55ae8b3ba2bf in do_interrupt_x86_hardirq 
../../home/mjt/qemu/master/target/i386/tcg/seg_helper.c:1270
     #15 0x55ae8b245071 in x86_cpu_exec_interrupt 
../../home/mjt/qemu/master/target/i386/tcg/system/seg_helper.c:209
     #16 0x55ae8b0a067c in cpu_handle_interrupt 
../../home/mjt/qemu/master/accel/tcg/cpu-exec.c:821
     #17 0x55ae8b0a15e4 in cpu_exec_loop 
../../home/mjt/qemu/master/accel/tcg/cpu-exec.c:925
     #18 0x55ae8b0a173b in cpu_exec_setjmp 
../../home/mjt/qemu/master/accel/tcg/cpu-exec.c:999
     #19 0x55ae8b0a1905 in cpu_exec 
../../home/mjt/qemu/master/accel/tcg/cpu-exec.c:1025
     #20 0x55ae8b0f0e48 in tcg_cpu_exec 
../../home/mjt/qemu/master/accel/tcg/tcg-accel-ops.c:81
     #21 0x55ae8b0f2b12 in mttcg_cpu_thread_fn 
../../home/mjt/qemu/master/accel/tcg/tcg-accel-ops-mttcg.c:94
     #22 0x55ae8ba9c4d5 in qemu_thread_start 
../../home/mjt/qemu/master/util/qemu-thread-posix.c:541
     #23 0x7f97736c11f4 in start_thread nptl/pthread_create.c:442
     #24 0x7f977374189b in clone3 
../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

0x6060003d5f80 is located 0 bytes inside of 64-byte region 
[0x6060003d5f80,0x6060003d5fc0)
freed by thread T1 here:
     #0 0x7f9774eb76a8 in __interceptor_free 
../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:52
     #1 0x55ae8aec5bc1 in address_space_dispatch_free 
../../home/mjt/qemu/master/system/physmem.c:2716
     #2 0x55ae8ae92afc in flatview_destroy 
../../home/mjt/qemu/master/system/memory.c:295
     #3 0x55ae8bab996c in call_rcu_thread 
../../home/mjt/qemu/master/util/rcu.c:301
     #4 0x55ae8ba9c4d5 in qemu_thread_start 
../../home/mjt/qemu/master/util/qemu-thread-posix.c:541
     #5 0x7f97736c11f4 in start_thread nptl/pthread_create.c:442

previously allocated by thread T4 here:
     #0 0x7f9774eb83b7 in __interceptor_calloc 
../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:77
     #1 0x7f97746e3670 in g_malloc0 
(/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x5a670)
     #2 0x55ae8ae96bd4 in generate_memory_topology 
../../home/mjt/qemu/master/system/memory.c:758
     #3 0x55ae8ae9ad9b in flatviews_reset 
../../home/mjt/qemu/master/system/memory.c:1074
     #4 0x55ae8ae9b2cf in memory_region_transaction_commit 
../../home/mjt/qemu/master/system/memory.c:1150
     #5 0x55ae8aea612b in memory_region_del_subregion 
../../home/mjt/qemu/master/system/memory.c:2700
     #6 0x55ae8ab3fae5 in pci_update_mappings 
../../home/mjt/qemu/master/hw/pci/pci.c:1717
     #7 0x55ae8ab4044d in pci_default_write_config 
../../home/mjt/qemu/master/hw/pci/pci.c:1790
     #8 0x55ae8a9e5e60 in e1000_write_config 
../../home/mjt/qemu/master/hw/net/e1000.c:1618
     #9 0x55ae8ab4ca87 in pci_host_config_write_common 
../../home/mjt/qemu/master/hw/pci/pci_host.c:96
     #10 0x55ae8ab4cf39 in pci_data_write 
../../home/mjt/qemu/master/hw/pci/pci_host.c:138
     #11 0x55ae8ab4d1cf in pci_host_data_write 
../../home/mjt/qemu/master/hw/pci/pci_host.c:188
     #12 0x55ae8ae94551 in memory_region_write_accessor 
../../home/mjt/qemu/master/system/memory.c:488
     #13 0x55ae8ae94beb in access_with_adjusted_size 
../../home/mjt/qemu/master/system/memory.c:564
     #14 0x55ae8ae9d6aa in memory_region_dispatch_write 
../../home/mjt/qemu/master/system/memory.c:1544
     #15 0x55ae8aecc896 in address_space_stw_internal 
../../home/mjt/qemu/master/system/memory_ldst.c.inc:415
     #16 0x55ae8aeccad7 in address_space_stw 
../../home/mjt/qemu/master/system/memory_ldst.c.inc:446
     #17 0x55ae8b2391a1 in helper_outw 
../../home/mjt/qemu/master/target/i386/tcg/system/misc_helper.c:45
     #18 0x7f96eef65a4d  (/memfd:tcg-jit (deleted)+0x1166a4d)

Thread T10 created by T0 here:
     #0 0x7f9774e49726 in __interceptor_pthread_create 
../../../../src/libsanitizer/asan/asan_interceptors.cpp:207
     #1 0x55ae8ba9c9a7 in qemu_thread_create 
../../home/mjt/qemu/master/util/qemu-thread-posix.c:581
     #2 0x55ae8b0f2f94 in mttcg_start_vcpu_thread 
../../home/mjt/qemu/master/accel/tcg/tcg-accel-ops-mttcg.c:143
     #3 0x55ae8ae7ba65 in qemu_init_vcpu 
../../home/mjt/qemu/master/system/cpus.c:709
     #4 0x55ae8b329362 in x86_cpu_realizefn 
../../home/mjt/qemu/master/target/i386/cpu.c:8865
     #5 0x55ae8b5a621f in device_set_realized 
../../home/mjt/qemu/master/hw/core/qdev.c:494
     #6 0x55ae8b5bd362 in property_set_bool 
../../home/mjt/qemu/master/qom/object.c:2375
     #7 0x55ae8b5b86af in object_property_set 
../../home/mjt/qemu/master/qom/object.c:1450
     #8 0x55ae8b5c22fd in object_property_set_qobject 
../../home/mjt/qemu/master/qom/qom-qobject.c:28
     #9 0x55ae8b5b8c29 in object_property_set_bool 
../../home/mjt/qemu/master/qom/object.c:1520
     #10 0x55ae8b5a50d4 in qdev_realize 
../../home/mjt/qemu/master/hw/core/qdev.c:276
     #11 0x55ae8b26fe3f in x86_cpu_new 
../../home/mjt/qemu/master/hw/i386/x86-common.c:64
     #12 0x55ae8b2701ff in x86_cpus_init 
../../home/mjt/qemu/master/hw/i386/x86-common.c:115
     #13 0x55ae8b267d90 in pc_init1 
../../home/mjt/qemu/master/hw/i386/pc_piix.c:185
     #14 0x55ae8b2695f7 in pc_i440fx_init 
../../home/mjt/qemu/master/hw/i386/pc_piix.c:451
     #15 0x55ae8b2699b3 in pc_i440fx_machine_10_1_init 
../../home/mjt/qemu/master/hw/i386/pc_piix.c:492
     #16 0x55ae8a7fa936 in machine_run_board_init 
../../home/mjt/qemu/master/hw/core/machine.c:1669
     #17 0x55ae8ae6fc53 in qemu_init_board 
../../home/mjt/qemu/master/system/vl.c:2710
     #18 0x55ae8ae7043b in qmp_x_exit_preconfig 
../../home/mjt/qemu/master/system/vl.c:2804
     #19 0x55ae8ae751b3 in qemu_init 
../../home/mjt/qemu/master/system/vl.c:3840
     #20 0x55ae8b8ba5d7 in main ../../home/mjt/qemu/master/system/main.c:71
     #21 0x7f977365f249 in __libc_start_call_main 
../sysdeps/nptl/libc_start_call_main.h:58

Thread T1 created by T0 here:
     #0 0x7f9774e49726 in __interceptor_pthread_create 
../../../../src/libsanitizer/asan/asan_interceptors.cpp:207
     #1 0x55ae8ba9c9a7 in qemu_thread_create 
../../home/mjt/qemu/master/util/qemu-thread-posix.c:581
     #2 0x55ae8baba213 in rcu_init_complete 
../../home/mjt/qemu/master/util/rcu.c:415
     #3 0x55ae8baba42c in rcu_init ../../home/mjt/qemu/master/util/rcu.c:471
     #4 0x7f977365f375 in call_init ../csu/libc-start.c:145
     #5 0x7f977365f375 in __libc_start_main_impl ../csu/libc-start.c:347

Thread T4 created by T0 here:
     #0 0x7f9774e49726 in __interceptor_pthread_create 
../../../../src/libsanitizer/asan/asan_interceptors.cpp:207
     #1 0x55ae8ba9c9a7 in qemu_thread_create 
../../home/mjt/qemu/master/util/qemu-thread-posix.c:581
     #2 0x55ae8b0f2f94 in mttcg_start_vcpu_thread 
../../home/mjt/qemu/master/accel/tcg/tcg-accel-ops-mttcg.c:143
     #3 0x55ae8ae7ba65 in qemu_init_vcpu 
../../home/mjt/qemu/master/system/cpus.c:709
     #4 0x55ae8b329362 in x86_cpu_realizefn 
../../home/mjt/qemu/master/target/i386/cpu.c:8865
     #5 0x55ae8b5a621f in device_set_realized 
../../home/mjt/qemu/master/hw/core/qdev.c:494
     #6 0x55ae8b5bd362 in property_set_bool 
../../home/mjt/qemu/master/qom/object.c:2375
     #7 0x55ae8b5b86af in object_property_set 
../../home/mjt/qemu/master/qom/object.c:1450
     #8 0x55ae8b5c22fd in object_property_set_qobject 
../../home/mjt/qemu/master/qom/qom-qobject.c:28
     #9 0x55ae8b5b8c29 in object_property_set_bool 
../../home/mjt/qemu/master/qom/object.c:1520
     #10 0x55ae8b5a50d4 in qdev_realize 
../../home/mjt/qemu/master/hw/core/qdev.c:276
     #11 0x55ae8b26fe3f in x86_cpu_new 
../../home/mjt/qemu/master/hw/i386/x86-common.c:64
     #12 0x55ae8b2701ff in x86_cpus_init 
../../home/mjt/qemu/master/hw/i386/x86-common.c:115
     #13 0x55ae8b267d90 in pc_init1 
../../home/mjt/qemu/master/hw/i386/pc_piix.c:185
     #14 0x55ae8b2695f7 in pc_i440fx_init 
../../home/mjt/qemu/master/hw/i386/pc_piix.c:451
     #15 0x55ae8b2699b3 in pc_i440fx_machine_10_1_init 
../../home/mjt/qemu/master/hw/i386/pc_piix.c:492
     #16 0x55ae8a7fa936 in machine_run_board_init 
../../home/mjt/qemu/master/hw/core/machine.c:1669
     #17 0x55ae8ae6fc53 in qemu_init_board 
../../home/mjt/qemu/master/system/vl.c:2710
     #18 0x55ae8ae7043b in qmp_x_exit_preconfig 
../../home/mjt/qemu/master/system/vl.c:2804
     #19 0x55ae8ae751b3 in qemu_init 
../../home/mjt/qemu/master/system/vl.c:3840
     #20 0x55ae8b8ba5d7 in main ../../home/mjt/qemu/master/system/main.c:71
     #21 0x7f977365f249 in __libc_start_call_main 
../sysdeps/nptl/libc_start_call_main.h:58

SUMMARY: AddressSanitizer: heap-use-after-free 
../../home/mjt/qemu/master/system/physmem.c:350 in 
address_space_lookup_region
Shadow bytes around the buggy address:
   0x0c0c80072ba0: fd fd fd fd fa fa fa fa fd fd fd fd fd fd fd fa
   0x0c0c80072bb0: fa fa fa fa fd fd fd fd fd fd fd fd fa fa fa fa
   0x0c0c80072bc0: fd fd fd fd fd fd fd fd fa fa fa fa fd fd fd fd
   0x0c0c80072bd0: fd fd fd fd fa fa fa fa fd fd fd fd fd fd fd fd
   0x0c0c80072be0: fa fa fa fa fd fd fd fd fd fd fd fa fa fa fa fa
=>0x0c0c80072bf0:[fd]fd fd fd fd fd fd fd fa fa fa fa fd fd fd fd
   0x0c0c80072c00: fd fd fd fd fa fa fa fa fd fd fd fd fd fd fd fd
   0x0c0c80072c10: fa fa fa fa fd fd fd fd fd fd fd fd fa fa fa fa
   0x0c0c80072c20: fd fd fd fd fd fd fd fd fa fa fa fa fd fd fd fd
   0x0c0c80072c30: fd fd fd fd fa fa fa fa fd fd fd fd fd fd fd fd
   0x0c0c80072c40: fa fa fa fa fd fd fd fd fd fd fd fd fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
   Addressable:           00
   Partially addressable: 01 02 03 04 05 06 07
   Heap left redzone:       fa
   Freed heap region:       fd
   Stack left redzone:      f1
   Stack mid redzone:       f2
   Stack right redzone:     f3
   Stack after return:      f5
   Stack use after scope:   f8
   Global redzone:          f9
   Global init order:       f6
   Poisoned by user:        f7
   Container overflow:      fc
   Array cookie:            ac
   Intra object redzone:    bb
   ASan internal:           fe
   Left alloca redzone:     ca
   Right alloca redzone:    cb
==368707==ABORTING



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: apparent race condition in mttcg memory handling
  2025-05-30 19:20 apparent race condition in mttcg memory handling Michael Tokarev
  2025-06-04 10:47 ` Michael Tokarev
@ 2025-07-21 11:47 ` Philippe Mathieu-Daudé
  2025-07-21 16:23   ` Pierrick Bouvier
  2025-07-22 20:11 ` Gustavo Romero
  2 siblings, 1 reply; 12+ messages in thread
From: Philippe Mathieu-Daudé @ 2025-07-21 11:47 UTC (permalink / raw)
  To: Michael Tokarev, QEMU Development
  Cc: Jonathan Cameron, Pierrick Bouvier, Alex Bennée,
	Richard Henderson, Paolo Bonzini, Stefan Hajnoczi,
	Mark Cave-Ayland

(Cc'ing few more developers)

On 30/5/25 21:20, Michael Tokarev wrote:
> Hi!
> 
> For quite some time (almost whole day yesterday) I'm trying to find out
> what's going on with mmtcg in qemu.  There's apparently a race condition
> somewhere, like a use-after-free or something.
> 
> It started as an incarnation of
> https://gitlab.com/qemu-project/qemu/-/issues/1921 -- the same assertion
> failure, but on an x86_64 host this time (it's also mentioned in that
> issue).
> 
> However, that particular assertion failure is not the only possible
> outcome.  We're hitting multiple assertion failures or SIGSEGVs in
> physmem.c and related files, - 4 or 5 different places so far.
> 
> The problem here is that the bug is rather difficult to reproduce.
> What I've been using so far was to make most host cores busy, and
> specify amount of virtual CPUs close to actual host cores (threads).
> 
> For example, on my 4-core, 8-threads notebook, I used `stress -c 8`
> and ran qemu with -smp 10, to trigger this issue. However, on this
> very notebook it is really difficult to trigger it, - it happens
> every 30..50 runs or so.
> 
> The reproducer I was using - it was just booting kernel, no user-
> space is needed. Qemu crashes during kernel init, or it runs fine.
> 
> I used regular kernel from debian sid:
>   http://deb.debian.org/debian/pool/main/l/linux-signed-amd64/linux- 
> image-amd64_6.12.29-1_amd64.deb
> Extract vmlinuz-6.12.29-amd64 from there.
> 
> In order to simplify the reproducing, I created a tiny initrd with
> just one executable in there, which does a poweroff:
> 
> cat >poweroff.c <<'EOF'
> #include <sys/reboot.h>
> #include <unistd.h>
> 
> int main(void) {
>    reboot(RB_POWER_OFF);
>    sleep(5);
>    return 0;
> }
> EOF
> diet gcc -static -o init poweroff.c
> echo init | cpio -o -H newc > initrd
> 
> (it uses dietlibc, optional, just to make the initrd smaller).
> 
> Now, the qemu invocation I used:
> 
>   qemu-system-x86_64 -kernel vmlinuz -initrd initrd \
>     -append "console=ttyS0" \
>     -vga none -display none \
>     -serial file:/dev/tty \
>     -monitor stdio \
>     -m 256 \
>     -smp 16
> 
> This way, it either succeeds, terminating normally due to
> the initrd hating the system, or it will segfault or assert
> as per the issue.
> 
> For a 64-core machine, I used -smp 64, and had 16..40 cores
> being busy with other stuff.  Also, adding `nice' in front
> of that command apparently helps.
> 
> Now, to the various issues/places I've hit.  Here's a typical
> output:
> 
> ...
> [    3.129806] smpboot: x86: Booting SMP configuration:
> [    3.135789] .... node  #0, CPUs:        #1  #2  #3  #4  #5  #6  #7 
> #8  #9
> [    0.000000] calibrate_delay_direct() failed to get a good estimate 
> for loops_per_jiffy.
> [    0.000000] Probably due to long platform interrupts. Consider using 
> "lpj=" boot option.
> [    0.000000] calibrate_delay_direct() failed to get a good estimate 
> for loops_per_jiffy.
> [    0.000000] Probably due to long platform interrupts. Consider using 
> "lpj=" boot option.
> [    0.000000] calibrate_delay_direct() failed to get a good estimate 
> for loops_per_jiffy.
> [    0.000000] Probably due to long platform interrupts. Consider using 
> "lpj=" boot option.
> [    0.000000] calibrate_delay_direct() failed to get a good estimate 
> for loops_per_jiffy.
> [    0.000000] Probably due to long platform interrupts. Consider using 
> "lpj=" boot option.
> [    4.494389] calibrate_delay_direct() failed to get a good estimate 
> for loops_per_jiffy.
> [    4.494389] Probably due to long platform interrupts. Consider using 
> "lpj=" boot option.
> [    4.494396] calibrate_delay_direct() failed to get a good estimate 
> for loops_per_jiffy.
> [    4.494396] Probably due to long platform interrupts. Consider using 
> "lpj=" boot option.
> [    4.494401] calibrate_delay_direct() failed to get a good estimate 
> for loops_per_jiffy.
> [    4.494401] Probably due to long platform interrupts. Consider using 
> "lpj=" boot option.
> [    4.494408] calibrate_delay_direct() failed to get a good estimate 
> for loops_per_jiffy.
> [    4.494408] Probably due to long platform interrupts. Consider using 
> "lpj=" boot option.
> [    4.494415] calibrate_delay_direct() failed to get a good estimate 
> for loops_per_jiffy.
> [    4.494415] Probably due to long platform interrupts. Consider using 
> "lpj=" boot option.
> [    5.864038] smp: Brought up 1 node, 10 CPUs
> [    5.865772] smpboot: Total of 10 processors activated (25983.25 
> BogoMIPS)
> [    6.119683] Memory: 200320K/261624K available (16384K kernel code, 
> 2486K rwdata, 11780K rodata, 4148K init, 4956K bss, 53176K reserved, 0K 
> cma-reserved)
> [    6.591933] devtmpfs: initialized
> [    6.635844] x86/mm: Memory block size: 128MB
> [    6.756849] clocksource: jiffies: mask: 0xffffffff max_cycles: 
> 0xffffffff, max_idle_ns: 7645041785100000 ns
> [    6.774545] futex hash table entries: 4096 (order: 6, 262144 bytes, 
> linear)
> [    6.840775] pinctrl core: initialized pinctrl subsystem
> [    7.117085] NET: Registered PF_NETLINK/PF_ROUTE protocol family
> [    7.165883] DMA: preallocated 128 KiB GFP_KERNEL pool for atomic 
> allocations
> [    7.184243] DMA: preallocated 128 KiB GFP_KERNEL|GFP_DMA pool for 
> atomic allocations
> [    7.188322] DMA: preallocated 128 KiB GFP_KERNEL|GFP_DMA32 pool for 
> atomic allocations
> [    7.195902] audit: initializing netlink subsys (disabled)
> [    7.223865] audit: type=2000 audit(1748628013.324:1): 
> state=initialized audit_enabled=0 res=1
> [    7.290904] thermal_sys: Registered thermal governor 'fair_share'
> [    7.291980] thermal_sys: Registered thermal governor 'bang_bang'
> [    7.295875] thermal_sys: Registered thermal governor 'step_wise'
> [    7.299817] thermal_sys: Registered thermal governor 'user_space'
> [    7.303804] thermal_sys: Registered thermal governor 'power_allocator'
> [    7.316281] cpuidle: using governor ladder
> [    7.331907] cpuidle: using governor menu
> [    7.348199] acpiphp: ACPI Hot Plug PCI Controller Driver version: 0.5
> [    7.407802] PCI: Using configuration type 1 for base access
> [    7.417386] mtrr: your CPUs had inconsistent fixed MTRR settings
> [    7.418244] mtrr: your CPUs had inconsistent variable MTRR settings
> [    7.419048] mtrr: your CPUs had inconsistent MTRRdefType settings
> [    7.419938] mtrr: probably your BIOS does not setup all CPUs.
> [    7.420691] mtrr: corrected configuration.
> [    7.461270] kprobes: kprobe jump-optimization is enabled. All kprobes 
> are optimized if possible.
> [    7.591938] HugeTLB: registered 2.00 MiB page size, pre-allocated 0 
> pages
> [    7.595986] HugeTLB: 28 KiB vmemmap can be freed for a 2.00 MiB page
> [    7.816900] ACPI: Added _OSI(Module Device)
> [    7.819950] ACPI: Added _OSI(Processor Device)
> [    7.823873] ACPI: Added _OSI(3.0 _SCP Extensions)
> [    7.827683] ACPI: Added _OSI(Processor Aggregator Device)
> [    8.000944] ACPI: 1 ACPI AML tables successfully acquired and loaded
> [    8.355952] ACPI: Interpreter enabled
> [    8.406604] ACPI: PM: (supports S0 S3 S4 S5)
> [    8.416143] ACPI: Using IOAPIC for interrupt routing
> [    8.448173] PCI: Using host bridge windows from ACPI; if necessary, 
> use "pci=nocrs" and report a bug
> [    8.468051] PCI: Using E820 reservations for host bridge windows
> [    8.562534] ACPI: Enabled 2 GPEs in block 00 to 0F
> [    9.153432] ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
> [    9.166585] acpi PNP0A03:00: _OSC: OS supports [ASPM ClockPM Segments 
> MSI HPX-Type3]
> [    9.168452] acpi PNP0A03:00: _OSC: not requesting OS control; OS 
> requires [ExtendedConfig ASPM ClockPM MSI]
> [    9.181933] acpi PNP0A03:00: fail to add MMCONFIG information, can't 
> access extended configuration space under this bridge
> [    9.297562] acpiphp: Slot [2] registered
> ...
> [    9.369007] PCI host bridge to bus 0000:00
> [    9.376590] pci_bus 0000:00: root bus resource [io  0x0000-0x0cf7 
> window]
> [    9.379987] pci_bus 0000:00: root bus resource [io  0x0d00-0xffff 
> window]
> [    9.383826] pci_bus 0000:00: root bus resource [mem 
> 0x000a0000-0x000bffff window]
> [    9.387818] pci_bus 0000:00: root bus resource [mem 
> 0x10000000-0xfebfffff window]
> [    9.393681] pci_bus 0000:00: root bus resource [mem 
> 0x100000000-0x17fffffff window]
> [    9.396987] pci_bus 0000:00: root bus resource [bus 00-ff]
> [    9.414378] pci 0000:00:00.0: [8086:1237] type 00 class 0x060000 
> conventional PCI endpoint
> [    9.477179] pci 0000:00:01.0: [8086:7000] type 00 class 0x060100 
> conventional PCI endpoint
> [    9.494836] pci 0000:00:01.1: [8086:7010] type 00 class 0x010180 
> conventional PCI endpoint
> [    9.527173] pci 0000:00:01.1: BAR 4 [io  0xc040-0xc04f]
> Segmentation fault
> 
> 
> So it breaks somewhere in PCI init, after SMP/CPUs has been inited
> by the guest kernel.
> 
> Thread 21 "qemu-system-x86" received signal SIGSEGV, Segmentation fault.
> 0x0000555555e2e9c0 in section_covers_addr (section=0x7fff58307, 
> addr=182591488) at ../system/physmem.c:309
> 309         return int128_gethi(section->size) ||
> (gdb) p *section
> Cannot access memory at address 0x7fff58307
> 
> This one has been seen multiple times.
> 
> Thread 53 "qemu-system-x86" received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7ffe8a7fc6c0 (LWP 104067)]
> 0x0000555555e30382 in memory_region_section_get_iotlb 
> (cpu=0x5555584e0a90, section=0x7fff58c3eac0) at
>               ../system/physmem.c:1002
> 1002        return section - d->map.sections;
> d is NULL here
> 
> 
> Thread 22 "qemu-system-x86" received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7fff0bfff6c0 (LWP 57595)]
> 0x0000555555e42c9a in memory_region_get_iommu (mr=0xffffffc1ffffffc1) at 
> include/exec/memory.h:1756
> 1756        if (mr->alias) {
> (gdb) p *mr
> Cannot access memory at address 0xffffffc1ffffffc1
> (gdb) frame 1
> #1  0x0000555555e42cb9 in memory_region_get_iommu (mr=0x7fff54239a10) at 
> include/exec/memory.h:1757
> 1757            return memory_region_get_iommu(mr->alias);
> (gdb) p mr
> $1 = (MemoryRegion *) 0x7fff54239a10
> 
> 
> [    9.222531] pci 0000:00:02.0: BAR 0 [mem 0xfebc0000-0xfebdffff]
> [
> Thread 54 "qemu-system-x86" received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7ffebeffd6c0 (LWP 14977)]
> 
> (gdb) l
> 1004    /* Called from RCU critical section */
> 1005    hwaddr memory_region_section_get_iotlb(CPUState *cpu,
> 1006                                           MemoryRegionSection 
> *section)
> 1007    {
> 1008        AddressSpaceDispatch *d = flatview_to_dispatch(section->fv);
> 1009        return section - d->map.sections;
> 1010    }
> 1011
> 1012    static int subpage_register(subpage_t *mmio, uint32_t start, 
> uint32_t end,
> 1013                                uint16_t section);
> 
> (gdb) p *section
> $1 = {size = 4940204083636081308795136, mr = 0x7fff98739760, fv = 
> 0x7fff998f6fd0,
>    offset_within_region = 0, offset_within_address_space = 0, readonly = 
> false,
>    nonvolatile = false, unmergeable = 12}
> 
> (gdb) p *section->fv
> $2 = {rcu = {next = 0x0, func = 0x20281}, ref = 2555275280, ranges = 
> 0x7fff99486a60, nr = 0,
>    nr_allocated = 0, dispatch = 0x0, root = 0xffffffc1ffffffc1}
> 
> (gdb) bt
> #0  0x0000555555e5118c in memory_region_section_get_iotlb 
> (cpu=cpu@entry=0x55555894fdf0,
>      section=section@entry=0x7fff984e6810) at system/physmem.c:1009
> #1  0x0000555555e6e07a in tlb_set_page_full (cpu=cpu@entry=0x55555894fdf0,
>      mmu_idx=mmu_idx@entry=6, addr=addr@entry=151134208, 
> full=full@entry=0x7ffebeffbd60)
>      at accel/tcg/cputlb.c:1088
> #2  0x0000555555e70a92 in tlb_set_page_with_attrs 
> (cpu=cpu@entry=0x55555894fdf0,
>      addr=addr@entry=151134208, paddr=paddr@entry=151134208, attrs=..., 
> prot=<optimized out>,
>      mmu_idx=mmu_idx@entry=6, size=4096) at accel/tcg/cputlb.c:1193
> #3  0x0000555555d4ae44 in x86_cpu_tlb_fill (cs=0x55555894fdf0, 
> addr=151138272,
>      size=<optimized out>, access_type=MMU_DATA_STORE, mmu_idx=6, 
> probe=<optimized out>,
>      retaddr=0) at target/i386/tcg/system/excp_helper.c:624
> #4  0x0000555555e6e8cf in tlb_fill_align (cpu=0x55555894fdf0, 
> addr=151138272,
>      type=MMU_DATA_LOAD, type@entry=MMU_DATA_STORE, mmu_idx=6, 
> memop=memop@entry=MO_8,
>      size=-1739692016, size@entry=151138272, probe=true, ra=0) at accel/ 
> tcg/cputlb.c:1251
> #5  0x0000555555e6eb0d in probe_access_internal 
> (cpu=cpu@entry=0x55555894fdf0,
>      addr=addr@entry=151138272, fault_size=fault_size@entry=0,
>      access_type=access_type@entry=MMU_DATA_STORE, mmu_idx=<optimized out>,
>      nonfault=nonfault@entry=true, phost=0x7ffebeffc0a8, 
> pfull=0x7ffebeffbfa0, retaddr=0,
>      check_mem_cbs=false) at accel/tcg/cputlb.c:1371
> #6  0x0000555555e70c84 in probe_access_full_mmu (env=0x5555589529b0, 
> addr=addr@entry=151138272,
>      size=size@entry=0, access_type=access_type@entry=MMU_DATA_STORE, 
> mmu_idx=<optimized out>,
>      phost=phost@entry=0x7ffebeffc0a8, pfull=0x7ffebeffbfa0) at accel/ 
> tcg/cputlb.c:1439
> #7  0x0000555555d497c9 in ptw_translate (inout=0x7ffebeffc090, 
> addr=151138272)
>      at target/i386/tcg/system/excp_helper.c:68
> #8  0x0000555555d49988 in mmu_translate (env=env@entry=0x5555589529b0,
>      in=in@entry=0x7ffebeffc140, out=out@entry=0x7ffebeffc110, 
> err=err@entry=0x7ffebeffc120,
>      ra=ra@entry=0) at target/i386/tcg/system/excp_helper.c:198
> #9  0x0000555555d4aece in get_physical_address (env=0x5555589529b0, 
> addr=18446741874686299840,
>      access_type=MMU_DATA_LOAD, mmu_idx=4, out=0x7ffebeffc110, 
> err=0x7ffebeffc120, ra=0)
>      at target/i386/tcg/system/excp_helper.c:597
> #10 x86_cpu_tlb_fill (cs=0x55555894fdf0, addr=18446741874686299840, 
> size=<optimized out>,
>      access_type=MMU_DATA_LOAD, mmu_idx=4, probe=<optimized out>, 
> retaddr=0)
>      at target/i386/tcg/system/excp_helper.c:617
> #11 0x0000555555e6e8cf in tlb_fill_align (cpu=0x55555894fdf0, 
> addr=18446741874686299840,
>      type=type@entry=MMU_DATA_LOAD, mmu_idx=4, memop=MO_8, 
> memop@entry=MO_32, size=-1739692016,
>      size@entry=3776, probe=false, ra=0) at accel/tcg/cputlb.c:1251
> #12 0x0000555555e6ed4d in mmu_lookup1 (cpu=cpu@entry=0x55555894fdf0,
>      data=data@entry=0x7ffebeffc310, memop=memop@entry=MO_32, 
> mmu_idx=mmu_idx@entry=4,
>      access_type=access_type@entry=MMU_DATA_LOAD, ra=ra@entry=0) at 
> accel/tcg/cputlb.c:1652
> #13 0x0000555555e6eea5 in mmu_lookup (cpu=cpu@entry=0x55555894fdf0,
>      addr=addr@entry=18446741874686299840, oi=oi@entry=36, ra=ra@entry=0,
>      type=type@entry=MMU_DATA_LOAD, l=l@entry=0x7ffebeffc310) at accel/ 
> tcg/cputlb.c:1755
> #14 0x0000555555e6f2f3 in do_ld4_mmu (cpu=cpu@entry=0x55555894fdf0,
>      addr=addr@entry=18446741874686299840, oi=oi@entry=36, ra=ra@entry=0,
>      access_type=access_type@entry=MMU_DATA_LOAD) at accel/tcg/ 
> cputlb.c:2364
> #15 0x0000555555e71dba in cpu_ldl_mmu (env=0x5555589529b0, 
> addr=18446741874686299840, oi=36,
>      ra=0) at accel/tcg/ldst_common.c.inc:165
> #16 cpu_ldl_le_mmuidx_ra (env=env@entry=0x5555589529b0, 
> addr=addr@entry=18446741874686299840,
>      mmu_idx=<optimized out>, ra=ra@entry=0) at accel/tcg/ 
> ldst_common.c.inc:308
> #17 0x0000555555db72da in do_interrupt64 (env=0x5555589529b0, intno=236, 
> is_int=0, error_code=0,
>      next_eip=<optimized out>, is_hw=0) at target/i386/tcg/seg_helper.c:954
> #18 do_interrupt_all (cpu=cpu@entry=0x55555894fdf0, intno=236, 
> is_int=is_int@entry=0,
>      error_code=error_code@entry=0, next_eip=next_eip@entry=0, 
> is_hw=is_hw@entry=1)
>      at target/i386/tcg/seg_helper.c:1213
> #19 0x0000555555db884a in do_interrupt_x86_hardirq 
> (env=env@entry=0x5555589529b0,
>      intno=<optimized out>, is_hw=is_hw@entry=1) at target/i386/tcg/ 
> seg_helper.c:1245
> #20 0x0000555555d4f06f in x86_cpu_exec_interrupt (cs=0x55555894fdf0,
>      interrupt_request=<optimized out>) at target/i386/tcg/system/ 
> seg_helper.c:209
> #21 0x0000555555e660ed in cpu_handle_interrupt (cpu=0x55555894fdf0, 
> last_tb=<synthetic pointer>)
>      at accel/tcg/cpu-exec.c:851
> #22 cpu_exec_loop (cpu=cpu@entry=0x55555894fdf0, 
> sc=sc@entry=0x7ffebeffc580)
>      at accel/tcg/cpu-exec.c:955
> #23 0x0000555555e663f1 in cpu_exec_setjmp (cpu=cpu@entry=0x55555894fdf0,
>      sc=sc@entry=0x7ffebeffc580) at accel/tcg/cpu-exec.c:1033
>      type=type@entry=MMU_DATA_LOAD, l=l@entry=0x7ffebeffc310) at accel/ 
> tcg/cputlb.c:1755
> #14 0x0000555555e6f2f3 in do_ld4_mmu (cpu=cpu@entry=0x55555894fdf0,
>      addr=addr@entry=18446741874686299840, oi=oi@entry=36, ra=ra@entry=0,
>      access_type=access_type@entry=MMU_DATA_LOAD) at accel/tcg/ 
> cputlb.c:2364
> #15 0x0000555555e71dba in cpu_ldl_mmu (env=0x5555589529b0, 
> addr=18446741874686299840, oi=36,
>      ra=0) at accel/tcg/ldst_common.c.inc:165
> #16 cpu_ldl_le_mmuidx_ra (env=env@entry=0x5555589529b0, 
> addr=addr@entry=18446741874686299840,
>      mmu_idx=<optimized out>, ra=ra@entry=0) at accel/tcg/ 
> ldst_common.c.inc:308
> #17 0x0000555555db72da in do_interrupt64 (env=0x5555589529b0, intno=236, 
> is_int=0, error_code=0,
>      next_eip=<optimized out>, is_hw=0) at target/i386/tcg/seg_helper.c:954
> #18 do_interrupt_all (cpu=cpu@entry=0x55555894fdf0, intno=236, 
> is_int=is_int@entry=0,
>      error_code=error_code@entry=0, next_eip=next_eip@entry=0, 
> is_hw=is_hw@entry=1)
>      at target/i386/tcg/seg_helper.c:1213
> #19 0x0000555555db884a in do_interrupt_x86_hardirq 
> (env=env@entry=0x5555589529b0,
>      intno=<optimized out>, is_hw=is_hw@entry=1) at target/i386/tcg/ 
> seg_helper.c:1245
> #20 0x0000555555d4f06f in x86_cpu_exec_interrupt (cs=0x55555894fdf0,
>      interrupt_request=<optimized out>) at target/i386/tcg/system/ 
> seg_helper.c:209
> #21 0x0000555555e660ed in cpu_handle_interrupt (cpu=0x55555894fdf0, 
> last_tb=<synthetic pointer>)
>      at accel/tcg/cpu-exec.c:851
> #22 cpu_exec_loop (cpu=cpu@entry=0x55555894fdf0, 
> sc=sc@entry=0x7ffebeffc580)
>      at accel/tcg/cpu-exec.c:955
> #23 0x0000555555e663f1 in cpu_exec_setjmp (cpu=cpu@entry=0x55555894fdf0,
>      sc=sc@entry=0x7ffebeffc580) at accel/tcg/cpu-exec.c:1033
> --Type <RET> for more, q to quit, c to continue without paging--
> #24 0x0000555555e66a5d in cpu_exec (cpu=cpu@entry=0x55555894fdf0) at 
> accel/tcg/cpu-exec.c:1059
> #25 0x0000555555d2bdc7 in tcg_cpu_exec (cpu=cpu@entry=0x55555894fdf0)
>      at accel/tcg/tcg-accel-ops.c:80
> #26 0x0000555555d2c1c3 in mttcg_cpu_thread_fn 
> (arg=arg@entry=0x55555894fdf0)
>      at accel/tcg/tcg-accel-ops-mttcg.c:94
> #27 0x0000555556056d90 in qemu_thread_start (args=0x5555589cdba0) at 
> util/qemu-thread-posix.c:541
> #28 0x00007ffff60e0b7b in ?? () from /lib/x86_64-linux-gnu/libc.so.6
> #29 0x00007ffff615e7b8 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
> 
> 
> qemu-system-x86_64: ./include/exec/ram_addr.h:91: ramblock_ptr: 
> Assertion `offset_in_ramblock(block, offset)' failed.
> 
> (gdb) bt
> #0  0x00007ffff6076507 in abort () from /lib/x86_64-linux-gnu/libc.so.6
> #1  0x00007ffff6076420 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
> #2  0x0000555555a047fa in ramblock_ptr (offset=281471527758833, 
> block=<optimized out>)
>      at ./include/exec/ram_addr.h:91
> #3  0x0000555555a04c83 in ramblock_ptr (block=<optimized out>, 
> offset=<optimized out>)
>      at system/physmem.c:2238
> #4  qemu_ram_ptr_length (lock=false, is_write=true, block=<optimized 
> out>, addr=<optimized out>,
>      size=0x0) at system/physmem.c:2430
> #5  qemu_map_ram_ptr (ram_block=<optimized out>, addr=<optimized out>) 
> at system/physmem.c:2443
> #6  0x0000555555e4af6b in memory_region_get_ram_ptr (mr=<optimized out>) 
> at system/memory.c:2452
> #7  0x0000555555e6e024 in tlb_set_page_full (cpu=cpu@entry=0x5555589f9f50,
>      mmu_idx=mmu_idx@entry=4, addr=addr@entry=18446741874686296064,
>      full=full@entry=0x7ffebd7f90b0) at accel/tcg/cputlb.c:1065
> #8  0x0000555555e70a92 in tlb_set_page_with_attrs 
> (cpu=cpu@entry=0x5555589f9f50,
>      addr=addr@entry=18446741874686296064, paddr=paddr@entry=206749696, 
> attrs=...,
>      prot=<optimized out>, mmu_idx=mmu_idx@entry=4, size=4096) at accel/ 
> tcg/cputlb.c:1193
> #9  0x0000555555d4ae44 in x86_cpu_tlb_fill (cs=0x5555589f9f50, 
> addr=18446741874686299840,
>      size=<optimized out>, access_type=MMU_DATA_LOAD, mmu_idx=4, 
> probe=<optimized out>, retaddr=0)
>      at target/i386/tcg/system/excp_helper.c:624
> #10 0x0000555555e6e8cf in tlb_fill_align (cpu=0x5555589f9f50, 
> addr=18446741874686299840,
>      type=type@entry=MMU_DATA_LOAD, mmu_idx=4, memop=MO_8, 
> memop@entry=MO_32, size=-1115714056,
>      size@entry=3776, probe=false, ra=0) at accel/tcg/cputlb.c:1251
> #11 0x0000555555e6ed4d in mmu_lookup1 (cpu=cpu@entry=0x5555589f9f50,
>      data=data@entry=0x7ffebd7f9310, memop=memop@entry=MO_32, 
> mmu_idx=mmu_idx@entry=4,
>      access_type=access_type@entry=MMU_DATA_LOAD, ra=ra@entry=0) at 
> accel/tcg/cputlb.c:1652
> #12 0x0000555555e6eea5 in mmu_lookup (cpu=cpu@entry=0x5555589f9f50,
>      addr=addr@entry=18446741874686299840, oi=oi@entry=36, ra=ra@entry=0,
>      type=type@entry=MMU_DATA_LOAD, l=l@entry=0x7ffebd7f9310) at accel/ 
> tcg/cputlb.c:1755
> #13 0x0000555555e6f2f3 in do_ld4_mmu (cpu=cpu@entry=0x5555589f9f50,
>      addr=addr@entry=18446741874686299840, oi=oi@entry=36, ra=ra@entry=0,
>      access_type=access_type@entry=MMU_DATA_LOAD) at accel/tcg/ 
> cputlb.c:2364
> #14 0x0000555555e71dba in cpu_ldl_mmu (env=0x5555589fcb10, 
> addr=18446741874686299840, oi=36,
>      ra=0) at accel/tcg/ldst_common.c.inc:165
> #15 cpu_ldl_le_mmuidx_ra (env=env@entry=0x5555589fcb10, 
> addr=addr@entry=18446741874686299840,
>      mmu_idx=<optimized out>, ra=ra@entry=0) at accel/tcg/ 
> ldst_common.c.inc:308
> #16 0x0000555555db72da in do_interrupt64 (env=0x5555589fcb10, intno=236, 
> is_int=0, error_code=0,
>      next_eip=<optimized out>, is_hw=0) at target/i386/tcg/seg_helper.c:954
> #17 do_interrupt_all (cpu=cpu@entry=0x5555589f9f50, intno=236, 
> is_int=is_int@entry=0,
>      error_code=error_code@entry=0, next_eip=next_eip@entry=0, 
> is_hw=is_hw@entry=1)
>      at target/i386/tcg/seg_helper.c:1213
> #18 0x0000555555db884a in do_interrupt_x86_hardirq 
> (env=env@entry=0x5555589fcb10,
>      intno=<optimized out>, is_hw=is_hw@entry=1) at target/i386/tcg/ 
> seg_helper.c:1245
> #19 0x0000555555d4f06f in x86_cpu_exec_interrupt (cs=0x5555589f9f50,
>      interrupt_request=<optimized out>) at target/i386/tcg/system/ 
> seg_helper.c:209
> #20 0x0000555555e660ed in cpu_handle_interrupt (cpu=0x5555589f9f50, 
> last_tb=<synthetic pointer>)
>      at accel/tcg/cpu-exec.c:851
> #21 cpu_exec_loop (cpu=cpu@entry=0x5555589f9f50, 
> sc=sc@entry=0x7ffebd7f9580)
>      at accel/tcg/cpu-exec.c:955
> #22 0x0000555555e663f1 in cpu_exec_setjmp (cpu=cpu@entry=0x5555589f9f50,
>      sc=sc@entry=0x7ffebd7f9580) at accel/tcg/cpu-exec.c:1033
> #23 0x0000555555e66a5d in cpu_exec (cpu=cpu@entry=0x5555589f9f50) at 
> accel/tcg/cpu-exec.c:1059
> #24 0x0000555555d2bdc7 in tcg_cpu_exec (cpu=cpu@entry=0x5555589f9f50)
>      at accel/tcg/tcg-accel-ops.c:80
> #25 0x0000555555d2c1c3 in mttcg_cpu_thread_fn 
> (arg=arg@entry=0x5555589f9f50)
>      at accel/tcg/tcg-accel-ops-mttcg.c:94
> #26 0x0000555556056d90 in qemu_thread_start (args=0x55555856bf60) at 
> util/qemu-thread-posix.c:541
> #27 0x00007ffff60e0b7b in ?? () from /lib/x86_64-linux-gnu/libc.so.6
> #28 0x00007ffff615e7b8 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
> 
> (gdb) frame 2
> #2  0x0000555555a047fa in ramblock_ptr (offset=281471527758833, 
> block=<optimized out>)
>      at ./include/exec/ram_addr.h:91
> 91          assert(offset_in_ramblock(block, offset));
> 
> (gdb) l
> 86          return (b && b->host && offset < b->used_length) ? true : 
> false;
> 87      }
> 88
> 89      static inline void *ramblock_ptr(RAMBlock *block, ram_addr_t 
> offset)
> 90      {
> 91          assert(offset_in_ramblock(block, offset));
> 92          return (char *)block->host + offset;
> 93      }
> 94
> 95      static inline unsigned long int ramblock_recv_bitmap_offset(void 
> *host_addr,
> 
> 
> [    9.439487] pci 0000:00:02.0: BAR 1 [io  0xc000-0xc03f]
> 
> Thread 65 "qemu-system-x86" received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7ffe9cff96c0 (LWP 15472)]
> phys_page_find (d=d@entry=0x7fff905ec880, addr=addr@entry=111288320) at 
> system/physmem.c:337
> 337         if (section_covers_addr(&sections[lp.ptr], addr)) {
> (gdb) l
> 332             }
> 333             p = nodes[lp.ptr];
> 334             lp = p[(index >> (i * P_L2_BITS)) & (P_L2_SIZE - 1)];
> 335         }
> 336
> 337         if (section_covers_addr(&sections[lp.ptr], addr)) {
> 338             return &sections[lp.ptr];
> 339         } else {
> 340             return &sections[PHYS_SECTION_UNASSIGNED];
> 341         }
> (gdb)
> 
> 
> I was doing a bisection between 9.2.0 and 10.0.0, since we observed
> this issue happening wiht 10.0 but not - with 9.2.  So some of the
> above failures might be somewhere from the middle between 9.2 and
> 10.0.  However, I was able to trigger some of the failures with
> 9.2.0, though with much less probability.  And some can be triggered
> in current master too, with much better probability.
> 
> On my 4-core notebook, the above command line fails every 20..50 run.
> 
> I was never able to reproduce the assertion failure as shown in !1921.
> 
> As of now, this issue is hitting debian trixie, - in debci, when a
> package which creates a guest image tries to run qemu but in the
> debci environment there's no kvm available, so it resorts to tcg.
> 
> On IRC, Manos Pitsidianakis noted that he was debugging use-after-free
> with MemoryRegion recently, and posted a patch which can help a bit:
> https://people.linaro.org/~manos.pitsidianakis/backtrace.diff
> 
> I'm not sure where to go from here.
> 
> Just collecting everything we have now.
> 
> Thanks,
> 
> /mjt
> 



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: apparent race condition in mttcg memory handling
  2025-07-21 11:47 ` Philippe Mathieu-Daudé
@ 2025-07-21 16:23   ` Pierrick Bouvier
  2025-07-21 16:29     ` Pierrick Bouvier
  0 siblings, 1 reply; 12+ messages in thread
From: Pierrick Bouvier @ 2025-07-21 16:23 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé, Michael Tokarev, QEMU Development
  Cc: Jonathan Cameron, Alex Bennée, Richard Henderson,
	Paolo Bonzini, Stefan Hajnoczi, Mark Cave-Ayland

Hi Michael,

On 7/21/25 4:47 AM, Philippe Mathieu-Daudé wrote:
> (Cc'ing few more developers)
> 
> On 30/5/25 21:20, Michael Tokarev wrote:
>> Hi!
>>
>> For quite some time (almost whole day yesterday) I'm trying to find out
>> what's going on with mmtcg in qemu.  There's apparently a race condition
>> somewhere, like a use-after-free or something.
>>
>> It started as an incarnation of
>> https://gitlab.com/qemu-project/qemu/-/issues/1921 -- the same assertion
>> failure, but on an x86_64 host this time (it's also mentioned in that
>> issue).
>>
>> However, that particular assertion failure is not the only possible
>> outcome.  We're hitting multiple assertion failures or SIGSEGVs in
>> physmem.c and related files, - 4 or 5 different places so far.
>>
>> The problem here is that the bug is rather difficult to reproduce.
>> What I've been using so far was to make most host cores busy, and
>> specify amount of virtual CPUs close to actual host cores (threads).
>>
>> For example, on my 4-core, 8-threads notebook, I used `stress -c 8`
>> and ran qemu with -smp 10, to trigger this issue. However, on this
>> very notebook it is really difficult to trigger it, - it happens
>> every 30..50 runs or so.
>>
>> The reproducer I was using - it was just booting kernel, no user-
>> space is needed. Qemu crashes during kernel init, or it runs fine.
>>
>> I used regular kernel from debian sid:
>>    http://deb.debian.org/debian/pool/main/l/linux-signed-amd64/linux-
>> image-amd64_6.12.29-1_amd64.deb
>> Extract vmlinuz-6.12.29-amd64 from there.
>>
>> In order to simplify the reproducing, I created a tiny initrd with
>> just one executable in there, which does a poweroff:
>>
>> cat >poweroff.c <<'EOF'
>> #include <sys/reboot.h>
>> #include <unistd.h>
>>
>> int main(void) {
>>     reboot(RB_POWER_OFF);
>>     sleep(5);
>>     return 0;
>> }
>> EOF
>> diet gcc -static -o init poweroff.c
>> echo init | cpio -o -H newc > initrd
>>
>> (it uses dietlibc, optional, just to make the initrd smaller).
>>
>> Now, the qemu invocation I used:
>>
>>    qemu-system-x86_64 -kernel vmlinuz -initrd initrd \
>>      -append "console=ttyS0" \
>>      -vga none -display none \
>>      -serial file:/dev/tty \
>>      -monitor stdio \
>>      -m 256 \
>>      -smp 16
>>
>> This way, it either succeeds, terminating normally due to
>> the initrd hating the system, or it will segfault or assert
>> as per the issue.
>>
>> For a 64-core machine, I used -smp 64, and had 16..40 cores
>> being busy with other stuff.  Also, adding `nice' in front
>> of that command apparently helps.
>>
>> Now, to the various issues/places I've hit.  Here's a typical
>> output:
>>
>> ...
>> [    3.129806] smpboot: x86: Booting SMP configuration:
>> [    3.135789] .... node  #0, CPUs:        #1  #2  #3  #4  #5  #6  #7
>> #8  #9
>> [    0.000000] calibrate_delay_direct() failed to get a good estimate
>> for loops_per_jiffy.
>> [    0.000000] Probably due to long platform interrupts. Consider using
>> "lpj=" boot option.
>> [    0.000000] calibrate_delay_direct() failed to get a good estimate
>> for loops_per_jiffy.
>> [    0.000000] Probably due to long platform interrupts. Consider using
>> "lpj=" boot option.
>> [    0.000000] calibrate_delay_direct() failed to get a good estimate
>> for loops_per_jiffy.
>> [    0.000000] Probably due to long platform interrupts. Consider using
>> "lpj=" boot option.
>> [    0.000000] calibrate_delay_direct() failed to get a good estimate
>> for loops_per_jiffy.
>> [    0.000000] Probably due to long platform interrupts. Consider using
>> "lpj=" boot option.
>> [    4.494389] calibrate_delay_direct() failed to get a good estimate
>> for loops_per_jiffy.
>> [    4.494389] Probably due to long platform interrupts. Consider using
>> "lpj=" boot option.
>> [    4.494396] calibrate_delay_direct() failed to get a good estimate
>> for loops_per_jiffy.
>> [    4.494396] Probably due to long platform interrupts. Consider using
>> "lpj=" boot option.
>> [    4.494401] calibrate_delay_direct() failed to get a good estimate
>> for loops_per_jiffy.
>> [    4.494401] Probably due to long platform interrupts. Consider using
>> "lpj=" boot option.
>> [    4.494408] calibrate_delay_direct() failed to get a good estimate
>> for loops_per_jiffy.
>> [    4.494408] Probably due to long platform interrupts. Consider using
>> "lpj=" boot option.
>> [    4.494415] calibrate_delay_direct() failed to get a good estimate
>> for loops_per_jiffy.
>> [    4.494415] Probably due to long platform interrupts. Consider using
>> "lpj=" boot option.
>> [    5.864038] smp: Brought up 1 node, 10 CPUs
>> [    5.865772] smpboot: Total of 10 processors activated (25983.25
>> BogoMIPS)
>> [    6.119683] Memory: 200320K/261624K available (16384K kernel code,
>> 2486K rwdata, 11780K rodata, 4148K init, 4956K bss, 53176K reserved, 0K
>> cma-reserved)
>> [    6.591933] devtmpfs: initialized
>> [    6.635844] x86/mm: Memory block size: 128MB
>> [    6.756849] clocksource: jiffies: mask: 0xffffffff max_cycles:
>> 0xffffffff, max_idle_ns: 7645041785100000 ns
>> [    6.774545] futex hash table entries: 4096 (order: 6, 262144 bytes,
>> linear)
>> [    6.840775] pinctrl core: initialized pinctrl subsystem
>> [    7.117085] NET: Registered PF_NETLINK/PF_ROUTE protocol family
>> [    7.165883] DMA: preallocated 128 KiB GFP_KERNEL pool for atomic
>> allocations
>> [    7.184243] DMA: preallocated 128 KiB GFP_KERNEL|GFP_DMA pool for
>> atomic allocations
>> [    7.188322] DMA: preallocated 128 KiB GFP_KERNEL|GFP_DMA32 pool for
>> atomic allocations
>> [    7.195902] audit: initializing netlink subsys (disabled)
>> [    7.223865] audit: type=2000 audit(1748628013.324:1):
>> state=initialized audit_enabled=0 res=1
>> [    7.290904] thermal_sys: Registered thermal governor 'fair_share'
>> [    7.291980] thermal_sys: Registered thermal governor 'bang_bang'
>> [    7.295875] thermal_sys: Registered thermal governor 'step_wise'
>> [    7.299817] thermal_sys: Registered thermal governor 'user_space'
>> [    7.303804] thermal_sys: Registered thermal governor 'power_allocator'
>> [    7.316281] cpuidle: using governor ladder
>> [    7.331907] cpuidle: using governor menu
>> [    7.348199] acpiphp: ACPI Hot Plug PCI Controller Driver version: 0.5
>> [    7.407802] PCI: Using configuration type 1 for base access
>> [    7.417386] mtrr: your CPUs had inconsistent fixed MTRR settings
>> [    7.418244] mtrr: your CPUs had inconsistent variable MTRR settings
>> [    7.419048] mtrr: your CPUs had inconsistent MTRRdefType settings
>> [    7.419938] mtrr: probably your BIOS does not setup all CPUs.
>> [    7.420691] mtrr: corrected configuration.
>> [    7.461270] kprobes: kprobe jump-optimization is enabled. All kprobes
>> are optimized if possible.
>> [    7.591938] HugeTLB: registered 2.00 MiB page size, pre-allocated 0
>> pages
>> [    7.595986] HugeTLB: 28 KiB vmemmap can be freed for a 2.00 MiB page
>> [    7.816900] ACPI: Added _OSI(Module Device)
>> [    7.819950] ACPI: Added _OSI(Processor Device)
>> [    7.823873] ACPI: Added _OSI(3.0 _SCP Extensions)
>> [    7.827683] ACPI: Added _OSI(Processor Aggregator Device)
>> [    8.000944] ACPI: 1 ACPI AML tables successfully acquired and loaded
>> [    8.355952] ACPI: Interpreter enabled
>> [    8.406604] ACPI: PM: (supports S0 S3 S4 S5)
>> [    8.416143] ACPI: Using IOAPIC for interrupt routing
>> [    8.448173] PCI: Using host bridge windows from ACPI; if necessary,
>> use "pci=nocrs" and report a bug
>> [    8.468051] PCI: Using E820 reservations for host bridge windows
>> [    8.562534] ACPI: Enabled 2 GPEs in block 00 to 0F
>> [    9.153432] ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
>> [    9.166585] acpi PNP0A03:00: _OSC: OS supports [ASPM ClockPM Segments
>> MSI HPX-Type3]
>> [    9.168452] acpi PNP0A03:00: _OSC: not requesting OS control; OS
>> requires [ExtendedConfig ASPM ClockPM MSI]
>> [    9.181933] acpi PNP0A03:00: fail to add MMCONFIG information, can't
>> access extended configuration space under this bridge
>> [    9.297562] acpiphp: Slot [2] registered
>> ...
>> [    9.369007] PCI host bridge to bus 0000:00
>> [    9.376590] pci_bus 0000:00: root bus resource [io  0x0000-0x0cf7
>> window]
>> [    9.379987] pci_bus 0000:00: root bus resource [io  0x0d00-0xffff
>> window]
>> [    9.383826] pci_bus 0000:00: root bus resource [mem
>> 0x000a0000-0x000bffff window]
>> [    9.387818] pci_bus 0000:00: root bus resource [mem
>> 0x10000000-0xfebfffff window]
>> [    9.393681] pci_bus 0000:00: root bus resource [mem
>> 0x100000000-0x17fffffff window]
>> [    9.396987] pci_bus 0000:00: root bus resource [bus 00-ff]
>> [    9.414378] pci 0000:00:00.0: [8086:1237] type 00 class 0x060000
>> conventional PCI endpoint
>> [    9.477179] pci 0000:00:01.0: [8086:7000] type 00 class 0x060100
>> conventional PCI endpoint
>> [    9.494836] pci 0000:00:01.1: [8086:7010] type 00 class 0x010180
>> conventional PCI endpoint
>> [    9.527173] pci 0000:00:01.1: BAR 4 [io  0xc040-0xc04f]
>> Segmentation fault
>>
>>
>> So it breaks somewhere in PCI init, after SMP/CPUs has been inited
>> by the guest kernel.
>>
>> Thread 21 "qemu-system-x86" received signal SIGSEGV, Segmentation fault.
>> 0x0000555555e2e9c0 in section_covers_addr (section=0x7fff58307,
>> addr=182591488) at ../system/physmem.c:309
>> 309         return int128_gethi(section->size) ||
>> (gdb) p *section
>> Cannot access memory at address 0x7fff58307
>>
>> This one has been seen multiple times.
>>
>> Thread 53 "qemu-system-x86" received signal SIGSEGV, Segmentation fault.
>> [Switching to Thread 0x7ffe8a7fc6c0 (LWP 104067)]
>> 0x0000555555e30382 in memory_region_section_get_iotlb
>> (cpu=0x5555584e0a90, section=0x7fff58c3eac0) at
>>                ../system/physmem.c:1002
>> 1002        return section - d->map.sections;
>> d is NULL here
>>
>>
>> Thread 22 "qemu-system-x86" received signal SIGSEGV, Segmentation fault.
>> [Switching to Thread 0x7fff0bfff6c0 (LWP 57595)]
>> 0x0000555555e42c9a in memory_region_get_iommu (mr=0xffffffc1ffffffc1) at
>> include/exec/memory.h:1756
>> 1756        if (mr->alias) {
>> (gdb) p *mr
>> Cannot access memory at address 0xffffffc1ffffffc1
>> (gdb) frame 1
>> #1  0x0000555555e42cb9 in memory_region_get_iommu (mr=0x7fff54239a10) at
>> include/exec/memory.h:1757
>> 1757            return memory_region_get_iommu(mr->alias);
>> (gdb) p mr
>> $1 = (MemoryRegion *) 0x7fff54239a10
>>
>>
>> [    9.222531] pci 0000:00:02.0: BAR 0 [mem 0xfebc0000-0xfebdffff]
>> [
>> Thread 54 "qemu-system-x86" received signal SIGSEGV, Segmentation fault.
>> [Switching to Thread 0x7ffebeffd6c0 (LWP 14977)]
>>
>> (gdb) l
>> 1004    /* Called from RCU critical section */
>> 1005    hwaddr memory_region_section_get_iotlb(CPUState *cpu,
>> 1006                                           MemoryRegionSection
>> *section)
>> 1007    {
>> 1008        AddressSpaceDispatch *d = flatview_to_dispatch(section->fv);
>> 1009        return section - d->map.sections;
>> 1010    }
>> 1011
>> 1012    static int subpage_register(subpage_t *mmio, uint32_t start,
>> uint32_t end,
>> 1013                                uint16_t section);
>>
>> (gdb) p *section
>> $1 = {size = 4940204083636081308795136, mr = 0x7fff98739760, fv =
>> 0x7fff998f6fd0,
>>     offset_within_region = 0, offset_within_address_space = 0, readonly =
>> false,
>>     nonvolatile = false, unmergeable = 12}
>>
>> (gdb) p *section->fv
>> $2 = {rcu = {next = 0x0, func = 0x20281}, ref = 2555275280, ranges =
>> 0x7fff99486a60, nr = 0,
>>     nr_allocated = 0, dispatch = 0x0, root = 0xffffffc1ffffffc1}
>>
>> (gdb) bt
>> #0  0x0000555555e5118c in memory_region_section_get_iotlb
>> (cpu=cpu@entry=0x55555894fdf0,
>>       section=section@entry=0x7fff984e6810) at system/physmem.c:1009
>> #1  0x0000555555e6e07a in tlb_set_page_full (cpu=cpu@entry=0x55555894fdf0,
>>       mmu_idx=mmu_idx@entry=6, addr=addr@entry=151134208,
>> full=full@entry=0x7ffebeffbd60)
>>       at accel/tcg/cputlb.c:1088
>> #2  0x0000555555e70a92 in tlb_set_page_with_attrs
>> (cpu=cpu@entry=0x55555894fdf0,
>>       addr=addr@entry=151134208, paddr=paddr@entry=151134208, attrs=...,
>> prot=<optimized out>,
>>       mmu_idx=mmu_idx@entry=6, size=4096) at accel/tcg/cputlb.c:1193
>> #3  0x0000555555d4ae44 in x86_cpu_tlb_fill (cs=0x55555894fdf0,
>> addr=151138272,
>>       size=<optimized out>, access_type=MMU_DATA_STORE, mmu_idx=6,
>> probe=<optimized out>,
>>       retaddr=0) at target/i386/tcg/system/excp_helper.c:624
>> #4  0x0000555555e6e8cf in tlb_fill_align (cpu=0x55555894fdf0,
>> addr=151138272,
>>       type=MMU_DATA_LOAD, type@entry=MMU_DATA_STORE, mmu_idx=6,
>> memop=memop@entry=MO_8,
>>       size=-1739692016, size@entry=151138272, probe=true, ra=0) at accel/
>> tcg/cputlb.c:1251
>> #5  0x0000555555e6eb0d in probe_access_internal
>> (cpu=cpu@entry=0x55555894fdf0,
>>       addr=addr@entry=151138272, fault_size=fault_size@entry=0,
>>       access_type=access_type@entry=MMU_DATA_STORE, mmu_idx=<optimized out>,
>>       nonfault=nonfault@entry=true, phost=0x7ffebeffc0a8,
>> pfull=0x7ffebeffbfa0, retaddr=0,
>>       check_mem_cbs=false) at accel/tcg/cputlb.c:1371
>> #6  0x0000555555e70c84 in probe_access_full_mmu (env=0x5555589529b0,
>> addr=addr@entry=151138272,
>>       size=size@entry=0, access_type=access_type@entry=MMU_DATA_STORE,
>> mmu_idx=<optimized out>,
>>       phost=phost@entry=0x7ffebeffc0a8, pfull=0x7ffebeffbfa0) at accel/
>> tcg/cputlb.c:1439
>> #7  0x0000555555d497c9 in ptw_translate (inout=0x7ffebeffc090,
>> addr=151138272)
>>       at target/i386/tcg/system/excp_helper.c:68
>> #8  0x0000555555d49988 in mmu_translate (env=env@entry=0x5555589529b0,
>>       in=in@entry=0x7ffebeffc140, out=out@entry=0x7ffebeffc110,
>> err=err@entry=0x7ffebeffc120,
>>       ra=ra@entry=0) at target/i386/tcg/system/excp_helper.c:198
>> #9  0x0000555555d4aece in get_physical_address (env=0x5555589529b0,
>> addr=18446741874686299840,
>>       access_type=MMU_DATA_LOAD, mmu_idx=4, out=0x7ffebeffc110,
>> err=0x7ffebeffc120, ra=0)
>>       at target/i386/tcg/system/excp_helper.c:597
>> #10 x86_cpu_tlb_fill (cs=0x55555894fdf0, addr=18446741874686299840,
>> size=<optimized out>,
>>       access_type=MMU_DATA_LOAD, mmu_idx=4, probe=<optimized out>,
>> retaddr=0)
>>       at target/i386/tcg/system/excp_helper.c:617
>> #11 0x0000555555e6e8cf in tlb_fill_align (cpu=0x55555894fdf0,
>> addr=18446741874686299840,
>>       type=type@entry=MMU_DATA_LOAD, mmu_idx=4, memop=MO_8,
>> memop@entry=MO_32, size=-1739692016,
>>       size@entry=3776, probe=false, ra=0) at accel/tcg/cputlb.c:1251
>> #12 0x0000555555e6ed4d in mmu_lookup1 (cpu=cpu@entry=0x55555894fdf0,
>>       data=data@entry=0x7ffebeffc310, memop=memop@entry=MO_32,
>> mmu_idx=mmu_idx@entry=4,
>>       access_type=access_type@entry=MMU_DATA_LOAD, ra=ra@entry=0) at
>> accel/tcg/cputlb.c:1652
>> #13 0x0000555555e6eea5 in mmu_lookup (cpu=cpu@entry=0x55555894fdf0,
>>       addr=addr@entry=18446741874686299840, oi=oi@entry=36, ra=ra@entry=0,
>>       type=type@entry=MMU_DATA_LOAD, l=l@entry=0x7ffebeffc310) at accel/
>> tcg/cputlb.c:1755
>> #14 0x0000555555e6f2f3 in do_ld4_mmu (cpu=cpu@entry=0x55555894fdf0,
>>       addr=addr@entry=18446741874686299840, oi=oi@entry=36, ra=ra@entry=0,
>>       access_type=access_type@entry=MMU_DATA_LOAD) at accel/tcg/
>> cputlb.c:2364
>> #15 0x0000555555e71dba in cpu_ldl_mmu (env=0x5555589529b0,
>> addr=18446741874686299840, oi=36,
>>       ra=0) at accel/tcg/ldst_common.c.inc:165
>> #16 cpu_ldl_le_mmuidx_ra (env=env@entry=0x5555589529b0,
>> addr=addr@entry=18446741874686299840,
>>       mmu_idx=<optimized out>, ra=ra@entry=0) at accel/tcg/
>> ldst_common.c.inc:308
>> #17 0x0000555555db72da in do_interrupt64 (env=0x5555589529b0, intno=236,
>> is_int=0, error_code=0,
>>       next_eip=<optimized out>, is_hw=0) at target/i386/tcg/seg_helper.c:954
>> #18 do_interrupt_all (cpu=cpu@entry=0x55555894fdf0, intno=236,
>> is_int=is_int@entry=0,
>>       error_code=error_code@entry=0, next_eip=next_eip@entry=0,
>> is_hw=is_hw@entry=1)
>>       at target/i386/tcg/seg_helper.c:1213
>> #19 0x0000555555db884a in do_interrupt_x86_hardirq
>> (env=env@entry=0x5555589529b0,
>>       intno=<optimized out>, is_hw=is_hw@entry=1) at target/i386/tcg/
>> seg_helper.c:1245
>> #20 0x0000555555d4f06f in x86_cpu_exec_interrupt (cs=0x55555894fdf0,
>>       interrupt_request=<optimized out>) at target/i386/tcg/system/
>> seg_helper.c:209
>> #21 0x0000555555e660ed in cpu_handle_interrupt (cpu=0x55555894fdf0,
>> last_tb=<synthetic pointer>)
>>       at accel/tcg/cpu-exec.c:851
>> #22 cpu_exec_loop (cpu=cpu@entry=0x55555894fdf0,
>> sc=sc@entry=0x7ffebeffc580)
>>       at accel/tcg/cpu-exec.c:955
>> #23 0x0000555555e663f1 in cpu_exec_setjmp (cpu=cpu@entry=0x55555894fdf0,
>>       sc=sc@entry=0x7ffebeffc580) at accel/tcg/cpu-exec.c:1033
>>       type=type@entry=MMU_DATA_LOAD, l=l@entry=0x7ffebeffc310) at accel/
>> tcg/cputlb.c:1755
>> #14 0x0000555555e6f2f3 in do_ld4_mmu (cpu=cpu@entry=0x55555894fdf0,
>>       addr=addr@entry=18446741874686299840, oi=oi@entry=36, ra=ra@entry=0,
>>       access_type=access_type@entry=MMU_DATA_LOAD) at accel/tcg/
>> cputlb.c:2364
>> #15 0x0000555555e71dba in cpu_ldl_mmu (env=0x5555589529b0,
>> addr=18446741874686299840, oi=36,
>>       ra=0) at accel/tcg/ldst_common.c.inc:165
>> #16 cpu_ldl_le_mmuidx_ra (env=env@entry=0x5555589529b0,
>> addr=addr@entry=18446741874686299840,
>>       mmu_idx=<optimized out>, ra=ra@entry=0) at accel/tcg/
>> ldst_common.c.inc:308
>> #17 0x0000555555db72da in do_interrupt64 (env=0x5555589529b0, intno=236,
>> is_int=0, error_code=0,
>>       next_eip=<optimized out>, is_hw=0) at target/i386/tcg/seg_helper.c:954
>> #18 do_interrupt_all (cpu=cpu@entry=0x55555894fdf0, intno=236,
>> is_int=is_int@entry=0,
>>       error_code=error_code@entry=0, next_eip=next_eip@entry=0,
>> is_hw=is_hw@entry=1)
>>       at target/i386/tcg/seg_helper.c:1213
>> #19 0x0000555555db884a in do_interrupt_x86_hardirq
>> (env=env@entry=0x5555589529b0,
>>       intno=<optimized out>, is_hw=is_hw@entry=1) at target/i386/tcg/
>> seg_helper.c:1245
>> #20 0x0000555555d4f06f in x86_cpu_exec_interrupt (cs=0x55555894fdf0,
>>       interrupt_request=<optimized out>) at target/i386/tcg/system/
>> seg_helper.c:209
>> #21 0x0000555555e660ed in cpu_handle_interrupt (cpu=0x55555894fdf0,
>> last_tb=<synthetic pointer>)
>>       at accel/tcg/cpu-exec.c:851
>> #22 cpu_exec_loop (cpu=cpu@entry=0x55555894fdf0,
>> sc=sc@entry=0x7ffebeffc580)
>>       at accel/tcg/cpu-exec.c:955
>> #23 0x0000555555e663f1 in cpu_exec_setjmp (cpu=cpu@entry=0x55555894fdf0,
>>       sc=sc@entry=0x7ffebeffc580) at accel/tcg/cpu-exec.c:1033
>> --Type <RET> for more, q to quit, c to continue without paging--
>> #24 0x0000555555e66a5d in cpu_exec (cpu=cpu@entry=0x55555894fdf0) at
>> accel/tcg/cpu-exec.c:1059
>> #25 0x0000555555d2bdc7 in tcg_cpu_exec (cpu=cpu@entry=0x55555894fdf0)
>>       at accel/tcg/tcg-accel-ops.c:80
>> #26 0x0000555555d2c1c3 in mttcg_cpu_thread_fn
>> (arg=arg@entry=0x55555894fdf0)
>>       at accel/tcg/tcg-accel-ops-mttcg.c:94
>> #27 0x0000555556056d90 in qemu_thread_start (args=0x5555589cdba0) at
>> util/qemu-thread-posix.c:541
>> #28 0x00007ffff60e0b7b in ?? () from /lib/x86_64-linux-gnu/libc.so.6
>> #29 0x00007ffff615e7b8 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
>>
>>
>> qemu-system-x86_64: ./include/exec/ram_addr.h:91: ramblock_ptr:
>> Assertion `offset_in_ramblock(block, offset)' failed.
>>
>> (gdb) bt
>> #0  0x00007ffff6076507 in abort () from /lib/x86_64-linux-gnu/libc.so.6
>> #1  0x00007ffff6076420 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
>> #2  0x0000555555a047fa in ramblock_ptr (offset=281471527758833,
>> block=<optimized out>)
>>       at ./include/exec/ram_addr.h:91
>> #3  0x0000555555a04c83 in ramblock_ptr (block=<optimized out>,
>> offset=<optimized out>)
>>       at system/physmem.c:2238
>> #4  qemu_ram_ptr_length (lock=false, is_write=true, block=<optimized
>> out>, addr=<optimized out>,
>>       size=0x0) at system/physmem.c:2430
>> #5  qemu_map_ram_ptr (ram_block=<optimized out>, addr=<optimized out>)
>> at system/physmem.c:2443
>> #6  0x0000555555e4af6b in memory_region_get_ram_ptr (mr=<optimized out>)
>> at system/memory.c:2452
>> #7  0x0000555555e6e024 in tlb_set_page_full (cpu=cpu@entry=0x5555589f9f50,
>>       mmu_idx=mmu_idx@entry=4, addr=addr@entry=18446741874686296064,
>>       full=full@entry=0x7ffebd7f90b0) at accel/tcg/cputlb.c:1065
>> #8  0x0000555555e70a92 in tlb_set_page_with_attrs
>> (cpu=cpu@entry=0x5555589f9f50,
>>       addr=addr@entry=18446741874686296064, paddr=paddr@entry=206749696,
>> attrs=...,
>>       prot=<optimized out>, mmu_idx=mmu_idx@entry=4, size=4096) at accel/
>> tcg/cputlb.c:1193
>> #9  0x0000555555d4ae44 in x86_cpu_tlb_fill (cs=0x5555589f9f50,
>> addr=18446741874686299840,
>>       size=<optimized out>, access_type=MMU_DATA_LOAD, mmu_idx=4,
>> probe=<optimized out>, retaddr=0)
>>       at target/i386/tcg/system/excp_helper.c:624
>> #10 0x0000555555e6e8cf in tlb_fill_align (cpu=0x5555589f9f50,
>> addr=18446741874686299840,
>>       type=type@entry=MMU_DATA_LOAD, mmu_idx=4, memop=MO_8,
>> memop@entry=MO_32, size=-1115714056,
>>       size@entry=3776, probe=false, ra=0) at accel/tcg/cputlb.c:1251
>> #11 0x0000555555e6ed4d in mmu_lookup1 (cpu=cpu@entry=0x5555589f9f50,
>>       data=data@entry=0x7ffebd7f9310, memop=memop@entry=MO_32,
>> mmu_idx=mmu_idx@entry=4,
>>       access_type=access_type@entry=MMU_DATA_LOAD, ra=ra@entry=0) at
>> accel/tcg/cputlb.c:1652
>> #12 0x0000555555e6eea5 in mmu_lookup (cpu=cpu@entry=0x5555589f9f50,
>>       addr=addr@entry=18446741874686299840, oi=oi@entry=36, ra=ra@entry=0,
>>       type=type@entry=MMU_DATA_LOAD, l=l@entry=0x7ffebd7f9310) at accel/
>> tcg/cputlb.c:1755
>> #13 0x0000555555e6f2f3 in do_ld4_mmu (cpu=cpu@entry=0x5555589f9f50,
>>       addr=addr@entry=18446741874686299840, oi=oi@entry=36, ra=ra@entry=0,
>>       access_type=access_type@entry=MMU_DATA_LOAD) at accel/tcg/
>> cputlb.c:2364
>> #14 0x0000555555e71dba in cpu_ldl_mmu (env=0x5555589fcb10,
>> addr=18446741874686299840, oi=36,
>>       ra=0) at accel/tcg/ldst_common.c.inc:165
>> #15 cpu_ldl_le_mmuidx_ra (env=env@entry=0x5555589fcb10,
>> addr=addr@entry=18446741874686299840,
>>       mmu_idx=<optimized out>, ra=ra@entry=0) at accel/tcg/
>> ldst_common.c.inc:308
>> #16 0x0000555555db72da in do_interrupt64 (env=0x5555589fcb10, intno=236,
>> is_int=0, error_code=0,
>>       next_eip=<optimized out>, is_hw=0) at target/i386/tcg/seg_helper.c:954
>> #17 do_interrupt_all (cpu=cpu@entry=0x5555589f9f50, intno=236,
>> is_int=is_int@entry=0,
>>       error_code=error_code@entry=0, next_eip=next_eip@entry=0,
>> is_hw=is_hw@entry=1)
>>       at target/i386/tcg/seg_helper.c:1213
>> #18 0x0000555555db884a in do_interrupt_x86_hardirq
>> (env=env@entry=0x5555589fcb10,
>>       intno=<optimized out>, is_hw=is_hw@entry=1) at target/i386/tcg/
>> seg_helper.c:1245
>> #19 0x0000555555d4f06f in x86_cpu_exec_interrupt (cs=0x5555589f9f50,
>>       interrupt_request=<optimized out>) at target/i386/tcg/system/
>> seg_helper.c:209
>> #20 0x0000555555e660ed in cpu_handle_interrupt (cpu=0x5555589f9f50,
>> last_tb=<synthetic pointer>)
>>       at accel/tcg/cpu-exec.c:851
>> #21 cpu_exec_loop (cpu=cpu@entry=0x5555589f9f50,
>> sc=sc@entry=0x7ffebd7f9580)
>>       at accel/tcg/cpu-exec.c:955
>> #22 0x0000555555e663f1 in cpu_exec_setjmp (cpu=cpu@entry=0x5555589f9f50,
>>       sc=sc@entry=0x7ffebd7f9580) at accel/tcg/cpu-exec.c:1033
>> #23 0x0000555555e66a5d in cpu_exec (cpu=cpu@entry=0x5555589f9f50) at
>> accel/tcg/cpu-exec.c:1059
>> #24 0x0000555555d2bdc7 in tcg_cpu_exec (cpu=cpu@entry=0x5555589f9f50)
>>       at accel/tcg/tcg-accel-ops.c:80
>> #25 0x0000555555d2c1c3 in mttcg_cpu_thread_fn
>> (arg=arg@entry=0x5555589f9f50)
>>       at accel/tcg/tcg-accel-ops-mttcg.c:94
>> #26 0x0000555556056d90 in qemu_thread_start (args=0x55555856bf60) at
>> util/qemu-thread-posix.c:541
>> #27 0x00007ffff60e0b7b in ?? () from /lib/x86_64-linux-gnu/libc.so.6
>> #28 0x00007ffff615e7b8 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
>>
>> (gdb) frame 2
>> #2  0x0000555555a047fa in ramblock_ptr (offset=281471527758833,
>> block=<optimized out>)
>>       at ./include/exec/ram_addr.h:91
>> 91          assert(offset_in_ramblock(block, offset));
>>
>> (gdb) l
>> 86          return (b && b->host && offset < b->used_length) ? true :
>> false;
>> 87      }
>> 88
>> 89      static inline void *ramblock_ptr(RAMBlock *block, ram_addr_t
>> offset)
>> 90      {
>> 91          assert(offset_in_ramblock(block, offset));
>> 92          return (char *)block->host + offset;
>> 93      }
>> 94
>> 95      static inline unsigned long int ramblock_recv_bitmap_offset(void
>> *host_addr,
>>
>>
>> [    9.439487] pci 0000:00:02.0: BAR 1 [io  0xc000-0xc03f]
>>
>> Thread 65 "qemu-system-x86" received signal SIGSEGV, Segmentation fault.
>> [Switching to Thread 0x7ffe9cff96c0 (LWP 15472)]
>> phys_page_find (d=d@entry=0x7fff905ec880, addr=addr@entry=111288320) at
>> system/physmem.c:337
>> 337         if (section_covers_addr(&sections[lp.ptr], addr)) {
>> (gdb) l
>> 332             }
>> 333             p = nodes[lp.ptr];
>> 334             lp = p[(index >> (i * P_L2_BITS)) & (P_L2_SIZE - 1)];
>> 335         }
>> 336
>> 337         if (section_covers_addr(&sections[lp.ptr], addr)) {
>> 338             return &sections[lp.ptr];
>> 339         } else {
>> 340             return &sections[PHYS_SECTION_UNASSIGNED];
>> 341         }
>> (gdb)
>>
>>
>> I was doing a bisection between 9.2.0 and 10.0.0, since we observed
>> this issue happening wiht 10.0 but not - with 9.2.  So some of the
>> above failures might be somewhere from the middle between 9.2 and
>> 10.0.  However, I was able to trigger some of the failures with
>> 9.2.0, though with much less probability.  And some can be triggered
>> in current master too, with much better probability.
>>
>> On my 4-core notebook, the above command line fails every 20..50 run.
>>
>> I was never able to reproduce the assertion failure as shown in !1921.
>>
>> As of now, this issue is hitting debian trixie, - in debci, when a
>> package which creates a guest image tries to run qemu but in the
>> debci environment there's no kvm available, so it resorts to tcg.
>>
>> On IRC, Manos Pitsidianakis noted that he was debugging use-after-free
>> with MemoryRegion recently, and posted a patch which can help a bit:
>> https://people.linaro.org/~manos.pitsidianakis/backtrace.diff
>>
>> I'm not sure where to go from here.
>>
>> Just collecting everything we have now.
>>
>> Thanks,
>>
>> /mjt
>>
> 

looks like a good target for TSAN, which might expose the race without 
really having to trigger it.
https://www.qemu.org/docs/master/devel/testing/main.html#building-and-testing-with-tsan

Else, you can reproduce your run using rr record -h (chaos mode) [1], 
which randomly schedules threads, until it catches the segfault, and 
then you'll have a reproducible case to debug.

[1] https://github.com/rr-debugger/rr

Regards,
Pierrick


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: apparent race condition in mttcg memory handling
  2025-07-21 16:23   ` Pierrick Bouvier
@ 2025-07-21 16:29     ` Pierrick Bouvier
  2025-07-21 17:14       ` Michael Tokarev
  0 siblings, 1 reply; 12+ messages in thread
From: Pierrick Bouvier @ 2025-07-21 16:29 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé, Michael Tokarev, QEMU Development
  Cc: Jonathan Cameron, Alex Bennée, Richard Henderson,
	Paolo Bonzini, Stefan Hajnoczi, Mark Cave-Ayland

On 7/21/25 9:23 AM, Pierrick Bouvier wrote:
> Hi Michael,
> 
> On 7/21/25 4:47 AM, Philippe Mathieu-Daudé wrote:
>> (Cc'ing few more developers)
>>
>> On 30/5/25 21:20, Michael Tokarev wrote:
>>> Hi!
>>>
>>> For quite some time (almost whole day yesterday) I'm trying to find out
>>> what's going on with mmtcg in qemu.  There's apparently a race condition
>>> somewhere, like a use-after-free or something.
>>>
>>> It started as an incarnation of
>>> https://gitlab.com/qemu-project/qemu/-/issues/1921 -- the same assertion
>>> failure, but on an x86_64 host this time (it's also mentioned in that
>>> issue).
>>>
>>> However, that particular assertion failure is not the only possible
>>> outcome.  We're hitting multiple assertion failures or SIGSEGVs in
>>> physmem.c and related files, - 4 or 5 different places so far.
>>>
>>> The problem here is that the bug is rather difficult to reproduce.
>>> What I've been using so far was to make most host cores busy, and
>>> specify amount of virtual CPUs close to actual host cores (threads).
>>>
>>> For example, on my 4-core, 8-threads notebook, I used `stress -c 8`
>>> and ran qemu with -smp 10, to trigger this issue. However, on this
>>> very notebook it is really difficult to trigger it, - it happens
>>> every 30..50 runs or so.
>>>
>>> The reproducer I was using - it was just booting kernel, no user-
>>> space is needed. Qemu crashes during kernel init, or it runs fine.
>>>
>>> I used regular kernel from debian sid:
>>>     http://deb.debian.org/debian/pool/main/l/linux-signed-amd64/linux-
>>> image-amd64_6.12.29-1_amd64.deb
>>> Extract vmlinuz-6.12.29-amd64 from there.
>>>
>>> In order to simplify the reproducing, I created a tiny initrd with
>>> just one executable in there, which does a poweroff:
>>>
>>> cat >poweroff.c <<'EOF'
>>> #include <sys/reboot.h>
>>> #include <unistd.h>
>>>
>>> int main(void) {
>>>      reboot(RB_POWER_OFF);
>>>      sleep(5);
>>>      return 0;
>>> }
>>> EOF
>>> diet gcc -static -o init poweroff.c
>>> echo init | cpio -o -H newc > initrd
>>>
>>> (it uses dietlibc, optional, just to make the initrd smaller).
>>>
>>> Now, the qemu invocation I used:
>>>
>>>     qemu-system-x86_64 -kernel vmlinuz -initrd initrd \
>>>       -append "console=ttyS0" \
>>>       -vga none -display none \
>>>       -serial file:/dev/tty \
>>>       -monitor stdio \
>>>       -m 256 \
>>>       -smp 16
>>>
>>> This way, it either succeeds, terminating normally due to
>>> the initrd hating the system, or it will segfault or assert
>>> as per the issue.
>>>
>>> For a 64-core machine, I used -smp 64, and had 16..40 cores
>>> being busy with other stuff.  Also, adding `nice' in front
>>> of that command apparently helps.
>>>
>>> Now, to the various issues/places I've hit.  Here's a typical
>>> output:
>>>
>>> ...
>>> [    3.129806] smpboot: x86: Booting SMP configuration:
>>> [    3.135789] .... node  #0, CPUs:        #1  #2  #3  #4  #5  #6  #7
>>> #8  #9
>>> [    0.000000] calibrate_delay_direct() failed to get a good estimate
>>> for loops_per_jiffy.
>>> [    0.000000] Probably due to long platform interrupts. Consider using
>>> "lpj=" boot option.
>>> [    0.000000] calibrate_delay_direct() failed to get a good estimate
>>> for loops_per_jiffy.
>>> [    0.000000] Probably due to long platform interrupts. Consider using
>>> "lpj=" boot option.
>>> [    0.000000] calibrate_delay_direct() failed to get a good estimate
>>> for loops_per_jiffy.
>>> [    0.000000] Probably due to long platform interrupts. Consider using
>>> "lpj=" boot option.
>>> [    0.000000] calibrate_delay_direct() failed to get a good estimate
>>> for loops_per_jiffy.
>>> [    0.000000] Probably due to long platform interrupts. Consider using
>>> "lpj=" boot option.
>>> [    4.494389] calibrate_delay_direct() failed to get a good estimate
>>> for loops_per_jiffy.
>>> [    4.494389] Probably due to long platform interrupts. Consider using
>>> "lpj=" boot option.
>>> [    4.494396] calibrate_delay_direct() failed to get a good estimate
>>> for loops_per_jiffy.
>>> [    4.494396] Probably due to long platform interrupts. Consider using
>>> "lpj=" boot option.
>>> [    4.494401] calibrate_delay_direct() failed to get a good estimate
>>> for loops_per_jiffy.
>>> [    4.494401] Probably due to long platform interrupts. Consider using
>>> "lpj=" boot option.
>>> [    4.494408] calibrate_delay_direct() failed to get a good estimate
>>> for loops_per_jiffy.
>>> [    4.494408] Probably due to long platform interrupts. Consider using
>>> "lpj=" boot option.
>>> [    4.494415] calibrate_delay_direct() failed to get a good estimate
>>> for loops_per_jiffy.
>>> [    4.494415] Probably due to long platform interrupts. Consider using
>>> "lpj=" boot option.
>>> [    5.864038] smp: Brought up 1 node, 10 CPUs
>>> [    5.865772] smpboot: Total of 10 processors activated (25983.25
>>> BogoMIPS)
>>> [    6.119683] Memory: 200320K/261624K available (16384K kernel code,
>>> 2486K rwdata, 11780K rodata, 4148K init, 4956K bss, 53176K reserved, 0K
>>> cma-reserved)
>>> [    6.591933] devtmpfs: initialized
>>> [    6.635844] x86/mm: Memory block size: 128MB
>>> [    6.756849] clocksource: jiffies: mask: 0xffffffff max_cycles:
>>> 0xffffffff, max_idle_ns: 7645041785100000 ns
>>> [    6.774545] futex hash table entries: 4096 (order: 6, 262144 bytes,
>>> linear)
>>> [    6.840775] pinctrl core: initialized pinctrl subsystem
>>> [    7.117085] NET: Registered PF_NETLINK/PF_ROUTE protocol family
>>> [    7.165883] DMA: preallocated 128 KiB GFP_KERNEL pool for atomic
>>> allocations
>>> [    7.184243] DMA: preallocated 128 KiB GFP_KERNEL|GFP_DMA pool for
>>> atomic allocations
>>> [    7.188322] DMA: preallocated 128 KiB GFP_KERNEL|GFP_DMA32 pool for
>>> atomic allocations
>>> [    7.195902] audit: initializing netlink subsys (disabled)
>>> [    7.223865] audit: type=2000 audit(1748628013.324:1):
>>> state=initialized audit_enabled=0 res=1
>>> [    7.290904] thermal_sys: Registered thermal governor 'fair_share'
>>> [    7.291980] thermal_sys: Registered thermal governor 'bang_bang'
>>> [    7.295875] thermal_sys: Registered thermal governor 'step_wise'
>>> [    7.299817] thermal_sys: Registered thermal governor 'user_space'
>>> [    7.303804] thermal_sys: Registered thermal governor 'power_allocator'
>>> [    7.316281] cpuidle: using governor ladder
>>> [    7.331907] cpuidle: using governor menu
>>> [    7.348199] acpiphp: ACPI Hot Plug PCI Controller Driver version: 0.5
>>> [    7.407802] PCI: Using configuration type 1 for base access
>>> [    7.417386] mtrr: your CPUs had inconsistent fixed MTRR settings
>>> [    7.418244] mtrr: your CPUs had inconsistent variable MTRR settings
>>> [    7.419048] mtrr: your CPUs had inconsistent MTRRdefType settings
>>> [    7.419938] mtrr: probably your BIOS does not setup all CPUs.
>>> [    7.420691] mtrr: corrected configuration.
>>> [    7.461270] kprobes: kprobe jump-optimization is enabled. All kprobes
>>> are optimized if possible.
>>> [    7.591938] HugeTLB: registered 2.00 MiB page size, pre-allocated 0
>>> pages
>>> [    7.595986] HugeTLB: 28 KiB vmemmap can be freed for a 2.00 MiB page
>>> [    7.816900] ACPI: Added _OSI(Module Device)
>>> [    7.819950] ACPI: Added _OSI(Processor Device)
>>> [    7.823873] ACPI: Added _OSI(3.0 _SCP Extensions)
>>> [    7.827683] ACPI: Added _OSI(Processor Aggregator Device)
>>> [    8.000944] ACPI: 1 ACPI AML tables successfully acquired and loaded
>>> [    8.355952] ACPI: Interpreter enabled
>>> [    8.406604] ACPI: PM: (supports S0 S3 S4 S5)
>>> [    8.416143] ACPI: Using IOAPIC for interrupt routing
>>> [    8.448173] PCI: Using host bridge windows from ACPI; if necessary,
>>> use "pci=nocrs" and report a bug
>>> [    8.468051] PCI: Using E820 reservations for host bridge windows
>>> [    8.562534] ACPI: Enabled 2 GPEs in block 00 to 0F
>>> [    9.153432] ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
>>> [    9.166585] acpi PNP0A03:00: _OSC: OS supports [ASPM ClockPM Segments
>>> MSI HPX-Type3]
>>> [    9.168452] acpi PNP0A03:00: _OSC: not requesting OS control; OS
>>> requires [ExtendedConfig ASPM ClockPM MSI]
>>> [    9.181933] acpi PNP0A03:00: fail to add MMCONFIG information, can't
>>> access extended configuration space under this bridge
>>> [    9.297562] acpiphp: Slot [2] registered
>>> ...
>>> [    9.369007] PCI host bridge to bus 0000:00
>>> [    9.376590] pci_bus 0000:00: root bus resource [io  0x0000-0x0cf7
>>> window]
>>> [    9.379987] pci_bus 0000:00: root bus resource [io  0x0d00-0xffff
>>> window]
>>> [    9.383826] pci_bus 0000:00: root bus resource [mem
>>> 0x000a0000-0x000bffff window]
>>> [    9.387818] pci_bus 0000:00: root bus resource [mem
>>> 0x10000000-0xfebfffff window]
>>> [    9.393681] pci_bus 0000:00: root bus resource [mem
>>> 0x100000000-0x17fffffff window]
>>> [    9.396987] pci_bus 0000:00: root bus resource [bus 00-ff]
>>> [    9.414378] pci 0000:00:00.0: [8086:1237] type 00 class 0x060000
>>> conventional PCI endpoint
>>> [    9.477179] pci 0000:00:01.0: [8086:7000] type 00 class 0x060100
>>> conventional PCI endpoint
>>> [    9.494836] pci 0000:00:01.1: [8086:7010] type 00 class 0x010180
>>> conventional PCI endpoint
>>> [    9.527173] pci 0000:00:01.1: BAR 4 [io  0xc040-0xc04f]
>>> Segmentation fault
>>>
>>>
>>> So it breaks somewhere in PCI init, after SMP/CPUs has been inited
>>> by the guest kernel.
>>>
>>> Thread 21 "qemu-system-x86" received signal SIGSEGV, Segmentation fault.
>>> 0x0000555555e2e9c0 in section_covers_addr (section=0x7fff58307,
>>> addr=182591488) at ../system/physmem.c:309
>>> 309         return int128_gethi(section->size) ||
>>> (gdb) p *section
>>> Cannot access memory at address 0x7fff58307
>>>
>>> This one has been seen multiple times.
>>>
>>> Thread 53 "qemu-system-x86" received signal SIGSEGV, Segmentation fault.
>>> [Switching to Thread 0x7ffe8a7fc6c0 (LWP 104067)]
>>> 0x0000555555e30382 in memory_region_section_get_iotlb
>>> (cpu=0x5555584e0a90, section=0x7fff58c3eac0) at
>>>                 ../system/physmem.c:1002
>>> 1002        return section - d->map.sections;
>>> d is NULL here
>>>
>>>
>>> Thread 22 "qemu-system-x86" received signal SIGSEGV, Segmentation fault.
>>> [Switching to Thread 0x7fff0bfff6c0 (LWP 57595)]
>>> 0x0000555555e42c9a in memory_region_get_iommu (mr=0xffffffc1ffffffc1) at
>>> include/exec/memory.h:1756
>>> 1756        if (mr->alias) {
>>> (gdb) p *mr
>>> Cannot access memory at address 0xffffffc1ffffffc1
>>> (gdb) frame 1
>>> #1  0x0000555555e42cb9 in memory_region_get_iommu (mr=0x7fff54239a10) at
>>> include/exec/memory.h:1757
>>> 1757            return memory_region_get_iommu(mr->alias);
>>> (gdb) p mr
>>> $1 = (MemoryRegion *) 0x7fff54239a10
>>>
>>>
>>> [    9.222531] pci 0000:00:02.0: BAR 0 [mem 0xfebc0000-0xfebdffff]
>>> [
>>> Thread 54 "qemu-system-x86" received signal SIGSEGV, Segmentation fault.
>>> [Switching to Thread 0x7ffebeffd6c0 (LWP 14977)]
>>>
>>> (gdb) l
>>> 1004    /* Called from RCU critical section */
>>> 1005    hwaddr memory_region_section_get_iotlb(CPUState *cpu,
>>> 1006                                           MemoryRegionSection
>>> *section)
>>> 1007    {
>>> 1008        AddressSpaceDispatch *d = flatview_to_dispatch(section->fv);
>>> 1009        return section - d->map.sections;
>>> 1010    }
>>> 1011
>>> 1012    static int subpage_register(subpage_t *mmio, uint32_t start,
>>> uint32_t end,
>>> 1013                                uint16_t section);
>>>
>>> (gdb) p *section
>>> $1 = {size = 4940204083636081308795136, mr = 0x7fff98739760, fv =
>>> 0x7fff998f6fd0,
>>>      offset_within_region = 0, offset_within_address_space = 0, readonly =
>>> false,
>>>      nonvolatile = false, unmergeable = 12}
>>>
>>> (gdb) p *section->fv
>>> $2 = {rcu = {next = 0x0, func = 0x20281}, ref = 2555275280, ranges =
>>> 0x7fff99486a60, nr = 0,
>>>      nr_allocated = 0, dispatch = 0x0, root = 0xffffffc1ffffffc1}
>>>
>>> (gdb) bt
>>> #0  0x0000555555e5118c in memory_region_section_get_iotlb
>>> (cpu=cpu@entry=0x55555894fdf0,
>>>        section=section@entry=0x7fff984e6810) at system/physmem.c:1009
>>> #1  0x0000555555e6e07a in tlb_set_page_full (cpu=cpu@entry=0x55555894fdf0,
>>>        mmu_idx=mmu_idx@entry=6, addr=addr@entry=151134208,
>>> full=full@entry=0x7ffebeffbd60)
>>>        at accel/tcg/cputlb.c:1088
>>> #2  0x0000555555e70a92 in tlb_set_page_with_attrs
>>> (cpu=cpu@entry=0x55555894fdf0,
>>>        addr=addr@entry=151134208, paddr=paddr@entry=151134208, attrs=...,
>>> prot=<optimized out>,
>>>        mmu_idx=mmu_idx@entry=6, size=4096) at accel/tcg/cputlb.c:1193
>>> #3  0x0000555555d4ae44 in x86_cpu_tlb_fill (cs=0x55555894fdf0,
>>> addr=151138272,
>>>        size=<optimized out>, access_type=MMU_DATA_STORE, mmu_idx=6,
>>> probe=<optimized out>,
>>>        retaddr=0) at target/i386/tcg/system/excp_helper.c:624
>>> #4  0x0000555555e6e8cf in tlb_fill_align (cpu=0x55555894fdf0,
>>> addr=151138272,
>>>        type=MMU_DATA_LOAD, type@entry=MMU_DATA_STORE, mmu_idx=6,
>>> memop=memop@entry=MO_8,
>>>        size=-1739692016, size@entry=151138272, probe=true, ra=0) at accel/
>>> tcg/cputlb.c:1251
>>> #5  0x0000555555e6eb0d in probe_access_internal
>>> (cpu=cpu@entry=0x55555894fdf0,
>>>        addr=addr@entry=151138272, fault_size=fault_size@entry=0,
>>>        access_type=access_type@entry=MMU_DATA_STORE, mmu_idx=<optimized out>,
>>>        nonfault=nonfault@entry=true, phost=0x7ffebeffc0a8,
>>> pfull=0x7ffebeffbfa0, retaddr=0,
>>>        check_mem_cbs=false) at accel/tcg/cputlb.c:1371
>>> #6  0x0000555555e70c84 in probe_access_full_mmu (env=0x5555589529b0,
>>> addr=addr@entry=151138272,
>>>        size=size@entry=0, access_type=access_type@entry=MMU_DATA_STORE,
>>> mmu_idx=<optimized out>,
>>>        phost=phost@entry=0x7ffebeffc0a8, pfull=0x7ffebeffbfa0) at accel/
>>> tcg/cputlb.c:1439
>>> #7  0x0000555555d497c9 in ptw_translate (inout=0x7ffebeffc090,
>>> addr=151138272)
>>>        at target/i386/tcg/system/excp_helper.c:68
>>> #8  0x0000555555d49988 in mmu_translate (env=env@entry=0x5555589529b0,
>>>        in=in@entry=0x7ffebeffc140, out=out@entry=0x7ffebeffc110,
>>> err=err@entry=0x7ffebeffc120,
>>>        ra=ra@entry=0) at target/i386/tcg/system/excp_helper.c:198
>>> #9  0x0000555555d4aece in get_physical_address (env=0x5555589529b0,
>>> addr=18446741874686299840,
>>>        access_type=MMU_DATA_LOAD, mmu_idx=4, out=0x7ffebeffc110,
>>> err=0x7ffebeffc120, ra=0)
>>>        at target/i386/tcg/system/excp_helper.c:597
>>> #10 x86_cpu_tlb_fill (cs=0x55555894fdf0, addr=18446741874686299840,
>>> size=<optimized out>,
>>>        access_type=MMU_DATA_LOAD, mmu_idx=4, probe=<optimized out>,
>>> retaddr=0)
>>>        at target/i386/tcg/system/excp_helper.c:617
>>> #11 0x0000555555e6e8cf in tlb_fill_align (cpu=0x55555894fdf0,
>>> addr=18446741874686299840,
>>>        type=type@entry=MMU_DATA_LOAD, mmu_idx=4, memop=MO_8,
>>> memop@entry=MO_32, size=-1739692016,
>>>        size@entry=3776, probe=false, ra=0) at accel/tcg/cputlb.c:1251
>>> #12 0x0000555555e6ed4d in mmu_lookup1 (cpu=cpu@entry=0x55555894fdf0,
>>>        data=data@entry=0x7ffebeffc310, memop=memop@entry=MO_32,
>>> mmu_idx=mmu_idx@entry=4,
>>>        access_type=access_type@entry=MMU_DATA_LOAD, ra=ra@entry=0) at
>>> accel/tcg/cputlb.c:1652
>>> #13 0x0000555555e6eea5 in mmu_lookup (cpu=cpu@entry=0x55555894fdf0,
>>>        addr=addr@entry=18446741874686299840, oi=oi@entry=36, ra=ra@entry=0,
>>>        type=type@entry=MMU_DATA_LOAD, l=l@entry=0x7ffebeffc310) at accel/
>>> tcg/cputlb.c:1755
>>> #14 0x0000555555e6f2f3 in do_ld4_mmu (cpu=cpu@entry=0x55555894fdf0,
>>>        addr=addr@entry=18446741874686299840, oi=oi@entry=36, ra=ra@entry=0,
>>>        access_type=access_type@entry=MMU_DATA_LOAD) at accel/tcg/
>>> cputlb.c:2364
>>> #15 0x0000555555e71dba in cpu_ldl_mmu (env=0x5555589529b0,
>>> addr=18446741874686299840, oi=36,
>>>        ra=0) at accel/tcg/ldst_common.c.inc:165
>>> #16 cpu_ldl_le_mmuidx_ra (env=env@entry=0x5555589529b0,
>>> addr=addr@entry=18446741874686299840,
>>>        mmu_idx=<optimized out>, ra=ra@entry=0) at accel/tcg/
>>> ldst_common.c.inc:308
>>> #17 0x0000555555db72da in do_interrupt64 (env=0x5555589529b0, intno=236,
>>> is_int=0, error_code=0,
>>>        next_eip=<optimized out>, is_hw=0) at target/i386/tcg/seg_helper.c:954
>>> #18 do_interrupt_all (cpu=cpu@entry=0x55555894fdf0, intno=236,
>>> is_int=is_int@entry=0,
>>>        error_code=error_code@entry=0, next_eip=next_eip@entry=0,
>>> is_hw=is_hw@entry=1)
>>>        at target/i386/tcg/seg_helper.c:1213
>>> #19 0x0000555555db884a in do_interrupt_x86_hardirq
>>> (env=env@entry=0x5555589529b0,
>>>        intno=<optimized out>, is_hw=is_hw@entry=1) at target/i386/tcg/
>>> seg_helper.c:1245
>>> #20 0x0000555555d4f06f in x86_cpu_exec_interrupt (cs=0x55555894fdf0,
>>>        interrupt_request=<optimized out>) at target/i386/tcg/system/
>>> seg_helper.c:209
>>> #21 0x0000555555e660ed in cpu_handle_interrupt (cpu=0x55555894fdf0,
>>> last_tb=<synthetic pointer>)
>>>        at accel/tcg/cpu-exec.c:851
>>> #22 cpu_exec_loop (cpu=cpu@entry=0x55555894fdf0,
>>> sc=sc@entry=0x7ffebeffc580)
>>>        at accel/tcg/cpu-exec.c:955
>>> #23 0x0000555555e663f1 in cpu_exec_setjmp (cpu=cpu@entry=0x55555894fdf0,
>>>        sc=sc@entry=0x7ffebeffc580) at accel/tcg/cpu-exec.c:1033
>>>        type=type@entry=MMU_DATA_LOAD, l=l@entry=0x7ffebeffc310) at accel/
>>> tcg/cputlb.c:1755
>>> #14 0x0000555555e6f2f3 in do_ld4_mmu (cpu=cpu@entry=0x55555894fdf0,
>>>        addr=addr@entry=18446741874686299840, oi=oi@entry=36, ra=ra@entry=0,
>>>        access_type=access_type@entry=MMU_DATA_LOAD) at accel/tcg/
>>> cputlb.c:2364
>>> #15 0x0000555555e71dba in cpu_ldl_mmu (env=0x5555589529b0,
>>> addr=18446741874686299840, oi=36,
>>>        ra=0) at accel/tcg/ldst_common.c.inc:165
>>> #16 cpu_ldl_le_mmuidx_ra (env=env@entry=0x5555589529b0,
>>> addr=addr@entry=18446741874686299840,
>>>        mmu_idx=<optimized out>, ra=ra@entry=0) at accel/tcg/
>>> ldst_common.c.inc:308
>>> #17 0x0000555555db72da in do_interrupt64 (env=0x5555589529b0, intno=236,
>>> is_int=0, error_code=0,
>>>        next_eip=<optimized out>, is_hw=0) at target/i386/tcg/seg_helper.c:954
>>> #18 do_interrupt_all (cpu=cpu@entry=0x55555894fdf0, intno=236,
>>> is_int=is_int@entry=0,
>>>        error_code=error_code@entry=0, next_eip=next_eip@entry=0,
>>> is_hw=is_hw@entry=1)
>>>        at target/i386/tcg/seg_helper.c:1213
>>> #19 0x0000555555db884a in do_interrupt_x86_hardirq
>>> (env=env@entry=0x5555589529b0,
>>>        intno=<optimized out>, is_hw=is_hw@entry=1) at target/i386/tcg/
>>> seg_helper.c:1245
>>> #20 0x0000555555d4f06f in x86_cpu_exec_interrupt (cs=0x55555894fdf0,
>>>        interrupt_request=<optimized out>) at target/i386/tcg/system/
>>> seg_helper.c:209
>>> #21 0x0000555555e660ed in cpu_handle_interrupt (cpu=0x55555894fdf0,
>>> last_tb=<synthetic pointer>)
>>>        at accel/tcg/cpu-exec.c:851
>>> #22 cpu_exec_loop (cpu=cpu@entry=0x55555894fdf0,
>>> sc=sc@entry=0x7ffebeffc580)
>>>        at accel/tcg/cpu-exec.c:955
>>> #23 0x0000555555e663f1 in cpu_exec_setjmp (cpu=cpu@entry=0x55555894fdf0,
>>>        sc=sc@entry=0x7ffebeffc580) at accel/tcg/cpu-exec.c:1033
>>> --Type <RET> for more, q to quit, c to continue without paging--
>>> #24 0x0000555555e66a5d in cpu_exec (cpu=cpu@entry=0x55555894fdf0) at
>>> accel/tcg/cpu-exec.c:1059
>>> #25 0x0000555555d2bdc7 in tcg_cpu_exec (cpu=cpu@entry=0x55555894fdf0)
>>>        at accel/tcg/tcg-accel-ops.c:80
>>> #26 0x0000555555d2c1c3 in mttcg_cpu_thread_fn
>>> (arg=arg@entry=0x55555894fdf0)
>>>        at accel/tcg/tcg-accel-ops-mttcg.c:94
>>> #27 0x0000555556056d90 in qemu_thread_start (args=0x5555589cdba0) at
>>> util/qemu-thread-posix.c:541
>>> #28 0x00007ffff60e0b7b in ?? () from /lib/x86_64-linux-gnu/libc.so.6
>>> #29 0x00007ffff615e7b8 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
>>>
>>>
>>> qemu-system-x86_64: ./include/exec/ram_addr.h:91: ramblock_ptr:
>>> Assertion `offset_in_ramblock(block, offset)' failed.
>>>
>>> (gdb) bt
>>> #0  0x00007ffff6076507 in abort () from /lib/x86_64-linux-gnu/libc.so.6
>>> #1  0x00007ffff6076420 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
>>> #2  0x0000555555a047fa in ramblock_ptr (offset=281471527758833,
>>> block=<optimized out>)
>>>        at ./include/exec/ram_addr.h:91
>>> #3  0x0000555555a04c83 in ramblock_ptr (block=<optimized out>,
>>> offset=<optimized out>)
>>>        at system/physmem.c:2238
>>> #4  qemu_ram_ptr_length (lock=false, is_write=true, block=<optimized
>>> out>, addr=<optimized out>,
>>>        size=0x0) at system/physmem.c:2430
>>> #5  qemu_map_ram_ptr (ram_block=<optimized out>, addr=<optimized out>)
>>> at system/physmem.c:2443
>>> #6  0x0000555555e4af6b in memory_region_get_ram_ptr (mr=<optimized out>)
>>> at system/memory.c:2452
>>> #7  0x0000555555e6e024 in tlb_set_page_full (cpu=cpu@entry=0x5555589f9f50,
>>>        mmu_idx=mmu_idx@entry=4, addr=addr@entry=18446741874686296064,
>>>        full=full@entry=0x7ffebd7f90b0) at accel/tcg/cputlb.c:1065
>>> #8  0x0000555555e70a92 in tlb_set_page_with_attrs
>>> (cpu=cpu@entry=0x5555589f9f50,
>>>        addr=addr@entry=18446741874686296064, paddr=paddr@entry=206749696,
>>> attrs=...,
>>>        prot=<optimized out>, mmu_idx=mmu_idx@entry=4, size=4096) at accel/
>>> tcg/cputlb.c:1193
>>> #9  0x0000555555d4ae44 in x86_cpu_tlb_fill (cs=0x5555589f9f50,
>>> addr=18446741874686299840,
>>>        size=<optimized out>, access_type=MMU_DATA_LOAD, mmu_idx=4,
>>> probe=<optimized out>, retaddr=0)
>>>        at target/i386/tcg/system/excp_helper.c:624
>>> #10 0x0000555555e6e8cf in tlb_fill_align (cpu=0x5555589f9f50,
>>> addr=18446741874686299840,
>>>        type=type@entry=MMU_DATA_LOAD, mmu_idx=4, memop=MO_8,
>>> memop@entry=MO_32, size=-1115714056,
>>>        size@entry=3776, probe=false, ra=0) at accel/tcg/cputlb.c:1251
>>> #11 0x0000555555e6ed4d in mmu_lookup1 (cpu=cpu@entry=0x5555589f9f50,
>>>        data=data@entry=0x7ffebd7f9310, memop=memop@entry=MO_32,
>>> mmu_idx=mmu_idx@entry=4,
>>>        access_type=access_type@entry=MMU_DATA_LOAD, ra=ra@entry=0) at
>>> accel/tcg/cputlb.c:1652
>>> #12 0x0000555555e6eea5 in mmu_lookup (cpu=cpu@entry=0x5555589f9f50,
>>>        addr=addr@entry=18446741874686299840, oi=oi@entry=36, ra=ra@entry=0,
>>>        type=type@entry=MMU_DATA_LOAD, l=l@entry=0x7ffebd7f9310) at accel/
>>> tcg/cputlb.c:1755
>>> #13 0x0000555555e6f2f3 in do_ld4_mmu (cpu=cpu@entry=0x5555589f9f50,
>>>        addr=addr@entry=18446741874686299840, oi=oi@entry=36, ra=ra@entry=0,
>>>        access_type=access_type@entry=MMU_DATA_LOAD) at accel/tcg/
>>> cputlb.c:2364
>>> #14 0x0000555555e71dba in cpu_ldl_mmu (env=0x5555589fcb10,
>>> addr=18446741874686299840, oi=36,
>>>        ra=0) at accel/tcg/ldst_common.c.inc:165
>>> #15 cpu_ldl_le_mmuidx_ra (env=env@entry=0x5555589fcb10,
>>> addr=addr@entry=18446741874686299840,
>>>        mmu_idx=<optimized out>, ra=ra@entry=0) at accel/tcg/
>>> ldst_common.c.inc:308
>>> #16 0x0000555555db72da in do_interrupt64 (env=0x5555589fcb10, intno=236,
>>> is_int=0, error_code=0,
>>>        next_eip=<optimized out>, is_hw=0) at target/i386/tcg/seg_helper.c:954
>>> #17 do_interrupt_all (cpu=cpu@entry=0x5555589f9f50, intno=236,
>>> is_int=is_int@entry=0,
>>>        error_code=error_code@entry=0, next_eip=next_eip@entry=0,
>>> is_hw=is_hw@entry=1)
>>>        at target/i386/tcg/seg_helper.c:1213
>>> #18 0x0000555555db884a in do_interrupt_x86_hardirq
>>> (env=env@entry=0x5555589fcb10,
>>>        intno=<optimized out>, is_hw=is_hw@entry=1) at target/i386/tcg/
>>> seg_helper.c:1245
>>> #19 0x0000555555d4f06f in x86_cpu_exec_interrupt (cs=0x5555589f9f50,
>>>        interrupt_request=<optimized out>) at target/i386/tcg/system/
>>> seg_helper.c:209
>>> #20 0x0000555555e660ed in cpu_handle_interrupt (cpu=0x5555589f9f50,
>>> last_tb=<synthetic pointer>)
>>>        at accel/tcg/cpu-exec.c:851
>>> #21 cpu_exec_loop (cpu=cpu@entry=0x5555589f9f50,
>>> sc=sc@entry=0x7ffebd7f9580)
>>>        at accel/tcg/cpu-exec.c:955
>>> #22 0x0000555555e663f1 in cpu_exec_setjmp (cpu=cpu@entry=0x5555589f9f50,
>>>        sc=sc@entry=0x7ffebd7f9580) at accel/tcg/cpu-exec.c:1033
>>> #23 0x0000555555e66a5d in cpu_exec (cpu=cpu@entry=0x5555589f9f50) at
>>> accel/tcg/cpu-exec.c:1059
>>> #24 0x0000555555d2bdc7 in tcg_cpu_exec (cpu=cpu@entry=0x5555589f9f50)
>>>        at accel/tcg/tcg-accel-ops.c:80
>>> #25 0x0000555555d2c1c3 in mttcg_cpu_thread_fn
>>> (arg=arg@entry=0x5555589f9f50)
>>>        at accel/tcg/tcg-accel-ops-mttcg.c:94
>>> #26 0x0000555556056d90 in qemu_thread_start (args=0x55555856bf60) at
>>> util/qemu-thread-posix.c:541
>>> #27 0x00007ffff60e0b7b in ?? () from /lib/x86_64-linux-gnu/libc.so.6
>>> #28 0x00007ffff615e7b8 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
>>>
>>> (gdb) frame 2
>>> #2  0x0000555555a047fa in ramblock_ptr (offset=281471527758833,
>>> block=<optimized out>)
>>>        at ./include/exec/ram_addr.h:91
>>> 91          assert(offset_in_ramblock(block, offset));
>>>
>>> (gdb) l
>>> 86          return (b && b->host && offset < b->used_length) ? true :
>>> false;
>>> 87      }
>>> 88
>>> 89      static inline void *ramblock_ptr(RAMBlock *block, ram_addr_t
>>> offset)
>>> 90      {
>>> 91          assert(offset_in_ramblock(block, offset));
>>> 92          return (char *)block->host + offset;
>>> 93      }
>>> 94
>>> 95      static inline unsigned long int ramblock_recv_bitmap_offset(void
>>> *host_addr,
>>>
>>>
>>> [    9.439487] pci 0000:00:02.0: BAR 1 [io  0xc000-0xc03f]
>>>
>>> Thread 65 "qemu-system-x86" received signal SIGSEGV, Segmentation fault.
>>> [Switching to Thread 0x7ffe9cff96c0 (LWP 15472)]
>>> phys_page_find (d=d@entry=0x7fff905ec880, addr=addr@entry=111288320) at
>>> system/physmem.c:337
>>> 337         if (section_covers_addr(&sections[lp.ptr], addr)) {
>>> (gdb) l
>>> 332             }
>>> 333             p = nodes[lp.ptr];
>>> 334             lp = p[(index >> (i * P_L2_BITS)) & (P_L2_SIZE - 1)];
>>> 335         }
>>> 336
>>> 337         if (section_covers_addr(&sections[lp.ptr], addr)) {
>>> 338             return &sections[lp.ptr];
>>> 339         } else {
>>> 340             return &sections[PHYS_SECTION_UNASSIGNED];
>>> 341         }
>>> (gdb)
>>>
>>>
>>> I was doing a bisection between 9.2.0 and 10.0.0, since we observed
>>> this issue happening wiht 10.0 but not - with 9.2.  So some of the
>>> above failures might be somewhere from the middle between 9.2 and
>>> 10.0.  However, I was able to trigger some of the failures with
>>> 9.2.0, though with much less probability.  And some can be triggered
>>> in current master too, with much better probability.
>>>
>>> On my 4-core notebook, the above command line fails every 20..50 run.
>>>
>>> I was never able to reproduce the assertion failure as shown in !1921.
>>>
>>> As of now, this issue is hitting debian trixie, - in debci, when a
>>> package which creates a guest image tries to run qemu but in the
>>> debci environment there's no kvm available, so it resorts to tcg.
>>>
>>> On IRC, Manos Pitsidianakis noted that he was debugging use-after-free
>>> with MemoryRegion recently, and posted a patch which can help a bit:
>>> https://people.linaro.org/~manos.pitsidianakis/backtrace.diff
>>>
>>> I'm not sure where to go from here.
>>>
>>> Just collecting everything we have now.
>>>
>>> Thanks,
>>>
>>> /mjt
>>>
>>
> 
> looks like a good target for TSAN, which might expose the race without
> really having to trigger it.
> https://www.qemu.org/docs/master/devel/testing/main.html#building-and-testing-with-tsan
> 
> Else, you can reproduce your run using rr record -h (chaos mode) [1],
> which randomly schedules threads, until it catches the segfault, and
> then you'll have a reproducible case to debug.
>

In case you never had opportunity to use rr, it is quite convenient, 
because you can set a hardware watchpoint on your faulty pointer (watch 
-l), do a reverse-continue, and in most cases, you'll directly reach 
where the bug happened. Feels like cheating.

> [1] https://github.com/rr-debugger/rr
> 
> Regards,
> Pierrick



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: apparent race condition in mttcg memory handling
  2025-07-21 16:29     ` Pierrick Bouvier
@ 2025-07-21 17:14       ` Michael Tokarev
  2025-07-21 17:25         ` Pierrick Bouvier
  0 siblings, 1 reply; 12+ messages in thread
From: Michael Tokarev @ 2025-07-21 17:14 UTC (permalink / raw)
  To: Pierrick Bouvier, Philippe Mathieu-Daudé, QEMU Development
  Cc: Jonathan Cameron, Alex Bennée, Richard Henderson,
	Paolo Bonzini, Stefan Hajnoczi, Mark Cave-Ayland

On 21.07.2025 19:29, Pierrick Bouvier wrote:
> On 7/21/25 9:23 AM, Pierrick Bouvier wrote:
..
>> looks like a good target for TSAN, which might expose the race without
>> really having to trigger it.
>> https://www.qemu.org/docs/master/devel/testing/main.html#building-and- 
>> testing-with-tsan

I think I tried with TSAN and it gave something useful even.
The prob now is to reproduce the thing by someone more familiar
with this stuff than me :)

>> Else, you can reproduce your run using rr record -h (chaos mode) [1],
>> which randomly schedules threads, until it catches the segfault, and
>> then you'll have a reproducible case to debug.
> 
> In case you never had opportunity to use rr, it is quite convenient, 
> because you can set a hardware watchpoint on your faulty pointer (watch 
> -l), do a reverse-continue, and in most cases, you'll directly reach 
> where the bug happened. Feels like cheating.

rr is the first thing I tried.  Nope, it's absolutely hopeless.   It
tried to boot just the kernel for over 30 minutes, after which I just
gave up.

Thanks,

/mjt


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: apparent race condition in mttcg memory handling
  2025-07-21 17:14       ` Michael Tokarev
@ 2025-07-21 17:25         ` Pierrick Bouvier
  2025-07-21 17:28           ` Pierrick Bouvier
  2025-07-21 17:31           ` Peter Maydell
  0 siblings, 2 replies; 12+ messages in thread
From: Pierrick Bouvier @ 2025-07-21 17:25 UTC (permalink / raw)
  To: Michael Tokarev, Philippe Mathieu-Daudé, QEMU Development
  Cc: Jonathan Cameron, Alex Bennée, Richard Henderson,
	Paolo Bonzini, Stefan Hajnoczi, Mark Cave-Ayland

On 7/21/25 10:14 AM, Michael Tokarev wrote:
> On 21.07.2025 19:29, Pierrick Bouvier wrote:
>> On 7/21/25 9:23 AM, Pierrick Bouvier wrote:
> ..
>>> looks like a good target for TSAN, which might expose the race without
>>> really having to trigger it.
>>> https://www.qemu.org/docs/master/devel/testing/main.html#building-and-
>>> testing-with-tsan
> 
> I think I tried with TSAN and it gave something useful even.
> The prob now is to reproduce the thing by someone more familiar
> with this stuff than me :)
> 
>>> Else, you can reproduce your run using rr record -h (chaos mode) [1],
>>> which randomly schedules threads, until it catches the segfault, and
>>> then you'll have a reproducible case to debug.
>>
>> In case you never had opportunity to use rr, it is quite convenient,
>> because you can set a hardware watchpoint on your faulty pointer (watch
>> -l), do a reverse-continue, and in most cases, you'll directly reach
>> where the bug happened. Feels like cheating.
> 
> rr is the first thing I tried.  Nope, it's absolutely hopeless.   It
> tried to boot just the kernel for over 30 minutes, after which I just
> gave up.
>

I had a similar thing to debug recently, and with a simple loop, I 
couldn't expose it easily. The bug I had was triggered with 3% 
probability, which seems close from yours.
As rr record -h is single threaded, I found useful to write a wrapper 
script [1] to run one instance, and then run it in parallel using:
./run_one.sh | head -n 10000 | parallel --bar -j$(nproc)

With that, I could expose the bug in 2 minutes reliably (vs trying for 
more than one hour before). With your 64 cores, I'm sure it will quickly 
expose it.

Might be worth a try, as you need to only catch the bug once to be able 
to reproduce it.

[1] https://github.com/pbo-linaro/qemu/blob/master/try_rme.sh

> Thanks,
> 
> /mjt



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: apparent race condition in mttcg memory handling
  2025-07-21 17:25         ` Pierrick Bouvier
@ 2025-07-21 17:28           ` Pierrick Bouvier
  2025-07-21 17:31           ` Peter Maydell
  1 sibling, 0 replies; 12+ messages in thread
From: Pierrick Bouvier @ 2025-07-21 17:28 UTC (permalink / raw)
  To: Michael Tokarev, Philippe Mathieu-Daudé, QEMU Development
  Cc: Jonathan Cameron, Alex Bennée, Richard Henderson,
	Paolo Bonzini, Stefan Hajnoczi, Mark Cave-Ayland

On 7/21/25 10:25 AM, Pierrick Bouvier wrote:
> On 7/21/25 10:14 AM, Michael Tokarev wrote:
>> On 21.07.2025 19:29, Pierrick Bouvier wrote:
>>> On 7/21/25 9:23 AM, Pierrick Bouvier wrote:
>> ..
>>>> looks like a good target for TSAN, which might expose the race without
>>>> really having to trigger it.
>>>> https://www.qemu.org/docs/master/devel/testing/main.html#building-and-
>>>> testing-with-tsan
>>
>> I think I tried with TSAN and it gave something useful even.
>> The prob now is to reproduce the thing by someone more familiar
>> with this stuff than me :)
>>
>>>> Else, you can reproduce your run using rr record -h (chaos mode) [1],
>>>> which randomly schedules threads, until it catches the segfault, and
>>>> then you'll have a reproducible case to debug.
>>>
>>> In case you never had opportunity to use rr, it is quite convenient,
>>> because you can set a hardware watchpoint on your faulty pointer (watch
>>> -l), do a reverse-continue, and in most cases, you'll directly reach
>>> where the bug happened. Feels like cheating.
>>
>> rr is the first thing I tried.  Nope, it's absolutely hopeless.   It
>> tried to boot just the kernel for over 30 minutes, after which I just
>> gave up.
>>
> 
> I had a similar thing to debug recently, and with a simple loop, I
> couldn't expose it easily. The bug I had was triggered with 3%
> probability, which seems close from yours.
> As rr record -h is single threaded, I found useful to write a wrapper
> script [1] to run one instance, and then run it in parallel using:
> ./run_one.sh | head -n 10000 | parallel --bar -j$(nproc)
> 
> With that, I could expose the bug in 2 minutes reliably (vs trying for
> more than one hour before). With your 64 cores, I'm sure it will quickly
> expose it.
> 
> Might be worth a try, as you need to only catch the bug once to be able
> to reproduce it.
> 
> [1] https://github.com/pbo-linaro/qemu/blob/master/try_rme.sh
>

In this script, I finally used qemu rr feature (as QEMU was working 
fine, but there was a bug in the software stack itself, that I wanted to 
investigate under gdbstub). But I was mentioning the same approach using 
rr (the tool).

>> Thanks,
>>
>> /mjt
> 



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: apparent race condition in mttcg memory handling
  2025-07-21 17:25         ` Pierrick Bouvier
  2025-07-21 17:28           ` Pierrick Bouvier
@ 2025-07-21 17:31           ` Peter Maydell
  2025-07-21 17:52             ` Pierrick Bouvier
  1 sibling, 1 reply; 12+ messages in thread
From: Peter Maydell @ 2025-07-21 17:31 UTC (permalink / raw)
  To: Pierrick Bouvier
  Cc: Michael Tokarev, Philippe Mathieu-Daudé, QEMU Development,
	Jonathan Cameron, Alex Bennée, Richard Henderson,
	Paolo Bonzini, Stefan Hajnoczi, Mark Cave-Ayland

On Mon, 21 Jul 2025 at 18:26, Pierrick Bouvier
<pierrick.bouvier@linaro.org> wrote:
>
> On 7/21/25 10:14 AM, Michael Tokarev wrote:
> > rr is the first thing I tried.  Nope, it's absolutely hopeless.   It
> > tried to boot just the kernel for over 30 minutes, after which I just
> > gave up.
> >
>
> I had a similar thing to debug recently, and with a simple loop, I
> couldn't expose it easily. The bug I had was triggered with 3%
> probability, which seems close from yours.
> As rr record -h is single threaded, I found useful to write a wrapper
> script [1] to run one instance, and then run it in parallel using:
> ./run_one.sh | head -n 10000 | parallel --bar -j$(nproc)
>
> With that, I could expose the bug in 2 minutes reliably (vs trying for
> more than one hour before). With your 64 cores, I'm sure it will quickly
> expose it.

I think the problem here is that the whole runtime to get to
point-of-potential failure is too long, not that it takes too
many runs to get a failure.

For that kind of thing I have had success in the past with
making a QEMU snapshot close to the point of failure so that
the actual runtime that it's necessary to record under rr is
reduced.

-- PMM


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: apparent race condition in mttcg memory handling
  2025-07-21 17:31           ` Peter Maydell
@ 2025-07-21 17:52             ` Pierrick Bouvier
  0 siblings, 0 replies; 12+ messages in thread
From: Pierrick Bouvier @ 2025-07-21 17:52 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Michael Tokarev, Philippe Mathieu-Daudé, QEMU Development,
	Jonathan Cameron, Alex Bennée, Richard Henderson,
	Paolo Bonzini, Stefan Hajnoczi, Mark Cave-Ayland

On 7/21/25 10:31 AM, Peter Maydell wrote:
> On Mon, 21 Jul 2025 at 18:26, Pierrick Bouvier
> <pierrick.bouvier@linaro.org> wrote:
>>
>> On 7/21/25 10:14 AM, Michael Tokarev wrote:
>>> rr is the first thing I tried.  Nope, it's absolutely hopeless.   It
>>> tried to boot just the kernel for over 30 minutes, after which I just
>>> gave up.
>>>
>>
>> I had a similar thing to debug recently, and with a simple loop, I
>> couldn't expose it easily. The bug I had was triggered with 3%
>> probability, which seems close from yours.
>> As rr record -h is single threaded, I found useful to write a wrapper
>> script [1] to run one instance, and then run it in parallel using:
>> ./run_one.sh | head -n 10000 | parallel --bar -j$(nproc)
>>
>> With that, I could expose the bug in 2 minutes reliably (vs trying for
>> more than one hour before). With your 64 cores, I'm sure it will quickly
>> expose it.
> 
> I think the problem here is that the whole runtime to get to
> point-of-potential failure is too long, not that it takes too
> many runs to get a failure.
> 
> For that kind of thing I have had success in the past with
> making a QEMU snapshot close to the point of failure so that
> the actual runtime that it's necessary to record under rr is
> reduced.
>

That's a good idea indeed. In the bug I had, it was due to KASLR address 
chosen, so by using a snapshot I would have had not expose the random 
aspect.
In case of current bug, it seems to be a proper race condition, so 
trying more combinations with a preloaded snapshot to save a few seconds 
per run is a good point.

> -- PMM



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: apparent race condition in mttcg memory handling
  2025-05-30 19:20 apparent race condition in mttcg memory handling Michael Tokarev
  2025-06-04 10:47 ` Michael Tokarev
  2025-07-21 11:47 ` Philippe Mathieu-Daudé
@ 2025-07-22 20:11 ` Gustavo Romero
  2025-07-23  6:31   ` Michael Tokarev
  2 siblings, 1 reply; 12+ messages in thread
From: Gustavo Romero @ 2025-07-22 20:11 UTC (permalink / raw)
  To: Michael Tokarev, QEMU Development

Hi Michael,

On 5/30/25 16:20, Michael Tokarev wrote:
> Hi!
> 
> For quite some time (almost whole day yesterday) I'm trying to find out
> what's going on with mmtcg in qemu.  There's apparently a race condition
> somewhere, like a use-after-free or something.
> 
> It started as an incarnation of
> https://gitlab.com/qemu-project/qemu/-/issues/1921 -- the same assertion
> failure, but on an x86_64 host this time (it's also mentioned in that
> issue).
> 
> However, that particular assertion failure is not the only possible
> outcome.  We're hitting multiple assertion failures or SIGSEGVs in
> physmem.c and related files, - 4 or 5 different places so far.
> 
> The problem here is that the bug is rather difficult to reproduce.
> What I've been using so far was to make most host cores busy, and
> specify amount of virtual CPUs close to actual host cores (threads).
> 
> For example, on my 4-core, 8-threads notebook, I used `stress -c 8`
> and ran qemu with -smp 10, to trigger this issue. However, on this
> very notebook it is really difficult to trigger it, - it happens
> every 30..50 runs or so.
> 
> The reproducer I was using - it was just booting kernel, no user-
> space is needed. Qemu crashes during kernel init, or it runs fine.
> 
> I used regular kernel from debian sid:
>   http://deb.debian.org/debian/pool/main/l/linux-signed-amd64/linux-image-amd64_6.12.29-1_amd64.deb
> Extract vmlinuz-6.12.29-amd64 from there.

The link above is broken. Googling for "linux-image-amd64_6.12.29-1_amd64.deb" successful too.

Could you point out another image to reproduce it?


Cheers,
Gustavo

> In order to simplify the reproducing, I created a tiny initrd with
> just one executable in there, which does a poweroff:
> 
> cat >poweroff.c <<'EOF'
> #include <sys/reboot.h>
> #include <unistd.h>
> 
> int main(void) {
>    reboot(RB_POWER_OFF);
>    sleep(5);
>    return 0;
> }
> EOF
> diet gcc -static -o init poweroff.c
> echo init | cpio -o -H newc > initrd
> 
> (it uses dietlibc, optional, just to make the initrd smaller).
> 
> Now, the qemu invocation I used:
> 
>   qemu-system-x86_64 -kernel vmlinuz -initrd initrd \
>     -append "console=ttyS0" \
>     -vga none -display none \
>     -serial file:/dev/tty \
>     -monitor stdio \
>     -m 256 \
>     -smp 16
> 
> This way, it either succeeds, terminating normally due to
> the initrd hating the system, or it will segfault or assert
> as per the issue.
> 
> For a 64-core machine, I used -smp 64, and had 16..40 cores
> being busy with other stuff.  Also, adding `nice' in front
> of that command apparently helps.
> 
> Now, to the various issues/places I've hit.  Here's a typical
> output:
> 
> ...
> [    3.129806] smpboot: x86: Booting SMP configuration:
> [    3.135789] .... node  #0, CPUs:        #1  #2  #3  #4  #5  #6  #7 #8  #9
> [    0.000000] calibrate_delay_direct() failed to get a good estimate for loops_per_jiffy.
> [    0.000000] Probably due to long platform interrupts. Consider using "lpj=" boot option.
> [    0.000000] calibrate_delay_direct() failed to get a good estimate for loops_per_jiffy.
> [    0.000000] Probably due to long platform interrupts. Consider using "lpj=" boot option.
> [    0.000000] calibrate_delay_direct() failed to get a good estimate for loops_per_jiffy.
> [    0.000000] Probably due to long platform interrupts. Consider using "lpj=" boot option.
> [    0.000000] calibrate_delay_direct() failed to get a good estimate for loops_per_jiffy.
> [    0.000000] Probably due to long platform interrupts. Consider using "lpj=" boot option.
> [    4.494389] calibrate_delay_direct() failed to get a good estimate for loops_per_jiffy.
> [    4.494389] Probably due to long platform interrupts. Consider using "lpj=" boot option.
> [    4.494396] calibrate_delay_direct() failed to get a good estimate for loops_per_jiffy.
> [    4.494396] Probably due to long platform interrupts. Consider using "lpj=" boot option.
> [    4.494401] calibrate_delay_direct() failed to get a good estimate for loops_per_jiffy.
> [    4.494401] Probably due to long platform interrupts. Consider using "lpj=" boot option.
> [    4.494408] calibrate_delay_direct() failed to get a good estimate for loops_per_jiffy.
> [    4.494408] Probably due to long platform interrupts. Consider using "lpj=" boot option.
> [    4.494415] calibrate_delay_direct() failed to get a good estimate for loops_per_jiffy.
> [    4.494415] Probably due to long platform interrupts. Consider using "lpj=" boot option.
> [    5.864038] smp: Brought up 1 node, 10 CPUs
> [    5.865772] smpboot: Total of 10 processors activated (25983.25 BogoMIPS)
> [    6.119683] Memory: 200320K/261624K available (16384K kernel code, 2486K rwdata, 11780K rodata, 4148K init, 4956K bss, 53176K reserved, 0K cma-reserved)
> [    6.591933] devtmpfs: initialized
> [    6.635844] x86/mm: Memory block size: 128MB
> [    6.756849] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645041785100000 ns
> [    6.774545] futex hash table entries: 4096 (order: 6, 262144 bytes, linear)
> [    6.840775] pinctrl core: initialized pinctrl subsystem
> [    7.117085] NET: Registered PF_NETLINK/PF_ROUTE protocol family
> [    7.165883] DMA: preallocated 128 KiB GFP_KERNEL pool for atomic allocations
> [    7.184243] DMA: preallocated 128 KiB GFP_KERNEL|GFP_DMA pool for atomic allocations
> [    7.188322] DMA: preallocated 128 KiB GFP_KERNEL|GFP_DMA32 pool for atomic allocations
> [    7.195902] audit: initializing netlink subsys (disabled)
> [    7.223865] audit: type=2000 audit(1748628013.324:1): state=initialized audit_enabled=0 res=1
> [    7.290904] thermal_sys: Registered thermal governor 'fair_share'
> [    7.291980] thermal_sys: Registered thermal governor 'bang_bang'
> [    7.295875] thermal_sys: Registered thermal governor 'step_wise'
> [    7.299817] thermal_sys: Registered thermal governor 'user_space'
> [    7.303804] thermal_sys: Registered thermal governor 'power_allocator'
> [    7.316281] cpuidle: using governor ladder
> [    7.331907] cpuidle: using governor menu
> [    7.348199] acpiphp: ACPI Hot Plug PCI Controller Driver version: 0.5
> [    7.407802] PCI: Using configuration type 1 for base access
> [    7.417386] mtrr: your CPUs had inconsistent fixed MTRR settings
> [    7.418244] mtrr: your CPUs had inconsistent variable MTRR settings
> [    7.419048] mtrr: your CPUs had inconsistent MTRRdefType settings
> [    7.419938] mtrr: probably your BIOS does not setup all CPUs.
> [    7.420691] mtrr: corrected configuration.
> [    7.461270] kprobes: kprobe jump-optimization is enabled. All kprobes are optimized if possible.
> [    7.591938] HugeTLB: registered 2.00 MiB page size, pre-allocated 0 pages
> [    7.595986] HugeTLB: 28 KiB vmemmap can be freed for a 2.00 MiB page
> [    7.816900] ACPI: Added _OSI(Module Device)
> [    7.819950] ACPI: Added _OSI(Processor Device)
> [    7.823873] ACPI: Added _OSI(3.0 _SCP Extensions)
> [    7.827683] ACPI: Added _OSI(Processor Aggregator Device)
> [    8.000944] ACPI: 1 ACPI AML tables successfully acquired and loaded
> [    8.355952] ACPI: Interpreter enabled
> [    8.406604] ACPI: PM: (supports S0 S3 S4 S5)
> [    8.416143] ACPI: Using IOAPIC for interrupt routing
> [    8.448173] PCI: Using host bridge windows from ACPI; if necessary, use "pci=nocrs" and report a bug
> [    8.468051] PCI: Using E820 reservations for host bridge windows
> [    8.562534] ACPI: Enabled 2 GPEs in block 00 to 0F
> [    9.153432] ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
> [    9.166585] acpi PNP0A03:00: _OSC: OS supports [ASPM ClockPM Segments MSI HPX-Type3]
> [    9.168452] acpi PNP0A03:00: _OSC: not requesting OS control; OS requires [ExtendedConfig ASPM ClockPM MSI]
> [    9.181933] acpi PNP0A03:00: fail to add MMCONFIG information, can't access extended configuration space under this bridge
> [    9.297562] acpiphp: Slot [2] registered
> ...
> [    9.369007] PCI host bridge to bus 0000:00
> [    9.376590] pci_bus 0000:00: root bus resource [io  0x0000-0x0cf7 window]
> [    9.379987] pci_bus 0000:00: root bus resource [io  0x0d00-0xffff window]
> [    9.383826] pci_bus 0000:00: root bus resource [mem 0x000a0000-0x000bffff window]
> [    9.387818] pci_bus 0000:00: root bus resource [mem 0x10000000-0xfebfffff window]
> [    9.393681] pci_bus 0000:00: root bus resource [mem 0x100000000-0x17fffffff window]
> [    9.396987] pci_bus 0000:00: root bus resource [bus 00-ff]
> [    9.414378] pci 0000:00:00.0: [8086:1237] type 00 class 0x060000 conventional PCI endpoint
> [    9.477179] pci 0000:00:01.0: [8086:7000] type 00 class 0x060100 conventional PCI endpoint
> [    9.494836] pci 0000:00:01.1: [8086:7010] type 00 class 0x010180 conventional PCI endpoint
> [    9.527173] pci 0000:00:01.1: BAR 4 [io  0xc040-0xc04f]
> Segmentation fault
> 
> 
> So it breaks somewhere in PCI init, after SMP/CPUs has been inited
> by the guest kernel.
> 
> Thread 21 "qemu-system-x86" received signal SIGSEGV, Segmentation fault.
> 0x0000555555e2e9c0 in section_covers_addr (section=0x7fff58307, addr=182591488) at ../system/physmem.c:309
> 309         return int128_gethi(section->size) ||
> (gdb) p *section
> Cannot access memory at address 0x7fff58307
> 
> This one has been seen multiple times.
> 
> Thread 53 "qemu-system-x86" received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7ffe8a7fc6c0 (LWP 104067)]
> 0x0000555555e30382 in memory_region_section_get_iotlb (cpu=0x5555584e0a90, section=0x7fff58c3eac0) at
>               ../system/physmem.c:1002
> 1002        return section - d->map.sections;
> d is NULL here
> 
> 
> Thread 22 "qemu-system-x86" received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7fff0bfff6c0 (LWP 57595)]
> 0x0000555555e42c9a in memory_region_get_iommu (mr=0xffffffc1ffffffc1) at include/exec/memory.h:1756
> 1756        if (mr->alias) {
> (gdb) p *mr
> Cannot access memory at address 0xffffffc1ffffffc1
> (gdb) frame 1
> #1  0x0000555555e42cb9 in memory_region_get_iommu (mr=0x7fff54239a10) at include/exec/memory.h:1757
> 1757            return memory_region_get_iommu(mr->alias);
> (gdb) p mr
> $1 = (MemoryRegion *) 0x7fff54239a10
> 
> 
> [    9.222531] pci 0000:00:02.0: BAR 0 [mem 0xfebc0000-0xfebdffff]
> [
> Thread 54 "qemu-system-x86" received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7ffebeffd6c0 (LWP 14977)]
> 
> (gdb) l
> 1004    /* Called from RCU critical section */
> 1005    hwaddr memory_region_section_get_iotlb(CPUState *cpu,
> 1006                                           MemoryRegionSection *section)
> 1007    {
> 1008        AddressSpaceDispatch *d = flatview_to_dispatch(section->fv);
> 1009        return section - d->map.sections;
> 1010    }
> 1011
> 1012    static int subpage_register(subpage_t *mmio, uint32_t start, uint32_t end,
> 1013                                uint16_t section);
> 
> (gdb) p *section
> $1 = {size = 4940204083636081308795136, mr = 0x7fff98739760, fv = 0x7fff998f6fd0,
>    offset_within_region = 0, offset_within_address_space = 0, readonly = false,
>    nonvolatile = false, unmergeable = 12}
> 
> (gdb) p *section->fv
> $2 = {rcu = {next = 0x0, func = 0x20281}, ref = 2555275280, ranges = 0x7fff99486a60, nr = 0,
>    nr_allocated = 0, dispatch = 0x0, root = 0xffffffc1ffffffc1}
> 
> (gdb) bt
> #0  0x0000555555e5118c in memory_region_section_get_iotlb (cpu=cpu@entry=0x55555894fdf0,
>      section=section@entry=0x7fff984e6810) at system/physmem.c:1009
> #1  0x0000555555e6e07a in tlb_set_page_full (cpu=cpu@entry=0x55555894fdf0,
>      mmu_idx=mmu_idx@entry=6, addr=addr@entry=151134208, full=full@entry=0x7ffebeffbd60)
>      at accel/tcg/cputlb.c:1088
> #2  0x0000555555e70a92 in tlb_set_page_with_attrs (cpu=cpu@entry=0x55555894fdf0,
>      addr=addr@entry=151134208, paddr=paddr@entry=151134208, attrs=..., prot=<optimized out>,
>      mmu_idx=mmu_idx@entry=6, size=4096) at accel/tcg/cputlb.c:1193
> #3  0x0000555555d4ae44 in x86_cpu_tlb_fill (cs=0x55555894fdf0, addr=151138272,
>      size=<optimized out>, access_type=MMU_DATA_STORE, mmu_idx=6, probe=<optimized out>,
>      retaddr=0) at target/i386/tcg/system/excp_helper.c:624
> #4  0x0000555555e6e8cf in tlb_fill_align (cpu=0x55555894fdf0, addr=151138272,
>      type=MMU_DATA_LOAD, type@entry=MMU_DATA_STORE, mmu_idx=6, memop=memop@entry=MO_8,
>      size=-1739692016, size@entry=151138272, probe=true, ra=0) at accel/tcg/cputlb.c:1251
> #5  0x0000555555e6eb0d in probe_access_internal (cpu=cpu@entry=0x55555894fdf0,
>      addr=addr@entry=151138272, fault_size=fault_size@entry=0,
>      access_type=access_type@entry=MMU_DATA_STORE, mmu_idx=<optimized out>,
>      nonfault=nonfault@entry=true, phost=0x7ffebeffc0a8, pfull=0x7ffebeffbfa0, retaddr=0,
>      check_mem_cbs=false) at accel/tcg/cputlb.c:1371
> #6  0x0000555555e70c84 in probe_access_full_mmu (env=0x5555589529b0, addr=addr@entry=151138272,
>      size=size@entry=0, access_type=access_type@entry=MMU_DATA_STORE, mmu_idx=<optimized out>,
>      phost=phost@entry=0x7ffebeffc0a8, pfull=0x7ffebeffbfa0) at accel/tcg/cputlb.c:1439
> #7  0x0000555555d497c9 in ptw_translate (inout=0x7ffebeffc090, addr=151138272)
>      at target/i386/tcg/system/excp_helper.c:68
> #8  0x0000555555d49988 in mmu_translate (env=env@entry=0x5555589529b0,
>      in=in@entry=0x7ffebeffc140, out=out@entry=0x7ffebeffc110, err=err@entry=0x7ffebeffc120,
>      ra=ra@entry=0) at target/i386/tcg/system/excp_helper.c:198
> #9  0x0000555555d4aece in get_physical_address (env=0x5555589529b0, addr=18446741874686299840,
>      access_type=MMU_DATA_LOAD, mmu_idx=4, out=0x7ffebeffc110, err=0x7ffebeffc120, ra=0)
>      at target/i386/tcg/system/excp_helper.c:597
> #10 x86_cpu_tlb_fill (cs=0x55555894fdf0, addr=18446741874686299840, size=<optimized out>,
>      access_type=MMU_DATA_LOAD, mmu_idx=4, probe=<optimized out>, retaddr=0)
>      at target/i386/tcg/system/excp_helper.c:617
> #11 0x0000555555e6e8cf in tlb_fill_align (cpu=0x55555894fdf0, addr=18446741874686299840,
>      type=type@entry=MMU_DATA_LOAD, mmu_idx=4, memop=MO_8, memop@entry=MO_32, size=-1739692016,
>      size@entry=3776, probe=false, ra=0) at accel/tcg/cputlb.c:1251
> #12 0x0000555555e6ed4d in mmu_lookup1 (cpu=cpu@entry=0x55555894fdf0,
>      data=data@entry=0x7ffebeffc310, memop=memop@entry=MO_32, mmu_idx=mmu_idx@entry=4,
>      access_type=access_type@entry=MMU_DATA_LOAD, ra=ra@entry=0) at accel/tcg/cputlb.c:1652
> #13 0x0000555555e6eea5 in mmu_lookup (cpu=cpu@entry=0x55555894fdf0,
>      addr=addr@entry=18446741874686299840, oi=oi@entry=36, ra=ra@entry=0,
>      type=type@entry=MMU_DATA_LOAD, l=l@entry=0x7ffebeffc310) at accel/tcg/cputlb.c:1755
> #14 0x0000555555e6f2f3 in do_ld4_mmu (cpu=cpu@entry=0x55555894fdf0,
>      addr=addr@entry=18446741874686299840, oi=oi@entry=36, ra=ra@entry=0,
>      access_type=access_type@entry=MMU_DATA_LOAD) at accel/tcg/cputlb.c:2364
> #15 0x0000555555e71dba in cpu_ldl_mmu (env=0x5555589529b0, addr=18446741874686299840, oi=36,
>      ra=0) at accel/tcg/ldst_common.c.inc:165
> #16 cpu_ldl_le_mmuidx_ra (env=env@entry=0x5555589529b0, addr=addr@entry=18446741874686299840,
>      mmu_idx=<optimized out>, ra=ra@entry=0) at accel/tcg/ldst_common.c.inc:308
> #17 0x0000555555db72da in do_interrupt64 (env=0x5555589529b0, intno=236, is_int=0, error_code=0,
>      next_eip=<optimized out>, is_hw=0) at target/i386/tcg/seg_helper.c:954
> #18 do_interrupt_all (cpu=cpu@entry=0x55555894fdf0, intno=236, is_int=is_int@entry=0,
>      error_code=error_code@entry=0, next_eip=next_eip@entry=0, is_hw=is_hw@entry=1)
>      at target/i386/tcg/seg_helper.c:1213
> #19 0x0000555555db884a in do_interrupt_x86_hardirq (env=env@entry=0x5555589529b0,
>      intno=<optimized out>, is_hw=is_hw@entry=1) at target/i386/tcg/seg_helper.c:1245
> #20 0x0000555555d4f06f in x86_cpu_exec_interrupt (cs=0x55555894fdf0,
>      interrupt_request=<optimized out>) at target/i386/tcg/system/seg_helper.c:209
> #21 0x0000555555e660ed in cpu_handle_interrupt (cpu=0x55555894fdf0, last_tb=<synthetic pointer>)
>      at accel/tcg/cpu-exec.c:851
> #22 cpu_exec_loop (cpu=cpu@entry=0x55555894fdf0, sc=sc@entry=0x7ffebeffc580)
>      at accel/tcg/cpu-exec.c:955
> #23 0x0000555555e663f1 in cpu_exec_setjmp (cpu=cpu@entry=0x55555894fdf0,
>      sc=sc@entry=0x7ffebeffc580) at accel/tcg/cpu-exec.c:1033
>      type=type@entry=MMU_DATA_LOAD, l=l@entry=0x7ffebeffc310) at accel/tcg/cputlb.c:1755
> #14 0x0000555555e6f2f3 in do_ld4_mmu (cpu=cpu@entry=0x55555894fdf0,
>      addr=addr@entry=18446741874686299840, oi=oi@entry=36, ra=ra@entry=0,
>      access_type=access_type@entry=MMU_DATA_LOAD) at accel/tcg/cputlb.c:2364
> #15 0x0000555555e71dba in cpu_ldl_mmu (env=0x5555589529b0, addr=18446741874686299840, oi=36,
>      ra=0) at accel/tcg/ldst_common.c.inc:165
> #16 cpu_ldl_le_mmuidx_ra (env=env@entry=0x5555589529b0, addr=addr@entry=18446741874686299840,
>      mmu_idx=<optimized out>, ra=ra@entry=0) at accel/tcg/ldst_common.c.inc:308
> #17 0x0000555555db72da in do_interrupt64 (env=0x5555589529b0, intno=236, is_int=0, error_code=0,
>      next_eip=<optimized out>, is_hw=0) at target/i386/tcg/seg_helper.c:954
> #18 do_interrupt_all (cpu=cpu@entry=0x55555894fdf0, intno=236, is_int=is_int@entry=0,
>      error_code=error_code@entry=0, next_eip=next_eip@entry=0, is_hw=is_hw@entry=1)
>      at target/i386/tcg/seg_helper.c:1213
> #19 0x0000555555db884a in do_interrupt_x86_hardirq (env=env@entry=0x5555589529b0,
>      intno=<optimized out>, is_hw=is_hw@entry=1) at target/i386/tcg/seg_helper.c:1245
> #20 0x0000555555d4f06f in x86_cpu_exec_interrupt (cs=0x55555894fdf0,
>      interrupt_request=<optimized out>) at target/i386/tcg/system/seg_helper.c:209
> #21 0x0000555555e660ed in cpu_handle_interrupt (cpu=0x55555894fdf0, last_tb=<synthetic pointer>)
>      at accel/tcg/cpu-exec.c:851
> #22 cpu_exec_loop (cpu=cpu@entry=0x55555894fdf0, sc=sc@entry=0x7ffebeffc580)
>      at accel/tcg/cpu-exec.c:955
> #23 0x0000555555e663f1 in cpu_exec_setjmp (cpu=cpu@entry=0x55555894fdf0,
>      sc=sc@entry=0x7ffebeffc580) at accel/tcg/cpu-exec.c:1033
> --Type <RET> for more, q to quit, c to continue without paging--
> #24 0x0000555555e66a5d in cpu_exec (cpu=cpu@entry=0x55555894fdf0) at accel/tcg/cpu-exec.c:1059
> #25 0x0000555555d2bdc7 in tcg_cpu_exec (cpu=cpu@entry=0x55555894fdf0)
>      at accel/tcg/tcg-accel-ops.c:80
> #26 0x0000555555d2c1c3 in mttcg_cpu_thread_fn (arg=arg@entry=0x55555894fdf0)
>      at accel/tcg/tcg-accel-ops-mttcg.c:94
> #27 0x0000555556056d90 in qemu_thread_start (args=0x5555589cdba0) at util/qemu-thread-posix.c:541
> #28 0x00007ffff60e0b7b in ?? () from /lib/x86_64-linux-gnu/libc.so.6
> #29 0x00007ffff615e7b8 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
> 
> 
> qemu-system-x86_64: ./include/exec/ram_addr.h:91: ramblock_ptr: Assertion `offset_in_ramblock(block, offset)' failed.
> 
> (gdb) bt
> #0  0x00007ffff6076507 in abort () from /lib/x86_64-linux-gnu/libc.so.6
> #1  0x00007ffff6076420 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
> #2  0x0000555555a047fa in ramblock_ptr (offset=281471527758833, block=<optimized out>)
>      at ./include/exec/ram_addr.h:91
> #3  0x0000555555a04c83 in ramblock_ptr (block=<optimized out>, offset=<optimized out>)
>      at system/physmem.c:2238
> #4  qemu_ram_ptr_length (lock=false, is_write=true, block=<optimized out>, addr=<optimized out>,
>      size=0x0) at system/physmem.c:2430
> #5  qemu_map_ram_ptr (ram_block=<optimized out>, addr=<optimized out>) at system/physmem.c:2443
> #6  0x0000555555e4af6b in memory_region_get_ram_ptr (mr=<optimized out>) at system/memory.c:2452
> #7  0x0000555555e6e024 in tlb_set_page_full (cpu=cpu@entry=0x5555589f9f50,
>      mmu_idx=mmu_idx@entry=4, addr=addr@entry=18446741874686296064,
>      full=full@entry=0x7ffebd7f90b0) at accel/tcg/cputlb.c:1065
> #8  0x0000555555e70a92 in tlb_set_page_with_attrs (cpu=cpu@entry=0x5555589f9f50,
>      addr=addr@entry=18446741874686296064, paddr=paddr@entry=206749696, attrs=...,
>      prot=<optimized out>, mmu_idx=mmu_idx@entry=4, size=4096) at accel/tcg/cputlb.c:1193
> #9  0x0000555555d4ae44 in x86_cpu_tlb_fill (cs=0x5555589f9f50, addr=18446741874686299840,
>      size=<optimized out>, access_type=MMU_DATA_LOAD, mmu_idx=4, probe=<optimized out>, retaddr=0)
>      at target/i386/tcg/system/excp_helper.c:624
> #10 0x0000555555e6e8cf in tlb_fill_align (cpu=0x5555589f9f50, addr=18446741874686299840,
>      type=type@entry=MMU_DATA_LOAD, mmu_idx=4, memop=MO_8, memop@entry=MO_32, size=-1115714056,
>      size@entry=3776, probe=false, ra=0) at accel/tcg/cputlb.c:1251
> #11 0x0000555555e6ed4d in mmu_lookup1 (cpu=cpu@entry=0x5555589f9f50,
>      data=data@entry=0x7ffebd7f9310, memop=memop@entry=MO_32, mmu_idx=mmu_idx@entry=4,
>      access_type=access_type@entry=MMU_DATA_LOAD, ra=ra@entry=0) at accel/tcg/cputlb.c:1652
> #12 0x0000555555e6eea5 in mmu_lookup (cpu=cpu@entry=0x5555589f9f50,
>      addr=addr@entry=18446741874686299840, oi=oi@entry=36, ra=ra@entry=0,
>      type=type@entry=MMU_DATA_LOAD, l=l@entry=0x7ffebd7f9310) at accel/tcg/cputlb.c:1755
> #13 0x0000555555e6f2f3 in do_ld4_mmu (cpu=cpu@entry=0x5555589f9f50,
>      addr=addr@entry=18446741874686299840, oi=oi@entry=36, ra=ra@entry=0,
>      access_type=access_type@entry=MMU_DATA_LOAD) at accel/tcg/cputlb.c:2364
> #14 0x0000555555e71dba in cpu_ldl_mmu (env=0x5555589fcb10, addr=18446741874686299840, oi=36,
>      ra=0) at accel/tcg/ldst_common.c.inc:165
> #15 cpu_ldl_le_mmuidx_ra (env=env@entry=0x5555589fcb10, addr=addr@entry=18446741874686299840,
>      mmu_idx=<optimized out>, ra=ra@entry=0) at accel/tcg/ldst_common.c.inc:308
> #16 0x0000555555db72da in do_interrupt64 (env=0x5555589fcb10, intno=236, is_int=0, error_code=0,
>      next_eip=<optimized out>, is_hw=0) at target/i386/tcg/seg_helper.c:954
> #17 do_interrupt_all (cpu=cpu@entry=0x5555589f9f50, intno=236, is_int=is_int@entry=0,
>      error_code=error_code@entry=0, next_eip=next_eip@entry=0, is_hw=is_hw@entry=1)
>      at target/i386/tcg/seg_helper.c:1213
> #18 0x0000555555db884a in do_interrupt_x86_hardirq (env=env@entry=0x5555589fcb10,
>      intno=<optimized out>, is_hw=is_hw@entry=1) at target/i386/tcg/seg_helper.c:1245
> #19 0x0000555555d4f06f in x86_cpu_exec_interrupt (cs=0x5555589f9f50,
>      interrupt_request=<optimized out>) at target/i386/tcg/system/seg_helper.c:209
> #20 0x0000555555e660ed in cpu_handle_interrupt (cpu=0x5555589f9f50, last_tb=<synthetic pointer>)
>      at accel/tcg/cpu-exec.c:851
> #21 cpu_exec_loop (cpu=cpu@entry=0x5555589f9f50, sc=sc@entry=0x7ffebd7f9580)
>      at accel/tcg/cpu-exec.c:955
> #22 0x0000555555e663f1 in cpu_exec_setjmp (cpu=cpu@entry=0x5555589f9f50,
>      sc=sc@entry=0x7ffebd7f9580) at accel/tcg/cpu-exec.c:1033
> #23 0x0000555555e66a5d in cpu_exec (cpu=cpu@entry=0x5555589f9f50) at accel/tcg/cpu-exec.c:1059
> #24 0x0000555555d2bdc7 in tcg_cpu_exec (cpu=cpu@entry=0x5555589f9f50)
>      at accel/tcg/tcg-accel-ops.c:80
> #25 0x0000555555d2c1c3 in mttcg_cpu_thread_fn (arg=arg@entry=0x5555589f9f50)
>      at accel/tcg/tcg-accel-ops-mttcg.c:94
> #26 0x0000555556056d90 in qemu_thread_start (args=0x55555856bf60) at util/qemu-thread-posix.c:541
> #27 0x00007ffff60e0b7b in ?? () from /lib/x86_64-linux-gnu/libc.so.6
> #28 0x00007ffff615e7b8 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
> 
> (gdb) frame 2
> #2  0x0000555555a047fa in ramblock_ptr (offset=281471527758833, block=<optimized out>)
>      at ./include/exec/ram_addr.h:91
> 91          assert(offset_in_ramblock(block, offset));
> 
> (gdb) l
> 86          return (b && b->host && offset < b->used_length) ? true : false;
> 87      }
> 88
> 89      static inline void *ramblock_ptr(RAMBlock *block, ram_addr_t offset)
> 90      {
> 91          assert(offset_in_ramblock(block, offset));
> 92          return (char *)block->host + offset;
> 93      }
> 94
> 95      static inline unsigned long int ramblock_recv_bitmap_offset(void *host_addr,
> 
> 
> [    9.439487] pci 0000:00:02.0: BAR 1 [io  0xc000-0xc03f]
> 
> Thread 65 "qemu-system-x86" received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7ffe9cff96c0 (LWP 15472)]
> phys_page_find (d=d@entry=0x7fff905ec880, addr=addr@entry=111288320) at system/physmem.c:337
> 337         if (section_covers_addr(&sections[lp.ptr], addr)) {
> (gdb) l
> 332             }
> 333             p = nodes[lp.ptr];
> 334             lp = p[(index >> (i * P_L2_BITS)) & (P_L2_SIZE - 1)];
> 335         }
> 336
> 337         if (section_covers_addr(&sections[lp.ptr], addr)) {
> 338             return &sections[lp.ptr];
> 339         } else {
> 340             return &sections[PHYS_SECTION_UNASSIGNED];
> 341         }
> (gdb)
> 
> 
> I was doing a bisection between 9.2.0 and 10.0.0, since we observed
> this issue happening wiht 10.0 but not - with 9.2.  So some of the
> above failures might be somewhere from the middle between 9.2 and
> 10.0.  However, I was able to trigger some of the failures with
> 9.2.0, though with much less probability.  And some can be triggered
> in current master too, with much better probability.
> 
> On my 4-core notebook, the above command line fails every 20..50 run.
> 
> I was never able to reproduce the assertion failure as shown in !1921.
> 
> As of now, this issue is hitting debian trixie, - in debci, when a
> package which creates a guest image tries to run qemu but in the
> debci environment there's no kvm available, so it resorts to tcg.
> 
> On IRC, Manos Pitsidianakis noted that he was debugging use-after-free
> with MemoryRegion recently, and posted a patch which can help a bit:
> https://people.linaro.org/~manos.pitsidianakis/backtrace.diff
> 
> I'm not sure where to go from here.
> 
> Just collecting everything we have now.
> 
> Thanks,
> 
> /mjt
> 



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: apparent race condition in mttcg memory handling
  2025-07-22 20:11 ` Gustavo Romero
@ 2025-07-23  6:31   ` Michael Tokarev
  0 siblings, 0 replies; 12+ messages in thread
From: Michael Tokarev @ 2025-07-23  6:31 UTC (permalink / raw)
  To: Gustavo Romero, QEMU Development

On 22.07.2025 23:11, Gustavo Romero wrote:
...
>> The reproducer I was using - it was just booting kernel, no user-
>> space is needed. Qemu crashes during kernel init, or it runs fine.
>>
>> I used regular kernel from debian sid:
>>   http://deb.debian.org/debian/pool/main/l/linux-signed-amd64/linux- 
>> image-amd64_6.12.29-1_amd64.deb
>> Extract vmlinuz-6.12.29-amd64 from there.
> 
> The link above is broken. Googling for "linux-image- 
> amd64_6.12.29-1_amd64.deb" successful too.
> 
> Could you point out another image to reproduce it?

Please see https://gitlab.com/qemu-project/qemu/-/issues/3040 --
actual kernel version isn't important, I guess any kernel will
do.  Yesterday I used current debian kernel, from
https://deb.debian.org/debian/pool/main/l/linux-signed-amd64/linux-image-6.12.37+deb13-amd64_6.12.37-1_amd64.deb 
, with current
qemu master, and I'm able to reproduce the issue in 4 invocations
on my laptop (running qemu with -smp 8, while running `stress -c10`
at the same time, on a laptop with 4 cores, 8 threads).

Thanks,

/mjt

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2025-07-23  6:52 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-30 19:20 apparent race condition in mttcg memory handling Michael Tokarev
2025-06-04 10:47 ` Michael Tokarev
2025-07-21 11:47 ` Philippe Mathieu-Daudé
2025-07-21 16:23   ` Pierrick Bouvier
2025-07-21 16:29     ` Pierrick Bouvier
2025-07-21 17:14       ` Michael Tokarev
2025-07-21 17:25         ` Pierrick Bouvier
2025-07-21 17:28           ` Pierrick Bouvier
2025-07-21 17:31           ` Peter Maydell
2025-07-21 17:52             ` Pierrick Bouvier
2025-07-22 20:11 ` Gustavo Romero
2025-07-23  6:31   ` Michael Tokarev

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).