* Re: apparent race condition in mttcg memory handling
2025-05-30 19:20 apparent race condition in mttcg memory handling Michael Tokarev
@ 2025-06-04 10:47 ` Michael Tokarev
2025-07-21 11:47 ` Philippe Mathieu-Daudé
2025-07-22 20:11 ` Gustavo Romero
2 siblings, 0 replies; 12+ messages in thread
From: Michael Tokarev @ 2025-06-04 10:47 UTC (permalink / raw)
To: QEMU Development
Here's a typical output with ASan enabled, fwiw:
$ ./qemu-system-x86_64 -smp 16 -m 256 -vga none -display none -kernel
/boot/vmlinuz-6.12.29-amd64 -append "console=ttyS0" -serial
file:/dev/tty -monitor stdio -initrd ~/debvm/initrd
==368707==WARNING: ASan doesn't fully support makecontext/swapcontext
functions and may produce false positives in some cases!
QEMU 10.0.50 monitor - type 'help' for more information
(qemu) [ 0.000000] Linux version 6.12.29-amd64
(debian-kernel@lists.debian.org) (x86_64-linux-gnu-gcc-14 (Debian
14.2.0-19) 14.2.0, GNU ld (GNU Binutils for Debian) 2.44) #1 SMP
PREEMPT_DYNAMIC Debian 6.12.29-1 (2025-05-18)
[ 0.000000] Command line: console=ttyS0
[ 0.000000] BIOS-provided physical RAM map:
[ 0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
[ 0.000000] BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff]
reserved
[ 0.000000] BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff]
reserved
[ 0.000000] BIOS-e820: [mem 0x0000000000100000-0x000000000ffdffff] usable
[ 0.000000] BIOS-e820: [mem 0x000000000ffe0000-0x000000000fffffff]
reserved
[ 0.000000] BIOS-e820: [mem 0x00000000fffc0000-0x00000000ffffffff]
reserved
[ 0.000000] BIOS-e820: [mem 0x000000fd00000000-0x000000ffffffffff]
reserved
[ 0.000000] NX (Execute Disable) protection: active
[ 0.000000] APIC: Static calls initialized
[ 0.000000] SMBIOS 2.8 present.
[ 0.000000] DMI: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
[ 0.000000] DMI: Memory slots populated: 1/1
[ 0.000000] tsc: Fast TSC calibration failed
[ 0.000000] AGP: No AGP bridge found
[ 0.000000] last_pfn = 0xffe0 max_arch_pfn = 0x400000000
[ 0.000000] MTRR map: 4 entries (3 fixed + 1 variable; max 19), built
from 8 variable MTRRs
[ 0.000000] x86/PAT: Configuration [0-7]: WB WC UC- UC WB WP UC-
WT
[ 0.000000] found SMP MP-table at [mem 0x000f5480-0x000f548f]
[ 0.000000] RAMDISK: [mem 0x0ffdb000-0x0ffdffff]
[ 0.000000] ACPI: Early table checksum verification disabled
[ 0.000000] ACPI: RSDP 0x00000000000F52A0 000014 (v00 BOCHS )
[ 0.000000] ACPI: RSDT 0x000000000FFE28F3 000034 (v01 BOCHS BXPC
00000001 BXPC 00000001)
[ 0.000000] ACPI: FACP 0x000000000FFE272F 000074 (v01 BOCHS BXPC
00000001 BXPC 00000001)
[ 0.000000] ACPI: DSDT 0x000000000FFE0040 0026EF (v01 BOCHS BXPC
00000001 BXPC 00000001)
[ 0.000000] ACPI: FACS 0x000000000FFE0000 000040
[ 0.000000] ACPI: APIC 0x000000000FFE27A3 0000F0 (v03 BOCHS BXPC
00000001 BXPC 00000001)
[ 0.000000] ACPI: HPET 0x000000000FFE2893 000038 (v01 BOCHS BXPC
00000001 BXPC 00000001)
[ 0.000000] ACPI: WAET 0x000000000FFE28CB 000028 (v01 BOCHS BXPC
00000001 BXPC 00000001)
[ 0.000000] ACPI: Reserving FACP table memory at [mem
0xffe272f-0xffe27a2]
[ 0.000000] ACPI: Reserving DSDT table memory at [mem
0xffe0040-0xffe272e]
[ 0.000000] ACPI: Reserving FACS table memory at [mem
0xffe0000-0xffe003f]
[ 0.000000] ACPI: Reserving APIC table memory at [mem
0xffe27a3-0xffe2892]
[ 0.000000] ACPI: Reserving HPET table memory at [mem
0xffe2893-0xffe28ca]
[ 0.000000] ACPI: Reserving WAET table memory at [mem
0xffe28cb-0xffe28f2]
[ 0.000000] No NUMA configuration found
[ 0.000000] Faking a node at [mem 0x0000000000000000-0x000000000ffdffff]
[ 0.000000] NODE_DATA(0) allocated [mem 0x0ffb0680-0x0ffdafff]
[ 0.000000] Zone ranges:
[ 0.000000] DMA [mem 0x0000000000001000-0x0000000000ffffff]
[ 0.000000] DMA32 [mem 0x0000000001000000-0x000000000ffdffff]
[ 0.000000] Normal empty
[ 0.000000] Device empty
[ 0.000000] Movable zone start for each node
[ 0.000000] Early memory node ranges
[ 0.000000] node 0: [mem 0x0000000000001000-0x000000000009efff]
[ 0.000000] node 0: [mem 0x0000000000100000-0x000000000ffdffff]
[ 0.000000] Initmem setup node 0 [mem
0x0000000000001000-0x000000000ffdffff]
[ 0.000000] On node 0, zone DMA: 1 pages in unavailable ranges
[ 0.000000] On node 0, zone DMA: 97 pages in unavailable ranges
[ 0.000000] On node 0, zone DMA32: 32 pages in unavailable ranges
[ 0.000000] ACPI: PM-Timer IO Port: 0x608
[ 0.000000] ACPI: LAPIC_NMI (acpi_id[0xff] dfl dfl lint[0x1])
[ 0.000000] IOAPIC[0]: apic_id 0, version 32, address 0xfec00000, GSI
0-23
[ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
[ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 5 global_irq 5 high level)
[ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
[ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 10 global_irq 10 high level)
[ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 11 global_irq 11 high level)
[ 0.000000] ACPI: Using ACPI (MADT) for SMP configuration information
[ 0.000000] ACPI: HPET id: 0x8086a201 base: 0xfed00000
[ 0.000000] CPU topo: Max. logical packages: 1
[ 0.000000] CPU topo: Max. logical dies: 1
[ 0.000000] CPU topo: Max. dies per package: 1
[ 0.000000] CPU topo: Max. threads per core: 1
[ 0.000000] CPU topo: Num. cores per package: 16
[ 0.000000] CPU topo: Num. threads per package: 16
[ 0.000000] CPU topo: Allowing 16 present CPUs plus 0 hotplug CPUs
[ 0.000000] PM: hibernation: Registered nosave memory: [mem
0x00000000-0x00000fff]
[ 0.000000] PM: hibernation: Registered nosave memory: [mem
0x0009f000-0x000fffff]
[ 0.000000] [mem 0x10000000-0xfffbffff] available for PCI devices
[ 0.000000] Booting paravirtualized kernel on bare hardware
[ 0.000000] clocksource: refined-jiffies: mask: 0xffffffff
max_cycles: 0xffffffff, max_idle_ns: 7645519600211568 ns
[ 0.000000] setup_percpu: NR_CPUS:8192 nr_cpumask_bits:16
nr_cpu_ids:16 nr_node_ids:1
[ 0.000000] percpu: Embedded 66 pages/cpu s233472 r8192 d28672 u524288
[ 0.000000] Kernel command line: console=ttyS0
[ 0.000000] Dentry cache hash table entries: 32768 (order: 6, 262144
bytes, linear)
[ 0.000000] Inode-cache hash table entries: 16384 (order: 5, 131072
bytes, linear)
[ 0.000000] Fallback order for Node 0: 0
[ 0.000000] Built 1 zonelists, mobility grouping on. Total pages: 65406
[ 0.000000] Policy zone: DMA32
[ 0.000000] mem auto-init: stack:all(zero), heap alloc:on, heap free:off
[ 0.000000] AGP: Checking aperture...
[ 0.000000] AGP: No AGP bridge found
[ 0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=16, Nodes=1
[ 0.000000] ftrace: allocating 45689 entries in 179 pages
[ 0.000000] ftrace: allocated 179 pages with 5 groups
[ 0.000000] Dynamic Preempt: voluntary
[ 0.000000] rcu: Preemptible hierarchical RCU implementation.
[ 0.000000] rcu: RCU restricting CPUs from NR_CPUS=8192 to
nr_cpu_ids=16.
[ 0.000000] Trampoline variant of Tasks RCU enabled.
[ 0.000000] Rude variant of Tasks RCU enabled.
[ 0.000000] Tracing variant of Tasks RCU enabled.
[ 0.000000] rcu: RCU calculated value of scheduler-enlistment delay
is 25 jiffies.
[ 0.000000] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=16
[ 0.000000] RCU Tasks: Setting shift to 4 and lim to 1
rcu_task_cb_adjust=1 rcu_task_cpu_ids=16.
[ 0.000000] RCU Tasks Rude: Setting shift to 4 and lim to 1
rcu_task_cb_adjust=1 rcu_task_cpu_ids=16.
[ 0.000000] RCU Tasks Trace: Setting shift to 4 and lim to 1
rcu_task_cb_adjust=1 rcu_task_cpu_ids=16.
[ 0.000000] NR_IRQS: 524544, nr_irqs: 552, preallocated irqs: 16
[ 0.000000] rcu: srcu_init: Setting srcu_struct sizes based on
contention.
[ 0.000000] Console: colour *CGA 80x25
[ 0.000000] printk: legacy console [ttyS0] enabled
[ 0.000000] ACPI: Core revision 20240827
[ 0.000000] clocksource: hpet: mask: 0xffffffff max_cycles:
0xffffffff, max_idle_ns: 19112604467 ns
[ 0.060000] APIC: Switch to symmetric I/O mode setup
[ 0.136000] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
[ 0.172000] tsc: Unable to calibrate against PIT
[ 0.172000] tsc: using HPET reference calibration
[ 0.176000] tsc: Detected 2096.090 MHz processor
[ 0.007755] clocksource: tsc-early: mask: 0xffffffffffffffff
max_cycles: 0x1e36c30ca71, max_idle_ns: 440795294664 ns
[ 0.019694] Calibrating delay loop (skipped), value calculated using
timer frequency.. 4192.18 BogoMIPS (lpj=8384360)
[ 0.081754] Last level iTLB entries: 4KB 512, 2MB 255, 4MB 127
[ 0.083138] Last level dTLB entries: 4KB 512, 2MB 255, 4MB 127, 1GB 0
[ 0.093255] Spectre V1 : Mitigation: usercopy/swapgs barriers and
__user pointer sanitization
[ 0.102414] Spectre V2 : Mitigation: Retpolines
[ 0.105952] Spectre V2 : Spectre v2 / SpectreRSB: Filling RSB on
context switch and VMEXIT
[ 0.160434] x86/fpu: x87 FPU will use FXSAVE
[ 3.002703] Freeing SMP alternatives memory: 40K
[ 3.023274] pid_max: default: 32768 minimum: 301
[ 3.122961] LSM: initializing
lsm=lockdown,capability,landlock,yama,apparmor,tomoyo,bpf,ipe,ima,evm
[ 3.172533] landlock: Up and running.
[ 3.173855] Yama: disabled by default; enable with sysctl kernel.yama.*
[ 3.269917] AppArmor: AppArmor initialized
[ 3.275313] TOMOYO Linux initialized
[ 3.305559] LSM support for eBPF active
[ 3.381819] Mount-cache hash table entries: 512 (order: 0, 4096
bytes, linear)
[ 3.386196] Mountpoint-cache hash table entries: 512 (order: 0, 4096
bytes, linear)
[ 4.149559] smpboot: CPU0: AMD QEMU Virtual CPU version 2.5+ (family:
0xf, model: 0x6b, stepping: 0x1)
[ 4.326143] Performance Events: PMU not available due to
virtualization, using software events only.
[ 4.358224] signal: max sigframe size: 1440
[ 4.378978] rcu: Hierarchical SRCU implementation.
[ 4.382048] rcu: Max phase no-delay instances is 1000.
[ 4.418254] Timer migration: 2 hierarchy levels; 8 children per
group; 2 crossnode level
[ 4.558206] NMI watchdog: Perf NMI watchdog permanently disabled
[ 4.603431] smp: Bringing up secondary CPUs ...
[ 4.702376] smpboot: x86: Booting SMP configuration:
[ 4.703724] .... node #0, CPUs: #1 #2 #3 #4 #5 #6 #7
#8 #9 #10 #11 #12 #13 #14 #15
[ 0.000000] calibrate_delay_direct() dropping max bogoMips estimate 4
= 9105957
[ 0.000000] calibrate_delay_direct() failed to get a good estimate
for loops_per_jiffy.
[ 0.000000] Probably due to long platform interrupts. Consider using
"lpj=" boot option.
[ 0.000000] calibrate_delay_direct() dropping max bogoMips estimate 1
= 28440919
[ 0.000000] calibrate_delay_direct() dropping max bogoMips estimate 3
= 20962063
[ 0.000000] calibrate_delay_direct() dropping max bogoMips estimate 4
= 11352022
[ 0.000000] calibrate_delay_direct() failed to get a good estimate
for loops_per_jiffy.
[ 0.000000] Probably due to long platform interrupts. Consider using
"lpj=" boot option.
[ 5.969337] calibrate_delay_direct() failed to get a good estimate
for loops_per_jiffy.
[ 5.969337] Probably due to long platform interrupts. Consider using
"lpj=" boot option.
[ 5.969343] calibrate_delay_direct() failed to get a good estimate
for loops_per_jiffy.
[ 5.969343] Probably due to long platform interrupts. Consider using
"lpj=" boot option.
[ 5.969348] calibrate_delay_direct() dropping max bogoMips estimate 2
= 27830974
[ 5.969358] calibrate_delay_direct() dropping max bogoMips estimate 3
= 30234130
[ 5.969358] calibrate_delay_direct() failed to get a good estimate
for loops_per_jiffy.
[ 5.969358] Probably due to long platform interrupts. Consider using
"lpj=" boot option.
[ 5.969364] calibrate_delay_direct() dropping max bogoMips estimate 1
= 21780255
[ 5.969364] calibrate_delay_direct() dropping min bogoMips estimate 3
= 7553311
[ 5.969364] calibrate_delay_direct() dropping min bogoMips estimate 4
= 8179132
[ 5.969369] calibrate_delay_direct() failed to get a good estimate
for loops_per_jiffy.
[ 5.969369] Probably due to long platform interrupts. Consider using
"lpj=" boot option.
[ 5.969374] calibrate_delay_direct() failed to get a good estimate
for loops_per_jiffy.
[ 5.969374] Probably due to long platform interrupts. Consider using
"lpj=" boot option.
[ 5.969389] calibrate_delay_direct() failed to get a good estimate
for loops_per_jiffy.
[ 5.969389] Probably due to long platform interrupts. Consider using
"lpj=" boot option.
[ 5.969400] calibrate_delay_direct() dropping min bogoMips estimate 1
= 1631122
[ 5.969405] calibrate_delay_direct() dropping min bogoMips estimate 0
= 8501104
[ 5.969410] calibrate_delay_direct() failed to get a good estimate
for loops_per_jiffy.
[ 5.969410] Probably due to long platform interrupts. Consider using
"lpj=" boot option.
[ 5.969415] calibrate_delay_direct() dropping max bogoMips estimate 1
= 9766470
[ 5.969415] calibrate_delay_direct() failed to get a good estimate
for loops_per_jiffy.
[ 5.969415] Probably due to long platform interrupts. Consider using
"lpj=" boot option.
[ 7.946795] smp: Brought up 1 node, 16 CPUs
[ 7.949559] smpboot: Total of 16 processors activated (36914.04 BogoMIPS)
[ 8.167796] Memory: 197656K/261624K available (16384K kernel code,
2486K rwdata, 11780K rodata, 4148K init, 4956K bss, 54800K reserved, 0K
cma-reserved)
[ 8.433923] devtmpfs: initialized
[ 8.547308] x86/mm: Memory block size: 128MB
[ 8.751207] clocksource: jiffies: mask: 0xffffffff max_cycles:
0xffffffff, max_idle_ns: 7645041785100000 ns
[ 8.775080] futex hash table entries: 4096 (order: 6, 262144 bytes,
linear)
[ 8.868262] pinctrl core: initialized pinctrl subsystem
[ 9.322265] NET: Registered PF_NETLINK/PF_ROUTE protocol family
[ 9.434496] DMA: preallocated 128 KiB GFP_KERNEL pool for atomic
allocations
[ 9.446267] DMA: preallocated 128 KiB GFP_KERNEL|GFP_DMA pool for
atomic allocations
[ 9.450210] DMA: preallocated 128 KiB GFP_KERNEL|GFP_DMA32 pool for
atomic allocations
[ 9.455908] audit: initializing netlink subsys (disabled)
[ 9.494877] audit: type=2000 audit(1749033951.660:1):
state=initialized audit_enabled=0 res=1
[ 9.622753] thermal_sys: Registered thermal governor 'fair_share'
[ 9.623234] thermal_sys: Registered thermal governor 'bang_bang'
[ 9.625842] thermal_sys: Registered thermal governor 'step_wise'
[ 9.629649] thermal_sys: Registered thermal governor 'user_space'
[ 9.633699] thermal_sys: Registered thermal governor 'power_allocator'
[ 9.653949] cpuidle: using governor ladder
[ 9.661815] cpuidle: using governor menu
[ 9.696090] acpiphp: ACPI Hot Plug PCI Controller Driver version: 0.5
[ 9.781559] PCI: Using configuration type 1 for base access
[ 9.797961] mtrr: your CPUs had inconsistent fixed MTRR settings
[ 9.801893] mtrr: your CPUs had inconsistent variable MTRR settings
[ 9.806120] mtrr: your CPUs had inconsistent MTRRdefType settings
[ 9.807416] mtrr: probably your BIOS does not setup all CPUs.
[ 9.808407] mtrr: corrected configuration.
[ 9.858380] kprobes: kprobe jump-optimization is enabled. All kprobes
are optimized if possible.
[ 10.012084] HugeTLB: registered 2.00 MiB page size, pre-allocated 0 pages
[ 10.013878] HugeTLB: 28 KiB vmemmap can be freed for a 2.00 MiB page
[ 10.348768] ACPI: Added _OSI(Module Device)
[ 10.349947] ACPI: Added _OSI(Processor Device)
[ 10.353682] ACPI: Added _OSI(3.0 _SCP Extensions)
[ 10.357664] ACPI: Added _OSI(Processor Aggregator Device)
[ 10.678343] ACPI: 1 ACPI AML tables successfully acquired and loaded
[ 11.221996] ACPI: Interpreter enabled
[ 11.262899] ACPI: PM: (supports S0 S3 S4 S5)
[ 11.270094] ACPI: Using IOAPIC for interrupt routing
[ 11.290614] PCI: Using host bridge windows from ACPI; if necessary,
use "pci=nocrs" and report a bug
[ 11.302139] PCI: Using E820 reservations for host bridge windows
[ 11.353959] ACPI: Enabled 2 GPEs in block 00 to 0F
[ 12.252675] ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
[ 12.287708] acpi PNP0A03:00: _OSC: OS supports [ASPM ClockPM Segments
MSI HPX-Type3]
[ 12.290520] acpi PNP0A03:00: _OSC: not requesting OS control; OS
requires [ExtendedConfig ASPM ClockPM MSI]
[ 12.326309] acpi PNP0A03:00: fail to add MMCONFIG information, can't
access extended configuration space under this bridge
[ 12.537420] acpiphp: Slot [2] registered
[ 12.542702] acpiphp: Slot [3] registered
[ 12.546456] acpiphp: Slot [4] registered
[ 12.550273] acpiphp: Slot [5] registered
[ 12.554307] acpiphp: Slot [6] registered
[ 12.558329] acpiphp: Slot [7] registered
[ 12.558515] acpiphp: Slot [8] registered
[ 12.560865] acpiphp: Slot [9] registered
[ 12.561559] acpiphp: Slot [10] registered
[ 12.561559] acpiphp: Slot [11] registered
[ 12.566400] acpiphp: Slot [12] registered
[ 12.574391] acpiphp: Slot [13] registered
[ 12.578194] acpiphp: Slot [14] registered
[ 12.580588] acpiphp: Slot [15] registered
[ 12.586418] acpiphp: Slot [16] registered
[ 12.587678] acpiphp: Slot [17] registered
[ 12.588808] acpiphp: Slot [18] registered
[ 12.594504] acpiphp: Slot [19] registered
[ 12.602435] acpiphp: Slot [20] registered
[ 12.603927] acpiphp: Slot [21] registered
[ 12.606341] acpiphp: Slot [22] registered
[ 12.607797] acpiphp: Slot [23] registered
[ 12.608969] acpiphp: Slot [24] registered
[ 12.609559] acpiphp: Slot [25] registered
[ 12.609559] acpiphp: Slot [26] registered
[ 12.609559] acpiphp: Slot [27] registered
[ 12.610162] acpiphp: Slot [28] registered
[ 12.611594] acpiphp: Slot [29] registered
[ 12.612960] acpiphp: Slot [30] registered
[ 12.614401] acpiphp: Slot [31] registered
[ 12.620799] PCI host bridge to bus 0000:00
[ 12.630278] pci_bus 0000:00: root bus resource [io 0x0000-0x0cf7 window]
[ 12.639483] pci_bus 0000:00: root bus resource [io 0x0d00-0xffff window]
[ 12.641728] pci_bus 0000:00: root bus resource [mem
0x000a0000-0x000bffff window]
[ 12.644426] pci_bus 0000:00: root bus resource [mem
0x10000000-0xfebfffff window]
[ 12.645559] pci_bus 0000:00: root bus resource [mem
0x100000000-0x17fffffff window]
[ 12.659495] pci_bus 0000:00: root bus resource [bus 00-ff]
[ 12.713130] pci 0000:00:00.0: [8086:1237] type 00 class 0x060000
conventional PCI endpoint
[ 12.896856] pci 0000:00:01.0: [8086:7000] type 00 class 0x060100
conventional PCI endpoint
[ 12.920028] pci 0000:00:01.1: [8086:7010] type 00 class 0x010180
conventional PCI endpoint
[ 12.991922] pci 0000:00:01.1: BAR 4 [io 0xc040-0xc04f]
[ 13.005559] pci 0000:00:01.1: BAR 0 [io 0x01f0-0x01f7]: legacy IDE quirk
[ 13.005559] pci 0000:00:01.1: BAR 1 [io 0x03f6]: legacy IDE quirk
[ 13.005559] pci 0000:00:01.1: BAR 2 [io 0x0170-0x0177]: legacy IDE quirk
[ 13.013769] pci 0000:00:01.1: BAR 3 [io 0x0376]: legacy IDE quirk
[ 13.026884] pci 0000:00:01.3: [8086:7113] type 00 class 0x068000
conventional PCI endpoint
[ 13.045860] pci 0000:00:01.3: quirk: [io 0x0600-0x063f] claimed by
PIIX4 ACPI
[ 13.055916] pci 0000:00:01.3: quirk: [io 0x0700-0x070f] claimed by
PIIX4 SMB
[ 13.059147] pci 0000:00:01.3: quirk_piix4_acpi+0x0/0x180 took 19531 usecs
[ 13.079357] pci 0000:00:02.0: [8086:100e] type 00 class 0x020000
conventional PCI endpoint
=================================================================
==368707==ERROR: AddressSanitizer: heap-use-after-free on address
0x6060003d5f80 at pc 0x55ae8aeb437f bp 0x7f96d99f5500 sp 0x7f96d99f54f8
READ of size 8 at 0x6060003d5f80 thread T10
#0 0x55ae8aeb437e in address_space_lookup_region
../../home/mjt/qemu/master/system/physmem.c:350
#1 0x55ae8aeb4648 in address_space_translate_internal
../../home/mjt/qemu/master/system/physmem.c:374
#2 0x55ae8aeb65b6 in address_space_translate_for_iotlb
../../home/mjt/qemu/master/system/physmem.c:698
#3 0x55ae8b0c938f in tlb_set_page_full
../../home/mjt/qemu/master/accel/tcg/cputlb.c:1052
#4 0x55ae8b0ca499 in tlb_set_page_with_attrs
../../home/mjt/qemu/master/accel/tcg/cputlb.c:1199
#5 0x55ae8b2370c0 in x86_cpu_tlb_fill
../../home/mjt/qemu/master/target/i386/tcg/system/excp_helper.c:628
#6 0x55ae8b0caa74 in tlb_fill_align
../../home/mjt/qemu/master/accel/tcg/cputlb.c:1257
#7 0x55ae8b0cfc75 in mmu_lookup1
../../home/mjt/qemu/master/accel/tcg/cputlb.c:1658
#8 0x55ae8b0d0534 in mmu_lookup
../../home/mjt/qemu/master/accel/tcg/cputlb.c:1761
#9 0x55ae8b0d3a3b in do_ld4_mmu
../../home/mjt/qemu/master/accel/tcg/cputlb.c:2374
#10 0x55ae8b0d8ad0 in cpu_ldl_mmu
../../home/mjt/qemu/master/accel/tcg/ldst_common.c.inc:165
#11 0x55ae8b3b11d9 in cpu_ldl_le_mmuidx_ra
/home/mjt/qemu/master/include/accel/tcg/cpu-ldst.h:142
#12 0x55ae8b3b8373 in do_interrupt64
../../home/mjt/qemu/master/target/i386/tcg/seg_helper.c:979
#13 0x55ae8b3ba0bd in do_interrupt_all
../../home/mjt/qemu/master/target/i386/tcg/seg_helper.c:1238
#14 0x55ae8b3ba2bf in do_interrupt_x86_hardirq
../../home/mjt/qemu/master/target/i386/tcg/seg_helper.c:1270
#15 0x55ae8b245071 in x86_cpu_exec_interrupt
../../home/mjt/qemu/master/target/i386/tcg/system/seg_helper.c:209
#16 0x55ae8b0a067c in cpu_handle_interrupt
../../home/mjt/qemu/master/accel/tcg/cpu-exec.c:821
#17 0x55ae8b0a15e4 in cpu_exec_loop
../../home/mjt/qemu/master/accel/tcg/cpu-exec.c:925
#18 0x55ae8b0a173b in cpu_exec_setjmp
../../home/mjt/qemu/master/accel/tcg/cpu-exec.c:999
#19 0x55ae8b0a1905 in cpu_exec
../../home/mjt/qemu/master/accel/tcg/cpu-exec.c:1025
#20 0x55ae8b0f0e48 in tcg_cpu_exec
../../home/mjt/qemu/master/accel/tcg/tcg-accel-ops.c:81
#21 0x55ae8b0f2b12 in mttcg_cpu_thread_fn
../../home/mjt/qemu/master/accel/tcg/tcg-accel-ops-mttcg.c:94
#22 0x55ae8ba9c4d5 in qemu_thread_start
../../home/mjt/qemu/master/util/qemu-thread-posix.c:541
#23 0x7f97736c11f4 in start_thread nptl/pthread_create.c:442
#24 0x7f977374189b in clone3
../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
0x6060003d5f80 is located 0 bytes inside of 64-byte region
[0x6060003d5f80,0x6060003d5fc0)
freed by thread T1 here:
#0 0x7f9774eb76a8 in __interceptor_free
../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:52
#1 0x55ae8aec5bc1 in address_space_dispatch_free
../../home/mjt/qemu/master/system/physmem.c:2716
#2 0x55ae8ae92afc in flatview_destroy
../../home/mjt/qemu/master/system/memory.c:295
#3 0x55ae8bab996c in call_rcu_thread
../../home/mjt/qemu/master/util/rcu.c:301
#4 0x55ae8ba9c4d5 in qemu_thread_start
../../home/mjt/qemu/master/util/qemu-thread-posix.c:541
#5 0x7f97736c11f4 in start_thread nptl/pthread_create.c:442
previously allocated by thread T4 here:
#0 0x7f9774eb83b7 in __interceptor_calloc
../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:77
#1 0x7f97746e3670 in g_malloc0
(/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x5a670)
#2 0x55ae8ae96bd4 in generate_memory_topology
../../home/mjt/qemu/master/system/memory.c:758
#3 0x55ae8ae9ad9b in flatviews_reset
../../home/mjt/qemu/master/system/memory.c:1074
#4 0x55ae8ae9b2cf in memory_region_transaction_commit
../../home/mjt/qemu/master/system/memory.c:1150
#5 0x55ae8aea612b in memory_region_del_subregion
../../home/mjt/qemu/master/system/memory.c:2700
#6 0x55ae8ab3fae5 in pci_update_mappings
../../home/mjt/qemu/master/hw/pci/pci.c:1717
#7 0x55ae8ab4044d in pci_default_write_config
../../home/mjt/qemu/master/hw/pci/pci.c:1790
#8 0x55ae8a9e5e60 in e1000_write_config
../../home/mjt/qemu/master/hw/net/e1000.c:1618
#9 0x55ae8ab4ca87 in pci_host_config_write_common
../../home/mjt/qemu/master/hw/pci/pci_host.c:96
#10 0x55ae8ab4cf39 in pci_data_write
../../home/mjt/qemu/master/hw/pci/pci_host.c:138
#11 0x55ae8ab4d1cf in pci_host_data_write
../../home/mjt/qemu/master/hw/pci/pci_host.c:188
#12 0x55ae8ae94551 in memory_region_write_accessor
../../home/mjt/qemu/master/system/memory.c:488
#13 0x55ae8ae94beb in access_with_adjusted_size
../../home/mjt/qemu/master/system/memory.c:564
#14 0x55ae8ae9d6aa in memory_region_dispatch_write
../../home/mjt/qemu/master/system/memory.c:1544
#15 0x55ae8aecc896 in address_space_stw_internal
../../home/mjt/qemu/master/system/memory_ldst.c.inc:415
#16 0x55ae8aeccad7 in address_space_stw
../../home/mjt/qemu/master/system/memory_ldst.c.inc:446
#17 0x55ae8b2391a1 in helper_outw
../../home/mjt/qemu/master/target/i386/tcg/system/misc_helper.c:45
#18 0x7f96eef65a4d (/memfd:tcg-jit (deleted)+0x1166a4d)
Thread T10 created by T0 here:
#0 0x7f9774e49726 in __interceptor_pthread_create
../../../../src/libsanitizer/asan/asan_interceptors.cpp:207
#1 0x55ae8ba9c9a7 in qemu_thread_create
../../home/mjt/qemu/master/util/qemu-thread-posix.c:581
#2 0x55ae8b0f2f94 in mttcg_start_vcpu_thread
../../home/mjt/qemu/master/accel/tcg/tcg-accel-ops-mttcg.c:143
#3 0x55ae8ae7ba65 in qemu_init_vcpu
../../home/mjt/qemu/master/system/cpus.c:709
#4 0x55ae8b329362 in x86_cpu_realizefn
../../home/mjt/qemu/master/target/i386/cpu.c:8865
#5 0x55ae8b5a621f in device_set_realized
../../home/mjt/qemu/master/hw/core/qdev.c:494
#6 0x55ae8b5bd362 in property_set_bool
../../home/mjt/qemu/master/qom/object.c:2375
#7 0x55ae8b5b86af in object_property_set
../../home/mjt/qemu/master/qom/object.c:1450
#8 0x55ae8b5c22fd in object_property_set_qobject
../../home/mjt/qemu/master/qom/qom-qobject.c:28
#9 0x55ae8b5b8c29 in object_property_set_bool
../../home/mjt/qemu/master/qom/object.c:1520
#10 0x55ae8b5a50d4 in qdev_realize
../../home/mjt/qemu/master/hw/core/qdev.c:276
#11 0x55ae8b26fe3f in x86_cpu_new
../../home/mjt/qemu/master/hw/i386/x86-common.c:64
#12 0x55ae8b2701ff in x86_cpus_init
../../home/mjt/qemu/master/hw/i386/x86-common.c:115
#13 0x55ae8b267d90 in pc_init1
../../home/mjt/qemu/master/hw/i386/pc_piix.c:185
#14 0x55ae8b2695f7 in pc_i440fx_init
../../home/mjt/qemu/master/hw/i386/pc_piix.c:451
#15 0x55ae8b2699b3 in pc_i440fx_machine_10_1_init
../../home/mjt/qemu/master/hw/i386/pc_piix.c:492
#16 0x55ae8a7fa936 in machine_run_board_init
../../home/mjt/qemu/master/hw/core/machine.c:1669
#17 0x55ae8ae6fc53 in qemu_init_board
../../home/mjt/qemu/master/system/vl.c:2710
#18 0x55ae8ae7043b in qmp_x_exit_preconfig
../../home/mjt/qemu/master/system/vl.c:2804
#19 0x55ae8ae751b3 in qemu_init
../../home/mjt/qemu/master/system/vl.c:3840
#20 0x55ae8b8ba5d7 in main ../../home/mjt/qemu/master/system/main.c:71
#21 0x7f977365f249 in __libc_start_call_main
../sysdeps/nptl/libc_start_call_main.h:58
Thread T1 created by T0 here:
#0 0x7f9774e49726 in __interceptor_pthread_create
../../../../src/libsanitizer/asan/asan_interceptors.cpp:207
#1 0x55ae8ba9c9a7 in qemu_thread_create
../../home/mjt/qemu/master/util/qemu-thread-posix.c:581
#2 0x55ae8baba213 in rcu_init_complete
../../home/mjt/qemu/master/util/rcu.c:415
#3 0x55ae8baba42c in rcu_init ../../home/mjt/qemu/master/util/rcu.c:471
#4 0x7f977365f375 in call_init ../csu/libc-start.c:145
#5 0x7f977365f375 in __libc_start_main_impl ../csu/libc-start.c:347
Thread T4 created by T0 here:
#0 0x7f9774e49726 in __interceptor_pthread_create
../../../../src/libsanitizer/asan/asan_interceptors.cpp:207
#1 0x55ae8ba9c9a7 in qemu_thread_create
../../home/mjt/qemu/master/util/qemu-thread-posix.c:581
#2 0x55ae8b0f2f94 in mttcg_start_vcpu_thread
../../home/mjt/qemu/master/accel/tcg/tcg-accel-ops-mttcg.c:143
#3 0x55ae8ae7ba65 in qemu_init_vcpu
../../home/mjt/qemu/master/system/cpus.c:709
#4 0x55ae8b329362 in x86_cpu_realizefn
../../home/mjt/qemu/master/target/i386/cpu.c:8865
#5 0x55ae8b5a621f in device_set_realized
../../home/mjt/qemu/master/hw/core/qdev.c:494
#6 0x55ae8b5bd362 in property_set_bool
../../home/mjt/qemu/master/qom/object.c:2375
#7 0x55ae8b5b86af in object_property_set
../../home/mjt/qemu/master/qom/object.c:1450
#8 0x55ae8b5c22fd in object_property_set_qobject
../../home/mjt/qemu/master/qom/qom-qobject.c:28
#9 0x55ae8b5b8c29 in object_property_set_bool
../../home/mjt/qemu/master/qom/object.c:1520
#10 0x55ae8b5a50d4 in qdev_realize
../../home/mjt/qemu/master/hw/core/qdev.c:276
#11 0x55ae8b26fe3f in x86_cpu_new
../../home/mjt/qemu/master/hw/i386/x86-common.c:64
#12 0x55ae8b2701ff in x86_cpus_init
../../home/mjt/qemu/master/hw/i386/x86-common.c:115
#13 0x55ae8b267d90 in pc_init1
../../home/mjt/qemu/master/hw/i386/pc_piix.c:185
#14 0x55ae8b2695f7 in pc_i440fx_init
../../home/mjt/qemu/master/hw/i386/pc_piix.c:451
#15 0x55ae8b2699b3 in pc_i440fx_machine_10_1_init
../../home/mjt/qemu/master/hw/i386/pc_piix.c:492
#16 0x55ae8a7fa936 in machine_run_board_init
../../home/mjt/qemu/master/hw/core/machine.c:1669
#17 0x55ae8ae6fc53 in qemu_init_board
../../home/mjt/qemu/master/system/vl.c:2710
#18 0x55ae8ae7043b in qmp_x_exit_preconfig
../../home/mjt/qemu/master/system/vl.c:2804
#19 0x55ae8ae751b3 in qemu_init
../../home/mjt/qemu/master/system/vl.c:3840
#20 0x55ae8b8ba5d7 in main ../../home/mjt/qemu/master/system/main.c:71
#21 0x7f977365f249 in __libc_start_call_main
../sysdeps/nptl/libc_start_call_main.h:58
SUMMARY: AddressSanitizer: heap-use-after-free
../../home/mjt/qemu/master/system/physmem.c:350 in
address_space_lookup_region
Shadow bytes around the buggy address:
0x0c0c80072ba0: fd fd fd fd fa fa fa fa fd fd fd fd fd fd fd fa
0x0c0c80072bb0: fa fa fa fa fd fd fd fd fd fd fd fd fa fa fa fa
0x0c0c80072bc0: fd fd fd fd fd fd fd fd fa fa fa fa fd fd fd fd
0x0c0c80072bd0: fd fd fd fd fa fa fa fa fd fd fd fd fd fd fd fd
0x0c0c80072be0: fa fa fa fa fd fd fd fd fd fd fd fa fa fa fa fa
=>0x0c0c80072bf0:[fd]fd fd fd fd fd fd fd fa fa fa fa fd fd fd fd
0x0c0c80072c00: fd fd fd fd fa fa fa fa fd fd fd fd fd fd fd fd
0x0c0c80072c10: fa fa fa fa fd fd fd fd fd fd fd fd fa fa fa fa
0x0c0c80072c20: fd fd fd fd fd fd fd fd fa fa fa fa fd fd fd fd
0x0c0c80072c30: fd fd fd fd fa fa fa fa fd fd fd fd fd fd fd fd
0x0c0c80072c40: fa fa fa fa fd fd fd fd fd fd fd fd fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
==368707==ABORTING
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: apparent race condition in mttcg memory handling
2025-05-30 19:20 apparent race condition in mttcg memory handling Michael Tokarev
2025-06-04 10:47 ` Michael Tokarev
@ 2025-07-21 11:47 ` Philippe Mathieu-Daudé
2025-07-21 16:23 ` Pierrick Bouvier
2025-07-22 20:11 ` Gustavo Romero
2 siblings, 1 reply; 12+ messages in thread
From: Philippe Mathieu-Daudé @ 2025-07-21 11:47 UTC (permalink / raw)
To: Michael Tokarev, QEMU Development
Cc: Jonathan Cameron, Pierrick Bouvier, Alex Bennée,
Richard Henderson, Paolo Bonzini, Stefan Hajnoczi,
Mark Cave-Ayland
(Cc'ing few more developers)
On 30/5/25 21:20, Michael Tokarev wrote:
> Hi!
>
> For quite some time (almost whole day yesterday) I'm trying to find out
> what's going on with mmtcg in qemu. There's apparently a race condition
> somewhere, like a use-after-free or something.
>
> It started as an incarnation of
> https://gitlab.com/qemu-project/qemu/-/issues/1921 -- the same assertion
> failure, but on an x86_64 host this time (it's also mentioned in that
> issue).
>
> However, that particular assertion failure is not the only possible
> outcome. We're hitting multiple assertion failures or SIGSEGVs in
> physmem.c and related files, - 4 or 5 different places so far.
>
> The problem here is that the bug is rather difficult to reproduce.
> What I've been using so far was to make most host cores busy, and
> specify amount of virtual CPUs close to actual host cores (threads).
>
> For example, on my 4-core, 8-threads notebook, I used `stress -c 8`
> and ran qemu with -smp 10, to trigger this issue. However, on this
> very notebook it is really difficult to trigger it, - it happens
> every 30..50 runs or so.
>
> The reproducer I was using - it was just booting kernel, no user-
> space is needed. Qemu crashes during kernel init, or it runs fine.
>
> I used regular kernel from debian sid:
> http://deb.debian.org/debian/pool/main/l/linux-signed-amd64/linux-
> image-amd64_6.12.29-1_amd64.deb
> Extract vmlinuz-6.12.29-amd64 from there.
>
> In order to simplify the reproducing, I created a tiny initrd with
> just one executable in there, which does a poweroff:
>
> cat >poweroff.c <<'EOF'
> #include <sys/reboot.h>
> #include <unistd.h>
>
> int main(void) {
> reboot(RB_POWER_OFF);
> sleep(5);
> return 0;
> }
> EOF
> diet gcc -static -o init poweroff.c
> echo init | cpio -o -H newc > initrd
>
> (it uses dietlibc, optional, just to make the initrd smaller).
>
> Now, the qemu invocation I used:
>
> qemu-system-x86_64 -kernel vmlinuz -initrd initrd \
> -append "console=ttyS0" \
> -vga none -display none \
> -serial file:/dev/tty \
> -monitor stdio \
> -m 256 \
> -smp 16
>
> This way, it either succeeds, terminating normally due to
> the initrd hating the system, or it will segfault or assert
> as per the issue.
>
> For a 64-core machine, I used -smp 64, and had 16..40 cores
> being busy with other stuff. Also, adding `nice' in front
> of that command apparently helps.
>
> Now, to the various issues/places I've hit. Here's a typical
> output:
>
> ...
> [ 3.129806] smpboot: x86: Booting SMP configuration:
> [ 3.135789] .... node #0, CPUs: #1 #2 #3 #4 #5 #6 #7
> #8 #9
> [ 0.000000] calibrate_delay_direct() failed to get a good estimate
> for loops_per_jiffy.
> [ 0.000000] Probably due to long platform interrupts. Consider using
> "lpj=" boot option.
> [ 0.000000] calibrate_delay_direct() failed to get a good estimate
> for loops_per_jiffy.
> [ 0.000000] Probably due to long platform interrupts. Consider using
> "lpj=" boot option.
> [ 0.000000] calibrate_delay_direct() failed to get a good estimate
> for loops_per_jiffy.
> [ 0.000000] Probably due to long platform interrupts. Consider using
> "lpj=" boot option.
> [ 0.000000] calibrate_delay_direct() failed to get a good estimate
> for loops_per_jiffy.
> [ 0.000000] Probably due to long platform interrupts. Consider using
> "lpj=" boot option.
> [ 4.494389] calibrate_delay_direct() failed to get a good estimate
> for loops_per_jiffy.
> [ 4.494389] Probably due to long platform interrupts. Consider using
> "lpj=" boot option.
> [ 4.494396] calibrate_delay_direct() failed to get a good estimate
> for loops_per_jiffy.
> [ 4.494396] Probably due to long platform interrupts. Consider using
> "lpj=" boot option.
> [ 4.494401] calibrate_delay_direct() failed to get a good estimate
> for loops_per_jiffy.
> [ 4.494401] Probably due to long platform interrupts. Consider using
> "lpj=" boot option.
> [ 4.494408] calibrate_delay_direct() failed to get a good estimate
> for loops_per_jiffy.
> [ 4.494408] Probably due to long platform interrupts. Consider using
> "lpj=" boot option.
> [ 4.494415] calibrate_delay_direct() failed to get a good estimate
> for loops_per_jiffy.
> [ 4.494415] Probably due to long platform interrupts. Consider using
> "lpj=" boot option.
> [ 5.864038] smp: Brought up 1 node, 10 CPUs
> [ 5.865772] smpboot: Total of 10 processors activated (25983.25
> BogoMIPS)
> [ 6.119683] Memory: 200320K/261624K available (16384K kernel code,
> 2486K rwdata, 11780K rodata, 4148K init, 4956K bss, 53176K reserved, 0K
> cma-reserved)
> [ 6.591933] devtmpfs: initialized
> [ 6.635844] x86/mm: Memory block size: 128MB
> [ 6.756849] clocksource: jiffies: mask: 0xffffffff max_cycles:
> 0xffffffff, max_idle_ns: 7645041785100000 ns
> [ 6.774545] futex hash table entries: 4096 (order: 6, 262144 bytes,
> linear)
> [ 6.840775] pinctrl core: initialized pinctrl subsystem
> [ 7.117085] NET: Registered PF_NETLINK/PF_ROUTE protocol family
> [ 7.165883] DMA: preallocated 128 KiB GFP_KERNEL pool for atomic
> allocations
> [ 7.184243] DMA: preallocated 128 KiB GFP_KERNEL|GFP_DMA pool for
> atomic allocations
> [ 7.188322] DMA: preallocated 128 KiB GFP_KERNEL|GFP_DMA32 pool for
> atomic allocations
> [ 7.195902] audit: initializing netlink subsys (disabled)
> [ 7.223865] audit: type=2000 audit(1748628013.324:1):
> state=initialized audit_enabled=0 res=1
> [ 7.290904] thermal_sys: Registered thermal governor 'fair_share'
> [ 7.291980] thermal_sys: Registered thermal governor 'bang_bang'
> [ 7.295875] thermal_sys: Registered thermal governor 'step_wise'
> [ 7.299817] thermal_sys: Registered thermal governor 'user_space'
> [ 7.303804] thermal_sys: Registered thermal governor 'power_allocator'
> [ 7.316281] cpuidle: using governor ladder
> [ 7.331907] cpuidle: using governor menu
> [ 7.348199] acpiphp: ACPI Hot Plug PCI Controller Driver version: 0.5
> [ 7.407802] PCI: Using configuration type 1 for base access
> [ 7.417386] mtrr: your CPUs had inconsistent fixed MTRR settings
> [ 7.418244] mtrr: your CPUs had inconsistent variable MTRR settings
> [ 7.419048] mtrr: your CPUs had inconsistent MTRRdefType settings
> [ 7.419938] mtrr: probably your BIOS does not setup all CPUs.
> [ 7.420691] mtrr: corrected configuration.
> [ 7.461270] kprobes: kprobe jump-optimization is enabled. All kprobes
> are optimized if possible.
> [ 7.591938] HugeTLB: registered 2.00 MiB page size, pre-allocated 0
> pages
> [ 7.595986] HugeTLB: 28 KiB vmemmap can be freed for a 2.00 MiB page
> [ 7.816900] ACPI: Added _OSI(Module Device)
> [ 7.819950] ACPI: Added _OSI(Processor Device)
> [ 7.823873] ACPI: Added _OSI(3.0 _SCP Extensions)
> [ 7.827683] ACPI: Added _OSI(Processor Aggregator Device)
> [ 8.000944] ACPI: 1 ACPI AML tables successfully acquired and loaded
> [ 8.355952] ACPI: Interpreter enabled
> [ 8.406604] ACPI: PM: (supports S0 S3 S4 S5)
> [ 8.416143] ACPI: Using IOAPIC for interrupt routing
> [ 8.448173] PCI: Using host bridge windows from ACPI; if necessary,
> use "pci=nocrs" and report a bug
> [ 8.468051] PCI: Using E820 reservations for host bridge windows
> [ 8.562534] ACPI: Enabled 2 GPEs in block 00 to 0F
> [ 9.153432] ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
> [ 9.166585] acpi PNP0A03:00: _OSC: OS supports [ASPM ClockPM Segments
> MSI HPX-Type3]
> [ 9.168452] acpi PNP0A03:00: _OSC: not requesting OS control; OS
> requires [ExtendedConfig ASPM ClockPM MSI]
> [ 9.181933] acpi PNP0A03:00: fail to add MMCONFIG information, can't
> access extended configuration space under this bridge
> [ 9.297562] acpiphp: Slot [2] registered
> ...
> [ 9.369007] PCI host bridge to bus 0000:00
> [ 9.376590] pci_bus 0000:00: root bus resource [io 0x0000-0x0cf7
> window]
> [ 9.379987] pci_bus 0000:00: root bus resource [io 0x0d00-0xffff
> window]
> [ 9.383826] pci_bus 0000:00: root bus resource [mem
> 0x000a0000-0x000bffff window]
> [ 9.387818] pci_bus 0000:00: root bus resource [mem
> 0x10000000-0xfebfffff window]
> [ 9.393681] pci_bus 0000:00: root bus resource [mem
> 0x100000000-0x17fffffff window]
> [ 9.396987] pci_bus 0000:00: root bus resource [bus 00-ff]
> [ 9.414378] pci 0000:00:00.0: [8086:1237] type 00 class 0x060000
> conventional PCI endpoint
> [ 9.477179] pci 0000:00:01.0: [8086:7000] type 00 class 0x060100
> conventional PCI endpoint
> [ 9.494836] pci 0000:00:01.1: [8086:7010] type 00 class 0x010180
> conventional PCI endpoint
> [ 9.527173] pci 0000:00:01.1: BAR 4 [io 0xc040-0xc04f]
> Segmentation fault
>
>
> So it breaks somewhere in PCI init, after SMP/CPUs has been inited
> by the guest kernel.
>
> Thread 21 "qemu-system-x86" received signal SIGSEGV, Segmentation fault.
> 0x0000555555e2e9c0 in section_covers_addr (section=0x7fff58307,
> addr=182591488) at ../system/physmem.c:309
> 309 return int128_gethi(section->size) ||
> (gdb) p *section
> Cannot access memory at address 0x7fff58307
>
> This one has been seen multiple times.
>
> Thread 53 "qemu-system-x86" received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7ffe8a7fc6c0 (LWP 104067)]
> 0x0000555555e30382 in memory_region_section_get_iotlb
> (cpu=0x5555584e0a90, section=0x7fff58c3eac0) at
> ../system/physmem.c:1002
> 1002 return section - d->map.sections;
> d is NULL here
>
>
> Thread 22 "qemu-system-x86" received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7fff0bfff6c0 (LWP 57595)]
> 0x0000555555e42c9a in memory_region_get_iommu (mr=0xffffffc1ffffffc1) at
> include/exec/memory.h:1756
> 1756 if (mr->alias) {
> (gdb) p *mr
> Cannot access memory at address 0xffffffc1ffffffc1
> (gdb) frame 1
> #1 0x0000555555e42cb9 in memory_region_get_iommu (mr=0x7fff54239a10) at
> include/exec/memory.h:1757
> 1757 return memory_region_get_iommu(mr->alias);
> (gdb) p mr
> $1 = (MemoryRegion *) 0x7fff54239a10
>
>
> [ 9.222531] pci 0000:00:02.0: BAR 0 [mem 0xfebc0000-0xfebdffff]
> [
> Thread 54 "qemu-system-x86" received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7ffebeffd6c0 (LWP 14977)]
>
> (gdb) l
> 1004 /* Called from RCU critical section */
> 1005 hwaddr memory_region_section_get_iotlb(CPUState *cpu,
> 1006 MemoryRegionSection
> *section)
> 1007 {
> 1008 AddressSpaceDispatch *d = flatview_to_dispatch(section->fv);
> 1009 return section - d->map.sections;
> 1010 }
> 1011
> 1012 static int subpage_register(subpage_t *mmio, uint32_t start,
> uint32_t end,
> 1013 uint16_t section);
>
> (gdb) p *section
> $1 = {size = 4940204083636081308795136, mr = 0x7fff98739760, fv =
> 0x7fff998f6fd0,
> offset_within_region = 0, offset_within_address_space = 0, readonly =
> false,
> nonvolatile = false, unmergeable = 12}
>
> (gdb) p *section->fv
> $2 = {rcu = {next = 0x0, func = 0x20281}, ref = 2555275280, ranges =
> 0x7fff99486a60, nr = 0,
> nr_allocated = 0, dispatch = 0x0, root = 0xffffffc1ffffffc1}
>
> (gdb) bt
> #0 0x0000555555e5118c in memory_region_section_get_iotlb
> (cpu=cpu@entry=0x55555894fdf0,
> section=section@entry=0x7fff984e6810) at system/physmem.c:1009
> #1 0x0000555555e6e07a in tlb_set_page_full (cpu=cpu@entry=0x55555894fdf0,
> mmu_idx=mmu_idx@entry=6, addr=addr@entry=151134208,
> full=full@entry=0x7ffebeffbd60)
> at accel/tcg/cputlb.c:1088
> #2 0x0000555555e70a92 in tlb_set_page_with_attrs
> (cpu=cpu@entry=0x55555894fdf0,
> addr=addr@entry=151134208, paddr=paddr@entry=151134208, attrs=...,
> prot=<optimized out>,
> mmu_idx=mmu_idx@entry=6, size=4096) at accel/tcg/cputlb.c:1193
> #3 0x0000555555d4ae44 in x86_cpu_tlb_fill (cs=0x55555894fdf0,
> addr=151138272,
> size=<optimized out>, access_type=MMU_DATA_STORE, mmu_idx=6,
> probe=<optimized out>,
> retaddr=0) at target/i386/tcg/system/excp_helper.c:624
> #4 0x0000555555e6e8cf in tlb_fill_align (cpu=0x55555894fdf0,
> addr=151138272,
> type=MMU_DATA_LOAD, type@entry=MMU_DATA_STORE, mmu_idx=6,
> memop=memop@entry=MO_8,
> size=-1739692016, size@entry=151138272, probe=true, ra=0) at accel/
> tcg/cputlb.c:1251
> #5 0x0000555555e6eb0d in probe_access_internal
> (cpu=cpu@entry=0x55555894fdf0,
> addr=addr@entry=151138272, fault_size=fault_size@entry=0,
> access_type=access_type@entry=MMU_DATA_STORE, mmu_idx=<optimized out>,
> nonfault=nonfault@entry=true, phost=0x7ffebeffc0a8,
> pfull=0x7ffebeffbfa0, retaddr=0,
> check_mem_cbs=false) at accel/tcg/cputlb.c:1371
> #6 0x0000555555e70c84 in probe_access_full_mmu (env=0x5555589529b0,
> addr=addr@entry=151138272,
> size=size@entry=0, access_type=access_type@entry=MMU_DATA_STORE,
> mmu_idx=<optimized out>,
> phost=phost@entry=0x7ffebeffc0a8, pfull=0x7ffebeffbfa0) at accel/
> tcg/cputlb.c:1439
> #7 0x0000555555d497c9 in ptw_translate (inout=0x7ffebeffc090,
> addr=151138272)
> at target/i386/tcg/system/excp_helper.c:68
> #8 0x0000555555d49988 in mmu_translate (env=env@entry=0x5555589529b0,
> in=in@entry=0x7ffebeffc140, out=out@entry=0x7ffebeffc110,
> err=err@entry=0x7ffebeffc120,
> ra=ra@entry=0) at target/i386/tcg/system/excp_helper.c:198
> #9 0x0000555555d4aece in get_physical_address (env=0x5555589529b0,
> addr=18446741874686299840,
> access_type=MMU_DATA_LOAD, mmu_idx=4, out=0x7ffebeffc110,
> err=0x7ffebeffc120, ra=0)
> at target/i386/tcg/system/excp_helper.c:597
> #10 x86_cpu_tlb_fill (cs=0x55555894fdf0, addr=18446741874686299840,
> size=<optimized out>,
> access_type=MMU_DATA_LOAD, mmu_idx=4, probe=<optimized out>,
> retaddr=0)
> at target/i386/tcg/system/excp_helper.c:617
> #11 0x0000555555e6e8cf in tlb_fill_align (cpu=0x55555894fdf0,
> addr=18446741874686299840,
> type=type@entry=MMU_DATA_LOAD, mmu_idx=4, memop=MO_8,
> memop@entry=MO_32, size=-1739692016,
> size@entry=3776, probe=false, ra=0) at accel/tcg/cputlb.c:1251
> #12 0x0000555555e6ed4d in mmu_lookup1 (cpu=cpu@entry=0x55555894fdf0,
> data=data@entry=0x7ffebeffc310, memop=memop@entry=MO_32,
> mmu_idx=mmu_idx@entry=4,
> access_type=access_type@entry=MMU_DATA_LOAD, ra=ra@entry=0) at
> accel/tcg/cputlb.c:1652
> #13 0x0000555555e6eea5 in mmu_lookup (cpu=cpu@entry=0x55555894fdf0,
> addr=addr@entry=18446741874686299840, oi=oi@entry=36, ra=ra@entry=0,
> type=type@entry=MMU_DATA_LOAD, l=l@entry=0x7ffebeffc310) at accel/
> tcg/cputlb.c:1755
> #14 0x0000555555e6f2f3 in do_ld4_mmu (cpu=cpu@entry=0x55555894fdf0,
> addr=addr@entry=18446741874686299840, oi=oi@entry=36, ra=ra@entry=0,
> access_type=access_type@entry=MMU_DATA_LOAD) at accel/tcg/
> cputlb.c:2364
> #15 0x0000555555e71dba in cpu_ldl_mmu (env=0x5555589529b0,
> addr=18446741874686299840, oi=36,
> ra=0) at accel/tcg/ldst_common.c.inc:165
> #16 cpu_ldl_le_mmuidx_ra (env=env@entry=0x5555589529b0,
> addr=addr@entry=18446741874686299840,
> mmu_idx=<optimized out>, ra=ra@entry=0) at accel/tcg/
> ldst_common.c.inc:308
> #17 0x0000555555db72da in do_interrupt64 (env=0x5555589529b0, intno=236,
> is_int=0, error_code=0,
> next_eip=<optimized out>, is_hw=0) at target/i386/tcg/seg_helper.c:954
> #18 do_interrupt_all (cpu=cpu@entry=0x55555894fdf0, intno=236,
> is_int=is_int@entry=0,
> error_code=error_code@entry=0, next_eip=next_eip@entry=0,
> is_hw=is_hw@entry=1)
> at target/i386/tcg/seg_helper.c:1213
> #19 0x0000555555db884a in do_interrupt_x86_hardirq
> (env=env@entry=0x5555589529b0,
> intno=<optimized out>, is_hw=is_hw@entry=1) at target/i386/tcg/
> seg_helper.c:1245
> #20 0x0000555555d4f06f in x86_cpu_exec_interrupt (cs=0x55555894fdf0,
> interrupt_request=<optimized out>) at target/i386/tcg/system/
> seg_helper.c:209
> #21 0x0000555555e660ed in cpu_handle_interrupt (cpu=0x55555894fdf0,
> last_tb=<synthetic pointer>)
> at accel/tcg/cpu-exec.c:851
> #22 cpu_exec_loop (cpu=cpu@entry=0x55555894fdf0,
> sc=sc@entry=0x7ffebeffc580)
> at accel/tcg/cpu-exec.c:955
> #23 0x0000555555e663f1 in cpu_exec_setjmp (cpu=cpu@entry=0x55555894fdf0,
> sc=sc@entry=0x7ffebeffc580) at accel/tcg/cpu-exec.c:1033
> type=type@entry=MMU_DATA_LOAD, l=l@entry=0x7ffebeffc310) at accel/
> tcg/cputlb.c:1755
> #14 0x0000555555e6f2f3 in do_ld4_mmu (cpu=cpu@entry=0x55555894fdf0,
> addr=addr@entry=18446741874686299840, oi=oi@entry=36, ra=ra@entry=0,
> access_type=access_type@entry=MMU_DATA_LOAD) at accel/tcg/
> cputlb.c:2364
> #15 0x0000555555e71dba in cpu_ldl_mmu (env=0x5555589529b0,
> addr=18446741874686299840, oi=36,
> ra=0) at accel/tcg/ldst_common.c.inc:165
> #16 cpu_ldl_le_mmuidx_ra (env=env@entry=0x5555589529b0,
> addr=addr@entry=18446741874686299840,
> mmu_idx=<optimized out>, ra=ra@entry=0) at accel/tcg/
> ldst_common.c.inc:308
> #17 0x0000555555db72da in do_interrupt64 (env=0x5555589529b0, intno=236,
> is_int=0, error_code=0,
> next_eip=<optimized out>, is_hw=0) at target/i386/tcg/seg_helper.c:954
> #18 do_interrupt_all (cpu=cpu@entry=0x55555894fdf0, intno=236,
> is_int=is_int@entry=0,
> error_code=error_code@entry=0, next_eip=next_eip@entry=0,
> is_hw=is_hw@entry=1)
> at target/i386/tcg/seg_helper.c:1213
> #19 0x0000555555db884a in do_interrupt_x86_hardirq
> (env=env@entry=0x5555589529b0,
> intno=<optimized out>, is_hw=is_hw@entry=1) at target/i386/tcg/
> seg_helper.c:1245
> #20 0x0000555555d4f06f in x86_cpu_exec_interrupt (cs=0x55555894fdf0,
> interrupt_request=<optimized out>) at target/i386/tcg/system/
> seg_helper.c:209
> #21 0x0000555555e660ed in cpu_handle_interrupt (cpu=0x55555894fdf0,
> last_tb=<synthetic pointer>)
> at accel/tcg/cpu-exec.c:851
> #22 cpu_exec_loop (cpu=cpu@entry=0x55555894fdf0,
> sc=sc@entry=0x7ffebeffc580)
> at accel/tcg/cpu-exec.c:955
> #23 0x0000555555e663f1 in cpu_exec_setjmp (cpu=cpu@entry=0x55555894fdf0,
> sc=sc@entry=0x7ffebeffc580) at accel/tcg/cpu-exec.c:1033
> --Type <RET> for more, q to quit, c to continue without paging--
> #24 0x0000555555e66a5d in cpu_exec (cpu=cpu@entry=0x55555894fdf0) at
> accel/tcg/cpu-exec.c:1059
> #25 0x0000555555d2bdc7 in tcg_cpu_exec (cpu=cpu@entry=0x55555894fdf0)
> at accel/tcg/tcg-accel-ops.c:80
> #26 0x0000555555d2c1c3 in mttcg_cpu_thread_fn
> (arg=arg@entry=0x55555894fdf0)
> at accel/tcg/tcg-accel-ops-mttcg.c:94
> #27 0x0000555556056d90 in qemu_thread_start (args=0x5555589cdba0) at
> util/qemu-thread-posix.c:541
> #28 0x00007ffff60e0b7b in ?? () from /lib/x86_64-linux-gnu/libc.so.6
> #29 0x00007ffff615e7b8 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
>
>
> qemu-system-x86_64: ./include/exec/ram_addr.h:91: ramblock_ptr:
> Assertion `offset_in_ramblock(block, offset)' failed.
>
> (gdb) bt
> #0 0x00007ffff6076507 in abort () from /lib/x86_64-linux-gnu/libc.so.6
> #1 0x00007ffff6076420 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
> #2 0x0000555555a047fa in ramblock_ptr (offset=281471527758833,
> block=<optimized out>)
> at ./include/exec/ram_addr.h:91
> #3 0x0000555555a04c83 in ramblock_ptr (block=<optimized out>,
> offset=<optimized out>)
> at system/physmem.c:2238
> #4 qemu_ram_ptr_length (lock=false, is_write=true, block=<optimized
> out>, addr=<optimized out>,
> size=0x0) at system/physmem.c:2430
> #5 qemu_map_ram_ptr (ram_block=<optimized out>, addr=<optimized out>)
> at system/physmem.c:2443
> #6 0x0000555555e4af6b in memory_region_get_ram_ptr (mr=<optimized out>)
> at system/memory.c:2452
> #7 0x0000555555e6e024 in tlb_set_page_full (cpu=cpu@entry=0x5555589f9f50,
> mmu_idx=mmu_idx@entry=4, addr=addr@entry=18446741874686296064,
> full=full@entry=0x7ffebd7f90b0) at accel/tcg/cputlb.c:1065
> #8 0x0000555555e70a92 in tlb_set_page_with_attrs
> (cpu=cpu@entry=0x5555589f9f50,
> addr=addr@entry=18446741874686296064, paddr=paddr@entry=206749696,
> attrs=...,
> prot=<optimized out>, mmu_idx=mmu_idx@entry=4, size=4096) at accel/
> tcg/cputlb.c:1193
> #9 0x0000555555d4ae44 in x86_cpu_tlb_fill (cs=0x5555589f9f50,
> addr=18446741874686299840,
> size=<optimized out>, access_type=MMU_DATA_LOAD, mmu_idx=4,
> probe=<optimized out>, retaddr=0)
> at target/i386/tcg/system/excp_helper.c:624
> #10 0x0000555555e6e8cf in tlb_fill_align (cpu=0x5555589f9f50,
> addr=18446741874686299840,
> type=type@entry=MMU_DATA_LOAD, mmu_idx=4, memop=MO_8,
> memop@entry=MO_32, size=-1115714056,
> size@entry=3776, probe=false, ra=0) at accel/tcg/cputlb.c:1251
> #11 0x0000555555e6ed4d in mmu_lookup1 (cpu=cpu@entry=0x5555589f9f50,
> data=data@entry=0x7ffebd7f9310, memop=memop@entry=MO_32,
> mmu_idx=mmu_idx@entry=4,
> access_type=access_type@entry=MMU_DATA_LOAD, ra=ra@entry=0) at
> accel/tcg/cputlb.c:1652
> #12 0x0000555555e6eea5 in mmu_lookup (cpu=cpu@entry=0x5555589f9f50,
> addr=addr@entry=18446741874686299840, oi=oi@entry=36, ra=ra@entry=0,
> type=type@entry=MMU_DATA_LOAD, l=l@entry=0x7ffebd7f9310) at accel/
> tcg/cputlb.c:1755
> #13 0x0000555555e6f2f3 in do_ld4_mmu (cpu=cpu@entry=0x5555589f9f50,
> addr=addr@entry=18446741874686299840, oi=oi@entry=36, ra=ra@entry=0,
> access_type=access_type@entry=MMU_DATA_LOAD) at accel/tcg/
> cputlb.c:2364
> #14 0x0000555555e71dba in cpu_ldl_mmu (env=0x5555589fcb10,
> addr=18446741874686299840, oi=36,
> ra=0) at accel/tcg/ldst_common.c.inc:165
> #15 cpu_ldl_le_mmuidx_ra (env=env@entry=0x5555589fcb10,
> addr=addr@entry=18446741874686299840,
> mmu_idx=<optimized out>, ra=ra@entry=0) at accel/tcg/
> ldst_common.c.inc:308
> #16 0x0000555555db72da in do_interrupt64 (env=0x5555589fcb10, intno=236,
> is_int=0, error_code=0,
> next_eip=<optimized out>, is_hw=0) at target/i386/tcg/seg_helper.c:954
> #17 do_interrupt_all (cpu=cpu@entry=0x5555589f9f50, intno=236,
> is_int=is_int@entry=0,
> error_code=error_code@entry=0, next_eip=next_eip@entry=0,
> is_hw=is_hw@entry=1)
> at target/i386/tcg/seg_helper.c:1213
> #18 0x0000555555db884a in do_interrupt_x86_hardirq
> (env=env@entry=0x5555589fcb10,
> intno=<optimized out>, is_hw=is_hw@entry=1) at target/i386/tcg/
> seg_helper.c:1245
> #19 0x0000555555d4f06f in x86_cpu_exec_interrupt (cs=0x5555589f9f50,
> interrupt_request=<optimized out>) at target/i386/tcg/system/
> seg_helper.c:209
> #20 0x0000555555e660ed in cpu_handle_interrupt (cpu=0x5555589f9f50,
> last_tb=<synthetic pointer>)
> at accel/tcg/cpu-exec.c:851
> #21 cpu_exec_loop (cpu=cpu@entry=0x5555589f9f50,
> sc=sc@entry=0x7ffebd7f9580)
> at accel/tcg/cpu-exec.c:955
> #22 0x0000555555e663f1 in cpu_exec_setjmp (cpu=cpu@entry=0x5555589f9f50,
> sc=sc@entry=0x7ffebd7f9580) at accel/tcg/cpu-exec.c:1033
> #23 0x0000555555e66a5d in cpu_exec (cpu=cpu@entry=0x5555589f9f50) at
> accel/tcg/cpu-exec.c:1059
> #24 0x0000555555d2bdc7 in tcg_cpu_exec (cpu=cpu@entry=0x5555589f9f50)
> at accel/tcg/tcg-accel-ops.c:80
> #25 0x0000555555d2c1c3 in mttcg_cpu_thread_fn
> (arg=arg@entry=0x5555589f9f50)
> at accel/tcg/tcg-accel-ops-mttcg.c:94
> #26 0x0000555556056d90 in qemu_thread_start (args=0x55555856bf60) at
> util/qemu-thread-posix.c:541
> #27 0x00007ffff60e0b7b in ?? () from /lib/x86_64-linux-gnu/libc.so.6
> #28 0x00007ffff615e7b8 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
>
> (gdb) frame 2
> #2 0x0000555555a047fa in ramblock_ptr (offset=281471527758833,
> block=<optimized out>)
> at ./include/exec/ram_addr.h:91
> 91 assert(offset_in_ramblock(block, offset));
>
> (gdb) l
> 86 return (b && b->host && offset < b->used_length) ? true :
> false;
> 87 }
> 88
> 89 static inline void *ramblock_ptr(RAMBlock *block, ram_addr_t
> offset)
> 90 {
> 91 assert(offset_in_ramblock(block, offset));
> 92 return (char *)block->host + offset;
> 93 }
> 94
> 95 static inline unsigned long int ramblock_recv_bitmap_offset(void
> *host_addr,
>
>
> [ 9.439487] pci 0000:00:02.0: BAR 1 [io 0xc000-0xc03f]
>
> Thread 65 "qemu-system-x86" received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7ffe9cff96c0 (LWP 15472)]
> phys_page_find (d=d@entry=0x7fff905ec880, addr=addr@entry=111288320) at
> system/physmem.c:337
> 337 if (section_covers_addr(§ions[lp.ptr], addr)) {
> (gdb) l
> 332 }
> 333 p = nodes[lp.ptr];
> 334 lp = p[(index >> (i * P_L2_BITS)) & (P_L2_SIZE - 1)];
> 335 }
> 336
> 337 if (section_covers_addr(§ions[lp.ptr], addr)) {
> 338 return §ions[lp.ptr];
> 339 } else {
> 340 return §ions[PHYS_SECTION_UNASSIGNED];
> 341 }
> (gdb)
>
>
> I was doing a bisection between 9.2.0 and 10.0.0, since we observed
> this issue happening wiht 10.0 but not - with 9.2. So some of the
> above failures might be somewhere from the middle between 9.2 and
> 10.0. However, I was able to trigger some of the failures with
> 9.2.0, though with much less probability. And some can be triggered
> in current master too, with much better probability.
>
> On my 4-core notebook, the above command line fails every 20..50 run.
>
> I was never able to reproduce the assertion failure as shown in !1921.
>
> As of now, this issue is hitting debian trixie, - in debci, when a
> package which creates a guest image tries to run qemu but in the
> debci environment there's no kvm available, so it resorts to tcg.
>
> On IRC, Manos Pitsidianakis noted that he was debugging use-after-free
> with MemoryRegion recently, and posted a patch which can help a bit:
> https://people.linaro.org/~manos.pitsidianakis/backtrace.diff
>
> I'm not sure where to go from here.
>
> Just collecting everything we have now.
>
> Thanks,
>
> /mjt
>
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: apparent race condition in mttcg memory handling
2025-07-21 11:47 ` Philippe Mathieu-Daudé
@ 2025-07-21 16:23 ` Pierrick Bouvier
2025-07-21 16:29 ` Pierrick Bouvier
0 siblings, 1 reply; 12+ messages in thread
From: Pierrick Bouvier @ 2025-07-21 16:23 UTC (permalink / raw)
To: Philippe Mathieu-Daudé, Michael Tokarev, QEMU Development
Cc: Jonathan Cameron, Alex Bennée, Richard Henderson,
Paolo Bonzini, Stefan Hajnoczi, Mark Cave-Ayland
Hi Michael,
On 7/21/25 4:47 AM, Philippe Mathieu-Daudé wrote:
> (Cc'ing few more developers)
>
> On 30/5/25 21:20, Michael Tokarev wrote:
>> Hi!
>>
>> For quite some time (almost whole day yesterday) I'm trying to find out
>> what's going on with mmtcg in qemu. There's apparently a race condition
>> somewhere, like a use-after-free or something.
>>
>> It started as an incarnation of
>> https://gitlab.com/qemu-project/qemu/-/issues/1921 -- the same assertion
>> failure, but on an x86_64 host this time (it's also mentioned in that
>> issue).
>>
>> However, that particular assertion failure is not the only possible
>> outcome. We're hitting multiple assertion failures or SIGSEGVs in
>> physmem.c and related files, - 4 or 5 different places so far.
>>
>> The problem here is that the bug is rather difficult to reproduce.
>> What I've been using so far was to make most host cores busy, and
>> specify amount of virtual CPUs close to actual host cores (threads).
>>
>> For example, on my 4-core, 8-threads notebook, I used `stress -c 8`
>> and ran qemu with -smp 10, to trigger this issue. However, on this
>> very notebook it is really difficult to trigger it, - it happens
>> every 30..50 runs or so.
>>
>> The reproducer I was using - it was just booting kernel, no user-
>> space is needed. Qemu crashes during kernel init, or it runs fine.
>>
>> I used regular kernel from debian sid:
>> http://deb.debian.org/debian/pool/main/l/linux-signed-amd64/linux-
>> image-amd64_6.12.29-1_amd64.deb
>> Extract vmlinuz-6.12.29-amd64 from there.
>>
>> In order to simplify the reproducing, I created a tiny initrd with
>> just one executable in there, which does a poweroff:
>>
>> cat >poweroff.c <<'EOF'
>> #include <sys/reboot.h>
>> #include <unistd.h>
>>
>> int main(void) {
>> reboot(RB_POWER_OFF);
>> sleep(5);
>> return 0;
>> }
>> EOF
>> diet gcc -static -o init poweroff.c
>> echo init | cpio -o -H newc > initrd
>>
>> (it uses dietlibc, optional, just to make the initrd smaller).
>>
>> Now, the qemu invocation I used:
>>
>> qemu-system-x86_64 -kernel vmlinuz -initrd initrd \
>> -append "console=ttyS0" \
>> -vga none -display none \
>> -serial file:/dev/tty \
>> -monitor stdio \
>> -m 256 \
>> -smp 16
>>
>> This way, it either succeeds, terminating normally due to
>> the initrd hating the system, or it will segfault or assert
>> as per the issue.
>>
>> For a 64-core machine, I used -smp 64, and had 16..40 cores
>> being busy with other stuff. Also, adding `nice' in front
>> of that command apparently helps.
>>
>> Now, to the various issues/places I've hit. Here's a typical
>> output:
>>
>> ...
>> [ 3.129806] smpboot: x86: Booting SMP configuration:
>> [ 3.135789] .... node #0, CPUs: #1 #2 #3 #4 #5 #6 #7
>> #8 #9
>> [ 0.000000] calibrate_delay_direct() failed to get a good estimate
>> for loops_per_jiffy.
>> [ 0.000000] Probably due to long platform interrupts. Consider using
>> "lpj=" boot option.
>> [ 0.000000] calibrate_delay_direct() failed to get a good estimate
>> for loops_per_jiffy.
>> [ 0.000000] Probably due to long platform interrupts. Consider using
>> "lpj=" boot option.
>> [ 0.000000] calibrate_delay_direct() failed to get a good estimate
>> for loops_per_jiffy.
>> [ 0.000000] Probably due to long platform interrupts. Consider using
>> "lpj=" boot option.
>> [ 0.000000] calibrate_delay_direct() failed to get a good estimate
>> for loops_per_jiffy.
>> [ 0.000000] Probably due to long platform interrupts. Consider using
>> "lpj=" boot option.
>> [ 4.494389] calibrate_delay_direct() failed to get a good estimate
>> for loops_per_jiffy.
>> [ 4.494389] Probably due to long platform interrupts. Consider using
>> "lpj=" boot option.
>> [ 4.494396] calibrate_delay_direct() failed to get a good estimate
>> for loops_per_jiffy.
>> [ 4.494396] Probably due to long platform interrupts. Consider using
>> "lpj=" boot option.
>> [ 4.494401] calibrate_delay_direct() failed to get a good estimate
>> for loops_per_jiffy.
>> [ 4.494401] Probably due to long platform interrupts. Consider using
>> "lpj=" boot option.
>> [ 4.494408] calibrate_delay_direct() failed to get a good estimate
>> for loops_per_jiffy.
>> [ 4.494408] Probably due to long platform interrupts. Consider using
>> "lpj=" boot option.
>> [ 4.494415] calibrate_delay_direct() failed to get a good estimate
>> for loops_per_jiffy.
>> [ 4.494415] Probably due to long platform interrupts. Consider using
>> "lpj=" boot option.
>> [ 5.864038] smp: Brought up 1 node, 10 CPUs
>> [ 5.865772] smpboot: Total of 10 processors activated (25983.25
>> BogoMIPS)
>> [ 6.119683] Memory: 200320K/261624K available (16384K kernel code,
>> 2486K rwdata, 11780K rodata, 4148K init, 4956K bss, 53176K reserved, 0K
>> cma-reserved)
>> [ 6.591933] devtmpfs: initialized
>> [ 6.635844] x86/mm: Memory block size: 128MB
>> [ 6.756849] clocksource: jiffies: mask: 0xffffffff max_cycles:
>> 0xffffffff, max_idle_ns: 7645041785100000 ns
>> [ 6.774545] futex hash table entries: 4096 (order: 6, 262144 bytes,
>> linear)
>> [ 6.840775] pinctrl core: initialized pinctrl subsystem
>> [ 7.117085] NET: Registered PF_NETLINK/PF_ROUTE protocol family
>> [ 7.165883] DMA: preallocated 128 KiB GFP_KERNEL pool for atomic
>> allocations
>> [ 7.184243] DMA: preallocated 128 KiB GFP_KERNEL|GFP_DMA pool for
>> atomic allocations
>> [ 7.188322] DMA: preallocated 128 KiB GFP_KERNEL|GFP_DMA32 pool for
>> atomic allocations
>> [ 7.195902] audit: initializing netlink subsys (disabled)
>> [ 7.223865] audit: type=2000 audit(1748628013.324:1):
>> state=initialized audit_enabled=0 res=1
>> [ 7.290904] thermal_sys: Registered thermal governor 'fair_share'
>> [ 7.291980] thermal_sys: Registered thermal governor 'bang_bang'
>> [ 7.295875] thermal_sys: Registered thermal governor 'step_wise'
>> [ 7.299817] thermal_sys: Registered thermal governor 'user_space'
>> [ 7.303804] thermal_sys: Registered thermal governor 'power_allocator'
>> [ 7.316281] cpuidle: using governor ladder
>> [ 7.331907] cpuidle: using governor menu
>> [ 7.348199] acpiphp: ACPI Hot Plug PCI Controller Driver version: 0.5
>> [ 7.407802] PCI: Using configuration type 1 for base access
>> [ 7.417386] mtrr: your CPUs had inconsistent fixed MTRR settings
>> [ 7.418244] mtrr: your CPUs had inconsistent variable MTRR settings
>> [ 7.419048] mtrr: your CPUs had inconsistent MTRRdefType settings
>> [ 7.419938] mtrr: probably your BIOS does not setup all CPUs.
>> [ 7.420691] mtrr: corrected configuration.
>> [ 7.461270] kprobes: kprobe jump-optimization is enabled. All kprobes
>> are optimized if possible.
>> [ 7.591938] HugeTLB: registered 2.00 MiB page size, pre-allocated 0
>> pages
>> [ 7.595986] HugeTLB: 28 KiB vmemmap can be freed for a 2.00 MiB page
>> [ 7.816900] ACPI: Added _OSI(Module Device)
>> [ 7.819950] ACPI: Added _OSI(Processor Device)
>> [ 7.823873] ACPI: Added _OSI(3.0 _SCP Extensions)
>> [ 7.827683] ACPI: Added _OSI(Processor Aggregator Device)
>> [ 8.000944] ACPI: 1 ACPI AML tables successfully acquired and loaded
>> [ 8.355952] ACPI: Interpreter enabled
>> [ 8.406604] ACPI: PM: (supports S0 S3 S4 S5)
>> [ 8.416143] ACPI: Using IOAPIC for interrupt routing
>> [ 8.448173] PCI: Using host bridge windows from ACPI; if necessary,
>> use "pci=nocrs" and report a bug
>> [ 8.468051] PCI: Using E820 reservations for host bridge windows
>> [ 8.562534] ACPI: Enabled 2 GPEs in block 00 to 0F
>> [ 9.153432] ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
>> [ 9.166585] acpi PNP0A03:00: _OSC: OS supports [ASPM ClockPM Segments
>> MSI HPX-Type3]
>> [ 9.168452] acpi PNP0A03:00: _OSC: not requesting OS control; OS
>> requires [ExtendedConfig ASPM ClockPM MSI]
>> [ 9.181933] acpi PNP0A03:00: fail to add MMCONFIG information, can't
>> access extended configuration space under this bridge
>> [ 9.297562] acpiphp: Slot [2] registered
>> ...
>> [ 9.369007] PCI host bridge to bus 0000:00
>> [ 9.376590] pci_bus 0000:00: root bus resource [io 0x0000-0x0cf7
>> window]
>> [ 9.379987] pci_bus 0000:00: root bus resource [io 0x0d00-0xffff
>> window]
>> [ 9.383826] pci_bus 0000:00: root bus resource [mem
>> 0x000a0000-0x000bffff window]
>> [ 9.387818] pci_bus 0000:00: root bus resource [mem
>> 0x10000000-0xfebfffff window]
>> [ 9.393681] pci_bus 0000:00: root bus resource [mem
>> 0x100000000-0x17fffffff window]
>> [ 9.396987] pci_bus 0000:00: root bus resource [bus 00-ff]
>> [ 9.414378] pci 0000:00:00.0: [8086:1237] type 00 class 0x060000
>> conventional PCI endpoint
>> [ 9.477179] pci 0000:00:01.0: [8086:7000] type 00 class 0x060100
>> conventional PCI endpoint
>> [ 9.494836] pci 0000:00:01.1: [8086:7010] type 00 class 0x010180
>> conventional PCI endpoint
>> [ 9.527173] pci 0000:00:01.1: BAR 4 [io 0xc040-0xc04f]
>> Segmentation fault
>>
>>
>> So it breaks somewhere in PCI init, after SMP/CPUs has been inited
>> by the guest kernel.
>>
>> Thread 21 "qemu-system-x86" received signal SIGSEGV, Segmentation fault.
>> 0x0000555555e2e9c0 in section_covers_addr (section=0x7fff58307,
>> addr=182591488) at ../system/physmem.c:309
>> 309 return int128_gethi(section->size) ||
>> (gdb) p *section
>> Cannot access memory at address 0x7fff58307
>>
>> This one has been seen multiple times.
>>
>> Thread 53 "qemu-system-x86" received signal SIGSEGV, Segmentation fault.
>> [Switching to Thread 0x7ffe8a7fc6c0 (LWP 104067)]
>> 0x0000555555e30382 in memory_region_section_get_iotlb
>> (cpu=0x5555584e0a90, section=0x7fff58c3eac0) at
>> ../system/physmem.c:1002
>> 1002 return section - d->map.sections;
>> d is NULL here
>>
>>
>> Thread 22 "qemu-system-x86" received signal SIGSEGV, Segmentation fault.
>> [Switching to Thread 0x7fff0bfff6c0 (LWP 57595)]
>> 0x0000555555e42c9a in memory_region_get_iommu (mr=0xffffffc1ffffffc1) at
>> include/exec/memory.h:1756
>> 1756 if (mr->alias) {
>> (gdb) p *mr
>> Cannot access memory at address 0xffffffc1ffffffc1
>> (gdb) frame 1
>> #1 0x0000555555e42cb9 in memory_region_get_iommu (mr=0x7fff54239a10) at
>> include/exec/memory.h:1757
>> 1757 return memory_region_get_iommu(mr->alias);
>> (gdb) p mr
>> $1 = (MemoryRegion *) 0x7fff54239a10
>>
>>
>> [ 9.222531] pci 0000:00:02.0: BAR 0 [mem 0xfebc0000-0xfebdffff]
>> [
>> Thread 54 "qemu-system-x86" received signal SIGSEGV, Segmentation fault.
>> [Switching to Thread 0x7ffebeffd6c0 (LWP 14977)]
>>
>> (gdb) l
>> 1004 /* Called from RCU critical section */
>> 1005 hwaddr memory_region_section_get_iotlb(CPUState *cpu,
>> 1006 MemoryRegionSection
>> *section)
>> 1007 {
>> 1008 AddressSpaceDispatch *d = flatview_to_dispatch(section->fv);
>> 1009 return section - d->map.sections;
>> 1010 }
>> 1011
>> 1012 static int subpage_register(subpage_t *mmio, uint32_t start,
>> uint32_t end,
>> 1013 uint16_t section);
>>
>> (gdb) p *section
>> $1 = {size = 4940204083636081308795136, mr = 0x7fff98739760, fv =
>> 0x7fff998f6fd0,
>> offset_within_region = 0, offset_within_address_space = 0, readonly =
>> false,
>> nonvolatile = false, unmergeable = 12}
>>
>> (gdb) p *section->fv
>> $2 = {rcu = {next = 0x0, func = 0x20281}, ref = 2555275280, ranges =
>> 0x7fff99486a60, nr = 0,
>> nr_allocated = 0, dispatch = 0x0, root = 0xffffffc1ffffffc1}
>>
>> (gdb) bt
>> #0 0x0000555555e5118c in memory_region_section_get_iotlb
>> (cpu=cpu@entry=0x55555894fdf0,
>> section=section@entry=0x7fff984e6810) at system/physmem.c:1009
>> #1 0x0000555555e6e07a in tlb_set_page_full (cpu=cpu@entry=0x55555894fdf0,
>> mmu_idx=mmu_idx@entry=6, addr=addr@entry=151134208,
>> full=full@entry=0x7ffebeffbd60)
>> at accel/tcg/cputlb.c:1088
>> #2 0x0000555555e70a92 in tlb_set_page_with_attrs
>> (cpu=cpu@entry=0x55555894fdf0,
>> addr=addr@entry=151134208, paddr=paddr@entry=151134208, attrs=...,
>> prot=<optimized out>,
>> mmu_idx=mmu_idx@entry=6, size=4096) at accel/tcg/cputlb.c:1193
>> #3 0x0000555555d4ae44 in x86_cpu_tlb_fill (cs=0x55555894fdf0,
>> addr=151138272,
>> size=<optimized out>, access_type=MMU_DATA_STORE, mmu_idx=6,
>> probe=<optimized out>,
>> retaddr=0) at target/i386/tcg/system/excp_helper.c:624
>> #4 0x0000555555e6e8cf in tlb_fill_align (cpu=0x55555894fdf0,
>> addr=151138272,
>> type=MMU_DATA_LOAD, type@entry=MMU_DATA_STORE, mmu_idx=6,
>> memop=memop@entry=MO_8,
>> size=-1739692016, size@entry=151138272, probe=true, ra=0) at accel/
>> tcg/cputlb.c:1251
>> #5 0x0000555555e6eb0d in probe_access_internal
>> (cpu=cpu@entry=0x55555894fdf0,
>> addr=addr@entry=151138272, fault_size=fault_size@entry=0,
>> access_type=access_type@entry=MMU_DATA_STORE, mmu_idx=<optimized out>,
>> nonfault=nonfault@entry=true, phost=0x7ffebeffc0a8,
>> pfull=0x7ffebeffbfa0, retaddr=0,
>> check_mem_cbs=false) at accel/tcg/cputlb.c:1371
>> #6 0x0000555555e70c84 in probe_access_full_mmu (env=0x5555589529b0,
>> addr=addr@entry=151138272,
>> size=size@entry=0, access_type=access_type@entry=MMU_DATA_STORE,
>> mmu_idx=<optimized out>,
>> phost=phost@entry=0x7ffebeffc0a8, pfull=0x7ffebeffbfa0) at accel/
>> tcg/cputlb.c:1439
>> #7 0x0000555555d497c9 in ptw_translate (inout=0x7ffebeffc090,
>> addr=151138272)
>> at target/i386/tcg/system/excp_helper.c:68
>> #8 0x0000555555d49988 in mmu_translate (env=env@entry=0x5555589529b0,
>> in=in@entry=0x7ffebeffc140, out=out@entry=0x7ffebeffc110,
>> err=err@entry=0x7ffebeffc120,
>> ra=ra@entry=0) at target/i386/tcg/system/excp_helper.c:198
>> #9 0x0000555555d4aece in get_physical_address (env=0x5555589529b0,
>> addr=18446741874686299840,
>> access_type=MMU_DATA_LOAD, mmu_idx=4, out=0x7ffebeffc110,
>> err=0x7ffebeffc120, ra=0)
>> at target/i386/tcg/system/excp_helper.c:597
>> #10 x86_cpu_tlb_fill (cs=0x55555894fdf0, addr=18446741874686299840,
>> size=<optimized out>,
>> access_type=MMU_DATA_LOAD, mmu_idx=4, probe=<optimized out>,
>> retaddr=0)
>> at target/i386/tcg/system/excp_helper.c:617
>> #11 0x0000555555e6e8cf in tlb_fill_align (cpu=0x55555894fdf0,
>> addr=18446741874686299840,
>> type=type@entry=MMU_DATA_LOAD, mmu_idx=4, memop=MO_8,
>> memop@entry=MO_32, size=-1739692016,
>> size@entry=3776, probe=false, ra=0) at accel/tcg/cputlb.c:1251
>> #12 0x0000555555e6ed4d in mmu_lookup1 (cpu=cpu@entry=0x55555894fdf0,
>> data=data@entry=0x7ffebeffc310, memop=memop@entry=MO_32,
>> mmu_idx=mmu_idx@entry=4,
>> access_type=access_type@entry=MMU_DATA_LOAD, ra=ra@entry=0) at
>> accel/tcg/cputlb.c:1652
>> #13 0x0000555555e6eea5 in mmu_lookup (cpu=cpu@entry=0x55555894fdf0,
>> addr=addr@entry=18446741874686299840, oi=oi@entry=36, ra=ra@entry=0,
>> type=type@entry=MMU_DATA_LOAD, l=l@entry=0x7ffebeffc310) at accel/
>> tcg/cputlb.c:1755
>> #14 0x0000555555e6f2f3 in do_ld4_mmu (cpu=cpu@entry=0x55555894fdf0,
>> addr=addr@entry=18446741874686299840, oi=oi@entry=36, ra=ra@entry=0,
>> access_type=access_type@entry=MMU_DATA_LOAD) at accel/tcg/
>> cputlb.c:2364
>> #15 0x0000555555e71dba in cpu_ldl_mmu (env=0x5555589529b0,
>> addr=18446741874686299840, oi=36,
>> ra=0) at accel/tcg/ldst_common.c.inc:165
>> #16 cpu_ldl_le_mmuidx_ra (env=env@entry=0x5555589529b0,
>> addr=addr@entry=18446741874686299840,
>> mmu_idx=<optimized out>, ra=ra@entry=0) at accel/tcg/
>> ldst_common.c.inc:308
>> #17 0x0000555555db72da in do_interrupt64 (env=0x5555589529b0, intno=236,
>> is_int=0, error_code=0,
>> next_eip=<optimized out>, is_hw=0) at target/i386/tcg/seg_helper.c:954
>> #18 do_interrupt_all (cpu=cpu@entry=0x55555894fdf0, intno=236,
>> is_int=is_int@entry=0,
>> error_code=error_code@entry=0, next_eip=next_eip@entry=0,
>> is_hw=is_hw@entry=1)
>> at target/i386/tcg/seg_helper.c:1213
>> #19 0x0000555555db884a in do_interrupt_x86_hardirq
>> (env=env@entry=0x5555589529b0,
>> intno=<optimized out>, is_hw=is_hw@entry=1) at target/i386/tcg/
>> seg_helper.c:1245
>> #20 0x0000555555d4f06f in x86_cpu_exec_interrupt (cs=0x55555894fdf0,
>> interrupt_request=<optimized out>) at target/i386/tcg/system/
>> seg_helper.c:209
>> #21 0x0000555555e660ed in cpu_handle_interrupt (cpu=0x55555894fdf0,
>> last_tb=<synthetic pointer>)
>> at accel/tcg/cpu-exec.c:851
>> #22 cpu_exec_loop (cpu=cpu@entry=0x55555894fdf0,
>> sc=sc@entry=0x7ffebeffc580)
>> at accel/tcg/cpu-exec.c:955
>> #23 0x0000555555e663f1 in cpu_exec_setjmp (cpu=cpu@entry=0x55555894fdf0,
>> sc=sc@entry=0x7ffebeffc580) at accel/tcg/cpu-exec.c:1033
>> type=type@entry=MMU_DATA_LOAD, l=l@entry=0x7ffebeffc310) at accel/
>> tcg/cputlb.c:1755
>> #14 0x0000555555e6f2f3 in do_ld4_mmu (cpu=cpu@entry=0x55555894fdf0,
>> addr=addr@entry=18446741874686299840, oi=oi@entry=36, ra=ra@entry=0,
>> access_type=access_type@entry=MMU_DATA_LOAD) at accel/tcg/
>> cputlb.c:2364
>> #15 0x0000555555e71dba in cpu_ldl_mmu (env=0x5555589529b0,
>> addr=18446741874686299840, oi=36,
>> ra=0) at accel/tcg/ldst_common.c.inc:165
>> #16 cpu_ldl_le_mmuidx_ra (env=env@entry=0x5555589529b0,
>> addr=addr@entry=18446741874686299840,
>> mmu_idx=<optimized out>, ra=ra@entry=0) at accel/tcg/
>> ldst_common.c.inc:308
>> #17 0x0000555555db72da in do_interrupt64 (env=0x5555589529b0, intno=236,
>> is_int=0, error_code=0,
>> next_eip=<optimized out>, is_hw=0) at target/i386/tcg/seg_helper.c:954
>> #18 do_interrupt_all (cpu=cpu@entry=0x55555894fdf0, intno=236,
>> is_int=is_int@entry=0,
>> error_code=error_code@entry=0, next_eip=next_eip@entry=0,
>> is_hw=is_hw@entry=1)
>> at target/i386/tcg/seg_helper.c:1213
>> #19 0x0000555555db884a in do_interrupt_x86_hardirq
>> (env=env@entry=0x5555589529b0,
>> intno=<optimized out>, is_hw=is_hw@entry=1) at target/i386/tcg/
>> seg_helper.c:1245
>> #20 0x0000555555d4f06f in x86_cpu_exec_interrupt (cs=0x55555894fdf0,
>> interrupt_request=<optimized out>) at target/i386/tcg/system/
>> seg_helper.c:209
>> #21 0x0000555555e660ed in cpu_handle_interrupt (cpu=0x55555894fdf0,
>> last_tb=<synthetic pointer>)
>> at accel/tcg/cpu-exec.c:851
>> #22 cpu_exec_loop (cpu=cpu@entry=0x55555894fdf0,
>> sc=sc@entry=0x7ffebeffc580)
>> at accel/tcg/cpu-exec.c:955
>> #23 0x0000555555e663f1 in cpu_exec_setjmp (cpu=cpu@entry=0x55555894fdf0,
>> sc=sc@entry=0x7ffebeffc580) at accel/tcg/cpu-exec.c:1033
>> --Type <RET> for more, q to quit, c to continue without paging--
>> #24 0x0000555555e66a5d in cpu_exec (cpu=cpu@entry=0x55555894fdf0) at
>> accel/tcg/cpu-exec.c:1059
>> #25 0x0000555555d2bdc7 in tcg_cpu_exec (cpu=cpu@entry=0x55555894fdf0)
>> at accel/tcg/tcg-accel-ops.c:80
>> #26 0x0000555555d2c1c3 in mttcg_cpu_thread_fn
>> (arg=arg@entry=0x55555894fdf0)
>> at accel/tcg/tcg-accel-ops-mttcg.c:94
>> #27 0x0000555556056d90 in qemu_thread_start (args=0x5555589cdba0) at
>> util/qemu-thread-posix.c:541
>> #28 0x00007ffff60e0b7b in ?? () from /lib/x86_64-linux-gnu/libc.so.6
>> #29 0x00007ffff615e7b8 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
>>
>>
>> qemu-system-x86_64: ./include/exec/ram_addr.h:91: ramblock_ptr:
>> Assertion `offset_in_ramblock(block, offset)' failed.
>>
>> (gdb) bt
>> #0 0x00007ffff6076507 in abort () from /lib/x86_64-linux-gnu/libc.so.6
>> #1 0x00007ffff6076420 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
>> #2 0x0000555555a047fa in ramblock_ptr (offset=281471527758833,
>> block=<optimized out>)
>> at ./include/exec/ram_addr.h:91
>> #3 0x0000555555a04c83 in ramblock_ptr (block=<optimized out>,
>> offset=<optimized out>)
>> at system/physmem.c:2238
>> #4 qemu_ram_ptr_length (lock=false, is_write=true, block=<optimized
>> out>, addr=<optimized out>,
>> size=0x0) at system/physmem.c:2430
>> #5 qemu_map_ram_ptr (ram_block=<optimized out>, addr=<optimized out>)
>> at system/physmem.c:2443
>> #6 0x0000555555e4af6b in memory_region_get_ram_ptr (mr=<optimized out>)
>> at system/memory.c:2452
>> #7 0x0000555555e6e024 in tlb_set_page_full (cpu=cpu@entry=0x5555589f9f50,
>> mmu_idx=mmu_idx@entry=4, addr=addr@entry=18446741874686296064,
>> full=full@entry=0x7ffebd7f90b0) at accel/tcg/cputlb.c:1065
>> #8 0x0000555555e70a92 in tlb_set_page_with_attrs
>> (cpu=cpu@entry=0x5555589f9f50,
>> addr=addr@entry=18446741874686296064, paddr=paddr@entry=206749696,
>> attrs=...,
>> prot=<optimized out>, mmu_idx=mmu_idx@entry=4, size=4096) at accel/
>> tcg/cputlb.c:1193
>> #9 0x0000555555d4ae44 in x86_cpu_tlb_fill (cs=0x5555589f9f50,
>> addr=18446741874686299840,
>> size=<optimized out>, access_type=MMU_DATA_LOAD, mmu_idx=4,
>> probe=<optimized out>, retaddr=0)
>> at target/i386/tcg/system/excp_helper.c:624
>> #10 0x0000555555e6e8cf in tlb_fill_align (cpu=0x5555589f9f50,
>> addr=18446741874686299840,
>> type=type@entry=MMU_DATA_LOAD, mmu_idx=4, memop=MO_8,
>> memop@entry=MO_32, size=-1115714056,
>> size@entry=3776, probe=false, ra=0) at accel/tcg/cputlb.c:1251
>> #11 0x0000555555e6ed4d in mmu_lookup1 (cpu=cpu@entry=0x5555589f9f50,
>> data=data@entry=0x7ffebd7f9310, memop=memop@entry=MO_32,
>> mmu_idx=mmu_idx@entry=4,
>> access_type=access_type@entry=MMU_DATA_LOAD, ra=ra@entry=0) at
>> accel/tcg/cputlb.c:1652
>> #12 0x0000555555e6eea5 in mmu_lookup (cpu=cpu@entry=0x5555589f9f50,
>> addr=addr@entry=18446741874686299840, oi=oi@entry=36, ra=ra@entry=0,
>> type=type@entry=MMU_DATA_LOAD, l=l@entry=0x7ffebd7f9310) at accel/
>> tcg/cputlb.c:1755
>> #13 0x0000555555e6f2f3 in do_ld4_mmu (cpu=cpu@entry=0x5555589f9f50,
>> addr=addr@entry=18446741874686299840, oi=oi@entry=36, ra=ra@entry=0,
>> access_type=access_type@entry=MMU_DATA_LOAD) at accel/tcg/
>> cputlb.c:2364
>> #14 0x0000555555e71dba in cpu_ldl_mmu (env=0x5555589fcb10,
>> addr=18446741874686299840, oi=36,
>> ra=0) at accel/tcg/ldst_common.c.inc:165
>> #15 cpu_ldl_le_mmuidx_ra (env=env@entry=0x5555589fcb10,
>> addr=addr@entry=18446741874686299840,
>> mmu_idx=<optimized out>, ra=ra@entry=0) at accel/tcg/
>> ldst_common.c.inc:308
>> #16 0x0000555555db72da in do_interrupt64 (env=0x5555589fcb10, intno=236,
>> is_int=0, error_code=0,
>> next_eip=<optimized out>, is_hw=0) at target/i386/tcg/seg_helper.c:954
>> #17 do_interrupt_all (cpu=cpu@entry=0x5555589f9f50, intno=236,
>> is_int=is_int@entry=0,
>> error_code=error_code@entry=0, next_eip=next_eip@entry=0,
>> is_hw=is_hw@entry=1)
>> at target/i386/tcg/seg_helper.c:1213
>> #18 0x0000555555db884a in do_interrupt_x86_hardirq
>> (env=env@entry=0x5555589fcb10,
>> intno=<optimized out>, is_hw=is_hw@entry=1) at target/i386/tcg/
>> seg_helper.c:1245
>> #19 0x0000555555d4f06f in x86_cpu_exec_interrupt (cs=0x5555589f9f50,
>> interrupt_request=<optimized out>) at target/i386/tcg/system/
>> seg_helper.c:209
>> #20 0x0000555555e660ed in cpu_handle_interrupt (cpu=0x5555589f9f50,
>> last_tb=<synthetic pointer>)
>> at accel/tcg/cpu-exec.c:851
>> #21 cpu_exec_loop (cpu=cpu@entry=0x5555589f9f50,
>> sc=sc@entry=0x7ffebd7f9580)
>> at accel/tcg/cpu-exec.c:955
>> #22 0x0000555555e663f1 in cpu_exec_setjmp (cpu=cpu@entry=0x5555589f9f50,
>> sc=sc@entry=0x7ffebd7f9580) at accel/tcg/cpu-exec.c:1033
>> #23 0x0000555555e66a5d in cpu_exec (cpu=cpu@entry=0x5555589f9f50) at
>> accel/tcg/cpu-exec.c:1059
>> #24 0x0000555555d2bdc7 in tcg_cpu_exec (cpu=cpu@entry=0x5555589f9f50)
>> at accel/tcg/tcg-accel-ops.c:80
>> #25 0x0000555555d2c1c3 in mttcg_cpu_thread_fn
>> (arg=arg@entry=0x5555589f9f50)
>> at accel/tcg/tcg-accel-ops-mttcg.c:94
>> #26 0x0000555556056d90 in qemu_thread_start (args=0x55555856bf60) at
>> util/qemu-thread-posix.c:541
>> #27 0x00007ffff60e0b7b in ?? () from /lib/x86_64-linux-gnu/libc.so.6
>> #28 0x00007ffff615e7b8 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
>>
>> (gdb) frame 2
>> #2 0x0000555555a047fa in ramblock_ptr (offset=281471527758833,
>> block=<optimized out>)
>> at ./include/exec/ram_addr.h:91
>> 91 assert(offset_in_ramblock(block, offset));
>>
>> (gdb) l
>> 86 return (b && b->host && offset < b->used_length) ? true :
>> false;
>> 87 }
>> 88
>> 89 static inline void *ramblock_ptr(RAMBlock *block, ram_addr_t
>> offset)
>> 90 {
>> 91 assert(offset_in_ramblock(block, offset));
>> 92 return (char *)block->host + offset;
>> 93 }
>> 94
>> 95 static inline unsigned long int ramblock_recv_bitmap_offset(void
>> *host_addr,
>>
>>
>> [ 9.439487] pci 0000:00:02.0: BAR 1 [io 0xc000-0xc03f]
>>
>> Thread 65 "qemu-system-x86" received signal SIGSEGV, Segmentation fault.
>> [Switching to Thread 0x7ffe9cff96c0 (LWP 15472)]
>> phys_page_find (d=d@entry=0x7fff905ec880, addr=addr@entry=111288320) at
>> system/physmem.c:337
>> 337 if (section_covers_addr(§ions[lp.ptr], addr)) {
>> (gdb) l
>> 332 }
>> 333 p = nodes[lp.ptr];
>> 334 lp = p[(index >> (i * P_L2_BITS)) & (P_L2_SIZE - 1)];
>> 335 }
>> 336
>> 337 if (section_covers_addr(§ions[lp.ptr], addr)) {
>> 338 return §ions[lp.ptr];
>> 339 } else {
>> 340 return §ions[PHYS_SECTION_UNASSIGNED];
>> 341 }
>> (gdb)
>>
>>
>> I was doing a bisection between 9.2.0 and 10.0.0, since we observed
>> this issue happening wiht 10.0 but not - with 9.2. So some of the
>> above failures might be somewhere from the middle between 9.2 and
>> 10.0. However, I was able to trigger some of the failures with
>> 9.2.0, though with much less probability. And some can be triggered
>> in current master too, with much better probability.
>>
>> On my 4-core notebook, the above command line fails every 20..50 run.
>>
>> I was never able to reproduce the assertion failure as shown in !1921.
>>
>> As of now, this issue is hitting debian trixie, - in debci, when a
>> package which creates a guest image tries to run qemu but in the
>> debci environment there's no kvm available, so it resorts to tcg.
>>
>> On IRC, Manos Pitsidianakis noted that he was debugging use-after-free
>> with MemoryRegion recently, and posted a patch which can help a bit:
>> https://people.linaro.org/~manos.pitsidianakis/backtrace.diff
>>
>> I'm not sure where to go from here.
>>
>> Just collecting everything we have now.
>>
>> Thanks,
>>
>> /mjt
>>
>
looks like a good target for TSAN, which might expose the race without
really having to trigger it.
https://www.qemu.org/docs/master/devel/testing/main.html#building-and-testing-with-tsan
Else, you can reproduce your run using rr record -h (chaos mode) [1],
which randomly schedules threads, until it catches the segfault, and
then you'll have a reproducible case to debug.
[1] https://github.com/rr-debugger/rr
Regards,
Pierrick
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: apparent race condition in mttcg memory handling
2025-07-21 16:23 ` Pierrick Bouvier
@ 2025-07-21 16:29 ` Pierrick Bouvier
2025-07-21 17:14 ` Michael Tokarev
0 siblings, 1 reply; 12+ messages in thread
From: Pierrick Bouvier @ 2025-07-21 16:29 UTC (permalink / raw)
To: Philippe Mathieu-Daudé, Michael Tokarev, QEMU Development
Cc: Jonathan Cameron, Alex Bennée, Richard Henderson,
Paolo Bonzini, Stefan Hajnoczi, Mark Cave-Ayland
On 7/21/25 9:23 AM, Pierrick Bouvier wrote:
> Hi Michael,
>
> On 7/21/25 4:47 AM, Philippe Mathieu-Daudé wrote:
>> (Cc'ing few more developers)
>>
>> On 30/5/25 21:20, Michael Tokarev wrote:
>>> Hi!
>>>
>>> For quite some time (almost whole day yesterday) I'm trying to find out
>>> what's going on with mmtcg in qemu. There's apparently a race condition
>>> somewhere, like a use-after-free or something.
>>>
>>> It started as an incarnation of
>>> https://gitlab.com/qemu-project/qemu/-/issues/1921 -- the same assertion
>>> failure, but on an x86_64 host this time (it's also mentioned in that
>>> issue).
>>>
>>> However, that particular assertion failure is not the only possible
>>> outcome. We're hitting multiple assertion failures or SIGSEGVs in
>>> physmem.c and related files, - 4 or 5 different places so far.
>>>
>>> The problem here is that the bug is rather difficult to reproduce.
>>> What I've been using so far was to make most host cores busy, and
>>> specify amount of virtual CPUs close to actual host cores (threads).
>>>
>>> For example, on my 4-core, 8-threads notebook, I used `stress -c 8`
>>> and ran qemu with -smp 10, to trigger this issue. However, on this
>>> very notebook it is really difficult to trigger it, - it happens
>>> every 30..50 runs or so.
>>>
>>> The reproducer I was using - it was just booting kernel, no user-
>>> space is needed. Qemu crashes during kernel init, or it runs fine.
>>>
>>> I used regular kernel from debian sid:
>>> http://deb.debian.org/debian/pool/main/l/linux-signed-amd64/linux-
>>> image-amd64_6.12.29-1_amd64.deb
>>> Extract vmlinuz-6.12.29-amd64 from there.
>>>
>>> In order to simplify the reproducing, I created a tiny initrd with
>>> just one executable in there, which does a poweroff:
>>>
>>> cat >poweroff.c <<'EOF'
>>> #include <sys/reboot.h>
>>> #include <unistd.h>
>>>
>>> int main(void) {
>>> reboot(RB_POWER_OFF);
>>> sleep(5);
>>> return 0;
>>> }
>>> EOF
>>> diet gcc -static -o init poweroff.c
>>> echo init | cpio -o -H newc > initrd
>>>
>>> (it uses dietlibc, optional, just to make the initrd smaller).
>>>
>>> Now, the qemu invocation I used:
>>>
>>> qemu-system-x86_64 -kernel vmlinuz -initrd initrd \
>>> -append "console=ttyS0" \
>>> -vga none -display none \
>>> -serial file:/dev/tty \
>>> -monitor stdio \
>>> -m 256 \
>>> -smp 16
>>>
>>> This way, it either succeeds, terminating normally due to
>>> the initrd hating the system, or it will segfault or assert
>>> as per the issue.
>>>
>>> For a 64-core machine, I used -smp 64, and had 16..40 cores
>>> being busy with other stuff. Also, adding `nice' in front
>>> of that command apparently helps.
>>>
>>> Now, to the various issues/places I've hit. Here's a typical
>>> output:
>>>
>>> ...
>>> [ 3.129806] smpboot: x86: Booting SMP configuration:
>>> [ 3.135789] .... node #0, CPUs: #1 #2 #3 #4 #5 #6 #7
>>> #8 #9
>>> [ 0.000000] calibrate_delay_direct() failed to get a good estimate
>>> for loops_per_jiffy.
>>> [ 0.000000] Probably due to long platform interrupts. Consider using
>>> "lpj=" boot option.
>>> [ 0.000000] calibrate_delay_direct() failed to get a good estimate
>>> for loops_per_jiffy.
>>> [ 0.000000] Probably due to long platform interrupts. Consider using
>>> "lpj=" boot option.
>>> [ 0.000000] calibrate_delay_direct() failed to get a good estimate
>>> for loops_per_jiffy.
>>> [ 0.000000] Probably due to long platform interrupts. Consider using
>>> "lpj=" boot option.
>>> [ 0.000000] calibrate_delay_direct() failed to get a good estimate
>>> for loops_per_jiffy.
>>> [ 0.000000] Probably due to long platform interrupts. Consider using
>>> "lpj=" boot option.
>>> [ 4.494389] calibrate_delay_direct() failed to get a good estimate
>>> for loops_per_jiffy.
>>> [ 4.494389] Probably due to long platform interrupts. Consider using
>>> "lpj=" boot option.
>>> [ 4.494396] calibrate_delay_direct() failed to get a good estimate
>>> for loops_per_jiffy.
>>> [ 4.494396] Probably due to long platform interrupts. Consider using
>>> "lpj=" boot option.
>>> [ 4.494401] calibrate_delay_direct() failed to get a good estimate
>>> for loops_per_jiffy.
>>> [ 4.494401] Probably due to long platform interrupts. Consider using
>>> "lpj=" boot option.
>>> [ 4.494408] calibrate_delay_direct() failed to get a good estimate
>>> for loops_per_jiffy.
>>> [ 4.494408] Probably due to long platform interrupts. Consider using
>>> "lpj=" boot option.
>>> [ 4.494415] calibrate_delay_direct() failed to get a good estimate
>>> for loops_per_jiffy.
>>> [ 4.494415] Probably due to long platform interrupts. Consider using
>>> "lpj=" boot option.
>>> [ 5.864038] smp: Brought up 1 node, 10 CPUs
>>> [ 5.865772] smpboot: Total of 10 processors activated (25983.25
>>> BogoMIPS)
>>> [ 6.119683] Memory: 200320K/261624K available (16384K kernel code,
>>> 2486K rwdata, 11780K rodata, 4148K init, 4956K bss, 53176K reserved, 0K
>>> cma-reserved)
>>> [ 6.591933] devtmpfs: initialized
>>> [ 6.635844] x86/mm: Memory block size: 128MB
>>> [ 6.756849] clocksource: jiffies: mask: 0xffffffff max_cycles:
>>> 0xffffffff, max_idle_ns: 7645041785100000 ns
>>> [ 6.774545] futex hash table entries: 4096 (order: 6, 262144 bytes,
>>> linear)
>>> [ 6.840775] pinctrl core: initialized pinctrl subsystem
>>> [ 7.117085] NET: Registered PF_NETLINK/PF_ROUTE protocol family
>>> [ 7.165883] DMA: preallocated 128 KiB GFP_KERNEL pool for atomic
>>> allocations
>>> [ 7.184243] DMA: preallocated 128 KiB GFP_KERNEL|GFP_DMA pool for
>>> atomic allocations
>>> [ 7.188322] DMA: preallocated 128 KiB GFP_KERNEL|GFP_DMA32 pool for
>>> atomic allocations
>>> [ 7.195902] audit: initializing netlink subsys (disabled)
>>> [ 7.223865] audit: type=2000 audit(1748628013.324:1):
>>> state=initialized audit_enabled=0 res=1
>>> [ 7.290904] thermal_sys: Registered thermal governor 'fair_share'
>>> [ 7.291980] thermal_sys: Registered thermal governor 'bang_bang'
>>> [ 7.295875] thermal_sys: Registered thermal governor 'step_wise'
>>> [ 7.299817] thermal_sys: Registered thermal governor 'user_space'
>>> [ 7.303804] thermal_sys: Registered thermal governor 'power_allocator'
>>> [ 7.316281] cpuidle: using governor ladder
>>> [ 7.331907] cpuidle: using governor menu
>>> [ 7.348199] acpiphp: ACPI Hot Plug PCI Controller Driver version: 0.5
>>> [ 7.407802] PCI: Using configuration type 1 for base access
>>> [ 7.417386] mtrr: your CPUs had inconsistent fixed MTRR settings
>>> [ 7.418244] mtrr: your CPUs had inconsistent variable MTRR settings
>>> [ 7.419048] mtrr: your CPUs had inconsistent MTRRdefType settings
>>> [ 7.419938] mtrr: probably your BIOS does not setup all CPUs.
>>> [ 7.420691] mtrr: corrected configuration.
>>> [ 7.461270] kprobes: kprobe jump-optimization is enabled. All kprobes
>>> are optimized if possible.
>>> [ 7.591938] HugeTLB: registered 2.00 MiB page size, pre-allocated 0
>>> pages
>>> [ 7.595986] HugeTLB: 28 KiB vmemmap can be freed for a 2.00 MiB page
>>> [ 7.816900] ACPI: Added _OSI(Module Device)
>>> [ 7.819950] ACPI: Added _OSI(Processor Device)
>>> [ 7.823873] ACPI: Added _OSI(3.0 _SCP Extensions)
>>> [ 7.827683] ACPI: Added _OSI(Processor Aggregator Device)
>>> [ 8.000944] ACPI: 1 ACPI AML tables successfully acquired and loaded
>>> [ 8.355952] ACPI: Interpreter enabled
>>> [ 8.406604] ACPI: PM: (supports S0 S3 S4 S5)
>>> [ 8.416143] ACPI: Using IOAPIC for interrupt routing
>>> [ 8.448173] PCI: Using host bridge windows from ACPI; if necessary,
>>> use "pci=nocrs" and report a bug
>>> [ 8.468051] PCI: Using E820 reservations for host bridge windows
>>> [ 8.562534] ACPI: Enabled 2 GPEs in block 00 to 0F
>>> [ 9.153432] ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
>>> [ 9.166585] acpi PNP0A03:00: _OSC: OS supports [ASPM ClockPM Segments
>>> MSI HPX-Type3]
>>> [ 9.168452] acpi PNP0A03:00: _OSC: not requesting OS control; OS
>>> requires [ExtendedConfig ASPM ClockPM MSI]
>>> [ 9.181933] acpi PNP0A03:00: fail to add MMCONFIG information, can't
>>> access extended configuration space under this bridge
>>> [ 9.297562] acpiphp: Slot [2] registered
>>> ...
>>> [ 9.369007] PCI host bridge to bus 0000:00
>>> [ 9.376590] pci_bus 0000:00: root bus resource [io 0x0000-0x0cf7
>>> window]
>>> [ 9.379987] pci_bus 0000:00: root bus resource [io 0x0d00-0xffff
>>> window]
>>> [ 9.383826] pci_bus 0000:00: root bus resource [mem
>>> 0x000a0000-0x000bffff window]
>>> [ 9.387818] pci_bus 0000:00: root bus resource [mem
>>> 0x10000000-0xfebfffff window]
>>> [ 9.393681] pci_bus 0000:00: root bus resource [mem
>>> 0x100000000-0x17fffffff window]
>>> [ 9.396987] pci_bus 0000:00: root bus resource [bus 00-ff]
>>> [ 9.414378] pci 0000:00:00.0: [8086:1237] type 00 class 0x060000
>>> conventional PCI endpoint
>>> [ 9.477179] pci 0000:00:01.0: [8086:7000] type 00 class 0x060100
>>> conventional PCI endpoint
>>> [ 9.494836] pci 0000:00:01.1: [8086:7010] type 00 class 0x010180
>>> conventional PCI endpoint
>>> [ 9.527173] pci 0000:00:01.1: BAR 4 [io 0xc040-0xc04f]
>>> Segmentation fault
>>>
>>>
>>> So it breaks somewhere in PCI init, after SMP/CPUs has been inited
>>> by the guest kernel.
>>>
>>> Thread 21 "qemu-system-x86" received signal SIGSEGV, Segmentation fault.
>>> 0x0000555555e2e9c0 in section_covers_addr (section=0x7fff58307,
>>> addr=182591488) at ../system/physmem.c:309
>>> 309 return int128_gethi(section->size) ||
>>> (gdb) p *section
>>> Cannot access memory at address 0x7fff58307
>>>
>>> This one has been seen multiple times.
>>>
>>> Thread 53 "qemu-system-x86" received signal SIGSEGV, Segmentation fault.
>>> [Switching to Thread 0x7ffe8a7fc6c0 (LWP 104067)]
>>> 0x0000555555e30382 in memory_region_section_get_iotlb
>>> (cpu=0x5555584e0a90, section=0x7fff58c3eac0) at
>>> ../system/physmem.c:1002
>>> 1002 return section - d->map.sections;
>>> d is NULL here
>>>
>>>
>>> Thread 22 "qemu-system-x86" received signal SIGSEGV, Segmentation fault.
>>> [Switching to Thread 0x7fff0bfff6c0 (LWP 57595)]
>>> 0x0000555555e42c9a in memory_region_get_iommu (mr=0xffffffc1ffffffc1) at
>>> include/exec/memory.h:1756
>>> 1756 if (mr->alias) {
>>> (gdb) p *mr
>>> Cannot access memory at address 0xffffffc1ffffffc1
>>> (gdb) frame 1
>>> #1 0x0000555555e42cb9 in memory_region_get_iommu (mr=0x7fff54239a10) at
>>> include/exec/memory.h:1757
>>> 1757 return memory_region_get_iommu(mr->alias);
>>> (gdb) p mr
>>> $1 = (MemoryRegion *) 0x7fff54239a10
>>>
>>>
>>> [ 9.222531] pci 0000:00:02.0: BAR 0 [mem 0xfebc0000-0xfebdffff]
>>> [
>>> Thread 54 "qemu-system-x86" received signal SIGSEGV, Segmentation fault.
>>> [Switching to Thread 0x7ffebeffd6c0 (LWP 14977)]
>>>
>>> (gdb) l
>>> 1004 /* Called from RCU critical section */
>>> 1005 hwaddr memory_region_section_get_iotlb(CPUState *cpu,
>>> 1006 MemoryRegionSection
>>> *section)
>>> 1007 {
>>> 1008 AddressSpaceDispatch *d = flatview_to_dispatch(section->fv);
>>> 1009 return section - d->map.sections;
>>> 1010 }
>>> 1011
>>> 1012 static int subpage_register(subpage_t *mmio, uint32_t start,
>>> uint32_t end,
>>> 1013 uint16_t section);
>>>
>>> (gdb) p *section
>>> $1 = {size = 4940204083636081308795136, mr = 0x7fff98739760, fv =
>>> 0x7fff998f6fd0,
>>> offset_within_region = 0, offset_within_address_space = 0, readonly =
>>> false,
>>> nonvolatile = false, unmergeable = 12}
>>>
>>> (gdb) p *section->fv
>>> $2 = {rcu = {next = 0x0, func = 0x20281}, ref = 2555275280, ranges =
>>> 0x7fff99486a60, nr = 0,
>>> nr_allocated = 0, dispatch = 0x0, root = 0xffffffc1ffffffc1}
>>>
>>> (gdb) bt
>>> #0 0x0000555555e5118c in memory_region_section_get_iotlb
>>> (cpu=cpu@entry=0x55555894fdf0,
>>> section=section@entry=0x7fff984e6810) at system/physmem.c:1009
>>> #1 0x0000555555e6e07a in tlb_set_page_full (cpu=cpu@entry=0x55555894fdf0,
>>> mmu_idx=mmu_idx@entry=6, addr=addr@entry=151134208,
>>> full=full@entry=0x7ffebeffbd60)
>>> at accel/tcg/cputlb.c:1088
>>> #2 0x0000555555e70a92 in tlb_set_page_with_attrs
>>> (cpu=cpu@entry=0x55555894fdf0,
>>> addr=addr@entry=151134208, paddr=paddr@entry=151134208, attrs=...,
>>> prot=<optimized out>,
>>> mmu_idx=mmu_idx@entry=6, size=4096) at accel/tcg/cputlb.c:1193
>>> #3 0x0000555555d4ae44 in x86_cpu_tlb_fill (cs=0x55555894fdf0,
>>> addr=151138272,
>>> size=<optimized out>, access_type=MMU_DATA_STORE, mmu_idx=6,
>>> probe=<optimized out>,
>>> retaddr=0) at target/i386/tcg/system/excp_helper.c:624
>>> #4 0x0000555555e6e8cf in tlb_fill_align (cpu=0x55555894fdf0,
>>> addr=151138272,
>>> type=MMU_DATA_LOAD, type@entry=MMU_DATA_STORE, mmu_idx=6,
>>> memop=memop@entry=MO_8,
>>> size=-1739692016, size@entry=151138272, probe=true, ra=0) at accel/
>>> tcg/cputlb.c:1251
>>> #5 0x0000555555e6eb0d in probe_access_internal
>>> (cpu=cpu@entry=0x55555894fdf0,
>>> addr=addr@entry=151138272, fault_size=fault_size@entry=0,
>>> access_type=access_type@entry=MMU_DATA_STORE, mmu_idx=<optimized out>,
>>> nonfault=nonfault@entry=true, phost=0x7ffebeffc0a8,
>>> pfull=0x7ffebeffbfa0, retaddr=0,
>>> check_mem_cbs=false) at accel/tcg/cputlb.c:1371
>>> #6 0x0000555555e70c84 in probe_access_full_mmu (env=0x5555589529b0,
>>> addr=addr@entry=151138272,
>>> size=size@entry=0, access_type=access_type@entry=MMU_DATA_STORE,
>>> mmu_idx=<optimized out>,
>>> phost=phost@entry=0x7ffebeffc0a8, pfull=0x7ffebeffbfa0) at accel/
>>> tcg/cputlb.c:1439
>>> #7 0x0000555555d497c9 in ptw_translate (inout=0x7ffebeffc090,
>>> addr=151138272)
>>> at target/i386/tcg/system/excp_helper.c:68
>>> #8 0x0000555555d49988 in mmu_translate (env=env@entry=0x5555589529b0,
>>> in=in@entry=0x7ffebeffc140, out=out@entry=0x7ffebeffc110,
>>> err=err@entry=0x7ffebeffc120,
>>> ra=ra@entry=0) at target/i386/tcg/system/excp_helper.c:198
>>> #9 0x0000555555d4aece in get_physical_address (env=0x5555589529b0,
>>> addr=18446741874686299840,
>>> access_type=MMU_DATA_LOAD, mmu_idx=4, out=0x7ffebeffc110,
>>> err=0x7ffebeffc120, ra=0)
>>> at target/i386/tcg/system/excp_helper.c:597
>>> #10 x86_cpu_tlb_fill (cs=0x55555894fdf0, addr=18446741874686299840,
>>> size=<optimized out>,
>>> access_type=MMU_DATA_LOAD, mmu_idx=4, probe=<optimized out>,
>>> retaddr=0)
>>> at target/i386/tcg/system/excp_helper.c:617
>>> #11 0x0000555555e6e8cf in tlb_fill_align (cpu=0x55555894fdf0,
>>> addr=18446741874686299840,
>>> type=type@entry=MMU_DATA_LOAD, mmu_idx=4, memop=MO_8,
>>> memop@entry=MO_32, size=-1739692016,
>>> size@entry=3776, probe=false, ra=0) at accel/tcg/cputlb.c:1251
>>> #12 0x0000555555e6ed4d in mmu_lookup1 (cpu=cpu@entry=0x55555894fdf0,
>>> data=data@entry=0x7ffebeffc310, memop=memop@entry=MO_32,
>>> mmu_idx=mmu_idx@entry=4,
>>> access_type=access_type@entry=MMU_DATA_LOAD, ra=ra@entry=0) at
>>> accel/tcg/cputlb.c:1652
>>> #13 0x0000555555e6eea5 in mmu_lookup (cpu=cpu@entry=0x55555894fdf0,
>>> addr=addr@entry=18446741874686299840, oi=oi@entry=36, ra=ra@entry=0,
>>> type=type@entry=MMU_DATA_LOAD, l=l@entry=0x7ffebeffc310) at accel/
>>> tcg/cputlb.c:1755
>>> #14 0x0000555555e6f2f3 in do_ld4_mmu (cpu=cpu@entry=0x55555894fdf0,
>>> addr=addr@entry=18446741874686299840, oi=oi@entry=36, ra=ra@entry=0,
>>> access_type=access_type@entry=MMU_DATA_LOAD) at accel/tcg/
>>> cputlb.c:2364
>>> #15 0x0000555555e71dba in cpu_ldl_mmu (env=0x5555589529b0,
>>> addr=18446741874686299840, oi=36,
>>> ra=0) at accel/tcg/ldst_common.c.inc:165
>>> #16 cpu_ldl_le_mmuidx_ra (env=env@entry=0x5555589529b0,
>>> addr=addr@entry=18446741874686299840,
>>> mmu_idx=<optimized out>, ra=ra@entry=0) at accel/tcg/
>>> ldst_common.c.inc:308
>>> #17 0x0000555555db72da in do_interrupt64 (env=0x5555589529b0, intno=236,
>>> is_int=0, error_code=0,
>>> next_eip=<optimized out>, is_hw=0) at target/i386/tcg/seg_helper.c:954
>>> #18 do_interrupt_all (cpu=cpu@entry=0x55555894fdf0, intno=236,
>>> is_int=is_int@entry=0,
>>> error_code=error_code@entry=0, next_eip=next_eip@entry=0,
>>> is_hw=is_hw@entry=1)
>>> at target/i386/tcg/seg_helper.c:1213
>>> #19 0x0000555555db884a in do_interrupt_x86_hardirq
>>> (env=env@entry=0x5555589529b0,
>>> intno=<optimized out>, is_hw=is_hw@entry=1) at target/i386/tcg/
>>> seg_helper.c:1245
>>> #20 0x0000555555d4f06f in x86_cpu_exec_interrupt (cs=0x55555894fdf0,
>>> interrupt_request=<optimized out>) at target/i386/tcg/system/
>>> seg_helper.c:209
>>> #21 0x0000555555e660ed in cpu_handle_interrupt (cpu=0x55555894fdf0,
>>> last_tb=<synthetic pointer>)
>>> at accel/tcg/cpu-exec.c:851
>>> #22 cpu_exec_loop (cpu=cpu@entry=0x55555894fdf0,
>>> sc=sc@entry=0x7ffebeffc580)
>>> at accel/tcg/cpu-exec.c:955
>>> #23 0x0000555555e663f1 in cpu_exec_setjmp (cpu=cpu@entry=0x55555894fdf0,
>>> sc=sc@entry=0x7ffebeffc580) at accel/tcg/cpu-exec.c:1033
>>> type=type@entry=MMU_DATA_LOAD, l=l@entry=0x7ffebeffc310) at accel/
>>> tcg/cputlb.c:1755
>>> #14 0x0000555555e6f2f3 in do_ld4_mmu (cpu=cpu@entry=0x55555894fdf0,
>>> addr=addr@entry=18446741874686299840, oi=oi@entry=36, ra=ra@entry=0,
>>> access_type=access_type@entry=MMU_DATA_LOAD) at accel/tcg/
>>> cputlb.c:2364
>>> #15 0x0000555555e71dba in cpu_ldl_mmu (env=0x5555589529b0,
>>> addr=18446741874686299840, oi=36,
>>> ra=0) at accel/tcg/ldst_common.c.inc:165
>>> #16 cpu_ldl_le_mmuidx_ra (env=env@entry=0x5555589529b0,
>>> addr=addr@entry=18446741874686299840,
>>> mmu_idx=<optimized out>, ra=ra@entry=0) at accel/tcg/
>>> ldst_common.c.inc:308
>>> #17 0x0000555555db72da in do_interrupt64 (env=0x5555589529b0, intno=236,
>>> is_int=0, error_code=0,
>>> next_eip=<optimized out>, is_hw=0) at target/i386/tcg/seg_helper.c:954
>>> #18 do_interrupt_all (cpu=cpu@entry=0x55555894fdf0, intno=236,
>>> is_int=is_int@entry=0,
>>> error_code=error_code@entry=0, next_eip=next_eip@entry=0,
>>> is_hw=is_hw@entry=1)
>>> at target/i386/tcg/seg_helper.c:1213
>>> #19 0x0000555555db884a in do_interrupt_x86_hardirq
>>> (env=env@entry=0x5555589529b0,
>>> intno=<optimized out>, is_hw=is_hw@entry=1) at target/i386/tcg/
>>> seg_helper.c:1245
>>> #20 0x0000555555d4f06f in x86_cpu_exec_interrupt (cs=0x55555894fdf0,
>>> interrupt_request=<optimized out>) at target/i386/tcg/system/
>>> seg_helper.c:209
>>> #21 0x0000555555e660ed in cpu_handle_interrupt (cpu=0x55555894fdf0,
>>> last_tb=<synthetic pointer>)
>>> at accel/tcg/cpu-exec.c:851
>>> #22 cpu_exec_loop (cpu=cpu@entry=0x55555894fdf0,
>>> sc=sc@entry=0x7ffebeffc580)
>>> at accel/tcg/cpu-exec.c:955
>>> #23 0x0000555555e663f1 in cpu_exec_setjmp (cpu=cpu@entry=0x55555894fdf0,
>>> sc=sc@entry=0x7ffebeffc580) at accel/tcg/cpu-exec.c:1033
>>> --Type <RET> for more, q to quit, c to continue without paging--
>>> #24 0x0000555555e66a5d in cpu_exec (cpu=cpu@entry=0x55555894fdf0) at
>>> accel/tcg/cpu-exec.c:1059
>>> #25 0x0000555555d2bdc7 in tcg_cpu_exec (cpu=cpu@entry=0x55555894fdf0)
>>> at accel/tcg/tcg-accel-ops.c:80
>>> #26 0x0000555555d2c1c3 in mttcg_cpu_thread_fn
>>> (arg=arg@entry=0x55555894fdf0)
>>> at accel/tcg/tcg-accel-ops-mttcg.c:94
>>> #27 0x0000555556056d90 in qemu_thread_start (args=0x5555589cdba0) at
>>> util/qemu-thread-posix.c:541
>>> #28 0x00007ffff60e0b7b in ?? () from /lib/x86_64-linux-gnu/libc.so.6
>>> #29 0x00007ffff615e7b8 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
>>>
>>>
>>> qemu-system-x86_64: ./include/exec/ram_addr.h:91: ramblock_ptr:
>>> Assertion `offset_in_ramblock(block, offset)' failed.
>>>
>>> (gdb) bt
>>> #0 0x00007ffff6076507 in abort () from /lib/x86_64-linux-gnu/libc.so.6
>>> #1 0x00007ffff6076420 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
>>> #2 0x0000555555a047fa in ramblock_ptr (offset=281471527758833,
>>> block=<optimized out>)
>>> at ./include/exec/ram_addr.h:91
>>> #3 0x0000555555a04c83 in ramblock_ptr (block=<optimized out>,
>>> offset=<optimized out>)
>>> at system/physmem.c:2238
>>> #4 qemu_ram_ptr_length (lock=false, is_write=true, block=<optimized
>>> out>, addr=<optimized out>,
>>> size=0x0) at system/physmem.c:2430
>>> #5 qemu_map_ram_ptr (ram_block=<optimized out>, addr=<optimized out>)
>>> at system/physmem.c:2443
>>> #6 0x0000555555e4af6b in memory_region_get_ram_ptr (mr=<optimized out>)
>>> at system/memory.c:2452
>>> #7 0x0000555555e6e024 in tlb_set_page_full (cpu=cpu@entry=0x5555589f9f50,
>>> mmu_idx=mmu_idx@entry=4, addr=addr@entry=18446741874686296064,
>>> full=full@entry=0x7ffebd7f90b0) at accel/tcg/cputlb.c:1065
>>> #8 0x0000555555e70a92 in tlb_set_page_with_attrs
>>> (cpu=cpu@entry=0x5555589f9f50,
>>> addr=addr@entry=18446741874686296064, paddr=paddr@entry=206749696,
>>> attrs=...,
>>> prot=<optimized out>, mmu_idx=mmu_idx@entry=4, size=4096) at accel/
>>> tcg/cputlb.c:1193
>>> #9 0x0000555555d4ae44 in x86_cpu_tlb_fill (cs=0x5555589f9f50,
>>> addr=18446741874686299840,
>>> size=<optimized out>, access_type=MMU_DATA_LOAD, mmu_idx=4,
>>> probe=<optimized out>, retaddr=0)
>>> at target/i386/tcg/system/excp_helper.c:624
>>> #10 0x0000555555e6e8cf in tlb_fill_align (cpu=0x5555589f9f50,
>>> addr=18446741874686299840,
>>> type=type@entry=MMU_DATA_LOAD, mmu_idx=4, memop=MO_8,
>>> memop@entry=MO_32, size=-1115714056,
>>> size@entry=3776, probe=false, ra=0) at accel/tcg/cputlb.c:1251
>>> #11 0x0000555555e6ed4d in mmu_lookup1 (cpu=cpu@entry=0x5555589f9f50,
>>> data=data@entry=0x7ffebd7f9310, memop=memop@entry=MO_32,
>>> mmu_idx=mmu_idx@entry=4,
>>> access_type=access_type@entry=MMU_DATA_LOAD, ra=ra@entry=0) at
>>> accel/tcg/cputlb.c:1652
>>> #12 0x0000555555e6eea5 in mmu_lookup (cpu=cpu@entry=0x5555589f9f50,
>>> addr=addr@entry=18446741874686299840, oi=oi@entry=36, ra=ra@entry=0,
>>> type=type@entry=MMU_DATA_LOAD, l=l@entry=0x7ffebd7f9310) at accel/
>>> tcg/cputlb.c:1755
>>> #13 0x0000555555e6f2f3 in do_ld4_mmu (cpu=cpu@entry=0x5555589f9f50,
>>> addr=addr@entry=18446741874686299840, oi=oi@entry=36, ra=ra@entry=0,
>>> access_type=access_type@entry=MMU_DATA_LOAD) at accel/tcg/
>>> cputlb.c:2364
>>> #14 0x0000555555e71dba in cpu_ldl_mmu (env=0x5555589fcb10,
>>> addr=18446741874686299840, oi=36,
>>> ra=0) at accel/tcg/ldst_common.c.inc:165
>>> #15 cpu_ldl_le_mmuidx_ra (env=env@entry=0x5555589fcb10,
>>> addr=addr@entry=18446741874686299840,
>>> mmu_idx=<optimized out>, ra=ra@entry=0) at accel/tcg/
>>> ldst_common.c.inc:308
>>> #16 0x0000555555db72da in do_interrupt64 (env=0x5555589fcb10, intno=236,
>>> is_int=0, error_code=0,
>>> next_eip=<optimized out>, is_hw=0) at target/i386/tcg/seg_helper.c:954
>>> #17 do_interrupt_all (cpu=cpu@entry=0x5555589f9f50, intno=236,
>>> is_int=is_int@entry=0,
>>> error_code=error_code@entry=0, next_eip=next_eip@entry=0,
>>> is_hw=is_hw@entry=1)
>>> at target/i386/tcg/seg_helper.c:1213
>>> #18 0x0000555555db884a in do_interrupt_x86_hardirq
>>> (env=env@entry=0x5555589fcb10,
>>> intno=<optimized out>, is_hw=is_hw@entry=1) at target/i386/tcg/
>>> seg_helper.c:1245
>>> #19 0x0000555555d4f06f in x86_cpu_exec_interrupt (cs=0x5555589f9f50,
>>> interrupt_request=<optimized out>) at target/i386/tcg/system/
>>> seg_helper.c:209
>>> #20 0x0000555555e660ed in cpu_handle_interrupt (cpu=0x5555589f9f50,
>>> last_tb=<synthetic pointer>)
>>> at accel/tcg/cpu-exec.c:851
>>> #21 cpu_exec_loop (cpu=cpu@entry=0x5555589f9f50,
>>> sc=sc@entry=0x7ffebd7f9580)
>>> at accel/tcg/cpu-exec.c:955
>>> #22 0x0000555555e663f1 in cpu_exec_setjmp (cpu=cpu@entry=0x5555589f9f50,
>>> sc=sc@entry=0x7ffebd7f9580) at accel/tcg/cpu-exec.c:1033
>>> #23 0x0000555555e66a5d in cpu_exec (cpu=cpu@entry=0x5555589f9f50) at
>>> accel/tcg/cpu-exec.c:1059
>>> #24 0x0000555555d2bdc7 in tcg_cpu_exec (cpu=cpu@entry=0x5555589f9f50)
>>> at accel/tcg/tcg-accel-ops.c:80
>>> #25 0x0000555555d2c1c3 in mttcg_cpu_thread_fn
>>> (arg=arg@entry=0x5555589f9f50)
>>> at accel/tcg/tcg-accel-ops-mttcg.c:94
>>> #26 0x0000555556056d90 in qemu_thread_start (args=0x55555856bf60) at
>>> util/qemu-thread-posix.c:541
>>> #27 0x00007ffff60e0b7b in ?? () from /lib/x86_64-linux-gnu/libc.so.6
>>> #28 0x00007ffff615e7b8 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
>>>
>>> (gdb) frame 2
>>> #2 0x0000555555a047fa in ramblock_ptr (offset=281471527758833,
>>> block=<optimized out>)
>>> at ./include/exec/ram_addr.h:91
>>> 91 assert(offset_in_ramblock(block, offset));
>>>
>>> (gdb) l
>>> 86 return (b && b->host && offset < b->used_length) ? true :
>>> false;
>>> 87 }
>>> 88
>>> 89 static inline void *ramblock_ptr(RAMBlock *block, ram_addr_t
>>> offset)
>>> 90 {
>>> 91 assert(offset_in_ramblock(block, offset));
>>> 92 return (char *)block->host + offset;
>>> 93 }
>>> 94
>>> 95 static inline unsigned long int ramblock_recv_bitmap_offset(void
>>> *host_addr,
>>>
>>>
>>> [ 9.439487] pci 0000:00:02.0: BAR 1 [io 0xc000-0xc03f]
>>>
>>> Thread 65 "qemu-system-x86" received signal SIGSEGV, Segmentation fault.
>>> [Switching to Thread 0x7ffe9cff96c0 (LWP 15472)]
>>> phys_page_find (d=d@entry=0x7fff905ec880, addr=addr@entry=111288320) at
>>> system/physmem.c:337
>>> 337 if (section_covers_addr(§ions[lp.ptr], addr)) {
>>> (gdb) l
>>> 332 }
>>> 333 p = nodes[lp.ptr];
>>> 334 lp = p[(index >> (i * P_L2_BITS)) & (P_L2_SIZE - 1)];
>>> 335 }
>>> 336
>>> 337 if (section_covers_addr(§ions[lp.ptr], addr)) {
>>> 338 return §ions[lp.ptr];
>>> 339 } else {
>>> 340 return §ions[PHYS_SECTION_UNASSIGNED];
>>> 341 }
>>> (gdb)
>>>
>>>
>>> I was doing a bisection between 9.2.0 and 10.0.0, since we observed
>>> this issue happening wiht 10.0 but not - with 9.2. So some of the
>>> above failures might be somewhere from the middle between 9.2 and
>>> 10.0. However, I was able to trigger some of the failures with
>>> 9.2.0, though with much less probability. And some can be triggered
>>> in current master too, with much better probability.
>>>
>>> On my 4-core notebook, the above command line fails every 20..50 run.
>>>
>>> I was never able to reproduce the assertion failure as shown in !1921.
>>>
>>> As of now, this issue is hitting debian trixie, - in debci, when a
>>> package which creates a guest image tries to run qemu but in the
>>> debci environment there's no kvm available, so it resorts to tcg.
>>>
>>> On IRC, Manos Pitsidianakis noted that he was debugging use-after-free
>>> with MemoryRegion recently, and posted a patch which can help a bit:
>>> https://people.linaro.org/~manos.pitsidianakis/backtrace.diff
>>>
>>> I'm not sure where to go from here.
>>>
>>> Just collecting everything we have now.
>>>
>>> Thanks,
>>>
>>> /mjt
>>>
>>
>
> looks like a good target for TSAN, which might expose the race without
> really having to trigger it.
> https://www.qemu.org/docs/master/devel/testing/main.html#building-and-testing-with-tsan
>
> Else, you can reproduce your run using rr record -h (chaos mode) [1],
> which randomly schedules threads, until it catches the segfault, and
> then you'll have a reproducible case to debug.
>
In case you never had opportunity to use rr, it is quite convenient,
because you can set a hardware watchpoint on your faulty pointer (watch
-l), do a reverse-continue, and in most cases, you'll directly reach
where the bug happened. Feels like cheating.
> [1] https://github.com/rr-debugger/rr
>
> Regards,
> Pierrick
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: apparent race condition in mttcg memory handling
2025-07-21 16:29 ` Pierrick Bouvier
@ 2025-07-21 17:14 ` Michael Tokarev
2025-07-21 17:25 ` Pierrick Bouvier
0 siblings, 1 reply; 12+ messages in thread
From: Michael Tokarev @ 2025-07-21 17:14 UTC (permalink / raw)
To: Pierrick Bouvier, Philippe Mathieu-Daudé, QEMU Development
Cc: Jonathan Cameron, Alex Bennée, Richard Henderson,
Paolo Bonzini, Stefan Hajnoczi, Mark Cave-Ayland
On 21.07.2025 19:29, Pierrick Bouvier wrote:
> On 7/21/25 9:23 AM, Pierrick Bouvier wrote:
..
>> looks like a good target for TSAN, which might expose the race without
>> really having to trigger it.
>> https://www.qemu.org/docs/master/devel/testing/main.html#building-and-
>> testing-with-tsan
I think I tried with TSAN and it gave something useful even.
The prob now is to reproduce the thing by someone more familiar
with this stuff than me :)
>> Else, you can reproduce your run using rr record -h (chaos mode) [1],
>> which randomly schedules threads, until it catches the segfault, and
>> then you'll have a reproducible case to debug.
>
> In case you never had opportunity to use rr, it is quite convenient,
> because you can set a hardware watchpoint on your faulty pointer (watch
> -l), do a reverse-continue, and in most cases, you'll directly reach
> where the bug happened. Feels like cheating.
rr is the first thing I tried. Nope, it's absolutely hopeless. It
tried to boot just the kernel for over 30 minutes, after which I just
gave up.
Thanks,
/mjt
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: apparent race condition in mttcg memory handling
2025-07-21 17:14 ` Michael Tokarev
@ 2025-07-21 17:25 ` Pierrick Bouvier
2025-07-21 17:28 ` Pierrick Bouvier
2025-07-21 17:31 ` Peter Maydell
0 siblings, 2 replies; 12+ messages in thread
From: Pierrick Bouvier @ 2025-07-21 17:25 UTC (permalink / raw)
To: Michael Tokarev, Philippe Mathieu-Daudé, QEMU Development
Cc: Jonathan Cameron, Alex Bennée, Richard Henderson,
Paolo Bonzini, Stefan Hajnoczi, Mark Cave-Ayland
On 7/21/25 10:14 AM, Michael Tokarev wrote:
> On 21.07.2025 19:29, Pierrick Bouvier wrote:
>> On 7/21/25 9:23 AM, Pierrick Bouvier wrote:
> ..
>>> looks like a good target for TSAN, which might expose the race without
>>> really having to trigger it.
>>> https://www.qemu.org/docs/master/devel/testing/main.html#building-and-
>>> testing-with-tsan
>
> I think I tried with TSAN and it gave something useful even.
> The prob now is to reproduce the thing by someone more familiar
> with this stuff than me :)
>
>>> Else, you can reproduce your run using rr record -h (chaos mode) [1],
>>> which randomly schedules threads, until it catches the segfault, and
>>> then you'll have a reproducible case to debug.
>>
>> In case you never had opportunity to use rr, it is quite convenient,
>> because you can set a hardware watchpoint on your faulty pointer (watch
>> -l), do a reverse-continue, and in most cases, you'll directly reach
>> where the bug happened. Feels like cheating.
>
> rr is the first thing I tried. Nope, it's absolutely hopeless. It
> tried to boot just the kernel for over 30 minutes, after which I just
> gave up.
>
I had a similar thing to debug recently, and with a simple loop, I
couldn't expose it easily. The bug I had was triggered with 3%
probability, which seems close from yours.
As rr record -h is single threaded, I found useful to write a wrapper
script [1] to run one instance, and then run it in parallel using:
./run_one.sh | head -n 10000 | parallel --bar -j$(nproc)
With that, I could expose the bug in 2 minutes reliably (vs trying for
more than one hour before). With your 64 cores, I'm sure it will quickly
expose it.
Might be worth a try, as you need to only catch the bug once to be able
to reproduce it.
[1] https://github.com/pbo-linaro/qemu/blob/master/try_rme.sh
> Thanks,
>
> /mjt
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: apparent race condition in mttcg memory handling
2025-07-21 17:25 ` Pierrick Bouvier
@ 2025-07-21 17:28 ` Pierrick Bouvier
2025-07-21 17:31 ` Peter Maydell
1 sibling, 0 replies; 12+ messages in thread
From: Pierrick Bouvier @ 2025-07-21 17:28 UTC (permalink / raw)
To: Michael Tokarev, Philippe Mathieu-Daudé, QEMU Development
Cc: Jonathan Cameron, Alex Bennée, Richard Henderson,
Paolo Bonzini, Stefan Hajnoczi, Mark Cave-Ayland
On 7/21/25 10:25 AM, Pierrick Bouvier wrote:
> On 7/21/25 10:14 AM, Michael Tokarev wrote:
>> On 21.07.2025 19:29, Pierrick Bouvier wrote:
>>> On 7/21/25 9:23 AM, Pierrick Bouvier wrote:
>> ..
>>>> looks like a good target for TSAN, which might expose the race without
>>>> really having to trigger it.
>>>> https://www.qemu.org/docs/master/devel/testing/main.html#building-and-
>>>> testing-with-tsan
>>
>> I think I tried with TSAN and it gave something useful even.
>> The prob now is to reproduce the thing by someone more familiar
>> with this stuff than me :)
>>
>>>> Else, you can reproduce your run using rr record -h (chaos mode) [1],
>>>> which randomly schedules threads, until it catches the segfault, and
>>>> then you'll have a reproducible case to debug.
>>>
>>> In case you never had opportunity to use rr, it is quite convenient,
>>> because you can set a hardware watchpoint on your faulty pointer (watch
>>> -l), do a reverse-continue, and in most cases, you'll directly reach
>>> where the bug happened. Feels like cheating.
>>
>> rr is the first thing I tried. Nope, it's absolutely hopeless. It
>> tried to boot just the kernel for over 30 minutes, after which I just
>> gave up.
>>
>
> I had a similar thing to debug recently, and with a simple loop, I
> couldn't expose it easily. The bug I had was triggered with 3%
> probability, which seems close from yours.
> As rr record -h is single threaded, I found useful to write a wrapper
> script [1] to run one instance, and then run it in parallel using:
> ./run_one.sh | head -n 10000 | parallel --bar -j$(nproc)
>
> With that, I could expose the bug in 2 minutes reliably (vs trying for
> more than one hour before). With your 64 cores, I'm sure it will quickly
> expose it.
>
> Might be worth a try, as you need to only catch the bug once to be able
> to reproduce it.
>
> [1] https://github.com/pbo-linaro/qemu/blob/master/try_rme.sh
>
In this script, I finally used qemu rr feature (as QEMU was working
fine, but there was a bug in the software stack itself, that I wanted to
investigate under gdbstub). But I was mentioning the same approach using
rr (the tool).
>> Thanks,
>>
>> /mjt
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: apparent race condition in mttcg memory handling
2025-07-21 17:25 ` Pierrick Bouvier
2025-07-21 17:28 ` Pierrick Bouvier
@ 2025-07-21 17:31 ` Peter Maydell
2025-07-21 17:52 ` Pierrick Bouvier
1 sibling, 1 reply; 12+ messages in thread
From: Peter Maydell @ 2025-07-21 17:31 UTC (permalink / raw)
To: Pierrick Bouvier
Cc: Michael Tokarev, Philippe Mathieu-Daudé, QEMU Development,
Jonathan Cameron, Alex Bennée, Richard Henderson,
Paolo Bonzini, Stefan Hajnoczi, Mark Cave-Ayland
On Mon, 21 Jul 2025 at 18:26, Pierrick Bouvier
<pierrick.bouvier@linaro.org> wrote:
>
> On 7/21/25 10:14 AM, Michael Tokarev wrote:
> > rr is the first thing I tried. Nope, it's absolutely hopeless. It
> > tried to boot just the kernel for over 30 minutes, after which I just
> > gave up.
> >
>
> I had a similar thing to debug recently, and with a simple loop, I
> couldn't expose it easily. The bug I had was triggered with 3%
> probability, which seems close from yours.
> As rr record -h is single threaded, I found useful to write a wrapper
> script [1] to run one instance, and then run it in parallel using:
> ./run_one.sh | head -n 10000 | parallel --bar -j$(nproc)
>
> With that, I could expose the bug in 2 minutes reliably (vs trying for
> more than one hour before). With your 64 cores, I'm sure it will quickly
> expose it.
I think the problem here is that the whole runtime to get to
point-of-potential failure is too long, not that it takes too
many runs to get a failure.
For that kind of thing I have had success in the past with
making a QEMU snapshot close to the point of failure so that
the actual runtime that it's necessary to record under rr is
reduced.
-- PMM
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: apparent race condition in mttcg memory handling
2025-07-21 17:31 ` Peter Maydell
@ 2025-07-21 17:52 ` Pierrick Bouvier
0 siblings, 0 replies; 12+ messages in thread
From: Pierrick Bouvier @ 2025-07-21 17:52 UTC (permalink / raw)
To: Peter Maydell
Cc: Michael Tokarev, Philippe Mathieu-Daudé, QEMU Development,
Jonathan Cameron, Alex Bennée, Richard Henderson,
Paolo Bonzini, Stefan Hajnoczi, Mark Cave-Ayland
On 7/21/25 10:31 AM, Peter Maydell wrote:
> On Mon, 21 Jul 2025 at 18:26, Pierrick Bouvier
> <pierrick.bouvier@linaro.org> wrote:
>>
>> On 7/21/25 10:14 AM, Michael Tokarev wrote:
>>> rr is the first thing I tried. Nope, it's absolutely hopeless. It
>>> tried to boot just the kernel for over 30 minutes, after which I just
>>> gave up.
>>>
>>
>> I had a similar thing to debug recently, and with a simple loop, I
>> couldn't expose it easily. The bug I had was triggered with 3%
>> probability, which seems close from yours.
>> As rr record -h is single threaded, I found useful to write a wrapper
>> script [1] to run one instance, and then run it in parallel using:
>> ./run_one.sh | head -n 10000 | parallel --bar -j$(nproc)
>>
>> With that, I could expose the bug in 2 minutes reliably (vs trying for
>> more than one hour before). With your 64 cores, I'm sure it will quickly
>> expose it.
>
> I think the problem here is that the whole runtime to get to
> point-of-potential failure is too long, not that it takes too
> many runs to get a failure.
>
> For that kind of thing I have had success in the past with
> making a QEMU snapshot close to the point of failure so that
> the actual runtime that it's necessary to record under rr is
> reduced.
>
That's a good idea indeed. In the bug I had, it was due to KASLR address
chosen, so by using a snapshot I would have had not expose the random
aspect.
In case of current bug, it seems to be a proper race condition, so
trying more combinations with a preloaded snapshot to save a few seconds
per run is a good point.
> -- PMM
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: apparent race condition in mttcg memory handling
2025-05-30 19:20 apparent race condition in mttcg memory handling Michael Tokarev
2025-06-04 10:47 ` Michael Tokarev
2025-07-21 11:47 ` Philippe Mathieu-Daudé
@ 2025-07-22 20:11 ` Gustavo Romero
2025-07-23 6:31 ` Michael Tokarev
2 siblings, 1 reply; 12+ messages in thread
From: Gustavo Romero @ 2025-07-22 20:11 UTC (permalink / raw)
To: Michael Tokarev, QEMU Development
Hi Michael,
On 5/30/25 16:20, Michael Tokarev wrote:
> Hi!
>
> For quite some time (almost whole day yesterday) I'm trying to find out
> what's going on with mmtcg in qemu. There's apparently a race condition
> somewhere, like a use-after-free or something.
>
> It started as an incarnation of
> https://gitlab.com/qemu-project/qemu/-/issues/1921 -- the same assertion
> failure, but on an x86_64 host this time (it's also mentioned in that
> issue).
>
> However, that particular assertion failure is not the only possible
> outcome. We're hitting multiple assertion failures or SIGSEGVs in
> physmem.c and related files, - 4 or 5 different places so far.
>
> The problem here is that the bug is rather difficult to reproduce.
> What I've been using so far was to make most host cores busy, and
> specify amount of virtual CPUs close to actual host cores (threads).
>
> For example, on my 4-core, 8-threads notebook, I used `stress -c 8`
> and ran qemu with -smp 10, to trigger this issue. However, on this
> very notebook it is really difficult to trigger it, - it happens
> every 30..50 runs or so.
>
> The reproducer I was using - it was just booting kernel, no user-
> space is needed. Qemu crashes during kernel init, or it runs fine.
>
> I used regular kernel from debian sid:
> http://deb.debian.org/debian/pool/main/l/linux-signed-amd64/linux-image-amd64_6.12.29-1_amd64.deb
> Extract vmlinuz-6.12.29-amd64 from there.
The link above is broken. Googling for "linux-image-amd64_6.12.29-1_amd64.deb" successful too.
Could you point out another image to reproduce it?
Cheers,
Gustavo
> In order to simplify the reproducing, I created a tiny initrd with
> just one executable in there, which does a poweroff:
>
> cat >poweroff.c <<'EOF'
> #include <sys/reboot.h>
> #include <unistd.h>
>
> int main(void) {
> reboot(RB_POWER_OFF);
> sleep(5);
> return 0;
> }
> EOF
> diet gcc -static -o init poweroff.c
> echo init | cpio -o -H newc > initrd
>
> (it uses dietlibc, optional, just to make the initrd smaller).
>
> Now, the qemu invocation I used:
>
> qemu-system-x86_64 -kernel vmlinuz -initrd initrd \
> -append "console=ttyS0" \
> -vga none -display none \
> -serial file:/dev/tty \
> -monitor stdio \
> -m 256 \
> -smp 16
>
> This way, it either succeeds, terminating normally due to
> the initrd hating the system, or it will segfault or assert
> as per the issue.
>
> For a 64-core machine, I used -smp 64, and had 16..40 cores
> being busy with other stuff. Also, adding `nice' in front
> of that command apparently helps.
>
> Now, to the various issues/places I've hit. Here's a typical
> output:
>
> ...
> [ 3.129806] smpboot: x86: Booting SMP configuration:
> [ 3.135789] .... node #0, CPUs: #1 #2 #3 #4 #5 #6 #7 #8 #9
> [ 0.000000] calibrate_delay_direct() failed to get a good estimate for loops_per_jiffy.
> [ 0.000000] Probably due to long platform interrupts. Consider using "lpj=" boot option.
> [ 0.000000] calibrate_delay_direct() failed to get a good estimate for loops_per_jiffy.
> [ 0.000000] Probably due to long platform interrupts. Consider using "lpj=" boot option.
> [ 0.000000] calibrate_delay_direct() failed to get a good estimate for loops_per_jiffy.
> [ 0.000000] Probably due to long platform interrupts. Consider using "lpj=" boot option.
> [ 0.000000] calibrate_delay_direct() failed to get a good estimate for loops_per_jiffy.
> [ 0.000000] Probably due to long platform interrupts. Consider using "lpj=" boot option.
> [ 4.494389] calibrate_delay_direct() failed to get a good estimate for loops_per_jiffy.
> [ 4.494389] Probably due to long platform interrupts. Consider using "lpj=" boot option.
> [ 4.494396] calibrate_delay_direct() failed to get a good estimate for loops_per_jiffy.
> [ 4.494396] Probably due to long platform interrupts. Consider using "lpj=" boot option.
> [ 4.494401] calibrate_delay_direct() failed to get a good estimate for loops_per_jiffy.
> [ 4.494401] Probably due to long platform interrupts. Consider using "lpj=" boot option.
> [ 4.494408] calibrate_delay_direct() failed to get a good estimate for loops_per_jiffy.
> [ 4.494408] Probably due to long platform interrupts. Consider using "lpj=" boot option.
> [ 4.494415] calibrate_delay_direct() failed to get a good estimate for loops_per_jiffy.
> [ 4.494415] Probably due to long platform interrupts. Consider using "lpj=" boot option.
> [ 5.864038] smp: Brought up 1 node, 10 CPUs
> [ 5.865772] smpboot: Total of 10 processors activated (25983.25 BogoMIPS)
> [ 6.119683] Memory: 200320K/261624K available (16384K kernel code, 2486K rwdata, 11780K rodata, 4148K init, 4956K bss, 53176K reserved, 0K cma-reserved)
> [ 6.591933] devtmpfs: initialized
> [ 6.635844] x86/mm: Memory block size: 128MB
> [ 6.756849] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645041785100000 ns
> [ 6.774545] futex hash table entries: 4096 (order: 6, 262144 bytes, linear)
> [ 6.840775] pinctrl core: initialized pinctrl subsystem
> [ 7.117085] NET: Registered PF_NETLINK/PF_ROUTE protocol family
> [ 7.165883] DMA: preallocated 128 KiB GFP_KERNEL pool for atomic allocations
> [ 7.184243] DMA: preallocated 128 KiB GFP_KERNEL|GFP_DMA pool for atomic allocations
> [ 7.188322] DMA: preallocated 128 KiB GFP_KERNEL|GFP_DMA32 pool for atomic allocations
> [ 7.195902] audit: initializing netlink subsys (disabled)
> [ 7.223865] audit: type=2000 audit(1748628013.324:1): state=initialized audit_enabled=0 res=1
> [ 7.290904] thermal_sys: Registered thermal governor 'fair_share'
> [ 7.291980] thermal_sys: Registered thermal governor 'bang_bang'
> [ 7.295875] thermal_sys: Registered thermal governor 'step_wise'
> [ 7.299817] thermal_sys: Registered thermal governor 'user_space'
> [ 7.303804] thermal_sys: Registered thermal governor 'power_allocator'
> [ 7.316281] cpuidle: using governor ladder
> [ 7.331907] cpuidle: using governor menu
> [ 7.348199] acpiphp: ACPI Hot Plug PCI Controller Driver version: 0.5
> [ 7.407802] PCI: Using configuration type 1 for base access
> [ 7.417386] mtrr: your CPUs had inconsistent fixed MTRR settings
> [ 7.418244] mtrr: your CPUs had inconsistent variable MTRR settings
> [ 7.419048] mtrr: your CPUs had inconsistent MTRRdefType settings
> [ 7.419938] mtrr: probably your BIOS does not setup all CPUs.
> [ 7.420691] mtrr: corrected configuration.
> [ 7.461270] kprobes: kprobe jump-optimization is enabled. All kprobes are optimized if possible.
> [ 7.591938] HugeTLB: registered 2.00 MiB page size, pre-allocated 0 pages
> [ 7.595986] HugeTLB: 28 KiB vmemmap can be freed for a 2.00 MiB page
> [ 7.816900] ACPI: Added _OSI(Module Device)
> [ 7.819950] ACPI: Added _OSI(Processor Device)
> [ 7.823873] ACPI: Added _OSI(3.0 _SCP Extensions)
> [ 7.827683] ACPI: Added _OSI(Processor Aggregator Device)
> [ 8.000944] ACPI: 1 ACPI AML tables successfully acquired and loaded
> [ 8.355952] ACPI: Interpreter enabled
> [ 8.406604] ACPI: PM: (supports S0 S3 S4 S5)
> [ 8.416143] ACPI: Using IOAPIC for interrupt routing
> [ 8.448173] PCI: Using host bridge windows from ACPI; if necessary, use "pci=nocrs" and report a bug
> [ 8.468051] PCI: Using E820 reservations for host bridge windows
> [ 8.562534] ACPI: Enabled 2 GPEs in block 00 to 0F
> [ 9.153432] ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
> [ 9.166585] acpi PNP0A03:00: _OSC: OS supports [ASPM ClockPM Segments MSI HPX-Type3]
> [ 9.168452] acpi PNP0A03:00: _OSC: not requesting OS control; OS requires [ExtendedConfig ASPM ClockPM MSI]
> [ 9.181933] acpi PNP0A03:00: fail to add MMCONFIG information, can't access extended configuration space under this bridge
> [ 9.297562] acpiphp: Slot [2] registered
> ...
> [ 9.369007] PCI host bridge to bus 0000:00
> [ 9.376590] pci_bus 0000:00: root bus resource [io 0x0000-0x0cf7 window]
> [ 9.379987] pci_bus 0000:00: root bus resource [io 0x0d00-0xffff window]
> [ 9.383826] pci_bus 0000:00: root bus resource [mem 0x000a0000-0x000bffff window]
> [ 9.387818] pci_bus 0000:00: root bus resource [mem 0x10000000-0xfebfffff window]
> [ 9.393681] pci_bus 0000:00: root bus resource [mem 0x100000000-0x17fffffff window]
> [ 9.396987] pci_bus 0000:00: root bus resource [bus 00-ff]
> [ 9.414378] pci 0000:00:00.0: [8086:1237] type 00 class 0x060000 conventional PCI endpoint
> [ 9.477179] pci 0000:00:01.0: [8086:7000] type 00 class 0x060100 conventional PCI endpoint
> [ 9.494836] pci 0000:00:01.1: [8086:7010] type 00 class 0x010180 conventional PCI endpoint
> [ 9.527173] pci 0000:00:01.1: BAR 4 [io 0xc040-0xc04f]
> Segmentation fault
>
>
> So it breaks somewhere in PCI init, after SMP/CPUs has been inited
> by the guest kernel.
>
> Thread 21 "qemu-system-x86" received signal SIGSEGV, Segmentation fault.
> 0x0000555555e2e9c0 in section_covers_addr (section=0x7fff58307, addr=182591488) at ../system/physmem.c:309
> 309 return int128_gethi(section->size) ||
> (gdb) p *section
> Cannot access memory at address 0x7fff58307
>
> This one has been seen multiple times.
>
> Thread 53 "qemu-system-x86" received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7ffe8a7fc6c0 (LWP 104067)]
> 0x0000555555e30382 in memory_region_section_get_iotlb (cpu=0x5555584e0a90, section=0x7fff58c3eac0) at
> ../system/physmem.c:1002
> 1002 return section - d->map.sections;
> d is NULL here
>
>
> Thread 22 "qemu-system-x86" received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7fff0bfff6c0 (LWP 57595)]
> 0x0000555555e42c9a in memory_region_get_iommu (mr=0xffffffc1ffffffc1) at include/exec/memory.h:1756
> 1756 if (mr->alias) {
> (gdb) p *mr
> Cannot access memory at address 0xffffffc1ffffffc1
> (gdb) frame 1
> #1 0x0000555555e42cb9 in memory_region_get_iommu (mr=0x7fff54239a10) at include/exec/memory.h:1757
> 1757 return memory_region_get_iommu(mr->alias);
> (gdb) p mr
> $1 = (MemoryRegion *) 0x7fff54239a10
>
>
> [ 9.222531] pci 0000:00:02.0: BAR 0 [mem 0xfebc0000-0xfebdffff]
> [
> Thread 54 "qemu-system-x86" received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7ffebeffd6c0 (LWP 14977)]
>
> (gdb) l
> 1004 /* Called from RCU critical section */
> 1005 hwaddr memory_region_section_get_iotlb(CPUState *cpu,
> 1006 MemoryRegionSection *section)
> 1007 {
> 1008 AddressSpaceDispatch *d = flatview_to_dispatch(section->fv);
> 1009 return section - d->map.sections;
> 1010 }
> 1011
> 1012 static int subpage_register(subpage_t *mmio, uint32_t start, uint32_t end,
> 1013 uint16_t section);
>
> (gdb) p *section
> $1 = {size = 4940204083636081308795136, mr = 0x7fff98739760, fv = 0x7fff998f6fd0,
> offset_within_region = 0, offset_within_address_space = 0, readonly = false,
> nonvolatile = false, unmergeable = 12}
>
> (gdb) p *section->fv
> $2 = {rcu = {next = 0x0, func = 0x20281}, ref = 2555275280, ranges = 0x7fff99486a60, nr = 0,
> nr_allocated = 0, dispatch = 0x0, root = 0xffffffc1ffffffc1}
>
> (gdb) bt
> #0 0x0000555555e5118c in memory_region_section_get_iotlb (cpu=cpu@entry=0x55555894fdf0,
> section=section@entry=0x7fff984e6810) at system/physmem.c:1009
> #1 0x0000555555e6e07a in tlb_set_page_full (cpu=cpu@entry=0x55555894fdf0,
> mmu_idx=mmu_idx@entry=6, addr=addr@entry=151134208, full=full@entry=0x7ffebeffbd60)
> at accel/tcg/cputlb.c:1088
> #2 0x0000555555e70a92 in tlb_set_page_with_attrs (cpu=cpu@entry=0x55555894fdf0,
> addr=addr@entry=151134208, paddr=paddr@entry=151134208, attrs=..., prot=<optimized out>,
> mmu_idx=mmu_idx@entry=6, size=4096) at accel/tcg/cputlb.c:1193
> #3 0x0000555555d4ae44 in x86_cpu_tlb_fill (cs=0x55555894fdf0, addr=151138272,
> size=<optimized out>, access_type=MMU_DATA_STORE, mmu_idx=6, probe=<optimized out>,
> retaddr=0) at target/i386/tcg/system/excp_helper.c:624
> #4 0x0000555555e6e8cf in tlb_fill_align (cpu=0x55555894fdf0, addr=151138272,
> type=MMU_DATA_LOAD, type@entry=MMU_DATA_STORE, mmu_idx=6, memop=memop@entry=MO_8,
> size=-1739692016, size@entry=151138272, probe=true, ra=0) at accel/tcg/cputlb.c:1251
> #5 0x0000555555e6eb0d in probe_access_internal (cpu=cpu@entry=0x55555894fdf0,
> addr=addr@entry=151138272, fault_size=fault_size@entry=0,
> access_type=access_type@entry=MMU_DATA_STORE, mmu_idx=<optimized out>,
> nonfault=nonfault@entry=true, phost=0x7ffebeffc0a8, pfull=0x7ffebeffbfa0, retaddr=0,
> check_mem_cbs=false) at accel/tcg/cputlb.c:1371
> #6 0x0000555555e70c84 in probe_access_full_mmu (env=0x5555589529b0, addr=addr@entry=151138272,
> size=size@entry=0, access_type=access_type@entry=MMU_DATA_STORE, mmu_idx=<optimized out>,
> phost=phost@entry=0x7ffebeffc0a8, pfull=0x7ffebeffbfa0) at accel/tcg/cputlb.c:1439
> #7 0x0000555555d497c9 in ptw_translate (inout=0x7ffebeffc090, addr=151138272)
> at target/i386/tcg/system/excp_helper.c:68
> #8 0x0000555555d49988 in mmu_translate (env=env@entry=0x5555589529b0,
> in=in@entry=0x7ffebeffc140, out=out@entry=0x7ffebeffc110, err=err@entry=0x7ffebeffc120,
> ra=ra@entry=0) at target/i386/tcg/system/excp_helper.c:198
> #9 0x0000555555d4aece in get_physical_address (env=0x5555589529b0, addr=18446741874686299840,
> access_type=MMU_DATA_LOAD, mmu_idx=4, out=0x7ffebeffc110, err=0x7ffebeffc120, ra=0)
> at target/i386/tcg/system/excp_helper.c:597
> #10 x86_cpu_tlb_fill (cs=0x55555894fdf0, addr=18446741874686299840, size=<optimized out>,
> access_type=MMU_DATA_LOAD, mmu_idx=4, probe=<optimized out>, retaddr=0)
> at target/i386/tcg/system/excp_helper.c:617
> #11 0x0000555555e6e8cf in tlb_fill_align (cpu=0x55555894fdf0, addr=18446741874686299840,
> type=type@entry=MMU_DATA_LOAD, mmu_idx=4, memop=MO_8, memop@entry=MO_32, size=-1739692016,
> size@entry=3776, probe=false, ra=0) at accel/tcg/cputlb.c:1251
> #12 0x0000555555e6ed4d in mmu_lookup1 (cpu=cpu@entry=0x55555894fdf0,
> data=data@entry=0x7ffebeffc310, memop=memop@entry=MO_32, mmu_idx=mmu_idx@entry=4,
> access_type=access_type@entry=MMU_DATA_LOAD, ra=ra@entry=0) at accel/tcg/cputlb.c:1652
> #13 0x0000555555e6eea5 in mmu_lookup (cpu=cpu@entry=0x55555894fdf0,
> addr=addr@entry=18446741874686299840, oi=oi@entry=36, ra=ra@entry=0,
> type=type@entry=MMU_DATA_LOAD, l=l@entry=0x7ffebeffc310) at accel/tcg/cputlb.c:1755
> #14 0x0000555555e6f2f3 in do_ld4_mmu (cpu=cpu@entry=0x55555894fdf0,
> addr=addr@entry=18446741874686299840, oi=oi@entry=36, ra=ra@entry=0,
> access_type=access_type@entry=MMU_DATA_LOAD) at accel/tcg/cputlb.c:2364
> #15 0x0000555555e71dba in cpu_ldl_mmu (env=0x5555589529b0, addr=18446741874686299840, oi=36,
> ra=0) at accel/tcg/ldst_common.c.inc:165
> #16 cpu_ldl_le_mmuidx_ra (env=env@entry=0x5555589529b0, addr=addr@entry=18446741874686299840,
> mmu_idx=<optimized out>, ra=ra@entry=0) at accel/tcg/ldst_common.c.inc:308
> #17 0x0000555555db72da in do_interrupt64 (env=0x5555589529b0, intno=236, is_int=0, error_code=0,
> next_eip=<optimized out>, is_hw=0) at target/i386/tcg/seg_helper.c:954
> #18 do_interrupt_all (cpu=cpu@entry=0x55555894fdf0, intno=236, is_int=is_int@entry=0,
> error_code=error_code@entry=0, next_eip=next_eip@entry=0, is_hw=is_hw@entry=1)
> at target/i386/tcg/seg_helper.c:1213
> #19 0x0000555555db884a in do_interrupt_x86_hardirq (env=env@entry=0x5555589529b0,
> intno=<optimized out>, is_hw=is_hw@entry=1) at target/i386/tcg/seg_helper.c:1245
> #20 0x0000555555d4f06f in x86_cpu_exec_interrupt (cs=0x55555894fdf0,
> interrupt_request=<optimized out>) at target/i386/tcg/system/seg_helper.c:209
> #21 0x0000555555e660ed in cpu_handle_interrupt (cpu=0x55555894fdf0, last_tb=<synthetic pointer>)
> at accel/tcg/cpu-exec.c:851
> #22 cpu_exec_loop (cpu=cpu@entry=0x55555894fdf0, sc=sc@entry=0x7ffebeffc580)
> at accel/tcg/cpu-exec.c:955
> #23 0x0000555555e663f1 in cpu_exec_setjmp (cpu=cpu@entry=0x55555894fdf0,
> sc=sc@entry=0x7ffebeffc580) at accel/tcg/cpu-exec.c:1033
> type=type@entry=MMU_DATA_LOAD, l=l@entry=0x7ffebeffc310) at accel/tcg/cputlb.c:1755
> #14 0x0000555555e6f2f3 in do_ld4_mmu (cpu=cpu@entry=0x55555894fdf0,
> addr=addr@entry=18446741874686299840, oi=oi@entry=36, ra=ra@entry=0,
> access_type=access_type@entry=MMU_DATA_LOAD) at accel/tcg/cputlb.c:2364
> #15 0x0000555555e71dba in cpu_ldl_mmu (env=0x5555589529b0, addr=18446741874686299840, oi=36,
> ra=0) at accel/tcg/ldst_common.c.inc:165
> #16 cpu_ldl_le_mmuidx_ra (env=env@entry=0x5555589529b0, addr=addr@entry=18446741874686299840,
> mmu_idx=<optimized out>, ra=ra@entry=0) at accel/tcg/ldst_common.c.inc:308
> #17 0x0000555555db72da in do_interrupt64 (env=0x5555589529b0, intno=236, is_int=0, error_code=0,
> next_eip=<optimized out>, is_hw=0) at target/i386/tcg/seg_helper.c:954
> #18 do_interrupt_all (cpu=cpu@entry=0x55555894fdf0, intno=236, is_int=is_int@entry=0,
> error_code=error_code@entry=0, next_eip=next_eip@entry=0, is_hw=is_hw@entry=1)
> at target/i386/tcg/seg_helper.c:1213
> #19 0x0000555555db884a in do_interrupt_x86_hardirq (env=env@entry=0x5555589529b0,
> intno=<optimized out>, is_hw=is_hw@entry=1) at target/i386/tcg/seg_helper.c:1245
> #20 0x0000555555d4f06f in x86_cpu_exec_interrupt (cs=0x55555894fdf0,
> interrupt_request=<optimized out>) at target/i386/tcg/system/seg_helper.c:209
> #21 0x0000555555e660ed in cpu_handle_interrupt (cpu=0x55555894fdf0, last_tb=<synthetic pointer>)
> at accel/tcg/cpu-exec.c:851
> #22 cpu_exec_loop (cpu=cpu@entry=0x55555894fdf0, sc=sc@entry=0x7ffebeffc580)
> at accel/tcg/cpu-exec.c:955
> #23 0x0000555555e663f1 in cpu_exec_setjmp (cpu=cpu@entry=0x55555894fdf0,
> sc=sc@entry=0x7ffebeffc580) at accel/tcg/cpu-exec.c:1033
> --Type <RET> for more, q to quit, c to continue without paging--
> #24 0x0000555555e66a5d in cpu_exec (cpu=cpu@entry=0x55555894fdf0) at accel/tcg/cpu-exec.c:1059
> #25 0x0000555555d2bdc7 in tcg_cpu_exec (cpu=cpu@entry=0x55555894fdf0)
> at accel/tcg/tcg-accel-ops.c:80
> #26 0x0000555555d2c1c3 in mttcg_cpu_thread_fn (arg=arg@entry=0x55555894fdf0)
> at accel/tcg/tcg-accel-ops-mttcg.c:94
> #27 0x0000555556056d90 in qemu_thread_start (args=0x5555589cdba0) at util/qemu-thread-posix.c:541
> #28 0x00007ffff60e0b7b in ?? () from /lib/x86_64-linux-gnu/libc.so.6
> #29 0x00007ffff615e7b8 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
>
>
> qemu-system-x86_64: ./include/exec/ram_addr.h:91: ramblock_ptr: Assertion `offset_in_ramblock(block, offset)' failed.
>
> (gdb) bt
> #0 0x00007ffff6076507 in abort () from /lib/x86_64-linux-gnu/libc.so.6
> #1 0x00007ffff6076420 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
> #2 0x0000555555a047fa in ramblock_ptr (offset=281471527758833, block=<optimized out>)
> at ./include/exec/ram_addr.h:91
> #3 0x0000555555a04c83 in ramblock_ptr (block=<optimized out>, offset=<optimized out>)
> at system/physmem.c:2238
> #4 qemu_ram_ptr_length (lock=false, is_write=true, block=<optimized out>, addr=<optimized out>,
> size=0x0) at system/physmem.c:2430
> #5 qemu_map_ram_ptr (ram_block=<optimized out>, addr=<optimized out>) at system/physmem.c:2443
> #6 0x0000555555e4af6b in memory_region_get_ram_ptr (mr=<optimized out>) at system/memory.c:2452
> #7 0x0000555555e6e024 in tlb_set_page_full (cpu=cpu@entry=0x5555589f9f50,
> mmu_idx=mmu_idx@entry=4, addr=addr@entry=18446741874686296064,
> full=full@entry=0x7ffebd7f90b0) at accel/tcg/cputlb.c:1065
> #8 0x0000555555e70a92 in tlb_set_page_with_attrs (cpu=cpu@entry=0x5555589f9f50,
> addr=addr@entry=18446741874686296064, paddr=paddr@entry=206749696, attrs=...,
> prot=<optimized out>, mmu_idx=mmu_idx@entry=4, size=4096) at accel/tcg/cputlb.c:1193
> #9 0x0000555555d4ae44 in x86_cpu_tlb_fill (cs=0x5555589f9f50, addr=18446741874686299840,
> size=<optimized out>, access_type=MMU_DATA_LOAD, mmu_idx=4, probe=<optimized out>, retaddr=0)
> at target/i386/tcg/system/excp_helper.c:624
> #10 0x0000555555e6e8cf in tlb_fill_align (cpu=0x5555589f9f50, addr=18446741874686299840,
> type=type@entry=MMU_DATA_LOAD, mmu_idx=4, memop=MO_8, memop@entry=MO_32, size=-1115714056,
> size@entry=3776, probe=false, ra=0) at accel/tcg/cputlb.c:1251
> #11 0x0000555555e6ed4d in mmu_lookup1 (cpu=cpu@entry=0x5555589f9f50,
> data=data@entry=0x7ffebd7f9310, memop=memop@entry=MO_32, mmu_idx=mmu_idx@entry=4,
> access_type=access_type@entry=MMU_DATA_LOAD, ra=ra@entry=0) at accel/tcg/cputlb.c:1652
> #12 0x0000555555e6eea5 in mmu_lookup (cpu=cpu@entry=0x5555589f9f50,
> addr=addr@entry=18446741874686299840, oi=oi@entry=36, ra=ra@entry=0,
> type=type@entry=MMU_DATA_LOAD, l=l@entry=0x7ffebd7f9310) at accel/tcg/cputlb.c:1755
> #13 0x0000555555e6f2f3 in do_ld4_mmu (cpu=cpu@entry=0x5555589f9f50,
> addr=addr@entry=18446741874686299840, oi=oi@entry=36, ra=ra@entry=0,
> access_type=access_type@entry=MMU_DATA_LOAD) at accel/tcg/cputlb.c:2364
> #14 0x0000555555e71dba in cpu_ldl_mmu (env=0x5555589fcb10, addr=18446741874686299840, oi=36,
> ra=0) at accel/tcg/ldst_common.c.inc:165
> #15 cpu_ldl_le_mmuidx_ra (env=env@entry=0x5555589fcb10, addr=addr@entry=18446741874686299840,
> mmu_idx=<optimized out>, ra=ra@entry=0) at accel/tcg/ldst_common.c.inc:308
> #16 0x0000555555db72da in do_interrupt64 (env=0x5555589fcb10, intno=236, is_int=0, error_code=0,
> next_eip=<optimized out>, is_hw=0) at target/i386/tcg/seg_helper.c:954
> #17 do_interrupt_all (cpu=cpu@entry=0x5555589f9f50, intno=236, is_int=is_int@entry=0,
> error_code=error_code@entry=0, next_eip=next_eip@entry=0, is_hw=is_hw@entry=1)
> at target/i386/tcg/seg_helper.c:1213
> #18 0x0000555555db884a in do_interrupt_x86_hardirq (env=env@entry=0x5555589fcb10,
> intno=<optimized out>, is_hw=is_hw@entry=1) at target/i386/tcg/seg_helper.c:1245
> #19 0x0000555555d4f06f in x86_cpu_exec_interrupt (cs=0x5555589f9f50,
> interrupt_request=<optimized out>) at target/i386/tcg/system/seg_helper.c:209
> #20 0x0000555555e660ed in cpu_handle_interrupt (cpu=0x5555589f9f50, last_tb=<synthetic pointer>)
> at accel/tcg/cpu-exec.c:851
> #21 cpu_exec_loop (cpu=cpu@entry=0x5555589f9f50, sc=sc@entry=0x7ffebd7f9580)
> at accel/tcg/cpu-exec.c:955
> #22 0x0000555555e663f1 in cpu_exec_setjmp (cpu=cpu@entry=0x5555589f9f50,
> sc=sc@entry=0x7ffebd7f9580) at accel/tcg/cpu-exec.c:1033
> #23 0x0000555555e66a5d in cpu_exec (cpu=cpu@entry=0x5555589f9f50) at accel/tcg/cpu-exec.c:1059
> #24 0x0000555555d2bdc7 in tcg_cpu_exec (cpu=cpu@entry=0x5555589f9f50)
> at accel/tcg/tcg-accel-ops.c:80
> #25 0x0000555555d2c1c3 in mttcg_cpu_thread_fn (arg=arg@entry=0x5555589f9f50)
> at accel/tcg/tcg-accel-ops-mttcg.c:94
> #26 0x0000555556056d90 in qemu_thread_start (args=0x55555856bf60) at util/qemu-thread-posix.c:541
> #27 0x00007ffff60e0b7b in ?? () from /lib/x86_64-linux-gnu/libc.so.6
> #28 0x00007ffff615e7b8 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
>
> (gdb) frame 2
> #2 0x0000555555a047fa in ramblock_ptr (offset=281471527758833, block=<optimized out>)
> at ./include/exec/ram_addr.h:91
> 91 assert(offset_in_ramblock(block, offset));
>
> (gdb) l
> 86 return (b && b->host && offset < b->used_length) ? true : false;
> 87 }
> 88
> 89 static inline void *ramblock_ptr(RAMBlock *block, ram_addr_t offset)
> 90 {
> 91 assert(offset_in_ramblock(block, offset));
> 92 return (char *)block->host + offset;
> 93 }
> 94
> 95 static inline unsigned long int ramblock_recv_bitmap_offset(void *host_addr,
>
>
> [ 9.439487] pci 0000:00:02.0: BAR 1 [io 0xc000-0xc03f]
>
> Thread 65 "qemu-system-x86" received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7ffe9cff96c0 (LWP 15472)]
> phys_page_find (d=d@entry=0x7fff905ec880, addr=addr@entry=111288320) at system/physmem.c:337
> 337 if (section_covers_addr(§ions[lp.ptr], addr)) {
> (gdb) l
> 332 }
> 333 p = nodes[lp.ptr];
> 334 lp = p[(index >> (i * P_L2_BITS)) & (P_L2_SIZE - 1)];
> 335 }
> 336
> 337 if (section_covers_addr(§ions[lp.ptr], addr)) {
> 338 return §ions[lp.ptr];
> 339 } else {
> 340 return §ions[PHYS_SECTION_UNASSIGNED];
> 341 }
> (gdb)
>
>
> I was doing a bisection between 9.2.0 and 10.0.0, since we observed
> this issue happening wiht 10.0 but not - with 9.2. So some of the
> above failures might be somewhere from the middle between 9.2 and
> 10.0. However, I was able to trigger some of the failures with
> 9.2.0, though with much less probability. And some can be triggered
> in current master too, with much better probability.
>
> On my 4-core notebook, the above command line fails every 20..50 run.
>
> I was never able to reproduce the assertion failure as shown in !1921.
>
> As of now, this issue is hitting debian trixie, - in debci, when a
> package which creates a guest image tries to run qemu but in the
> debci environment there's no kvm available, so it resorts to tcg.
>
> On IRC, Manos Pitsidianakis noted that he was debugging use-after-free
> with MemoryRegion recently, and posted a patch which can help a bit:
> https://people.linaro.org/~manos.pitsidianakis/backtrace.diff
>
> I'm not sure where to go from here.
>
> Just collecting everything we have now.
>
> Thanks,
>
> /mjt
>
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: apparent race condition in mttcg memory handling
2025-07-22 20:11 ` Gustavo Romero
@ 2025-07-23 6:31 ` Michael Tokarev
0 siblings, 0 replies; 12+ messages in thread
From: Michael Tokarev @ 2025-07-23 6:31 UTC (permalink / raw)
To: Gustavo Romero, QEMU Development
On 22.07.2025 23:11, Gustavo Romero wrote:
...
>> The reproducer I was using - it was just booting kernel, no user-
>> space is needed. Qemu crashes during kernel init, or it runs fine.
>>
>> I used regular kernel from debian sid:
>> http://deb.debian.org/debian/pool/main/l/linux-signed-amd64/linux-
>> image-amd64_6.12.29-1_amd64.deb
>> Extract vmlinuz-6.12.29-amd64 from there.
>
> The link above is broken. Googling for "linux-image-
> amd64_6.12.29-1_amd64.deb" successful too.
>
> Could you point out another image to reproduce it?
Please see https://gitlab.com/qemu-project/qemu/-/issues/3040 --
actual kernel version isn't important, I guess any kernel will
do. Yesterday I used current debian kernel, from
https://deb.debian.org/debian/pool/main/l/linux-signed-amd64/linux-image-6.12.37+deb13-amd64_6.12.37-1_amd64.deb
, with current
qemu master, and I'm able to reproduce the issue in 4 invocations
on my laptop (running qemu with -smp 8, while running `stress -c10`
at the same time, on a laptop with 4 cores, 8 threads).
Thanks,
/mjt
^ permalink raw reply [flat|nested] 12+ messages in thread