* Found that running runltp under the virtual machine caused the virtio-blk-pci device to hang!!!
@ 2026-03-26 2:42 lixianglai
0 siblings, 0 replies; only message in thread
From: lixianglai @ 2026-03-26 2:42 UTC (permalink / raw)
To: Michael S. Tsirkin, qemu-devel, Song Gao,
毛碧波, ltp, lixianglai
[-- Attachment #1: Type: text/plain, Size: 23871 bytes --]
Hello Everyone:
During the runltp test in the virtual machine, we found that the tpci
test case had a
small probability of causing the virtio-blk-pci disk in the virtual
machine to fail to
work properly, resulting in the virtual machine hanging.
The situation seen in the virtual machine is like this:
/[root@localhost ~]# ps aux | grep " D"//
//root 689 0.0 0.0 0 0 ? D 09:42
0:00 [kworker/u256:2+flush-253:0]//
//root 1327 0.0 0.0 0 0 ? D 09:42
0:00 [jbd2/dm-0-8]//
//root 8533 2.1 0.0 223664 0 pts/0 D+ 09:48
0:01 /usr/bin/cp -rvf kvm_install_desktop_6.6.52+
kvm_install_desktop_6.6.52+-temp//
//root 8686 0.0 0.0 222992 0 ttyS0 S+ 09:49
0:00 grep --color=auto D//
//[root@localhost ~]# cat /proc/689/stack //
//[<0>] rq_qos_wait+0xd0/0x190//
//[<0>] wbt_wait+0xac/0x168//
//[<0>] __rq_qos_throttle+0x34/0x58//
//[<0>] blk_mq_submit_bio+0x184/0x7b0//
//[<0>] __submit_bio_noacct+0x70/0x288//
//[<0>] submit_bio_noacct+0x1e8/0x798//
//[<0>] __block_write_full_folio+0x348/0x608//
//[<0>] writepage_cb+0x24/0x98//
//[<0>] write_cache_pages+0x18c/0x450//
//[<0>] do_writepages+0x120/0x290//
//[<0>] __writeback_single_inode+0x4c/0x4c8//
//[<0>] writeback_sb_inodes+0x28c/0x630//
//[<0>] __writeback_inodes_wb+0x78/0x168//
//[<0>] wb_writeback+0x308/0x400//
//[<0>] wb_workfn+0x430/0x580//
//[<0>] process_one_work+0x16c/0x418//
//[<0>] worker_thread+0x280/0x498//
//[<0>] kthread+0xec/0xf8//
//[<0>] ret_from_kernel_thread+0x28/0xc8//
//[<0>] ret_from_kernel_thread_asm+0xc/0x88/
/[root@localhost ~]# lspci//
//00:00.0 Host bridge: Red Hat, Inc. QEMU PCIe Host bridge//
//00:01.0 PCI bridge: Red Hat, Inc. QEMU PCIe Root port//
//00:01.1 PCI bridge: Red Hat, Inc. QEMU PCIe Root port//
//00:01.2 PCI bridge: Red Hat, Inc. QEMU PCIe Root port//
//00:01.3 PCI bridge: Red Hat, Inc. QEMU PCIe Root port//
//00:01.4 PCI bridge: Red Hat, Inc. QEMU PCIe Root port//
//00:01.5 PCI bridge: Red Hat, Inc. QEMU PCIe Root port//
//00:01.6 PCI bridge: Red Hat, Inc. QEMU PCIe Root port//
//00:01.7 PCI bridge: Red Hat, Inc. QEMU PCIe Root port//
//00:02.0 PCI bridge: Red Hat, Inc. QEMU PCIe Root port//
//00:02.1 PCI bridge: Red Hat, Inc. QEMU PCIe Root port//
//00:02.2 PCI bridge: Red Hat, Inc. QEMU PCIe Root port//
//00:02.3 PCI bridge: Red Hat, Inc. QEMU PCIe Root port//
//00:02.4 PCI bridge: Red Hat, Inc. QEMU PCIe Root port//
//00:02.5 PCI bridge: Red Hat, Inc. QEMU PCIe Root port//
//01:00.0 Ethernet controller: Red Hat, Inc. Virtio 1.0 network
device (rev 01)//
//02:00.0 USB controller: Red Hat, Inc. QEMU XHCI Host
Controller (rev 01)//
//04:00.0 SCSI storage controller: Red Hat, Inc. Virtio 1.0
block device (rev 01)//
//05:00.0 Communication controller: Red Hat, Inc. Virtio 1.0
console (rev 01)//
//06:00.0 Unclassified device [00ff]: Red Hat, Inc. Virtio 1.0
memory balloon (rev 01)//
//07:00.0 Unclassified device [00ff]: Red Hat, Inc. Virtio 1.0
RNG (rev 01)//
//08:00.0 Display controller: Red Hat, Inc. Virtio 1.0 GPU (rev 01)/
There will be the following print in dmesg:
/[ 375.532574] LTP: starting tpci//
//[ 375.544269] ltp_tpci: Starting module//
//[ 375.544855] ltp_tpci: device registered//
//
//[ 375.546348] ltp_tpci: found pci_dev '0000:04:00.0', bus 4,
devfn 0//
//[ 375.547159] ltp_tpci: Bus number: 4//
//[ 375.547803] ltp_tpci: test-case 12//
//[ 375.548328] ltp_tpci: assign resources//
//[ 375.548793] ltp_tpci: assign resource #0//
//[ 375.549391] ltp_tpci: name = 0000:04:00.0, flags = 0, start
0x0, end 0x0//
//[ 375.549993] ltp_tpci: assign resource #1//
//[ 375.550409] ltp_tpci: name = 0000:04:00.0, flags = 262656,
start 0x40c00000, end 0x40c00fff//
//[ 375.551422] ltp_tpci: assign resource #2//
//[ 375.551936] ltp_tpci: name = 0000:04:00.0, flags = 0, start
0x0, end 0x0//
//[ 375.552756] ltp_tpci: assign resource #3//
//[ 375.553124] ltp_tpci: name = 0000:04:00.0, flags = 0, start
0x0, end 0x0//
//[ 375.553747] ltp_tpci: assign resource #4//
//[ 375.554104] ltp_tpci: name = 0000:04:00.0, flags = 1319436,
start 0x40e00000, end 0x40e03fff//
//[ 375.554880] virtio-pci 0000:04:00.0: BAR 4: releasing [mem
0x40e00000-0x40e03fff 64bit pref]//
//[ 375.555796] virtio-pci 0000:04:00.0: BAR 4: assigned [mem
0x40e00000-0x40e03fff 64bit pref]//
//[ 375.583052] ltp_tpci: assign resource to '4', ret '0'//
//[ 375.585047] ltp_tpci: assign resource #5//
//[ 375.585407] ltp_tpci: name = (null), flags = 0, start 0x0, end
0x0//
//[ 375.585941] ltp_tpci: assign resource #6//
//[ 375.586296] ltp_tpci: name = 0000:04:00.0, flags = 0, start
0x0, end 0x0//
//[ 375.591294] ltp_tpci: device released//
//
//[ 604.910783] INFO: task kworker/u256:2:689 blocked for more than
120 seconds.//
//[ 604.911539] Tainted: G OE 6.6.52+ #16//
//[ 604.912013] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.//
//[ 604.912659] task:kworker/u256:2 state:D stack:0 pid:689
ppid:2 flags:0x00000800//
//[ 604.913353] Workqueue: writeback wb_workfn (flush-253:0)//
//[ 604.913834] Stack : 0000000006d64000 90000000025409a0
9000000001585af8 0000000000000000//
//[ 604.914505] 0000000000000012 0000000000000000
90000000016d2220 0000000000000000//
//[ 604.915221] 0000000000000002 e35c7a5dfd2412e2
9000000122cb4b40 fffffffffffffffd//
//[ 604.916192] 0000000000000080 fffffffffffffffb
9000000000bd0918 90000001071b7548//
//[ 604.916955] 9000000002341000 90000001071b75f8
900000012961a080 900000010698f600//
//[ 604.917671] 90000000875db2c8 9000000001585af8
0000000000000000 9000000001585bf0//
//[ 604.918369] 0000000000000002 9000000000ba9bfc
9000000122cb4bb8 0000000000000001//
//[ 604.919290] 0000000000000000 9000000000ba93a0
900000012961a088 900000012961a088//
//[ 604.920960] 900000010698f600 900000012961a080
9000000000bd0c20 90000001071b75f8//
//[ 604.921992] 0000000000000000 e35c7a5dfd2412e2
0000100000000001 0000000000000000//
//[ 604.922812] ...//
//[ 604.923355] Call Trace://
//[ 604.923359] [<9000000001585430>] __schedule+0x318/0x988//
//[ 604.924802] [<9000000001585af4>] schedule+0x54/0xf0//
//[ 604.925606] [<9000000001585bec>] io_schedule+0x44/0x68//
//[ 604.926318] [<9000000000ba9bf8>] rq_qos_wait+0xd0/0x190//
//[ 604.927201] [<9000000000bd0ef4>] wbt_wait+0xac/0x168//
//[ 604.928033] [<9000000000ba963c>] __rq_qos_throttle+0x34/0x58//
//[ 604.928900] [<9000000000b97acc>] blk_mq_submit_bio+0x184/0x7b0//
//[ 604.929785] [<9000000000b82b18>] __submit_bio_noacct+0x70/0x288//
//[ 604.930675] [<9000000000b83398>] submit_bio_noacct+0x1e8/0x798//
//[ 604.931671] [<9000000000653a28>]
__block_write_full_folio+0x348/0x608//
//[ 604.932769] [<90000000004da37c>] writepage_cb+0x24/0x98//
//[ 604.933611] [<90000000004dcfdc>] write_cache_pages+0x18c/0x450//
//[ 604.934368] [<90000000004de368>] do_writepages+0x120/0x290//
//[ 604.935132] [<900000000063dfe4>]
__writeback_single_inode+0x4c/0x4c8//
//[ 604.936191] [<900000000063eb04>] writeback_sb_inodes+0x28c/0x630//
//[ 604.937090] [<900000000063ef20>] __writeback_inodes_wb+0x78/0x168//
//[ 604.937973] [<900000000063f478>] wb_writeback+0x308/0x400//
//[ 604.938829] [<9000000000640c18>] wb_workfn+0x430/0x580//
//[ 604.939550] [<9000000000295fb4>] process_one_work+0x16c/0x418//
//[ 604.940390] [<90000000002987a8>] worker_thread+0x280/0x498//
//[ 604.941016] [<90000000002a25b4>] kthread+0xec/0xf8//
//[ 604.941809] [<900000000157e660>] ret_from_kernel_thread+0x28/0xc8//
//[ 604.942720] [<9000000000222084>]
ret_from_kernel_thread_asm+0xc/0x88//
//
//[ 604.944135] INFO: task jbd2/dm-0-8:1327 blocked for more than
120 seconds.//
//[ 604.944785] Tainted: G OE 6.6.52+ #16//
//[ 604.945634] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.//
//[ 604.946277] task:jbd2/dm-0-8 state:D stack:0 pid:1327
ppid:2 flags:0x00000800//
//[ 604.947254] Stack : 0000000006be4000 90000000025409a0
9000000001585af8 0000000000000000//
//[ 604.948286] 90000000a6de88c0 0000000000000000
90000000016d1c20 0000000000000000//
//[ 604.949210] 0000000000000002 e35c7a5dfd2412e2
0000000000000400 90000000022d9b90//
//[ 604.950130] 0000000000000002 0000000000000001
90000000022d9b00 9000000002341000//
//[ 604.951084] 0000000000000006 0000000000000090
0000000000000001 9000000128355c80//
//[ 604.952103] 0000000000000002 9000000001585af8
0000000000000000 9000000001585bf0//
//[ 604.953150] ffffff0000b22ec0 90000000004cd71c
900000011bb39a40 0000000000000001//
//[ 604.954180] 0000000000000005 0000000000000000
900000012ad87998 90000000022d9b98//
//[ 604.955240] 0000100000000001 0000000000000000
ffffff0000b22ec0 0000000000000001//
//[ 604.956145] 0000000000000000 9000000128355c80
90000000004ca560 90000000022d9b98//
//[ 604.956948] ...//
//[ 604.957460] Call Trace://
//[ 604.957462] [<9000000001585430>] __schedule+0x318/0x988//
//[ 604.958913] [<9000000001585af4>] schedule+0x54/0xf0//
//[ 604.959618] [<9000000001585bec>] io_schedule+0x44/0x68//
//[ 604.960424] [<90000000004cd718>]
folio_wait_bit_common+0x1d8/0x460//
//[ 604.961336] [<90000000004da560>] folio_wait_writeback+0x48/0xd0//
//[ 604.962097] [<90000000004cba00>]
__filemap_fdatawait_range+0xb0/0x170//
//[ 604.963284] [<90000000004cbad8>]
filemap_fdatawait_range_keep_errors+0x18/0x68//
//[ 604.964243] [<90000000007822c8>]
journal_finish_inode_data_buffers+0xa8/0x278//
//[ 604.965103] [<900000000078348c>]
jbd2_journal_commit_transaction+0xfd4/0x1d68//
//[ 604.965953] [<9000000000790154>] kjournald2+0x10c/0x3e0//
//[ 604.966804] [<90000000002a25b4>] kthread+0xec/0xf8//
//[ 604.967489] [<900000000157e660>] ret_from_kernel_thread+0x28/0xc8//
//[ 604.968361] [<9000000000222084>]
ret_from_kernel_thread_asm+0xc/0x88/
From the above information, you can see that the state of the virtio
blk device is no longer working properly,
After analysis, it is found that the following reasons may lead to it:
1.There is an item in the tpci test case of runltp that first releases
pci device memory resources, and then reallocates pci memory
resources,like this:
/[ 375.553747] ltp_tpci: assign resource #4/
///[ 375.554104] ltp_tpci: name = 0000:04:00.0, flags = 1319436,
start 0x40e00000, end 0x40e03fff/
///[ 375.554880] virtio-pci 0000:04:00.0: BAR 4: releasing [mem
0x40e00000-0x40e03fff 64bit pref]/
///[ 375.555796] virtio-pci 0000:04:00.0: BAR 4: assigned [mem
0x40e00000-0x40e03fff 64bit pref]/
///[ 375.583052] ltp_tpci: assign resource to '4', ret '0'/
//2.The BAR 4 of Virtio PCI is the Virtio modern memory address space.
It is the interaction address space between the Virtio driver and the
Virtio device in the guest.
3.When the address space of bar 4 is released, if the guest is writing
to the disk on another CPU, the vp_notify function needs to be used to
interact with the virtio device.
At this point, the interaction information may be lost, ultimately
causing the virtio blk device to be blocked.
So here I have a question: Where exactly is the problem? Could it be
that there was an issue with running the LTP test cases, or is there a
defect in the virio driver of the virtual machine,
or is there a problem with the Virtio device simulation in QEMU?
I have tested on both x86 and loongarch virtual machines and found this
problem in both.Here I explain again, this problem should not be related
to cpu architecture and software version,
because I have tested the latest linux kernel and qemu in the upstream
community.
Now, I will introduce the x86 environment:
[root@anolis ~]# lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 46 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 20
On-line CPU(s) list: 0-19
Vendor ID: GenuineIntel
BIOS Vendor ID: Intel(R) Corporation
Model name: 12th Gen Intel(R) Core(TM) i7-12700
BIOS Model name: 12th Gen Intel(R) Core(TM)
i7-12700 CPU @ 4.4GHz
BIOS CPU family: 198
CPU family: 6
Model: 151
Thread(s) per core: 2
Core(s) per socket: 12
Socket(s): 1
Stepping: 2
CPU(s) scaling MHz: 18%
CPU max MHz: 4900.0000
CPU min MHz: 800.0000
BogoMIPS: 4224.00
Flags: fpu vme de pse tsc msr pae mce cx8
apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr
sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc
art arch_perfmon pebs bts
rep_good nopl xtopology nonstop_tsc cpuid aperfmperf
tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx smx est
tm2 ssse3 sdbg fma
cx16 xtpr pdcm pcid sse4_1 sse4_2
x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand
lahf_lm abm 3dnowprefetch cpuid_fault epb ssbd ibrs ibpb stib
p ibrs_enhanced tpr_shadow
flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep
bmi2 erms invpcid rdseed adx smap clflushopt clwb intel_pt
sha_ni xsav
eopt xsavec xgetbv1 xsaves
split_lock_detect avx_vnni dtherm ida arat pln pts hwp
hwp_notify hwp_act_window hwp_epp hwp_pkg_req hfi vnmi umip pku
ospke waitpkg g
fni vaes vpclmulqdq tme rdpid
movdiri movdir64b fsrm md_clear serialize pconfig arch_lbr ibt
flush_l1d arch_capabilities
Virtualization features:
Virtualization: VT-x
Caches (sum of all):
L1d: 512 KiB (12 instances)
L1i: 512 KiB (12 instances)
L2: 12 MiB (9 instances)
L3: 25 MiB (1 instance)
NUMA:
NUMA node(s): 1
NUMA node0 CPU(s): 0-19
Vulnerabilities:
Gather data sampling: Not affected
Indirect target selection: Not affected
Itlb multihit: Not affected
L1tf: Not affected
Mds: Not affected
Meltdown: Not affected
Mmio stale data: Not affected
Reg file data sampling: Mitigation; Clear Register File
Retbleed: Not affected
Spec rstack overflow: Not affected
Spec store bypass: Mitigation; Speculative Store
Bypass disabled via prctl
Spectre v1: Mitigation; usercopy/swapgs
barriers and __user pointer sanitization
Spectre v2: Mitigation; Enhanced / Automatic
IBRS; IBPB conditional; PBRSB-eIBRS SW sequence; BHI BHI_DIS_S
Srbds: Not affected
Tsa: Not affected
Tsx async abort: Not affected
[root@anolis ~]# cat /etc/os-release
NAME="Anolis OS"
VERSION="23.3"
ID="anolis"
VERSION_ID="23.3"
PLATFORM_ID="platform:an23"
PRETTY_NAME="Anolis OS 23.3"
ANSI_COLOR="0;31"
HOME_URL="https://openanolis.cn/"
BUG_REPORT_URL="https://bugzilla.openanolis.cn/"
[root@anolis ~]# uname -a
Linux anolis 6.6.102-5.2.an23.x86_64 #1 SMP PREEMPT_DYNAMIC Thu
Nov 27 22:47:00 CST 2025 x86_64 x86_64 x86_64 GNU/Linux
The virtual machine environment for x86 is as follows:
[root@anolis ~]# lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 46 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 20
On-line CPU(s) list: 0-19
Vendor ID: GenuineIntel
BIOS Vendor ID: QEMU
Model name: 12th Gen Intel(R) Core(TM) i7-12700
BIOS Model name: pc-i440fx-8.2 CPU @ 2.0GHz
BIOS CPU family: 1
CPU family: 6
Model: 151
Thread(s) per core: 1
Core(s) per socket: 1
Socket(s): 20
Stepping: 2
BogoMIPS: 4224.00
Flags: fpu vme de pse tsc msr pae mce cx8
apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2
ss syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtop
ology cpuid tsc_known_freq pni
pclmulqdq vmx ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe
popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor
lahf_l
m abm 3dnowprefetch cpuid_fault
ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow flexpriority ept
vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid r
dseed adx smap clflushopt clwb
sha_ni xsaveopt xsavec xgetbv1 xsaves avx_vnni arat vnmi umip
pku ospke waitpkg gfni vaes vpclmulqdq rdpid movdiri movdir64b fsrm
md_clear serialize flush_l1d
arch_capabilities
Virtualization features:
Virtualization: VT-x
Hypervisor vendor: KVM
Virtualization type: full
Caches (sum of all):
L1d: 640 KiB (20 instances)
L1i: 640 KiB (20 instances)
L2: 80 MiB (20 instances)
L3: 320 MiB (20 instances)
NUMA:
NUMA node(s): 1
NUMA node0 CPU(s): 0-19
Vulnerabilities:
Gather data sampling: Not affected
Indirect target selection: Not affected
Itlb multihit: Not affected
L1tf: Not affected
Mds: Not affected
Meltdown: Not affected
Mmio stale data: Not affected
Reg file data sampling: Mitigation; Clear Register File
Retbleed: Not affected
Spec rstack overflow: Not affected
Spec store bypass: Mitigation; Speculative Store
Bypass disabled via prctl
Spectre v1: Mitigation; usercopy/swapgs
barriers and __user pointer sanitization
Spectre v2: Mitigation; Enhanced / Automatic
IBRS; IBPB conditional; PBRSB-eIBRS SW sequence; BHI BHI_DIS_S
Srbds: Not affected
Tsa: Not affected
Tsx async abort: Not affected
[root@anolis ~]# cat /etc/os-release
NAME="Anolis OS"
VERSION="23.3"
ID="anolis"
VERSION_ID="23.3"
PLATFORM_ID="platform:an23"
PRETTY_NAME="Anolis OS 23.3"
ANSI_COLOR="0;31"
HOME_URL="https://openanolis.cn/"
BUG_REPORT_URL="https://bugzilla.openanolis.cn/"
[root@anolis ~]# uname -a
Linux anolis 6.6.102-5.2.an23.x86_64 #1 SMP PREEMPT_DYNAMIC Thu
Nov 27 22:47:00 CST 2025 x86_64 x86_64 x86_64 GNU/Linux
Finally, let me introduce the steps for reproducing the problem:
1.The virtual machine must have multiple vcpus. The issue of a single
core does not exist.
2.In the virtual machine, the corresponding PCI number of virtio blk can
be found through the "lspci" command.
3.Modify the tpci test case for runltp so that it only performs tpci
tests on the virio blk device.
4.Run a file copying program in a virtual machine
5.Run the tpci test case on the virtio blk device
6.Test cases run after using the command/ps aux | grep "D"/ to see if
there is a long time blocked and unable to restore process,
if there is problem reappeared, no need to repeat steps 4 and 5 steps,
until the problem is.
Thanks!
Xianglai.
[-- Attachment #2: Type: text/html, Size: 29387 bytes --]
^ permalink raw reply [flat|nested] only message in thread
only message in thread, other threads:[~2026-03-26 2:50 UTC | newest]
Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-26 2:42 Found that running runltp under the virtual machine caused the virtio-blk-pci device to hang!!! lixianglai
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox