* [Bug 219166] New: ext4 hang when setting echo noop > /sys/block/sda/queue/scheduler
@ 2024-08-16 7:43 bugzilla-daemon
2024-08-16 16:40 ` [Bug 219166] " bugzilla-daemon
` (20 more replies)
0 siblings, 21 replies; 22+ messages in thread
From: bugzilla-daemon @ 2024-08-16 7:43 UTC (permalink / raw)
To: linux-ext4
https://bugzilla.kernel.org/show_bug.cgi?id=219166
Bug ID: 219166
Summary: ext4 hang when setting echo noop >
/sys/block/sda/queue/scheduler
Product: File System
Version: 2.5
Hardware: All
OS: Linux
Status: NEW
Severity: normal
Priority: P3
Component: ext4
Assignee: fs_ext4@kernel-bugs.osdl.org
Reporter: rjones@redhat.com
Regression: No
kernel-6.11.0-0.rc3.30.fc41.x86_64
This hang seems new in 6.11.0.
Downstream bug:
https://bugzilla.redhat.com/show_bug.cgi?id=2303267
Trying to change the queue/scheduler for a device containing an ext4 filesystem
very rarely causes other processes which are using the filesystem to hang.
This is running inside qemu. Using 'crash' I was able to get a stack trace
from a couple of hanging processes (from different VM instances):
crash> set 230
PID: 230
COMMAND: "modprobe"
TASK: ffff98ce03ca3040 [THREAD_INFO: ffff98ce03ca3040]
CPU: 0
STATE: TASK_UNINTERRUPTIBLE
crash> bt
PID: 230 TASK: ffff98ce03ca3040 CPU: 0 COMMAND: "modprobe"
#0 [ffffaf9940307840] __schedule at ffffffff9618f6d0
#1 [ffffaf99403078f8] schedule at ffffffff96190a27
#2 [ffffaf9940307908] __bio_queue_enter at ffffffff957e121c
#3 [ffffaf9940307968] blk_mq_submit_bio at ffffffff957f358c
#4 [ffffaf99403079f0] __submit_bio at ffffffff957e1e3c
#5 [ffffaf9940307a58] submit_bio_noacct_nocheck at ffffffff957e2326
#6 [ffffaf9940307ac0] ext4_mpage_readpages at ffffffff955ceafc
#7 [ffffaf9940307be0] read_pages at ffffffff95381d1a
#8 [ffffaf9940307c40] page_cache_ra_unbounded at ffffffff95381ff5
#9 [ffffaf9940307ca8] filemap_fault at ffffffff953761b5
#10 [ffffaf9940307d48] __do_fault at ffffffff953d1895
#11 [ffffaf9940307d70] do_fault at ffffffff953d2425
#12 [ffffaf9940307da0] __handle_mm_fault at ffffffff953d8c6b
#13 [ffffaf9940307e88] handle_mm_fault at ffffffff953d95c2
#14 [ffffaf9940307ec8] do_user_addr_fault at ffffffff950b34ea
#15 [ffffaf9940307f28] exc_page_fault at ffffffff96186e4e
#16 [ffffaf9940307f50] asm_exc_page_fault at ffffffff962012a6
RIP: 0000556b7a7468d8 RSP: 00007ffde2ffb560 RFLAGS: 00000206
RAX: 00000000000bec82 RBX: 00007f5331a0dc82 RCX: 0000556b7a75b92a
RDX: 00007ffde2ffd8d0 RSI: 00000000200bec82 RDI: 0000556ba8edf960
RBP: 00007ffde2ffb7c0 R8: 0000000000000000 R9: 0000000000000000
R10: 0000000000000000 R11: 0000000000000202 R12: 00000000200bec82
R13: 0000556ba8edf960 R14: 00007ffde2ffd8d0 R15: 0000556b7a760708
ORIG_RAX: ffffffffffffffff CS: 0033 SS: 002b
crash> set 234
PID: 234
COMMAND: "modprobe"
TASK: ffff9f5ec3a22f40 [THREAD_INFO: ffff9f5ec3a22f40]
CPU: 0
STATE: TASK_UNINTERRUPTIBLE
crash> bt
PID: 234 TASK: ffff9f5ec3a22f40 CPU: 0 COMMAND: "modprobe"
#0 [ffffb21e002e7840] __schedule at ffffffffa718f6d0
#1 [ffffb21e002e78f8] schedule at ffffffffa7190a27
#2 [ffffb21e002e7908] __bio_queue_enter at ffffffffa67e121c
#3 [ffffb21e002e7968] blk_mq_submit_bio at ffffffffa67f358c
#4 [ffffb21e002e79f0] __submit_bio at ffffffffa67e1e3c
#5 [ffffb21e002e7a58] submit_bio_noacct_nocheck at ffffffffa67e2326
#6 [ffffb21e002e7ac0] ext4_mpage_readpages at ffffffffa65ceafc
#7 [ffffb21e002e7be0] read_pages at ffffffffa6381d17
#8 [ffffb21e002e7c40] page_cache_ra_unbounded at ffffffffa6381ff5
#9 [ffffb21e002e7ca8] filemap_fault at ffffffffa63761b5
#10 [ffffb21e002e7d48] __do_fault at ffffffffa63d1892
#11 [ffffb21e002e7d70] do_fault at ffffffffa63d2425
#12 [ffffb21e002e7da0] __handle_mm_fault at ffffffffa63d8c6b
#13 [ffffb21e002e7e88] handle_mm_fault at ffffffffa63d95c2
#14 [ffffb21e002e7ec8] do_user_addr_fault at ffffffffa60b34ea
#15 [ffffb21e002e7f28] exc_page_fault at ffffffffa7186e4e
#16 [ffffb21e002e7f50] asm_exc_page_fault at ffffffffa72012a6
RIP: 000055d16159f8d8 RSP: 00007ffdd4c1f340 RFLAGS: 00010206
RAX: 00000000000bec82 RBX: 00007ff2fd00dc82 RCX: 000055d1615b492a
RDX: 00007ffdd4c216b0 RSI: 00000000200bec82 RDI: 000055d185725960
RBP: 00007ffdd4c1f5a0 R8: 0000000000000000 R9: 0000000000000000
R10: 0000000000000000 R11: 0000000000000202 R12: 00000000200bec82
R13: 000055d185725960 R14: 00007ffdd4c216b0 R15: 000055d1615b9708
ORIG_RAX: ffffffffffffffff CS: 0033 SS: 002b
The hanging process was modprobe both times. I don't know if that is
significant.
Note that "noop" is not actually a valid scheduler (for about 10 years)! We
still observe this hang even when setting it to this impossible value.
# echo noop > /sys/block/sda/queue/scheduler
/init: line 108: echo: write error: Invalid argument
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 22+ messages in thread
* [Bug 219166] ext4 hang when setting echo noop > /sys/block/sda/queue/scheduler
2024-08-16 7:43 [Bug 219166] New: ext4 hang when setting echo noop > /sys/block/sda/queue/scheduler bugzilla-daemon
@ 2024-08-16 16:40 ` bugzilla-daemon
2024-08-16 17:06 ` bugzilla-daemon
` (19 subsequent siblings)
20 siblings, 0 replies; 22+ messages in thread
From: bugzilla-daemon @ 2024-08-16 16:40 UTC (permalink / raw)
To: linux-ext4
https://bugzilla.kernel.org/show_bug.cgi?id=219166
Theodore Tso (tytso@mit.edu) changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |tytso@mit.edu
--- Comment #1 from Theodore Tso (tytso@mit.edu) ---
This is a bug in the block layer, not in ext4. It's also a fairly
long-standing issue; I remember seeing something like in our Data Center
kernels at $WORK at least five plus years ago. At the time, we dealt with
this via the simple expedient of "Doctor, Doctor, it hurts when I do that.
Well, then don't do that then!"
At the time it happened relatively rarely; if you're able to reliably reproduce
it, perhaps it would be useful to send a script with said reliable reproducer
to the linux-block list, and perhaps try to do a bisect to see when it became
trivially easy to trigger.
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 22+ messages in thread
* [Bug 219166] ext4 hang when setting echo noop > /sys/block/sda/queue/scheduler
2024-08-16 7:43 [Bug 219166] New: ext4 hang when setting echo noop > /sys/block/sda/queue/scheduler bugzilla-daemon
2024-08-16 16:40 ` [Bug 219166] " bugzilla-daemon
@ 2024-08-16 17:06 ` bugzilla-daemon
2024-08-16 20:36 ` bugzilla-daemon
` (18 subsequent siblings)
20 siblings, 0 replies; 22+ messages in thread
From: bugzilla-daemon @ 2024-08-16 17:06 UTC (permalink / raw)
To: linux-ext4
https://bugzilla.kernel.org/show_bug.cgi?id=219166
--- Comment #2 from Richard W.M. Jones (rjones@redhat.com) ---
We did indeed work around it by not doing it:
https://github.com/libguestfs/libguestfs/commit/b2d682a4730ead8b4ae07e5aaf6fa230c5eec305
so my interest in this bug is now limited. If you want to close it then that's
fine.
It's unfortunately triggered very rarely. The only test I have is:
$ while guestfish -a /dev/null run -vx >& /tmp/log ; do echo -n . ; done
but that hits the bug less than 1 in 200 iterations (although more often when
run in a qemu VM using software emulation, which is where we first hit this.)
I tried to write something that could trigger it more frequently by having a
loop in a guest writing to queue/scheduler, while another loop in the guest
would write to the disk, but I couldn't seem to hit the bug at all.
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 22+ messages in thread
* [Bug 219166] ext4 hang when setting echo noop > /sys/block/sda/queue/scheduler
2024-08-16 7:43 [Bug 219166] New: ext4 hang when setting echo noop > /sys/block/sda/queue/scheduler bugzilla-daemon
2024-08-16 16:40 ` [Bug 219166] " bugzilla-daemon
2024-08-16 17:06 ` bugzilla-daemon
@ 2024-08-16 20:36 ` bugzilla-daemon
2024-08-16 20:51 ` bugzilla-daemon
` (17 subsequent siblings)
20 siblings, 0 replies; 22+ messages in thread
From: bugzilla-daemon @ 2024-08-16 20:36 UTC (permalink / raw)
To: linux-ext4
https://bugzilla.kernel.org/show_bug.cgi?id=219166
--- Comment #3 from Theodore Tso (tytso@mit.edu) ---
So FWIW, what we saw in our data center kernel was switching between one valid
scheduler to a different valid schedule while I/O was in flight. And this was
with a kernel that didn't have any modules (or not any modules that would be
loaded under normal circumstance, so modprobe wouldn't have been in the
picture). It triggered rarely as well, and I don't remember whether it was an
oops or a hang --- if I remember correctly, it was a oops. So it might not be
the same thing, but our workaround was to quiescece the device before changing
the scheduler. Since this was happening in the boot sequence, it was something
we could do relatively easily, and like you we then lost interest. :-)
The question is whether or not I want to close it; the question is whether we
think it's worth trying to ask the block layer developers to try to take a look
at it. Right now it's mostly only ext4 developers who are paying attention to
this bug componet, so someone would need to take it up to the block developers,
and the thing that would be most useful is a reliable reproducer.
I don't know what guestfish is doing, but if we were trying to create a
reproducer from what I was seeing a few years ago, it would be something like
running fsstress or fio to exercise the block layer, and then try switching the
I/O scheduler and see if we can make it go *boom* regularly. Maybe with
something like that we could get a reproducer that doesn't require launching a
VM multiple times and only seeing the failure less than 0.5% of the time....
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 22+ messages in thread
* [Bug 219166] ext4 hang when setting echo noop > /sys/block/sda/queue/scheduler
2024-08-16 7:43 [Bug 219166] New: ext4 hang when setting echo noop > /sys/block/sda/queue/scheduler bugzilla-daemon
` (2 preceding siblings ...)
2024-08-16 20:36 ` bugzilla-daemon
@ 2024-08-16 20:51 ` bugzilla-daemon
2024-08-17 10:22 ` bugzilla-daemon
` (16 subsequent siblings)
20 siblings, 0 replies; 22+ messages in thread
From: bugzilla-daemon @ 2024-08-16 20:51 UTC (permalink / raw)
To: linux-ext4
https://bugzilla.kernel.org/show_bug.cgi?id=219166
--- Comment #4 from Richard W.M. Jones (rjones@redhat.com) ---
I've got a job running now:
# while true; do echo noop > /sys/block/sda/queue/scheduler 2>/dev/null ; done
and fio doing some random writes on an XFS filesystem on sda1.
Baremetal, kernel 6.11.0-0.rc3.20240814git6b0f8db921ab.32.fc42.x86_64.
I'll leave that going overnight to see what happens.
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 22+ messages in thread
* [Bug 219166] ext4 hang when setting echo noop > /sys/block/sda/queue/scheduler
2024-08-16 7:43 [Bug 219166] New: ext4 hang when setting echo noop > /sys/block/sda/queue/scheduler bugzilla-daemon
` (3 preceding siblings ...)
2024-08-16 20:51 ` bugzilla-daemon
@ 2024-08-17 10:22 ` bugzilla-daemon
2024-08-17 12:36 ` bugzilla-daemon
` (15 subsequent siblings)
20 siblings, 0 replies; 22+ messages in thread
From: bugzilla-daemon @ 2024-08-17 10:22 UTC (permalink / raw)
To: linux-ext4
https://bugzilla.kernel.org/show_bug.cgi?id=219166
--- Comment #5 from Richard W.M. Jones (rjones@redhat.com) ---
I'm afraid that didn't reproduce the issue. I have an idea to run the same
test, but in a software emulated virtual machine. The idea is that the much
slower performance will exaggerate any race conditions in the kernel. However
this will take time.
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 22+ messages in thread
* [Bug 219166] ext4 hang when setting echo noop > /sys/block/sda/queue/scheduler
2024-08-16 7:43 [Bug 219166] New: ext4 hang when setting echo noop > /sys/block/sda/queue/scheduler bugzilla-daemon
` (4 preceding siblings ...)
2024-08-17 10:22 ` bugzilla-daemon
@ 2024-08-17 12:36 ` bugzilla-daemon
2024-08-17 12:38 ` bugzilla-daemon
` (14 subsequent siblings)
20 siblings, 0 replies; 22+ messages in thread
From: bugzilla-daemon @ 2024-08-17 12:36 UTC (permalink / raw)
To: linux-ext4
https://bugzilla.kernel.org/show_bug.cgi?id=219166
--- Comment #6 from Richard W.M. Jones (rjones@redhat.com) ---
Yes I can reproduce this inside a software emulated VM with another 6.11.0
Fedora kernel. I will bisect this later, but for now reproduction instructions
are given below.
(1) Install a Fedora 40 virtual machine. I used the command below but other
ways are available:
virt-builder fedora-40 --size=10G --root-password=password:123456
(2) Run the VM in qemu with software emulation (TCG):
qemu-system-x86_64 -machine accel=tcg -cpu qemu64 -m 4096 -drive
file=/var/tmp/fedora-40.qcow2,format=qcow2,if=virtio
(3) Inside the VM, log in as root/123456, install fio, and update the kernel:
dnf install fedora-repos-rawhide
dnf install fio
dnf update kernel
reboot
(should upgrade to 6.11.0 and boot into that kernel).
(4) Inside the VM, in one terminal run:
while true; do echo noop > /sys/block/sda/queue/scheduler 2>/dev/null ; done
(5) Inside the VM, in another terminal run fio with the following config or
similar:
[global]
name=fio-rand-write
filename=/root/fio-rand-write
rw=randwrite
bs=4K
numjobs=4
time_based
runtime=1h
group_reporting
[file1]
size=1G
ioengine=libaio
iodepth=64
(6) After a while the fio process ETA will start counting up (since one or more
threads have got stuck and are not making progress). Also logging in is
problematic and many common commands like 'dmesg' or 'ps' hang. I could only
recover by rebooting.
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 22+ messages in thread
* [Bug 219166] ext4 hang when setting echo noop > /sys/block/sda/queue/scheduler
2024-08-16 7:43 [Bug 219166] New: ext4 hang when setting echo noop > /sys/block/sda/queue/scheduler bugzilla-daemon
` (5 preceding siblings ...)
2024-08-17 12:36 ` bugzilla-daemon
@ 2024-08-17 12:38 ` bugzilla-daemon
2024-08-17 12:41 ` bugzilla-daemon
` (13 subsequent siblings)
20 siblings, 0 replies; 22+ messages in thread
From: bugzilla-daemon @ 2024-08-17 12:38 UTC (permalink / raw)
To: linux-ext4
https://bugzilla.kernel.org/show_bug.cgi?id=219166
--- Comment #7 from Richard W.M. Jones (rjones@redhat.com) ---
> virt-builder fedora-40 --size=10G --root-password=password:123456
Should be:
virt-builder fedora-40 --size=10G --root-password=password:123456
--format=qcow2
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 22+ messages in thread
* [Bug 219166] ext4 hang when setting echo noop > /sys/block/sda/queue/scheduler
2024-08-16 7:43 [Bug 219166] New: ext4 hang when setting echo noop > /sys/block/sda/queue/scheduler bugzilla-daemon
` (6 preceding siblings ...)
2024-08-17 12:38 ` bugzilla-daemon
@ 2024-08-17 12:41 ` bugzilla-daemon
2024-08-17 13:58 ` bugzilla-daemon
` (12 subsequent siblings)
20 siblings, 0 replies; 22+ messages in thread
From: bugzilla-daemon @ 2024-08-17 12:41 UTC (permalink / raw)
To: linux-ext4
https://bugzilla.kernel.org/show_bug.cgi?id=219166
--- Comment #8 from Richard W.M. Jones (rjones@redhat.com) ---
> while true; do echo noop > /sys/block/sda/queue/scheduler 2>/dev/null ; done
Should be:
while true; do echo noop > /sys/block/vda/queue/scheduler 2>/dev/null ; done
as the guest is using virtio. (Copied and pasted the instructions from my host
test, but I did use the correct command in the VM test.)
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 22+ messages in thread
* [Bug 219166] ext4 hang when setting echo noop > /sys/block/sda/queue/scheduler
2024-08-16 7:43 [Bug 219166] New: ext4 hang when setting echo noop > /sys/block/sda/queue/scheduler bugzilla-daemon
` (7 preceding siblings ...)
2024-08-17 12:41 ` bugzilla-daemon
@ 2024-08-17 13:58 ` bugzilla-daemon
2024-08-20 15:33 ` bugzilla-daemon
` (11 subsequent siblings)
20 siblings, 0 replies; 22+ messages in thread
From: bugzilla-daemon @ 2024-08-17 13:58 UTC (permalink / raw)
To: linux-ext4
https://bugzilla.kernel.org/show_bug.cgi?id=219166
--- Comment #9 from Richard W.M. Jones (rjones@redhat.com) ---
Also this does *not* reproduce with a 6.8.5 kernel on the same VM, so I really
do think this is a regression.
This looks like a very tedious bisect but I'll have a go if I have time later.
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 22+ messages in thread
* [Bug 219166] ext4 hang when setting echo noop > /sys/block/sda/queue/scheduler
2024-08-16 7:43 [Bug 219166] New: ext4 hang when setting echo noop > /sys/block/sda/queue/scheduler bugzilla-daemon
` (8 preceding siblings ...)
2024-08-17 13:58 ` bugzilla-daemon
@ 2024-08-20 15:33 ` bugzilla-daemon
2024-08-20 15:33 ` [Bug 219166] occasional block layer hang when setting 'echo noop > /sys/block/sda/queue/scheduler' bugzilla-daemon
` (10 subsequent siblings)
20 siblings, 0 replies; 22+ messages in thread
From: bugzilla-daemon @ 2024-08-20 15:33 UTC (permalink / raw)
To: linux-ext4
https://bugzilla.kernel.org/show_bug.cgi?id=219166
--- Comment #10 from Richard W.M. Jones (rjones@redhat.com) ---
Just to close this one out ...
I couldn't reproduce this issue when I compiled the kernel myself, even though
I was using the exact same .config as Fedora uses, the same tag, building it on
Fedora 40, and Fedora itself does not have any significant downstream patches.
There were a few differences, for example I'm probably using a slightly
different version of gcc/binutils than the Fedora kernel builders.
So being unable to reproduce it in a self-compiled kernel, I cannot bisect it.
We have worked around the problem, so that's basically as far as I want to take
this bug. Feel free to close it if you want.
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 22+ messages in thread
* [Bug 219166] occasional block layer hang when setting 'echo noop > /sys/block/sda/queue/scheduler'
2024-08-16 7:43 [Bug 219166] New: ext4 hang when setting echo noop > /sys/block/sda/queue/scheduler bugzilla-daemon
` (9 preceding siblings ...)
2024-08-20 15:33 ` bugzilla-daemon
@ 2024-08-20 15:33 ` bugzilla-daemon
2024-09-05 1:04 ` bugzilla-daemon
` (9 subsequent siblings)
20 siblings, 0 replies; 22+ messages in thread
From: bugzilla-daemon @ 2024-08-20 15:33 UTC (permalink / raw)
To: linux-ext4
https://bugzilla.kernel.org/show_bug.cgi?id=219166
Richard W.M. Jones (rjones@redhat.com) changed:
What |Removed |Added
----------------------------------------------------------------------------
Summary|ext4 hang when setting echo |occasional block layer hang
|noop > |when setting 'echo noop >
|/sys/block/sda/queue/schedu |/sys/block/sda/queue/schedu
|ler |ler'
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 22+ messages in thread
* [Bug 219166] occasional block layer hang when setting 'echo noop > /sys/block/sda/queue/scheduler'
2024-08-16 7:43 [Bug 219166] New: ext4 hang when setting echo noop > /sys/block/sda/queue/scheduler bugzilla-daemon
` (10 preceding siblings ...)
2024-08-20 15:33 ` [Bug 219166] occasional block layer hang when setting 'echo noop > /sys/block/sda/queue/scheduler' bugzilla-daemon
@ 2024-09-05 1:04 ` bugzilla-daemon
2024-09-05 7:25 ` bugzilla-daemon
` (8 subsequent siblings)
20 siblings, 0 replies; 22+ messages in thread
From: bugzilla-daemon @ 2024-09-05 1:04 UTC (permalink / raw)
To: linux-ext4
https://bugzilla.kernel.org/show_bug.cgi?id=219166
Lei Ming (tom.leiming@gmail.com) changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |tom.leiming@gmail.com
--- Comment #11 from Lei Ming (tom.leiming@gmail.com) ---
Hi Richard,
Can you collect the blk-mq debugfs log for this virtio-blk devices?
(cd /sys/kernel/debug/block/vda && find . -type f -exec grep -aH . {} \;)
There is known io hang risk in case of MQ:
https://lore.kernel.org/linux-block/20240903081653.65613-1-songmuchun@bytedance.com/T/#m0212e28431d08e00a3de132ec40b2b907bb07177
If you or anyone can reproduce the issue, please test the above patches.
Thanks,
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 22+ messages in thread
* [Bug 219166] occasional block layer hang when setting 'echo noop > /sys/block/sda/queue/scheduler'
2024-08-16 7:43 [Bug 219166] New: ext4 hang when setting echo noop > /sys/block/sda/queue/scheduler bugzilla-daemon
` (11 preceding siblings ...)
2024-09-05 1:04 ` bugzilla-daemon
@ 2024-09-05 7:25 ` bugzilla-daemon
2024-09-05 9:32 ` bugzilla-daemon
` (7 subsequent siblings)
20 siblings, 0 replies; 22+ messages in thread
From: bugzilla-daemon @ 2024-09-05 7:25 UTC (permalink / raw)
To: linux-ext4
https://bugzilla.kernel.org/show_bug.cgi?id=219166
--- Comment #12 from Richard W.M. Jones (rjones@redhat.com) ---
Is it possible to get this from 'crash' utility (ie from kernel structs)? The
VM hangs fairly hard when this problem is hit.
I can try the kernel patches though, will see if I can do that today.
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 22+ messages in thread
* [Bug 219166] occasional block layer hang when setting 'echo noop > /sys/block/sda/queue/scheduler'
2024-08-16 7:43 [Bug 219166] New: ext4 hang when setting echo noop > /sys/block/sda/queue/scheduler bugzilla-daemon
` (12 preceding siblings ...)
2024-09-05 7:25 ` bugzilla-daemon
@ 2024-09-05 9:32 ` bugzilla-daemon
2024-09-06 19:46 ` bugzilla-daemon
` (6 subsequent siblings)
20 siblings, 0 replies; 22+ messages in thread
From: bugzilla-daemon @ 2024-09-05 9:32 UTC (permalink / raw)
To: linux-ext4
https://bugzilla.kernel.org/show_bug.cgi?id=219166
--- Comment #13 from Richard W.M. Jones (rjones@redhat.com) ---
To document for myself and others what I did to reproduce the bug and get the
kernel stack trace ...
(1) libguestfs from git with this patch reverted:
https://github.com/libguestfs/libguestfs/commit/b2d682a4730ead8b4ae07e5aaf6fa230c5eec305
(2) Run guestfish in a loop until it hangs:
$ while LIBGUESTFS_BACKEND_SETTINGS=force_tcg ./run guestfish -a /dev/null run
-vx >& /tmp/log ; do echo -n . ; done
(3) Looking /tmp/log we can see it hung just after trying to set noop
scheduler:
$ tail -5 /tmp/log
+ echo 300
+ for f in /sys/block/sd*/device/timeout
+ echo 300
+ for f in /sys/block/{h,s,ub,v}d*/queue/scheduler
+ echo noop
(4) Check the log for the kernel version, install the corresponding kernel
debuginfo.
(5) Get virsh to produce a core dump of the VM:
$ virsh list
Id Name State
--------------------------------------------
1950 guestfs-lsdbxy71u4jg1w6x running
$ virsh dump 1950 /var/tmp/core --memory-only
Domain '1950' dumped to /var/tmp/core
(6) Open in 'crash':
$ crash
/usr/lib/debug/lib/modules/6.11.0-0.rc5.20240830git20371ba12063.47.fc42.x86_64/vmlinux
/var/tmp/core
(7) List processes and find the one which hung:
crash> ps
...
230 73 0 ffffa01f83c58000 UN 0.3 11608 3340 modprobe
(8) Get stack trace from the hung process:
crash> set 230
PID: 230
COMMAND: "modprobe"
TASK: ffffa01f83c58000 [THREAD_INFO: ffffa01f83c58000]
CPU: 0
STATE: TASK_UNINTERRUPTIBLE
crash> bt
PID: 230 TASK: ffffa01f83c58000 CPU: 0 COMMAND: "modprobe"
#0 [ffffc1db0030f840] __schedule at ffffffff921906d0
#1 [ffffc1db0030f8f8] schedule at ffffffff92191a27
#2 [ffffc1db0030f908] __bio_queue_enter at ffffffff917e17dc
#3 [ffffc1db0030f968] blk_mq_submit_bio at ffffffff917f3b4c
#4 [ffffc1db0030f9f0] __submit_bio at ffffffff917e23fc
#5 [ffffc1db0030fa58] submit_bio_noacct_nocheck at ffffffff917e28e6
#6 [ffffc1db0030fac0] ext4_mpage_readpages at ffffffff915cef7c
#7 [ffffc1db0030fbe0] read_pages at ffffffff91381cda
#8 [ffffc1db0030fc40] page_cache_ra_unbounded at ffffffff91381fb5
#9 [ffffc1db0030fca8] filemap_fault at ffffffff91376175
#10 [ffffc1db0030fd48] __do_fault at ffffffff913d1755
#11 [ffffc1db0030fd70] do_fault at ffffffff913d22e5
#12 [ffffc1db0030fda0] __handle_mm_fault at ffffffff913d8b2b
#13 [ffffc1db0030fe88] handle_mm_fault at ffffffff913d9472
#14 [ffffc1db0030fec8] do_user_addr_fault at ffffffff910b34ea
#15 [ffffc1db0030ff28] exc_page_fault at ffffffff92187e4e
#16 [ffffc1db0030ff50] asm_exc_page_fault at ffffffff922012a6
RIP: 000055bb085508d8 RSP: 00007ffc3e731900 RFLAGS: 00010206
RAX: 00000000000becd6 RBX: 00007f39925d1cd6 RCX: 000055bb0856592a
RDX: 00007ffc3e733c70 RSI: 00000000200becd6 RDI: 000055bb1a712970
RBP: 00007ffc3e731b60 R8: 0000000000000000 R9: 0000000000000000
R10: 0000000000000000 R11: 0000000000000202 R12: 00000000200becd6
R13: 000055bb1a712970 R14: 00007ffc3e733c70 R15: 000055bb0856a708
ORIG_RAX: ffffffffffffffff CS: 0033 SS: 002b
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 22+ messages in thread
* [Bug 219166] occasional block layer hang when setting 'echo noop > /sys/block/sda/queue/scheduler'
2024-08-16 7:43 [Bug 219166] New: ext4 hang when setting echo noop > /sys/block/sda/queue/scheduler bugzilla-daemon
` (13 preceding siblings ...)
2024-09-05 9:32 ` bugzilla-daemon
@ 2024-09-06 19:46 ` bugzilla-daemon
2024-09-06 20:28 ` bugzilla-daemon
` (5 subsequent siblings)
20 siblings, 0 replies; 22+ messages in thread
From: bugzilla-daemon @ 2024-09-06 19:46 UTC (permalink / raw)
To: linux-ext4
https://bugzilla.kernel.org/show_bug.cgi?id=219166
--- Comment #14 from Richard W.M. Jones (rjones@redhat.com) ---
I think I have bisected this to:
commit af2814149883e2c1851866ea2afcd8eadc040f79
Author: Christoph Hellwig <hch@lst.de>
Date: Mon Jun 17 08:04:38 2024 +0200
block: freeze the queue in queue_attr_store
queue_attr_store updates attributes used to control generating I/O, and
can cause malformed bios if changed with I/O in flight. Freeze the queue
in common code instead of adding it to almost every attribute.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20240617060532.127975-12-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 22+ messages in thread
* [Bug 219166] occasional block layer hang when setting 'echo noop > /sys/block/sda/queue/scheduler'
2024-08-16 7:43 [Bug 219166] New: ext4 hang when setting echo noop > /sys/block/sda/queue/scheduler bugzilla-daemon
` (14 preceding siblings ...)
2024-09-06 19:46 ` bugzilla-daemon
@ 2024-09-06 20:28 ` bugzilla-daemon
2024-09-06 20:32 ` bugzilla-daemon
` (4 subsequent siblings)
20 siblings, 0 replies; 22+ messages in thread
From: bugzilla-daemon @ 2024-09-06 20:28 UTC (permalink / raw)
To: linux-ext4
https://bugzilla.kernel.org/show_bug.cgi?id=219166
--- Comment #15 from Richard W.M. Jones (rjones@redhat.com) ---
Created attachment 306825
--> https://bugzilla.kernel.org/attachment.cgi?id=306825&action=edit
'foreach bt' in crash utility
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 22+ messages in thread
* [Bug 219166] occasional block layer hang when setting 'echo noop > /sys/block/sda/queue/scheduler'
2024-08-16 7:43 [Bug 219166] New: ext4 hang when setting echo noop > /sys/block/sda/queue/scheduler bugzilla-daemon
` (15 preceding siblings ...)
2024-09-06 20:28 ` bugzilla-daemon
@ 2024-09-06 20:32 ` bugzilla-daemon
2024-09-07 7:49 ` bugzilla-daemon
` (3 subsequent siblings)
20 siblings, 0 replies; 22+ messages in thread
From: bugzilla-daemon @ 2024-09-06 20:32 UTC (permalink / raw)
To: linux-ext4
https://bugzilla.kernel.org/show_bug.cgi?id=219166
--- Comment #16 from Richard W.M. Jones (rjones@redhat.com) ---
Reverting commit af2814149883e2c1851866ea2afcd8eadc040f79 on top of current git
head (788220eee30d6) either fixes or hides the problem enough to fix my
reproducer.
The revert is not clean, but only 2 hunks fail and they were trivial to fix up.
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 22+ messages in thread
* [Bug 219166] occasional block layer hang when setting 'echo noop > /sys/block/sda/queue/scheduler'
2024-08-16 7:43 [Bug 219166] New: ext4 hang when setting echo noop > /sys/block/sda/queue/scheduler bugzilla-daemon
` (16 preceding siblings ...)
2024-09-06 20:32 ` bugzilla-daemon
@ 2024-09-07 7:49 ` bugzilla-daemon
2024-09-07 11:09 ` bugzilla-daemon
` (2 subsequent siblings)
20 siblings, 0 replies; 22+ messages in thread
From: bugzilla-daemon @ 2024-09-07 7:49 UTC (permalink / raw)
To: linux-ext4
https://bugzilla.kernel.org/show_bug.cgi?id=219166
--- Comment #17 from Richard W.M. Jones (rjones@redhat.com) ---
http://oirase.annexia.org/tmp/kbug-219166/
vmlinux and memory core dump captured when the bug happened. You
can open this using the 'crash' utility. The corresponding kernel is:
https://koji.fedoraproject.org/koji/buildinfo?buildID=2538598
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 22+ messages in thread
* [Bug 219166] occasional block layer hang when setting 'echo noop > /sys/block/sda/queue/scheduler'
2024-08-16 7:43 [Bug 219166] New: ext4 hang when setting echo noop > /sys/block/sda/queue/scheduler bugzilla-daemon
` (17 preceding siblings ...)
2024-09-07 7:49 ` bugzilla-daemon
@ 2024-09-07 11:09 ` bugzilla-daemon
2024-09-07 11:10 ` bugzilla-daemon
2024-09-18 8:36 ` bugzilla-daemon
20 siblings, 0 replies; 22+ messages in thread
From: bugzilla-daemon @ 2024-09-07 11:09 UTC (permalink / raw)
To: linux-ext4
https://bugzilla.kernel.org/show_bug.cgi?id=219166
Richard W.M. Jones (rjones@redhat.com) changed:
What |Removed |Added
----------------------------------------------------------------------------
Component|ext4 |Block Layer
Product|File System |IO/Storage
--- Comment #18 from Richard W.M. Jones (rjones@redhat.com) ---
Upstream discussion:
https://lore.kernel.org/linux-block/20240907073522.GW1450@redhat.com/T/
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 22+ messages in thread
* [Bug 219166] occasional block layer hang when setting 'echo noop > /sys/block/sda/queue/scheduler'
2024-08-16 7:43 [Bug 219166] New: ext4 hang when setting echo noop > /sys/block/sda/queue/scheduler bugzilla-daemon
` (18 preceding siblings ...)
2024-09-07 11:09 ` bugzilla-daemon
@ 2024-09-07 11:10 ` bugzilla-daemon
2024-09-18 8:36 ` bugzilla-daemon
20 siblings, 0 replies; 22+ messages in thread
From: bugzilla-daemon @ 2024-09-07 11:10 UTC (permalink / raw)
To: linux-ext4
https://bugzilla.kernel.org/show_bug.cgi?id=219166
Richard W.M. Jones (rjones@redhat.com) changed:
What |Removed |Added
----------------------------------------------------------------------------
Kernel Version| |6.11.0
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 22+ messages in thread
* [Bug 219166] occasional block layer hang when setting 'echo noop > /sys/block/sda/queue/scheduler'
2024-08-16 7:43 [Bug 219166] New: ext4 hang when setting echo noop > /sys/block/sda/queue/scheduler bugzilla-daemon
` (19 preceding siblings ...)
2024-09-07 11:10 ` bugzilla-daemon
@ 2024-09-18 8:36 ` bugzilla-daemon
20 siblings, 0 replies; 22+ messages in thread
From: bugzilla-daemon @ 2024-09-18 8:36 UTC (permalink / raw)
To: linux-ext4
https://bugzilla.kernel.org/show_bug.cgi?id=219166
Richard W.M. Jones (rjones@redhat.com) changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
URL| |https://git.kernel.org/pub/
| |scm/linux/kernel/git/torval
| |ds/linux.git/commit/?id=734
| |e1a8603128ac31526c477a39456
| |be5f4092b6
Resolution|--- |CODE_FIX
--- Comment #19 from Richard W.M. Jones (rjones@redhat.com) ---
Closing as fixed in:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=734e1a8603128ac31526c477a39456be5f4092b6
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2024-09-18 8:36 UTC | newest]
Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-16 7:43 [Bug 219166] New: ext4 hang when setting echo noop > /sys/block/sda/queue/scheduler bugzilla-daemon
2024-08-16 16:40 ` [Bug 219166] " bugzilla-daemon
2024-08-16 17:06 ` bugzilla-daemon
2024-08-16 20:36 ` bugzilla-daemon
2024-08-16 20:51 ` bugzilla-daemon
2024-08-17 10:22 ` bugzilla-daemon
2024-08-17 12:36 ` bugzilla-daemon
2024-08-17 12:38 ` bugzilla-daemon
2024-08-17 12:41 ` bugzilla-daemon
2024-08-17 13:58 ` bugzilla-daemon
2024-08-20 15:33 ` bugzilla-daemon
2024-08-20 15:33 ` [Bug 219166] occasional block layer hang when setting 'echo noop > /sys/block/sda/queue/scheduler' bugzilla-daemon
2024-09-05 1:04 ` bugzilla-daemon
2024-09-05 7:25 ` bugzilla-daemon
2024-09-05 9:32 ` bugzilla-daemon
2024-09-06 19:46 ` bugzilla-daemon
2024-09-06 20:28 ` bugzilla-daemon
2024-09-06 20:32 ` bugzilla-daemon
2024-09-07 7:49 ` bugzilla-daemon
2024-09-07 11:09 ` bugzilla-daemon
2024-09-07 11:10 ` bugzilla-daemon
2024-09-18 8:36 ` bugzilla-daemon
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).