* [PROBLEM] nbd requests become stuck when devices watched by inotify emit udev uevent changes
@ 2022-04-22 5:42 Matthew Ruffell
2022-04-22 15:23 ` Josef Bacik
0 siblings, 1 reply; 10+ messages in thread
From: Matthew Ruffell @ 2022-04-22 5:42 UTC (permalink / raw)
To: josef; +Cc: axboe, linux-block, nbd, linux-kernel
Dear maintainers of the nbd subsystem,
A user has come across an issue which causes the nbd module to hang after a
disconnect where a write has been made to a qemu qcow image file, with qemu-nbd
being the server.
The issue is easily reproducible with the following:
Ubuntu 20.04, 22.04 or Fedora 36
Linux 5.18-rc2 or earlier (have tested 5.18-rc2, 5.15, 5.4, 4.15)
QEMU 6.2 or earlier
Instructions to reproduce:
==========================
$ sudo apt install qemu-nbd
$ cat << EOF >> reproducer.sh
#!/bin/bash
sudo modprobe nbd
while :
do
qemu-img create -f qcow2 foo.img 500M
sudo qemu-nbd --disconnect /dev/nbd15 || true
sudo qemu-nbd --connect=/dev/nbd15 --cache=writeback --format=qcow2 foo.img
sudo mkfs.ext4 -L root -O "^64bit" -E nodiscard /dev/nbd15
sudo qemu-nbd --disconnect /dev/nbd15
done
EOF
$ chmod +x reproducer.sh
$ ./reproducer.sh
On Ubuntu, the terminal will pause within a minute or two, and dmesg will report
a lot of I/O errors, followed by hung task timeouts. On Fedora, it takes a
little longer, but it will break in the same way within 10 minutes.
An example kernel log is below:
https://paste.ubuntu.com/p/5ZjC5b8MR7/
Debugging done:
===============
Looking at syslog, it seems systemd-udevd gets stuck, and enters D state.
systemd-udevd[419]: nbd15: Worker [2004] processing SEQNUM=5661 is taking a long time
$ ps aux
...
419 1194 root D 0.1 systemd-udevd -
I rebooted, and disabled systemd-udevd and its sockets, with:
$ sudo systemctl stop systemd-udevd.service
$ sudo systemctl stop systemd-udevd-control.socket
$ sudo systemctl stop systemd-udevd-kernel.socket
When running the reproducer again, everything works fine, and nbd subsystem does
not hang.
Looking at udev rules, I looked at the differences between those in Ubuntu 18.04
where the issue does not occur, and 20.04, where it does, and came across:
/usr/lib/udev/rules.d/60-block.rules
In 18.04:
# watch metadata changes, caused by tools closing the device node which was opened for writing
ACTION!="remove", SUBSYSTEM=="block", KERNEL=="loop*|nvme*|sd*|vd*|xvd*|pmem*|mmcblk*", OPTIONS+="watch"
In 20.04:
# watch metadata changes, caused by tools closing the device node which was opened for writing
ACTION!="remove", SUBSYSTEM=="block", \
KERNEL=="loop*|mmcblk*[0-9]|msblk*[0-9]|mspblk*[0-9]|nvme*|sd*|vd*|xvd*|bcache*|cciss*|dasd*|ubd*|ubi*|scm*|pmem*|nbd*|zd*", \
OPTIONS+="watch"
The difference is the OPTIONS+="watch" being added to nbd* any event not remove.
When I deleted nbd* and retried the reproducer again, everything works smoothly.
Looking at the manpage for udev:
> watch
> Watch the device node with inotify; when the node is closed after being
> opened for writing, a change uevent is synthesized.
>
> nowatch
> Disable the watching of a device node with inotify.
It appears that there is some sort of problem where systemd-udevd uses inotify
to watch for updates to the device metadata, and it blocks a subsequent
disconnect request, causing it to fail with:
block nbd15: Send disconnect failed -32
After which we start seeing stuck requests:
block nbd15: Possible stuck request 000000007fcf62ba: control (read@523915264,24576B). Runtime 30 seconds
All userspace calls to the nbd module hang, and the system has to be rebooted.
Workaround:
===========
We can workaround the issue by adding a higher priority udev rule to not watch
nbd* devices.
$ cat << EOF >> /etc/udev/rules.d/97-nbd-device.rules
# Disable inotify watching of change events for NBD devices
ACTION=="add|change", KERNEL=="nbd*", OPTIONS:="nowatch"
EOF
$ sudo udevadm control --reload-rules
$ sudo udevadm trigger
Help on debugging the problem:
==============================
I need some help debugging the problem, as I am not quite sure how to trace
the interactions between inotify and nbd.
I am happy to help debug the issue, or try any patches that gather debugging
data or any potential fixes.
Thanks,
Matthew
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [PROBLEM] nbd requests become stuck when devices watched by inotify emit udev uevent changes 2022-04-22 5:42 [PROBLEM] nbd requests become stuck when devices watched by inotify emit udev uevent changes Matthew Ruffell @ 2022-04-22 15:23 ` Josef Bacik 2022-04-25 21:47 ` Matthew Ruffell 0 siblings, 1 reply; 10+ messages in thread From: Josef Bacik @ 2022-04-22 15:23 UTC (permalink / raw) To: Matthew Ruffell; +Cc: Jens Axboe, linux-block, nbd, Linux Kernel On Fri, Apr 22, 2022 at 1:42 AM Matthew Ruffell <matthew.ruffell@canonical.com> wrote: > > Dear maintainers of the nbd subsystem, > > A user has come across an issue which causes the nbd module to hang after a > disconnect where a write has been made to a qemu qcow image file, with qemu-nbd > being the server. > Ok there's two problems here, but I want to make sure I have the right fix for the hang first. Can you apply this patch https://paste.centos.org/view/b1a2d01a and make sure the hang goes away? Once that part is fixed I'll fix the IO errors, this is just us racing with systemd while we teardown the device and then we're triggering a partition read while the device is going down and it's complaining loudly. Before we would set_capacity to 0 whenever we disconnected, but that causes problems with file systems that may still have the device open. However now we only do this if the server does the CLEAR_SOCK ioctl, which clearly can race with systemd poking the device, so I need to make it set_capacity(0) when the last opener closes the device to prevent this style of race. Let me know if that patch fixes the hang, and then I'll work up something for the capacity problem. Thanks, Josef ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PROBLEM] nbd requests become stuck when devices watched by inotify emit udev uevent changes 2022-04-22 15:23 ` Josef Bacik @ 2022-04-25 21:47 ` Matthew Ruffell 2022-05-13 2:56 ` Matthew Ruffell 0 siblings, 1 reply; 10+ messages in thread From: Matthew Ruffell @ 2022-04-25 21:47 UTC (permalink / raw) To: Josef Bacik; +Cc: Jens Axboe, linux-block, nbd, Linux Kernel Hi Josef, The pastebin has expired the link, and I can't access your patch. Seems to default to 1 day deletion. Could you please create a new paste or send the patch inline in this email thread? I am more than happy to try the patch out. Thank you for your analysis. Matthew On Sat, Apr 23, 2022 at 3:24 AM Josef Bacik <josef@toxicpanda.com> wrote: > > On Fri, Apr 22, 2022 at 1:42 AM Matthew Ruffell > <matthew.ruffell@canonical.com> wrote: > > > > Dear maintainers of the nbd subsystem, > > > > A user has come across an issue which causes the nbd module to hang after a > > disconnect where a write has been made to a qemu qcow image file, with qemu-nbd > > being the server. > > > > Ok there's two problems here, but I want to make sure I have the right > fix for the hang first. Can you apply this patch > > https://paste.centos.org/view/b1a2d01a > > and make sure the hang goes away? Once that part is fixed I'll fix > the IO errors, this is just us racing with systemd while we teardown > the device and then we're triggering a partition read while the device > is going down and it's complaining loudly. Before we would > set_capacity to 0 whenever we disconnected, but that causes problems > with file systems that may still have the device open. However now we > only do this if the server does the CLEAR_SOCK ioctl, which clearly > can race with systemd poking the device, so I need to make it > set_capacity(0) when the last opener closes the device to prevent this > style of race. > > Let me know if that patch fixes the hang, and then I'll work up > something for the capacity problem. Thanks, > > Josef ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PROBLEM] nbd requests become stuck when devices watched by inotify emit udev uevent changes 2022-04-25 21:47 ` Matthew Ruffell @ 2022-05-13 2:56 ` Matthew Ruffell 2022-05-13 3:20 ` yukuai (C) 2022-05-13 13:13 ` Josef Bacik 0 siblings, 2 replies; 10+ messages in thread From: Matthew Ruffell @ 2022-05-13 2:56 UTC (permalink / raw) To: Josef Bacik; +Cc: Jens Axboe, linux-block, nbd, Linux Kernel, yukuai3 Hi Josef, Just a friendly ping, I am more than happy to test a patch, if you send it inline in the email, since the pastebin you used expired after 1 day, and I couldn't access it. I came across and tested Yu Kuai's patches [1][2] which are for the same issue, and they indeed fix the hang. Thank you Yu. [1] nbd: don't clear 'NBD_CMD_INFLIGHT' flag if request is not completed https://lists.debian.org/nbd/2022/04/msg00212.html [2] nbd: fix io hung while disconnecting device https://lists.debian.org/nbd/2022/04/msg00207.html I am also happy to test any patches to fix the I/O errors. Thanks, Matthew On Tue, Apr 26, 2022 at 9:47 AM Matthew Ruffell <matthew.ruffell@canonical.com> wrote: > > Hi Josef, > > The pastebin has expired the link, and I can't access your patch. > Seems to default to 1 day deletion. > > Could you please create a new paste or send the patch inline in this > email thread? > > I am more than happy to try the patch out. > > Thank you for your analysis. > Matthew > > On Sat, Apr 23, 2022 at 3:24 AM Josef Bacik <josef@toxicpanda.com> wrote: > > > > On Fri, Apr 22, 2022 at 1:42 AM Matthew Ruffell > > <matthew.ruffell@canonical.com> wrote: > > > > > > Dear maintainers of the nbd subsystem, > > > > > > A user has come across an issue which causes the nbd module to hang after a > > > disconnect where a write has been made to a qemu qcow image file, with qemu-nbd > > > being the server. > > > > > > > Ok there's two problems here, but I want to make sure I have the right > > fix for the hang first. Can you apply this patch > > > > https://paste.centos.org/view/b1a2d01a > > > > and make sure the hang goes away? Once that part is fixed I'll fix > > the IO errors, this is just us racing with systemd while we teardown > > the device and then we're triggering a partition read while the device > > is going down and it's complaining loudly. Before we would > > set_capacity to 0 whenever we disconnected, but that causes problems > > with file systems that may still have the device open. However now we > > only do this if the server does the CLEAR_SOCK ioctl, which clearly > > can race with systemd poking the device, so I need to make it > > set_capacity(0) when the last opener closes the device to prevent this > > style of race. > > > > Let me know if that patch fixes the hang, and then I'll work up > > something for the capacity problem. Thanks, > > > > Josef ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PROBLEM] nbd requests become stuck when devices watched by inotify emit udev uevent changes 2022-05-13 2:56 ` Matthew Ruffell @ 2022-05-13 3:20 ` yukuai (C) 2022-05-13 13:13 ` Josef Bacik 1 sibling, 0 replies; 10+ messages in thread From: yukuai (C) @ 2022-05-13 3:20 UTC (permalink / raw) To: Matthew Ruffell, Josef Bacik; +Cc: Jens Axboe, linux-block, nbd, Linux Kernel 在 2022/05/13 10:56, Matthew Ruffell 写道: > Hi Josef, > > Just a friendly ping, I am more than happy to test a patch, if you send it > inline in the email, since the pastebin you used expired after 1 day, and I > couldn't access it. > > I came across and tested Yu Kuai's patches [1][2] which are for the same issue, > and they indeed fix the hang. Thank you Yu. Hi, Matthew Thanks for your test. > > [1] nbd: don't clear 'NBD_CMD_INFLIGHT' flag if request is not completed > https://lists.debian.org/nbd/2022/04/msg00212.html > > [2] nbd: fix io hung while disconnecting device > https://lists.debian.org/nbd/2022/04/msg00207.html > > I am also happy to test any patches to fix the I/O errors. Sorry that I missed this thread. IMO, if inflight requests is cleared by ioctl NBD_CLEAR_SOCK after my patch [2](or other callers for nbd_clear_que()), such io will return as error. Thus I don't think such io errors need to be fixed. Josef, do you have other suggestions? Thanks, Kuai ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PROBLEM] nbd requests become stuck when devices watched by inotify emit udev uevent changes 2022-05-13 2:56 ` Matthew Ruffell 2022-05-13 3:20 ` yukuai (C) @ 2022-05-13 13:13 ` Josef Bacik 2022-05-14 3:39 ` yukuai (C) 1 sibling, 1 reply; 10+ messages in thread From: Josef Bacik @ 2022-05-13 13:13 UTC (permalink / raw) To: Matthew Ruffell; +Cc: Jens Axboe, linux-block, nbd, Linux Kernel, yukuai3 On Fri, May 13, 2022 at 02:56:18PM +1200, Matthew Ruffell wrote: > Hi Josef, > > Just a friendly ping, I am more than happy to test a patch, if you send it > inline in the email, since the pastebin you used expired after 1 day, and I > couldn't access it. > > I came across and tested Yu Kuai's patches [1][2] which are for the same issue, > and they indeed fix the hang. Thank you Yu. > > [1] nbd: don't clear 'NBD_CMD_INFLIGHT' flag if request is not completed > https://lists.debian.org/nbd/2022/04/msg00212.html > > [2] nbd: fix io hung while disconnecting device > https://lists.debian.org/nbd/2022/04/msg00207.html > > I am also happy to test any patches to fix the I/O errors. > Sorry, you caught me on vacation before and I forgot to reply. Here's part one of the patch I wanted you to try which fixes the io hung part. Thanks, Josef From 0a6123520380cb84de8ccefcccc5f112bce5efb6 Mon Sep 17 00:00:00 2001 Message-Id: <0a6123520380cb84de8ccefcccc5f112bce5efb6.1652447517.git.josef@toxicpanda.com> From: Josef Bacik <josef@toxicpanda.com> Date: Sat, 23 Apr 2022 23:51:23 -0400 Subject: [PATCH] timeout thing --- drivers/block/nbd.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c index 526389351784..ab365c0e9c04 100644 --- a/drivers/block/nbd.c +++ b/drivers/block/nbd.c @@ -1314,7 +1314,10 @@ static void nbd_config_put(struct nbd_device *nbd) kfree(nbd->config); nbd->config = NULL; - nbd->tag_set.timeout = 0; + /* Reset our timeout to something sane. */ + nbd->tag_set.timeout = 30 * HZ; + blk_queue_rq_timeout(nbd->disk->queue, 30 * HZ); + nbd->disk->queue->limits.discard_granularity = 0; nbd->disk->queue->limits.discard_alignment = 0; blk_queue_max_discard_sectors(nbd->disk->queue, 0); -- 2.26.3 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PROBLEM] nbd requests become stuck when devices watched by inotify emit udev uevent changes 2022-05-13 13:13 ` Josef Bacik @ 2022-05-14 3:39 ` yukuai (C) 2022-05-16 5:35 ` Matthew Ruffell 2022-05-16 12:17 ` Josef Bacik 0 siblings, 2 replies; 10+ messages in thread From: yukuai (C) @ 2022-05-14 3:39 UTC (permalink / raw) To: Josef Bacik, Matthew Ruffell; +Cc: Jens Axboe, linux-block, nbd, Linux Kernel 在 2022/05/13 21:13, Josef Bacik 写道: > On Fri, May 13, 2022 at 02:56:18PM +1200, Matthew Ruffell wrote: >> Hi Josef, >> >> Just a friendly ping, I am more than happy to test a patch, if you send it >> inline in the email, since the pastebin you used expired after 1 day, and I >> couldn't access it. >> >> I came across and tested Yu Kuai's patches [1][2] which are for the same issue, >> and they indeed fix the hang. Thank you Yu. >> >> [1] nbd: don't clear 'NBD_CMD_INFLIGHT' flag if request is not completed >> https://lists.debian.org/nbd/2022/04/msg00212.html >> >> [2] nbd: fix io hung while disconnecting device >> https://lists.debian.org/nbd/2022/04/msg00207.html >> >> I am also happy to test any patches to fix the I/O errors. >> > > Sorry, you caught me on vacation before and I forgot to reply. Here's part one > of the patch I wanted you to try which fixes the io hung part. Thanks, > > Josef > > >>From 0a6123520380cb84de8ccefcccc5f112bce5efb6 Mon Sep 17 00:00:00 2001 > Message-Id: <0a6123520380cb84de8ccefcccc5f112bce5efb6.1652447517.git.josef@toxicpanda.com> > From: Josef Bacik <josef@toxicpanda.com> > Date: Sat, 23 Apr 2022 23:51:23 -0400 > Subject: [PATCH] timeout thing > > --- > drivers/block/nbd.c | 5 ++++- > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c > index 526389351784..ab365c0e9c04 100644 > --- a/drivers/block/nbd.c > +++ b/drivers/block/nbd.c > @@ -1314,7 +1314,10 @@ static void nbd_config_put(struct nbd_device *nbd) > kfree(nbd->config); > nbd->config = NULL; > > - nbd->tag_set.timeout = 0; > + /* Reset our timeout to something sane. */ > + nbd->tag_set.timeout = 30 * HZ; > + blk_queue_rq_timeout(nbd->disk->queue, 30 * HZ); > + > nbd->disk->queue->limits.discard_granularity = 0; > nbd->disk->queue->limits.discard_alignment = 0; > blk_queue_max_discard_sectors(nbd->disk->queue, 0); > Hi, Josef This seems to try to fix the same problem that I described here: nbd: fix io hung while disconnecting device https://lists.debian.org/nbd/2022/04/msg00207.html There are still some io that are stuck, which means the devcie is probably still opened. Thus nbd_config_put() can't reach here. I'm afraid this patch can't fix the io hung. Matthew, can you try a test with this patch together with my patch below to comfirm my thought? nbd: don't clear 'NBD_CMD_INFLIGHT' flag if request is not completed https://lists.debian.org/nbd/2022/04/msg00212.html. Thanks, Kuai ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PROBLEM] nbd requests become stuck when devices watched by inotify emit udev uevent changes 2022-05-14 3:39 ` yukuai (C) @ 2022-05-16 5:35 ` Matthew Ruffell 2022-05-16 12:17 ` Josef Bacik 1 sibling, 0 replies; 10+ messages in thread From: Matthew Ruffell @ 2022-05-16 5:35 UTC (permalink / raw) To: yukuai (C); +Cc: Josef Bacik, Jens Axboe, linux-block, nbd, Linux Kernel Hi Josef, Kuai, Josef, thank you for attaching your patch. No worries about being on vacation, I hope you enjoyed your time off. Josef, I built your patch ontop of 5.18-rc6 with no other patches applied, and ran the testcase in my original message. After 3x loops, a hang occurred, and we see the usual -32 error: May 16 03:38:35 focal-nbd kernel: block nbd15: NBD_DISCONNECT May 16 03:38:35 focal-nbd kernel: block nbd15: Send disconnect failed -32 The hang lasted 30 seconds, no doubt caused by the "30 * HZ" timeout in your patch, and things started moving forward again: May 16 03:39:05 focal-nbd kernel: block nbd15: Connection timed out, retrying (0/1 alive) May 16 03:39:05 focal-nbd kernel: block nbd15: Connection timed out, retrying (0/1 alive) May 16 03:39:05 focal-nbd kernel: blk_print_req_error: 128 callbacks suppressed May 16 03:39:05 focal-nbd kernel: I/O error, dev nbd15, sector 1023488 op 0x0:(READ) flags 0x80700 phys_seg 14 prio class 0 May 16 03:39:05 focal-nbd kernel: I/O error, dev nbd15, sector 1023608 op 0x0:(READ) flags 0x80700 phys_seg 16 prio class 0 May 16 03:39:05 focal-nbd kernel: block nbd15: Device being setup by another task Note the timestamp increment of 30s. There were a whole host of I/O errors, and after a few more loops, the hang occurred again, again lasting for 30 seconds, and then doing a few more loops before getting stuck again. Pastebin of journalctl: https://paste.ubuntu.com/p/Cx6MBC8Vgj/ Unfortunately, your patch doesn't quite solve the issue. Kuai, I tested your suspicions by building Josef's patch ontop of 5.18-rc6 with your below patch applied: nbd: don't clear 'NBD_CMD_INFLIGHT' flag if request is not completed https://lists.debian.org/nbd/2022/04/msg00212.html. The behaviour was different this time from Josef's patch alone. On the very second iteration of the loop, I got a bunch of I/O errors, and the nbd subsystem hung, and did not recover. I started getting stuck request messages, and the usual hung task timeout oops messages. Pastebin of journalctl here: https://paste.ubuntu.com/p/C9rjckrWtp/ I went back and did some more testing of Kuai's two commits: nbd: don't clear 'NBD_CMD_INFLIGHT' flag if request is not completed https://lists.debian.org/nbd/2022/04/msg00212.html nbd: fix io hung while disconnecting device https://lists.debian.org/nbd/2022/04/msg00207.html I left the testcase running for about 20 minutes, and it never hung. It did get a bit racey from time to time trying to get a write lock for the qcow image, where the disconnect completed after the call to mkfs.ext4 started, but simply saying "y" let the loop run for another 5 minutes before the race occurred again. Formatting 'foo.img', fmt=qcow2 size=524288000 cluster_size=65536 lazy_refcounts=off refcount_bits=16 qemu-img: foo.img: Failed to get "write" lock Is another process using the image [foo.img]? /dev/nbd15 disconnected mke2fs 1.45.5 (07-Jan-2020) /dev/nbd15 contains a ext4 file system labelled 'root' created on Mon May 16 05:23:01 2022 Proceed anyway? (y,N) Through my whole time testing Kuai's fixes, I never saw a hang. The behaviour seen is the same as the workaround of preventing systemd from watching nbd devices with inotify. I think we should go with Kuai's patches. So for Kuai's two patches: Tested-by: Matthew Ruffell <matthew.ruffell@canonical.com> Thanks, Matthew On Sat, May 14, 2022 at 3:39 PM yukuai (C) <yukuai3@huawei.com> wrote: > > 在 2022/05/13 21:13, Josef Bacik 写道: > > On Fri, May 13, 2022 at 02:56:18PM +1200, Matthew Ruffell wrote: > >> Hi Josef, > >> > >> Just a friendly ping, I am more than happy to test a patch, if you send it > >> inline in the email, since the pastebin you used expired after 1 day, and I > >> couldn't access it. > >> > >> I came across and tested Yu Kuai's patches [1][2] which are for the same issue, > >> and they indeed fix the hang. Thank you Yu. > >> > >> [1] nbd: don't clear 'NBD_CMD_INFLIGHT' flag if request is not completed > >> https://lists.debian.org/nbd/2022/04/msg00212.html > >> > >> [2] nbd: fix io hung while disconnecting device > >> https://lists.debian.org/nbd/2022/04/msg00207.html > >> > >> I am also happy to test any patches to fix the I/O errors. > >> > > > > Sorry, you caught me on vacation before and I forgot to reply. Here's part one > > of the patch I wanted you to try which fixes the io hung part. Thanks, > > > > Josef > > > > > >>From 0a6123520380cb84de8ccefcccc5f112bce5efb6 Mon Sep 17 00:00:00 2001 > > Message-Id: <0a6123520380cb84de8ccefcccc5f112bce5efb6.1652447517.git.josef@toxicpanda.com> > > From: Josef Bacik <josef@toxicpanda.com> > > Date: Sat, 23 Apr 2022 23:51:23 -0400 > > Subject: [PATCH] timeout thing > > > > --- > > drivers/block/nbd.c | 5 ++++- > > 1 file changed, 4 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c > > index 526389351784..ab365c0e9c04 100644 > > --- a/drivers/block/nbd.c > > +++ b/drivers/block/nbd.c > > @@ -1314,7 +1314,10 @@ static void nbd_config_put(struct nbd_device *nbd) > > kfree(nbd->config); > > nbd->config = NULL; > > > > - nbd->tag_set.timeout = 0; > > + /* Reset our timeout to something sane. */ > > + nbd->tag_set.timeout = 30 * HZ; > > + blk_queue_rq_timeout(nbd->disk->queue, 30 * HZ); > > + > > nbd->disk->queue->limits.discard_granularity = 0; > > nbd->disk->queue->limits.discard_alignment = 0; > > blk_queue_max_discard_sectors(nbd->disk->queue, 0); > > > Hi, Josef > > This seems to try to fix the same problem that I described here: > > nbd: fix io hung while disconnecting device > https://lists.debian.org/nbd/2022/04/msg00207.html > > There are still some io that are stuck, which means the devcie is > probably still opened. Thus nbd_config_put() can't reach here. > I'm afraid this patch can't fix the io hung. > > Matthew, can you try a test with this patch together with my patch below > to comfirm my thought? > > nbd: don't clear 'NBD_CMD_INFLIGHT' flag if request is not completed > https://lists.debian.org/nbd/2022/04/msg00212.html. > > Thanks, > Kuai ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PROBLEM] nbd requests become stuck when devices watched by inotify emit udev uevent changes 2022-05-14 3:39 ` yukuai (C) 2022-05-16 5:35 ` Matthew Ruffell @ 2022-05-16 12:17 ` Josef Bacik 2022-05-16 12:53 ` yukuai (C) 1 sibling, 1 reply; 10+ messages in thread From: Josef Bacik @ 2022-05-16 12:17 UTC (permalink / raw) To: yukuai (C); +Cc: Matthew Ruffell, Jens Axboe, linux-block, nbd, Linux Kernel On Sat, May 14, 2022 at 11:39:25AM +0800, yukuai (C) wrote: > 在 2022/05/13 21:13, Josef Bacik 写道: > > On Fri, May 13, 2022 at 02:56:18PM +1200, Matthew Ruffell wrote: > > > Hi Josef, > > > > > > Just a friendly ping, I am more than happy to test a patch, if you send it > > > inline in the email, since the pastebin you used expired after 1 day, and I > > > couldn't access it. > > > > > > I came across and tested Yu Kuai's patches [1][2] which are for the same issue, > > > and they indeed fix the hang. Thank you Yu. > > > > > > [1] nbd: don't clear 'NBD_CMD_INFLIGHT' flag if request is not completed > > > https://lists.debian.org/nbd/2022/04/msg00212.html > > > > > > [2] nbd: fix io hung while disconnecting device > > > https://lists.debian.org/nbd/2022/04/msg00207.html > > > > > > I am also happy to test any patches to fix the I/O errors. > > > > > > > Sorry, you caught me on vacation before and I forgot to reply. Here's part one > > of the patch I wanted you to try which fixes the io hung part. Thanks, > > > > Josef > > > > > From 0a6123520380cb84de8ccefcccc5f112bce5efb6 Mon Sep 17 00:00:00 2001 > > Message-Id: <0a6123520380cb84de8ccefcccc5f112bce5efb6.1652447517.git.josef@toxicpanda.com> > > From: Josef Bacik <josef@toxicpanda.com> > > Date: Sat, 23 Apr 2022 23:51:23 -0400 > > Subject: [PATCH] timeout thing > > > > --- > > drivers/block/nbd.c | 5 ++++- > > 1 file changed, 4 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c > > index 526389351784..ab365c0e9c04 100644 > > --- a/drivers/block/nbd.c > > +++ b/drivers/block/nbd.c > > @@ -1314,7 +1314,10 @@ static void nbd_config_put(struct nbd_device *nbd) > > kfree(nbd->config); > > nbd->config = NULL; > > - nbd->tag_set.timeout = 0; > > + /* Reset our timeout to something sane. */ > > + nbd->tag_set.timeout = 30 * HZ; > > + blk_queue_rq_timeout(nbd->disk->queue, 30 * HZ); > > + > > nbd->disk->queue->limits.discard_granularity = 0; > > nbd->disk->queue->limits.discard_alignment = 0; > > blk_queue_max_discard_sectors(nbd->disk->queue, 0); > > > Hi, Josef > > This seems to try to fix the same problem that I described here: > > nbd: fix io hung while disconnecting device > https://lists.debian.org/nbd/2022/04/msg00207.html > > There are still some io that are stuck, which means the devcie is > probably still opened. Thus nbd_config_put() can't reach here. > I'm afraid this patch can't fix the io hung. > > Matthew, can you try a test with this patch together with my patch below > to comfirm my thought? > > nbd: don't clear 'NBD_CMD_INFLIGHT' flag if request is not completed > https://lists.debian.org/nbd/2022/04/msg00212.html. > Re-submit this one, but fix it so we just test the bit to see if we need to skip it, and change it so we only CLEAR when we're sure we're going to complete the request. Thanks, Josef ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PROBLEM] nbd requests become stuck when devices watched by inotify emit udev uevent changes 2022-05-16 12:17 ` Josef Bacik @ 2022-05-16 12:53 ` yukuai (C) 0 siblings, 0 replies; 10+ messages in thread From: yukuai (C) @ 2022-05-16 12:53 UTC (permalink / raw) To: Josef Bacik; +Cc: Matthew Ruffell, Jens Axboe, linux-block, nbd, Linux Kernel 在 2022/05/16 20:17, Josef Bacik 写道: >> Hi, Josef >> >> This seems to try to fix the same problem that I described here: >> >> nbd: fix io hung while disconnecting device >> https://lists.debian.org/nbd/2022/04/msg00207.html >> >> There are still some io that are stuck, which means the devcie is >> probably still opened. Thus nbd_config_put() can't reach here. >> I'm afraid this patch can't fix the io hung. >> >> Matthew, can you try a test with this patch together with my patch below >> to comfirm my thought? >> >> nbd: don't clear 'NBD_CMD_INFLIGHT' flag if request is not completed >> https://lists.debian.org/nbd/2022/04/msg00212.html. >> > > Re-submit this one, but fix it so we just test the bit to see if we need to skip > it, and change it so we only CLEAR when we're sure we're going to complete the > request. Thanks, Ok, thanks for your advice. I'll send a new version. BTW, do you have any suggestions on other patches of the patchset? https://lore.kernel.org/all/20220426130746.885140-1-yukuai3@huawei.com/ Thanks, Kuai > > Josef > . > ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2022-05-16 12:53 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2022-04-22 5:42 [PROBLEM] nbd requests become stuck when devices watched by inotify emit udev uevent changes Matthew Ruffell 2022-04-22 15:23 ` Josef Bacik 2022-04-25 21:47 ` Matthew Ruffell 2022-05-13 2:56 ` Matthew Ruffell 2022-05-13 3:20 ` yukuai (C) 2022-05-13 13:13 ` Josef Bacik 2022-05-14 3:39 ` yukuai (C) 2022-05-16 5:35 ` Matthew Ruffell 2022-05-16 12:17 ` Josef Bacik 2022-05-16 12:53 ` yukuai (C)
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox