* [REGRESSION][BISECTED] virtio-blk serial attribute causes guest to hang [Was: Re: [PATCH UPDATED 4/5] dm: implement REQ_FLUSH/FUA support for request-based dm]
[not found] <20100902032246.GA31484@redhat.com>
@ 2010-09-09 15:26 ` Mike Snitzer
2010-09-09 15:44 ` Ryan Harper
0 siblings, 1 reply; 16+ messages in thread
From: Mike Snitzer @ 2010-09-09 15:26 UTC (permalink / raw)
To: Tejun Heo
Cc: Mikulas Patocka, dm-devel, Vivek Goyal, ryanh, john.cooper, rusty,
hch, kvm
[-- Attachment #1: Type: text/plain, Size: 4471 bytes --]
On Wed, Sep 01 2010 at 11:22pm -0400,
Mike Snitzer <snitzer@redhat.com> wrote:
> On Wed, Sep 01 2010 at 2:59pm -0400,
> Mike Snitzer <snitzer@redhat.com> wrote:
>
> > My hope was that the request-based deadlock I'm seeing would disappear
> > if that relaxed ordering patch wasn't applied. Unfortunately, I still
> > see the hang.
>
> Turns out I can reproduce the hang on a stock 2.6.36-rc3 (without _any_
> FLUSH+FUA patches)!
>
> I'll try to pin-point the root cause but I think my test is somehow
> exposing a bug in my virt setup.
[my virt setup == single kvm guest (RHEL6) with F13 host]
My gut turned out to be correct. I finally tracked down the regression
point to the following commit (cc'ing appropriate people):
commit a5eb9e4ff18a33e43557d44b205f953b0c1efade
Author: Ryan Harper <ryanh@us.ibm.com>
Date: Wed Jun 23 22:19:57 2010 -0500
virtio_blk: Add 'serial' attribute to virtio-blk devices (v2)
Create a new attribute for virtio-blk devices that will fetch the serial number
of the block device. This attribute can be used by udev to create disk/by-id
symlinks for devices that don't have a UUID (filesystem) associated with them.
ATA_IDENTIFY strings are special in that they can be up to 20 chars long
and aren't required to be nul-terminated. The buffer is also zero-padded
meaning that if the serial is 19 chars or less that we get a nul-terminated
string. When copying this value into a string buffer, we must be careful to
copy up to the nul (if it present) and only 20 if it is longer and not to
attempt to nul terminate; this isn't needed.
Changes since v1:
- Added BUILD_BUG_ON() for PAGE_SIZE check
- Removed min() since BUILD_BUG_ON() handles the check
- Replaced serial_sysfs() by copying id directly to buffer
Signed-off-by: Ryan Harper <ryanh@us.ibm.com>
Signed-off-by: john cooper <john.cooper@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
So the first released kernel to have this regression is 2.6.36-rc1.
Some background:
I have been working with Tejun to test the barrier to FLUSH+FUA
conversion patchset. I crafted the attached script to test the DM
changes that are part of the FLUSH+FUA patchset.
Using this script with:
while true ; do ./test_dm_discard_mpath_scsi_debug.sh ; done
I can reliably trigger the following hang, always on the 5th iteration
in my testing, IFF commit a5eb9e4ff18a33e43557d44b205f953b0c1efade is
applied:
INFO: task lvcreate:2484 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
lvcreate D 0000000100064871 4960 2484 2350 0x00000080
ffff88007b87b978 0000000000000046 ffff88007b87b8e8 ffff880000000000
ffff88007b87bfd8 ffff8800724fa400 00000000001d4040 ffff88007b87bfd8
00000000001d4040 00000000001d4040 00000000001d4040 00000000001d4040
Call Trace:
[<ffffffff8136de23>] io_schedule+0x73/0xb5
[<ffffffff811b6882>] get_request_wait+0xf2/0x180
[<ffffffff8105d8da>] ? autoremove_wake_function+0x0/0x39
[<ffffffff811b6deb>] __make_request+0x310/0x434
[<ffffffff811b5442>] generic_make_request+0x2f1/0x36e
[<ffffffff81062f78>] ? cpu_clock+0x43/0x5e
[<ffffffff811b559d>] submit_bio+0xde/0xfb
[<ffffffff8106e459>] ? trace_hardirqs_on+0xd/0xf
[<ffffffff81129332>] dio_bio_submit+0x7b/0x9c
[<ffffffff8112939d>] dio_send_cur_page+0x4a/0xb0
[<ffffffff81129f1c>] __blockdev_direct_IO_newtrunc+0x7c5/0x97d
[<ffffffff81127f4f>] blkdev_direct_IO+0x57/0x59
[<ffffffff81127080>] ? blkdev_get_blocks+0x0/0x90
[<ffffffff810c2eee>] generic_file_aio_read+0xed/0x5b4
[<ffffffff810d70d4>] ? might_fault+0x5c/0xac
[<ffffffff810242bd>] ? pvclock_clocksource_read+0x50/0xb9
[<ffffffff81100813>] do_sync_read+0xcb/0x108
[<ffffffff8136e5ad>] ? __mutex_unlock_slowpath+0x119/0x12b
[<ffffffff8106e428>] ? trace_hardirqs_on_caller+0x11d/0x141
[<ffffffff8106e459>] ? trace_hardirqs_on+0xd/0xf
[<ffffffff8118cae7>] ? security_file_permission+0x16/0x18
[<ffffffff81100e7a>] vfs_read+0xab/0x108
[<ffffffff8106e428>] ? trace_hardirqs_on_caller+0x11d/0x141
[<ffffffff81100f97>] sys_read+0x4a/0x6e
[<ffffffff81002bf2>] system_call_fastpath+0x16/0x1b
no locks held by lvcreate/2484.
lvcreate is just the first victim (sometimes it is the vgcreate). But
if the guest is left running other new processes get hung with
comparable traces (w/ get_request_wait). Until eventually the guest is
completely unresponsive.
Mike
[-- Attachment #2: test_dm_discard_mpath_scsi_debug.sh --]
[-- Type: application/x-sh, Size: 907 bytes --]
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [REGRESSION][BISECTED] virtio-blk serial attribute causes guest to hang [Was: Re: [PATCH UPDATED 4/5] dm: implement REQ_FLUSH/FUA support for request-based dm]
2010-09-09 15:26 ` [REGRESSION][BISECTED] virtio-blk serial attribute causes guest to hang [Was: Re: [PATCH UPDATED 4/5] dm: implement REQ_FLUSH/FUA support for request-based dm] Mike Snitzer
@ 2010-09-09 15:44 ` Ryan Harper
2010-09-09 15:57 ` Mike Snitzer
0 siblings, 1 reply; 16+ messages in thread
From: Ryan Harper @ 2010-09-09 15:44 UTC (permalink / raw)
To: Mike Snitzer
Cc: Tejun Heo, Mikulas Patocka, dm-devel, Vivek Goyal, ryanh,
john.cooper, rusty, hch, kvm
* Mike Snitzer <snitzer@redhat.com> [2010-09-09 10:29]:
> On Wed, Sep 01 2010 at 11:22pm -0400,
> Mike Snitzer <snitzer@redhat.com> wrote:
>
> > On Wed, Sep 01 2010 at 2:59pm -0400,
> > Mike Snitzer <snitzer@redhat.com> wrote:
> >
> > > My hope was that the request-based deadlock I'm seeing would disappear
> > > if that relaxed ordering patch wasn't applied. Unfortunately, I still
> > > see the hang.
> >
> > Turns out I can reproduce the hang on a stock 2.6.36-rc3 (without _any_
> > FLUSH+FUA patches)!
> >
> > I'll try to pin-point the root cause but I think my test is somehow
> > exposing a bug in my virt setup.
>
> [my virt setup == single kvm guest (RHEL6) with F13 host]
What's your kvm guest command line? And the guest is using stock RHEL6
kernel? What KVM userspace are you using on the host? What comes with
F13 or some updated version?
>
> My gut turned out to be correct. I finally tracked down the regression
> point to the following commit (cc'ing appropriate people):
>
> commit a5eb9e4ff18a33e43557d44b205f953b0c1efade
> Author: Ryan Harper <ryanh@us.ibm.com>
> Date: Wed Jun 23 22:19:57 2010 -0500
>
> virtio_blk: Add 'serial' attribute to virtio-blk devices (v2)
>
> Create a new attribute for virtio-blk devices that will fetch the serial number
> of the block device. This attribute can be used by udev to create disk/by-id
> symlinks for devices that don't have a UUID (filesystem) associated with them.
>
> ATA_IDENTIFY strings are special in that they can be up to 20 chars long
> and aren't required to be nul-terminated. The buffer is also zero-padded
> meaning that if the serial is 19 chars or less that we get a nul-terminated
> string. When copying this value into a string buffer, we must be careful to
> copy up to the nul (if it present) and only 20 if it is longer and not to
> attempt to nul terminate; this isn't needed.
>
> Changes since v1:
> - Added BUILD_BUG_ON() for PAGE_SIZE check
> - Removed min() since BUILD_BUG_ON() handles the check
> - Replaced serial_sysfs() by copying id directly to buffer
>
> Signed-off-by: Ryan Harper <ryanh@us.ibm.com>
> Signed-off-by: john cooper <john.cooper@redhat.com>
> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
>
> So the first released kernel to have this regression is 2.6.36-rc1.
>
> Some background:
> I have been working with Tejun to test the barrier to FLUSH+FUA
> conversion patchset. I crafted the attached script to test the DM
> changes that are part of the FLUSH+FUA patchset.
>
> Using this script with:
> while true ; do ./test_dm_discard_mpath_scsi_debug.sh ; done
>
> I can reliably trigger the following hang, always on the 5th iteration
> in my testing, IFF commit a5eb9e4ff18a33e43557d44b205f953b0c1efade is
> applied:
>
> INFO: task lvcreate:2484 blocked for more than 120 seconds.
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
> message.
> lvcreate D 0000000100064871 4960 2484 2350 0x00000080
> ffff88007b87b978 0000000000000046 ffff88007b87b8e8 ffff880000000000
> ffff88007b87bfd8 ffff8800724fa400 00000000001d4040 ffff88007b87bfd8
> 00000000001d4040 00000000001d4040 00000000001d4040 00000000001d4040
> Call Trace:
> [<ffffffff8136de23>] io_schedule+0x73/0xb5
> [<ffffffff811b6882>] get_request_wait+0xf2/0x180
> [<ffffffff8105d8da>] ? autoremove_wake_function+0x0/0x39
> [<ffffffff811b6deb>] __make_request+0x310/0x434
> [<ffffffff811b5442>] generic_make_request+0x2f1/0x36e
> [<ffffffff81062f78>] ? cpu_clock+0x43/0x5e
> [<ffffffff811b559d>] submit_bio+0xde/0xfb
> [<ffffffff8106e459>] ? trace_hardirqs_on+0xd/0xf
> [<ffffffff81129332>] dio_bio_submit+0x7b/0x9c
> [<ffffffff8112939d>] dio_send_cur_page+0x4a/0xb0
> [<ffffffff81129f1c>] __blockdev_direct_IO_newtrunc+0x7c5/0x97d
> [<ffffffff81127f4f>] blkdev_direct_IO+0x57/0x59
> [<ffffffff81127080>] ? blkdev_get_blocks+0x0/0x90
> [<ffffffff810c2eee>] generic_file_aio_read+0xed/0x5b4
> [<ffffffff810d70d4>] ? might_fault+0x5c/0xac
> [<ffffffff810242bd>] ? pvclock_clocksource_read+0x50/0xb9
> [<ffffffff81100813>] do_sync_read+0xcb/0x108
> [<ffffffff8136e5ad>] ? __mutex_unlock_slowpath+0x119/0x12b
> [<ffffffff8106e428>] ? trace_hardirqs_on_caller+0x11d/0x141
> [<ffffffff8106e459>] ? trace_hardirqs_on+0xd/0xf
> [<ffffffff8118cae7>] ? security_file_permission+0x16/0x18
> [<ffffffff81100e7a>] vfs_read+0xab/0x108
> [<ffffffff8106e428>] ? trace_hardirqs_on_caller+0x11d/0x141
> [<ffffffff81100f97>] sys_read+0x4a/0x6e
> [<ffffffff81002bf2>] system_call_fastpath+0x16/0x1b
> no locks held by lvcreate/2484.
>
>
> lvcreate is just the first victim (sometimes it is the vgcreate). But
> if the guest is left running other new processes get hung with
> comparable traces (w/ get_request_wait). Until eventually the guest is
> completely unresponsive.
>
> Mike
--
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
ryanh@us.ibm.com
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [REGRESSION][BISECTED] virtio-blk serial attribute causes guest to hang [Was: Re: [PATCH UPDATED 4/5] dm: implement REQ_FLUSH/FUA support for request-based dm]
2010-09-09 15:44 ` Ryan Harper
@ 2010-09-09 15:57 ` Mike Snitzer
2010-09-09 16:03 ` Ryan Harper
0 siblings, 1 reply; 16+ messages in thread
From: Mike Snitzer @ 2010-09-09 15:57 UTC (permalink / raw)
To: Ryan Harper
Cc: Tejun Heo, Mikulas Patocka, dm-devel, Vivek Goyal, john.cooper,
rusty, hch, kvm
On Thu, Sep 09 2010 at 11:44am -0400,
Ryan Harper <ryanh@us.ibm.com> wrote:
> * Mike Snitzer <snitzer@redhat.com> [2010-09-09 10:29]:
> > On Wed, Sep 01 2010 at 11:22pm -0400,
> > Mike Snitzer <snitzer@redhat.com> wrote:
> >
> > > On Wed, Sep 01 2010 at 2:59pm -0400,
> > > Mike Snitzer <snitzer@redhat.com> wrote:
> > >
> > > > My hope was that the request-based deadlock I'm seeing would disappear
> > > > if that relaxed ordering patch wasn't applied. Unfortunately, I still
> > > > see the hang.
> > >
> > > Turns out I can reproduce the hang on a stock 2.6.36-rc3 (without _any_
> > > FLUSH+FUA patches)!
> > >
> > > I'll try to pin-point the root cause but I think my test is somehow
> > > exposing a bug in my virt setup.
> >
> > [my virt setup == single kvm guest (RHEL6) with F13 host]
>
> What's your kvm guest command line?
I assume you mean qemu-kvm commandline:
/usr/bin/qemu-kvm -S -M pc-0.11 -enable-kvm -m 2048 -smp 1,sockets=1,cores=1,threads=1 -name rhel6.x86_64 -uuid 9129e4e4-15d3-00e2-e9de-2c28a29feb52 -nodefaults -chardev socket,id=monitor,path=/var/lib/libvirt/qemu/rhel6.x86_64.monitor,server,nowait -mon chardev=monitor,mode=readline -rtc base=utc -boot cd -drive file=/var/lib/libvirt/images/rhel6.x86_64.img,if=none,id=drive-virtio-disk0,boot=on,format=raw,cache=none -device virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 -drive file=/var/lib/libvirt/images/boot.iso,if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -device virtio-net-pci,vlan=0,id=net0,mac=54:52:00:70:e8:23,bus=pci.0,addr=0x6 -net tap,fd=49,vlan=0,name=hostnet0 -charde
v pty,id=serial0 -device isa-serial,chardev=serial0 -usb -device usb-tablet,id=input0 -vnc 127.0.0.1:0 -vga cirrus -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3
I'm using virtio-blk w/ cache=none for the root device. virtio-blk
isn't used for any other devices in the guest.
Here is the guest's kernel commandline (not that it is interesting):
ro root=UUID=e0236db2-5a38-4d48-8bf5-55675671dee6 console=ttyS0 rhgb quiet SYSFONT=latarcyrheb-sun16 LANG=en_US.UTF-8 KEYTABLE=us rd_plytheme=charge crashkernel=auto
> And the guest is using stock RHEL6 kernel?
No, the guest is using the upstream kernel.org kernel. Hence my report
of an upstream regression. The guest is using RHEL6 userspace (udev in
all its glory, etc).
> What KVM userspace are you using on the host? What comes with
> F13 or some updated version?
Just what comes with F13.
Mike
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [REGRESSION][BISECTED] virtio-blk serial attribute causes guest to hang [Was: Re: [PATCH UPDATED 4/5] dm: implement REQ_FLUSH/FUA support for request-based dm]
2010-09-09 15:57 ` Mike Snitzer
@ 2010-09-09 16:03 ` Ryan Harper
2010-09-09 17:55 ` Mike Snitzer
0 siblings, 1 reply; 16+ messages in thread
From: Ryan Harper @ 2010-09-09 16:03 UTC (permalink / raw)
To: Mike Snitzer
Cc: Ryan Harper, Tejun Heo, Mikulas Patocka, dm-devel, Vivek Goyal,
john.cooper, rusty, hch, kvm
* Mike Snitzer <snitzer@redhat.com> [2010-09-09 10:58]:
> On Thu, Sep 09 2010 at 11:44am -0400,
> Ryan Harper <ryanh@us.ibm.com> wrote:
>
> > * Mike Snitzer <snitzer@redhat.com> [2010-09-09 10:29]:
> > > On Wed, Sep 01 2010 at 11:22pm -0400,
> > > Mike Snitzer <snitzer@redhat.com> wrote:
> > >
> > > > On Wed, Sep 01 2010 at 2:59pm -0400,
> > > > Mike Snitzer <snitzer@redhat.com> wrote:
> > > >
> > > > > My hope was that the request-based deadlock I'm seeing would disappear
> > > > > if that relaxed ordering patch wasn't applied. Unfortunately, I still
> > > > > see the hang.
> > > >
> > > > Turns out I can reproduce the hang on a stock 2.6.36-rc3 (without _any_
> > > > FLUSH+FUA patches)!
> > > >
> > > > I'll try to pin-point the root cause but I think my test is somehow
> > > > exposing a bug in my virt setup.
> > >
> > > [my virt setup == single kvm guest (RHEL6) with F13 host]
> >
> > What's your kvm guest command line?
>
> I assume you mean qemu-kvm commandline:
>
> /usr/bin/qemu-kvm -S -M pc-0.11 -enable-kvm -m 2048 -smp 1,sockets=1,cores=1,threads=1 -name rhel6.x86_64 -uuid 9129e4e4-15d3-00e2-e9de-2c28a29feb52 -nodefaults -chardev socket,id=monitor,path=/var/lib/libvirt/qemu/rhel6.x86_64.monitor,server,nowait -mon chardev=monitor,mode=readline -rtc base=utc -boot cd -drive file=/var/lib/libvirt/images/rhel6.x86_64.img,if=none,id=drive-virtio-disk0,boot=on,format=raw,cache=none -device virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 -drive file=/var/lib/libvirt/images/boot.iso,if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -device virtio-net-pci,vlan=0,id=net0,mac=54:52:00:70:e8:23,bus=pci.0,addr=0x6 -net tap,fd=49,vlan=0,name=hostnet0 -char
dev pty,id=serial0 -device isa-serial,chardev=serial0 -usb -device usb-tablet,id=input0 -vnc 127.0.0.1:0 -vga cirrus -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3
>
> I'm using virtio-blk w/ cache=none for the root device. virtio-blk
> isn't used for any other devices in the guest.
And you don't have any other disks in the guest (I see just the root and
the cdrom), the lv stuff is happening against some sort of dummy target?
>
> Here is the guest's kernel commandline (not that it is interesting):
> ro root=UUID=e0236db2-5a38-4d48-8bf5-55675671dee6 console=ttyS0 rhgb quiet SYSFONT=latarcyrheb-sun16 LANG=en_US.UTF-8 KEYTABLE=us rd_plytheme=charge crashkernel=auto
>
> > And the guest is using stock RHEL6 kernel?
>
> No, the guest is using the upstream kernel.org kernel. Hence my report
> of an upstream regression. The guest is using RHEL6 userspace (udev in
> all its glory, etc).
>
> > What KVM userspace are you using on the host? What comes with
> > F13 or some updated version?
>
> Just what comes with F13.
>
> Mike
--
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
ryanh@us.ibm.com
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [REGRESSION][BISECTED] virtio-blk serial attribute causes guest to hang [Was: Re: [PATCH UPDATED 4/5] dm: implement REQ_FLUSH/FUA support for request-based dm]
2010-09-09 16:03 ` Ryan Harper
@ 2010-09-09 17:55 ` Mike Snitzer
2010-09-09 18:35 ` Ryan Harper
0 siblings, 1 reply; 16+ messages in thread
From: Mike Snitzer @ 2010-09-09 17:55 UTC (permalink / raw)
To: Ryan Harper
Cc: Tejun Heo, Mikulas Patocka, dm-devel, Vivek Goyal, john.cooper,
rusty, hch, kvm
On Thu, Sep 09 2010 at 12:03pm -0400,
Ryan Harper <ryanh@us.ibm.com> wrote:
> * Mike Snitzer <snitzer@redhat.com> [2010-09-09 10:58]:
> > I'm using virtio-blk w/ cache=none for the root device. virtio-blk
> > isn't used for any other devices in the guest.
>
> And you don't have any other disks in the guest (I see just the root and
> the cdrom), the lv stuff is happening against some sort of dummy target?
Correct. I have used variants of the script I provided against both
scsi-debug devices and iscsi devices in the guest. The script I shared
uses multipath on scsi-debug (ram-based) devices.
That script causes udev to run its various callouts via multipath and
LVM (both packages, upstream and RHEL6, now use udev).
I have verified that I no longer get the hang if I switch the root
device from virtio to ide.
Mike
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [REGRESSION][BISECTED] virtio-blk serial attribute causes guest to hang [Was: Re: [PATCH UPDATED 4/5] dm: implement REQ_FLUSH/FUA support for request-based dm]
2010-09-09 17:55 ` Mike Snitzer
@ 2010-09-09 18:35 ` Ryan Harper
2010-09-09 19:15 ` Mike Snitzer
0 siblings, 1 reply; 16+ messages in thread
From: Ryan Harper @ 2010-09-09 18:35 UTC (permalink / raw)
To: Mike Snitzer
Cc: Ryan Harper, Tejun Heo, Mikulas Patocka, dm-devel, Vivek Goyal,
john.cooper, rusty, hch, kvm
* Mike Snitzer <snitzer@redhat.com> [2010-09-09 12:56]:
> On Thu, Sep 09 2010 at 12:03pm -0400,
> Ryan Harper <ryanh@us.ibm.com> wrote:
>
> > * Mike Snitzer <snitzer@redhat.com> [2010-09-09 10:58]:
>
> > > I'm using virtio-blk w/ cache=none for the root device. virtio-blk
> > > isn't used for any other devices in the guest.
> >
> > And you don't have any other disks in the guest (I see just the root and
> > the cdrom), the lv stuff is happening against some sort of dummy target?
>
> Correct. I have used variants of the script I provided against both
> scsi-debug devices and iscsi devices in the guest. The script I shared
> uses multipath on scsi-debug (ram-based) devices.
>
> That script causes udev to run its various callouts via multipath and
> LVM (both packages, upstream and RHEL6, now use udev).
>
> I have verified that I no longer get the hang if I switch the root
> device from virtio to ide.
And in the failing case, do you see:
/sys/block/vda/serial
attribute in sysfs?
--
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
ryanh@us.ibm.com
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [REGRESSION][BISECTED] virtio-blk serial attribute causes guest to hang [Was: Re: [PATCH UPDATED 4/5] dm: implement REQ_FLUSH/FUA support for request-based dm]
2010-09-09 18:35 ` Ryan Harper
@ 2010-09-09 19:15 ` Mike Snitzer
2010-09-09 19:43 ` Mike Snitzer
0 siblings, 1 reply; 16+ messages in thread
From: Mike Snitzer @ 2010-09-09 19:15 UTC (permalink / raw)
To: Ryan Harper
Cc: Tejun Heo, Mikulas Patocka, dm-devel, Vivek Goyal, john.cooper,
rusty, hch, kvm
On Thu, Sep 09 2010 at 2:35pm -0400,
Ryan Harper <ryanh@us.ibm.com> wrote:
> * Mike Snitzer <snitzer@redhat.com> [2010-09-09 12:56]:
> > I have verified that I no longer get the hang if I switch the root
> > device from virtio to ide.
>
> And in the failing case, do you see:
>
> /sys/block/vda/serial
>
> attribute in sysfs?
[root@rhel6 ~]# ls -al /sys/block/vda/serial
-r--r--r-- 1 root root 4096 Sep 9 15:07 /sys/block/vda/serial
[root@rhel6 ~]# cat /sys/block/vda/serial
[root@rhel6 ~]#
If I try to access 'serial' once the reproducer script has hung the cat
also hangs:
cat D 00000000fffe5d48 5664 2386 2049 0x00000080
ffff880075129ce8 0000000000000046 ffff880075129c88 ffff880000000000
ffff880075129fd8 ffff8800758aa400 00000000001d4040 ffff880075129fd8
00000000001d4040 00000000001d4040 00000000001d4040 00000000001d4040
Call Trace:
[<ffffffff8136de23>] io_schedule+0x73/0xb5
[<ffffffff811b6882>] get_request_wait+0xf2/0x180
[<ffffffff8105d8da>] ? autoremove_wake_function+0x0/0x39
[<ffffffff811b6951>] blk_get_request+0x41/0x71
[<ffffffff811b69ad>] blk_make_request+0x2c/0x8b
[<ffffffffa0016073>] virtblk_get_id+0x57/0x93 [virtio_blk]
[<ffffffffa00160d0>] virtblk_serial_show+0x21/0x4d [virtio_blk]
[<ffffffff81265f7c>] dev_attr_show+0x27/0x4e
[<ffffffff81157f0b>] ? sysfs_read_file+0x94/0x17f
[<ffffffff810c6d6b>] ? __get_free_pages+0x18/0x55
[<ffffffff81157f34>] sysfs_read_file+0xbd/0x17f
[<ffffffff81100e7a>] vfs_read+0xab/0x108
[<ffffffff8106e428>] ? trace_hardirqs_on_caller+0x11d/0x141
[<ffffffff81100f97>] sys_read+0x4a/0x6e
[<ffffffff81002bf2>] system_call_fastpath+0x16/0x1b
Taking a step back; why did you make the inability to add the 'serial'
attribute such a hard failure?
Strikes me as worthy of a non-fatal warning at most. The serial
attribute is added fine in this instance but I'm just curious.
Mike
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [REGRESSION][BISECTED] virtio-blk serial attribute causes guest to hang [Was: Re: [PATCH UPDATED 4/5] dm: implement REQ_FLUSH/FUA support for request-based dm]
2010-09-09 19:15 ` Mike Snitzer
@ 2010-09-09 19:43 ` Mike Snitzer
2010-09-09 20:14 ` Mike Snitzer
0 siblings, 1 reply; 16+ messages in thread
From: Mike Snitzer @ 2010-09-09 19:43 UTC (permalink / raw)
To: Ryan Harper
Cc: Tejun Heo, Mikulas Patocka, dm-devel, Vivek Goyal, john.cooper,
rusty, hch, kvm
On Thu, Sep 09 2010 at 3:15pm -0400,
Mike Snitzer <snitzer@redhat.com> wrote:
> On Thu, Sep 09 2010 at 2:35pm -0400,
> Ryan Harper <ryanh@us.ibm.com> wrote:
>
> > * Mike Snitzer <snitzer@redhat.com> [2010-09-09 12:56]:
>
> > > I have verified that I no longer get the hang if I switch the root
> > > device from virtio to ide.
> >
> > And in the failing case, do you see:
> >
> > /sys/block/vda/serial
> >
> > attribute in sysfs?
Interestingly, just this loop:
while true ; do cat /sys/block/vda/serial && date && sleep 1 ; done
Thu Sep 9 15:29:30 EDT 2010
...
Thu Sep 9 15:31:19 EDT 2010
caused the following hang:
INFO: task cat:1825 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
cat D 000000010000e784 5664 1825 1467 0x00000080
ffff88007bbcdce8 0000000000000046 ffff88007bbcdc88 ffff880000000000
ffff88007bbcdfd8 ffff88007b42a400 00000000001d4040 ffff88007bbcdfd8
00000000001d4040 00000000001d4040 00000000001d4040 00000000001d4040
Call Trace:
[<ffffffff8136de23>] io_schedule+0x73/0xb5
[<ffffffff811b6882>] get_request_wait+0xf2/0x180
[<ffffffff8105d8da>] ? autoremove_wake_function+0x0/0x39
[<ffffffff811b6951>] blk_get_request+0x41/0x71
[<ffffffff811b69ad>] blk_make_request+0x2c/0x8b
[<ffffffffa0016073>] virtblk_get_id+0x57/0x93 [virtio_blk]
[<ffffffffa00160d0>] virtblk_serial_show+0x21/0x4d [virtio_blk]
[<ffffffff81265f7c>] dev_attr_show+0x27/0x4e
[<ffffffff81157f0b>] ? sysfs_read_file+0x94/0x17f
[<ffffffff810c6d6b>] ? __get_free_pages+0x18/0x55
[<ffffffff81157f34>] sysfs_read_file+0xbd/0x17f
[<ffffffff81100e7a>] vfs_read+0xab/0x108
[<ffffffff8106e428>] ? trace_hardirqs_on_caller+0x11d/0x141
[<ffffffff81100f97>] sys_read+0x4a/0x6e
[<ffffffff81002bf2>] system_call_fastpath+0x16/0x1b
2 locks held by cat/1825:
#0: (&buffer->mutex){+.+.+.}, at: [<ffffffff81157eaf>] sysfs_read_file+0x38/0x17f
#1: (s_active#14){.+.+.+}, at: [<ffffffff81157f0b>] sysfs_read_file+0x94/0x17f
So it seems like the virtio requests aren't being properly cleaned up?
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [REGRESSION][BISECTED] virtio-blk serial attribute causes guest to hang [Was: Re: [PATCH UPDATED 4/5] dm: implement REQ_FLUSH/FUA support for request-based dm]
2010-09-09 19:43 ` Mike Snitzer
@ 2010-09-09 20:14 ` Mike Snitzer
2010-09-09 20:30 ` Ryan Harper
0 siblings, 1 reply; 16+ messages in thread
From: Mike Snitzer @ 2010-09-09 20:14 UTC (permalink / raw)
To: Ryan Harper
Cc: Tejun Heo, Mikulas Patocka, dm-devel, Vivek Goyal, john.cooper,
rusty, hch, kvm
[-- Attachment #1: Type: text/plain, Size: 1122 bytes --]
On Thu, Sep 09 2010 at 3:43pm -0400,
Mike Snitzer <snitzer@redhat.com> wrote:
> Interestingly, just this loop:
>
> while true ; do cat /sys/block/vda/serial && date && sleep 1 ; done
> Thu Sep 9 15:29:30 EDT 2010
> ...
> Thu Sep 9 15:31:19 EDT 2010
>
> caused the following hang:
...
> So it seems like the virtio requests aren't being properly cleaned up?
Yeap, here is the result with the attached debug patch that Vivek wrote
last week to help chase this issue (which adds 'nr_requests_used'). We
thought the mpath device might be leaking requests; concern for other
devices wasn't on our radar:
# cat /sys/block/vda/queue/nr_requests
128
# while true ; do cat /sys/block/vda/queue/nr_requests_used && cat /sys/block/vda/serial && date && sleep 1 ; done
10
Thu Sep 9 16:04:40 EDT 2010
11
Thu Sep 9 16:04:41 EDT 2010
...
Thu Sep 9 16:06:38 EDT 2010
127
Thu Sep 9 16:06:39 EDT 2010
128
I'll have a quick look at the virtio-blk code to see if I can spot where
the request isn't getting cleaned up. But I welcome others to have a
look too (I've already spent entirely way to much time on this issue).
Mike
[-- Attachment #2: export-nr-requests-throuth-sysfs.patch --]
[-- Type: text/plain, Size: 1650 bytes --]
---
block/blk-sysfs.c | 19 +++++++++++++++++++
1 file changed, 19 insertions(+)
Index: linux-2.6/block/blk-sysfs.c
===================================================================
--- linux-2.6.orig/block/blk-sysfs.c 2010-09-01 09:23:55.000000000 -0400
+++ linux-2.6/block/blk-sysfs.c 2010-09-01 17:55:50.000000000 -0400
@@ -36,6 +36,19 @@ static ssize_t queue_requests_show(struc
return queue_var_show(q->nr_requests, (page));
}
+static ssize_t queue_requests_used_show(struct request_queue *q, char *page)
+{
+ struct request_list *rl = &q->rq;
+
+ printk("Vivek: count[sync]=%d count[async]=%d"
+ " congestion_on_thres=%d queue_congestion_off_threshold=%d\n",
+ rl->count[BLK_RW_SYNC], rl->count[BLK_RW_ASYNC],
+ queue_congestion_on_threshold(q),
+ queue_congestion_off_threshold(q));
+
+ return queue_var_show(rl->count[BLK_RW_SYNC], (page));
+}
+
static ssize_t
queue_requests_store(struct request_queue *q, const char *page, size_t count)
{
@@ -266,6 +279,11 @@ static struct queue_sysfs_entry queue_re
.store = queue_requests_store,
};
+static struct queue_sysfs_entry queue_requests_used_entry = {
+ .attr = {.name = "nr_requests_used", .mode = S_IRUGO | S_IWUSR },
+ .show = queue_requests_used_show,
+};
+
static struct queue_sysfs_entry queue_ra_entry = {
.attr = {.name = "read_ahead_kb", .mode = S_IRUGO | S_IWUSR },
.show = queue_ra_show,
@@ -371,6 +389,7 @@ static struct queue_sysfs_entry queue_ra
static struct attribute *default_attrs[] = {
&queue_requests_entry.attr,
+ &queue_requests_used_entry.attr,
&queue_ra_entry.attr,
&queue_max_hw_sectors_entry.attr,
&queue_max_sectors_entry.attr,
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [REGRESSION][BISECTED] virtio-blk serial attribute causes guest to hang [Was: Re: [PATCH UPDATED 4/5] dm: implement REQ_FLUSH/FUA support for request-based dm]
2010-09-09 20:14 ` Mike Snitzer
@ 2010-09-09 20:30 ` Ryan Harper
2010-09-09 21:00 ` [PATCH] virtio-blk: put request that was created to retrieve the device id Mike Snitzer
0 siblings, 1 reply; 16+ messages in thread
From: Ryan Harper @ 2010-09-09 20:30 UTC (permalink / raw)
To: Mike Snitzer
Cc: Ryan Harper, Tejun Heo, Mikulas Patocka, dm-devel, Vivek Goyal,
john.cooper, rusty, hch, kvm
* Mike Snitzer <snitzer@redhat.com> [2010-09-09 15:15]:
> On Thu, Sep 09 2010 at 3:43pm -0400,
> Mike Snitzer <snitzer@redhat.com> wrote:
>
> > Interestingly, just this loop:
> >
> > while true ; do cat /sys/block/vda/serial && date && sleep 1 ; done
> > Thu Sep 9 15:29:30 EDT 2010
> > ...
> > Thu Sep 9 15:31:19 EDT 2010
> >
> > caused the following hang:
> ...
> > So it seems like the virtio requests aren't being properly cleaned up?
>
> Yeap, here is the result with the attached debug patch that Vivek wrote
> last week to help chase this issue (which adds 'nr_requests_used'). We
> thought the mpath device might be leaking requests; concern for other
> devices wasn't on our radar:
>
> # cat /sys/block/vda/queue/nr_requests
> 128
>
> # while true ; do cat /sys/block/vda/queue/nr_requests_used && cat /sys/block/vda/serial && date && sleep 1 ; done
> 10
> Thu Sep 9 16:04:40 EDT 2010
> 11
> Thu Sep 9 16:04:41 EDT 2010
> ...
> Thu Sep 9 16:06:38 EDT 2010
> 127
> Thu Sep 9 16:06:39 EDT 2010
> 128
>
> I'll have a quick look at the virtio-blk code to see if I can spot where
> the request isn't getting cleaned up. But I welcome others to have a
> look too (I've already spent entirely way to much time on this issue).
The qemu on the host isn't new enough to handle the request. This
serial attribute should have had a feature bit with it (it did at one
point in one of the previous forms of the virtio-blk serial patch
series, but it isn't present now) so we don't expose the attribute
unless backend can handle the request type.
For immediate relief, it's probably easiest to revert the kernel-side
commit (or comment out the device_create_file() call after add_disk() in
virtblk_probe().
--
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
ryanh@us.ibm.com
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH] virtio-blk: put request that was created to retrieve the device id
2010-09-09 20:30 ` Ryan Harper
@ 2010-09-09 21:00 ` Mike Snitzer
2010-09-09 21:15 ` Christoph Hellwig
2010-10-09 1:41 ` [PATCH] " Rusty Russell
0 siblings, 2 replies; 16+ messages in thread
From: Mike Snitzer @ 2010-09-09 21:00 UTC (permalink / raw)
To: Ryan Harper
Cc: Tejun Heo, Mikulas Patocka, dm-devel, Vivek Goyal, john.cooper,
rusty, hch, kvm
On Thu, Sep 09 2010 at 4:30pm -0400,
Ryan Harper <ryanh@us.ibm.com> wrote:
> * Mike Snitzer <snitzer@redhat.com> [2010-09-09 15:15]:
> > On Thu, Sep 09 2010 at 3:43pm -0400,
> > Mike Snitzer <snitzer@redhat.com> wrote:
> >
> > > Interestingly, just this loop:
> > >
> > > while true ; do cat /sys/block/vda/serial && date && sleep 1 ; done
> > > Thu Sep 9 15:29:30 EDT 2010
> > > ...
> > > Thu Sep 9 15:31:19 EDT 2010
> > >
> > > caused the following hang:
> > ...
> > > So it seems like the virtio requests aren't being properly cleaned up?
> >
> > Yeap, here is the result with the attached debug patch that Vivek wrote
> > last week to help chase this issue (which adds 'nr_requests_used'). We
> > thought the mpath device might be leaking requests; concern for other
> > devices wasn't on our radar:
> >
> > # cat /sys/block/vda/queue/nr_requests
> > 128
> >
> > # while true ; do cat /sys/block/vda/queue/nr_requests_used && cat /sys/block/vda/serial && date && sleep 1 ; done
> > 10
> > Thu Sep 9 16:04:40 EDT 2010
> > 11
> > Thu Sep 9 16:04:41 EDT 2010
> > ...
> > Thu Sep 9 16:06:38 EDT 2010
> > 127
> > Thu Sep 9 16:06:39 EDT 2010
> > 128
> >
> > I'll have a quick look at the virtio-blk code to see if I can spot where
> > the request isn't getting cleaned up. But I welcome others to have a
> > look too (I've already spent entirely way to much time on this issue).
>
> The qemu on the host isn't new enough to handle the request. This
> serial attribute should have had a feature bit with it (it did at one
> point in one of the previous forms of the virtio-blk serial patch
> series, but it isn't present now) so we don't expose the attribute
> unless backend can handle the request type.
Be that as it may, it doesn't change the fact that the request created
in virtblk_get_id (via blk_make_request) isn't being properly cleaned
up.
> For immediate relief, it's probably easiest to revert the kernel-side
> commit (or comment out the device_create_file() call after add_disk() in
> virtblk_probe().
This patch fixes the issue for me; Rusty and/or Christoph please
review/advise.
From: Mike Snitzer <snitzer@redhat.com>
Subject: virtio-blk: put request that was created to retrieve the device id
Must drop reference taken by blk_make_request().
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
drivers/block/virtio_blk.c | 6 +++++-
1 files changed, 5 insertions(+), 1 deletions(-)
diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index 1260628..831e75c 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -199,6 +199,7 @@ static int virtblk_get_id(struct gendisk *disk, char *id_str)
struct virtio_blk *vblk = disk->private_data;
struct request *req;
struct bio *bio;
+ int err;
bio = bio_map_kern(vblk->disk->queue, id_str, VIRTIO_BLK_ID_BYTES,
GFP_KERNEL);
@@ -212,7 +213,10 @@ static int virtblk_get_id(struct gendisk *disk, char *id_str)
}
req->cmd_type = REQ_TYPE_SPECIAL;
- return blk_execute_rq(vblk->disk->queue, vblk->disk, req, false);
+ err = blk_execute_rq(vblk->disk->queue, vblk->disk, req, false);
+ blk_put_request(req);
+
+ return err;
}
static int virtblk_locked_ioctl(struct block_device *bdev, fmode_t mode,
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH] virtio-blk: put request that was created to retrieve the device id
2010-09-09 21:00 ` [PATCH] virtio-blk: put request that was created to retrieve the device id Mike Snitzer
@ 2010-09-09 21:15 ` Christoph Hellwig
2010-09-17 14:58 ` Ryan Harper
2010-10-09 1:41 ` [PATCH] " Rusty Russell
1 sibling, 1 reply; 16+ messages in thread
From: Christoph Hellwig @ 2010-09-09 21:15 UTC (permalink / raw)
To: Mike Snitzer
Cc: Ryan Harper, Tejun Heo, Mikulas Patocka, dm-devel, Vivek Goyal,
john.cooper, rusty, hch, kvm
On Thu, Sep 09, 2010 at 05:00:42PM -0400, Mike Snitzer wrote:
> diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
> index 1260628..831e75c 100644
> --- a/drivers/block/virtio_blk.c
> +++ b/drivers/block/virtio_blk.c
> @@ -199,6 +199,7 @@ static int virtblk_get_id(struct gendisk *disk, char *id_str)
> struct virtio_blk *vblk = disk->private_data;
> struct request *req;
> struct bio *bio;
> + int err;
>
> bio = bio_map_kern(vblk->disk->queue, id_str, VIRTIO_BLK_ID_BYTES,
> GFP_KERNEL);
> @@ -212,7 +213,10 @@ static int virtblk_get_id(struct gendisk *disk, char *id_str)
> }
>
> req->cmd_type = REQ_TYPE_SPECIAL;
> - return blk_execute_rq(vblk->disk->queue, vblk->disk, req, false);
> + err = blk_execute_rq(vblk->disk->queue, vblk->disk, req, false);
> + blk_put_request(req);
This looks correct as far as the request is concerned, but we're still
leaking the bio.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] virtio-blk: put request that was created to retrieve the device id
2010-09-09 21:15 ` Christoph Hellwig
@ 2010-09-17 14:58 ` Ryan Harper
2010-09-21 21:00 ` Christoph Hellwig
0 siblings, 1 reply; 16+ messages in thread
From: Ryan Harper @ 2010-09-17 14:58 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Mike Snitzer, Ryan Harper, Tejun Heo, Mikulas Patocka, dm-devel,
Vivek Goyal, john.cooper, rusty, kvm
* Christoph Hellwig <hch@infradead.org> [2010-09-09 16:18]:
> On Thu, Sep 09, 2010 at 05:00:42PM -0400, Mike Snitzer wrote:
> > diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
> > index 1260628..831e75c 100644
> > --- a/drivers/block/virtio_blk.c
> > +++ b/drivers/block/virtio_blk.c
> > @@ -199,6 +199,7 @@ static int virtblk_get_id(struct gendisk *disk, char *id_str)
> > struct virtio_blk *vblk = disk->private_data;
> > struct request *req;
> > struct bio *bio;
> > + int err;
> >
> > bio = bio_map_kern(vblk->disk->queue, id_str, VIRTIO_BLK_ID_BYTES,
> > GFP_KERNEL);
> > @@ -212,7 +213,10 @@ static int virtblk_get_id(struct gendisk *disk, char *id_str)
> > }
> >
> > req->cmd_type = REQ_TYPE_SPECIAL;
> > - return blk_execute_rq(vblk->disk->queue, vblk->disk, req, false);
> > + err = blk_execute_rq(vblk->disk->queue, vblk->disk, req, false);
> > + blk_put_request(req);
>
> This looks correct as far as the request is concerned, but we're still
> leaking the bio.
Since __bio_map_kern() sets up bio->bi_end_io = bio_map_kern_endio
(which does a bio_put(bio)) doesn't that ensure we don't leak?
--
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
ryanh@us.ibm.com
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] virtio-blk: put request that was created to retrieve the device id
2010-09-17 14:58 ` Ryan Harper
@ 2010-09-21 21:00 ` Christoph Hellwig
2010-10-08 16:06 ` [2.6.36 REGRESSION] " Mike Snitzer
0 siblings, 1 reply; 16+ messages in thread
From: Christoph Hellwig @ 2010-09-21 21:00 UTC (permalink / raw)
To: Ryan Harper
Cc: Christoph Hellwig, Mike Snitzer, Tejun Heo, Mikulas Patocka,
dm-devel, Vivek Goyal, john.cooper, rusty, kvm
On Fri, Sep 17, 2010 at 09:58:48AM -0500, Ryan Harper wrote:
> Since __bio_map_kern() sets up bio->bi_end_io = bio_map_kern_endio
> (which does a bio_put(bio)) doesn't that ensure we don't leak?
Indeed, that should take care of it.
^ permalink raw reply [flat|nested] 16+ messages in thread
* [2.6.36 REGRESSION] Re: virtio-blk: put request that was created to retrieve the device id
2010-09-21 21:00 ` Christoph Hellwig
@ 2010-10-08 16:06 ` Mike Snitzer
0 siblings, 0 replies; 16+ messages in thread
From: Mike Snitzer @ 2010-10-08 16:06 UTC (permalink / raw)
To: rusty
Cc: Christoph Hellwig, Ryan Harper, Tejun Heo, Mikulas Patocka,
dm-devel, Vivek Goyal, john.cooper, kvm, linux-kernel
Hi Rusty,
On Tue, Sep 21 2010 at 5:00pm -0400,
Christoph Hellwig <hch@infradead.org> wrote:
> On Fri, Sep 17, 2010 at 09:58:48AM -0500, Ryan Harper wrote:
> > Since __bio_map_kern() sets up bio->bi_end_io = bio_map_kern_endio
> > (which does a bio_put(bio)) doesn't that ensure we don't leak?
>
> Indeed, that should take care of it.
>
We need to fix this regression for 2.6.36. Not sure what else I need to
do to get this on your radar. It's a pretty significant show-stopper
for 2.6.36 guests that use virtio-blk with a udev enabled distro.
Here is a reference to my original patch (which Ryan and hch have both
reviewed): https://patchwork.kernel.org/patch/165571/
Thanks,
Mike
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] virtio-blk: put request that was created to retrieve the device id
2010-09-09 21:00 ` [PATCH] virtio-blk: put request that was created to retrieve the device id Mike Snitzer
2010-09-09 21:15 ` Christoph Hellwig
@ 2010-10-09 1:41 ` Rusty Russell
1 sibling, 0 replies; 16+ messages in thread
From: Rusty Russell @ 2010-10-09 1:41 UTC (permalink / raw)
To: Mike Snitzer; +Cc: Ryan Harper, dm-devel, kvm, john.cooper
On Fri, 10 Sep 2010 06:30:42 am Mike Snitzer wrote:
> On Thu, Sep 09 2010 at 4:30pm -0400,
> Ryan Harper <ryanh@us.ibm.com> wrote:
>
> > * Mike Snitzer <snitzer@redhat.com> [2010-09-09 15:15]:
> > > On Thu, Sep 09 2010 at 3:43pm -0400,
> > > Mike Snitzer <snitzer@redhat.com> wrote:
> > > # while true ; do cat /sys/block/vda/queue/nr_requests_used && cat /sys/block/vda/serial && date && sleep 1 ; done
> > > 10
> > > Thu Sep 9 16:04:40 EDT 2010
> > > 11
...
> > The qemu on the host isn't new enough to handle the request. This
> > serial attribute should have had a feature bit with it (it did at one
> > point in one of the previous forms of the virtio-blk serial patch
> > series, but it isn't present now) so we don't expose the attribute
> > unless backend can handle the request type.
>
> Be that as it may, it doesn't change the fact that the request created
> in virtblk_get_id (via blk_make_request) isn't being properly cleaned
> up.
Thanks for re-sending Mike.
This patch confused me at first, but it's correct. Took me a few
minutes of checking though.
For those not familiar with the block layer, here are the key points:
1) blk_execute_rq waits for the request to finish.
2) blk_execute_rq grabs its own reference to the req.
3) Once qemu finishes with it and sends an interrupt blk_done()
releases that reference via __blk_end_request_all()
4) As caller of blk_make_request, it is our responsibility to
free it after it's finished, ie. after blk_execute_rq.
> This patch fixes the issue for me; Rusty and/or Christoph please
> review/advise.
Thanks, applied, and CC'd stable@kernel.org (it's in 2.6.35 as well).
From: Mike Snitzer <snitzer@redhat.com>
Subject: virtio-blk: fix request leak.
Must drop reference taken by blk_make_request().
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: stable@kernel.org # .35.x
---
drivers/block/virtio_blk.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index 1260628..831e75c 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -199,6 +199,7 @@ static int virtblk_get_id(struct gendisk *disk, char *id_str)
struct virtio_blk *vblk = disk->private_data;
struct request *req;
struct bio *bio;
+ int err;
bio = bio_map_kern(vblk->disk->queue, id_str, VIRTIO_BLK_ID_BYTES,
GFP_KERNEL);
@@ -212,7 +213,10 @@ static int virtblk_get_id(struct gendisk *disk, char *id_str)
}
req->cmd_type = REQ_TYPE_SPECIAL;
- return blk_execute_rq(vblk->disk->queue, vblk->disk, req, false);
+ err = blk_execute_rq(vblk->disk->queue, vblk->disk, req, false);
+ blk_put_request(req);
+
+ return err;
}
static int virtblk_locked_ioctl(struct block_device *bdev, fmode_t mode,
^ permalink raw reply related [flat|nested] 16+ messages in thread
end of thread, other threads:[~2010-10-09 1:41 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20100902032246.GA31484@redhat.com>
2010-09-09 15:26 ` [REGRESSION][BISECTED] virtio-blk serial attribute causes guest to hang [Was: Re: [PATCH UPDATED 4/5] dm: implement REQ_FLUSH/FUA support for request-based dm] Mike Snitzer
2010-09-09 15:44 ` Ryan Harper
2010-09-09 15:57 ` Mike Snitzer
2010-09-09 16:03 ` Ryan Harper
2010-09-09 17:55 ` Mike Snitzer
2010-09-09 18:35 ` Ryan Harper
2010-09-09 19:15 ` Mike Snitzer
2010-09-09 19:43 ` Mike Snitzer
2010-09-09 20:14 ` Mike Snitzer
2010-09-09 20:30 ` Ryan Harper
2010-09-09 21:00 ` [PATCH] virtio-blk: put request that was created to retrieve the device id Mike Snitzer
2010-09-09 21:15 ` Christoph Hellwig
2010-09-17 14:58 ` Ryan Harper
2010-09-21 21:00 ` Christoph Hellwig
2010-10-08 16:06 ` [2.6.36 REGRESSION] " Mike Snitzer
2010-10-09 1:41 ` [PATCH] " Rusty Russell
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox