From: Ryan Harper <ryanh@us.ibm.com>
To: Mike Snitzer <snitzer@redhat.com>
Cc: Tejun Heo <tj@kernel.org>, Mikulas Patocka <mpatocka@redhat.com>,
dm-devel@redhat.com, Vivek Goyal <vgoyal@redhat.com>,
ryanh@us.ibm.com, john.cooper@redhat.com, rusty@rustcorp.com.au,
hch@infradead.org, kvm@vger.kernel.org
Subject: Re: [REGRESSION][BISECTED] virtio-blk serial attribute causes guest to hang [Was: Re: [PATCH UPDATED 4/5] dm: implement REQ_FLUSH/FUA support for request-based dm]
Date: Thu, 9 Sep 2010 10:44:42 -0500 [thread overview]
Message-ID: <20100909154442.GI30086@us.ibm.com> (raw)
In-Reply-To: <20100909152658.GA8118@redhat.com>
* Mike Snitzer <snitzer@redhat.com> [2010-09-09 10:29]:
> On Wed, Sep 01 2010 at 11:22pm -0400,
> Mike Snitzer <snitzer@redhat.com> wrote:
>
> > On Wed, Sep 01 2010 at 2:59pm -0400,
> > Mike Snitzer <snitzer@redhat.com> wrote:
> >
> > > My hope was that the request-based deadlock I'm seeing would disappear
> > > if that relaxed ordering patch wasn't applied. Unfortunately, I still
> > > see the hang.
> >
> > Turns out I can reproduce the hang on a stock 2.6.36-rc3 (without _any_
> > FLUSH+FUA patches)!
> >
> > I'll try to pin-point the root cause but I think my test is somehow
> > exposing a bug in my virt setup.
>
> [my virt setup == single kvm guest (RHEL6) with F13 host]
What's your kvm guest command line? And the guest is using stock RHEL6
kernel? What KVM userspace are you using on the host? What comes with
F13 or some updated version?
>
> My gut turned out to be correct. I finally tracked down the regression
> point to the following commit (cc'ing appropriate people):
>
> commit a5eb9e4ff18a33e43557d44b205f953b0c1efade
> Author: Ryan Harper <ryanh@us.ibm.com>
> Date: Wed Jun 23 22:19:57 2010 -0500
>
> virtio_blk: Add 'serial' attribute to virtio-blk devices (v2)
>
> Create a new attribute for virtio-blk devices that will fetch the serial number
> of the block device. This attribute can be used by udev to create disk/by-id
> symlinks for devices that don't have a UUID (filesystem) associated with them.
>
> ATA_IDENTIFY strings are special in that they can be up to 20 chars long
> and aren't required to be nul-terminated. The buffer is also zero-padded
> meaning that if the serial is 19 chars or less that we get a nul-terminated
> string. When copying this value into a string buffer, we must be careful to
> copy up to the nul (if it present) and only 20 if it is longer and not to
> attempt to nul terminate; this isn't needed.
>
> Changes since v1:
> - Added BUILD_BUG_ON() for PAGE_SIZE check
> - Removed min() since BUILD_BUG_ON() handles the check
> - Replaced serial_sysfs() by copying id directly to buffer
>
> Signed-off-by: Ryan Harper <ryanh@us.ibm.com>
> Signed-off-by: john cooper <john.cooper@redhat.com>
> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
>
> So the first released kernel to have this regression is 2.6.36-rc1.
>
> Some background:
> I have been working with Tejun to test the barrier to FLUSH+FUA
> conversion patchset. I crafted the attached script to test the DM
> changes that are part of the FLUSH+FUA patchset.
>
> Using this script with:
> while true ; do ./test_dm_discard_mpath_scsi_debug.sh ; done
>
> I can reliably trigger the following hang, always on the 5th iteration
> in my testing, IFF commit a5eb9e4ff18a33e43557d44b205f953b0c1efade is
> applied:
>
> INFO: task lvcreate:2484 blocked for more than 120 seconds.
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
> message.
> lvcreate D 0000000100064871 4960 2484 2350 0x00000080
> ffff88007b87b978 0000000000000046 ffff88007b87b8e8 ffff880000000000
> ffff88007b87bfd8 ffff8800724fa400 00000000001d4040 ffff88007b87bfd8
> 00000000001d4040 00000000001d4040 00000000001d4040 00000000001d4040
> Call Trace:
> [<ffffffff8136de23>] io_schedule+0x73/0xb5
> [<ffffffff811b6882>] get_request_wait+0xf2/0x180
> [<ffffffff8105d8da>] ? autoremove_wake_function+0x0/0x39
> [<ffffffff811b6deb>] __make_request+0x310/0x434
> [<ffffffff811b5442>] generic_make_request+0x2f1/0x36e
> [<ffffffff81062f78>] ? cpu_clock+0x43/0x5e
> [<ffffffff811b559d>] submit_bio+0xde/0xfb
> [<ffffffff8106e459>] ? trace_hardirqs_on+0xd/0xf
> [<ffffffff81129332>] dio_bio_submit+0x7b/0x9c
> [<ffffffff8112939d>] dio_send_cur_page+0x4a/0xb0
> [<ffffffff81129f1c>] __blockdev_direct_IO_newtrunc+0x7c5/0x97d
> [<ffffffff81127f4f>] blkdev_direct_IO+0x57/0x59
> [<ffffffff81127080>] ? blkdev_get_blocks+0x0/0x90
> [<ffffffff810c2eee>] generic_file_aio_read+0xed/0x5b4
> [<ffffffff810d70d4>] ? might_fault+0x5c/0xac
> [<ffffffff810242bd>] ? pvclock_clocksource_read+0x50/0xb9
> [<ffffffff81100813>] do_sync_read+0xcb/0x108
> [<ffffffff8136e5ad>] ? __mutex_unlock_slowpath+0x119/0x12b
> [<ffffffff8106e428>] ? trace_hardirqs_on_caller+0x11d/0x141
> [<ffffffff8106e459>] ? trace_hardirqs_on+0xd/0xf
> [<ffffffff8118cae7>] ? security_file_permission+0x16/0x18
> [<ffffffff81100e7a>] vfs_read+0xab/0x108
> [<ffffffff8106e428>] ? trace_hardirqs_on_caller+0x11d/0x141
> [<ffffffff81100f97>] sys_read+0x4a/0x6e
> [<ffffffff81002bf2>] system_call_fastpath+0x16/0x1b
> no locks held by lvcreate/2484.
>
>
> lvcreate is just the first victim (sometimes it is the vgcreate). But
> if the guest is left running other new processes get hung with
> comparable traces (w/ get_request_wait). Until eventually the guest is
> completely unresponsive.
>
> Mike
--
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
ryanh@us.ibm.com
next prev parent reply other threads:[~2010-09-09 15:44 UTC|newest]
Thread overview: 88+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-08-30 9:58 [PATCHSET 2.6.36-rc2] block, dm: finish REQ_FLUSH/FUA conversion, take#2 Tejun Heo
2010-08-30 9:58 ` Tejun Heo
2010-08-30 9:58 ` [PATCH 1/5] block: make __blk_rq_prep_clone() copy most command flags Tejun Heo
2010-08-30 9:58 ` Tejun Heo
2010-09-01 15:30 ` Christoph Hellwig
2010-08-30 9:58 ` Tejun Heo
2010-08-30 9:58 ` Tejun Heo
2010-08-30 9:58 ` [PATCH 2/5] dm: implement REQ_FLUSH/FUA support for bio-based dm Tejun Heo
2010-08-30 9:58 ` Tejun Heo
2010-09-01 13:43 ` Mike Snitzer
2010-09-01 13:50 ` Tejun Heo
2010-09-01 13:54 ` Mike Snitzer
2010-09-01 13:56 ` Tejun Heo
2010-08-30 9:58 ` Tejun Heo
2010-08-30 9:58 ` [PATCH 3/5] dm: relax ordering of bio-based flush implementation Tejun Heo
2010-08-30 9:58 ` Tejun Heo
2010-08-30 9:58 ` Tejun Heo
2010-09-01 13:51 ` Mike Snitzer
2010-09-01 13:51 ` Mike Snitzer
2010-09-01 13:56 ` Tejun Heo
2010-09-01 13:56 ` Tejun Heo
2010-09-03 6:04 ` Kiyoshi Ueda
2010-09-03 9:42 ` Tejun Heo
2010-08-30 9:58 ` [PATCH 4/5] dm: implement REQ_FLUSH/FUA support for request-based dm Tejun Heo
2010-08-30 9:58 ` Tejun Heo
2010-08-30 9:58 ` Tejun Heo
2010-08-30 13:28 ` Mike Snitzer
2010-08-30 13:28 ` Mike Snitzer
2010-08-30 13:59 ` Tejun Heo
2010-08-30 15:07 ` Tejun Heo
2010-08-30 19:08 ` Mike Snitzer
2010-08-30 19:08 ` Mike Snitzer
2010-08-30 21:28 ` Mike Snitzer
2010-08-31 10:29 ` Tejun Heo
2010-08-31 13:02 ` Mike Snitzer
2010-08-31 13:14 ` Tejun Heo
2010-08-30 15:07 ` Tejun Heo
2010-08-30 15:42 ` [PATCH] block: initialize flush request with WRITE_FLUSH instead of REQ_FLUSH Tejun Heo
2010-08-30 15:42 ` Tejun Heo
2010-08-30 15:45 ` [PATCH UPDATED 4/5] dm: implement REQ_FLUSH/FUA support for request-based dm Tejun Heo
2010-08-30 15:45 ` Tejun Heo
2010-08-30 19:18 ` Mike Snitzer
2010-08-30 19:18 ` Mike Snitzer
2010-09-01 7:15 ` Kiyoshi Ueda
2010-09-01 12:25 ` Mike Snitzer
2010-09-02 13:22 ` Tejun Heo
2010-09-02 13:32 ` Tejun Heo
2010-09-03 5:46 ` Kiyoshi Ueda
2010-09-02 17:43 ` [PATCH] block: make sure FSEQ_DATA request has the same rq_disk as the original Tejun Heo
2010-09-03 5:47 ` Kiyoshi Ueda
2010-09-03 9:33 ` Tejun Heo
2010-09-03 10:28 ` Kiyoshi Ueda
2010-09-03 11:42 ` Tejun Heo
2010-09-03 11:51 ` Kiyoshi Ueda
[not found] ` <20100830194731.GA10702@redhat.com>
2010-09-01 10:31 ` [PATCH UPDATED 4/5] dm: implement REQ_FLUSH/FUA support for request-based dm Mikulas Patocka
2010-09-01 11:20 ` Tejun Heo
2010-09-01 12:12 ` Mikulas Patocka
2010-09-01 12:42 ` Tejun Heo
2010-09-01 12:54 ` Mike Snitzer
2010-09-01 15:20 ` Mike Snitzer
2010-09-01 15:35 ` Mikulas Patocka
2010-09-01 17:07 ` Mike Snitzer
2010-09-01 18:59 ` Mike Snitzer
2010-09-02 3:22 ` Mike Snitzer
2010-09-02 10:24 ` Tejun Heo
2010-09-02 15:11 ` Mike Snitzer
2010-09-09 15:26 ` [REGRESSION][BISECTED] virtio-blk serial attribute causes guest to hang [Was: Re: [PATCH UPDATED 4/5] dm: implement REQ_FLUSH/FUA support for request-based dm] Mike Snitzer
2010-09-09 15:44 ` Ryan Harper [this message]
2010-09-09 15:57 ` Mike Snitzer
2010-09-09 16:03 ` Ryan Harper
2010-09-09 17:55 ` Mike Snitzer
2010-09-09 18:35 ` Ryan Harper
2010-09-09 19:15 ` Mike Snitzer
2010-09-09 19:43 ` Mike Snitzer
2010-09-09 20:14 ` Mike Snitzer
2010-09-09 20:30 ` Ryan Harper
2010-09-09 21:00 ` [PATCH] virtio-blk: put request that was created to retrieve the device id Mike Snitzer
2010-09-09 21:15 ` Christoph Hellwig
2010-09-17 14:58 ` Ryan Harper
2010-09-21 21:00 ` Christoph Hellwig
2010-10-08 16:06 ` [2.6.36 REGRESSION] " Mike Snitzer
2010-10-09 1:41 ` [PATCH] " Rusty Russell
2010-08-30 13:59 ` [PATCH 4/5] dm: implement REQ_FLUSH/FUA support for request-based dm Tejun Heo
2010-08-30 9:58 ` Tejun Heo
2010-08-30 9:58 ` [PATCH 5/5] block: remove the WRITE_BARRIER flag Tejun Heo
2010-08-30 9:58 ` Tejun Heo
2010-08-30 9:58 ` Tejun Heo
2010-08-30 9:58 ` Tejun Heo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100909154442.GI30086@us.ibm.com \
--to=ryanh@us.ibm.com \
--cc=dm-devel@redhat.com \
--cc=hch@infradead.org \
--cc=john.cooper@redhat.com \
--cc=kvm@vger.kernel.org \
--cc=mpatocka@redhat.com \
--cc=rusty@rustcorp.com.au \
--cc=snitzer@redhat.com \
--cc=tj@kernel.org \
--cc=vgoyal@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.