All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mike Snitzer <snitzer@redhat.com>
To: Tejun Heo <tj@kernel.org>
Cc: Mikulas Patocka <mpatocka@redhat.com>,
	dm-devel@redhat.com, Vivek Goyal <vgoyal@redhat.com>,
	ryanh@us.ibm.com, john.cooper@redhat.com, rusty@rustcorp.com.au,
	hch@infradead.org, kvm@vger.kernel.org
Subject: [REGRESSION][BISECTED] virtio-blk serial attribute causes guest to hang [Was: Re: [PATCH UPDATED 4/5] dm: implement REQ_FLUSH/FUA support for request-based dm]
Date: Thu, 9 Sep 2010 11:26:58 -0400	[thread overview]
Message-ID: <20100909152658.GA8118@redhat.com> (raw)
In-Reply-To: <20100902032246.GA31484@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 4471 bytes --]

On Wed, Sep 01 2010 at 11:22pm -0400,
Mike Snitzer <snitzer@redhat.com> wrote:

> On Wed, Sep 01 2010 at  2:59pm -0400,
> Mike Snitzer <snitzer@redhat.com> wrote:
> 
> > My hope was that the request-based deadlock I'm seeing would disappear
> > if that relaxed ordering patch wasn't applied.  Unfortunately, I still
> > see the hang.
> 
> Turns out I can reproduce the hang on a stock 2.6.36-rc3 (without _any_
> FLUSH+FUA patches)!
> 
> I'll try to pin-point the root cause but I think my test is somehow
> exposing a bug in my virt setup.

[my virt setup == single kvm guest (RHEL6) with F13 host]

My gut turned out to be correct.  I finally tracked down the regression
point to the following commit (cc'ing appropriate people):

commit a5eb9e4ff18a33e43557d44b205f953b0c1efade
Author: Ryan Harper <ryanh@us.ibm.com>
Date:   Wed Jun 23 22:19:57 2010 -0500

    virtio_blk: Add 'serial' attribute to virtio-blk devices (v2)
    
    Create a new attribute for virtio-blk devices that will fetch the serial number
    of the block device.  This attribute can be used by udev to create disk/by-id
    symlinks for devices that don't have a UUID (filesystem) associated with them.
    
    ATA_IDENTIFY strings are special in that they can be up to 20 chars long
    and aren't required to be nul-terminated.  The buffer is also zero-padded
    meaning that if the serial is 19 chars or less that we get a nul-terminated
    string.  When copying this value into a string buffer, we must be careful to
    copy up to the nul (if it present) and only 20 if it is longer and not to
    attempt to nul terminate; this isn't needed.
    
    Changes since v1:
    - Added BUILD_BUG_ON() for PAGE_SIZE check
    - Removed min() since BUILD_BUG_ON() handles the check
    - Replaced serial_sysfs() by copying id directly to buffer
    
    Signed-off-by: Ryan Harper <ryanh@us.ibm.com>
    Signed-off-by: john cooper <john.cooper@redhat.com>
    Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

So the first released kernel to have this regression is 2.6.36-rc1.

Some background:
I have been working with Tejun to test the barrier to FLUSH+FUA
conversion patchset.  I crafted the attached script to test the DM
changes that are part of the FLUSH+FUA patchset.

Using this script with:
while true ; do ./test_dm_discard_mpath_scsi_debug.sh ; done

I can reliably trigger the following hang, always on the 5th iteration
in my testing, IFF commit a5eb9e4ff18a33e43557d44b205f953b0c1efade is
applied:

INFO: task lvcreate:2484 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
lvcreate      D 0000000100064871  4960  2484   2350 0x00000080
 ffff88007b87b978 0000000000000046 ffff88007b87b8e8 ffff880000000000
 ffff88007b87bfd8 ffff8800724fa400 00000000001d4040 ffff88007b87bfd8
 00000000001d4040 00000000001d4040 00000000001d4040 00000000001d4040
Call Trace:
 [<ffffffff8136de23>] io_schedule+0x73/0xb5
 [<ffffffff811b6882>] get_request_wait+0xf2/0x180
 [<ffffffff8105d8da>] ? autoremove_wake_function+0x0/0x39
 [<ffffffff811b6deb>] __make_request+0x310/0x434
 [<ffffffff811b5442>] generic_make_request+0x2f1/0x36e
 [<ffffffff81062f78>] ? cpu_clock+0x43/0x5e
 [<ffffffff811b559d>] submit_bio+0xde/0xfb
 [<ffffffff8106e459>] ? trace_hardirqs_on+0xd/0xf
 [<ffffffff81129332>] dio_bio_submit+0x7b/0x9c
 [<ffffffff8112939d>] dio_send_cur_page+0x4a/0xb0
 [<ffffffff81129f1c>] __blockdev_direct_IO_newtrunc+0x7c5/0x97d
 [<ffffffff81127f4f>] blkdev_direct_IO+0x57/0x59
 [<ffffffff81127080>] ? blkdev_get_blocks+0x0/0x90
 [<ffffffff810c2eee>] generic_file_aio_read+0xed/0x5b4
 [<ffffffff810d70d4>] ? might_fault+0x5c/0xac
 [<ffffffff810242bd>] ? pvclock_clocksource_read+0x50/0xb9
 [<ffffffff81100813>] do_sync_read+0xcb/0x108
 [<ffffffff8136e5ad>] ? __mutex_unlock_slowpath+0x119/0x12b
 [<ffffffff8106e428>] ? trace_hardirqs_on_caller+0x11d/0x141
 [<ffffffff8106e459>] ? trace_hardirqs_on+0xd/0xf
 [<ffffffff8118cae7>] ? security_file_permission+0x16/0x18
 [<ffffffff81100e7a>] vfs_read+0xab/0x108
 [<ffffffff8106e428>] ? trace_hardirqs_on_caller+0x11d/0x141
 [<ffffffff81100f97>] sys_read+0x4a/0x6e
 [<ffffffff81002bf2>] system_call_fastpath+0x16/0x1b
no locks held by lvcreate/2484.


lvcreate is just the first victim (sometimes it is the vgcreate).  But
if the guest is left running other new processes get hung with
comparable traces (w/ get_request_wait).  Until eventually the guest is
completely unresponsive.

Mike

[-- Attachment #2: test_dm_discard_mpath_scsi_debug.sh --]
[-- Type: application/x-sh, Size: 907 bytes --]

  parent reply	other threads:[~2010-09-09 15:26 UTC|newest]

Thread overview: 88+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-08-30  9:58 [PATCHSET 2.6.36-rc2] block, dm: finish REQ_FLUSH/FUA conversion, take#2 Tejun Heo
2010-08-30  9:58 ` Tejun Heo
2010-08-30  9:58 ` [PATCH 1/5] block: make __blk_rq_prep_clone() copy most command flags Tejun Heo
2010-08-30  9:58 ` Tejun Heo
2010-08-30  9:58   ` Tejun Heo
2010-09-01 15:30   ` Christoph Hellwig
2010-08-30  9:58 ` Tejun Heo
2010-08-30  9:58 ` [PATCH 2/5] dm: implement REQ_FLUSH/FUA support for bio-based dm Tejun Heo
2010-08-30  9:58 ` Tejun Heo
2010-08-30  9:58   ` Tejun Heo
2010-09-01 13:43   ` Mike Snitzer
2010-09-01 13:50     ` Tejun Heo
2010-09-01 13:54       ` Mike Snitzer
2010-09-01 13:56         ` Tejun Heo
2010-08-30  9:58 ` [PATCH 3/5] dm: relax ordering of bio-based flush implementation Tejun Heo
2010-08-30  9:58   ` Tejun Heo
2010-09-01 13:51   ` Mike Snitzer
2010-09-01 13:51     ` Mike Snitzer
2010-09-01 13:56     ` Tejun Heo
2010-09-01 13:56       ` Tejun Heo
2010-09-03  6:04   ` Kiyoshi Ueda
2010-09-03  9:42     ` Tejun Heo
2010-08-30  9:58 ` Tejun Heo
2010-08-30  9:58 ` [PATCH 4/5] dm: implement REQ_FLUSH/FUA support for request-based dm Tejun Heo
2010-08-30  9:58   ` Tejun Heo
2010-08-30 13:28   ` Mike Snitzer
2010-08-30 13:28   ` Mike Snitzer
2010-08-30 13:59     ` Tejun Heo
2010-08-30 15:07       ` Tejun Heo
2010-08-30 19:08         ` Mike Snitzer
2010-08-30 21:28           ` Mike Snitzer
2010-08-31 10:29             ` Tejun Heo
2010-08-31 13:02               ` Mike Snitzer
2010-08-31 13:14                 ` Tejun Heo
2010-08-30 19:08         ` Mike Snitzer
2010-08-30 15:07       ` Tejun Heo
2010-08-30 15:42       ` [PATCH] block: initialize flush request with WRITE_FLUSH instead of REQ_FLUSH Tejun Heo
2010-08-30 15:42       ` Tejun Heo
2010-08-30 15:45       ` [PATCH UPDATED 4/5] dm: implement REQ_FLUSH/FUA support for request-based dm Tejun Heo
2010-08-30 15:45       ` Tejun Heo
2010-08-30 19:18         ` Mike Snitzer
2010-08-30 19:18         ` Mike Snitzer
2010-09-01  7:15         ` Kiyoshi Ueda
2010-09-01 12:25           ` Mike Snitzer
2010-09-02 13:22           ` Tejun Heo
2010-09-02 13:32             ` Tejun Heo
2010-09-03  5:46             ` Kiyoshi Ueda
2010-09-02 17:43           ` [PATCH] block: make sure FSEQ_DATA request has the same rq_disk as the original Tejun Heo
2010-09-03  5:47             ` Kiyoshi Ueda
2010-09-03  9:33               ` Tejun Heo
2010-09-03 10:28                 ` Kiyoshi Ueda
2010-09-03 11:42                   ` Tejun Heo
2010-09-03 11:51                     ` Kiyoshi Ueda
     [not found]         ` <20100830194731.GA10702@redhat.com>
2010-09-01 10:31           ` [PATCH UPDATED 4/5] dm: implement REQ_FLUSH/FUA support for request-based dm Mikulas Patocka
2010-09-01 11:20             ` Tejun Heo
2010-09-01 12:12               ` Mikulas Patocka
2010-09-01 12:42                 ` Tejun Heo
2010-09-01 12:54                   ` Mike Snitzer
2010-09-01 15:20                 ` Mike Snitzer
2010-09-01 15:35                   ` Mikulas Patocka
2010-09-01 17:07                     ` Mike Snitzer
2010-09-01 18:59                       ` Mike Snitzer
2010-09-02  3:22                         ` Mike Snitzer
2010-09-02 10:24                           ` Tejun Heo
2010-09-02 15:11                             ` Mike Snitzer
2010-09-09 15:26                           ` Mike Snitzer [this message]
2010-09-09 15:44                             ` [REGRESSION][BISECTED] virtio-blk serial attribute causes guest to hang [Was: Re: [PATCH UPDATED 4/5] dm: implement REQ_FLUSH/FUA support for request-based dm] Ryan Harper
2010-09-09 15:57                               ` Mike Snitzer
2010-09-09 16:03                                 ` Ryan Harper
2010-09-09 17:55                                   ` Mike Snitzer
2010-09-09 18:35                                     ` Ryan Harper
2010-09-09 19:15                                       ` Mike Snitzer
2010-09-09 19:43                                         ` Mike Snitzer
2010-09-09 20:14                                           ` Mike Snitzer
2010-09-09 20:30                                             ` Ryan Harper
2010-09-09 21:00                                               ` [PATCH] virtio-blk: put request that was created to retrieve the device id Mike Snitzer
2010-09-09 21:15                                                 ` Christoph Hellwig
2010-09-17 14:58                                                   ` Ryan Harper
2010-09-21 21:00                                                     ` Christoph Hellwig
2010-10-08 16:06                                                       ` [2.6.36 REGRESSION] " Mike Snitzer
2010-10-09  1:41                                                 ` [PATCH] " Rusty Russell
2010-08-30 13:59     ` [PATCH 4/5] dm: implement REQ_FLUSH/FUA support for request-based dm Tejun Heo
2010-08-30  9:58 ` Tejun Heo
2010-08-30  9:58 ` Tejun Heo
2010-08-30  9:58 ` [PATCH 5/5] block: remove the WRITE_BARRIER flag Tejun Heo
2010-08-30  9:58 ` Tejun Heo
2010-08-30  9:58   ` Tejun Heo
2010-08-30  9:58 ` Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100909152658.GA8118@redhat.com \
    --to=snitzer@redhat.com \
    --cc=dm-devel@redhat.com \
    --cc=hch@infradead.org \
    --cc=john.cooper@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=mpatocka@redhat.com \
    --cc=rusty@rustcorp.com.au \
    --cc=ryanh@us.ibm.com \
    --cc=tj@kernel.org \
    --cc=vgoyal@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.