All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mike Snitzer <snitzer@redhat.com>
To: Bart Van Assche <bvanassche@acm.org>
Cc: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>,
	device-mapper development <dm-devel@redhat.com>
Subject: Re: v3.15 dm-mpath regression: cable pull test causes I/O hang
Date: Fri, 27 Jun 2014 09:33:45 -0400	[thread overview]
Message-ID: <20140627133345.GA6150@redhat.com> (raw)
In-Reply-To: <53AD6B62.2020407@acm.org>

On Fri, Jun 27 2014 at  9:02am -0400,
Bart Van Assche <bvanassche@acm.org> wrote:

> Hello,
> 
> While running a cable pull simulation test with dm_multipath on top of
> the SRP initiator driver I noticed that after a few iterations I/O locks
> up instead of dm_multipath processing the path failure properly (see also
> below for a call trace). At least kernel versions 3.15 and 3.16-rc2 are
> vulnerable. This issue does not occur with kernel 3.14. I have tried to
> bisect this but gave up when I noticed that I/O locked up completely with
> a kernel built from git commit ID e809917735ebf1b9a56c24e877ce0d320baee2ec
> (dm mpath: push back requests instead of queueing). But with the bisect I
> have been able to narrow down this issue to one of the patches in "Merge
> tag 'dm-3.15-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/
> device-mapper/linux-dm". Does anyone have a suggestion how to analyze this
> further or how to fix this ?
> 
> Thanks,
> 
> Bart.
> 
> systemd-udevd   D ffff880831b58000     0  9926    356 0x00000006
>  ffff8807f8bb79b8 0000000000000002 ffff880831b58000 ffff8807f8bb7fd8
>  00000000000131c0 00000000000131c0 ffff88083b490000 ffff88085fc53ad0
>  ffff88085ff6bf38 ffff8807f8bb7a40 0000000000000002 ffffffff81135bd0
> Call Trace:
>  [<ffffffff814bba8d>] io_schedule+0x9d/0x130
>  [<ffffffff81135bde>] sleep_on_page+0xe/0x20
>  [<ffffffff814bc0d8>] __wait_on_bit_lock+0x48/0xb0
>  [<ffffffff81135cea>] __lock_page+0x6a/0x70
>  [<ffffffff811471df>] truncate_inode_pages_range+0x3ff/0x690
>  [<ffffffff81147485>] truncate_inode_pages+0x15/0x20
>  [<ffffffff811d2f85>] kill_bdev+0x35/0x40
>  [<ffffffff811d4509>] __blkdev_put+0x69/0x1b0
>  [<ffffffff811d4fb0>] blkdev_put+0x50/0x160
>  [<ffffffff811d5175>] blkdev_close+0x25/0x30
>  [<ffffffff81199eda>] __fput+0xea/0x1f0
>  [<ffffffff8119a02e>] ____fput+0xe/0x10
>  [<ffffffff81074d9c>] task_work_run+0xac/0xe0
>  [<ffffffff8104ff37>] do_exit+0x2c7/0xc60
>  [<ffffffff81051c7c>] do_group_exit+0x4c/0xc0
>  [<ffffffff81064261>] get_signal_to_deliver+0x2e1/0x940
>  [<ffffffff81002528>] do_signal+0x48/0x630
>  [<ffffffff81002b81>] do_notify_resume+0x71/0xc0
>  [<ffffffff814c1918>] int_signal+0x12/0x17

(we've seen sync on last close cause problems when the block device
isn't reachable).

Any other threads that look suspect in output from?:
 echo t > /proc/sysrq-trigger

Can you provide your dmsetup table output for the relevant mpath device?
Are you using queue_if_no_path?  Also, AFAIK you don't use
multipath-tools, but if by some chance you do please provide your
multipath.conf.  I'll attempt to reproduce.

But I'm almost tempted to just revert _all_ of 3.15's dm-mpath changes,
and only reintroduce them once they can pass your testing.  I'd like to
avoid that, so Hannes and/or Jun'ichi, it is time for looking at this
seriously.. any help would be very appreciated.  For starters, have you
guys done cable pull tests with > 3.15 ?

  reply	other threads:[~2014-06-27 13:33 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-06-27 13:02 v3.15 dm-mpath regression: cable pull test causes I/O hang Bart Van Assche
2014-06-27 13:33 ` Mike Snitzer [this message]
2014-06-27 14:18   ` Bart Van Assche
2014-07-02 22:02   ` Mike Snitzer
2014-07-03  5:43     ` Hannes Reinecke
2014-07-03 13:56     ` Bart Van Assche
2014-07-03 13:58       ` Hannes Reinecke
2014-07-03 14:05       ` Mike Snitzer
2014-07-03 14:15         ` Hannes Reinecke
2014-07-03 14:18           ` Mike Snitzer
2014-07-03 14:34         ` Bart Van Assche
2014-07-03 15:00           ` Mike Snitzer
2014-07-07 13:28             ` Bart Van Assche
2014-07-04  3:10           ` Junichi Nomura
2014-07-07 13:40             ` Bart Van Assche
2014-07-08  0:55               ` Junichi Nomura
2014-07-08  9:43                 ` Bart Van Assche
2014-07-08 16:33                 ` Mike Snitzer
2014-07-08 23:24                   ` Junichi Nomura

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140627133345.GA6150@redhat.com \
    --to=snitzer@redhat.com \
    --cc=bvanassche@acm.org \
    --cc=dm-devel@redhat.com \
    --cc=j-nomura@ce.jp.nec.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.