[PATCH] multipath queues build invalid requests when all paths are lost

dm-devel.redhat.com archive mirror
 help / color / mirror / Atom feed

* [PATCH] multipath queues build invalid requests when all paths are lost
@ 2012-08-31 15:04 David Jeffery
  2012-09-04 14:58 ` Mike Snitzer
  0 siblings, 1 reply; 15+ messages in thread
From: David Jeffery @ 2012-08-31 15:04 UTC (permalink / raw)
  To: dm-devel

The DM module recalculates queue limits based only on devices which currently
exist in the table.  This creates a problem in the event all devices are
temporarily removed such as all fibre channel paths being lost in multipath.
DM will reset the limits to the maximum permissible, which can then assemble
requests which exceed the limits of the paths when the paths are restored.  The
request will fail the blk_rq_check_limits() test when sent to a path with
lower limits, and will be retried without end by multipath.

This becomes a much bigger issue after fe86cdcef73ba19a2246a124f0ddbd19b14fb549.
Previously, most storage had max_sector limits which exceeded the default
value used.  This meant most setups wouldn't trigger this issue as the default
values used when there were no paths were still less than the limits of the
underlying devices.  Now that the default stacking values are no longer
constrained, any hardware setup can potentially hit this issue.

This proposed patch alters the DM limit behavior.  With the patch, DM queue
limits only go one way: more restrictive.  As paths are removed, the queue's
limits will maintain their current settings.  As paths are added, the queue's
limits may become more restrictive.

Signed-off-by: David Jeffery <djeffery@redhat.com>

---
 dm-table.c |    2 --
 dm.c       |    2 ++
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index f900690..5e7e3ca 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -1222,8 +1222,6 @@ int dm_calculate_queue_limits(struct dm_table *table,
 	struct queue_limits ti_limits;
 	unsigned i = 0;

-	blk_set_stacking_limits(limits);
-
 	while (i < dm_table_get_num_targets(table)) {
 		blk_set_stacking_limits(&ti_limits);

diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 4e09b6f..fbf89d5 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1832,6 +1832,7 @@ static void dm_init_md_queue(struct mapped_device *md)
 	blk_queue_make_request(md->queue, dm_request);
 	blk_queue_bounce_limit(md->queue, BLK_BOUNCE_ANY);
 	blk_queue_merge_bvec(md->queue, dm_merge_bvec);
+	blk_set_stacking_limits(&md->queue->limits);
 }

 /*
@@ -2419,6 +2420,7 @@ struct dm_table *dm_swap_table(struct mapped_device *md, struct dm_table *table)
 	if (!dm_suspended_md(md))
 		goto out;

+	limits = md->queue->limits;
 	r = dm_calculate_queue_limits(table, &limits);
 	if (r) {
 		map = ERR_PTR(r);

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: multipath queues build invalid requests when all paths are lost
  2012-08-31 15:04 [PATCH] multipath queues build invalid requests when all paths are lost David Jeffery
@ 2012-09-04 14:58 ` Mike Snitzer
  2012-09-04 16:10   ` Mike Snitzer
  2012-09-12 19:37   ` [PATCH] dm table: do not allow queue limits that will exceed hardware limits Mike Snitzer
  0 siblings, 2 replies; 15+ messages in thread
From: Mike Snitzer @ 2012-09-04 14:58 UTC (permalink / raw)
  To: David Jeffery; +Cc: dm-devel

On Fri, Aug 31 2012 at 11:04am -0400,
David Jeffery <djeffery@redhat.com> wrote:

> 
> The DM module recalculates queue limits based only on devices which currently
> exist in the table.  This creates a problem in the event all devices are
> temporarily removed such as all fibre channel paths being lost in multipath.
> DM will reset the limits to the maximum permissible, which can then assemble
> requests which exceed the limits of the paths when the paths are restored.  The
> request will fail the blk_rq_check_limits() test when sent to a path with
> lower limits, and will be retried without end by multipath.
> 
> This becomes a much bigger issue after fe86cdcef73ba19a2246a124f0ddbd19b14fb549.
> Previously, most storage had max_sector limits which exceeded the default
> value used.  This meant most setups wouldn't trigger this issue as the default
> values used when there were no paths were still less than the limits of the
> underlying devices.  Now that the default stacking values are no longer
> constrained, any hardware setup can potentially hit this issue.
> 
> This proposed patch alters the DM limit behavior.  With the patch, DM queue
> limits only go one way: more restrictive.  As paths are removed, the queue's
> limits will maintain their current settings.  As paths are added, the queue's
> limits may become more restrictive.

With your proposed patch you could still hit the problem if the
initial multipath table load were to occur when no paths exist, e.g.:
echo "0 1024 multipath 0 0 0 0" | dmsetup create mpath_nodevs 

(granted, this shouldn't ever happen.. as is evidenced by the fact
that doing so will trigger an existing mpath bug; commit a490a07a67b
"dm mpath: allow table load with no priority groups" clearly wasn't
tested with the initial table load having no priority groups)

But ignoring all that, what I really don't like about your patch is the
limits from a previous table load will be used as the basis for
subsequent table loads.  This could result in incorrect limit stacking.

I don't have an immediate counter-proposal but I'll continue looking and
will let you know.  Thanks for pointing this issue out.

Mike

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: multipath queues build invalid requests when all paths are lost
  2012-09-04 14:58 ` Mike Snitzer
@ 2012-09-04 16:10   ` Mike Snitzer
  2012-09-04 16:12     ` Mike Snitzer
  2012-09-12 19:37   ` [PATCH] dm table: do not allow queue limits that will exceed hardware limits Mike Snitzer
  1 sibling, 1 reply; 15+ messages in thread
From: Mike Snitzer @ 2012-09-04 16:10 UTC (permalink / raw)
  To: Mikulas Patocka; +Cc: dm-devel, David Jeffery

On Tue, Sep 04 2012 at 10:58am -0400,
Mike Snitzer <snitzer@redhat.com> wrote:

> On Fri, Aug 31 2012 at 11:04am -0400,
> David Jeffery <djeffery@redhat.com> wrote:
> 
> > 
> > The DM module recalculates queue limits based only on devices which currently
> > exist in the table.  This creates a problem in the event all devices are
> > temporarily removed such as all fibre channel paths being lost in multipath.
> > DM will reset the limits to the maximum permissible, which can then assemble
> > requests which exceed the limits of the paths when the paths are restored.  The
> > request will fail the blk_rq_check_limits() test when sent to a path with
> > lower limits, and will be retried without end by multipath.
> > 
> > This becomes a much bigger issue after fe86cdcef73ba19a2246a124f0ddbd19b14fb549.
> > Previously, most storage had max_sector limits which exceeded the default
> > value used.  This meant most setups wouldn't trigger this issue as the default
> > values used when there were no paths were still less than the limits of the
> > underlying devices.  Now that the default stacking values are no longer
> > constrained, any hardware setup can potentially hit this issue.
> > 
> > This proposed patch alters the DM limit behavior.  With the patch, DM queue
> > limits only go one way: more restrictive.  As paths are removed, the queue's
> > limits will maintain their current settings.  As paths are added, the queue's
> > limits may become more restrictive.
> 
> With your proposed patch you could still hit the problem if the
> initial multipath table load were to occur when no paths exist, e.g.:
> echo "0 1024 multipath 0 0 0 0" | dmsetup create mpath_nodevs 
> 
> (granted, this shouldn't ever happen.. as is evidenced by the fact
> that doing so will trigger an existing mpath bug; commit a490a07a67b
> "dm mpath: allow table load with no priority groups" clearly wasn't
> tested with the initial table load having no priority groups)

Hi Mikulas, 

It seems your new retry in multipath_ioctl (commit 3599165) is causing
problems for the above dmsetup create.

Here is the stack trace for a hang that resulted as a side-effect of
udev starting blkid for the newly created multipath device:

blkid           D 0000000000000002     0 23936      1 0x00000000
 ffff8802b89e5cd8 0000000000000082 ffff8802b89e5fd8 0000000000012440
 ffff8802b89e4010 0000000000012440 0000000000012440 0000000000012440
 ffff8802b89e5fd8 0000000000012440 ffff88030c2aab30 ffff880325794040
Call Trace:
 [<ffffffff814ce099>] schedule+0x29/0x70
 [<ffffffff814cc312>] schedule_timeout+0x182/0x2e0
 [<ffffffff8104dee0>] ? lock_timer_base+0x70/0x70
 [<ffffffff814cc48e>] schedule_timeout_uninterruptible+0x1e/0x20
 [<ffffffff8104f840>] msleep+0x20/0x30
 [<ffffffffa0000839>] multipath_ioctl+0x109/0x170 [dm_multipath]
 [<ffffffffa06bfb9c>] dm_blk_ioctl+0xbc/0xd0 [dm_mod]
 [<ffffffff8122a408>] __blkdev_driver_ioctl+0x28/0x30
 [<ffffffff8122a79e>] blkdev_ioctl+0xce/0x730
 [<ffffffff811970ac>] block_ioctl+0x3c/0x40
 [<ffffffff8117321c>] do_vfs_ioctl+0x8c/0x340
 [<ffffffff81166293>] ? sys_newfstat+0x33/0x40
 [<ffffffff81173571>] sys_ioctl+0xa1/0xb0
 [<ffffffff814d70a9>] system_call_fastpath+0x16/0x1b

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: multipath queues build invalid requests when all paths are lost
  2012-09-04 16:10   ` Mike Snitzer
@ 2012-09-04 16:12     ` Mike Snitzer
  2012-09-08 16:50       ` Mikulas Patocka
  0 siblings, 1 reply; 15+ messages in thread
From: Mike Snitzer @ 2012-09-04 16:12 UTC (permalink / raw)
  To: Mikulas Patocka; +Cc: dm-devel, David Jeffery

On Tue, Sep 04 2012 at 12:10pm -0400,
Mike Snitzer <snitzer@redhat.com> wrote:

> On Tue, Sep 04 2012 at 10:58am -0400,
> Mike Snitzer <snitzer@redhat.com> wrote:
> 
> > On Fri, Aug 31 2012 at 11:04am -0400,
> > David Jeffery <djeffery@redhat.com> wrote:
> > 
> > > 
> > > The DM module recalculates queue limits based only on devices which currently
> > > exist in the table.  This creates a problem in the event all devices are
> > > temporarily removed such as all fibre channel paths being lost in multipath.
> > > DM will reset the limits to the maximum permissible, which can then assemble
> > > requests which exceed the limits of the paths when the paths are restored.  The
> > > request will fail the blk_rq_check_limits() test when sent to a path with
> > > lower limits, and will be retried without end by multipath.
> > > 
> > > This becomes a much bigger issue after fe86cdcef73ba19a2246a124f0ddbd19b14fb549.
> > > Previously, most storage had max_sector limits which exceeded the default
> > > value used.  This meant most setups wouldn't trigger this issue as the default
> > > values used when there were no paths were still less than the limits of the
> > > underlying devices.  Now that the default stacking values are no longer
> > > constrained, any hardware setup can potentially hit this issue.
> > > 
> > > This proposed patch alters the DM limit behavior.  With the patch, DM queue
> > > limits only go one way: more restrictive.  As paths are removed, the queue's
> > > limits will maintain their current settings.  As paths are added, the queue's
> > > limits may become more restrictive.
> > 
> > With your proposed patch you could still hit the problem if the
> > initial multipath table load were to occur when no paths exist, e.g.:
> > echo "0 1024 multipath 0 0 0 0" | dmsetup create mpath_nodevs 
> > 
> > (granted, this shouldn't ever happen.. as is evidenced by the fact
> > that doing so will trigger an existing mpath bug; commit a490a07a67b
> > "dm mpath: allow table load with no priority groups" clearly wasn't
> > tested with the initial table load having no priority groups)
> 
> Hi Mikulas, 
> 
> It seems your new retry in multipath_ioctl (commit 3599165) is causing
> problems for the above dmsetup create.
> 
> Here is the stack trace for a hang that resulted as a side-effect of
> udev starting blkid for the newly created multipath device:
> 
> blkid           D 0000000000000002     0 23936      1 0x00000000
>  ffff8802b89e5cd8 0000000000000082 ffff8802b89e5fd8 0000000000012440
>  ffff8802b89e4010 0000000000012440 0000000000012440 0000000000012440
>  ffff8802b89e5fd8 0000000000012440 ffff88030c2aab30 ffff880325794040
> Call Trace:
>  [<ffffffff814ce099>] schedule+0x29/0x70
>  [<ffffffff814cc312>] schedule_timeout+0x182/0x2e0
>  [<ffffffff8104dee0>] ? lock_timer_base+0x70/0x70
>  [<ffffffff814cc48e>] schedule_timeout_uninterruptible+0x1e/0x20
>  [<ffffffff8104f840>] msleep+0x20/0x30
>  [<ffffffffa0000839>] multipath_ioctl+0x109/0x170 [dm_multipath]
>  [<ffffffffa06bfb9c>] dm_blk_ioctl+0xbc/0xd0 [dm_mod]
>  [<ffffffff8122a408>] __blkdev_driver_ioctl+0x28/0x30
>  [<ffffffff8122a79e>] blkdev_ioctl+0xce/0x730
>  [<ffffffff811970ac>] block_ioctl+0x3c/0x40
>  [<ffffffff8117321c>] do_vfs_ioctl+0x8c/0x340
>  [<ffffffff81166293>] ? sys_newfstat+0x33/0x40
>  [<ffffffff81173571>] sys_ioctl+0xa1/0xb0
>  [<ffffffff814d70a9>] system_call_fastpath+0x16/0x1b

FYI, here is the full blkid command line:
/sbin/blkid -o udev -p /dev/dm-8

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: multipath queues build invalid requests when all paths are lost
  2012-09-04 16:12     ` Mike Snitzer
@ 2012-09-08 16:50       ` Mikulas Patocka
  2012-09-12 15:37         ` [PATCH] dm mpath: only retry ioctl if queue_if_no_path was configured Mike Snitzer
  0 siblings, 1 reply; 15+ messages in thread
From: Mikulas Patocka @ 2012-09-08 16:50 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: dm-devel, David Jeffery



On Tue, 4 Sep 2012, Mike Snitzer wrote:

> On Tue, Sep 04 2012 at 12:10pm -0400,
> Mike Snitzer <snitzer@redhat.com> wrote:
> 
> > On Tue, Sep 04 2012 at 10:58am -0400,
> > Mike Snitzer <snitzer@redhat.com> wrote:
> > 
> > > On Fri, Aug 31 2012 at 11:04am -0400,
> > > David Jeffery <djeffery@redhat.com> wrote:
> > > 
> > > > 
> > > > The DM module recalculates queue limits based only on devices which currently
> > > > exist in the table.  This creates a problem in the event all devices are
> > > > temporarily removed such as all fibre channel paths being lost in multipath.
> > > > DM will reset the limits to the maximum permissible, which can then assemble
> > > > requests which exceed the limits of the paths when the paths are restored.  The
> > > > request will fail the blk_rq_check_limits() test when sent to a path with
> > > > lower limits, and will be retried without end by multipath.
> > > > 
> > > > This becomes a much bigger issue after fe86cdcef73ba19a2246a124f0ddbd19b14fb549.
> > > > Previously, most storage had max_sector limits which exceeded the default
> > > > value used.  This meant most setups wouldn't trigger this issue as the default
> > > > values used when there were no paths were still less than the limits of the
> > > > underlying devices.  Now that the default stacking values are no longer
> > > > constrained, any hardware setup can potentially hit this issue.
> > > > 
> > > > This proposed patch alters the DM limit behavior.  With the patch, DM queue
> > > > limits only go one way: more restrictive.  As paths are removed, the queue's
> > > > limits will maintain their current settings.  As paths are added, the queue's
> > > > limits may become more restrictive.
> > > 
> > > With your proposed patch you could still hit the problem if the
> > > initial multipath table load were to occur when no paths exist, e.g.:
> > > echo "0 1024 multipath 0 0 0 0" | dmsetup create mpath_nodevs 
> > > 
> > > (granted, this shouldn't ever happen.. as is evidenced by the fact
> > > that doing so will trigger an existing mpath bug; commit a490a07a67b
> > > "dm mpath: allow table load with no priority groups" clearly wasn't
> > > tested with the initial table load having no priority groups)
> > 
> > Hi Mikulas, 
> > 
> > It seems your new retry in multipath_ioctl (commit 3599165) is causing
> > problems for the above dmsetup create.
> > 
> > Here is the stack trace for a hang that resulted as a side-effect of
> > udev starting blkid for the newly created multipath device:
> > 
> > blkid           D 0000000000000002     0 23936      1 0x00000000
> >  ffff8802b89e5cd8 0000000000000082 ffff8802b89e5fd8 0000000000012440
> >  ffff8802b89e4010 0000000000012440 0000000000012440 0000000000012440
> >  ffff8802b89e5fd8 0000000000012440 ffff88030c2aab30 ffff880325794040
> > Call Trace:
> >  [<ffffffff814ce099>] schedule+0x29/0x70
> >  [<ffffffff814cc312>] schedule_timeout+0x182/0x2e0
> >  [<ffffffff8104dee0>] ? lock_timer_base+0x70/0x70
> >  [<ffffffff814cc48e>] schedule_timeout_uninterruptible+0x1e/0x20
> >  [<ffffffff8104f840>] msleep+0x20/0x30
> >  [<ffffffffa0000839>] multipath_ioctl+0x109/0x170 [dm_multipath]
> >  [<ffffffffa06bfb9c>] dm_blk_ioctl+0xbc/0xd0 [dm_mod]
> >  [<ffffffff8122a408>] __blkdev_driver_ioctl+0x28/0x30
> >  [<ffffffff8122a79e>] blkdev_ioctl+0xce/0x730
> >  [<ffffffff811970ac>] block_ioctl+0x3c/0x40
> >  [<ffffffff8117321c>] do_vfs_ioctl+0x8c/0x340
> >  [<ffffffff81166293>] ? sys_newfstat+0x33/0x40
> >  [<ffffffff81173571>] sys_ioctl+0xa1/0xb0
> >  [<ffffffff814d70a9>] system_call_fastpath+0x16/0x1b
> 
> FYI, here is the full blkid command line:
> /sbin/blkid -o udev -p /dev/dm-8

It's hard to say what should be done correctly ... if you create a 
multipath device with "queue_if_no_path" and no active path, it should 
delay all requests until a path becomes available ... and it is doing 
that.

Maybe we could move the waiting loop up to dm_blk_ioctl so that it unlocks 
when someone reloads the target?

BTW. there is also -EAGAIN dm_blk_ioctl if dm_suspended_md ... should this 
-EAGAIN be removed too or not?

Mikulas

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH] dm mpath: only retry ioctl if queue_if_no_path was configured
  2012-09-08 16:50       ` Mikulas Patocka
@ 2012-09-12 15:37         ` Mike Snitzer
  2012-09-12 17:01           ` Mikulas Patocka
  0 siblings, 1 reply; 15+ messages in thread
From: Mike Snitzer @ 2012-09-12 15:37 UTC (permalink / raw)
  To: Mikulas Patocka; +Cc: dm-devel, David Jeffery

On Sat, Sep 08 2012 at 12:50pm -0400,
Mikulas Patocka <mpatocka@redhat.com> wrote:
 
> It's hard to say what should be done correctly ... if you create a 
> multipath device with "queue_if_no_path" and no active path, it should 
> delay all requests until a path becomes available ... and it is doing 
> that.
> 
> Maybe we could move the waiting loop up to dm_blk_ioctl so that it unlocks 
> when someone reloads the target?

I think it much more likely that multipathd will restore a path then an
entirely new table be loaded.
 
> BTW. there is also -EAGAIN dm_blk_ioctl if dm_suspended_md ... should this 
> -EAGAIN be removed too or not?

Don't think it needs to be removed.  Here is a patch that addresses the
issue for me.

---

From: Mike Snitzer <snitzer@redhat.com>
Subject: [PATCH] dm mpath: only retry ioctl if queue_if_no_path was configured

commit 35991652b ("dm mpath: allow ioctls to trigger pg init") should
have checked if queue_if_no_path was configured before queueing IO.

Checking for the queue_if_no_path feature, like is done in map_io(),
allows the following table load to work without blocking in the
multipath_ioctl retry loop:

  echo "0 1024 multipath 0 0 0 0" | dmsetup create mpath_nodevs

Without this fix the multipath_ioctl will block with the following stack
trace:

  blkid           D 0000000000000002     0 23936      1 0x00000000
   ffff8802b89e5cd8 0000000000000082 ffff8802b89e5fd8 0000000000012440
   ffff8802b89e4010 0000000000012440 0000000000012440 0000000000012440
   ffff8802b89e5fd8 0000000000012440 ffff88030c2aab30 ffff880325794040
  Call Trace:
   [<ffffffff814ce099>] schedule+0x29/0x70
   [<ffffffff814cc312>] schedule_timeout+0x182/0x2e0
   [<ffffffff8104dee0>] ? lock_timer_base+0x70/0x70
   [<ffffffff814cc48e>] schedule_timeout_uninterruptible+0x1e/0x20
   [<ffffffff8104f840>] msleep+0x20/0x30
   [<ffffffffa0000839>] multipath_ioctl+0x109/0x170 [dm_multipath]
   [<ffffffffa06bfb9c>] dm_blk_ioctl+0xbc/0xd0 [dm_mod]
   [<ffffffff8122a408>] __blkdev_driver_ioctl+0x28/0x30
   [<ffffffff8122a79e>] blkdev_ioctl+0xce/0x730
   [<ffffffff811970ac>] block_ioctl+0x3c/0x40
   [<ffffffff8117321c>] do_vfs_ioctl+0x8c/0x340
   [<ffffffff81166293>] ? sys_newfstat+0x33/0x40
   [<ffffffff81173571>] sys_ioctl+0xa1/0xb0
   [<ffffffff814d70a9>] system_call_fastpath+0x16/0x1b

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
 drivers/md/dm-mpath.c |   11 +++++++----
 1 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
index d8abb90..034233e 100644
--- a/drivers/md/dm-mpath.c
+++ b/drivers/md/dm-mpath.c
@@ -1555,6 +1555,7 @@ static int multipath_ioctl(struct dm_target *ti, unsigned int cmd,
 			   unsigned long arg)
 {
 	struct multipath *m = ti->private;
+	struct pgpath *pgpath;
 	struct block_device *bdev;
 	fmode_t mode;
 	unsigned long flags;
@@ -1570,12 +1571,14 @@ again:
 	if (!m->current_pgpath)
 		__choose_pgpath(m, 0);
 
-	if (m->current_pgpath) {
-		bdev = m->current_pgpath->path.dev->bdev;
-		mode = m->current_pgpath->path.dev->mode;
+	pgpath = m->current_pgpath;
+
+	if (pgpath) {
+		bdev = pgpath->path.dev->bdev;
+		mode = pgpath->path.dev->mode;
 	}
 
-	if (m->queue_io)
+	if ((pgpath && m->queue_io) || (!pgpath && m->queue_if_no_path))
 		r = -EAGAIN;
 	else if (!bdev)
 		r = -EIO;
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH] dm mpath: only retry ioctl if queue_if_no_path was configured
  2012-09-12 15:37         ` [PATCH] dm mpath: only retry ioctl if queue_if_no_path was configured Mike Snitzer
@ 2012-09-12 17:01           ` Mikulas Patocka
  0 siblings, 0 replies; 15+ messages in thread
From: Mikulas Patocka @ 2012-09-12 17:01 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: dm-devel, David Jeffery

Acked-by: Mikulas Patocka <mpatocka@redhat.com>

On Wed, 12 Sep 2012, Mike Snitzer wrote:

> On Sat, Sep 08 2012 at 12:50pm -0400,
> Mikulas Patocka <mpatocka@redhat.com> wrote:
>  
> > It's hard to say what should be done correctly ... if you create a 
> > multipath device with "queue_if_no_path" and no active path, it should 
> > delay all requests until a path becomes available ... and it is doing 
> > that.
> > 
> > Maybe we could move the waiting loop up to dm_blk_ioctl so that it unlocks 
> > when someone reloads the target?
> 
> I think it much more likely that multipathd will restore a path then an
> entirely new table be loaded.
>  
> > BTW. there is also -EAGAIN dm_blk_ioctl if dm_suspended_md ... should this 
> > -EAGAIN be removed too or not?
> 
> Don't think it needs to be removed.  Here is a patch that addresses the
> issue for me.
> 
> ---
> 
> From: Mike Snitzer <snitzer@redhat.com>
> Subject: [PATCH] dm mpath: only retry ioctl if queue_if_no_path was configured
> 
> commit 35991652b ("dm mpath: allow ioctls to trigger pg init") should
> have checked if queue_if_no_path was configured before queueing IO.
> 
> Checking for the queue_if_no_path feature, like is done in map_io(),
> allows the following table load to work without blocking in the
> multipath_ioctl retry loop:
> 
>   echo "0 1024 multipath 0 0 0 0" | dmsetup create mpath_nodevs
> 
> Without this fix the multipath_ioctl will block with the following stack
> trace:
> 
>   blkid           D 0000000000000002     0 23936      1 0x00000000
>    ffff8802b89e5cd8 0000000000000082 ffff8802b89e5fd8 0000000000012440
>    ffff8802b89e4010 0000000000012440 0000000000012440 0000000000012440
>    ffff8802b89e5fd8 0000000000012440 ffff88030c2aab30 ffff880325794040
>   Call Trace:
>    [<ffffffff814ce099>] schedule+0x29/0x70
>    [<ffffffff814cc312>] schedule_timeout+0x182/0x2e0
>    [<ffffffff8104dee0>] ? lock_timer_base+0x70/0x70
>    [<ffffffff814cc48e>] schedule_timeout_uninterruptible+0x1e/0x20
>    [<ffffffff8104f840>] msleep+0x20/0x30
>    [<ffffffffa0000839>] multipath_ioctl+0x109/0x170 [dm_multipath]
>    [<ffffffffa06bfb9c>] dm_blk_ioctl+0xbc/0xd0 [dm_mod]
>    [<ffffffff8122a408>] __blkdev_driver_ioctl+0x28/0x30
>    [<ffffffff8122a79e>] blkdev_ioctl+0xce/0x730
>    [<ffffffff811970ac>] block_ioctl+0x3c/0x40
>    [<ffffffff8117321c>] do_vfs_ioctl+0x8c/0x340
>    [<ffffffff81166293>] ? sys_newfstat+0x33/0x40
>    [<ffffffff81173571>] sys_ioctl+0xa1/0xb0
>    [<ffffffff814d70a9>] system_call_fastpath+0x16/0x1b
> 
> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
> ---
>  drivers/md/dm-mpath.c |   11 +++++++----
>  1 files changed, 7 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
> index d8abb90..034233e 100644
> --- a/drivers/md/dm-mpath.c
> +++ b/drivers/md/dm-mpath.c
> @@ -1555,6 +1555,7 @@ static int multipath_ioctl(struct dm_target *ti, unsigned int cmd,
>  			   unsigned long arg)
>  {
>  	struct multipath *m = ti->private;
> +	struct pgpath *pgpath;
>  	struct block_device *bdev;
>  	fmode_t mode;
>  	unsigned long flags;
> @@ -1570,12 +1571,14 @@ again:
>  	if (!m->current_pgpath)
>  		__choose_pgpath(m, 0);
>  
> -	if (m->current_pgpath) {
> -		bdev = m->current_pgpath->path.dev->bdev;
> -		mode = m->current_pgpath->path.dev->mode;
> +	pgpath = m->current_pgpath;
> +
> +	if (pgpath) {
> +		bdev = pgpath->path.dev->bdev;
> +		mode = pgpath->path.dev->mode;
>  	}
>  
> -	if (m->queue_io)
> +	if ((pgpath && m->queue_io) || (!pgpath && m->queue_if_no_path))
>  		r = -EAGAIN;
>  	else if (!bdev)
>  		r = -EIO;
> -- 
> 1.7.1
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH] dm table: do not allow queue limits that will exceed hardware limits
  2012-09-04 14:58 ` Mike Snitzer
  2012-09-04 16:10   ` Mike Snitzer
@ 2012-09-12 19:37   ` Mike Snitzer
  2012-09-14 20:41     ` [PATCH v2] " Mike Snitzer
  1 sibling, 1 reply; 15+ messages in thread
From: Mike Snitzer @ 2012-09-12 19:37 UTC (permalink / raw)
  To: David Jeffery; +Cc: dm-devel

DM recalculates queue limits based only on devices which currently exist
in the table.  This creates a problem in the event all devices are
temporarily removed such as all fibre channel paths being lost in
multipath.  DM will reset the limits to the maximum permissible, which
can then assemble requests which exceed the limits of the paths when the
paths are restored.  The request will fail the blk_rq_check_limits()
test when sent to a path with lower limits, and will be retried without
end by multipath.

This becomes a much bigger issue after commit fe86cdcef ("block: do not
artificially constrain max_sectors for stacking drivers").  Previously,
most storage had max_sector limits which exceeded the default value
used.  This meant most setups wouldn't trigger this issue as the default
values used when there were no paths were still less than the limits of
the underlying devices.  Now that the default stacking values are no
longer constrained, any hardware setup can potentially hit this issue.

So add a safety net that will establish safe default limits, via
blk_set_default_limits, in the event that a table temporarily doesn't
have any component devices.

Reported-by: David Jeffery <djeffery@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
 drivers/md/dm-table.c |    9 +++++++++
 1 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index f6979ad..9b931b4 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -1264,6 +1264,15 @@ combine_limits:
 			       (unsigned long long) ti->len);
 	}

+	/*
+	 * If a table doesn't have any component devices (e.g. multipath
+	 * loses all paths) don't allow the queue_limits to be left at
+	 * their maximum (as established by blk_set_stacking_limits() so
+	 * limits could be inherited from component devices).
+	 */
+	if (limits->max_sectors == UINT_MAX)
+		blk_set_default_limits(limits);
+
 	return validate_hardware_logical_block_alignment(table, limits);
 }

-- 
1.7.1

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2] dm table: do not allow queue limits that will exceed hardware limits
  2012-09-12 19:37   ` [PATCH] dm table: do not allow queue limits that will exceed hardware limits Mike Snitzer
@ 2012-09-14 20:41     ` Mike Snitzer
  2012-09-17 19:44       ` David Jeffery
  2012-09-17 20:24       ` [PATCH v2] dm table: do not allow queue limits that will exceed hardware limits Alasdair G Kergon
  0 siblings, 2 replies; 15+ messages in thread
From: Mike Snitzer @ 2012-09-14 20:41 UTC (permalink / raw)
  To: David Jeffery; +Cc: dm-devel

DM recalculates queue limits based only on devices which currently exist
in the table.  This creates a problem in the event all devices are
temporarily removed such as all fibre channel paths being lost in
multipath.  DM will reset the limits to the maximum permissible, which
can then assemble requests which exceed the limits of the paths when the
paths are restored.  The request will fail the blk_rq_check_limits()
test when sent to a path with lower limits, and will be retried without
end by multipath.  This became a much bigger issue after commit
fe86cdcef ("block: do not artificially constrain max_sectors for
stacking drivers").

Add a safety net that will establish safe default limits, via
blk_set_default_limits, in the event that a table temporarily doesn't
have any component devices.  The default limits may be too conservative
once paths are reinstated but adding a path to a multipath device
requires a table reload.  Once the table is reloaded proper limits
will be established based on the available component device(s).

Reported-by: David Jeffery <djeffery@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
 drivers/md/dm-table.c |    9 +++++++++
 1 files changed, 9 insertions(+), 0 deletions(-)

v2: adjust the patch header to be more succinct

diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index f6979ad..9b931b4 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -1264,6 +1264,15 @@ combine_limits:
 			       (unsigned long long) ti->len);
 	}

+	/*
+	 * If a table doesn't have any component devices (e.g. multipath
+	 * loses all paths) don't allow the queue_limits to be left at
+	 * their maximum (as established by blk_set_stacking_limits() so
+	 * limits could be inherited from component devices).
+	 */
+	if (limits->max_sectors == UINT_MAX)
+		blk_set_default_limits(limits);
+
 	return validate_hardware_logical_block_alignment(table, limits);
 }

-- 
1.7.1

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v2] dm table: do not allow queue limits that will exceed hardware limits
  2012-09-14 20:41     ` [PATCH v2] " Mike Snitzer
@ 2012-09-17 19:44       ` David Jeffery
  2012-09-17 19:52         ` Alasdair G Kergon
  2012-09-18 11:40         ` Alasdair G Kergon
  2012-09-17 20:24       ` [PATCH v2] dm table: do not allow queue limits that will exceed hardware limits Alasdair G Kergon
  1 sibling, 2 replies; 15+ messages in thread
From: David Jeffery @ 2012-09-17 19:44 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: dm-devel

----- Original Message -----
> v2: adjust the patch header to be more succinct
> 
> diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
> index f6979ad..9b931b4 100644
> --- a/drivers/md/dm-table.c
> +++ b/drivers/md/dm-table.c
> @@ -1264,6 +1264,15 @@ combine_limits:
>  			       (unsigned long long) ti->len);
>  	}
>  
> +	/*
> +	 * If a table doesn't have any component devices (e.g. multipath
> +	 * loses all paths) don't allow the queue_limits to be left at
> +	 * their maximum (as established by blk_set_stacking_limits() so
> +	 * limits could be inherited from component devices).
> +	 */
> +	if (limits->max_sectors == UINT_MAX)
> +		blk_set_default_limits(limits);
> +
>  	return validate_hardware_logical_block_alignment(table, limits);
>  }
>  
> --
> 1.7.1
> 
> 

Instead of setting to defaults, how about maintaining previous limits?
The initial queue setup sets defaults when a queue is first configured,
and this maintains known, working limits if all paths are temporarly
lost.  For example, I have a test setup with a lower than normal max
segment list.  It can fail a test with the previous patch as the
default limits exceed the hardware limits. But this setup will work if
we leave queue limits unchanged in the special case of there being no
target devices.

David Jeffery

diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 4e09b6f..89ec9ee 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -2425,6 +2425,15 @@ struct dm_table *dm_swap_table(struct mapped_device *md, struct dm_table *table)
 		goto out;
 	}
 
+	/*
+	 * If a table doesn't have any component devices (e.g. multipath
+	 * loses all paths) don't allow the queue_limits to be left at
+	 * their maximum (as established by blk_set_stacking_limits()).
+	 * Instead, maintain the previous limits.
+	 */
+	if (limits.max_sectors == UINT_MAX)
+		limits = md->queue->limits;
+
 	map = __bind(md, table, &limits);
 
 out:

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v2] dm table: do not allow queue limits that will exceed hardware limits
  2012-09-17 19:44       ` David Jeffery
@ 2012-09-17 19:52         ` Alasdair G Kergon
  2012-09-18 11:40         ` Alasdair G Kergon
  1 sibling, 0 replies; 15+ messages in thread
From: Alasdair G Kergon @ 2012-09-17 19:52 UTC (permalink / raw)
  To: device-mapper development; +Cc: Mike Snitzer

On Mon, Sep 17, 2012 at 03:44:29PM -0400, David Jeffery wrote:
> Instead of setting to defaults, how about maintaining previous limits?
> The initial queue setup sets defaults when a queue is first configured,
> and this maintains known, working limits if all paths are temporarly
> lost.  For example, I have a test setup with a lower than normal max
> segment list.  It can fail a test with the previous patch as the
> default limits exceed the hardware limits. But this setup will work if
> we leave queue limits unchanged in the special case of there being no
> target devices.

Firstly, the problem cannot be fixed completely - so let's make sure the
patch header doesn't claim that, and does explain how different
situations are handled and why.

Secondly, it's a mpath problem, so the solution should be a patch to 
dm-mpath that does NOT change the way any non-mpath dm devices are handled.

Now my question is whether this can be fixed adequately within the existing
interface, or whether userspace needs the ability to control whether
limits are reset or not (either by message or ioctl flag).

Alasdair

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2] dm table: do not allow queue limits that will exceed hardware limits
  2012-09-14 20:41     ` [PATCH v2] " Mike Snitzer
  2012-09-17 19:44       ` David Jeffery
@ 2012-09-17 20:24       ` Alasdair G Kergon
  1 sibling, 0 replies; 15+ messages in thread
From: Alasdair G Kergon @ 2012-09-17 20:24 UTC (permalink / raw)
  To: device-mapper development; +Cc: David Jeffery

On Fri, Sep 14, 2012 at 04:41:33PM -0400, Mike Snitzer wrote:
> Add a safety net that will establish safe default limits, via
> blk_set_default_limits, in the event that a table temporarily doesn't
> have any component devices.  

Under what circumstances is this a problem?

1) When a table part-way down a stack of devices is reloaded while bios
are already flowing through the upper parts.
- A general problem we're ignoring.

2) When queue_if_no_path is set, i/o is queued, a table is reloaded
with more restrictive limits than the i/o already in the system (that
gets pushed back).

  a) There was not previously a table.
     => Use this patch to fix a better default limit?
        - Needs an explanation why the limit is set in dm rather than block.

  b) There was previously a table.
     => Retain the limits from that previous table as a better estimate
        of what the limits might need to be, on the basis that disappeared
        paths might reappear later.
        - Needs a patch writing to do this.
        - Does userspace need an option to reset this?
          - Does it only apply if there is pushback?
	  - Does it apply in general to all target types if and only if there
            is pushback?
          - Or does it need a userspace flag to control whether or not it happens?

  If queue_if_no_path is NOT set, there is no problem so neither case applies?

Alasdair

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2] dm table: do not allow queue limits that will exceed hardware limits
  2012-09-17 19:44       ` David Jeffery
  2012-09-17 19:52         ` Alasdair G Kergon
@ 2012-09-18 11:40         ` Alasdair G Kergon
  2012-09-18 13:02           ` Mike Snitzer
  1 sibling, 1 reply; 15+ messages in thread
From: Alasdair G Kergon @ 2012-09-18 11:40 UTC (permalink / raw)
  To: David Jeffery; +Cc: dm-devel, Mike Snitzer

On Mon, Sep 17, 2012 at 03:44:29PM -0400, David Jeffery wrote:
> @@ -2425,6 +2425,15 @@ struct dm_table *dm_swap_table(struct mapped_device *md, struct dm_table *table)

> +	if (limits.max_sectors == UINT_MAX)
  
Specifically, I don't want dm.c to be peering directly into limits.

It just called calculate_queue_limits() above that.
Why is calculate_queue_limits getting the limits wrong?

Alasdair

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2] dm table: do not allow queue limits that will exceed hardware limits
  2012-09-18 11:40         ` Alasdair G Kergon
@ 2012-09-18 13:02           ` Mike Snitzer
  2012-09-21 15:37             ` [PATCH v3] dm: re-use live table's limits if next table has no data devices Mike Snitzer
  0 siblings, 1 reply; 15+ messages in thread
From: Mike Snitzer @ 2012-09-18 13:02 UTC (permalink / raw)
  To: Alasdair G Kergon; +Cc: dm-devel, David Jeffery

On Tue, Sep 18 2012 at  7:40am -0400,
Alasdair G Kergon <agk@redhat.com> wrote:

> On Mon, Sep 17, 2012 at 03:44:29PM -0400, David Jeffery wrote:
> > @@ -2425,6 +2425,15 @@ struct dm_table *dm_swap_table(struct mapped_device *md, struct dm_table *table)
> 
> > +	if (limits.max_sectors == UINT_MAX)
>   
> Specifically, I don't want dm.c to be peering directly into limits.

That's fine.  I agree, it is wrong to have DM code be so fragile
relative to potential block layer changes.

> It just called calculate_queue_limits() above that.
> Why is calculate_queue_limits getting the limits wrong?

Because, in the mpath case David reported, there are no longer any
devices in the table.  So .iterate_devices doesn't have anything to work
with; which means the blk_set_stacking_limits()' limits are left
unchanged (e.g. limits.max_sectors is UINT_MAX).

I'll try to think of a way to get dm_calculate_queue_limits() to detect:
1) that there are no devices in the table
2) that there is a live table that already has established limits
3) if 1 && 2, then use existing limits (should address David's request)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v3] dm: re-use live table's limits if next table has no data devices
  2012-09-18 13:02           ` Mike Snitzer
@ 2012-09-21 15:37             ` Mike Snitzer
  0 siblings, 0 replies; 15+ messages in thread
From: Mike Snitzer @ 2012-09-21 15:37 UTC (permalink / raw)
  To: Alasdair G Kergon; +Cc: dm-devel, David Jeffery

DM recalculates queue limits based only on devices which currently exist
in the table.  This creates a problem in the event all devices are
temporarily removed such as all paths being lost in multipath.  DM will
reset the limits to the maximum permissible, which can then assemble
requests which exceed the limits of the paths when the paths are
restored.  The request will fail the blk_rq_check_limits() test when
sent to a path with lower limits, and will be retried without end by
multipath.  This became a much bigger issue after commit fe86cdcef
("block: do not artificially constrain max_sectors for stacking
drivers").

Add a safety net that will re-use the DM device's existing limits in the
event that DM device has a temporary table that doesn't have any
component devices.

Reported-by: David Jeffery <djeffery@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
 drivers/md/dm-table.c |   30 ++++++++++++++++++++++++++++++
 drivers/md/dm.c       |    8 +++++++-
 drivers/md/dm.h       |    1 +
 3 files changed, 38 insertions(+), 1 deletion(-)

Index: linux/drivers/md/dm-table.c
===================================================================
--- linux.orig/drivers/md/dm-table.c
+++ linux/drivers/md/dm-table.c
@@ -1212,6 +1212,36 @@ struct dm_target *dm_table_find_target(s
 	return &t->targets[(KEYS_PER_NODE * n) + k];
 }
 
+static int count_device(struct dm_target *ti, struct dm_dev *dev,
+			sector_t start, sector_t len, void *data)
+{
+	unsigned *num_devices_p = data;
+	*num_devices_p += 1;
+	return 0;
+}
+
+bool dm_table_has_no_data_devices(struct dm_table *table)
+{
+	struct dm_target *uninitialized_var(ti);
+	unsigned i = 0, num_devices = 0;
+
+	while (i < dm_table_get_num_targets(table)) {
+		ti = dm_table_get_target(table, i++);
+
+		if (!ti->type->iterate_devices) {
+			/* unfortunately, must assume there are devices */
+			num_devices = 1;
+			break;
+		}
+
+		ti->type->iterate_devices(ti, count_device, &num_devices);
+		if (num_devices)
+			break;
+	}
+
+	return num_devices == 0;
+}
+
 /*
  * Establish the new table's queue_limits and validate them.
  */
Index: linux/drivers/md/dm.c
===================================================================
--- linux.orig/drivers/md/dm.c
+++ linux/drivers/md/dm.c
@@ -2422,7 +2422,7 @@ static void dm_queue_flush(struct mapped
  */
 struct dm_table *dm_swap_table(struct mapped_device *md, struct dm_table *table)
 {
-	struct dm_table *map = ERR_PTR(-EINVAL);
+	struct dm_table *live_map, *map = ERR_PTR(-EINVAL);
 	struct queue_limits limits;
 	int r;
 
@@ -2432,6 +2432,12 @@ struct dm_table *dm_swap_table(struct ma
 	if (!dm_suspended_md(md))
 		goto out;
 
+	/* re-use live table's limits if next table has no data devices */
+	live_map = dm_get_live_table(md);
+	if (live_map && dm_table_has_no_data_devices(table))
+		limits = md->queue->limits;
+	dm_table_put(live_map);
+
 	r = dm_calculate_queue_limits(table, &limits);
 	if (r) {
 		map = ERR_PTR(r);
Index: linux/drivers/md/dm.h
===================================================================
--- linux.orig/drivers/md/dm.h
+++ linux/drivers/md/dm.h
@@ -54,6 +54,7 @@ void dm_table_event_callback(struct dm_t
 			     void (*fn)(void *), void *context);
 struct dm_target *dm_table_get_target(struct dm_table *t, unsigned int index);
 struct dm_target *dm_table_find_target(struct dm_table *t, sector_t sector);
+bool dm_table_has_no_data_devices(struct dm_table *table);
 int dm_calculate_queue_limits(struct dm_table *table,
 			      struct queue_limits *limits);
 void dm_table_set_restrictions(struct dm_table *t, struct request_queue *q,

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2012-09-21 15:37 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-08-31 15:04 [PATCH] multipath queues build invalid requests when all paths are lost David Jeffery
2012-09-04 14:58 ` Mike Snitzer
2012-09-04 16:10   ` Mike Snitzer
2012-09-04 16:12     ` Mike Snitzer
2012-09-08 16:50       ` Mikulas Patocka
2012-09-12 15:37         ` [PATCH] dm mpath: only retry ioctl if queue_if_no_path was configured Mike Snitzer
2012-09-12 17:01           ` Mikulas Patocka
2012-09-12 19:37   ` [PATCH] dm table: do not allow queue limits that will exceed hardware limits Mike Snitzer
2012-09-14 20:41     ` [PATCH v2] " Mike Snitzer
2012-09-17 19:44       ` David Jeffery
2012-09-17 19:52         ` Alasdair G Kergon
2012-09-18 11:40         ` Alasdair G Kergon
2012-09-18 13:02           ` Mike Snitzer
2012-09-21 15:37             ` [PATCH v3] dm: re-use live table's limits if next table has no data devices Mike Snitzer
2012-09-17 20:24       ` [PATCH v2] dm table: do not allow queue limits that will exceed hardware limits Alasdair G Kergon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).