[PATCH] block: Check that queue is alive in blk_insert_cloned

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH] block: Check that queue is alive in blk_insert_cloned_request()
       [not found] <CAG4TOxMjuntpd3Q5jgn+xDa2tgB9tq+jLcDquM_eAdFHXLDrWA@mail.gmail.com>
@ 2011-07-08 23:04 ` Roland Dreier
  2011-07-09  9:05   ` Stefan Richter
                     ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Roland Dreier @ 2011-07-08 23:04 UTC (permalink / raw)
  To: Jens Axboe
  Cc: James Bottomley, Alan Stern, Heiko Carstens, linux-scsi,
	Steffen Maier, Manvanthara B. Puttashankar, Tarak Reddy,
	Seshagiri N. Ippili, linux-kernel

From: Roland Dreier <roland@purestorage.com>

This fixes crashes such as the below that I see when the storage
underlying a dm-multipath device is hot-removed.  The problem is that
dm requeues a request to a device whose block queue has already been
cleaned up, and blk_insert_cloned_request() doesn't check if the queue
is alive, but rather goes ahead and tries to queue the request.  This
ends up dereferencing the elevator that was already freed in
blk_cleanup_queue().

    general protection fault: 0000 [#1] SMP 
    CPU 7 
    Modules linked in: kvm_intel kvm serio_raw i7core_edac edac_core ioatdma dca pci_stub ses enclosure usbhid usb_storage qla2xxx mpt2sas ahci hid uas libahci e1000e scsi_transport_fc scsi_transport_sas mlx4_core raid_class scsi_tgt
    RIP: 0010:[<ffffffff81233867>]  [<ffffffff81233867>] elv_drain_elevator+0x27/0x80
    RSP: 0018:ffff880614e9fa48  EFLAGS: 00010096
    RAX: 6b6b6b6b6b6b6b6b RBX: ffff880610bb0000 RCX: 0000000000000000
    RDX: 0000000000000002 RSI: 0000000000000001 RDI: ffff880610bb0000
    RBP: ffff880614e9fa58 R08: 0000000000000000 R09: 0000000000000001
    R10: ffff880c0a7dca70 R11: ffff880615622440 R12: ffff880610bb0000
    R13: 0000000000000002 R14: 0000000000000002 R15: ffff880c0db01160
    FS:  00007fe46457a760(0000) GS:ffff880c3fc20000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00007fe463c86330 CR3: 0000000c0cc0c000 CR4: 00000000000006e0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    Process blkid (pid: 12458, threadinfo ffff880614e9e000, task ffff8806190eae20)
    Stack:
     ffff880c0d9f03c8 ffff880c0d9f03c8 ffff880614e9fa88 ffffff
     ffff880c0d9f03c8 ffff880610bb0000 0000000000000002 ffffc9
     ffff880614e9fab8 ffffffff812366fd ffff880614e9fad8 ffff88
    Call Trace:
     [<ffffffff812339a8>] __elv_add_request+0xe8/0x280
     [<ffffffff812366fd>] add_acct_request+0x3d/0x50
     [<ffffffff81236775>] blk_insert_cloned_request+0x65/0x90
     [<ffffffff813b8d4e>] dm_dispatch_request+0x3e/0x70
     [<ffffffff813ba850>] dm_request_fn+0x160/0x250
     [<ffffffff81236e88>] queue_unplugged+0x48/0xd0
     [<ffffffff8123b03d>] blk_flush_plug_list+0x1ed/0x250
     [<ffffffff810e98f0>] ? sleep_on_page+0x20/0x20
     [<ffffffff814db895>] io_schedule+0x75/0xd0
     [<ffffffff810e98fe>] sleep_on_page_killable+0xe/0x40
     [<ffffffff814dbf9a>] __wait_on_bit_lock+0x5a/0xc0
     [<ffffffff810e9857>] __lock_page_killable+0x67/0x70
     [<ffffffff8106d650>] ? autoremove_wake_function+0x40/0x40
     [<ffffffff810eb5c5>] generic_file_aio_read+0x405/0x720
     [<ffffffff81146962>] do_sync_read+0xd2/0x110
     [<ffffffff8121e393>] ? security_file_permission+0x93/0xb0
     [<ffffffff81146c81>] ? rw_verify_area+0x61/0xf0
     [<ffffffff81147143>] vfs_read+0xc3/0x180
     [<ffffffff81147251>] sys_read+0x51/0x90
     [<ffffffff814e5942>] system_call_fastpath+0x16/0x1b
    Code: c3 0f 1f 00 55 48 89 e5 53 48 83 ec 08 66 66 66 66 90 48 89 fb 0f 1f 80 00 00 00 00 48 8b 43 18 be 01 00 00 00 48 89 df 48 8b 00 <ff> 50 28 85 c0 75 ea 8b 93 f0 03 00 00 85 d2 74 14 8b 05 fa e1 
    RIP  [<ffffffff81233867>] elv_drain_elevator+0x27/0x80
     RSP <ffff880614e9fa48>
     RSP <ffff880614e9fa48>
    ---[ end trace 5dbd03e7295023d7 ]---

Signed-off-by: Roland Dreier <roland@purestorage.com>
---
 block/blk-core.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index d2f8f40..a13a9cc 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1706,6 +1706,9 @@ int blk_insert_cloned_request(struct request_queue *q, struct request *rq)
 		return -EIO;
 #endif
 
+	if (unlikely(test_bit(QUEUE_FLAG_DEAD, &q->queue_flags)))
+		return -EIO;
+
 	spin_lock_irqsave(q->queue_lock, flags);
 
 	/*

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH] block: Check that queue is alive in blk_insert_cloned_request()
  2011-07-08 23:04 ` [PATCH] block: Check that queue is alive in blk_insert_cloned_request() Roland Dreier
@ 2011-07-09  9:05   ` Stefan Richter
  2011-07-11 22:40   ` Mike Snitzer
  2011-07-12  2:09   ` Vivek Goyal
  2 siblings, 0 replies; 16+ messages in thread
From: Stefan Richter @ 2011-07-09  9:05 UTC (permalink / raw)
  To: Roland Dreier
  Cc: Jens Axboe, James Bottomley, Alan Stern, Heiko Carstens,
	linux-scsi, Steffen Maier, Manvanthara B. Puttashankar,
	Tarak Reddy, Seshagiri N. Ippili, linux-kernel

On Jul 08 Roland Dreier wrote:
> This fixes crashes such as the below that I see when the storage
> underlying a dm-multipath device is hot-removed.  The problem is that
> dm requeues a request to a device whose block queue has already been
> cleaned up, and blk_insert_cloned_request() doesn't check if the queue
> is alive, but rather goes ahead and tries to queue the request.  This
> ends up dereferencing the elevator that was already freed in
> blk_cleanup_queue().
[...]
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -1706,6 +1706,9 @@ int blk_insert_cloned_request(struct request_queue *q, struct request *rq)
>  		return -EIO;
>  #endif
>  
> +	if (unlikely(test_bit(QUEUE_FLAG_DEAD, &q->queue_flags)))
> +		return -EIO;
> +
>  	spin_lock_irqsave(q->queue_lock, flags);
>  
>  	/*

Not knowing the calling contexts or having tried to learn about them, I am
wondering:  What prevents the elevator to be freed right after the flag
was checked?
-- 
Stefan Richter
-=====-==-== -=== -=--=
http://arcgraph.de/sr/

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] block: Check that queue is alive in blk_insert_cloned_request()
  2011-07-08 23:04 ` [PATCH] block: Check that queue is alive in blk_insert_cloned_request() Roland Dreier
  2011-07-09  9:05   ` Stefan Richter
@ 2011-07-11 22:40   ` Mike Snitzer
  2011-07-12  0:52     ` Alan Stern
                       ` (2 more replies)
  2011-07-12  2:09   ` Vivek Goyal
  2 siblings, 3 replies; 16+ messages in thread
From: Mike Snitzer @ 2011-07-11 22:40 UTC (permalink / raw)
  To: Roland Dreier
  Cc: Jens Axboe, James Bottomley, Alan Stern, Heiko Carstens,
	linux-scsi, Steffen Maier, Manvanthara B. Puttashankar,
	Tarak Reddy, Seshagiri N. Ippili, linux-kernel,
	device-mapper development, Vivek Goyal, Tejun Heo

[cc'ing dm-devel, vivek and tejun]

On Fri, Jul 8, 2011 at 7:04 PM, Roland Dreier <roland@kernel.org> wrote:
> From: Roland Dreier <roland@purestorage.com>
>
> This fixes crashes such as the below that I see when the storage
> underlying a dm-multipath device is hot-removed.  The problem is that
> dm requeues a request to a device whose block queue has already been
> cleaned up, and blk_insert_cloned_request() doesn't check if the queue
> is alive, but rather goes ahead and tries to queue the request.  This
> ends up dereferencing the elevator that was already freed in
> blk_cleanup_queue().

Your patch looks fine to me:
Acked-by: Mike Snitzer <snitzer@redhat.com>

And I looked at various code paths to arrive at the references DM takes.

A reference is taken on the underlying devices' block_device via
drivers/md/dm-table.c:open_dev() with blkdev_get_by_dev().  open_dev()
also does bd_link_disk_holder(), resulting in the mpath device
becoming a holder of the underlying devices. e.g.:
/sys/block/sda/holders/dm-4

But at no point does DM-mpath get a reference to the underlying
devices' request_queue that gets assigned to clone->q (in
drivers/md/dm-mpath.c:map_io).

Seems we should, though AFAIK it won't help with the issue you've
pointed out (because the hotplugged device's driver already called
blk_cleanup_queue and nuked the elevator).

So I'm not sure getting the request_queue reference actually fixes
anything (maybe my imagination is lacking?).  But getting the
request_queue reference is "the right thing" if we're going to be
setting pointers to it.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] block: Check that queue is alive in blk_insert_cloned_request()
  2011-07-11 22:40   ` Mike Snitzer
@ 2011-07-12  0:52     ` Alan Stern
  2011-07-12  1:22       ` Mike Snitzer
  2011-07-12 14:58     ` [PATCH] dm mpath: manage reference on request queue of underlying devices Mike Snitzer
  2011-07-12 17:06     ` [PATCH] block: Check that queue is alive in blk_insert_cloned_request() Vivek Goyal
  2 siblings, 1 reply; 16+ messages in thread
From: Alan Stern @ 2011-07-12  0:52 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Roland Dreier, Jens Axboe, James Bottomley, Heiko Carstens,
	linux-scsi, Steffen Maier, Manvanthara B. Puttashankar,
	Tarak Reddy, Seshagiri N. Ippili, linux-kernel,
	device-mapper development, Vivek Goyal, Tejun Heo

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: TEXT/PLAIN; charset=UTF-8, Size: 904 bytes --]

On Mon, 11 Jul 2011, Mike Snitzer wrote:

> [cc'ing dm-devel, vivek and tejun]
> 
> On Fri, Jul 8, 2011 at 7:04 PM, Roland Dreier <roland@kernel.org> wrote:
> > From: Roland Dreier <roland@purestorage.com>
> >
> > This fixes crashes such as the below that I see when the storage
> > underlying a dm-multipath device is hot-removed.  The problem is that
> > dm requeues a request to a device whose block queue has already been
> > cleaned up, and blk_insert_cloned_request() doesn't check if the queue
> > is alive, but rather goes ahead and tries to queue the request.  This
> > ends up dereferencing the elevator that was already freed in
> > blk_cleanup_queue().
> 
> Your patch looks fine to me:
> Acked-by: Mike Snitzer <snitzer@redhat.com>

There's still the issue that Stefan Richter pointed out: The test for a 
dead queue must be made _after_ acquiring the queue lock, not _before_.

Alan Stern


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: block: Check that queue is alive in blk_insert_cloned_request()
  2011-07-12  0:52     ` Alan Stern
@ 2011-07-12  1:22       ` Mike Snitzer
  2011-07-12  1:46         ` Vivek Goyal
  0 siblings, 1 reply; 16+ messages in thread
From: Mike Snitzer @ 2011-07-12  1:22 UTC (permalink / raw)
  To: Alan Stern
  Cc: Roland Dreier, Jens Axboe, James Bottomley, Heiko Carstens,
	linux-scsi, Steffen Maier, Manvanthara B. Puttashankar,
	Tarak Reddy, Seshagiri N. Ippili, linux-kernel,
	device-mapper development, Vivek Goyal, Tejun Heo, jaxboe

On Mon, Jul 11 2011 at  8:52pm -0400,
Alan Stern <stern@rowland.harvard.edu> wrote:

> On Mon, 11 Jul 2011, Mike Snitzer wrote:
> 
> > [cc'ing dm-devel, vivek and tejun]
> > 
> > On Fri, Jul 8, 2011 at 7:04 PM, Roland Dreier <roland@kernel.org> wrote:
> > > From: Roland Dreier <roland@purestorage.com>
> > >
> > > This fixes crashes such as the below that I see when the storage
> > > underlying a dm-multipath device is hot-removed. ?The problem is that
> > > dm requeues a request to a device whose block queue has already been
> > > cleaned up, and blk_insert_cloned_request() doesn't check if the queue
> > > is alive, but rather goes ahead and tries to queue the request. ?This
> > > ends up dereferencing the elevator that was already freed in
> > > blk_cleanup_queue().
> > 
> > Your patch looks fine to me:
> > Acked-by: Mike Snitzer <snitzer@redhat.com>
> 
> There's still the issue that Stefan Richter pointed out: The test for a 
> dead queue must be made _after_ acquiring the queue lock, not _before_.

Yes, quite important.

Jens, can you tweak the patch or should Roland send a v2?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: block: Check that queue is alive in blk_insert_cloned_request()
  2011-07-12  1:22       ` Mike Snitzer
@ 2011-07-12  1:46         ` Vivek Goyal
  2011-07-12 15:24           ` Alan Stern
  0 siblings, 1 reply; 16+ messages in thread
From: Vivek Goyal @ 2011-07-12  1:46 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Alan Stern, Roland Dreier, Jens Axboe, James Bottomley,
	Heiko Carstens, linux-scsi, Steffen Maier,
	Manvanthara B. Puttashankar, Tarak Reddy, Seshagiri N. Ippili,
	linux-kernel, device-mapper development, Tejun Heo, jaxboe

On Mon, Jul 11, 2011 at 09:22:06PM -0400, Mike Snitzer wrote:
> On Mon, Jul 11 2011 at  8:52pm -0400,
> Alan Stern <stern@rowland.harvard.edu> wrote:
> 
> > On Mon, 11 Jul 2011, Mike Snitzer wrote:
> > 
> > > [cc'ing dm-devel, vivek and tejun]
> > > 
> > > On Fri, Jul 8, 2011 at 7:04 PM, Roland Dreier <roland@kernel.org> wrote:
> > > > From: Roland Dreier <roland@purestorage.com>
> > > >
> > > > This fixes crashes such as the below that I see when the storage
> > > > underlying a dm-multipath device is hot-removed. ?The problem is that
> > > > dm requeues a request to a device whose block queue has already been
> > > > cleaned up, and blk_insert_cloned_request() doesn't check if the queue
> > > > is alive, but rather goes ahead and tries to queue the request. ?This
> > > > ends up dereferencing the elevator that was already freed in
> > > > blk_cleanup_queue().
> > > 
> > > Your patch looks fine to me:
> > > Acked-by: Mike Snitzer <snitzer@redhat.com>
> > 
> > There's still the issue that Stefan Richter pointed out: The test for a 
> > dead queue must be made _after_ acquiring the queue lock, not _before_.
> 
> Yes, quite important.
> 
> Jens, can you tweak the patch or should Roland send a v2?

I do not think that we should do queue dead check after taking a spinlock.
The reason being that there are life time issues of two objects.

- Validity of request queue pointer
- Validity of q->spin_lock pointer

If the dm has taken the reference to the request queue in the beginning
then it can be sure request queue pointer is valid. But spin_lock might
be coming from driver and might be in one of driver allocated structures.
So it might happen that driver has called blk_cleanup_queue() and freed
up structures which contained the spin lock.

So if queue is not dead, we know that q->spin_lock is valid. I think
only race present here is that whole operation is not atomic. First
we check for queue not dead flag and then go on to acquire request
queue lock. So this leaves a small window for race. I think I have
seen other code written in such manner (__generic_make_request()). So
it proably reasonably safe to do here too.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: block: Check that queue is alive in blk_insert_cloned_request()
  2011-07-12  1:46         ` Vivek Goyal
@ 2011-07-12 15:24           ` Alan Stern
  2011-07-12 17:10             ` Vivek Goyal
  0 siblings, 1 reply; 16+ messages in thread
From: Alan Stern @ 2011-07-12 15:24 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Mike Snitzer, Roland Dreier, Jens Axboe, James Bottomley,
	Heiko Carstens, linux-scsi, Steffen Maier,
	Manvanthara B. Puttashankar, Tarak Reddy, Seshagiri N. Ippili,
	linux-kernel, device-mapper development, Tejun Heo, jaxboe

On Mon, 11 Jul 2011, Vivek Goyal wrote:

> > > There's still the issue that Stefan Richter pointed out: The test for a 
> > > dead queue must be made _after_ acquiring the queue lock, not _before_.
> > 
> > Yes, quite important.
> > 
> > Jens, can you tweak the patch or should Roland send a v2?
> 
> I do not think that we should do queue dead check after taking a spinlock.
> The reason being that there are life time issues of two objects.
> 
> - Validity of request queue pointer
> - Validity of q->spin_lock pointer
> 
> If the dm has taken the reference to the request queue in the beginning
> then it can be sure request queue pointer is valid. But spin_lock might
> be coming from driver and might be in one of driver allocated structures.
> So it might happen that driver has called blk_cleanup_queue() and freed
> up structures which contained the spin lock.

Surely this is a bug in the design of the block layer?

> So if queue is not dead, we know that q->spin_lock is valid. I think
> only race present here is that whole operation is not atomic. First
> we check for queue not dead flag and then go on to acquire request
> queue lock. So this leaves a small window for race. I think I have
> seen other code written in such manner (__generic_make_request()). So
> it proably reasonably safe to do here too.

"Probably reasonably safe" = "unsafe".  The fact that it will usually
work out okay means that when it does fail, it will be very difficult
to track down.

It needs to be fixed _now_, when people are aware of the issue.  Not 
five years from now, when everybody has forgotten about it.

Alan Stern


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: block: Check that queue is alive in blk_insert_cloned_request()
  2011-07-12 15:24           ` Alan Stern
@ 2011-07-12 17:10             ` Vivek Goyal
  0 siblings, 0 replies; 16+ messages in thread
From: Vivek Goyal @ 2011-07-12 17:10 UTC (permalink / raw)
  To: Alan Stern
  Cc: Mike Snitzer, Roland Dreier, Jens Axboe, James Bottomley,
	Heiko Carstens, linux-scsi, Steffen Maier,
	Manvanthara B. Puttashankar, Tarak Reddy, Seshagiri N. Ippili,
	linux-kernel, device-mapper development, Tejun Heo, jaxboe

On Tue, Jul 12, 2011 at 11:24:54AM -0400, Alan Stern wrote:
> On Mon, 11 Jul 2011, Vivek Goyal wrote:
> 
> > > > There's still the issue that Stefan Richter pointed out: The test for a 
> > > > dead queue must be made _after_ acquiring the queue lock, not _before_.
> > > 
> > > Yes, quite important.
> > > 
> > > Jens, can you tweak the patch or should Roland send a v2?
> > 
> > I do not think that we should do queue dead check after taking a spinlock.
> > The reason being that there are life time issues of two objects.
> > 
> > - Validity of request queue pointer
> > - Validity of q->spin_lock pointer
> > 
> > If the dm has taken the reference to the request queue in the beginning
> > then it can be sure request queue pointer is valid. But spin_lock might
> > be coming from driver and might be in one of driver allocated structures.
> > So it might happen that driver has called blk_cleanup_queue() and freed
> > up structures which contained the spin lock.
> 
> Surely this is a bug in the design of the block layer?
> 
> > So if queue is not dead, we know that q->spin_lock is valid. I think
> > only race present here is that whole operation is not atomic. First
> > we check for queue not dead flag and then go on to acquire request
> > queue lock. So this leaves a small window for race. I think I have
> > seen other code written in such manner (__generic_make_request()). So
> > it proably reasonably safe to do here too.
> 
> "Probably reasonably safe" = "unsafe".  The fact that it will usually
> work out okay means that when it does fail, it will be very difficult
> to track down.
> 
> It needs to be fixed _now_, when people are aware of the issue.  Not 
> five years from now, when everybody has forgotten about it.

I agree that fixing would be good. Frankly speaking I don't even have
full understanding of the problem. I know little bit from request queue
side but have no idea about referencing mechanism at device level and
how that is supposed to work with request queue referencing.

So once we understand the problem well, probably we will have an answer
how to go about fixing it.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH] dm mpath: manage reference on request queue of underlying devices
  2011-07-11 22:40   ` Mike Snitzer
  2011-07-12  0:52     ` Alan Stern
@ 2011-07-12 14:58     ` Mike Snitzer
  2011-07-12 17:06     ` [PATCH] block: Check that queue is alive in blk_insert_cloned_request() Vivek Goyal
  2 siblings, 0 replies; 16+ messages in thread
From: Mike Snitzer @ 2011-07-12 14:58 UTC (permalink / raw)
  To: Alasdair G. Kergon
  Cc: Jens Axboe, James Bottomley, Alan Stern, Heiko Carstens,
	linux-scsi, Steffen Maier, Manvanthara B. Puttashankar,
	Tarak Reddy, Seshagiri N. Ippili, linux-kernel,
	device-mapper development, Vivek Goyal, Tejun Heo, Roland Dreier

When processing a request, DM-mpath's map_io() set the cloned request's
request_queue to the appropriate underlying device's request_queue
without getting a reference on that request_queue.

DM-mpath now maintains a reference count on the underlying devices'
request_queue.  This change wasn't motivated by a specific report but
code, like blk_insert_cloned_request(), will access the request_queue
with the understanding that the request_queue is valid.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
 drivers/md/dm-mpath.c |   33 +++++++++++++++++++++------------
 1 files changed, 21 insertions(+), 12 deletions(-)

diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
index eadea8e..629efa3 100644
--- a/drivers/md/dm-mpath.c
+++ b/drivers/md/dm-mpath.c
@@ -161,11 +161,14 @@ static void free_pgpaths(struct list_head *pgpaths, struct dm_target *ti)
 {
 	struct pgpath *pgpath, *tmp;
 	struct multipath *m = ti->private;
+	struct request_queue *q;
 
 	list_for_each_entry_safe(pgpath, tmp, pgpaths, list) {
 		list_del(&pgpath->list);
+		q = bdev_get_queue(pgpath->path.dev->bdev);
 		if (m->hw_handler_name)
-			scsi_dh_detach(bdev_get_queue(pgpath->path.dev->bdev));
+			scsi_dh_detach(q);
+		blk_put_queue(q);
 		dm_put_device(ti, pgpath->path.dev);
 		free_pgpath(pgpath);
 	}
@@ -552,6 +555,7 @@ static struct pgpath *parse_path(struct dm_arg_set *as, struct path_selector *ps
 {
 	int r;
 	struct pgpath *p;
+	struct request_queue *q;
 	struct multipath *m = ti->private;
 
 	/* we need at least a path arg */
@@ -571,9 +575,14 @@ static struct pgpath *parse_path(struct dm_arg_set *as, struct path_selector *ps
 		goto bad;
 	}
 
-	if (m->hw_handler_name) {
-		struct request_queue *q = bdev_get_queue(p->path.dev->bdev);
+	q = bdev_get_queue(p->path.dev->bdev);
+	r = blk_get_queue(q);
+	if (r) {
+		ti->error = "error getting request queue";
+		goto bad_queue;
+	}
 
+	if (m->hw_handler_name) {
 		r = scsi_dh_attach(q, m->hw_handler_name);
 		if (r == -EBUSY) {
 			/*
@@ -586,8 +595,7 @@ static struct pgpath *parse_path(struct dm_arg_set *as, struct path_selector *ps
 
 		if (r < 0) {
 			ti->error = "error attaching hardware handler";
-			dm_put_device(ti, p->path.dev);
-			goto bad;
+			goto bad_finish;
 		}
 
 		if (m->hw_handler_params) {
@@ -596,21 +604,22 @@ static struct pgpath *parse_path(struct dm_arg_set *as, struct path_selector *ps
 				ti->error = "unable to set hardware "
 							"handler parameters";
 				scsi_dh_detach(q);
-				dm_put_device(ti, p->path.dev);
-				goto bad;
+				goto bad_finish;
 			}
 		}
 	}
 
 	r = ps->type->add_path(ps, &p->path, as->argc, as->argv, &ti->error);
-	if (r) {
-		dm_put_device(ti, p->path.dev);
-		goto bad;
-	}
+	if (r)
+		goto bad_finish;
 
 	return p;
 
- bad:
+bad_finish:
+	blk_put_queue(q);
+bad_queue:
+	dm_put_device(ti, p->path.dev);
+bad:
 	free_pgpath(p);
 	return ERR_PTR(r);
 }

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH] block: Check that queue is alive in blk_insert_cloned_request()
  2011-07-11 22:40   ` Mike Snitzer
  2011-07-12  0:52     ` Alan Stern
  2011-07-12 14:58     ` [PATCH] dm mpath: manage reference on request queue of underlying devices Mike Snitzer
@ 2011-07-12 17:06     ` Vivek Goyal
  2011-07-12 17:41       ` James Bottomley
  2 siblings, 1 reply; 16+ messages in thread
From: Vivek Goyal @ 2011-07-12 17:06 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Roland Dreier, Jens Axboe, James Bottomley, Alan Stern,
	Heiko Carstens, linux-scsi, Steffen Maier,
	Manvanthara B. Puttashankar, Tarak Reddy, Seshagiri N. Ippili,
	linux-kernel, device-mapper development, Tejun Heo

On Mon, Jul 11, 2011 at 06:40:11PM -0400, Mike Snitzer wrote:
> [cc'ing dm-devel, vivek and tejun]
> 
> On Fri, Jul 8, 2011 at 7:04 PM, Roland Dreier <roland@kernel.org> wrote:
> > From: Roland Dreier <roland@purestorage.com>
> >
> > This fixes crashes such as the below that I see when the storage
> > underlying a dm-multipath device is hot-removed.  The problem is that
> > dm requeues a request to a device whose block queue has already been
> > cleaned up, and blk_insert_cloned_request() doesn't check if the queue
> > is alive, but rather goes ahead and tries to queue the request.  This
> > ends up dereferencing the elevator that was already freed in
> > blk_cleanup_queue().
> 
> Your patch looks fine to me:
> Acked-by: Mike Snitzer <snitzer@redhat.com>
> 
> And I looked at various code paths to arrive at the references DM takes.
> 
> A reference is taken on the underlying devices' block_device via
> drivers/md/dm-table.c:open_dev() with blkdev_get_by_dev().  open_dev()
> also does bd_link_disk_holder(), resulting in the mpath device
> becoming a holder of the underlying devices. e.g.:
> /sys/block/sda/holders/dm-4
> 
> But at no point does DM-mpath get a reference to the underlying
> devices' request_queue that gets assigned to clone->q (in
> drivers/md/dm-mpath.c:map_io).
> 
> Seems we should, though AFAIK it won't help with the issue you've
> pointed out (because the hotplugged device's driver already called
> blk_cleanup_queue and nuked the elevator).

[Thinking loud]

Could it be a driver specific issue that it cleaned up the request
queue too early?

Is there any notion of device reference which higher layers take and
that should make sure request queue is intact till somebody is holding
device reference.

If yes, what't the connection between device reference and request
queue reference. IOW, why request queue reference is needed and why
device reference is not sufficient. (Because there is not necessarily
one to one mapping between request queue and device?)

I seem to just have lots of question about devices and referencing.
Hopefully somebody with more knowledge in this area be able to
shed some light on it.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] block: Check that queue is alive in blk_insert_cloned_request()
  2011-07-12 17:06     ` [PATCH] block: Check that queue is alive in blk_insert_cloned_request() Vivek Goyal
@ 2011-07-12 17:41       ` James Bottomley
  2011-07-12 18:02         ` Vivek Goyal
  0 siblings, 1 reply; 16+ messages in thread
From: James Bottomley @ 2011-07-12 17:41 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Mike Snitzer, Roland Dreier, Jens Axboe, Alan Stern,
	Heiko Carstens, linux-scsi, Steffen Maier,
	Manvanthara B. Puttashankar, Tarak Reddy, Seshagiri N. Ippili,
	linux-kernel, device-mapper development, Tejun Heo

On Tue, 2011-07-12 at 13:06 -0400, Vivek Goyal wrote:
> On Mon, Jul 11, 2011 at 06:40:11PM -0400, Mike Snitzer wrote:
> > [cc'ing dm-devel, vivek and tejun]
> > 
> > On Fri, Jul 8, 2011 at 7:04 PM, Roland Dreier <roland@kernel.org> wrote:
> > > From: Roland Dreier <roland@purestorage.com>
> > >
> > > This fixes crashes such as the below that I see when the storage
> > > underlying a dm-multipath device is hot-removed.  The problem is that
> > > dm requeues a request to a device whose block queue has already been
> > > cleaned up, and blk_insert_cloned_request() doesn't check if the queue
> > > is alive, but rather goes ahead and tries to queue the request.  This
> > > ends up dereferencing the elevator that was already freed in
> > > blk_cleanup_queue().
> > 
> > Your patch looks fine to me:
> > Acked-by: Mike Snitzer <snitzer@redhat.com>
> > 
> > And I looked at various code paths to arrive at the references DM takes.
> > 
> > A reference is taken on the underlying devices' block_device via
> > drivers/md/dm-table.c:open_dev() with blkdev_get_by_dev().  open_dev()
> > also does bd_link_disk_holder(), resulting in the mpath device
> > becoming a holder of the underlying devices. e.g.:
> > /sys/block/sda/holders/dm-4
> > 
> > But at no point does DM-mpath get a reference to the underlying
> > devices' request_queue that gets assigned to clone->q (in
> > drivers/md/dm-mpath.c:map_io).
> > 
> > Seems we should, though AFAIK it won't help with the issue you've
> > pointed out (because the hotplugged device's driver already called
> > blk_cleanup_queue and nuked the elevator).
> 
> [Thinking loud]
> 
> Could it be a driver specific issue that it cleaned up the request
> queue too early?

One could glibly answer yes to this.  However, the fact is that it's
currently SCSI which manages the queue, so SCSI cleans it up.  Now, the
only real thing dm is interested in is the queue itself, hence the need
to take a reference to the queue.  However, queue references don't pin
SCSI devices, so you can hold a queue reference all you like and SCSI
will still clean up the queue.

I think a better question is what should cleaning up the queue do?  SCSI
uses it to indicate that we're no longer processing requests, which
happens when the device goes into a DEL state, but queue cleanup tears
down the elevators and really makes the request queue non functional.
In this case, holding a reference isn't particularly helpful.

I'm starting to wonder if there's actually any value to
blk_cleanup_queue() and whether its functionality wouldn't be better
assumed by the queue release function on last put.

> Is there any notion of device reference which higher layers take and
> that should make sure request queue is intact till somebody is holding
> device reference.
> 
> If yes, what't the connection between device reference and request
> queue reference. IOW, why request queue reference is needed and why
> device reference is not sufficient. (Because there is not necessarily
> one to one mapping between request queue and device?)

For the model, these need to be independent but related.  Trying to
subordinate one to the other is going to lead to layering problems.

James


> I seem to just have lots of question about devices and referencing.
> Hopefully somebody with more knowledge in this area be able to
> shed some light on it.
> 
> Thanks
> Vivek



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] block: Check that queue is alive in blk_insert_cloned_request()
  2011-07-12 17:41       ` James Bottomley
@ 2011-07-12 18:02         ` Vivek Goyal
  2011-07-12 18:28           ` James Bottomley
  0 siblings, 1 reply; 16+ messages in thread
From: Vivek Goyal @ 2011-07-12 18:02 UTC (permalink / raw)
  To: James Bottomley
  Cc: Mike Snitzer, Roland Dreier, Jens Axboe, Alan Stern,
	Heiko Carstens, linux-scsi, Steffen Maier,
	Manvanthara B. Puttashankar, Tarak Reddy, Seshagiri N. Ippili,
	linux-kernel, device-mapper development, Tejun Heo

On Tue, Jul 12, 2011 at 12:41:30PM -0500, James Bottomley wrote:
> On Tue, 2011-07-12 at 13:06 -0400, Vivek Goyal wrote:
> > On Mon, Jul 11, 2011 at 06:40:11PM -0400, Mike Snitzer wrote:
> > > [cc'ing dm-devel, vivek and tejun]
> > > 
> > > On Fri, Jul 8, 2011 at 7:04 PM, Roland Dreier <roland@kernel.org> wrote:
> > > > From: Roland Dreier <roland@purestorage.com>
> > > >
> > > > This fixes crashes such as the below that I see when the storage
> > > > underlying a dm-multipath device is hot-removed.  The problem is that
> > > > dm requeues a request to a device whose block queue has already been
> > > > cleaned up, and blk_insert_cloned_request() doesn't check if the queue
> > > > is alive, but rather goes ahead and tries to queue the request.  This
> > > > ends up dereferencing the elevator that was already freed in
> > > > blk_cleanup_queue().
> > > 
> > > Your patch looks fine to me:
> > > Acked-by: Mike Snitzer <snitzer@redhat.com>
> > > 
> > > And I looked at various code paths to arrive at the references DM takes.
> > > 
> > > A reference is taken on the underlying devices' block_device via
> > > drivers/md/dm-table.c:open_dev() with blkdev_get_by_dev().  open_dev()
> > > also does bd_link_disk_holder(), resulting in the mpath device
> > > becoming a holder of the underlying devices. e.g.:
> > > /sys/block/sda/holders/dm-4
> > > 
> > > But at no point does DM-mpath get a reference to the underlying
> > > devices' request_queue that gets assigned to clone->q (in
> > > drivers/md/dm-mpath.c:map_io).
> > > 
> > > Seems we should, though AFAIK it won't help with the issue you've
> > > pointed out (because the hotplugged device's driver already called
> > > blk_cleanup_queue and nuked the elevator).
> > 
> > [Thinking loud]
> > 
> > Could it be a driver specific issue that it cleaned up the request
> > queue too early?
> 
> One could glibly answer yes to this.  However, the fact is that it's
> currently SCSI which manages the queue, so SCSI cleans it up.  Now, the
> only real thing dm is interested in is the queue itself, hence the need
> to take a reference to the queue.  However, queue references don't pin
> SCSI devices, so you can hold a queue reference all you like and SCSI
> will still clean up the queue.
> 
> I think a better question is what should cleaning up the queue do?  SCSI
> uses it to indicate that we're no longer processing requests, which
> happens when the device goes into a DEL state, but queue cleanup tears
> down the elevators and really makes the request queue non functional.
> In this case, holding a reference isn't particularly helpful.
> 
> I'm starting to wonder if there's actually any value to
> blk_cleanup_queue() and whether its functionality wouldn't be better
> assumed by the queue release function on last put.

I think one problem point is q->queue_lock. If driver drops its reference
on queue and cleans up its data structures, then it will free up memory
associated with q->queue_lock too. (If driver provided its own queue
lock). In that case anything which is dependent on queue lock, needs
to be freed up on blk_cleanup_queue().

If we can make sure that request queue reference will keep the spin lock
alive, then i guess all cleanup part might be able to go in release
queue function.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] block: Check that queue is alive in blk_insert_cloned_request()
  2011-07-12 18:02         ` Vivek Goyal
@ 2011-07-12 18:28           ` James Bottomley
  2011-07-12 18:54             ` Vivek Goyal
  2011-07-12 21:02             ` Alan Stern
  0 siblings, 2 replies; 16+ messages in thread
From: James Bottomley @ 2011-07-12 18:28 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Mike Snitzer, Roland Dreier, Jens Axboe, Alan Stern,
	Heiko Carstens, linux-scsi, Steffen Maier,
	Manvanthara B. Puttashankar, Tarak Reddy, Seshagiri N. Ippili,
	linux-kernel, device-mapper development, Tejun Heo

On Tue, 2011-07-12 at 14:02 -0400, Vivek Goyal wrote:
> On Tue, Jul 12, 2011 at 12:41:30PM -0500, James Bottomley wrote:
> > On Tue, 2011-07-12 at 13:06 -0400, Vivek Goyal wrote:
> > > On Mon, Jul 11, 2011 at 06:40:11PM -0400, Mike Snitzer wrote:
> > > > [cc'ing dm-devel, vivek and tejun]
> > > > 
> > > > On Fri, Jul 8, 2011 at 7:04 PM, Roland Dreier <roland@kernel.org> wrote:
> > > > > From: Roland Dreier <roland@purestorage.com>
> > > > >
> > > > > This fixes crashes such as the below that I see when the storage
> > > > > underlying a dm-multipath device is hot-removed.  The problem is that
> > > > > dm requeues a request to a device whose block queue has already been
> > > > > cleaned up, and blk_insert_cloned_request() doesn't check if the queue
> > > > > is alive, but rather goes ahead and tries to queue the request.  This
> > > > > ends up dereferencing the elevator that was already freed in
> > > > > blk_cleanup_queue().
> > > > 
> > > > Your patch looks fine to me:
> > > > Acked-by: Mike Snitzer <snitzer@redhat.com>
> > > > 
> > > > And I looked at various code paths to arrive at the references DM takes.
> > > > 
> > > > A reference is taken on the underlying devices' block_device via
> > > > drivers/md/dm-table.c:open_dev() with blkdev_get_by_dev().  open_dev()
> > > > also does bd_link_disk_holder(), resulting in the mpath device
> > > > becoming a holder of the underlying devices. e.g.:
> > > > /sys/block/sda/holders/dm-4
> > > > 
> > > > But at no point does DM-mpath get a reference to the underlying
> > > > devices' request_queue that gets assigned to clone->q (in
> > > > drivers/md/dm-mpath.c:map_io).
> > > > 
> > > > Seems we should, though AFAIK it won't help with the issue you've
> > > > pointed out (because the hotplugged device's driver already called
> > > > blk_cleanup_queue and nuked the elevator).
> > > 
> > > [Thinking loud]
> > > 
> > > Could it be a driver specific issue that it cleaned up the request
> > > queue too early?
> > 
> > One could glibly answer yes to this.  However, the fact is that it's
> > currently SCSI which manages the queue, so SCSI cleans it up.  Now, the
> > only real thing dm is interested in is the queue itself, hence the need
> > to take a reference to the queue.  However, queue references don't pin
> > SCSI devices, so you can hold a queue reference all you like and SCSI
> > will still clean up the queue.
> > 
> > I think a better question is what should cleaning up the queue do?  SCSI
> > uses it to indicate that we're no longer processing requests, which
> > happens when the device goes into a DEL state, but queue cleanup tears
> > down the elevators and really makes the request queue non functional.
> > In this case, holding a reference isn't particularly helpful.
> > 
> > I'm starting to wonder if there's actually any value to
> > blk_cleanup_queue() and whether its functionality wouldn't be better
> > assumed by the queue release function on last put.
> 
> I think one problem point is q->queue_lock. If driver drops its reference
> on queue and cleans up its data structures, then it will free up memory
> associated with q->queue_lock too. (If driver provided its own queue
> lock). In that case anything which is dependent on queue lock, needs
> to be freed up on blk_cleanup_queue().

I don't quite follow.  blk_cleanup_queue() doesn't free anything (well,
except the elevator).  Final put will free the queue structure which
contains the lock, but if it's really a final put, you have no other
possible references, so no-one is using the lock ... well, assuming
there isn't a programming error, of course ...

> If we can make sure that request queue reference will keep the spin lock
> alive, then i guess all cleanup part might be able to go in release
> queue function.

As I said: cleanup doesn't free the structure containing the lock,
release does, so that piece wouldn't be altered by putting
blk_cleanup_queue() elsewhere.

James



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] block: Check that queue is alive in blk_insert_cloned_request()
  2011-07-12 18:28           ` James Bottomley
@ 2011-07-12 18:54             ` Vivek Goyal
  2011-07-12 21:02             ` Alan Stern
  1 sibling, 0 replies; 16+ messages in thread
From: Vivek Goyal @ 2011-07-12 18:54 UTC (permalink / raw)
  To: James Bottomley
  Cc: Mike Snitzer, Roland Dreier, Jens Axboe, Alan Stern,
	Heiko Carstens, linux-scsi, Steffen Maier,
	Manvanthara B. Puttashankar, Tarak Reddy, Seshagiri N. Ippili,
	linux-kernel, device-mapper development, Tejun Heo

On Tue, Jul 12, 2011 at 01:28:18PM -0500, James Bottomley wrote:

[..]
> > > I'm starting to wonder if there's actually any value to
> > > blk_cleanup_queue() and whether its functionality wouldn't be better
> > > assumed by the queue release function on last put.
> > 
> > I think one problem point is q->queue_lock. If driver drops its reference
> > on queue and cleans up its data structures, then it will free up memory
> > associated with q->queue_lock too. (If driver provided its own queue
> > lock). In that case anything which is dependent on queue lock, needs
> > to be freed up on blk_cleanup_queue().
> 
> I don't quite follow.  blk_cleanup_queue() doesn't free anything (well,
> except the elevator).  Final put will free the queue structure which
> contains the lock, but if it's really a final put, you have no other
> possible references, so no-one is using the lock ... well, assuming
> there isn't a programming error, of course ...
> 
> > If we can make sure that request queue reference will keep the spin lock
> > alive, then i guess all cleanup part might be able to go in release
> > queue function.
> 
> As I said: cleanup doesn't free the structure containing the lock,
> release does, so that piece wouldn't be altered by putting
> blk_cleanup_queue() elsewhere.

I thought a driver could either rely on spin lock provided by request
queue or override that by providing its own spinlock.

blk_init_allocated_queue_node()
        /* Override internal queue lock with supplied lock pointer */
        if (lock)
                q->queue_lock           = lock;

So if driver calls blk_cleanup_queue() and drops its reference on queue, then
it should be free to release any memory it has allocated for spinlock.
So though queue is around there are no gurantees that q->queue_lock is
still around. That memory might have been freed by driver and reused.

I see many drivers are providing their own locks. Some samples from
drivers/block.

/virtio_blk.c:	q = vblk->disk->queue = blk_init_queue(do_virtblk_request,
&vblk->lock);
./xd.c:	xd_queue = blk_init_queue(do_xd_request, &xd_lock);
./cpqarray.c:	q = blk_init_queue(do_ida_request, &hba[i]->lock);
./sx8.c:	q = blk_init_queue(carm_rq_fn, &host->lock);
./sx8.c:	q = blk_init_queue(carm_oob_rq_fn, &host->lock);
./floppy.c:	disks[dr]->queue = blk_init_queue(do_fd_request, &floppy_lock);
./viodasd.c:	q = blk_init_queue(do_viodasd_request, &d->q_lock);
./cciss.c:	disk->queue = blk_init_queue(do_cciss_request, &h->lock);
./hd.c:	hd_queue = blk_init_queue(do_hd_request, &hd_lock);
./DAC960.c:  	RequestQueue = blk_init_queue(DAC960_RequestFunction,&Controller->queue_lock);
./z2ram.c:    z2_queue = blk_init_queue(do_z2_request, &z2ram_lock);
./amiflop.c:	disk->queue = blk_init_queue(do_fd_request, &amiflop_lock);
./xen-blkfront.c:	rq = blk_init_queue(do_blkif_request, &blkif_io_lock);
./paride/pd.c:	pd_queue = blk_init_queue(do_pd_request, &pd_lock);
./paride/pf.c:	pf_queue = blk_init_queue(do_pf_request, &pf_spin_lock);
./paride/pcd.c:	pcd_queue = blk_init_queue(do_pcd_request, &pcd_lock);
./mg_disk.c:	host->breq = blk_init_queue(mg_request_poll, &host->lock);
./mg_disk.c:	host->breq = blk_init_queue(mg_request, &host->lock);
./rbd.c:	q = blk_init_queue(rbd_rq_fn, &rbd_dev->lock);
./sunvdc.c:	q = blk_init_queue(do_vdc_request, &port->vio.lock);
./swim.c:	swd->queue = blk_init_queue(do_fd_request, &swd->lock);
./xsysace.c:	ace->queue = blk_init_queue(ace_request, &ace->lock);
./osdblk.c:	q = blk_init_queue(osdblk_rq_fn, &osdev->lock);
./ps3disk.c:	queue = blk_init_queue(ps3disk_request, &priv->lock);
./swim3.c:	swim3_queue = blk_init_queue(do_fd_request, &swim3_lock);
./ub.c:	if ((q = blk_init_queue(ub_request_fn, sc->lock)) == NULL)
./nbd.c:	disk->queue = blk_init_queue(do_nbd_request, &nbd_lock);

Thanks
Vivek

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] block: Check that queue is alive in blk_insert_cloned_request()
  2011-07-12 18:28           ` James Bottomley
  2011-07-12 18:54             ` Vivek Goyal
@ 2011-07-12 21:02             ` Alan Stern
  1 sibling, 0 replies; 16+ messages in thread
From: Alan Stern @ 2011-07-12 21:02 UTC (permalink / raw)
  To: James Bottomley
  Cc: Vivek Goyal, Mike Snitzer, Roland Dreier, Jens Axboe,
	Heiko Carstens, linux-scsi, Steffen Maier,
	Manvanthara B. Puttashankar, Tarak Reddy, Seshagiri N. Ippili,
	linux-kernel, device-mapper development, Tejun Heo

On Tue, 12 Jul 2011, James Bottomley wrote:

> > I think one problem point is q->queue_lock. If driver drops its reference
> > on queue and cleans up its data structures, then it will free up memory
> > associated with q->queue_lock too. (If driver provided its own queue
> > lock). In that case anything which is dependent on queue lock, needs
> > to be freed up on blk_cleanup_queue().
> 
> I don't quite follow.  blk_cleanup_queue() doesn't free anything (well,
> except the elevator).  Final put will free the queue structure which
> contains the lock, but if it's really a final put, you have no other
> possible references, so no-one is using the lock ... well, assuming
> there isn't a programming error, of course ...
> 
> > If we can make sure that request queue reference will keep the spin lock
> > alive, then i guess all cleanup part might be able to go in release
> > queue function.
> 
> As I said: cleanup doesn't free the structure containing the lock,
> release does, so that piece wouldn't be altered by putting
> blk_cleanup_queue() elsewhere.

It's worth taking a closer look at what happened here.  Originally
request_queue->queuedata was set to NULL and blk_cleanup_queue() was
called from scsi_device_dev_release_usercontext() (by way of
scsi_free_queue()).  This caused a problem because there was a window
in which the last reference to the scsi_device had been dropped but the
release routine hadn't yet run, so the queue was still marked as active
(i.e., queuedata was still non-NULL).

Commit 86cbfb5607d4b81b1a993ff689bbd2addd5d3a9b ([SCSI] put stricter 
guards on queue dead checks) was written to fix this problem.  However 
it moved _both_ lines from the release routine into 
__scsi_remove_device().  There was no need for this; all that was 
needed was to make sure queuedata was NULL before the device reference 
was dropped.

And in fact, moving the call to scsi_free_queue() has opened up another
window for bugs.  Now it's possible for requests to get on the queue
(submitted by drivers from another layer that still hold a reference to
the scsi_device) even after the queue's elevator has been freed.  Yes,
the block layer is supposed to prevent this from happening, but there
are multiple paths and they don't all have the requisite checks.

And as Vivek points out, it's questionable whether they _can_ make the 
proper checks.  Checking requires the queue's lock, but the client may 
have deallocated the lock at the time the queue was cleaned up.  Maybe 
SCSI doesn't do this, but other clients of the block layer might.  The 
block layer can't make any assumptions.  The unavoidable conclusion is 
that the clients must be responsible for insuring that no requests are 
added to a queue after its lock has been released.

To help accomplish this in SCSI, I'm suggesting that the call to
scsi_free_queue() be moved back to where it used to be, in
scsi_device_dev_release_usercontext().  Then it won't be possible for
any wayward requests to be added to the queue after it is cleaned up,
because there won't be any outstanding references to the scsi_device.

Alan Stern

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] block: Check that queue is alive in blk_insert_cloned_request()
  2011-07-08 23:04 ` [PATCH] block: Check that queue is alive in blk_insert_cloned_request() Roland Dreier
  2011-07-09  9:05   ` Stefan Richter
  2011-07-11 22:40   ` Mike Snitzer
@ 2011-07-12  2:09   ` Vivek Goyal
  2 siblings, 0 replies; 16+ messages in thread
From: Vivek Goyal @ 2011-07-12  2:09 UTC (permalink / raw)
  To: Roland Dreier
  Cc: Jens Axboe, James Bottomley, Alan Stern, Heiko Carstens,
	linux-scsi, Steffen Maier, Manvanthara B. Puttashankar,
	Tarak Reddy, Seshagiri N. Ippili, linux-kernel

On Fri, Jul 08, 2011 at 04:04:30PM -0700, Roland Dreier wrote:
> From: Roland Dreier <roland@purestorage.com>
> 
> This fixes crashes such as the below that I see when the storage
> underlying a dm-multipath device is hot-removed.  The problem is that
> dm requeues a request to a device whose block queue has already been
> cleaned up, and blk_insert_cloned_request() doesn't check if the queue
> is alive, but rather goes ahead and tries to queue the request.  This
> ends up dereferencing the elevator that was already freed in
> blk_cleanup_queue().
> 
>     general protection fault: 0000 [#1] SMP 
>     CPU 7 
>     Modules linked in: kvm_intel kvm serio_raw i7core_edac edac_core ioatdma dca pci_stub ses enclosure usbhid usb_storage qla2xxx mpt2sas ahci hid uas libahci e1000e scsi_transport_fc scsi_transport_sas mlx4_core raid_class scsi_tgt
>     RIP: 0010:[<ffffffff81233867>]  [<ffffffff81233867>] elv_drain_elevator+0x27/0x80
>     RSP: 0018:ffff880614e9fa48  EFLAGS: 00010096
>     RAX: 6b6b6b6b6b6b6b6b RBX: ffff880610bb0000 RCX: 0000000000000000
>     RDX: 0000000000000002 RSI: 0000000000000001 RDI: ffff880610bb0000
>     RBP: ffff880614e9fa58 R08: 0000000000000000 R09: 0000000000000001
>     R10: ffff880c0a7dca70 R11: ffff880615622440 R12: ffff880610bb0000
>     R13: 0000000000000002 R14: 0000000000000002 R15: ffff880c0db01160
>     FS:  00007fe46457a760(0000) GS:ffff880c3fc20000(0000) knlGS:0000000000000000
>     CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>     CR2: 00007fe463c86330 CR3: 0000000c0cc0c000 CR4: 00000000000006e0
>     DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>     DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
>     Process blkid (pid: 12458, threadinfo ffff880614e9e000, task ffff8806190eae20)
>     Stack:
>      ffff880c0d9f03c8 ffff880c0d9f03c8 ffff880614e9fa88 ffffff
>      ffff880c0d9f03c8 ffff880610bb0000 0000000000000002 ffffc9
>      ffff880614e9fab8 ffffffff812366fd ffff880614e9fad8 ffff88
>     Call Trace:
>      [<ffffffff812339a8>] __elv_add_request+0xe8/0x280
>      [<ffffffff812366fd>] add_acct_request+0x3d/0x50
>      [<ffffffff81236775>] blk_insert_cloned_request+0x65/0x90
>      [<ffffffff813b8d4e>] dm_dispatch_request+0x3e/0x70
>      [<ffffffff813ba850>] dm_request_fn+0x160/0x250
>      [<ffffffff81236e88>] queue_unplugged+0x48/0xd0
>      [<ffffffff8123b03d>] blk_flush_plug_list+0x1ed/0x250

Roland,

IIUC, this crash can happen even without dm-multipath being in picture.
Because it looks like it can happen that we have put requests on 
plug list and then device is hot removed which ends up cleaning elevator
and then blk_flush_plug_list() is called. 

Can you please try it without multipath target.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2011-07-12 21:02 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <CAG4TOxMjuntpd3Q5jgn+xDa2tgB9tq+jLcDquM_eAdFHXLDrWA@mail.gmail.com>
2011-07-08 23:04 ` [PATCH] block: Check that queue is alive in blk_insert_cloned_request() Roland Dreier
2011-07-09  9:05   ` Stefan Richter
2011-07-11 22:40   ` Mike Snitzer
2011-07-12  0:52     ` Alan Stern
2011-07-12  1:22       ` Mike Snitzer
2011-07-12  1:46         ` Vivek Goyal
2011-07-12 15:24           ` Alan Stern
2011-07-12 17:10             ` Vivek Goyal
2011-07-12 14:58     ` [PATCH] dm mpath: manage reference on request queue of underlying devices Mike Snitzer
2011-07-12 17:06     ` [PATCH] block: Check that queue is alive in blk_insert_cloned_request() Vivek Goyal
2011-07-12 17:41       ` James Bottomley
2011-07-12 18:02         ` Vivek Goyal
2011-07-12 18:28           ` James Bottomley
2011-07-12 18:54             ` Vivek Goyal
2011-07-12 21:02             ` Alan Stern
2011-07-12  2:09   ` Vivek Goyal

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox