* [PATCH] block: Check that queue is alive in blk_insert_cloned_request() [not found] <CAG4TOxMjuntpd3Q5jgn+xDa2tgB9tq+jLcDquM_eAdFHXLDrWA@mail.gmail.com> @ 2011-07-08 23:04 ` Roland Dreier 2011-07-09 9:05 ` Stefan Richter ` (2 more replies) 0 siblings, 3 replies; 16+ messages in thread From: Roland Dreier @ 2011-07-08 23:04 UTC (permalink / raw) To: Jens Axboe Cc: James Bottomley, Alan Stern, Heiko Carstens, linux-scsi, Steffen Maier, Manvanthara B. Puttashankar, Tarak Reddy, Seshagiri N. Ippili, linux-kernel From: Roland Dreier <roland@purestorage.com> This fixes crashes such as the below that I see when the storage underlying a dm-multipath device is hot-removed. The problem is that dm requeues a request to a device whose block queue has already been cleaned up, and blk_insert_cloned_request() doesn't check if the queue is alive, but rather goes ahead and tries to queue the request. This ends up dereferencing the elevator that was already freed in blk_cleanup_queue(). general protection fault: 0000 [#1] SMP CPU 7 Modules linked in: kvm_intel kvm serio_raw i7core_edac edac_core ioatdma dca pci_stub ses enclosure usbhid usb_storage qla2xxx mpt2sas ahci hid uas libahci e1000e scsi_transport_fc scsi_transport_sas mlx4_core raid_class scsi_tgt RIP: 0010:[<ffffffff81233867>] [<ffffffff81233867>] elv_drain_elevator+0x27/0x80 RSP: 0018:ffff880614e9fa48 EFLAGS: 00010096 RAX: 6b6b6b6b6b6b6b6b RBX: ffff880610bb0000 RCX: 0000000000000000 RDX: 0000000000000002 RSI: 0000000000000001 RDI: ffff880610bb0000 RBP: ffff880614e9fa58 R08: 0000000000000000 R09: 0000000000000001 R10: ffff880c0a7dca70 R11: ffff880615622440 R12: ffff880610bb0000 R13: 0000000000000002 R14: 0000000000000002 R15: ffff880c0db01160 FS: 00007fe46457a760(0000) GS:ffff880c3fc20000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fe463c86330 CR3: 0000000c0cc0c000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process blkid (pid: 12458, threadinfo ffff880614e9e000, task ffff8806190eae20) Stack: ffff880c0d9f03c8 ffff880c0d9f03c8 ffff880614e9fa88 ffffff ffff880c0d9f03c8 ffff880610bb0000 0000000000000002 ffffc9 ffff880614e9fab8 ffffffff812366fd ffff880614e9fad8 ffff88 Call Trace: [<ffffffff812339a8>] __elv_add_request+0xe8/0x280 [<ffffffff812366fd>] add_acct_request+0x3d/0x50 [<ffffffff81236775>] blk_insert_cloned_request+0x65/0x90 [<ffffffff813b8d4e>] dm_dispatch_request+0x3e/0x70 [<ffffffff813ba850>] dm_request_fn+0x160/0x250 [<ffffffff81236e88>] queue_unplugged+0x48/0xd0 [<ffffffff8123b03d>] blk_flush_plug_list+0x1ed/0x250 [<ffffffff810e98f0>] ? sleep_on_page+0x20/0x20 [<ffffffff814db895>] io_schedule+0x75/0xd0 [<ffffffff810e98fe>] sleep_on_page_killable+0xe/0x40 [<ffffffff814dbf9a>] __wait_on_bit_lock+0x5a/0xc0 [<ffffffff810e9857>] __lock_page_killable+0x67/0x70 [<ffffffff8106d650>] ? autoremove_wake_function+0x40/0x40 [<ffffffff810eb5c5>] generic_file_aio_read+0x405/0x720 [<ffffffff81146962>] do_sync_read+0xd2/0x110 [<ffffffff8121e393>] ? security_file_permission+0x93/0xb0 [<ffffffff81146c81>] ? rw_verify_area+0x61/0xf0 [<ffffffff81147143>] vfs_read+0xc3/0x180 [<ffffffff81147251>] sys_read+0x51/0x90 [<ffffffff814e5942>] system_call_fastpath+0x16/0x1b Code: c3 0f 1f 00 55 48 89 e5 53 48 83 ec 08 66 66 66 66 90 48 89 fb 0f 1f 80 00 00 00 00 48 8b 43 18 be 01 00 00 00 48 89 df 48 8b 00 <ff> 50 28 85 c0 75 ea 8b 93 f0 03 00 00 85 d2 74 14 8b 05 fa e1 RIP [<ffffffff81233867>] elv_drain_elevator+0x27/0x80 RSP <ffff880614e9fa48> RSP <ffff880614e9fa48> ---[ end trace 5dbd03e7295023d7 ]--- Signed-off-by: Roland Dreier <roland@purestorage.com> --- block/blk-core.c | 3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/block/blk-core.c b/block/blk-core.c index d2f8f40..a13a9cc 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -1706,6 +1706,9 @@ int blk_insert_cloned_request(struct request_queue *q, struct request *rq) return -EIO; #endif + if (unlikely(test_bit(QUEUE_FLAG_DEAD, &q->queue_flags))) + return -EIO; + spin_lock_irqsave(q->queue_lock, flags); /* ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH] block: Check that queue is alive in blk_insert_cloned_request() 2011-07-08 23:04 ` [PATCH] block: Check that queue is alive in blk_insert_cloned_request() Roland Dreier @ 2011-07-09 9:05 ` Stefan Richter 2011-07-11 22:40 ` Mike Snitzer 2011-07-12 2:09 ` Vivek Goyal 2 siblings, 0 replies; 16+ messages in thread From: Stefan Richter @ 2011-07-09 9:05 UTC (permalink / raw) To: Roland Dreier Cc: Jens Axboe, James Bottomley, Alan Stern, Heiko Carstens, linux-scsi, Steffen Maier, Manvanthara B. Puttashankar, Tarak Reddy, Seshagiri N. Ippili, linux-kernel On Jul 08 Roland Dreier wrote: > This fixes crashes such as the below that I see when the storage > underlying a dm-multipath device is hot-removed. The problem is that > dm requeues a request to a device whose block queue has already been > cleaned up, and blk_insert_cloned_request() doesn't check if the queue > is alive, but rather goes ahead and tries to queue the request. This > ends up dereferencing the elevator that was already freed in > blk_cleanup_queue(). [...] > --- a/block/blk-core.c > +++ b/block/blk-core.c > @@ -1706,6 +1706,9 @@ int blk_insert_cloned_request(struct request_queue *q, struct request *rq) > return -EIO; > #endif > > + if (unlikely(test_bit(QUEUE_FLAG_DEAD, &q->queue_flags))) > + return -EIO; > + > spin_lock_irqsave(q->queue_lock, flags); > > /* Not knowing the calling contexts or having tried to learn about them, I am wondering: What prevents the elevator to be freed right after the flag was checked? -- Stefan Richter -=====-==-== -=== -=--= http://arcgraph.de/sr/ ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] block: Check that queue is alive in blk_insert_cloned_request() 2011-07-08 23:04 ` [PATCH] block: Check that queue is alive in blk_insert_cloned_request() Roland Dreier 2011-07-09 9:05 ` Stefan Richter @ 2011-07-11 22:40 ` Mike Snitzer 2011-07-12 0:52 ` Alan Stern ` (2 more replies) 2011-07-12 2:09 ` Vivek Goyal 2 siblings, 3 replies; 16+ messages in thread From: Mike Snitzer @ 2011-07-11 22:40 UTC (permalink / raw) To: Roland Dreier Cc: Jens Axboe, James Bottomley, Alan Stern, Heiko Carstens, linux-scsi, Steffen Maier, Manvanthara B. Puttashankar, Tarak Reddy, Seshagiri N. Ippili, linux-kernel, device-mapper development, Vivek Goyal, Tejun Heo [cc'ing dm-devel, vivek and tejun] On Fri, Jul 8, 2011 at 7:04 PM, Roland Dreier <roland@kernel.org> wrote: > From: Roland Dreier <roland@purestorage.com> > > This fixes crashes such as the below that I see when the storage > underlying a dm-multipath device is hot-removed. The problem is that > dm requeues a request to a device whose block queue has already been > cleaned up, and blk_insert_cloned_request() doesn't check if the queue > is alive, but rather goes ahead and tries to queue the request. This > ends up dereferencing the elevator that was already freed in > blk_cleanup_queue(). Your patch looks fine to me: Acked-by: Mike Snitzer <snitzer@redhat.com> And I looked at various code paths to arrive at the references DM takes. A reference is taken on the underlying devices' block_device via drivers/md/dm-table.c:open_dev() with blkdev_get_by_dev(). open_dev() also does bd_link_disk_holder(), resulting in the mpath device becoming a holder of the underlying devices. e.g.: /sys/block/sda/holders/dm-4 But at no point does DM-mpath get a reference to the underlying devices' request_queue that gets assigned to clone->q (in drivers/md/dm-mpath.c:map_io). Seems we should, though AFAIK it won't help with the issue you've pointed out (because the hotplugged device's driver already called blk_cleanup_queue and nuked the elevator). So I'm not sure getting the request_queue reference actually fixes anything (maybe my imagination is lacking?). But getting the request_queue reference is "the right thing" if we're going to be setting pointers to it. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] block: Check that queue is alive in blk_insert_cloned_request() 2011-07-11 22:40 ` Mike Snitzer @ 2011-07-12 0:52 ` Alan Stern 2011-07-12 1:22 ` Mike Snitzer 2011-07-12 14:58 ` [PATCH] dm mpath: manage reference on request queue of underlying devices Mike Snitzer 2011-07-12 17:06 ` [PATCH] block: Check that queue is alive in blk_insert_cloned_request() Vivek Goyal 2 siblings, 1 reply; 16+ messages in thread From: Alan Stern @ 2011-07-12 0:52 UTC (permalink / raw) To: Mike Snitzer Cc: Roland Dreier, Jens Axboe, James Bottomley, Heiko Carstens, linux-scsi, Steffen Maier, Manvanthara B. Puttashankar, Tarak Reddy, Seshagiri N. Ippili, linux-kernel, device-mapper development, Vivek Goyal, Tejun Heo [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #1: Type: TEXT/PLAIN; charset=UTF-8, Size: 904 bytes --] On Mon, 11 Jul 2011, Mike Snitzer wrote: > [cc'ing dm-devel, vivek and tejun] > > On Fri, Jul 8, 2011 at 7:04 PM, Roland Dreier <roland@kernel.org> wrote: > > From: Roland Dreier <roland@purestorage.com> > > > > This fixes crashes such as the below that I see when the storage > > underlying a dm-multipath device is hot-removed. The problem is that > > dm requeues a request to a device whose block queue has already been > > cleaned up, and blk_insert_cloned_request() doesn't check if the queue > > is alive, but rather goes ahead and tries to queue the request. This > > ends up dereferencing the elevator that was already freed in > > blk_cleanup_queue(). > > Your patch looks fine to me: > Acked-by: Mike Snitzer <snitzer@redhat.com> There's still the issue that Stefan Richter pointed out: The test for a dead queue must be made _after_ acquiring the queue lock, not _before_. Alan Stern ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: block: Check that queue is alive in blk_insert_cloned_request() 2011-07-12 0:52 ` Alan Stern @ 2011-07-12 1:22 ` Mike Snitzer 2011-07-12 1:46 ` Vivek Goyal 0 siblings, 1 reply; 16+ messages in thread From: Mike Snitzer @ 2011-07-12 1:22 UTC (permalink / raw) To: Alan Stern Cc: Roland Dreier, Jens Axboe, James Bottomley, Heiko Carstens, linux-scsi, Steffen Maier, Manvanthara B. Puttashankar, Tarak Reddy, Seshagiri N. Ippili, linux-kernel, device-mapper development, Vivek Goyal, Tejun Heo, jaxboe On Mon, Jul 11 2011 at 8:52pm -0400, Alan Stern <stern@rowland.harvard.edu> wrote: > On Mon, 11 Jul 2011, Mike Snitzer wrote: > > > [cc'ing dm-devel, vivek and tejun] > > > > On Fri, Jul 8, 2011 at 7:04 PM, Roland Dreier <roland@kernel.org> wrote: > > > From: Roland Dreier <roland@purestorage.com> > > > > > > This fixes crashes such as the below that I see when the storage > > > underlying a dm-multipath device is hot-removed. ?The problem is that > > > dm requeues a request to a device whose block queue has already been > > > cleaned up, and blk_insert_cloned_request() doesn't check if the queue > > > is alive, but rather goes ahead and tries to queue the request. ?This > > > ends up dereferencing the elevator that was already freed in > > > blk_cleanup_queue(). > > > > Your patch looks fine to me: > > Acked-by: Mike Snitzer <snitzer@redhat.com> > > There's still the issue that Stefan Richter pointed out: The test for a > dead queue must be made _after_ acquiring the queue lock, not _before_. Yes, quite important. Jens, can you tweak the patch or should Roland send a v2? ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: block: Check that queue is alive in blk_insert_cloned_request() 2011-07-12 1:22 ` Mike Snitzer @ 2011-07-12 1:46 ` Vivek Goyal 2011-07-12 15:24 ` Alan Stern 0 siblings, 1 reply; 16+ messages in thread From: Vivek Goyal @ 2011-07-12 1:46 UTC (permalink / raw) To: Mike Snitzer Cc: Alan Stern, Roland Dreier, Jens Axboe, James Bottomley, Heiko Carstens, linux-scsi, Steffen Maier, Manvanthara B. Puttashankar, Tarak Reddy, Seshagiri N. Ippili, linux-kernel, device-mapper development, Tejun Heo, jaxboe On Mon, Jul 11, 2011 at 09:22:06PM -0400, Mike Snitzer wrote: > On Mon, Jul 11 2011 at 8:52pm -0400, > Alan Stern <stern@rowland.harvard.edu> wrote: > > > On Mon, 11 Jul 2011, Mike Snitzer wrote: > > > > > [cc'ing dm-devel, vivek and tejun] > > > > > > On Fri, Jul 8, 2011 at 7:04 PM, Roland Dreier <roland@kernel.org> wrote: > > > > From: Roland Dreier <roland@purestorage.com> > > > > > > > > This fixes crashes such as the below that I see when the storage > > > > underlying a dm-multipath device is hot-removed. ?The problem is that > > > > dm requeues a request to a device whose block queue has already been > > > > cleaned up, and blk_insert_cloned_request() doesn't check if the queue > > > > is alive, but rather goes ahead and tries to queue the request. ?This > > > > ends up dereferencing the elevator that was already freed in > > > > blk_cleanup_queue(). > > > > > > Your patch looks fine to me: > > > Acked-by: Mike Snitzer <snitzer@redhat.com> > > > > There's still the issue that Stefan Richter pointed out: The test for a > > dead queue must be made _after_ acquiring the queue lock, not _before_. > > Yes, quite important. > > Jens, can you tweak the patch or should Roland send a v2? I do not think that we should do queue dead check after taking a spinlock. The reason being that there are life time issues of two objects. - Validity of request queue pointer - Validity of q->spin_lock pointer If the dm has taken the reference to the request queue in the beginning then it can be sure request queue pointer is valid. But spin_lock might be coming from driver and might be in one of driver allocated structures. So it might happen that driver has called blk_cleanup_queue() and freed up structures which contained the spin lock. So if queue is not dead, we know that q->spin_lock is valid. I think only race present here is that whole operation is not atomic. First we check for queue not dead flag and then go on to acquire request queue lock. So this leaves a small window for race. I think I have seen other code written in such manner (__generic_make_request()). So it proably reasonably safe to do here too. Thanks Vivek ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: block: Check that queue is alive in blk_insert_cloned_request() 2011-07-12 1:46 ` Vivek Goyal @ 2011-07-12 15:24 ` Alan Stern 2011-07-12 17:10 ` Vivek Goyal 0 siblings, 1 reply; 16+ messages in thread From: Alan Stern @ 2011-07-12 15:24 UTC (permalink / raw) To: Vivek Goyal Cc: Mike Snitzer, Roland Dreier, Jens Axboe, James Bottomley, Heiko Carstens, linux-scsi, Steffen Maier, Manvanthara B. Puttashankar, Tarak Reddy, Seshagiri N. Ippili, linux-kernel, device-mapper development, Tejun Heo, jaxboe On Mon, 11 Jul 2011, Vivek Goyal wrote: > > > There's still the issue that Stefan Richter pointed out: The test for a > > > dead queue must be made _after_ acquiring the queue lock, not _before_. > > > > Yes, quite important. > > > > Jens, can you tweak the patch or should Roland send a v2? > > I do not think that we should do queue dead check after taking a spinlock. > The reason being that there are life time issues of two objects. > > - Validity of request queue pointer > - Validity of q->spin_lock pointer > > If the dm has taken the reference to the request queue in the beginning > then it can be sure request queue pointer is valid. But spin_lock might > be coming from driver and might be in one of driver allocated structures. > So it might happen that driver has called blk_cleanup_queue() and freed > up structures which contained the spin lock. Surely this is a bug in the design of the block layer? > So if queue is not dead, we know that q->spin_lock is valid. I think > only race present here is that whole operation is not atomic. First > we check for queue not dead flag and then go on to acquire request > queue lock. So this leaves a small window for race. I think I have > seen other code written in such manner (__generic_make_request()). So > it proably reasonably safe to do here too. "Probably reasonably safe" = "unsafe". The fact that it will usually work out okay means that when it does fail, it will be very difficult to track down. It needs to be fixed _now_, when people are aware of the issue. Not five years from now, when everybody has forgotten about it. Alan Stern ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: block: Check that queue is alive in blk_insert_cloned_request() 2011-07-12 15:24 ` Alan Stern @ 2011-07-12 17:10 ` Vivek Goyal 0 siblings, 0 replies; 16+ messages in thread From: Vivek Goyal @ 2011-07-12 17:10 UTC (permalink / raw) To: Alan Stern Cc: Mike Snitzer, Roland Dreier, Jens Axboe, James Bottomley, Heiko Carstens, linux-scsi, Steffen Maier, Manvanthara B. Puttashankar, Tarak Reddy, Seshagiri N. Ippili, linux-kernel, device-mapper development, Tejun Heo, jaxboe On Tue, Jul 12, 2011 at 11:24:54AM -0400, Alan Stern wrote: > On Mon, 11 Jul 2011, Vivek Goyal wrote: > > > > > There's still the issue that Stefan Richter pointed out: The test for a > > > > dead queue must be made _after_ acquiring the queue lock, not _before_. > > > > > > Yes, quite important. > > > > > > Jens, can you tweak the patch or should Roland send a v2? > > > > I do not think that we should do queue dead check after taking a spinlock. > > The reason being that there are life time issues of two objects. > > > > - Validity of request queue pointer > > - Validity of q->spin_lock pointer > > > > If the dm has taken the reference to the request queue in the beginning > > then it can be sure request queue pointer is valid. But spin_lock might > > be coming from driver and might be in one of driver allocated structures. > > So it might happen that driver has called blk_cleanup_queue() and freed > > up structures which contained the spin lock. > > Surely this is a bug in the design of the block layer? > > > So if queue is not dead, we know that q->spin_lock is valid. I think > > only race present here is that whole operation is not atomic. First > > we check for queue not dead flag and then go on to acquire request > > queue lock. So this leaves a small window for race. I think I have > > seen other code written in such manner (__generic_make_request()). So > > it proably reasonably safe to do here too. > > "Probably reasonably safe" = "unsafe". The fact that it will usually > work out okay means that when it does fail, it will be very difficult > to track down. > > It needs to be fixed _now_, when people are aware of the issue. Not > five years from now, when everybody has forgotten about it. I agree that fixing would be good. Frankly speaking I don't even have full understanding of the problem. I know little bit from request queue side but have no idea about referencing mechanism at device level and how that is supposed to work with request queue referencing. So once we understand the problem well, probably we will have an answer how to go about fixing it. Thanks Vivek ^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH] dm mpath: manage reference on request queue of underlying devices 2011-07-11 22:40 ` Mike Snitzer 2011-07-12 0:52 ` Alan Stern @ 2011-07-12 14:58 ` Mike Snitzer 2011-07-12 17:06 ` [PATCH] block: Check that queue is alive in blk_insert_cloned_request() Vivek Goyal 2 siblings, 0 replies; 16+ messages in thread From: Mike Snitzer @ 2011-07-12 14:58 UTC (permalink / raw) To: Alasdair G. Kergon Cc: Jens Axboe, James Bottomley, Alan Stern, Heiko Carstens, linux-scsi, Steffen Maier, Manvanthara B. Puttashankar, Tarak Reddy, Seshagiri N. Ippili, linux-kernel, device-mapper development, Vivek Goyal, Tejun Heo, Roland Dreier When processing a request, DM-mpath's map_io() set the cloned request's request_queue to the appropriate underlying device's request_queue without getting a reference on that request_queue. DM-mpath now maintains a reference count on the underlying devices' request_queue. This change wasn't motivated by a specific report but code, like blk_insert_cloned_request(), will access the request_queue with the understanding that the request_queue is valid. Signed-off-by: Mike Snitzer <snitzer@redhat.com> --- drivers/md/dm-mpath.c | 33 +++++++++++++++++++++------------ 1 files changed, 21 insertions(+), 12 deletions(-) diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c index eadea8e..629efa3 100644 --- a/drivers/md/dm-mpath.c +++ b/drivers/md/dm-mpath.c @@ -161,11 +161,14 @@ static void free_pgpaths(struct list_head *pgpaths, struct dm_target *ti) { struct pgpath *pgpath, *tmp; struct multipath *m = ti->private; + struct request_queue *q; list_for_each_entry_safe(pgpath, tmp, pgpaths, list) { list_del(&pgpath->list); + q = bdev_get_queue(pgpath->path.dev->bdev); if (m->hw_handler_name) - scsi_dh_detach(bdev_get_queue(pgpath->path.dev->bdev)); + scsi_dh_detach(q); + blk_put_queue(q); dm_put_device(ti, pgpath->path.dev); free_pgpath(pgpath); } @@ -552,6 +555,7 @@ static struct pgpath *parse_path(struct dm_arg_set *as, struct path_selector *ps { int r; struct pgpath *p; + struct request_queue *q; struct multipath *m = ti->private; /* we need at least a path arg */ @@ -571,9 +575,14 @@ static struct pgpath *parse_path(struct dm_arg_set *as, struct path_selector *ps goto bad; } - if (m->hw_handler_name) { - struct request_queue *q = bdev_get_queue(p->path.dev->bdev); + q = bdev_get_queue(p->path.dev->bdev); + r = blk_get_queue(q); + if (r) { + ti->error = "error getting request queue"; + goto bad_queue; + } + if (m->hw_handler_name) { r = scsi_dh_attach(q, m->hw_handler_name); if (r == -EBUSY) { /* @@ -586,8 +595,7 @@ static struct pgpath *parse_path(struct dm_arg_set *as, struct path_selector *ps if (r < 0) { ti->error = "error attaching hardware handler"; - dm_put_device(ti, p->path.dev); - goto bad; + goto bad_finish; } if (m->hw_handler_params) { @@ -596,21 +604,22 @@ static struct pgpath *parse_path(struct dm_arg_set *as, struct path_selector *ps ti->error = "unable to set hardware " "handler parameters"; scsi_dh_detach(q); - dm_put_device(ti, p->path.dev); - goto bad; + goto bad_finish; } } } r = ps->type->add_path(ps, &p->path, as->argc, as->argv, &ti->error); - if (r) { - dm_put_device(ti, p->path.dev); - goto bad; - } + if (r) + goto bad_finish; return p; - bad: +bad_finish: + blk_put_queue(q); +bad_queue: + dm_put_device(ti, p->path.dev); +bad: free_pgpath(p); return ERR_PTR(r); } ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH] block: Check that queue is alive in blk_insert_cloned_request() 2011-07-11 22:40 ` Mike Snitzer 2011-07-12 0:52 ` Alan Stern 2011-07-12 14:58 ` [PATCH] dm mpath: manage reference on request queue of underlying devices Mike Snitzer @ 2011-07-12 17:06 ` Vivek Goyal 2011-07-12 17:41 ` James Bottomley 2 siblings, 1 reply; 16+ messages in thread From: Vivek Goyal @ 2011-07-12 17:06 UTC (permalink / raw) To: Mike Snitzer Cc: Roland Dreier, Jens Axboe, James Bottomley, Alan Stern, Heiko Carstens, linux-scsi, Steffen Maier, Manvanthara B. Puttashankar, Tarak Reddy, Seshagiri N. Ippili, linux-kernel, device-mapper development, Tejun Heo On Mon, Jul 11, 2011 at 06:40:11PM -0400, Mike Snitzer wrote: > [cc'ing dm-devel, vivek and tejun] > > On Fri, Jul 8, 2011 at 7:04 PM, Roland Dreier <roland@kernel.org> wrote: > > From: Roland Dreier <roland@purestorage.com> > > > > This fixes crashes such as the below that I see when the storage > > underlying a dm-multipath device is hot-removed. The problem is that > > dm requeues a request to a device whose block queue has already been > > cleaned up, and blk_insert_cloned_request() doesn't check if the queue > > is alive, but rather goes ahead and tries to queue the request. This > > ends up dereferencing the elevator that was already freed in > > blk_cleanup_queue(). > > Your patch looks fine to me: > Acked-by: Mike Snitzer <snitzer@redhat.com> > > And I looked at various code paths to arrive at the references DM takes. > > A reference is taken on the underlying devices' block_device via > drivers/md/dm-table.c:open_dev() with blkdev_get_by_dev(). open_dev() > also does bd_link_disk_holder(), resulting in the mpath device > becoming a holder of the underlying devices. e.g.: > /sys/block/sda/holders/dm-4 > > But at no point does DM-mpath get a reference to the underlying > devices' request_queue that gets assigned to clone->q (in > drivers/md/dm-mpath.c:map_io). > > Seems we should, though AFAIK it won't help with the issue you've > pointed out (because the hotplugged device's driver already called > blk_cleanup_queue and nuked the elevator). [Thinking loud] Could it be a driver specific issue that it cleaned up the request queue too early? Is there any notion of device reference which higher layers take and that should make sure request queue is intact till somebody is holding device reference. If yes, what't the connection between device reference and request queue reference. IOW, why request queue reference is needed and why device reference is not sufficient. (Because there is not necessarily one to one mapping between request queue and device?) I seem to just have lots of question about devices and referencing. Hopefully somebody with more knowledge in this area be able to shed some light on it. Thanks Vivek ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] block: Check that queue is alive in blk_insert_cloned_request() 2011-07-12 17:06 ` [PATCH] block: Check that queue is alive in blk_insert_cloned_request() Vivek Goyal @ 2011-07-12 17:41 ` James Bottomley 2011-07-12 18:02 ` Vivek Goyal 0 siblings, 1 reply; 16+ messages in thread From: James Bottomley @ 2011-07-12 17:41 UTC (permalink / raw) To: Vivek Goyal Cc: Mike Snitzer, Roland Dreier, Jens Axboe, Alan Stern, Heiko Carstens, linux-scsi, Steffen Maier, Manvanthara B. Puttashankar, Tarak Reddy, Seshagiri N. Ippili, linux-kernel, device-mapper development, Tejun Heo On Tue, 2011-07-12 at 13:06 -0400, Vivek Goyal wrote: > On Mon, Jul 11, 2011 at 06:40:11PM -0400, Mike Snitzer wrote: > > [cc'ing dm-devel, vivek and tejun] > > > > On Fri, Jul 8, 2011 at 7:04 PM, Roland Dreier <roland@kernel.org> wrote: > > > From: Roland Dreier <roland@purestorage.com> > > > > > > This fixes crashes such as the below that I see when the storage > > > underlying a dm-multipath device is hot-removed. The problem is that > > > dm requeues a request to a device whose block queue has already been > > > cleaned up, and blk_insert_cloned_request() doesn't check if the queue > > > is alive, but rather goes ahead and tries to queue the request. This > > > ends up dereferencing the elevator that was already freed in > > > blk_cleanup_queue(). > > > > Your patch looks fine to me: > > Acked-by: Mike Snitzer <snitzer@redhat.com> > > > > And I looked at various code paths to arrive at the references DM takes. > > > > A reference is taken on the underlying devices' block_device via > > drivers/md/dm-table.c:open_dev() with blkdev_get_by_dev(). open_dev() > > also does bd_link_disk_holder(), resulting in the mpath device > > becoming a holder of the underlying devices. e.g.: > > /sys/block/sda/holders/dm-4 > > > > But at no point does DM-mpath get a reference to the underlying > > devices' request_queue that gets assigned to clone->q (in > > drivers/md/dm-mpath.c:map_io). > > > > Seems we should, though AFAIK it won't help with the issue you've > > pointed out (because the hotplugged device's driver already called > > blk_cleanup_queue and nuked the elevator). > > [Thinking loud] > > Could it be a driver specific issue that it cleaned up the request > queue too early? One could glibly answer yes to this. However, the fact is that it's currently SCSI which manages the queue, so SCSI cleans it up. Now, the only real thing dm is interested in is the queue itself, hence the need to take a reference to the queue. However, queue references don't pin SCSI devices, so you can hold a queue reference all you like and SCSI will still clean up the queue. I think a better question is what should cleaning up the queue do? SCSI uses it to indicate that we're no longer processing requests, which happens when the device goes into a DEL state, but queue cleanup tears down the elevators and really makes the request queue non functional. In this case, holding a reference isn't particularly helpful. I'm starting to wonder if there's actually any value to blk_cleanup_queue() and whether its functionality wouldn't be better assumed by the queue release function on last put. > Is there any notion of device reference which higher layers take and > that should make sure request queue is intact till somebody is holding > device reference. > > If yes, what't the connection between device reference and request > queue reference. IOW, why request queue reference is needed and why > device reference is not sufficient. (Because there is not necessarily > one to one mapping between request queue and device?) For the model, these need to be independent but related. Trying to subordinate one to the other is going to lead to layering problems. James > I seem to just have lots of question about devices and referencing. > Hopefully somebody with more knowledge in this area be able to > shed some light on it. > > Thanks > Vivek ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] block: Check that queue is alive in blk_insert_cloned_request() 2011-07-12 17:41 ` James Bottomley @ 2011-07-12 18:02 ` Vivek Goyal 2011-07-12 18:28 ` James Bottomley 0 siblings, 1 reply; 16+ messages in thread From: Vivek Goyal @ 2011-07-12 18:02 UTC (permalink / raw) To: James Bottomley Cc: Mike Snitzer, Roland Dreier, Jens Axboe, Alan Stern, Heiko Carstens, linux-scsi, Steffen Maier, Manvanthara B. Puttashankar, Tarak Reddy, Seshagiri N. Ippili, linux-kernel, device-mapper development, Tejun Heo On Tue, Jul 12, 2011 at 12:41:30PM -0500, James Bottomley wrote: > On Tue, 2011-07-12 at 13:06 -0400, Vivek Goyal wrote: > > On Mon, Jul 11, 2011 at 06:40:11PM -0400, Mike Snitzer wrote: > > > [cc'ing dm-devel, vivek and tejun] > > > > > > On Fri, Jul 8, 2011 at 7:04 PM, Roland Dreier <roland@kernel.org> wrote: > > > > From: Roland Dreier <roland@purestorage.com> > > > > > > > > This fixes crashes such as the below that I see when the storage > > > > underlying a dm-multipath device is hot-removed. The problem is that > > > > dm requeues a request to a device whose block queue has already been > > > > cleaned up, and blk_insert_cloned_request() doesn't check if the queue > > > > is alive, but rather goes ahead and tries to queue the request. This > > > > ends up dereferencing the elevator that was already freed in > > > > blk_cleanup_queue(). > > > > > > Your patch looks fine to me: > > > Acked-by: Mike Snitzer <snitzer@redhat.com> > > > > > > And I looked at various code paths to arrive at the references DM takes. > > > > > > A reference is taken on the underlying devices' block_device via > > > drivers/md/dm-table.c:open_dev() with blkdev_get_by_dev(). open_dev() > > > also does bd_link_disk_holder(), resulting in the mpath device > > > becoming a holder of the underlying devices. e.g.: > > > /sys/block/sda/holders/dm-4 > > > > > > But at no point does DM-mpath get a reference to the underlying > > > devices' request_queue that gets assigned to clone->q (in > > > drivers/md/dm-mpath.c:map_io). > > > > > > Seems we should, though AFAIK it won't help with the issue you've > > > pointed out (because the hotplugged device's driver already called > > > blk_cleanup_queue and nuked the elevator). > > > > [Thinking loud] > > > > Could it be a driver specific issue that it cleaned up the request > > queue too early? > > One could glibly answer yes to this. However, the fact is that it's > currently SCSI which manages the queue, so SCSI cleans it up. Now, the > only real thing dm is interested in is the queue itself, hence the need > to take a reference to the queue. However, queue references don't pin > SCSI devices, so you can hold a queue reference all you like and SCSI > will still clean up the queue. > > I think a better question is what should cleaning up the queue do? SCSI > uses it to indicate that we're no longer processing requests, which > happens when the device goes into a DEL state, but queue cleanup tears > down the elevators and really makes the request queue non functional. > In this case, holding a reference isn't particularly helpful. > > I'm starting to wonder if there's actually any value to > blk_cleanup_queue() and whether its functionality wouldn't be better > assumed by the queue release function on last put. I think one problem point is q->queue_lock. If driver drops its reference on queue and cleans up its data structures, then it will free up memory associated with q->queue_lock too. (If driver provided its own queue lock). In that case anything which is dependent on queue lock, needs to be freed up on blk_cleanup_queue(). If we can make sure that request queue reference will keep the spin lock alive, then i guess all cleanup part might be able to go in release queue function. Thanks Vivek ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] block: Check that queue is alive in blk_insert_cloned_request() 2011-07-12 18:02 ` Vivek Goyal @ 2011-07-12 18:28 ` James Bottomley 2011-07-12 18:54 ` Vivek Goyal 2011-07-12 21:02 ` Alan Stern 0 siblings, 2 replies; 16+ messages in thread From: James Bottomley @ 2011-07-12 18:28 UTC (permalink / raw) To: Vivek Goyal Cc: Mike Snitzer, Roland Dreier, Jens Axboe, Alan Stern, Heiko Carstens, linux-scsi, Steffen Maier, Manvanthara B. Puttashankar, Tarak Reddy, Seshagiri N. Ippili, linux-kernel, device-mapper development, Tejun Heo On Tue, 2011-07-12 at 14:02 -0400, Vivek Goyal wrote: > On Tue, Jul 12, 2011 at 12:41:30PM -0500, James Bottomley wrote: > > On Tue, 2011-07-12 at 13:06 -0400, Vivek Goyal wrote: > > > On Mon, Jul 11, 2011 at 06:40:11PM -0400, Mike Snitzer wrote: > > > > [cc'ing dm-devel, vivek and tejun] > > > > > > > > On Fri, Jul 8, 2011 at 7:04 PM, Roland Dreier <roland@kernel.org> wrote: > > > > > From: Roland Dreier <roland@purestorage.com> > > > > > > > > > > This fixes crashes such as the below that I see when the storage > > > > > underlying a dm-multipath device is hot-removed. The problem is that > > > > > dm requeues a request to a device whose block queue has already been > > > > > cleaned up, and blk_insert_cloned_request() doesn't check if the queue > > > > > is alive, but rather goes ahead and tries to queue the request. This > > > > > ends up dereferencing the elevator that was already freed in > > > > > blk_cleanup_queue(). > > > > > > > > Your patch looks fine to me: > > > > Acked-by: Mike Snitzer <snitzer@redhat.com> > > > > > > > > And I looked at various code paths to arrive at the references DM takes. > > > > > > > > A reference is taken on the underlying devices' block_device via > > > > drivers/md/dm-table.c:open_dev() with blkdev_get_by_dev(). open_dev() > > > > also does bd_link_disk_holder(), resulting in the mpath device > > > > becoming a holder of the underlying devices. e.g.: > > > > /sys/block/sda/holders/dm-4 > > > > > > > > But at no point does DM-mpath get a reference to the underlying > > > > devices' request_queue that gets assigned to clone->q (in > > > > drivers/md/dm-mpath.c:map_io). > > > > > > > > Seems we should, though AFAIK it won't help with the issue you've > > > > pointed out (because the hotplugged device's driver already called > > > > blk_cleanup_queue and nuked the elevator). > > > > > > [Thinking loud] > > > > > > Could it be a driver specific issue that it cleaned up the request > > > queue too early? > > > > One could glibly answer yes to this. However, the fact is that it's > > currently SCSI which manages the queue, so SCSI cleans it up. Now, the > > only real thing dm is interested in is the queue itself, hence the need > > to take a reference to the queue. However, queue references don't pin > > SCSI devices, so you can hold a queue reference all you like and SCSI > > will still clean up the queue. > > > > I think a better question is what should cleaning up the queue do? SCSI > > uses it to indicate that we're no longer processing requests, which > > happens when the device goes into a DEL state, but queue cleanup tears > > down the elevators and really makes the request queue non functional. > > In this case, holding a reference isn't particularly helpful. > > > > I'm starting to wonder if there's actually any value to > > blk_cleanup_queue() and whether its functionality wouldn't be better > > assumed by the queue release function on last put. > > I think one problem point is q->queue_lock. If driver drops its reference > on queue and cleans up its data structures, then it will free up memory > associated with q->queue_lock too. (If driver provided its own queue > lock). In that case anything which is dependent on queue lock, needs > to be freed up on blk_cleanup_queue(). I don't quite follow. blk_cleanup_queue() doesn't free anything (well, except the elevator). Final put will free the queue structure which contains the lock, but if it's really a final put, you have no other possible references, so no-one is using the lock ... well, assuming there isn't a programming error, of course ... > If we can make sure that request queue reference will keep the spin lock > alive, then i guess all cleanup part might be able to go in release > queue function. As I said: cleanup doesn't free the structure containing the lock, release does, so that piece wouldn't be altered by putting blk_cleanup_queue() elsewhere. James ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] block: Check that queue is alive in blk_insert_cloned_request() 2011-07-12 18:28 ` James Bottomley @ 2011-07-12 18:54 ` Vivek Goyal 2011-07-12 21:02 ` Alan Stern 1 sibling, 0 replies; 16+ messages in thread From: Vivek Goyal @ 2011-07-12 18:54 UTC (permalink / raw) To: James Bottomley Cc: Mike Snitzer, Roland Dreier, Jens Axboe, Alan Stern, Heiko Carstens, linux-scsi, Steffen Maier, Manvanthara B. Puttashankar, Tarak Reddy, Seshagiri N. Ippili, linux-kernel, device-mapper development, Tejun Heo On Tue, Jul 12, 2011 at 01:28:18PM -0500, James Bottomley wrote: [..] > > > I'm starting to wonder if there's actually any value to > > > blk_cleanup_queue() and whether its functionality wouldn't be better > > > assumed by the queue release function on last put. > > > > I think one problem point is q->queue_lock. If driver drops its reference > > on queue and cleans up its data structures, then it will free up memory > > associated with q->queue_lock too. (If driver provided its own queue > > lock). In that case anything which is dependent on queue lock, needs > > to be freed up on blk_cleanup_queue(). > > I don't quite follow. blk_cleanup_queue() doesn't free anything (well, > except the elevator). Final put will free the queue structure which > contains the lock, but if it's really a final put, you have no other > possible references, so no-one is using the lock ... well, assuming > there isn't a programming error, of course ... > > > If we can make sure that request queue reference will keep the spin lock > > alive, then i guess all cleanup part might be able to go in release > > queue function. > > As I said: cleanup doesn't free the structure containing the lock, > release does, so that piece wouldn't be altered by putting > blk_cleanup_queue() elsewhere. I thought a driver could either rely on spin lock provided by request queue or override that by providing its own spinlock. blk_init_allocated_queue_node() /* Override internal queue lock with supplied lock pointer */ if (lock) q->queue_lock = lock; So if driver calls blk_cleanup_queue() and drops its reference on queue, then it should be free to release any memory it has allocated for spinlock. So though queue is around there are no gurantees that q->queue_lock is still around. That memory might have been freed by driver and reused. I see many drivers are providing their own locks. Some samples from drivers/block. /virtio_blk.c: q = vblk->disk->queue = blk_init_queue(do_virtblk_request, &vblk->lock); ./xd.c: xd_queue = blk_init_queue(do_xd_request, &xd_lock); ./cpqarray.c: q = blk_init_queue(do_ida_request, &hba[i]->lock); ./sx8.c: q = blk_init_queue(carm_rq_fn, &host->lock); ./sx8.c: q = blk_init_queue(carm_oob_rq_fn, &host->lock); ./floppy.c: disks[dr]->queue = blk_init_queue(do_fd_request, &floppy_lock); ./viodasd.c: q = blk_init_queue(do_viodasd_request, &d->q_lock); ./cciss.c: disk->queue = blk_init_queue(do_cciss_request, &h->lock); ./hd.c: hd_queue = blk_init_queue(do_hd_request, &hd_lock); ./DAC960.c: RequestQueue = blk_init_queue(DAC960_RequestFunction,&Controller->queue_lock); ./z2ram.c: z2_queue = blk_init_queue(do_z2_request, &z2ram_lock); ./amiflop.c: disk->queue = blk_init_queue(do_fd_request, &amiflop_lock); ./xen-blkfront.c: rq = blk_init_queue(do_blkif_request, &blkif_io_lock); ./paride/pd.c: pd_queue = blk_init_queue(do_pd_request, &pd_lock); ./paride/pf.c: pf_queue = blk_init_queue(do_pf_request, &pf_spin_lock); ./paride/pcd.c: pcd_queue = blk_init_queue(do_pcd_request, &pcd_lock); ./mg_disk.c: host->breq = blk_init_queue(mg_request_poll, &host->lock); ./mg_disk.c: host->breq = blk_init_queue(mg_request, &host->lock); ./rbd.c: q = blk_init_queue(rbd_rq_fn, &rbd_dev->lock); ./sunvdc.c: q = blk_init_queue(do_vdc_request, &port->vio.lock); ./swim.c: swd->queue = blk_init_queue(do_fd_request, &swd->lock); ./xsysace.c: ace->queue = blk_init_queue(ace_request, &ace->lock); ./osdblk.c: q = blk_init_queue(osdblk_rq_fn, &osdev->lock); ./ps3disk.c: queue = blk_init_queue(ps3disk_request, &priv->lock); ./swim3.c: swim3_queue = blk_init_queue(do_fd_request, &swim3_lock); ./ub.c: if ((q = blk_init_queue(ub_request_fn, sc->lock)) == NULL) ./nbd.c: disk->queue = blk_init_queue(do_nbd_request, &nbd_lock); Thanks Vivek ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] block: Check that queue is alive in blk_insert_cloned_request() 2011-07-12 18:28 ` James Bottomley 2011-07-12 18:54 ` Vivek Goyal @ 2011-07-12 21:02 ` Alan Stern 1 sibling, 0 replies; 16+ messages in thread From: Alan Stern @ 2011-07-12 21:02 UTC (permalink / raw) To: James Bottomley Cc: Vivek Goyal, Mike Snitzer, Roland Dreier, Jens Axboe, Heiko Carstens, linux-scsi, Steffen Maier, Manvanthara B. Puttashankar, Tarak Reddy, Seshagiri N. Ippili, linux-kernel, device-mapper development, Tejun Heo On Tue, 12 Jul 2011, James Bottomley wrote: > > I think one problem point is q->queue_lock. If driver drops its reference > > on queue and cleans up its data structures, then it will free up memory > > associated with q->queue_lock too. (If driver provided its own queue > > lock). In that case anything which is dependent on queue lock, needs > > to be freed up on blk_cleanup_queue(). > > I don't quite follow. blk_cleanup_queue() doesn't free anything (well, > except the elevator). Final put will free the queue structure which > contains the lock, but if it's really a final put, you have no other > possible references, so no-one is using the lock ... well, assuming > there isn't a programming error, of course ... > > > If we can make sure that request queue reference will keep the spin lock > > alive, then i guess all cleanup part might be able to go in release > > queue function. > > As I said: cleanup doesn't free the structure containing the lock, > release does, so that piece wouldn't be altered by putting > blk_cleanup_queue() elsewhere. It's worth taking a closer look at what happened here. Originally request_queue->queuedata was set to NULL and blk_cleanup_queue() was called from scsi_device_dev_release_usercontext() (by way of scsi_free_queue()). This caused a problem because there was a window in which the last reference to the scsi_device had been dropped but the release routine hadn't yet run, so the queue was still marked as active (i.e., queuedata was still non-NULL). Commit 86cbfb5607d4b81b1a993ff689bbd2addd5d3a9b ([SCSI] put stricter guards on queue dead checks) was written to fix this problem. However it moved _both_ lines from the release routine into __scsi_remove_device(). There was no need for this; all that was needed was to make sure queuedata was NULL before the device reference was dropped. And in fact, moving the call to scsi_free_queue() has opened up another window for bugs. Now it's possible for requests to get on the queue (submitted by drivers from another layer that still hold a reference to the scsi_device) even after the queue's elevator has been freed. Yes, the block layer is supposed to prevent this from happening, but there are multiple paths and they don't all have the requisite checks. And as Vivek points out, it's questionable whether they _can_ make the proper checks. Checking requires the queue's lock, but the client may have deallocated the lock at the time the queue was cleaned up. Maybe SCSI doesn't do this, but other clients of the block layer might. The block layer can't make any assumptions. The unavoidable conclusion is that the clients must be responsible for insuring that no requests are added to a queue after its lock has been released. To help accomplish this in SCSI, I'm suggesting that the call to scsi_free_queue() be moved back to where it used to be, in scsi_device_dev_release_usercontext(). Then it won't be possible for any wayward requests to be added to the queue after it is cleaned up, because there won't be any outstanding references to the scsi_device. Alan Stern ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] block: Check that queue is alive in blk_insert_cloned_request() 2011-07-08 23:04 ` [PATCH] block: Check that queue is alive in blk_insert_cloned_request() Roland Dreier 2011-07-09 9:05 ` Stefan Richter 2011-07-11 22:40 ` Mike Snitzer @ 2011-07-12 2:09 ` Vivek Goyal 2 siblings, 0 replies; 16+ messages in thread From: Vivek Goyal @ 2011-07-12 2:09 UTC (permalink / raw) To: Roland Dreier Cc: Jens Axboe, James Bottomley, Alan Stern, Heiko Carstens, linux-scsi, Steffen Maier, Manvanthara B. Puttashankar, Tarak Reddy, Seshagiri N. Ippili, linux-kernel On Fri, Jul 08, 2011 at 04:04:30PM -0700, Roland Dreier wrote: > From: Roland Dreier <roland@purestorage.com> > > This fixes crashes such as the below that I see when the storage > underlying a dm-multipath device is hot-removed. The problem is that > dm requeues a request to a device whose block queue has already been > cleaned up, and blk_insert_cloned_request() doesn't check if the queue > is alive, but rather goes ahead and tries to queue the request. This > ends up dereferencing the elevator that was already freed in > blk_cleanup_queue(). > > general protection fault: 0000 [#1] SMP > CPU 7 > Modules linked in: kvm_intel kvm serio_raw i7core_edac edac_core ioatdma dca pci_stub ses enclosure usbhid usb_storage qla2xxx mpt2sas ahci hid uas libahci e1000e scsi_transport_fc scsi_transport_sas mlx4_core raid_class scsi_tgt > RIP: 0010:[<ffffffff81233867>] [<ffffffff81233867>] elv_drain_elevator+0x27/0x80 > RSP: 0018:ffff880614e9fa48 EFLAGS: 00010096 > RAX: 6b6b6b6b6b6b6b6b RBX: ffff880610bb0000 RCX: 0000000000000000 > RDX: 0000000000000002 RSI: 0000000000000001 RDI: ffff880610bb0000 > RBP: ffff880614e9fa58 R08: 0000000000000000 R09: 0000000000000001 > R10: ffff880c0a7dca70 R11: ffff880615622440 R12: ffff880610bb0000 > R13: 0000000000000002 R14: 0000000000000002 R15: ffff880c0db01160 > FS: 00007fe46457a760(0000) GS:ffff880c3fc20000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 00007fe463c86330 CR3: 0000000c0cc0c000 CR4: 00000000000006e0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > Process blkid (pid: 12458, threadinfo ffff880614e9e000, task ffff8806190eae20) > Stack: > ffff880c0d9f03c8 ffff880c0d9f03c8 ffff880614e9fa88 ffffff > ffff880c0d9f03c8 ffff880610bb0000 0000000000000002 ffffc9 > ffff880614e9fab8 ffffffff812366fd ffff880614e9fad8 ffff88 > Call Trace: > [<ffffffff812339a8>] __elv_add_request+0xe8/0x280 > [<ffffffff812366fd>] add_acct_request+0x3d/0x50 > [<ffffffff81236775>] blk_insert_cloned_request+0x65/0x90 > [<ffffffff813b8d4e>] dm_dispatch_request+0x3e/0x70 > [<ffffffff813ba850>] dm_request_fn+0x160/0x250 > [<ffffffff81236e88>] queue_unplugged+0x48/0xd0 > [<ffffffff8123b03d>] blk_flush_plug_list+0x1ed/0x250 Roland, IIUC, this crash can happen even without dm-multipath being in picture. Because it looks like it can happen that we have put requests on plug list and then device is hot removed which ends up cleaning elevator and then blk_flush_plug_list() is called. Can you please try it without multipath target. Thanks Vivek ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2011-07-12 21:02 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <CAG4TOxMjuntpd3Q5jgn+xDa2tgB9tq+jLcDquM_eAdFHXLDrWA@mail.gmail.com>
2011-07-08 23:04 ` [PATCH] block: Check that queue is alive in blk_insert_cloned_request() Roland Dreier
2011-07-09 9:05 ` Stefan Richter
2011-07-11 22:40 ` Mike Snitzer
2011-07-12 0:52 ` Alan Stern
2011-07-12 1:22 ` Mike Snitzer
2011-07-12 1:46 ` Vivek Goyal
2011-07-12 15:24 ` Alan Stern
2011-07-12 17:10 ` Vivek Goyal
2011-07-12 14:58 ` [PATCH] dm mpath: manage reference on request queue of underlying devices Mike Snitzer
2011-07-12 17:06 ` [PATCH] block: Check that queue is alive in blk_insert_cloned_request() Vivek Goyal
2011-07-12 17:41 ` James Bottomley
2011-07-12 18:02 ` Vivek Goyal
2011-07-12 18:28 ` James Bottomley
2011-07-12 18:54 ` Vivek Goyal
2011-07-12 21:02 ` Alan Stern
2011-07-12 2:09 ` Vivek Goyal
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox