From mboxrd@z Thu Jan 1 00:00:00 1970 From: Bart Van Assche Subject: Re: v3.8-rc7: Kernel oops in end_clone_bio() Date: Wed, 20 Feb 2013 16:49:04 +0100 Message-ID: <5124F070.9060001@acm.org> References: <5123C8CD.9010207@acm.org> Reply-To: device-mapper development Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <5123C8CD.9010207@acm.org> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: dm-devel-bounces@redhat.com Errors-To: dm-devel-bounces@redhat.com To: device-mapper development Cc: Jens Axboe , Alasdair G Kergon List-Id: dm-devel.ids On 02/19/13 19:47, Bart Van Assche wrote: > general protection fault: 0000 [#1] SMP > RIP: 0010:[] [] mempool_free+0x24/0xb0 > Call Trace: > > [] bio_put+0x97/0xc0 > [] end_clone_bio+0x35/0x90 [dm_mod] > [] bio_endio+0x1d/0x30 > [] req_bio_endio.isra.51+0xa3/0xe0 > [] blk_update_request+0x118/0x520 > [] blk_update_bidi_request+0x27/0xa0 > [] blk_end_bidi_request+0x2c/0x80 > [] blk_end_request+0x10/0x20 > [] scsi_io_completion+0xfb/0x6c0 [scsi_mod] > [] scsi_finish_command+0xbd/0x120 [scsi_mod] > [] scsi_softirq_done+0x13f/0x160 [scsi_mod] > [] blk_done_softirq+0x80/0xa0 > [] __do_softirq+0xf1/0x250 > [] call_softirq+0x1c/0x30 > [] do_softirq+0x8d/0xc0 > [] irq_exit+0xd5/0xe0 > [] do_IRQ+0x63/0xe0 > [] common_interrupt+0x6f/0x6f > > [] srp_queuecommand+0x8c/0xcb0 [ib_srp] > [] scsi_dispatch_cmd+0x148/0x310 [scsi_mod] > [] scsi_request_fn+0x31e/0x520 [scsi_mod] > [] __blk_run_queue+0x37/0x50 > [] blk_delay_work+0x29/0x40 > [] process_one_work+0x1c3/0x5c0 > [] worker_thread+0x15e/0x440 > [] kthread+0xdb/0xe0 > [] ret_from_fork+0x7c/0xb0 (replying to my own e-mail) Any opinions about the patch below ? It seems to fix the kernel oops mentioned above. [PATCH] Avoid destroying a dm device before request processing finished diff --git a/block/blk-core.c b/block/blk-core.c index c973249..77f4ea8 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -304,10 +304,18 @@ EXPORT_SYMBOL(blk_sync_queue); * This variant runs the queue whether or not the queue has been * stopped. Must be called with the queue lock held and interrupts * disabled. See also @blk_run_queue. + * + * Note: + * Request handling functions that unlock and relock the queue lock + * internally are allowed to invoke blk_run_queue(). This will not result + * in a recursive call of the request handler. However, such request + * handling functions must, before they return, either reexamine the + * request queue or invoke blk_delay_queue() to avoid that queue processing + * stops. */ inline void __blk_run_queue_uncond(struct request_queue *q) { - if (unlikely(blk_queue_dead(q))) + if (unlikely(blk_queue_dead(q) || q->request_fn_active)) return; /* diff --git a/drivers/md/dm.c b/drivers/md/dm.c index 314a0e2..28b7ad4 100644 --- a/drivers/md/dm.c +++ b/drivers/md/dm.c @@ -728,14 +728,8 @@ static void rq_completed(struct mapped_device *md, int rw, int run_queue) if (!md_in_flight(md)) wake_up(&md->wait); - /* - * Run this off this callpath, as drivers could invoke end_io while - * inside their request_fn (and holding the queue lock). Calling - * back into ->request_fn() could deadlock attempting to grab the - * queue lock again. - */ if (run_queue) - blk_run_queue_async(md->queue); + blk_run_queue(md->queue); /* * dm_put() must be at the end of this function. See the comment above