From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932775Ab3LEP6N (ORCPT ); Thu, 5 Dec 2013 10:58:13 -0500 Received: from merlin.infradead.org ([205.233.59.134]:51976 "EHLO merlin.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756819Ab3LEP6M (ORCPT ); Thu, 5 Dec 2013 10:58:12 -0500 Message-ID: <52A0A258.1040404@kernel.dk> Date: Thu, 05 Dec 2013 08:57:12 -0700 From: Jens Axboe User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.1.1 MIME-Version: 1.0 To: Ming Lei , Dave Chinner CC: linux-kernel@vger.kernel.org Subject: Re: [OOPS, 3.13-rc2] null ptr in dio_complete() References: <20131203215940.GX10988@dastard> <20131204015837.GJ10988@dastard> <20131205222220.5d7f4a48@tom-ThinkPad-T410> In-Reply-To: <20131205222220.5d7f4a48@tom-ThinkPad-T410> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 12/05/2013 07:22 AM, Ming Lei wrote: > On Wed, 4 Dec 2013 12:58:38 +1100 > Dave Chinner wrote: > >> >> And I just hit this from running xfs_repair which is doing >> multithreaded direct IO directly on /dev/vdc: >> >> [ 1776.508599] BUG: unable to handle kernel NULL pointer dereference at 0000000000000328 >> [ 1776.510446] IP: [] blk_account_io_done+0x6a/0x180 >> [ 1776.511762] PGD 38e75b067 PUD 41987d067 PMD 0 >> [ 1776.512577] Oops: 0000 [#1] SMP >> [ 1776.512577] Modules linked in: >> [ 1776.512577] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.13.0-rc2-dgc+ #75 >> [ 1776.512577] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 >> [ 1776.512577] task: ffffffff81f37480 ti: ffffffff81f26000 task.ti: ffffffff81f26000 >> [ 1776.512577] RIP: 0010:[] [] blk_account_io_done+0x6a/0x180 >> [ 1776.512577] RSP: 0000:ffff88011bc03da8 EFLAGS: 00010046 >> [ 1776.512577] RAX: 0000000010000000 RBX: 0000000000000000 RCX: 0000000000000000 >> [ 1776.512577] RDX: 0000000000000000 RSI: 0000000000000002 RDI: ffff8800dba59fc0 >> [ 1776.512577] RBP: ffff88011bc03db8 R08: 0000000000000000 R09: ffff88021a828928 >> [ 1776.512577] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000 >> [ 1776.512577] R13: 0000000000000000 R14: 0000000000002000 R15: 0000000000000000 >> [ 1776.512577] FS: 0000000000000000(0000) GS:ffff88011bc00000(0000) knlGS:0000000000000000 >> [ 1776.512577] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b >> [ 1776.512577] CR2: 0000000000000328 CR3: 000000041a4b0000 CR4: 00000000000006f0 >> [ 1776.512577] Stack: >> [ 1776.512577] 0000000000000000 ffff8800dba59fc0 ffff88011bc03df8 ffffffff8175e4b8 >> [ 1776.512577] 00000000000001c0 ffff8800dba59fc0 ffffe8fcfbc03a80 ffff88041aac0800 >> [ 1776.512577] ffff88011ad53b10 0000000000000096 ffff88011bc03e08 ffffffff8175e536 >> [ 1776.512577] Call Trace: >> [ 1776.512577] >> [ 1776.512577] [] blk_mq_complete_request+0xb8/0x120 >> [ 1776.512577] [] __blk_mq_end_io+0x16/0x20 >> [ 1776.512577] [] blk_mq_end_io+0x68/0xd0 >> [ 1776.512577] [] virtblk_done+0x67/0x110 >> [ 1776.512577] [] vring_interrupt+0x35/0x60 >> [ 1776.512577] [] handle_irq_event_percpu+0x54/0x1e0 >> [ 1776.512577] [] handle_irq_event+0x48/0x70 >> [ 1776.512577] [] handle_edge_irq+0x77/0x110 >> [ 1776.512577] [] handle_irq+0xbf/0x150 >> [ 1776.512577] [] ? __atomic_notifier_call_chain+0x12/0x20 >> [ 1776.512577] [] ? atomic_notifier_call_chain+0x16/0x20 >> [ 1776.512577] [] do_IRQ+0x5a/0xe0 >> [ 1776.512577] [] common_interrupt+0x6d/0x6d >> [ 1776.512577] >> [ 1776.512577] [] ? native_safe_halt+0x6/0x10 >> [ 1776.512577] [] default_idle+0x1f/0xc0 >> [ 1776.512577] [] arch_cpu_idle+0x26/0x30 >> [ 1776.512577] [] cpu_startup_entry+0x81/0x240 >> [ 1776.512577] [] rest_init+0x77/0x80 >> [ 1776.512577] [] start_kernel+0x3c8/0x3d5 >> [ 1776.512577] [] ? repair_env_string+0x5e/0x5e >> [ 1776.512577] [] ? early_idt_handlers+0x117/0x120 >> [ 1776.512577] [] x86_64_start_reservations+0x2a/0x2c >> [ 1776.512577] [] x86_64_start_kernel+0x146/0x155 >> [ 1776.512577] Code: 54 41 89 c4 53 41 83 e4 01 48 2b 97 d8 00 00 00 48 8b 9f d0 00 00 00 4d 63 e4 65 44 8b 04 25 1c b0 00 00 49 8d 74 24 02 49 63 c8 <48> 8b 83 28 03 >> [ 1776.512577] RIP [] blk_account_io_done+0x6a/0x180 >> [ 1776.512577] RSP >> [ 1776.512577] CR2: 0000000000000328 >> [ 1776.512577] ---[ end trace 33a5787221ff13b4 ]--- >> [ 1776.512577] Kernel panic - not syncing: Fatal exception in interrupt >> [ 1776.512577] Shutting down cpus with NMI >> >> So this is looking like another virtio+blk_mq problem.... > > This one might be caused by using-after-'free' request because the freed > request may be allocated to another CPU immediately, and the below patch > may be helpful, would you mind testing it? > > diff --git a/block/blk-mq.c b/block/blk-mq.c > index 6875736..2358bdf 100644 > --- a/block/blk-mq.c > +++ b/block/blk-mq.c > @@ -311,12 +311,12 @@ void blk_mq_complete_request(struct request *rq, int error) > > blk_account_io_completion(rq, bytes); > > + blk_account_io_done(rq); > + > if (rq->end_io) > rq->end_io(rq, error); > else > blk_mq_free_request(rq); > - > - blk_account_io_done(rq); > } > > void __blk_mq_end_io(struct request *rq, int error) That's definitely a bug... Good catch, Ming. Applied. -- Jens Axboe