From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754960Ab2ASIVU (ORCPT ); Thu, 19 Jan 2012 03:21:20 -0500 Received: from casper.infradead.org ([85.118.1.10]:59675 "EHLO casper.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753907Ab2ASIVS (ORCPT ); Thu, 19 Jan 2012 03:21:18 -0500 Message-ID: <4F17D263.4090803@kernel.dk> Date: Thu, 19 Jan 2012 09:20:51 +0100 From: Jens Axboe MIME-Version: 1.0 To: Shaohua Li CC: Tejun Heo , Vivek Goyal , linux kernel mailing list Subject: Re: [patch]block: fix NULL icq_cache reference References: <20120117201823.GD19223@redhat.com> <20120117214834.GD26207@google.com> <20120117220715.GB23527@redhat.com> <20120118010323.GA32160@htj.dyndns.org> <20120118011112.GB32160@htj.dyndns.org> <1326850253.22361.619.camel@sli10-conroe> <1326866602.22361.624.camel@sli10-conroe> <20120118160713.GA30664@google.com> <1326937294.22361.649.camel@sli10-conroe> In-Reply-To: <1326937294.22361.649.camel@sli10-conroe> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 01/19/2012 02:41 AM, Shaohua Li wrote: > On Wed, 2012-01-18 at 08:07 -0800, Tejun Heo wrote: >> Hello, >> >> On Wed, Jan 18, 2012 at 02:03:22PM +0800, Shaohua Li wrote: >>> Subject: block: fix NULL icq_cache reference >>> >>> CPU0: CPU1: >>> switch from cfq to noop >>> elv_quiesce_start >>> C: get_request >>> A: ioc_create_icq >>> alloc icq with cfq >>> B: do elevator switch >>> ioc_clear_queue >>> elv_quiesce_end >>> insert icq to ioc >>> switch from noop to cfq >>> elv_quiesce_start >>> do elevator switch >>> ioc_clear_queue >>> elv_quiesce_end >>> CPU0 leaves some icq to ioc list after elevator is switching from cfq to noop. >>> in the second ioc_clear_queue, the ioc has icq in its list, but current >>> elevator is noop. so ioc_exit_icq will get a NULL et->icq_cache. >>> >>> In above cases, if A runs after B, ioc_create_icq will have a NULL >>> et->icq_cache, this will trigger another crash. >>> >>> Note, get_request caches et without lock hold. Between C and A, a elevator >>> switch can start. But we will have elvpriv++, elv_quiesce_start will drain >>> all requests first. So this will not trigger crash. >> >> Thanks a lot for tracking it down. >> >> Hmmm... but I'm having a difficult time following the description. >> Maybe we can simplify a bit? e.g. sth like the following? >> >> Once a queue is quiesced, it's not supposed to have any elvpriv data >> or icq's, and elevator switching depends on that. Request alloc >> path followed the rule for elvpriv data but forgot apply it to >> icq's leading to the following crash during elevator switch. >> >> >> >> Fix it by not allocating icq's if ELVPRIV is not set for the >> request. > I'm trying to explain why there is a crash, but fine to use your > description. > >>> Index: linux/block/blk-core.c >>> =================================================================== >>> --- linux.orig/block/blk-core.c 2012-01-18 12:44:13.000000000 +0800 >>> +++ linux/block/blk-core.c 2012-01-18 12:45:28.000000000 +0800 >>> @@ -872,11 +872,11 @@ retry: >>> spin_unlock_irq(q->queue_lock); >>> >>> /* create icq if missing */ >>> - if (unlikely(et->icq_cache && !icq)) >>> + if (unlikely(et->icq_cache && !icq && (rw_flags & REQ_ELVPRIV))) >>> icq = ioc_create_icq(q, gfp_mask); >>> >>> /* rqs are guaranteed to have icq on elv_set_request() if requested */ >>> - if (likely(!et->icq_cache || icq)) >>> + if (likely(!et->icq_cache || icq || !(rw_flags & REQ_ELVPRIV))) >>> rq = blk_alloc_request(q, icq, rw_flags, gfp_mask); >> >> Hmmm... I was trying to avoid adding a goto label with the double >> testing but with REQ_ELVPRIV test added, it looks more confusing. >> Maybe something like the following is better? >> >> /* rqs are guaranteed to have icq on elv_set_request() if requested */ >> if ((rw_flags & REQ_ELVPRIV) && unlikely(et->icq_cache && !icq)) { >> icq = ioc_create_icq(q, gfp_mask); >> if (!icq) >> goto fail_icq; >> } >> rq = blk_alloc_request(q, icq, rw_flags, gfp_mask); >> fail_icq: > this is ok. > > Subject: block: fix NULL icq_cache reference > > Vivek reported a kernel crash: > [ 94.217015] BUG: unable to handle kernel NULL pointer dereference at 000000000000001c > [ 94.218004] IP: [] kmem_cache_free+0x5e/0x200 > [ 94.218004] PGD 13abda067 PUD 137d52067 PMD 0 > [ 94.218004] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC > [ 94.218004] CPU 0 > [ 94.218004] Modules linked in: [last unloaded: scsi_wait_scan] > [ 94.218004] > [ 94.218004] Pid: 0, comm: swapper/0 Not tainted 3.2.0+ #16 Hewlett-Packard HP xw6600 Workstation/0A9Ch > [ 94.218004] RIP: 0010:[] [] kmem_cache_free+0x5e/0x200 > [ 94.218004] RSP: 0018:ffff88013fc03de0 EFLAGS: 00010006 > [ 94.218004] RAX: ffffffff81e0d020 RBX: ffff880138b3c680 RCX: 00000001801c001b > [ 94.218004] RDX: 00000000003aac1d RSI: ffff880138b3c680 RDI: ffffffff81142fae > [ 94.218004] RBP: ffff88013fc03e10 R08: ffff880137830238 R09: 0000000000000001 > [ 94.218004] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 > [ 94.218004] R13: ffffea0004e2cf00 R14: ffffffff812f6eb6 R15: 0000000000000246 > [ 94.218004] FS: 0000000000000000(0000) GS:ffff88013fc00000(0000) knlGS:0000000000000000 > [ 94.218004] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > [ 94.218004] CR2: 000000000000001c CR3: 00000001395ab000 CR4: 00000000000006f0 > [ 94.218004] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > [ 94.218004] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > [ 94.218004] Process swapper/0 (pid: 0, threadinfo ffffffff81e00000, task ffffffff81e0d020) > [ 94.218004] Stack: > [ 94.218004] 0000000000000102 ffff88013fc0db20 ffffffff81e22700 ffff880139500f00 > [ 94.218004] 0000000000000001 000000000000000a ffff88013fc03e20 ffffffff812f6eb6 > [ 94.218004] ffff88013fc03e90 ffffffff810c8da2 ffffffff81e01fd8 ffff880137830240 > [ 94.218004] Call Trace: > [ 94.218004] > [ 94.218004] [] icq_free_icq_rcu+0x16/0x20 > [ 94.218004] [] __rcu_process_callbacks+0x1c2/0x420 > [ 94.218004] [] rcu_process_callbacks+0x38/0x250 > [ 94.218004] [] __do_softirq+0xce/0x3e0 > [ 94.218004] [] ? clockevents_program_event+0x74/0x100 > [ 94.218004] [] ? tick_program_event+0x24/0x30 > [ 94.218004] [] call_softirq+0x1c/0x30 > [ 94.218004] [] do_softirq+0x8d/0xc0 > [ 94.218004] [] irq_exit+0xae/0xe0 > [ 94.218004] [] smp_apic_timer_interrupt+0x6e/0x99 > [ 94.218004] [] apic_timer_interrupt+0x70/0x80 > > Once a queue is quiesced, it's not supposed to have any elvpriv data or > icq's, and elevator switching depends on that. Request alloc path > followed the rule for elvpriv data but forgot apply it to icq's > leading to the following crash during elevator switch. Fix it by not > allocating icq's if ELVPRIV is not set for the request. > > Reported-by: Vivek Goyal > Tested-by: Vivek Goyal > Signed-off-by: Shaohua Li Thanks, applied. -- Jens Axboe