All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jens Axboe <axboe@kernel.dk>
To: Shaohua Li <shaohua.li@intel.com>
Cc: Tejun Heo <tj@kernel.org>, Vivek Goyal <vgoyal@redhat.com>,
	linux kernel mailing list <linux-kernel@vger.kernel.org>
Subject: Re: [patch]block: fix NULL icq_cache reference
Date: Thu, 19 Jan 2012 09:20:51 +0100	[thread overview]
Message-ID: <4F17D263.4090803@kernel.dk> (raw)
In-Reply-To: <1326937294.22361.649.camel@sli10-conroe>

On 01/19/2012 02:41 AM, Shaohua Li wrote:
> On Wed, 2012-01-18 at 08:07 -0800, Tejun Heo wrote:
>> Hello,
>>
>> On Wed, Jan 18, 2012 at 02:03:22PM +0800, Shaohua Li wrote:
>>> Subject: block: fix NULL icq_cache reference
>>>
>>> CPU0:					CPU1:
>>> 					switch from cfq to noop
>>> 					   elv_quiesce_start
>>> C: get_request
>>> A:   ioc_create_icq
>>>       alloc icq with cfq
>>> 					   B: do elevator switch
>>> 					       ioc_clear_queue
>>> 					   elv_quiesce_end
>>>       insert icq to ioc
>>> 					switch from noop to cfq
>>> 					    elv_quiesce_start
>>> 					    do elevator switch
>>> 					       ioc_clear_queue
>>> 					    elv_quiesce_end
>>> CPU0 leaves some icq to ioc list after elevator is switching from cfq to noop.
>>> in the second ioc_clear_queue, the ioc has icq in its list, but current
>>> elevator is noop. so ioc_exit_icq will get a NULL et->icq_cache.
>>>
>>> In above cases, if A runs after B, ioc_create_icq will have a NULL
>>> et->icq_cache, this will trigger another crash.
>>>
>>> Note, get_request caches et without lock hold. Between C and A, a elevator
>>> switch can start. But we will have elvpriv++, elv_quiesce_start will drain
>>> all requests first. So this will not trigger crash.
>>
>> Thanks a lot for tracking it down.
>>
>> Hmmm... but I'm having a difficult time following the description.
>> Maybe we can simplify a bit?  e.g. sth like the following?
>>
>>   Once a queue is quiesced, it's not supposed to have any elvpriv data
>>   or icq's, and elevator switching depends on that.  Request alloc
>>   path followed the rule for elvpriv data but forgot apply it to
>>   icq's leading to the following crash during elevator switch.
>>
>>   <oops log>
>>
>>   Fix it by not allocating icq's if ELVPRIV is not set for the
>>   request.
> I'm trying to explain why there is a crash, but fine to use your
> description.
> 
>>> Index: linux/block/blk-core.c
>>> ===================================================================
>>> --- linux.orig/block/blk-core.c	2012-01-18 12:44:13.000000000 +0800
>>> +++ linux/block/blk-core.c	2012-01-18 12:45:28.000000000 +0800
>>> @@ -872,11 +872,11 @@ retry:
>>>  	spin_unlock_irq(q->queue_lock);
>>>  
>>>  	/* create icq if missing */
>>> -	if (unlikely(et->icq_cache && !icq))
>>> +	if (unlikely(et->icq_cache && !icq && (rw_flags & REQ_ELVPRIV)))
>>>  		icq = ioc_create_icq(q, gfp_mask);
>>>  
>>>  	/* rqs are guaranteed to have icq on elv_set_request() if requested */
>>> -	if (likely(!et->icq_cache || icq))
>>> +	if (likely(!et->icq_cache || icq || !(rw_flags & REQ_ELVPRIV)))
>>>  		rq = blk_alloc_request(q, icq, rw_flags, gfp_mask);
>>
>> Hmmm... I was trying to avoid adding a goto label with the double
>> testing but with REQ_ELVPRIV test added, it looks more confusing.
>> Maybe something like the following is better?
>>
>> 	/* rqs are guaranteed to have icq on elv_set_request() if requested */
>> 	if ((rw_flags & REQ_ELVPRIV) && unlikely(et->icq_cache && !icq)) {
>> 		icq = ioc_create_icq(q, gfp_mask);
>> 		if (!icq)
>> 			goto fail_icq;
>> 	}
>> 	rq = blk_alloc_request(q, icq, rw_flags, gfp_mask);
>>  fail_icq:
> this is ok.
> 
> Subject: block: fix NULL icq_cache reference
> 
> Vivek reported a kernel crash:
> [   94.217015] BUG: unable to handle kernel NULL pointer dereference at 000000000000001c
> [   94.218004] IP: [<ffffffff81142fae>] kmem_cache_free+0x5e/0x200
> [   94.218004] PGD 13abda067 PUD 137d52067 PMD 0
> [   94.218004] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
> [   94.218004] CPU 0
> [   94.218004] Modules linked in: [last unloaded: scsi_wait_scan]
> [   94.218004]
> [   94.218004] Pid: 0, comm: swapper/0 Not tainted 3.2.0+ #16 Hewlett-Packard HP xw6600 Workstation/0A9Ch
> [   94.218004] RIP: 0010:[<ffffffff81142fae>]  [<ffffffff81142fae>] kmem_cache_free+0x5e/0x200
> [   94.218004] RSP: 0018:ffff88013fc03de0  EFLAGS: 00010006
> [   94.218004] RAX: ffffffff81e0d020 RBX: ffff880138b3c680 RCX: 00000001801c001b
> [   94.218004] RDX: 00000000003aac1d RSI: ffff880138b3c680 RDI: ffffffff81142fae
> [   94.218004] RBP: ffff88013fc03e10 R08: ffff880137830238 R09: 0000000000000001
> [   94.218004] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
> [   94.218004] R13: ffffea0004e2cf00 R14: ffffffff812f6eb6 R15: 0000000000000246
> [   94.218004] FS:  0000000000000000(0000) GS:ffff88013fc00000(0000) knlGS:0000000000000000
> [   94.218004] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [   94.218004] CR2: 000000000000001c CR3: 00000001395ab000 CR4: 00000000000006f0
> [   94.218004] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [   94.218004] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [   94.218004] Process swapper/0 (pid: 0, threadinfo ffffffff81e00000, task ffffffff81e0d020)
> [   94.218004] Stack:
> [   94.218004]  0000000000000102 ffff88013fc0db20 ffffffff81e22700 ffff880139500f00
> [   94.218004]  0000000000000001 000000000000000a ffff88013fc03e20 ffffffff812f6eb6
> [   94.218004]  ffff88013fc03e90 ffffffff810c8da2 ffffffff81e01fd8 ffff880137830240
> [   94.218004] Call Trace:
> [   94.218004]  <IRQ>
> [   94.218004]  [<ffffffff812f6eb6>] icq_free_icq_rcu+0x16/0x20
> [   94.218004]  [<ffffffff810c8da2>] __rcu_process_callbacks+0x1c2/0x420
> [   94.218004]  [<ffffffff810c9038>] rcu_process_callbacks+0x38/0x250
> [   94.218004]  [<ffffffff810405ee>] __do_softirq+0xce/0x3e0
> [   94.218004]  [<ffffffff8108ed04>] ? clockevents_program_event+0x74/0x100
> [   94.218004]  [<ffffffff81090104>] ? tick_program_event+0x24/0x30
> [   94.218004]  [<ffffffff8183ed1c>] call_softirq+0x1c/0x30
> [   94.218004]  [<ffffffff8100422d>] do_softirq+0x8d/0xc0
> [   94.218004]  [<ffffffff81040c3e>] irq_exit+0xae/0xe0
> [   94.218004]  [<ffffffff8183f4be>] smp_apic_timer_interrupt+0x6e/0x99
> [   94.218004]  [<ffffffff8183e330>] apic_timer_interrupt+0x70/0x80
> 
> Once a queue is quiesced, it's not supposed to have any elvpriv data or
> icq's, and elevator switching depends on that.  Request alloc path
> followed the rule for elvpriv data but forgot apply it to icq's
> leading to the following crash during elevator switch. Fix it by not
> allocating icq's if ELVPRIV is not set for the request.
> 
> Reported-by: Vivek Goyal <vgoyal@redhat.com>
> Tested-by: Vivek Goyal <vgoyal@redhat.com>
> Signed-off-by: Shaohua Li <shaohua.li@intel.com>

Thanks, applied.

-- 
Jens Axboe


      parent reply	other threads:[~2012-01-19  8:21 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-01-17 20:18 Kernel crash in icq_free_icq_rcu Vivek Goyal
2012-01-17 20:19 ` Jens Axboe
2012-01-17 20:40   ` Vivek Goyal
2012-01-17 20:42     ` Jens Axboe
2012-01-17 20:58       ` Vivek Goyal
2012-01-17 21:01         ` Vivek Goyal
2012-01-17 21:48 ` Tejun Heo
2012-01-17 22:07   ` Vivek Goyal
2012-01-18  1:01     ` Shaohua Li
2012-01-18  1:03       ` Tejun Heo
2012-01-18  1:05         ` Shaohua Li
2012-01-18  1:11           ` Tejun Heo
2012-01-18  1:30             ` Shaohua Li
2012-01-18  2:26               ` Shaohua Li
2012-01-18  4:23                 ` Shaohua Li
2012-01-18  6:03               ` Shaohua Li
2012-01-18 13:51                 ` Vivek Goyal
2012-01-18 14:20                   ` Vivek Goyal
2012-01-18 16:09                     ` Tejun Heo
2012-01-18 16:24                       ` Jens Axboe
2012-01-18 16:31                         ` Jens Axboe
2012-01-18 16:36                           ` Vivek Goyal
2012-01-18 17:10                             ` Tejun Heo
2012-01-18 19:07                               ` Jens Axboe
2012-01-18 19:05                             ` Jens Axboe
2012-01-18 16:55                           ` Tejun Heo
2012-01-18 16:07                 ` Tejun Heo
2012-01-19  1:41                   ` [patch]block: fix NULL icq_cache reference Shaohua Li
2012-01-19  1:43                     ` Tejun Heo
2012-01-19  8:20                     ` Jens Axboe [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4F17D263.4090803@kernel.dk \
    --to=axboe@kernel.dk \
    --cc=linux-kernel@vger.kernel.org \
    --cc=shaohua.li@intel.com \
    --cc=tj@kernel.org \
    --cc=vgoyal@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.