* NULL ptr deref at elv_drain_elevator @ 2011-11-03 16:08 Jiri Slaby 2011-11-03 16:14 ` Tejun Heo 0 siblings, 1 reply; 12+ messages in thread From: Jiri Slaby @ 2011-11-03 16:08 UTC (permalink / raw) To: Jens Axboe; +Cc: Tejun Heo, James E.J. Bottomley, LKML, linux-scsi, Jiri Slaby Hi, I'm seeing the NULL ptr dereference below on each boot of KVM virtual machine. q->elevator is NULL. This is next-20111025. I tried to apply Tejun's patch from: https://lkml.org/lkml/2011/4/30/87 but it doesn't help. Maybe I should revert something? Scanning for LVM volume groups... Reading all physical volumes. This may take a while... No volume groups found BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffff8125a69c>] elv_drain_elevator+0x1c/0x70 PGD 46176067 PUD 452b5067 PMD 0 Oops: 0000 [#1] SMP CPU 0 Modules linked in: Pid: 830, comm: kworker/0:2 Not tainted 3.1.0-next-20111025_64+ #1590 Bochs Bochs RIP: 0010:[<ffffffff8125a69c>] [<ffffffff8125a69c>] elv_drain_elevator+0x1c/0x70 RSP: 0018:ffff8800461abd00 EFLAGS: 00010086 RAX: 0000000000000000 RBX: ffff880046948e00 RCX: 00000001820001ee RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffff880046948e00 RBP: ffff8800461abd10 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000001 R13: ffff8800469494e0 R14: ffff88004689c450 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff880049600000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000000 CR3: 00000000451f6000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process kworker/0:2 (pid: 830, threadinfo ffff8800461aa000, task ffff880046f28670) Stack: ffff88004789f888 ffff880046948e00 ffff8800461abd30 ffffffff8125da92 ffff880046948e00 ffff8800469491f4 ffff8800461abd60 ffffffff8125db90 ffff8800461abd40 ffff88004689c450 ffff88004689c400 ffff88004789f888 Call Trace: [<ffffffff8125da92>] blk_drain_queue+0x42/0x70 [<ffffffff8125db90>] blk_cleanup_queue+0xd0/0x1c0 [<ffffffff81469640>] md_free+0x50/0x70 [<ffffffff8126f43b>] kobject_release+0x8b/0x1d0 [<ffffffff8126f3b0>] ? kobject_del+0x40/0x40 [<ffffffff81469380>] ? bb_show+0x20/0x20 [<ffffffff81270d56>] kref_put+0x36/0xa0 [<ffffffff8126f2b7>] kobject_put+0x27/0x60 [<ffffffff814693af>] mddev_delayed_delete+0x2f/0x40 [<ffffffff81083450>] process_one_work+0x100/0x3b0 [<ffffffff8108527f>] worker_thread+0x15f/0x3a0 [<ffffffff81085120>] ? manage_workers.isra.32+0x240/0x240 [<ffffffff81089937>] kthread+0x87/0x90 [<ffffffff81621834>] kernel_thread_helper+0x4/0x10 [<ffffffff810898b0>] ? kthread_worker_fn+0x1a0/0x1a0 [<ffffffff81621830>] ? gs_change+0xb/0xb Code: c0 74 02 ff d0 5d c3 0f 1f 84 00 00 00 00 00 55 48 89 e5 53 48 89 fb 48 83 ec 08 0f 1f 40 00 48 8b 43 18 be 01 00 00 00 48 89 df 8b 00 ff 50 28 85 c0 75 ea 8b 93 80 04 00 00 85 d2 74 14 8b RIP [<ffffffff8125a69c>] elv_drain_elevator+0x1c/0x70 RSP <ffff8800461abd00> CR2: 0000000000000000 ---[ end trace 2b4616ccecf0982b ]--- thanks, -- js suse labs ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: NULL ptr deref at elv_drain_elevator 2011-11-03 16:08 NULL ptr deref at elv_drain_elevator Jiri Slaby @ 2011-11-03 16:14 ` Tejun Heo 2011-11-03 16:16 ` Jiri Slaby 2011-11-03 16:26 ` [PATCH block/for-next] block: don't call blk_drain_queue() if elevator is not up Tejun Heo 0 siblings, 2 replies; 12+ messages in thread From: Tejun Heo @ 2011-11-03 16:14 UTC (permalink / raw) To: Jiri Slaby; +Cc: Jens Axboe, James E.J. Bottomley, LKML, linux-scsi, Jiri Slaby On Thu, Nov 03, 2011 at 05:08:40PM +0100, Jiri Slaby wrote: > Hi, > > I'm seeing the NULL ptr dereference below on each boot of KVM virtual > machine. q->elevator is NULL. This is next-20111025. > > I tried to apply Tejun's patch from: > https://lkml.org/lkml/2011/4/30/87 > but it doesn't help. Maybe I should revert something? > > Scanning for LVM volume groups... > Reading all physical volumes. This may take a while... > No volume groups found > BUG: unable to handle kernel NULL pointer dereference at (null) > IP: [<ffffffff8125a69c>] elv_drain_elevator+0x1c/0x70 > PGD 46176067 PUD 452b5067 PMD 0 > Oops: 0000 [#1] SMP > CPU 0 > Modules linked in: > > Pid: 830, comm: kworker/0:2 Not tainted 3.1.0-next-20111025_64+ #1590 > Bochs Bochs > RIP: 0010:[<ffffffff8125a69c>] [<ffffffff8125a69c>] > elv_drain_elevator+0x1c/0x70 Heh, probably md is tearing down a queue which isn't fully setup. Does the following fix the problem? Thanks. diff --git a/block/blk-core.c b/block/blk-core.c index f658711..5292e31 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -408,7 +408,8 @@ void blk_cleanup_queue(struct request_queue *q) mutex_unlock(&q->sysfs_lock); /* drain all requests queued before DEAD marking */ - blk_drain_queue(q, true); + if (q->elevator) + blk_drain_queue(q, true); /* @q won't process any more request, flush async actions */ del_timer_sync(&q->backing_dev_info.laptop_mode_wb_timer); -- tejun ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: NULL ptr deref at elv_drain_elevator 2011-11-03 16:14 ` Tejun Heo @ 2011-11-03 16:16 ` Jiri Slaby 2011-11-03 16:26 ` [PATCH block/for-next] block: don't call blk_drain_queue() if elevator is not up Tejun Heo 1 sibling, 0 replies; 12+ messages in thread From: Jiri Slaby @ 2011-11-03 16:16 UTC (permalink / raw) To: Tejun Heo; +Cc: Jens Axboe, James E.J. Bottomley, LKML, linux-scsi, Jiri Slaby On 11/03/2011 05:14 PM, Tejun Heo wrote: > On Thu, Nov 03, 2011 at 05:08:40PM +0100, Jiri Slaby wrote: >> Hi, >> >> I'm seeing the NULL ptr dereference below on each boot of KVM virtual >> machine. q->elevator is NULL. This is next-20111025. >> >> I tried to apply Tejun's patch from: >> https://lkml.org/lkml/2011/4/30/87 >> but it doesn't help. Maybe I should revert something? >> >> Scanning for LVM volume groups... >> Reading all physical volumes. This may take a while... >> No volume groups found >> BUG: unable to handle kernel NULL pointer dereference at (null) >> IP: [<ffffffff8125a69c>] elv_drain_elevator+0x1c/0x70 >> PGD 46176067 PUD 452b5067 PMD 0 >> Oops: 0000 [#1] SMP >> CPU 0 >> Modules linked in: >> >> Pid: 830, comm: kworker/0:2 Not tainted 3.1.0-next-20111025_64+ #1590 >> Bochs Bochs >> RIP: 0010:[<ffffffff8125a69c>] [<ffffffff8125a69c>] >> elv_drain_elevator+0x1c/0x70 > > Heh, probably md is tearing down a queue which isn't fully setup. > Does the following fix the problem? Yes, it does. Thanks a lot. > diff --git a/block/blk-core.c b/block/blk-core.c > index f658711..5292e31 100644 > --- a/block/blk-core.c > +++ b/block/blk-core.c > @@ -408,7 +408,8 @@ void blk_cleanup_queue(struct request_queue *q) > mutex_unlock(&q->sysfs_lock); > > /* drain all requests queued before DEAD marking */ > - blk_drain_queue(q, true); > + if (q->elevator) > + blk_drain_queue(q, true); > > /* @q won't process any more request, flush async actions */ > del_timer_sync(&q->backing_dev_info.laptop_mode_wb_timer); > > -- js suse labs ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH block/for-next] block: don't call blk_drain_queue() if elevator is not up 2011-11-03 16:14 ` Tejun Heo 2011-11-03 16:16 ` Jiri Slaby @ 2011-11-03 16:26 ` Tejun Heo 2011-11-03 17:49 ` Jens Axboe 2011-11-04 9:40 ` Stefan Richter 1 sibling, 2 replies; 12+ messages in thread From: Tejun Heo @ 2011-11-03 16:26 UTC (permalink / raw) To: Jens Axboe; +Cc: Jiri Slaby, James E.J. Bottomley, LKML, linux-scsi, Jiri Slaby blk_cleanup_queue() may be called before elevator is set up on a queue which triggers the following oops. BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffff8125a69c>] elv_drain_elevator+0x1c/0x70 ... Pid: 830, comm: kworker/0:2 Not tainted 3.1.0-next-20111025_64+ #1590 Bochs Bochs RIP: 0010:[<ffffffff8125a69c>] [<ffffffff8125a69c>] elv_drain_elevator+0x1c/0x70 ... Call Trace: [<ffffffff8125da92>] blk_drain_queue+0x42/0x70 [<ffffffff8125db90>] blk_cleanup_queue+0xd0/0x1c0 [<ffffffff81469640>] md_free+0x50/0x70 [<ffffffff8126f43b>] kobject_release+0x8b/0x1d0 [<ffffffff81270d56>] kref_put+0x36/0xa0 [<ffffffff8126f2b7>] kobject_put+0x27/0x60 [<ffffffff814693af>] mddev_delayed_delete+0x2f/0x40 [<ffffffff81083450>] process_one_work+0x100/0x3b0 [<ffffffff8108527f>] worker_thread+0x15f/0x3a0 [<ffffffff81089937>] kthread+0x87/0x90 [<ffffffff81621834>] kernel_thread_helper+0x4/0x10 Fix it by making blk_cleanup_queue() check whether q->elevator is set up before invoking blk_drain_queue. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-and-tested-by: Jiri Slaby <jslaby@suse.cz> --- block/blk-core.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/block/blk-core.c b/block/blk-core.c index f658711..f43c8a5 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -407,8 +407,13 @@ void blk_cleanup_queue(struct request_queue *q) spin_unlock_irq(lock); mutex_unlock(&q->sysfs_lock); - /* drain all requests queued before DEAD marking */ - blk_drain_queue(q, true); + /* + * Drain all requests queued before DEAD marking. The caller might + * be trying to tear down @q before its elevator is initialized, in + * which case we don't want to call into draining. + */ + if (q->elevator) + blk_drain_queue(q, true); /* @q won't process any more request, flush async actions */ del_timer_sync(&q->backing_dev_info.laptop_mode_wb_timer); ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH block/for-next] block: don't call blk_drain_queue() if elevator is not up 2011-11-03 16:26 ` [PATCH block/for-next] block: don't call blk_drain_queue() if elevator is not up Tejun Heo @ 2011-11-03 17:49 ` Jens Axboe 2011-11-03 21:12 ` Tejun Heo 2011-11-04 9:40 ` Stefan Richter 1 sibling, 1 reply; 12+ messages in thread From: Jens Axboe @ 2011-11-03 17:49 UTC (permalink / raw) To: Tejun Heo; +Cc: Jiri Slaby, James E.J. Bottomley, LKML, linux-scsi, Jiri Slaby On 2011-11-03 17:26, Tejun Heo wrote: > blk_cleanup_queue() may be called before elevator is set up on a > queue which triggers the following oops. > > BUG: unable to handle kernel NULL pointer dereference at (null) > IP: [<ffffffff8125a69c>] elv_drain_elevator+0x1c/0x70 > ... > Pid: 830, comm: kworker/0:2 Not tainted 3.1.0-next-20111025_64+ #1590 > Bochs Bochs > RIP: 0010:[<ffffffff8125a69c>] [<ffffffff8125a69c>] elv_drain_elevator+0x1c/0x70 > ... > Call Trace: > [<ffffffff8125da92>] blk_drain_queue+0x42/0x70 > [<ffffffff8125db90>] blk_cleanup_queue+0xd0/0x1c0 > [<ffffffff81469640>] md_free+0x50/0x70 > [<ffffffff8126f43b>] kobject_release+0x8b/0x1d0 > [<ffffffff81270d56>] kref_put+0x36/0xa0 > [<ffffffff8126f2b7>] kobject_put+0x27/0x60 > [<ffffffff814693af>] mddev_delayed_delete+0x2f/0x40 > [<ffffffff81083450>] process_one_work+0x100/0x3b0 > [<ffffffff8108527f>] worker_thread+0x15f/0x3a0 > [<ffffffff81089937>] kthread+0x87/0x90 > [<ffffffff81621834>] kernel_thread_helper+0x4/0x10 > > Fix it by making blk_cleanup_queue() check whether q->elevator is set > up before invoking blk_drain_queue. Thanks, glad this got caught before the merge... -- Jens Axboe ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH block/for-next] block: don't call blk_drain_queue() if elevator is not up 2011-11-03 17:49 ` Jens Axboe @ 2011-11-03 21:12 ` Tejun Heo 2011-11-04 8:12 ` Jens Axboe 0 siblings, 1 reply; 12+ messages in thread From: Tejun Heo @ 2011-11-03 21:12 UTC (permalink / raw) To: Jens Axboe; +Cc: Jiri Slaby, James E.J. Bottomley, LKML, linux-scsi, Jiri Slaby On Thu, Nov 03, 2011 at 06:49:53PM +0100, Jens Axboe wrote: > > Fix it by making blk_cleanup_queue() check whether q->elevator is set > > up before invoking blk_drain_queue. > > Thanks, glad this got caught before the merge... Heh, yeah, definitely, and just to be paranoid, this whole thing is for the next merge window. Thanks. -- tejun ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH block/for-next] block: don't call blk_drain_queue() if elevator is not up 2011-11-03 21:12 ` Tejun Heo @ 2011-11-04 8:12 ` Jens Axboe 2011-11-04 14:26 ` Tejun Heo 0 siblings, 1 reply; 12+ messages in thread From: Jens Axboe @ 2011-11-04 8:12 UTC (permalink / raw) To: Tejun Heo; +Cc: Jiri Slaby, James E.J. Bottomley, LKML, linux-scsi, Jiri Slaby On 2011-11-03 22:12, Tejun Heo wrote: > On Thu, Nov 03, 2011 at 06:49:53PM +0100, Jens Axboe wrote: >>> Fix it by making blk_cleanup_queue() check whether q->elevator is set >>> up before invoking blk_drain_queue. >> >> Thanks, glad this got caught before the merge... > > Heh, yeah, definitely, and just to be paranoid, this whole thing is > for the next merge window. Ehm, what parts? If the bug is in for-next, things are queued up for _this_ merge window. -- Jens Axboe ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH block/for-next] block: don't call blk_drain_queue() if elevator is not up 2011-11-04 8:12 ` Jens Axboe @ 2011-11-04 14:26 ` Tejun Heo 2011-11-04 14:28 ` Jens Axboe 0 siblings, 1 reply; 12+ messages in thread From: Tejun Heo @ 2011-11-04 14:26 UTC (permalink / raw) To: Jens Axboe; +Cc: Jiri Slaby, James E.J. Bottomley, LKML, linux-scsi, Jiri Slaby Hello, On Fri, Nov 4, 2011 at 1:12 AM, Jens Axboe <axboe@kernel.dk> wrote: >> Heh, yeah, definitely, and just to be paranoid, this whole thing is >> for the next merge window. > > Ehm, what parts? If the bug is in for-next, things are queued up for > _this_ merge window. So, we had four patchsets - the original drain improvements, updates to drain improvements, cfq locking cleanup, and cfq api cleanup. Currently, the first one is in block tree but the other three are not. I was thinking all four were going mainline in the next merge window. Hmmm... yeah, the first and second patchsets kinda go together but well the first one definitely is pretty good bug fix without others, so I guess that split isn't too bad either. Alright, no objection. Thanks. -- tejun ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH block/for-next] block: don't call blk_drain_queue() if elevator is not up 2011-11-04 14:26 ` Tejun Heo @ 2011-11-04 14:28 ` Jens Axboe 2011-11-04 14:30 ` Tejun Heo 0 siblings, 1 reply; 12+ messages in thread From: Jens Axboe @ 2011-11-04 14:28 UTC (permalink / raw) To: Tejun Heo; +Cc: Jiri Slaby, James E.J. Bottomley, LKML, linux-scsi, Jiri Slaby On 2011-11-04 15:26, Tejun Heo wrote: > Hello, > > On Fri, Nov 4, 2011 at 1:12 AM, Jens Axboe <axboe@kernel.dk> wrote: >>> Heh, yeah, definitely, and just to be paranoid, this whole thing is >>> for the next merge window. >> >> Ehm, what parts? If the bug is in for-next, things are queued up for >> _this_ merge window. > > So, we had four patchsets - the original drain improvements, updates > to drain improvements, cfq locking cleanup, and cfq api cleanup. > Currently, the first one is in block tree but the other three are not. > I was thinking all four were going mainline in the next merge window. > > Hmmm... yeah, the first and second patchsets kinda go together but > well the first one definitely is pretty good bug fix without others, > so I guess that split isn't too bad either. Alright, no objection. OK, I'll push off what I have now and then we can queue #2 up shortly. I would probably prefer pushing the cfq locking cleanup to 3.3 to get some more testing time on that, but it all depends on what your level of confidence in it is? -- Jens Axboe ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH block/for-next] block: don't call blk_drain_queue() if elevator is not up 2011-11-04 14:28 ` Jens Axboe @ 2011-11-04 14:30 ` Tejun Heo 0 siblings, 0 replies; 12+ messages in thread From: Tejun Heo @ 2011-11-04 14:30 UTC (permalink / raw) To: Jens Axboe; +Cc: Jiri Slaby, James E.J. Bottomley, LKML, linux-scsi, Jiri Slaby Hello, On Fri, Nov 4, 2011 at 7:28 AM, Jens Axboe <axboe@kernel.dk> wrote: >> Hmmm... yeah, the first and second patchsets kinda go together but >> well the first one definitely is pretty good bug fix without others, >> so I guess that split isn't too bad either. Alright, no objection. > > OK, I'll push off what I have now and then we can queue #2 up shortly. I > would probably prefer pushing the cfq locking cleanup to 3.3 to get some > more testing time on that, but it all depends on what your level of > confidence in it is? Not that confident. Those definitely belong to the next window. Thank you. -- tejun ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH block/for-next] block: don't call blk_drain_queue() if elevator is not up 2011-11-03 16:26 ` [PATCH block/for-next] block: don't call blk_drain_queue() if elevator is not up Tejun Heo 2011-11-03 17:49 ` Jens Axboe @ 2011-11-04 9:40 ` Stefan Richter 2011-11-04 14:38 ` Tejun Heo 1 sibling, 1 reply; 12+ messages in thread From: Stefan Richter @ 2011-11-04 9:40 UTC (permalink / raw) To: Tejun Heo Cc: Jens Axboe, Jiri Slaby, James E.J. Bottomley, LKML, linux-scsi, Jiri Slaby On Nov 03 Tejun Heo wrote: > blk_cleanup_queue() may be called before elevator is set up on a > queue which triggers the following oops. [...] > --- a/block/blk-core.c > +++ b/block/blk-core.c > @@ -407,8 +407,13 @@ void blk_cleanup_queue(struct request_queue *q) > spin_unlock_irq(lock); > mutex_unlock(&q->sysfs_lock); > > - /* drain all requests queued before DEAD marking */ > - blk_drain_queue(q, true); > + /* > + * Drain all requests queued before DEAD marking. The caller might > + * be trying to tear down @q before its elevator is initialized, in > + * which case we don't want to call into draining. > + */ > + if (q->elevator) > + blk_drain_queue(q, true); > > /* @q won't process any more request, flush async actions */ > del_timer_sync(&q->backing_dev_info.laptop_mode_wb_timer); What prevents elevator_attach to be called and requests to be inserted between 'if (q-elevator)' and 'blk_put_queue(q)'? -- Stefan Richter -=====-==-== =-== --=-- http://arcgraph.de/sr/ ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH block/for-next] block: don't call blk_drain_queue() if elevator is not up 2011-11-04 9:40 ` Stefan Richter @ 2011-11-04 14:38 ` Tejun Heo 0 siblings, 0 replies; 12+ messages in thread From: Tejun Heo @ 2011-11-04 14:38 UTC (permalink / raw) To: Stefan Richter Cc: Jens Axboe, Jiri Slaby, James E.J. Bottomley, LKML, linux-scsi, Jiri Slaby Hello, On Fri, Nov 4, 2011 at 2:40 AM, Stefan Richter <stefanr@s5r6.in-berlin.de> wrote: >> /* @q won't process any more request, flush async actions */ >> del_timer_sync(&q->backing_dev_info.laptop_mode_wb_timer); > > What prevents elevator_attach to be called and requests to be inserted > between 'if (q-elevator)' and 'blk_put_queue(q)'? The fact that the queue owner has called blk_cleaup_queue()? Thanks. -- tejun ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2011-11-04 14:38 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-11-03 16:08 NULL ptr deref at elv_drain_elevator Jiri Slaby 2011-11-03 16:14 ` Tejun Heo 2011-11-03 16:16 ` Jiri Slaby 2011-11-03 16:26 ` [PATCH block/for-next] block: don't call blk_drain_queue() if elevator is not up Tejun Heo 2011-11-03 17:49 ` Jens Axboe 2011-11-03 21:12 ` Tejun Heo 2011-11-04 8:12 ` Jens Axboe 2011-11-04 14:26 ` Tejun Heo 2011-11-04 14:28 ` Jens Axboe 2011-11-04 14:30 ` Tejun Heo 2011-11-04 9:40 ` Stefan Richter 2011-11-04 14:38 ` Tejun Heo
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox