From: James Bottomley <James.Bottomley@HansenPartnership.com>
To: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>,
Steffen Maier <maier@linux.vnet.ibm.com>,
"linux-scsi@vger.kernel.org" <linux-scsi@vger.kernel.org>,
Jens Axboe <axboe@kernel.dk>, Hannes Reinecke <hare@suse.de>,
Linux Kernel <linux-kernel@vger.kernel.org>,
Alan Stern <stern@rowland.harvard.edu>,
Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>,
"Taraka R. Bodireddy" <tarak.reddy@in.ibm.com>,
"Seshagiri N. Ippili" <seshagiri.ippili@in.ibm.com>,
"Manvanthara B. Puttashankar" <mputtash@in.ibm.com>,
Jeff Moyer <jmoyer@redhat.com>, Shaohua Li <shaohua.li@intel.com>,
Mike Snitzer <snitzer@redhat.com>,
gmuelas@de.ibm.com
Subject: Re: [GIT PULL] Queue free fix (was Re: [PATCH] block: Free queue resources at blk_release_queue())
Date: Mon, 31 Oct 2011 14:42:26 +0400 [thread overview]
Message-ID: <1320057746.2964.1.camel@dabdike> (raw)
In-Reply-To: <20111031100557.GA2621@osiris.boeblingen.de.ibm.com>
On Mon, 2011-10-31 at 11:05 +0100, Heiko Carstens wrote:
> On Tue, Oct 18, 2011 at 11:29:40AM -0500, James Bottomley wrote:
> > On Tue, 2011-10-18 at 17:45 +0200, Heiko Carstens wrote:
> > > On Tue, Oct 18, 2011 at 10:31:20PM +0900, Jun'ichi Nomura wrote:
> > > > On 10/17/11 23:06, James Bottomley wrote:
> > > > > On Mon, 2011-10-17 at 17:46 +0900, Jun'ichi Nomura wrote:
> > > > >> On 10/15/11 01:03, James Bottomley wrote:
> > > > >>> On Thu, 2011-10-13 at 15:09 +0200, Steffen Maier wrote:
> > > > >>>> Initially, we encountered use-after-free bugs in
> > > > >>>> scsi_print_command / scsi_dispatch_cmd
> > > > >>>> http://marc.info/?l=linux-scsi&m=130824013229933&w=2
> > > > >>
> > > > >> It is interesting that both this and the older report
> > > > >> got oopsed in scsi_log_send(), while there are other
> > > > >> dereferences of 'cmd' around scsi_dispatch_cmd().
> > > > >> Are there any reason they are special? Just by accident?
> > > > >
> > > > > Right, that's why it looks like the command area got freed rather than
> > > > > the command pointer was bogus (6b is a poison free pattern). Perhaps if
> > > > > the reporter could pin down the failing source line, we'd know better
> > > > > what was going on?
> > > >
> > > > Yeah, that might be useful.
> > >
> > > The struct scsi_cmnd that was passed to scsi_log_send() was already freed
> > > (contents completely 6b6b6b...).
> > > Since SLUB debugging was turned on we can see that it was freed from
> > > __scsi_put_command(). Not too much of a surprise.
> >
> > But it does tell us the put must be racing with dispatch, since
> > dereferencing the command to find the device worked higher up in
> > scsi_dispatch_cmd().
> >
> > There is one way to invalidate the theory that we cloned something with
> > an attached command, and that's to put
> >
> > BUG_ON(rq->special)
> >
> > in blk_insert_cloned_request(). I think we're careful about clearing
> > it, so it should work (perhaps a warn on just in case).
>
> It _looks_ like we do not hit the BUG_ON() that. This time we get this instead:
>
> [ 4024.937870] Unable to handle kernel pointer dereference at virtual kernel address 000003e004d41000
> [ 4024.937886] Oops: 0011 [#1] PREEMPT SMP DEBUG_PAGEALLOC
> [ 4024.937899] Modules linked in: dm_round_robin sunrpc ipv6 qeth_l2 binfmt_misc dm_multipath scsi_dh dm_mod qeth ccwgroup [las
> t unloaded: scsi_wait_scan]
> [ 4024.937925] CPU: 1 Not tainted 3.0.7-50.x.20111021-s390xdefault #1
> [ 4024.937930] Process ksoftirqd/1 (pid: 1942, task: 0000000079c6c750, ksp: 0000000073adfc50)
> [ 4024.937936] Krnl PSW : 0704000180000000 000003e00126263a (dm_softirq_done+0x72/0x140 [dm_mod])
> [ 4024.937959] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:0 CC:0 PM:0 EA:3
> [ 4024.937966] Krnl GPRS: 000000007b9156b0 000003e004d41100 000000000e14b600 000000000000006d
> [ 4024.937971] 00000000715332b0 000000000c140ce8 000000000090d2ef 0000000000000005
> [ 4024.937977] 0000000000000001 0000000000000101 000000000c140d00 0000000000000000
> [ 4024.937983] 000003e001260000 000003e00126f098 0000000073adfd08 0000000073adfcb8
> [ 4024.938001] Krnl Code: 000003e00126262a: f0a0000407f1 srp 4(11,%r0),2033,0
> [ 4024.938009] 000003e001262630: e31050080004 lg %r1,8(%r5)
> [ 4024.938017] 000003e001262636: 58b05180 l %r11,384(%r5)
> [ 4024.938024] >000003e00126263a: e31010080004 lg %r1,8(%r1)
> [ 4024.938031] 000003e001262640: e31010500004 lg %r1,80(%r1)
> [ 4024.938038] 000003e001262646: b9020011 ltgr %r1,%r1
> [ 4024.938045] 000003e00126264a: a784ffdf brc 8,3e001262608
> [ 4024.938053] 000003e00126264e: e32050080004 lg %r2,8(%r5)
> [ 4024.938060] Call Trace:
> [ 4024.938063] ([<070000000040716c>] 0x70000000040716c)
> [ 4024.938069] [<000000000040d29c>] blk_done_softirq+0xd4/0xf0
> [ 4024.938080] [<00000000001587c2>] __do_softirq+0xda/0x398
> [ 4024.938088] [<0000000000158ba0>] run_ksoftirqd+0x120/0x23c
> [ 4024.938095] [<000000000017c2aa>] kthread+0xa6/0xb0
> [ 4024.938102] [<000000000061970e>] kernel_thread_starter+0x6/0xc
> [ 4024.938112] [<0000000000619708>] kernel_thread_starter+0x0/0xc
> [ 4024.938118] INFO: lockdep is turned off.
> [ 4024.938121] Last Breaking-Event-Address:
> [ 4024.938124] [<000003e001262600>] dm_softirq_done+0x38/0x140 [dm_mod]
> [ 4024.938135]
> [ 4024.938139] Kernel panic - not syncing: Fatal exception in interrupt
> [ 4024.938144] CPU: 1 Tainted: G D 3.0.7-50.x.20111021-s390xdefault #1
> [ 4024.938150] Process ksoftirqd/1 (pid: 1942, task: 0000000079c6c750, ksp: 0000000073adfc50)
> [ 4024.938155] 0000000073adf958 0000000073adf8d8 0000000000000002 0000000000000000
> [ 4024.938164] 0000000073adf978 0000000073adf8f0 0000000073adf8f0 000000000061386a
> [ 4024.938174] 0000000000000000 0000000000000000 0000000000000005 0000000000100ec6
> [ 4024.938184] 000000000000000d 000000000000000c 0000000073adf940 0000000000000000
> [ 4024.938194] 0000000000000000 0000000000100a18 0000000073adf8d8 0000000073adf918
> [ 4024.938205] Call Trace:
> [ 4024.938208] ([<0000000000100926>] show_trace+0xee/0x144)
> [ 4024.938216] [<0000000000613694>] panic+0xb0/0x234
> [ 4024.938224] [<0000000000100ec6>] die+0x15a/0x168
> [ 4024.938230] [<000000000011fb9e>] do_no_context+0xba/0xf8
> [ 4024.938306] [<000000000061c074>] do_dat_exception+0x378/0x3e4
> [ 4024.938314] [<0000000000619e02>] pgm_exit+0x0/0x4
> [ 4024.938319] [<000003e00126263a>] dm_softirq_done+0x72/0x140 [dm_mod]
> [ 4024.938329] ([<070000000040716c>] 0x70000000040716c)
> [ 4024.938334] [<000000000040d29c>] blk_done_softirq+0xd4/0xf0
> [ 4024.938341] [<00000000001587c2>] __do_softirq+0xda/0x398
> [ 4024.938347] [<0000000000158ba0>] run_ksoftirqd+0x120/0x23c
> [ 4024.938354] [<000000000017c2aa>] kthread+0xa6/0xb0
> [ 4024.938360] [<000000000061970e>] kernel_thread_starter+0x6/0xc
> [ 4024.938366] [<0000000000619708>] kernel_thread_starter+0x0/0xc
> [ 4024.938373] INFO: lockdep is turned off.
>
> So we thought we might as well upgrade to 3.1 but immediately got a
>
> kernel BUG at block/blk-flush.c:323!
>
> which was handled here https://lkml.org/lkml/2011/10/4/105 and
> here https://lkml.org/lkml/2011/10/12/408 .
>
> But no patches for that one went upstream AFAICS.
Well, all I can say is "hm". You put only a BUG_ON() in the code, which
wasn't triggered, but now we get a completely different oops. However,
I think it does point to the dm barrier handling code. Can you turn off
barriers and see if all oopses go away?
James
next prev parent reply other threads:[~2011-10-31 10:42 UTC|newest]
Thread overview: 65+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-09-22 13:18 [PATCH] block: Free queue resources at blk_release_queue() Hannes Reinecke
2011-09-28 0:47 ` Jens Axboe
2011-09-28 0:55 ` Linus Torvalds
2011-09-28 1:15 ` Jens Axboe
2011-09-28 1:59 ` Linus Torvalds
2011-09-28 2:02 ` Jens Axboe
2011-09-28 4:10 ` James Bottomley
2011-09-28 14:08 ` Jens Axboe
2011-09-28 14:11 ` James Bottomley
2011-09-28 14:14 ` [GIT PULL] Queue free fix (was Re: [PATCH] block: Free queue resources at blk_release_queue()) Jens Axboe
2011-09-28 15:22 ` Linus Torvalds
2011-09-28 15:22 ` Linus Torvalds
2011-09-28 15:43 ` James Bottomley
2011-09-28 17:48 ` Vivek Goyal
2011-09-28 17:53 ` Christoph Hellwig
2011-09-28 18:09 ` Vivek Goyal
2011-09-28 18:16 ` Christoph Hellwig
2011-09-28 19:05 ` Eric Seppanen
2011-09-28 19:05 ` Eric Seppanen
2011-09-28 19:14 ` Christoph Hellwig
2011-11-30 10:18 ` Jens Axboe
2011-11-30 10:26 ` Christoph Hellwig
2011-09-28 22:34 ` Vivek Goyal
2011-09-28 17:59 ` James Bottomley
2011-10-13 13:09 ` Steffen Maier
2011-10-14 16:03 ` James Bottomley
2011-10-17 8:46 ` Jun'ichi Nomura
2011-10-17 14:06 ` James Bottomley
2011-10-18 13:31 ` Jun'ichi Nomura
2011-10-18 15:45 ` Heiko Carstens
2011-10-18 16:29 ` James Bottomley
2011-10-31 10:05 ` Heiko Carstens
2011-10-31 10:42 ` James Bottomley [this message]
2011-10-31 11:46 ` Jun'ichi Nomura
2011-10-31 13:00 ` Heiko Carstens
2011-11-02 12:37 ` Jun'ichi Nomura
2011-11-02 12:44 ` Hannes Reinecke
2011-11-02 12:44 ` Hannes Reinecke
2011-11-02 13:47 ` Heiko Carstens
2011-11-04 4:07 ` Jun'ichi Nomura
2011-11-04 9:12 ` Heiko Carstens
2011-11-03 18:25 ` Mike Snitzer
2011-11-04 9:19 ` Heiko Carstens
2011-11-04 13:30 ` Mike Snitzer
2011-11-04 13:37 ` Hannes Reinecke
2011-11-07 11:31 ` Jun'ichi Nomura
2011-11-07 13:42 ` Mike Snitzer
2011-11-07 12:23 ` Heiko Carstens
2011-11-07 11:30 ` Jun'ichi Nomura
2011-11-07 15:36 ` Mike Snitzer
2011-11-07 16:43 ` Heiko Carstens
2011-11-07 17:10 ` Mike Snitzer
2011-11-07 21:44 ` Mike Snitzer
2011-11-09 9:37 ` Hannes Reinecke
2011-11-10 16:10 ` Heiko Carstens
2011-11-17 16:29 ` Mike Snitzer
2011-11-29 12:00 ` Heiko Carstens
2011-11-29 20:18 ` Mike Snitzer
2011-11-30 7:25 ` Hannes Reinecke
2011-11-30 7:25 ` Hannes Reinecke
2011-12-12 12:39 ` Heiko Carstens
2011-12-13 16:50 ` Mike Snitzer
2011-10-31 13:21 ` Mike Snitzer
2011-10-31 13:40 ` Heiko Carstens
2011-10-31 14:01 ` Mike Snitzer
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1320057746.2964.1.camel@dabdike \
--to=james.bottomley@hansenpartnership.com \
--cc=axboe@kernel.dk \
--cc=cascardo@linux.vnet.ibm.com \
--cc=gmuelas@de.ibm.com \
--cc=hare@suse.de \
--cc=heiko.carstens@de.ibm.com \
--cc=j-nomura@ce.jp.nec.com \
--cc=jmoyer@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-scsi@vger.kernel.org \
--cc=maier@linux.vnet.ibm.com \
--cc=mputtash@in.ibm.com \
--cc=seshagiri.ippili@in.ibm.com \
--cc=shaohua.li@intel.com \
--cc=snitzer@redhat.com \
--cc=stern@rowland.harvard.edu \
--cc=tarak.reddy@in.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.