All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mike Anderson <andmike@linux.vnet.ibm.com>
To: bugme-daemon@bugzilla.kernel.org
Cc: linux-scsi@vger.kernel.org, Jens Axboe <Jens.Axboe@oracle.com>,
	James Bottomley <James.Bottomley@HansenPartnership.com>,
	Tejun Heo <tj@kernel.org>
Subject: Re: [Bug 12020] scsi_times_out NULL pointer dereference
Date: Thu, 20 Nov 2008 11:36:33 -0800	[thread overview]
Message-ID: <20081120193633.GB28370@linux.vnet.ibm.com> (raw)
In-Reply-To: <20081120151224.EB880108042@picon.linux-foundation.org>

I have two systems that are hitting similar signatures in scsi_times_out.

Note: that my testing is using a distro kernel, but in this area the code
is very similar. I will work to get a reproduction on mainline.

..but..

I added some debug to scsi_times_out and noticed that the request with no
scmd set in req->special also did not have REQ_STARTED set.

I added a WARN_ON check to blk_add_timer for any requests 
that we where starting a timer for that did not have REQ_STARTED. This is
shown below. This does not look good as the elv_dequeue_request is being
called off elv_next_request for some cases.

Call Trace:
[c00000007b747580] [c00000000027808c] .blk_add_timer+0x74/0x134
(unreliable)
[c00000007b747610] [c00000000026f9b8] .elv_dequeue_request+0x78/0x8c
[c00000007b747680] [c000000000275830] .blk_do_ordered+0x8c/0x31c
[c00000007b747720] [c00000000026fc18] .elv_next_request+0x24c/0x2d4
[c00000007b7477c0] [d000000000368004] .scsi_request_fn+0xc8/0x628
[scsi_mod]
[c00000007b7478a0] [c00000000026fdf4] .elv_insert+0x154/0x38c
[c00000007b747940] [c000000000273ad0] .__make_request+0x4e4/0x568
[c00000007b7479f0] [c000000000271a68] .generic_make_request+0x3f4/0x468
[c00000007b747af0] [c000000000271bd8] .submit_bio+0xfc/0x124
[c00000007b747bb0] [c000000000160a00] .submit_bh+0x14c/0x198
[c00000007b747c40] [c0000000001630a0] .sync_dirty_buffer+0xbc/0x15c
[c00000007b747cd0] [c0000000001fcac0]
.journal_commit_transaction+0x1014/0x158c
[c00000007b747e10] [c00000000020111c] .kjournald+0x104/0x2f4
[c00000007b747f00] [c0000000000a909c] .kthread+0x78/0xc4
[c00000007b747f90] [c00000000002ae2c] .kernel_thread+0x4c/0x68

I changed the previous mentioned WARN_ON to just do a return if the request
does not have REQ_STARTED. This corrected the issue of seeing an oops in
scsi_times_out. But this is just a hack.

Hope this analysis is not flawed because of kernel deltas. It also may not
address this specific issue being seen in this bug, but does appear to
indicate a possible path to get a request on the timeout list with out a
req->special set.

I think we may need to look at some of the paths that are calling
blkdev_dequeue_request and understand how to prevent blk_add_timer from
being called if we are not really starting a SCSI cmd.

-andmike
--
Michael Anderson
andmike@linux.vnet.ibm.com

  reply	other threads:[~2008-11-20 19:36 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-11-13 18:30 [Bug 12020] New: scsi_times_out NULL pointer dereference bugme-daemon
2008-11-13 18:40 ` [Bug 12020] " bugme-daemon
2008-11-13 19:03 ` [Bug 12020] New: " James Bottomley
2008-11-13 22:46   ` James Bottomley
2008-11-13 19:03 ` [Bug 12020] " bugme-daemon
2008-11-13 20:12 ` bugme-daemon
2008-11-13 20:22   ` James Bottomley
2008-11-13 20:23 ` bugme-daemon
2008-11-13 21:36 ` bugme-daemon
2008-11-13 22:47 ` bugme-daemon
2008-11-16 17:50 ` bugme-daemon
2008-11-20 15:12 ` bugme-daemon
2008-11-20 19:36   ` Mike Anderson [this message]
2008-11-20 19:36 ` bugme-daemon
2008-12-03 10:19 ` bugme-daemon
2008-12-07 20:21 ` bugme-daemon
2008-12-07 20:21 ` bugme-daemon
  -- strict thread matches above, loose matches on Subject: below --
2008-11-16 16:24 2.6.28-rc5: Reported regressions from 2.6.27 Rafael J. Wysocki
2008-11-16 16:35 ` [Bug #12020] scsi_times_out NULL pointer dereference Rafael J. Wysocki
2008-11-16 16:35   ` Rafael J. Wysocki
2008-11-22 20:24 2.6.28-rc6-git1: Reported regressions from 2.6.27 Rafael J. Wysocki
2008-11-22 20:28 ` [Bug #12020] scsi_times_out NULL pointer dereference Rafael J. Wysocki
2008-11-22 20:28   ` Rafael J. Wysocki
2008-12-03 21:49 2.6.28-rc7-git2: Reported regressions from 2.6.27 Rafael J. Wysocki
2008-12-03 21:57 ` [Bug #12020] scsi_times_out NULL pointer dereference Rafael J. Wysocki
2008-12-03 21:57   ` Rafael J. Wysocki
2008-12-04  0:14   ` James Bottomley
2008-12-04  0:14     ` James Bottomley
     [not found]     ` <1228349648.5551.98.camel-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
2008-12-07 20:22       ` Rafael J. Wysocki
2008-12-07 20:22         ` Rafael J. Wysocki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20081120193633.GB28370@linux.vnet.ibm.com \
    --to=andmike@linux.vnet.ibm.com \
    --cc=James.Bottomley@HansenPartnership.com \
    --cc=Jens.Axboe@oracle.com \
    --cc=bugme-daemon@bugzilla.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.