From: Paul Smith <paul@mad-scientist.net>
To: Mike Anderson <andmike@linux.vnet.ibm.com>
Cc: linux-scsi@vger.kernel.org
Subject: Re: [2.6.27.25] Hang in SCSI sync cache when a disk is removed--?
Date: Thu, 02 Jul 2009 16:12:59 -0400 [thread overview]
Message-ID: <1246565579.9022.7226.camel@psmith-ubeta.netezza.com> (raw)
In-Reply-To: <20090702174151.GA17414@linux.vnet.ibm.com>
On Thu, 2009-07-02 at 10:41 -0700, Mike Anderson wrote:
> > As I mentioned, when we pull one of the disks from the EXP3000 the IO
> > subsystem completely hangs. Since we're running on a ramdisk this
> > doesn't hang our system completely, but any attempt to do any disk IO
> > thereafter hangs, so we have to power-cycle the blade (because reboot
> > tries to write to the disks). This quite reproducible in our
> > environment BUT it is very timing-sensitive, as shown below. If we
> > enable too much logging, etc. it goes away.
>
> Have you tried a minimum level of logging like the following without the
> error going away?
> "sysctl -w dev.scsi.logging_level=4100"
We've been enabling some logging at the mptlinux driver level, but not
the generic SCSI level. We'll give this a try.
> Can you run "cat /sys/class/scsi_host/*/state" when you are in the hung
> state?
We'll try this as well. One thing we did discover was that the device
entry in /sys/class/scsi_device/* is already gone when the hang occurs
so we can't retrieve the state of the device that way.
> If the host is in recovery no IOs will move forward. I assume if you can
> get a run with the 4100 level of logging it will show a host reset sent,
> but no waking up host to restart (unless the reset is being generated for
> other reasons outside of the scsi error handler).
This seems likely but the question is, why isn't it waking up again?
Unfortunately one of the folks working on this with me left for the US
holiday weekend so we'll have to take it up again on Monday morning.
Thanks for your reply, Mike!
next prev parent reply other threads:[~2009-07-02 20:12 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-07-02 16:22 [2.6.27.25] Hang in SCSI sync cache when a disk is removed--? Paul Smith
2009-07-02 17:41 ` Mike Anderson
2009-07-02 20:12 ` Paul Smith [this message]
2009-07-06 18:04 ` Paul Smith
2009-07-07 6:25 ` Mike Anderson
2009-07-07 13:58 ` James Bottomley
2009-07-07 14:33 ` Paul Smith
2009-07-07 20:24 ` Desai, Kashyap
2009-07-07 20:45 ` Mike Anderson
2009-07-07 21:10 ` Mike Anderson
2009-07-21 10:16 ` Desai, Kashyap
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1246565579.9022.7226.camel@psmith-ubeta.netezza.com \
--to=paul@mad-scientist.net \
--cc=andmike@linux.vnet.ibm.com \
--cc=linux-scsi@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox