From: Mike Anderson <andmike@linux.vnet.ibm.com>
To: Paul Smith <paul@mad-scientist.net>
Cc: linux-scsi@vger.kernel.org
Subject: Re: [2.6.27.25] Hang in SCSI sync cache when a disk is removed--?
Date: Thu, 2 Jul 2009 10:41:51 -0700 [thread overview]
Message-ID: <20090702174151.GA17414@linux.vnet.ibm.com> (raw)
In-Reply-To: <1246551772.9022.7192.camel@psmith-ubeta.netezza.com>
Paul Smith <paul@mad-scientist.net> wrote:
> Hi all; we are seeing a problem where, when we pull a disk out of our
> disk array (even one that's not actively being used), the entire IO
> subsystem in Linux hangs. Here are some details:
>
> I have an IBM Bladecenter with an LSI EXP3000 SAS expander with 12 1TB
> Seagate SAS disks. Relevant lspci output for the SAS controllers:
>
> # lspci | grep LSI
> 02:00.0 SCSI storage controller: LSI Logic / Symbios Logic SAS1064ET PCI-Express Fusion-MPT SAS (rev 02)
> 08:01.0 SCSI storage controller: LSI Logic / Symbios Logic SAS1064 PCI-X Fusion-MPT SAS (rev 03)
> 14:01.0 SCSI storage controller: LSI Logic / Symbios Logic SAS1064 PCI-X Fusion-MPT SAS (rev 03)
>
> On this system we are running an embedded/custom version of Linux in a
> ramdisk, based on Linux 2.6.27.25. Unfortunately it's quite
> difficult/impossible for us to upgrade to a newer kernel at this time,
> however if this problem rings a bell I'm happy to backport patches,
> fixes, etc.
>
> As I mentioned, when we pull one of the disks from the EXP3000 the IO
> subsystem completely hangs. Since we're running on a ramdisk this
> doesn't hang our system completely, but any attempt to do any disk IO
> thereafter hangs, so we have to power-cycle the blade (because reboot
> tries to write to the disks). This quite reproducible in our
> environment BUT it is very timing-sensitive, as shown below. If we
> enable too much logging, etc. it goes away.
>
Have you tried a minimum level of logging like the following without the
error going away?
"sysctl -w dev.scsi.logging_level=4100"
> We've been in touch with some driver folks at LSI and they seem to feel
> that the problem is a SCSI midlayer race condition, rather than in the
> mptlinux driver itself. So I'm hoping someone here has ideas.
>
> On a working disk pull we get log messages like this:
>
> mptscsih: ioc1: attempting host reset! (sc=ffff8804619e2640)
> mptscsih: ioc1: host reset: SUCCESS (sc=ffff8804619e2640)
> mptbase: ioc1: LogInfo(0x30030501): Originator={IOP}, Code={Invalid Page}, SubCode(0x0501)
> mptsas: ioc1: removing ssp device: fw_channel 0, fw_id 72, phy 11, sas_addr 0x5000c5000d2987b6
> sd 3:0:11:0: [sdx] Synchronizing SCSI cache
> sd 3:0:11:0: Device offlined - not ready after error recovery
> sg_cmd_done: device detached
>
> Note that the "host reset: SUCCESS" message here comes BEFORE the
> "Synchronizing SCSI cache" message. On a hanging disk pull we get log
> messages like this:
>
> mptscsih: ioc1: attempting host reset! (sc=ffff8804622b48c0)
> mptsas: ioc1: removing ssp device: fw_channel 0, fw_id 72, phy 11, sas_addr 0x5000c5000d2987b6
> sd 3:0:11:0: [sdx] Synchronizing SCSI cache
>
> and it hangs right here. In this situation the host reset does not
> complete before we try to sync, and that appears to be the indicator of
> the problem. Here's a backtrace; note we're in sd_sync_cache():
>
> Call Trace:
> [<ffffffff8048d88f>] _spin_lock_irqsave+0x1f/0x50
> [<ffffffff8048daf2>] _spin_unlock_irqrestore+0x12/0x40
> [<ffffffffa00080fc>] scsi_get_command+0x8c/0xc0 [scsi_mod]
> [<ffffffff8048c11d>] schedule_timeout+0xad/0xf0
> [<ffffffff8034df1d>] elv_next_request+0x15d/0x290
> [<ffffffff8048b1ea>] wait_for_common+0xba/0x170
> [<ffffffff80237460>] default_wake_function+0x0/0x10
> [<ffffffff80353b77>] blk_execute_rq+0x67/0xa0
> [<ffffffff80350e71>] get_request_wait+0x21/0x1d0
> [<ffffffff8023e972>] vprintk+0x1f2/0x490
> [<ffffffff8048dab1>] _spin_unlock_irq+0x11/0x40
> [<ffffffffa000e5a4>] scsi_execute+0xf4/0x150 [scsi_mod]
> [<ffffffffa000e691>] scsi_execute_req+0x91/0x100 [scsi_mod]
> [<ffffffffa00f89bc>] sd_sync_cache+0xac/0x100 [sd_mod]
> [<ffffffff80360000>] compat_blkdev_ioctl+0x80/0x1740
> [<ffffffff80364062>] kobject_get+0x12/0x20
> [<ffffffffa00fac51>] sd_shutdown+0x71/0x160 [sd_mod]
> [<ffffffffa00fad7c>] sd_remove+0x3c/0x80 [sd_mod]
> [<ffffffffa0012122>] scsi_bus_remove+0x42/0x60 [scsi_mod]
> [<ffffffff803d8ba9>] __device_release_driver+0x99/0x100
> [<ffffffff803d8d08>] device_release_driver+0x28/0x40
> [<ffffffff803d8087>] bus_remove_device+0xb7/0xf0
> [<ffffffff803d66c9>] device_del+0x119/0x1a0
> [<ffffffffa001245c>] __scsi_remove_device+0x5c/0xb0 [scsi_mod]
> [<ffffffffa00124d8>] scsi_remove_device+0x28/0x40 [scsi_mod]
> [<ffffffffa00125a0>] __scsi_remove_target+0xa0/0xd0 [scsi_mod]
> [<ffffffffa0012640>] __remove_child+0x0/0x30 [scsi_mod]
> [<ffffffffa0012656>] __remove_child+0x16/0x30 [scsi_mod]
> [<ffffffff803d5c3b>] device_for_each_child+0x3b/0x60
> [<ffffffffa0012606>] scsi_remove_target+0x36/0x70 [scsi_mod]
> [<ffffffffa010c5f5>] sas_rphy_remove+0x75/0x80 [scsi_transport_sas]
> [<ffffffffa010c609>] sas_rphy_delete+0x9/0x20 [scsi_transport_sas]
> [<ffffffffa010c642>] sas_port_delete+0x22/0x140 [scsi_transport_sas]
> [<ffffffffa013c230>] mptsas_del_end_device+0x230/0x2c0 [mptsas]
> [<ffffffffa013c8a1>] mptsas_hotplug_work+0x291/0xb20 [mptsas]
> [<ffffffff80369c9a>] vsnprintf+0x2ea/0x7c0
> [<ffffffff80287dac>] free_hot_cold_page+0x1fc/0x2f0
> [<ffffffff80287ed8>] __pagevec_free+0x38/0x50
> [<ffffffff8028b730>] release_pages+0x180/0x1d0
> [<ffffffff80362789>] __next_cpu+0x19/0x30
> [<ffffffff802321ec>] find_busiest_group+0x1dc/0x960
> [<ffffffff80362789>] __next_cpu+0x19/0x30
> [<ffffffff802321ec>] find_busiest_group+0x1dc/0x960
> [<ffffffffa013e4a9>] mptsas_firmware_event_work+0xd29/0x1110 [mptsas]
> [<ffffffff8022dc94>] update_curr+0x84/0xd0
> [<ffffffff80230370>] __dequeue_entity+0x60/0x90
> [<ffffffff8048dab1>] _spin_unlock_irq+0x11/0x40
> [<ffffffff802364fb>] finish_task_switch+0x3b/0xd0
> [<ffffffff8048b911>] thread_return+0xa3/0x662
> [<ffffffffa013d780>] mptsas_firmware_event_work+0x0/0x1110 [mptsas]
> [<ffffffff80250e65>] run_workqueue+0x85/0x150
> [<ffffffff80250fcf>] worker_thread+0x9f/0x110
> [<ffffffff802553b0>] autoremove_wake_function+0x0/0x30
> [<ffffffff80250f30>] worker_thread+0x0/0x110
> [<ffffffff80254ef7>] kthread+0x47/0x90
> [<ffffffff80254eb0>] kthread+0x0/0x90
> [<ffffffff8020d5f9>] child_rip+0xa/0x11
> [<ffffffff80254eb0>] kthread+0x0/0x90
> [<ffffffff80254eb0>] kthread+0x0/0x90
> [<ffffffff8020d5ef>] child_rip+0x0/0x11
>
> According to sd.c:sd_synch_cache() it's supposed to retry the
> scsi_execute_req() three times then give up, but instead it never
> returns. It seems that if the host reset is not completed yet, then we
> find this event on the workqueue and get into some kind of deadlock
> situation.
>
> We're kind of stuck on this and I was wondering if anyone has any
> thoughts or avenues to look at to move us forward on resolving this?
Can you run "cat /sys/class/scsi_host/*/state" when you are in the hung
state?
If the host is in recovery no IOs will move forward. I assume if you can
get a run with the 4100 level of logging it will show a host reset sent,
but no waking up host to restart (unless the reset is being generated for
other reasons outside of the scsi error handler).
-andmike
--
Michael Anderson
andmike@linux.vnet.ibm.com
next prev parent reply other threads:[~2009-07-02 17:41 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-07-02 16:22 [2.6.27.25] Hang in SCSI sync cache when a disk is removed--? Paul Smith
2009-07-02 17:41 ` Mike Anderson [this message]
2009-07-02 20:12 ` Paul Smith
2009-07-06 18:04 ` Paul Smith
2009-07-07 6:25 ` Mike Anderson
2009-07-07 13:58 ` James Bottomley
2009-07-07 14:33 ` Paul Smith
2009-07-07 20:24 ` Desai, Kashyap
2009-07-07 20:45 ` Mike Anderson
2009-07-07 21:10 ` Mike Anderson
2009-07-21 10:16 ` Desai, Kashyap
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090702174151.GA17414@linux.vnet.ibm.com \
--to=andmike@linux.vnet.ibm.com \
--cc=linux-scsi@vger.kernel.org \
--cc=paul@mad-scientist.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox