Re: [2.6.27.25] Hang in SCSI sync cache when a disk is removed--?

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Mike Anderson <andmike@linux.vnet.ibm.com>
To: Paul Smith <paul@mad-scientist.net>
Cc: linux-scsi@vger.kernel.org
Subject: Re: [2.6.27.25] Hang in SCSI sync cache when a disk is removed--?
Date: Thu, 2 Jul 2009 10:41:51 -0700	[thread overview]
Message-ID: <20090702174151.GA17414@linux.vnet.ibm.com> (raw)
In-Reply-To: <1246551772.9022.7192.camel@psmith-ubeta.netezza.com>

Paul Smith <paul@mad-scientist.net> wrote:
> Hi all; we are seeing a problem where, when we pull a disk out of our
> disk array (even one that's not actively being used), the entire IO
> subsystem in Linux hangs.  Here are some details:
> 
> I have an IBM Bladecenter with an LSI EXP3000 SAS expander with 12 1TB
> Seagate SAS disks.  Relevant lspci output for the SAS controllers:
> 
>         # lspci | grep LSI
>         02:00.0 SCSI storage controller: LSI Logic / Symbios Logic SAS1064ET PCI-Express Fusion-MPT SAS (rev 02)
>         08:01.0 SCSI storage controller: LSI Logic / Symbios Logic SAS1064 PCI-X Fusion-MPT SAS (rev 03)
>         14:01.0 SCSI storage controller: LSI Logic / Symbios Logic SAS1064 PCI-X Fusion-MPT SAS (rev 03)
> 
> On this system we are running an embedded/custom version of Linux in a
> ramdisk, based on Linux 2.6.27.25.  Unfortunately it's quite
> difficult/impossible for us to upgrade to a newer kernel at this time,
> however if this problem rings a bell I'm happy to backport patches,
> fixes, etc.
> 
> As I mentioned, when we pull one of the disks from the EXP3000 the IO
> subsystem completely hangs.  Since we're running on a ramdisk this
> doesn't hang our system completely, but any attempt to do any disk IO
> thereafter hangs, so we have to power-cycle the blade (because reboot
> tries to write to the disks).  This quite reproducible in our
> environment BUT it is very timing-sensitive, as shown below.  If we
> enable too much logging, etc. it goes away.
> 

Have you tried a minimum level of logging like the following without the
error going away?
"sysctl -w dev.scsi.logging_level=4100"


> We've been in touch with some driver folks at LSI and they seem to feel
> that the problem is a SCSI midlayer race condition, rather than in the
> mptlinux driver itself.  So I'm hoping someone here has ideas.
> 
> On a working disk pull we get log messages like this:
> 
>         mptscsih: ioc1: attempting host reset! (sc=ffff8804619e2640)
>         mptscsih: ioc1: host reset: SUCCESS (sc=ffff8804619e2640)
>         mptbase: ioc1: LogInfo(0x30030501): Originator={IOP}, Code={Invalid Page}, SubCode(0x0501)
>         mptsas: ioc1: removing ssp device: fw_channel 0, fw_id 72, phy 11, sas_addr 0x5000c5000d2987b6
>         sd 3:0:11:0: [sdx] Synchronizing SCSI cache
>         sd 3:0:11:0: Device offlined - not ready after error recovery
>         sg_cmd_done: device detached
> 
> Note that the "host reset: SUCCESS" message here comes BEFORE the
> "Synchronizing SCSI cache" message.  On a hanging disk pull we get log
> messages like this:
> 
>         mptscsih: ioc1: attempting host reset! (sc=ffff8804622b48c0)
>         mptsas: ioc1: removing ssp device: fw_channel 0, fw_id 72, phy 11, sas_addr 0x5000c5000d2987b6
>         sd 3:0:11:0: [sdx] Synchronizing SCSI cache
> 
> and it hangs right here.  In this situation the host reset does not
> complete before we try to sync, and that appears to be the indicator of
> the problem.  Here's a backtrace; note we're in sd_sync_cache():
> 
>         Call Trace:
>          [<ffffffff8048d88f>] _spin_lock_irqsave+0x1f/0x50
>          [<ffffffff8048daf2>] _spin_unlock_irqrestore+0x12/0x40
>          [<ffffffffa00080fc>] scsi_get_command+0x8c/0xc0 [scsi_mod]
>          [<ffffffff8048c11d>] schedule_timeout+0xad/0xf0
>          [<ffffffff8034df1d>] elv_next_request+0x15d/0x290
>          [<ffffffff8048b1ea>] wait_for_common+0xba/0x170
>          [<ffffffff80237460>] default_wake_function+0x0/0x10
>          [<ffffffff80353b77>] blk_execute_rq+0x67/0xa0
>          [<ffffffff80350e71>] get_request_wait+0x21/0x1d0
>          [<ffffffff8023e972>] vprintk+0x1f2/0x490
>          [<ffffffff8048dab1>] _spin_unlock_irq+0x11/0x40
>          [<ffffffffa000e5a4>] scsi_execute+0xf4/0x150 [scsi_mod]
>          [<ffffffffa000e691>] scsi_execute_req+0x91/0x100 [scsi_mod]
>          [<ffffffffa00f89bc>] sd_sync_cache+0xac/0x100 [sd_mod]
>          [<ffffffff80360000>] compat_blkdev_ioctl+0x80/0x1740
>          [<ffffffff80364062>] kobject_get+0x12/0x20
>          [<ffffffffa00fac51>] sd_shutdown+0x71/0x160 [sd_mod]
>          [<ffffffffa00fad7c>] sd_remove+0x3c/0x80 [sd_mod]
>          [<ffffffffa0012122>] scsi_bus_remove+0x42/0x60 [scsi_mod]
>          [<ffffffff803d8ba9>] __device_release_driver+0x99/0x100
>          [<ffffffff803d8d08>] device_release_driver+0x28/0x40
>          [<ffffffff803d8087>] bus_remove_device+0xb7/0xf0
>          [<ffffffff803d66c9>] device_del+0x119/0x1a0
>          [<ffffffffa001245c>] __scsi_remove_device+0x5c/0xb0 [scsi_mod]
>          [<ffffffffa00124d8>] scsi_remove_device+0x28/0x40 [scsi_mod]
>          [<ffffffffa00125a0>] __scsi_remove_target+0xa0/0xd0 [scsi_mod]
>          [<ffffffffa0012640>] __remove_child+0x0/0x30 [scsi_mod]
>          [<ffffffffa0012656>] __remove_child+0x16/0x30 [scsi_mod]
>          [<ffffffff803d5c3b>] device_for_each_child+0x3b/0x60
>          [<ffffffffa0012606>] scsi_remove_target+0x36/0x70 [scsi_mod]
>          [<ffffffffa010c5f5>] sas_rphy_remove+0x75/0x80 [scsi_transport_sas]
>          [<ffffffffa010c609>] sas_rphy_delete+0x9/0x20 [scsi_transport_sas]
>          [<ffffffffa010c642>] sas_port_delete+0x22/0x140 [scsi_transport_sas]
>          [<ffffffffa013c230>] mptsas_del_end_device+0x230/0x2c0 [mptsas]
>          [<ffffffffa013c8a1>] mptsas_hotplug_work+0x291/0xb20 [mptsas]
>          [<ffffffff80369c9a>] vsnprintf+0x2ea/0x7c0
>          [<ffffffff80287dac>] free_hot_cold_page+0x1fc/0x2f0
>          [<ffffffff80287ed8>] __pagevec_free+0x38/0x50
>          [<ffffffff8028b730>] release_pages+0x180/0x1d0
>          [<ffffffff80362789>] __next_cpu+0x19/0x30
>          [<ffffffff802321ec>] find_busiest_group+0x1dc/0x960
>          [<ffffffff80362789>] __next_cpu+0x19/0x30
>          [<ffffffff802321ec>] find_busiest_group+0x1dc/0x960
>          [<ffffffffa013e4a9>] mptsas_firmware_event_work+0xd29/0x1110 [mptsas]
>          [<ffffffff8022dc94>] update_curr+0x84/0xd0
>          [<ffffffff80230370>] __dequeue_entity+0x60/0x90
>          [<ffffffff8048dab1>] _spin_unlock_irq+0x11/0x40
>          [<ffffffff802364fb>] finish_task_switch+0x3b/0xd0
>          [<ffffffff8048b911>] thread_return+0xa3/0x662
>          [<ffffffffa013d780>] mptsas_firmware_event_work+0x0/0x1110 [mptsas]
>          [<ffffffff80250e65>] run_workqueue+0x85/0x150
>          [<ffffffff80250fcf>] worker_thread+0x9f/0x110
>          [<ffffffff802553b0>] autoremove_wake_function+0x0/0x30
>          [<ffffffff80250f30>] worker_thread+0x0/0x110
>          [<ffffffff80254ef7>] kthread+0x47/0x90
>          [<ffffffff80254eb0>] kthread+0x0/0x90
>          [<ffffffff8020d5f9>] child_rip+0xa/0x11
>          [<ffffffff80254eb0>] kthread+0x0/0x90
>          [<ffffffff80254eb0>] kthread+0x0/0x90
>          [<ffffffff8020d5ef>] child_rip+0x0/0x11
> 
> According to sd.c:sd_synch_cache() it's supposed to retry the
> scsi_execute_req() three times then give up, but instead it never
> returns.  It seems that if the host reset is not completed yet, then we
> find this event on the workqueue and get into some kind of deadlock
> situation.
> 
> We're kind of stuck on this and I was wondering if anyone has any
> thoughts or avenues to look at to move us forward on resolving this?

Can you run "cat /sys/class/scsi_host/*/state" when you are in the hung
state?

If the host is in recovery no IOs will move forward. I assume if you can
get a run with the 4100 level of logging it will show a host reset sent,
but no waking up host to restart (unless the reset is being generated for
other reasons outside of the scsi error handler).

-andmike
--
Michael Anderson
andmike@linux.vnet.ibm.com

next prev parent reply	other threads:[~2009-07-02 17:41 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-07-02 16:22 [2.6.27.25] Hang in SCSI sync cache when a disk is removed--? Paul Smith
2009-07-02 17:41 ` Mike Anderson [this message]
2009-07-02 20:12   ` Paul Smith
2009-07-06 18:04   ` Paul Smith
2009-07-07  6:25     ` Mike Anderson
2009-07-07 13:58       ` James Bottomley
2009-07-07 14:33         ` Paul Smith
2009-07-07 20:24           ` Desai, Kashyap
2009-07-07 20:45             ` Mike Anderson
2009-07-07 21:10               ` Mike Anderson
2009-07-21 10:16                 ` Desai, Kashyap

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090702174151.GA17414@linux.vnet.ibm.com \
    --to=andmike@linux.vnet.ibm.com \
    --cc=linux-scsi@vger.kernel.org \
    --cc=paul@mad-scientist.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.