From: scameron@beardog.cce.hp.com
To: Douglas Gilbert <dgilbert@interlog.com>
Cc: Tomas Henzl <thenzl@redhat.com>,
james.bottomley@hansenpartnership.com, stephenmcameron@gmail.com,
mikem@beardog.cce.hp.com, linux-scsi@vger.kernel.org,
scott.teel@hp.com, scameron@beardog.cce.hp.com
Subject: Re: [PATCH 07/10] hpsa: hide logical drives with format in progress from linux
Date: Fri, 27 Sep 2013 12:41:34 -0500 [thread overview]
Message-ID: <20130927174134.GD31476@beardog.cce.hp.com> (raw)
In-Reply-To: <5245B84F.6090605@interlog.com>
On Fri, Sep 27, 2013 at 12:54:39PM -0400, Douglas Gilbert wrote:
> On 13-09-27 10:41 AM, scameron@beardog.cce.hp.com wrote:
> >On Fri, Sep 27, 2013 at 04:01:30PM +0200, Tomas Henzl wrote:
> >>On 09/27/2013 03:34 PM, scameron@beardog.cce.hp.com wrote:
> >>>On Fri, Sep 27, 2013 at 03:22:19PM +0200, Tomas Henzl wrote:
> >>>>On 09/23/2013 08:34 PM, Stephen M. Cameron wrote:
> >>>>>From: Stephen M. Cameron <scameron@beardog.cce.hp.com>
> >>>>>
> >>>>>SCSI mid layer doesn't seem to handle logical drives undergoing format
> >>>>>very well. scsi_add_device on such devices seems to result in hitting
> >>>>>those devices with a TUR at a rate of 3Hz for awhile, transitioning
> >>>>>to hitting them with a READ(10) at a much higher rate indefinitely,
> >>>>>and at boot time, this prevents the system from coming up. If we
> >>>>>do not expose such devices to the kernel, it isn't bothered by them.
> >>>>Is the result of this patch that the drive is no more visible for the
> >>>>user
> >>>>and he can't follow the formatting progress?
> >>>Yes (subsequent patch monitors the progress and brings the drive
> >>>online when it's ready).
> >>>
> >>>>I think a better option is to fix the kernel to handle formatting
> >>>>devices better
> >>>Yeah, you're probably right. (This is what comes of writing code for all
> >>>the distros then forward porting to kernel.org code.
> >>>Grumble-grumble-management
> >>>grumble-grumble real-world problems.)
> >>>
> >>>>or harden the hpsa so it can cope with TURs or reads (ignore) from a
> >>>>formatting
> >>>>device.
> >>>I don't think hpsa driver had any problem with the TURs or READs though,
> >>>they would be returned to the mid layer just fine (TUR returned sense
> >>>data
> >>>indicating not ready, format in progress, I forget what the reads
> >>>returned, whatever the firmware filled in for the sense data, which
> >>>was reasonable), but the mid-layer was relentless and just never
> >>>really proceeded, iirc.
> >>>
> >>>Since we were trying to make this work on existing OSes where fixing the
> >>>SCSI mid layer wasn't an option, we came up with this.
> >>
> >>I'm actually glad that you care about existing OSes :)
> >
> >And the pain of porting would be much the same regardless of
> >whether the port is forward or backward, I suppose.
> >
> >>
> >>Do you know whether the midlayer has similar problems with other drivers?
> >
> >No, not sure. One thing that's a bit unusual about hpsa is it uses
> >the scan_start and scan_finished members of scsi_host_template, so hpsa
> >does its own scanning, rather than let the midlayer do the scanning which
> >is due to Smart Array's weirdness around the vicinity of SCSI_REPORT_LUNS.
> >
> >I suspect that a lld driver calling scsi_add_device() on something which
> >is NOT READY/FORMAT IN PROGRESS is what provokes the trouble. Most drivers
> >do not call scsi_add_device() directly at all, so it's quite possible most
> >drivers do not experience such a problem. A few do call scsi_add_device()
> >directly, like ipr or pmcraid, so these might conceivably have a similar
> >problem.
> >
> >We ran into this problem with what we call "Rapid Parity Initialization",
> >which
> >is what you get when the RAID controller leaves the logical volume in a NOT
> >READY/FORMAT IN PROGRESS state and devotes itself entirely to initializing
> >parity data and when that's done, then the volume starts acting normally.
> >
> >Initializing the parity data can take quite a long time (hours), but not as
> >long as initializing it on the fly under load, which, with very large,
> >relatively slow drives can take nigh on forever, hence the "rapid" parity
> >initialization moniker. So, if those other RAID controllers don't have a
> >similar feature that produces a relatively long lived NOT READY/FORMAT IN
> >PROGRESS state, they may not bump into the problem.
>
> {0x04,0x04,"Logical unit not ready, format in progress"},
> {0x04,0x05,"Logical unit not ready, rebuild in progress"},
> {0x04,0x06,"Logical unit not ready, recalculation in progress"},
> {0x04,0x07,"Logical unit not ready, operation in progress"},
> ...
> {0x04,0x1b,"Logical unit not ready, sanitize in progress"},
>
> Wouldn't perhaps 0x4,0x5 be more accurate? If someone managed to
> send a FORMAT UNIT or SANITIZE to a physical drive behind your LV,
> that would be a completely different issue.
Perhaps, but 0x04/0x04 is what the firmware returns in this instance.
-- steve
next prev parent reply other threads:[~2013-09-27 17:41 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-09-23 18:33 [PATCH 00/10] hpsa: September 2013 driver fixes Stephen M. Cameron
2013-09-23 18:33 ` [PATCH 01/10] hpsa: do not attempt to flush the cache on locked up controllers Stephen M. Cameron
2013-09-23 18:33 ` [PATCH 02/10] hpsa: add 5 second delay after doorbell reset Stephen M. Cameron
2013-09-23 18:33 ` [PATCH 03/10] hpsa: do not discard scsi status on aborted commands Stephen M. Cameron
2013-09-23 18:33 ` [PATCH 04/10] hpsa: remove unneeded include of seq_file.h Stephen M. Cameron
2013-09-23 18:33 ` [PATCH 05/10] hpsa: fix memory leak in CCISS_BIG_PASSTHRU ioctl Stephen M. Cameron
2013-09-23 18:33 ` [PATCH 06/10] hpsa: add MSA 2040 to list of external target devices Stephen M. Cameron
2013-09-23 18:34 ` [PATCH 07/10] hpsa: hide logical drives with format in progress from linux Stephen M. Cameron
2013-09-27 13:22 ` Tomas Henzl
2013-09-27 13:34 ` scameron
2013-09-27 14:01 ` Tomas Henzl
2013-09-27 14:41 ` scameron
2013-09-27 14:58 ` Tomas Henzl
2013-09-30 21:18 ` scameron
2013-09-27 16:54 ` Douglas Gilbert
2013-09-27 17:41 ` scameron [this message]
2013-10-10 16:25 ` scameron
2013-09-27 19:11 ` scameron
2013-09-23 18:34 ` [PATCH 08/10] hpsa: bring logical drives online when format completes Stephen M. Cameron
2013-09-23 18:34 ` [PATCH 09/10] hpsa: cap CCISS_PASSTHRU at 20 concurrent commands Stephen M. Cameron
2013-09-23 18:34 ` [PATCH 10/10] hpsa: prevent stalled i/o Stephen M. Cameron
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130927174134.GD31476@beardog.cce.hp.com \
--to=scameron@beardog.cce.hp.com \
--cc=dgilbert@interlog.com \
--cc=james.bottomley@hansenpartnership.com \
--cc=linux-scsi@vger.kernel.org \
--cc=mikem@beardog.cce.hp.com \
--cc=scott.teel@hp.com \
--cc=stephenmcameron@gmail.com \
--cc=thenzl@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).