Re: aic7xxx sets CDR offline, how to reset?

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: James Bottomley <James.Bottomley@steeleye.com>
To: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: James Bottomley <James.Bottomley@SteelEye.com>,
	"Justin T. Gibbs" <gibbs@scsiguy.com>,
	linux-kernel@vger.kernel.org, linux-scsi@vger.kernel.org
Subject: Re: aic7xxx sets CDR offline, how to reset?
Date: Tue, 03 Sep 2002 16:32:38 -0500	[thread overview]
Message-ID: <200209032132.g83LWdD09043@localhost.localdomain> (raw)
In-Reply-To: Message from Alan Cox <alan@lxorguk.ukuu.org.uk>  of "03 Sep 2002 21:59:37 BST." <1031086777.21579.33.camel@irongate.swansea.linux.org.uk>

alan@lxorguk.ukuu.org.uk said:
> What do we plan to do for the cases where reset is disabled because we
> have shared disk scsi so don't want to reset and hose the reservations

The reset gets issued and the reservation gets broken.  Good HA or other 
software knows the reservation may be lost and takes this into account in the 
cluster algorithm.

With SCSI-2 reservations, there's no way to preserve the reservation and have 
the reset be effective (I know, in theory, that this can be circumvented by 
the soft reset alternative, but I've never seen a device that implements it 
correctly).  I suppose we hope SCSI-3 Persistent Group Reservations come along 
quickly.

> If your error correction always requires all commands return to the
> block layer then the block layer is IMHO broken. Its messy enough
> doing that before you hit the fun situations where insert scsi
> commands of their own the block layer never initiated. 

This is part of the slim SCSI down approach.  The block layer already has 
handling for tag errors like this.  Inserted SCSI commands should now work 
correctly since we're deprecating the scsi_do_cmnd() in favour of scsi_do_req, 
which means the command is always associated with a request and goes into the 
block queue just like any other request.

I think the block layer, which already knows about the barrier ordering, is 
the appropriate place for this.  If you think the scsi error handler is a 
hairy wart now, just watch it grow into a stonking great carbuncle as I try to 
introduce it to the concept of command queue ordering and appropriate recovery.

> Next you only need to return stuff if commands have been issued
> between the aborting command and a barrier. Since most sane systems
> will never be causing REQ_BARRIER that should mean the general case
> for an abort is going to be fine. The CD burner example is also true
> for this. If we track barrier sequences then we will know the barrier
> count for the command we are aborting and the top barrier count for
> commands issued to the device. Finally you only need to go to the
> large hammer approach when you are dealing with a media changing
> command (ie WRITE*) - if we abort a read then knowing we don't queue
> overlapping read then write to disk we already know that the read will
> not break down the tag ordering as I understand it ? 

I agree with your reasoning.  However, errors occur infrequently enough (I 
hope) so that its just not worth the extra code complexity to make the error 
handler look for that case.

However, in all honesty, I have to say that I just don't believe ABORTs are 
ever particularly effective.  As part of error recovery, If a device is 
tipping over into failure, adding another message isn't a good way to pull it 
back.  ABORT is really part of the I/O cancellation API, and, like all 
cancellation implementations, it's potentially full of holes.  The only uses 
it might have---like oops I didn't mean to fixate that CD, give it back to me 
now---aren't clearly defined in the SPEC to produce the desired effect (stop 
the fixation so the drive door can be opened).

> If we get to the point we need an abort we don't want to issue a
> reset. Not every device comes back sane from a reset and in some cases
> we have to issue a whole sequence of commands to get the state of the
> device back (door locking, power management, ..)

Well, this is SCSI---the first thing most controllers do for parallel SCSI at 
least is reset the BUS.  Some FC drivers do the FC equivalent as well (not 
that they should, but that's another issue).

The pain of coming back from a reset (and I grant, it isn't trivial) is well 
known and well implemented in SCSI.  It also, from error handlings point of 
view, sets the device back to a known point in the state model.

James

next prev parent reply	other threads:[~2002-09-03 21:28 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2002-09-03 14:35 aic7xxx sets CDR offline, how to reset? James Bottomley
2002-09-03 18:23 ` Doug Ledford
2002-09-03 19:09   ` James Bottomley
2002-09-03 20:59     ` Alan Cox
2002-09-03 21:32       ` James Bottomley [this message]
2002-09-03 21:54         ` Alan Cox
2002-09-03 22:50         ` Doug Ledford
2002-09-03 23:28           ` Alan Cox
2002-09-04  7:40           ` Jeremy Higdon
2002-09-04 16:24             ` James Bottomley
2002-09-04 17:13               ` Mike Anderson
2002-09-05  9:50               ` Jeremy Higdon
2002-09-04 16:13           ` James Bottomley
2002-09-04 16:50             ` Justin T. Gibbs
2002-09-05  9:39               ` Jeremy Higdon
2002-09-05 13:35                 ` Justin T. Gibbs
2002-09-03 21:13     ` Doug Ledford
2002-09-03 21:48       ` James Bottomley
2002-09-03 22:42         ` Doug Ledford
2002-09-03 22:52           ` Doug Ledford
2002-09-03 23:29           ` Alan Cox
2002-09-04 21:16           ` Luben Tuikov
2002-09-04 10:37         ` Andries Brouwer
2002-09-04 10:48           ` Doug Ledford
2002-09-04 11:23           ` Alan Cox
2002-09-04 16:25             ` Rogier Wolff
2002-09-04 19:34               ` Thunder from the hill
2002-09-03 21:24     ` Patrick Mansfield
2002-09-03 22:02       ` James Bottomley
2002-09-03 23:26         ` Alan Cox
  -- strict thread matches above, loose matches on Subject: below --
2002-09-02 12:23 CAMTP guest
2002-09-02 15:50 ` Justin T. Gibbs
2002-09-02 18:05   ` Doug Ledford
2002-09-02 19:16     ` CAMTP guest
2002-09-02 19:48       ` Justin T. Gibbs
2002-09-02 19:42     ` Justin T. Gibbs
2002-06-11  2:46 Proposed changes to generic blk tag for use in SCSI (1/3) James Bottomley
2002-06-11  5:50 ` Jens Axboe
2002-06-11 14:29   ` James Bottomley
2002-06-11 14:45     ` Jens Axboe
2002-06-11 16:39       ` James Bottomley
2002-06-13 21:01 ` Doug Ledford
2002-06-13 21:26   ` James Bottomley
2002-06-13 21:50     ` Doug Ledford
2002-06-13 22:09       ` James Bottomley

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200209032132.g83LWdD09043@localhost.localdomain \
    --to=james.bottomley@steeleye.com \
    --cc=alan@lxorguk.ukuu.org.uk \
    --cc=gibbs@scsiguy.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox