linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Russell King <rmk@arm.linux.org.uk>
To: Patrick Mansfield <patmans@us.ibm.com>
Cc: James Bottomley <James.Bottomley@steeleye.com>,
	linux-scsi@vger.kernel.org
Subject: Re: SCSI woes (followup)
Date: Tue, 24 Sep 2002 20:01:17 +0100	[thread overview]
Message-ID: <20020924200117.A4409@flint.arm.linux.org.uk> (raw)
In-Reply-To: <20020924111847.A4151@eng2.beaverton.ibm.com>; from patmans@us.ibm.com on Tue, Sep 24, 2002 at 11:18:47AM -0700

On Tue, Sep 24, 2002 at 11:18:47AM -0700, Patrick Mansfield wrote:
> Moving the empty check up sounds like good and simple fix for 2.4, or
> check if queue_depth == 0. Anything else would be difficult to get right.

I disagree here.  Moving the empty check up means that we won't relock any
in-use but idle check.

As I said to James earlier today, Eric Young's comments in the code say
that it is in the wrong place.  Thinking about it for a while, I'd agree
with that statement for the reason above; any device that is in use by
user space (ie, mounted on a filesystem) could well be idle for many
hours before a request comes through for it.

This means that the door would be unlocked on an in-use device, and the
media can be (accidentally) unloaded.

> Moving the the SCSI_IOCTL_DOORLOCK doesn't fix the problem if it is
> still called on a incompletely initialized device.

There are _many_ problems here.  Let me sort out a patch later tonight
and put together the gory details of each problem.  Its basically layer
upon layer of crap and fixme's.

I have a set of fixes piling up, some trivial, that address many of these
problems.  Hell, I almost have a completely stable SCSI system here which
I almost can't bring down.  And I'm giving it all sorts of hellish
conditions to deal with.

> And, perhaps do not allow the error handler to run during scanning, let
> later IO (to any discovered device) kick off the error handler. It's
> hard to say if this is good or not - for example, if this is your root
> device, you want it online. But if it some other device, and we try hard
> to scan and use it, it can cause more problems (if it keeps getting errors,
> and we keeping running the error handler/reset cycle, blocking other IO).

No.  You need the error handler to produce bus resets.  Without bus resets,
if your SCSI bus hangs (eg, in my case due to a permanent parity error to
one device brought on by test code) you don't want to continue queueing
IDENTIFY messages to other targets.  They have no hope what so ever to get
onto the bus.

You need the error handler to time out the connection to the bad device
and perform a bus reset.  A bus reset is the only way that an initiator
can clear down a stuck bus.

> The problem happens via:
> 
> 1) device A is found that has removable media during scan
> 
> 2) INQUIRY to another device B kicks off error handling before the
> scan has completed, so device A has no command blocks.
> 
> 3) Error handler completion calls scsi_request_fn() for A.
> 
> 4) scsi_request_fn() for A sees the reset happened, and calls scsi_ioctl().
> 
> 5) scsi_ioctl() calls scsi_request_fn(), it cannot get a Scsi_Cmnd, so
> it just returns, incorrectly assuming that another request must be
> outstanding.

Not quite.  Its even more disgusting.  That code is fundamentally wrong, as
proven by my later hangs when the devices have been initialised, and passed
to the device drivers.

1) We queue up a command for device A _or_ step 3 above.

2) scsi_request_fn() gets called to start this device

3) scsi_request_fn() for A sees that a reset happened, and calls scsi_ioctl()

4) scsi_ioctl() calls scsi_request_fn().

5) the head of the request queue is _not_ the door lock, but the original
   request.

6) we kick off the original request and return from scsi_request_fn()

7) scsi_ioctl() waits for its door lock command to complete.

Ahem, well, it has _never_ been submitted to the host driver.  The done paths
for the original request don't kick the request function.  We deadlock.

Trying to lock the door in scsi_request_fn() in the way we do is
_fundamentally_ flawed.  Even Eric Young's comments agree!

Now, as I said above, I do have a whole raft of fixes accumulating here, and
if I can solve the last few problems, I'll get some patches out for you guys
to look at.

-- 
Russell King (rmk@arm.linux.org.uk)                The developer of ARM Linux
             http://www.arm.linux.org.uk/personal/aboutme.html


  reply	other threads:[~2002-09-24 19:01 UTC|newest]

Thread overview: 269+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2002-09-24 11:35 SCSI woes (followup) Russell King
2002-09-24 13:46 ` James Bottomley
2002-09-24 13:58   ` Russell King
2002-09-24 14:29     ` James Bottomley
2002-09-24 18:16       ` Luben Tuikov
2002-09-24 18:18     ` Patrick Mansfield
2002-09-24 19:01       ` Russell King [this message]
2002-09-24 19:08       ` Mike Anderson
2002-09-24 19:21         ` Russell King
2002-09-24 19:32       ` Patrick Mansfield
2002-09-24 20:00         ` Russell King
2002-09-24 22:23           ` Patrick Mansfield
2002-09-24 23:04             ` Russell King
2002-09-24 22:39         ` Russell King
2002-09-24 23:14           ` James Bottomley
2002-09-24 23:26             ` Mike Anderson
2002-09-24 23:31               ` James Bottomley
2002-09-24 23:56                 ` Mike Anderson
2002-09-24 23:33               ` Russell King
2002-09-25  0:47                 ` Mike Anderson
2002-09-25  8:45                   ` Russell King
2002-09-25  2:18                 ` Doug Ledford
2002-09-25 14:41               ` Russell King
2002-09-24 23:33           ` Mike Anderson
2002-09-24 23:45             ` Russell King
2002-09-25  0:08           ` Patrick Mansfield
2002-09-25  8:41             ` Russell King
2002-09-25 17:22               ` Patrick Mansfield
2002-09-25 12:46             ` Russell King
2002-09-24 17:57   ` Luben Tuikov
2002-09-24 18:39     ` Mike Anderson
2002-09-24 18:49       ` Luben Tuikov
  -- strict thread matches above, loose matches on Subject: below --
2002-11-21 15:16 [PATCH] turn scsi_allocate_device into readable code Christoph Hellwig
2002-11-21 15:36 ` Doug Ledford
2002-11-21 15:39   ` J.E.J. Bottomley
2002-11-21 15:49     ` Doug Ledford
2002-11-21 16:12       ` J.E.J. Bottomley
2002-11-21 17:08         ` [PATCH] current scsi-misc-2.5 include files Patrick Mansfield
2002-11-16 19:40 [PATCH] removel useless mod use count manipulation Christoph Hellwig
2002-11-17  2:59 ` Doug Ledford
2002-11-17 17:31   ` J.E.J. Bottomley
2002-11-17 18:14     ` Doug Ledford
2002-11-17 12:40 ` Douglas Gilbert
2002-11-17 12:48   ` Christoph Hellwig
2002-11-17 13:38     ` Douglas Gilbert
2002-11-15 20:34 [RFC][PATCH] move dma_mask into struct device J.E.J. Bottomley
2002-11-16  0:19 ` Mike Anderson
2002-11-16 14:48   ` J.E.J. Bottomley
2002-11-16 20:33 ` Patrick Mansfield
2002-11-17 15:07   ` J.E.J. Bottomley
2002-11-06 22:18 [PATCH] add request prep functions to SCSI J.E.J. Bottomley
2002-11-06 23:16 ` Doug Ledford
2002-11-06 23:43   ` J.E.J. Bottomley
2002-11-07 21:45 ` Mike Anderson
2002-11-06  4:24 [PATCH] fix 2.5 scsi queue depth setting Patrick Mansfield
2002-11-06  4:35 ` Patrick Mansfield
2002-11-06 17:15 ` J.E.J. Bottomley
2002-11-06 17:47 ` J.E.J. Bottomley
2002-11-06 18:24   ` Patrick Mansfield
2002-11-06 18:32     ` J.E.J. Bottomley
2002-11-06 18:39       ` Patrick Mansfield
2002-11-06 18:50         ` J.E.J. Bottomley
2002-11-06 19:50           ` Patrick Mansfield
2002-11-06 20:45     ` Doug Ledford
2002-11-06 21:19       ` J.E.J. Bottomley
2002-11-06 20:50 ` Doug Ledford
     [not found] <patmans@us.ibm.com>
2002-10-15 16:55 ` [RFC PATCH] consolidate SCSI-2 command lun setting Patrick Mansfield
2002-10-15 20:29   ` James Bottomley
2002-10-15 22:00     ` Patrick Mansfield
2002-10-30 16:58 ` [PATCH] 2.5 current bk fix setting scsi queue depths Patrick Mansfield
2002-10-30 17:17   ` James Bottomley
2002-10-30 18:05     ` Patrick Mansfield
2002-10-31  0:44       ` James Bottomley
2002-10-21 19:34 [PATCH] get rid of ->finish method for highlevel drivers Christoph Hellwig
2002-10-21 23:58 ` James Bottomley
2002-10-22 15:48   ` James Bottomley
2002-10-22 18:43     ` Patrick Mansfield
2002-10-22 23:17       ` Mike Anderson
2002-10-22 23:30         ` Doug Ledford
2002-10-23 14:16           ` James Bottomley
2002-10-23 15:13             ` Christoph Hellwig
2002-10-24  1:36               ` Patrick Mansfield
2002-10-24 23:20               ` Willem Riede
2002-10-24 23:36                 ` Christoph Hellwig
2002-10-25  0:02                   ` Willem Riede
2002-10-22  7:30 ` Mike Anderson
2002-10-22 11:14   ` Christoph Hellwig
2002-10-15 18:55 [patch 2.5] ips queue depths Jeffery, David
2002-10-15 19:30 ` Dave Hansen
2002-10-15 19:47 ` Doug Ledford
2002-10-15 20:04   ` Patrick Mansfield
2002-10-15 20:52     ` Doug Ledford
2002-10-15 23:30       ` Patrick Mansfield
2002-10-15 23:56         ` Luben Tuikov
2002-10-16  2:32         ` Doug Ledford
2002-10-16 19:04           ` Patrick Mansfield
2002-10-16 20:15             ` Doug Ledford
2002-10-17  0:39             ` Luben Tuikov
2002-10-17 17:01               ` Mike Anderson
2002-10-17 21:13                 ` Luben Tuikov
2002-10-15 20:10   ` Mike Anderson
2002-10-15 20:24     ` Doug Ledford
2002-10-15 20:38     ` James Bottomley
2002-10-15 22:10       ` Mike Anderson
2002-10-16  1:04         ` James Bottomley
2002-10-15 20:24   ` Mike Anderson
2002-10-15 22:46     ` Doug Ledford
2002-10-15 20:26   ` Luben Tuikov
2002-10-15 21:27     ` Patrick Mansfield
2002-10-16  0:43       ` Luben Tuikov
2002-10-21  7:28   ` Mike Anderson
2002-10-21 16:16     ` Doug Ledford
2002-10-21 16:29       ` James Bottomley
2002-10-10 15:01 [PATCH] scsi host cleanup 3/3 (driver changes) Stephen Cameron
2002-10-10 16:46 ` Mike Anderson
2002-10-10 16:59   ` James Bottomley
2002-10-10 20:05     ` Mike Anderson
     [not found] <dledford@redhat.com>
2002-10-02  0:28 ` PATCH: scsi device queue depth adjustability patch Doug Ledford
2002-10-02  1:16   ` Alan Cox
2002-10-02  1:41     ` Doug Ledford
2002-10-02 13:44       ` Alan Cox
2002-10-02 21:41   ` James Bottomley
2002-10-02 22:18     ` Doug Ledford
2002-10-02 23:19       ` James Bottomley
2002-10-03 12:46       ` James Bottomley
2002-10-03 16:35         ` Doug Ledford
2002-10-04  1:40         ` Jeremy Higdon
2002-10-03 14:25   ` James Bottomley
2002-10-03 16:41     ` Doug Ledford
2002-10-03 17:00       ` James Bottomley
2002-10-16 21:35 ` scsi_scan.c question Doug Ledford
2002-10-16 21:41   ` James Bottomley
2002-10-17  0:18     ` Doug Ledford
2002-10-16 21:57   ` Patrick Mansfield
2002-10-18 15:57     ` Patrick Mansfield
2002-11-18  0:27 ` aic7xxx_biosparam Doug Ledford
2002-11-18  0:36   ` aic7xxx_biosparam J.E.J. Bottomley
2002-11-18  2:46     ` aic7xxx_biosparam Doug Ledford
2002-11-18  3:20       ` aic7xxx_biosparam J.E.J. Bottomley
2002-11-18  3:26         ` aic7xxx_biosparam Doug Ledford
2002-11-18  0:43   ` aic7xxx_biosparam Andries Brouwer
2002-11-18  2:47     ` aic7xxx_biosparam Doug Ledford
2002-11-18  0:57   ` aic7xxx_biosparam Alan Cox
2002-11-18  2:34     ` aic7xxx_biosparam Doug Ledford
2002-12-21  1:22 ` scsi_scan changes Doug Ledford
2002-12-21  1:27   ` James Bottomley
2002-09-30 21:06 [PATCH] first cut at fixing unable to requeue with no outstanding commands James Bottomley
2002-09-30 23:28 ` Mike Anderson
2002-10-01  0:38   ` James Bottomley
2002-10-01 15:01     ` Patrick Mansfield
2002-10-01 15:14       ` James Bottomley
2002-10-01 16:23         ` Mike Anderson
2002-10-01 16:30           ` James Bottomley
2002-10-01 20:18         ` Inhibit auto-attach of scsi disks ? Scott Merritt
2002-10-02  0:46           ` Alan Cox
2002-10-02  1:49             ` Scott Merritt
2002-10-02  1:58               ` Doug Ledford
2002-10-02  2:45                 ` Scott Merritt
2002-10-02 13:40               ` Alan Cox
     [not found] <200209091458.g89Evv806056@localhost.localdomain>
2002-09-09 16:56 ` [RFC] Multi-path IO in 2.5/2.6 ? Patrick Mansfield
2002-09-09 17:34   ` James Bottomley
2002-09-09 18:40     ` Mike Anderson
2002-09-10 13:02       ` Lars Marowsky-Bree
2002-09-10 16:03         ` Patrick Mansfield
2002-09-10 16:27         ` Mike Anderson
2002-09-10  0:08     ` Patrick Mansfield
2002-09-10  7:55       ` Jeremy Higdon
2002-09-10 13:04         ` Lars Marowsky-Bree
2002-09-10 16:20           ` Patrick Mansfield
2002-09-10 13:16       ` Lars Marowsky-Bree
2002-09-10 19:26         ` Patrick Mansfield
2002-09-11 14:20           ` James Bottomley
2002-09-11 19:17             ` Lars Marowsky-Bree
2002-09-11 19:37               ` James Bottomley
2002-09-11 19:52                 ` Lars Marowsky-Bree
2002-09-11 21:38                 ` Oliver Xymoron
2002-09-11 20:30             ` Doug Ledford
2002-09-11 21:17               ` Mike Anderson
2002-09-10 17:21       ` Patrick Mochel
2002-09-10 18:42         ` Patrick Mansfield
2002-09-10 19:00           ` Patrick Mochel
2002-09-10 19:37             ` Patrick Mansfield
2002-09-03 14:35 aic7xxx sets CDR offline, how to reset? James Bottomley
2002-09-03 18:23 ` Doug Ledford
2002-09-03 19:09   ` James Bottomley
2002-09-03 20:59     ` Alan Cox
2002-09-03 21:32       ` James Bottomley
2002-09-03 21:54         ` Alan Cox
2002-09-03 22:50         ` Doug Ledford
2002-09-03 23:28           ` Alan Cox
2002-09-04  7:40           ` Jeremy Higdon
2002-09-04 16:24             ` James Bottomley
2002-09-04 17:13               ` Mike Anderson
2002-09-05  9:50               ` Jeremy Higdon
2002-09-04 16:13           ` James Bottomley
2002-09-04 16:50             ` Justin T. Gibbs
2002-09-05  9:39               ` Jeremy Higdon
2002-09-05 13:35                 ` Justin T. Gibbs
2002-09-05 23:56                   ` Jeremy Higdon
2002-09-06  0:13                     ` Justin T. Gibbs
2002-09-06  0:32                       ` Jeremy Higdon
2002-09-03 21:13     ` Doug Ledford
2002-09-03 21:48       ` James Bottomley
2002-09-03 22:42         ` Doug Ledford
2002-09-03 22:52           ` Doug Ledford
2002-09-03 23:29           ` Alan Cox
2002-09-04 21:16           ` Luben Tuikov
2002-09-04 10:37         ` Andries Brouwer
2002-09-04 10:48           ` Doug Ledford
2002-09-04 11:23           ` Alan Cox
2002-09-04 16:25             ` Rogier Wolff
2002-09-04 19:34               ` Thunder from the hill
2002-09-03 21:24     ` Patrick Mansfield
2002-09-03 22:02       ` James Bottomley
2002-09-03 23:26         ` Alan Cox
2002-08-26 16:29 [RFC]: 64 bit LUN/Tags, dummy device in host_queue, host_lock <-> LLDD reentrancy Aron Zeh
2002-08-26 16:48 ` James Bottomley
2002-08-26 17:27   ` Mike Anderson
2002-08-26 19:00     ` James Bottomley
2002-08-26 20:57       ` Mike Anderson
2002-08-26 21:10         ` James Bottomley
2002-08-26 22:38           ` Mike Anderson
2002-08-26 22:56             ` Patrick Mansfield
2002-08-26 23:10             ` Doug Ledford
2002-08-28 14:38             ` James Bottomley
2002-08-26 21:15         ` Mike Anderson
2002-08-12 23:38 [PATCH] 2.5.31 scsi_error.c cleanup Mike Anderson
2002-08-22 14:05 ` James Bottomley
2002-08-22 16:34   ` Mike Anderson
2002-08-22 17:11     ` James Bottomley
2002-08-22 20:10       ` Mike Anderson
2002-08-05 23:53 When must the io_request_lock be held? Jamie Wellnitz
2002-08-06 17:58 ` Mukul Kotwani
2002-08-07 14:48 ` Doug Ledford
2002-08-07 15:26   ` James Bottomley
2002-08-07 16:18     ` Doug Ledford
2002-08-07 16:48       ` James Bottomley
2002-08-07 18:06         ` Mike Anderson
2002-08-07 23:17           ` James Bottomley
2002-08-08 19:28         ` Luben Tuikov
2002-08-07 16:55       ` Patrick Mansfield
     [not found] <200206132126.g5DLQiQ24889@localhost.localdomain>
2002-06-13 21:50 ` Proposed changes to generic blk tag for use in SCSI (1/3) Doug Ledford
2002-06-13 22:09   ` James Bottomley
2002-06-11  2:46 James Bottomley
2002-06-11  5:50 ` Jens Axboe
2002-06-11 14:29   ` James Bottomley
2002-06-11 14:45     ` Jens Axboe
2002-06-11 16:39       ` James Bottomley
2002-06-13 21:01 ` Doug Ledford
2002-06-13 21:26   ` James Bottomley
2002-04-08 15:18 [RFC] Persistent naming of scsi devices sullivan
2002-04-08 15:04 ` Christoph Hellwig
2002-04-08 15:59   ` Matthew Jacob
2002-04-08 16:34   ` James Bottomley
2002-04-08 18:27     ` Patrick Mansfield
2002-04-08 19:17       ` James Bottomley
2002-04-09  0:22         ` Douglas Gilbert
2002-04-09 14:35           ` sullivan
2002-04-09 14:55         ` sullivan
2002-04-08 17:51   ` Oliver Neukum
2002-04-08 18:01     ` Christoph Hellwig
2002-04-08 18:18     ` Matthew Jacob
2002-04-08 18:28       ` James Bottomley
2002-04-08 18:34         ` Matthew Jacob
2002-04-08 19:07           ` James Bottomley
2002-04-08 20:41             ` Matthew Jacob
2002-04-08 18:45   ` Tigran Aivazian
2002-04-08 20:18 ` Eddie Williams
2002-04-09  0:48 ` Kurt Garloff

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20020924200117.A4409@flint.arm.linux.org.uk \
    --to=rmk@arm.linux.org.uk \
    --cc=James.Bottomley@steeleye.com \
    --cc=linux-scsi@vger.kernel.org \
    --cc=patmans@us.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).