public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Douglas Gilbert <dougg@torque.net>
To: Patrick Mansfield <patmans@us.ibm.com>
Cc: Ralf Oehler <R.Oehler@GDAmbH.com>,
	Scsi <linux-scsi@vger.kernel.org>,
	linux-kernel@vger.kernel.org, Andrea Arcangeli <andrea@suse.de>,
	Jens Axboe <axboe@kernel.org>,
	Alan Cox <alan@lxorguk.ukuu.org.uk>
Subject: Re: one-line-patch against SCSI-Read-Error-BUG()
Date: Thu, 07 Feb 2002 00:10:23 -0500	[thread overview]
Message-ID: <3C620C3F.92DB3553@torque.net> (raw)
In-Reply-To: <XFMail.20020205153210.R.Oehler@GDAmbH.com> <20020206165911.A27458@eng2.beaverton.ibm.com>

Patrick Mansfield wrote:
> 
> On Tue, Feb 05, 2002 at 03:32:10PM +0100, Ralf Oehler wrote:
> > Hi, List
> >
> > I think, I found a very simple solution for this annoying BUG().
> >
> > Since at least kernel 2.4.16 there is a BUG() in pci.h,
> > that crashes the kernel on any attempt to read a SCSI-Sector
> > from an erased MO-Medium and on any attempt to read
> > a sector from a SCSI-disk, which returns "Read-Error".
> >
> > There seems to be a thinko in the corresponding code, which
> > does not take into account the case where a SCSI-READ
> > does not return any data because of a "sense code: read error"
> > or a "sense code: blank sector".
> 
> > Regards,
> >         Ralf
> 
> Ralf -
> 
> A retried IO for a host busy (queuecommand returns 1) or SCSI queue full
> case calls scsi_mlqueue_insert; it eventually calls __scsi_insert_special,
> where we then set:
> 
>     rq->cmd = SPECIAL
> 
> The IO is then resent, and if it completes, it looks like everything will
> complete OK.
> 
> But, if we then get a check condition for the same command, sd.c will call
> scsi_io_completion - it nulls out some buffers, including request_buffer,
> and then it requeues the command without modifying rq->cmd.
> 
> scsi_request_fn() sees that we now have cmd == SPECIAL, and a valid
> SCpnt, and does not re-init the IO - this means request_buffer is still
> NULL.
> 
> I can't tell from your earlier oops stack where exactly you are in the aic
> code, but it could be a dereference of request_buffer (stored in cur_seg),
> but I would think you would have seen a NULL pointer dereference rather
> than hitting the BUG().
> 
> I don't know what conditions on your system might lead to a host busy or
> queue full, but, it seems possible that we could have a queue full or host
> busy immediately followed by a check condition.
> 
> In 2.4, I don't know how this can be fixed without adding another field
> in Scsi_Cmnd, we could just try setting cmd back to not be SPECIAL, but
> the original value is not saved - you could try such a hack and see if
> it still oopses. You could also add a check to pci_map_sg to print
> out the value of sg when the BUG() is hit.
> 
> In 2.5, we might be able to clear REQ_CMD, then set REQ_SPECIAL (in
> __scsi_insert_special), and then clear REQ_SPECIAL and set REQ_CMD
> in scsi_queue_next_request (when SCpnt != NULL).
> 
> You can see if you are hitting the busy cases by turning on SCSI mlcomplete
> (to see queue full retries) and mlqueue logging (to see the host busy retries),
> and looking for log messages. (Turning on scsi logging with syslogd and/or
> klogd? running can flood your system with log messages.)
> 
> It also looks the race condition checked for in scsi_mlqueue_insert
> is faulty (device_busy or host_busy == 0, but we are just about to
> decrement them, so they can't ever be 0, it should compare them to 1),
> but that would just hang your IO's (on queue full no matter what the
> comparison it could hang IO for clustered systems sharing SCSI devices).

My investigations show an infinite loop commences on the
adapter driver's queue_command() when a MEDIUM_ERROR is
reported. I have added an option to scsi_debug to simulate
a MEDIUM ERROR at a fixed address (0x1234). If I use sg
to read the scsi_debug ram disk it is well behaved,
stopping in an orderly fashion in sg_dd at the bad block.

However if that bad block is read via sd then the scsi_debug
driver's queue_command() entry point is called indefinitely
trying to reread the range that contains the bad block.


A new scsi_debug driver, version 1.58 has been placed
on http://www.torque.net/sg/sdebug.html
Load the scsi_debug module with "scsi_debug_opts=3" to
get debug output and the medium_error at block 0x1234 .

So the problem has been reported on these adapter
drivers: aic7xxx, sym53c8xx and scsi_debug . It is
interesting that sg doesn't have problems, probably
because it sets "retries" to 1 (i.e. don't retry).
Looks like the error handling logic is broken.

Doug Gilbert

  reply	other threads:[~2002-02-07  5:11 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2002-02-05 14:32 one-line-patch against SCSI-Read-Error-BUG() Ralf Oehler
2002-02-05 14:24 ` Jens Axboe
2002-02-05 14:52   ` Momchil Velikov
2002-02-05 14:42     ` Jens Axboe
2002-02-06  4:40   ` Douglas Gilbert
2002-02-05 20:56 ` Alan Cox
2002-02-06  7:32   ` Ralf Oehler
2002-02-06  9:22   ` Helge Hafting
2002-02-05 22:40 ` James Stevenson
2002-02-06  7:36   ` Ralf Oehler
2002-02-07  0:59 ` Patrick Mansfield
2002-02-07  5:10   ` Douglas Gilbert [this message]
2002-02-07  7:29   ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3C620C3F.92DB3553@torque.net \
    --to=dougg@torque.net \
    --cc=R.Oehler@GDAmbH.com \
    --cc=alan@lxorguk.ukuu.org.uk \
    --cc=andrea@suse.de \
    --cc=axboe@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=patmans@us.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox