All of lore.kernel.org
 help / color / mirror / Atom feed
From: Douglas Gilbert <dougg@torque.net>
To: Patrick Mansfield <patmans@us.ibm.com>
Cc: Ralf Oehler <R.Oehler@GDAmbH.com>,
	Scsi <linux-scsi@vger.kernel.org>,
	linux-kernel@vger.kernel.org, Andrea Arcangeli <andrea@suse.de>,
	Jens Axboe <axboe@kernel.org>,
	Alan Cox <alan@lxorguk.ukuu.org.uk>
Subject: Re: one-line-patch against SCSI-Read-Error-BUG()
Date: Thu, 07 Feb 2002 00:10:23 -0500	[thread overview]
Message-ID: <3C620C3F.92DB3553@torque.net> (raw)
In-Reply-To: <XFMail.20020205153210.R.Oehler@GDAmbH.com> <20020206165911.A27458@eng2.beaverton.ibm.com>

Patrick Mansfield wrote:
> 
> On Tue, Feb 05, 2002 at 03:32:10PM +0100, Ralf Oehler wrote:
> > Hi, List
> >
> > I think, I found a very simple solution for this annoying BUG().
> >
> > Since at least kernel 2.4.16 there is a BUG() in pci.h,
> > that crashes the kernel on any attempt to read a SCSI-Sector
> > from an erased MO-Medium and on any attempt to read
> > a sector from a SCSI-disk, which returns "Read-Error".
> >
> > There seems to be a thinko in the corresponding code, which
> > does not take into account the case where a SCSI-READ
> > does not return any data because of a "sense code: read error"
> > or a "sense code: blank sector".
> 
> > Regards,
> >         Ralf
> 
> Ralf -
> 
> A retried IO for a host busy (queuecommand returns 1) or SCSI queue full
> case calls scsi_mlqueue_insert; it eventually calls __scsi_insert_special,
> where we then set:
> 
>     rq->cmd = SPECIAL
> 
> The IO is then resent, and if it completes, it looks like everything will
> complete OK.
> 
> But, if we then get a check condition for the same command, sd.c will call
> scsi_io_completion - it nulls out some buffers, including request_buffer,
> and then it requeues the command without modifying rq->cmd.
> 
> scsi_request_fn() sees that we now have cmd == SPECIAL, and a valid
> SCpnt, and does not re-init the IO - this means request_buffer is still
> NULL.
> 
> I can't tell from your earlier oops stack where exactly you are in the aic
> code, but it could be a dereference of request_buffer (stored in cur_seg),
> but I would think you would have seen a NULL pointer dereference rather
> than hitting the BUG().
> 
> I don't know what conditions on your system might lead to a host busy or
> queue full, but, it seems possible that we could have a queue full or host
> busy immediately followed by a check condition.
> 
> In 2.4, I don't know how this can be fixed without adding another field
> in Scsi_Cmnd, we could just try setting cmd back to not be SPECIAL, but
> the original value is not saved - you could try such a hack and see if
> it still oopses. You could also add a check to pci_map_sg to print
> out the value of sg when the BUG() is hit.
> 
> In 2.5, we might be able to clear REQ_CMD, then set REQ_SPECIAL (in
> __scsi_insert_special), and then clear REQ_SPECIAL and set REQ_CMD
> in scsi_queue_next_request (when SCpnt != NULL).
> 
> You can see if you are hitting the busy cases by turning on SCSI mlcomplete
> (to see queue full retries) and mlqueue logging (to see the host busy retries),
> and looking for log messages. (Turning on scsi logging with syslogd and/or
> klogd? running can flood your system with log messages.)
> 
> It also looks the race condition checked for in scsi_mlqueue_insert
> is faulty (device_busy or host_busy == 0, but we are just about to
> decrement them, so they can't ever be 0, it should compare them to 1),
> but that would just hang your IO's (on queue full no matter what the
> comparison it could hang IO for clustered systems sharing SCSI devices).

My investigations show an infinite loop commences on the
adapter driver's queue_command() when a MEDIUM_ERROR is
reported. I have added an option to scsi_debug to simulate
a MEDIUM ERROR at a fixed address (0x1234). If I use sg
to read the scsi_debug ram disk it is well behaved,
stopping in an orderly fashion in sg_dd at the bad block.

However if that bad block is read via sd then the scsi_debug
driver's queue_command() entry point is called indefinitely
trying to reread the range that contains the bad block.


A new scsi_debug driver, version 1.58 has been placed
on http://www.torque.net/sg/sdebug.html
Load the scsi_debug module with "scsi_debug_opts=3" to
get debug output and the medium_error at block 0x1234 .

So the problem has been reported on these adapter
drivers: aic7xxx, sym53c8xx and scsi_debug . It is
interesting that sg doesn't have problems, probably
because it sets "retries" to 1 (i.e. don't retry).
Looks like the error handling logic is broken.

Doug Gilbert

  reply	other threads:[~2002-02-07  5:11 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2002-02-05 14:32 one-line-patch against SCSI-Read-Error-BUG() Ralf Oehler
2002-02-05 14:24 ` Jens Axboe
2002-02-05 14:52   ` Momchil Velikov
2002-02-05 14:42     ` Jens Axboe
2002-02-06  4:40   ` Douglas Gilbert
2002-02-05 20:56 ` Alan Cox
2002-02-06  7:32   ` Ralf Oehler
2002-02-06  9:22   ` Helge Hafting
2002-02-05 22:40 ` James Stevenson
2002-02-06  7:36   ` Ralf Oehler
2002-02-07  0:59 ` Patrick Mansfield
2002-02-07  5:10   ` Douglas Gilbert [this message]
2002-02-07  7:29   ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3C620C3F.92DB3553@torque.net \
    --to=dougg@torque.net \
    --cc=R.Oehler@GDAmbH.com \
    --cc=alan@lxorguk.ukuu.org.uk \
    --cc=andrea@suse.de \
    --cc=axboe@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=patmans@us.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.