Re: Bugs in scsi system, need help to fix

public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed

From: Mike Anderson <andmike@us.ibm.com>
To: Alan Stern <stern@rowland.harvard.edu>
Cc: David Brownell <david-b@pacbell.net>,
	Linux SCSI list <linux-scsi@vger.kernel.org>
Subject: Re: Bugs in scsi system, need help to fix
Date: Sun, 13 Apr 2003 23:23:33 -0700	[thread overview]
Message-ID: <20030414062333.GA11487@beaverton.ibm.com> (raw)
In-Reply-To: <Pine.LNX.4.44L0.0304121546330.3290-100000@netrider.rowland.org>

Alan Stern [stern@rowland.harvard.edu] wrote:
> The first problem is in the error-handling code in scsi_error.c.  The
> scsi_eh_lock_done() function is the callback for a special request
> inserted at the beginning of the queue while restarting normal operations.  
> (It locks the drive's door.)  But this function does not call
> scsi_io_completion(), scsi_end_request(), or scsi_queue_next_request(),
> with the result that the device's request queue stops once the door-lock
> command has been processed.  The callback needs to do _something_ to keep
> the queue going, but I don't know what.

The problem you are hitting is related to missing calls to
scsi_queue_next_request. Patrick created a patch in a previous thread:
http://marc.theaimsgroup.com/?l=linux-scsi&m=104855818826887&w=2

though it did not cover the path scsi_eh_lock_done takes. It appears
that we should add a call to scsi_queue_next_request in
scsi_release_request if sr_command is set.

> 
> Maybe I'm wrong about that, and the problem doesn't lie in 
> scsi_eh_lock_done().  But there is no doubt that at the end of error 
> recovery, after the door-lock command finishes up no other commands are 
> processed.  Logging just stops after the "Notifying upper driver of 
> completion" message.
> 
> 
> The second problem is a coding mistake in scsi_check_device_busy(), in
> hosts.c.  Here is an excerpt from the source:
> 
> ---------------
> 
> static int scsi_check_device_busy(struct scsi_device *sdev)
> {
> 	struct Scsi_Host *shost = sdev->host;
> 	struct scsi_cmnd *scmd;
> 	unsigned long flags;
> 
> 	/*
> 	 * Loop over all of the commands associated with the
> 	 * device.  If any of them are busy, then set the state
> 	 * back to inactive and bail.
> 	 */
> 	spin_lock_irqsave(&sdev->list_lock, flags);
> 	list_for_each_entry(scmd, &sdev->cmd_list, list) {
> 		if (scmd->request && scmd->request->rq_status != RQ_INACTIVE)
> 			goto active;
> 
> <snip>
> 
> active:
> 	printk(KERN_ERR "SCSI device not inactive - rq_status=%d, target=%d, "
> 			"pid=%ld, state=%d, owner=%d.\n",
> 			scmd->request->rq_status, scmd->device->id,
> 			scmd->pid, scmd->state, scmd->owner);
> 
> (A)	list_for_each_entry(sdev, &shost->my_devices, siblings) {
> 		list_for_each_entry(scmd, &sdev->cmd_list, list) {
> (B)			if (scmd->request->rq_status == RQ_SCSI_DISCONNECTING)
> 				scmd->request->rq_status = RQ_INACTIVE;
> 		}
> 	}
> 
> (C)	spin_unlock_irqrestore(&sdev->list_lock, flags);
> 	printk(KERN_ERR "Device busy???\n");
> 	return 1;
> }
> 
> ---------------
> 
> The line labelled (A) is definitely wrong.  Maybe it doesn't belong there
> at all -- I don't see any reason why a function devoted to checking
> whether a particular device is busy needs to look at any other devices on
> the same host.  Furthermore, line (A) changes the value of the sdev
> variable, which means that line (C) unlocks the wrong spinlock.  Finally,
> judging from the style of the code earlier on, line (B) needs to test that
> scmd->request is non-NULL before dereferencing it.
> 

The code you point to is incorrect. It looks like the migration of the
2.4 code left a few lines behind. The short term fix if you are hitting
a problem here is to remove (A) add "scmd->request" to (B). The better
fix is to not call this code anymore.

-andmike
--
Michael Anderson
andmike@us.ibm.com

next prev parent reply	other threads:[~2003-04-14  6:10 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <3E982DDE.7010002@pacbell.net>
2003-04-12 20:10 ` Bugs in scsi system, need help to fix Alan Stern
2003-04-14  6:23   ` Mike Anderson [this message]
2003-04-14 14:20     ` Alan Stern
2003-04-14 20:40       ` Mike Anderson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20030414062333.GA11487@beaverton.ibm.com \
    --to=andmike@us.ibm.com \
    --cc=david-b@pacbell.net \
    --cc=linux-scsi@vger.kernel.org \
    --cc=stern@rowland.harvard.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox