public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed
From: Mike Anderson <andmike@us.ibm.com>
To: Charlie Brett <cfb@ldl.fc.hp.com>
Cc: Linux SCSI list <linux-scsi@vger.kernel.org>,
	dann frazier <dannf@hp.com>
Subject: Re: serial_number and serial_number_at_timeout in 2.4
Date: Mon, 15 Sep 2003 14:09:23 -0700	[thread overview]
Message-ID: <20030915210923.GA1702@beaverton.ibm.com> (raw)
In-Reply-To: <002e01c37bc2$e838bc10$3906ee0f@americas.cpqcorp.net>

Charlie Brett [cfb@ldl.fc.hp.com] wrote:
> We discovered what appears to be a race problem in scsi_done() on the 2.4
> kernel. On MP systems, we found the following to occur:
> 
> 1) A command is scheduled through scsi_dispatch_cmd(). This will start the
> timeout timer and call queuecommand().
> 2) The command completes and the interrupt handler acquires the lock.
> 3) The timeout occurs before scsi_done()[scsi_old_done()] is called (which
> would have turned off the timeout).
> 4) The timeout routine, scsi_old_times_out(), waits for the lock.
> In scsi_done() the following lines are executed:
> 
>     SCpnt->serial_number = 0;
>     SCpnt->serial_number_at_timeout = 0;
> 
> 5) If, it is decided that the command should be retried, the lock is
> released to call scsi_dispatch_cmd().
> 6) As soon as the lock is released, the timeout routine acquires the lock
> and starts to complete. It will either call scsi_abort() or scsi_reset(). In
> each of the routines, serial_number and serial_number_at_timeout are
> compared, which are now both 0, so the routines continue (calling
> scsi_done() a second time).
> 
> To minimize the changes, we would like propose the addition of a check to
> see if the serial number is zero at the beginning of scsi_old_times_out().
> This would prevent to continuation of an abort or reset on a command that
> scsi_done() has already processed.
> 
> Note: If scsi_dispatch_cmd() does start, before the timeout routine, then a
> new serial number is assigned, which cause the current test to work.
> 
> Any thoughts?

It has been a while since I looked at that the old error handler, but
with your change you would still have an issue with the call to
scsi_old_done from the driver if scsi_old_times_out aquired the lock
first. As the update_timeout function does not past the return value
back from scsi_delete_timer so scsi_old_done does not know the timeout
is already running.

Moving to the new error handler would be better answer, but depending on
your driver maybe plugging the old error handler is your only option.

-andmike
--
Michael Anderson
andmike@us.ibm.com


  reply	other threads:[~2003-09-15 21:09 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-09-15  0:41 st.ko with usb-storage problems Matthew Dharm
2003-09-15 18:08 ` Kai Makisara
2003-09-15 19:52   ` serial_number and serial_number_at_timeout in 2.4 Charlie Brett
2003-09-15 21:09     ` Mike Anderson [this message]
2003-09-16 14:08       ` Charlie Brett

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20030915210923.GA1702@beaverton.ibm.com \
    --to=andmike@us.ibm.com \
    --cc=cfb@ldl.fc.hp.com \
    --cc=dannf@hp.com \
    --cc=linux-scsi@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox