linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Bart Van Assche <bvanassche@acm.org>
To: Mike Christie <michaelc@cs.wisc.edu>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>,
	Tomas Henzl <thenzl@redhat.com>,
	"'linux-scsi@vger.kernel.org'" <linux-scsi@vger.kernel.org>,
	Stanislaw Gruszka <sgruszka@redhat.com>
Subject: Re: [RFC] How to fix an async scan - rmmod race?
Date: Thu, 12 Apr 2012 10:48:20 +0000	[thread overview]
Message-ID: <4F86B2F4.3020101@acm.org> (raw)
In-Reply-To: <4F860572.6090404@cs.wisc.edu>

On 04/11/12 22:28, Mike Christie wrote:

> On 04/11/2012 02:47 PM, Bart Van Assche wrote:
>> disadvantage is that this approach will only work fine if the LLD stops
>> I/O completion notifications before invoking scsi_remove_host(). Several
> 
> I don't think you would want to do that, because you have IO from the
> sd_shutdown path that you do want to execute. After the remove/shutdown
> callouts have been run then you do not want new IO to be sent to the LLD.
> 
> So scsi_remove_host sets the host state to cancel initially. It then
> calls scsi_forget_host which will loop over devices and remove them.
> That could cause IO to be sent by functions like sd_shutdown.


So that means that with an operational transport layer it's wrong for a
SCSI LLD to stop processing SCSI commands before scsi_remove_host()
finished ? It looks like several SCSI LLD authors are not aware of this.
I have found several examples of high-profile SCSI LLD drivers in the
kernel tree that cause newly submitted SCSI commands to fail during
kernel module removal even before scsi_remove_host() gets invoked.

> After the ULD code is run __scsi_remove_device will set the state to
> SDEV_DEL and scsi_remove_host will then set the state to SHOST_DEL. So
> that would prevent new IO from getting queued.
> 
> But then is there a race that you were hitting?


scsi_remove_host() can get invoked after the SCSI core has submitted a
request to the LLD via queuecommand() but before the LLD has received
the I/O completion notification that will be generated once that request
finishes. I see three alternatives to handle this:
- The LLD stops I/O completion notifications before invoking
  scsi_remove_host() (which is not correct because it prevents
  sd_shutdown() to send SCSI commands to the device).
- The SCSI core keeps the LLD around long enough until it is sure that
  no new I/O notifications will arrive.
- The SCSI LLD stops I/O completion notifications after having invoked
  scsi_remove_host() and kills all pending SCSI commands before
  continuing with LLD-specific host removal tasks. As far as I can see
  the SCSI core doesn't provide a function yet that would allow an
  LLD to kill all pending requests. Maybe blk_abort_queue() could be
  helpful here.

Bart.

  reply	other threads:[~2012-04-12 10:48 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-04-05 13:58 [RFC] How to fix an async scan - rmmod race? Tomas Henzl
2012-04-05 15:57 ` Mike Christie
2012-04-05 16:05   ` Mike Christie
2012-04-05 18:00 ` Bart Van Assche
2012-04-05 21:29   ` Mike Christie
2012-04-06  9:24     ` Bart Van Assche
2012-04-06 17:22       ` Mike Christie
2012-04-06 18:37         ` Bart Van Assche
2012-04-11 21:46         ` Mike Christie
2012-04-06  9:54     ` Tomas Henzl
2012-04-06 15:20       ` James Bottomley
2012-04-06 16:15         ` Bart Van Assche
2012-04-06 16:35           ` James Bottomley
2012-04-06 17:01             ` Bart Van Assche
2012-04-06 17:15               ` James Bottomley
2012-04-06 17:59                 ` Bart Van Assche
2012-04-08 17:38                 ` Bart Van Assche
2012-04-11 18:17                   ` Mike Christie
2012-04-11 18:30                     ` Mike Christie
2012-04-11 19:47                     ` Bart Van Assche
2012-04-11 22:28                       ` Mike Christie
2012-04-12 10:48                         ` Bart Van Assche [this message]
2012-04-06  9:39 ` Bart Van Assche
2012-04-06 10:14   ` Tomas Henzl
2012-04-06 13:13     ` Tomas Henzl
2012-04-06 14:38       ` Bart Van Assche
2012-04-06 15:32         ` Tomas Henzl
2012-04-12 12:48 ` [RFC] How to fix an async scan - rmmod race? try_module_get Tomas Henzl
2012-04-18 16:48   ` [RFC] How to fix an async scan - 'rmmod --wait' race? Tomas Henzl
2012-04-18 18:18     ` Bart Van Assche
2012-05-17  8:42     ` James Bottomley
2012-05-17  8:55       ` Bart Van Assche
2012-05-17  9:01         ` James Bottomley
2012-05-17 14:51           ` Tomas Henzl
2012-05-22 10:05             ` James Bottomley
2012-05-25 15:13               ` Tomas Henzl
2012-05-25 18:46                 ` Dan Williams
2012-05-28 11:58                   ` Tomas Henzl

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4F86B2F4.3020101@acm.org \
    --to=bvanassche@acm.org \
    --cc=James.Bottomley@HansenPartnership.com \
    --cc=linux-scsi@vger.kernel.org \
    --cc=michaelc@cs.wisc.edu \
    --cc=sgruszka@redhat.com \
    --cc=thenzl@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).