From: Mike Christie <michaelc@cs.wisc.edu>
To: Bart Van Assche <bvanassche@acm.org>
Cc: linux-scsi <linux-scsi@vger.kernel.org>,
James Bottomley <jbottomley@parallels.com>,
Jun'ichi Nomura <j-nomura@ce.jp.nec.com>,
Stefan Richter <stefanr@s5r6.in-berlin.de>,
Tomas Henzl <thenzl@redhat.com>,
Mike Snitzer <snitzer@redhat.com>
Subject: Re: [PATCH 2/3] Stop accepting SCSI requests before removing a device
Date: Wed, 30 May 2012 12:27:36 -0500 [thread overview]
Message-ID: <4FC65888.3000907@cs.wisc.edu> (raw)
In-Reply-To: <4FC5C488.4010307@acm.org>
On 05/30/2012 01:56 AM, Bart Van Assche wrote:
> On 05/29/12 17:35, Mike Christie wrote:
>
>> On 05/29/2012 10:00 AM, Bart Van Assche wrote:
>>> The patch below makes sure that blk_drain_queue() and blk_cleanup_queue()
>>> wait until all queuecommand invocations have finished and hence fixes a
>>> race between the SCSI error handler and __scsi_remove_device(). Any feedback
>>> is welcome.
>>>
>>> ---
>>> drivers/scsi/scsi_error.c | 14 +++++++++++++-
>>> 1 files changed, 13 insertions(+), 1 deletions(-)
>>>
>>> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
>>> index 386f0c5..947f627 100644
>>> --- a/drivers/scsi/scsi_error.c
>>> +++ b/drivers/scsi/scsi_error.c
>>> @@ -781,10 +781,17 @@ static int scsi_send_eh_cmnd(struct scsi_cmnd *scmd, unsigned char *cmnd,
>>> struct scsi_device *sdev = scmd->device;
>>> struct scsi_driver *sdrv = scsi_cmd_to_driver(scmd);
>>> struct Scsi_Host *shost = sdev->host;
>>> + struct request_queue *q = sdev->request_queue;
>>> DECLARE_COMPLETION_ONSTACK(done);
>>> unsigned long timeleft;
>>> struct scsi_eh_save ses;
>>> - int rtn;
>>> + int rtn = FAILED;
>>> +
>>> + spin_lock_irq(q->queue_lock);
>>> + if (blk_queue_dead(q))
>>> + goto out_unlock;
>>> + q->rq.count[BLK_RW_SYNC]++;
>>> + spin_unlock_irq(q->queue_lock);
>>
>> Are you hitting a case where a scsi_cmnd does not have a request struct
>> that was allocated through the block layer functions like
>> blk_get_request, but is getting sent through this path? What code is
>> doing this?
>>
>> Or, are you hitting a bug where somehow the request is freed (so the
>> rq.count is decremented) but the scsi eh is still working on a scsi_cmnd
>> that had a request struct allocated for it?
>
>
> I haven't hit any such bugs. This patch is what I came up with after
> analyzing what would be necessary to make sure that queuecommand isn't
> called anymore after blk_cleanup_queue() finished and also to make sure
> that blk_drain_queue() waits until all active queuecommand calls have
It should be waiting now if the scsi_cmnd has a request backing
shouldn't it? We will allocate a request struct with blk_get_request or
one of the other blk helpers for each scsi_cmnd, and that will increment
the q->rq.count. If we then go down the error path because a cmd timed
out or because scsi_decide_disposition returned FAILED, then we will
still have that request backing the scsi cmnd and the count should still
be incremented for it. When we call scsi_send_eh_cmnd for eh operations
the request is then still there and not freed yet. The request will get
freed later when scsi_eh_flush_done_q is called. In there we will either
retry or call scsi_finish_command which will go through the normal
completion process and eventually call __blk_put_request and freed_request.
next prev parent reply other threads:[~2012-05-30 17:28 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-05-04 15:00 [PATCH 0/3 v6] Fixes for SCSI device removal Bart Van Assche
2012-05-04 15:03 ` [PATCH 1/3] sd: Fix device removal NULL pointer dereference Bart Van Assche
2012-05-04 15:06 ` [PATCH 2/3] Stop accepting SCSI requests before removing a device Bart Van Assche
2012-05-04 20:16 ` Mike Christie
2012-05-04 20:30 ` Mike Christie
2012-05-05 13:04 ` Bart Van Assche
2012-05-29 15:00 ` Bart Van Assche
2012-05-29 17:35 ` Mike Christie
2012-05-30 6:56 ` Bart Van Assche
2012-05-30 17:27 ` Mike Christie [this message]
2012-05-30 20:00 ` Bart Van Assche
2012-06-01 3:13 ` Mike Christie
2012-05-04 15:07 ` [PATCH 3/3] Make scsi_free_queue() abort pending requests Bart Van Assche
2012-05-04 20:25 ` Mike Christie
2012-05-04 20:32 ` Mike Christie
2012-05-05 6:07 ` Bart Van Assche
2012-05-07 0:44 ` Mike Christie
2012-05-07 1:15 ` Mike Christie
2012-05-14 18:43 ` Bart Van Assche
2012-05-29 14:56 ` Bart Van Assche
2012-05-05 13:41 ` Bart Van Assche
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4FC65888.3000907@cs.wisc.edu \
--to=michaelc@cs.wisc.edu \
--cc=bvanassche@acm.org \
--cc=j-nomura@ce.jp.nec.com \
--cc=jbottomley@parallels.com \
--cc=linux-scsi@vger.kernel.org \
--cc=snitzer@redhat.com \
--cc=stefanr@s5r6.in-berlin.de \
--cc=thenzl@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).