All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mike Christie <michaelc@cs.wisc.edu>
To: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Chanho Min <chanho0207@gmail.com>,
	linux-scsi@vger.kernel.org, linux-kernel@vger.kernel.org,
	Jens Axboe <axboe@kernel.dk>, Tejun Heo <tj@kernel.org>
Subject: Re: [PATCH] fix NULL-pointer dereference on scsi_run_queue
Date: Thu, 02 Aug 2012 22:01:42 -0500	[thread overview]
Message-ID: <501B3F16.3090308@cs.wisc.edu> (raw)
In-Reply-To: <1343900093.5073.15.camel@dabdike.int.hansenpartnership.com>

On 08/02/2012 04:34 AM, James Bottomley wrote:
> On Thu, 2012-08-02 at 18:28 +0900, Chanho Min wrote:
>> On Thu, Aug 2, 2012 at 5:57 PM, James Bottomley
>> <James.Bottomley@hansenpartnership.com> wrote:
>>> On Thu, 2012-08-02 at 17:41 +0900, Chanho Min wrote:
>>>> This patch is to fix a oops from a torn down device. When
>>>> scsi_run_queue process starved queues, scsi_request_fn can race with
>>>> scsi_remove_device. In this case, rarely, scsi_request_fn release the
>>>> last reference and set sdev->request_queue to NULL. It result in
>>>> NULL-pointer dereference when spin_unlock is tried with (NULL)->
>>>> queue_lock. We need to add an extra reference to the device on both
>>>> sides of the __blk_run_queue to hold reference until scsi_request_fn
>>>> is finished.
>>>
>>> You need a recent kernel with this patch:
>>>
>>> commit 940f5d47e2f2e1fa00443921a0abf4822335b54d
>>> Author: Bart Van Assche <bvanassche@acm.org>
>>> Date:   Fri Jun 29 15:34:26 2012 +0000
>>>
>>>     [SCSI] Avoid dangling pointer in scsi_requeue_command()
>>>
>>> James
>> It is different from my case. This is occured inside scsi_run_queue
>> and on processing starved_list.
>> Another sdev is obtained from starved_list.
> 
> Does it occur with that patch applied?
> 
> If it does, the likely fix would be to take a copy of the queue ... but
> I'd like to understand why first.  An active command has an automatic
> reference to the sdev_gendev, so it shouldn't be the normal case.  This
> was broken by unprep because it releases the command from the queue and
> drops the reference.  We may have another case like unjprep, but in that
> case, we need to find it ... trying to add extra get/put_device() calls
> will paper over the problem.
> 

I think the problem is that __scsi_remove_device will now wait for
commands to get dequeued and run, before proceeding but we do not take a
device off the starved list until scsi_device_dev_release_usercontext is
run, or maybe thinking about it another way scsi_kill_request does not
remove sdevs from the starved list if the device is being removed.

So lets say we hit the not_ready path in scsi_request_fn and put the
sdev on the starved list. Then we remove the device. We could end up
putting the device in SDEV_DEL, and then calling scsi_request_fn via
blk_cleanup_queue's drain queue call. scsi_request_fn would hit the
scsi_device_online check and fail the IO, but we never took the sdev off
the starved list from what I can tell.

Now, there is no IO in the queue and so __scsi_remove_device continues.
It then calls scsi_device_dev_release_usercontext at the same time some
other thread is calling scsi_run_queue. We then race. scsi_run_queue
splices the starved list with the sdev we are trying to remove and
deletes the list entry from the list and drops the host lock. But then
scsi_device_dev_release_usercontext grabs the host lock and ends up
running the entire function and freeing the queue. Then scsi_run_queue
tries to access the sdev and queue so it can grab the queue lock that
was just freed and kablewy.


  parent reply	other threads:[~2012-08-03  3:01 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-08-02  8:41 [PATCH] fix NULL-pointer dereference on scsi_run_queue Chanho Min
2012-08-02  8:57 ` James Bottomley
2012-08-02  9:28   ` Chanho Min
2012-08-02  9:34     ` James Bottomley
2012-08-03  2:28       ` Chanho Min
2012-08-03  3:01       ` Mike Christie [this message]
2012-08-04  9:01 ` Bart Van Assche
2012-08-04 16:46   ` Mike Christie
2012-08-04 20:18     ` Bart Van Assche
2012-08-04 22:36       ` Mike Christie
2012-08-06 17:56         ` Bart Van Assche
2012-08-07  8:53           ` Chanho Min
2012-08-07  9:30             ` Bart Van Assche
2012-08-08  3:42               ` Chanho Min
2012-08-08  7:37                 ` Bart Van Assche
2012-08-07  9:43             ` Bart Van Assche
2012-08-07 16:16           ` Mike Christie

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=501B3F16.3090308@cs.wisc.edu \
    --to=michaelc@cs.wisc.edu \
    --cc=James.Bottomley@HansenPartnership.com \
    --cc=axboe@kernel.dk \
    --cc=chanho0207@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.