All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jens Axboe <axboe@kernel.dk>
To: Ming Lei <ming.lei@canonical.com>,
	Dongsu Park <dongsu.park@profitbricks.com>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Christoph Hellwig <hch@infradead.org>
Subject: Re: panic with CPU hotplug + blk-mq + scsi-mq
Date: Sat, 18 Apr 2015 14:30:48 -0600	[thread overview]
Message-ID: <5532BEF8.3070008@kernel.dk> (raw)
In-Reply-To: <CACVXFVN_NjC814BqjmAb48do2qMy9EjQ+n3OD2nLz8AngopJXg@mail.gmail.com>

On 04/17/2015 10:23 PM, Ming Lei wrote:
> Hi Dongsu,
>
> On Fri, Apr 17, 2015 at 5:41 AM, Dongsu Park
> <dongsu.park@profitbricks.com> wrote:
>> Hi,
>>
>> there's a critical bug regarding CPU hotplug, blk-mq, and scsi-mq.
>> Every time when a CPU is offlined, some arbitrary range of kernel memory
>> seems to get corrupted. Then after a while, kernel panics at random places
>> when block IOs are issued. (for example, see the call traces below)
>
> Thanks for the report.
>
>>
>> This bug can be easily reproducible with a Qemu VM running with virtio-scsi,
>> when its guest kernel is 3.19-rc1 or higher, and when scsi-mq is loaded
>> with blk-mq enabled. And yes, 4.0 release is still affected, as well as
>> Jens' for-4.1/core. How to reproduce:
>>
>>   # echo 0 > /sys/devices/system/cpu/cpu1/online
>>   (and issue some block IOs, that's it.)
>>
>> Bisecting between 3.18 and 3.19-rc1, it looks like this bug had been hidden
>> until commit ccbedf117f01 ("virtio_scsi: support multi hw queue of blk-mq"),
>> which started to allow virtio-scsi to map virtqueues to hardware queues of
>> blk-mq. Reverting that commit makes the bug go away. However, I suppose
>> reverting it could not be a correct solution.
>
> I agree, and that patch only enables multiple hw queues.
>
>>
>> More precisely, every time a CPU hotplug event gets triggered,
>> a call graph is like the following:
>>
>>    blk_mq_queue_reinit_notify()
>>    -> blk_mq_queue_reinit()
>>     -> blk_mq_map_swqueue()
>>      -> blk_mq_free_rq_map()
>>       -> scsi_exit_request()
>>
>>  From that point, as soon as any address in the request gets modified, an
>> arbitrary range of memory gets corrupted. My first guess was that probably
>> the exit routine could try to deallocate tags->rqs[] where invalid
>> addresses are stored. But actually it looks like it's not the case,
>> and cmd->sense_buffer looks also valid.
>> It's not obvious to me, exactly what could go wrong.
>>
>> Does anyone have an idea?
>
> As far as I can see, at least two problems exist:
> - race between timeout and CPU hotplug
> - in case of shared tags, during CPU online handling, about setting
> and checking hctx->tags
>
> So could you please test the attached two patches to see if they fix your issue?
>
> I run them in my VM, and looks opps does disappear.

Hard to comment on your patches directly when they are attached. Both 
look good to me. I'd perhaps change the ->tags check in #1 to use 
blk_mq_hw_queue_mapped() instead of checking directly. Might even be 
worth considering changing the normal iterator to skip unmapped queues, 
but that can be left for a later change.

-- 
Jens Axboe


  reply	other threads:[~2015-04-18 20:30 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-04-17  9:41 panic with CPU hotplug + blk-mq + scsi-mq Dongsu Park
2015-04-18  4:23 ` Ming Lei
2015-04-18 20:30   ` Jens Axboe [this message]
2015-04-19 14:31     ` Ming Lei
2015-04-20  8:07   ` Dongsu Park
2015-04-20 13:12     ` Ming Lei
2015-04-20 15:52       ` Dongsu Park
2015-04-20 16:48         ` Ming Lei
2015-04-20 18:36           ` Dongsu Park

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5532BEF8.3070008@kernel.dk \
    --to=axboe@kernel.dk \
    --cc=dongsu.park@profitbricks.com \
    --cc=hch@infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=ming.lei@canonical.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.