public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Jens Axboe <axboe@kernel.dk>
To: Ming Lei <ming.lei@canonical.com>,
	Dongsu Park <dongsu.park@profitbricks.com>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Christoph Hellwig <hch@infradead.org>
Subject: Re: panic with CPU hotplug + blk-mq + scsi-mq
Date: Sat, 18 Apr 2015 14:30:48 -0600	[thread overview]
Message-ID: <5532BEF8.3070008@kernel.dk> (raw)
In-Reply-To: <CACVXFVN_NjC814BqjmAb48do2qMy9EjQ+n3OD2nLz8AngopJXg@mail.gmail.com>

On 04/17/2015 10:23 PM, Ming Lei wrote:
> Hi Dongsu,
>
> On Fri, Apr 17, 2015 at 5:41 AM, Dongsu Park
> <dongsu.park@profitbricks.com> wrote:
>> Hi,
>>
>> there's a critical bug regarding CPU hotplug, blk-mq, and scsi-mq.
>> Every time when a CPU is offlined, some arbitrary range of kernel memory
>> seems to get corrupted. Then after a while, kernel panics at random places
>> when block IOs are issued. (for example, see the call traces below)
>
> Thanks for the report.
>
>>
>> This bug can be easily reproducible with a Qemu VM running with virtio-scsi,
>> when its guest kernel is 3.19-rc1 or higher, and when scsi-mq is loaded
>> with blk-mq enabled. And yes, 4.0 release is still affected, as well as
>> Jens' for-4.1/core. How to reproduce:
>>
>>   # echo 0 > /sys/devices/system/cpu/cpu1/online
>>   (and issue some block IOs, that's it.)
>>
>> Bisecting between 3.18 and 3.19-rc1, it looks like this bug had been hidden
>> until commit ccbedf117f01 ("virtio_scsi: support multi hw queue of blk-mq"),
>> which started to allow virtio-scsi to map virtqueues to hardware queues of
>> blk-mq. Reverting that commit makes the bug go away. However, I suppose
>> reverting it could not be a correct solution.
>
> I agree, and that patch only enables multiple hw queues.
>
>>
>> More precisely, every time a CPU hotplug event gets triggered,
>> a call graph is like the following:
>>
>>    blk_mq_queue_reinit_notify()
>>    -> blk_mq_queue_reinit()
>>     -> blk_mq_map_swqueue()
>>      -> blk_mq_free_rq_map()
>>       -> scsi_exit_request()
>>
>>  From that point, as soon as any address in the request gets modified, an
>> arbitrary range of memory gets corrupted. My first guess was that probably
>> the exit routine could try to deallocate tags->rqs[] where invalid
>> addresses are stored. But actually it looks like it's not the case,
>> and cmd->sense_buffer looks also valid.
>> It's not obvious to me, exactly what could go wrong.
>>
>> Does anyone have an idea?
>
> As far as I can see, at least two problems exist:
> - race between timeout and CPU hotplug
> - in case of shared tags, during CPU online handling, about setting
> and checking hctx->tags
>
> So could you please test the attached two patches to see if they fix your issue?
>
> I run them in my VM, and looks opps does disappear.

Hard to comment on your patches directly when they are attached. Both 
look good to me. I'd perhaps change the ->tags check in #1 to use 
blk_mq_hw_queue_mapped() instead of checking directly. Might even be 
worth considering changing the normal iterator to skip unmapped queues, 
but that can be left for a later change.

-- 
Jens Axboe


  reply	other threads:[~2015-04-18 20:30 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-04-17  9:41 panic with CPU hotplug + blk-mq + scsi-mq Dongsu Park
2015-04-18  4:23 ` Ming Lei
2015-04-18 20:30   ` Jens Axboe [this message]
2015-04-19 14:31     ` Ming Lei
2015-04-20  8:07   ` Dongsu Park
2015-04-20 13:12     ` Ming Lei
2015-04-20 15:52       ` Dongsu Park
2015-04-20 16:48         ` Ming Lei
2015-04-20 18:36           ` Dongsu Park

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5532BEF8.3070008@kernel.dk \
    --to=axboe@kernel.dk \
    --cc=dongsu.park@profitbricks.com \
    --cc=hch@infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=ming.lei@canonical.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox