From: Jens Axboe <axboe@kernel.dk>
To: Ming Lei <ming.lei@canonical.com>,
Dongsu Park <dongsu.park@profitbricks.com>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
Christoph Hellwig <hch@infradead.org>
Subject: Re: panic with CPU hotplug + blk-mq + scsi-mq
Date: Sat, 18 Apr 2015 14:30:48 -0600 [thread overview]
Message-ID: <5532BEF8.3070008@kernel.dk> (raw)
In-Reply-To: <CACVXFVN_NjC814BqjmAb48do2qMy9EjQ+n3OD2nLz8AngopJXg@mail.gmail.com>
On 04/17/2015 10:23 PM, Ming Lei wrote:
> Hi Dongsu,
>
> On Fri, Apr 17, 2015 at 5:41 AM, Dongsu Park
> <dongsu.park@profitbricks.com> wrote:
>> Hi,
>>
>> there's a critical bug regarding CPU hotplug, blk-mq, and scsi-mq.
>> Every time when a CPU is offlined, some arbitrary range of kernel memory
>> seems to get corrupted. Then after a while, kernel panics at random places
>> when block IOs are issued. (for example, see the call traces below)
>
> Thanks for the report.
>
>>
>> This bug can be easily reproducible with a Qemu VM running with virtio-scsi,
>> when its guest kernel is 3.19-rc1 or higher, and when scsi-mq is loaded
>> with blk-mq enabled. And yes, 4.0 release is still affected, as well as
>> Jens' for-4.1/core. How to reproduce:
>>
>> # echo 0 > /sys/devices/system/cpu/cpu1/online
>> (and issue some block IOs, that's it.)
>>
>> Bisecting between 3.18 and 3.19-rc1, it looks like this bug had been hidden
>> until commit ccbedf117f01 ("virtio_scsi: support multi hw queue of blk-mq"),
>> which started to allow virtio-scsi to map virtqueues to hardware queues of
>> blk-mq. Reverting that commit makes the bug go away. However, I suppose
>> reverting it could not be a correct solution.
>
> I agree, and that patch only enables multiple hw queues.
>
>>
>> More precisely, every time a CPU hotplug event gets triggered,
>> a call graph is like the following:
>>
>> blk_mq_queue_reinit_notify()
>> -> blk_mq_queue_reinit()
>> -> blk_mq_map_swqueue()
>> -> blk_mq_free_rq_map()
>> -> scsi_exit_request()
>>
>> From that point, as soon as any address in the request gets modified, an
>> arbitrary range of memory gets corrupted. My first guess was that probably
>> the exit routine could try to deallocate tags->rqs[] where invalid
>> addresses are stored. But actually it looks like it's not the case,
>> and cmd->sense_buffer looks also valid.
>> It's not obvious to me, exactly what could go wrong.
>>
>> Does anyone have an idea?
>
> As far as I can see, at least two problems exist:
> - race between timeout and CPU hotplug
> - in case of shared tags, during CPU online handling, about setting
> and checking hctx->tags
>
> So could you please test the attached two patches to see if they fix your issue?
>
> I run them in my VM, and looks opps does disappear.
Hard to comment on your patches directly when they are attached. Both
look good to me. I'd perhaps change the ->tags check in #1 to use
blk_mq_hw_queue_mapped() instead of checking directly. Might even be
worth considering changing the normal iterator to skip unmapped queues,
but that can be left for a later change.
--
Jens Axboe
next prev parent reply other threads:[~2015-04-18 20:30 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-04-17 9:41 panic with CPU hotplug + blk-mq + scsi-mq Dongsu Park
2015-04-18 4:23 ` Ming Lei
2015-04-18 20:30 ` Jens Axboe [this message]
2015-04-19 14:31 ` Ming Lei
2015-04-20 8:07 ` Dongsu Park
2015-04-20 13:12 ` Ming Lei
2015-04-20 15:52 ` Dongsu Park
2015-04-20 16:48 ` Ming Lei
2015-04-20 18:36 ` Dongsu Park
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5532BEF8.3070008@kernel.dk \
--to=axboe@kernel.dk \
--cc=dongsu.park@profitbricks.com \
--cc=hch@infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=ming.lei@canonical.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox