From: Gabriel Krisman Bertazi <krisman@linux.vnet.ibm.com>
To: Jens Axboe <axboe@kernel.dk>
Cc: linux-block@vger.kernel.org, linux-scsi@vger.kernel.org,
linux-nvme@lists.infradead.org,
Keith Busch <keith.busch@intel.com>,
Brian King <brking@linux.vnet.ibm.com>,
Christoph Hellwig <hch@lst.de>
Subject: Re: Oops when completing request on the wrong queue
Date: Mon, 05 Sep 2016 09:02:56 -0300 [thread overview]
Message-ID: <87mvjmzh8v.fsf@linux.vnet.ibm.com> (raw)
In-Reply-To: <5902c166-7aec-b2ae-72d7-07e8efeb5aa9@kernel.dk> (Jens Axboe's message of "Mon, 29 Aug 2016 12:40:32 -0600")
Jens Axboe <axboe@kernel.dk> writes:
> On 08/29/2016 12:06 PM, Gabriel Krisman Bertazi wrote:
>> Jens Axboe <axboe@kernel.dk> writes:
>>>> Can you try this patch? It's not perfect, but I'll be interested if it
>>>> makes a difference for you.
>>>
>>
>> Hi Jens,
>>
>> Sorry for the delay. I just got back to this and have been running your
>> patch on top of 4.8 without a crash for over 1 hour. I wanna give it
>> more time to make sure it's running properly, though.
>>
>> Let me get back to you after a few more rounds of test.
>
> Thanks, sounds good. The patches have landed in mainline too.
Hi Jens,
Our test teams ran stress tests on several machines over the last week
on a test kernel with your patches applied, and were no longer able to
reproduce the issue.
Thanks a lot for helping out on this one.
>>> This one should handle the WARN_ON() for running the hw queue on the
>>> wrong CPU as well.
>>
>> On the workaround you added to prevent WARN_ON, we surely need to
>> prevent blk_mq_hctx_next_cpu from scheduling dead cpus in the first
>> place, right.. How do you feel about the following RFC? I know it's
>> not a complete fix, but it feels like a good improvement to me.
>>
>> http://www.spinics.net/lists/linux-scsi/msg98608.html
>
> But we can't completely prevent it, and I don't think we have to. I just
> don't want to trigger a warning for something that's a valid condition.
> I want the warning to trigger if this happens without the CPU going
> offline, since then it's indicative of a real bug in the mapping. Your
> patch isn't going to prevent it either - it'll shrink the window, at the
> expense of making blk_mq_hctx_next_cpu() more expensive. So I don't
> think it's worthwhile.
Right, I got your point. Your patch definitely prevents the WARN_ON
from occurring on CPU hotplug events too. So thanks a lot for help on
that too :)
--
Gabriel Krisman Bertazi
_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme
WARNING: multiple messages have this Message-ID (diff)
From: krisman@linux.vnet.ibm.com (Gabriel Krisman Bertazi)
Subject: Oops when completing request on the wrong queue
Date: Mon, 05 Sep 2016 09:02:56 -0300 [thread overview]
Message-ID: <87mvjmzh8v.fsf@linux.vnet.ibm.com> (raw)
In-Reply-To: <5902c166-7aec-b2ae-72d7-07e8efeb5aa9@kernel.dk> (Jens Axboe's message of "Mon, 29 Aug 2016 12:40:32 -0600")
Jens Axboe <axboe at kernel.dk> writes:
> On 08/29/2016 12:06 PM, Gabriel Krisman Bertazi wrote:
>> Jens Axboe <axboe at kernel.dk> writes:
>>>> Can you try this patch? It's not perfect, but I'll be interested if it
>>>> makes a difference for you.
>>>
>>
>> Hi Jens,
>>
>> Sorry for the delay. I just got back to this and have been running your
>> patch on top of 4.8 without a crash for over 1 hour. I wanna give it
>> more time to make sure it's running properly, though.
>>
>> Let me get back to you after a few more rounds of test.
>
> Thanks, sounds good. The patches have landed in mainline too.
Hi Jens,
Our test teams ran stress tests on several machines over the last week
on a test kernel with your patches applied, and were no longer able to
reproduce the issue.
Thanks a lot for helping out on this one.
>>> This one should handle the WARN_ON() for running the hw queue on the
>>> wrong CPU as well.
>>
>> On the workaround you added to prevent WARN_ON, we surely need to
>> prevent blk_mq_hctx_next_cpu from scheduling dead cpus in the first
>> place, right.. How do you feel about the following RFC? I know it's
>> not a complete fix, but it feels like a good improvement to me.
>>
>> http://www.spinics.net/lists/linux-scsi/msg98608.html
>
> But we can't completely prevent it, and I don't think we have to. I just
> don't want to trigger a warning for something that's a valid condition.
> I want the warning to trigger if this happens without the CPU going
> offline, since then it's indicative of a real bug in the mapping. Your
> patch isn't going to prevent it either - it'll shrink the window, at the
> expense of making blk_mq_hctx_next_cpu() more expensive. So I don't
> think it's worthwhile.
Right, I got your point. Your patch definitely prevents the WARN_ON
from occurring on CPU hotplug events too. So thanks a lot for help on
that too :)
--
Gabriel Krisman Bertazi
next prev parent reply other threads:[~2016-09-05 12:02 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-08-10 4:04 Oops when completing request on the wrong queue Gabriel Krisman Bertazi
2016-08-11 17:16 ` Keith Busch
2016-08-11 18:10 ` Gabriel Krisman Bertazi
2016-08-19 13:28 ` Gabriel Krisman Bertazi
2016-08-19 13:28 ` Gabriel Krisman Bertazi
2016-08-19 14:13 ` Jens Axboe
2016-08-19 14:13 ` Jens Axboe
2016-08-19 15:51 ` Jens Axboe
2016-08-19 15:51 ` Jens Axboe
2016-08-19 16:38 ` Gabriel Krisman Bertazi
2016-08-19 16:38 ` Gabriel Krisman Bertazi
2016-08-23 20:54 ` Gabriel Krisman Bertazi
2016-08-23 20:54 ` Gabriel Krisman Bertazi
2016-08-23 21:11 ` Jens Axboe
2016-08-23 21:11 ` Jens Axboe
2016-08-23 21:14 ` Jens Axboe
2016-08-23 21:14 ` Jens Axboe
2016-08-23 22:49 ` Keith Busch
2016-08-23 22:49 ` Keith Busch
2016-08-24 18:34 ` Jens Axboe
2016-08-24 18:34 ` Jens Axboe
2016-08-24 20:36 ` Jens Axboe
2016-08-24 20:36 ` Jens Axboe
2016-08-29 18:06 ` Gabriel Krisman Bertazi
2016-08-29 18:06 ` Gabriel Krisman Bertazi
2016-08-29 18:40 ` Jens Axboe
2016-08-29 18:40 ` Jens Axboe
2016-09-05 12:02 ` Gabriel Krisman Bertazi [this message]
2016-09-05 12:02 ` Gabriel Krisman Bertazi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87mvjmzh8v.fsf@linux.vnet.ibm.com \
--to=krisman@linux.vnet.ibm.com \
--cc=axboe@kernel.dk \
--cc=brking@linux.vnet.ibm.com \
--cc=hch@lst.de \
--cc=keith.busch@intel.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-nvme@lists.infradead.org \
--cc=linux-scsi@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.