public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed
From: "Zephaniah E. Loss-Cutler-Hull" <warp@aehallh.com>
To: Paolo Valente <paolo.valente@linaro.org>, Jens Axboe <axboe@kernel.dk>
Cc: "Zephaniah E. Loss-Cutler-Hull" <warp-spam_kernel@aehallh.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	linux-block <linux-block@vger.kernel.org>,
	linux-scsi@vger.kernel.org
Subject: Re: General protection fault with use_blk_mq=1.
Date: Thu, 29 Mar 2018 02:12:39 -0700	[thread overview]
Message-ID: <0a96375b-5a2e-e828-fa1f-a14af192be4c@aehallh.com> (raw)
In-Reply-To: <882A26D2-BEB8-4CE3-B132-0DE31BFD5D28@linaro.org>


[-- Attachment #1.1: Type: text/plain, Size: 2932 bytes --]

On 03/28/2018 10:13 PM, Paolo Valente wrote:
> 
> 
>> Il giorno 29 mar 2018, alle ore 05:22, Jens Axboe <axboe@kernel.dk> ha scritto:
>>
>> On 3/28/18 9:13 PM, Zephaniah E. Loss-Cutler-Hull wrote:
>>> On 03/28/2018 06:02 PM, Jens Axboe wrote:
>>>> On 3/28/18 5:03 PM, Zephaniah E. Loss-Cutler-Hull wrote:
>>>>> I am not subscribed to any of the lists on the To list here, please CC
>>>>> me on any replies.
>>>>>
>>>>> I am encountering a fairly consistent crash anywhere from 15 minutes to
>>>>> 12 hours after boot with scsi_mod.use_blk_mq=1 dm_mod.use_blk_mq=1> 
>>>>> The crash looks like:
>>>>>
>>>
>>>>>
>>>>> Looking through the code, I'd guess that this is dying inside
>>>>> blkg_rwstat_add, which calls percpu_counter_add_batch, which is what RIP
>>>>> is pointing at.
>>>>
>>>> Leaving the whole thing here for Paolo - it's crashing off insertion of
>>>> a request coming out of SG_IO. Don't think we've seen this BFQ failure
>>>> case before.
>>>>
>>>> You can mitigate this by switching the scsi-mq devices to mq-deadline
>>>> instead.
>>>>
>>>
>>> I'm thinking that I should also be able to mitigate it by disabling
>>> CONFIG_DEBUG_BLK_CGROUP.
>>>
>>> That should remove that entire chunk of code.
>>>
>>> Of course, that won't help if this is actually a symptom of a bigger
>>> problem.
>>
>> Yes, it's not a given that it will fully mask the issue at hand. But
>> turning off BFQ has a much higher chance of working for you.
>>
>> This time actually CC'ing Paolo.
>>
> 
> Hi Zephaniah,
> if you are actually interested in the benefits of BFQ (low latency,
> high responsiveness, fairness, ...) then it may be worth to try what
> you yourself suggest: disabling CONFIG_DEBUG_BLK_CGROUP.  Also because
> this option activates the heavy computation of debug cgroup statistics,
> which probably you don't use.

I definitely am.
> 
> In addition, the outcome of your attempt without
> CONFIG_DEBUG_BLK_CGROUP would give us useful bisection information:
> - if no failure occurs, then the issue is likely to be confined in
> that debugging code (which, on the bright side, is likely to be of
> occasional interest, for only a handful of developers)
> - if the issue still shows up, then we may have new hints on this odd
> failure
> 
> Finally, consider that this issue has been reported to disappear from
> 4.16 [1], and, as a plus, that the service quality of BFQ had a
> further boost exactly from 4.16.

I look forward to that either way then.
> 
> Looking forward to your feedback, in case you try BFQ without
> CONFIG_DEBUG_BLK_CGROUP,

I'm running that now, judging from the past if it survives until
tomorrow evening then we're good, so I should hopefully know in the next
day.

Thank you,
Zephaniah E. Loss-Cutler-Hull.

> Paolo
> 
> [1] https://www.spinics.net/lists/linux-block/msg21422.html
> 
>>
>> -- 
>> Jens Axboe
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

  reply	other threads:[~2018-03-29  9:12 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-03-28 23:03 General protection fault with use_blk_mq=1 Zephaniah E. Loss-Cutler-Hull
2018-03-29  1:02 ` Jens Axboe
2018-03-29  3:13   ` Zephaniah E. Loss-Cutler-Hull
2018-03-29  3:22     ` Jens Axboe
2018-03-29  5:13       ` Paolo Valente
2018-03-29  9:12         ` Zephaniah E. Loss-Cutler-Hull [this message]
2018-03-30  5:43           ` Zephaniah E. Loss-Cutler-Hull
2018-03-29  4:56   ` Paolo Valente

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0a96375b-5a2e-e828-fa1f-a14af192be4c@aehallh.com \
    --to=warp@aehallh.com \
    --cc=axboe@kernel.dk \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=paolo.valente@linaro.org \
    --cc=warp-spam_kernel@aehallh.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox