From: Barto <mister.freeman@laposte.net>
To: "Elliott, Robert (Server Storage)" <Elliott@hp.com>,
Guenter Roeck <linux@roeck-us.net>,
Bjorn Helgaas <bhelgaas@google.com>
Cc: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"linux-scsi@vger.kernel.org" <linux-scsi@vger.kernel.org>,
Joe Perches <joe@perches.com>
Subject: Re: BUG in scsi_lib.c due to a bad commit
Date: Thu, 13 Nov 2014 10:38:24 +0100 [thread overview]
Message-ID: <54647C10.4070506@laposte.net> (raw)
In-Reply-To: <94D0CD8314A33A4D9D801C0FE68B4029593A1880@G9W0745.americas.hpqcorp.net>
Hello,
> Were you running with scsi_mod.use_blk_mq=Y or =N?
I don't find this value in my .config file related to kernel modules
options,
perhaps you talk about a kernel boot option for grub ?
I don't use this kernel option boot in grub ( scsi_mod.use_blk_mq ),
I use archlinux and his default kernel config file for 64 bits CPU, you
can find this file here :
https://projects.archlinux.org/svntogit/packages.git/tree/trunk?h=packages/linux
I don't have SCSI devices, I have only 3 SATA harddisks, and 2 IDE
harddisks,
before kernel 3.17 I don't have this random hang bug on boot, so we need
to know what's going on, it's probably a change made last months ago in
SCSI source code ( or something else ) who has introduced a bug who
prevents a correct behaviour on some parts in scsi_lib.c ( for example
the if statement related to atomic_read(&sdev->device_busy) and
blk_delay_queue(q, SCSI_QUEUE_DELAY) )
Guenter Roeck was the first to have been hit by this bug ( his qemu test
files hang on boot ), he thought the solution was to revert the polarity
of the if statement, it solves his problem but unfortunately a new bug
is now triggered on some PC configurations,
it would be interesting to find a definitive solution who can solve both
the "qemu bug" and the "random bug on boot" on some PC configurations,
I'm not an expert about the scsi code, perhaps with some unit tests you
can spot the defect element in the scsi source code, check if some parts
in the source code acts really like it should be ?
Le 13/11/2014 06:33, Elliott, Robert (Server Storage) a écrit :
>
>
>> -----Original Message-----
>> From: linux-kernel-owner@vger.kernel.org [mailto:linux-kernel-
>> owner@vger.kernel.org] On Behalf Of Barto
>> Sent: Wednesday, November 12, 2014 9:28 PM
>> To: Guenter Roeck; Bjorn Helgaas
>> Cc: linux-kernel@vger.kernel.org; linux-scsi@vger.kernel.org; Joe
>> Perches
>> Subject: Re: BUG in scsi_lib.c due to a bad commit
>>
>> reverting your commit 045065d8a300a37218c is a solution, but it's just a
>> temporary solution,
>>
>> it's better to search why your commit can create a random hang on boot
>> on some PC configurations,
>>
>> --- a/drivers/scsi/scsi_lib.c
>> +++ b/drivers/scsi/scsi_lib.c
>> @@ -1774,7 +1774,7 @@ static void scsi_request_fn(struct request_queue
>> *q)
>> blk_requeue_request(q, req);
>> atomic_dec(&sdev->device_busy);
>> out_delay:
>> - if (atomic_read(&sdev->device_busy) && !scsi_device_blocked(sdev))
>> + if (!atomic_read(&sdev->device_busy) && !scsi_device_blocked(sdev))
>> blk_delay_queue(q, SCSI_QUEUE_DELAY);
>> }
>>
>> perhaps the atomic_read() function doesn't make the expected job on some
>> rare circonstances, I have the same doubts about the blk_delay_queue()
>> function
>
> Were you running with scsi_mod.use_blk_mq=Y or =N?
>
> device_busy is the active queue depth for the device (e.g.
> 5 means there are 5 commands submitted but not yet completed).
>
> The function reaches this code if it has run out of tags, the host
> has reached its limit of outstanding commands, or the target has
> reached its limit. It requeus the request:
> * with delay if device_busy is zero
> * without delay if device_busy is non_zero
>
> I think this is the reasoning:
> If device_busy is zero, trying to process the request again will
> probably run into the same problem; a delay gives time for the
> situation to change. If device_busy is non-zero, then the
> requeued command goes behind others and might get a different
> result.
>
> With the polarity backwards, the lack of delay hung PA-RISC
> and SPARC64 systems), not just QEMU. So, I don't think reverting
> the fix is good.
>
> Changing it to an unconditional delay might be safe - delay
> regardless of device_busy (until the root cause is understood).
>
> Also, SCSI_QUEUE_DELAY seems like an arbitrary magic number;
> maybe that value isn't working correctly anymore?
>
>
next prev parent reply other threads:[~2014-11-13 9:38 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-11-11 23:33 BUG in scsi_lib.c due to a bad commit Barto
2014-11-12 0:17 ` Bjorn Helgaas
2014-11-12 2:53 ` Guenter Roeck
2014-11-13 3:28 ` Barto
2014-11-13 5:33 ` Elliott, Robert (Server Storage)
2014-11-13 5:33 ` Elliott, Robert (Server Storage)
2014-11-13 9:38 ` Barto [this message]
2014-11-13 14:29 ` Christoph Hellwig
2014-11-13 15:13 ` Barto
2014-11-13 17:14 ` Barto
2014-11-13 17:54 ` Christoph Hellwig
2014-11-13 22:55 ` Barto
2014-11-14 7:32 ` Christoph Hellwig
2014-11-14 16:30 ` Barto
2014-11-16 18:30 ` Barto
2014-11-19 20:21 ` Barto
-- strict thread matches above, loose matches on Subject: below --
2014-11-20 6:09 Christoph Hellwig
2014-11-20 17:44 ` Barto
2014-11-20 17:53 ` Christoph Hellwig
2014-11-20 18:27 ` Barto
2014-11-24 9:18 ` Christoph Hellwig
2014-11-24 15:12 ` Barto
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=54647C10.4070506@laposte.net \
--to=mister.freeman@laposte.net \
--cc=Elliott@hp.com \
--cc=bhelgaas@google.com \
--cc=joe@perches.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-scsi@vger.kernel.org \
--cc=linux@roeck-us.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.