public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed
From: John Garry <john.g.garry@oracle.com>
To: Shin'ichiro Kawasaki <shinichiro@fastmail.com>
Cc: Bart Van Assche <bvanassche@acm.org>,
	linux-scsi@vger.kernel.org, linux-block@vger.kernel.org,
	dgilbert@interlog.com
Subject: Re: blktests scsi/007 failure
Date: Thu, 20 Apr 2023 13:59:15 +0100	[thread overview]
Message-ID: <15edb8ec-704c-cf0c-00a9-014391ba15f9@oracle.com> (raw)
In-Reply-To: <yqe6sjp6ukfoafaoetwacddkpo2y5mk4hsnxgw377iwholxo52@psw2zzelcmig>

On 20/04/2023 13:26, Shin'ichiro Kawasaki wrote:
>> Thanks for the notice. I think your changes were applied to 6.4/scsi-queue,
>> which I've not yet tried. Then it should not be related to your changes.
> I took a closer look in your changes for kernel v6.4, and noticed that it might
> affect the scsi/007 failure I observed with kernel v6.3-rcX. I did some trials
> and found these:
> 
> - On kernel v6.3-rc7 without your changes, the test case scsi/007 fails with
>    unexpected read command success (The failure I found and reported).
> - On kernel v6.3-rc7 with your changes until "scsi: scsi_debug: Dynamically
>    allocate sdebug_queued_cmd" [1], scsi/007 fails and causes system hang.
>    Kernel reported "BUG sdebug_queued_cmd". When I reverte [1] from the kernel,
>    the failure symptom is same as v6.3-rc7 (no hang, no BUG).
> - On kernel v6.3-rc7 with your changes including [1] and "scsi: scsi_debug:
>    Abort commands from scsi_debug_device_reset()" [2], scsi/007 passes.
> 
> [1]https://urldefense.com/v3/__https://lore.kernel.org/lkml/20230327074310.1862889-7-john.g.garry@oracle.com/__;!!ACWV5N9M2RV99hQ!LO2F4s8nfVkzWonRz3dAAjnNVnWR9BxaU3O5S0eyOQ2LfEvDoYKqox5_uN2SctlIhs5Dzq762TQTlB5jHTJX_WPW$  
> [2]https://urldefense.com/v3/__https://lore.kernel.org/linux-scsi/20230416175654.159163-1-john.g.garry@oracle.com/__;!!ACWV5N9M2RV99hQ!LO2F4s8nfVkzWonRz3dAAjnNVnWR9BxaU3O5S0eyOQ2LfEvDoYKqox5_uN2SctlIhs5Dzq762TQTlB5jHa0ytu5Y$  
> 
> Your fix [2] intended to fix the BUG that [1] caused, but it also fixed the
> scsi/007 failure I found 😄

Great

> 
> 
> To understand the failure deeper, I added debug prints in scsi_debug, using
> kernel v6.3-rc7 with your changes just before [1]. This kernel does not have the
> fix [2], then it does not abort commands at device reset. When scsi error
> handler does BDR, bus device reset, scsi_debug does not cancel the hrtimer for
> the commands issued to the scsi_debug. This hrtimer is alive across the reset.
> When that hrtimer expires, scsi_debug completes the command that issued_after_
> BDR. The hrtimer for the command before BDR completes the command after BDR
> since those two commands use the same scsi_cmnd and rq objects reused. Then the
> command issued after BDR completes earlier than expected, and results in the
> unexpected read command success and scsi/007 failure.
> 
> After applying the fix [2], scsi_debug cancels hrtimers at reset. Then, the
> hrtimers started before reset do not affect the commands issued after reset.
> 
> These findings mean that the scsi/007 failure I found with kernel v6.3-rc7
> indicated the bug in scsi_debug, and the commit [2] fixed it. Now I don't think
> blktests side fix for scsi/007 is required. Good 😄

Do you know why you specifically were seeing this issue for v6.3-rc7? Is 
it just timing related? I seem to remember you mentioning debug configs 
earlier.

It would be nice to see this issue fixed for 6.3 and earlier, but, 
considering the circumstances, it doesn't look straightforward.

Thanks for the info,
John


  reply	other threads:[~2023-04-20 12:59 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-14  7:36 blktests scsi/007 failure Shin'ichiro Kawasaki
2023-04-14  8:09 ` Chaitanya Kulkarni
2023-04-14  8:53   ` Shin'ichiro Kawasaki
2023-04-14  8:33 ` John Garry
2023-04-14  8:58   ` Shin'ichiro Kawasaki
2023-04-20 12:26     ` Shin'ichiro Kawasaki
2023-04-20 12:59       ` John Garry [this message]
2023-04-21  1:03         ` Shin'ichiro Kawasaki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=15edb8ec-704c-cf0c-00a9-014391ba15f9@oracle.com \
    --to=john.g.garry@oracle.com \
    --cc=bvanassche@acm.org \
    --cc=dgilbert@interlog.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=shinichiro@fastmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox