* [PATCH] scsi_debug: fix scp is NULL errors
@ 2020-08-13 15:57 Douglas Gilbert
2020-08-15 17:18 ` Lee Duncan
2020-08-18 3:12 ` Martin K. Petersen
0 siblings, 2 replies; 3+ messages in thread
From: Douglas Gilbert @ 2020-08-13 15:57 UTC (permalink / raw)
To: linux-scsi; +Cc: martin.petersen, jejb, hare, john.garry
John Garry reported 'sdebug_q_cmd_complete: scp is NULL' failures
that were mainly seen on aarch64 machines (e.g. RPi 4 with four
A72 CPUs). The problem was tracked down to a missing critical
section on a "short circuit" path. Namely, the time to process
the current command so far has already exceeded the requested
command duration (i.e. the number of nanoseconds in the ndelay
parameter).
The random=1 parameter setting was pivotal in finding this error.
The failure scenario involved first taking that "short circuit"
path (due to a very short command duration) and then taking the
more likely hrtimer_start() path (due to a longer command
duration). With random=1 each command's duration is taken from
the uniformly distributed [0..ndelay) interval.
The fio utility also helped by reliably generating the error
scenario at about once per minute on a RPi 4 (64 bit OS).
Reported-by: John Garry <john.garry@huawei.com>
Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
---
drivers/scsi/scsi_debug.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/scsi/scsi_debug.c b/drivers/scsi/scsi_debug.c
index d95822dceeb6..4b4e31af22bd 100644
--- a/drivers/scsi/scsi_debug.c
+++ b/drivers/scsi/scsi_debug.c
@@ -5471,9 +5471,11 @@ static int schedule_resp(struct scsi_cmnd *cmnd, struct sdebug_dev_info *devip,
u64 d = ktime_get_boottime_ns() - ns_from_boot;
if (kt <= d) { /* elapsed duration >= kt */
+ spin_lock_irqsave(&sqp->qc_lock, iflags);
sqcp->a_cmnd = NULL;
atomic_dec(&devip->num_in_q);
clear_bit(k, sqp->in_use_bm);
+ spin_unlock_irqrestore(&sqp->qc_lock, iflags);
if (new_sd_dp)
kfree(sd_dp);
/* call scsi_done() from this thread */
--
2.25.1
^ permalink raw reply related [flat|nested] 3+ messages in thread* Re: [PATCH] scsi_debug: fix scp is NULL errors
2020-08-13 15:57 [PATCH] scsi_debug: fix scp is NULL errors Douglas Gilbert
@ 2020-08-15 17:18 ` Lee Duncan
2020-08-18 3:12 ` Martin K. Petersen
1 sibling, 0 replies; 3+ messages in thread
From: Lee Duncan @ 2020-08-15 17:18 UTC (permalink / raw)
To: Douglas Gilbert, linux-scsi; +Cc: martin.petersen, jejb, hare, john.garry
On 8/13/20 8:57 AM, Douglas Gilbert wrote:
> John Garry reported 'sdebug_q_cmd_complete: scp is NULL' failures
> that were mainly seen on aarch64 machines (e.g. RPi 4 with four
> A72 CPUs). The problem was tracked down to a missing critical
> section on a "short circuit" path. Namely, the time to process
> the current command so far has already exceeded the requested
> command duration (i.e. the number of nanoseconds in the ndelay
> parameter).
>
> The random=1 parameter setting was pivotal in finding this error.
> The failure scenario involved first taking that "short circuit"
> path (due to a very short command duration) and then taking the
> more likely hrtimer_start() path (due to a longer command
> duration). With random=1 each command's duration is taken from
> the uniformly distributed [0..ndelay) interval.
> The fio utility also helped by reliably generating the error
> scenario at about once per minute on a RPi 4 (64 bit OS).
>
> Reported-by: John Garry <john.garry@huawei.com>
> Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
> ---
> drivers/scsi/scsi_debug.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/drivers/scsi/scsi_debug.c b/drivers/scsi/scsi_debug.c
> index d95822dceeb6..4b4e31af22bd 100644
> --- a/drivers/scsi/scsi_debug.c
> +++ b/drivers/scsi/scsi_debug.c
> @@ -5471,9 +5471,11 @@ static int schedule_resp(struct scsi_cmnd *cmnd, struct sdebug_dev_info *devip,
> u64 d = ktime_get_boottime_ns() - ns_from_boot;
>
> if (kt <= d) { /* elapsed duration >= kt */
> + spin_lock_irqsave(&sqp->qc_lock, iflags);
> sqcp->a_cmnd = NULL;
> atomic_dec(&devip->num_in_q);
> clear_bit(k, sqp->in_use_bm);
> + spin_unlock_irqrestore(&sqp->qc_lock, iflags);
> if (new_sd_dp)
> kfree(sd_dp);
> /* call scsi_done() from this thread */
>
Reviewed-by: Lee Duncan <lduncan@suse.com>
^ permalink raw reply [flat|nested] 3+ messages in thread* Re: [PATCH] scsi_debug: fix scp is NULL errors
2020-08-13 15:57 [PATCH] scsi_debug: fix scp is NULL errors Douglas Gilbert
2020-08-15 17:18 ` Lee Duncan
@ 2020-08-18 3:12 ` Martin K. Petersen
1 sibling, 0 replies; 3+ messages in thread
From: Martin K. Petersen @ 2020-08-18 3:12 UTC (permalink / raw)
To: Douglas Gilbert, linux-scsi; +Cc: Martin K . Petersen, john.garry, jejb, hare
On Thu, 13 Aug 2020 11:57:38 -0400, Douglas Gilbert wrote:
> John Garry reported 'sdebug_q_cmd_complete: scp is NULL' failures
> that were mainly seen on aarch64 machines (e.g. RPi 4 with four
> A72 CPUs). The problem was tracked down to a missing critical
> section on a "short circuit" path. Namely, the time to process
> the current command so far has already exceeded the requested
> command duration (i.e. the number of nanoseconds in the ndelay
> parameter).
>
> [...]
Applied to 5.9/scsi-fixes, thanks!
[1/1] scsi: scsi_debug: Fix scp is NULL errors
https://git.kernel.org/mkp/scsi/c/223f91b48079
--
Martin K. Petersen Oracle Linux Engineering
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2020-08-18 3:12 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-08-13 15:57 [PATCH] scsi_debug: fix scp is NULL errors Douglas Gilbert
2020-08-15 17:18 ` Lee Duncan
2020-08-18 3:12 ` Martin K. Petersen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox