public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] scsi_debug: fix scp is NULL errors
@ 2020-08-13 15:57 Douglas Gilbert
  2020-08-15 17:18 ` Lee Duncan
  2020-08-18  3:12 ` Martin K. Petersen
  0 siblings, 2 replies; 3+ messages in thread
From: Douglas Gilbert @ 2020-08-13 15:57 UTC (permalink / raw)
  To: linux-scsi; +Cc: martin.petersen, jejb, hare, john.garry

John Garry reported 'sdebug_q_cmd_complete: scp is NULL' failures
that were mainly seen on aarch64 machines (e.g. RPi 4 with four
A72 CPUs). The problem was tracked down to a missing critical
section on a "short circuit" path. Namely, the time to process
the current command so far has already exceeded the requested
command duration (i.e. the number of nanoseconds in the ndelay
parameter).

The random=1 parameter setting was pivotal in finding this error.
The failure scenario involved first taking that "short circuit"
path (due to a very short command duration) and then taking the
more likely hrtimer_start() path (due to a longer command
duration). With random=1 each command's duration is taken from
the uniformly distributed [0..ndelay) interval.
The fio utility also helped by reliably generating the error
scenario at about once per minute on a RPi 4 (64 bit OS).

Reported-by: John Garry <john.garry@huawei.com>
Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
---
 drivers/scsi/scsi_debug.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/scsi/scsi_debug.c b/drivers/scsi/scsi_debug.c
index d95822dceeb6..4b4e31af22bd 100644
--- a/drivers/scsi/scsi_debug.c
+++ b/drivers/scsi/scsi_debug.c
@@ -5471,9 +5471,11 @@ static int schedule_resp(struct scsi_cmnd *cmnd, struct sdebug_dev_info *devip,
 				u64 d = ktime_get_boottime_ns() - ns_from_boot;
 
 				if (kt <= d) {	/* elapsed duration >= kt */
+					spin_lock_irqsave(&sqp->qc_lock, iflags);
 					sqcp->a_cmnd = NULL;
 					atomic_dec(&devip->num_in_q);
 					clear_bit(k, sqp->in_use_bm);
+					spin_unlock_irqrestore(&sqp->qc_lock, iflags);
 					if (new_sd_dp)
 						kfree(sd_dp);
 					/* call scsi_done() from this thread */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] scsi_debug: fix scp is NULL errors
  2020-08-13 15:57 [PATCH] scsi_debug: fix scp is NULL errors Douglas Gilbert
@ 2020-08-15 17:18 ` Lee Duncan
  2020-08-18  3:12 ` Martin K. Petersen
  1 sibling, 0 replies; 3+ messages in thread
From: Lee Duncan @ 2020-08-15 17:18 UTC (permalink / raw)
  To: Douglas Gilbert, linux-scsi; +Cc: martin.petersen, jejb, hare, john.garry

On 8/13/20 8:57 AM, Douglas Gilbert wrote:
> John Garry reported 'sdebug_q_cmd_complete: scp is NULL' failures
> that were mainly seen on aarch64 machines (e.g. RPi 4 with four
> A72 CPUs). The problem was tracked down to a missing critical
> section on a "short circuit" path. Namely, the time to process
> the current command so far has already exceeded the requested
> command duration (i.e. the number of nanoseconds in the ndelay
> parameter).
> 
> The random=1 parameter setting was pivotal in finding this error.
> The failure scenario involved first taking that "short circuit"
> path (due to a very short command duration) and then taking the
> more likely hrtimer_start() path (due to a longer command
> duration). With random=1 each command's duration is taken from
> the uniformly distributed [0..ndelay) interval.
> The fio utility also helped by reliably generating the error
> scenario at about once per minute on a RPi 4 (64 bit OS).
> 
> Reported-by: John Garry <john.garry@huawei.com>
> Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
> ---
>  drivers/scsi/scsi_debug.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/scsi/scsi_debug.c b/drivers/scsi/scsi_debug.c
> index d95822dceeb6..4b4e31af22bd 100644
> --- a/drivers/scsi/scsi_debug.c
> +++ b/drivers/scsi/scsi_debug.c
> @@ -5471,9 +5471,11 @@ static int schedule_resp(struct scsi_cmnd *cmnd, struct sdebug_dev_info *devip,
>  				u64 d = ktime_get_boottime_ns() - ns_from_boot;
>  
>  				if (kt <= d) {	/* elapsed duration >= kt */
> +					spin_lock_irqsave(&sqp->qc_lock, iflags);
>  					sqcp->a_cmnd = NULL;
>  					atomic_dec(&devip->num_in_q);
>  					clear_bit(k, sqp->in_use_bm);
> +					spin_unlock_irqrestore(&sqp->qc_lock, iflags);
>  					if (new_sd_dp)
>  						kfree(sd_dp);
>  					/* call scsi_done() from this thread */
> 

Reviewed-by: Lee Duncan <lduncan@suse.com>


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] scsi_debug: fix scp is NULL errors
  2020-08-13 15:57 [PATCH] scsi_debug: fix scp is NULL errors Douglas Gilbert
  2020-08-15 17:18 ` Lee Duncan
@ 2020-08-18  3:12 ` Martin K. Petersen
  1 sibling, 0 replies; 3+ messages in thread
From: Martin K. Petersen @ 2020-08-18  3:12 UTC (permalink / raw)
  To: Douglas Gilbert, linux-scsi; +Cc: Martin K . Petersen, john.garry, jejb, hare

On Thu, 13 Aug 2020 11:57:38 -0400, Douglas Gilbert wrote:

> John Garry reported 'sdebug_q_cmd_complete: scp is NULL' failures
> that were mainly seen on aarch64 machines (e.g. RPi 4 with four
> A72 CPUs). The problem was tracked down to a missing critical
> section on a "short circuit" path. Namely, the time to process
> the current command so far has already exceeded the requested
> command duration (i.e. the number of nanoseconds in the ndelay
> parameter).
> 
> [...]

Applied to 5.9/scsi-fixes, thanks!

[1/1] scsi: scsi_debug: Fix scp is NULL errors
      https://git.kernel.org/mkp/scsi/c/223f91b48079

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2020-08-18  3:12 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-08-13 15:57 [PATCH] scsi_debug: fix scp is NULL errors Douglas Gilbert
2020-08-15 17:18 ` Lee Duncan
2020-08-18  3:12 ` Martin K. Petersen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox