* [PATCH RFC] Remove the cancel_delayed_work() call from scsi_put_command()
@ 2014-05-21 13:30 Bart Van Assche
2014-05-22 16:22 ` Paolo Bonzini
2014-05-23 6:09 ` Hannes Reinecke
0 siblings, 2 replies; 8+ messages in thread
From: Bart Van Assche @ 2014-05-21 13:30 UTC (permalink / raw)
To: linux-scsi@vger.kernel.org
Cc: Hannes Reinecke, Paolo Bonzini, Christoph Hellwig, Jens Axboe,
Joe Lawrence
scmd->abort_work is only scheduled after the block layer has marked
the request associated with a command as complete and for commands
that are not on the eh_cmd_q list. A SCSI command is only requeued
after the scmd->abort_work handler has started (requeueing clears
the "complete" flag). This means that the cancel_delayed_work()
statement in scsi_put_command() is a no-op. Hence remove it.
Additionally, document how it is avoided that scsi_finish_command()
and the SCSI error handler code are invoked concurrently for the
same command via WARN_ON_ONCE() statements. This should avoid that
the scsi error handler code confuses its readers.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Jens Axboe <axboe@fb.com>
Cc: Joe Lawrence <jdl1291@gmail.com>
---
block/blk-softirq.c | 6 ++++++
drivers/scsi/scsi.c | 2 --
drivers/scsi/scsi_error.c | 28 ++++++++++++++++++++++++++++
include/linux/blkdev.h | 1 +
4 files changed, 35 insertions(+), 2 deletions(-)
diff --git a/block/blk-softirq.c b/block/blk-softirq.c
index 53b1737..59bb52d 100644
--- a/block/blk-softirq.c
+++ b/block/blk-softirq.c
@@ -172,6 +172,12 @@ void blk_complete_request(struct request *req)
}
EXPORT_SYMBOL(blk_complete_request);
+bool blk_rq_completed(struct request *rq)
+{
+ return test_bit(REQ_ATOM_COMPLETE, &rq->atomic_flags);
+}
+EXPORT_SYMBOL(blk_rq_completed);
+
static __init int blk_softirq_init(void)
{
int i;
diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
index 88d46fe..04a282a 100644
--- a/drivers/scsi/scsi.c
+++ b/drivers/scsi/scsi.c
@@ -334,8 +334,6 @@ void scsi_put_command(struct scsi_cmnd *cmd)
list_del_init(&cmd->list);
spin_unlock_irqrestore(&cmd->device->list_lock, flags);
- cancel_delayed_work(&cmd->abort_work);
-
__scsi_put_command(cmd->device->host, cmd);
}
EXPORT_SYMBOL(scsi_put_command);
diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index 14ce3b4..32a8cd1 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -108,6 +108,28 @@ static int scsi_host_eh_past_deadline(struct Scsi_Host *shost)
return 1;
}
+static bool scmd_being_handled_in_other_context(struct scsi_cmnd *scmd)
+{
+ struct Scsi_Host *shost = scmd->device->host;
+ struct scsi_cmnd *c;
+ unsigned long flags;
+ bool ret = false;
+
+ if (!blk_rq_completed(scmd->request))
+ return true;
+
+ spin_lock_irqsave(shost->host_lock, flags);
+ list_for_each_entry(c, &shost->eh_cmd_q, eh_entry) {
+ if (c == scmd) {
+ ret = true;
+ break;
+ }
+ }
+ spin_unlock_irqrestore(shost->host_lock, flags);
+
+ return ret;
+}
+
/**
* scmd_eh_abort_handler - Handle command aborts
* @work: command to be aborted.
@@ -120,6 +142,8 @@ scmd_eh_abort_handler(struct work_struct *work)
struct scsi_device *sdev = scmd->device;
int rtn;
+ WARN_ON_ONCE(scmd_being_handled_in_other_context(scmd));
+
if (scsi_host_eh_past_deadline(sdev->host)) {
SCSI_LOG_ERROR_RECOVERY(3,
scmd_printk(KERN_INFO, scmd,
@@ -185,6 +209,8 @@ scsi_abort_command(struct scsi_cmnd *scmd)
struct Scsi_Host *shost = sdev->host;
unsigned long flags;
+ WARN_ON_ONCE(scmd_being_handled_in_other_context(scmd));
+
if (scmd->eh_eflags & SCSI_EH_ABORT_SCHEDULED) {
/*
* Retry after abort failed, escalate to next level.
@@ -237,6 +263,8 @@ int scsi_eh_scmd_add(struct scsi_cmnd *scmd, int eh_flag)
unsigned long flags;
int ret = 0;
+ WARN_ON_ONCE(scmd_being_handled_in_other_context(scmd));
+
if (!shost->ehandler)
return 0;
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 0d84981..a621bc5 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -952,6 +952,7 @@ extern void blk_complete_request(struct request *);
extern void __blk_complete_request(struct request *);
extern void blk_abort_request(struct request *);
extern void blk_unprep_request(struct request *);
+extern bool blk_rq_completed(struct request *);
/*
* Access functions for manipulating queue properties
--
1.8.4.5
^ permalink raw reply related [flat|nested] 8+ messages in thread* Re: [PATCH RFC] Remove the cancel_delayed_work() call from scsi_put_command()
2014-05-21 13:30 [PATCH RFC] Remove the cancel_delayed_work() call from scsi_put_command() Bart Van Assche
@ 2014-05-22 16:22 ` Paolo Bonzini
2014-05-22 17:41 ` Bart Van Assche
2014-05-23 6:09 ` Hannes Reinecke
1 sibling, 1 reply; 8+ messages in thread
From: Paolo Bonzini @ 2014-05-22 16:22 UTC (permalink / raw)
To: Bart Van Assche, linux-scsi@vger.kernel.org
Cc: Hannes Reinecke, Christoph Hellwig, Jens Axboe, Joe Lawrence
Il 21/05/2014 15:30, Bart Van Assche ha scritto:
> +static bool scmd_being_handled_in_other_context(struct scsi_cmnd *scmd)
> +{
> + struct Scsi_Host *shost = scmd->device->host;
> + struct scsi_cmnd *c;
> + unsigned long flags;
> + bool ret = false;
> +
> + if (!blk_rq_completed(scmd->request))
> + return true;
> +
> + spin_lock_irqsave(shost->host_lock, flags);
> + list_for_each_entry(c, &shost->eh_cmd_q, eh_entry) {
> + if (c == scmd) {
> + ret = true;
> + break;
> + }
> + }
> + spin_unlock_irqrestore(shost->host_lock, flags);
> +
> + return ret;
> +}
> +
> /**
> * scmd_eh_abort_handler - Handle command aborts
> * @work: command to be aborted.
> @@ -120,6 +142,8 @@ scmd_eh_abort_handler(struct work_struct *work)
> struct scsi_device *sdev = scmd->device;
> int rtn;
>
> + WARN_ON_ONCE(scmd_being_handled_in_other_context(scmd));
What about a simpler, though less accuracte
WARN_ON(!blk_rq_completed(scmd->request));
that doesn't need the host_lock?
Paolo
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: [PATCH RFC] Remove the cancel_delayed_work() call from scsi_put_command()
2014-05-22 16:22 ` Paolo Bonzini
@ 2014-05-22 17:41 ` Bart Van Assche
0 siblings, 0 replies; 8+ messages in thread
From: Bart Van Assche @ 2014-05-22 17:41 UTC (permalink / raw)
To: Paolo Bonzini, linux-scsi@vger.kernel.org
Cc: Hannes Reinecke, Christoph Hellwig, Jens Axboe, Joe Lawrence
On 05/22/14 18:22, Paolo Bonzini wrote:
> Il 21/05/2014 15:30, Bart Van Assche ha scritto:
>> +static bool scmd_being_handled_in_other_context(struct scsi_cmnd *scmd)
>> +{
>> + struct Scsi_Host *shost = scmd->device->host;
>> + struct scsi_cmnd *c;
>> + unsigned long flags;
>> + bool ret = false;
>> +
>> + if (!blk_rq_completed(scmd->request))
>> + return true;
>> +
>> + spin_lock_irqsave(shost->host_lock, flags);
>> + list_for_each_entry(c, &shost->eh_cmd_q, eh_entry) {
>> + if (c == scmd) {
>> + ret = true;
>> + break;
>> + }
>> + }
>> + spin_unlock_irqrestore(shost->host_lock, flags);
>> +
>> + return ret;
>> +}
>> +
>> /**
>> * scmd_eh_abort_handler - Handle command aborts
>> * @work: command to be aborted.
>> @@ -120,6 +142,8 @@ scmd_eh_abort_handler(struct work_struct *work)
>> struct scsi_device *sdev = scmd->device;
>> int rtn;
>>
>> + WARN_ON_ONCE(scmd_being_handled_in_other_context(scmd));
>
> What about a simpler, though less accuracte
>
> WARN_ON(!blk_rq_completed(scmd->request));
>
> that doesn't need the host_lock?
One reason why I posted this patch as an RFC was to invite feedback. I'm
fine with leaving out the loop over the eh_cmd_q list although I do not
expect that will make a significant performance difference. None of the
functions in which a check was added are in the hot path.
Bart.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH RFC] Remove the cancel_delayed_work() call from scsi_put_command()
2014-05-21 13:30 [PATCH RFC] Remove the cancel_delayed_work() call from scsi_put_command() Bart Van Assche
2014-05-22 16:22 ` Paolo Bonzini
@ 2014-05-23 6:09 ` Hannes Reinecke
2014-05-23 9:24 ` Paolo Bonzini
1 sibling, 1 reply; 8+ messages in thread
From: Hannes Reinecke @ 2014-05-23 6:09 UTC (permalink / raw)
To: Bart Van Assche, linux-scsi@vger.kernel.org
Cc: Paolo Bonzini, Christoph Hellwig, Jens Axboe, Joe Lawrence
On 05/21/2014 03:30 PM, Bart Van Assche wrote:
> scmd->abort_work is only scheduled after the block layer has marked
> the request associated with a command as complete and for commands
> that are not on the eh_cmd_q list. A SCSI command is only requeued
> after the scmd->abort_work handler has started (requeueing clears
> the "complete" flag). This means that the cancel_delayed_work()
> statement in scsi_put_command() is a no-op. Hence remove it.
>
Hmm.
I've put in the cancel_delayed_work() as a safety guard, fully
aware that it's one of the "this cannot happen" kind of things.
But there is a workqueue and it might have elements on it.
And when freeing a command we absolutely need to make sure that
the workqueue is empty.
So calling cancel_delayed_work() was the obvious thing to do.
I'd be fine with adding a WARN_ON(!list_empty(&cmd->abort_work))
here, however. This will clear up the intent of this statement.
> Additionally, document how it is avoided that scsi_finish_command()
> and the SCSI error handler code are invoked concurrently for the
> same command via WARN_ON_ONCE() statements. This should avoid that
> the scsi error handler code confuses its readers.
>
This I'd rather put into a separate patch, as it's really a
different issue.
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> Cc: Hannes Reinecke <hare@suse.de>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Christoph Hellwig <hch@infradead.org>
> Cc: Jens Axboe <axboe@fb.com>
> Cc: Joe Lawrence <jdl1291@gmail.com>
> ---
> block/blk-softirq.c | 6 ++++++
> drivers/scsi/scsi.c | 2 --
> drivers/scsi/scsi_error.c | 28 ++++++++++++++++++++++++++++
> include/linux/blkdev.h | 1 +
> 4 files changed, 35 insertions(+), 2 deletions(-)
>
> diff --git a/block/blk-softirq.c b/block/blk-softirq.c
> index 53b1737..59bb52d 100644
> --- a/block/blk-softirq.c
> +++ b/block/blk-softirq.c
> @@ -172,6 +172,12 @@ void blk_complete_request(struct request *req)
> }
> EXPORT_SYMBOL(blk_complete_request);
>
> +bool blk_rq_completed(struct request *rq)
> +{
> + return test_bit(REQ_ATOM_COMPLETE, &rq->atomic_flags);
> +}
> +EXPORT_SYMBOL(blk_rq_completed);
> +
> static __init int blk_softirq_init(void)
> {
> int i;
> diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
> index 88d46fe..04a282a 100644
> --- a/drivers/scsi/scsi.c
> +++ b/drivers/scsi/scsi.c
> @@ -334,8 +334,6 @@ void scsi_put_command(struct scsi_cmnd *cmd)
> list_del_init(&cmd->list);
> spin_unlock_irqrestore(&cmd->device->list_lock, flags);
>
> - cancel_delayed_work(&cmd->abort_work);
> -
> __scsi_put_command(cmd->device->host, cmd);
> }
> EXPORT_SYMBOL(scsi_put_command);
> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
> index 14ce3b4..32a8cd1 100644
> --- a/drivers/scsi/scsi_error.c
> +++ b/drivers/scsi/scsi_error.c
> @@ -108,6 +108,28 @@ static int scsi_host_eh_past_deadline(struct Scsi_Host *shost)
> return 1;
> }
>
> +static bool scmd_being_handled_in_other_context(struct scsi_cmnd *scmd)
> +{
> + struct Scsi_Host *shost = scmd->device->host;
> + struct scsi_cmnd *c;
> + unsigned long flags;
> + bool ret = false;
> +
> + if (!blk_rq_completed(scmd->request))
> + return true;
> +
> + spin_lock_irqsave(shost->host_lock, flags);
> + list_for_each_entry(c, &shost->eh_cmd_q, eh_entry) {
> + if (c == scmd) {
> + ret = true;
> + break;
> + }
> + }
> + spin_unlock_irqrestore(shost->host_lock, flags);
> +
> + return ret;
> +}
> +
> /**
> * scmd_eh_abort_handler - Handle command aborts
> * @work: command to be aborted.
Can't we just check for
!list_empty(&scmd->eh_entry)
here?
Should achieve the same with less computation...
Cheers,
Hannes
--
Dr. Hannes Reinecke zSeries & Storage
hare@suse.de +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: [PATCH RFC] Remove the cancel_delayed_work() call from scsi_put_command()
2014-05-23 6:09 ` Hannes Reinecke
@ 2014-05-23 9:24 ` Paolo Bonzini
2014-05-23 10:37 ` Bart Van Assche
0 siblings, 1 reply; 8+ messages in thread
From: Paolo Bonzini @ 2014-05-23 9:24 UTC (permalink / raw)
To: Hannes Reinecke, Bart Van Assche, linux-scsi@vger.kernel.org
Cc: Christoph Hellwig, Jens Axboe, Joe Lawrence
Il 23/05/2014 08:09, Hannes Reinecke ha scritto:
>
> And when freeing a command we absolutely need to make sure that
> the workqueue is empty.
> So calling cancel_delayed_work() was the obvious thing to do.
You would need cancel_delayed_work_sync, but if it really happened that
the work item is running, it would cause a double free.
> I'd be fine with adding a WARN_ON(!list_empty(&cmd->abort_work))
> here, however. This will clear up the intent of this statement.
BUG_ON even, since you'd get badness from the double free anyway.
Paolo
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH RFC] Remove the cancel_delayed_work() call from scsi_put_command()
2014-05-23 9:24 ` Paolo Bonzini
@ 2014-05-23 10:37 ` Bart Van Assche
2014-05-23 11:28 ` Paolo Bonzini
0 siblings, 1 reply; 8+ messages in thread
From: Bart Van Assche @ 2014-05-23 10:37 UTC (permalink / raw)
To: Paolo Bonzini, Hannes Reinecke, linux-scsi@vger.kernel.org
Cc: Christoph Hellwig, Jens Axboe, Joe Lawrence
On 05/23/14 11:24, Paolo Bonzini wrote:
> Il 23/05/2014 08:09, Hannes Reinecke ha scritto:
>>
>> And when freeing a command we absolutely need to make sure that
>> the workqueue is empty.
>> So calling cancel_delayed_work() was the obvious thing to do.
>
> You would need cancel_delayed_work_sync, but if it really happened that
> the work item is running, it would cause a double free.
>
>> I'd be fine with adding a WARN_ON(!list_empty(&cmd->abort_work))
>> here, however. This will clear up the intent of this statement.
>
> BUG_ON even, since you'd get badness from the double free anyway.
Hello Paolo,
Are you aware that Linus strongly prefers WARN_ON_ONCE() over BUG_ON() ?
See e.g. https://lkml.org/lkml/2012/9/27/461 or
https://lkml.org/lkml/2014/4/28/657.
Bart.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH RFC] Remove the cancel_delayed_work() call from scsi_put_command()
2014-05-23 10:37 ` Bart Van Assche
@ 2014-05-23 11:28 ` Paolo Bonzini
2014-05-23 11:36 ` Hannes Reinecke
0 siblings, 1 reply; 8+ messages in thread
From: Paolo Bonzini @ 2014-05-23 11:28 UTC (permalink / raw)
To: Bart Van Assche, Hannes Reinecke, linux-scsi@vger.kernel.org
Cc: Christoph Hellwig, Jens Axboe, Joe Lawrence
Il 23/05/2014 12:37, Bart Van Assche ha scritto:
> On 05/23/14 11:24, Paolo Bonzini wrote:
>> Il 23/05/2014 08:09, Hannes Reinecke ha scritto:
>>>
>>> And when freeing a command we absolutely need to make sure that
>>> the workqueue is empty.
>>> So calling cancel_delayed_work() was the obvious thing to do.
>>
>> You would need cancel_delayed_work_sync, but if it really happened that
>> the work item is running, it would cause a double free.
>>
>>> I'd be fine with adding a WARN_ON(!list_empty(&cmd->abort_work))
>>> here, however. This will clear up the intent of this statement.
>>
>> BUG_ON even, since you'd get badness from the double free anyway.
>
> Hello Paolo,
>
> Are you aware that Linus strongly prefers WARN_ON_ONCE() over BUG_ON() ?
> See e.g. https://lkml.org/lkml/2012/9/27/461 or
> https://lkml.org/lkml/2014/4/28/657.
Yes, I am and I even downgraded some KVM BUG_ONs recently.
But in this case I think that memory corruption is going to happen
anyway unless you consciously leak the Scsi_Cmnd * (because if you use
WARN_ON, you also need to return early as Linus suggested in the second
email).
So the WARN_ON/BUG_ON choice here should not just consider what makes
the problem easier to debug; hanging the machine before guaranteed
badness seems to me like a good use for BUG_ON.
Paolo
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH RFC] Remove the cancel_delayed_work() call from scsi_put_command()
2014-05-23 11:28 ` Paolo Bonzini
@ 2014-05-23 11:36 ` Hannes Reinecke
0 siblings, 0 replies; 8+ messages in thread
From: Hannes Reinecke @ 2014-05-23 11:36 UTC (permalink / raw)
To: Paolo Bonzini, Bart Van Assche, linux-scsi@vger.kernel.org
Cc: Christoph Hellwig, Jens Axboe, Joe Lawrence
On 05/23/2014 01:28 PM, Paolo Bonzini wrote:
> Il 23/05/2014 12:37, Bart Van Assche ha scritto:
>> On 05/23/14 11:24, Paolo Bonzini wrote:
>>> Il 23/05/2014 08:09, Hannes Reinecke ha scritto:
>>>>
>>>> And when freeing a command we absolutely need to make sure that
>>>> the workqueue is empty.
>>>> So calling cancel_delayed_work() was the obvious thing to do.
>>>
>>> You would need cancel_delayed_work_sync, but if it really
>>> happened that
>>> the work item is running, it would cause a double free.
>>>
>>>> I'd be fine with adding a WARN_ON(!list_empty(&cmd->abort_work))
>>>> here, however. This will clear up the intent of this statement.
>>>
>>> BUG_ON even, since you'd get badness from the double free anyway.
>>
>> Hello Paolo,
>>
>> Are you aware that Linus strongly prefers WARN_ON_ONCE() over
>> BUG_ON() ?
>> See e.g. https://lkml.org/lkml/2012/9/27/461 or
>> https://lkml.org/lkml/2014/4/28/657.
>
> Yes, I am and I even downgraded some KVM BUG_ONs recently.
>
> But in this case I think that memory corruption is going to happen
> anyway unless you consciously leak the Scsi_Cmnd * (because if you
> use WARN_ON, you also need to return early as Linus suggested in the
> second email).
>
> So the WARN_ON/BUG_ON choice here should not just consider what
> makes the problem easier to debug; hanging the machine before
> guaranteed badness seems to me like a good use for BUG_ON.
>
So this should work, right?
diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
index 88d46fe..53b8b94 100644
--- a/drivers/scsi/scsi.c
+++ b/drivers/scsi/scsi.c
@@ -334,7 +334,7 @@ void scsi_put_command(struct scsi_cmnd *cmd)
list_del_init(&cmd->list);
spin_unlock_irqrestore(&cmd->device->list_lock, flags);
- cancel_delayed_work(&cmd->abort_work);
+ BUG_ON(delayed_work_pending(&cmd->abort_work));
__scsi_put_command(cmd->device->host, cmd);
}
Cheers,
Hannes
--
Dr. Hannes Reinecke zSeries & Storage
hare@suse.de +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related [flat|nested] 8+ messages in thread
end of thread, other threads:[~2014-05-23 11:36 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-05-21 13:30 [PATCH RFC] Remove the cancel_delayed_work() call from scsi_put_command() Bart Van Assche
2014-05-22 16:22 ` Paolo Bonzini
2014-05-22 17:41 ` Bart Van Assche
2014-05-23 6:09 ` Hannes Reinecke
2014-05-23 9:24 ` Paolo Bonzini
2014-05-23 10:37 ` Bart Van Assche
2014-05-23 11:28 ` Paolo Bonzini
2014-05-23 11:36 ` Hannes Reinecke
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).