* [RFC 1/9] block: make timeout_list protectd by REQ_ATOM_COMPLETE bit
2014-05-30 8:15 [RFC 0/9] fix for the race issue between scsi timer and in-flight scmd Liu Ping Fan
@ 2014-05-30 8:15 ` Liu Ping Fan
2014-05-30 8:15 ` [RFC 2/9] scsi: ensure request is dequeue when finishing scmd Liu Ping Fan
` (8 subsequent siblings)
9 siblings, 0 replies; 12+ messages in thread
From: Liu Ping Fan @ 2014-05-30 8:15 UTC (permalink / raw)
To: linux-scsi
Cc: Adaptec OEM Raid Solutions, Jens Axboe, Paolo Bonzini,
Stefan Hajnoczi, Jeff Moyer
The "request->timeout_list" is under the risk of modified by both timeout
(abort handler) and a "finishing" handler. Using bit REQ_ATOM_COMPLETE in
atomic_flags as a guard to shield this issue.
Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
---
block/blk-timeout.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/block/blk-timeout.c b/block/blk-timeout.c
index 7a882f7..d3067f0 100644
--- a/block/blk-timeout.c
+++ b/block/blk-timeout.c
@@ -120,13 +120,14 @@ void blk_rq_check_expired(struct request *rq, unsigned long *next_timeout,
unsigned int *next_set)
{
if (time_after_eq(jiffies, rq->deadline)) {
- list_del_init(&rq->timeout_list);
/*
* Check if we raced with end io completion
*/
- if (!blk_mark_rq_complete(rq))
+ if (!blk_mark_rq_complete(rq)) {
+ list_del_init(&rq->timeout_list);
blk_rq_timed_out(rq);
+ }
} else if (!*next_set || time_after(*next_timeout, rq->deadline)) {
*next_timeout = rq->deadline;
*next_set = 1;
--
1.8.1.4
^ permalink raw reply related [flat|nested] 12+ messages in thread* [RFC 2/9] scsi: ensure request is dequeue when finishing scmd
2014-05-30 8:15 [RFC 0/9] fix for the race issue between scsi timer and in-flight scmd Liu Ping Fan
2014-05-30 8:15 ` [RFC 1/9] block: make timeout_list protectd by REQ_ATOM_COMPLETE bit Liu Ping Fan
@ 2014-05-30 8:15 ` Liu Ping Fan
2014-05-30 8:15 ` [RFC 3/9] scsi: introduce new internal flag SUCCESS_REMOVE Liu Ping Fan
` (7 subsequent siblings)
9 siblings, 0 replies; 12+ messages in thread
From: Liu Ping Fan @ 2014-05-30 8:15 UTC (permalink / raw)
To: linux-scsi
Cc: Adaptec OEM Raid Solutions, Jens Axboe, Paolo Bonzini,
Stefan Hajnoczi, Jeff Moyer
When a timer requeue a request, it clears out REQ_ATOM_COMPLETE bit.
Hence if there is an in-flight scmd asks scsi-eh to handle an error by
joining the eh_entry list, then when scsi_error_handler() handles a scmd,
the scmd->request may be still on request_queue. This will trigger
the BUG_ON(!list_empty(&req->queuelist)) in __blk_put_request().
Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
---
note: I hit this bug in my test, and the above comment is my guess. Hope for
more comments about it.
---
block/blk.h | 1 -
drivers/scsi/scsi_error.c | 1 +
drivers/scsi/scsi_lib.c | 20 +++++++++++++++++++-
drivers/scsi/scsi_priv.h | 1 +
include/linux/blkdev.h | 1 +
5 files changed, 22 insertions(+), 2 deletions(-)
diff --git a/block/blk.h b/block/blk.h
index c90e1d8..a9cad62 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -29,7 +29,6 @@ int blk_rq_append_bio(struct request_queue *q, struct request *rq,
struct bio *bio);
void blk_queue_bypass_start(struct request_queue *q);
void blk_queue_bypass_end(struct request_queue *q);
-void blk_dequeue_request(struct request *rq);
void __blk_queue_free_tags(struct request_queue *q);
bool __blk_end_bidi_request(struct request *rq, int error,
unsigned int nr_bytes, unsigned int bidi_bytes);
diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index c5f49cf..3b8b95b 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -2068,6 +2068,7 @@ void scsi_eh_flush_done_q(struct list_head *done_q)
SCSI_LOG_ERROR_RECOVERY(3, printk("%s: flush finish"
" cmd: %p\n",
current->comm, scmd));
+ scsi_queue_dequeue(scmd);
scsi_finish_command(scmd);
}
}
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 7fb2afe..e117579 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -210,6 +210,21 @@ void scsi_queue_insert(struct scsi_cmnd *cmd, int reason)
{
__scsi_queue_insert(cmd, reason, 1);
}
+
+void scsi_queue_dequeue(struct scsi_cmnd *cmd)
+{
+ struct scsi_device *device = cmd->device;
+ struct request_queue *q = device->request_queue;
+ struct request *req = cmd->request;
+ unsigned long flags;
+
+ if (list_empty(&cmd->request->queuelist))
+ return;
+ spin_lock_irqsave(q->queue_lock, flags);
+ blk_dequeue_request(req);
+ spin_unlock_irqrestore(q->queue_lock, flags);
+}
+
/**
* scsi_execute - insert request and wait for the result
* @sdev: scsi device
@@ -859,6 +874,7 @@ void scsi_io_completion(struct scsi_cmnd *cmd, unsigned int good_bytes)
*/
req->next_rq->resid_len = scsi_in(cmd)->resid;
+ scsi_queue_dequeue(cmd);
scsi_release_buffers(cmd);
blk_end_request_all(req, 0);
@@ -1038,8 +1054,10 @@ void scsi_io_completion(struct scsi_cmnd *cmd, unsigned int good_bytes)
}
if (blk_end_request_err(req, error))
scsi_requeue_command(q, cmd);
- else
+ else {
+ scsi_queue_dequeue(cmd);
scsi_next_command(cmd);
+ }
break;
case ACTION_REPREP:
/* Unprep the request and put it back at the head of the queue.
diff --git a/drivers/scsi/scsi_priv.h b/drivers/scsi/scsi_priv.h
index f079a59..601b964 100644
--- a/drivers/scsi/scsi_priv.h
+++ b/drivers/scsi/scsi_priv.h
@@ -84,6 +84,7 @@ int scsi_noretry_cmd(struct scsi_cmnd *scmd);
extern int scsi_maybe_unblock_host(struct scsi_device *sdev);
extern void scsi_device_unbusy(struct scsi_device *sdev);
extern void scsi_queue_insert(struct scsi_cmnd *cmd, int reason);
+extern void scsi_queue_dequeue(struct scsi_cmnd *cmd);
extern void scsi_next_command(struct scsi_cmnd *cmd);
extern void scsi_io_completion(struct scsi_cmnd *, unsigned int);
extern void scsi_run_host_queues(struct Scsi_Host *shost);
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 8168524..ee7ddcd 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -916,6 +916,7 @@ static inline unsigned int blk_rq_count_bios(struct request *rq)
*/
extern struct request *blk_peek_request(struct request_queue *q);
extern void blk_start_request(struct request *rq);
+extern void blk_dequeue_request(struct request *rq);
extern struct request *blk_fetch_request(struct request_queue *q);
/*
--
1.8.1.4
^ permalink raw reply related [flat|nested] 12+ messages in thread* [RFC 3/9] scsi: introduce new internal flag SUCCESS_REMOVE
2014-05-30 8:15 [RFC 0/9] fix for the race issue between scsi timer and in-flight scmd Liu Ping Fan
2014-05-30 8:15 ` [RFC 1/9] block: make timeout_list protectd by REQ_ATOM_COMPLETE bit Liu Ping Fan
2014-05-30 8:15 ` [RFC 2/9] scsi: ensure request is dequeue when finishing scmd Liu Ping Fan
@ 2014-05-30 8:15 ` Liu Ping Fan
2014-05-30 8:15 ` [RFC 4/9] blk: change the prototype of blk_complete_request() Liu Ping Fan
` (6 subsequent siblings)
9 siblings, 0 replies; 12+ messages in thread
From: Liu Ping Fan @ 2014-05-30 8:15 UTC (permalink / raw)
To: linux-scsi
Cc: Adaptec OEM Raid Solutions, Jens Axboe, Paolo Bonzini,
Stefan Hajnoczi, Jeff Moyer
Currently when aborting a scsi_cmnd, we can not tell whether this cmd is
canceled or has not been found by scsi_try_to_abort_cmd(). So introducing
this new flag. (In the succeeding patch, if caller success to cancel
the scmd, then it should drop the ref on scmd, otherwise no ops)
Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
---
drivers/scsi/scsi_error.c | 8 +++++++-
include/scsi/scsi.h | 1 +
2 files changed, 8 insertions(+), 1 deletion(-)
diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index 3b8b95b..8ddd8f5 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -869,10 +869,16 @@ static int scsi_try_bus_device_reset(struct scsi_cmnd *scmd)
static int scsi_try_to_abort_cmd(struct scsi_host_template *hostt, struct scsi_cmnd *scmd)
{
+ int ret;
+
if (!hostt->eh_abort_handler)
return FAILED;
- return hostt->eh_abort_handler(scmd);
+ ret = hostt->eh_abort_handler(scmd);
+ /* After introducing ref on scsi_cmnd, here will handle the ref */
+ if (ret == SUCCESS_REMOVE)
+ ret = SUCCESS;
+ return ret;
}
static void scsi_abort_eh_cmnd(struct scsi_cmnd *scmd)
diff --git a/include/scsi/scsi.h b/include/scsi/scsi.h
index d477bfb..4183b01 100644
--- a/include/scsi/scsi.h
+++ b/include/scsi/scsi.h
@@ -488,6 +488,7 @@ static inline int scsi_is_wlun(unsigned int lun)
#define TIMEOUT_ERROR 0x2007
#define SCSI_RETURN_NOT_HANDLED 0x2008
#define FAST_IO_FAIL 0x2009
+#define SUCCESS_REMOVE 0x200a
/*
* Midlevel queue return values.
--
1.8.1.4
^ permalink raw reply related [flat|nested] 12+ messages in thread* [RFC 4/9] blk: change the prototype of blk_complete_request()
2014-05-30 8:15 [RFC 0/9] fix for the race issue between scsi timer and in-flight scmd Liu Ping Fan
` (2 preceding siblings ...)
2014-05-30 8:15 ` [RFC 3/9] scsi: introduce new internal flag SUCCESS_REMOVE Liu Ping Fan
@ 2014-05-30 8:15 ` Liu Ping Fan
2014-05-30 8:15 ` Liu Ping Fan
` (5 subsequent siblings)
9 siblings, 0 replies; 12+ messages in thread
From: Liu Ping Fan @ 2014-05-30 8:15 UTC (permalink / raw)
To: linux-scsi
Cc: Adaptec OEM Raid Solutions, Jens Axboe, Paolo Bonzini,
Stefan Hajnoczi, Jeff Moyer
This patch is for the incoming refcnt on scsi_cmnd. It can help the
scsi_done handler to run against the abort handler in parallel.
In the following patch, if a thread takes the REQ_ATOM_COMPLETE,
then it can drop the ref when end of this request. But if it fails
to take REQ_ATOM_COMPLETE, it should drop the ref immediately.
Here, using the return of blk_complete_request() to notice its caller.
Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
---
block/blk-softirq.c | 13 ++++++++++---
include/linux/blkdev.h | 2 +-
2 files changed, 11 insertions(+), 4 deletions(-)
diff --git a/block/blk-softirq.c b/block/blk-softirq.c
index 467c8de..6a841ae 100644
--- a/block/blk-softirq.c
+++ b/block/blk-softirq.c
@@ -156,6 +156,9 @@ do_local:
* blk_complete_request - end I/O on a request
* @req: the request being processed
*
+ * return -1 if fail to call __blk_complete_request which implies another
+ * routine is handling this req
+ *
* Description:
* Ends all I/O on a request. It does not handle partial completions,
* unless the driver actually implements this in its completion callback
@@ -163,12 +166,16 @@ do_local:
* through a softirq handler. The user must have registered a completion
* callback through blk_queue_softirq_done().
**/
-void blk_complete_request(struct request *req)
+int blk_complete_request(struct request *req)
{
+ int ret = -1;
if (unlikely(blk_should_fake_timeout(req->q)))
- return;
- if (!blk_mark_rq_complete(req))
+ return ret;
+ if (!blk_mark_rq_complete(req)) {
__blk_complete_request(req);
+ ret = 0;
+ }
+ return ret;
}
EXPORT_SYMBOL(blk_complete_request);
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index ee7ddcd..23deadb 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -945,7 +945,7 @@ extern void __blk_end_request_all(struct request *rq, int error);
extern bool __blk_end_request_cur(struct request *rq, int error);
extern bool __blk_end_request_err(struct request *rq, int error);
-extern void blk_complete_request(struct request *);
+extern int blk_complete_request(struct request *);
extern void __blk_complete_request(struct request *);
extern void blk_abort_request(struct request *);
extern void blk_unprep_request(struct request *);
--
1.8.1.4
^ permalink raw reply related [flat|nested] 12+ messages in thread* [RFC 4/9] blk: change the prototype of blk_complete_request()
2014-05-30 8:15 [RFC 0/9] fix for the race issue between scsi timer and in-flight scmd Liu Ping Fan
` (3 preceding siblings ...)
2014-05-30 8:15 ` [RFC 4/9] blk: change the prototype of blk_complete_request() Liu Ping Fan
@ 2014-05-30 8:15 ` Liu Ping Fan
2014-05-30 8:15 ` [RFC 6/9] blk: split the reclaim of req from blk_finish_request() Liu Ping Fan
` (4 subsequent siblings)
9 siblings, 0 replies; 12+ messages in thread
From: Liu Ping Fan @ 2014-05-30 8:15 UTC (permalink / raw)
To: linux-scsi
Cc: Adaptec OEM Raid Solutions, Jens Axboe, Paolo Bonzini,
Stefan Hajnoczi, Jeff Moyer
This patch is for the incoming refcnt on scsi_cmnd. It can help the
scsi_done handler to run against the abort handler in parallel.
In the following patch, if a thread takes the REQ_ATOM_COMPLETE,
then it can drop the ref when end of this request. But if it fails
to take REQ_ATOM_COMPLETE, it should drop the ref immediately.
Here, using the return of blk_complete_request() to notice its caller.
Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
---
block/blk-softirq.c | 13 ++++++++++---
include/linux/blkdev.h | 2 +-
2 files changed, 11 insertions(+), 4 deletions(-)
diff --git a/block/blk-softirq.c b/block/blk-softirq.c
index 467c8de..6a841ae 100644
--- a/block/blk-softirq.c
+++ b/block/blk-softirq.c
@@ -156,6 +156,9 @@ do_local:
* blk_complete_request - end I/O on a request
* @req: the request being processed
*
+ * return -1 if fail to call __blk_complete_request which implies another
+ * routine is handling this req
+ *
* Description:
* Ends all I/O on a request. It does not handle partial completions,
* unless the driver actually implements this in its completion callback
@@ -163,12 +166,16 @@ do_local:
* through a softirq handler. The user must have registered a completion
* callback through blk_queue_softirq_done().
**/
-void blk_complete_request(struct request *req)
+int blk_complete_request(struct request *req)
{
+ int ret = -1;
if (unlikely(blk_should_fake_timeout(req->q)))
- return;
- if (!blk_mark_rq_complete(req))
+ return ret;
+ if (!blk_mark_rq_complete(req)) {
__blk_complete_request(req);
+ ret = 0;
+ }
+ return ret;
}
EXPORT_SYMBOL(blk_complete_request);
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index ee7ddcd..23deadb 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -945,7 +945,7 @@ extern void __blk_end_request_all(struct request *rq, int error);
extern bool __blk_end_request_cur(struct request *rq, int error);
extern bool __blk_end_request_err(struct request *rq, int error);
-extern void blk_complete_request(struct request *);
+extern int blk_complete_request(struct request *);
extern void __blk_complete_request(struct request *);
extern void blk_abort_request(struct request *);
extern void blk_unprep_request(struct request *);
--
1.8.1.4
^ permalink raw reply related [flat|nested] 12+ messages in thread* [RFC 6/9] blk: split the reclaim of req from blk_finish_request()
2014-05-30 8:15 [RFC 0/9] fix for the race issue between scsi timer and in-flight scmd Liu Ping Fan
` (4 preceding siblings ...)
2014-05-30 8:15 ` Liu Ping Fan
@ 2014-05-30 8:15 ` Liu Ping Fan
2014-05-30 8:15 ` [RFC 7/9] scsi: adopt ref on scsi_cmnd to avoid a race on request Liu Ping Fan
` (3 subsequent siblings)
9 siblings, 0 replies; 12+ messages in thread
From: Liu Ping Fan @ 2014-05-30 8:15 UTC (permalink / raw)
To: linux-scsi
Cc: Adaptec OEM Raid Solutions, Jens Axboe, Paolo Bonzini,
Stefan Hajnoczi, Jeff Moyer
Later, the low layer (scsi) can decide when to turn back the mem of
req by blk_reclaim_request()
Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
---
block/blk-core.c | 52 +++++++++++++++++++++++++++++++++-----------------
include/linux/blkdev.h | 6 ++++--
2 files changed, 39 insertions(+), 19 deletions(-)
diff --git a/block/blk-core.c b/block/blk-core.c
index beaca2e..7e8a8ae 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -2495,7 +2495,7 @@ EXPORT_SYMBOL_GPL(blk_unprep_request);
/*
* queue lock must be held
*/
-static void blk_finish_request(struct request *req, int error, int *drop_ref)
+static void __blk_finish_request(struct request *req, int error, int *drop_ref)
{
bool ref;
@@ -2517,7 +2517,10 @@ static void blk_finish_request(struct request *req, int error, int *drop_ref)
blk_unprep_request(req);
blk_account_io_done(req);
+}
+void blk_reclaim_request(struct request *req, int error)
+{
if (req->end_io)
req->end_io(req, error);
else {
@@ -2528,6 +2531,12 @@ static void blk_finish_request(struct request *req, int error, int *drop_ref)
}
}
+static void blk_finish_request(struct request *req, int error, int *drop_ref)
+{
+ __blk_finish_request(req, error, drop_ref);
+ blk_reclaim_request(req, error);
+}
+
/**
* blk_end_bidi_request - Complete a bidi request
* @rq: the request to complete
@@ -2555,7 +2564,11 @@ static bool blk_end_bidi_request(struct request *rq, int error,
return true;
spin_lock_irqsave(q->queue_lock, flags);
- blk_finish_request(rq, error, drop_ref);
+ /* low level wants to manage ref, so not to reclaim rq */
+ if (drop_ref)
+ __blk_finish_request(rq, error, drop_ref);
+ else
+ blk_finish_request(rq, error, drop_ref);
spin_unlock_irqrestore(q->queue_lock, flags);
return false;
@@ -2608,25 +2621,15 @@ bool blk_end_request(struct request *rq, int error, unsigned int nr_bytes)
EXPORT_SYMBOL(blk_end_request);
/**
- * blk_end_request_ref - Helper function for drivers to complete the request.
- * @rq: the request being processed
- * @error: %0 for success, < %0 for error
- * @nr_bytes: number of bytes to complete
- *
- * Description:
- * Ends I/O on a number of bytes attached to @rq.
- * If @rq has leftover, sets it up for the next range of segments.
- *
- * Return:
- * %false - we are done with this request
- * %true - still buffers pending for this request
+ * blk_end_request_noreclaim - same as blk_end_request, but not reclaim rq
**/
-bool blk_end_request_ref(struct request *rq, int error, unsigned int nr_bytes,
- int *drop_ref)
+bool blk_end_request_noreclaim(struct request *rq, int error,
+ unsigned int nr_bytes, int *drop_ref)
{
+ BUG_ON(!drop_ref);
return blk_end_bidi_request(rq, error, nr_bytes, 0, drop_ref);
}
-EXPORT_SYMBOL(blk_end_request_ref);
+EXPORT_SYMBOL(blk_end_request_noreclaim);
/**
* blk_end_request_all - Helper function for drives to finish the request.
@@ -2650,6 +2653,21 @@ void blk_end_request_all(struct request *rq, int error)
}
EXPORT_SYMBOL(blk_end_request_all);
+void blk_end_request_all_noreclaim(struct request *rq, int error)
+{
+ bool pending;
+ unsigned int bidi_bytes = 0;
+ int ref;
+
+ if (unlikely(blk_bidi_rq(rq)))
+ bidi_bytes = blk_rq_bytes(rq->next_rq);
+
+ pending = blk_end_bidi_request(rq, error, blk_rq_bytes(rq), bidi_bytes,
+ &ref);
+ BUG_ON(pending);
+}
+EXPORT_SYMBOL(blk_end_request_all_noreclaim);
+
/**
* blk_end_request_cur - Helper function to finish the current request chunk.
* @rq: the request to finish the current chunk for
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 7a2e79f..70c65d2 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -772,6 +772,7 @@ extern void generic_make_request(struct bio *bio);
extern void blk_rq_init(struct request_queue *q, struct request *rq);
extern void blk_put_request(struct request *);
extern void __blk_put_request(struct request_queue *, struct request *);
+extern void blk_reclaim_request(struct request *req, int error);
extern struct request *blk_get_request(struct request_queue *, int, gfp_t);
extern struct request *blk_make_request(struct request_queue *, struct bio *,
gfp_t);
@@ -934,11 +935,12 @@ extern struct request *blk_fetch_request(struct request_queue *q);
*/
extern bool blk_update_request(struct request *rq, int error,
unsigned int nr_bytes);
-extern bool blk_end_request_ref(struct request *rq, int error,
- unsigned int nr_bytes, int *drop_ref);
extern bool blk_end_request(struct request *rq, int error,
unsigned int nr_bytes);
+extern bool blk_end_request_noreclaim(struct request *rq, int error,
+ unsigned int nr_bytes, int *drop_ref);
extern void blk_end_request_all(struct request *rq, int error);
+extern void blk_end_request_all_noreclaim(struct request *rq, int error);
extern bool blk_end_request_cur(struct request *rq, int error);
extern bool blk_end_request_err(struct request *rq, int error);
extern bool __blk_end_request(struct request *rq, int error,
--
1.8.1.4
^ permalink raw reply related [flat|nested] 12+ messages in thread* [RFC 7/9] scsi: adopt ref on scsi_cmnd to avoid a race on request
2014-05-30 8:15 [RFC 0/9] fix for the race issue between scsi timer and in-flight scmd Liu Ping Fan
` (5 preceding siblings ...)
2014-05-30 8:15 ` [RFC 6/9] blk: split the reclaim of req from blk_finish_request() Liu Ping Fan
@ 2014-05-30 8:15 ` Liu Ping Fan
2014-05-30 8:15 ` [RFC 8/9] scsi: virtscsi: work around to abort a scmd Liu Ping Fan
` (2 subsequent siblings)
9 siblings, 0 replies; 12+ messages in thread
From: Liu Ping Fan @ 2014-05-30 8:15 UTC (permalink / raw)
To: linux-scsi
Cc: Adaptec OEM Raid Solutions, Jens Axboe, Paolo Bonzini,
Stefan Hajnoczi, Jeff Moyer
When running io stress test on large latency disk, e.g guest with a image on nfs.
It can trigger the BUG_ON(test_bit(REQ_ATOM_COMPLETE, &req->atomic_flags));
Since there is a race between latency "finishing" scmd and the re-allocated scmd.
I.e a request is still referred by a scmd, but we had turn it back to
slab. This patch introduces the ref on scmd to exclude this issue.
inc ref rules is like the following:
When setup a scmd, inited as 1. When add a timer inc 1.
dec ref rules is like the following:
-for the normal ref
scsi_done() will drop the ref when fail to acquire REQ_ATOM_COMPLETE immediately
or drop the ref by scsi_end_request()
or drop by return SUCCESS_REMOVE
-for a timer ref
when deleting timer, if !list_empty(timeout_list), then there is a timer ref, and
drop it.
Oops call trace:
[ 3379.866773] ------------[ cut here ]------------
[ 3379.866997] kernel BUG at block/blk-core.c:2295!
[ 3379.867238] Oops: Exception in kernel mode, sig: 5 [#1]
[ 3379.867400] SMP NR_CPUS=2048 NUMA pSeries
[ 3379.867574] Modules linked in: nfsv3 sg rpcsec_gss_krb5 arc4 md4 nfsv4 nls_utf8
[ 3379.879310] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 3.10.0-105.el7.ppc64 #1
[ 3379.879586] task: c0000003f8bf6900 ti: c0000003fffd4000 task.ti: c0000003f8cb4000
[ 3379.879858] NIP: c000000000413324 LR: c000000000413380 CTR: c0000000005b2cf0
[ 3379.881647] REGS: c0000003fffd76b0 TRAP: 0700 Not tainted (3.10.0-105.el7.ppc64)
[ 3379.881932] MSR: 8000000000029032 <SF,EE,ME,IR,DR,RI> CR: 42022042 XER: 00000000
[ 3379.882533] SOFTE: 0
[ 3379.882626] CFAR: c000000000413388
[ 3379.882765]
GPR00: c0000000004132e4 c0000003fffd7930 c0000000012543b8 00000312efc08912
GPR04: c0000003f650f000 c0000003f650f000 c000000000c55e48 000000000009f49d
GPR08: c0000000012e43b8 0000000000000001 0000000000080000 000000002f7c7611
GPR12: 0000000022022044 c00000000fe01000 c000000000b270e8 0000000000010000
GPR16: c000000000b27128 c000000000b27110 c000000000b271c8 c000000000c54480
GPR20: c000000000ac9e00 0000000000000000 c000000000ac9c58 c000000000b271f0
GPR24: c000000003a81948 c000000003a81848 0000000000000000 0000000000000000
GPR28: c0000003f650f000 c0000003efab0000 c0000003efab0004 c0000003f650f000
[ 3379.886311] NIP [c000000000413324] .blk_start_request+0x84/0x100
[ 3379.886539] LR [c000000000413380] .blk_start_request+0xe0/0x100
[ 3379.886760] PACATMSCRATCH [8000000000009032]
[ 3379.886951] Call Trace:
[ 3379.887037] [c0000003fffd7930] [c0000000004132e4] .blk_start_request+0x44/0x100 (unreliable)
[ 3379.887381] [c0000003fffd79b0] [c0000000005b2ef8] .scsi_request_fn+0x208/0x7e0
[ 3379.887756] [c0000003fffd7ad0] [c0000000004141c8] .blk_run_queue+0x68/0xa0
[ 3379.889260] [c0000003fffd7b50] [c0000000005b2098] .scsi_run_queue+0x158/0x300
[ 3379.889645] [c0000003fffd7c10] [c0000000005b5bec] .scsi_io_completion+0x22c/0xe80
[ 3379.890083] [c0000003fffd7ce0] [c0000000005a58a8] .scsi_finish_command+0x108/0x1a0
[ 3379.890483] [c0000003fffd7d70] [c0000000005b5858] .scsi_softirq_done+0x1c8/0x240
[ 3379.890956] [c0000003fffd7e00] [c000000000422b24] .blk_done_softirq+0xa4/0xd0
[ 3379.891364] [c0000003fffd7e90] [c0000000000bb8a8] .__do_softirq+0x148/0x380
[ 3379.891632] [c0000003fffd7f90] [c00000000002462c] .call_do_softirq+0x14/0x24
[ 3379.892008] [c0000003fffd3df0] [c000000000010f70] .do_softirq+0x120/0x170
[ 3379.892429] [c0000003fffd3e80] [c0000000000bbe34] .irq_exit+0x1e4/0x1f0
[ 3379.893180] [c0000003fffd3f10] [c000000000010b60] .__do_irq+0xc0/0x1d0
[ 3379.893578] [c0000003fffd3f90] [c000000000024650] .call_do_irq+0x14/0x24
[ 3379.894018] [c0000003f8cb7830] [c000000000010cfc] .do_IRQ+0x8c/0x100
[ 3379.894539] [c0000003f8cb78d0] [c000000000002494] hardware_interrupt_common+0x114/0x180
[ 3379.895278] --- Exception: 501 at .plpar_hcall_norets+0x84/0xd4
[ 3379.895278] LR = .shared_cede_loop+0x40/0x90
[ 3379.895730] [c0000003f8cb7bc0] [0000000000000000] (null) (unreliable)
[ 3379.896356] [c0000003f8cb7c40] [c0000000006c5054] .cpuidle_idle_call+0x114/0x3c0
[ 3379.897882] [c0000003f8cb7d10] [c00000000007d6f0] .pseries_lpar_idle+0x10/0x50
[ 3379.898200] [c0000003f8cb7d80] [c0000000000186a4] .arch_cpu_idle+0x64/0x150
[ 3379.898475] [c0000003f8cb7e00] [c000000000135b84] .cpu_startup_entry+0x1a4/0x2e0
[ 3379.898860] [c0000003f8cb7ed0] [c0000000008b9a4c] .start_secondary+0x3a8/0x3b0
[ 3379.899173] [c0000003f8cb7f90] [c00000000000976c] .start_secondary_prolog+0x10/0x14
[ 3379.899496] Instruction dump:
[ 3379.899631] 792a77e3 41820010 815f0050 2f8a0001 419e004c e93f0178 815f0064 2fa90000
[ 3379.900085] 915f013c 40de0074 e93f0058 792907e0 <0b090000> 7fe3fb78 48010495 60000000
[ 3379.900546] ---[ end trace d57cacc25e1c8c73 ]---
[ 3379.905937]
[ 3379.906041] Sending IPI to other CPUs
[ 3389.939144] ERROR: 1 cpu(s) not responding
Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
---
drivers/scsi/scsi.c | 33 +++++++++++++++++++++-------
drivers/scsi/scsi_error.c | 5 +++--
drivers/scsi/scsi_lib.c | 56 +++++++++++++++++++++++++++++++++--------------
drivers/scsi/scsi_priv.h | 3 +++
include/scsi/scsi_cmnd.h | 1 +
5 files changed, 72 insertions(+), 26 deletions(-)
diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
index d8afec8..2095edc 100644
--- a/drivers/scsi/scsi.c
+++ b/drivers/scsi/scsi.c
@@ -271,6 +271,8 @@ struct scsi_cmnd *__scsi_get_command(struct Scsi_Host *shost, gfp_t gfp_mask)
}
}
+ if (likely(cmd))
+ atomic_set(&cmd->ref, 1);
return cmd;
}
EXPORT_SYMBOL_GPL(__scsi_get_command);
@@ -320,6 +322,7 @@ void __scsi_put_command(struct Scsi_Host *shost, struct scsi_cmnd *cmd,
{
unsigned long flags;
+ BUG_ON(atomic_read(&cmd->ref) != 0);
/* changing locks here, don't need to restore the irq state */
spin_lock_irqsave(&shost->free_list_lock, flags);
if (unlikely(list_empty(&shost->free_list))) {
@@ -348,15 +351,25 @@ void scsi_put_command(struct scsi_cmnd *cmd)
struct scsi_device *sdev = cmd->device;
unsigned long flags;
- /* serious error if the command hasn't come from a device list */
- spin_lock_irqsave(&cmd->device->list_lock, flags);
- BUG_ON(list_empty(&cmd->list));
- list_del_init(&cmd->list);
- spin_unlock_irqrestore(&cmd->device->list_lock, flags);
+ BUG_ON(atomic_read(&cmd->ref) <= 0);
+ if (atomic_dec_and_test(&cmd->ref)) {
+ smp_mb();
+ if (cmd->request) {
+ BUG_ON(!list_empty(&cmd->request->queuelist));
+ /* scsi layer has no access to req, so let it gone */
+ blk_reclaim_request(cmd->request, cmd->request->errors);
+ }
+ __scsi_release_buffers(cmd, 0);
+ /* serious error if the command hasn't come from a device list */
+ spin_lock_irqsave(&cmd->device->list_lock, flags);
+ BUG_ON(list_empty(&cmd->list));
+ list_del_init(&cmd->list);
+ spin_unlock_irqrestore(&cmd->device->list_lock, flags);
- cancel_delayed_work(&cmd->abort_work);
+ cancel_delayed_work(&cmd->abort_work);
- __scsi_put_command(cmd->device->host, cmd, &sdev->sdev_gendev);
+ __scsi_put_command(cmd->device->host, cmd, &sdev->sdev_gendev);
+ }
}
EXPORT_SYMBOL(scsi_put_command);
@@ -757,8 +770,12 @@ int scsi_dispatch_cmd(struct scsi_cmnd *cmd)
*/
static void scsi_done(struct scsi_cmnd *cmd)
{
+ int ret;
+
trace_scsi_dispatch_cmd_done(cmd);
- blk_complete_request(cmd->request);
+ ret = blk_complete_request(cmd->request);
+ if (ret < 0)
+ scsi_put_command(cmd);
}
/**
diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index 8ddd8f5..52e1adb 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -875,9 +875,10 @@ static int scsi_try_to_abort_cmd(struct scsi_host_template *hostt, struct scsi_c
return FAILED;
ret = hostt->eh_abort_handler(scmd);
- /* After introducing ref on scsi_cmnd, here will handle the ref */
- if (ret == SUCCESS_REMOVE)
+ if (ret == SUCCESS_REMOVE) {
+ scsi_put_command(scmd);
ret = SUCCESS;
+ }
return ret;
}
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index e117579..d5eb2df 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -114,6 +114,7 @@ static void scsi_unprep_request(struct request *req)
struct scsi_cmnd *cmd = req->special;
blk_unprep_request(req);
+ cmd->request = NULL;
req->special = NULL;
scsi_put_command(cmd);
@@ -138,6 +139,7 @@ static void __scsi_queue_insert(struct scsi_cmnd *cmd, int reason, int unbusy)
struct scsi_target *starget = scsi_target(device);
struct request_queue *q = device->request_queue;
unsigned long flags;
+ bool drop;
SCSI_LOG_MLQUEUE(1,
printk("Inserting command %p into mlqueue\n", cmd));
@@ -182,7 +184,10 @@ static void __scsi_queue_insert(struct scsi_cmnd *cmd, int reason, int unbusy)
* before blk_cleanup_queue() finishes.
*/
spin_lock_irqsave(q->queue_lock, flags);
- blk_requeue_request(q, cmd->request);
+ drop = blk_requeue_request(q, cmd->request);
+ if (drop)
+ /* drop the ref by a timer */
+ scsi_put_command(cmd);
kblockd_schedule_work(q, &device->requeue_work);
spin_unlock_irqrestore(q->queue_lock, flags);
}
@@ -587,8 +592,6 @@ void scsi_run_host_queues(struct Scsi_Host *shost)
scsi_run_queue(sdev->request_queue);
}
-static void __scsi_release_buffers(struct scsi_cmnd *, int);
-
/*
* Function: scsi_end_request()
*
@@ -616,15 +619,16 @@ static struct scsi_cmnd *scsi_end_request(struct scsi_cmnd *cmd, int error,
{
struct request_queue *q = cmd->device->request_queue;
struct request *req = cmd->request;
+ int drop = 0;
/*
* If there are blocks left over at the end, set up the command
* to queue the remainder of them.
*/
- if (blk_end_request(req, error, bytes)) {
+ if (blk_end_request_noreclaim(req, error, bytes, &drop)) {
/* kill remainder if no retrys */
if (error && scsi_noretry_cmd(cmd))
- blk_end_request_all(req, error);
+ blk_end_request_all_noreclaim(req, error);
else {
if (requeue) {
/*
@@ -639,12 +643,15 @@ static struct scsi_cmnd *scsi_end_request(struct scsi_cmnd *cmd, int error,
return cmd;
}
}
+ /* drop the ref which is hold by timer */
+ if (drop)
+ scsi_put_command(cmd);
+ req->errors = error;
/*
* This will goose the queue request function at the end, so we don't
* need to worry about launching another command.
*/
- __scsi_release_buffers(cmd, 0);
scsi_next_command(cmd);
return NULL;
}
@@ -700,7 +707,7 @@ static void scsi_free_sgtable(struct scsi_data_buffer *sdb)
__sg_free_table(&sdb->table, SCSI_MAX_SG_SEGMENTS, scsi_sg_free);
}
-static void __scsi_release_buffers(struct scsi_cmnd *cmd, int do_bidi_check)
+void __scsi_release_buffers(struct scsi_cmnd *cmd, int do_bidi_check)
{
if (cmd->sdb.table.nents)
@@ -876,7 +883,7 @@ void scsi_io_completion(struct scsi_cmnd *cmd, unsigned int good_bytes)
scsi_queue_dequeue(cmd);
scsi_release_buffers(cmd);
- blk_end_request_all(req, 0);
+ blk_end_request_all_noreclaim(req, 0);
scsi_next_command(cmd);
return;
@@ -1179,6 +1186,7 @@ static struct scsi_cmnd *scsi_get_cmd_from_req(struct scsi_device *sdev,
req->special = cmd;
} else {
cmd = req->special;
+ BUG_ON(atomic_read(&cmd->ref) <= 0);
}
/* pull a tag out of the request if we have one */
@@ -1321,6 +1329,7 @@ scsi_prep_return(struct request_queue *q, struct request *req, int ret)
/* release the command and kill it */
if (req->special) {
struct scsi_cmnd *cmd = req->special;
+ cmd->request = NULL;
scsi_release_buffers(cmd);
scsi_put_command(cmd);
req->special = NULL;
@@ -1600,6 +1609,7 @@ static void scsi_request_fn(struct request_queue *q)
struct Scsi_Host *shost;
struct scsi_cmnd *cmd;
struct request *req;
+ bool tref;
if(!get_device(&sdev->sdev_gendev))
/* We must be tearing the block queue down already */
@@ -1612,6 +1622,8 @@ static void scsi_request_fn(struct request_queue *q)
shost = sdev->host;
for (;;) {
int rtn;
+
+ tref = false;
/*
* get next queueable request. We do this early to make sure
* that the request is fully prepared even if we cannot
@@ -1629,14 +1641,6 @@ static void scsi_request_fn(struct request_queue *q)
}
- /*
- * Remove the request from the request list.
- */
- if (!(blk_queue_tagged(q) && !blk_queue_start_tag(q, req)))
- blk_start_request(req);
- sdev->device_busy++;
-
- spin_unlock(q->queue_lock);
cmd = req->special;
if (unlikely(cmd == NULL)) {
printk(KERN_CRIT "impossible request in %s.\n"
@@ -1649,6 +1653,23 @@ static void scsi_request_fn(struct request_queue *q)
spin_lock(shost->host_lock);
/*
+ * Remove the request from the request list.
+ */
+ if (!(blk_queue_tagged(q) && !blk_queue_start_tag(q, req))) {
+ blk_start_request(req);
+ if (likely(cmd)) {
+ tref = true;
+ /* a timer references the cmd */
+ atomic_inc(&cmd->ref);
+ /* usual 2, or 3 for done cmd in flight */
+ BUG_ON(atomic_read(&cmd->ref) > 3);
+ }
+ }
+ sdev->device_busy++;
+
+ spin_unlock(q->queue_lock);
+
+ /*
* We hit this when the driver is using a host wide
* tag map. For device level tag maps the queue_depth check
* in the device ready fn would prevent us from trying
@@ -1708,6 +1729,9 @@ static void scsi_request_fn(struct request_queue *q)
*/
spin_lock_irq(q->queue_lock);
blk_requeue_request(q, req);
+ if (tref)
+ /* drop the ref holded by a timer*/
+ scsi_put_command(cmd);
sdev->device_busy--;
out_delay:
if (sdev->device_busy == 0)
diff --git a/drivers/scsi/scsi_priv.h b/drivers/scsi/scsi_priv.h
index 601b964..a8f630c 100644
--- a/drivers/scsi/scsi_priv.h
+++ b/drivers/scsi/scsi_priv.h
@@ -179,4 +179,7 @@ extern int scsi_internal_device_block(struct scsi_device *sdev);
extern int scsi_internal_device_unblock(struct scsi_device *sdev,
enum scsi_device_state new_state);
+extern void __scsi_release_buffers(struct scsi_cmnd *cmd, int do_bidi_check);
+
+
#endif /* _SCSI_PRIV_H */
diff --git a/include/scsi/scsi_cmnd.h b/include/scsi/scsi_cmnd.h
index 029b076..5643026 100644
--- a/include/scsi/scsi_cmnd.h
+++ b/include/scsi/scsi_cmnd.h
@@ -58,6 +58,7 @@ struct scsi_cmnd {
struct delayed_work abort_work;
int eh_eflags; /* Used by error handlr */
+ atomic_t ref;
/*
* A SCSI Command is assigned a nonzero serial_number before passed
* to the driver's queue command function. The serial_number is
--
1.8.1.4
^ permalink raw reply related [flat|nested] 12+ messages in thread* [RFC 8/9] scsi: virtscsi: work around to abort a scmd
2014-05-30 8:15 [RFC 0/9] fix for the race issue between scsi timer and in-flight scmd Liu Ping Fan
` (6 preceding siblings ...)
2014-05-30 8:15 ` [RFC 7/9] scsi: adopt ref on scsi_cmnd to avoid a race on request Liu Ping Fan
@ 2014-05-30 8:15 ` Liu Ping Fan
2014-05-30 8:15 ` [RFC 9/9] scsi: ibmvscsi: return SUCCESS_REMOVE when finding a abort cmd Liu Ping Fan
2014-05-30 8:26 ` [RFC 0/9] fix for the race issue between scsi timer and in-flight scmd Paolo Bonzini
9 siblings, 0 replies; 12+ messages in thread
From: Liu Ping Fan @ 2014-05-30 8:15 UTC (permalink / raw)
To: linux-scsi
Cc: Adaptec OEM Raid Solutions, Jens Axboe, Paolo Bonzini,
Stefan Hajnoczi, Jeff Moyer
This patch is just an "eg and test-issue" for this series. The main
changes is that when abort a scmd, distinguishing the case of removing
from the case of not found. ( I think it should be better to remove the
scmd directly from vq, but for the time being, I have no idea about
this detail in virtio)
Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
---
drivers/scsi/virtio_scsi.c | 61 ++++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 59 insertions(+), 2 deletions(-)
diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
index b26f1a5..d08aae5 100644
--- a/drivers/scsi/virtio_scsi.c
+++ b/drivers/scsi/virtio_scsi.c
@@ -31,8 +31,14 @@
#define VIRTIO_SCSI_EVENT_LEN 8
#define VIRTIO_SCSI_VQ_BASE 2
+/* should be per virtscsi dev */
+static struct list_head active_cmds = LIST_HEAD_INIT(active_cmds);
+static spinlock_t cmds_lock;
+
/* Command queue element */
struct virtio_scsi_cmd {
+ struct list_head list;
+ unsigned long abort;
struct scsi_cmnd *sc;
struct completion *comp;
union {
@@ -148,6 +154,18 @@ static void virtscsi_compute_resid(struct scsi_cmnd *sc, u32 resid)
static void virtscsi_complete_cmd(struct virtio_scsi *vscsi, void *buf)
{
struct virtio_scsi_cmd *cmd = buf;
+ unsigned long flags;
+ int skip = 0;
+ spin_lock_irqsave(&cmds_lock, flags);
+ list_del_init(&cmd->list);
+ if (cmd->abort)
+ skip = 1;
+ spin_unlock_irqrestore(&cmds_lock, flags);
+ if (skip) {
+ mempool_free(cmd, virtscsi_cmd_pool);
+ return;
+ }
+
struct scsi_cmnd *sc = cmd->sc;
struct virtio_scsi_cmd_resp *resp = &cmd->resp.cmd;
struct virtio_scsi_target_state *tgt =
@@ -273,11 +291,16 @@ static void virtscsi_req_done(struct virtqueue *vq)
static void virtscsi_complete_free(struct virtio_scsi *vscsi, void *buf)
{
struct virtio_scsi_cmd *cmd = buf;
+ unsigned long flags;
if (cmd->comp)
complete_all(cmd->comp);
- else
+ else {
+ spin_lock_irqsave(&cmds_lock, flags);
+ list_del_init(&cmd->list);
+ spin_unlock_irqrestore(&cmds_lock, flags);
mempool_free(cmd, virtscsi_cmd_pool);
+ }
}
static void virtscsi_ctrl_done(struct virtqueue *vq)
@@ -477,6 +500,10 @@ static int virtscsi_kick_cmd(struct virtio_scsi_vq *vq,
int err;
bool needs_kick = false;
+ spin_lock_irqsave(&cmds_lock, flags);
+ list_add_tail(&cmd->list, &active_cmds);
+ spin_unlock_irqrestore(&cmds_lock, flags);
+
spin_lock_irqsave(&vq->vq_lock, flags);
err = virtscsi_add_cmd(vq->vq, cmd, req_size, resp_size, gfp);
if (!err)
@@ -495,6 +522,7 @@ static int virtscsi_queuecommand(struct virtio_scsi *vscsi,
{
struct virtio_scsi_cmd *cmd;
int ret;
+ unsigned long flags;
struct Scsi_Host *shost = virtio_scsi_host(vscsi->vdev);
BUG_ON(scsi_sg_count(sc) > shost->sg_tablesize);
@@ -511,6 +539,7 @@ static int virtscsi_queuecommand(struct virtio_scsi *vscsi,
goto out;
memset(cmd, 0, sizeof(*cmd));
+ INIT_LIST_HEAD(&cmd->list);
cmd->sc = sc;
cmd->req.cmd = (struct virtio_scsi_cmd_req){
.lun[0] = 1,
@@ -530,8 +559,12 @@ static int virtscsi_queuecommand(struct virtio_scsi *vscsi,
sizeof cmd->req.cmd, sizeof cmd->resp.cmd,
GFP_ATOMIC) == 0)
ret = 0;
- else
+ else {
+ spin_lock_irqsave(&cmds_lock, flags);
+ list_del_init(&cmd->list);
+ spin_unlock_irqrestore(&cmds_lock, flags);
mempool_free(cmd, virtscsi_cmd_pool);
+ }
out:
return ret;
@@ -590,6 +623,7 @@ static int virtscsi_tmf(struct virtio_scsi *vscsi, struct virtio_scsi_cmd *cmd)
{
DECLARE_COMPLETION_ONSTACK(comp);
int ret = FAILED;
+ unsigned long flags;
cmd->comp = ∁
if (virtscsi_kick_cmd(&vscsi->ctrl_vq, cmd,
@@ -603,6 +637,9 @@ static int virtscsi_tmf(struct virtio_scsi *vscsi, struct virtio_scsi_cmd *cmd)
ret = SUCCESS;
out:
+ spin_lock_irqsave(&cmds_lock, flags);
+ list_del_init(&cmd->list);
+ spin_unlock_irqrestore(&cmds_lock, flags);
mempool_free(cmd, virtscsi_cmd_pool);
return ret;
}
@@ -618,6 +655,7 @@ static int virtscsi_device_reset(struct scsi_cmnd *sc)
return FAILED;
memset(cmd, 0, sizeof(*cmd));
+ INIT_LIST_HEAD(&cmd->list);
cmd->sc = sc;
cmd->req.tmf = (struct virtio_scsi_ctrl_tmf_req){
.type = VIRTIO_SCSI_T_TMF,
@@ -634,13 +672,31 @@ static int virtscsi_abort(struct scsi_cmnd *sc)
{
struct virtio_scsi *vscsi = shost_priv(sc->device->host);
struct virtio_scsi_cmd *cmd;
+ struct virtio_scsi_cmd *entry, *tmp;
+ unsigned long flags, found = 0;
scmd_printk(KERN_INFO, sc, "abort\n");
+ spin_lock_irqsave(&cmds_lock, flags);
+ list_for_each_entry_safe(entry, tmp, &active_cmds, list) {
+ if (entry->sc == sc) {
+ list_del_init(&entry->list);
+ /* mark the virtio_scsi_cmd as aborted
+ */
+ entry->abort = 1;
+ found = 1;
+ }
+
+ }
+ spin_unlock_irqrestore(&cmds_lock, flags);
+ if (found)
+ return SUCCESS_REMOVE;
+
cmd = mempool_alloc(virtscsi_cmd_pool, GFP_NOIO);
if (!cmd)
return FAILED;
memset(cmd, 0, sizeof(*cmd));
+ INIT_LIST_HEAD(&cmd->list);
cmd->sc = sc;
cmd->req.tmf = (struct virtio_scsi_ctrl_tmf_req){
.type = VIRTIO_SCSI_T_TMF,
@@ -1030,6 +1086,7 @@ static int __init init(void)
if (ret < 0)
goto error;
+ spin_lock_init(&cmds_lock);
return 0;
error:
--
1.8.1.4
^ permalink raw reply related [flat|nested] 12+ messages in thread* [RFC 9/9] scsi: ibmvscsi: return SUCCESS_REMOVE when finding a abort cmd
2014-05-30 8:15 [RFC 0/9] fix for the race issue between scsi timer and in-flight scmd Liu Ping Fan
` (7 preceding siblings ...)
2014-05-30 8:15 ` [RFC 8/9] scsi: virtscsi: work around to abort a scmd Liu Ping Fan
@ 2014-05-30 8:15 ` Liu Ping Fan
2014-05-30 8:26 ` [RFC 0/9] fix for the race issue between scsi timer and in-flight scmd Paolo Bonzini
9 siblings, 0 replies; 12+ messages in thread
From: Liu Ping Fan @ 2014-05-30 8:15 UTC (permalink / raw)
To: linux-scsi
Cc: Adaptec OEM Raid Solutions, Jens Axboe, Paolo Bonzini,
Stefan Hajnoczi, Jeff Moyer
When return SUCCESS_REMOVE, it can benifit the ref on scsi_cmnd
Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
---
drivers/scsi/ibmvscsi/ibmvscsi.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/scsi/ibmvscsi/ibmvscsi.c b/drivers/scsi/ibmvscsi/ibmvscsi.c
index d0fa4b6..cf78be5 100644
--- a/drivers/scsi/ibmvscsi/ibmvscsi.c
+++ b/drivers/scsi/ibmvscsi/ibmvscsi.c
@@ -1583,7 +1583,7 @@ static int ibmvscsi_eh_abort_handler(struct scsi_cmnd *cmd)
free_event_struct(&found_evt->hostdata->pool, found_evt);
spin_unlock_irqrestore(hostdata->host->host_lock, flags);
atomic_inc(&hostdata->request_limit);
- return SUCCESS;
+ return SUCCESS_REMOVE;
}
/**
--
1.8.1.4
^ permalink raw reply related [flat|nested] 12+ messages in thread* Re: [RFC 0/9] fix for the race issue between scsi timer and in-flight scmd
2014-05-30 8:15 [RFC 0/9] fix for the race issue between scsi timer and in-flight scmd Liu Ping Fan
` (8 preceding siblings ...)
2014-05-30 8:15 ` [RFC 9/9] scsi: ibmvscsi: return SUCCESS_REMOVE when finding a abort cmd Liu Ping Fan
@ 2014-05-30 8:26 ` Paolo Bonzini
2014-05-30 8:31 ` liu ping fan
9 siblings, 1 reply; 12+ messages in thread
From: Paolo Bonzini @ 2014-05-30 8:26 UTC (permalink / raw)
To: Liu Ping Fan, linux-scsi
Cc: Adaptec OEM Raid Solutions, Jens Axboe, Stefan Hajnoczi,
Jeff Moyer
Il 30/05/2014 10:15, Liu Ping Fan ha scritto:
> When running io stress test on large latency scsi-disk, e.g guest with virtscsi
> on a nfs image. It can trigger the BUG_ON(test_bit(REQ_ATOM_COMPLETE, &req->atomic_flags));
> in blk_start_request().
> Since there is a race between latency "finishing" scmd and the re-allocated scmd.
> I.e a request is still referred by a scmd, but we had turn it back to
> slab.
>
> This series introduces the ref on scmd to exclude this issue, and the following is ref rules.
>
> inc ref rules is like the following:
> When setup a scmd, inited as 1. When add a timer inc 1.
>
> dec ref rules is like the following:
> -for the normal ref
> scsi_done() will drop the ref when fail to acquire REQ_ATOM_COMPLETE immediately
> or drop the ref by scsi_end_request()
> or drop by return SUCCESS_REMOVE
> -for a timer ref
> when deleting timer, if !list_empty(timeout_list), then there is a timer ref, and
> drop it.
>
>
> patch1-2: fix the current potential bug
> patch3~6: prepare for the mechanism for the ref
> patch7: the ref rules core
> patch8-9: e.g and test-issue for the new mechanism. Since lack of many virtscsi background,
> patch8 may be poor and need to be improved :)
>
>
> Note: all the patches are based on rhel7, whose kernel version is linux-3.10.
> I will rebase them onto the latest commit if my method is practical.
This series is not necessary, this is a bug in the virtscsi driver. I
have a ten line patch to fix it, but I haven't yet tested it properly.
Paolo
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: [RFC 0/9] fix for the race issue between scsi timer and in-flight scmd
2014-05-30 8:26 ` [RFC 0/9] fix for the race issue between scsi timer and in-flight scmd Paolo Bonzini
@ 2014-05-30 8:31 ` liu ping fan
0 siblings, 0 replies; 12+ messages in thread
From: liu ping fan @ 2014-05-30 8:31 UTC (permalink / raw)
To: Paolo Bonzini
Cc: Liu Ping Fan, linux-scsi, Adaptec OEM Raid Solutions, Jens Axboe,
Stefan Hajnoczi, Jeff Moyer
On Fri, May 30, 2014 at 4:26 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> Il 30/05/2014 10:15, Liu Ping Fan ha scritto:
>
>> When running io stress test on large latency scsi-disk, e.g guest with
>> virtscsi
>> on a nfs image. It can trigger the BUG_ON(test_bit(REQ_ATOM_COMPLETE,
>> &req->atomic_flags));
>> in blk_start_request().
>> Since there is a race between latency "finishing" scmd and the
>> re-allocated scmd.
>> I.e a request is still referred by a scmd, but we had turn it back to
>> slab.
>>
>> This series introduces the ref on scmd to exclude this issue, and the
>> following is ref rules.
>>
>> inc ref rules is like the following:
>> When setup a scmd, inited as 1. When add a timer inc 1.
>>
>> dec ref rules is like the following:
>> -for the normal ref
>> scsi_done() will drop the ref when fail to acquire
>> REQ_ATOM_COMPLETE immediately
>> or drop the ref by scsi_end_request()
>> or drop by return SUCCESS_REMOVE
>> -for a timer ref
>> when deleting timer, if !list_empty(timeout_list), then there is a
>> timer ref, and
>> drop it.
>>
>>
>> patch1-2: fix the current potential bug
>> patch3~6: prepare for the mechanism for the ref
>> patch7: the ref rules core
>> patch8-9: e.g and test-issue for the new mechanism. Since lack of many
>> virtscsi background,
>> patch8 may be poor and need to be improved :)
>>
>>
>> Note: all the patches are based on rhel7, whose kernel version is
>> linux-3.10.
>> I will rebase them onto the latest commit if my method is practical.
>
>
> This series is not necessary, this is a bug in the virtscsi driver. I have
> a ten line patch to fix it, but I haven't yet tested it properly.
>
Could I pick up your patch, and test it?
Thx,
Fan
> Paolo
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 12+ messages in thread