[RFC 0/9] fix for the race issue between scsi timer and in-flight scmd

linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [RFC 0/9] fix for the race issue between scsi timer and in-flight scmd
@ 2014-05-30  8:15 Liu Ping Fan
  2014-05-30  8:15 ` [RFC 1/9] block: make timeout_list protectd by REQ_ATOM_COMPLETE bit Liu Ping Fan
                   ` (9 more replies)
  0 siblings, 10 replies; 12+ messages in thread
From: Liu Ping Fan @ 2014-05-30  8:15 UTC (permalink / raw)
  To: linux-scsi
  Cc: Adaptec OEM Raid Solutions, Jens Axboe, Paolo Bonzini,
	Stefan Hajnoczi, Jeff Moyer

When running io stress test on large latency scsi-disk, e.g guest with virtscsi
on a nfs image. It can trigger the BUG_ON(test_bit(REQ_ATOM_COMPLETE, &req->atomic_flags));
in blk_start_request().
Since there is a race between latency "finishing" scmd and the re-allocated scmd.
I.e a request is still referred by a scmd, but we had turn it back to
slab. 

This series introduces the ref on scmd to exclude this issue, and the following is ref rules.

  inc ref rules is like the following:
    When setup a scmd, inited as 1. When add a timer inc 1.
  
  dec ref rules is like the following:
    -for the normal ref
       scsi_done() will drop the ref when fail to acquire REQ_ATOM_COMPLETE immediately
       or drop the ref by scsi_end_request()
       or drop by return SUCCESS_REMOVE
    -for a timer ref
       when deleting timer, if !list_empty(timeout_list), then there is a timer ref, and
       drop it.


patch1-2: fix the current potential bug 
patch3~6: prepare for the mechanism for the ref
patch7:   the ref rules core
patch8-9:  e.g and test-issue for the new mechanism. Since lack of many virtscsi background,
           patch8 may be poor and need to be improved :)


Note: all the patches are based on rhel7, whose kernel version is linux-3.10.
      I will rebase them onto the latest commit if my method is practical.


Liu Ping Fan (9):
  block: make timeout_list protectd by REQ_ATOM_COMPLETE bit
  scsi: ensure request is dequeue when finishing scmd
  scsi: introduce new internal flag SUCCESS_REMOVE
  blk: change the prototype of blk_complete_request()
  blk: change funcs' prototype to expose the ref of timer
  blk: split the reclaim of req from blk_finish_request()
  scsi: adopt ref on scsi_cmnd to avoid a race on request
  scsi: virtscsi: work around to abort a scmd
  scsi: ibmvscsi: return SUCCESS_REMOVE when finding a abort cmd

 block/blk-core.c                 | 67 ++++++++++++++++++++++++++++++-----
 block/blk-softirq.c              | 13 +++++--
 block/blk-timeout.c              | 15 +++++---
 block/blk.h                      |  3 +-
 drivers/scsi/ibmvscsi/ibmvscsi.c |  2 +-
 drivers/scsi/scsi.c              | 33 ++++++++++++-----
 drivers/scsi/scsi_error.c        | 10 +++++-
 drivers/scsi/scsi_lib.c          | 76 +++++++++++++++++++++++++++++++---------
 drivers/scsi/scsi_priv.h         |  4 +++
 drivers/scsi/virtio_scsi.c       | 61 ++++++++++++++++++++++++++++++--
 include/linux/blkdev.h           |  9 +++--
 include/scsi/scsi.h              |  1 +
 include/scsi/scsi_cmnd.h         |  1 +
 13 files changed, 246 insertions(+), 49 deletions(-)

-- 
1.8.1.4


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [RFC 1/9] block: make timeout_list protectd by REQ_ATOM_COMPLETE bit
  2014-05-30  8:15 [RFC 0/9] fix for the race issue between scsi timer and in-flight scmd Liu Ping Fan
@ 2014-05-30  8:15 ` Liu Ping Fan
  2014-05-30  8:15 ` [RFC 2/9] scsi: ensure request is dequeue when finishing scmd Liu Ping Fan
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 12+ messages in thread
From: Liu Ping Fan @ 2014-05-30  8:15 UTC (permalink / raw)
  To: linux-scsi
  Cc: Adaptec OEM Raid Solutions, Jens Axboe, Paolo Bonzini,
	Stefan Hajnoczi, Jeff Moyer

The "request->timeout_list" is under the risk of modified by both timeout
(abort handler) and a "finishing" handler. Using bit REQ_ATOM_COMPLETE in
atomic_flags as a guard to shield this issue.

Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
---
 block/blk-timeout.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/block/blk-timeout.c b/block/blk-timeout.c
index 7a882f7..d3067f0 100644
--- a/block/blk-timeout.c
+++ b/block/blk-timeout.c
@@ -120,13 +120,14 @@ void blk_rq_check_expired(struct request *rq, unsigned long *next_timeout,
 			  unsigned int *next_set)
 {
 	if (time_after_eq(jiffies, rq->deadline)) {
-		list_del_init(&rq->timeout_list);
 
 		/*
 		 * Check if we raced with end io completion
 		 */
-		if (!blk_mark_rq_complete(rq))
+		if (!blk_mark_rq_complete(rq)) {
+			list_del_init(&rq->timeout_list);
 			blk_rq_timed_out(rq);
+		}
 	} else if (!*next_set || time_after(*next_timeout, rq->deadline)) {
 		*next_timeout = rq->deadline;
 		*next_set = 1;
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RFC 2/9] scsi: ensure request is dequeue when finishing scmd
  2014-05-30  8:15 [RFC 0/9] fix for the race issue between scsi timer and in-flight scmd Liu Ping Fan
  2014-05-30  8:15 ` [RFC 1/9] block: make timeout_list protectd by REQ_ATOM_COMPLETE bit Liu Ping Fan
@ 2014-05-30  8:15 ` Liu Ping Fan
  2014-05-30  8:15 ` [RFC 3/9] scsi: introduce new internal flag SUCCESS_REMOVE Liu Ping Fan
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 12+ messages in thread
From: Liu Ping Fan @ 2014-05-30  8:15 UTC (permalink / raw)
  To: linux-scsi
  Cc: Adaptec OEM Raid Solutions, Jens Axboe, Paolo Bonzini,
	Stefan Hajnoczi, Jeff Moyer

When a timer requeue a request, it clears out REQ_ATOM_COMPLETE bit.
Hence if there is an in-flight scmd asks scsi-eh to handle an error by
joining the eh_entry list, then when scsi_error_handler() handles a scmd,
the scmd->request may be still on request_queue.  This will trigger
the BUG_ON(!list_empty(&req->queuelist)) in __blk_put_request().

Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
---
note: I hit this bug in my test, and the above comment is my guess. Hope for
more comments about it.

---
 block/blk.h               |  1 -
 drivers/scsi/scsi_error.c |  1 +
 drivers/scsi/scsi_lib.c   | 20 +++++++++++++++++++-
 drivers/scsi/scsi_priv.h  |  1 +
 include/linux/blkdev.h    |  1 +
 5 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/block/blk.h b/block/blk.h
index c90e1d8..a9cad62 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -29,7 +29,6 @@ int blk_rq_append_bio(struct request_queue *q, struct request *rq,
 		      struct bio *bio);
 void blk_queue_bypass_start(struct request_queue *q);
 void blk_queue_bypass_end(struct request_queue *q);
-void blk_dequeue_request(struct request *rq);
 void __blk_queue_free_tags(struct request_queue *q);
 bool __blk_end_bidi_request(struct request *rq, int error,
 			    unsigned int nr_bytes, unsigned int bidi_bytes);
diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index c5f49cf..3b8b95b 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -2068,6 +2068,7 @@ void scsi_eh_flush_done_q(struct list_head *done_q)
 			SCSI_LOG_ERROR_RECOVERY(3, printk("%s: flush finish"
 							" cmd: %p\n",
 							current->comm, scmd));
+			scsi_queue_dequeue(scmd);
 			scsi_finish_command(scmd);
 		}
 	}
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 7fb2afe..e117579 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -210,6 +210,21 @@ void scsi_queue_insert(struct scsi_cmnd *cmd, int reason)
 {
 	__scsi_queue_insert(cmd, reason, 1);
 }
+
+void scsi_queue_dequeue(struct scsi_cmnd *cmd)
+{
+	struct scsi_device *device = cmd->device;
+	struct request_queue *q = device->request_queue;
+	struct request *req = cmd->request;
+	unsigned long flags;
+
+	if (list_empty(&cmd->request->queuelist))
+		return;
+	spin_lock_irqsave(q->queue_lock, flags);
+	blk_dequeue_request(req);
+	spin_unlock_irqrestore(q->queue_lock, flags);
+}
+
 /**
  * scsi_execute - insert request and wait for the result
  * @sdev:	scsi device
@@ -859,6 +874,7 @@ void scsi_io_completion(struct scsi_cmnd *cmd, unsigned int good_bytes)
 			 */
 			req->next_rq->resid_len = scsi_in(cmd)->resid;
 
+			scsi_queue_dequeue(cmd);
 			scsi_release_buffers(cmd);
 			blk_end_request_all(req, 0);
 
@@ -1038,8 +1054,10 @@ void scsi_io_completion(struct scsi_cmnd *cmd, unsigned int good_bytes)
 		}
 		if (blk_end_request_err(req, error))
 			scsi_requeue_command(q, cmd);
-		else
+		else {
+			scsi_queue_dequeue(cmd);
 			scsi_next_command(cmd);
+		}
 		break;
 	case ACTION_REPREP:
 		/* Unprep the request and put it back at the head of the queue.
diff --git a/drivers/scsi/scsi_priv.h b/drivers/scsi/scsi_priv.h
index f079a59..601b964 100644
--- a/drivers/scsi/scsi_priv.h
+++ b/drivers/scsi/scsi_priv.h
@@ -84,6 +84,7 @@ int scsi_noretry_cmd(struct scsi_cmnd *scmd);
 extern int scsi_maybe_unblock_host(struct scsi_device *sdev);
 extern void scsi_device_unbusy(struct scsi_device *sdev);
 extern void scsi_queue_insert(struct scsi_cmnd *cmd, int reason);
+extern void scsi_queue_dequeue(struct scsi_cmnd *cmd);
 extern void scsi_next_command(struct scsi_cmnd *cmd);
 extern void scsi_io_completion(struct scsi_cmnd *, unsigned int);
 extern void scsi_run_host_queues(struct Scsi_Host *shost);
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 8168524..ee7ddcd 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -916,6 +916,7 @@ static inline unsigned int blk_rq_count_bios(struct request *rq)
  */
 extern struct request *blk_peek_request(struct request_queue *q);
 extern void blk_start_request(struct request *rq);
+extern void blk_dequeue_request(struct request *rq);
 extern struct request *blk_fetch_request(struct request_queue *q);
 
 /*
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RFC 3/9] scsi: introduce new internal flag SUCCESS_REMOVE
  2014-05-30  8:15 [RFC 0/9] fix for the race issue between scsi timer and in-flight scmd Liu Ping Fan
  2014-05-30  8:15 ` [RFC 1/9] block: make timeout_list protectd by REQ_ATOM_COMPLETE bit Liu Ping Fan
  2014-05-30  8:15 ` [RFC 2/9] scsi: ensure request is dequeue when finishing scmd Liu Ping Fan
@ 2014-05-30  8:15 ` Liu Ping Fan
  2014-05-30  8:15 ` [RFC 4/9] blk: change the prototype of blk_complete_request() Liu Ping Fan
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 12+ messages in thread
From: Liu Ping Fan @ 2014-05-30  8:15 UTC (permalink / raw)
  To: linux-scsi
  Cc: Adaptec OEM Raid Solutions, Jens Axboe, Paolo Bonzini,
	Stefan Hajnoczi, Jeff Moyer

Currently when aborting a scsi_cmnd, we can not tell whether this cmd is
canceled or has not been found by scsi_try_to_abort_cmd(). So introducing
this new flag. (In the succeeding patch, if caller success to cancel
the scmd, then it should drop the ref on scmd, otherwise no ops)

Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
---
 drivers/scsi/scsi_error.c | 8 +++++++-
 include/scsi/scsi.h       | 1 +
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index 3b8b95b..8ddd8f5 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -869,10 +869,16 @@ static int scsi_try_bus_device_reset(struct scsi_cmnd *scmd)
 
 static int scsi_try_to_abort_cmd(struct scsi_host_template *hostt, struct scsi_cmnd *scmd)
 {
+	int ret;
+
 	if (!hostt->eh_abort_handler)
 		return FAILED;
 
-	return hostt->eh_abort_handler(scmd);
+	ret = hostt->eh_abort_handler(scmd);
+	/* After introducing ref on scsi_cmnd, here will handle the ref */
+	if (ret == SUCCESS_REMOVE)
+		ret = SUCCESS;
+	return ret;
 }
 
 static void scsi_abort_eh_cmnd(struct scsi_cmnd *scmd)
diff --git a/include/scsi/scsi.h b/include/scsi/scsi.h
index d477bfb..4183b01 100644
--- a/include/scsi/scsi.h
+++ b/include/scsi/scsi.h
@@ -488,6 +488,7 @@ static inline int scsi_is_wlun(unsigned int lun)
 #define TIMEOUT_ERROR   0x2007
 #define SCSI_RETURN_NOT_HANDLED   0x2008
 #define FAST_IO_FAIL	0x2009
+#define SUCCESS_REMOVE 0x200a
 
 /*
  * Midlevel queue return values.
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RFC 4/9] blk: change the prototype of blk_complete_request()
  2014-05-30  8:15 [RFC 0/9] fix for the race issue between scsi timer and in-flight scmd Liu Ping Fan
                   ` (2 preceding siblings ...)
  2014-05-30  8:15 ` [RFC 3/9] scsi: introduce new internal flag SUCCESS_REMOVE Liu Ping Fan
@ 2014-05-30  8:15 ` Liu Ping Fan
  2014-05-30  8:15 ` Liu Ping Fan
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 12+ messages in thread
From: Liu Ping Fan @ 2014-05-30  8:15 UTC (permalink / raw)
  To: linux-scsi
  Cc: Adaptec OEM Raid Solutions, Jens Axboe, Paolo Bonzini,
	Stefan Hajnoczi, Jeff Moyer

This patch is for the incoming refcnt on scsi_cmnd. It can help the
scsi_done handler to run against the abort handler in parallel.

In the following patch, if a thread takes the REQ_ATOM_COMPLETE,
then it can drop the ref when end of this request. But if it fails
to take REQ_ATOM_COMPLETE, it should drop the ref immediately.
Here, using the return of blk_complete_request() to notice its caller.

Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
---
 block/blk-softirq.c    | 13 ++++++++++---
 include/linux/blkdev.h |  2 +-
 2 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/block/blk-softirq.c b/block/blk-softirq.c
index 467c8de..6a841ae 100644
--- a/block/blk-softirq.c
+++ b/block/blk-softirq.c
@@ -156,6 +156,9 @@ do_local:
  * blk_complete_request - end I/O on a request
  * @req:      the request being processed
  *
+ * return -1 if fail to call __blk_complete_request which implies another
+ * routine is handling this req
+ *
  * Description:
  *     Ends all I/O on a request. It does not handle partial completions,
  *     unless the driver actually implements this in its completion callback
@@ -163,12 +166,16 @@ do_local:
  *     through a softirq handler. The user must have registered a completion
  *     callback through blk_queue_softirq_done().
  **/
-void blk_complete_request(struct request *req)
+int blk_complete_request(struct request *req)
 {
+	int ret = -1;
 	if (unlikely(blk_should_fake_timeout(req->q)))
-		return;
-	if (!blk_mark_rq_complete(req))
+		return ret;
+	if (!blk_mark_rq_complete(req)) {
 		__blk_complete_request(req);
+		ret = 0;
+	}
+	return ret;
 }
 EXPORT_SYMBOL(blk_complete_request);
 
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index ee7ddcd..23deadb 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -945,7 +945,7 @@ extern void __blk_end_request_all(struct request *rq, int error);
 extern bool __blk_end_request_cur(struct request *rq, int error);
 extern bool __blk_end_request_err(struct request *rq, int error);
 
-extern void blk_complete_request(struct request *);
+extern int blk_complete_request(struct request *);
 extern void __blk_complete_request(struct request *);
 extern void blk_abort_request(struct request *);
 extern void blk_unprep_request(struct request *);
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RFC 4/9] blk: change the prototype of blk_complete_request()
  2014-05-30  8:15 [RFC 0/9] fix for the race issue between scsi timer and in-flight scmd Liu Ping Fan
                   ` (3 preceding siblings ...)
  2014-05-30  8:15 ` [RFC 4/9] blk: change the prototype of blk_complete_request() Liu Ping Fan
@ 2014-05-30  8:15 ` Liu Ping Fan
  2014-05-30  8:15 ` [RFC 6/9] blk: split the reclaim of req from blk_finish_request() Liu Ping Fan
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 12+ messages in thread
From: Liu Ping Fan @ 2014-05-30  8:15 UTC (permalink / raw)
  To: linux-scsi
  Cc: Adaptec OEM Raid Solutions, Jens Axboe, Paolo Bonzini,
	Stefan Hajnoczi, Jeff Moyer

This patch is for the incoming refcnt on scsi_cmnd. It can help the
scsi_done handler to run against the abort handler in parallel.

In the following patch, if a thread takes the REQ_ATOM_COMPLETE,
then it can drop the ref when end of this request. But if it fails
to take REQ_ATOM_COMPLETE, it should drop the ref immediately.
Here, using the return of blk_complete_request() to notice its caller.

Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
---
 block/blk-softirq.c    | 13 ++++++++++---
 include/linux/blkdev.h |  2 +-
 2 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/block/blk-softirq.c b/block/blk-softirq.c
index 467c8de..6a841ae 100644
--- a/block/blk-softirq.c
+++ b/block/blk-softirq.c
@@ -156,6 +156,9 @@ do_local:
  * blk_complete_request - end I/O on a request
  * @req:      the request being processed
  *
+ * return -1 if fail to call __blk_complete_request which implies another
+ * routine is handling this req
+ *
  * Description:
  *     Ends all I/O on a request. It does not handle partial completions,
  *     unless the driver actually implements this in its completion callback
@@ -163,12 +166,16 @@ do_local:
  *     through a softirq handler. The user must have registered a completion
  *     callback through blk_queue_softirq_done().
  **/
-void blk_complete_request(struct request *req)
+int blk_complete_request(struct request *req)
 {
+	int ret = -1;
 	if (unlikely(blk_should_fake_timeout(req->q)))
-		return;
-	if (!blk_mark_rq_complete(req))
+		return ret;
+	if (!blk_mark_rq_complete(req)) {
 		__blk_complete_request(req);
+		ret = 0;
+	}
+	return ret;
 }
 EXPORT_SYMBOL(blk_complete_request);
 
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index ee7ddcd..23deadb 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -945,7 +945,7 @@ extern void __blk_end_request_all(struct request *rq, int error);
 extern bool __blk_end_request_cur(struct request *rq, int error);
 extern bool __blk_end_request_err(struct request *rq, int error);
 
-extern void blk_complete_request(struct request *);
+extern int blk_complete_request(struct request *);
 extern void __blk_complete_request(struct request *);
 extern void blk_abort_request(struct request *);
 extern void blk_unprep_request(struct request *);
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RFC 6/9] blk: split the reclaim of req from blk_finish_request()
  2014-05-30  8:15 [RFC 0/9] fix for the race issue between scsi timer and in-flight scmd Liu Ping Fan
                   ` (4 preceding siblings ...)
  2014-05-30  8:15 ` Liu Ping Fan
@ 2014-05-30  8:15 ` Liu Ping Fan
  2014-05-30  8:15 ` [RFC 7/9] scsi: adopt ref on scsi_cmnd to avoid a race on request Liu Ping Fan
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 12+ messages in thread
From: Liu Ping Fan @ 2014-05-30  8:15 UTC (permalink / raw)
  To: linux-scsi
  Cc: Adaptec OEM Raid Solutions, Jens Axboe, Paolo Bonzini,
	Stefan Hajnoczi, Jeff Moyer

Later, the low layer (scsi) can decide when to turn back the mem of
req by blk_reclaim_request()

Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
---
 block/blk-core.c       | 52 +++++++++++++++++++++++++++++++++-----------------
 include/linux/blkdev.h |  6 ++++--
 2 files changed, 39 insertions(+), 19 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index beaca2e..7e8a8ae 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -2495,7 +2495,7 @@ EXPORT_SYMBOL_GPL(blk_unprep_request);
 /*
  * queue lock must be held
  */
-static void blk_finish_request(struct request *req, int error, int *drop_ref)
+static void __blk_finish_request(struct request *req, int error, int *drop_ref)
 {
 	bool ref;
 
@@ -2517,7 +2517,10 @@ static void blk_finish_request(struct request *req, int error, int *drop_ref)
 		blk_unprep_request(req);
 
 	blk_account_io_done(req);
+}
 
+void blk_reclaim_request(struct request *req, int error)
+{
 	if (req->end_io)
 		req->end_io(req, error);
 	else {
@@ -2528,6 +2531,12 @@ static void blk_finish_request(struct request *req, int error, int *drop_ref)
 	}
 }
 
+static void blk_finish_request(struct request *req, int error, int *drop_ref)
+{
+	__blk_finish_request(req, error, drop_ref);
+	blk_reclaim_request(req, error);
+}
+
 /**
  * blk_end_bidi_request - Complete a bidi request
  * @rq:         the request to complete
@@ -2555,7 +2564,11 @@ static bool blk_end_bidi_request(struct request *rq, int error,
 		return true;
 
 	spin_lock_irqsave(q->queue_lock, flags);
-	blk_finish_request(rq, error, drop_ref);
+	/* low level wants to manage ref, so not to reclaim rq */
+	if (drop_ref)
+		__blk_finish_request(rq, error, drop_ref);
+	else
+		blk_finish_request(rq, error, drop_ref);
 	spin_unlock_irqrestore(q->queue_lock, flags);
 
 	return false;
@@ -2608,25 +2621,15 @@ bool blk_end_request(struct request *rq, int error, unsigned int nr_bytes)
 EXPORT_SYMBOL(blk_end_request);
 
 /**
- * blk_end_request_ref - Helper function for drivers to complete the request.
- * @rq:       the request being processed
- * @error:    %0 for success, < %0 for error
- * @nr_bytes: number of bytes to complete
- *
- * Description:
- *     Ends I/O on a number of bytes attached to @rq.
- *     If @rq has leftover, sets it up for the next range of segments.
- *
- * Return:
- *     %false - we are done with this request
- *     %true  - still buffers pending for this request
+ * blk_end_request_noreclaim - same as blk_end_request, but not reclaim rq
  **/
-bool blk_end_request_ref(struct request *rq, int error, unsigned int nr_bytes,
-	int *drop_ref)
+bool blk_end_request_noreclaim(struct request *rq, int error,
+	unsigned int nr_bytes, int *drop_ref)
 {
+	BUG_ON(!drop_ref);
 	return blk_end_bidi_request(rq, error, nr_bytes, 0, drop_ref);
 }
-EXPORT_SYMBOL(blk_end_request_ref);
+EXPORT_SYMBOL(blk_end_request_noreclaim);
 
 /**
  * blk_end_request_all - Helper function for drives to finish the request.
@@ -2650,6 +2653,21 @@ void blk_end_request_all(struct request *rq, int error)
 }
 EXPORT_SYMBOL(blk_end_request_all);
 
+void blk_end_request_all_noreclaim(struct request *rq, int error)
+{
+	bool pending;
+	unsigned int bidi_bytes = 0;
+	int ref;
+
+	if (unlikely(blk_bidi_rq(rq)))
+		bidi_bytes = blk_rq_bytes(rq->next_rq);
+
+	pending = blk_end_bidi_request(rq, error, blk_rq_bytes(rq), bidi_bytes,
+		&ref);
+	BUG_ON(pending);
+}
+EXPORT_SYMBOL(blk_end_request_all_noreclaim);
+
 /**
  * blk_end_request_cur - Helper function to finish the current request chunk.
  * @rq: the request to finish the current chunk for
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 7a2e79f..70c65d2 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -772,6 +772,7 @@ extern void generic_make_request(struct bio *bio);
 extern void blk_rq_init(struct request_queue *q, struct request *rq);
 extern void blk_put_request(struct request *);
 extern void __blk_put_request(struct request_queue *, struct request *);
+extern void blk_reclaim_request(struct request *req, int error);
 extern struct request *blk_get_request(struct request_queue *, int, gfp_t);
 extern struct request *blk_make_request(struct request_queue *, struct bio *,
 					gfp_t);
@@ -934,11 +935,12 @@ extern struct request *blk_fetch_request(struct request_queue *q);
  */
 extern bool blk_update_request(struct request *rq, int error,
 			       unsigned int nr_bytes);
-extern bool blk_end_request_ref(struct request *rq, int error,
-	unsigned int nr_bytes, int *drop_ref);
 extern bool blk_end_request(struct request *rq, int error,
 			    unsigned int nr_bytes);
+extern bool blk_end_request_noreclaim(struct request *rq, int error,
+	unsigned int nr_bytes, int *drop_ref);
 extern void blk_end_request_all(struct request *rq, int error);
+extern void blk_end_request_all_noreclaim(struct request *rq, int error);
 extern bool blk_end_request_cur(struct request *rq, int error);
 extern bool blk_end_request_err(struct request *rq, int error);
 extern bool __blk_end_request(struct request *rq, int error,
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RFC 7/9] scsi: adopt ref on scsi_cmnd to avoid a race on request
  2014-05-30  8:15 [RFC 0/9] fix for the race issue between scsi timer and in-flight scmd Liu Ping Fan
                   ` (5 preceding siblings ...)
  2014-05-30  8:15 ` [RFC 6/9] blk: split the reclaim of req from blk_finish_request() Liu Ping Fan
@ 2014-05-30  8:15 ` Liu Ping Fan
  2014-05-30  8:15 ` [RFC 8/9] scsi: virtscsi: work around to abort a scmd Liu Ping Fan
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 12+ messages in thread
From: Liu Ping Fan @ 2014-05-30  8:15 UTC (permalink / raw)
  To: linux-scsi
  Cc: Adaptec OEM Raid Solutions, Jens Axboe, Paolo Bonzini,
	Stefan Hajnoczi, Jeff Moyer

When running io stress test on large latency disk, e.g guest with a image on nfs.
It can trigger the BUG_ON(test_bit(REQ_ATOM_COMPLETE, &req->atomic_flags));
Since there is a race between latency "finishing" scmd and the re-allocated scmd.
I.e a request is still referred by a scmd, but we had turn it back to
slab. This patch introduces the ref on scmd to exclude this issue.

inc ref rules is like the following:
  When setup a scmd, inited as 1. When add a timer inc 1.

dec ref rules is like the following:
  -for the normal ref
     scsi_done() will drop the ref when fail to acquire REQ_ATOM_COMPLETE immediately
     or drop the ref by scsi_end_request()
     or drop by return SUCCESS_REMOVE
  -for a timer ref
     when deleting timer, if !list_empty(timeout_list), then there is a timer ref, and
     drop it.

Oops call trace:

[ 3379.866773] ------------[ cut here ]------------
[ 3379.866997] kernel BUG at block/blk-core.c:2295!
[ 3379.867238] Oops: Exception in kernel mode, sig: 5 [#1]
[ 3379.867400] SMP NR_CPUS=2048 NUMA pSeries
[ 3379.867574] Modules linked in: nfsv3 sg rpcsec_gss_krb5 arc4 md4 nfsv4 nls_utf8
[ 3379.879310] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 3.10.0-105.el7.ppc64 #1
[ 3379.879586] task: c0000003f8bf6900 ti: c0000003fffd4000 task.ti: c0000003f8cb4000
[ 3379.879858] NIP: c000000000413324 LR: c000000000413380 CTR: c0000000005b2cf0
[ 3379.881647] REGS: c0000003fffd76b0 TRAP: 0700   Not tainted  (3.10.0-105.el7.ppc64)
[ 3379.881932] MSR: 8000000000029032 <SF,EE,ME,IR,DR,RI>  CR: 42022042  XER: 00000000
[ 3379.882533] SOFTE: 0
[ 3379.882626] CFAR: c000000000413388
[ 3379.882765]
GPR00: c0000000004132e4 c0000003fffd7930 c0000000012543b8 00000312efc08912
GPR04: c0000003f650f000 c0000003f650f000 c000000000c55e48 000000000009f49d
GPR08: c0000000012e43b8 0000000000000001 0000000000080000 000000002f7c7611
GPR12: 0000000022022044 c00000000fe01000 c000000000b270e8 0000000000010000
GPR16: c000000000b27128 c000000000b27110 c000000000b271c8 c000000000c54480
GPR20: c000000000ac9e00 0000000000000000 c000000000ac9c58 c000000000b271f0
GPR24: c000000003a81948 c000000003a81848 0000000000000000 0000000000000000
GPR28: c0000003f650f000 c0000003efab0000 c0000003efab0004 c0000003f650f000
[ 3379.886311] NIP [c000000000413324] .blk_start_request+0x84/0x100
[ 3379.886539] LR [c000000000413380] .blk_start_request+0xe0/0x100
[ 3379.886760] PACATMSCRATCH [8000000000009032]
[ 3379.886951] Call Trace:
[ 3379.887037] [c0000003fffd7930] [c0000000004132e4] .blk_start_request+0x44/0x100 (unreliable)
[ 3379.887381] [c0000003fffd79b0] [c0000000005b2ef8] .scsi_request_fn+0x208/0x7e0
[ 3379.887756] [c0000003fffd7ad0] [c0000000004141c8] .blk_run_queue+0x68/0xa0
[ 3379.889260] [c0000003fffd7b50] [c0000000005b2098] .scsi_run_queue+0x158/0x300
[ 3379.889645] [c0000003fffd7c10] [c0000000005b5bec] .scsi_io_completion+0x22c/0xe80
[ 3379.890083] [c0000003fffd7ce0] [c0000000005a58a8] .scsi_finish_command+0x108/0x1a0
[ 3379.890483] [c0000003fffd7d70] [c0000000005b5858] .scsi_softirq_done+0x1c8/0x240
[ 3379.890956] [c0000003fffd7e00] [c000000000422b24] .blk_done_softirq+0xa4/0xd0
[ 3379.891364] [c0000003fffd7e90] [c0000000000bb8a8] .__do_softirq+0x148/0x380
[ 3379.891632] [c0000003fffd7f90] [c00000000002462c] .call_do_softirq+0x14/0x24
[ 3379.892008] [c0000003fffd3df0] [c000000000010f70] .do_softirq+0x120/0x170
[ 3379.892429] [c0000003fffd3e80] [c0000000000bbe34] .irq_exit+0x1e4/0x1f0
[ 3379.893180] [c0000003fffd3f10] [c000000000010b60] .__do_irq+0xc0/0x1d0
[ 3379.893578] [c0000003fffd3f90] [c000000000024650] .call_do_irq+0x14/0x24
[ 3379.894018] [c0000003f8cb7830] [c000000000010cfc] .do_IRQ+0x8c/0x100
[ 3379.894539] [c0000003f8cb78d0] [c000000000002494] hardware_interrupt_common+0x114/0x180
[ 3379.895278] --- Exception: 501 at .plpar_hcall_norets+0x84/0xd4
[ 3379.895278]     LR = .shared_cede_loop+0x40/0x90
[ 3379.895730] [c0000003f8cb7bc0] [0000000000000000]           (null) (unreliable)
[ 3379.896356] [c0000003f8cb7c40] [c0000000006c5054] .cpuidle_idle_call+0x114/0x3c0
[ 3379.897882] [c0000003f8cb7d10] [c00000000007d6f0] .pseries_lpar_idle+0x10/0x50
[ 3379.898200] [c0000003f8cb7d80] [c0000000000186a4] .arch_cpu_idle+0x64/0x150
[ 3379.898475] [c0000003f8cb7e00] [c000000000135b84] .cpu_startup_entry+0x1a4/0x2e0
[ 3379.898860] [c0000003f8cb7ed0] [c0000000008b9a4c] .start_secondary+0x3a8/0x3b0
[ 3379.899173] [c0000003f8cb7f90] [c00000000000976c] .start_secondary_prolog+0x10/0x14
[ 3379.899496] Instruction dump:
[ 3379.899631] 792a77e3 41820010 815f0050 2f8a0001 419e004c e93f0178 815f0064 2fa90000
[ 3379.900085] 915f013c 40de0074 e93f0058 792907e0 <0b090000> 7fe3fb78 48010495 60000000
[ 3379.900546] ---[ end trace d57cacc25e1c8c73 ]---
[ 3379.905937]
[ 3379.906041] Sending IPI to other CPUs
[ 3389.939144] ERROR: 1 cpu(s) not responding

Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
---
 drivers/scsi/scsi.c       | 33 +++++++++++++++++++++-------
 drivers/scsi/scsi_error.c |  5 +++--
 drivers/scsi/scsi_lib.c   | 56 +++++++++++++++++++++++++++++++++--------------
 drivers/scsi/scsi_priv.h  |  3 +++
 include/scsi/scsi_cmnd.h  |  1 +
 5 files changed, 72 insertions(+), 26 deletions(-)

diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
index d8afec8..2095edc 100644
--- a/drivers/scsi/scsi.c
+++ b/drivers/scsi/scsi.c
@@ -271,6 +271,8 @@ struct scsi_cmnd *__scsi_get_command(struct Scsi_Host *shost, gfp_t gfp_mask)
 		}
 	}
 
+	if (likely(cmd))
+		atomic_set(&cmd->ref, 1);
 	return cmd;
 }
 EXPORT_SYMBOL_GPL(__scsi_get_command);
@@ -320,6 +322,7 @@ void __scsi_put_command(struct Scsi_Host *shost, struct scsi_cmnd *cmd,
 {
 	unsigned long flags;
 
+	BUG_ON(atomic_read(&cmd->ref) != 0);
 	/* changing locks here, don't need to restore the irq state */
 	spin_lock_irqsave(&shost->free_list_lock, flags);
 	if (unlikely(list_empty(&shost->free_list))) {
@@ -348,15 +351,25 @@ void scsi_put_command(struct scsi_cmnd *cmd)
 	struct scsi_device *sdev = cmd->device;
 	unsigned long flags;
 
-	/* serious error if the command hasn't come from a device list */
-	spin_lock_irqsave(&cmd->device->list_lock, flags);
-	BUG_ON(list_empty(&cmd->list));
-	list_del_init(&cmd->list);
-	spin_unlock_irqrestore(&cmd->device->list_lock, flags);
+	BUG_ON(atomic_read(&cmd->ref) <= 0);
+	if (atomic_dec_and_test(&cmd->ref)) {
+		smp_mb();
+		if (cmd->request) {
+			BUG_ON(!list_empty(&cmd->request->queuelist));
+			/* scsi layer has no access to req, so let it gone */
+			blk_reclaim_request(cmd->request, cmd->request->errors);
+		}
+		__scsi_release_buffers(cmd, 0);
+		/* serious error if the command hasn't come from a device list */
+		spin_lock_irqsave(&cmd->device->list_lock, flags);
+		BUG_ON(list_empty(&cmd->list));
+		list_del_init(&cmd->list);
+		spin_unlock_irqrestore(&cmd->device->list_lock, flags);
 
-	cancel_delayed_work(&cmd->abort_work);
+		cancel_delayed_work(&cmd->abort_work);
 
-	__scsi_put_command(cmd->device->host, cmd, &sdev->sdev_gendev);
+		__scsi_put_command(cmd->device->host, cmd, &sdev->sdev_gendev);
+	}
 }
 EXPORT_SYMBOL(scsi_put_command);
 
@@ -757,8 +770,12 @@ int scsi_dispatch_cmd(struct scsi_cmnd *cmd)
  */
 static void scsi_done(struct scsi_cmnd *cmd)
 {
+	int ret;
+
 	trace_scsi_dispatch_cmd_done(cmd);
-	blk_complete_request(cmd->request);
+	ret = blk_complete_request(cmd->request);
+	if (ret < 0)
+		scsi_put_command(cmd);
 }
 
 /**
diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index 8ddd8f5..52e1adb 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -875,9 +875,10 @@ static int scsi_try_to_abort_cmd(struct scsi_host_template *hostt, struct scsi_c
 		return FAILED;
 
 	ret = hostt->eh_abort_handler(scmd);
-	/* After introducing ref on scsi_cmnd, here will handle the ref */
-	if (ret == SUCCESS_REMOVE)
+	if (ret == SUCCESS_REMOVE) {
+		scsi_put_command(scmd);
 		ret = SUCCESS;
+	}
 	return ret;
 }
 
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index e117579..d5eb2df 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -114,6 +114,7 @@ static void scsi_unprep_request(struct request *req)
 	struct scsi_cmnd *cmd = req->special;
 
 	blk_unprep_request(req);
+	cmd->request = NULL;
 	req->special = NULL;
 
 	scsi_put_command(cmd);
@@ -138,6 +139,7 @@ static void __scsi_queue_insert(struct scsi_cmnd *cmd, int reason, int unbusy)
 	struct scsi_target *starget = scsi_target(device);
 	struct request_queue *q = device->request_queue;
 	unsigned long flags;
+	bool drop;
 
 	SCSI_LOG_MLQUEUE(1,
 		 printk("Inserting command %p into mlqueue\n", cmd));
@@ -182,7 +184,10 @@ static void __scsi_queue_insert(struct scsi_cmnd *cmd, int reason, int unbusy)
 	 * before blk_cleanup_queue() finishes.
 	 */
 	spin_lock_irqsave(q->queue_lock, flags);
-	blk_requeue_request(q, cmd->request);
+	drop = blk_requeue_request(q, cmd->request);
+	if (drop)
+		/* drop the ref by a timer */
+		scsi_put_command(cmd);
 	kblockd_schedule_work(q, &device->requeue_work);
 	spin_unlock_irqrestore(q->queue_lock, flags);
 }
@@ -587,8 +592,6 @@ void scsi_run_host_queues(struct Scsi_Host *shost)
 		scsi_run_queue(sdev->request_queue);
 }
 
-static void __scsi_release_buffers(struct scsi_cmnd *, int);
-
 /*
  * Function:    scsi_end_request()
  *
@@ -616,15 +619,16 @@ static struct scsi_cmnd *scsi_end_request(struct scsi_cmnd *cmd, int error,
 {
 	struct request_queue *q = cmd->device->request_queue;
 	struct request *req = cmd->request;
+	int drop = 0;
 
 	/*
 	 * If there are blocks left over at the end, set up the command
 	 * to queue the remainder of them.
 	 */
-	if (blk_end_request(req, error, bytes)) {
+	if (blk_end_request_noreclaim(req, error, bytes, &drop)) {
 		/* kill remainder if no retrys */
 		if (error && scsi_noretry_cmd(cmd))
-			blk_end_request_all(req, error);
+			blk_end_request_all_noreclaim(req, error);
 		else {
 			if (requeue) {
 				/*
@@ -639,12 +643,15 @@ static struct scsi_cmnd *scsi_end_request(struct scsi_cmnd *cmd, int error,
 			return cmd;
 		}
 	}
+	/* drop the ref which is hold by timer */
+	if (drop)
+		scsi_put_command(cmd);
 
+	req->errors = error;
 	/*
 	 * This will goose the queue request function at the end, so we don't
 	 * need to worry about launching another command.
 	 */
-	__scsi_release_buffers(cmd, 0);
 	scsi_next_command(cmd);
 	return NULL;
 }
@@ -700,7 +707,7 @@ static void scsi_free_sgtable(struct scsi_data_buffer *sdb)
 	__sg_free_table(&sdb->table, SCSI_MAX_SG_SEGMENTS, scsi_sg_free);
 }
 
-static void __scsi_release_buffers(struct scsi_cmnd *cmd, int do_bidi_check)
+void __scsi_release_buffers(struct scsi_cmnd *cmd, int do_bidi_check)
 {
 
 	if (cmd->sdb.table.nents)
@@ -876,7 +883,7 @@ void scsi_io_completion(struct scsi_cmnd *cmd, unsigned int good_bytes)
 
 			scsi_queue_dequeue(cmd);
 			scsi_release_buffers(cmd);
-			blk_end_request_all(req, 0);
+			blk_end_request_all_noreclaim(req, 0);
 
 			scsi_next_command(cmd);
 			return;
@@ -1179,6 +1186,7 @@ static struct scsi_cmnd *scsi_get_cmd_from_req(struct scsi_device *sdev,
 		req->special = cmd;
 	} else {
 		cmd = req->special;
+		BUG_ON(atomic_read(&cmd->ref) <= 0);
 	}
 
 	/* pull a tag out of the request if we have one */
@@ -1321,6 +1329,7 @@ scsi_prep_return(struct request_queue *q, struct request *req, int ret)
 		/* release the command and kill it */
 		if (req->special) {
 			struct scsi_cmnd *cmd = req->special;
+			cmd->request = NULL;
 			scsi_release_buffers(cmd);
 			scsi_put_command(cmd);
 			req->special = NULL;
@@ -1600,6 +1609,7 @@ static void scsi_request_fn(struct request_queue *q)
 	struct Scsi_Host *shost;
 	struct scsi_cmnd *cmd;
 	struct request *req;
+	bool tref;
 
 	if(!get_device(&sdev->sdev_gendev))
 		/* We must be tearing the block queue down already */
@@ -1612,6 +1622,8 @@ static void scsi_request_fn(struct request_queue *q)
 	shost = sdev->host;
 	for (;;) {
 		int rtn;
+
+		tref = false;
 		/*
 		 * get next queueable request.  We do this early to make sure
 		 * that the request is fully prepared even if we cannot 
@@ -1629,14 +1641,6 @@ static void scsi_request_fn(struct request_queue *q)
 		}
 
 
-		/*
-		 * Remove the request from the request list.
-		 */
-		if (!(blk_queue_tagged(q) && !blk_queue_start_tag(q, req)))
-			blk_start_request(req);
-		sdev->device_busy++;
-
-		spin_unlock(q->queue_lock);
 		cmd = req->special;
 		if (unlikely(cmd == NULL)) {
 			printk(KERN_CRIT "impossible request in %s.\n"
@@ -1649,6 +1653,23 @@ static void scsi_request_fn(struct request_queue *q)
 		spin_lock(shost->host_lock);
 
 		/*
+		 * Remove the request from the request list.
+		 */
+		if (!(blk_queue_tagged(q) && !blk_queue_start_tag(q, req))) {
+			blk_start_request(req);
+			if (likely(cmd)) {
+				tref = true;
+				/* a timer references the cmd */
+				atomic_inc(&cmd->ref);
+				/* usual 2, or 3 for done cmd in flight */
+				BUG_ON(atomic_read(&cmd->ref) > 3);
+			}
+		}
+		sdev->device_busy++;
+
+		spin_unlock(q->queue_lock);
+
+		/*
 		 * We hit this when the driver is using a host wide
 		 * tag map. For device level tag maps the queue_depth check
 		 * in the device ready fn would prevent us from trying
@@ -1708,6 +1729,9 @@ static void scsi_request_fn(struct request_queue *q)
 	 */
 	spin_lock_irq(q->queue_lock);
 	blk_requeue_request(q, req);
+	if (tref)
+		/* drop the ref holded by a timer*/
+		scsi_put_command(cmd);
 	sdev->device_busy--;
 out_delay:
 	if (sdev->device_busy == 0)
diff --git a/drivers/scsi/scsi_priv.h b/drivers/scsi/scsi_priv.h
index 601b964..a8f630c 100644
--- a/drivers/scsi/scsi_priv.h
+++ b/drivers/scsi/scsi_priv.h
@@ -179,4 +179,7 @@ extern int scsi_internal_device_block(struct scsi_device *sdev);
 extern int scsi_internal_device_unblock(struct scsi_device *sdev,
 					enum scsi_device_state new_state);
 
+extern void __scsi_release_buffers(struct scsi_cmnd *cmd, int do_bidi_check);
+
+
 #endif /* _SCSI_PRIV_H */
diff --git a/include/scsi/scsi_cmnd.h b/include/scsi/scsi_cmnd.h
index 029b076..5643026 100644
--- a/include/scsi/scsi_cmnd.h
+++ b/include/scsi/scsi_cmnd.h
@@ -58,6 +58,7 @@ struct scsi_cmnd {
 	struct delayed_work abort_work;
 	int eh_eflags;		/* Used by error handlr */
 
+	atomic_t ref;
 	/*
 	 * A SCSI Command is assigned a nonzero serial_number before passed
 	 * to the driver's queue command function.  The serial_number is
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RFC 8/9] scsi: virtscsi: work around to abort a scmd
  2014-05-30  8:15 [RFC 0/9] fix for the race issue between scsi timer and in-flight scmd Liu Ping Fan
                   ` (6 preceding siblings ...)
  2014-05-30  8:15 ` [RFC 7/9] scsi: adopt ref on scsi_cmnd to avoid a race on request Liu Ping Fan
@ 2014-05-30  8:15 ` Liu Ping Fan
  2014-05-30  8:15 ` [RFC 9/9] scsi: ibmvscsi: return SUCCESS_REMOVE when finding a abort cmd Liu Ping Fan
  2014-05-30  8:26 ` [RFC 0/9] fix for the race issue between scsi timer and in-flight scmd Paolo Bonzini
  9 siblings, 0 replies; 12+ messages in thread
From: Liu Ping Fan @ 2014-05-30  8:15 UTC (permalink / raw)
  To: linux-scsi
  Cc: Adaptec OEM Raid Solutions, Jens Axboe, Paolo Bonzini,
	Stefan Hajnoczi, Jeff Moyer

This patch is just an "eg and test-issue" for this series. The main
changes is that when abort a scmd, distinguishing the case of removing
from the case of not found. ( I think it should be better to remove the
scmd directly from vq, but for the time being, I have no idea about
this detail in virtio)

Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
---
 drivers/scsi/virtio_scsi.c | 61 ++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 59 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
index b26f1a5..d08aae5 100644
--- a/drivers/scsi/virtio_scsi.c
+++ b/drivers/scsi/virtio_scsi.c
@@ -31,8 +31,14 @@
 #define VIRTIO_SCSI_EVENT_LEN 8
 #define VIRTIO_SCSI_VQ_BASE 2
 
+/* should be per virtscsi dev */
+static struct list_head active_cmds = LIST_HEAD_INIT(active_cmds);
+static spinlock_t cmds_lock;
+
 /* Command queue element */
 struct virtio_scsi_cmd {
+	struct list_head list;
+	unsigned long abort;
 	struct scsi_cmnd *sc;
 	struct completion *comp;
 	union {
@@ -148,6 +154,18 @@ static void virtscsi_compute_resid(struct scsi_cmnd *sc, u32 resid)
 static void virtscsi_complete_cmd(struct virtio_scsi *vscsi, void *buf)
 {
 	struct virtio_scsi_cmd *cmd = buf;
+	unsigned long flags;
+	int skip = 0;
+	spin_lock_irqsave(&cmds_lock, flags);
+	list_del_init(&cmd->list);
+	if (cmd->abort)
+		skip = 1;
+	spin_unlock_irqrestore(&cmds_lock, flags);
+	if (skip) {
+		mempool_free(cmd, virtscsi_cmd_pool);
+		return;
+	}
+
 	struct scsi_cmnd *sc = cmd->sc;
 	struct virtio_scsi_cmd_resp *resp = &cmd->resp.cmd;
 	struct virtio_scsi_target_state *tgt =
@@ -273,11 +291,16 @@ static void virtscsi_req_done(struct virtqueue *vq)
 static void virtscsi_complete_free(struct virtio_scsi *vscsi, void *buf)
 {
 	struct virtio_scsi_cmd *cmd = buf;
+	unsigned long flags;
 
 	if (cmd->comp)
 		complete_all(cmd->comp);
-	else
+	else {
+		spin_lock_irqsave(&cmds_lock, flags);
+		list_del_init(&cmd->list);
+		spin_unlock_irqrestore(&cmds_lock, flags);
 		mempool_free(cmd, virtscsi_cmd_pool);
+	}
 }
 
 static void virtscsi_ctrl_done(struct virtqueue *vq)
@@ -477,6 +500,10 @@ static int virtscsi_kick_cmd(struct virtio_scsi_vq *vq,
 	int err;
 	bool needs_kick = false;
 
+	spin_lock_irqsave(&cmds_lock, flags);
+	list_add_tail(&cmd->list, &active_cmds);
+	spin_unlock_irqrestore(&cmds_lock, flags);
+
 	spin_lock_irqsave(&vq->vq_lock, flags);
 	err = virtscsi_add_cmd(vq->vq, cmd, req_size, resp_size, gfp);
 	if (!err)
@@ -495,6 +522,7 @@ static int virtscsi_queuecommand(struct virtio_scsi *vscsi,
 {
 	struct virtio_scsi_cmd *cmd;
 	int ret;
+	unsigned long flags;
 
 	struct Scsi_Host *shost = virtio_scsi_host(vscsi->vdev);
 	BUG_ON(scsi_sg_count(sc) > shost->sg_tablesize);
@@ -511,6 +539,7 @@ static int virtscsi_queuecommand(struct virtio_scsi *vscsi,
 		goto out;
 
 	memset(cmd, 0, sizeof(*cmd));
+	INIT_LIST_HEAD(&cmd->list);
 	cmd->sc = sc;
 	cmd->req.cmd = (struct virtio_scsi_cmd_req){
 		.lun[0] = 1,
@@ -530,8 +559,12 @@ static int virtscsi_queuecommand(struct virtio_scsi *vscsi,
 			      sizeof cmd->req.cmd, sizeof cmd->resp.cmd,
 			      GFP_ATOMIC) == 0)
 		ret = 0;
-	else
+	else {
+		spin_lock_irqsave(&cmds_lock, flags);
+		list_del_init(&cmd->list);
+		spin_unlock_irqrestore(&cmds_lock, flags);
 		mempool_free(cmd, virtscsi_cmd_pool);
+	}
 
 out:
 	return ret;
@@ -590,6 +623,7 @@ static int virtscsi_tmf(struct virtio_scsi *vscsi, struct virtio_scsi_cmd *cmd)
 {
 	DECLARE_COMPLETION_ONSTACK(comp);
 	int ret = FAILED;
+	unsigned long flags;
 
 	cmd->comp = &comp;
 	if (virtscsi_kick_cmd(&vscsi->ctrl_vq, cmd,
@@ -603,6 +637,9 @@ static int virtscsi_tmf(struct virtio_scsi *vscsi, struct virtio_scsi_cmd *cmd)
 		ret = SUCCESS;
 
 out:
+	spin_lock_irqsave(&cmds_lock, flags);
+	list_del_init(&cmd->list);
+	spin_unlock_irqrestore(&cmds_lock, flags);
 	mempool_free(cmd, virtscsi_cmd_pool);
 	return ret;
 }
@@ -618,6 +655,7 @@ static int virtscsi_device_reset(struct scsi_cmnd *sc)
 		return FAILED;
 
 	memset(cmd, 0, sizeof(*cmd));
+	INIT_LIST_HEAD(&cmd->list);
 	cmd->sc = sc;
 	cmd->req.tmf = (struct virtio_scsi_ctrl_tmf_req){
 		.type = VIRTIO_SCSI_T_TMF,
@@ -634,13 +672,31 @@ static int virtscsi_abort(struct scsi_cmnd *sc)
 {
 	struct virtio_scsi *vscsi = shost_priv(sc->device->host);
 	struct virtio_scsi_cmd *cmd;
+	struct virtio_scsi_cmd *entry, *tmp;
+	unsigned long flags, found = 0;
 
 	scmd_printk(KERN_INFO, sc, "abort\n");
+	spin_lock_irqsave(&cmds_lock, flags);
+	list_for_each_entry_safe(entry, tmp, &active_cmds, list) {
+		if (entry->sc == sc) {
+			list_del_init(&entry->list);
+			/* mark the virtio_scsi_cmd as aborted
+			 */
+			entry->abort = 1;
+			found = 1;
+		}
+
+	}
+	spin_unlock_irqrestore(&cmds_lock, flags);
+	if (found)
+		return SUCCESS_REMOVE;
+
 	cmd = mempool_alloc(virtscsi_cmd_pool, GFP_NOIO);
 	if (!cmd)
 		return FAILED;
 
 	memset(cmd, 0, sizeof(*cmd));
+	INIT_LIST_HEAD(&cmd->list);
 	cmd->sc = sc;
 	cmd->req.tmf = (struct virtio_scsi_ctrl_tmf_req){
 		.type = VIRTIO_SCSI_T_TMF,
@@ -1030,6 +1086,7 @@ static int __init init(void)
 	if (ret < 0)
 		goto error;
 
+	spin_lock_init(&cmds_lock);
 	return 0;
 
 error:
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RFC 9/9] scsi: ibmvscsi: return SUCCESS_REMOVE when finding a abort cmd
  2014-05-30  8:15 [RFC 0/9] fix for the race issue between scsi timer and in-flight scmd Liu Ping Fan
                   ` (7 preceding siblings ...)
  2014-05-30  8:15 ` [RFC 8/9] scsi: virtscsi: work around to abort a scmd Liu Ping Fan
@ 2014-05-30  8:15 ` Liu Ping Fan
  2014-05-30  8:26 ` [RFC 0/9] fix for the race issue between scsi timer and in-flight scmd Paolo Bonzini
  9 siblings, 0 replies; 12+ messages in thread
From: Liu Ping Fan @ 2014-05-30  8:15 UTC (permalink / raw)
  To: linux-scsi
  Cc: Adaptec OEM Raid Solutions, Jens Axboe, Paolo Bonzini,
	Stefan Hajnoczi, Jeff Moyer

When return SUCCESS_REMOVE, it can benifit the ref on scsi_cmnd

Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
---
 drivers/scsi/ibmvscsi/ibmvscsi.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/scsi/ibmvscsi/ibmvscsi.c b/drivers/scsi/ibmvscsi/ibmvscsi.c
index d0fa4b6..cf78be5 100644
--- a/drivers/scsi/ibmvscsi/ibmvscsi.c
+++ b/drivers/scsi/ibmvscsi/ibmvscsi.c
@@ -1583,7 +1583,7 @@ static int ibmvscsi_eh_abort_handler(struct scsi_cmnd *cmd)
 	free_event_struct(&found_evt->hostdata->pool, found_evt);
 	spin_unlock_irqrestore(hostdata->host->host_lock, flags);
 	atomic_inc(&hostdata->request_limit);
-	return SUCCESS;
+	return SUCCESS_REMOVE;
 }
 
 /**
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [RFC 0/9] fix for the race issue between scsi timer and in-flight scmd
  2014-05-30  8:15 [RFC 0/9] fix for the race issue between scsi timer and in-flight scmd Liu Ping Fan
                   ` (8 preceding siblings ...)
  2014-05-30  8:15 ` [RFC 9/9] scsi: ibmvscsi: return SUCCESS_REMOVE when finding a abort cmd Liu Ping Fan
@ 2014-05-30  8:26 ` Paolo Bonzini
  2014-05-30  8:31   ` liu ping fan
  9 siblings, 1 reply; 12+ messages in thread
From: Paolo Bonzini @ 2014-05-30  8:26 UTC (permalink / raw)
  To: Liu Ping Fan, linux-scsi
  Cc: Adaptec OEM Raid Solutions, Jens Axboe, Stefan Hajnoczi,
	Jeff Moyer

Il 30/05/2014 10:15, Liu Ping Fan ha scritto:
> When running io stress test on large latency scsi-disk, e.g guest with virtscsi
> on a nfs image. It can trigger the BUG_ON(test_bit(REQ_ATOM_COMPLETE, &req->atomic_flags));
> in blk_start_request().
> Since there is a race between latency "finishing" scmd and the re-allocated scmd.
> I.e a request is still referred by a scmd, but we had turn it back to
> slab.
>
> This series introduces the ref on scmd to exclude this issue, and the following is ref rules.
>
>   inc ref rules is like the following:
>     When setup a scmd, inited as 1. When add a timer inc 1.
>
>   dec ref rules is like the following:
>     -for the normal ref
>        scsi_done() will drop the ref when fail to acquire REQ_ATOM_COMPLETE immediately
>        or drop the ref by scsi_end_request()
>        or drop by return SUCCESS_REMOVE
>     -for a timer ref
>        when deleting timer, if !list_empty(timeout_list), then there is a timer ref, and
>        drop it.
>
>
> patch1-2: fix the current potential bug
> patch3~6: prepare for the mechanism for the ref
> patch7:   the ref rules core
> patch8-9:  e.g and test-issue for the new mechanism. Since lack of many virtscsi background,
>            patch8 may be poor and need to be improved :)
>
>
> Note: all the patches are based on rhel7, whose kernel version is linux-3.10.
>       I will rebase them onto the latest commit if my method is practical.

This series is not necessary, this is a bug in the virtscsi driver.  I 
have a ten line patch to fix it, but I haven't yet tested it properly.

Paolo

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC 0/9] fix for the race issue between scsi timer and in-flight scmd
  2014-05-30  8:26 ` [RFC 0/9] fix for the race issue between scsi timer and in-flight scmd Paolo Bonzini
@ 2014-05-30  8:31   ` liu ping fan
  0 siblings, 0 replies; 12+ messages in thread
From: liu ping fan @ 2014-05-30  8:31 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Liu Ping Fan, linux-scsi, Adaptec OEM Raid Solutions, Jens Axboe,
	Stefan Hajnoczi, Jeff Moyer

On Fri, May 30, 2014 at 4:26 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> Il 30/05/2014 10:15, Liu Ping Fan ha scritto:
>
>> When running io stress test on large latency scsi-disk, e.g guest with
>> virtscsi
>> on a nfs image. It can trigger the BUG_ON(test_bit(REQ_ATOM_COMPLETE,
>> &req->atomic_flags));
>> in blk_start_request().
>> Since there is a race between latency "finishing" scmd and the
>> re-allocated scmd.
>> I.e a request is still referred by a scmd, but we had turn it back to
>> slab.
>>
>> This series introduces the ref on scmd to exclude this issue, and the
>> following is ref rules.
>>
>>   inc ref rules is like the following:
>>     When setup a scmd, inited as 1. When add a timer inc 1.
>>
>>   dec ref rules is like the following:
>>     -for the normal ref
>>        scsi_done() will drop the ref when fail to acquire
>> REQ_ATOM_COMPLETE immediately
>>        or drop the ref by scsi_end_request()
>>        or drop by return SUCCESS_REMOVE
>>     -for a timer ref
>>        when deleting timer, if !list_empty(timeout_list), then there is a
>> timer ref, and
>>        drop it.
>>
>>
>> patch1-2: fix the current potential bug
>> patch3~6: prepare for the mechanism for the ref
>> patch7:   the ref rules core
>> patch8-9:  e.g and test-issue for the new mechanism. Since lack of many
>> virtscsi background,
>>            patch8 may be poor and need to be improved :)
>>
>>
>> Note: all the patches are based on rhel7, whose kernel version is
>> linux-3.10.
>>       I will rebase them onto the latest commit if my method is practical.
>
>
> This series is not necessary, this is a bug in the virtscsi driver.  I have
> a ten line patch to fix it, but I haven't yet tested it properly.
>
Could I pick up your patch, and test it?

Thx,
Fan

> Paolo
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2014-05-30  8:31 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-05-30  8:15 [RFC 0/9] fix for the race issue between scsi timer and in-flight scmd Liu Ping Fan
2014-05-30  8:15 ` [RFC 1/9] block: make timeout_list protectd by REQ_ATOM_COMPLETE bit Liu Ping Fan
2014-05-30  8:15 ` [RFC 2/9] scsi: ensure request is dequeue when finishing scmd Liu Ping Fan
2014-05-30  8:15 ` [RFC 3/9] scsi: introduce new internal flag SUCCESS_REMOVE Liu Ping Fan
2014-05-30  8:15 ` [RFC 4/9] blk: change the prototype of blk_complete_request() Liu Ping Fan
2014-05-30  8:15 ` Liu Ping Fan
2014-05-30  8:15 ` [RFC 6/9] blk: split the reclaim of req from blk_finish_request() Liu Ping Fan
2014-05-30  8:15 ` [RFC 7/9] scsi: adopt ref on scsi_cmnd to avoid a race on request Liu Ping Fan
2014-05-30  8:15 ` [RFC 8/9] scsi: virtscsi: work around to abort a scmd Liu Ping Fan
2014-05-30  8:15 ` [RFC 9/9] scsi: ibmvscsi: return SUCCESS_REMOVE when finding a abort cmd Liu Ping Fan
2014-05-30  8:26 ` [RFC 0/9] fix for the race issue between scsi timer and in-flight scmd Paolo Bonzini
2014-05-30  8:31   ` liu ping fan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).