linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] Fix race between starved list processing and device removal
@ 2012-08-09 18:59 Bart Van Assche
       [not found] ` <001901cd8281$d49132d0$7db39870$@min@lge.com>
  2012-09-24 13:14 ` [PATCH, resend] " Bart Van Assche
  0 siblings, 2 replies; 6+ messages in thread
From: Bart Van Assche @ 2012-08-09 18:59 UTC (permalink / raw)
  To: linux-scsi, Chanho Min, Jens Axboe, Tejun Heo, James Bottomley

Avoid that the sdev reference count can drop to zero before
the queue is run by scsi_run_queue(). Also avoid that the sdev
reference count can drop to zero in the same function by invoking
__blk_run_queue().

Reported-by: Chanho Min <chanho.min@lge.com>
Reference: http://lkml.org/lkml/2012/8/2/96
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Tejun Heo <tj@kernel.org>
Cc: <stable@vger.kernel.org>
---
 drivers/scsi/scsi_lib.c   |    5 +++++
 drivers/scsi/scsi_sysfs.c |    7 ++++++-
 2 files changed, 11 insertions(+), 1 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index ffd7773..bd7daec 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -452,10 +452,15 @@ static void scsi_run_queue(struct request_queue *q)
 			continue;
 		}
 
+		get_device(&sdev->sdev_gendev);
 		spin_unlock(shost->host_lock);
+
 		spin_lock(sdev->request_queue->queue_lock);
 		__blk_run_queue(sdev->request_queue);
 		spin_unlock(sdev->request_queue->queue_lock);
+
+		put_device(&sdev->sdev_gendev);
+
 		spin_lock(shost->host_lock);
 	}
 	/* put any unprocessed entries back */
diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index 093d4f6..44f232e 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -348,7 +348,6 @@ static void scsi_device_dev_release_usercontext(struct work_struct *work)
 	starget->reap_ref++;
 	list_del(&sdev->siblings);
 	list_del(&sdev->same_target_siblings);
-	list_del(&sdev->starved_entry);
 	spin_unlock_irqrestore(sdev->host->host_lock, flags);
 
 	cancel_work_sync(&sdev->event_work);
@@ -956,6 +955,8 @@ int scsi_sysfs_add_sdev(struct scsi_device *sdev)
 void __scsi_remove_device(struct scsi_device *sdev)
 {
 	struct device *dev = &sdev->sdev_gendev;
+	struct Scsi_Host *shost = sdev->host;
+	unsigned long flags;
 
 	if (sdev->is_visible) {
 		if (scsi_device_set_state(sdev, SDEV_CANCEL) != 0)
@@ -977,6 +978,10 @@ void __scsi_remove_device(struct scsi_device *sdev)
 	blk_cleanup_queue(sdev->request_queue);
 	cancel_work_sync(&sdev->requeue_work);
 
+	spin_lock_irqsave(shost->host_lock, flags);
+	list_del(&sdev->starved_entry);
+	spin_unlock_irqrestore(shost->host_lock, flags);
+
 	if (sdev->host->hostt->slave_destroy)
 		sdev->host->hostt->slave_destroy(sdev);
 	transport_destroy_device(dev);
-- 
1.7.7


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] Fix race between starved list processing and device removal
       [not found] ` <001901cd8281$d49132d0$7db39870$@min@lge.com>
@ 2012-08-25  6:44   ` Bart Van Assche
       [not found]     ` <001a01cd828e$944acf30$bce06d90$@min@lge.com>
  0 siblings, 1 reply; 6+ messages in thread
From: Bart Van Assche @ 2012-08-25  6:44 UTC (permalink / raw)
  To: Chanho Min
  Cc: 'linux-scsi', 'Jens Axboe', 'Tejun Heo',
	'James Bottomley'

On 08/25/12 05:23, Chanho Min wrote:
> I still have this race issue in our platform.
> If this patch is not accepted, I have to find another approach.
> Any comment will be appreciated.

Which kernel version are you using in your tests, and what is the oops
message the kernel reports ?

Bart.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] Fix race between starved list processing and device removal
       [not found]     ` <001a01cd828e$944acf30$bce06d90$@min@lge.com>
@ 2012-08-25  7:31       ` Bart Van Assche
  0 siblings, 0 replies; 6+ messages in thread
From: Bart Van Assche @ 2012-08-25  7:31 UTC (permalink / raw)
  To: Chanho Min
  Cc: 'linux-scsi', 'Jens Axboe', 'Tejun Heo',
	'James Bottomley'

On 08/25/12 06:55, Chanho Min wrote:
>> On 08/25/12 05:23, Chanho Min wrote:
>>> I still have this race issue in our platform.
>>> If this patch is not accepted, I have to find another approach.
>>> Any comment will be appreciated.
>>
>> Which kernel version are you using in your tests, and what is the oops
>> message the kernel reports ?
> 
> I mean it's solved if your patch is applied. I hope that this patch is
> accepted.

Ah, that's great. Thanks for testing this patch !

Bart.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH, resend] Fix race between starved list processing and device removal
  2012-08-09 18:59 [PATCH] Fix race between starved list processing and device removal Bart Van Assche
       [not found] ` <001901cd8281$d49132d0$7db39870$@min@lge.com>
@ 2012-09-24 13:14 ` Bart Van Assche
  2012-10-07 10:47   ` James Bottomley
  1 sibling, 1 reply; 6+ messages in thread
From: Bart Van Assche @ 2012-09-24 13:14 UTC (permalink / raw)
  To: linux-scsi, James Bottomley
  Cc: Mike Christie, Jens Axboe, Tejun Heo, Chanho Min

Avoid that the sdev reference count can drop to zero before
the queue is run by scsi_run_queue(). Also avoid that the sdev
reference count can drop to zero in the same function by invoking
__blk_run_queue().

Reported-by: Chanho Min <chanho.min@lge.com>
Tested-by: Chanho Min <chanho.min@lge.com>
Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Tejun Heo <tj@kernel.org>
Cc: <stable@vger.kernel.org>
Reference: http://lkml.org/lkml/2012/8/2/96
---
 drivers/scsi/scsi_lib.c   |    5 +++++
 drivers/scsi/scsi_sysfs.c |    7 ++++++-
 2 files changed, 11 insertions(+), 1 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index ffd7773..bd7daec 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -452,10 +452,15 @@ static void scsi_run_queue(struct request_queue *q)
 			continue;
 		}
 
+		get_device(&sdev->sdev_gendev);
 		spin_unlock(shost->host_lock);
+
 		spin_lock(sdev->request_queue->queue_lock);
 		__blk_run_queue(sdev->request_queue);
 		spin_unlock(sdev->request_queue->queue_lock);
+
+		put_device(&sdev->sdev_gendev);
+
 		spin_lock(shost->host_lock);
 	}
 	/* put any unprocessed entries back */
diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index 093d4f6..44f232e 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -348,7 +348,6 @@ static void scsi_device_dev_release_usercontext(struct work_struct *work)
 	starget->reap_ref++;
 	list_del(&sdev->siblings);
 	list_del(&sdev->same_target_siblings);
-	list_del(&sdev->starved_entry);
 	spin_unlock_irqrestore(sdev->host->host_lock, flags);
 
 	cancel_work_sync(&sdev->event_work);
@@ -956,6 +955,8 @@ int scsi_sysfs_add_sdev(struct scsi_device *sdev)
 void __scsi_remove_device(struct scsi_device *sdev)
 {
 	struct device *dev = &sdev->sdev_gendev;
+	struct Scsi_Host *shost = sdev->host;
+	unsigned long flags;
 
 	if (sdev->is_visible) {
 		if (scsi_device_set_state(sdev, SDEV_CANCEL) != 0)
@@ -977,6 +978,10 @@ void __scsi_remove_device(struct scsi_device *sdev)
 	blk_cleanup_queue(sdev->request_queue);
 	cancel_work_sync(&sdev->requeue_work);
 
+	spin_lock_irqsave(shost->host_lock, flags);
+	list_del(&sdev->starved_entry);
+	spin_unlock_irqrestore(shost->host_lock, flags);
+
 	if (sdev->host->hostt->slave_destroy)
 		sdev->host->hostt->slave_destroy(sdev);
 	transport_destroy_device(dev);
-- 
1.7.7


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH, resend] Fix race between starved list processing and device removal
  2012-09-24 13:14 ` [PATCH, resend] " Bart Van Assche
@ 2012-10-07 10:47   ` James Bottomley
  2012-10-07 18:24     ` Bart Van Assche
  0 siblings, 1 reply; 6+ messages in thread
From: James Bottomley @ 2012-10-07 10:47 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: linux-scsi, Mike Christie, Jens Axboe, Tejun Heo, Chanho Min

On Mon, 2012-09-24 at 15:14 +0200, Bart Van Assche wrote:
> Avoid that the sdev reference count can drop to zero before
> the queue is run by scsi_run_queue(). Also avoid that the sdev
> reference count can drop to zero in the same function by invoking
> __blk_run_queue().
[...]		if (scsi_device_set_state(sdev, SDEV_CANCEL) != 0)
> @@ -977,6 +978,10 @@ void __scsi_remove_device(struct scsi_device *sdev)
>  	blk_cleanup_queue(sdev->request_queue);
>  	cancel_work_sync(&sdev->requeue_work);
>  
> +	spin_lock_irqsave(shost->host_lock, flags);
> +	list_del(&sdev->starved_entry);
> +	spin_unlock_irqrestore(shost->host_lock, flags);
> +

This hunk doesn't make much sense.  It seems to be orthogonal to the
problem listed in the changelog and this action is done on last put
anyway.

James



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH, resend] Fix race between starved list processing and device removal
  2012-10-07 10:47   ` James Bottomley
@ 2012-10-07 18:24     ` Bart Van Assche
  0 siblings, 0 replies; 6+ messages in thread
From: Bart Van Assche @ 2012-10-07 18:24 UTC (permalink / raw)
  To: James Bottomley
  Cc: linux-scsi, Mike Christie, Jens Axboe, Tejun Heo, Chanho Min

On 10/07/12 12:47, James Bottomley wrote:
> On Mon, 2012-09-24 at 15:14 +0200, Bart Van Assche wrote:
>> Avoid that the sdev reference count can drop to zero before
>> the queue is run by scsi_run_queue(). Also avoid that the sdev
>> reference count can drop to zero in the same function by invoking
>> __blk_run_queue().
> [...]		if (scsi_device_set_state(sdev, SDEV_CANCEL) != 0)
>> @@ -977,6 +978,10 @@ void __scsi_remove_device(struct scsi_device *sdev)
>>   	blk_cleanup_queue(sdev->request_queue);
>>   	cancel_work_sync(&sdev->requeue_work);
>>
>> +	spin_lock_irqsave(shost->host_lock, flags);
>> +	list_del(&sdev->starved_entry);
>> +	spin_unlock_irqrestore(shost->host_lock, flags);
>> +
>
> This hunk doesn't make much sense.  It seems to be orthogonal to the
> problem listed in the changelog and this action is done on last put
> anyway.

Removing an sdev from the starved list in __scsi_remove_device() has the 
advantage that it is guaranteed that the get_device() call added in 
scsi_run_queue() will succeed. A possible alternative is to leave the 
starved list removal code in scsi_device_dev_release_usercontext() and 
to invoke __blk_run_queue() in scsi_run_queue() only if the get_device() 
call in that function succeeded. Does this mean that you prefer the 
second option - something like the (untested) code below ?

		if (get_device(&sdev->sdev_gendev)) {
			spin_unlock(shost->host_lock);

			spin_lock(sdev->request_queue->queue_lock);
			__blk_run_queue(sdev->request_queue);
			spin_unlock(sdev->request_queue->queue_lock);

			put_device(&sdev->sdev_gendev);
			spin_lock(shost->host_lock);
		}

Bart.


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2012-10-07 18:24 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-08-09 18:59 [PATCH] Fix race between starved list processing and device removal Bart Van Assche
     [not found] ` <001901cd8281$d49132d0$7db39870$@min@lge.com>
2012-08-25  6:44   ` Bart Van Assche
     [not found]     ` <001a01cd828e$944acf30$bce06d90$@min@lge.com>
2012-08-25  7:31       ` Bart Van Assche
2012-09-24 13:14 ` [PATCH, resend] " Bart Van Assche
2012-10-07 10:47   ` James Bottomley
2012-10-07 18:24     ` Bart Van Assche

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).