public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed
* list_for_each_entry_safe() regarded as unsafe
@ 2005-06-09 16:27 Alan Stern
  2005-06-09 21:59 ` Mike Anderson
  0 siblings, 1 reply; 5+ messages in thread
From: Alan Stern @ 2005-06-09 16:27 UTC (permalink / raw)
  To: Mike Anderson; +Cc: Dag Nygren, SCSI development list

Mike and whoever else may be interested:

The scsi_forget_host() and __scsi_remove_target() routines (in scsi_scan.c 
and scsi_sysfs.c) contain these lines respectively:

	list_for_each_entry_safe(starget, tmp, &shost->__targets, siblings) {

	list_for_each_entry_safe(sdev, tmp, &shost->__devices, siblings) {

Neither loop is truly safe because they release shost->host_lock to do the
actual removals.  I've just seen a couple of different oopses caused when
__scsi_remove_target() was called during scanning.  Details available if 
you want them.

I don't know what the best way is fix this.  Even if scsi_forget_host() 
acquired the host's scan_mutex, that wouldn't be enough to guarantee the 
__targets and __devices lists won't change, would it?  And it might cause 
interference with other pathways.

Maybe it's best simply to avoid using list_for_each_entry_safe, as in
the example below:

Alan Stern


Index: usb-2.6/drivers/scsi/scsi_sysfs.c
===================================================================
--- usb-2.6.orig/drivers/scsi/scsi_sysfs.c
+++ usb-2.6/drivers/scsi/scsi_sysfs.c
@@ -653,17 +653,19 @@ void __scsi_remove_target(struct scsi_ta
 {
 	struct Scsi_Host *shost = dev_to_shost(starget->dev.parent);
 	unsigned long flags;
-	struct scsi_device *sdev, *tmp;
+	struct scsi_device *sdev;
 
 	spin_lock_irqsave(shost->host_lock, flags);
 	starget->reap_ref++;
-	list_for_each_entry_safe(sdev, tmp, &shost->__devices, siblings) {
+restart:
+	list_for_each_entry(sdev, &shost->__devices, siblings) {
 		if (sdev->channel != starget->channel ||
 		    sdev->id != starget->id)
 			continue;
 		spin_unlock_irqrestore(shost->host_lock, flags);
 		scsi_remove_device(sdev);
 		spin_lock_irqsave(shost->host_lock, flags);
+		goto restart;
 	}
 	spin_unlock_irqrestore(shost->host_lock, flags);
 	scsi_target_reap(starget);



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: list_for_each_entry_safe() regarded as unsafe
  2005-06-09 16:27 list_for_each_entry_safe() regarded as unsafe Alan Stern
@ 2005-06-09 21:59 ` Mike Anderson
  2005-06-09 23:19   ` Alan Stern
  0 siblings, 1 reply; 5+ messages in thread
From: Mike Anderson @ 2005-06-09 21:59 UTC (permalink / raw)
  To: Alan Stern; +Cc: Dag Nygren, SCSI development list

Alan Stern [stern@rowland.harvard.edu] wrote:
> Mike and whoever else may be interested:
> 
> The scsi_forget_host() and __scsi_remove_target() routines (in scsi_scan.c 
> and scsi_sysfs.c) contain these lines respectively:
> 
> 	list_for_each_entry_safe(starget, tmp, &shost->__targets, siblings) {
> 
> 	list_for_each_entry_safe(sdev, tmp, &shost->__devices, siblings) {
> 
> Neither loop is truly safe because they release shost->host_lock to do the
> actual removals.  I've just seen a couple of different oopses caused when
> __scsi_remove_target() was called during scanning.  Details available if 
> you want them.

Well we need a updated scsi_host state model that would prevent scanning
while we are removing the host. I would believe that if the oopses in
__scsi_remove_target where prevent there maybe some other oopses showing
up as the host started going away.

> 
> I don't know what the best way is fix this.  Even if scsi_forget_host() 
> acquired the host's scan_mutex, that wouldn't be enough to guarantee the 
> __targets and __devices lists won't change, would it?  And it might cause 
> interference with other pathways.
> 

Yes if scsi_forget_host acquired the scan_mutex it would deadlock when
scsi_remove_device acquired it later on in the call stack.

> Maybe it's best simply to avoid using list_for_each_entry_safe, as in
> the example below:
> .. snip .. 
> +restart:
> +	list_for_each_entry(sdev, &shost->__devices, siblings) {
>  		if (sdev->channel != starget->channel ||
>  		    sdev->id != starget->id)
>  			continue;
>  		spin_unlock_irqrestore(shost->host_lock, flags);
>  		scsi_remove_device(sdev);
>  		spin_lock_irqsave(shost->host_lock, flags);
> +		goto restart;
>  	}
>  	spin_unlock_irqrestore(shost->host_lock, flags);
>  	scsi_target_reap(starget);
> 

Since we are not guaranteed that scsi_remove_device will remove the device
off the list (i.e. the release may not be called if unexpected disconnect)
you may get stuck on the same device for a bit.

-andmike
--
Michael Anderson
andmike@us.ibm.com


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: list_for_each_entry_safe() regarded as unsafe
  2005-06-09 21:59 ` Mike Anderson
@ 2005-06-09 23:19   ` Alan Stern
  2005-06-10 13:39     ` Brian King
  0 siblings, 1 reply; 5+ messages in thread
From: Alan Stern @ 2005-06-09 23:19 UTC (permalink / raw)
  To: Mike Anderson; +Cc: Dag Nygren, SCSI development list

On Thu, 9 Jun 2005, Mike Anderson wrote:

> Well we need a updated scsi_host state model that would prevent scanning
> while we are removing the host. I would believe that if the oopses in
> __scsi_remove_target where prevent there maybe some other oopses showing
> up as the host started going away.

More than that is needed -- you have to guarantee that two threads won't 
try to add or remove a target or device to the same host at the same time.

> > I don't know what the best way is fix this.  Even if scsi_forget_host() 
> > acquired the host's scan_mutex, that wouldn't be enough to guarantee the 
> > __targets and __devices lists won't change, would it?  And it might cause 
> > interference with other pathways.
> > 
> 
> Yes if scsi_forget_host acquired the scan_mutex it would deadlock when
> scsi_remove_device acquired it later on in the call stack.

How about not acquiring the scan_mutex in scsi_remove_device, and 
insisting that the caller hold it instead?  There aren't that many places 
where it gets called.  In fact, one of those places (an error pathway in 
scsi_sysfs_add_sdev) looks like it already will cause a deadlock.

Then it would be necessary also to have scanning threads check whether the
host is in the process of removal.  This means that scsi_forget_host will
have to change the host state somehow.  What do you think would be the
best to mark a host being removed?

On the plus side, neither forget_host nor remove_target would need to
acquire the host_lock, because holding the scan_mutex would already
guarantee the necessary exclusion.

Alan Stern


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: list_for_each_entry_safe() regarded as unsafe
  2005-06-09 23:19   ` Alan Stern
@ 2005-06-10 13:39     ` Brian King
  2005-06-10 15:26       ` Alan Stern
  0 siblings, 1 reply; 5+ messages in thread
From: Brian King @ 2005-06-10 13:39 UTC (permalink / raw)
  To: Alan Stern; +Cc: Mike Anderson, Dag Nygren, SCSI development list

Alan Stern wrote:
> On Thu, 9 Jun 2005, Mike Anderson wrote:
> 
> 
>>Well we need a updated scsi_host state model that would prevent scanning
>>while we are removing the host. I would believe that if the oopses in
>>__scsi_remove_target where prevent there maybe some other oopses showing
>>up as the host started going away.
> 
> 
> More than that is needed -- you have to guarantee that two threads won't 
> try to add or remove a target or device to the same host at the same time.
> 
> 
>>>I don't know what the best way is fix this.  Even if scsi_forget_host() 
>>>acquired the host's scan_mutex, that wouldn't be enough to guarantee the 
>>>__targets and __devices lists won't change, would it?  And it might cause 
>>>interference with other pathways.
>>>
>>
>>Yes if scsi_forget_host acquired the scan_mutex it would deadlock when
>>scsi_remove_device acquired it later on in the call stack.
> 
> 
> How about not acquiring the scan_mutex in scsi_remove_device, and 
> insisting that the caller hold it instead?  There aren't that many places 
> where it gets called.  In fact, one of those places (an error pathway in 
> scsi_sysfs_add_sdev) looks like it already will cause a deadlock.

scsi_remove_device is an exported symbol, so requiring the caller to obtain
the scan_mutex prior to calling it would not work. A __scsi_remove_device
could be created, however, which would not grab the scan_mutex so that scsi
core could do the right thing.


-- 
Brian King
eServer Storage I/O
IBM Linux Technology Center

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: list_for_each_entry_safe() regarded as unsafe
  2005-06-10 13:39     ` Brian King
@ 2005-06-10 15:26       ` Alan Stern
  0 siblings, 0 replies; 5+ messages in thread
From: Alan Stern @ 2005-06-10 15:26 UTC (permalink / raw)
  To: Brian King; +Cc: Mike Anderson, Dag Nygren, SCSI development list

On Fri, 10 Jun 2005, Brian King wrote:

> > How about not acquiring the scan_mutex in scsi_remove_device, and 
> > insisting that the caller hold it instead?  There aren't that many places 
> > where it gets called.  In fact, one of those places (an error pathway in 
> > scsi_sysfs_add_sdev) looks like it already will cause a deadlock.
> 
> scsi_remove_device is an exported symbol, so requiring the caller to obtain
> the scan_mutex prior to calling it would not work. A __scsi_remove_device
> could be created, however, which would not grab the scan_mutex so that scsi
> core could do the right thing.

Okay.

How should a host be marked to indicate it's being removed?  Add another 
bit to shost_state?

Alan Stern


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2005-06-10 15:26 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-06-09 16:27 list_for_each_entry_safe() regarded as unsafe Alan Stern
2005-06-09 21:59 ` Mike Anderson
2005-06-09 23:19   ` Alan Stern
2005-06-10 13:39     ` Brian King
2005-06-10 15:26       ` Alan Stern

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox