* list_for_each_entry_safe() regarded as unsafe
@ 2005-06-09 16:27 Alan Stern
2005-06-09 21:59 ` Mike Anderson
0 siblings, 1 reply; 5+ messages in thread
From: Alan Stern @ 2005-06-09 16:27 UTC (permalink / raw)
To: Mike Anderson; +Cc: Dag Nygren, SCSI development list
Mike and whoever else may be interested:
The scsi_forget_host() and __scsi_remove_target() routines (in scsi_scan.c
and scsi_sysfs.c) contain these lines respectively:
list_for_each_entry_safe(starget, tmp, &shost->__targets, siblings) {
list_for_each_entry_safe(sdev, tmp, &shost->__devices, siblings) {
Neither loop is truly safe because they release shost->host_lock to do the
actual removals. I've just seen a couple of different oopses caused when
__scsi_remove_target() was called during scanning. Details available if
you want them.
I don't know what the best way is fix this. Even if scsi_forget_host()
acquired the host's scan_mutex, that wouldn't be enough to guarantee the
__targets and __devices lists won't change, would it? And it might cause
interference with other pathways.
Maybe it's best simply to avoid using list_for_each_entry_safe, as in
the example below:
Alan Stern
Index: usb-2.6/drivers/scsi/scsi_sysfs.c
===================================================================
--- usb-2.6.orig/drivers/scsi/scsi_sysfs.c
+++ usb-2.6/drivers/scsi/scsi_sysfs.c
@@ -653,17 +653,19 @@ void __scsi_remove_target(struct scsi_ta
{
struct Scsi_Host *shost = dev_to_shost(starget->dev.parent);
unsigned long flags;
- struct scsi_device *sdev, *tmp;
+ struct scsi_device *sdev;
spin_lock_irqsave(shost->host_lock, flags);
starget->reap_ref++;
- list_for_each_entry_safe(sdev, tmp, &shost->__devices, siblings) {
+restart:
+ list_for_each_entry(sdev, &shost->__devices, siblings) {
if (sdev->channel != starget->channel ||
sdev->id != starget->id)
continue;
spin_unlock_irqrestore(shost->host_lock, flags);
scsi_remove_device(sdev);
spin_lock_irqsave(shost->host_lock, flags);
+ goto restart;
}
spin_unlock_irqrestore(shost->host_lock, flags);
scsi_target_reap(starget);
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: list_for_each_entry_safe() regarded as unsafe
2005-06-09 16:27 list_for_each_entry_safe() regarded as unsafe Alan Stern
@ 2005-06-09 21:59 ` Mike Anderson
2005-06-09 23:19 ` Alan Stern
0 siblings, 1 reply; 5+ messages in thread
From: Mike Anderson @ 2005-06-09 21:59 UTC (permalink / raw)
To: Alan Stern; +Cc: Dag Nygren, SCSI development list
Alan Stern [stern@rowland.harvard.edu] wrote:
> Mike and whoever else may be interested:
>
> The scsi_forget_host() and __scsi_remove_target() routines (in scsi_scan.c
> and scsi_sysfs.c) contain these lines respectively:
>
> list_for_each_entry_safe(starget, tmp, &shost->__targets, siblings) {
>
> list_for_each_entry_safe(sdev, tmp, &shost->__devices, siblings) {
>
> Neither loop is truly safe because they release shost->host_lock to do the
> actual removals. I've just seen a couple of different oopses caused when
> __scsi_remove_target() was called during scanning. Details available if
> you want them.
Well we need a updated scsi_host state model that would prevent scanning
while we are removing the host. I would believe that if the oopses in
__scsi_remove_target where prevent there maybe some other oopses showing
up as the host started going away.
>
> I don't know what the best way is fix this. Even if scsi_forget_host()
> acquired the host's scan_mutex, that wouldn't be enough to guarantee the
> __targets and __devices lists won't change, would it? And it might cause
> interference with other pathways.
>
Yes if scsi_forget_host acquired the scan_mutex it would deadlock when
scsi_remove_device acquired it later on in the call stack.
> Maybe it's best simply to avoid using list_for_each_entry_safe, as in
> the example below:
> .. snip ..
> +restart:
> + list_for_each_entry(sdev, &shost->__devices, siblings) {
> if (sdev->channel != starget->channel ||
> sdev->id != starget->id)
> continue;
> spin_unlock_irqrestore(shost->host_lock, flags);
> scsi_remove_device(sdev);
> spin_lock_irqsave(shost->host_lock, flags);
> + goto restart;
> }
> spin_unlock_irqrestore(shost->host_lock, flags);
> scsi_target_reap(starget);
>
Since we are not guaranteed that scsi_remove_device will remove the device
off the list (i.e. the release may not be called if unexpected disconnect)
you may get stuck on the same device for a bit.
-andmike
--
Michael Anderson
andmike@us.ibm.com
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: list_for_each_entry_safe() regarded as unsafe
2005-06-09 21:59 ` Mike Anderson
@ 2005-06-09 23:19 ` Alan Stern
2005-06-10 13:39 ` Brian King
0 siblings, 1 reply; 5+ messages in thread
From: Alan Stern @ 2005-06-09 23:19 UTC (permalink / raw)
To: Mike Anderson; +Cc: Dag Nygren, SCSI development list
On Thu, 9 Jun 2005, Mike Anderson wrote:
> Well we need a updated scsi_host state model that would prevent scanning
> while we are removing the host. I would believe that if the oopses in
> __scsi_remove_target where prevent there maybe some other oopses showing
> up as the host started going away.
More than that is needed -- you have to guarantee that two threads won't
try to add or remove a target or device to the same host at the same time.
> > I don't know what the best way is fix this. Even if scsi_forget_host()
> > acquired the host's scan_mutex, that wouldn't be enough to guarantee the
> > __targets and __devices lists won't change, would it? And it might cause
> > interference with other pathways.
> >
>
> Yes if scsi_forget_host acquired the scan_mutex it would deadlock when
> scsi_remove_device acquired it later on in the call stack.
How about not acquiring the scan_mutex in scsi_remove_device, and
insisting that the caller hold it instead? There aren't that many places
where it gets called. In fact, one of those places (an error pathway in
scsi_sysfs_add_sdev) looks like it already will cause a deadlock.
Then it would be necessary also to have scanning threads check whether the
host is in the process of removal. This means that scsi_forget_host will
have to change the host state somehow. What do you think would be the
best to mark a host being removed?
On the plus side, neither forget_host nor remove_target would need to
acquire the host_lock, because holding the scan_mutex would already
guarantee the necessary exclusion.
Alan Stern
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: list_for_each_entry_safe() regarded as unsafe
2005-06-09 23:19 ` Alan Stern
@ 2005-06-10 13:39 ` Brian King
2005-06-10 15:26 ` Alan Stern
0 siblings, 1 reply; 5+ messages in thread
From: Brian King @ 2005-06-10 13:39 UTC (permalink / raw)
To: Alan Stern; +Cc: Mike Anderson, Dag Nygren, SCSI development list
Alan Stern wrote:
> On Thu, 9 Jun 2005, Mike Anderson wrote:
>
>
>>Well we need a updated scsi_host state model that would prevent scanning
>>while we are removing the host. I would believe that if the oopses in
>>__scsi_remove_target where prevent there maybe some other oopses showing
>>up as the host started going away.
>
>
> More than that is needed -- you have to guarantee that two threads won't
> try to add or remove a target or device to the same host at the same time.
>
>
>>>I don't know what the best way is fix this. Even if scsi_forget_host()
>>>acquired the host's scan_mutex, that wouldn't be enough to guarantee the
>>>__targets and __devices lists won't change, would it? And it might cause
>>>interference with other pathways.
>>>
>>
>>Yes if scsi_forget_host acquired the scan_mutex it would deadlock when
>>scsi_remove_device acquired it later on in the call stack.
>
>
> How about not acquiring the scan_mutex in scsi_remove_device, and
> insisting that the caller hold it instead? There aren't that many places
> where it gets called. In fact, one of those places (an error pathway in
> scsi_sysfs_add_sdev) looks like it already will cause a deadlock.
scsi_remove_device is an exported symbol, so requiring the caller to obtain
the scan_mutex prior to calling it would not work. A __scsi_remove_device
could be created, however, which would not grab the scan_mutex so that scsi
core could do the right thing.
--
Brian King
eServer Storage I/O
IBM Linux Technology Center
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: list_for_each_entry_safe() regarded as unsafe
2005-06-10 13:39 ` Brian King
@ 2005-06-10 15:26 ` Alan Stern
0 siblings, 0 replies; 5+ messages in thread
From: Alan Stern @ 2005-06-10 15:26 UTC (permalink / raw)
To: Brian King; +Cc: Mike Anderson, Dag Nygren, SCSI development list
On Fri, 10 Jun 2005, Brian King wrote:
> > How about not acquiring the scan_mutex in scsi_remove_device, and
> > insisting that the caller hold it instead? There aren't that many places
> > where it gets called. In fact, one of those places (an error pathway in
> > scsi_sysfs_add_sdev) looks like it already will cause a deadlock.
>
> scsi_remove_device is an exported symbol, so requiring the caller to obtain
> the scan_mutex prior to calling it would not work. A __scsi_remove_device
> could be created, however, which would not grab the scan_mutex so that scsi
> core could do the right thing.
Okay.
How should a host be marked to indicate it's being removed? Add another
bit to shost_state?
Alan Stern
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2005-06-10 15:26 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-06-09 16:27 list_for_each_entry_safe() regarded as unsafe Alan Stern
2005-06-09 21:59 ` Mike Anderson
2005-06-09 23:19 ` Alan Stern
2005-06-10 13:39 ` Brian King
2005-06-10 15:26 ` Alan Stern
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox