* [Comments Needed] scan vs remove_target deadlock
@ 2006-04-10 18:25 James Smart
2006-04-11 4:03 ` Mike Christie
2006-04-11 8:53 ` Stefan Richter
0 siblings, 2 replies; 13+ messages in thread
From: James Smart @ 2006-04-10 18:25 UTC (permalink / raw)
To: linux-scsi
We've seen a very nasty deadlock condition between the scan code and
the scsi remove code, when the sdev block/unblock functionality is
used. The scsi_scan mutex is taken as a very coarse lock over the
scan code, and will be held across multiple SCSI i/o's while the
scan is proceeding. The scan may be on a single lun basis, or on a
target basis. The jist is - it's held a loooonnng time. Additionally,
the scan code uses the block request queue for scan i/o's. In the case
where the block/unblock interfaces are being used (fc transport), the
request queue can be stopped - which stops scanning. If the same or
unrelated sdev is then to be removed, we enter a deadlock waiting for
the scan mutex to be released. In most cases, a background timer fires
that unblocks the sdev and things eventually unclog (granted a *lot*
of time may have gone by). In a few cases, we are seeing the sdev
request queue get plugged, then this deadlock really locks up. One last
observation: don't mix scan code and other work on the same workq.
Workq flushing will fall over fast.
I'd like to poll the wisdom of those on this list as to the best way
to approach this issue:
- The plugged queue logic needs to be tracked down. Anyone have any
insights ?
- The scan mutex, as coarse as it is, is really broken. It would be
great to reduce the lock holding so the lock isn't held while an
i/o is pending. This change would be extremely invasive to the scan
code. Any other alternatives ?
- If an sdev is "blocked", we shouldn't be wasting our time scanning it.
Should we be adding this checks before sending each scan i/o, or is
there a better lower-level function place this check ? Should we be
creating an explicit return code, or continue to piggy-back on
DID_NO_CONNECT ? How do we deal with a scan i/o which may already be
queued when the device is blocked ?
- Similarly, we need to make sure error handling doesn't take the device
offline when it's blocked. scsi_eh_stu() and scsi_eh_bus_device_reset()
should gracefully handle error conditions (including blocked). Right
now, if a host replies with DID_IMM_RETRY or DID_BUS_BUSY, the device
could be taken offline.
- Anything else ?
Comments appreciated....
-- james s
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Comments Needed] scan vs remove_target deadlock
2006-04-10 18:25 [Comments Needed] scan vs remove_target deadlock James Smart
@ 2006-04-11 4:03 ` Mike Christie
2006-04-13 15:14 ` James Smart
2006-04-11 8:53 ` Stefan Richter
1 sibling, 1 reply; 13+ messages in thread
From: Mike Christie @ 2006-04-11 4:03 UTC (permalink / raw)
To: James.Smart; +Cc: linux-scsi
James Smart wrote:
> We've seen a very nasty deadlock condition between the scan code and
> the scsi remove code, when the sdev block/unblock functionality is
> used. The scsi_scan mutex is taken as a very coarse lock over the
> scan code, and will be held across multiple SCSI i/o's while the
> scan is proceeding. The scan may be on a single lun basis, or on a
> target basis. The jist is - it's held a loooonnng time. Additionally,
> the scan code uses the block request queue for scan i/o's. In the case
> where the block/unblock interfaces are being used (fc transport), the
> request queue can be stopped - which stops scanning. If the same or
> unrelated sdev is then to be removed, we enter a deadlock waiting for
> the scan mutex to be released. In most cases, a background timer fires
> that unblocks the sdev and things eventually unclog (granted a *lot*
> of time may have gone by). In a few cases, we are seeing the sdev
> request queue get plugged, then this deadlock really locks up. One last
> observation: don't mix scan code and other work on the same workq.
> Workq flushing will fall over fast.
>
>
> I'd like to poll the wisdom of those on this list as to the best way
> to approach this issue:
>
> - The plugged queue logic needs to be tracked down. Anyone have any
> insights ?
Are you referring to a problem in a function like our prep or request fn
where we could plug the queue if the device state is not online? Do we
need the scan mutex to change the device state? I mean if a scan has the
mutex lock, and the transport class decides to remove the device it
tries to grab the scan_mutex by calling scsi_remove_device, but if we
moved the state change invocation before the scan_mutex is taken in
scsi_remove_device then, I assume eventually the device should get
unplugged, the prep or request_fn will see the new state and fail the
request.
I am also asking because if you change the state from userspace we do
not grab the scan_mutex so I did not know if that was bug.
> - The scan mutex, as coarse as it is, is really broken. It would be
> great to reduce the lock holding so the lock isn't held while an
> i/o is pending. This change would be extremely invasive to the scan
> code. Any other alternatives ?
> - If an sdev is "blocked", we shouldn't be wasting our time scanning it.
> Should we be adding this checks before sending each scan i/o, or is
> there a better lower-level function place this check ? Should we be
are you referring to our request or prep fn or scsi_dispatch_cmd or
something else like when the scan code calls the scsi_execute?
> creating an explicit return code, or continue to piggy-back on
> DID_NO_CONNECT ? How do we deal with a scan i/o which may already be
> queued when the device is blocked ?
Are you thinking about adding a abort/cancel_request() callback to the
request_queue? Something that could also be called by upper layers like
DM, and could eventually end up calling a scsi abort function?
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Comments Needed] scan vs remove_target deadlock
2006-04-10 18:25 [Comments Needed] scan vs remove_target deadlock James Smart
2006-04-11 4:03 ` Mike Christie
@ 2006-04-11 8:53 ` Stefan Richter
2006-04-13 15:21 ` James Smart
1 sibling, 1 reply; 13+ messages in thread
From: Stefan Richter @ 2006-04-11 8:53 UTC (permalink / raw)
To: James.Smart; +Cc: linux-scsi
James Smart wrote:
> In the case
> where the block/unblock interfaces are being used (fc transport), the
> request queue can be stopped - which stops scanning. If the same or
> unrelated sdev is then to be removed, we enter a deadlock waiting for
> the scan mutex to be released.
Another driver which uses a block/unblock interface is sbp2. It blocks
shosts (because one shost == one SBP-2 LU at the moment) during 1394 bus
reset/ 1394 nodes rescan/ SBP-2 reconnect phases. I learned the hard way
that an shost (or sdev if you will) *must not be blocked* when an shost
(or sdev) is to be removed. This is generally true, not only during
scanning, because SCSI command set drivers' shutdown methods go into
deadlock as well if an shost (or sdev) is blocked. I posted an addition
for Documentation/scsi/scsi_mid_low_api.txt a while ago to document this
pitfall.
IOW before a transport may remove an sdev or shost, it has to unblock it
and it also has to make sure that all commands that were enqueued before
the blocking are being completed. But isn't it rather a responsibility
of the SCSI core to get a LU's or target's state transitions right? When
an sdev is "blocked" and the transport tells the core to transition it
to "to be removed", then the core should know that the sdev's LU cannot
be reached anymore and act accordingly.
--
Stefan Richter
-=====-=-==- -=-- -=-==
http://arcgraph.de/sr/
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Comments Needed] scan vs remove_target deadlock
2006-04-11 4:03 ` Mike Christie
@ 2006-04-13 15:14 ` James Smart
2006-04-14 4:23 ` Mike Christie
0 siblings, 1 reply; 13+ messages in thread
From: James Smart @ 2006-04-13 15:14 UTC (permalink / raw)
To: Mike Christie; +Cc: linux-scsi
Mike Christie wrote:
>> - The plugged queue logic needs to be tracked down. Anyone have any
>> insights ?
>
> Are you referring to a problem in a function like our prep or request fn
> where we could plug the queue if the device state is not online?
I believe so. One signature is that the scan i/o failed, starting recovery,
and right as recovery started, the sdev blocked, which caused recovery to
fail and sdev to be taken offline.
> Do we
> need the scan mutex to change the device state? I mean if a scan has the
> mutex lock, and the transport class decides to remove the device it
> tries to grab the scan_mutex by calling scsi_remove_device, but if we
> moved the state change invocation before the scan_mutex is taken in
> scsi_remove_device then, I assume eventually the device should get
> unplugged, the prep or request_fn will see the new state and fail the
> request.
This may be what's needed. I don't understand all of this path yet, so I
can only speculate (and likely w/ error). Thus, the questions.
> I am also asking because if you change the state from userspace we do
> not grab the scan_mutex so I did not know if that was bug.
>
>
>> - The scan mutex, as coarse as it is, is really broken. It would be
>> great to reduce the lock holding so the lock isn't held while an
>> i/o is pending. This change would be extremely invasive to the scan
>> code. Any other alternatives ?
>> - If an sdev is "blocked", we shouldn't be wasting our time scanning it.
>> Should we be adding this checks before sending each scan i/o, or is
>> there a better lower-level function place this check ? Should we be
>
> are you referring to our request or prep fn or scsi_dispatch_cmd or
> something else like when the scan code calls the scsi_execute?
Well, we want to stop the i/o before it gets on the request queue, thus it
should be in scsi_execute(). We tried modifying scsi_execute() to bounce
i/o's (w/ DID_NO_CONNECT) if the sdev was offline. This made the deadlock
harder to hit, but didn't remove it. We'll probably be augmenting this
with a check for the blocked state as well.
Once the i/o is on the queue, we have to ensure the queues get run, and the
LLDD/transport will bounce the i/o. However, we would have liked to avoid
the delay while waiting on the queue to run.
>
>> creating an explicit return code, or continue to piggy-back on
>> DID_NO_CONNECT ? How do we deal with a scan i/o which may already be
>> queued when the device is blocked ?
>
> Are you thinking about adding a abort/cancel_request() callback to the
> request_queue? Something that could also be called by upper layers like
> DM, and could eventually end up calling a scsi abort function?
Actually - yes, and I knew things like DM would have liked this. There's
2 states we have to be aware of though... 1st state is where the i/o is
queued, but not yet given to the LLDD (which I believe is the stall case
that is hurting us here). The 2nd case is where the i/o has been issued
to the LLDD, but the sdev blocked is not blocked when the abort is issued.
Unfortunately, since the block state implies a loss of connectivity, we
can't send the abort, so we have to sit and wait until the block times out.
No real way to avoid this delay.
Hope this makes sense.
-- james s
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Comments Needed] scan vs remove_target deadlock
2006-04-11 8:53 ` Stefan Richter
@ 2006-04-13 15:21 ` James Smart
2006-04-14 19:16 ` Stefan Richter
2006-04-18 20:09 ` Michael Reed
0 siblings, 2 replies; 13+ messages in thread
From: James Smart @ 2006-04-13 15:21 UTC (permalink / raw)
To: Stefan Richter; +Cc: linux-scsi
Thanks Stefan...
> Another driver which uses a block/unblock interface is sbp2. It blocks
> shosts (because one shost == one SBP-2 LU at the moment) during 1394 bus
> reset/ 1394 nodes rescan/ SBP-2 reconnect phases. I learned the hard way
> that an shost (or sdev if you will) *must not be blocked* when an shost
> (or sdev) is to be removed.
True. The FC transport explicitly performs an unblock prior to the remove
call. However, the remove is then deadlocking on the scan_mutext vs the
pending request queue (still trying to find out why it's really stuck).
> IOW before a transport may remove an sdev or shost, it has to unblock it
> and it also has to make sure that all commands that were enqueued before
> the blocking are being completed.
True. The FC transport explicitly performs an unblock prior to the remove
call. What I'm seeing would align with "not" making sure the prior queued
commands are completed before it removes.
> But isn't it rather a responsibility
> of the SCSI core to get a LU's or target's state transitions right?
Agreed. The real issue is - define the window for prior queued commands.
You may flush all that are there right now, but that may immediately
requeue a retry, etc - which means you have to start all over.
> When
> an sdev is "blocked" and the transport tells the core to transition it
> to "to be removed", then the core should know that the sdev's LU cannot
> be reached anymore and act accordingly.
I would assume - that's what we'll eventually get to, with the mutex
being the first onion layer to get pulled.
-- james s
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Comments Needed] scan vs remove_target deadlock
2006-04-13 15:14 ` James Smart
@ 2006-04-14 4:23 ` Mike Christie
2006-04-14 10:19 ` James Smart
0 siblings, 1 reply; 13+ messages in thread
From: Mike Christie @ 2006-04-14 4:23 UTC (permalink / raw)
To: James.Smart; +Cc: linux-scsi
James Smart wrote:
>> Do we
>> need the scan mutex to change the device state? I mean if a scan has
>> the mutex lock, and the transport class decides to remove the device
>> it tries to grab the scan_mutex by calling scsi_remove_device, but if
>> we moved the state change invocation before the scan_mutex is taken in
>> scsi_remove_device then, I assume eventually the device should get
>> unplugged, the prep or request_fn will see the new state and fail the
>> request.
>
> This may be what's needed. I don't understand all of this path yet, so I
> can only speculate (and likely w/ error). Thus, the questions.
Actually, maybe, I should not have brought this up as it could just be
more of a workaround of the core problem. For FC in fc_user_scan() do
you need some sort of lock around the rport loop?
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Comments Needed] scan vs remove_target deadlock
2006-04-14 4:23 ` Mike Christie
@ 2006-04-14 10:19 ` James Smart
2006-04-14 17:48 ` Mike Christie
0 siblings, 1 reply; 13+ messages in thread
From: James Smart @ 2006-04-14 10:19 UTC (permalink / raw)
To: Mike Christie; +Cc: linux-scsi
> Actually, maybe, I should not have brought this up as it could just be
> more of a workaround of the core problem. For FC in fc_user_scan() do
> you need some sort of lock around the rport loop?
Yes, I had noticed this as well. However, I don't think this is influencing
the deadlock.
-- james s
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Comments Needed] scan vs remove_target deadlock
2006-04-14 10:19 ` James Smart
@ 2006-04-14 17:48 ` Mike Christie
2006-04-14 17:58 ` Mike Christie
0 siblings, 1 reply; 13+ messages in thread
From: Mike Christie @ 2006-04-14 17:48 UTC (permalink / raw)
To: James.Smart; +Cc: linux-scsi
James Smart wrote:
>
>> Actually, maybe, I should not have brought this up as it could just be
>> more of a workaround of the core problem. For FC in fc_user_scan() do
>> you need some sort of lock around the rport loop?
>
> Yes, I had noticed this as well. However, I don't think this is influencing
> the deadlock.
>
iscsi needed a lock too. And so we ended up just adding a semaphore
around the addition and deletion and scanning of sessions. We also do
some weird things in that we initiate some tasks from userspace
(scanning and shutting down devices, transport shutdown, etc), but the
locking issues are similar and by doing some of the things in userspace
iscsi is just trying to work around some of the issues. I was just
thinking maybe your original thought about a more invasive locking
change may be needed instead of the workaround I was suggesting earlier
in the thread.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Comments Needed] scan vs remove_target deadlock
2006-04-14 17:48 ` Mike Christie
@ 2006-04-14 17:58 ` Mike Christie
0 siblings, 0 replies; 13+ messages in thread
From: Mike Christie @ 2006-04-14 17:58 UTC (permalink / raw)
To: Mike Christie; +Cc: James.Smart, linux-scsi
Mike Christie wrote:
> James Smart wrote:
>>> Actually, maybe, I should not have brought this up as it could just be
>>> more of a workaround of the core problem. For FC in fc_user_scan() do
>>> you need some sort of lock around the rport loop?
>> Yes, I had noticed this as well. However, I don't think this is influencing
>> the deadlock.
>>
>
> iscsi needed a lock too. And so we ended up just adding a semaphore
> around the addition and deletion and scanning of sessions. We also do
Do what can happen is that we go from
iscs_user_scan()
grab iscsi lock around sessions
scsi-ml scan()
grab scsi scan lock
delete session
grab iscsi lock
remove session from sessions list
scsi ml host/decice deletion
grab scsi scan lock
and I am just saying I think we are duplicating some of the locking in
the transport class (due to some weirdness with the userspace
workarounds this is more or less true).
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Comments Needed] scan vs remove_target deadlock
2006-04-13 15:21 ` James Smart
@ 2006-04-14 19:16 ` Stefan Richter
2006-04-18 20:09 ` Michael Reed
1 sibling, 0 replies; 13+ messages in thread
From: Stefan Richter @ 2006-04-14 19:16 UTC (permalink / raw)
To: James.Smart; +Cc: linux-scsi
James Smart wrote:
(I wrote)
...
>> before a transport may remove an sdev or shost, it has to unblock it
>> and it also has to make sure that all commands that were enqueued before
>> the blocking are being completed.
>
> True. The FC transport explicitly performs an unblock prior to the remove
> call. What I'm seeing would align with "not" making sure the prior queued
> commands are completed before it removes.
>
>> But isn't it rather a responsibility
>> of the SCSI core to get a LU's or target's state transitions right?
>
> Agreed. The real issue is - define the window for prior queued commands.
> You may flush all that are there right now, but that may immediately
> requeue a retry, etc - which means you have to start all over.
The retry problem is of course also to be solved in the SCSI core and/or
above, i.e. in the protocol driver (and perhaps in userspace apps doing
sg_io, although this is not very relevant to this discussion).
For example, sr_mod seems to issue and retry commands to LUs which are
even in "offline" state.
>> When
>> an sdev is "blocked" and the transport tells the core to transition it
>> to "to be removed", then the core should know that the sdev's LU cannot
>> be reached anymore and act accordingly.
>
> I would assume - that's what we'll eventually get to, with the mutex
> being the first onion layer to get pulled.
The deadlocks which sbp2<->scsi plagued in the past did not seem
scan_mutex related. But I will stress-test this again to see if a same
or similar situation as you described can be provoked with sbp2. Maybe
it is completely impossible because /a/ sbp2 (currently) has only one LU
beneath each Scsi_Host and /b/ sdev removal will never be started before
the scanning finished, due to how the IEEE 1394 subsystem (currently)
adds and removes units.
--
Stefan Richter
-=====-=-==- -=-- -===-
http://arcgraph.de/sr/
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Comments Needed] scan vs remove_target deadlock
2006-04-13 15:21 ` James Smart
2006-04-14 19:16 ` Stefan Richter
@ 2006-04-18 20:09 ` Michael Reed
2006-04-18 21:35 ` James Smart
1 sibling, 1 reply; 13+ messages in thread
From: Michael Reed @ 2006-04-18 20:09 UTC (permalink / raw)
To: James.Smart; +Cc: Stefan Richter, linux-scsi
James Smart wrote:
> Thanks Stefan...
>
>> Another driver which uses a block/unblock interface is sbp2. It blocks
>> shosts (because one shost == one SBP-2 LU at the moment) during 1394 bus
>> reset/ 1394 nodes rescan/ SBP-2 reconnect phases. I learned the hard way
>> that an shost (or sdev if you will) *must not be blocked* when an shost
>> (or sdev) is to be removed.
>
> True. The FC transport explicitly performs an unblock prior to the remove
> call. However, the remove is then deadlocking on the scan_mutext vs the
> pending request queue (still trying to find out why it's really stuck).
The remove is not for the target which holds the scsi host's scan mutex.
Hence, the unblock doesn't kick the [right] queue.
I think this means that transport cannot call scsi_remove_target() for any
target if a scan is running. So, transport has to wait until it can assure
that no scan is running, perhaps a new mutex, and has to have a way of kicking
a blocked target which is being scanned, either when the LLDD unblocks
the target or the delete work for that target fires.
Mike
>
>> IOW before a transport may remove an sdev or shost, it has to unblock it
>> and it also has to make sure that all commands that were enqueued before
>> the blocking are being completed.
>
> True. The FC transport explicitly performs an unblock prior to the remove
> call. What I'm seeing would align with "not" making sure the prior queued
> commands are completed before it removes.
>
>> But isn't it rather a responsibility
>> of the SCSI core to get a LU's or target's state transitions right?
>
> Agreed. The real issue is - define the window for prior queued commands.
> You may flush all that are there right now, but that may immediately
> requeue a retry, etc - which means you have to start all over.
>
>> When
>> an sdev is "blocked" and the transport tells the core to transition it
>> to "to be removed", then the core should know that the sdev's LU cannot
>> be reached anymore and act accordingly.
>
> I would assume - that's what we'll eventually get to, with the mutex
> being the first onion layer to get pulled.
>
> -- james s
> -
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Comments Needed] scan vs remove_target deadlock
2006-04-18 20:09 ` Michael Reed
@ 2006-04-18 21:35 ` James Smart
2006-04-19 15:34 ` Michael Reed
0 siblings, 1 reply; 13+ messages in thread
From: James Smart @ 2006-04-18 21:35 UTC (permalink / raw)
To: Michael Reed; +Cc: Stefan Richter, linux-scsi
Michael Reed wrote:
> The remove is not for the target which holds the scsi host's scan mutex.
> Hence, the unblock doesn't kick the [right] queue.
Certainly could be true.
> I think this means that transport cannot call scsi_remove_target() for any
> target if a scan is running. So, transport has to wait until it can assure
> that no scan is running, perhaps a new mutex, and has to have a way of kicking
> a blocked target which is being scanned, either when the LLDD unblocks
> the target or the delete work for that target fires.
Well - that's one way. Very difficult for the transport to know when this is
true (not all scans occur from the transport). It should be a midlayer thing
to ensure the proper things happen. Also highlights just how gross the that
scan_lock is - which is where the real fix should be, although this will be
a rats nest.
-- james s
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Comments Needed] scan vs remove_target deadlock
2006-04-18 21:35 ` James Smart
@ 2006-04-19 15:34 ` Michael Reed
0 siblings, 0 replies; 13+ messages in thread
From: Michael Reed @ 2006-04-19 15:34 UTC (permalink / raw)
To: James.Smart; +Cc: Stefan Richter, linux-scsi
James Smart wrote:
> Michael Reed wrote:
>> The remove is not for the target which holds the scsi host's scan mutex.
>> Hence, the unblock doesn't kick the [right] queue.
>
> Certainly could be true.
I don't think it would deadlock if it wasn't. The scan mutex is a rather
gross lock.
>
>> I think this means that transport cannot call scsi_remove_target() for any
>> target if a scan is running. So, transport has to wait until it can assure
>> that no scan is running, perhaps a new mutex, and has to have a way of kicking
>> a blocked target which is being scanned, either when the LLDD unblocks
>> the target or the delete work for that target fires.
>
> Well - that's one way. Very difficult for the transport to know when this is
> true (not all scans occur from the transport). It should be a midlayer thing
> to ensure the proper things happen. Also highlights just how gross the that
> scan_lock is - which is where the real fix should be, although this will be
> a rats nest.
There's fc_user_scan() which I believe handles scans initiated
via the sysfs/proc variables. There's fc_scsi_scan_rport() run via the scan work.
It appears that the routines that perform a scan, in a fibre channel context,
are all entered via the transport.
What am I missing?
Mike
>
> -- james s
>
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2006-04-19 15:34 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-04-10 18:25 [Comments Needed] scan vs remove_target deadlock James Smart
2006-04-11 4:03 ` Mike Christie
2006-04-13 15:14 ` James Smart
2006-04-14 4:23 ` Mike Christie
2006-04-14 10:19 ` James Smart
2006-04-14 17:48 ` Mike Christie
2006-04-14 17:58 ` Mike Christie
2006-04-11 8:53 ` Stefan Richter
2006-04-13 15:21 ` James Smart
2006-04-14 19:16 ` Stefan Richter
2006-04-18 20:09 ` Michael Reed
2006-04-18 21:35 ` James Smart
2006-04-19 15:34 ` Michael Reed
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).