Re: [Comments Needed] scan vs remove_target deadlock

linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Mike Christie <michaelc@cs.wisc.edu>
To: James.Smart@Emulex.Com
Cc: linux-scsi@vger.kernel.org
Subject: Re: [Comments Needed] scan vs remove_target deadlock
Date: Mon, 10 Apr 2006 23:03:05 -0500	[thread overview]
Message-ID: <443B2A79.4020600@cs.wisc.edu> (raw)
In-Reply-To: <1144693508.3820.33.camel@localhost.localdomain>

James Smart wrote:
> We've seen a very nasty deadlock condition between the scan code and
> the scsi remove code, when the sdev block/unblock functionality is
> used. The scsi_scan mutex is taken as a very coarse lock over the
> scan code, and will be held across multiple SCSI i/o's while the
> scan is proceeding. The scan may be on a single lun basis, or on a
> target basis. The jist is - it's held a loooonnng time. Additionally,
> the scan code uses the block request queue for scan i/o's.  In the case
> where the block/unblock interfaces are being used (fc transport), the
> request queue can be stopped - which stops scanning.  If the same or
> unrelated sdev is then to be removed, we enter a deadlock waiting for
> the scan mutex to be released. In most cases, a background timer fires
> that unblocks the sdev and things eventually unclog (granted a *lot*
> of time may have gone by). In a few cases, we are seeing the sdev
> request queue get plugged, then this deadlock really locks up. One last
> observation: don't mix scan code and other work on the same workq.
> Workq flushing will fall over fast.
> 
> 
> I'd like to poll the wisdom of those on this list as to the best way
> to approach this issue:
> 
> - The plugged queue logic needs to be tracked down. Anyone have any
>   insights ?

Are you referring to a problem in a function like our prep or request fn 
where we could plug the queue if the device state is not online? Do we 
need the scan mutex to change the device state? I mean if a scan has the 
mutex lock, and the transport class decides to remove the device it 
tries to grab the scan_mutex by calling scsi_remove_device, but if we 
moved the state change invocation before the scan_mutex is taken in 
scsi_remove_device then, I assume eventually the device should get 
unplugged, the prep or request_fn will see the new state and fail the 
request.

I am also asking because if you change the state from userspace we do 
not grab the scan_mutex so I did not know if that was bug.

> - The scan mutex, as coarse as it is, is really broken. It would be
>   great to reduce the lock holding so the lock isn't held while an
>   i/o is pending.  This change would be extremely invasive to the scan
>   code. Any other alternatives ?
> - If an sdev is "blocked", we shouldn't be wasting our time scanning it.
>   Should we be adding this checks before sending each scan i/o, or is
>   there a better lower-level function place this check ? Should we be

are you referring to our request or prep fn or scsi_dispatch_cmd or 
something else like when the scan code calls the scsi_execute?

>   creating an explicit return code, or continue to piggy-back on
>   DID_NO_CONNECT ? How do we deal with a scan i/o which may already be
>   queued when the device is blocked ?

Are you thinking about adding a abort/cancel_request() callback to the 
request_queue? Something that could also be called by upper layers like 
DM, and could eventually end up calling a scsi abort function?

next prev parent reply	other threads:[~2006-04-11  4:03 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-04-10 18:25 [Comments Needed] scan vs remove_target deadlock James Smart
2006-04-11  4:03 ` Mike Christie [this message]
2006-04-13 15:14   ` James Smart
2006-04-14  4:23     ` Mike Christie
2006-04-14 10:19       ` James Smart
2006-04-14 17:48         ` Mike Christie
2006-04-14 17:58           ` Mike Christie
2006-04-11  8:53 ` Stefan Richter
2006-04-13 15:21   ` James Smart
2006-04-14 19:16     ` Stefan Richter
2006-04-18 20:09     ` Michael Reed
2006-04-18 21:35       ` James Smart
2006-04-19 15:34         ` Michael Reed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=443B2A79.4020600@cs.wisc.edu \
    --to=michaelc@cs.wisc.edu \
    --cc=James.Smart@Emulex.Com \
    --cc=linux-scsi@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).