public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed
From: James Smart <James.Smart@Emulex.Com>
To: linux-scsi@vger.kernel.org
Subject: [Comments Needed] scan vs remove_target deadlock
Date: Mon, 10 Apr 2006 14:25:08 -0400	[thread overview]
Message-ID: <1144693508.3820.33.camel@localhost.localdomain> (raw)

We've seen a very nasty deadlock condition between the scan code and
the scsi remove code, when the sdev block/unblock functionality is
used. The scsi_scan mutex is taken as a very coarse lock over the
scan code, and will be held across multiple SCSI i/o's while the
scan is proceeding. The scan may be on a single lun basis, or on a
target basis. The jist is - it's held a loooonnng time. Additionally,
the scan code uses the block request queue for scan i/o's.  In the case
where the block/unblock interfaces are being used (fc transport), the
request queue can be stopped - which stops scanning.  If the same or
unrelated sdev is then to be removed, we enter a deadlock waiting for
the scan mutex to be released. In most cases, a background timer fires
that unblocks the sdev and things eventually unclog (granted a *lot*
of time may have gone by). In a few cases, we are seeing the sdev
request queue get plugged, then this deadlock really locks up. One last
observation: don't mix scan code and other work on the same workq.
Workq flushing will fall over fast.


I'd like to poll the wisdom of those on this list as to the best way
to approach this issue:

- The plugged queue logic needs to be tracked down. Anyone have any
  insights ?
- The scan mutex, as coarse as it is, is really broken. It would be
  great to reduce the lock holding so the lock isn't held while an
  i/o is pending. This change would be extremely invasive to the scan
  code. Any other alternatives ?
- If an sdev is "blocked", we shouldn't be wasting our time scanning it.
  Should we be adding this checks before sending each scan i/o, or is
  there a better lower-level function place this check ? Should we be
  creating an explicit return code, or continue to piggy-back on
  DID_NO_CONNECT ? How do we deal with a scan i/o which may already be
  queued when the device is blocked ?
- Similarly, we need to make sure error handling doesn't take the device
  offline when it's blocked. scsi_eh_stu() and scsi_eh_bus_device_reset()
  should gracefully handle error conditions (including blocked). Right
  now, if a host replies with DID_IMM_RETRY or DID_BUS_BUSY, the device
  could be taken offline.
- Anything else ?

Comments appreciated....

-- james s



             reply	other threads:[~2006-04-10 18:25 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-04-10 18:25 James Smart [this message]
2006-04-11  4:03 ` [Comments Needed] scan vs remove_target deadlock Mike Christie
2006-04-13 15:14   ` James Smart
2006-04-14  4:23     ` Mike Christie
2006-04-14 10:19       ` James Smart
2006-04-14 17:48         ` Mike Christie
2006-04-14 17:58           ` Mike Christie
2006-04-11  8:53 ` Stefan Richter
2006-04-13 15:21   ` James Smart
2006-04-14 19:16     ` Stefan Richter
2006-04-18 20:09     ` Michael Reed
2006-04-18 21:35       ` James Smart
2006-04-19 15:34         ` Michael Reed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1144693508.3820.33.camel@localhost.localdomain \
    --to=james.smart@emulex.com \
    --cc=linux-scsi@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox