All of lore.kernel.org
 help / color / mirror / Atom feed
From: Michael Reed <mdr@sgi.com>
To: linux-scsi <linux-scsi@vger.kernel.org>
Cc: James Smart <James.Smart@Emulex.Com>,
	Andrew Vasquez <andrew.vasquez@qlogic.com>,
	Jeremy Higdon <jeremy@sgi.com>
Subject: 2.6.31 - scsi scanning / target deletion deadlock
Date: Wed, 28 Oct 2009 15:18:22 -0500	[thread overview]
Message-ID: <4AE8A70E.9070504@sgi.com> (raw)

Hi All,

I encountered the following deadlock on the Scsi_Host's scan_lock.
Target device glitches have caused the qla2xxx driver to delete and
later attempt to re-add a scsi device.  (Sorry, I cannot present the
exact sequence of events.)

scsi_wq_3 is executing a scan on host 3, holds host's scan_lock.
   i/o has been queued to target3:0:0, on rport 0xe00000b0f02d6c20.

qla2xxx_3_dpc is changing rport roles on rport 0xe00000b0f02d6c20.  Until
  this completes, the work on scsi_wq_3 cannot progress.  The change in
  rport roles results in a call to flush target delete work on fc_wq_3.

fc_wq_3 is trying to remove scsi target 0xe0000030f5e86488 on rport 0xe0000030f1f432d0
  and needs to acquire the scan_lock held by scsi_wq_3.  

Perhaps the granularity of scan_lock is too great?

Would anyone have any thoughts on how best to eliminate this deadlock?

Thanks,
 Mike



[0]kdb> btp 3790
Stack traceback for pid 3790
0xe0000034f5d30000     3790        2  0    1   D  0xe0000034f5d30570  fc_wq_3
0xa0000001007280a0 schedule+0x14e0
        args (0x4000, 0x0, 0x0, 0xa000000100729720, 0x813, 0xe0000034f5d3fdb0, 0x1111111111111111, 0x0, 0x1010095a6000)
0xa000000100729840 __mutex_lock_slowpath+0x320
        args (0xe0000034f4f24cf0, 0xe0000034f5d30000, 0x10095a6010, 0xe0000034f4f24cf4, 0xe0000034f4f24cf8, 0xa0000001011c2600, 0xa0000001011c1cb0, 0x7ffff00)
0xa000000100729ad0 mutex_lock+0x30
        args (0xe0000034f4f24d08, 0xa000000100471d30, 0x286, 0x10095a6010)
0xa000000100471d30 scsi_remove_device+0x30
        args (0xe0000030f5ea57a8, 0xe0000034f4f24cf0, 0xa000000100471f40, 0x48b, 0xe0000034f4f24c90)
0xa000000100471f40 __scsi_remove_target+0x180
        args (0xe0000030f5e86488, 0xe0000030f5ea57a8, 0xe0000034f4f24c90, 0xe0000034f4f24ce8, 0xe0000030f5e865f0, 0xe0000030f5e865ec, 0xa000000100472120, 0x205, 0xa00000010096c950)
0xa000000100472120 __remove_child+0x40
        args (0xe0000030f5e864b0, 0xa0000001004152c0, 0x389, 0x0)
0xa0000001004152c0 device_for_each_child+0x80
        args (0xe0000030f1f43338, 0x0, 0xa00000010096c200, 0x0, 0xa0000001004720b0, 0x288, 0xa0000001013a6540)
0xa0000001004720b0 scsi_remove_target+0x90
        args (0xe0000030f1f43330, 0xe0000030f1f43330, 0xa000000100485630, 0x205, 0xa0000001013a6540)
0xa000000100485630 fc_starget_delete+0x30
        args (0xe0000030f1f43528, 0xa0000001000cbd00, 0x50e, 0xa0000001000cbb80)
0xa0000001000cbd00 worker_thread+0x2a0
        args (0xe0000034f7f1b098, 0xa00000010096cec0, 0xe0000034f7f1b0a0, 0xe0000034f7f1b0c8, 0xe0000034f7f1b0a0, 0xe0000034f7f1b0b0, 0xffffffffbfffffff, 0xa0000001000d5bb0, 0x389)
0xa0000001000d5bb0 kthread+0x110
        args (0xe00000b073a1fcf8, 0xe0000034f5d3fe18, 0xe0000034f7f1b098, 0xa00000010096f650, 0xa000000100014a30, 0x286, 0xa0000001013a6540)
0xa000000100014a30 kernel_thread_helper+0xd0
        args (0xa00000010096ffd0, 0xe00000b073a1fcf8, 0xa00000010000a4c0, 0x2, 0xa0000001013a6540)
0xa00000010000a4c0 start_kernel_thread+0x20
        args (0xa00000010096ffd0, 0xe00000b073a1fcf8)




[0]kdb> btp 3789
Stack traceback for pid 3789
0xe0000034f5b50000     3789        2  0    1   D  0xe0000034f5b50570  scsi_wq_3
0xa0000001007280a0 schedule+0x14e0
        args (0xe0000034f4ec7008, 0xe0000034f4f24d70, 0xe0000034f4ec6fe8, 0xe0000034f3669508, 0xe0000034f3669500, 0xe0000034f3669508, 0xe0000034f36694f8, 0xa0000001011c2cd0, 0x1010095a6000)
0xa000000100728640 schedule_timeout+0x40
        args (0x7fffffffffffffff, 0x0, 0x0, 0xe0000034f64a6928, 0xa000000100726840, 0x50d, 0xe0000034f4ec7000)
0xa000000100726840 wait_for_common+0x1a0
        args (0xe0000034f5b5fce0, 0x7fffffffffffffff, 0x2, 0xe0000034f5b5fce8, 0xe0000034f5b50000, 0xe0000034f5b5fce8, 0xa000000100726ba0, 0x207, 0xa0000001013a6540)
0xa000000100726ba0 wait_for_completion+0x40
        args (0xe0000034f5b5fce0, 0xa0000001002b8460, 0x48e, 0x1)
0xa0000001002b8460 blk_execute_rq+0x140
        args (0xe0000034f36692d0, 0x0, 0xe000003441024250, 0x1, 0xa0000001002b7b60, 0xe000003441024360, 0xa0000001002b8510, 0x38b, 0xe000003441024300)
0xa0000001002b8510 scsi_execute_rq+0x30
        args (0xe0000034f36692d0, 0xe0000034f4ec6fb8, 0xe000003441024250, 0x1, 0xa000000100469050, 0x713, 0x713)
0xa000000100469050 scsi_execute+0x190
        args (0xe0000034f4ec6fb8, 0xe000003441024250, 0xe0000034f03ec500, 0x1000, 0xe000003440f3e278, 0x5dc, 0x3, 0x4000000)
0xa000000100469200 scsi_execute_req+0xe0
        args (0xe0000034f4ec6fb8, 0xe0000034f5b5fd8c, 0x2, 0xe0000034f03ec500, 0x1000, 0xe0000034f5b5fd84, 0x5dc, 0x3, 0xe000003440f3e278)
0xa00000010046da70 __scsi_scan_target+0x530
        args (0x0, 0x0, 0x1000, 0xe0000034f03ec500, 0x1, 0xe0000034f4ec6fb8, 0xe0000030f14b55e0, 0xa0000001011c2cd0, 0xe0000034f5b5fd70)
0xa00000010046f000 scsi_scan_target+0x120
        args (0xe00000b0f02d6c80, 0x0, 0x0, 0xffffffffffffffff, 0x1, 0xe0000034f4f24c90, 0xe0000034f4f24cf0, 0xa000000100485c20, 0x28a)
0xa000000100485c20 fc_scsi_scan_rport+0x140
        args (0xe00000b0f02d6c20, 0xe0000034f4f24ce8, 0xa0000001000cbd00, 0x50e, 0x50e)
0xa0000001000cbd00 worker_thread+0x2a0
        args (0xe0000034f7f1ada0, 0xa00000010096ceb0, 0xe0000034f7f1ada8, 0xe0000034f7f1add0, 0xe0000034f7f1ada8, 0xe0000034f7f1adb8, 0xffffffffbfffffff, 0xa0000001000d5bb0, 0x389)
0xa0000001000d5bb0 kthread+0x110
        args (0xe00000b073a1fd18, 0xe0000034f5b5fe18, 0xe0000034f7f1ada0, 0xa00000010096f650, 0xa000000100014a30, 0x286, 0xa0000001013a6540)
0xa000000100014a30 kernel_thread_helper+0xd0
        args (0xa00000010096ffd0, 0xe00000b073a1fd18, 0xa00000010000a4c0, 0x2, 0xa0000001013a6540)
0xa00000010000a4c0 start_kernel_thread+0x20
        args (0xa00000010096ffd0, 0xe00000b073a1fd18)




[0]kdb> btp 3788
Stack traceback for pid 3788
0xe0000034f3e40000     3788        2  0    0   D  0xe0000034f3e40570  qla2xxx_3_dpc
0xa0000001007280a0 schedule+0x14e0
        args (0x0, 0x1, 0xf, 0x43, 0xa000000100f6d300, 0x0, 0x0, 0xa0000001011e5c80, 0x1010095a6000)
0xa000000100728640 schedule_timeout+0x40
        args (0x7fffffffffffffff, 0x0, 0x0, 0xa0000001000cc150, 0xa000000100726840, 0x50d, 0xe0000034f7f1b0b0)
0xa000000100726840 wait_for_common+0x1a0
        args (0xe0000034f3e4fd00, 0x7fffffffffffffff, 0x2, 0xe0000034f3e4fd08, 0xe0000034f3e40000, 0xe0000034f3e4fd08, 0xa000000100726ba0, 0x207, 0xe0000034f7f1b0b0)
0xa000000100726ba0 wait_for_completion+0x40
        args (0xe0000034f3e4fd00, 0xa0000001000cc390, 0x288, 0xa0000001000cc350)
0xa0000001000cc390 flush_cpu_workqueue+0x110
        args (0xe0000034f7f1b098, 0x1, 0xa0000001000cc750, 0x38a, 0xe0000034f7f1b458)
0xa0000001000cc750 flush_workqueue+0x90
        args (0xe0000034f5c68140, 0x0, 0xa0000001007c13a8, 0xa000000100bd0200, 0xa000000100483850, 0x206, 0x4000)
0xa000000100483850 fc_flush_work+0xb0
        args (0xe0000034f4f24c90, 0xa000000100483b70, 0x48b, 0xe0000034f4f24ce0)
0xa000000100483b70 fc_remote_port_rolechg+0x2f0
        args (0xe00000b0f02d6c20, 0x1, 0xe00000b0f02d6c68, 0xe0000034f4f24ce8, 0xe0000030f442a608, 0xe0000034f4f24c90, 0xa000000206fdfa20, 0x38f, 0xe0000034f7d4d0c8)
0xa000000206fdfa20 [qla2xxx]qla2x00_update_fcport+0x880
        args (0xe00000b0f02d6c20, 0xe0000030f442a5b0, 0xe0000034f62131c8, 0xe0000030f442a5c0, 0xa000000206fdfc00, 0x38c, 0xa00000020700d058)
0xa000000206fdfc00 [qla2xxx]qla2x00_fabric_dev_login+0x160
        args (0xe0000034f4f250a8, 0xe0000030f442a5b0, 0x0, 0xe0000034f62131c8, 0xa000000206fe2900, 0x1634, 0xa00000020700d058)
0xa000000206fe2900 [qla2xxx]qla2x00_configure_loop+0x2cc0
        args (0xe0000034f4f250a8, 0xe0000030f442a5b0, 0xe0000034f4f251a4, 0xe0000034f3e4fd88, 0x300000000, 0x0, 0x1000, 0xe0000034f4f25108, 0xe000003440efdda2)
0xa000000206fe32b0 [qla2xxx]qla2x00_loop_resync+0x1b0
        args (0xe0000034f4f250a8, 0xe0000034f4f25108, 0x0, 0xfe, 0xe0000034f56dc000, 0xe0000034f4f251a4, 0xe0000034f4f2511c, 0xe0000034f4f25104, 0xe0000034f7f1bab0)
0xa000000206fd6d40 [qla2xxx]qla2x00_do_dpc+0x9a0
        args (0xe0000034f62131c8, 0x1, 0xe0000034f3e4fe00, 0xe0000034f4f25108, 0xe0000034f4f250a8, 0xe0000034f3e4fe00, 0xe0000034f4f250e8, 0xa000000207048958, 0xe0000034f4f250c8)
0xa0000001000d5bb0 kthread+0x110
        args (0xe00000b073a1fd28, 0xe0000034f3e4fe18, 0xe0000034f62131c8, 0xa00000020703cfd8, 0xa000000100014a30, 0x286, 0xa0000001013a6540)
0xa000000100014a30 kernel_thread_helper+0xd0
        args (0xa00000010096ffd0, 0xe00000b073a1fd28, 0xa00000010000a4c0, 0x2, 0xa0000001013a6540)
0xa00000010000a4c0 start_kernel_thread+0x20
        args (0xa00000010096ffd0, 0xe00000b073a1fd28)

             reply	other threads:[~2009-10-28 20:18 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-10-28 20:18 Michael Reed [this message]
2010-03-23 20:28 ` [RFC PATCH] fc_transport: reduce scan_mutex contention. (was: Re: 2.6.31 - scsi scanning / target deletion deadlock) Andrew Vasquez
2010-03-24 19:12   ` [RFC PATCH] fc_transport: reduce scan_mutex contention Michael Reed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4AE8A70E.9070504@sgi.com \
    --to=mdr@sgi.com \
    --cc=James.Smart@Emulex.Com \
    --cc=andrew.vasquez@qlogic.com \
    --cc=jeremy@sgi.com \
    --cc=linux-scsi@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.