linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* 2.6.15-rc4 error messages with multiple qla2300 hba ports on fabric
@ 2005-12-02 15:26 Michael Reed
  0 siblings, 0 replies; 7+ messages in thread
From: Michael Reed @ 2005-12-02 15:26 UTC (permalink / raw)
  To: linux-scsi, James.Smart, Christoph Hellwig, Andrew Vasquez

[-- Attachment #1: Type: text/plain, Size: 1010 bytes --]

Hello,

I've been testing with the qla2300 driver with 2.6.14.3 and 2.6.15-rc4.
I've observed two sets of error messages which are not present with
2.6.14.3.

First, the qla2300 driver is generating soft lockups.
Second, several error messages indicating that remote
ports are being deleted are being emitted.

 rport-2:0-16: blocked FC remote port time out: removing target and saving binding
 run_workqueue: recursion depth exceeded: 29

If the timing is just right, scsi errors are generated, though not evident
in the attached dmesg file.

I've observed similar behavior with my modified mpt fusion driver
when multiple hba ports are on the fabric.  The kernels tested
are as downloaded from kernel.org, without my mpt mods.

(Andrew, I'm not "blaming" your driver for the rport issues.  I chose
your driver to be the "victim" 'cause I didn't want to post this using
under development code with mpt fusion.)

Platform: SGI Altix IA64.

What additional information should I acquire?

Mike Reed
mdr@sgi.com


[-- Attachment #2: dmesg-2.6.15-rc4.bz2 --]
[-- Type: application/x-bzip2, Size: 10646 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread
[parent not found: <B179AE41C1147041AA1121F44614F0B0012AD98D@AVEXCH02.qlogic.org>]
* RE: 2.6.15-rc4 error messages with multiple qla2300 hba ports on fabric
@ 2005-12-02 16:47 James.Smart
  2005-12-05 19:06 ` Michael Reed
  0 siblings, 1 reply; 7+ messages in thread
From: James.Smart @ 2005-12-02 16:47 UTC (permalink / raw)
  To: andrew.vasquez, mdr, linux-scsi, hch

We recently saw this as well. It's related to the number of targets
that go away simultaneously.

There ends up being many delete rport items on the work queue, and 
when the 1st one stalls to flush the work queues, it starts the 2nd,
which stops to flush, and so on.

My inclination is to look at what we have on the work queue and see if we can
circumvent some of the flush calls.

-- james s

Here's a backtrace:
rport-4:0-37: blocked FC remote port time out: removing target and saving binding
rport-4:0-42: blocked FC remote port time out: removing target and saving binding
rport-4:0-55: blocked FC remote port time out: removing target and saving binding
run_workqueue: recursion depth exceeded: 4
Call Trace:<ffffffff80146270>{flush_cpu_workqueue+96} <ffffffff8036a9b0>{_spin_lock_irqsave+32}
<ffffffff80146473>{flush_workqueue+115} <ffffffff880a05b5>{:scsi_transport_fc:fc_rport_tgt_remove+85}
<ffffffff801462e0>{flush_cpu_workqueue+208} <ffffffff880a0fc0>{:scsi_transport_fc:fc_timeout_deleted_rport+0}
<ffffffff8036a9b0>{_spin_lock_irqsave+32} <ffffffff80146473>{flush_workqueue+115}
<ffffffff880a05b5>{:scsi_transport_fc:fc_rport_tgt_remove+85}
<ffffffff801462e0>{flush_cpu_workqueue+208} <ffffffff880a0fc0>{:scsi_transport_fc:fc_timeout_deleted_rport+0}
<ffffffff880a0fc0>{:scsi_transport_fc:fc_timeout_deleted_rport+0}
<ffffffff8036a9b0>{_spin_lock_irqsave+32} <ffffffff880a0fc0>{:scsi_transport_fc:fc_timeout_deleted_rport+0}
<ffffffff80146473>{flush_workqueue+115} <ffffffff880a05b5>{:scsi_transport_fc:fc_rport_tgt_remove+85}
<ffffffff8014617c>{worker_thread+508} <ffffffff8012f0e0>{default_wake_function+0}
<ffffffff8012f0e0>{default_wake_function+0} <ffffffff80145f80>{worker_thread+0}
<ffffffff8014a9c9>{kthread+217} <ffffffff8010edbe>{child_rip+8}
<ffffffff8014a8f0>{kthread+0} <ffffffff8010edb6>{child_rip+0}
rport-4:0-53: blocked FC remote port time out: removing target and
saving binding
run_workqueue: recursion depth exceeded: 5
Call Trace:<ffffffff80146270>{flush_cpu_workqueue+96} <ffffffff8036a9b0>{_spin_lock_irqsave+32}
<ffffffff80146473>{flush_workqueue+115} <ffffffff880a05b5>{:scsi_transport_fc:fc_rport_tgt_remove+85}
<ffffffff801462e0>{flush_cpu_workqueue+208} <ffffffff880a0fc0>{:scsi_transport_fc:fc_timeout_deleted_rport+0}
<ffffffff8036a9b0>{_spin_lock_irqsave+32} <ffffffff80146473>{flush_workqueue+115}
<ffffffff880a05b5>{:scsi_transport_fc:fc_rport_tgt_remove+85}
<ffffffff801462e0>{flush_cpu_workqueue+208} <ffffffff880a0fc0>{:scsi_transport_fc:fc_timeout_deleted_rport+0}
<ffffffff8036a9b0>{_spin_lock_irqsave+32} <ffffffff80146473>{flush_workqueue+115}
<ffffffff880a05b5>{:scsi_transport_fc:fc_rport_tgt_remove+85}
<ffffffff801462e0>{flush_cpu_workqueue+208} <ffffffff880a0fc0>{:scsi_transport_fc:fc_timeout_deleted_rport+0}
<ffffffff880a0fc0>{:scsi_transport_fc:fc_timeout_deleted_rport+0}
<ffffffff8036a9b0>{_spin_lock_irqsave+32} <ffffffff880a0fc0>{:scsi_transport_fc:fc_timeout_deleted_rport+0}
<ffffffff80146473>{flush_workqueue+115} <ffffffff880a05b5>{:scsi_transport_fc:fc_rport_tgt_remove+85}
<ffffffff8014617c>{worker_thread+508} <ffffffff8012f0e0>{default_wake_function+0}
<ffffffff8012f0e0>{default_wake_function+0} <ffffffff80145f80>{worker_thread+0}
<ffffffff8014a9c9>{kthread+217} <ffffffff8010edbe>{child_rip+8}
<ffffffff8014a8f0>{kthread+0} <ffffffff8010edb6>{child_rip+0}
...
<ffffffff8014a9c9>{kthread+217} <ffffffff8010edbe>{child_rip+8}
<ffffffff8014a8f0>{kthread+0} <ffffffff8010edb6>{child_rip+0}
rport-4:0-38: blocked FC remote port time out: removing target and saving binding
run_workqueue: recursion depth exceeded: 30


-----Original Message-----
From: Andrew Vasquez [mailto:andrew.vasquez@qlogic.com]
Sent: Friday, December 02, 2005 11:29 AM
To: Michael Reed; linux-scsi@vger.kernel.org; Smart, James; Christoph Hellwig
Subject: RE: 2.6.15-rc4 error messages with multiple qla2300 hba ports on fabric




> From: Michael Reed [mailto:mdr@sgi.com]
>

Sidenote:  I'm on the east-coast until hopefully tonight -- won't
have a chance to look at debugging this for a couple of days...

> I've been testing with the qla2300 driver with 2.6.14.3 and 2.6.15-rc4.
> I've observed two sets of error messages which are not present with
> 2.6.14.3.
>
> First, the qla2300 driver is generating soft lockups.

Have a backtrace?

> Second, several error messages indicating that remote
> ports are being deleted are being emitted.
>
>  rport-2:0-16: blocked FC remote port time out: removing target and saving binding
>  run_workqueue: recursion depth exceeded: 29
>
> If the timing is just right, scsi errors are generated, though not evident
> in the attached dmesg file.
>
> I've observed similar behavior with my modified mpt fusion driver
> when multiple hba ports are on the fabric.  The kernels tested
> are as downloaded from kernel.org, without my mpt mods.
>
> (Andrew, I'm not "blaming" your driver for the rport issues.  I chose
> your driver to be the "victim" 'cause I didn't want to post this using
> under development code with mpt fusion.)
>
> Platform: SGI Altix IA64.
>
> What additional information should I acquire?

^ permalink raw reply	[flat|nested] 7+ messages in thread
* RE: 2.6.15-rc4 error messages with multiple qla2300 hba ports on fabric
@ 2005-12-07 21:41 James.Smart
  0 siblings, 0 replies; 7+ messages in thread
From: James.Smart @ 2005-12-07 21:41 UTC (permalink / raw)
  To: mdr; +Cc: andrew.vasquez, linux-scsi, hch

> > My inclination is to look at what we have on the work queue 
> and see if we can
> > circumvent some of the flush calls.
> 
> Snooping the work queue?  That sounds a little, um, like a hack?
> If the code is correct, and the end result is correct, is the
> test for recursion level and the associated dump_stack() necessary?
> (Yeah, I know, newbie questions. :)

Well, my thinking is along the same lines... Also too much other work on
the sdevs, etc to really get a good feel for things. Plus that should be
handled by default in the other layers. However, I wouldn't address
it via eliminating the recursion level check.

We're testing a patch that deals with the recursion by not doing it. Oldest
trick in the book :)   Will keep you posted once we know the results.

-- james s

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2005-12-07 23:54 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-12-02 15:26 2.6.15-rc4 error messages with multiple qla2300 hba ports on fabric Michael Reed
     [not found] <B179AE41C1147041AA1121F44614F0B0012AD98D@AVEXCH02.qlogic.org>
2005-12-02 16:41 ` Michael Reed
2005-12-02 17:05   ` Michele Baldessari
2005-12-07 23:54   ` Andrew Vasquez
  -- strict thread matches above, loose matches on Subject: below --
2005-12-02 16:47 James.Smart
2005-12-05 19:06 ` Michael Reed
2005-12-07 21:41 James.Smart

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).