public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed
* lpfc: System freezing if fiber is broken
@ 2005-07-26 17:57 Bodo Stroesser
  2005-07-26 18:48 ` Mike Anderson
  0 siblings, 1 reply; 3+ messages in thread
From: Bodo Stroesser @ 2005-07-26 17:57 UTC (permalink / raw)
  To: James.Smart; +Cc: linux-scsi

Hi James,

disrupting a working FC connection makes my i386 SMP server
(2.6.12.2) freeze just one or two seconds after this.
I'm normally using lpfc_nodev_tmo = 1. When I change this to the
default value of 35, the system stalls about 36 seconds after
disruption. So I guess, the problem is caused by nodev_tmo
expiring.
I activated the nmi_watchdog, but no output.

What can I do to analyze this problem?

Regards
	Bodo


BTW:
I couldn't reproduce the problem of a wrong WWPN yet.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: lpfc: System freezing if fiber is broken
  2005-07-26 17:57 lpfc: System freezing if fiber is broken Bodo Stroesser
@ 2005-07-26 18:48 ` Mike Anderson
  2005-07-27 13:45   ` Bodo Stroesser
  0 siblings, 1 reply; 3+ messages in thread
From: Mike Anderson @ 2005-07-26 18:48 UTC (permalink / raw)
  To: Bodo Stroesser; +Cc: James.Smart, linux-scsi

Bodo Stroesser [bstroesser@fujitsu-siemens.com] wrote:
> Hi James,
> 
> disrupting a working FC connection makes my i386 SMP server
> (2.6.12.2) freeze just one or two seconds after this.
> I'm normally using lpfc_nodev_tmo = 1. When I change this to the
> default value of 35, the system stalls about 36 seconds after
> disruption. So I guess, the problem is caused by nodev_tmo
> expiring.
> I activated the nmi_watchdog, but no output.
> 
> What can I do to analyze this problem?

Does changing the timeout for a scsi device also alter the problem. In the
past people have seen issues of the nodev_tmo expiring near the scsi
timeout. This past cases lead to devices being offlined, but may this
could be causing a different symptom on your system.

You can change the timeout for the device by echoing a higher value into
/sys/bus/scsi/devices/${nexus}/timeout.

Is this a full system freeze or only the controlling console?

-andmike
--
Michael Anderson
andmike@us.ibm.com

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: lpfc: System freezing if fiber is broken
  2005-07-26 18:48 ` Mike Anderson
@ 2005-07-27 13:45   ` Bodo Stroesser
  0 siblings, 0 replies; 3+ messages in thread
From: Bodo Stroesser @ 2005-07-27 13:45 UTC (permalink / raw)
  To: Mike Anderson; +Cc: James.Smart, linux-scsi

Mike Anderson wrote:
> Bodo Stroesser [bstroesser@fujitsu-siemens.com] wrote:
> 
>>Hi James,
>>
>>disrupting a working FC connection makes my i386 SMP server
>>(2.6.12.2) freeze just one or two seconds after this.
>>I'm normally using lpfc_nodev_tmo = 1. When I change this to the
>>default value of 35, the system stalls about 36 seconds after
>>disruption. So I guess, the problem is caused by nodev_tmo
>>expiring.
>>I activated the nmi_watchdog, but no output.
>>
>>What can I do to analyze this problem?
> 
> 
> Does changing the timeout for a scsi device also alter the problem. In the
> past people have seen issues of the nodev_tmo expiring near the scsi
> timeout. This past cases lead to devices being offlined, but may this
> could be causing a different symptom on your system.
The amount of time between cutting the connection and the system freezing
is nearly the same as lpfc_nodev_tmo. Using the default nodev_tmo of 35 seconds
results in about 36 seconds, while setting nodev_tmo to 1 results in
2 seconds. As the devices on the Fibre Channel are tapedrives scsi timeout is
900 seconds.
There are 8 tests running that write 8 tape-LUNs at the same SCSI target.
If the connection is broken, some of the tests immediately receive a bad
result for write(), some keep waiting for a result.

Meanwhile I also did some tests with timeout set to 5 and nodev_tmo to 35
(The test I'm running doesn't fail with that small timeout). Those tests, that
do not receive a bad result, stay waiting for result even after 5 second timeout
is expired. In most cases, the system doesn't freeze after nodev_tmo with this test.
But about 5 seconds after plugging FC cable again, it freezes.

> You can change the timeout for the device by echoing a higher value into
> /sys/bus/scsi/devices/${nexus}/timeout.
> 
> Is this a full system freeze or only the controlling console?
Full freeze, no more replies via console or network.

> 
> -andmike
> --
> Michael Anderson
> andmike@us.ibm.com

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2005-07-27 13:45 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-07-26 17:57 lpfc: System freezing if fiber is broken Bodo Stroesser
2005-07-26 18:48 ` Mike Anderson
2005-07-27 13:45   ` Bodo Stroesser

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox