* lpfc: System freezing if fiber is broken
@ 2005-07-26 17:57 Bodo Stroesser
2005-07-26 18:48 ` Mike Anderson
0 siblings, 1 reply; 3+ messages in thread
From: Bodo Stroesser @ 2005-07-26 17:57 UTC (permalink / raw)
To: James.Smart; +Cc: linux-scsi
Hi James,
disrupting a working FC connection makes my i386 SMP server
(2.6.12.2) freeze just one or two seconds after this.
I'm normally using lpfc_nodev_tmo = 1. When I change this to the
default value of 35, the system stalls about 36 seconds after
disruption. So I guess, the problem is caused by nodev_tmo
expiring.
I activated the nmi_watchdog, but no output.
What can I do to analyze this problem?
Regards
Bodo
BTW:
I couldn't reproduce the problem of a wrong WWPN yet.
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: lpfc: System freezing if fiber is broken
2005-07-26 17:57 lpfc: System freezing if fiber is broken Bodo Stroesser
@ 2005-07-26 18:48 ` Mike Anderson
2005-07-27 13:45 ` Bodo Stroesser
0 siblings, 1 reply; 3+ messages in thread
From: Mike Anderson @ 2005-07-26 18:48 UTC (permalink / raw)
To: Bodo Stroesser; +Cc: James.Smart, linux-scsi
Bodo Stroesser [bstroesser@fujitsu-siemens.com] wrote:
> Hi James,
>
> disrupting a working FC connection makes my i386 SMP server
> (2.6.12.2) freeze just one or two seconds after this.
> I'm normally using lpfc_nodev_tmo = 1. When I change this to the
> default value of 35, the system stalls about 36 seconds after
> disruption. So I guess, the problem is caused by nodev_tmo
> expiring.
> I activated the nmi_watchdog, but no output.
>
> What can I do to analyze this problem?
Does changing the timeout for a scsi device also alter the problem. In the
past people have seen issues of the nodev_tmo expiring near the scsi
timeout. This past cases lead to devices being offlined, but may this
could be causing a different symptom on your system.
You can change the timeout for the device by echoing a higher value into
/sys/bus/scsi/devices/${nexus}/timeout.
Is this a full system freeze or only the controlling console?
-andmike
--
Michael Anderson
andmike@us.ibm.com
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: lpfc: System freezing if fiber is broken
2005-07-26 18:48 ` Mike Anderson
@ 2005-07-27 13:45 ` Bodo Stroesser
0 siblings, 0 replies; 3+ messages in thread
From: Bodo Stroesser @ 2005-07-27 13:45 UTC (permalink / raw)
To: Mike Anderson; +Cc: James.Smart, linux-scsi
Mike Anderson wrote:
> Bodo Stroesser [bstroesser@fujitsu-siemens.com] wrote:
>
>>Hi James,
>>
>>disrupting a working FC connection makes my i386 SMP server
>>(2.6.12.2) freeze just one or two seconds after this.
>>I'm normally using lpfc_nodev_tmo = 1. When I change this to the
>>default value of 35, the system stalls about 36 seconds after
>>disruption. So I guess, the problem is caused by nodev_tmo
>>expiring.
>>I activated the nmi_watchdog, but no output.
>>
>>What can I do to analyze this problem?
>
>
> Does changing the timeout for a scsi device also alter the problem. In the
> past people have seen issues of the nodev_tmo expiring near the scsi
> timeout. This past cases lead to devices being offlined, but may this
> could be causing a different symptom on your system.
The amount of time between cutting the connection and the system freezing
is nearly the same as lpfc_nodev_tmo. Using the default nodev_tmo of 35 seconds
results in about 36 seconds, while setting nodev_tmo to 1 results in
2 seconds. As the devices on the Fibre Channel are tapedrives scsi timeout is
900 seconds.
There are 8 tests running that write 8 tape-LUNs at the same SCSI target.
If the connection is broken, some of the tests immediately receive a bad
result for write(), some keep waiting for a result.
Meanwhile I also did some tests with timeout set to 5 and nodev_tmo to 35
(The test I'm running doesn't fail with that small timeout). Those tests, that
do not receive a bad result, stay waiting for result even after 5 second timeout
is expired. In most cases, the system doesn't freeze after nodev_tmo with this test.
But about 5 seconds after plugging FC cable again, it freezes.
> You can change the timeout for the device by echoing a higher value into
> /sys/bus/scsi/devices/${nexus}/timeout.
>
> Is this a full system freeze or only the controlling console?
Full freeze, no more replies via console or network.
>
> -andmike
> --
> Michael Anderson
> andmike@us.ibm.com
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2005-07-27 13:45 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-07-26 17:57 lpfc: System freezing if fiber is broken Bodo Stroesser
2005-07-26 18:48 ` Mike Anderson
2005-07-27 13:45 ` Bodo Stroesser
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox