From: Bodo Stroesser <bstroesser@fujitsu-siemens.com>
To: Mike Anderson <andmike@us.ibm.com>
Cc: James.Smart@Emulex.Com, linux-scsi@vger.kernel.org
Subject: Re: lpfc: System freezing if fiber is broken
Date: Wed, 27 Jul 2005 15:45:41 +0200 [thread overview]
Message-ID: <42E79005.3020108@fujitsu-siemens.com> (raw)
In-Reply-To: <20050726184800.GA19810@us.ibm.com>
Mike Anderson wrote:
> Bodo Stroesser [bstroesser@fujitsu-siemens.com] wrote:
>
>>Hi James,
>>
>>disrupting a working FC connection makes my i386 SMP server
>>(2.6.12.2) freeze just one or two seconds after this.
>>I'm normally using lpfc_nodev_tmo = 1. When I change this to the
>>default value of 35, the system stalls about 36 seconds after
>>disruption. So I guess, the problem is caused by nodev_tmo
>>expiring.
>>I activated the nmi_watchdog, but no output.
>>
>>What can I do to analyze this problem?
>
>
> Does changing the timeout for a scsi device also alter the problem. In the
> past people have seen issues of the nodev_tmo expiring near the scsi
> timeout. This past cases lead to devices being offlined, but may this
> could be causing a different symptom on your system.
The amount of time between cutting the connection and the system freezing
is nearly the same as lpfc_nodev_tmo. Using the default nodev_tmo of 35 seconds
results in about 36 seconds, while setting nodev_tmo to 1 results in
2 seconds. As the devices on the Fibre Channel are tapedrives scsi timeout is
900 seconds.
There are 8 tests running that write 8 tape-LUNs at the same SCSI target.
If the connection is broken, some of the tests immediately receive a bad
result for write(), some keep waiting for a result.
Meanwhile I also did some tests with timeout set to 5 and nodev_tmo to 35
(The test I'm running doesn't fail with that small timeout). Those tests, that
do not receive a bad result, stay waiting for result even after 5 second timeout
is expired. In most cases, the system doesn't freeze after nodev_tmo with this test.
But about 5 seconds after plugging FC cable again, it freezes.
> You can change the timeout for the device by echoing a higher value into
> /sys/bus/scsi/devices/${nexus}/timeout.
>
> Is this a full system freeze or only the controlling console?
Full freeze, no more replies via console or network.
>
> -andmike
> --
> Michael Anderson
> andmike@us.ibm.com
prev parent reply other threads:[~2005-07-27 13:45 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-07-26 17:57 lpfc: System freezing if fiber is broken Bodo Stroesser
2005-07-26 18:48 ` Mike Anderson
2005-07-27 13:45 ` Bodo Stroesser [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=42E79005.3020108@fujitsu-siemens.com \
--to=bstroesser@fujitsu-siemens.com \
--cc=James.Smart@Emulex.Com \
--cc=andmike@us.ibm.com \
--cc=linux-scsi@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox