From: Bodo Stroesser <bstroesser@fujitsu-siemens.com>
To: Mike Anderson <andmike@us.ibm.com>
Cc: James.Smart@Emulex.Com, linux-scsi@vger.kernel.org
Subject: Re: lpfc: System freezing if fiber is broken
Date: Wed, 27 Jul 2005 15:45:41 +0200 [thread overview]
Message-ID: <42E79005.3020108@fujitsu-siemens.com> (raw)
In-Reply-To: <20050726184800.GA19810@us.ibm.com>
Mike Anderson wrote:
> Bodo Stroesser [bstroesser@fujitsu-siemens.com] wrote:
>
>>Hi James,
>>
>>disrupting a working FC connection makes my i386 SMP server
>>(2.6.12.2) freeze just one or two seconds after this.
>>I'm normally using lpfc_nodev_tmo = 1. When I change this to the
>>default value of 35, the system stalls about 36 seconds after
>>disruption. So I guess, the problem is caused by nodev_tmo
>>expiring.
>>I activated the nmi_watchdog, but no output.
>>
>>What can I do to analyze this problem?
>
>
> Does changing the timeout for a scsi device also alter the problem. In the
> past people have seen issues of the nodev_tmo expiring near the scsi
> timeout. This past cases lead to devices being offlined, but may this
> could be causing a different symptom on your system.
The amount of time between cutting the connection and the system freezing
is nearly the same as lpfc_nodev_tmo. Using the default nodev_tmo of 35 seconds
results in about 36 seconds, while setting nodev_tmo to 1 results in
2 seconds. As the devices on the Fibre Channel are tapedrives scsi timeout is
900 seconds.
There are 8 tests running that write 8 tape-LUNs at the same SCSI target.
If the connection is broken, some of the tests immediately receive a bad
result for write(), some keep waiting for a result.
Meanwhile I also did some tests with timeout set to 5 and nodev_tmo to 35
(The test I'm running doesn't fail with that small timeout). Those tests, that
do not receive a bad result, stay waiting for result even after 5 second timeout
is expired. In most cases, the system doesn't freeze after nodev_tmo with this test.
But about 5 seconds after plugging FC cable again, it freezes.
> You can change the timeout for the device by echoing a higher value into
> /sys/bus/scsi/devices/${nexus}/timeout.
>
> Is this a full system freeze or only the controlling console?
Full freeze, no more replies via console or network.
>
> -andmike
> --
> Michael Anderson
> andmike@us.ibm.com
prev parent reply other threads:[~2005-07-27 13:45 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-07-26 17:57 lpfc: System freezing if fiber is broken Bodo Stroesser
2005-07-26 18:48 ` Mike Anderson
2005-07-27 13:45 ` Bodo Stroesser [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=42E79005.3020108@fujitsu-siemens.com \
--to=bstroesser@fujitsu-siemens.com \
--cc=James.Smart@Emulex.Com \
--cc=andmike@us.ibm.com \
--cc=linux-scsi@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.