From: Mike Christie <michaelc@cs.wisc.edu>
To: James.Smart@Emulex.Com
Cc: Andrew Vasquez <andrew.vasquez@qlogic.com>,
Linux-SCSI Mailing List <linux-scsi@vger.kernel.org>,
James Bottomley <james.bottomley@steeleye.com>
Subject: Re: [PATCH 7/8] qla2xxx: Stall mid-layer error handlers while rport is blocked.
Date: Fri, 06 Oct 2006 12:01:45 -0500 [thread overview]
Message-ID: <45268BF9.10007@cs.wisc.edu> (raw)
In-Reply-To: <452674E7.9080606@emulex.com>
James Smart wrote:
>
>
> Mike Christie wrote:
>> James Smart wrote:
>>> Given this is the 3rd instance of this (qla2xxxx, lpfc, mpt fusion),
>>> we should either:
>>>
>>> - Fix the error handler. (but we all know this is a lot of work,
>>> of which none of us have the time to do, nor expect it to
>>> be complete in time for our next distro delivery).
>>
>> I understand the bugs in the eh. I have worked around them in iscsi and
>> tried to fix them in scsi-ml :) (still working on the queuecommand
>> SCSI_ML_HOST/DEVICE_BUSY fix), but along with the problems in the eh
>> where we could get the device offlined there could really be times when
>> the device needs to be offlined and reonlined, right?
>
> True...
>
>> For iscsi we do
>> not really worry about either, in our userspace daemon we have code
>> where if the device was offlined and the daemon has corrected the
>> problem (or in qla4xxx case has been notified that the problem has been
>> corrected), then we online the devices.
>
> Ok - but that's not really the intent around offlining. Offlining implies
> that recovery steps were taken, but it didn't result in a functional
> device,
> thus retries are likely to fail as well - which implies that device media
> is corrupt and could use some user interaction to clean up (filesystem
> check
> and the like). So - it's not always the best ideal to simply online after
> resolving the link state for the device.
Yeah ok I can see your point but there are some problems with this
currently. Maybe I am thinking about this wrong too.
In order to do diagnostics like TUR or fscheck you have to online the
device first. If the device is offlined because the connection is down,
multipathd does not want to touch the online state. It does not know why
the device was offlined and does not think it can experiment there.
Should it? ChristopheV does not feel it should so if iscsid knows the
device was offlined because of a connection failure, we online it so
multipathd can do its tests. If we are doing a FS directly on a disk
then we need to online the device so a user can now do fscheck. So I am
saying we are onlining devices because we have correct the problem on
our side and now the user can do whatever tests they need to do.
Maybe we need to fix up the SDEV_QUIESCE so we can do diagnostic IOs
with SG_IO. Userspace can at least set the device to this state and do
some tests but all other IO will not get through and the upper layers do
not have to do special things like set the device in READ only or set
the path state as failed.
Or are you saying that even if we are able to relogin then there will be
problems that cannot be handled with the current tools? Something like
that one sense bug I was asking you about at OLS right? I am not sure
what to do with that?
next prev parent reply other threads:[~2006-10-06 17:02 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-10-02 18:59 [PATCH 0/8] qla2xxx: driver update Andrew Vasquez
2006-10-02 19:00 ` [PATCH 1/8] qla2xxx: Add iIDMA support Andrew Vasquez
2006-10-10 9:29 ` Christoph Hellwig
2006-10-02 19:00 ` [PATCH 2/8] qla2xxx: Add support for symbolic nodename FC transport attribute Andrew Vasquez
2006-10-10 9:30 ` Christoph Hellwig
2006-10-12 16:37 ` Andrew Vasquez
2006-10-02 19:00 ` [PATCH 3/8] qla2xxx: Add support for system hostname " Andrew Vasquez
2006-10-10 9:32 ` Christoph Hellwig
2006-10-02 19:00 ` [PATCH 4/8] qla2xxx: Add support for fabric name " Andrew Vasquez
2006-10-10 9:32 ` Christoph Hellwig
2006-10-02 19:00 ` [PATCH 5/8] qla2xxx: Add support for host port state " Andrew Vasquez
2006-10-10 9:33 ` Christoph Hellwig
2006-10-02 19:00 ` [PATCH 6/8] qla2xxx: Add MODULE_FIRMWARE tags Andrew Vasquez
2006-10-10 9:33 ` Christoph Hellwig
2006-10-02 19:00 ` [PATCH 7/8] qla2xxx: Stall mid-layer error handlers while rport is blocked Andrew Vasquez
2006-10-02 19:26 ` James Smart
2006-10-02 22:14 ` Matthew Wilcox
2006-10-02 23:05 ` Mike Anderson
2006-10-03 14:24 ` James Smart
2006-10-05 16:09 ` Mike Christie
2006-10-06 15:23 ` James Smart
2006-10-06 17:01 ` Mike Christie [this message]
2006-10-06 17:33 ` James Smart
2006-10-10 15:11 ` Patrick Mansfield
2006-10-02 19:00 ` [PATCH 8/8] qla2xxx: Update version number to 8.01.07-k2 Andrew Vasquez
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=45268BF9.10007@cs.wisc.edu \
--to=michaelc@cs.wisc.edu \
--cc=James.Smart@Emulex.Com \
--cc=andrew.vasquez@qlogic.com \
--cc=james.bottomley@steeleye.com \
--cc=linux-scsi@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox