From: James Smart <James.Smart@Emulex.Com>
To: Mike Christie <michaelc@cs.wisc.edu>
Cc: Andrew Vasquez <andrew.vasquez@qlogic.com>,
Linux-SCSI Mailing List <linux-scsi@vger.kernel.org>,
James Bottomley <james.bottomley@steeleye.com>
Subject: Re: [PATCH 7/8] qla2xxx: Stall mid-layer error handlers while rport is blocked.
Date: Fri, 06 Oct 2006 11:23:19 -0400 [thread overview]
Message-ID: <452674E7.9080606@emulex.com> (raw)
In-Reply-To: <45252E1C.3000403@cs.wisc.edu>
Mike Christie wrote:
> James Smart wrote:
>> Given this is the 3rd instance of this (qla2xxxx, lpfc, mpt fusion),
>> we should either:
>>
>> - Fix the error handler. (but we all know this is a lot of work,
>> of which none of us have the time to do, nor expect it to
>> be complete in time for our next distro delivery).
>
> I understand the bugs in the eh. I have worked around them in iscsi and
> tried to fix them in scsi-ml :) (still working on the queuecommand
> SCSI_ML_HOST/DEVICE_BUSY fix), but along with the problems in the eh
> where we could get the device offlined there could really be times when
> the device needs to be offlined and reonlined, right?
True...
> For iscsi we do
> not really worry about either, in our userspace daemon we have code
> where if the device was offlined and the daemon has corrected the
> problem (or in qla4xxx case has been notified that the problem has been
> corrected), then we online the devices.
Ok - but that's not really the intent around offlining. Offlining implies
that recovery steps were taken, but it didn't result in a functional device,
thus retries are likely to fail as well - which implies that device media
is corrupt and could use some user interaction to clean up (filesystem check
and the like). So - it's not always the best ideal to simply online after
resolving the link state for the device.
That said, there are scenarios in which we lose connectivity altogether
to the device, but never end up taking it offline (e.g. we fail the i/o
outright in queuecommand without going through the error handler). I would
assume the device media is as screwed up as when offline was justified.
Perhaps it's because we expect the upper layers to be preserving the data
that was contained in the retry (consider block cache data) and to
eventually attempt to resync this data when connectivity to the device is
restored. If we have this scenario, it implies that simply onlining after
the link state is ok, is all right..
> Since FC, has added a netlink
> interface could we add something like fc_rport state changed event
> support to some daemom. The daemon could online the device when the
> rport state is back up if needed.
Well - this is similar to what we talked about in the storage bof at
OLS. We decided to add kobject calls for block and unblock to the
block device. These are the events you could key off of. I'll complete
the patch for these events.
> I was also thinking that the iscsi code has some common features and
> maybe iscsi and fc could share something in some sort of blktool daemon.
Sounds useful - we just have to make sure it's keeping the system sane.
> Or do you think the userspace daemon is more of hack in userspace. I
> cannot tell when I am hacking around something in the kernel or doing
> something nifty in userspace anymore :)
As pointed out above, unconditionally onlining a device is not really
the right solution, although it is what users unconditionally do after
loosing connectivity and encountering the offline state. Anyone else
with some thoughts here. Users really don't understand the offline state
and the manual onlining once they believe they have restored connectivity
to the device.
-- james
next prev parent reply other threads:[~2006-10-06 15:23 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-10-02 18:59 [PATCH 0/8] qla2xxx: driver update Andrew Vasquez
2006-10-02 19:00 ` [PATCH 1/8] qla2xxx: Add iIDMA support Andrew Vasquez
2006-10-10 9:29 ` Christoph Hellwig
2006-10-02 19:00 ` [PATCH 2/8] qla2xxx: Add support for symbolic nodename FC transport attribute Andrew Vasquez
2006-10-10 9:30 ` Christoph Hellwig
2006-10-12 16:37 ` Andrew Vasquez
2006-10-02 19:00 ` [PATCH 3/8] qla2xxx: Add support for system hostname " Andrew Vasquez
2006-10-10 9:32 ` Christoph Hellwig
2006-10-02 19:00 ` [PATCH 4/8] qla2xxx: Add support for fabric name " Andrew Vasquez
2006-10-10 9:32 ` Christoph Hellwig
2006-10-02 19:00 ` [PATCH 5/8] qla2xxx: Add support for host port state " Andrew Vasquez
2006-10-10 9:33 ` Christoph Hellwig
2006-10-02 19:00 ` [PATCH 6/8] qla2xxx: Add MODULE_FIRMWARE tags Andrew Vasquez
2006-10-10 9:33 ` Christoph Hellwig
2006-10-02 19:00 ` [PATCH 7/8] qla2xxx: Stall mid-layer error handlers while rport is blocked Andrew Vasquez
2006-10-02 19:26 ` James Smart
2006-10-02 22:14 ` Matthew Wilcox
2006-10-02 23:05 ` Mike Anderson
2006-10-03 14:24 ` James Smart
2006-10-05 16:09 ` Mike Christie
2006-10-06 15:23 ` James Smart [this message]
2006-10-06 17:01 ` Mike Christie
2006-10-06 17:33 ` James Smart
2006-10-10 15:11 ` Patrick Mansfield
2006-10-02 19:00 ` [PATCH 8/8] qla2xxx: Update version number to 8.01.07-k2 Andrew Vasquez
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=452674E7.9080606@emulex.com \
--to=james.smart@emulex.com \
--cc=andrew.vasquez@qlogic.com \
--cc=james.bottomley@steeleye.com \
--cc=linux-scsi@vger.kernel.org \
--cc=michaelc@cs.wisc.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox