From: Michael Reed <mdr@sgi.com>
To: James Bottomley <James.Bottomley@SteelEye.com>
Cc: James.Smart@Emulex.Com, Christoph Hellwig <hch@infradead.org>,
linux-scsi <linux-scsi@vger.kernel.org>, Jim Nead <jnead@sgi.com>,
Jeremy Higdon <jeremy@sgi.com>, Gary Hagensen <gwh@sgi.com>
Subject: Re: [PATCH] make fc transport removal of target configurable
Date: Tue, 13 Jun 2006 14:37:39 -0500 [thread overview]
Message-ID: <448F1403.6090903@sgi.com> (raw)
In-Reply-To: <1150221599.3441.54.camel@mulgrave.il.steeleye.com>
James Bottomley wrote:
> On Tue, 2006-06-13 at 10:42 -0500, Michael Reed wrote:
>> Mounted file systems have no clue either. Even with no activity on the
>> fs, if the target stays missing beyond the device loss timeout and then
>> returns, the file system cannot be accessed without intervention.
>>
>> When the target does return, the file system has to be unmounted and
>> remounted on a new "sd" device. This is even if there was no activity
>> on the file system while its target was absent, i.e., it wouldn't otherwise
>> require an unmount/remount.
>
> But lets examine the options: If you leave an uncontactable target
> hanging around, the SCSI error handler will activate anyway when the
> command timeout passes (currently 30s) and the device will be offlined.
Not really true as the transport holds off the error handler until the
transport dev loss timer expires.
And afterwards, commands are returned immediately with DID_NO_CONNECT.
The device is never offlined (with my patch applied).
> Bringing it back online will require user intervention and likely
> necessitate an unmount and a remount to repair the filesystem anyway.
With the unpatched code, the device transitions from ONLINE - BLOCKED - CANCEL -
DEL. Then the infrastructure is removed. With the new code, it
transitions from ONLINE - BLOCKED - ONLINE. Subsequent access to the
device results in i/o errors with a status of DID_NO_CONNECT.
duck /root# dd if=/dev/md0 bs=128k count=1 of=/dev/null
sd 5:0:13:0: SCSI error: return code = 0x10000
end_request: I/O error, dev sdj, sector 0
Buffer I/O error on device md0, logical block 0
sd 5:0:11:0: SCSI error: return code = 0x10000
end_request: I/O error, dev sdh, sector 0
Buffer I/O error on device md0, logical block 4dd:
reading `/dev/md0'sd 5:0:13:0: SCSI error: return code = 0x10000
: Input/output error
The layer issuing the i/o can decide what to do with the device.
> Even if you go further and hold off the error handler, what this will do
> is slowly hang the system since anything that touches an inode on the
> blocked target will be put into D wait. I really think pro-actively
> removing the target is better than either of the other two options.
The error handler is only held off during the dev loss period. Once
the timer expires, the target is unblocked and pending commands issue
and terminate with DID_NO_CONNECT. If there are no pending commands,
nothing bad happens. Many multi-path drivers know to change paths when
"EIO" is returned, so, no EIO, no path switch, even if a prolonged
absence occurs.
The system does not slowly hang. It remains responsive and behaves in
an expected manner.
>
> The device loss timer represents an acceptable compromise between the
> need to keep the target across short disconnect/reconnect events and the
> need to keep the system functioning.
The new parameter doesn't really change the usage of the device loss timer.
It still will result in failed i/o when it expires. It just leaves the
infrastructure around so that if/when the target returns, the reference
holders can resume using it. This is the desired behavior.
The system remains fully functional with no unexpected delays.
Mike
>
> James
>
>
next prev parent reply other threads:[~2006-06-13 19:37 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-06-12 23:16 [PATCH] make fc transport removal of target configurable Michael Reed
2006-06-13 7:07 ` Christoph Hellwig
2006-06-13 11:06 ` James Smart
2006-06-13 15:42 ` Michael Reed
2006-06-13 17:24 ` Stefan Richter
2006-06-13 19:36 ` Michael Reed
2006-06-13 23:13 ` Stefan Richter
2006-06-13 17:33 ` Steve Byan
2006-06-13 19:35 ` Michael Reed
2006-06-13 19:49 ` Steve Byan
2006-06-13 17:59 ` James Bottomley
2006-06-13 19:37 ` Michael Reed [this message]
2006-06-13 20:02 ` James Bottomley
2006-06-13 21:44 ` Michael Reed
2006-06-14 7:21 ` Hannes Reinecke
2006-06-14 16:18 ` Mike Christie
2006-06-14 16:31 ` Mike Christie
2006-06-15 9:04 ` Stefan Richter
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=448F1403.6090903@sgi.com \
--to=mdr@sgi.com \
--cc=James.Bottomley@SteelEye.com \
--cc=James.Smart@Emulex.Com \
--cc=gwh@sgi.com \
--cc=hch@infradead.org \
--cc=jeremy@sgi.com \
--cc=jnead@sgi.com \
--cc=linux-scsi@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox