From: Liu Bo <bo.li.liu@oracle.com>
To: Anand Jain <anand.jain@oracle.com>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: [PATCH v8 2/2] btrfs: check device for critical errors and mark failed
Date: Fri, 13 Oct 2017 11:46:28 -0700 [thread overview]
Message-ID: <20171013184628.GD1553@lim.localdomain> (raw)
In-Reply-To: <634a15da-a831-d196-55c6-8684a7288572@oracle.com>
On Sun, Oct 08, 2017 at 10:23:58PM +0800, Anand Jain wrote:
>
>
> On 10/07/2017 07:56 AM, Liu Bo wrote:
> > On Thu, Oct 05, 2017 at 09:56:59PM +0800, Anand Jain wrote:
> > >
> > >
> > > On 10/05/2017 04:11 AM, Liu Bo wrote:
> > > > On Tue, Oct 03, 2017 at 11:59:20PM +0800, Anand Jain wrote:
> > > > > From: Anand Jain <Anand.Jain@oracle.com>
> > > > >
> > > > > Write and flush errors are critical errors, upon which the device fd
> > > > > must be closed and marked as failed.
> > > > >
> > > >
> > > > Can we defer the job of closing device to umount?
> > >
> > > Originally I think it was designed to handle the bad disk
> > > same as missing device. as below [1]. This patch just does that.
>
> > > But I am curious to know if there is any issue to close failed
> > > device ?
>
> Anything on this ?
No big deal, it's just different with md's behavior, but surely it
doesn't make a big difference whether we close device fd here or not.
>
>
> > > [1]
> > > -----
> > > static int read_one_dev(struct btrfs_fs_info *fs_info,
> > > struct extent_buffer *leaf,
> > > struct btrfs_dev_item *dev_item)
> > > ::
> > > /*
> > > * this happens when a device that was properly setup
> > > * in the device info lists suddenly goes bad.
> > > * device->bdev is NULL, and so we have to set
> > > * device->missing to one here
> > > */
> > > ------
> > >
> > > So here I misuse the device missing scenario (device->bdev = NULL)
> > > to the failed device scenario. (code comment mentioned it).
> > >
> >
> > This is OK to me, we can unify these stuff later, for now, ->missing
> > is to bypass this failed disk for r/w.
> >
> > > Secondly as in the patch 1/2 commit log and also Austin mentioned it,
> > > if failed device close is deferred it will also defer the cleanup of
> > > the failed device at the block layer.
> > >
> > > But yes block layer can still clean up when btrfs closes the
> > > device at the time of replace, also where in the long run, pulling
> > > out of the failed disk would (planned) trigger a udev notification
> > > into the btrfs sysfs interface and it can close the device.
> > >
> > > Anything that I have missed ?
> > >
> >
> > I'm more inclined to do cleanup by making udev call 'device remove',
> > which is supposed to do all the validation checks you need to done
> > here.
>
> Validation check like what ?
OK, just realize that I was talking about what we should do upon
device getting pulled out, in that case, validation check about raid
profile is needed, such as can we keep the raid profile or do we need
to degrade raid proflie, and 'device delete' can do that.
Although I agree with the fact that write/flush errors are critical
ones, it's still too strict if it got only a few write/flush errors,
given cable issues or timeout from some layers may also lead to those
errors. A compromised way is to have a threshold to give drives
another chance.
thanks,
-liubo
>
> Thanks, Anand
>
> > > Further, jfyi closing of failed device isn't related to the
> > > looping for device failure though.
> > >
> > > > We can go mark the device failed and skip it while doing read/write,
> > > > and umount can do the cleanup work.
> > >
> > > In servers a need of umount is considered as downtime, things
> > > shouldn't wait for umount to cleanup.
> > >
> >
> > OK.
> >
> > > > That way we don't need a dedicated thread looping around to detect a
> > > > rare situation.
> > >
> > > I think its better to make it event based instead of thread looping,
> > > pls take a look..
> > >
> > > [PATCH v8.1 2/2] btrfs: mark device failed for write and flush errors
> > >
> >
> > Event based is good, for anything about failing disk, we would have to
> > follow what md is doing eventually, i.e. let udev handle it.
> >
> > thanks,
> > -liubo
> >
next prev parent reply other threads:[~2017-10-13 18:50 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-10-03 15:59 [PATCH v8 0/2] [RFC] Introduce device state 'failed' Anand Jain
2017-10-03 15:59 ` [PATCH v8 1/2] btrfs: introduce device dynamic state transition to failed Anand Jain
2017-10-13 18:47 ` Liu Bo
2017-10-16 6:09 ` Anand Jain
2017-10-03 15:59 ` [PATCH v8 2/2] btrfs: check device for critical errors and mark failed Anand Jain
2017-10-04 20:11 ` Liu Bo
2017-10-05 11:07 ` Austin S. Hemmelgarn
2017-10-06 23:33 ` Liu Bo
2017-10-09 11:58 ` Austin S. Hemmelgarn
2017-10-05 13:56 ` Anand Jain
2017-10-06 23:56 ` Liu Bo
2017-10-08 14:23 ` Anand Jain
2017-10-13 18:46 ` Liu Bo [this message]
2017-10-16 6:09 ` Anand Jain
2017-10-05 13:54 ` [PATCH v8.1 2/2] btrfs: mark device failed for write and flush errors Anand Jain
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20171013184628.GD1553@lim.localdomain \
--to=bo.li.liu@oracle.com \
--cc=anand.jain@oracle.com \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).