From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-it0-f42.google.com ([209.85.214.42]:48106 "EHLO mail-it0-f42.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751840AbdJIL6z (ORCPT ); Mon, 9 Oct 2017 07:58:55 -0400 Received: by mail-it0-f42.google.com with SMTP id p138so13347250itp.2 for ; Mon, 09 Oct 2017 04:58:55 -0700 (PDT) Subject: Re: [PATCH v8 2/2] btrfs: check device for critical errors and mark failed To: bo.li.liu@oracle.com Cc: Anand Jain , linux-btrfs@vger.kernel.org References: <20171003155920.24925-1-anand.jain@oracle.com> <20171003155920.24925-3-anand.jain@oracle.com> <20171004201154.GB4902@dhcp-10-211-47-181.usdhcp.oraclecorp.com> <20171006233357.GB19068@lim.localdomain> From: "Austin S. Hemmelgarn" Message-ID: <7c4f262c-2843-0876-8914-761b371f9371@gmail.com> Date: Mon, 9 Oct 2017 07:58:49 -0400 MIME-Version: 1.0 In-Reply-To: <20171006233357.GB19068@lim.localdomain> Content-Type: text/plain; charset=utf-8; format=flowed Sender: linux-btrfs-owner@vger.kernel.org List-ID: On 2017-10-06 19:33, Liu Bo wrote: > On Thu, Oct 05, 2017 at 07:07:44AM -0400, Austin S. Hemmelgarn wrote: >> On 2017-10-04 16:11, Liu Bo wrote: >>> On Tue, Oct 03, 2017 at 11:59:20PM +0800, Anand Jain wrote: >>>> From: Anand Jain >>>> >>>> Write and flush errors are critical errors, upon which the device fd >>>> must be closed and marked as failed. >>>> >>> >>> Can we defer the job of closing device to umount? >>> >>> We can go mark the device failed and skip it while doing read/write, >>> and umount can do the cleanup work. >>> >>> That way we don't need a dedicated thread looping around to detect a >>> rare situation. >> If BTRFS doesn't close the device, then it's 100% guaranteed if it >> reconnects that it will show up under a different device node. It would >> also mean that the device node stays visible when there is in fact no device >> connected to it, which is a pain from a monitoring perspective. > > I see, you're assuming that these errors are due to disconnection of > disks, it could be bad sectors (although almost impossible from > enterprise hard disks) or some other errors across the stack. And we need to handle any of these cases. If BTRFS is going to claim handling of device failures, it needs to handle all types of device failure it reasonably can without making assumptions about what type of failure occurred. > > I do agree that cleanup needs to be done if disk got disconnected, but > not doing cleanup here, a udev rule is needed to handle such an event. And why exactly does it need to involve userspace beyond notifying that something bad happened? We know the failure happened, why should we tell userspace about it and require userspace to tell us to handle it? It's not like this is a performance critical path since it won't happen very often, and the primary concern is data safety, not performance, so why exactly do we need to push something that isn't a policy decision (and arguably isn't a decision, if we're not using a device, we shouldn't be holding it open, period) out to userspace? Looking at it a bit differently though, every time I've seen any write or flush errors, either the connection to the device was unstable, or the storage device was failing, and neither case is something that should require userspace to make a decision about how to handle things. To be entirely honest though, from a system administrator's perspective, I would actually expect a device getting disconnected (or otherwise disappearing off the bus) to end up in a 'Missing' state, not a 'Failed' state. I see no reason to treat it any differently than if it disappeared before the volume was mounted, other than the requirement to actually handle it. That type of handling _will_ require userspace involvement (because I seriously doubt there will ever be proper notification from the block layer to holders that a device is gone), but that's also largely orthogonal to this patch set (there's no reason that we need to handle write or flush errors the same way).