Re: [PATCH 1/1] xfs: fallback to readonly during recovery

Linux XFS filesystem development
 help / color / mirror / Atom feed

From: Vincent Fazio <vfazio@xes-inc.com>
To: Brian Foster <bfoster@redhat.com>, Eric Sandeen <sandeen@sandeen.net>
Cc: Aaron Sierra <asierra@xes-inc.com>, linux-xfs@vger.kernel.org
Subject: Re: [PATCH 1/1] xfs: fallback to readonly during recovery
Date: Tue, 11 Feb 2020 08:04:01 -0600	[thread overview]
Message-ID: <e8169b53-252b-b133-7bc5-ee5dc206c402@xes-inc.com> (raw)
In-Reply-To: <20200211125504.GA2951@bfoster>

All,

On 2/11/20 6:55 AM, Brian Foster wrote:
> On Mon, Feb 10, 2020 at 05:40:03PM -0600, Eric Sandeen wrote:
>> On 2/10/20 4:31 PM, Aaron Sierra wrote:
>>>> From: "Eric Sandeen" <sandeen@sandeen.net>
>>>> Sent: Monday, February 10, 2020 3:43:50 PM
>>>> On 2/10/20 3:10 PM, Vincent Fazio wrote:
>>>>> Previously, XFS would fail to mount if there was an error during log
>>>>> recovery. This can occur as a result of inevitable I/O errors when
>>>>> trying to apply the log on read-only ATA devices since the ATA layer
>>>>> does not support reporting a device as read-only.
>>>>>
>>>>> Now, if there's an error during log recovery, fall back to norecovery
>>>>> mode and mark the filesystem as read-only in the XFS and VFS layers.
>>>>>
>>>>> This roughly approximates the 'errors=remount-ro' mount option in ext4
>>>>> but is implicit and the scope only covers errors during log recovery.
>>>>> Since XFS is the default filesystem for some distributions, this change
>>>>> allows users to continue to use XFS on these read-only ATA devices.
>>>> What is the workload or scenario where you need this behavior?
>>>>
>>>> I'm not a big fan of ~silently mounting a filesystem with latent errors,
>>>> tbh, but maybe you can explain a bit more about the problem you're solving
>>>> here?
>>> Hi Eric,
>>>
>>> We use SSDs from multiple vendors that can be configured at power-on (via
>>> GPIO) to be read-write or write-protected. When write-protected we get I/O
>>> errors for any writes that reach the device. We believe that behavior is
>>> correct.
>>>
>>> We have found that XFS fails during log recovery even when the log is clean
>>> (apparently due to metadata writes immediately before actual recovery).
>> There should be no log recovery if it's clean ...
>>
>> And I don't see that here - a clean log on a readonly device simply mounts
>> RO for me by default, with no muss, no fuss.
>>
>> # mkfs.xfs -f fsfile
>> ...
>> # losetup /dev/loop0 fsfile
>> # mount /dev/loop0 mnt
>> # touch mnt/blah
>> # umount mnt
>> # blockdev --setro /dev/loop0
>> # dd if=/dev/zero of=/dev/loop0 bs=4k count=1
>> dd: error writing ‘/dev/loop0’: Operation not permitted
>> # mount /dev/loop0 mnt
>> mount: /dev/loop0 is write-protected, mounting read-only
>> # dmesg
>> [  419.941649] /dev/loop0: Can't open blockdev
>> [  419.947106] XFS (loop0): Mounting V5 Filesystem
>> [  419.952895] XFS (loop0): Ending clean mount
>> # uname -r
>> 5.5.0
>>
I think it's important to note that you're calling `blockdev --setro` 
here, which sets the device RO at the block layer...

As mentioned in the commit message, the SSDs we work with are ATA 
devices and there is no such mechanism in the ATA spec to report to the 
block layer that the device is RO. What we run into is this:

xfs_log_mount
     xfs_log_recover
         xfs_find_tail
             xfs_clear_stale_blocks
                 xlog_write_log_records
                     xlog_bwrite

the xlog_bwrite fails and triggers the call to xfs_force_shutdown. In 
this specific scenario, we know the log is clean as XFS_MOUNT_WAS_CLEAN 
is set in the log flags, however the stale blocks cannot be removed due 
to the device being write-protected. the call to xfs_clear_stale_blocks 
cannot be obviated because, as mentioned before, ATA devices do not have 
a mechanism to report that they're read-only.

>>> Vincent and I believe that mounting read-only without recovery should be
>>> fine even when the log is not clean, since the filesystem will be consistent,
>>> even if out-of-date.
>> I think that you may be making too many assumptions here, i.e. that "log
>> recovery failure leaves the filesystem in a consistent state" - and that
>> may not be true in all cases.
>>
>> IOWS, transitioning to a new RO state for your particular case may be safe,
>> but I'm not sure that's universally true for all log replay failures.
>>
> Agreed. Just to double down on this bit, this is definitely a misguided
> assumption. Generally speaking, XFS logging places ordering rules on
> metadata writes to the filesystem such that we can guarantee we can
> always recover to a consistent point after a crash. By skipping recovery
> of a dirty log, you are actively bypassing that mechanism.
>
> For example, if a filesystem transaction modifies several objects, those
> objects are logged in a transaction and committed to the physical log.
> Once the transaction is committed to the physical log, the individual
> objects are free to be written back in any arbitrary order because of
> the transactional guarantee that log recovery provides. So nothing
> prevents one object from being written back while another is reused (and
> re-pinned) before a crash that leaves the filesystem in a corrupted
> state. Log recovery is required to update the associated metadata
> objects and make the fs consistent again.
>
> In short, it's probably safer to assume any filesystem mounted with a
> dirty log and norecovery is in fact corrupted as opposed to the other
> way around.
>
> Brian
>
>>> Our customers' use often requires nonvolatile memory to be write-protected
>>> or not based on the device being installed in a development or deployed
>>> system. It is ideal for them to be able to mount their filesystems read-
>>> write when possible and read-only when not without having to alter mount
>>> options.
>>  From my example above, I'd like to understand more why/how you have a
>> clean log that fails to mount by default on a readonly block device...
>> in my testing, no writes get sent to the device when mounting a clean
>> log.
>>
>> -Eric
>>
-- 
Vincent Fazio
Embedded Software Engineer - Linux
Extreme Engineering Solutions, Inc
http://www.xes-inc.com

next prev parent reply	other threads:[~2020-02-11 14:04 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-10 21:10 [PATCH 1/1] xfs: fallback to readonly during recovery Vincent Fazio
2020-02-10 21:43 ` Eric Sandeen
2020-02-10 22:31   ` Aaron Sierra
2020-02-10 23:40     ` Eric Sandeen
2020-02-11 12:55       ` Brian Foster
2020-02-11 14:04         ` Vincent Fazio [this message]
2020-02-11 14:29           ` Eric Sandeen
2020-02-11 15:10             ` Darrick J. Wong
2020-02-11 20:04           ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e8169b53-252b-b133-7bc5-ee5dc206c402@xes-inc.com \
    --to=vfazio@xes-inc.com \
    --cc=asierra@xes-inc.com \
    --cc=bfoster@redhat.com \
    --cc=linux-xfs@vger.kernel.org \
    --cc=sandeen@sandeen.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox