Re: [PATCH v4 00/14] forcealign for xfs

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: John Garry <john.g.garry@oracle.com>
To: Dave Chinner <david@fromorbit.com>
Cc: Ritesh Harjani <ritesh.list@gmail.com>,
	chandan.babu@oracle.com, djwong@kernel.org, dchinner@redhat.com,
	hch@lst.de, viro@zeniv.linux.org.uk, brauner@kernel.org,
	jack@suse.cz, linux-xfs@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	catherine.hoang@oracle.com, martin.petersen@oracle.com
Subject: Re: [PATCH v4 00/14] forcealign for xfs
Date: Wed, 18 Sep 2024 08:59:41 +0100	[thread overview]
Message-ID: <8e13fa74-f8f7-49d3-b640-0daf50da5acb@oracle.com> (raw)
In-Reply-To: <Zun+yci6CeiuNS2o@dread.disaster.area>

On 17/09/2024 23:12, Dave Chinner wrote:
> On Mon, Sep 16, 2024 at 11:24:56AM +0100, John Garry wrote:
>> On 16/09/2024 08:03, Dave Chinner wrote:
>>> OTOH, we can't do this with atomic writes. Atomic writes require
>>> some mkfs help because they require explicit physical alignment of
>>> the filesystem to the underlying storage.
>>
>> If we are enabling atomic writes at mkfs time, then we can ensure agsize %
>> extsize == 0. That provides the physical alignment guarantee. It also makes
>> sense to ensure extsize is a power-of-2.
> 
> No, mkfs does not want to align anything to "extsize". It needs to
> align the filesystem geometry to be compatible with the underlying
> block device atomic write alignment parameters.
> 
> We just don't care if extsize is not an exact multiple of agsize.
> As long as extsize is aligned to the atomic write boundaries and the
> start of the AG is aligned to atomic write boundaries, we can
> allocate hardware aligned extsize sized extents from the AG.
> 
> AGs are always going to contain lots of non-aligned, randomly sized
> extents for other stuff like metadata and unaligned file data.
> Aligned allocation is all about finding extsized aligned free space
> within the AG and has nothing to do with the size of the AG itself.

Fine, we can go the way of aligning the agsize to the atomic write unit 
max for mkfs.

> 
>> However, extsize is re-configurble per inode. So, for an inode enabled for
>> atomic writes, we must still ensure agsize % new extsize == 0 (and also new
>> extsize is a power-of-2)
> 
> Ensuring that the extsize is aligned to the hardware atomic write
> limits is a kernel runtime check when enabling atomic writes on an
> inode.
> 
> In this case, we do not care what the AG size is - it is completely
> irrelevant to these per-inode runtime checks because mkfs has
> already guaranteed that the AG is correctly aligned to the
> underlying hardware. That means is extsize is also aligned to the
> underlying hardware, physical extent layout is guaranteed to be
> compatible with the hardware constraints for atomic writes...

Sure, we would just need to enforce that extsize is a power-of-2 then.

> 
>>> Hence we'll eventually end
>>> up with atomic writes needing to be enabled at mkfs time, but force
>>> align will be an upgradeable feature flag.
>>
>> Could atomic writes also be an upgradeable feature? We just need to ensure
>> that agsize % extsize == 0 for an inode enabled for atomic writes.
> 
> To turn the superblock feature bit on, we have to check the AGs are
> correctly aligned to the *underlying hardware*. If they aren't
> correctly aligned (and there is a good chance they will not be)
> then we can't enable atomic writes at all. The only way to change
> this is to physically move AGs around in the block device (i.e. via
> xfs_expand tool I proposed).
 > > i.e. the mkfs dependency on having the AGs aligned to the underlying
> atomic write capabilities of the block device never goes away, even
> if we want to make the feature dynamically enabled.
> 
> IOWs, yes, an existing filesystem -could- be upgradeable, but there
> is no guarantee that is will be.
> 
> Quite frankly, we aren't going to see block devices that filesystems
> already exist on suddenly sprout support for atomic writes mid-life.

I would not be so sure. Some SCSI devices used in production which I 
know implicitly write 32KB atomically. And we would like to use them for 
atomic writes. 32KB is small and I guess that there is a small chance of 
pre-existing AGs not being 32KB aligned. I would need to check if there 
is even a min alignment for AGs...

> Hence if mkfs detects atomic write support in the underlying device,
> it should *always* modify the geometry to be compatible with atomic
> writes and enable atomic write support.

The current solution is to enable via commandline.

> 
> Yes, that means the "incompat with reflink" issue needs to be fixed
> before we take atomic writes out of experimental (i.e. we consistently
> apply the same "full support" criteria we applied to DAX).

In the meantime, if mkfs auto-enables atomic writes (when the HW 
supports), what will it do to reflink feature (in terms of enabling)?

> 
> Hence by the time atomic writes are a fully supported feature, we're
> going to be able to enable them by default at mkfs time for any
> hardware that supports them...
> 
>> Valid
>> extsize values may be quite limited, though, depending on the value of
>> agsize.
> 
> No. The only limit agsize puts on extsize is that a single aligned
> extent can't be larger than half the AG size. Forced alignment and
> atomic writes don't change that.
> 

ok

Thanks,
John

next prev parent reply	other threads:[~2024-09-18  8:00 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-13 16:36 [PATCH v4 00/14] forcealign for xfs John Garry
2024-08-13 16:36 ` [PATCH v4 01/14] xfs: only allow minlen allocations when near ENOSPC John Garry
2024-08-23 16:28   ` Darrick J. Wong
2024-08-13 16:36 ` [PATCH v4 02/14] xfs: always tail align maxlen allocations John Garry
2024-08-23 16:31   ` Darrick J. Wong
2024-08-29 17:58     ` John Garry
2024-08-29 21:34       ` Darrick J. Wong
2024-08-13 16:36 ` [PATCH v4 03/14] xfs: simplify extent allocation alignment John Garry
2024-08-13 16:36 ` [PATCH v4 04/14] xfs: make EOF allocation simpler John Garry
2024-09-04 18:25   ` Ritesh Harjani
2024-09-05  7:51     ` John Garry
2024-08-13 16:36 ` [PATCH v4 05/14] xfs: introduce forced allocation alignment John Garry
2024-08-13 16:36 ` [PATCH v4 06/14] xfs: align args->minlen for " John Garry
2024-08-13 16:36 ` [PATCH v4 07/14] xfs: Introduce FORCEALIGN inode flag John Garry
2024-08-13 16:36 ` [PATCH v4 08/14] xfs: Update xfs_inode_alloc_unitsize() for forcealign John Garry
2024-08-13 16:36 ` [PATCH v4 09/14] xfs: Update xfs_setattr_size() " John Garry
2024-08-13 16:36 ` [PATCH v4 10/14] xfs: Do not free EOF blocks " John Garry
2024-08-13 16:36 ` [PATCH v4 11/14] xfs: Only free full extents " John Garry
2024-08-13 16:36 ` [PATCH v4 12/14] xfs: Unmap blocks according to forcealign John Garry
2024-08-23 16:35   ` Darrick J. Wong
2024-08-13 16:36 ` [PATCH v4 13/14] xfs: Don't revert allocated offset for forcealign John Garry
2024-08-13 16:36 ` [PATCH v4 14/14] xfs: Enable file data forcealign feature John Garry
2024-09-04 18:14 ` [PATCH v4 00/14] forcealign for xfs Ritesh Harjani
2024-09-04 23:20   ` Dave Chinner
2024-09-05  3:56     ` Ritesh Harjani
2024-09-05  6:33       ` Dave Chinner
2024-09-10  2:51         ` Ritesh Harjani
2024-09-16  6:33           ` Dave Chinner
2024-09-10 12:33         ` Ritesh Harjani
2024-09-16  7:03           ` Dave Chinner
2024-09-16 10:24             ` John Garry
2024-09-17 20:54               ` Darrick J. Wong
2024-09-17 23:34                 ` Dave Chinner
2024-09-17 22:12               ` Dave Chinner
2024-09-18  7:59                 ` John Garry [this message]
2024-09-23  2:57                   ` Dave Chinner
2024-09-23  3:33                     ` Christoph Hellwig
2024-09-23  8:16                       ` John Garry
2024-09-23 12:07                         ` Christoph Hellwig
2024-09-23 12:33                           ` John Garry
2024-09-24  6:17                             ` Christoph Hellwig
2024-09-24  9:48                               ` John Garry
2024-11-29 11:36                                 ` John Garry
2024-09-23  8:00                     ` John Garry
2024-09-05 10:15     ` John Garry
2024-09-05 21:47       ` Dave Chinner
2024-09-06 14:31         ` John Garry
2024-09-08 22:49           ` Dave Chinner
2024-09-09 16:18             ` John Garry
2024-09-16  5:25               ` Dave Chinner
2024-09-16  9:44                 ` John Garry
2024-09-17 22:27                   ` Dave Chinner
2024-09-18 10:12                     ` John Garry
2024-11-14 12:48                       ` Long Li
2024-11-14 16:22                         ` John Garry
2024-11-14 20:07                         ` Dave Chinner
2024-11-15  8:14                           ` John Garry
2024-11-15 11:20                           ` Long Li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8e13fa74-f8f7-49d3-b640-0daf50da5acb@oracle.com \
    --to=john.g.garry@oracle.com \
    --cc=brauner@kernel.org \
    --cc=catherine.hoang@oracle.com \
    --cc=chandan.babu@oracle.com \
    --cc=david@fromorbit.com \
    --cc=dchinner@redhat.com \
    --cc=djwong@kernel.org \
    --cc=hch@lst.de \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=ritesh.list@gmail.com \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).