From: John Garry <john.g.garry@oracle.com>
To: Ojaswin Mujoo <ojaswin@linux.ibm.com>, lsf-pc@lists.linux-foundation.org
Cc: linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org,
djwong@kernel.org, dchinner@redhat.com, hch@lst.de,
ritesh.list@gmail.com, jack@suse.cz, tytso@mit.edu,
linux-ext4@vger.kernel.org
Subject: Re: [LSF/MM/BPF TOPIC] extsize and forcealign design in filesystems for atomic writes
Date: Wed, 29 Jan 2025 08:59:15 +0000 [thread overview]
Message-ID: <35939b19-088b-450e-8fa6-49165b95b1d3@oracle.com> (raw)
In-Reply-To: <Z5nTaQgLGdD6hSvL@li-dc0c254c-257c-11b2-a85c-98b6c1322444.ibm.com>
On 29/01/2025 07:06, Ojaswin Mujoo wrote:
Hi Ojaswin,
>
> I would like to submit a proposal to discuss the design of extsize and
> forcealign and various open questions around it.
>
> ** Background **
>
> Modern NVMe/SCSI disks with atomic write capabilities can allow writes to a
> multi-KB range on disk to go atomically. This feature has a wide variety of use
> cases especially for databases like mysql and postgres that can leverage atomic
> writes to gain significant performance. However, in order to enable atomic
> writes on Linux, the underlying disk may have some size and alignment
> constraints that the upper layers like filesystems should follow. extsize with
> forcealign is one of the ways filesystems can make sure the IO submitted to the
> disk adheres to the atomic writes constraints.
>
> extsize is a hint to the FS to allocate extents at a certian logical alignment
> and size. forcealign builds on this by forcing the allocator to enforce the
> alignment guarantees for physical blocks as well, which is essential for atomic
> writes.
>
> ** Points of discussion **
>
> Extsize hints feature is already supported by XFS [1] with forcealign still
> under development and discussion [2].
From
https://lore.kernel.org/linux-xfs/20241212013433.GC6678@frogsfrogsfrogs/
thread, the alternate solution to forcealign for XFS is to use a
software-emulated fallback for unaligned atomic writes. I am looking at
a PoC implementation now. Note that this does rely on CoW.
There has been push back on forcealign for XFS, so we need to
prove/disprove that this software-emulated fallback can work, see
https://lore.kernel.org/linux-xfs/20240924061719.GA11211@lst.de/
> After taking a look at ext4's multi-block
> allocator design, supporting extsize with forcealign can be done in ext4 as
> well. There is a RFC proposed which adds support for extsize hints feature in
> ext4 [3]. However there are some caveats and deviations from XFS design. With
> these in mind, I would like to propose LSFMM topic on:
>
> * exact semantics of extsize w/ forcealign which can bring a consistent
> interface among ext4 and xfs and possibly any other FS that plans to
> implement them in the future.
>
> * Documenting how forcealign with extsize should behave with various FS
> operations like fallocate, truncate, punch hole, insert/collapse range etcÂ
>
> * Implementing extsize with delayed allocation and the challenges there.
>
> * Discussing tooling support of forcealign like how are we planning to maintain
> block alignment gurantees during fsck, resize and other times where we might
> need to move blocks around?
>
> * Documenting any areas where FSes might differ in their implementations of the
> same. Example, ext4 doesn't plan to support non power of 2 extsizes whereas
> XFS has support for that.
>
> Hopefully this discussion will be relevant in defining consistent semantics for
> extsize hints and forcealign which might as well come useful for other FS
> developers too.
>
> Thoughts and suggestions are welcome.
>
> References:
> [1] https://urldefense.com/v3/__https://man7.org/linux/man-pages/man2/ioctl_xfs_fsgetxattr.2.html__;!!ACWV5N9M2RV99hQ!NoUXCJI_ofztyeV6aq2HvNI4YHcyjSHvzxHkw0fSGB9_SKz6jkAqzBVy7WcUSNNHrJl0jM0qolbvuVK2oQKuYw$
> [2] https://urldefense.com/v3/__https://lore.kernel.org/linux-xfs/20240813163638.3751939-1-john.g.garry@oracle.com/__;!!ACWV5N9M2RV99hQ!NoUXCJI_ofztyeV6aq2HvNI4YHcyjSHvzxHkw0fSGB9_SKz6jkAqzBVy7WcUSNNHrJl0jM0qolbvuVLgqkSeIg$
> [3] https://urldefense.com/v3/__https://lore.kernel.org/linux-ext4/cover.1733901374.git.ojaswin@linux.ibm.com/__;!!ACWV5N9M2RV99hQ!NoUXCJI_ofztyeV6aq2HvNI4YHcyjSHvzxHkw0fSGB9_SKz6jkAqzBVy7WcUSNNHrJl0jM0qolbvuVJ_GK50Cg$
>
> Regards,
> ojaswin
next prev parent reply other threads:[~2025-01-29 8:59 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-29 7:06 [LSF/MM/BPF TOPIC] extsize and forcealign design in filesystems for atomic writes Ojaswin Mujoo
2025-01-29 8:59 ` John Garry [this message]
2025-01-29 16:06 ` Ojaswin Mujoo
2025-01-30 14:08 ` John Garry
2025-02-01 7:12 ` Ojaswin Mujoo
2025-02-04 12:20 ` John Garry
2025-02-04 20:12 ` Dave Chinner
2025-02-07 6:08 ` Ojaswin Mujoo
2025-02-07 12:01 ` John Garry
2025-02-08 17:05 ` Ojaswin Mujoo
2025-03-23 7:00 ` [RFCv1 0/1] EXT4 support of multi-fsblock atomic write with bigalloc Ritesh Harjani (IBM)
2025-03-23 7:00 ` [RFCv1 1/1] ext4: Add multi-fsblock atomic write support " Ritesh Harjani (IBM)
2025-03-23 7:02 ` Ritesh Harjani (IBM)
2025-03-25 11:42 ` Ojaswin Mujoo
2025-03-23 7:02 ` [RFCv1 0/1] EXT4 support of multi-fsblock atomic write " Ritesh Harjani (IBM)
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=35939b19-088b-450e-8fa6-49165b95b1d3@oracle.com \
--to=john.g.garry@oracle.com \
--cc=dchinner@redhat.com \
--cc=djwong@kernel.org \
--cc=hch@lst.de \
--cc=jack@suse.cz \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
--cc=lsf-pc@lists.linux-foundation.org \
--cc=ojaswin@linux.ibm.com \
--cc=ritesh.list@gmail.com \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).