linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: John Garry <john.g.garry@oracle.com>
To: Ojaswin Mujoo <ojaswin@linux.ibm.com>, lsf-pc@lists.linux-foundation.org
Cc: linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	djwong@kernel.org, dchinner@redhat.com, hch@lst.de,
	ritesh.list@gmail.com, jack@suse.cz, tytso@mit.edu,
	linux-ext4@vger.kernel.org
Subject: Re: [LSF/MM/BPF TOPIC] extsize and forcealign design in filesystems for atomic writes
Date: Wed, 29 Jan 2025 08:59:15 +0000	[thread overview]
Message-ID: <35939b19-088b-450e-8fa6-49165b95b1d3@oracle.com> (raw)
In-Reply-To: <Z5nTaQgLGdD6hSvL@li-dc0c254c-257c-11b2-a85c-98b6c1322444.ibm.com>

On 29/01/2025 07:06, Ojaswin Mujoo wrote:

Hi Ojaswin,

> 
> I would like to submit a proposal to discuss the design of extsize and
> forcealign and various open questions around it.
> 
>   ** Background **
> 
> Modern NVMe/SCSI disks with atomic write capabilities can allow writes to a
> multi-KB range on disk to go atomically. This feature has a wide variety of use
> cases especially for databases like mysql and postgres that can leverage atomic
> writes to gain significant performance. However, in order to enable atomic
> writes on Linux, the underlying disk may have some size and alignment
> constraints that the upper layers like filesystems should follow. extsize with
> forcealign is one of the ways filesystems can make sure the IO submitted to the
> disk adheres to the atomic writes constraints.
> 
> extsize is a hint to the FS to allocate extents at a certian logical alignment
> and size. forcealign builds on this by forcing the allocator to enforce the
> alignment guarantees for physical blocks as well, which is essential for atomic
> writes.
> 
>   ** Points of discussion **
> 
> Extsize hints feature is already supported by XFS [1] with forcealign still
> under development and discussion [2].

 From 
https://lore.kernel.org/linux-xfs/20241212013433.GC6678@frogsfrogsfrogs/ 
thread, the alternate solution to forcealign for XFS is to use a 
software-emulated fallback for unaligned atomic writes. I am looking at 
a PoC implementation now. Note that this does rely on CoW.

There has been push back on forcealign for XFS, so we need to 
prove/disprove that this software-emulated fallback can work, see 
https://lore.kernel.org/linux-xfs/20240924061719.GA11211@lst.de/

> After taking a look at ext4's multi-block
> allocator design, supporting extsize with forcealign can be done in ext4 as
> well. There is a RFC proposed which adds support for extsize hints feature in
> ext4 [3]. However there are some caveats and deviations from XFS design. With
> these in mind, I would like to propose LSFMM topic on:
> 
>   * exact semantics of extsize w/ forcealign which can bring a consistent
>     interface among ext4 and xfs and possibly any other FS that plans to
>     implement them in the future.
> 
>   * Documenting how forcealign with extsize should behave with various FS
>     operations like fallocate, truncate, punch hole, insert/collapse range etcÂ
> 
>   * Implementing extsize with delayed allocation and the challenges there.
> 
>   * Discussing tooling support of forcealign like how are we planning to maintain
>     block alignment gurantees during fsck, resize and other times where we might
>     need to move blocks around?
> 
>   * Documenting any areas where FSes might differ in their implementations of the
>     same. Example, ext4 doesn't plan to support non power of 2 extsizes whereas
>     XFS has support for that.
> 
> Hopefully this discussion will be relevant in defining consistent semantics for
> extsize hints and forcealign which might as well come useful for other FS
> developers too.
> 
> Thoughts and suggestions are welcome.
> 
> References:
> [1] https://urldefense.com/v3/__https://man7.org/linux/man-pages/man2/ioctl_xfs_fsgetxattr.2.html__;!!ACWV5N9M2RV99hQ!NoUXCJI_ofztyeV6aq2HvNI4YHcyjSHvzxHkw0fSGB9_SKz6jkAqzBVy7WcUSNNHrJl0jM0qolbvuVK2oQKuYw$
> [2] https://urldefense.com/v3/__https://lore.kernel.org/linux-xfs/20240813163638.3751939-1-john.g.garry@oracle.com/__;!!ACWV5N9M2RV99hQ!NoUXCJI_ofztyeV6aq2HvNI4YHcyjSHvzxHkw0fSGB9_SKz6jkAqzBVy7WcUSNNHrJl0jM0qolbvuVLgqkSeIg$
> [3] https://urldefense.com/v3/__https://lore.kernel.org/linux-ext4/cover.1733901374.git.ojaswin@linux.ibm.com/__;!!ACWV5N9M2RV99hQ!NoUXCJI_ofztyeV6aq2HvNI4YHcyjSHvzxHkw0fSGB9_SKz6jkAqzBVy7WcUSNNHrJl0jM0qolbvuVJ_GK50Cg$
> 
> Regards,
> ojaswin


  reply	other threads:[~2025-01-29  8:59 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-29  7:06 [LSF/MM/BPF TOPIC] extsize and forcealign design in filesystems for atomic writes Ojaswin Mujoo
2025-01-29  8:59 ` John Garry [this message]
2025-01-29 16:06   ` Ojaswin Mujoo
2025-01-30 14:08     ` John Garry
2025-02-01  7:12       ` Ojaswin Mujoo
2025-02-04 12:20         ` John Garry
2025-02-04 20:12           ` Dave Chinner
2025-02-07  6:08           ` Ojaswin Mujoo
2025-02-07 12:01             ` John Garry
2025-02-08 17:05               ` Ojaswin Mujoo
2025-03-23  7:00 ` [RFCv1 0/1] EXT4 support of multi-fsblock atomic write with bigalloc Ritesh Harjani (IBM)
2025-03-23  7:00   ` [RFCv1 1/1] ext4: Add multi-fsblock atomic write support " Ritesh Harjani (IBM)
2025-03-23  7:02     ` Ritesh Harjani (IBM)
2025-03-25 11:42       ` Ojaswin Mujoo
2025-03-23  7:02   ` [RFCv1 0/1] EXT4 support of multi-fsblock atomic write " Ritesh Harjani (IBM)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=35939b19-088b-450e-8fa6-49165b95b1d3@oracle.com \
    --to=john.g.garry@oracle.com \
    --cc=dchinner@redhat.com \
    --cc=djwong@kernel.org \
    --cc=hch@lst.de \
    --cc=jack@suse.cz \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=ojaswin@linux.ibm.com \
    --cc=ritesh.list@gmail.com \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).