Re: [PATCH RFC 5/6] fs: xfs: iomap atomic write support

linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: John Garry <john.g.garry@oracle.com>
To: Dave Chinner <david@fromorbit.com>
Cc: "Darrick J. Wong" <djwong@kernel.org>,
	hch@lst.de, viro@zeniv.linux.org.uk, brauner@kernel.org,
	dchinner@redhat.com, jack@suse.cz, chandan.babu@oracle.com,
	martin.petersen@oracle.com, linux-kernel@vger.kernel.org,
	linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	tytso@mit.edu, jbongio@google.com, ojaswin@linux.ibm.com
Subject: Re: [PATCH RFC 5/6] fs: xfs: iomap atomic write support
Date: Thu, 15 Feb 2024 09:53:47 +0000	[thread overview]
Message-ID: <c0d1d513-991b-4893-a132-66ee02c0c880@oracle.com> (raw)
In-Reply-To: <Zc1GwE/7QJisKZCX@dread.disaster.area>


>>
>> Yes, I was just looking at adding a mkfs option to format for atomic writes,
>> which would check physical information about the volume and whether it suits
>> rtextsize and then subsequently also set XFS_SB_FEAT_RO_COMPAT_ATOMICWRITES.
> 
> FWIW, atomic writes need to be implemented in XFS in a way that
> isn't specific to the rtdev. There is no reason they cannot be
> applied to the data device (via superblock max atomic write size
> field or extent size hints and the align flag) so
> please don't get hung up on rtextsize as the only thing that atomic
> write alignment might apply to.

Sure

> 
>>> Yes, mkfs allows the user to override the hardware configsi it
>>> probes, but it also warns when the override is doing something
>>> sub-optimal (like aligning all AG headers to the same disk in a
>>> stripe).
>>>
>>> IOWs, mkfs should be pulling this atomic write info from the
>>> hardware and configuring the filesysetm around that information.
>>> That's the target we should be aiming the kernel implementation at
>>> and optimising for - a filesystem that is correctly configured
>>> according to published hardware capability.
>>
>> Right
>>
>> So, for example, if the atomic writes option is set and the rtextsize set by
>> the user is so much larger than what HW can support in terms of atomic
>> writes, then we should let the user know about this.
> 
> Well, this is part of the problem I mention above: you're focussing
> entirely on the rtdev setup and not the general "atomic writes
> require BMBT extent alignment constraints".

I'm really just saying what I was considering based on this series only.

> 
> So, maybe, yes, we might want to warn that the rtextsize is much
> bigger than the atomic write size of that device, but now there's
> something else we need to take into account: the rtdev could have a
> different atomic write size comxpapred to the data device.
> 
> What now?
> 
> IOWs, focussing on the rtdev misses key considerations for making
> the functionality generic, and we most definitely don't want to have
> to rev the on disk format a second time to support atomic writes
> for the data device. Hence we likely need two variables for atomic
> write sizes in the superblock - one for the rtdev, and one for the
> data device. And this then feeds through to Darrick's multi-rtdev
> stuff - each rtdev will need to have an attribute that tracks this
> information.

ok


>>>
>>> What the patchset does is try to extend and infer things from
>>> existing allocation alignment constraints, but then not apply those
>>> constraints to critical extent state operations (pure BMBT
>>> modifications) that atomic writes also need constrained to work
>>> correctly and efficiently.
>>
>> Right. Previously I also did mention that we could explicitly request the
>> atomic write size per-inode, but a drawback is that this would require an
>> on-disk format change.
> 
> We're already having to change the on-disk format for this (inode
> flag, superblock feature bit), so we really should be trying to make
> this generic and properly featured so that it can be used for
> anything that requires physical alignment of file data extents, not
> just atomic writes...

ok

...

>> Another motivation for this flag is that we can explicitly enable some
>> software-based atomic write support for an inode when the backing device
>> does not have HW support.
> 
> That's orthogonal to the aligment issue. If the BMBT extents are
> always aligned in a way that is compatible with atomic writes, we
> don't need and aomtic writes flag to tell the filesystem it should
> do an atomic write. 

Any instruction to do an atomic write should be encoded in the userspace 
write operation. Or maybe the file open operation in future - I still 
get questions about O_ATOMIC.

> That comes from userspace via the IOCB_ATOMIC
> flag.
> 
> It is the IOCB_ATOMIC that triggers the software fallback when the
> hardware doesn't support atomic writes, not an inode flag. 

To me, any such fallback seems like something which we should be 
explicitly enabling.

> All the
> filesystem has to do is guarantee all extent manipulations are
> correctly aligned and IOCB_ATOMIC can always be executed regardless
> of whether it is hardware or software that does it.
> 
> 
>> In addition, in this series setting FS_XFLAG_ATOMICWRITES means
>> XFS_DIFLAG2_ATOMICWRITES gets set, and I would expect it to do something
>> similar for other OSes, and for those other OSes it may also mean some other
>> special alignment feature enabled. We want a consistent user experience.
> 
> I don't care about other OSes - they don't implement the
> FS_IOC_FSSETXATTR interfaces, so we just don't care about cross-OS
> compatibility for the user API.

Other FSes need to support FS_IOC_FSSETXATTR for atomic writes. Or at 
least should support it....

> 
> Fundamentally, atomic writes are *not a property of the filesystem
> on-disk format*. They require extent tracking constraints (i.e.
> alignment), and that's the property of the filesystem on-disk format
> that we need to manipulate here.
> 
> Users are not going to care if the setup ioctl for atomic writes
> is to set FS_XFLAG_ALIGN_EXTENTS vs FS_XFLAG_ATOMICWRITES. They
> already know they have to align their IO properly for atomic writes,
> so it's not like this is something they will be completely
> unfamiliar with.
> 
> Indeed, FS_XFLAG_ALIGN_EXTENTS | FS_XFLAG_EXTSIZE w/ fsx.fsx_extsize
> = max_atomic_write_size as a user interface to set up the inode for
> atomic writes is pretty concise and easy to use. It also isn't
> specific to atomic writes, and so this fucntionality can be used for
> anything that needs aligned extent manipulation for performant
> functionality.

This is where there seems to be a difference between how you would like 
atomic writes to be enabled and how Christoph would, judging by this:
https://lore.kernel.org/linux-fsdevel/20240110091929.GA31003@lst.de/

I think that if the FS and the userspace util can between them figure 
out what to do, then that is ok. This is something like what I proposed 
previously:

xfs_io -c "atomic-writes 64K" mnt/file

where the userspace util would use for the FS_IOC_FSSETXATTR ioctl:

FS_XFLAG_ATOMIC_WRITES | FS_XFLAG_ALIGN_EXTENTS | FS_XFLAG_EXTSIZE w/ 
fsx.fsx_extsize

There is a question on the purpose of the FS_XFLAG_ATOMIC_WRITES flag, 
but, to me, it does seem useful for future feature support.

We could just have FS_XFLAG_ATOMIC_WRITES | FS_XFLAG_EXTSIZE w/ 
fsx.fsx_extsize, and the kernel auto-enables FS_XFLAG_ALIGN_EXTENTS, but 
the other way seems better


> 
>>> to behave in a particular way - forced alignment - not that atomic
>>> writes are being used in the filesystem....
>>>
>>> At this point, the filesystem can do all the extent modification
>>> alignment stuff that atomic writes require without caring if the
>>> block device supports atomic writes or even if the
>>> application is using atomic writes.
>>>
>>> This means we can test the BMBT functionality in fstests without
>>> requiring hardware (or emulation) that supports atomic writes - all
>>> we need to do is set the forced align flag, an extent size hint and
>>> go run fsx on it...
>>>
>>
>> The current idea was that the forcealign feature would be required for the
>> second phase for atomic write support - non-rtvol support. Darrick did send
>> that series out separately over the New Year's break.
> 
> This is the wrong approach: this needs to be integrated into the
> same patchset so we can review the changes for atomic writes as a
> whole, not as two separate, independent on-disk format changes. The
> on-disk format change that atomic writes need is aligned BMBT extent
> manipulations, regardless of whether we are targeting the rtdev or
> the datadev, and trying to separate them is leading you down
> entirely the wrong path.
> 

ok, fine.

BTW, going back to the original discussion on the extent zeroing and 
your idea to do this in the iomap dio path, my impression is that we 
require changes like this:

- in iomap_dio_bio_iter(), we need to zero out the extent according to 
extsize (and not just FS block size)
- xfs_dio_write_end_io() -> xfs_iomap_write_unwritten() also needs to 
consider this extent being written, and not assume a FS block
- per-inode locking similar to what is done in 
xfs_file_dio_write_unaligned() - I need to check that further, though, 
as I am not yet sure on how we decide to use this exclusive lock or not.

Does this sound sane?

Thanks,
John

next prev parent reply	other threads:[~2024-02-15  9:54 UTC|newest]

Thread overview: 68+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-24 14:26 [PATCH 0/6] block atomic writes for XFS John Garry
2024-01-24 14:26 ` [PATCH 1/6] fs: iomap: Atomic write support John Garry
2024-02-02 17:25   ` Darrick J. Wong
2024-02-05 11:29     ` John Garry
2024-02-13  6:55       ` Christoph Hellwig
2024-02-13  8:20         ` John Garry
2024-02-15 11:08           ` John Garry
2024-02-13 18:08       ` Darrick J. Wong
2024-02-05 15:20   ` Pankaj Raghav (Samsung)
2024-02-05 15:41     ` John Garry
2024-01-24 14:26 ` [PATCH 2/6] fs: Add FS_XFLAG_ATOMICWRITES flag John Garry
2024-02-02 17:57   ` Darrick J. Wong
2024-02-05 12:58     ` John Garry
2024-02-13  6:56       ` Christoph Hellwig
2024-02-13 17:08       ` Darrick J. Wong
2024-01-24 14:26 ` [PATCH 3/6] fs: xfs: Support FS_XFLAG_ATOMICWRITES for rtvol John Garry
2024-02-02 17:52   ` Darrick J. Wong
2024-02-03  7:40     ` Ojaswin Mujoo
2024-02-05 12:51     ` John Garry
2024-02-13 17:22       ` Darrick J. Wong
2024-02-14 12:19         ` John Garry
2024-01-24 14:26 ` [PATCH 4/6] fs: xfs: Support atomic write for statx John Garry
2024-02-02 18:05   ` Darrick J. Wong
2024-02-05 13:10     ` John Garry
2024-02-13 17:37       ` Darrick J. Wong
2024-02-14 12:26         ` John Garry
2024-02-09  7:00   ` Ojaswin Mujoo
2024-02-09 17:30     ` John Garry
2024-02-12 11:48       ` Ojaswin Mujoo
2024-02-12 12:05       ` Ojaswin Mujoo
2024-01-24 14:26 ` [PATCH RFC 5/6] fs: xfs: iomap atomic write support John Garry
2024-02-02 18:47   ` Darrick J. Wong
2024-02-05 13:36     ` John Garry
2024-02-06  1:15       ` Dave Chinner
2024-02-06  9:53         ` John Garry
2024-02-07  0:06           ` Dave Chinner
2024-02-07 14:13             ` John Garry
2024-02-09  1:40               ` Dave Chinner
2024-02-09 12:47                 ` John Garry
2024-02-13 23:41                   ` Dave Chinner
2024-02-14 11:06                     ` John Garry
2024-02-14 23:03                       ` Dave Chinner
2024-02-15  9:53                         ` John Garry [this message]
2024-02-13 17:50       ` Darrick J. Wong
2024-02-14 12:13         ` John Garry
2024-01-24 14:26 ` [PATCH 6/6] fs: xfs: Set FMODE_CAN_ATOMIC_WRITE for FS_XFLAG_ATOMICWRITES set John Garry
2024-02-02 18:06   ` Darrick J. Wong
2024-02-05 10:26     ` John Garry
2024-02-13 17:59       ` Darrick J. Wong
2024-02-14 12:36         ` John Garry
2024-02-21 17:00           ` Darrick J. Wong
2024-02-21 17:38             ` John Garry
2024-02-24  4:18               ` Darrick J. Wong
2024-02-09  7:14 ` [PATCH 0/6] block atomic writes for XFS Ojaswin Mujoo
2024-02-09  9:22   ` John Garry
2024-02-12 12:06     ` Ojaswin Mujoo
2024-02-13  7:22 ` Christoph Hellwig
2024-02-13 17:55   ` Darrick J. Wong
2024-02-14  7:45     ` Christoph Hellwig
2024-02-21 16:56       ` Darrick J. Wong
2024-02-23  6:57         ` Christoph Hellwig
2024-02-13 23:50   ` Dave Chinner
2024-02-14  7:38     ` Christoph Hellwig
2024-02-13  7:45 ` Ritesh Harjani
2024-02-13  8:41   ` John Garry
2024-02-13  9:10     ` Ritesh Harjani
2024-02-13 22:49     ` Dave Chinner
2024-02-14 10:10       ` John Garry

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c0d1d513-991b-4893-a132-66ee02c0c880@oracle.com \
    --to=john.g.garry@oracle.com \
    --cc=brauner@kernel.org \
    --cc=chandan.babu@oracle.com \
    --cc=david@fromorbit.com \
    --cc=dchinner@redhat.com \
    --cc=djwong@kernel.org \
    --cc=hch@lst.de \
    --cc=jack@suse.cz \
    --cc=jbongio@google.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=ojaswin@linux.ibm.com \
    --cc=tytso@mit.edu \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).