From: John Garry <john.g.garry@oracle.com>
To: Dave Chinner <david@fromorbit.com>
Cc: Christoph Hellwig <hch@lst.de>,
"Darrick J. Wong" <djwong@kernel.org>,
axboe@kernel.dk, kbusch@kernel.org, sagi@grimberg.me,
jejb@linux.ibm.com, martin.petersen@oracle.com,
viro@zeniv.linux.org.uk, brauner@kernel.org, dchinner@redhat.com,
jack@suse.cz, linux-block@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-nvme@lists.infradead.org,
linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org,
tytso@mit.edu, jbongio@google.com, linux-scsi@vger.kernel.org,
ming.lei@redhat.com, bvanassche@acm.org, ojaswin@linux.ibm.com
Subject: Re: [PATCH v2 00/16] block atomic writes
Date: Wed, 10 Jan 2024 08:55:06 +0000 [thread overview]
Message-ID: <aaa33b4f-dea7-4596-82ce-8c7e6cdaa6ef@oracle.com> (raw)
In-Reply-To: <ZZ3Q4GPrKYo91NQ0@dread.disaster.area>
On 09/01/2024 23:04, Dave Chinner wrote:
>> --- a/include/uapi/linux/fs.h
>> +++ b/include/uapi/linux/fs.h
>> @@ -118,7 +118,8 @@ struct fsxattr {
>> __u32 fsx_nextents; /* nextents field value (get) */
>> __u32 fsx_projid; /* project identifier (get/set) */
>> __u32 fsx_cowextsize; /* CoW extsize field value
>> (get/set)*/
>> - unsigned char fsx_pad[8];
>> + __u32 fsx_atomicwrites_size; /* unit max */
>> + unsigned char fsx_pad[4];
>> };
>>
>> /*
>> @@ -140,6 +141,7 @@ struct fsxattr {
>> #define FS_XFLAG_FILESTREAM 0x00004000 /* use filestream allocator
>> */
>> #define FS_XFLAG_DAX 0x00008000 /* use DAX for IO */
>> #define FS_XFLAG_COWEXTSIZE 0x00010000 /* CoW extent size
>> allocator hint */
>> +#define FS_XFLAG_ATOMICWRITES 0x00020000
>> #define FS_XFLAG_HASATTR 0x80000000 /* no DIFLAG for this */
>>
>> /* the read-only stuff doesn't really belong here, but any other place is
>> lines 1-22/22 (END)
>>
>> Having FS_XFLAG_ATOMICWRITES set will lead to FMODE_CAN_ATOMIC_WRITE being
>> set.
>>
>> So a user can issue:
>>
>>> xfs_io -c "atomic-writes 64K" mnt/file
>>> xfs_io -c "atomic-writes" mnt/file
>> [65536] mnt/file
> Where are you going to store this value in the inode? It requires a
> new field in the inode and so is a change of on-disk format, right?
It would require an on-disk format change, unless we can find an
alternative way to store the value, like:
a. re-use pre-existing extsize or even cowextsize fields and 'xfs_io -c
"atomic-writes $SIZE"' would update those fields and
FS_XFLAG_ATOMICWRITES would be incompatible with FS_XFLAG_COWEXTSIZE or
FS_XFLAG_EXTSIZE
b. require FS_XFLAG_EXTSIZE and extsize be also set to enable atomic
writes, and extsize is used for atomic write unit max
I'm trying to think of ways to avoid requiring a value, but I don't see
good options, like:
- make atomic write unit max some compile-time option
- require mkfs stripe alignment/width be set and use that as basis for
atomic write unit max
We could just use the atomic write unit max which HW provides, but that
could be 1MB or more and that will hardly give efficient data usage for
small files. But maybe we don't care about that if we expect this
feature to only be used on DB files, which can be huge anyway. However I
still have concerns – we require that value to be fixed, but a disk
firmware update could increase that value and this could mean we have
what would be pre-existing mis-aligned extents.
>
> As it is, I really don't see this as a better solution than the
> original generic "force align" flag that simply makes the extent
> size hint alignment a hard physical alignment requirement rather
> than just a hint. This has multiple uses (DAX PMD alignment is
> another), so I just don't see why something that has a single,
> application specific API that implements a hard physical alignment
> is desirable.
I would still hope that we will support forcealign separately for those
purposes.
>
> Indeed, the whole reason that extent size hints are so versatile is
> that they implement a generic allocation alignment/size function
> that can be used for anything your imagination extends to. If they
> were implemented as a "only allow RAID stripe aligned/sized
> allocation" for the original use case then that functionality would
> have been far less useful than it has proven to be over the past
> couple of decades.
>
> Hence history teaches us that we should be designing the API around
> the generic filesystem function required (hard alignment of physical
> extent allocation), not the specific use case that requires that
> functionality.
I understand your concern. However I am not even sure that forcealign
even gives us everything we want to enable atomic writes. There is an
issue where we were required to pre-zero a file prior to issuing atomic
writes to ensure extents are suitably sized, so FS_XFLAG_ATOMICWRITES
would make the FS do what is required to avoid that pre-zeroing (but
that pre-zeroing requirement that does sound like a forcealign issue...)
Furthermore, there was some desire to support atomic writes on block
devices with no HW support by using a CoW-based solution, and forcealign
would not be relevant there.
Thanks,
John
next prev parent reply other threads:[~2024-01-10 8:55 UTC|newest]
Thread overview: 73+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-12-12 11:08 [PATCH v2 00/16] block atomic writes John Garry
2023-12-12 11:08 ` [PATCH v2 01/16] block: Add atomic write operations to request_queue limits John Garry
2023-12-13 1:25 ` Ming Lei
2023-12-13 9:13 ` John Garry
2023-12-13 12:28 ` Ming Lei
2023-12-13 19:01 ` John Garry
2023-12-14 4:38 ` Martin K. Petersen
2023-12-14 13:46 ` Ming Lei
2023-12-14 4:34 ` Martin K. Petersen
2023-12-14 16:12 ` Christoph Hellwig
2023-12-12 11:08 ` [PATCH v2 02/16] block: Limit atomic writes according to bio and queue limits John Garry
2023-12-12 11:08 ` [PATCH v2 03/16] fs/bdev: Add atomic write support info to statx John Garry
2023-12-13 10:24 ` Jan Kara
2023-12-13 11:02 ` John Garry
2023-12-12 11:08 ` [PATCH v2 04/16] fs: Increase fmode_t size John Garry
2023-12-13 11:20 ` Jan Kara
2023-12-13 13:03 ` John Garry
2023-12-13 13:02 ` Christian Brauner
2023-12-13 13:15 ` John Garry
2023-12-13 16:03 ` Christoph Hellwig
2023-12-14 8:56 ` John Garry
2023-12-12 11:08 ` [PATCH v2 05/16] fs: Add RWF_ATOMIC and IOCB_ATOMIC flags for atomic write support John Garry
2023-12-13 13:31 ` Al Viro
2023-12-13 16:02 ` John Garry
2024-01-22 8:29 ` John Garry
2023-12-12 11:08 ` [PATCH v2 06/16] block: Add REQ_ATOMIC flag John Garry
2023-12-12 11:08 ` [PATCH v2 07/16] block: Pass blk_queue_get_max_sectors() a request pointer John Garry
2023-12-12 11:08 ` [PATCH v2 08/16] block: Limit atomic write IO size according to atomic_write_max_sectors John Garry
2023-12-15 2:27 ` Ming Lei
2023-12-15 13:55 ` John Garry
2023-12-12 11:08 ` [PATCH v2 09/16] block: Error an attempt to split an atomic write bio John Garry
2023-12-12 11:08 ` [PATCH v2 10/16] block: Add checks to merging of atomic writes John Garry
2023-12-12 11:08 ` [PATCH v2 11/16] block: Add fops atomic write support John Garry
2023-12-12 11:08 ` [PATCH v2 12/16] scsi: sd: Support reading atomic write properties from block limits VPD John Garry
2023-12-12 11:08 ` [PATCH v2 13/16] scsi: sd: Add WRITE_ATOMIC_16 support John Garry
2023-12-12 11:08 ` [PATCH v2 14/16] scsi: scsi_debug: Atomic write support John Garry
2023-12-12 11:08 ` [PATCH v2 15/16] nvme: Support atomic writes John Garry
2023-12-12 11:08 ` [PATCH v2 16/16] nvme: Ensure atomic writes will be executed atomically John Garry
2023-12-12 16:32 ` [PATCH v2 00/16] block atomic writes Christoph Hellwig
2023-12-13 9:32 ` John Garry
2023-12-13 15:44 ` Christoph Hellwig
2023-12-13 16:27 ` John Garry
2023-12-14 14:37 ` Christoph Hellwig
2023-12-14 15:46 ` John Garry
2023-12-18 22:50 ` Keith Busch
2023-12-19 5:14 ` Darrick J. Wong
2023-12-19 5:21 ` Christoph Hellwig
2023-12-19 12:41 ` John Garry
2023-12-19 15:17 ` Christoph Hellwig
2023-12-19 16:53 ` John Garry
2023-12-21 6:50 ` Christoph Hellwig
2023-12-21 9:49 ` John Garry
2023-12-21 12:19 ` Christoph Hellwig
2023-12-21 12:48 ` John Garry
2023-12-21 12:57 ` Christoph Hellwig
2023-12-21 13:18 ` John Garry
2023-12-21 13:22 ` Christoph Hellwig
2023-12-21 13:56 ` John Garry
2024-01-16 11:35 ` John Garry
2024-01-17 15:02 ` Christoph Hellwig
2024-01-17 16:16 ` John Garry
2024-01-09 9:55 ` John Garry
2024-01-09 16:02 ` Christoph Hellwig
2024-01-09 16:52 ` John Garry
2024-01-09 23:04 ` Dave Chinner
2024-01-10 8:55 ` John Garry [this message]
2024-01-10 9:19 ` Christoph Hellwig
2024-01-11 1:40 ` Darrick J. Wong
2024-01-11 5:02 ` Christoph Hellwig
2024-01-11 9:55 ` John Garry
2024-01-11 14:45 ` Christoph Hellwig
2024-01-11 16:11 ` John Garry
2024-01-11 16:15 ` Christoph Hellwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aaa33b4f-dea7-4596-82ce-8c7e6cdaa6ef@oracle.com \
--to=john.g.garry@oracle.com \
--cc=axboe@kernel.dk \
--cc=brauner@kernel.org \
--cc=bvanassche@acm.org \
--cc=david@fromorbit.com \
--cc=dchinner@redhat.com \
--cc=djwong@kernel.org \
--cc=hch@lst.de \
--cc=jack@suse.cz \
--cc=jbongio@google.com \
--cc=jejb@linux.ibm.com \
--cc=kbusch@kernel.org \
--cc=linux-block@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-nvme@lists.infradead.org \
--cc=linux-scsi@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
--cc=martin.petersen@oracle.com \
--cc=ming.lei@redhat.com \
--cc=ojaswin@linux.ibm.com \
--cc=sagi@grimberg.me \
--cc=tytso@mit.edu \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox