* [PATCH v4 0/3] man2: Document RWF_ATOMIC
@ 2024-07-17 9:36 John Garry
2024-07-17 9:36 ` [PATCH v4 1/3] statx.2: Document STATX_WRITE_ATOMIC John Garry
` (2 more replies)
0 siblings, 3 replies; 9+ messages in thread
From: John Garry @ 2024-07-17 9:36 UTC (permalink / raw)
To: alx
Cc: linux-man, linux-fsdevel, axboe, hch, djwong, dchinner,
martin.petersen, John Garry
Document RWF_ATOMIC flag for pwritev2().
RWF_ATOMIC is used for enabling torn-write protection.
We use RWF_ATOMIC as this is legacy name for similar feature proposed in
the past.
Kernel support has now been merged into Linus' tree, to be released in
v6.11
Differences to v3:
- Formatting changes (Alex)
- semantic newlines
- Add missing .TP in statx
- Combine description of atomic write unit min and max
- misc others
Differences to v2:
- rebase
Differences to v1:
- Add statx max segments param
- Expand readv.2 description
- Document EINVAL
Himanshu Madhani (2):
statx.2: Document STATX_WRITE_ATOMIC
readv.2: Document RWF_ATOMIC flag
John Garry (1):
io_submit.2: Document RWF_ATOMIC
man/man2/io_submit.2 | 19 +++++++++++
man/man2/readv.2 | 76 ++++++++++++++++++++++++++++++++++++++++++++
man/man2/statx.2 | 27 ++++++++++++++++
3 files changed, 122 insertions(+)
--
2.31.1
^ permalink raw reply [flat|nested] 9+ messages in thread* [PATCH v4 1/3] statx.2: Document STATX_WRITE_ATOMIC 2024-07-17 9:36 [PATCH v4 0/3] man2: Document RWF_ATOMIC John Garry @ 2024-07-17 9:36 ` John Garry 2024-07-17 21:36 ` Darrick J. Wong 2024-07-17 9:36 ` [PATCH v4 2/3] readv.2: Document RWF_ATOMIC flag John Garry 2024-07-17 9:36 ` [PATCH v4 3/3] io_submit.2: Document RWF_ATOMIC John Garry 2 siblings, 1 reply; 9+ messages in thread From: John Garry @ 2024-07-17 9:36 UTC (permalink / raw) To: alx Cc: linux-man, linux-fsdevel, axboe, hch, djwong, dchinner, martin.petersen, Himanshu Madhani, John Garry From: Himanshu Madhani <himanshu.madhani@oracle.com> Add the text to the statx man page. Signed-off-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: John Garry <john.g.garry@oracle.com> --- man/man2/statx.2 | 27 +++++++++++++++++++++++++++ 1 file changed, 27 insertions(+) diff --git a/man/man2/statx.2 b/man/man2/statx.2 index 3d47319c6..a7cdc0097 100644 --- a/man/man2/statx.2 +++ b/man/man2/statx.2 @@ -70,6 +70,11 @@ struct statx { __u32 stx_dio_offset_align; \& __u64 stx_subvol; /* Subvolume identifier */ +\& + /* Direct I/O atomic write limits */ + __u32 stx_atomic_write_unit_min; + __u32 stx_atomic_write_unit_max; + __u32 stx_atomic_write_segments_max; }; .EE .in @@ -259,6 +264,9 @@ STATX_DIOALIGN Want stx_dio_mem_align and stx_dio_offset_align STATX_MNT_ID_UNIQUE Want unique stx_mnt_id (since Linux 6.8) STATX_SUBVOL Want stx_subvol (since Linux 6.10; support varies by filesystem) +STATX_WRITE_ATOMIC Want stx_atomic_write_unit_min, stx_atomic_write_unit_max, + and stx_atomic_write_segments_max. + (since Linux 6.11; support varies by filesystem) .TE .in .P @@ -463,6 +471,22 @@ Subvolumes are fancy directories, i.e. they form a tree structure that may be walked recursively. Support varies by filesystem; it is supported by bcachefs and btrfs since Linux 6.10. +.TP +.I stx_atomic_write_unit_min +.TQ +.I stx_atomic_write_unit_max +The minimum and maximum sizes (in bytes) supported for direct I/O +.RB ( O_DIRECT ) +on the file to be written with torn-write protection. +These values are each guaranteed to be a power-of-2. +.TP +.I stx_atomic_write_segments_max +The maximum number of elements in an array of vectors for a write with +torn-write protection enabled. +See +.BR RWF_ATOMIC +flag for +.BR pwritev2 (2). .P For further information on the above fields, see .BR inode (7). @@ -516,6 +540,9 @@ It cannot be written to, and all reads from it will be verified against a cryptographic hash that covers the entire file (e.g., via a Merkle tree). .TP +.BR STATX_ATTR_WRITE_ATOMIC " (since Linux 6.11)" +The file supports torn-write protection. +.TP .BR STATX_ATTR_DAX " (since Linux 5.8)" The file is in the DAX (cpu direct access) state. DAX state attempts to -- 2.31.1 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH v4 1/3] statx.2: Document STATX_WRITE_ATOMIC 2024-07-17 9:36 ` [PATCH v4 1/3] statx.2: Document STATX_WRITE_ATOMIC John Garry @ 2024-07-17 21:36 ` Darrick J. Wong 0 siblings, 0 replies; 9+ messages in thread From: Darrick J. Wong @ 2024-07-17 21:36 UTC (permalink / raw) To: John Garry Cc: alx, linux-man, linux-fsdevel, axboe, hch, dchinner, martin.petersen, Himanshu Madhani On Wed, Jul 17, 2024 at 09:36:17AM +0000, John Garry wrote: > From: Himanshu Madhani <himanshu.madhani@oracle.com> > > Add the text to the statx man page. > > Signed-off-by: Himanshu Madhani <himanshu.madhani@oracle.com> > Signed-off-by: John Garry <john.g.garry@oracle.com> > --- > man/man2/statx.2 | 27 +++++++++++++++++++++++++++ > 1 file changed, 27 insertions(+) > > diff --git a/man/man2/statx.2 b/man/man2/statx.2 > index 3d47319c6..a7cdc0097 100644 > --- a/man/man2/statx.2 > +++ b/man/man2/statx.2 > @@ -70,6 +70,11 @@ struct statx { > __u32 stx_dio_offset_align; > \& > __u64 stx_subvol; /* Subvolume identifier */ > +\& > + /* Direct I/O atomic write limits */ > + __u32 stx_atomic_write_unit_min; > + __u32 stx_atomic_write_unit_max; > + __u32 stx_atomic_write_segments_max; > }; > .EE > .in > @@ -259,6 +264,9 @@ STATX_DIOALIGN Want stx_dio_mem_align and stx_dio_offset_align > STATX_MNT_ID_UNIQUE Want unique stx_mnt_id (since Linux 6.8) > STATX_SUBVOL Want stx_subvol > (since Linux 6.10; support varies by filesystem) > +STATX_WRITE_ATOMIC Want stx_atomic_write_unit_min, stx_atomic_write_unit_max, > + and stx_atomic_write_segments_max. > + (since Linux 6.11; support varies by filesystem) Congratulations ^^^^^^^^^ on getting this merged! > .TE > .in > .P > @@ -463,6 +471,22 @@ Subvolumes are fancy directories, > i.e. they form a tree structure that may be walked recursively. > Support varies by filesystem; > it is supported by bcachefs and btrfs since Linux 6.10. > +.TP > +.I stx_atomic_write_unit_min > +.TQ > +.I stx_atomic_write_unit_max > +The minimum and maximum sizes (in bytes) supported for direct I/O > +.RB ( O_DIRECT ) > +on the file to be written with torn-write protection. I'm tempted to be nitpicky and say "...supported for direct I/O writes to the the file to have torn-write protection" but... eh. It's hot out and I'm not that fussed if you want to ignore that. Either way, Reviewed-by: Darrick J. Wong <djwong@kernel.org> --D > +These values are each guaranteed to be a power-of-2. > +.TP > +.I stx_atomic_write_segments_max > +The maximum number of elements in an array of vectors for a write with > +torn-write protection enabled. > +See > +.BR RWF_ATOMIC > +flag for > +.BR pwritev2 (2). > .P > For further information on the above fields, see > .BR inode (7). > @@ -516,6 +540,9 @@ It cannot be written to, and all reads from it will be verified > against a cryptographic hash that covers the > entire file (e.g., via a Merkle tree). > .TP > +.BR STATX_ATTR_WRITE_ATOMIC " (since Linux 6.11)" > +The file supports torn-write protection. > +.TP > .BR STATX_ATTR_DAX " (since Linux 5.8)" > The file is in the DAX (cpu direct access) state. > DAX state attempts to > -- > 2.31.1 > ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH v4 2/3] readv.2: Document RWF_ATOMIC flag 2024-07-17 9:36 [PATCH v4 0/3] man2: Document RWF_ATOMIC John Garry 2024-07-17 9:36 ` [PATCH v4 1/3] statx.2: Document STATX_WRITE_ATOMIC John Garry @ 2024-07-17 9:36 ` John Garry 2024-07-17 21:44 ` Darrick J. Wong 2024-07-17 9:36 ` [PATCH v4 3/3] io_submit.2: Document RWF_ATOMIC John Garry 2 siblings, 1 reply; 9+ messages in thread From: John Garry @ 2024-07-17 9:36 UTC (permalink / raw) To: alx Cc: linux-man, linux-fsdevel, axboe, hch, djwong, dchinner, martin.petersen, Himanshu Madhani, John Garry From: Himanshu Madhani <himanshu.madhani@oracle.com> Add RWF_ATOMIC flag description for pwritev2(). Signed-off-by: Himanshu Madhani <himanshu.madhani@oracle.com> [jpg: complete rewrite] Signed-off-by: John Garry <john.g.garry@oracle.com> --- man/man2/readv.2 | 76 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 76 insertions(+) diff --git a/man/man2/readv.2 b/man/man2/readv.2 index eecde06dc..9c8a11324 100644 --- a/man/man2/readv.2 +++ b/man/man2/readv.2 @@ -193,6 +193,66 @@ which provides lower latency, but may use additional resources. .B O_DIRECT flag.) .TP +.BR RWF_ATOMIC " (since Linux 6.11)" +Requires that writes to regular files in block-based filesystems be issued with +torn-write protection. +Torn-write protection means that for a power or any other hardware failure, +all or none of the data from the write will be stored, +but never a mix of old and new data. +This flag is meaningful only for +.BR pwritev2 (), +and its effect applies only to the data range written by the system call. +The total write length must be power-of-2 and must be sized in the range +.RI [ stx_atomic_write_unit_min , +.IR stx_atomic_write_unit_max ]. +The write must be at a naturally-aligned offset within the file with respect to +the total write length - +for example, +a write of length 32KB at a file offset of 32KB is permitted, +however a write of length 32KB at a file offset of 48KB is not permitted. +The upper limit of +.I iovcnt +for +.BR pwritev2 () +is in +.I stx_atomic_write_segments_max. +Torn-write protection only works with +.B O_DIRECT +flag, i.e. buffered writes are not supported. +To guarantee consistency from the write between a file's in-core state with the +storage device, +.BR fdatasync (2), +or +.BR fsync (2), +or +.BR open (2) +and either +.B O_SYNC +or +.B O_DSYNC, +or +.B pwritev2 () +and either +.B RWF_SYNC +or +.B RWF_DSYNC +is required. Flags +.B O_SYNC +or +.B RWF_SYNC +provide the strongest guarantees for +.BR RWF_ATOMIC, +in that all data and also file metadata updates will be persisted for a +successfully completed write. +Just using either flags +.B O_DSYNC +or +.B RWF_DSYNC +means that all data and any file updates will be persisted for a successfully +completed write. +Not using any sync flags means that there is no guarantee that data or +filesystem updates are persisted. +.TP .BR RWF_SYNC " (since Linux 4.7)" .\" commit e864f39569f4092c2b2bc72c773b6e486c7e3bd9 Provide a per-write equivalent of the @@ -279,10 +339,26 @@ values overflows an .I ssize_t value. .TP +.B EINVAL + For +.BR RWF_ATOMIC +set, +the combination of the sum of the +.I iov_len +values and the +.I offset +value does not comply with the length and offset torn-write protection rules. +.TP .B EINVAL The vector count, .IR iovcnt , is less than zero or greater than the permitted maximum. +For +.BR RWF_ATOMIC +set, this maximum is in +.I stx_atomic_write_segments_max +from +.I statx. .TP .B EOPNOTSUPP An unknown flag is specified in \fIflags\fP. -- 2.31.1 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH v4 2/3] readv.2: Document RWF_ATOMIC flag 2024-07-17 9:36 ` [PATCH v4 2/3] readv.2: Document RWF_ATOMIC flag John Garry @ 2024-07-17 21:44 ` Darrick J. Wong 2024-07-18 14:07 ` John Garry 0 siblings, 1 reply; 9+ messages in thread From: Darrick J. Wong @ 2024-07-17 21:44 UTC (permalink / raw) To: John Garry Cc: alx, linux-man, linux-fsdevel, axboe, hch, dchinner, martin.petersen, Himanshu Madhani On Wed, Jul 17, 2024 at 09:36:18AM +0000, John Garry wrote: > From: Himanshu Madhani <himanshu.madhani@oracle.com> > > Add RWF_ATOMIC flag description for pwritev2(). > > Signed-off-by: Himanshu Madhani <himanshu.madhani@oracle.com> > [jpg: complete rewrite] > Signed-off-by: John Garry <john.g.garry@oracle.com> > --- > man/man2/readv.2 | 76 ++++++++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 76 insertions(+) > > diff --git a/man/man2/readv.2 b/man/man2/readv.2 > index eecde06dc..9c8a11324 100644 > --- a/man/man2/readv.2 > +++ b/man/man2/readv.2 > @@ -193,6 +193,66 @@ which provides lower latency, but may use additional resources. > .B O_DIRECT > flag.) > .TP > +.BR RWF_ATOMIC " (since Linux 6.11)" > +Requires that writes to regular files in block-based filesystems be issued with > +torn-write protection. > +Torn-write protection means that for a power or any other hardware failure, > +all or none of the data from the write will be stored, > +but never a mix of old and new data. > +This flag is meaningful only for > +.BR pwritev2 (), > +and its effect applies only to the data range written by the system call. > +The total write length must be power-of-2 and must be sized in the range > +.RI [ stx_atomic_write_unit_min , > +.IR stx_atomic_write_unit_max ]. > +The write must be at a naturally-aligned offset within the file with respect to > +the total write length - > +for example, Nit: these could be two sentences "The write must be at a naturally-aligned offset within the file with respect to the total write length. For example, ..." > +a write of length 32KB at a file offset of 32KB is permitted, > +however a write of length 32KB at a file offset of 48KB is not permitted. Pickier nit: KiB, not KB. > +The upper limit of > +.I iovcnt > +for > +.BR pwritev2 () > +is in "is given by" ? > +.I stx_atomic_write_segments_max. > +Torn-write protection only works with > +.B O_DIRECT > +flag, i.e. buffered writes are not supported. > +To guarantee consistency from the write between a file's in-core state with the > +storage device, > +.BR fdatasync (2), > +or > +.BR fsync (2), > +or > +.BR open (2) > +and either > +.B O_SYNC > +or > +.B O_DSYNC, > +or > +.B pwritev2 () > +and either > +.B RWF_SYNC > +or > +.B RWF_DSYNC > +is required. Flags This sentence ^^ should start on a new line. > +.B O_SYNC > +or > +.B RWF_SYNC > +provide the strongest guarantees for > +.BR RWF_ATOMIC, > +in that all data and also file metadata updates will be persisted for a > +successfully completed write. > +Just using either flags > +.B O_DSYNC > +or > +.B RWF_DSYNC > +means that all data and any file updates will be persisted for a successfully > +completed write. "any file updates" ? I /think/ the difference between O_SYNC and O_DSYNC is that O_DSYNC persists all data and file metadata updates for the file range that was written, whereas O_SYNC persists all data and file metadata updates for the entire file. Perhaps everything between "Flags O_SYNC or RWF_SYNC..." and "...for a successfully completed write." should instead refer readers to the notes about synchronized I/O flags in the openat manpage? > +Not using any sync flags means that there is no guarantee that data or > +filesystem updates are persisted. > +.TP > .BR RWF_SYNC " (since Linux 4.7)" > .\" commit e864f39569f4092c2b2bc72c773b6e486c7e3bd9 > Provide a per-write equivalent of the > @@ -279,10 +339,26 @@ values overflows an > .I ssize_t > value. > .TP > +.B EINVAL > + For > +.BR RWF_ATOMIC > +set, "If RWF_ATOMIC is specified..." ? (to be a bit more consistent with the language around the AT_* flags in openat) > +the combination of the sum of the > +.I iov_len > +values and the > +.I offset > +value does not comply with the length and offset torn-write protection rules. > +.TP > .B EINVAL > The vector count, > .IR iovcnt , > is less than zero or greater than the permitted maximum. > +For > +.BR RWF_ATOMIC > +set, this maximum is in (same) --D > +.I stx_atomic_write_segments_max > +from > +.I statx. > .TP > .B EOPNOTSUPP > An unknown flag is specified in \fIflags\fP. > -- > 2.31.1 > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v4 2/3] readv.2: Document RWF_ATOMIC flag 2024-07-17 21:44 ` Darrick J. Wong @ 2024-07-18 14:07 ` John Garry 2024-07-18 15:05 ` Darrick J. Wong 0 siblings, 1 reply; 9+ messages in thread From: John Garry @ 2024-07-18 14:07 UTC (permalink / raw) To: Darrick J. Wong Cc: alx, linux-man, linux-fsdevel, axboe, hch, dchinner, martin.petersen, Himanshu Madhani On 17/07/2024 22:44, Darrick J. Wong wrote: > On Wed, Jul 17, 2024 at 09:36:18AM +0000, John Garry wrote: >> From: Himanshu Madhani <himanshu.madhani@oracle.com> >> >> Add RWF_ATOMIC flag description for pwritev2(). >> >> Signed-off-by: Himanshu Madhani <himanshu.madhani@oracle.com> >> [jpg: complete rewrite] >> Signed-off-by: John Garry <john.g.garry@oracle.com> >> --- >> man/man2/readv.2 | 76 ++++++++++++++++++++++++++++++++++++++++++++++++ >> 1 file changed, 76 insertions(+) >> >> diff --git a/man/man2/readv.2 b/man/man2/readv.2 >> index eecde06dc..9c8a11324 100644 >> --- a/man/man2/readv.2 >> +++ b/man/man2/readv.2 >> @@ -193,6 +193,66 @@ which provides lower latency, but may use additional resources. >> .B O_DIRECT >> flag.) >> .TP >> +.BR RWF_ATOMIC " (since Linux 6.11)" >> +Requires that writes to regular files in block-based filesystems be issued with >> +torn-write protection. >> +Torn-write protection means that for a power or any other hardware failure, >> +all or none of the data from the write will be stored, >> +but never a mix of old and new data. >> +This flag is meaningful only for >> +.BR pwritev2 (), >> +and its effect applies only to the data range written by the system call. >> +The total write length must be power-of-2 and must be sized in the range >> +.RI [ stx_atomic_write_unit_min , >> +.IR stx_atomic_write_unit_max ]. >> +The write must be at a naturally-aligned offset within the file with respect to >> +the total write length - >> +for example, > > Nit: these could be two sentences > > "The write must be at a naturally-aligned offset within the file with > respect to the total write length. For example, ..." ok, sure > >> +a write of length 32KB at a file offset of 32KB is permitted, >> +however a write of length 32KB at a file offset of 48KB is not permitted. > > Pickier nit: KiB, not KB. ok > >> +The upper limit of >> +.I iovcnt >> +for >> +.BR pwritev2 () >> +is in > > "is given by" ? ok, fine, I don't mind > >> +.I stx_atomic_write_segments_max. >> +Torn-write protection only works with >> +.B O_DIRECT >> +flag, i.e. buffered writes are not supported. >> +To guarantee consistency from the write between a file's in-core state with the >> +storage device, >> +.BR fdatasync (2), >> +or >> +.BR fsync (2), >> +or >> +.BR open (2) >> +and either >> +.B O_SYNC >> +or >> +.B O_DSYNC, >> +or >> +.B pwritev2 () >> +and either >> +.B RWF_SYNC >> +or >> +.B RWF_DSYNC >> +is required. Flags > > This sentence ^^ should start on a new line. yes > >> +.B O_SYNC >> +or >> +.B RWF_SYNC >> +provide the strongest guarantees for >> +.BR RWF_ATOMIC, >> +in that all data and also file metadata updates will be persisted for a >> +successfully completed write. >> +Just using either flags >> +.B O_DSYNC >> +or >> +.B RWF_DSYNC >> +means that all data and any file updates will be persisted for a successfully >> +completed write. > ughh, this is hard to word both concisely and accurately... > "any file updates" ? I /think/ the difference between O_SYNC and > O_DSYNC is that O_DSYNC persists all data and file metadata updates for > the file range that was written, whereas O_SYNC persists all data and > file metadata updates for the entire file. I think that https://man7.org/linux/man-pages/man2/open.2.html#NOTES describes it best. > > Perhaps everything between "Flags O_SYNC or RWF_SYNC..." and "...for a > successfully completed write." should instead refer readers to the notes > about synchronized I/O flags in the openat manpage? Maybe that would be better, but we just need to make it clear that RWF_ATOMIC provides the guarantee that the data is atomically updated only in addition to whatever guarantee we have for metadata updates from O_SYNC/O_DSYNC. So maybe: RWF_ATOMIC provides the guarantee that any data is written with torn-write protection, and additional flags O_SYNC or O_DSYNC provide same Synchronized I/O guarantees as documented in <openat manpage reference> OK? > >> +Not using any sync flags means that there is no guarantee that data or >> +filesystem updates are persisted. >> +.TP >> .BR RWF_SYNC " (since Linux 4.7)" >> .\" commit e864f39569f4092c2b2bc72c773b6e486c7e3bd9 >> Provide a per-write equivalent of the >> @@ -279,10 +339,26 @@ values overflows an >> .I ssize_t >> value. >> .TP >> +.B EINVAL >> + For >> +.BR RWF_ATOMIC >> +set, > > "If RWF_ATOMIC is specified..." ? > > (to be a bit more consistent with the language around the AT_* flags in > openat) ok, fine > >> +the combination of the sum of the >> +.I iov_len >> +values and the >> +.I offset >> +value does not comply with the length and offset torn-write protection rules. >> +.TP >> .B EINVAL >> The vector count, >> .IR iovcnt , >> is less than zero or greater than the permitted maximum. >> +For >> +.BR RWF_ATOMIC >> +set, this maximum is in > > (same) > > --D > Thanks for checking, John ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v4 2/3] readv.2: Document RWF_ATOMIC flag 2024-07-18 14:07 ` John Garry @ 2024-07-18 15:05 ` Darrick J. Wong 0 siblings, 0 replies; 9+ messages in thread From: Darrick J. Wong @ 2024-07-18 15:05 UTC (permalink / raw) To: John Garry Cc: alx, linux-man, linux-fsdevel, axboe, hch, dchinner, martin.petersen, Himanshu Madhani On Thu, Jul 18, 2024 at 03:07:59PM +0100, John Garry wrote: > On 17/07/2024 22:44, Darrick J. Wong wrote: > > On Wed, Jul 17, 2024 at 09:36:18AM +0000, John Garry wrote: > > > From: Himanshu Madhani <himanshu.madhani@oracle.com> > > > > > > Add RWF_ATOMIC flag description for pwritev2(). > > > > > > Signed-off-by: Himanshu Madhani <himanshu.madhani@oracle.com> > > > [jpg: complete rewrite] > > > Signed-off-by: John Garry <john.g.garry@oracle.com> > > > --- > > > man/man2/readv.2 | 76 ++++++++++++++++++++++++++++++++++++++++++++++++ > > > 1 file changed, 76 insertions(+) > > > > > > diff --git a/man/man2/readv.2 b/man/man2/readv.2 > > > index eecde06dc..9c8a11324 100644 > > > --- a/man/man2/readv.2 > > > +++ b/man/man2/readv.2 > > > @@ -193,6 +193,66 @@ which provides lower latency, but may use additional resources. > > > .B O_DIRECT > > > flag.) > > > .TP > > > +.BR RWF_ATOMIC " (since Linux 6.11)" > > > +Requires that writes to regular files in block-based filesystems be issued with > > > +torn-write protection. > > > +Torn-write protection means that for a power or any other hardware failure, > > > +all or none of the data from the write will be stored, > > > +but never a mix of old and new data. > > > +This flag is meaningful only for > > > +.BR pwritev2 (), > > > +and its effect applies only to the data range written by the system call. > > > +The total write length must be power-of-2 and must be sized in the range > > > +.RI [ stx_atomic_write_unit_min , > > > +.IR stx_atomic_write_unit_max ]. > > > +The write must be at a naturally-aligned offset within the file with respect to > > > +the total write length - > > > +for example, > > > > Nit: these could be two sentences > > > > "The write must be at a naturally-aligned offset within the file with > > respect to the total write length. For example, ..." > > ok, sure > > > > > > +a write of length 32KB at a file offset of 32KB is permitted, > > > +however a write of length 32KB at a file offset of 48KB is not permitted. > > > > Pickier nit: KiB, not KB. > > ok > > > > > > +The upper limit of > > > +.I iovcnt > > > +for > > > +.BR pwritev2 () > > > +is in > > > > "is given by" ? > > ok, fine, I don't mind > > > > > > +.I stx_atomic_write_segments_max. > > > +Torn-write protection only works with > > > +.B O_DIRECT > > > +flag, i.e. buffered writes are not supported. > > > +To guarantee consistency from the write between a file's in-core state with the > > > +storage device, > > > +.BR fdatasync (2), > > > +or > > > +.BR fsync (2), > > > +or > > > +.BR open (2) > > > +and either > > > +.B O_SYNC > > > +or > > > +.B O_DSYNC, > > > +or > > > +.B pwritev2 () > > > +and either > > > +.B RWF_SYNC > > > +or > > > +.B RWF_DSYNC > > > +is required. Flags > > > > This sentence ^^ should start on a new line. > > yes > > > > > > +.B O_SYNC > > > +or > > > +.B RWF_SYNC > > > +provide the strongest guarantees for > > > +.BR RWF_ATOMIC, > > > +in that all data and also file metadata updates will be persisted for a > > > +successfully completed write. > > > +Just using either flags > > > +.B O_DSYNC > > > +or > > > +.B RWF_DSYNC > > > +means that all data and any file updates will be persisted for a successfully > > > +completed write. > > > > ughh, this is hard to word both concisely and accurately... > > > "any file updates" ? I /think/ the difference between O_SYNC and > > O_DSYNC is that O_DSYNC persists all data and file metadata updates for > > the file range that was written, whereas O_SYNC persists all data and > > file metadata updates for the entire file. > > I think that https://man7.org/linux/man-pages/man2/open.2.html#NOTES > describes it best. > > > > > Perhaps everything between "Flags O_SYNC or RWF_SYNC..." and "...for a > > successfully completed write." should instead refer readers to the notes > > about synchronized I/O flags in the openat manpage? > > Maybe that would be better, but we just need to make it clear that > RWF_ATOMIC provides the guarantee that the data is atomically updated only > in addition to whatever guarantee we have for metadata updates from > O_SYNC/O_DSYNC. > > > So maybe: > RWF_ATOMIC provides the guarantee that any data is written with torn-write > protection, and additional flags O_SYNC or O_DSYNC provide > same Synchronized I/O guarantees as documented in <openat manpage reference> ^ the same > > OK? Yes. > > > +Not using any sync flags means that there is no guarantee that data or > > > +filesystem updates are persisted. > > > +.TP > > > .BR RWF_SYNC " (since Linux 4.7)" > > > .\" commit e864f39569f4092c2b2bc72c773b6e486c7e3bd9 > > > Provide a per-write equivalent of the > > > @@ -279,10 +339,26 @@ values overflows an > > > .I ssize_t > > > value. > > > .TP > > > +.B EINVAL > > > + For > > > +.BR RWF_ATOMIC > > > +set, > > > > "If RWF_ATOMIC is specified..." ? > > > > (to be a bit more consistent with the language around the AT_* flags in > > openat) > > ok, fine > > > > > > +the combination of the sum of the > > > +.I iov_len > > > +values and the > > > +.I offset > > > +value does not comply with the length and offset torn-write protection rules. > > > +.TP > > > .B EINVAL > > > The vector count, > > > .IR iovcnt , > > > is less than zero or greater than the permitted maximum. > > > +For > > > +.BR RWF_ATOMIC > > > +set, this maximum is in > > > > (same) > > > > --D > > > > Thanks for checking, NP. :) --D > John > > ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH v4 3/3] io_submit.2: Document RWF_ATOMIC 2024-07-17 9:36 [PATCH v4 0/3] man2: Document RWF_ATOMIC John Garry 2024-07-17 9:36 ` [PATCH v4 1/3] statx.2: Document STATX_WRITE_ATOMIC John Garry 2024-07-17 9:36 ` [PATCH v4 2/3] readv.2: Document RWF_ATOMIC flag John Garry @ 2024-07-17 9:36 ` John Garry 2024-07-17 21:44 ` Darrick J. Wong 2 siblings, 1 reply; 9+ messages in thread From: John Garry @ 2024-07-17 9:36 UTC (permalink / raw) To: alx Cc: linux-man, linux-fsdevel, axboe, hch, djwong, dchinner, martin.petersen, John Garry Document RWF_ATOMIC for asynchronous I/O. Signed-off-by: John Garry <john.g.garry@oracle.com> --- man/man2/io_submit.2 | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) diff --git a/man/man2/io_submit.2 b/man/man2/io_submit.2 index c53ae9aaf..12b4a72d7 100644 --- a/man/man2/io_submit.2 +++ b/man/man2/io_submit.2 @@ -140,6 +140,25 @@ as well the description of .B O_SYNC in .BR open (2). +.TP +.BR RWF_ATOMIC " (since Linux 6.11)" +Write a block of data such that a write will never be torn from power fail or +similar. +See the description of +.B RWF_ATOMIC +in +.BR pwritev2 (2). +For usage with +.BR IOCB_CMD_PWRITEV, +the upper vector limit is in +.I stx_atomic_write_segments_max. +See +.B STATX_WRITE_ATOMIC +and +.I stx_atomic_write_segments_max +description +in +.BR statx (2). .RE .TP .I aio_lio_opcode -- 2.31.1 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH v4 3/3] io_submit.2: Document RWF_ATOMIC 2024-07-17 9:36 ` [PATCH v4 3/3] io_submit.2: Document RWF_ATOMIC John Garry @ 2024-07-17 21:44 ` Darrick J. Wong 0 siblings, 0 replies; 9+ messages in thread From: Darrick J. Wong @ 2024-07-17 21:44 UTC (permalink / raw) To: John Garry Cc: alx, linux-man, linux-fsdevel, axboe, hch, dchinner, martin.petersen On Wed, Jul 17, 2024 at 09:36:19AM +0000, John Garry wrote: > Document RWF_ATOMIC for asynchronous I/O. > > Signed-off-by: John Garry <john.g.garry@oracle.com> > --- > man/man2/io_submit.2 | 19 +++++++++++++++++++ > 1 file changed, 19 insertions(+) > > diff --git a/man/man2/io_submit.2 b/man/man2/io_submit.2 > index c53ae9aaf..12b4a72d7 100644 > --- a/man/man2/io_submit.2 > +++ b/man/man2/io_submit.2 > @@ -140,6 +140,25 @@ as well the description of > .B O_SYNC > in > .BR open (2). > +.TP > +.BR RWF_ATOMIC " (since Linux 6.11)" > +Write a block of data such that a write will never be torn from power fail or > +similar. > +See the description of > +.B RWF_ATOMIC > +in > +.BR pwritev2 (2). > +For usage with > +.BR IOCB_CMD_PWRITEV, > +the upper vector limit is in > +.I stx_atomic_write_segments_max. > +See > +.B STATX_WRITE_ATOMIC > +and > +.I stx_atomic_write_segments_max > +description > +in > +.BR statx (2). Sounds good to me! Reviewed-by: Darrick J. Wong <djwong@kernel.org> --D > .RE > .TP > .I aio_lio_opcode > -- > 2.31.1 > ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2024-07-18 15:05 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-07-17 9:36 [PATCH v4 0/3] man2: Document RWF_ATOMIC John Garry 2024-07-17 9:36 ` [PATCH v4 1/3] statx.2: Document STATX_WRITE_ATOMIC John Garry 2024-07-17 21:36 ` Darrick J. Wong 2024-07-17 9:36 ` [PATCH v4 2/3] readv.2: Document RWF_ATOMIC flag John Garry 2024-07-17 21:44 ` Darrick J. Wong 2024-07-18 14:07 ` John Garry 2024-07-18 15:05 ` Darrick J. Wong 2024-07-17 9:36 ` [PATCH v4 3/3] io_submit.2: Document RWF_ATOMIC John Garry 2024-07-17 21:44 ` Darrick J. Wong
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).