linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 0/3] man2: Document RWF_ATOMIC
@ 2024-07-17  9:36 John Garry
  2024-07-17  9:36 ` [PATCH v4 1/3] statx.2: Document STATX_WRITE_ATOMIC John Garry
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: John Garry @ 2024-07-17  9:36 UTC (permalink / raw)
  To: alx
  Cc: linux-man, linux-fsdevel, axboe, hch, djwong, dchinner,
	martin.petersen, John Garry

Document RWF_ATOMIC flag for pwritev2().

RWF_ATOMIC is used for enabling torn-write protection.

We use RWF_ATOMIC as this is legacy name for similar feature proposed in
the past.

Kernel support has now been merged into Linus' tree, to be released in
v6.11

Differences to v3:
- Formatting changes (Alex)
 - semantic newlines
 - Add missing .TP in statx
 - Combine description of atomic write unit min and max
 - misc others

Differences to v2:
- rebase

Differences to v1:
- Add statx max segments param
- Expand readv.2 description
- Document EINVAL

Himanshu Madhani (2):
  statx.2: Document STATX_WRITE_ATOMIC
  readv.2: Document RWF_ATOMIC flag

John Garry (1):
  io_submit.2: Document RWF_ATOMIC

 man/man2/io_submit.2 | 19 +++++++++++
 man/man2/readv.2     | 76 ++++++++++++++++++++++++++++++++++++++++++++
 man/man2/statx.2     | 27 ++++++++++++++++
 3 files changed, 122 insertions(+)

-- 
2.31.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v4 1/3] statx.2: Document STATX_WRITE_ATOMIC
  2024-07-17  9:36 [PATCH v4 0/3] man2: Document RWF_ATOMIC John Garry
@ 2024-07-17  9:36 ` John Garry
  2024-07-17 21:36   ` Darrick J. Wong
  2024-07-17  9:36 ` [PATCH v4 2/3] readv.2: Document RWF_ATOMIC flag John Garry
  2024-07-17  9:36 ` [PATCH v4 3/3] io_submit.2: Document RWF_ATOMIC John Garry
  2 siblings, 1 reply; 9+ messages in thread
From: John Garry @ 2024-07-17  9:36 UTC (permalink / raw)
  To: alx
  Cc: linux-man, linux-fsdevel, axboe, hch, djwong, dchinner,
	martin.petersen, Himanshu Madhani, John Garry

From: Himanshu Madhani <himanshu.madhani@oracle.com>

Add the text to the statx man page.

Signed-off-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: John Garry <john.g.garry@oracle.com>
---
 man/man2/statx.2 | 27 +++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)

diff --git a/man/man2/statx.2 b/man/man2/statx.2
index 3d47319c6..a7cdc0097 100644
--- a/man/man2/statx.2
+++ b/man/man2/statx.2
@@ -70,6 +70,11 @@ struct statx {
     __u32 stx_dio_offset_align;
 \&
     __u64 stx_subvol;      /* Subvolume identifier */
+\&
+    /* Direct I/O atomic write limits */
+    __u32 stx_atomic_write_unit_min;
+    __u32 stx_atomic_write_unit_max;
+    __u32 stx_atomic_write_segments_max;
 };
 .EE
 .in
@@ -259,6 +264,9 @@ STATX_DIOALIGN	Want stx_dio_mem_align and stx_dio_offset_align
 STATX_MNT_ID_UNIQUE	Want unique stx_mnt_id (since Linux 6.8)
 STATX_SUBVOL	Want stx_subvol
 	(since Linux 6.10; support varies by filesystem)
+STATX_WRITE_ATOMIC	Want stx_atomic_write_unit_min, stx_atomic_write_unit_max,
+	and stx_atomic_write_segments_max.
+	(since Linux 6.11; support varies by filesystem)
 .TE
 .in
 .P
@@ -463,6 +471,22 @@ Subvolumes are fancy directories,
 i.e. they form a tree structure that may be walked recursively.
 Support varies by filesystem;
 it is supported by bcachefs and btrfs since Linux 6.10.
+.TP
+.I stx_atomic_write_unit_min
+.TQ
+.I stx_atomic_write_unit_max
+The minimum and maximum sizes (in bytes) supported for direct I/O
+.RB ( O_DIRECT )
+on the file to be written with torn-write protection.
+These values are each guaranteed to be a power-of-2.
+.TP
+.I stx_atomic_write_segments_max
+The maximum number of elements in an array of vectors for a write with
+torn-write protection enabled.
+See
+.BR RWF_ATOMIC
+flag for
+.BR pwritev2 (2).
 .P
 For further information on the above fields, see
 .BR inode (7).
@@ -516,6 +540,9 @@ It cannot be written to, and all reads from it will be verified
 against a cryptographic hash that covers the
 entire file (e.g., via a Merkle tree).
 .TP
+.BR STATX_ATTR_WRITE_ATOMIC " (since Linux 6.11)"
+The file supports torn-write protection.
+.TP
 .BR STATX_ATTR_DAX " (since Linux 5.8)"
 The file is in the DAX (cpu direct access) state.
 DAX state attempts to
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v4 2/3] readv.2: Document RWF_ATOMIC flag
  2024-07-17  9:36 [PATCH v4 0/3] man2: Document RWF_ATOMIC John Garry
  2024-07-17  9:36 ` [PATCH v4 1/3] statx.2: Document STATX_WRITE_ATOMIC John Garry
@ 2024-07-17  9:36 ` John Garry
  2024-07-17 21:44   ` Darrick J. Wong
  2024-07-17  9:36 ` [PATCH v4 3/3] io_submit.2: Document RWF_ATOMIC John Garry
  2 siblings, 1 reply; 9+ messages in thread
From: John Garry @ 2024-07-17  9:36 UTC (permalink / raw)
  To: alx
  Cc: linux-man, linux-fsdevel, axboe, hch, djwong, dchinner,
	martin.petersen, Himanshu Madhani, John Garry

From: Himanshu Madhani <himanshu.madhani@oracle.com>

Add RWF_ATOMIC flag description for pwritev2().

Signed-off-by: Himanshu Madhani <himanshu.madhani@oracle.com>
[jpg: complete rewrite]
Signed-off-by: John Garry <john.g.garry@oracle.com>
---
 man/man2/readv.2 | 76 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 76 insertions(+)

diff --git a/man/man2/readv.2 b/man/man2/readv.2
index eecde06dc..9c8a11324 100644
--- a/man/man2/readv.2
+++ b/man/man2/readv.2
@@ -193,6 +193,66 @@ which provides lower latency, but may use additional resources.
 .B O_DIRECT
 flag.)
 .TP
+.BR RWF_ATOMIC " (since Linux 6.11)"
+Requires that writes to regular files in block-based filesystems be issued with
+torn-write protection.
+Torn-write protection means that for a power or any other hardware failure,
+all or none of the data from the write will be stored,
+but never a mix of old and new data.
+This flag is meaningful only for
+.BR pwritev2 (),
+and its effect applies only to the data range written by the system call.
+The total write length must be power-of-2 and must be sized in the range
+.RI [ stx_atomic_write_unit_min ,
+.IR stx_atomic_write_unit_max ].
+The write must be at a naturally-aligned offset within the file with respect to
+the total write length -
+for example,
+a write of length 32KB at a file offset of 32KB is permitted,
+however a write of length 32KB at a file offset of 48KB is not permitted.
+The upper limit of
+.I iovcnt
+for
+.BR pwritev2 ()
+is in
+.I stx_atomic_write_segments_max.
+Torn-write protection only works with
+.B O_DIRECT
+flag, i.e. buffered writes are not supported.
+To guarantee consistency from the write between a file's in-core state with the
+storage device,
+.BR fdatasync (2),
+or
+.BR fsync (2),
+or
+.BR open (2)
+and either
+.B O_SYNC
+or
+.B O_DSYNC,
+or
+.B pwritev2 ()
+and either
+.B RWF_SYNC
+or
+.B RWF_DSYNC
+is required. Flags
+.B O_SYNC
+or
+.B RWF_SYNC
+provide the strongest guarantees for
+.BR RWF_ATOMIC,
+in that all data and also file metadata updates will be persisted for a
+successfully completed write.
+Just using either flags
+.B O_DSYNC
+or
+.B RWF_DSYNC
+means that all data and any file updates will be persisted for a successfully
+completed write.
+Not using any sync flags means that there is no guarantee that data or
+filesystem updates are persisted.
+.TP
 .BR RWF_SYNC " (since Linux 4.7)"
 .\" commit e864f39569f4092c2b2bc72c773b6e486c7e3bd9
 Provide a per-write equivalent of the
@@ -279,10 +339,26 @@ values overflows an
 .I ssize_t
 value.
 .TP
+.B EINVAL
+ For
+.BR RWF_ATOMIC
+set,
+the combination of the sum of the
+.I iov_len
+values and the
+.I offset
+value does not comply with the length and offset torn-write protection rules.
+.TP
 .B EINVAL
 The vector count,
 .IR iovcnt ,
 is less than zero or greater than the permitted maximum.
+For
+.BR RWF_ATOMIC
+set, this maximum is in
+.I stx_atomic_write_segments_max
+from
+.I statx.
 .TP
 .B EOPNOTSUPP
 An unknown flag is specified in \fIflags\fP.
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v4 3/3] io_submit.2: Document RWF_ATOMIC
  2024-07-17  9:36 [PATCH v4 0/3] man2: Document RWF_ATOMIC John Garry
  2024-07-17  9:36 ` [PATCH v4 1/3] statx.2: Document STATX_WRITE_ATOMIC John Garry
  2024-07-17  9:36 ` [PATCH v4 2/3] readv.2: Document RWF_ATOMIC flag John Garry
@ 2024-07-17  9:36 ` John Garry
  2024-07-17 21:44   ` Darrick J. Wong
  2 siblings, 1 reply; 9+ messages in thread
From: John Garry @ 2024-07-17  9:36 UTC (permalink / raw)
  To: alx
  Cc: linux-man, linux-fsdevel, axboe, hch, djwong, dchinner,
	martin.petersen, John Garry

Document RWF_ATOMIC for asynchronous I/O.

Signed-off-by: John Garry <john.g.garry@oracle.com>
---
 man/man2/io_submit.2 | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/man/man2/io_submit.2 b/man/man2/io_submit.2
index c53ae9aaf..12b4a72d7 100644
--- a/man/man2/io_submit.2
+++ b/man/man2/io_submit.2
@@ -140,6 +140,25 @@ as well the description of
 .B O_SYNC
 in
 .BR open (2).
+.TP
+.BR RWF_ATOMIC " (since Linux 6.11)"
+Write a block of data such that a write will never be torn from power fail or
+similar.
+See the description of
+.B RWF_ATOMIC
+in
+.BR pwritev2 (2).
+For usage with
+.BR IOCB_CMD_PWRITEV,
+the upper vector limit is in
+.I stx_atomic_write_segments_max.
+See
+.B STATX_WRITE_ATOMIC
+and
+.I stx_atomic_write_segments_max
+description
+in
+.BR statx (2).
 .RE
 .TP
 .I aio_lio_opcode
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v4 1/3] statx.2: Document STATX_WRITE_ATOMIC
  2024-07-17  9:36 ` [PATCH v4 1/3] statx.2: Document STATX_WRITE_ATOMIC John Garry
@ 2024-07-17 21:36   ` Darrick J. Wong
  0 siblings, 0 replies; 9+ messages in thread
From: Darrick J. Wong @ 2024-07-17 21:36 UTC (permalink / raw)
  To: John Garry
  Cc: alx, linux-man, linux-fsdevel, axboe, hch, dchinner,
	martin.petersen, Himanshu Madhani

On Wed, Jul 17, 2024 at 09:36:17AM +0000, John Garry wrote:
> From: Himanshu Madhani <himanshu.madhani@oracle.com>
> 
> Add the text to the statx man page.
> 
> Signed-off-by: Himanshu Madhani <himanshu.madhani@oracle.com>
> Signed-off-by: John Garry <john.g.garry@oracle.com>
> ---
>  man/man2/statx.2 | 27 +++++++++++++++++++++++++++
>  1 file changed, 27 insertions(+)
> 
> diff --git a/man/man2/statx.2 b/man/man2/statx.2
> index 3d47319c6..a7cdc0097 100644
> --- a/man/man2/statx.2
> +++ b/man/man2/statx.2
> @@ -70,6 +70,11 @@ struct statx {
>      __u32 stx_dio_offset_align;
>  \&
>      __u64 stx_subvol;      /* Subvolume identifier */
> +\&
> +    /* Direct I/O atomic write limits */
> +    __u32 stx_atomic_write_unit_min;
> +    __u32 stx_atomic_write_unit_max;
> +    __u32 stx_atomic_write_segments_max;
>  };
>  .EE
>  .in
> @@ -259,6 +264,9 @@ STATX_DIOALIGN	Want stx_dio_mem_align and stx_dio_offset_align
>  STATX_MNT_ID_UNIQUE	Want unique stx_mnt_id (since Linux 6.8)
>  STATX_SUBVOL	Want stx_subvol
>  	(since Linux 6.10; support varies by filesystem)
> +STATX_WRITE_ATOMIC	Want stx_atomic_write_unit_min, stx_atomic_write_unit_max,
> +	and stx_atomic_write_segments_max.
> +	(since Linux 6.11; support varies by filesystem)

Congratulations ^^^^^^^^^ on getting this merged!

>  .TE
>  .in
>  .P
> @@ -463,6 +471,22 @@ Subvolumes are fancy directories,
>  i.e. they form a tree structure that may be walked recursively.
>  Support varies by filesystem;
>  it is supported by bcachefs and btrfs since Linux 6.10.
> +.TP
> +.I stx_atomic_write_unit_min
> +.TQ
> +.I stx_atomic_write_unit_max
> +The minimum and maximum sizes (in bytes) supported for direct I/O
> +.RB ( O_DIRECT )
> +on the file to be written with torn-write protection.

I'm tempted to be nitpicky and say "...supported for direct I/O writes
to the the file to have torn-write protection" but... eh.  It's hot out
and I'm not that fussed if you want to ignore that.

Either way,
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D


> +These values are each guaranteed to be a power-of-2.
> +.TP
> +.I stx_atomic_write_segments_max
> +The maximum number of elements in an array of vectors for a write with
> +torn-write protection enabled.
> +See
> +.BR RWF_ATOMIC
> +flag for
> +.BR pwritev2 (2).
>  .P
>  For further information on the above fields, see
>  .BR inode (7).
> @@ -516,6 +540,9 @@ It cannot be written to, and all reads from it will be verified
>  against a cryptographic hash that covers the
>  entire file (e.g., via a Merkle tree).
>  .TP
> +.BR STATX_ATTR_WRITE_ATOMIC " (since Linux 6.11)"
> +The file supports torn-write protection.
> +.TP
>  .BR STATX_ATTR_DAX " (since Linux 5.8)"
>  The file is in the DAX (cpu direct access) state.
>  DAX state attempts to
> -- 
> 2.31.1
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v4 2/3] readv.2: Document RWF_ATOMIC flag
  2024-07-17  9:36 ` [PATCH v4 2/3] readv.2: Document RWF_ATOMIC flag John Garry
@ 2024-07-17 21:44   ` Darrick J. Wong
  2024-07-18 14:07     ` John Garry
  0 siblings, 1 reply; 9+ messages in thread
From: Darrick J. Wong @ 2024-07-17 21:44 UTC (permalink / raw)
  To: John Garry
  Cc: alx, linux-man, linux-fsdevel, axboe, hch, dchinner,
	martin.petersen, Himanshu Madhani

On Wed, Jul 17, 2024 at 09:36:18AM +0000, John Garry wrote:
> From: Himanshu Madhani <himanshu.madhani@oracle.com>
> 
> Add RWF_ATOMIC flag description for pwritev2().
> 
> Signed-off-by: Himanshu Madhani <himanshu.madhani@oracle.com>
> [jpg: complete rewrite]
> Signed-off-by: John Garry <john.g.garry@oracle.com>
> ---
>  man/man2/readv.2 | 76 ++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 76 insertions(+)
> 
> diff --git a/man/man2/readv.2 b/man/man2/readv.2
> index eecde06dc..9c8a11324 100644
> --- a/man/man2/readv.2
> +++ b/man/man2/readv.2
> @@ -193,6 +193,66 @@ which provides lower latency, but may use additional resources.
>  .B O_DIRECT
>  flag.)
>  .TP
> +.BR RWF_ATOMIC " (since Linux 6.11)"
> +Requires that writes to regular files in block-based filesystems be issued with
> +torn-write protection.
> +Torn-write protection means that for a power or any other hardware failure,
> +all or none of the data from the write will be stored,
> +but never a mix of old and new data.
> +This flag is meaningful only for
> +.BR pwritev2 (),
> +and its effect applies only to the data range written by the system call.
> +The total write length must be power-of-2 and must be sized in the range
> +.RI [ stx_atomic_write_unit_min ,
> +.IR stx_atomic_write_unit_max ].
> +The write must be at a naturally-aligned offset within the file with respect to
> +the total write length -
> +for example,

Nit: these could be two sentences

"The write must be at a naturally-aligned offset within the file with
respect to the total write length.  For example, ..."

> +a write of length 32KB at a file offset of 32KB is permitted,
> +however a write of length 32KB at a file offset of 48KB is not permitted.

Pickier nit: KiB, not KB.

> +The upper limit of
> +.I iovcnt
> +for
> +.BR pwritev2 ()
> +is in

"is given by" ?

> +.I stx_atomic_write_segments_max.
> +Torn-write protection only works with
> +.B O_DIRECT
> +flag, i.e. buffered writes are not supported.
> +To guarantee consistency from the write between a file's in-core state with the
> +storage device,
> +.BR fdatasync (2),
> +or
> +.BR fsync (2),
> +or
> +.BR open (2)
> +and either
> +.B O_SYNC
> +or
> +.B O_DSYNC,
> +or
> +.B pwritev2 ()
> +and either
> +.B RWF_SYNC
> +or
> +.B RWF_DSYNC
> +is required. Flags

This sentence   ^^ should start on a new line.

> +.B O_SYNC
> +or
> +.B RWF_SYNC
> +provide the strongest guarantees for
> +.BR RWF_ATOMIC,
> +in that all data and also file metadata updates will be persisted for a
> +successfully completed write.
> +Just using either flags
> +.B O_DSYNC
> +or
> +.B RWF_DSYNC
> +means that all data and any file updates will be persisted for a successfully
> +completed write.

"any file updates" ?  I /think/ the difference between O_SYNC and
O_DSYNC is that O_DSYNC persists all data and file metadata updates for
the file range that was written, whereas O_SYNC persists all data and
file metadata updates for the entire file.

Perhaps everything between "Flags O_SYNC or RWF_SYNC..." and "...for a
successfully completed write." should instead refer readers to the notes
about synchronized I/O flags in the openat manpage?

> +Not using any sync flags means that there is no guarantee that data or
> +filesystem updates are persisted.
> +.TP
>  .BR RWF_SYNC " (since Linux 4.7)"
>  .\" commit e864f39569f4092c2b2bc72c773b6e486c7e3bd9
>  Provide a per-write equivalent of the
> @@ -279,10 +339,26 @@ values overflows an
>  .I ssize_t
>  value.
>  .TP
> +.B EINVAL
> + For
> +.BR RWF_ATOMIC
> +set,

"If RWF_ATOMIC is specified..." ?

(to be a bit more consistent with the language around the AT_* flags in
openat)

> +the combination of the sum of the
> +.I iov_len
> +values and the
> +.I offset
> +value does not comply with the length and offset torn-write protection rules.
> +.TP
>  .B EINVAL
>  The vector count,
>  .IR iovcnt ,
>  is less than zero or greater than the permitted maximum.
> +For
> +.BR RWF_ATOMIC
> +set, this maximum is in

(same)

--D

> +.I stx_atomic_write_segments_max
> +from
> +.I statx.
>  .TP
>  .B EOPNOTSUPP
>  An unknown flag is specified in \fIflags\fP.
> -- 
> 2.31.1
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v4 3/3] io_submit.2: Document RWF_ATOMIC
  2024-07-17  9:36 ` [PATCH v4 3/3] io_submit.2: Document RWF_ATOMIC John Garry
@ 2024-07-17 21:44   ` Darrick J. Wong
  0 siblings, 0 replies; 9+ messages in thread
From: Darrick J. Wong @ 2024-07-17 21:44 UTC (permalink / raw)
  To: John Garry
  Cc: alx, linux-man, linux-fsdevel, axboe, hch, dchinner,
	martin.petersen

On Wed, Jul 17, 2024 at 09:36:19AM +0000, John Garry wrote:
> Document RWF_ATOMIC for asynchronous I/O.
> 
> Signed-off-by: John Garry <john.g.garry@oracle.com>
> ---
>  man/man2/io_submit.2 | 19 +++++++++++++++++++
>  1 file changed, 19 insertions(+)
> 
> diff --git a/man/man2/io_submit.2 b/man/man2/io_submit.2
> index c53ae9aaf..12b4a72d7 100644
> --- a/man/man2/io_submit.2
> +++ b/man/man2/io_submit.2
> @@ -140,6 +140,25 @@ as well the description of
>  .B O_SYNC
>  in
>  .BR open (2).
> +.TP
> +.BR RWF_ATOMIC " (since Linux 6.11)"
> +Write a block of data such that a write will never be torn from power fail or
> +similar.
> +See the description of
> +.B RWF_ATOMIC
> +in
> +.BR pwritev2 (2).
> +For usage with
> +.BR IOCB_CMD_PWRITEV,
> +the upper vector limit is in
> +.I stx_atomic_write_segments_max.
> +See
> +.B STATX_WRITE_ATOMIC
> +and
> +.I stx_atomic_write_segments_max
> +description
> +in
> +.BR statx (2).

Sounds good to me!
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

>  .RE
>  .TP
>  .I aio_lio_opcode
> -- 
> 2.31.1
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v4 2/3] readv.2: Document RWF_ATOMIC flag
  2024-07-17 21:44   ` Darrick J. Wong
@ 2024-07-18 14:07     ` John Garry
  2024-07-18 15:05       ` Darrick J. Wong
  0 siblings, 1 reply; 9+ messages in thread
From: John Garry @ 2024-07-18 14:07 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: alx, linux-man, linux-fsdevel, axboe, hch, dchinner,
	martin.petersen, Himanshu Madhani

On 17/07/2024 22:44, Darrick J. Wong wrote:
> On Wed, Jul 17, 2024 at 09:36:18AM +0000, John Garry wrote:
>> From: Himanshu Madhani <himanshu.madhani@oracle.com>
>>
>> Add RWF_ATOMIC flag description for pwritev2().
>>
>> Signed-off-by: Himanshu Madhani <himanshu.madhani@oracle.com>
>> [jpg: complete rewrite]
>> Signed-off-by: John Garry <john.g.garry@oracle.com>
>> ---
>>   man/man2/readv.2 | 76 ++++++++++++++++++++++++++++++++++++++++++++++++
>>   1 file changed, 76 insertions(+)
>>
>> diff --git a/man/man2/readv.2 b/man/man2/readv.2
>> index eecde06dc..9c8a11324 100644
>> --- a/man/man2/readv.2
>> +++ b/man/man2/readv.2
>> @@ -193,6 +193,66 @@ which provides lower latency, but may use additional resources.
>>   .B O_DIRECT
>>   flag.)
>>   .TP
>> +.BR RWF_ATOMIC " (since Linux 6.11)"
>> +Requires that writes to regular files in block-based filesystems be issued with
>> +torn-write protection.
>> +Torn-write protection means that for a power or any other hardware failure,
>> +all or none of the data from the write will be stored,
>> +but never a mix of old and new data.
>> +This flag is meaningful only for
>> +.BR pwritev2 (),
>> +and its effect applies only to the data range written by the system call.
>> +The total write length must be power-of-2 and must be sized in the range
>> +.RI [ stx_atomic_write_unit_min ,
>> +.IR stx_atomic_write_unit_max ].
>> +The write must be at a naturally-aligned offset within the file with respect to
>> +the total write length -
>> +for example,
> 
> Nit: these could be two sentences
> 
> "The write must be at a naturally-aligned offset within the file with
> respect to the total write length.  For example, ..."

ok, sure

> 
>> +a write of length 32KB at a file offset of 32KB is permitted,
>> +however a write of length 32KB at a file offset of 48KB is not permitted.
> 
> Pickier nit: KiB, not KB.

ok

> 
>> +The upper limit of
>> +.I iovcnt
>> +for
>> +.BR pwritev2 ()
>> +is in
> 
> "is given by" ?

ok, fine, I don't mind

> 
>> +.I stx_atomic_write_segments_max.
>> +Torn-write protection only works with
>> +.B O_DIRECT
>> +flag, i.e. buffered writes are not supported.
>> +To guarantee consistency from the write between a file's in-core state with the
>> +storage device,
>> +.BR fdatasync (2),
>> +or
>> +.BR fsync (2),
>> +or
>> +.BR open (2)
>> +and either
>> +.B O_SYNC
>> +or
>> +.B O_DSYNC,
>> +or
>> +.B pwritev2 ()
>> +and either
>> +.B RWF_SYNC
>> +or
>> +.B RWF_DSYNC
>> +is required. Flags
> 
> This sentence   ^^ should start on a new line.

yes

> 
>> +.B O_SYNC
>> +or
>> +.B RWF_SYNC
>> +provide the strongest guarantees for
>> +.BR RWF_ATOMIC,
>> +in that all data and also file metadata updates will be persisted for a
>> +successfully completed write.
>> +Just using either flags
>> +.B O_DSYNC
>> +or
>> +.B RWF_DSYNC
>> +means that all data and any file updates will be persisted for a successfully
>> +completed write.
> 

ughh, this is hard to word both concisely and accurately...

> "any file updates" ?  I /think/ the difference between O_SYNC and
> O_DSYNC is that O_DSYNC persists all data and file metadata updates for
> the file range that was written, whereas O_SYNC persists all data and
> file metadata updates for the entire file.

I think that https://man7.org/linux/man-pages/man2/open.2.html#NOTES 
describes it best.

> 
> Perhaps everything between "Flags O_SYNC or RWF_SYNC..." and "...for a
> successfully completed write." should instead refer readers to the notes
> about synchronized I/O flags in the openat manpage?

Maybe that would be better, but we just need to make it clear that 
RWF_ATOMIC provides the guarantee that the data is atomically updated 
only in addition to whatever guarantee we have for metadata updates from 
O_SYNC/O_DSYNC.


So maybe:
RWF_ATOMIC provides the guarantee that any data is written with 
torn-write protection, and additional flags O_SYNC or O_DSYNC provide
same Synchronized I/O guarantees as documented in <openat manpage reference>

OK?


> 
>> +Not using any sync flags means that there is no guarantee that data or
>> +filesystem updates are persisted.
>> +.TP
>>   .BR RWF_SYNC " (since Linux 4.7)"
>>   .\" commit e864f39569f4092c2b2bc72c773b6e486c7e3bd9
>>   Provide a per-write equivalent of the
>> @@ -279,10 +339,26 @@ values overflows an
>>   .I ssize_t
>>   value.
>>   .TP
>> +.B EINVAL
>> + For
>> +.BR RWF_ATOMIC
>> +set,
> 
> "If RWF_ATOMIC is specified..." ?
> 
> (to be a bit more consistent with the language around the AT_* flags in
> openat)

ok, fine

> 
>> +the combination of the sum of the
>> +.I iov_len
>> +values and the
>> +.I offset
>> +value does not comply with the length and offset torn-write protection rules.
>> +.TP
>>   .B EINVAL
>>   The vector count,
>>   .IR iovcnt ,
>>   is less than zero or greater than the permitted maximum.
>> +For
>> +.BR RWF_ATOMIC
>> +set, this maximum is in
> 
> (same)
> 
> --D
> 

Thanks for checking,
John


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v4 2/3] readv.2: Document RWF_ATOMIC flag
  2024-07-18 14:07     ` John Garry
@ 2024-07-18 15:05       ` Darrick J. Wong
  0 siblings, 0 replies; 9+ messages in thread
From: Darrick J. Wong @ 2024-07-18 15:05 UTC (permalink / raw)
  To: John Garry
  Cc: alx, linux-man, linux-fsdevel, axboe, hch, dchinner,
	martin.petersen, Himanshu Madhani

On Thu, Jul 18, 2024 at 03:07:59PM +0100, John Garry wrote:
> On 17/07/2024 22:44, Darrick J. Wong wrote:
> > On Wed, Jul 17, 2024 at 09:36:18AM +0000, John Garry wrote:
> > > From: Himanshu Madhani <himanshu.madhani@oracle.com>
> > > 
> > > Add RWF_ATOMIC flag description for pwritev2().
> > > 
> > > Signed-off-by: Himanshu Madhani <himanshu.madhani@oracle.com>
> > > [jpg: complete rewrite]
> > > Signed-off-by: John Garry <john.g.garry@oracle.com>
> > > ---
> > >   man/man2/readv.2 | 76 ++++++++++++++++++++++++++++++++++++++++++++++++
> > >   1 file changed, 76 insertions(+)
> > > 
> > > diff --git a/man/man2/readv.2 b/man/man2/readv.2
> > > index eecde06dc..9c8a11324 100644
> > > --- a/man/man2/readv.2
> > > +++ b/man/man2/readv.2
> > > @@ -193,6 +193,66 @@ which provides lower latency, but may use additional resources.
> > >   .B O_DIRECT
> > >   flag.)
> > >   .TP
> > > +.BR RWF_ATOMIC " (since Linux 6.11)"
> > > +Requires that writes to regular files in block-based filesystems be issued with
> > > +torn-write protection.
> > > +Torn-write protection means that for a power or any other hardware failure,
> > > +all or none of the data from the write will be stored,
> > > +but never a mix of old and new data.
> > > +This flag is meaningful only for
> > > +.BR pwritev2 (),
> > > +and its effect applies only to the data range written by the system call.
> > > +The total write length must be power-of-2 and must be sized in the range
> > > +.RI [ stx_atomic_write_unit_min ,
> > > +.IR stx_atomic_write_unit_max ].
> > > +The write must be at a naturally-aligned offset within the file with respect to
> > > +the total write length -
> > > +for example,
> > 
> > Nit: these could be two sentences
> > 
> > "The write must be at a naturally-aligned offset within the file with
> > respect to the total write length.  For example, ..."
> 
> ok, sure
> 
> > 
> > > +a write of length 32KB at a file offset of 32KB is permitted,
> > > +however a write of length 32KB at a file offset of 48KB is not permitted.
> > 
> > Pickier nit: KiB, not KB.
> 
> ok
> 
> > 
> > > +The upper limit of
> > > +.I iovcnt
> > > +for
> > > +.BR pwritev2 ()
> > > +is in
> > 
> > "is given by" ?
> 
> ok, fine, I don't mind
> 
> > 
> > > +.I stx_atomic_write_segments_max.
> > > +Torn-write protection only works with
> > > +.B O_DIRECT
> > > +flag, i.e. buffered writes are not supported.
> > > +To guarantee consistency from the write between a file's in-core state with the
> > > +storage device,
> > > +.BR fdatasync (2),
> > > +or
> > > +.BR fsync (2),
> > > +or
> > > +.BR open (2)
> > > +and either
> > > +.B O_SYNC
> > > +or
> > > +.B O_DSYNC,
> > > +or
> > > +.B pwritev2 ()
> > > +and either
> > > +.B RWF_SYNC
> > > +or
> > > +.B RWF_DSYNC
> > > +is required. Flags
> > 
> > This sentence   ^^ should start on a new line.
> 
> yes
> 
> > 
> > > +.B O_SYNC
> > > +or
> > > +.B RWF_SYNC
> > > +provide the strongest guarantees for
> > > +.BR RWF_ATOMIC,
> > > +in that all data and also file metadata updates will be persisted for a
> > > +successfully completed write.
> > > +Just using either flags
> > > +.B O_DSYNC
> > > +or
> > > +.B RWF_DSYNC
> > > +means that all data and any file updates will be persisted for a successfully
> > > +completed write.
> > 
> 
> ughh, this is hard to word both concisely and accurately...
> 
> > "any file updates" ?  I /think/ the difference between O_SYNC and
> > O_DSYNC is that O_DSYNC persists all data and file metadata updates for
> > the file range that was written, whereas O_SYNC persists all data and
> > file metadata updates for the entire file.
> 
> I think that https://man7.org/linux/man-pages/man2/open.2.html#NOTES
> describes it best.
> 
> > 
> > Perhaps everything between "Flags O_SYNC or RWF_SYNC..." and "...for a
> > successfully completed write." should instead refer readers to the notes
> > about synchronized I/O flags in the openat manpage?
> 
> Maybe that would be better, but we just need to make it clear that
> RWF_ATOMIC provides the guarantee that the data is atomically updated only
> in addition to whatever guarantee we have for metadata updates from
> O_SYNC/O_DSYNC.
> 
> 
> So maybe:
> RWF_ATOMIC provides the guarantee that any data is written with torn-write
> protection, and additional flags O_SYNC or O_DSYNC provide
> same Synchronized I/O guarantees as documented in <openat manpage reference>

  ^ the same

> 
> OK?

Yes.

> > > +Not using any sync flags means that there is no guarantee that data or
> > > +filesystem updates are persisted.
> > > +.TP
> > >   .BR RWF_SYNC " (since Linux 4.7)"
> > >   .\" commit e864f39569f4092c2b2bc72c773b6e486c7e3bd9
> > >   Provide a per-write equivalent of the
> > > @@ -279,10 +339,26 @@ values overflows an
> > >   .I ssize_t
> > >   value.
> > >   .TP
> > > +.B EINVAL
> > > + For
> > > +.BR RWF_ATOMIC
> > > +set,
> > 
> > "If RWF_ATOMIC is specified..." ?
> > 
> > (to be a bit more consistent with the language around the AT_* flags in
> > openat)
> 
> ok, fine
> 
> > 
> > > +the combination of the sum of the
> > > +.I iov_len
> > > +values and the
> > > +.I offset
> > > +value does not comply with the length and offset torn-write protection rules.
> > > +.TP
> > >   .B EINVAL
> > >   The vector count,
> > >   .IR iovcnt ,
> > >   is less than zero or greater than the permitted maximum.
> > > +For
> > > +.BR RWF_ATOMIC
> > > +set, this maximum is in
> > 
> > (same)
> > 
> > --D
> > 
> 
> Thanks for checking,

NP. :)

--D

> John
> 
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2024-07-18 15:05 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-17  9:36 [PATCH v4 0/3] man2: Document RWF_ATOMIC John Garry
2024-07-17  9:36 ` [PATCH v4 1/3] statx.2: Document STATX_WRITE_ATOMIC John Garry
2024-07-17 21:36   ` Darrick J. Wong
2024-07-17  9:36 ` [PATCH v4 2/3] readv.2: Document RWF_ATOMIC flag John Garry
2024-07-17 21:44   ` Darrick J. Wong
2024-07-18 14:07     ` John Garry
2024-07-18 15:05       ` Darrick J. Wong
2024-07-17  9:36 ` [PATCH v4 3/3] io_submit.2: Document RWF_ATOMIC John Garry
2024-07-17 21:44   ` Darrick J. Wong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).