linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: John Garry <john.g.garry@oracle.com>
To: "Darrick J. Wong" <djwong@kernel.org>
Cc: alx@kernel.org, linux-man@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, axboe@kernel.dk, hch@lst.de,
	dchinner@redhat.com, martin.petersen@oracle.com,
	Himanshu Madhani <himanshu.madhani@oracle.com>
Subject: Re: [PATCH v4 2/3] readv.2: Document RWF_ATOMIC flag
Date: Thu, 18 Jul 2024 15:07:59 +0100	[thread overview]
Message-ID: <2eb8c7b7-7758-49a3-b837-2e2a622c0ed9@oracle.com> (raw)
In-Reply-To: <20240717214423.GI1998502@frogsfrogsfrogs>

On 17/07/2024 22:44, Darrick J. Wong wrote:
> On Wed, Jul 17, 2024 at 09:36:18AM +0000, John Garry wrote:
>> From: Himanshu Madhani <himanshu.madhani@oracle.com>
>>
>> Add RWF_ATOMIC flag description for pwritev2().
>>
>> Signed-off-by: Himanshu Madhani <himanshu.madhani@oracle.com>
>> [jpg: complete rewrite]
>> Signed-off-by: John Garry <john.g.garry@oracle.com>
>> ---
>>   man/man2/readv.2 | 76 ++++++++++++++++++++++++++++++++++++++++++++++++
>>   1 file changed, 76 insertions(+)
>>
>> diff --git a/man/man2/readv.2 b/man/man2/readv.2
>> index eecde06dc..9c8a11324 100644
>> --- a/man/man2/readv.2
>> +++ b/man/man2/readv.2
>> @@ -193,6 +193,66 @@ which provides lower latency, but may use additional resources.
>>   .B O_DIRECT
>>   flag.)
>>   .TP
>> +.BR RWF_ATOMIC " (since Linux 6.11)"
>> +Requires that writes to regular files in block-based filesystems be issued with
>> +torn-write protection.
>> +Torn-write protection means that for a power or any other hardware failure,
>> +all or none of the data from the write will be stored,
>> +but never a mix of old and new data.
>> +This flag is meaningful only for
>> +.BR pwritev2 (),
>> +and its effect applies only to the data range written by the system call.
>> +The total write length must be power-of-2 and must be sized in the range
>> +.RI [ stx_atomic_write_unit_min ,
>> +.IR stx_atomic_write_unit_max ].
>> +The write must be at a naturally-aligned offset within the file with respect to
>> +the total write length -
>> +for example,
> 
> Nit: these could be two sentences
> 
> "The write must be at a naturally-aligned offset within the file with
> respect to the total write length.  For example, ..."

ok, sure

> 
>> +a write of length 32KB at a file offset of 32KB is permitted,
>> +however a write of length 32KB at a file offset of 48KB is not permitted.
> 
> Pickier nit: KiB, not KB.

ok

> 
>> +The upper limit of
>> +.I iovcnt
>> +for
>> +.BR pwritev2 ()
>> +is in
> 
> "is given by" ?

ok, fine, I don't mind

> 
>> +.I stx_atomic_write_segments_max.
>> +Torn-write protection only works with
>> +.B O_DIRECT
>> +flag, i.e. buffered writes are not supported.
>> +To guarantee consistency from the write between a file's in-core state with the
>> +storage device,
>> +.BR fdatasync (2),
>> +or
>> +.BR fsync (2),
>> +or
>> +.BR open (2)
>> +and either
>> +.B O_SYNC
>> +or
>> +.B O_DSYNC,
>> +or
>> +.B pwritev2 ()
>> +and either
>> +.B RWF_SYNC
>> +or
>> +.B RWF_DSYNC
>> +is required. Flags
> 
> This sentence   ^^ should start on a new line.

yes

> 
>> +.B O_SYNC
>> +or
>> +.B RWF_SYNC
>> +provide the strongest guarantees for
>> +.BR RWF_ATOMIC,
>> +in that all data and also file metadata updates will be persisted for a
>> +successfully completed write.
>> +Just using either flags
>> +.B O_DSYNC
>> +or
>> +.B RWF_DSYNC
>> +means that all data and any file updates will be persisted for a successfully
>> +completed write.
> 

ughh, this is hard to word both concisely and accurately...

> "any file updates" ?  I /think/ the difference between O_SYNC and
> O_DSYNC is that O_DSYNC persists all data and file metadata updates for
> the file range that was written, whereas O_SYNC persists all data and
> file metadata updates for the entire file.

I think that https://man7.org/linux/man-pages/man2/open.2.html#NOTES 
describes it best.

> 
> Perhaps everything between "Flags O_SYNC or RWF_SYNC..." and "...for a
> successfully completed write." should instead refer readers to the notes
> about synchronized I/O flags in the openat manpage?

Maybe that would be better, but we just need to make it clear that 
RWF_ATOMIC provides the guarantee that the data is atomically updated 
only in addition to whatever guarantee we have for metadata updates from 
O_SYNC/O_DSYNC.


So maybe:
RWF_ATOMIC provides the guarantee that any data is written with 
torn-write protection, and additional flags O_SYNC or O_DSYNC provide
same Synchronized I/O guarantees as documented in <openat manpage reference>

OK?


> 
>> +Not using any sync flags means that there is no guarantee that data or
>> +filesystem updates are persisted.
>> +.TP
>>   .BR RWF_SYNC " (since Linux 4.7)"
>>   .\" commit e864f39569f4092c2b2bc72c773b6e486c7e3bd9
>>   Provide a per-write equivalent of the
>> @@ -279,10 +339,26 @@ values overflows an
>>   .I ssize_t
>>   value.
>>   .TP
>> +.B EINVAL
>> + For
>> +.BR RWF_ATOMIC
>> +set,
> 
> "If RWF_ATOMIC is specified..." ?
> 
> (to be a bit more consistent with the language around the AT_* flags in
> openat)

ok, fine

> 
>> +the combination of the sum of the
>> +.I iov_len
>> +values and the
>> +.I offset
>> +value does not comply with the length and offset torn-write protection rules.
>> +.TP
>>   .B EINVAL
>>   The vector count,
>>   .IR iovcnt ,
>>   is less than zero or greater than the permitted maximum.
>> +For
>> +.BR RWF_ATOMIC
>> +set, this maximum is in
> 
> (same)
> 
> --D
> 

Thanks for checking,
John


  reply	other threads:[~2024-07-18 14:08 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-07-17  9:36 [PATCH v4 0/3] man2: Document RWF_ATOMIC John Garry
2024-07-17  9:36 ` [PATCH v4 1/3] statx.2: Document STATX_WRITE_ATOMIC John Garry
2024-07-17 21:36   ` Darrick J. Wong
2024-07-17  9:36 ` [PATCH v4 2/3] readv.2: Document RWF_ATOMIC flag John Garry
2024-07-17 21:44   ` Darrick J. Wong
2024-07-18 14:07     ` John Garry [this message]
2024-07-18 15:05       ` Darrick J. Wong
2024-07-17  9:36 ` [PATCH v4 3/3] io_submit.2: Document RWF_ATOMIC John Garry
2024-07-17 21:44   ` Darrick J. Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2eb8c7b7-7758-49a3-b837-2e2a622c0ed9@oracle.com \
    --to=john.g.garry@oracle.com \
    --cc=alx@kernel.org \
    --cc=axboe@kernel.dk \
    --cc=dchinner@redhat.com \
    --cc=djwong@kernel.org \
    --cc=hch@lst.de \
    --cc=himanshu.madhani@oracle.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-man@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).