From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-14.2 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_SANE_1 autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 370F4C64E90 for ; Fri, 20 Nov 2020 15:04:11 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id DE550221F8 for ; Fri, 20 Nov 2020 15:04:10 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="OdLSVBVV" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728404AbgKTPDu (ORCPT ); Fri, 20 Nov 2020 10:03:50 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37486 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728331AbgKTPDt (ORCPT ); Fri, 20 Nov 2020 10:03:49 -0500 Received: from mail-wr1-x42c.google.com (mail-wr1-x42c.google.com [IPv6:2a00:1450:4864:20::42c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 40306C0613CF; Fri, 20 Nov 2020 07:03:49 -0800 (PST) Received: by mail-wr1-x42c.google.com with SMTP id m6so10322025wrg.7; Fri, 20 Nov 2020 07:03:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:from:to:cc:references:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=7NX+oSialTFMrJg3jvClgkFjyUiHmTK38lfA+R7uNAI=; b=OdLSVBVVpnJcWQO8vWVXTLNcMk9Kp9Csfpn9ZISkzALxT+b3mOAAXVgt/afJe6LDI6 JEODjqzM3BBdHY6Qu/MiCIskgTbtmFOgekq0bKvt+eUr56LezDpzH1xKD8x4Yn75/YlX PdvfyOPqYUwzi41nxw8ugrYg4TffzvPFsWKDS+GdYQhmKq2qq7D2sMhzQvY+O7JfEe9T xCc4KOIy2jRteC+8c1Kno7g/6PNQo6rQ8M66GZkWlxNIugDb5CJznOv7NpEl9uhq3GCC ZNXoPISgokzG/meqforCwwcVKB57KU5oV++o4niQ9u9cV/2/HV8dbCKIEemZolE6+ML2 21mg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:from:to:cc:references:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=7NX+oSialTFMrJg3jvClgkFjyUiHmTK38lfA+R7uNAI=; b=LTIarVjWzNvglDHyjUqMOrxnSHe7Ockm6PyQM7ElgTFH2NwJyF1Tj0ac0zpczEJXck 0Dd1CboiFUySfXhd+/83u1Xjh+iDZwzvqe1+GjHRzVOe1gioAj1BKTyvE0xUE5Ym4yM8 SBavs8MmlrcgDlMdA/QzCcTcv25/Yt8gP1wYQ4Wq6oVqa4cgzkmiqPyoP8Cxhr9c62pi WR7zVxZVJ3p+ORDP0OFAhJ3KppQeqI9KZ4UJRO1ixzBDOyx1kzIAd0q3eOEimcPwExLr Q4768huRInkVvovI2iKzxHaL9kw9kZRJkMNrANaTANIFiJegD01cYnKNQSEYEz0rJcbx 60RQ== X-Gm-Message-State: AOAM533A2EGAlAlw510ErUAxyjPpIyt/vlGE+kXXmhIrGKAgubdCaHFv GMdr+/rG3+SI6Oh6UaEI+wX/i9Lof2EFsg== X-Google-Smtp-Source: ABdhPJzTjJe6XGV9JiD35mj+WY5jHtje8HxjGC/SnQ+VL7Ctb+53BG2kFPyV44229ogUP5awuHG9AA== X-Received: by 2002:a5d:42c9:: with SMTP id t9mr15914353wrr.13.1605884626873; Fri, 20 Nov 2020 07:03:46 -0800 (PST) Received: from [192.168.1.143] ([170.253.51.130]) by smtp.gmail.com with ESMTPSA id m20sm6152329wrg.79.2020.11.20.07.03.45 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 20 Nov 2020 07:03:46 -0800 (PST) Subject: Re: [PATCH man-pages v6] Document encoded I/O From: "Alejandro Colomar (man-pages)" To: Omar Sandoval , Michael Kerrisk Cc: linux-fsdevel@vger.kernel.org, linux-btrfs@vger.kernel.org, Al Viro , Christoph Hellwig , Dave Chinner , Jann Horn , Amir Goldstein , Aleksa Sarai , linux-api@vger.kernel.org, kernel-team@fb.com, linux-man References: <4d1430aa-a374-7565-4009-7ec5139bf311@gmail.com> Message-ID: <05e1f13c-5776-961b-edc4-0d09d02b7829@gmail.com> Date: Fri, 20 Nov 2020 16:03:44 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.12.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-api@vger.kernel.org Hi Omar, I found a wording of mine to be a bit confusing. Please see below. Thanks, Alex On 11/20/20 3:06 PM, Alejandro Colomar (man-pages) wrote: > Hi Omar and Michael, > > please, see below. > > Thanks, > > Alex > > On 11/20/20 12:29 AM, Alejandro Colomar (mailing lists; readonly) wrote: >> Hi Omar, >> >> Please, see some fixes below: >> >> Michael, I've also some questions for you below >> (you can grep for mtk to find those). >> >> Thanks, >> >> Alex >> >> On 11/18/20 8:18 PM, Omar Sandoval wrote: >>> From: Omar Sandoval >>> >>> This adds a new page, encoded_io(7), providing an overview of encoded >>> I/O and updates fcntl(2), open(2), and preadv2(2)/pwritev2(2) to >>> reference it. >>> >>> Cc: Michael Kerrisk >>> Cc: linux-man >>> Signed-off-by: Omar Sandoval >>> --- >>> This feature is not yet upstream. >>> >>> man2/fcntl.2 | 10 +- >>> man2/open.2 | 23 +++ >>> man2/readv.2 | 70 +++++++++ >>> man7/encoded_io.7 | 369 ++++++++++++++++++++++++++++++++++++++++++++++ >>> 4 files changed, 471 insertions(+), 1 deletion(-) >>> create mode 100644 man7/encoded_io.7 >>> >>> diff --git a/man2/fcntl.2 b/man2/fcntl.2 >>> index 546016617..b0d7fa2c3 100644 >>> --- a/man2/fcntl.2 >>> +++ b/man2/fcntl.2 >>> @@ -221,8 +221,9 @@ On Linux, this command can change only the >>> .BR O_ASYNC , >>> .BR O_DIRECT , >>> .BR O_NOATIME , >>> +.BR O_NONBLOCK , >>> and >>> -.B O_NONBLOCK >>> +.B O_ALLOW_ENCODED >>> flags. >>> It is not possible to change the >>> .BR O_DSYNC >>> @@ -1820,6 +1821,13 @@ Attempted to clear the >>> flag on a file that has the append-only attribute set. >>> .TP >>> .B EPERM >>> +Attempted to set the >>> +.B O_ALLOW_ENCODED >>> +flag and the calling process did not have the >>> +.B CAP_SYS_ADMIN >>> +capability. >>> +.TP >>> +.B EPERM >>> .I cmd >>> was >>> .BR F_ADD_SEALS , >>> diff --git a/man2/open.2 b/man2/open.2 >>> index f587b0d95..84697dfa8 100644 >>> --- a/man2/open.2 >>> +++ b/man2/open.2 >>> @@ -437,6 +437,16 @@ was followed by a call to >>> .BR fdatasync (2)). >>> .IR "See NOTES below" . >>> .TP >>> +.B O_ALLOW_ENCODED >> >> The list is alphabetically sorted; >> please, follow that >> (O_ALLOW_ENCODED should be the first one). >> >>> +Open the file with encoded I/O permissions; >>> +see >>> +.BR encoded_io (7). >>> +.B O_CLOEXEC >>> +must be specified in conjuction with this flag. >>> +The caller must have the >>> +.B CAP_SYS_ADMIN >>> +capability. >>> +.TP >>> .B O_EXCL >>> Ensure that this call creates the file: >>> if this flag is specified in conjunction with >>> @@ -1082,6 +1092,14 @@ is invalid >>> (e.g., it contains characters not permitted by the underlying filesystem). >>> .TP >>> .B EINVAL >>> +.B O_ALLOW_ENCODED >>> +was specified in >>> +.IR flags , >>> +but >>> +.B O_CLOEXEC >>> +was not specified. >>> +.TP >>> +.B EINVAL >>> The final component ("basename") of >>> .I pathname >>> is invalid >>> @@ -1238,6 +1256,11 @@ did not match the owner of the file and the caller was not privileged. >>> The operation was prevented by a file seal; see >>> .BR fcntl (2). >>> .TP >>> +.B EPERM >>> +The >>> +.B O_ALLOW_ENCODED >>> +flag was specified, but the caller was not privileged. >>> +.TP >>> .B EROFS >>> .I pathname >>> refers to a file on a read-only filesystem and write access was >>> diff --git a/man2/readv.2 b/man2/readv.2 >>> index 5a8b74168..c9933acf0 100644 >>> --- a/man2/readv.2 >>> +++ b/man2/readv.2 >>> @@ -264,6 +264,11 @@ the data is always appended to the end of the file. >>> However, if the >>> .I offset >>> argument is \-1, the current file offset is updated. >>> +.TP >>> +.BR RWF_ENCODED " (since Linux 5.12)" >>> +Read or write encoded (e.g., compressed) data. >>> +See >>> +.BR encoded_io (7). >>> .SH RETURN VALUE >>> On success, >>> .BR readv (), >>> @@ -283,6 +288,13 @@ than requested (see >>> and >>> .BR write (2)). >>> .PP >>> +If >>> +.B >>> +RWF_ENCODED >> >> RWF_ENCODED should go in the same line as .B: >> >> [ >> .B RWF_ENCODED >> ] >> >>> +was specified in >>> +.IR flags , >>> +then the return value is the number of encoded bytes. >>> +.PP >>> On error, \-1 is returned, and \fIerrno\fP is set appropriately. >>> .SH ERRORS >>> The errors are as given for >>> @@ -313,6 +325,64 @@ is less than zero or greater than the permitted maximum. >>> .TP >>> .B EOPNOTSUPP >>> An unknown flag is specified in \fIflags\fP. >>> +.TP >>> +.B EOPNOTSUPP >>> +.B RWF_ENCODED >>> +is specified in >>> +.I flags >>> +and the filesystem does not implement encoded I/O. >>> +.TP >>> +.B EPERM >>> +.B RWF_ENCODED >>> +is specified in >>> +.I flags >>> +and the file was not opened with the >>> +.B O_ALLOW_ENCODED >>> +flag. >>> +.PP >>> +.BR preadv2 () >>> +can fail for the following reasons: >> >> The wording is a bit unclear: >> >> Above your additions (old text, not yours), >> it says that some errors apply to preadv2 >> (as well as to other functions): >> >> [ >> ERRORS >> The errors are as given for read(2) and write(2). Furthermore, >> preadv(), preadv2(), pwritev(), and pwritev2() can also fail >> for the same reasons as lseek(2). Additionally, the following >> errors are defined: >> >> EINVAL The sum of the iov_len values overflows an ssize_t >> value. >> >> EINVAL The vector count, iovcnt, is less than zero or greater >> than the permitted maximum. >> >> EOPNOTSUPP >> An unknown flag is specified in flags. >> >> EOPNOTSUPP >> RWF_ENCODED is specified in flags and the filesystem >> does not implement encoded I/O. >> >> EPERM RWF_ENCODED is specified in flags and the file was not >> opened with the O_ALLOW_ENCODED flag. >> ] >> >> And then you added a line that says: >> >> [ >> preadv2() can fail for the following reasons: >> ] >> >> Which if read strictly, it says that [only] the following errors apply. >> >> Did you mean that >> "preadv3() can _additionally_ fail for the following reasons"? >> >> Could you please be a bit more specific? >> >> The same applies for pwritev2() below. >> >>> +.TP >>> +.B E2BIG >>> +.B RWF_ENCODED >>> +is specified in >>> +.I flags >>> +and >>> +.I iov[0] >>> +is not large enough to return the encoding metadata. >>> +.TP >>> +.B ENOBUFS >>> +.B RWF_ENCODED >>> +is specified in >>> +.I flags >>> +and the buffers in >>> +.I iov >>> +are not big enough to return the encoded data. >>> +.PP >>> +.BR pwritev2 () >>> +can fail for the following reasons: >>> +.TP >>> +.B E2BIG >>> +.B RWF_ENCODED >>> +is specified in >>> +.I flags >>> +and >>> +.I iov[0] >>> +contains non-zero fields >>> +after the kernel's >>> +.IR "sizeof(struct\ encoded_iov)" . >> >> Don't escape the space, if the string is already in "". >> >>> +.TP >>> +.B EINVAL >>> +.B RWF_ENCODED >>> +is specified in >>> +.I flags >>> +and the encoding is unknown or not supported by the filesystem. >>> +.TP >>> +.B EINVAL >>> +.B RWF_ENCODED >>> +is specified in >>> +.I flags >>> +and the alignment and/or size requirements are not met. >>> .SH VERSIONS >>> .BR preadv () >>> and >>> diff --git a/man7/encoded_io.7 b/man7/encoded_io.7 >>> new file mode 100644 >>> index 000000000..106fa587b >>> --- /dev/null >>> +++ b/man7/encoded_io.7 >>> @@ -0,0 +1,369 @@ >>> +.\" Copyright (c) 2020 by Omar Sandoval >>> +.\" >>> +.\" %%%LICENSE_START(VERBATIM) >>> +.\" Permission is granted to make and distribute verbatim copies of this >>> +.\" manual provided the copyright notice and this permission notice are >>> +.\" preserved on all copies. >>> +.\" >>> +.\" Permission is granted to copy and distribute modified versions of this >>> +.\" manual under the conditions for verbatim copying, provided that the >>> +.\" entire resulting derived work is distributed under the terms of a >>> +.\" permission notice identical to this one. >>> +.\" >>> +.\" Since the Linux kernel and libraries are constantly changing, this >>> +.\" manual page may be incorrect or out-of-date. The author(s) assume no >>> +.\" responsibility for errors or omissions, or for damages resulting from >>> +.\" the use of the information contained herein. The author(s) may not >>> +.\" have taken the same level of care in the production of this manual, >>> +.\" which is licensed free of charge, as they might when working >>> +.\" professionally. >>> +.\" >>> +.\" Formatted or processed versions of this manual, if unaccompanied by >>> +.\" the source, must acknowledge the copyright and authors of this work. >>> +.\" %%%LICENSE_END >>> +.\" >>> +.\" >>> +.TH ENCODED_IO 7 2020-11-11 "Linux" "Linux Programmer's Manual" >>> +.SH NAME >>> +encoded_io \- overview of encoded I/O >>> +.SH DESCRIPTION >>> +Several filesystems (e.g., Btrfs) support transparent encoding >>> +(e.g., compression, encryption) of data on disk: >>> +written data is encoded by the kernel before it is written to disk, >>> +and read data is decoded before being returned to the user. >>> +In some cases, it is useful to skip this encoding step. >> >> Here I would use ';' instead of '.' >> (and next letter would be lowercase, then). >> >>> +For example, the user may want to read the compressed contents of a file >>> +or write pre-compressed data directly to a file. >>> +This is referred to as "encoded I/O". >>> +.SS Encoded I/O API >>> +Encoded I/O is specified with the >>> +.B RWF_ENCODED >>> +flag to >>> +.BR preadv2 (2) >>> +and >>> +.BR pwritev2 (2). >>> +If >>> +.B RWF_ENCODED >>> +is specified, then >>> +.I iov[0].iov_base >>> +points to an >>> +.I >>> +encoded_iov >> >> On the same line, please. >> >>> +structure, defined in >>> +.I >>> +as: >>> +.PP >>> +.in +4n >>> +.EX >>> +struct encoded_iov { >>> + __aligned_u64 len; >>> + __aligned_u64 unencoded_len; >>> + __aligned_u64 unencoded_offset; >>> + __u32 compression; >>> + __u32 encryption; >>> +}; >>> +.EE >>> +.in >>> +.PP >>> +This may be extended in the future, so >>> +.I iov[0].iov_len >>> +must be set to >>> +.I "sizeof(struct\ encoded_iov)" >>> +for forward/backward compatibility. >>> +The remaining buffers contain the encoded data. >>> +.PP >>> +.I compression >>> +and >>> +.I encryption >>> +are the encoding fields. >>> +.I compression >>> +is >>> +.B ENCODED_IOV_COMPRESSION_NONE >>> +(zero) >>> +or a filesystem-specific >>> +.B ENCODED_IOV_COMPRESSION >> >> Maybe s/ENCODED_IOV_COMPRESSION/ENCODED_IOV_COMPRESSION_*/ > > Or s/ENCODED_IOV_COMPRESSION/ENCODED_IOV_COMPRESSION_/ > > I'm not sure about existing practice. > > Michael (mtk), what would you do here? > >> >>> +constant; >>> +see >>> +.BR Filesystem\ support . >> >> Please, write it as [.BR "Filesystem support" .] >> >> and maybe I would change it, to be more specific, to the following: >> >> [ >> see >> .B Filesystem support >> below. >> ] >> >> So that the reader clearly understands it's on the same page. >> >>> +.I encryption >>> +is currently always >>> +.B ENCODED_IOV_ENCRYPTION_NONE >>> +(zero). >>> +.PP >>> +.I unencoded_len >>> +is the length of the unencoded (i.e., decrypted and decompressed) data. >>> +.I unencoded_offset >>> +is the offset into the unencoded data where the data in the file begins >> >> The above wording is a bit unclear to me. >> >> I suggest the following: >> >> [ >> .I unencoded_offset >> is the offset from the begining of the file >> to the first byte of the unencoded data >> ] Now I've read it again, and my wording was even worse than yours. I think yours can be understood after a few reads. However, I'll still try to reword mine to see if I add some value: [ .I unencoded_offset is the offset from the first byte of the unencoded data to the first byte of logical data. ] If you prefer yours, or a mix, that's fine. >> >>> +(less than or equal to >>> +.IR unencoded_len ). >>> +.I len >>> +is the length of the data in the file >>> +(less than or equal to >>> +.I unencoded_len >>> +- >> >> Here's a question for Michael (mtk): >> >> I've seen (many) cases where these math operations >> are written without spaces, >> and in the same line (e.g., [.IR a + b]). >> >> I'd like to know your preferences on this, >> or what is actually more extended in the manual pages, >> to stick with only one of them. >> >>> +.IR unencoded_offset ). >>> +See >>> +.B Extent layout >>> +below for some examples. >>> +.I >> >> Were you maybe going to add something there? >> >> If not, please remove that [.I]. >> >>> +.PP >>> +If the unencoded data is actually longer than >>> +.IR unencoded_len , >>> +then it is truncated; >>> +if it is shorter, then it is extended with zeroes. >>> +.PP >>> + >> >> Please, remove that blank line. >> >>> +.BR pwritev2 () >> >> Should be [.BR pwritev2 (2)] >> >> Michael (mtk), >> >> Am I right in that? Please, confirm. >> >>> +uses the metadata specified in >>> +.IR iov[0] , >>> +writes the encoded data from the remaining buffers, >>> +and returns the number of encoded bytes written >>> +(that is, the sum of >>> +.I iov[n].iov_len >>> +for 1 <= >>> +.I n >>> +< >>> +.IR iovcnt ; >>> +partial writes will not occur). >>> +At least one encoding field must be non-zero. >>> +Note that the encoded data is not validated when it is written; >>> +if it is not valid (e.g., it cannot be decompressed), >>> +then a subsequent read may return an error. >>> +If the >>> +.I offset >>> +argument to >>> +.BR pwritev2 () >> >> Same as above: specify (2). >> >>> +is -1, then the file offset is incremented by >>> +.IR len . >>> +If >>> +.I iov[0].iov_len >>> +is less than >>> +.I "sizeof(struct\ encoded_iov)" >> >> [.I] allows spaces, so it should be: >> >> [ >> .I sizeof(struct encoded_iov) >> ] >> >>> +in the kernel, >>> +then any fields unknown to userspace are treated as if they were zero; >> >> s/userspace/user space/ >> >> See man-pages(7)::STYLE GUIDE::Preferred terms >> >>> +if it is greater and any fields unknown to the kernel are non-zero, >>> +then this returns -1 and sets >>> +.I errno >>> +to >>> +.BR E2BIG . >>> +.PP >>> +.BR preadv2 () >> >> Same as above: specify (2). >> >>> +populates the metadata in >>> +.IR iov[0] , >>> +the encoded data in the remaining buffers, >>> +and returns the number of encoded bytes read. >>> +This will only return one extent per call. >>> +This can also read data which is not encoded; >>> +all encoding fields will be zero in that case. >>> +If the >>> +.I offset >>> +argument to >>> +.BR preadv2 () >> >> Smae as above: specify (2). >> >>> +is -1, then the file offset is incremented by >>> +.IR len . >>> +If >>> +.I iov[0].iov_len >>> +is less than >>> +.I "sizeof(struct\ encoded_iov)" >> >> Don't need '"' nor '\', as above. >> >>> +in the kernel and any fields unknown to userspace are non-zero, >> >> s/userspace/user space/ >> >>> +then >>> +.BR preadv2 () >> >> (2) >> >>> +returns -1 and sets >>> +.I errno >>> +to >>> +.BR E2BIG ; >>> +if it is greater, >>> +then any fields unknown to the kernel are returned as zero. >>> +If the provided buffers are not large enough to return an entire encoded >>> +extent, >> >> Please use semantic newlines. >> I haven't checked that in the text above, >> so if you happen to find that there's any other line >> that should also be fixed in that sense, please do so. >> >> To understand 'semantic newlines', >> please have a look at >> man-pages(7)::STYLE GUIDE::Use semantic newlines >> >> Basically, split lines at the most natural separation point, >> instead of just when the line gets over the margin. >> >>> +then >>> +.BR preadv2 () >> >> (2) >> >>> +returns -1 and sets >>> +.I errno >>> +to >>> +.BR ENOBUFS . >>> +.PP >>> +As the filesystem page cache typically contains decoded data, >>> +encoded I/O bypasses the page cache. >>> +.SS Extent layout >>> +By using >>> +.IR len , >>> +.IR unencoded_len , >>> +and >>> +.IR unencoded_offset , >>> +it is possible to refer to a subset of an unencoded extent. >>> +.PP >>> +In the simplest case, >>> +.I len >>> +is equal to >>> +.I unencoded_len >>> +and >>> +.I unencoded_offset >>> +is zero. >>> +This means that the entire unencoded extent is used. >>> +.PP >>> +However, suppose we read 50 bytes into a file >>> +which contains a single compressed extent. >>> +The filesystem must still return the entire compressed extent >>> +for us to be able to decompress it, >>> +so >>> +.I unencoded_len >>> +would be the length of the entire decompressed extent. >>> +However, because the read was at offset 50, >>> +the first 50 bytes should be ignored. >>> +Therefore, >>> +.I unencoded_offset >>> +would be 50, >>> +and >>> +.I len >>> +would accordingly be >>> +.IR unencoded_len\ -\ 50 . >> >> This formats everything as I, except for the last dot. >> Replace by: >> >> [ >> .I unencoded >> - 50. >> ] >> >> Michael (mtk), same as above: >> to space, or not to space? That is the question :p >> >> Personally, I find spaces more clear. >> >>> +.PP >>> +Additionally, suppose we want to create an encrypted file with length 500, >>> +but the file is encrypted with a block cipher using a block size of 4096. >>> +The unencoded data would therefore include the appropriate padding, >>> +and >>> +.I unencoded_len >>> +would be 4096. >>> +However, to represent the logical size of the file, >>> +.I len >>> +would be 500 >>> +(and >>> +.I unencoded_offset >>> +would be 0). >>> +.PP >>> +Similar situations can arise in other cases: >>> +.IP * 3 >>> +If the filesystem pads data to the filesystem block size before compressing, >>> +then compressed files with a size unaligned to the filesystem block size will >>> +end with an extent with >>> +.I len >>> +< >>> +.IR unencoded_len . >>> +.IP * >>> +Extents cloned from the middle of a larger encoded extent with >>> +.B FICLONERANGE >>> +may have a non-zero >>> +.I unencoded_offset >>> +and/or >>> +.I len >>> +< >>> +.IR unencoded_len . >>> +.IP * >>> +If the middle of an encoded extent is overwritten, >>> +the filesystem may create extents with a non-zero >>> +.I unencoded_offset >>> +and/or >>> +.I len >>> +< >>> +.I unencoded_len >>> +for the parts that were not overwritten. >>> +.SS Security >>> +Encoded I/O creates the potential for some security issues: >>> +.IP * 3 >>> +Encoded writes allow writing arbitrary data which the kernel will decode on >>> +a subsequent read. Decompression algorithms are complex and may have bugs >>> +which can be exploited by maliciously crafted data. >>> +.IP * >>> +Encoded reads may return data which is not logically present in the file >>> +(see the discussion of >>> +.I len >>> +vs. >> >> Please, s/vs./vs/ >> See the reasons below: >> >> Michael (mtk), >> >> Here the renderer outputs a double space >> (as for separating two sentences). >> >> Are you okay with that? >> >> I haven't found any other "\\.". >> However, I've found a few "\[^\.]". >> >>> +.I unencoded_len >>> +above). >>> +It may not be intended for this data to be readable. >>> +.PP >>> +Therefore, encoded I/O requires privilege. >>> +Namely, the >>> +.B RWF_ENCODED >>> +flag may only be used when the file was opened with the >>> +.B O_ALLOW_ENCODED >>> +flag to >>> +.BR open (2), >>> +which requires the >>> +.B CAP_SYS_ADMIN >>> +capability. >>> +The >>> +.B O_CLOEXEC >>> +flag must be specified in conjunction with >>> +.BR O_ALLOW_ENCODED . >>> +This avoids accidentally leaking the encoded I/O privilege >>> +(it is not cleared on >>> +.BR fork (2) >>> +or >>> +.BR execve (2) >>> +otherwise). >>> +If >>> +.B O_ALLOW_ENCODED >>> +without >>> +.B O_CLOEXEC >>> +is desired, >>> +.B O_CLOEXEC >>> +can be cleared afterwards with >>> +.BR fnctl (2). >>> +.BR fcntl (2) >>> +can also clear or set >>> +.B O_ALLOW_ENCODED >>> +(including without >>> +.BR O_CLOEXEC ). >>> +.SS Filesystem support >>> +Encoded I/O is supported on the following filesystems: >>> +.TP >>> +Btrfs (since Linux 5.12) >>> +.IP >>> +Btrfs supports encoded reads and writes of compressed data. >>> +The data is encoded as follows: >>> +.RS >>> +.IP * 3 >>> +If >>> +.I compression >>> +is >>> +.BR ENCODED_IOV_COMPRESSION_BTRFS_ZLIB , >>> +then the encoded data is a single zlib stream. >>> +.IP * >>> +If >>> +.I compression >>> +is >>> +.BR ENCODED_IOV_COMPRESSION_BTRFS_ZSTD , >>> +then the encoded data is a single zstd frame compressed with the >>> +.I windowLog >>> +compression parameter set to no more than 17. >>> +.IP * >>> +If >>> +.I compression >>> +is one of >>> +.BR ENCODED_IOV_COMPRESSION_BTRFS_LZO_4K , >>> +.BR ENCODED_IOV_COMPRESSION_BTRFS_LZO_8K , >>> +.BR ENCODED_IOV_COMPRESSION_BTRFS_LZO_16K , >>> +.BR ENCODED_IOV_COMPRESSION_BTRFS_LZO_32K , >>> +or >>> +.BR ENCODED_IOV_COMPRESSION_BTRFS_LZO_64K , >>> +then the encoded data is compressed page by page >>> +(using the page size indicated by the name of the constant) >>> +with LZO1X >>> +and wrapped in the format documented in the Linux kernel source file >>> +.IR fs/btrfs/lzo.c . >>> +.RE >>> +.IP >>> +Additionally, there are some restrictions on >>> +.BR pwritev2 (): >> >> (2) >> >>> +.RS >>> +.IP * 3 >>> +.I offset >>> +(or the current file offset if >>> +.I offset >>> +is -1) must be aligned to the sector size of the filesystem. >>> +.IP * >>> +.I len >>> +must be aligned to the sector size of the filesystem >>> +unless the data ends at or beyond the current end of the file. >>> +.IP * >>> +.I unencoded_len >>> +and the length of the encoded data must each be no more than 128 KiB. >>> +This limit may increase in the future. >>> +.IP * >>> +The length of the encoded data must be less than or equal to >>> +.IR unencoded_len . >>> +.IP * >>> +If using LZO, the filesystem's page size must match the compression page size. >>> +.RE >>> >> >> Please, add a SEE ALSO section, which should at least point to >> preadv2(2) (or pwritev2(2), if you prefer): >> >> [ >> .SH SEE ALSO >> .BR preadv2 (2) >> ] >>