Netdev List
 help / color / mirror / Atom feed
* [PATCH net] tls: avoid zc receive for file-backed pages
       [not found] <2026052150-stylus-germicide-780e@gregkh>
@ 2026-05-21 16:53 ` Yeonju Bae
  2026-05-25 17:54   ` Jakub Kicinski
  0 siblings, 1 reply; 2+ messages in thread
From: Yeonju Bae @ 2026-05-21 16:53 UTC (permalink / raw)
  To: netdev
  Cc: gregkh, security, john.fastabend, kuba, sd, davem, edumazet,
	pabeni, horms, linux-kernel, Yeonju Bae

kTLS RX zc decrypt writes unauthenticated AEAD output directly into
pages pinned from the recvmsg iterator via tls_setup_from_iter().
For MAP_SHARED, PROT_WRITE file-backed destinations, those pages are
live page-cache pages rather than anonymous copies: MAP_SHARED does not
trigger copy-on-write, so FOLL_WRITE returns the actual page-cache page.

crypto_aead_decrypt() writes CTR-mode decryption output into the
scatter-gather list before the authentication tag is verified.  If the
tag check fails (-EBADMSG), the plaintext-like output is already
resident in the page-cache page.  exit_free_pages() calls put_page()
without any content cleanup, so the modification persists through the
backing file.  An independent open(O_RDONLY)/read() of the same file
returns different content and its SHA-256 changes.  MAP_PRIVATE is safe
via COW; PROT_READ-only destinations fail at iov_iter_get_pages2()
before any decryption occurs.

Avoid zc receive for file-backed destination pages.  In
tls_setup_from_iter(), after iov_iter_get_pages2() pins pages, check
each page with folio_mapping(page_folio(page)).  If any pinned page is
file-backed (mapping != NULL), release the pinned pages and return
-EOPNOTSUPP.  Handle -EOPNOTSUPP in tls_decrypt_sw() by clearing
darg->zc and retrying, which causes tls_decrypt_sg() to allocate a
kernel bounce buffer instead.  Decryption output never reaches the
file-backed page; on tag failure the bounce buffer is discarded.

This follows the existing opportunistic zc retry pattern already used
for TLS 1.3 record type mismatches in tls_decrypt_sw().

Verified on linux-7.0-rc3 QEMU (x86-64), four destination types:
  MAP_SHARED+RW:   file_changed=0  (was 4077/4096 bytes before patch)
  MAP_PRIVATE+RW:  file_changed=0  (COW isolation; unchanged)
  anonymous heap:  no file backing  (unchanged)
  PROT_READ only:  file_changed=0  (EFAULT before decrypt; unchanged)

Signed-off-by: Yeonju Bae <iwasbaeyz@gmail.com>
---
 net/tls/tls_sw.c | 37 +++++++++++++++++++++++++++++++++++++
 1 file changed, 37 insertions(+)

diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index a977b0434..c312a83b4 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -36,6 +36,7 @@
  */
 
 #include <linux/bug.h>
+#include <linux/pagemap.h>
 #include <linux/sched/signal.h>
 #include <linux/module.h>
 #include <linux/kernel.h>
@@ -1443,6 +1444,34 @@ static int tls_setup_from_iter(struct iov_iter *from,
 
 		length -= copied;
 		size += copied;
+		/* Reject file-backed destination pages.  Writing unauthenticated
+		 * AEAD output into a page-cache page before tag verification
+		 * leaves the backing file modified even when recvmsg() returns
+		 * -EBADMSG.  Return -EOPNOTSUPP so the caller retries via the
+		 * non-ZC bounce-buffer path.
+		 */
+		{
+			ssize_t remain = copied;
+			size_t  off    = offset;
+			int     np = 0, j;
+
+			while (remain > 0) {
+				remain -= min_t(ssize_t, remain,
+						(ssize_t)(PAGE_SIZE - off));
+				off = 0;
+				np++;
+			}
+			for (j = 0; j < np; j++) {
+				if (folio_mapping(page_folio(pages[j]))) {
+					int k;
+
+					for (k = 0; k < np; k++)
+						put_page(pages[k]);
+					rc = -EOPNOTSUPP;
+					goto out;
+				}
+			}
+		}
 		while (copied) {
 			use = min_t(int, copied, PAGE_SIZE - offset);
 
@@ -1699,6 +1728,14 @@ tls_decrypt_sw(struct sock *sk, struct tls_context *tls_ctx,
 	if (err < 0) {
 		if (err == -EBADMSG)
 			TLS_INC_STATS(sock_net(sk), LINUX_MIB_TLSDECRYPTERROR);
+		if (err == -EOPNOTSUPP && darg->zc) {
+			/* tls_setup_from_iter detected file-backed destination
+			 * pages; retry without ZC via the bounce-buffer path.
+			 */
+			darg->zc = false;
+			TLS_INC_STATS(sock_net(sk), LINUX_MIB_TLSDECRYPTRETRY);
+			return tls_decrypt_sw(sk, tls_ctx, msg, darg);
+		}
 		return err;
 	}
 	/* keep going even for ->async, the code below is TLS 1.3 */
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 2+ messages in thread

* Re: [PATCH net] tls: avoid zc receive for file-backed pages
  2026-05-21 16:53 ` [PATCH net] tls: avoid zc receive for file-backed pages Yeonju Bae
@ 2026-05-25 17:54   ` Jakub Kicinski
  0 siblings, 0 replies; 2+ messages in thread
From: Jakub Kicinski @ 2026-05-25 17:54 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Yeonju Bae, netdev, gregkh, security, john.fastabend, sd, davem,
	edumazet, pabeni, horms, linux-kernel, linux-fsdevel

On Fri, 22 May 2026 01:53:28 +0900 Yeonju Bae wrote:
> kTLS RX zc decrypt writes unauthenticated AEAD output directly into
> pages pinned from the recvmsg iterator via tls_setup_from_iter().
> For MAP_SHARED, PROT_WRITE file-backed destinations, those pages are
> live page-cache pages rather than anonymous copies: MAP_SHARED does not
> trigger copy-on-write, so FOLL_WRITE returns the actual page-cache page.
> 
> crypto_aead_decrypt() writes CTR-mode decryption output into the
> scatter-gather list before the authentication tag is verified.  If the
> tag check fails (-EBADMSG), the plaintext-like output is already
> resident in the page-cache page.  exit_free_pages() calls put_page()
> without any content cleanup, so the modification persists through the
> backing file.  An independent open(O_RDONLY)/read() of the same file
> returns different content and its SHA-256 changes.  MAP_PRIVATE is safe
> via COW; PROT_READ-only destinations fail at iov_iter_get_pages2()
> before any decryption occurs.
> 
> Avoid zc receive for file-backed destination pages.  In
> tls_setup_from_iter(), after iov_iter_get_pages2() pins pages, check
> each page with folio_mapping(page_folio(page)).  If any pinned page is
> file-backed (mapping != NULL), release the pinned pages and return
> -EOPNOTSUPP.  Handle -EOPNOTSUPP in tls_decrypt_sw() by clearing
> darg->zc and retrying, which causes tls_decrypt_sg() to allocate a
> kernel bounce buffer instead.  Decryption output never reaches the
> file-backed page; on tag failure the bounce buffer is discarded.
> 
> This follows the existing opportunistic zc retry pattern already used
> for TLS 1.3 record type mismatches in tls_decrypt_sw().
> 
> Verified on linux-7.0-rc3 QEMU (x86-64), four destination types:
>   MAP_SHARED+RW:   file_changed=0  (was 4077/4096 bytes before patch)
>   MAP_PRIVATE+RW:  file_changed=0  (COW isolation; unchanged)
>   anonymous heap:  no file backing  (unchanged)
>   PROT_READ only:  file_changed=0  (EFAULT before decrypt; unchanged)

I'm not seeing anything unusual here from high level API use size. 
We feed the iov_iter constructed by recvmsg in socket code into 
iov_iter_get_pages2(). Either:
 - the way we construct the iov_iter is wrong; or
 - iov_iter_get_pages2() should be return an error; or
 - we should use a different iov_* API; or
 - the current behavior you describe is expected / correct.

I don't think that TLS open-coding page checks is the right move.

Al, would you mind glancing over this?
I have no idea what's the expect page cache behavior in this scenario.

> diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
> index a977b0434..c312a83b4 100644
> --- a/net/tls/tls_sw.c
> +++ b/net/tls/tls_sw.c
> @@ -36,6 +36,7 @@
>   */
>  
>  #include <linux/bug.h>
> +#include <linux/pagemap.h>
>  #include <linux/sched/signal.h>
>  #include <linux/module.h>
>  #include <linux/kernel.h>
> @@ -1443,6 +1444,34 @@ static int tls_setup_from_iter(struct iov_iter *from,
>  
>  		length -= copied;
>  		size += copied;
> +		/* Reject file-backed destination pages.  Writing unauthenticated
> +		 * AEAD output into a page-cache page before tag verification
> +		 * leaves the backing file modified even when recvmsg() returns
> +		 * -EBADMSG.  Return -EOPNOTSUPP so the caller retries via the
> +		 * non-ZC bounce-buffer path.
> +		 */
> +		{
> +			ssize_t remain = copied;
> +			size_t  off    = offset;
> +			int     np = 0, j;
> +
> +			while (remain > 0) {
> +				remain -= min_t(ssize_t, remain,
> +						(ssize_t)(PAGE_SIZE - off));
> +				off = 0;
> +				np++;
> +			}
> +			for (j = 0; j < np; j++) {
> +				if (folio_mapping(page_folio(pages[j]))) {
> +					int k;
> +
> +					for (k = 0; k < np; k++)
> +						put_page(pages[k]);
> +					rc = -EOPNOTSUPP;
> +					goto out;
> +				}
> +			}
> +		}
>  		while (copied) {
>  			use = min_t(int, copied, PAGE_SIZE - offset);
>  
> @@ -1699,6 +1728,14 @@ tls_decrypt_sw(struct sock *sk, struct tls_context *tls_ctx,
>  	if (err < 0) {
>  		if (err == -EBADMSG)
>  			TLS_INC_STATS(sock_net(sk), LINUX_MIB_TLSDECRYPTERROR);
> +		if (err == -EOPNOTSUPP && darg->zc) {
> +			/* tls_setup_from_iter detected file-backed destination
> +			 * pages; retry without ZC via the bounce-buffer path.
> +			 */
> +			darg->zc = false;
> +			TLS_INC_STATS(sock_net(sk), LINUX_MIB_TLSDECRYPTRETRY);
> +			return tls_decrypt_sw(sk, tls_ctx, msg, darg);
> +		}
>  		return err;
>  	}
>  	/* keep going even for ->async, the code below is TLS 1.3 */


^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2026-05-25 17:55 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <2026052150-stylus-germicide-780e@gregkh>
2026-05-21 16:53 ` [PATCH net] tls: avoid zc receive for file-backed pages Yeonju Bae
2026-05-25 17:54   ` Jakub Kicinski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox