Linux filesystem development
 help / color / mirror / Atom feed
* Re: [PATCH net] tls: avoid zc receive for file-backed pages
       [not found] ` <20260521165328.16112-1-iwasbaeyz@gmail.com>
@ 2026-05-25 17:54   ` Jakub Kicinski
  2026-05-26  7:05     ` Christoph Hellwig
  0 siblings, 1 reply; 2+ messages in thread
From: Jakub Kicinski @ 2026-05-25 17:54 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Yeonju Bae, netdev, gregkh, security, john.fastabend, sd, davem,
	edumazet, pabeni, horms, linux-kernel, linux-fsdevel

On Fri, 22 May 2026 01:53:28 +0900 Yeonju Bae wrote:
> kTLS RX zc decrypt writes unauthenticated AEAD output directly into
> pages pinned from the recvmsg iterator via tls_setup_from_iter().
> For MAP_SHARED, PROT_WRITE file-backed destinations, those pages are
> live page-cache pages rather than anonymous copies: MAP_SHARED does not
> trigger copy-on-write, so FOLL_WRITE returns the actual page-cache page.
> 
> crypto_aead_decrypt() writes CTR-mode decryption output into the
> scatter-gather list before the authentication tag is verified.  If the
> tag check fails (-EBADMSG), the plaintext-like output is already
> resident in the page-cache page.  exit_free_pages() calls put_page()
> without any content cleanup, so the modification persists through the
> backing file.  An independent open(O_RDONLY)/read() of the same file
> returns different content and its SHA-256 changes.  MAP_PRIVATE is safe
> via COW; PROT_READ-only destinations fail at iov_iter_get_pages2()
> before any decryption occurs.
> 
> Avoid zc receive for file-backed destination pages.  In
> tls_setup_from_iter(), after iov_iter_get_pages2() pins pages, check
> each page with folio_mapping(page_folio(page)).  If any pinned page is
> file-backed (mapping != NULL), release the pinned pages and return
> -EOPNOTSUPP.  Handle -EOPNOTSUPP in tls_decrypt_sw() by clearing
> darg->zc and retrying, which causes tls_decrypt_sg() to allocate a
> kernel bounce buffer instead.  Decryption output never reaches the
> file-backed page; on tag failure the bounce buffer is discarded.
> 
> This follows the existing opportunistic zc retry pattern already used
> for TLS 1.3 record type mismatches in tls_decrypt_sw().
> 
> Verified on linux-7.0-rc3 QEMU (x86-64), four destination types:
>   MAP_SHARED+RW:   file_changed=0  (was 4077/4096 bytes before patch)
>   MAP_PRIVATE+RW:  file_changed=0  (COW isolation; unchanged)
>   anonymous heap:  no file backing  (unchanged)
>   PROT_READ only:  file_changed=0  (EFAULT before decrypt; unchanged)

I'm not seeing anything unusual here from high level API use size. 
We feed the iov_iter constructed by recvmsg in socket code into 
iov_iter_get_pages2(). Either:
 - the way we construct the iov_iter is wrong; or
 - iov_iter_get_pages2() should be return an error; or
 - we should use a different iov_* API; or
 - the current behavior you describe is expected / correct.

I don't think that TLS open-coding page checks is the right move.

Al, would you mind glancing over this?
I have no idea what's the expect page cache behavior in this scenario.

> diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
> index a977b0434..c312a83b4 100644
> --- a/net/tls/tls_sw.c
> +++ b/net/tls/tls_sw.c
> @@ -36,6 +36,7 @@
>   */
>  
>  #include <linux/bug.h>
> +#include <linux/pagemap.h>
>  #include <linux/sched/signal.h>
>  #include <linux/module.h>
>  #include <linux/kernel.h>
> @@ -1443,6 +1444,34 @@ static int tls_setup_from_iter(struct iov_iter *from,
>  
>  		length -= copied;
>  		size += copied;
> +		/* Reject file-backed destination pages.  Writing unauthenticated
> +		 * AEAD output into a page-cache page before tag verification
> +		 * leaves the backing file modified even when recvmsg() returns
> +		 * -EBADMSG.  Return -EOPNOTSUPP so the caller retries via the
> +		 * non-ZC bounce-buffer path.
> +		 */
> +		{
> +			ssize_t remain = copied;
> +			size_t  off    = offset;
> +			int     np = 0, j;
> +
> +			while (remain > 0) {
> +				remain -= min_t(ssize_t, remain,
> +						(ssize_t)(PAGE_SIZE - off));
> +				off = 0;
> +				np++;
> +			}
> +			for (j = 0; j < np; j++) {
> +				if (folio_mapping(page_folio(pages[j]))) {
> +					int k;
> +
> +					for (k = 0; k < np; k++)
> +						put_page(pages[k]);
> +					rc = -EOPNOTSUPP;
> +					goto out;
> +				}
> +			}
> +		}
>  		while (copied) {
>  			use = min_t(int, copied, PAGE_SIZE - offset);
>  
> @@ -1699,6 +1728,14 @@ tls_decrypt_sw(struct sock *sk, struct tls_context *tls_ctx,
>  	if (err < 0) {
>  		if (err == -EBADMSG)
>  			TLS_INC_STATS(sock_net(sk), LINUX_MIB_TLSDECRYPTERROR);
> +		if (err == -EOPNOTSUPP && darg->zc) {
> +			/* tls_setup_from_iter detected file-backed destination
> +			 * pages; retry without ZC via the bounce-buffer path.
> +			 */
> +			darg->zc = false;
> +			TLS_INC_STATS(sock_net(sk), LINUX_MIB_TLSDECRYPTRETRY);
> +			return tls_decrypt_sw(sk, tls_ctx, msg, darg);
> +		}
>  		return err;
>  	}
>  	/* keep going even for ->async, the code below is TLS 1.3 */


^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [PATCH net] tls: avoid zc receive for file-backed pages
  2026-05-25 17:54   ` [PATCH net] tls: avoid zc receive for file-backed pages Jakub Kicinski
@ 2026-05-26  7:05     ` Christoph Hellwig
  0 siblings, 0 replies; 2+ messages in thread
From: Christoph Hellwig @ 2026-05-26  7:05 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Alexander Viro, Yeonju Bae, netdev, gregkh, security,
	john.fastabend, sd, davem, edumazet, pabeni, horms, linux-kernel,
	linux-fsdevel

On Mon, May 25, 2026 at 10:54:59AM -0700, Jakub Kicinski wrote:
> > kTLS RX zc decrypt writes unauthenticated AEAD output directly into
> > pages pinned from the recvmsg iterator via tls_setup_from_iter().
> > For MAP_SHARED, PROT_WRITE file-backed destinations, those pages are
> > live page-cache pages rather than anonymous copies: MAP_SHARED does not
> > trigger copy-on-write, so FOLL_WRITE returns the actual page-cache page.

As does MAP_SHARED for any other mapping.

> > via COW; PROT_READ-only destinations fail at iov_iter_get_pages2()
> > before any decryption occurs.

Btw, this really needs to stop using iov_iter_get_pages2 and switch to
iov_iter_extract_pages / iov_iter_extract_bvecs.  This does not fix
your probleb, but other potentially exploitable races.

iov_iter_get_pages2 and friends must never be used for writing,
and preferably should go away entirely.

> > Avoid zc receive for file-backed destination pages.  In
> > tls_setup_from_iter(), after iov_iter_get_pages2() pins pages, check
> > each page with folio_mapping(page_folio(page)).  If any pinned page is
> > file-backed (mapping != NULL), release the pinned pages and return
> > -EOPNOTSUPP.  Handle -EOPNOTSUPP in tls_decrypt_sw() by clearing
> > darg->zc and retrying, which causes tls_decrypt_sg() to allocate a
> > kernel bounce buffer instead.  Decryption output never reaches the
> > file-backed page; on tag failure the bounce buffer is discarded.

I can't see how this is not a problem for non-file backed shared
mappings.


^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2026-05-26  7:05 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <2026052150-stylus-germicide-780e@gregkh>
     [not found] ` <20260521165328.16112-1-iwasbaeyz@gmail.com>
2026-05-25 17:54   ` [PATCH net] tls: avoid zc receive for file-backed pages Jakub Kicinski
2026-05-26  7:05     ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox