* [PATCH net] tls: avoid zc receive for file-backed pages [not found] <2026052150-stylus-germicide-780e@gregkh> @ 2026-05-21 16:53 ` Yeonju Bae 2026-05-25 17:54 ` Jakub Kicinski 0 siblings, 1 reply; 3+ messages in thread From: Yeonju Bae @ 2026-05-21 16:53 UTC (permalink / raw) To: netdev Cc: gregkh, security, john.fastabend, kuba, sd, davem, edumazet, pabeni, horms, linux-kernel, Yeonju Bae kTLS RX zc decrypt writes unauthenticated AEAD output directly into pages pinned from the recvmsg iterator via tls_setup_from_iter(). For MAP_SHARED, PROT_WRITE file-backed destinations, those pages are live page-cache pages rather than anonymous copies: MAP_SHARED does not trigger copy-on-write, so FOLL_WRITE returns the actual page-cache page. crypto_aead_decrypt() writes CTR-mode decryption output into the scatter-gather list before the authentication tag is verified. If the tag check fails (-EBADMSG), the plaintext-like output is already resident in the page-cache page. exit_free_pages() calls put_page() without any content cleanup, so the modification persists through the backing file. An independent open(O_RDONLY)/read() of the same file returns different content and its SHA-256 changes. MAP_PRIVATE is safe via COW; PROT_READ-only destinations fail at iov_iter_get_pages2() before any decryption occurs. Avoid zc receive for file-backed destination pages. In tls_setup_from_iter(), after iov_iter_get_pages2() pins pages, check each page with folio_mapping(page_folio(page)). If any pinned page is file-backed (mapping != NULL), release the pinned pages and return -EOPNOTSUPP. Handle -EOPNOTSUPP in tls_decrypt_sw() by clearing darg->zc and retrying, which causes tls_decrypt_sg() to allocate a kernel bounce buffer instead. Decryption output never reaches the file-backed page; on tag failure the bounce buffer is discarded. This follows the existing opportunistic zc retry pattern already used for TLS 1.3 record type mismatches in tls_decrypt_sw(). Verified on linux-7.0-rc3 QEMU (x86-64), four destination types: MAP_SHARED+RW: file_changed=0 (was 4077/4096 bytes before patch) MAP_PRIVATE+RW: file_changed=0 (COW isolation; unchanged) anonymous heap: no file backing (unchanged) PROT_READ only: file_changed=0 (EFAULT before decrypt; unchanged) Signed-off-by: Yeonju Bae <iwasbaeyz@gmail.com> --- net/tls/tls_sw.c | 37 +++++++++++++++++++++++++++++++++++++ 1 file changed, 37 insertions(+) diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c index a977b0434..c312a83b4 100644 --- a/net/tls/tls_sw.c +++ b/net/tls/tls_sw.c @@ -36,6 +36,7 @@ */ #include <linux/bug.h> +#include <linux/pagemap.h> #include <linux/sched/signal.h> #include <linux/module.h> #include <linux/kernel.h> @@ -1443,6 +1444,34 @@ static int tls_setup_from_iter(struct iov_iter *from, length -= copied; size += copied; + /* Reject file-backed destination pages. Writing unauthenticated + * AEAD output into a page-cache page before tag verification + * leaves the backing file modified even when recvmsg() returns + * -EBADMSG. Return -EOPNOTSUPP so the caller retries via the + * non-ZC bounce-buffer path. + */ + { + ssize_t remain = copied; + size_t off = offset; + int np = 0, j; + + while (remain > 0) { + remain -= min_t(ssize_t, remain, + (ssize_t)(PAGE_SIZE - off)); + off = 0; + np++; + } + for (j = 0; j < np; j++) { + if (folio_mapping(page_folio(pages[j]))) { + int k; + + for (k = 0; k < np; k++) + put_page(pages[k]); + rc = -EOPNOTSUPP; + goto out; + } + } + } while (copied) { use = min_t(int, copied, PAGE_SIZE - offset); @@ -1699,6 +1728,14 @@ tls_decrypt_sw(struct sock *sk, struct tls_context *tls_ctx, if (err < 0) { if (err == -EBADMSG) TLS_INC_STATS(sock_net(sk), LINUX_MIB_TLSDECRYPTERROR); + if (err == -EOPNOTSUPP && darg->zc) { + /* tls_setup_from_iter detected file-backed destination + * pages; retry without ZC via the bounce-buffer path. + */ + darg->zc = false; + TLS_INC_STATS(sock_net(sk), LINUX_MIB_TLSDECRYPTRETRY); + return tls_decrypt_sw(sk, tls_ctx, msg, darg); + } return err; } /* keep going even for ->async, the code below is TLS 1.3 */ -- 2.43.0 ^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH net] tls: avoid zc receive for file-backed pages 2026-05-21 16:53 ` [PATCH net] tls: avoid zc receive for file-backed pages Yeonju Bae @ 2026-05-25 17:54 ` Jakub Kicinski 2026-05-26 7:05 ` Christoph Hellwig 0 siblings, 1 reply; 3+ messages in thread From: Jakub Kicinski @ 2026-05-25 17:54 UTC (permalink / raw) To: Alexander Viro Cc: Yeonju Bae, netdev, gregkh, security, john.fastabend, sd, davem, edumazet, pabeni, horms, linux-kernel, linux-fsdevel On Fri, 22 May 2026 01:53:28 +0900 Yeonju Bae wrote: > kTLS RX zc decrypt writes unauthenticated AEAD output directly into > pages pinned from the recvmsg iterator via tls_setup_from_iter(). > For MAP_SHARED, PROT_WRITE file-backed destinations, those pages are > live page-cache pages rather than anonymous copies: MAP_SHARED does not > trigger copy-on-write, so FOLL_WRITE returns the actual page-cache page. > > crypto_aead_decrypt() writes CTR-mode decryption output into the > scatter-gather list before the authentication tag is verified. If the > tag check fails (-EBADMSG), the plaintext-like output is already > resident in the page-cache page. exit_free_pages() calls put_page() > without any content cleanup, so the modification persists through the > backing file. An independent open(O_RDONLY)/read() of the same file > returns different content and its SHA-256 changes. MAP_PRIVATE is safe > via COW; PROT_READ-only destinations fail at iov_iter_get_pages2() > before any decryption occurs. > > Avoid zc receive for file-backed destination pages. In > tls_setup_from_iter(), after iov_iter_get_pages2() pins pages, check > each page with folio_mapping(page_folio(page)). If any pinned page is > file-backed (mapping != NULL), release the pinned pages and return > -EOPNOTSUPP. Handle -EOPNOTSUPP in tls_decrypt_sw() by clearing > darg->zc and retrying, which causes tls_decrypt_sg() to allocate a > kernel bounce buffer instead. Decryption output never reaches the > file-backed page; on tag failure the bounce buffer is discarded. > > This follows the existing opportunistic zc retry pattern already used > for TLS 1.3 record type mismatches in tls_decrypt_sw(). > > Verified on linux-7.0-rc3 QEMU (x86-64), four destination types: > MAP_SHARED+RW: file_changed=0 (was 4077/4096 bytes before patch) > MAP_PRIVATE+RW: file_changed=0 (COW isolation; unchanged) > anonymous heap: no file backing (unchanged) > PROT_READ only: file_changed=0 (EFAULT before decrypt; unchanged) I'm not seeing anything unusual here from high level API use size. We feed the iov_iter constructed by recvmsg in socket code into iov_iter_get_pages2(). Either: - the way we construct the iov_iter is wrong; or - iov_iter_get_pages2() should be return an error; or - we should use a different iov_* API; or - the current behavior you describe is expected / correct. I don't think that TLS open-coding page checks is the right move. Al, would you mind glancing over this? I have no idea what's the expect page cache behavior in this scenario. > diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c > index a977b0434..c312a83b4 100644 > --- a/net/tls/tls_sw.c > +++ b/net/tls/tls_sw.c > @@ -36,6 +36,7 @@ > */ > > #include <linux/bug.h> > +#include <linux/pagemap.h> > #include <linux/sched/signal.h> > #include <linux/module.h> > #include <linux/kernel.h> > @@ -1443,6 +1444,34 @@ static int tls_setup_from_iter(struct iov_iter *from, > > length -= copied; > size += copied; > + /* Reject file-backed destination pages. Writing unauthenticated > + * AEAD output into a page-cache page before tag verification > + * leaves the backing file modified even when recvmsg() returns > + * -EBADMSG. Return -EOPNOTSUPP so the caller retries via the > + * non-ZC bounce-buffer path. > + */ > + { > + ssize_t remain = copied; > + size_t off = offset; > + int np = 0, j; > + > + while (remain > 0) { > + remain -= min_t(ssize_t, remain, > + (ssize_t)(PAGE_SIZE - off)); > + off = 0; > + np++; > + } > + for (j = 0; j < np; j++) { > + if (folio_mapping(page_folio(pages[j]))) { > + int k; > + > + for (k = 0; k < np; k++) > + put_page(pages[k]); > + rc = -EOPNOTSUPP; > + goto out; > + } > + } > + } > while (copied) { > use = min_t(int, copied, PAGE_SIZE - offset); > > @@ -1699,6 +1728,14 @@ tls_decrypt_sw(struct sock *sk, struct tls_context *tls_ctx, > if (err < 0) { > if (err == -EBADMSG) > TLS_INC_STATS(sock_net(sk), LINUX_MIB_TLSDECRYPTERROR); > + if (err == -EOPNOTSUPP && darg->zc) { > + /* tls_setup_from_iter detected file-backed destination > + * pages; retry without ZC via the bounce-buffer path. > + */ > + darg->zc = false; > + TLS_INC_STATS(sock_net(sk), LINUX_MIB_TLSDECRYPTRETRY); > + return tls_decrypt_sw(sk, tls_ctx, msg, darg); > + } > return err; > } > /* keep going even for ->async, the code below is TLS 1.3 */ ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH net] tls: avoid zc receive for file-backed pages 2026-05-25 17:54 ` Jakub Kicinski @ 2026-05-26 7:05 ` Christoph Hellwig 0 siblings, 0 replies; 3+ messages in thread From: Christoph Hellwig @ 2026-05-26 7:05 UTC (permalink / raw) To: Jakub Kicinski Cc: Alexander Viro, Yeonju Bae, netdev, gregkh, security, john.fastabend, sd, davem, edumazet, pabeni, horms, linux-kernel, linux-fsdevel On Mon, May 25, 2026 at 10:54:59AM -0700, Jakub Kicinski wrote: > > kTLS RX zc decrypt writes unauthenticated AEAD output directly into > > pages pinned from the recvmsg iterator via tls_setup_from_iter(). > > For MAP_SHARED, PROT_WRITE file-backed destinations, those pages are > > live page-cache pages rather than anonymous copies: MAP_SHARED does not > > trigger copy-on-write, so FOLL_WRITE returns the actual page-cache page. As does MAP_SHARED for any other mapping. > > via COW; PROT_READ-only destinations fail at iov_iter_get_pages2() > > before any decryption occurs. Btw, this really needs to stop using iov_iter_get_pages2 and switch to iov_iter_extract_pages / iov_iter_extract_bvecs. This does not fix your probleb, but other potentially exploitable races. iov_iter_get_pages2 and friends must never be used for writing, and preferably should go away entirely. > > Avoid zc receive for file-backed destination pages. In > > tls_setup_from_iter(), after iov_iter_get_pages2() pins pages, check > > each page with folio_mapping(page_folio(page)). If any pinned page is > > file-backed (mapping != NULL), release the pinned pages and return > > -EOPNOTSUPP. Handle -EOPNOTSUPP in tls_decrypt_sw() by clearing > > darg->zc and retrying, which causes tls_decrypt_sg() to allocate a > > kernel bounce buffer instead. Decryption output never reaches the > > file-backed page; on tag failure the bounce buffer is discarded. I can't see how this is not a problem for non-file backed shared mappings. ^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2026-05-26 7:05 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <2026052150-stylus-germicide-780e@gregkh>
2026-05-21 16:53 ` [PATCH net] tls: avoid zc receive for file-backed pages Yeonju Bae
2026-05-25 17:54 ` Jakub Kicinski
2026-05-26 7:05 ` Christoph Hellwig
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.