* [PATCH net v4] net: skbuff: propagate shared-frag marker through frag-transfer helpers
@ 2026-05-15 5:55 Hyunwoo Kim
2026-05-15 6:07 ` Hyunwoo Kim
2026-05-15 6:26 ` Sultan Alsawaf
0 siblings, 2 replies; 4+ messages in thread
From: Hyunwoo Kim @ 2026-05-15 5:55 UTC (permalink / raw)
To: davem, edumazet, kuba, pabeni, horms, kerneljasonxing, kuniyu,
mhal, jiayuan.chen, steffen.klassert, ben, herbert, dsahern,
sultan, sd
Cc: netdev, stable, imv4bel
Two frag-transfer helpers (__pskb_copy_fclone() and skb_shift()) fail
to propagate the SKBFL_SHARED_FRAG bit in skb_shinfo()->flags when
moving frags from source to destination. __pskb_copy_fclone() defers
the rest of the shinfo metadata to skb_copy_header() after copying
frag descriptors, but that helper only carries over gso_{size,segs,
type} and never touches skb_shinfo()->flags; skb_shift() moves frag
descriptors directly and leaves flags untouched. As a result, the
destination skb keeps a reference to the same externally-owned or
page-cache-backed pages while reporting skb_has_shared_frag() as
false.
The mismatch is harmful in any in-place writer that uses
skb_has_shared_frag() to decide whether shared pages must be detoured
through skb_cow_data(). ESP input is one such writer (esp4.c,
esp6.c), and a single nft 'dup to <local>' rule -- or any other
nf_dup_ipv4() / xt_TEE caller -- is enough to land a pskb_copy()'d
skb in esp_input() with the marker stripped, letting an unprivileged
user write into the page cache of a root-owned read-only file via
authencesn-ESN stray writes.
Set SKBFL_SHARED_FRAG on the destination whenever frag descriptors
were actually moved from the source. skb_copy() and skb_copy_expand()
share skb_copy_header() too but linearize all paged data into freshly
allocated head storage and emerge with nr_frags == 0, so
skb_has_shared_frag() returns false on its own; they need no change.
The same omission exists in skb_gro_receive() and skb_gro_receive_list().
The former moves the incoming skb's frag descriptors into the
accumulator's last sub-skb via two paths (a direct frag-move loop and
the head_frag + memcpy path); the latter chains the incoming skb whole
onto p's frag_list. Downstream skb_segment() reads only
skb_shinfo(p)->flags, and skb_segment_list() reuses each sub-skb's
shinfo as the nskb -- both p and lp must carry the marker.
The same omission also exists in tcp_clone_payload(), which builds an
MTU probe skb by moving frag descriptors from skbs on sk_write_queue
into a freshly allocated nskb. The helper falls into the same family
and warrants the same fix for consistency; no TCP TX-side in-place
writer is currently known to reach a user page through this gap, but
a future consumer depending on the marker would regress silently.
Fixes: cef401de7be8 ("net: fix possible wrong checksum generation")
Fixes: f4c50a4034e6 ("xfrm: esp: avoid in-place decrypt on shared skb frags")
Suggested-by: Sabrina Dubroca <sd@queasysnail.net>
Suggested-by: Sultan Alsawaf <sultan@kerneltoast.com>
Suggested-by: Ben Hutchings <ben@decadent.org.uk>
Cc: stable@vger.kernel.org
Signed-off-by: Hyunwoo Kim <imv4bel@gmail.com>
---
Changes in v4:
- Include the tcp_clone_payload() propagation suggested by Sabrina.
- Drop the skb_try_coalesce() change; addressed by commit f84eca581739.
- v3: https://lore.kernel.org/all/agW4vC0r8QOUKtRT@v4bel/
Changes in v3:
- Include the skb_gro_receive() audit patch suggested by Sultan
- v2: https://lore.kernel.org/all/agToIEDI4TaTNLRb@v4bel/
Changes in v2:
- Also propagate SHARED_FRAG in skb_try_coalesce() and skb_shift()
- v1: https://lore.kernel.org/all/agRfuVOeMI5pbHhY@v4bel/
---
net/core/gro.c | 4 ++++
net/core/skbuff.c | 3 +++
net/ipv4/tcp_output.c | 1 +
3 files changed, 8 insertions(+)
diff --git a/net/core/gro.c b/net/core/gro.c
index 31d21de5b15a..9f8960789b2c 100644
--- a/net/core/gro.c
+++ b/net/core/gro.c
@@ -213,10 +213,12 @@ int skb_gro_receive(struct sk_buff *p, struct sk_buff *skb)
p->data_len += len;
p->truesize += delta_truesize;
p->len += len;
+ skb_shinfo(p)->flags |= skbinfo->flags & SKBFL_SHARED_FRAG;
if (lp != p) {
lp->data_len += len;
lp->truesize += delta_truesize;
lp->len += len;
+ skb_shinfo(lp)->flags |= skbinfo->flags & SKBFL_SHARED_FRAG;
}
NAPI_GRO_CB(skb)->same_flow = 1;
return 0;
@@ -244,6 +246,8 @@ int skb_gro_receive_list(struct sk_buff *p, struct sk_buff *skb)
p->truesize += skb->truesize;
p->len += skb->len;
+ skb_shinfo(p)->flags |= skb_shinfo(skb)->flags & SKBFL_SHARED_FRAG;
+
NAPI_GRO_CB(skb)->same_flow = 1;
return 0;
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 9c4e8d331d6d..7cd388504297 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -2248,6 +2248,7 @@ struct sk_buff *__pskb_copy_fclone(struct sk_buff *skb, int headroom,
skb_frag_ref(skb, i);
}
skb_shinfo(n)->nr_frags = i;
+ skb_shinfo(n)->flags |= skb_shinfo(skb)->flags & SKBFL_SHARED_FRAG;
}
if (skb_has_frag_list(skb)) {
@@ -4349,6 +4350,8 @@ int skb_shift(struct sk_buff *tgt, struct sk_buff *skb, int shiftlen)
tgt->ip_summed = CHECKSUM_PARTIAL;
skb->ip_summed = CHECKSUM_PARTIAL;
+ skb_shinfo(tgt)->flags |= skb_shinfo(skb)->flags & SKBFL_SHARED_FRAG;
+
skb_len_add(skb, -shiftlen);
skb_len_add(tgt, shiftlen);
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index f9d8755705f7..6e4bb411dc04 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2626,6 +2626,7 @@ static int tcp_clone_payload(struct sock *sk, struct sk_buff *to,
todo = min_t(int, skb_frag_size(fragfrom),
probe_size - len);
len += todo;
+ skb_shinfo(to)->flags |= skb_shinfo(skb)->flags & SKBFL_SHARED_FRAG;
if (lastfrag &&
skb_frag_page(fragfrom) == skb_frag_page(lastfrag) &&
skb_frag_off(fragfrom) == skb_frag_off(lastfrag) +
--
2.43.0
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH net v4] net: skbuff: propagate shared-frag marker through frag-transfer helpers
2026-05-15 5:55 [PATCH net v4] net: skbuff: propagate shared-frag marker through frag-transfer helpers Hyunwoo Kim
@ 2026-05-15 6:07 ` Hyunwoo Kim
2026-05-15 6:26 ` Sultan Alsawaf
1 sibling, 0 replies; 4+ messages in thread
From: Hyunwoo Kim @ 2026-05-15 6:07 UTC (permalink / raw)
To: davem, edumazet, kuba, pabeni, horms, kerneljasonxing, kuniyu,
mhal, jiayuan.chen, steffen.klassert, ben, herbert, dsahern,
sultan, sd, malin89, tanjingguo
Cc: netdev, stable, imv4bel
On Fri, May 15, 2026 at 02:55:35PM +0900, Hyunwoo Kim wrote:
> Two frag-transfer helpers (__pskb_copy_fclone() and skb_shift()) fail
> to propagate the SKBFL_SHARED_FRAG bit in skb_shinfo()->flags when
> moving frags from source to destination. __pskb_copy_fclone() defers
> the rest of the shinfo metadata to skb_copy_header() after copying
> frag descriptors, but that helper only carries over gso_{size,segs,
> type} and never touches skb_shinfo()->flags; skb_shift() moves frag
> descriptors directly and leaves flags untouched. As a result, the
> destination skb keeps a reference to the same externally-owned or
> page-cache-backed pages while reporting skb_has_shared_frag() as
> false.
>
> The mismatch is harmful in any in-place writer that uses
> skb_has_shared_frag() to decide whether shared pages must be detoured
> through skb_cow_data(). ESP input is one such writer (esp4.c,
> esp6.c), and a single nft 'dup to <local>' rule -- or any other
> nf_dup_ipv4() / xt_TEE caller -- is enough to land a pskb_copy()'d
> skb in esp_input() with the marker stripped, letting an unprivileged
> user write into the page cache of a root-owned read-only file via
> authencesn-ESN stray writes.
>
> Set SKBFL_SHARED_FRAG on the destination whenever frag descriptors
> were actually moved from the source. skb_copy() and skb_copy_expand()
> share skb_copy_header() too but linearize all paged data into freshly
> allocated head storage and emerge with nr_frags == 0, so
> skb_has_shared_frag() returns false on its own; they need no change.
>
> The same omission exists in skb_gro_receive() and skb_gro_receive_list().
> The former moves the incoming skb's frag descriptors into the
> accumulator's last sub-skb via two paths (a direct frag-move loop and
> the head_frag + memcpy path); the latter chains the incoming skb whole
> onto p's frag_list. Downstream skb_segment() reads only
> skb_shinfo(p)->flags, and skb_segment_list() reuses each sub-skb's
> shinfo as the nskb -- both p and lp must carry the marker.
>
> The same omission also exists in tcp_clone_payload(), which builds an
> MTU probe skb by moving frag descriptors from skbs on sk_write_queue
> into a freshly allocated nskb. The helper falls into the same family
> and warrants the same fix for consistency; no TCP TX-side in-place
> writer is currently known to reach a user page through this gap, but
> a future consumer depending on the marker would regress silently.
>
> Fixes: cef401de7be8 ("net: fix possible wrong checksum generation")
> Fixes: f4c50a4034e6 ("xfrm: esp: avoid in-place decrypt on shared skb frags")
> Suggested-by: Sabrina Dubroca <sd@queasysnail.net>
> Suggested-by: Sultan Alsawaf <sultan@kerneltoast.com>
> Suggested-by: Ben Hutchings <ben@decadent.org.uk>
Since they are asking for credit, I will add them:
Suggested-by: Lin Ma <malin89@huawei.com>
Suggested-by: Jingguo Tan <tanjingguo@huawei.com>
> Cc: stable@vger.kernel.org
> Signed-off-by: Hyunwoo Kim <imv4bel@gmail.com>
> ---
> Changes in v4:
> - Include the tcp_clone_payload() propagation suggested by Sabrina.
> - Drop the skb_try_coalesce() change; addressed by commit f84eca581739.
> - v3: https://lore.kernel.org/all/agW4vC0r8QOUKtRT@v4bel/
>
> Changes in v3:
> - Include the skb_gro_receive() audit patch suggested by Sultan
> - v2: https://lore.kernel.org/all/agToIEDI4TaTNLRb@v4bel/
>
> Changes in v2:
> - Also propagate SHARED_FRAG in skb_try_coalesce() and skb_shift()
> - v1: https://lore.kernel.org/all/agRfuVOeMI5pbHhY@v4bel/
> ---
> net/core/gro.c | 4 ++++
> net/core/skbuff.c | 3 +++
> net/ipv4/tcp_output.c | 1 +
> 3 files changed, 8 insertions(+)
>
> diff --git a/net/core/gro.c b/net/core/gro.c
> index 31d21de5b15a..9f8960789b2c 100644
> --- a/net/core/gro.c
> +++ b/net/core/gro.c
> @@ -213,10 +213,12 @@ int skb_gro_receive(struct sk_buff *p, struct sk_buff *skb)
> p->data_len += len;
> p->truesize += delta_truesize;
> p->len += len;
> + skb_shinfo(p)->flags |= skbinfo->flags & SKBFL_SHARED_FRAG;
> if (lp != p) {
> lp->data_len += len;
> lp->truesize += delta_truesize;
> lp->len += len;
> + skb_shinfo(lp)->flags |= skbinfo->flags & SKBFL_SHARED_FRAG;
> }
> NAPI_GRO_CB(skb)->same_flow = 1;
> return 0;
> @@ -244,6 +246,8 @@ int skb_gro_receive_list(struct sk_buff *p, struct sk_buff *skb)
> p->truesize += skb->truesize;
> p->len += skb->len;
>
> + skb_shinfo(p)->flags |= skb_shinfo(skb)->flags & SKBFL_SHARED_FRAG;
> +
> NAPI_GRO_CB(skb)->same_flow = 1;
>
> return 0;
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index 9c4e8d331d6d..7cd388504297 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -2248,6 +2248,7 @@ struct sk_buff *__pskb_copy_fclone(struct sk_buff *skb, int headroom,
> skb_frag_ref(skb, i);
> }
> skb_shinfo(n)->nr_frags = i;
> + skb_shinfo(n)->flags |= skb_shinfo(skb)->flags & SKBFL_SHARED_FRAG;
> }
>
> if (skb_has_frag_list(skb)) {
> @@ -4349,6 +4350,8 @@ int skb_shift(struct sk_buff *tgt, struct sk_buff *skb, int shiftlen)
> tgt->ip_summed = CHECKSUM_PARTIAL;
> skb->ip_summed = CHECKSUM_PARTIAL;
>
> + skb_shinfo(tgt)->flags |= skb_shinfo(skb)->flags & SKBFL_SHARED_FRAG;
> +
> skb_len_add(skb, -shiftlen);
> skb_len_add(tgt, shiftlen);
>
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index f9d8755705f7..6e4bb411dc04 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -2626,6 +2626,7 @@ static int tcp_clone_payload(struct sock *sk, struct sk_buff *to,
> todo = min_t(int, skb_frag_size(fragfrom),
> probe_size - len);
> len += todo;
> + skb_shinfo(to)->flags |= skb_shinfo(skb)->flags & SKBFL_SHARED_FRAG;
> if (lastfrag &&
> skb_frag_page(fragfrom) == skb_frag_page(lastfrag) &&
> skb_frag_off(fragfrom) == skb_frag_off(lastfrag) +
> --
> 2.43.0
>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH net v4] net: skbuff: propagate shared-frag marker through frag-transfer helpers
2026-05-15 5:55 [PATCH net v4] net: skbuff: propagate shared-frag marker through frag-transfer helpers Hyunwoo Kim
2026-05-15 6:07 ` Hyunwoo Kim
@ 2026-05-15 6:26 ` Sultan Alsawaf
2026-05-15 6:36 ` Hyunwoo Kim
1 sibling, 1 reply; 4+ messages in thread
From: Sultan Alsawaf @ 2026-05-15 6:26 UTC (permalink / raw)
To: Hyunwoo Kim
Cc: davem, edumazet, kuba, pabeni, horms, kerneljasonxing, kuniyu,
mhal, jiayuan.chen, steffen.klassert, ben, herbert, dsahern, sd,
netdev, stable
[-- Attachment #1: Type: text/plain, Size: 953 bytes --]
On Fri, May 15, 2026 at 02:55:35PM +0900, Hyunwoo Kim wrote:
> Changes in v4:
> - Include the tcp_clone_payload() propagation suggested by Sabrina.
> - Drop the skb_try_coalesce() change; addressed by commit f84eca581739.
> - v3: https://lore.kernel.org/all/agW4vC0r8QOUKtRT@v4bel/
>
> Changes in v3:
> - Include the skb_gro_receive() audit patch suggested by Sultan
> - v2: https://lore.kernel.org/all/agToIEDI4TaTNLRb@v4bel/
>
> Changes in v2:
> - Also propagate SHARED_FRAG in skb_try_coalesce() and skb_shift()
> - v1: https://lore.kernel.org/all/agRfuVOeMI5pbHhY@v4bel/
Hi Hyunwoo,
Per your ask to me to use AI for exploring relevant paths [1], I've attached my
findings from a pretty thorough day of hunting for these with Claude.
None of the findings appear to be currently exploitable.
Please let me know if you have any questions, and I hope you find this helpful.
[1] https://lore.kernel.org/all/agWUdie1xBvBu22I@v4bel/
Thanks,
Sultan
[-- Attachment #2: shared-frag-audit.txt --]
[-- Type: text/plain, Size: 5057 bytes --]
Line numbers below are against netdev commit 5db89c99566fc ("net: ifb: report
ethtool stats over num_tx_queues").
We audited the netdev tree for remaining sites where frag descriptors are
transferred between skbs without propagating SKBFL_SHARED_FRAG. Hyunwoo Kim's
v4 fix covers __pskb_copy_fclone, skb_shift, skb_gro_receive,
skb_gro_receive_list, and tcp_clone_payload; the standalone f84eca5817390
covers skb_try_coalesce. Several other sites in newer code have the same class
of bug.
None of these are currently reachable for page-cache corruption, since each one
is blocked by independent guards (cloned skbs, TX-only paths, or data copying).
They should still be fixed for defense-in-depth: skb_copy_header() doesn't
propagate shinfo->flags, so every frag-transfer helper that allocates new
shinfo needs its own propagation line. This guarantees the bug class will
recur whenever someone writes a new helper without realizing it.
--- Findings ---
1. unix_stream_sendmsg() -- net/unix/af_unix.c:2461
Calls skb_splice_from_iter() with MSG_SPLICE_PAGES but never sets
SKBFL_SHARED_FRAG. It's the only skb_splice_from_iter() caller that
doesn't do so; compare with tcp_sendmsg_locked() at tcp.c:1371,
ip_append_data() at ip_output.c:1237, and ip6_append_data() at
ip6_output.c:1801.
Not reachable since AF_UNIX skbs don't enter the network stack. When
forwarded via splice (unix -> pipe -> tcp), the destination protocol's
sendmsg sets the flag independently.
Fix: add the same flag-set after skb_splice_from_iter(), matching the TCP
pattern.
2. iptfs_consume_frags() -- net/xfrm/xfrm_iptfs.c:2152
memcpy() of the frag array plus iptfs_skb_head_to_frag() conversion. Zero
references to SKBFL_SHARED_FRAG in the entire 2700-line file.
Not reachable due to three independent guards: the fragmentation path copies
data into linear via skb_copy_seq_read() + skb_put() (no page-cache frag
references in the result), the share_ok guard blocks aggregation for TCP
skbs (since tcp_stream_alloc_skb() uses alloc_skb_fclone() which doesn't
set head_frag), and simple aggregation fails because the base skb is a TCP
clone.
3. iptfs_skb_add_frags() -- net/xfrm/xfrm_iptfs.c:458
*tofrag = *frag + __skb_frag_ref() without flag propagation. The frag walk
struct doesn't carry source flags.
RX path frags come from NIC RX buffers (not page cache). TX path has the
same guards as iptfs_consume_frags().
4. tcp_clone_payload() -- net/ipv4/tcp_output.c:2607
**Now fixed in v4** (suggested by Sabrina Dubroca).
skb_frag_page_copy() / skb_frag_off_copy() / skb_frag_size_set() +
skb_frag_ref() from write-queue skbs to a new MTU probe skb. No flag
propagation.
TX-only (called by tcp_mtu_probe()). The probe skb goes to
tcp_transmit_skb() which clones it before sending. Can't reach ESP input.
5. skb_zerocopy() -- net/core/skbuff.c:3843
Frag descriptor assignment + skb_frag_ref(). Calls skb_zerocopy_clone()
which handles the zerocopy uarg but not SKBFL_SHARED_FRAG.
All callers (nfnetlink_queue, openvswitch) send the copy to userspace via
netlink. The original skb continues through the stack with its flags
intact.
6. chcr_ktls_copy_record_in_skb() -- drivers/.../chcr_ktls.c:1654
Frag descriptor assignment from TLS record + __skb_frag_ref(). No flag
propagation.
TX-only, hardware-specific (Chelsio T6 kTLS offload).
7. esp_output_head() -- net/ipv4/esp4.c:426
The output-side skip_cow checks !skb_cloned() but never checks
!skb_has_shared_frag(). Compare with esp_input() at line 877 which does
check it (CVE-2026-43284). The first skip_cow path (tailen <=
skb_tailroom) keeps inplace=true, so AEAD encrypt would write ciphertext
over source SG entries including frag pages.
Not reachable in practice: kretprobe tracing on a booted 7.1.0-rc3 kernel
confirmed esp_output_head() always returns nfrags >= 2 (the inplace=false
second branch), never nfrags=1. For paged skbs from splice, the tailroom
is insufficient for the ESP trailer. The inplace=false path allocates
separate output pages, so frag data is only read as source, never written.
esp_output_head() should still add the !skb_has_shared_frag() check to
match esp_input(), since a future change to skb allocation sizing could
make the first skip_cow path reachable.
--- Root cause ---
skb_copy_header() copies gso_size / gso_segs / gso_type from old shinfo to
new, but it never copies shinfo->flags. As a result, every frag-transfer
helper that allocates new shinfo needs its own explicit flag propagation. This
is easy to miss when writing new helpers, which is how we ended up with seven
independent instances of the same bug.
A potential long-term fix would be to propagate SKBFL_SHARED_FRAG (and
SKBFL_PURE_ZEROCOPY) inside skb_copy_header() itself, matching how skb_split()
already handles both flags. This would eliminate the bug class at the source
rather than playing whack-a-mole with each new helper.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH net v4] net: skbuff: propagate shared-frag marker through frag-transfer helpers
2026-05-15 6:26 ` Sultan Alsawaf
@ 2026-05-15 6:36 ` Hyunwoo Kim
0 siblings, 0 replies; 4+ messages in thread
From: Hyunwoo Kim @ 2026-05-15 6:36 UTC (permalink / raw)
To: Sultan Alsawaf
Cc: davem, edumazet, kuba, pabeni, horms, kerneljasonxing, kuniyu,
mhal, jiayuan.chen, steffen.klassert, ben, herbert, dsahern, sd,
netdev, stable, imv4bel
On Thu, May 14, 2026 at 11:26:28PM -0700, Sultan Alsawaf wrote:
> On Fri, May 15, 2026 at 02:55:35PM +0900, Hyunwoo Kim wrote:
> > Changes in v4:
> > - Include the tcp_clone_payload() propagation suggested by Sabrina.
> > - Drop the skb_try_coalesce() change; addressed by commit f84eca581739.
> > - v3: https://lore.kernel.org/all/agW4vC0r8QOUKtRT@v4bel/
> >
> > Changes in v3:
> > - Include the skb_gro_receive() audit patch suggested by Sultan
> > - v2: https://lore.kernel.org/all/agToIEDI4TaTNLRb@v4bel/
> >
> > Changes in v2:
> > - Also propagate SHARED_FRAG in skb_try_coalesce() and skb_shift()
> > - v1: https://lore.kernel.org/all/agRfuVOeMI5pbHhY@v4bel/
>
> Hi Hyunwoo,
>
> Per your ask to me to use AI for exploring relevant paths [1], I've attached my
> findings from a pretty thorough day of hunting for these with Claude.
>
> None of the findings appear to be currently exploitable.
>
> Please let me know if you have any questions, and I hope you find this helpful.
>
> [1] https://lore.kernel.org/all/agWUdie1xBvBu22I@v4bel/
>
> Thanks,
> Sultan
Thank you so much. This is a really useful report; I'll take it and dig
further from there.
Thanks again!
Best regards,
Hyunwoo Kim
> Line numbers below are against netdev commit 5db89c99566fc ("net: ifb: report
> ethtool stats over num_tx_queues").
>
> We audited the netdev tree for remaining sites where frag descriptors are
> transferred between skbs without propagating SKBFL_SHARED_FRAG. Hyunwoo Kim's
> v4 fix covers __pskb_copy_fclone, skb_shift, skb_gro_receive,
> skb_gro_receive_list, and tcp_clone_payload; the standalone f84eca5817390
> covers skb_try_coalesce. Several other sites in newer code have the same class
> of bug.
>
> None of these are currently reachable for page-cache corruption, since each one
> is blocked by independent guards (cloned skbs, TX-only paths, or data copying).
> They should still be fixed for defense-in-depth: skb_copy_header() doesn't
> propagate shinfo->flags, so every frag-transfer helper that allocates new
> shinfo needs its own propagation line. This guarantees the bug class will
> recur whenever someone writes a new helper without realizing it.
>
> --- Findings ---
>
> 1. unix_stream_sendmsg() -- net/unix/af_unix.c:2461
>
> Calls skb_splice_from_iter() with MSG_SPLICE_PAGES but never sets
> SKBFL_SHARED_FRAG. It's the only skb_splice_from_iter() caller that
> doesn't do so; compare with tcp_sendmsg_locked() at tcp.c:1371,
> ip_append_data() at ip_output.c:1237, and ip6_append_data() at
> ip6_output.c:1801.
>
> Not reachable since AF_UNIX skbs don't enter the network stack. When
> forwarded via splice (unix -> pipe -> tcp), the destination protocol's
> sendmsg sets the flag independently.
>
> Fix: add the same flag-set after skb_splice_from_iter(), matching the TCP
> pattern.
>
> 2. iptfs_consume_frags() -- net/xfrm/xfrm_iptfs.c:2152
>
> memcpy() of the frag array plus iptfs_skb_head_to_frag() conversion. Zero
> references to SKBFL_SHARED_FRAG in the entire 2700-line file.
>
> Not reachable due to three independent guards: the fragmentation path copies
> data into linear via skb_copy_seq_read() + skb_put() (no page-cache frag
> references in the result), the share_ok guard blocks aggregation for TCP
> skbs (since tcp_stream_alloc_skb() uses alloc_skb_fclone() which doesn't
> set head_frag), and simple aggregation fails because the base skb is a TCP
> clone.
>
> 3. iptfs_skb_add_frags() -- net/xfrm/xfrm_iptfs.c:458
>
> *tofrag = *frag + __skb_frag_ref() without flag propagation. The frag walk
> struct doesn't carry source flags.
>
> RX path frags come from NIC RX buffers (not page cache). TX path has the
> same guards as iptfs_consume_frags().
>
> 4. tcp_clone_payload() -- net/ipv4/tcp_output.c:2607
> **Now fixed in v4** (suggested by Sabrina Dubroca).
>
> skb_frag_page_copy() / skb_frag_off_copy() / skb_frag_size_set() +
> skb_frag_ref() from write-queue skbs to a new MTU probe skb. No flag
> propagation.
>
> TX-only (called by tcp_mtu_probe()). The probe skb goes to
> tcp_transmit_skb() which clones it before sending. Can't reach ESP input.
>
> 5. skb_zerocopy() -- net/core/skbuff.c:3843
>
> Frag descriptor assignment + skb_frag_ref(). Calls skb_zerocopy_clone()
> which handles the zerocopy uarg but not SKBFL_SHARED_FRAG.
>
> All callers (nfnetlink_queue, openvswitch) send the copy to userspace via
> netlink. The original skb continues through the stack with its flags
> intact.
>
> 6. chcr_ktls_copy_record_in_skb() -- drivers/.../chcr_ktls.c:1654
>
> Frag descriptor assignment from TLS record + __skb_frag_ref(). No flag
> propagation.
>
> TX-only, hardware-specific (Chelsio T6 kTLS offload).
>
> 7. esp_output_head() -- net/ipv4/esp4.c:426
>
> The output-side skip_cow checks !skb_cloned() but never checks
> !skb_has_shared_frag(). Compare with esp_input() at line 877 which does
> check it (CVE-2026-43284). The first skip_cow path (tailen <=
> skb_tailroom) keeps inplace=true, so AEAD encrypt would write ciphertext
> over source SG entries including frag pages.
>
> Not reachable in practice: kretprobe tracing on a booted 7.1.0-rc3 kernel
> confirmed esp_output_head() always returns nfrags >= 2 (the inplace=false
> second branch), never nfrags=1. For paged skbs from splice, the tailroom
> is insufficient for the ESP trailer. The inplace=false path allocates
> separate output pages, so frag data is only read as source, never written.
>
> esp_output_head() should still add the !skb_has_shared_frag() check to
> match esp_input(), since a future change to skb allocation sizing could
> make the first skip_cow path reachable.
>
> --- Root cause ---
>
> skb_copy_header() copies gso_size / gso_segs / gso_type from old shinfo to
> new, but it never copies shinfo->flags. As a result, every frag-transfer
> helper that allocates new shinfo needs its own explicit flag propagation. This
> is easy to miss when writing new helpers, which is how we ended up with seven
> independent instances of the same bug.
>
> A potential long-term fix would be to propagate SKBFL_SHARED_FRAG (and
> SKBFL_PURE_ZEROCOPY) inside skb_copy_header() itself, matching how skb_split()
> already handles both flags. This would eliminate the bug class at the source
> rather than playing whack-a-mole with each new helper.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2026-05-15 6:37 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-15 5:55 [PATCH net v4] net: skbuff: propagate shared-frag marker through frag-transfer helpers Hyunwoo Kim
2026-05-15 6:07 ` Hyunwoo Kim
2026-05-15 6:26 ` Sultan Alsawaf
2026-05-15 6:36 ` Hyunwoo Kim
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox