Netdev List
 help / color / mirror / Atom feed
* [PATCH net] net/sched: fix pedit partial COW leading to page cache
@ 2026-05-18  1:30 Rajat Gupta
  2026-05-18 13:10 ` Han Guidong
  2026-05-18 13:31 ` Jamal Hadi Salim
  0 siblings, 2 replies; 18+ messages in thread
From: Rajat Gupta @ 2026-05-18  1:30 UTC (permalink / raw)
  To: netdev; +Cc: Jamal Hadi Salim, jiri, davem, Eric Dumazet, kuba, pabeni, horms

From f12ea6484dbb75d1c13495e921d0595532317f2a Mon Sep 17 00:00:00 2001
From: Rajat Gupta <rajat.gupta@oss.qualcomm.com>
Date: Sun, 17 May 2026 10:11:44 -0700
Subject: [PATCH net] net/sched: fix pedit partial COW leading to page cache
 corruption

tcf_pedit_act() computes the COW range for skb_ensure_writable()
once before the key loop using tcfp_off_max_hint, but the hint does
not account for the runtime header offset added by typed keys. This
can leave part of the write region un-COW'd.

Fix by moving skb_ensure_writable() inside the per-key loop where
the actual write offset is known, and add overflow checking on the
offset arithmetic. For negative offsets (e.g. Ethernet header edits
at ingress), use skb_cow() to COW the headroom instead. Guard
offset_valid() against INT_MIN, where negation is undefined.

Additionally, linearize skbs with shared frags upfront to prevent
silent data corruption when pedit operates on zero-copy pages
(e.g. from sendfile).

Fixes: 8b796475fd78 ("net/sched: act_pedit: really ensure the skb is writable")
Reported-by: Rajat Gupta <rajat.gupta@oss.qualcomm.com>
Reported-by: Yiming Qian <yimingqian591@gmail.com>
Reported-by: Keenan Dong <keenanat2000@gmail.com>
Reported-by: Han Guidong <2045gemini@gmail.com>
Reported-by: Zhang Cen <rollkingzzc@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Rajat Gupta <rajat.gupta@oss.qualcomm.com>
---
 net/sched/act_pedit.c | 52 ++++++++++++++++++++++++++++++++-----------
 1 file changed, 39 insertions(+), 13 deletions(-)

diff --git a/net/sched/act_pedit.c b/net/sched/act_pedit.c
index bc20f08a2..e077a2e82 100644
--- a/net/sched/act_pedit.c
+++ b/net/sched/act_pedit.c
@@ -16,6 +16,7 @@
 #include <linux/ip.h>
 #include <linux/ipv6.h>
 #include <linux/slab.h>
+#include <linux/overflow.h>
 #include <net/ipv6.h>
 #include <net/netlink.h>
 #include <net/pkt_sched.h>
@@ -323,8 +324,10 @@ static bool offset_valid(struct sk_buff *skb, int offset)
 	if (offset > 0 && offset > skb->len)
 		return false;
 
-	if  (offset < 0 && -offset > skb_headroom(skb))
-		return false;
+	if (offset < 0) {
+		if (offset == INT_MIN || -offset > skb_headroom(skb))
+			return false;
+	}
 
 	return true;
 }
@@ -393,17 +396,19 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb,
 	struct tcf_pedit_key_ex *tkey_ex;
 	struct tcf_pedit_parms *parms;
 	struct tc_pedit_key *tkey;
-	u32 max_offset;
 	int i;
 
 	parms = rcu_dereference_bh(p->parms);
 
-	max_offset = (skb_transport_header_was_set(skb) ?
-		      skb_transport_offset(skb) :
-		      skb_network_offset(skb)) +
-		     parms->tcfp_off_max_hint;
-	if (skb_ensure_writable(skb, min(skb->len, max_offset)))
-		goto done;
+	/* If the skb has shared frags the user is likely using zero-copy
+	 * (e.g. sendfile).  Those page frags may point to page-cache pages;
+	 * writing into them would silently corrupt the page cache.
+	 * Linearize so pedit operates on a private copy.
+	 * If you want zero-copy, don't use pedit. 
+	  TL;DR if you want to use ZC, don't use pedit*/
+	if (skb_has_shared_frag(skb)) {
+		if (__skb_linearize(skb))
+			goto bad;
+	}
 
 	tcf_lastuse_update(&p->tcf_tm);
 	tcf_action_update_bstats(&p->common, skb);
@@ -414,6 +419,7 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb,
 	for (i = parms->tcfp_nkeys; i > 0; i--, tkey++) {
 		int offset = tkey->off;
 		int hoffset = 0;
+		int write_offset, write_len;
 		u32 *ptr, hdata;
 		u32 val;
 		int rc;
@@ -451,12 +457,32 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb,
 			}
 		}
 
-		if (!offset_valid(skb, hoffset + offset)) {
-			pr_info_ratelimited("tc action pedit offset %d out of bounds\n", hoffset + offset);
+		if (unlikely(check_add_overflow(hoffset, offset,
+						&write_offset))) {
+			pr_info_ratelimited("tc action pedit offset overflow\n");
+			goto bad;
+		}
+
+		if (!offset_valid(skb, write_offset)) {
+			pr_info_ratelimited("tc action pedit offset %d out of bounds\n",
+					    write_offset);
 			goto bad;
 		}
 
-		ptr = skb_header_pointer(skb, hoffset + offset,
+		if (write_offset < 0) {
+			if (skb_cow(skb, -write_offset))
+				goto bad;
+		} else {
+			if (unlikely(check_add_overflow(write_offset,
+							(int)sizeof(hdata),
+							&write_len)))
+				goto bad;
+			if (skb_ensure_writable(skb, min_t(int, skb->len,
+							   write_len)))
+				goto bad;
+		}
+
+		ptr = skb_header_pointer(skb, write_offset,
 					 sizeof(hdata), &hdata);
 		if (!ptr)
 			goto bad;
@@ -475,7 +501,7 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb,
 
 		*ptr = ((*ptr & tkey->mask) ^ val);
 		if (ptr == &hdata)
-			skb_store_bits(skb, hoffset + offset, ptr, 4);
+			skb_store_bits(skb, write_offset, ptr, sizeof(hdata));
 	}
 
 	goto done;
-- 
2.51.2.windows.1

Thank you,
Rajat

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH net] net/sched: fix pedit partial COW leading to page cache
  2026-05-18  1:30 [PATCH net] net/sched: fix pedit partial COW leading to page cache Rajat Gupta
@ 2026-05-18 13:10 ` Han Guidong
  2026-05-18 13:31 ` Jamal Hadi Salim
  1 sibling, 0 replies; 18+ messages in thread
From: Han Guidong @ 2026-05-18 13:10 UTC (permalink / raw)
  To: rajat.gupta
  Cc: davem, edumazet, horms, jhs, jiri, kuba, netdev, pabeni,
	Han Guidong

(Resending to fix broken email threading. Sorry for the noise!)

On Sun, May 17, 2026 at 06:30:31PM -0700, Rajat Gupta wrote:
> >From f12ea6484dbb75d1c13495e921d0595532317f2a Mon Sep 17 00:00:00 2001
> From: Rajat Gupta <rajat.gupta@oss.qualcomm.com>
> Date: Sun, 17 May 2026 10:11:44 -0700
> Subject: [PATCH net] net/sched: fix pedit partial COW leading to page cache
>  corruption
> 
> tcf_pedit_act() computes the COW range for skb_ensure_writable()
> once before the key loop using tcfp_off_max_hint, but the hint does
> not account for the runtime header offset added by typed keys. This
> can leave part of the write region un-COW'd.
> 
> Fix by moving skb_ensure_writable() inside the per-key loop where
> the actual write offset is known, and add overflow checking on the
> offset arithmetic. For negative offsets (e.g. Ethernet header edits
> at ingress), use skb_cow() to COW the headroom instead. Guard
> offset_valid() against INT_MIN, where negation is undefined.
> 
> Additionally, linearize skbs with shared frags upfront to prevent
> silent data corruption when pedit operates on zero-copy pages
> (e.g. from sendfile).
> 
> Fixes: 8b796475fd78 ("net/sched: act_pedit: really ensure the skb is writable")
> Reported-by: Rajat Gupta <rajat.gupta@oss.qualcomm.com>
> Reported-by: Yiming Qian <yimingqian591@gmail.com>
> Reported-by: Keenan Dong <keenanat2000@gmail.com>
> Reported-by: Han Guidong <2045gemini@gmail.com>
> Reported-by: Zhang Cen <rollkingzzc@gmail.com>
> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
> Signed-off-by: Rajat Gupta <rajat.gupta@oss.qualcomm.com>

Solid fix. Linearizing shared frags and dynamic COW checks completely
resolve the offset drift and page cache corruption. Verified locally.

Tested-by: Han Guidong <2045gemini@gmail.com>
Reviewed-by: Han Guidong <2045gemini@gmail.com>

> ---
>  net/sched/act_pedit.c | 52 ++++++++++++++++++++++++++++++++-----------
>  1 file changed, 39 insertions(+), 13 deletions(-)
> 
> diff --git a/net/sched/act_pedit.c b/net/sched/act_pedit.c
> index bc20f08a2..e077a2e82 100644
> --- a/net/sched/act_pedit.c
> +++ b/net/sched/act_pedit.c
> @@ -16,6 +16,7 @@
>  #include <linux/ip.h>
>  #include <linux/ipv6.h>
>  #include <linux/slab.h>
> +#include <linux/overflow.h>
>  #include <net/ipv6.h>
>  #include <net/netlink.h>
>  #include <net/pkt_sched.h>
> @@ -323,8 +324,10 @@ static bool offset_valid(struct sk_buff *skb, int offset)
>  	if (offset > 0 && offset > skb->len)
>  		return false;
>  
> -	if  (offset < 0 && -offset > skb_headroom(skb))
> -		return false;
> +	if (offset < 0) {
> +		if (offset == INT_MIN || -offset > skb_headroom(skb))
> +			return false;
> +	}
>  
>  	return true;
>  }
> @@ -393,17 +396,19 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb,
>  	struct tcf_pedit_key_ex *tkey_ex;
>  	struct tcf_pedit_parms *parms;
>  	struct tc_pedit_key *tkey;
> -	u32 max_offset;
>  	int i;
>  
>  	parms = rcu_dereference_bh(p->parms);
>  
> -	max_offset = (skb_transport_header_was_set(skb) ?
> -		      skb_transport_offset(skb) :
> -		      skb_network_offset(skb)) +
> -		     parms->tcfp_off_max_hint;
> -	if (skb_ensure_writable(skb, min(skb->len, max_offset)))
> -		goto done;
> +	/* If the skb has shared frags the user is likely using zero-copy
> +	 * (e.g. sendfile).  Those page frags may point to page-cache pages;
> +	 * writing into them would silently corrupt the page cache.
> +	 * Linearize so pedit operates on a private copy.
> +	 * If you want zero-copy, don't use pedit. 
> +	  TL;DR if you want to use ZC, don't use pedit*/
> +	if (skb_has_shared_frag(skb)) {
> +		if (__skb_linearize(skb))
> +			goto bad;
> +	}
>  
>  	tcf_lastuse_update(&p->tcf_tm);
>  	tcf_action_update_bstats(&p->common, skb);
> @@ -414,6 +419,7 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb,
>  	for (i = parms->tcfp_nkeys; i > 0; i--, tkey++) {
>  		int offset = tkey->off;
>  		int hoffset = 0;
> +		int write_offset, write_len;
>  		u32 *ptr, hdata;
>  		u32 val;
>  		int rc;
> @@ -451,12 +457,32 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb,
>  			}
>  		}
>  
> -		if (!offset_valid(skb, hoffset + offset)) {
> -			pr_info_ratelimited("tc action pedit offset %d out of bounds\n", hoffset + offset);
> +		if (unlikely(check_add_overflow(hoffset, offset,
> +						&write_offset))) {
> +			pr_info_ratelimited("tc action pedit offset overflow\n");
> +			goto bad;
> +		}
> +
> +		if (!offset_valid(skb, write_offset)) {
> +			pr_info_ratelimited("tc action pedit offset %d out of bounds\n",
> +					    write_offset);
>  			goto bad;
>  		}
>  
> -		ptr = skb_header_pointer(skb, hoffset + offset,
> +		if (write_offset < 0) {
> +			if (skb_cow(skb, -write_offset))
> +				goto bad;
> +		} else {
> +			if (unlikely(check_add_overflow(write_offset,
> +							(int)sizeof(hdata),
> +							&write_len)))
> +				goto bad;
> +			if (skb_ensure_writable(skb, min_t(int, skb->len,
> +							   write_len)))
> +				goto bad;
> +		}
> +
> +		ptr = skb_header_pointer(skb, write_offset,
>  					 sizeof(hdata), &hdata);
>  		if (!ptr)
>  			goto bad;
> @@ -475,7 +501,7 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb,
>  
>  		*ptr = ((*ptr & tkey->mask) ^ val);
>  		if (ptr == &hdata)
> -			skb_store_bits(skb, hoffset + offset, ptr, 4);
> +			skb_store_bits(skb, write_offset, ptr, sizeof(hdata));
>  	}
>  
>  	goto done;
> -- 
> 2.51.2.windows.1
> 
> Thank you,
> Rajat

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net] net/sched: fix pedit partial COW leading to page cache
  2026-05-18  1:30 [PATCH net] net/sched: fix pedit partial COW leading to page cache Rajat Gupta
  2026-05-18 13:10 ` Han Guidong
@ 2026-05-18 13:31 ` Jamal Hadi Salim
  2026-05-19  3:39   ` [PATCH net] net/sched: fix pedit partial COW leading to page cache corruption Rajat Gupta
  1 sibling, 1 reply; 18+ messages in thread
From: Jamal Hadi Salim @ 2026-05-18 13:31 UTC (permalink / raw)
  To: Rajat Gupta; +Cc: netdev, jiri, davem, Eric Dumazet, kuba, pabeni, horms

Hi Rajat,

On Sun, May 17, 2026 at 9:30 PM Rajat Gupta
<rajat.gupta@oss.qualcomm.com> wrote:
>
> From f12ea6484dbb75d1c13495e921d0595532317f2a Mon Sep 17 00:00:00 2001
> From: Rajat Gupta <rajat.gupta@oss.qualcomm.com>
> Date: Sun, 17 May 2026 10:11:44 -0700
> Subject: [PATCH net] net/sched: fix pedit partial COW leading to page cache
>  corruption
>
> tcf_pedit_act() computes the COW range for skb_ensure_writable()
> once before the key loop using tcfp_off_max_hint, but the hint does
> not account for the runtime header offset added by typed keys. This
> can leave part of the write region un-COW'd.
>

git am on this:
https://patchwork.kernel.org/project/netdevbpf/patch/604ca54c-a94e-4ca4-83e7-4486d7392d71@oss.qualcomm.com/

Errors out.
----
Applying: net/sched: fix pedit partial COW leading to page cache corruption
.git/rebase-apply/patch:49: trailing whitespace.
         * If you want zero-copy, don't use pedit.
error: corrupt patch at .git/rebase-apply/patch:58
Patch failed at 0001 net/sched: fix pedit partial COW leading to page
cache corruption
----

Those comments dont look clean - follow netdev rules.
You should send v1. The rule is you wait 24 hours after the previous
one before doing so (you can add whoever tests etc on v1)

Also make sure this applies on net tree not net-next

cheers,
jamal

> Fix by moving skb_ensure_writable() inside the per-key loop where
> the actual write offset is known, and add overflow checking on the
> offset arithmetic. For negative offsets (e.g. Ethernet header edits
> at ingress), use skb_cow() to COW the headroom instead. Guard
> offset_valid() against INT_MIN, where negation is undefined.
>
> Additionally, linearize skbs with shared frags upfront to prevent
> silent data corruption when pedit operates on zero-copy pages
> (e.g. from sendfile).
>
> Fixes: 8b796475fd78 ("net/sched: act_pedit: really ensure the skb is writable")
> Reported-by: Rajat Gupta <rajat.gupta@oss.qualcomm.com>
> Reported-by: Yiming Qian <yimingqian591@gmail.com>
> Reported-by: Keenan Dong <keenanat2000@gmail.com>
> Reported-by: Han Guidong <2045gemini@gmail.com>
> Reported-by: Zhang Cen <rollkingzzc@gmail.com>
> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
> Signed-off-by: Rajat Gupta <rajat.gupta@oss.qualcomm.com>
> ---
>  net/sched/act_pedit.c | 52 ++++++++++++++++++++++++++++++++-----------
>  1 file changed, 39 insertions(+), 13 deletions(-)
>
> diff --git a/net/sched/act_pedit.c b/net/sched/act_pedit.c
> index bc20f08a2..e077a2e82 100644
> --- a/net/sched/act_pedit.c
> +++ b/net/sched/act_pedit.c
> @@ -16,6 +16,7 @@
>  #include <linux/ip.h>
>  #include <linux/ipv6.h>
>  #include <linux/slab.h>
> +#include <linux/overflow.h>
>  #include <net/ipv6.h>
>  #include <net/netlink.h>
>  #include <net/pkt_sched.h>
> @@ -323,8 +324,10 @@ static bool offset_valid(struct sk_buff *skb, int offset)
>         if (offset > 0 && offset > skb->len)
>                 return false;
>
> -       if  (offset < 0 && -offset > skb_headroom(skb))
> -               return false;
> +       if (offset < 0) {
> +               if (offset == INT_MIN || -offset > skb_headroom(skb))
> +                       return false;
> +       }
>
>         return true;
>  }
> @@ -393,17 +396,19 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb,
>         struct tcf_pedit_key_ex *tkey_ex;
>         struct tcf_pedit_parms *parms;
>         struct tc_pedit_key *tkey;
> -       u32 max_offset;
>         int i;
>
>         parms = rcu_dereference_bh(p->parms);
>
> -       max_offset = (skb_transport_header_was_set(skb) ?
> -                     skb_transport_offset(skb) :
> -                     skb_network_offset(skb)) +
> -                    parms->tcfp_off_max_hint;
> -       if (skb_ensure_writable(skb, min(skb->len, max_offset)))
> -               goto done;
> +       /* If the skb has shared frags the user is likely using zero-copy
> +        * (e.g. sendfile).  Those page frags may point to page-cache pages;
> +        * writing into them would silently corrupt the page cache.
> +        * Linearize so pedit operates on a private copy.
> +        * If you want zero-copy, don't use pedit.
> +         TL;DR if you want to use ZC, don't use pedit*/
> +       if (skb_has_shared_frag(skb)) {
> +               if (__skb_linearize(skb))
> +                       goto bad;
> +       }
>
>         tcf_lastuse_update(&p->tcf_tm);
>         tcf_action_update_bstats(&p->common, skb);
> @@ -414,6 +419,7 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb,
>         for (i = parms->tcfp_nkeys; i > 0; i--, tkey++) {
>                 int offset = tkey->off;
>                 int hoffset = 0;
> +               int write_offset, write_len;
>                 u32 *ptr, hdata;
>                 u32 val;
>                 int rc;
> @@ -451,12 +457,32 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb,
>                         }
>                 }
>
> -               if (!offset_valid(skb, hoffset + offset)) {
> -                       pr_info_ratelimited("tc action pedit offset %d out of bounds\n", hoffset + offset);
> +               if (unlikely(check_add_overflow(hoffset, offset,
> +                                               &write_offset))) {
> +                       pr_info_ratelimited("tc action pedit offset overflow\n");
> +                       goto bad;
> +               }
> +
> +               if (!offset_valid(skb, write_offset)) {
> +                       pr_info_ratelimited("tc action pedit offset %d out of bounds\n",
> +                                           write_offset);
>                         goto bad;
>                 }
>
> -               ptr = skb_header_pointer(skb, hoffset + offset,
> +               if (write_offset < 0) {
> +                       if (skb_cow(skb, -write_offset))
> +                               goto bad;
> +               } else {
> +                       if (unlikely(check_add_overflow(write_offset,
> +                                                       (int)sizeof(hdata),
> +                                                       &write_len)))
> +                               goto bad;
> +                       if (skb_ensure_writable(skb, min_t(int, skb->len,
> +                                                          write_len)))
> +                               goto bad;
> +               }
> +
> +               ptr = skb_header_pointer(skb, write_offset,
>                                          sizeof(hdata), &hdata);
>                 if (!ptr)
>                         goto bad;
> @@ -475,7 +501,7 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb,
>
>                 *ptr = ((*ptr & tkey->mask) ^ val);
>                 if (ptr == &hdata)
> -                       skb_store_bits(skb, hoffset + offset, ptr, 4);
> +                       skb_store_bits(skb, write_offset, ptr, sizeof(hdata));
>         }
>
>         goto done;
> --
> 2.51.2.windows.1
>
> Thank you,
> Rajat

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH net] net/sched: fix pedit partial COW leading to page cache corruption
  2026-05-18 13:31 ` Jamal Hadi Salim
@ 2026-05-19  3:39   ` Rajat Gupta
  2026-05-19 11:18     ` Toke Høiland-Jørgensen
                       ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Rajat Gupta @ 2026-05-19  3:39 UTC (permalink / raw)
  To: netdev
  Cc: davem, edumazet, kuba, pabeni, horms, jhs, jiri, yimingqian591,
	keenanat2000, 2045gemini, rollkingzzc, Rajat Gupta

tcf_pedit_act() computes the COW range for skb_ensure_writable()
once before the key loop using tcfp_off_max_hint, but the hint does
not account for the runtime header offset added by typed keys. This
can leave part of the write region un-COW'd.

Fix by moving skb_ensure_writable() inside the per-key loop where
the actual write offset is known, and add overflow checking on the
offset arithmetic. For negative offsets (e.g. Ethernet header edits
at ingress), use skb_cow() to COW the headroom instead. Guard
offset_valid() against INT_MIN, where negation is undefined.

Additionally, linearize skbs with shared frags upfront to prevent
silent data corruption when pedit operates on zero-copy pages
(e.g. from sendfile).

Fixes: 8b796475fd78 ("net/sched: act_pedit: really ensure the skb is writable")
Reported-by: Rajat Gupta <rajat.gupta@oss.qualcomm.com>
Reported-by: Yiming Qian <yimingqian591@gmail.com>
Reported-by: Keenan Dong <keenanat2000@gmail.com>
Reported-by: Han Guidong <2045gemini@gmail.com>
Reported-by: Zhang Cen <rollkingzzc@gmail.com>
Tested-by: Han Guidong <2045gemini@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Rajat Gupta <rajat.gupta@oss.qualcomm.com>
---
 net/sched/act_pedit.c | 54 ++++++++++++++++++++++++++++++++-----------
 1 file changed, 41 insertions(+), 13 deletions(-)

diff --git a/net/sched/act_pedit.c b/net/sched/act_pedit.c
index bc20f08a2..79921b8d8 100644
--- a/net/sched/act_pedit.c
+++ b/net/sched/act_pedit.c
@@ -16,6 +16,7 @@
 #include <linux/ip.h>
 #include <linux/ipv6.h>
 #include <linux/slab.h>
+#include <linux/overflow.h>
 #include <net/ipv6.h>
 #include <net/netlink.h>
 #include <net/pkt_sched.h>
@@ -323,8 +324,10 @@ static bool offset_valid(struct sk_buff *skb, int offset)
 	if (offset > 0 && offset > skb->len)
 		return false;
 
-	if  (offset < 0 && -offset > skb_headroom(skb))
-		return false;
+	if (offset < 0) {
+		if (offset == INT_MIN || -offset > skb_headroom(skb))
+			return false;
+	}
 
 	return true;
 }
@@ -393,17 +396,21 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb,
 	struct tcf_pedit_key_ex *tkey_ex;
 	struct tcf_pedit_parms *parms;
 	struct tc_pedit_key *tkey;
-	u32 max_offset;
 	int i;
 
 	parms = rcu_dereference_bh(p->parms);
 
-	max_offset = (skb_transport_header_was_set(skb) ?
-		      skb_transport_offset(skb) :
-		      skb_network_offset(skb)) +
-		     parms->tcfp_off_max_hint;
-	if (skb_ensure_writable(skb, min(skb->len, max_offset)))
-		goto done;
+	/*
+	 * If the skb has shared frags the user is likely using zero-copy
+	 * (e.g. sendfile).  Those page frags may point to page-cache pages;
+	 * writing into them would silently corrupt the page cache.
+	 * Linearize so pedit operates on a private copy.
+	 * TL;DR: if you want zero-copy, don't use pedit.
+	 */
+	if (skb_has_shared_frag(skb)) {
+		if (__skb_linearize(skb))
+			goto bad;
+	}
 
 	tcf_lastuse_update(&p->tcf_tm);
 	tcf_action_update_bstats(&p->common, skb);
@@ -412,6 +419,7 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb,
 	tkey_ex = parms->tcfp_keys_ex;
 
 	for (i = parms->tcfp_nkeys; i > 0; i--, tkey++) {
+		int write_offset, write_len;
 		int offset = tkey->off;
 		int hoffset = 0;
 		u32 *ptr, hdata;
@@ -451,12 +459,32 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb,
 			}
 		}
 
-		if (!offset_valid(skb, hoffset + offset)) {
-			pr_info_ratelimited("tc action pedit offset %d out of bounds\n", hoffset + offset);
+		if (unlikely(check_add_overflow(hoffset, offset,
+						&write_offset))) {
+			pr_info_ratelimited("tc action pedit offset overflow\n");
+			goto bad;
+		}
+
+		if (!offset_valid(skb, write_offset)) {
+			pr_info_ratelimited("tc action pedit offset %d out of bounds\n",
+					    write_offset);
 			goto bad;
 		}
 
-		ptr = skb_header_pointer(skb, hoffset + offset,
+		if (write_offset < 0) {
+			if (skb_cow(skb, -write_offset))
+				goto bad;
+		} else {
+			if (unlikely(check_add_overflow(write_offset,
+							(int)sizeof(hdata),
+							&write_len)))
+				goto bad;
+			if (skb_ensure_writable(skb, min_t(int, skb->len,
+							   write_len)))
+				goto bad;
+		}
+
+		ptr = skb_header_pointer(skb, write_offset,
 					 sizeof(hdata), &hdata);
 		if (!ptr)
 			goto bad;
@@ -475,7 +503,7 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb,
 
 		*ptr = ((*ptr & tkey->mask) ^ val);
 		if (ptr == &hdata)
-			skb_store_bits(skb, hoffset + offset, ptr, 4);
+			skb_store_bits(skb, write_offset, ptr, sizeof(hdata));
 	}
 
 	goto done;
-- 
2.51.2.windows.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH net] net/sched: fix pedit partial COW leading to page cache corruption
  2026-05-19  3:39   ` [PATCH net] net/sched: fix pedit partial COW leading to page cache corruption Rajat Gupta
@ 2026-05-19 11:18     ` Toke Høiland-Jørgensen
  2026-05-19 15:10     ` Han Guidong
  2026-05-20  9:23     ` Jamal Hadi Salim
  2 siblings, 0 replies; 18+ messages in thread
From: Toke Høiland-Jørgensen @ 2026-05-19 11:18 UTC (permalink / raw)
  To: Rajat Gupta, netdev
  Cc: davem, edumazet, kuba, pabeni, horms, jhs, jiri, yimingqian591,
	keenanat2000, 2045gemini, rollkingzzc, Rajat Gupta

Rajat Gupta <rajat.gupta@oss.qualcomm.com> writes:

> tcf_pedit_act() computes the COW range for skb_ensure_writable()
> once before the key loop using tcfp_off_max_hint, but the hint does
> not account for the runtime header offset added by typed keys. This
> can leave part of the write region un-COW'd.
>
> Fix by moving skb_ensure_writable() inside the per-key loop where
> the actual write offset is known, and add overflow checking on the
> offset arithmetic. For negative offsets (e.g. Ethernet header edits
> at ingress), use skb_cow() to COW the headroom instead. Guard
> offset_valid() against INT_MIN, where negation is undefined.
>
> Additionally, linearize skbs with shared frags upfront to prevent
> silent data corruption when pedit operates on zero-copy pages
> (e.g. from sendfile).
>
> Fixes: 8b796475fd78 ("net/sched: act_pedit: really ensure the skb is writable")
> Reported-by: Rajat Gupta <rajat.gupta@oss.qualcomm.com>
> Reported-by: Yiming Qian <yimingqian591@gmail.com>
> Reported-by: Keenan Dong <keenanat2000@gmail.com>
> Reported-by: Han Guidong <2045gemini@gmail.com>
> Reported-by: Zhang Cen <rollkingzzc@gmail.com>
> Tested-by: Han Guidong <2045gemini@gmail.com>
> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
> Signed-off-by: Rajat Gupta <rajat.gupta@oss.qualcomm.com>

Reviewed-by: Toke Høiland-Jørgensen <toke@kernel.org>


Also applied this to -net and ran the TC pedit selftests and the
pedit_* scripts in net/forwarding, none of which turned up any
regressions, so:

Tested-by: Toke Høiland-Jørgensen <toke@kernel.org>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net] net/sched: fix pedit partial COW leading to page cache corruption
  2026-05-19  3:39   ` [PATCH net] net/sched: fix pedit partial COW leading to page cache corruption Rajat Gupta
  2026-05-19 11:18     ` Toke Høiland-Jørgensen
@ 2026-05-19 15:10     ` Han Guidong
  2026-05-20  9:12       ` Jamal Hadi Salim
  2026-05-20  9:23     ` Jamal Hadi Salim
  2 siblings, 1 reply; 18+ messages in thread
From: Han Guidong @ 2026-05-19 15:10 UTC (permalink / raw)
  To: Rajat Gupta
  Cc: netdev, davem, edumazet, kuba, pabeni, horms, jhs, jiri,
	yimingqian591, keenanat2000, rollkingzzc

On Tue, May 19, 2026 at 11:42 AM Rajat Gupta
<rajat.gupta@oss.qualcomm.com> wrote:
>
> tcf_pedit_act() computes the COW range for skb_ensure_writable()
> once before the key loop using tcfp_off_max_hint, but the hint does
> not account for the runtime header offset added by typed keys. This
> can leave part of the write region un-COW'd.
>
> Fix by moving skb_ensure_writable() inside the per-key loop where
> the actual write offset is known, and add overflow checking on the
> offset arithmetic. For negative offsets (e.g. Ethernet header edits
> at ingress), use skb_cow() to COW the headroom instead. Guard
> offset_valid() against INT_MIN, where negation is undefined.
>
> Additionally, linearize skbs with shared frags upfront to prevent
> silent data corruption when pedit operates on zero-copy pages
> (e.g. from sendfile).
>
> Fixes: 8b796475fd78 ("net/sched: act_pedit: really ensure the skb is writable")

Fixes: 6c02568fd1ae ("net/sched: act_pedit: Parse L3 Header for L4 offset")
Cc: stable@vger.kernel.org

I took a closer look at the code, and this patch is really addressing
two issues.

The negative-offset write into shared head/headroom appears to be from
8b796475fd78, while the shared-frag write, and thus possible
page-cache corruption, appears to be the later regression from
6c02568fd1ae. So it may be worth adding an additional Fixes tag for
6c02568fd1ae, and I think an explicit Cc: stable@vger.kernel.org would
also make sense here.

> Reported-by: Rajat Gupta <rajat.gupta@oss.qualcomm.com>
> Reported-by: Yiming Qian <yimingqian591@gmail.com>
> Reported-by: Keenan Dong <keenanat2000@gmail.com>
> Reported-by: Han Guidong <2045gemini@gmail.com>
> Reported-by: Zhang Cen <rollkingzzc@gmail.com>
> Tested-by: Han Guidong <2045gemini@gmail.com>
> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
> Signed-off-by: Rajat Gupta <rajat.gupta@oss.qualcomm.com>
> ---
>  net/sched/act_pedit.c | 54 ++++++++++++++++++++++++++++++++-----------
>  1 file changed, 41 insertions(+), 13 deletions(-)
>
> diff --git a/net/sched/act_pedit.c b/net/sched/act_pedit.c
> index bc20f08a2..79921b8d8 100644
> --- a/net/sched/act_pedit.c
> +++ b/net/sched/act_pedit.c
> @@ -16,6 +16,7 @@
>  #include <linux/ip.h>
>  #include <linux/ipv6.h>
>  #include <linux/slab.h>
> +#include <linux/overflow.h>
>  #include <net/ipv6.h>
>  #include <net/netlink.h>
>  #include <net/pkt_sched.h>
> @@ -323,8 +324,10 @@ static bool offset_valid(struct sk_buff *skb, int offset)
>         if (offset > 0 && offset > skb->len)
>                 return false;
>
> -       if  (offset < 0 && -offset > skb_headroom(skb))
> -               return false;
> +       if (offset < 0) {
> +               if (offset == INT_MIN || -offset > skb_headroom(skb))
> +                       return false;
> +       }
>
>         return true;
>  }
> @@ -393,17 +396,21 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb,
>         struct tcf_pedit_key_ex *tkey_ex;
>         struct tcf_pedit_parms *parms;
>         struct tc_pedit_key *tkey;
> -       u32 max_offset;
>         int i;
>
>         parms = rcu_dereference_bh(p->parms);
>
> -       max_offset = (skb_transport_header_was_set(skb) ?
> -                     skb_transport_offset(skb) :
> -                     skb_network_offset(skb)) +
> -                    parms->tcfp_off_max_hint;
> -       if (skb_ensure_writable(skb, min(skb->len, max_offset)))
> -               goto done;
> +       /*
> +        * If the skb has shared frags the user is likely using zero-copy
> +        * (e.g. sendfile).  Those page frags may point to page-cache pages;
> +        * writing into them would silently corrupt the page cache.
> +        * Linearize so pedit operates on a private copy.
> +        * TL;DR: if you want zero-copy, don't use pedit.
> +        */
> +       if (skb_has_shared_frag(skb)) {
> +               if (__skb_linearize(skb))
> +                       goto bad;
> +       }
>
>         tcf_lastuse_update(&p->tcf_tm);
>         tcf_action_update_bstats(&p->common, skb);
> @@ -412,6 +419,7 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb,
>         tkey_ex = parms->tcfp_keys_ex;
>
>         for (i = parms->tcfp_nkeys; i > 0; i--, tkey++) {
> +               int write_offset, write_len;
>                 int offset = tkey->off;
>                 int hoffset = 0;
>                 u32 *ptr, hdata;
> @@ -451,12 +459,32 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb,
>                         }
>                 }
>
> -               if (!offset_valid(skb, hoffset + offset)) {
> -                       pr_info_ratelimited("tc action pedit offset %d out of bounds\n", hoffset + offset);
> +               if (unlikely(check_add_overflow(hoffset, offset,
> +                                               &write_offset))) {

I also noticed two remaining unchecked additions, e.g. hoffset +
tkey->at and offset += (*d & offmask) >> shift. These look more like
wrap/type-semantics cleanup than a practical signed-overflow/UB issue
to me, so I am OK with this patch as-is.

> +                       pr_info_ratelimited("tc action pedit offset overflow\n");
> +                       goto bad;
> +               }
> +
> +               if (!offset_valid(skb, write_offset)) {
> +                       pr_info_ratelimited("tc action pedit offset %d out of bounds\n",
> +                                           write_offset);
>                         goto bad;
>                 }
>
> -               ptr = skb_header_pointer(skb, hoffset + offset,
> +               if (write_offset < 0) {

There is also a small negative-offset 4-byte-write corner case, e.g.
an ETH-relative edit with hoffset=-14 and offset=12, giving a final
write_offset=-2; if the linear head after skb->data is only 1 byte,
that store spans -2..1. That seems unlikely in practice; fixing it
would be nice, but I would not hold this patch for it.

Reviewed-by: Han Guidong <2045gemini@gmail.com>


> +                       if (skb_cow(skb, -write_offset))
> +                               goto bad;
> +               } else {
> +                       if (unlikely(check_add_overflow(write_offset,
> +                                                       (int)sizeof(hdata),
> +                                                       &write_len)))
> +                               goto bad;
> +                       if (skb_ensure_writable(skb, min_t(int, skb->len,
> +                                                          write_len)))
> +                               goto bad;
> +               }
> +
> +               ptr = skb_header_pointer(skb, write_offset,
>                                          sizeof(hdata), &hdata);
>                 if (!ptr)
>                         goto bad;
> @@ -475,7 +503,7 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb,
>
>                 *ptr = ((*ptr & tkey->mask) ^ val);
>                 if (ptr == &hdata)
> -                       skb_store_bits(skb, hoffset + offset, ptr, 4);
> +                       skb_store_bits(skb, write_offset, ptr, sizeof(hdata));
>         }
>
>         goto done;
> --
> 2.51.2.windows.1
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net] net/sched: fix pedit partial COW leading to page cache corruption
  2026-05-19 15:10     ` Han Guidong
@ 2026-05-20  9:12       ` Jamal Hadi Salim
  2026-05-20 10:04         ` Han Guidong
  2026-05-20 10:36         ` Han Guidong
  0 siblings, 2 replies; 18+ messages in thread
From: Jamal Hadi Salim @ 2026-05-20  9:12 UTC (permalink / raw)
  To: Han Guidong
  Cc: Rajat Gupta, netdev, davem, edumazet, kuba, pabeni, horms, jiri,
	yimingqian591, keenanat2000, rollkingzzc

On Tue, May 19, 2026 at 11:11 AM Han Guidong <2045gemini@gmail.com> wrote:
>
> On Tue, May 19, 2026 at 11:42 AM Rajat Gupta
> <rajat.gupta@oss.qualcomm.com> wrote:
> >
> > tcf_pedit_act() computes the COW range for skb_ensure_writable()
> > once before the key loop using tcfp_off_max_hint, but the hint does
> > not account for the runtime header offset added by typed keys. This
> > can leave part of the write region un-COW'd.
> >
> > Fix by moving skb_ensure_writable() inside the per-key loop where
> > the actual write offset is known, and add overflow checking on the
> > offset arithmetic. For negative offsets (e.g. Ethernet header edits
> > at ingress), use skb_cow() to COW the headroom instead. Guard
> > offset_valid() against INT_MIN, where negation is undefined.
> >
> > Additionally, linearize skbs with shared frags upfront to prevent
> > silent data corruption when pedit operates on zero-copy pages
> > (e.g. from sendfile).
> >
> > Fixes: 8b796475fd78 ("net/sched: act_pedit: really ensure the skb is writable")
>
> Fixes: 6c02568fd1ae ("net/sched: act_pedit: Parse L3 Header for L4 offset")
> Cc: stable@vger.kernel.org
>
> I took a closer look at the code, and this patch is really addressing
> two issues.
>
> The negative-offset write into shared head/headroom appears to be from
> 8b796475fd78, while the shared-frag write, and thus possible
> page-cache corruption, appears to be the later regression from
> 6c02568fd1ae. So it may be worth adding an additional Fixes tag for
> 6c02568fd1ae,

Pushing it a little - but sure we should also point to that commit.

> and I think an explicit Cc: stable@vger.kernel.org would also make sense here.

Please study the rules.

> > Reported-by: Rajat Gupta <rajat.gupta@oss.qualcomm.com>
> > Reported-by: Yiming Qian <yimingqian591@gmail.com>
> > Reported-by: Keenan Dong <keenanat2000@gmail.com>
> > Reported-by: Han Guidong <2045gemini@gmail.com>
> > Reported-by: Zhang Cen <rollkingzzc@gmail.com>
> > Tested-by: Han Guidong <2045gemini@gmail.com>
> > Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
> > Signed-off-by: Rajat Gupta <rajat.gupta@oss.qualcomm.com>
> > ---
> >  net/sched/act_pedit.c | 54 ++++++++++++++++++++++++++++++++-----------
> >  1 file changed, 41 insertions(+), 13 deletions(-)
> >
> > diff --git a/net/sched/act_pedit.c b/net/sched/act_pedit.c
> > index bc20f08a2..79921b8d8 100644
> > --- a/net/sched/act_pedit.c
> > +++ b/net/sched/act_pedit.c
> > @@ -16,6 +16,7 @@
> >  #include <linux/ip.h>
> >  #include <linux/ipv6.h>
> >  #include <linux/slab.h>
> > +#include <linux/overflow.h>
> >  #include <net/ipv6.h>
> >  #include <net/netlink.h>
> >  #include <net/pkt_sched.h>
> > @@ -323,8 +324,10 @@ static bool offset_valid(struct sk_buff *skb, int offset)
> >         if (offset > 0 && offset > skb->len)
> >                 return false;
> >
> > -       if  (offset < 0 && -offset > skb_headroom(skb))
> > -               return false;
> > +       if (offset < 0) {
> > +               if (offset == INT_MIN || -offset > skb_headroom(skb))
> > +                       return false;
> > +       }
> >
> >         return true;
> >  }
> > @@ -393,17 +396,21 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb,
> >         struct tcf_pedit_key_ex *tkey_ex;
> >         struct tcf_pedit_parms *parms;
> >         struct tc_pedit_key *tkey;
> > -       u32 max_offset;
> >         int i;
> >
> >         parms = rcu_dereference_bh(p->parms);
> >
> > -       max_offset = (skb_transport_header_was_set(skb) ?
> > -                     skb_transport_offset(skb) :
> > -                     skb_network_offset(skb)) +
> > -                    parms->tcfp_off_max_hint;
> > -       if (skb_ensure_writable(skb, min(skb->len, max_offset)))
> > -               goto done;
> > +       /*
> > +        * If the skb has shared frags the user is likely using zero-copy
> > +        * (e.g. sendfile).  Those page frags may point to page-cache pages;
> > +        * writing into them would silently corrupt the page cache.
> > +        * Linearize so pedit operates on a private copy.
> > +        * TL;DR: if you want zero-copy, don't use pedit.
> > +        */
> > +       if (skb_has_shared_frag(skb)) {
> > +               if (__skb_linearize(skb))
> > +                       goto bad;
> > +       }
> >
> >         tcf_lastuse_update(&p->tcf_tm);
> >         tcf_action_update_bstats(&p->common, skb);
> > @@ -412,6 +419,7 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb,
> >         tkey_ex = parms->tcfp_keys_ex;
> >
> >         for (i = parms->tcfp_nkeys; i > 0; i--, tkey++) {
> > +               int write_offset, write_len;
> >                 int offset = tkey->off;
> >                 int hoffset = 0;
> >                 u32 *ptr, hdata;
> > @@ -451,12 +459,32 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb,
> >                         }
> >                 }
> >
> > -               if (!offset_valid(skb, hoffset + offset)) {
> > -                       pr_info_ratelimited("tc action pedit offset %d out of bounds\n", hoffset + offset);
> > +               if (unlikely(check_add_overflow(hoffset, offset,
> > +                                               &write_offset))) {
>
> I also noticed two remaining unchecked additions, e.g. hoffset +
> tkey->at and offset += (*d & offmask) >> shift. These look more like
> wrap/type-semantics cleanup than a practical signed-overflow/UB issue
> to me, so I am OK with this patch as-is.

Did you get this from some AI?
None of these are possible. Both additions you mention are safe in
practice because any potential wrap around or overflow is caught
_before any memory access occurs_. Example see:
offset_valid() which implictly casts hoffset + key to signed int.

> > +                       pr_info_ratelimited("tc action pedit offset overflow\n");
> > +                       goto bad;
> > +               }
> > +
> > +               if (!offset_valid(skb, write_offset)) {
> > +                       pr_info_ratelimited("tc action pedit offset %d out of bounds\n",
> > +                                           write_offset);
> >                         goto bad;
> >                 }
> >
> > -               ptr = skb_header_pointer(skb, hoffset + offset,
> > +               if (write_offset < 0) {
>
> There is also a small negative-offset 4-byte-write corner case, e.g.
> an ETH-relative edit with hoffset=-14 and offset=12, giving a final
> write_offset=-2; if the linear head after skb->data is only 1 byte,
> that store spans -2..1. That seems unlikely in practice; fixing it
> would be nice, but I would not hold this patch for it.
>

Again smells AI generated. I spent a lot of time trying to determine
the practicality and validity of this (and your previous claim), only
to find that this whole claim is based on  a view that a headroom of 1
byte is a valid scenario in practice. Is it theoretically possible
from the math? yes. Practically, I don't see how it makes any sense.
Can you describe a scenario where this would happen in practise?

> Reviewed-by: Han Guidong <2045gemini@gmail.com>

More like reviewed by my AI who goes by the name foobar.

cheers,
jamal

>
>
> > +                       if (skb_cow(skb, -write_offset))
> > +                               goto bad;
> > +               } else {
> > +                       if (unlikely(check_add_overflow(write_offset,
> > +                                                       (int)sizeof(hdata),
> > +                                                       &write_len)))
> > +                               goto bad;
> > +                       if (skb_ensure_writable(skb, min_t(int, skb->len,
> > +                                                          write_len)))
> > +                               goto bad;
> > +               }
> > +
> > +               ptr = skb_header_pointer(skb, write_offset,
> >                                          sizeof(hdata), &hdata);
> >                 if (!ptr)
> >                         goto bad;
> > @@ -475,7 +503,7 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb,
> >
> >                 *ptr = ((*ptr & tkey->mask) ^ val);
> >                 if (ptr == &hdata)
> > -                       skb_store_bits(skb, hoffset + offset, ptr, 4);
> > +                       skb_store_bits(skb, write_offset, ptr, sizeof(hdata));
> >         }
> >
> >         goto done;
> > --
> > 2.51.2.windows.1
> >

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net] net/sched: fix pedit partial COW leading to page cache corruption
  2026-05-19  3:39   ` [PATCH net] net/sched: fix pedit partial COW leading to page cache corruption Rajat Gupta
  2026-05-19 11:18     ` Toke Høiland-Jørgensen
  2026-05-19 15:10     ` Han Guidong
@ 2026-05-20  9:23     ` Jamal Hadi Salim
  2026-05-20 20:00       ` Jamal Hadi Salim
  2 siblings, 1 reply; 18+ messages in thread
From: Jamal Hadi Salim @ 2026-05-20  9:23 UTC (permalink / raw)
  To: Rajat Gupta
  Cc: netdev, davem, edumazet, kuba, pabeni, horms, jiri, yimingqian591,
	keenanat2000, 2045gemini, rollkingzzc

On Mon, May 18, 2026 at 11:42 PM Rajat Gupta
<rajat.gupta@oss.qualcomm.com> wrote:
>
> tcf_pedit_act() computes the COW range for skb_ensure_writable()
> once before the key loop using tcfp_off_max_hint, but the hint does
> not account for the runtime header offset added by typed keys. This
> can leave part of the write region un-COW'd.
>
> Fix by moving skb_ensure_writable() inside the per-key loop where
> the actual write offset is known, and add overflow checking on the
> offset arithmetic. For negative offsets (e.g. Ethernet header edits
> at ingress), use skb_cow() to COW the headroom instead. Guard
> offset_valid() against INT_MIN, where negation is undefined.
>
> Additionally, linearize skbs with shared frags upfront to prevent
> silent data corruption when pedit operates on zero-copy pages
> (e.g. from sendfile).
>

Sashiko makes the claim that:
"
....
.....
If the payload part extending past offset 0 resides in a shared
fragment for a cloned skb, wouldn't skb_store_bits() write the
positive offset  portion directly into the shared page, causing a data
corruption regression?
"

It missed the earlier code (before the quoted code is hit) which is
very well documented!
/*
 * If the skb has shared frags the user is likely using zero-copy
 * (e.g. sendfile).  Those page frags may point to page-cache pages;
 * writing into them would silently corrupt the page cache.
 * Linearize so pedit operates on a private copy.
 * TL;DR: if you want zero-copy, don't use pedit.
 */
 if (skb_has_shared_frag(skb)) {
        if (__skb_linearize(skb))
             goto bad;
 }

TLDR:
If the  skb  has shared frags, it is linearized upfront using
__skb_linearize() which means any risk of modifying a shared fragment
is non-existent.

cheers,
jamal

> Fixes: 8b796475fd78 ("net/sched: act_pedit: really ensure the skb is writable")
> Reported-by: Rajat Gupta <rajat.gupta@oss.qualcomm.com>
> Reported-by: Yiming Qian <yimingqian591@gmail.com>
> Reported-by: Keenan Dong <keenanat2000@gmail.com>
> Reported-by: Han Guidong <2045gemini@gmail.com>
> Reported-by: Zhang Cen <rollkingzzc@gmail.com>
> Tested-by: Han Guidong <2045gemini@gmail.com>
> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
> Signed-off-by: Rajat Gupta <rajat.gupta@oss.qualcomm.com>
> ---
>  net/sched/act_pedit.c | 54 ++++++++++++++++++++++++++++++++-----------
>  1 file changed, 41 insertions(+), 13 deletions(-)
>
> diff --git a/net/sched/act_pedit.c b/net/sched/act_pedit.c
> index bc20f08a2..79921b8d8 100644
> --- a/net/sched/act_pedit.c
> +++ b/net/sched/act_pedit.c
> @@ -16,6 +16,7 @@
>  #include <linux/ip.h>
>  #include <linux/ipv6.h>
>  #include <linux/slab.h>
> +#include <linux/overflow.h>
>  #include <net/ipv6.h>
>  #include <net/netlink.h>
>  #include <net/pkt_sched.h>
> @@ -323,8 +324,10 @@ static bool offset_valid(struct sk_buff *skb, int offset)
>         if (offset > 0 && offset > skb->len)
>                 return false;
>
> -       if  (offset < 0 && -offset > skb_headroom(skb))
> -               return false;
> +       if (offset < 0) {
> +               if (offset == INT_MIN || -offset > skb_headroom(skb))
> +                       return false;
> +       }
>
>         return true;
>  }
> @@ -393,17 +396,21 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb,
>         struct tcf_pedit_key_ex *tkey_ex;
>         struct tcf_pedit_parms *parms;
>         struct tc_pedit_key *tkey;
> -       u32 max_offset;
>         int i;
>
>         parms = rcu_dereference_bh(p->parms);
>
> -       max_offset = (skb_transport_header_was_set(skb) ?
> -                     skb_transport_offset(skb) :
> -                     skb_network_offset(skb)) +
> -                    parms->tcfp_off_max_hint;
> -       if (skb_ensure_writable(skb, min(skb->len, max_offset)))
> -               goto done;
> +       /*
> +        * If the skb has shared frags the user is likely using zero-copy
> +        * (e.g. sendfile).  Those page frags may point to page-cache pages;
> +        * writing into them would silently corrupt the page cache.
> +        * Linearize so pedit operates on a private copy.
> +        * TL;DR: if you want zero-copy, don't use pedit.
> +        */
> +       if (skb_has_shared_frag(skb)) {
> +               if (__skb_linearize(skb))
> +                       goto bad;
> +       }
>
>         tcf_lastuse_update(&p->tcf_tm);
>         tcf_action_update_bstats(&p->common, skb);
> @@ -412,6 +419,7 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb,
>         tkey_ex = parms->tcfp_keys_ex;
>
>         for (i = parms->tcfp_nkeys; i > 0; i--, tkey++) {
> +               int write_offset, write_len;
>                 int offset = tkey->off;
>                 int hoffset = 0;
>                 u32 *ptr, hdata;
> @@ -451,12 +459,32 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb,
>                         }
>                 }
>
> -               if (!offset_valid(skb, hoffset + offset)) {
> -                       pr_info_ratelimited("tc action pedit offset %d out of bounds\n", hoffset + offset);
> +               if (unlikely(check_add_overflow(hoffset, offset,
> +                                               &write_offset))) {
> +                       pr_info_ratelimited("tc action pedit offset overflow\n");
> +                       goto bad;
> +               }
> +
> +               if (!offset_valid(skb, write_offset)) {
> +                       pr_info_ratelimited("tc action pedit offset %d out of bounds\n",
> +                                           write_offset);
>                         goto bad;
>                 }
>
> -               ptr = skb_header_pointer(skb, hoffset + offset,
> +               if (write_offset < 0) {
> +                       if (skb_cow(skb, -write_offset))
> +                               goto bad;
> +               } else {
> +                       if (unlikely(check_add_overflow(write_offset,
> +                                                       (int)sizeof(hdata),
> +                                                       &write_len)))
> +                               goto bad;
> +                       if (skb_ensure_writable(skb, min_t(int, skb->len,
> +                                                          write_len)))
> +                               goto bad;
> +               }
> +
> +               ptr = skb_header_pointer(skb, write_offset,
>                                          sizeof(hdata), &hdata);
>                 if (!ptr)
>                         goto bad;
> @@ -475,7 +503,7 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb,
>
>                 *ptr = ((*ptr & tkey->mask) ^ val);
>                 if (ptr == &hdata)
> -                       skb_store_bits(skb, hoffset + offset, ptr, 4);
> +                       skb_store_bits(skb, write_offset, ptr, sizeof(hdata));
>         }
>
>         goto done;
> --
> 2.51.2.windows.1
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net] net/sched: fix pedit partial COW leading to page cache corruption
  2026-05-20  9:12       ` Jamal Hadi Salim
@ 2026-05-20 10:04         ` Han Guidong
  2026-05-20 10:36         ` Han Guidong
  1 sibling, 0 replies; 18+ messages in thread
From: Han Guidong @ 2026-05-20 10:04 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: Rajat Gupta, netdev, davem, edumazet, kuba, pabeni, horms, jiri,
	yimingqian591, keenanat2000, rollkingzzc

On Wed, May 20, 2026 at 5:12 PM Jamal Hadi Salim <jhs@mojatatu.com> wrote:
>
> On Tue, May 19, 2026 at 11:11 AM Han Guidong <2045gemini@gmail.com> wrote:
> >
> > On Tue, May 19, 2026 at 11:42 AM Rajat Gupta
> > <rajat.gupta@oss.qualcomm.com> wrote:
> > >
> > > tcf_pedit_act() computes the COW range for skb_ensure_writable()
> > > once before the key loop using tcfp_off_max_hint, but the hint does
> > > not account for the runtime header offset added by typed keys. This
> > > can leave part of the write region un-COW'd.
> > >
> > > Fix by moving skb_ensure_writable() inside the per-key loop where
> > > the actual write offset is known, and add overflow checking on the
> > > offset arithmetic. For negative offsets (e.g. Ethernet header edits
> > > at ingress), use skb_cow() to COW the headroom instead. Guard
> > > offset_valid() against INT_MIN, where negation is undefined.
> > >
> > > Additionally, linearize skbs with shared frags upfront to prevent
> > > silent data corruption when pedit operates on zero-copy pages
> > > (e.g. from sendfile).
> > >
> > > Fixes: 8b796475fd78 ("net/sched: act_pedit: really ensure the skb is writable")
> >
> > Fixes: 6c02568fd1ae ("net/sched: act_pedit: Parse L3 Header for L4 offset")
> > Cc: stable@vger.kernel.org
> >
> > I took a closer look at the code, and this patch is really addressing
> > two issues.
> >
> > The negative-offset write into shared head/headroom appears to be from
> > 8b796475fd78, while the shared-frag write, and thus possible
> > page-cache corruption, appears to be the later regression from
> > 6c02568fd1ae. So it may be worth adding an additional Fixes tag for
> > 6c02568fd1ae,
>
> Pushing it a little - but sure we should also point to that commit.
>
> > and I think an explicit Cc: stable@vger.kernel.org would also make sense here.
>
> Please study the rules.

I will review the rules. Sorry for the noise.

>
> > > Reported-by: Rajat Gupta <rajat.gupta@oss.qualcomm.com>
> > > Reported-by: Yiming Qian <yimingqian591@gmail.com>
> > > Reported-by: Keenan Dong <keenanat2000@gmail.com>
> > > Reported-by: Han Guidong <2045gemini@gmail.com>
> > > Reported-by: Zhang Cen <rollkingzzc@gmail.com>
> > > Tested-by: Han Guidong <2045gemini@gmail.com>
> > > Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
> > > Signed-off-by: Rajat Gupta <rajat.gupta@oss.qualcomm.com>
> > > ---
> > >  net/sched/act_pedit.c | 54 ++++++++++++++++++++++++++++++++-----------
> > >  1 file changed, 41 insertions(+), 13 deletions(-)
> > >
> > > diff --git a/net/sched/act_pedit.c b/net/sched/act_pedit.c
> > > index bc20f08a2..79921b8d8 100644
> > > --- a/net/sched/act_pedit.c
> > > +++ b/net/sched/act_pedit.c
> > > @@ -16,6 +16,7 @@
> > >  #include <linux/ip.h>
> > >  #include <linux/ipv6.h>
> > >  #include <linux/slab.h>
> > > +#include <linux/overflow.h>
> > >  #include <net/ipv6.h>
> > >  #include <net/netlink.h>
> > >  #include <net/pkt_sched.h>
> > > @@ -323,8 +324,10 @@ static bool offset_valid(struct sk_buff *skb, int offset)
> > >         if (offset > 0 && offset > skb->len)
> > >                 return false;
> > >
> > > -       if  (offset < 0 && -offset > skb_headroom(skb))
> > > -               return false;
> > > +       if (offset < 0) {
> > > +               if (offset == INT_MIN || -offset > skb_headroom(skb))
> > > +                       return false;
> > > +       }
> > >
> > >         return true;
> > >  }
> > > @@ -393,17 +396,21 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb,
> > >         struct tcf_pedit_key_ex *tkey_ex;
> > >         struct tcf_pedit_parms *parms;
> > >         struct tc_pedit_key *tkey;
> > > -       u32 max_offset;
> > >         int i;
> > >
> > >         parms = rcu_dereference_bh(p->parms);
> > >
> > > -       max_offset = (skb_transport_header_was_set(skb) ?
> > > -                     skb_transport_offset(skb) :
> > > -                     skb_network_offset(skb)) +
> > > -                    parms->tcfp_off_max_hint;
> > > -       if (skb_ensure_writable(skb, min(skb->len, max_offset)))
> > > -               goto done;
> > > +       /*
> > > +        * If the skb has shared frags the user is likely using zero-copy
> > > +        * (e.g. sendfile).  Those page frags may point to page-cache pages;
> > > +        * writing into them would silently corrupt the page cache.
> > > +        * Linearize so pedit operates on a private copy.
> > > +        * TL;DR: if you want zero-copy, don't use pedit.
> > > +        */
> > > +       if (skb_has_shared_frag(skb)) {
> > > +               if (__skb_linearize(skb))
> > > +                       goto bad;
> > > +       }
> > >
> > >         tcf_lastuse_update(&p->tcf_tm);
> > >         tcf_action_update_bstats(&p->common, skb);
> > > @@ -412,6 +419,7 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb,
> > >         tkey_ex = parms->tcfp_keys_ex;
> > >
> > >         for (i = parms->tcfp_nkeys; i > 0; i--, tkey++) {
> > > +               int write_offset, write_len;
> > >                 int offset = tkey->off;
> > >                 int hoffset = 0;
> > >                 u32 *ptr, hdata;
> > > @@ -451,12 +459,32 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb,
> > >                         }
> > >                 }
> > >
> > > -               if (!offset_valid(skb, hoffset + offset)) {
> > > -                       pr_info_ratelimited("tc action pedit offset %d out of bounds\n", hoffset + offset);
> > > +               if (unlikely(check_add_overflow(hoffset, offset,
> > > +                                               &write_offset))) {
> >
> > I also noticed two remaining unchecked additions, e.g. hoffset +
> > tkey->at and offset += (*d & offmask) >> shift. These look more like
> > wrap/type-semantics cleanup than a practical signed-overflow/UB issue
> > to me, so I am OK with this patch as-is.
>
> Did you get this from some AI?
> None of these are possible. Both additions you mention are safe in
> practice because any potential wrap around or overflow is caught
> _before any memory access occurs_. Example see:
> offset_valid() which implictly casts hoffset + key to signed int.

Not AI-generated. I was looking at the code and noticed that
tc_pedit_key fields are unsigned, but the results are stored in int
offset and int hoffset. I was just thinking about odd edge cases where
an unsigned-to-signed conversion results in a negative number. I agree
with you that offset_valid() catches this anyway, so it causes no
actual harm.

>
> > > +                       pr_info_ratelimited("tc action pedit offset overflow\n");
> > > +                       goto bad;
> > > +               }
> > > +
> > > +               if (!offset_valid(skb, write_offset)) {
> > > +                       pr_info_ratelimited("tc action pedit offset %d out of bounds\n",
> > > +                                           write_offset);
> > >                         goto bad;
> > >                 }
> > >
> > > -               ptr = skb_header_pointer(skb, hoffset + offset,
> > > +               if (write_offset < 0) {
> >
> > There is also a small negative-offset 4-byte-write corner case, e.g.
> > an ETH-relative edit with hoffset=-14 and offset=12, giving a final
> > write_offset=-2; if the linear head after skb->data is only 1 byte,
> > that store spans -2..1. That seems unlikely in practice; fixing it
> > would be nice, but I would not hold this patch for it.
> >
>
> Again smells AI generated. I spent a lot of time trying to determine
> the practicality and validity of this (and your previous claim), only
> to find that this whole claim is based on  a view that a headroom of 1
> byte is a valid scenario in practice. Is it theoretically possible
> from the math? yes. Practically, I don't see how it makes any sense.
> Can you describe a scenario where this would happen in practise?

As I mentioned in the previous mail, I agree it is unlikely in
practice. For context, this was an edge case brought up by sashiko
regarding our patch. I wrote a test kernel module and verified that if
an skb is artificially constructed with exactly 1 byte of headroom,
skb_cow() is indeed insufficient. However, I also checked real-world
packets and the headroom is always plenty, so I agree it doesn't
happen in the real world.

>
> > Reviewed-by: Han Guidong <2045gemini@gmail.com>
>
> More like reviewed by my AI who goes by the name foobar.

I will work on improving my review quality going forward. Sorry again
for the noise.

Thanks.

>
> cheers,
> jamal
>
> >
> >
> > > +                       if (skb_cow(skb, -write_offset))
> > > +                               goto bad;
> > > +               } else {
> > > +                       if (unlikely(check_add_overflow(write_offset,
> > > +                                                       (int)sizeof(hdata),
> > > +                                                       &write_len)))
> > > +                               goto bad;
> > > +                       if (skb_ensure_writable(skb, min_t(int, skb->len,
> > > +                                                          write_len)))
> > > +                               goto bad;
> > > +               }
> > > +
> > > +               ptr = skb_header_pointer(skb, write_offset,
> > >                                          sizeof(hdata), &hdata);
> > >                 if (!ptr)
> > >                         goto bad;
> > > @@ -475,7 +503,7 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb,
> > >
> > >                 *ptr = ((*ptr & tkey->mask) ^ val);
> > >                 if (ptr == &hdata)
> > > -                       skb_store_bits(skb, hoffset + offset, ptr, 4);
> > > +                       skb_store_bits(skb, write_offset, ptr, sizeof(hdata));
> > >         }
> > >
> > >         goto done;
> > > --
> > > 2.51.2.windows.1
> > >

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net] net/sched: fix pedit partial COW leading to page cache corruption
  2026-05-20  9:12       ` Jamal Hadi Salim
  2026-05-20 10:04         ` Han Guidong
@ 2026-05-20 10:36         ` Han Guidong
  2026-05-20 11:40           ` Jamal Hadi Salim
  1 sibling, 1 reply; 18+ messages in thread
From: Han Guidong @ 2026-05-20 10:36 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: Rajat Gupta, netdev, davem, edumazet, kuba, pabeni, horms, jiri,
	yimingqian591, keenanat2000, rollkingzzc

On Wed, May 20, 2026 at 5:12 PM Jamal Hadi Salim <jhs@mojatatu.com> wrote:
>
> On Tue, May 19, 2026 at 11:11 AM Han Guidong <2045gemini@gmail.com> wrote:
> >
> > On Tue, May 19, 2026 at 11:42 AM Rajat Gupta
> > <rajat.gupta@oss.qualcomm.com> wrote:
> > >
> > > tcf_pedit_act() computes the COW range for skb_ensure_writable()
> > > once before the key loop using tcfp_off_max_hint, but the hint does
> > > not account for the runtime header offset added by typed keys. This
> > > can leave part of the write region un-COW'd.
> > >
> > > Fix by moving skb_ensure_writable() inside the per-key loop where
> > > the actual write offset is known, and add overflow checking on the
> > > offset arithmetic. For negative offsets (e.g. Ethernet header edits
> > > at ingress), use skb_cow() to COW the headroom instead. Guard
> > > offset_valid() against INT_MIN, where negation is undefined.
> > >
> > > Additionally, linearize skbs with shared frags upfront to prevent
> > > silent data corruption when pedit operates on zero-copy pages
> > > (e.g. from sendfile).
> > >
> > > Fixes: 8b796475fd78 ("net/sched: act_pedit: really ensure the skb is writable")
> >
> > Fixes: 6c02568fd1ae ("net/sched: act_pedit: Parse L3 Header for L4 offset")
> > Cc: stable@vger.kernel.org
> >
> > I took a closer look at the code, and this patch is really addressing
> > two issues.
> >
> > The negative-offset write into shared head/headroom appears to be from
> > 8b796475fd78, while the shared-frag write, and thus possible
> > page-cache corruption, appears to be the later regression from
> > 6c02568fd1ae. So it may be worth adding an additional Fixes tag for
> > 6c02568fd1ae,
>
> Pushing it a little - but sure we should also point to that commit.
>
> > and I think an explicit Cc: stable@vger.kernel.org would also make sense here.
>
> Please study the rules.

Hi Jamal,

Apologies, I'm still a newbie learning the process. I checked the docs
and saw commit dbbe7c962c3a ("docs: networking: drop special stable
handling"), but I'm still a bit confused.

Did the rule change to allow explicit "Cc: stable@vger.kernel.org"
tags for netdev?

If I completely missed the correct documentation, I would be very
grateful if you could point me to it.

Thanks for your patience!

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net] net/sched: fix pedit partial COW leading to page cache corruption
  2026-05-20 10:36         ` Han Guidong
@ 2026-05-20 11:40           ` Jamal Hadi Salim
  0 siblings, 0 replies; 18+ messages in thread
From: Jamal Hadi Salim @ 2026-05-20 11:40 UTC (permalink / raw)
  To: Han Guidong
  Cc: Rajat Gupta, netdev, davem, edumazet, kuba, pabeni, horms, jiri,
	yimingqian591, keenanat2000, rollkingzzc

On Wed, May 20, 2026 at 6:37 AM Han Guidong <2045gemini@gmail.com> wrote:
>
> On Wed, May 20, 2026 at 5:12 PM Jamal Hadi Salim <jhs@mojatatu.com> wrote:
> >
> > On Tue, May 19, 2026 at 11:11 AM Han Guidong <2045gemini@gmail.com> wrote:
> > >
> > > On Tue, May 19, 2026 at 11:42 AM Rajat Gupta
> > > <rajat.gupta@oss.qualcomm.com> wrote:
> > > >
> > > > tcf_pedit_act() computes the COW range for skb_ensure_writable()
> > > > once before the key loop using tcfp_off_max_hint, but the hint does
> > > > not account for the runtime header offset added by typed keys. This
> > > > can leave part of the write region un-COW'd.
> > > >
> > > > Fix by moving skb_ensure_writable() inside the per-key loop where
> > > > the actual write offset is known, and add overflow checking on the
> > > > offset arithmetic. For negative offsets (e.g. Ethernet header edits
> > > > at ingress), use skb_cow() to COW the headroom instead. Guard
> > > > offset_valid() against INT_MIN, where negation is undefined.
> > > >
> > > > Additionally, linearize skbs with shared frags upfront to prevent
> > > > silent data corruption when pedit operates on zero-copy pages
> > > > (e.g. from sendfile).
> > > >
> > > > Fixes: 8b796475fd78 ("net/sched: act_pedit: really ensure the skb is writable")
> > >
> > > Fixes: 6c02568fd1ae ("net/sched: act_pedit: Parse L3 Header for L4 offset")
> > > Cc: stable@vger.kernel.org
> > >
> > > I took a closer look at the code, and this patch is really addressing
> > > two issues.
> > >
> > > The negative-offset write into shared head/headroom appears to be from
> > > 8b796475fd78, while the shared-frag write, and thus possible
> > > page-cache corruption, appears to be the later regression from
> > > 6c02568fd1ae. So it may be worth adding an additional Fixes tag for
> > > 6c02568fd1ae,
> >
> > Pushing it a little - but sure we should also point to that commit.
> >
> > > and I think an explicit Cc: stable@vger.kernel.org would also make sense here.
> >
> > Please study the rules.
>
> Hi Jamal,
>
> Apologies, I'm still a newbie learning the process. I checked the docs
> and saw commit dbbe7c962c3a ("docs: networking: drop special stable
> handling"), but I'm still a bit confused.
>
> Did the rule change to allow explicit "Cc: stable@vger.kernel.org"
> tags for netdev?
>
> If I completely missed the correct documentation, I would be very
> grateful if you could point me to it.
>

This is a security fix. AFAIK, It does not follow the stable rules.
It goes via the net tree (then propagated via net-next) before making
it to linux main. It gets picked up by the stable tree.
But: You are correct that Ccing stable is useful when the vulnerberity
hasnt been published. I apologize for being harsh - too many AI emails
claiming bugs on which i spend too much time and you bore the brunt of
it. So we should Cc them.

cheers,
jamal

> Thanks for your patience!

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net] net/sched: fix pedit partial COW leading to page cache corruption
  2026-05-20  9:23     ` Jamal Hadi Salim
@ 2026-05-20 20:00       ` Jamal Hadi Salim
  2026-05-21  9:53         ` Jamal Hadi Salim
  0 siblings, 1 reply; 18+ messages in thread
From: Jamal Hadi Salim @ 2026-05-20 20:00 UTC (permalink / raw)
  To: Rajat Gupta
  Cc: netdev, davem, edumazet, kuba, pabeni, horms, jiri, yimingqian591,
	keenanat2000, 2045gemini, rollkingzzc

On Wed, May 20, 2026 at 5:23 AM Jamal Hadi Salim <jhs@mojatatu.com> wrote:
>
> On Mon, May 18, 2026 at 11:42 PM Rajat Gupta
> <rajat.gupta@oss.qualcomm.com> wrote:
> >
> > tcf_pedit_act() computes the COW range for skb_ensure_writable()
> > once before the key loop using tcfp_off_max_hint, but the hint does
> > not account for the runtime header offset added by typed keys. This
> > can leave part of the write region un-COW'd.
> >
> > Fix by moving skb_ensure_writable() inside the per-key loop where
> > the actual write offset is known, and add overflow checking on the
> > offset arithmetic. For negative offsets (e.g. Ethernet header edits
> > at ingress), use skb_cow() to COW the headroom instead. Guard
> > offset_valid() against INT_MIN, where negation is undefined.
> >
> > Additionally, linearize skbs with shared frags upfront to prevent
> > silent data corruption when pedit operates on zero-copy pages
> > (e.g. from sendfile).
> >
>
> Sashiko makes the claim that:
> "
> ....
> .....
> If the payload part extending past offset 0 resides in a shared
> fragment for a cloned skb, wouldn't skb_store_bits() write the
> positive offset  portion directly into the shared page, causing a data
> corruption regression?
> "
>
> It missed the earlier code (before the quoted code is hit) which is
> very well documented!
> /*
>  * If the skb has shared frags the user is likely using zero-copy
>  * (e.g. sendfile).  Those page frags may point to page-cache pages;
>  * writing into them would silently corrupt the page cache.
>  * Linearize so pedit operates on a private copy.
>  * TL;DR: if you want zero-copy, don't use pedit.
>  */
>  if (skb_has_shared_frag(skb)) {
>         if (__skb_linearize(skb))
>              goto bad;
>  }
>
> TLDR:
> If the  skb  has shared frags, it is linearized upfront using
> __skb_linearize() which means any risk of modifying a shared fragment
> is non-existent.

It seems there's a second sashiko ive been made aware of - these
things are just multiplying ;->
https://netdev-ai.bots.linux.dev/sashiko/#/patchset/20260519033950.2037-1-rajat.gupta%40oss.qualcomm.com

Both points it makes are valid. Dont have time right now.
And infact looking at the second point sashiko 2 made I realize I was
only paying attention to externally shared fragments which the poc
used.
So probably the sashiko1 had a point as well. Anyways i dont have time
- will respond by am tommorow.

cheers,
jamal

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net] net/sched: fix pedit partial COW leading to page cache corruption
  2026-05-20 20:00       ` Jamal Hadi Salim
@ 2026-05-21  9:53         ` Jamal Hadi Salim
  2026-05-21 10:15           ` Jamal Hadi Salim
  0 siblings, 1 reply; 18+ messages in thread
From: Jamal Hadi Salim @ 2026-05-21  9:53 UTC (permalink / raw)
  To: Rajat Gupta
  Cc: netdev, davem, edumazet, kuba, pabeni, horms, jiri, yimingqian591,
	keenanat2000, 2045gemini, rollkingzzc

::

On Wed, May 20, 2026 at 4:00 PM Jamal Hadi Salim <jhs@mojatatu.com> wrote:
>
> On Wed, May 20, 2026 at 5:23 AM Jamal Hadi Salim <jhs@mojatatu.com> wrote:
> >
> > On Mon, May 18, 2026 at 11:42 PM Rajat Gupta
> > <rajat.gupta@oss.qualcomm.com> wrote:
> > >
> > > tcf_pedit_act() computes the COW range for skb_ensure_writable()
> > > once before the key loop using tcfp_off_max_hint, but the hint does
> > > not account for the runtime header offset added by typed keys. This
> > > can leave part of the write region un-COW'd.
> > >
> > > Fix by moving skb_ensure_writable() inside the per-key loop where
> > > the actual write offset is known, and add overflow checking on the
> > > offset arithmetic. For negative offsets (e.g. Ethernet header edits
> > > at ingress), use skb_cow() to COW the headroom instead. Guard
> > > offset_valid() against INT_MIN, where negation is undefined.
> > >
> > > Additionally, linearize skbs with shared frags upfront to prevent
> > > silent data corruption when pedit operates on zero-copy pages
> > > (e.g. from sendfile).
> > >
> >
> > Sashiko makes the claim that:
> > "
> > ....
> > .....
> > If the payload part extending past offset 0 resides in a shared
> > fragment for a cloned skb, wouldn't skb_store_bits() write the
> > positive offset  portion directly into the shared page, causing a data
> > corruption regression?
> > "
> >
> > It missed the earlier code (before the quoted code is hit) which is
> > very well documented!
> > /*
> >  * If the skb has shared frags the user is likely using zero-copy
> >  * (e.g. sendfile).  Those page frags may point to page-cache pages;
> >  * writing into them would silently corrupt the page cache.
> >  * Linearize so pedit operates on a private copy.
> >  * TL;DR: if you want zero-copy, don't use pedit.
> >  */
> >  if (skb_has_shared_frag(skb)) {
> >         if (__skb_linearize(skb))
> >              goto bad;
> >  }
> >
> > TLDR:
> > If the  skb  has shared frags, it is linearized upfront using
> > __skb_linearize() which means any risk of modifying a shared fragment
> > is non-existent.
>
> It seems there's a second sashiko ive been made aware of - these
> things are just multiplying ;->
> https://netdev-ai.bots.linux.dev/sashiko/#/patchset/20260519033950.2037-1-rajat.gupta%40oss.qualcomm.com
>
> Both points it makes are valid. Dont have time right now.
> And infact looking at the second point sashiko 2 made I realize I was
> only paying attention to externally shared fragments which the poc
> used.
> So probably the sashiko1 had a point as well. Anyways i dont have time
> - will respond by am tommorow.
>

Sashiko2, claim 1 (rephrased for brevity):
"With the pre-loop skb_ensure_writable() removed, tcfp_off_max_hint
and related use is no longer needed.

I was going to say "separate patch" but since the cause is this patch
that removed the use, at the expense of a much bigger patch we should
remove it (delete it from include/net/tc_act/tc_pedit.h::struct
tcf_pedit_parms and net/sched/act_pedit.c - as described in sashiko2
claim.

Sashiko2, claim 2 (summarized):
"..skb_store_bits() will memcpy() the tail bytes of that 4-byte write
onto frag pages that are still referenced by the original clone, so a
clone held by a tap, AF_PACKET listener, or mirroring path would see
the modified bytes on its shared pages."

This is the same claim as sashiko1 but sashiko2 gave a much more
convincing description ;->
skb_has_shared_frag() is only true if the frags are flagged as
SKBFL_SHARED_FRAG (which is what the repro did); however, if we get
frags from eg a driver on ingress and that skb gets cloned with frags
we won't catch it.
One approach is to do an if (skb_has_any_shared_frags(skb)) and then
do a skb_linearize_cow() but that sounds like overkill.
Another which will make the patch even uglier (but less expensive) is
to add an extra check insde the patch's "if (write_offset < 0)"
to do:  if (write_offset + (int)sizeof(hdata) > 0) { skb_ensure_writable()}

cheers,
jamal

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net] net/sched: fix pedit partial COW leading to page cache corruption
  2026-05-21  9:53         ` Jamal Hadi Salim
@ 2026-05-21 10:15           ` Jamal Hadi Salim
  2026-05-21 14:35             ` Jakub Kicinski
  0 siblings, 1 reply; 18+ messages in thread
From: Jamal Hadi Salim @ 2026-05-21 10:15 UTC (permalink / raw)
  To: Rajat Gupta
  Cc: netdev, davem, edumazet, kuba, pabeni, horms, jiri, yimingqian591,
	keenanat2000, 2045gemini, rollkingzzc

[-- Attachment #1: Type: text/plain, Size: 4508 bytes --]

On Thu, May 21, 2026 at 5:53 AM Jamal Hadi Salim <jhs@mojatatu.com> wrote:
>
> ::
>
> On Wed, May 20, 2026 at 4:00 PM Jamal Hadi Salim <jhs@mojatatu.com> wrote:
> >
> > On Wed, May 20, 2026 at 5:23 AM Jamal Hadi Salim <jhs@mojatatu.com> wrote:
> > >
> > > On Mon, May 18, 2026 at 11:42 PM Rajat Gupta
> > > <rajat.gupta@oss.qualcomm.com> wrote:
> > > >
> > > > tcf_pedit_act() computes the COW range for skb_ensure_writable()
> > > > once before the key loop using tcfp_off_max_hint, but the hint does
> > > > not account for the runtime header offset added by typed keys. This
> > > > can leave part of the write region un-COW'd.
> > > >
> > > > Fix by moving skb_ensure_writable() inside the per-key loop where
> > > > the actual write offset is known, and add overflow checking on the
> > > > offset arithmetic. For negative offsets (e.g. Ethernet header edits
> > > > at ingress), use skb_cow() to COW the headroom instead. Guard
> > > > offset_valid() against INT_MIN, where negation is undefined.
> > > >
> > > > Additionally, linearize skbs with shared frags upfront to prevent
> > > > silent data corruption when pedit operates on zero-copy pages
> > > > (e.g. from sendfile).
> > > >
> > >
> > > Sashiko makes the claim that:
> > > "
> > > ....
> > > .....
> > > If the payload part extending past offset 0 resides in a shared
> > > fragment for a cloned skb, wouldn't skb_store_bits() write the
> > > positive offset  portion directly into the shared page, causing a data
> > > corruption regression?
> > > "
> > >
> > > It missed the earlier code (before the quoted code is hit) which is
> > > very well documented!
> > > /*
> > >  * If the skb has shared frags the user is likely using zero-copy
> > >  * (e.g. sendfile).  Those page frags may point to page-cache pages;
> > >  * writing into them would silently corrupt the page cache.
> > >  * Linearize so pedit operates on a private copy.
> > >  * TL;DR: if you want zero-copy, don't use pedit.
> > >  */
> > >  if (skb_has_shared_frag(skb)) {
> > >         if (__skb_linearize(skb))
> > >              goto bad;
> > >  }
> > >
> > > TLDR:
> > > If the  skb  has shared frags, it is linearized upfront using
> > > __skb_linearize() which means any risk of modifying a shared fragment
> > > is non-existent.
> >
> > It seems there's a second sashiko ive been made aware of - these
> > things are just multiplying ;->
> > https://netdev-ai.bots.linux.dev/sashiko/#/patchset/20260519033950.2037-1-rajat.gupta%40oss.qualcomm.com
> >
> > Both points it makes are valid. Dont have time right now.
> > And infact looking at the second point sashiko 2 made I realize I was
> > only paying attention to externally shared fragments which the poc
> > used.
> > So probably the sashiko1 had a point as well. Anyways i dont have time
> > - will respond by am tommorow.
> >
>
> Sashiko2, claim 1 (rephrased for brevity):
> "With the pre-loop skb_ensure_writable() removed, tcfp_off_max_hint
> and related use is no longer needed.
>
> I was going to say "separate patch" but since the cause is this patch
> that removed the use, at the expense of a much bigger patch we should
> remove it (delete it from include/net/tc_act/tc_pedit.h::struct
> tcf_pedit_parms and net/sched/act_pedit.c - as described in sashiko2
> claim.
>
> Sashiko2, claim 2 (summarized):
> "..skb_store_bits() will memcpy() the tail bytes of that 4-byte write
> onto frag pages that are still referenced by the original clone, so a
> clone held by a tap, AF_PACKET listener, or mirroring path would see
> the modified bytes on its shared pages."
>
> This is the same claim as sashiko1 but sashiko2 gave a much more
> convincing description ;->
> skb_has_shared_frag() is only true if the frags are flagged as
> SKBFL_SHARED_FRAG (which is what the repro did); however, if we get
> frags from eg a driver on ingress and that skb gets cloned with frags
> we won't catch it.
> One approach is to do an if (skb_has_any_shared_frags(skb)) and then
> do a skb_linearize_cow() but that sounds like overkill.

Yeah, this would be overkill - imagine running tcpdump 100% will be cloned

> Another which will make the patch even uglier (but less expensive) is
> to add an extra check insde the patch's "if (write_offset < 0)"
> to do:  if (write_offset + (int)sizeof(hdata) > 0) { skb_ensure_writable()}
>

To be precise, something like attached (untested, uncompiled)

cheers,
jamal

[-- Attachment #2: patchlet-rajat --]
[-- Type: application/octet-stream, Size: 585 bytes --]

diff --git a/net/sched/act_pedit.c b/net/sched/act_pedit.c
index 79921b8d89ba..8f0f84b50c85 100644
--- a/net/sched/act_pedit.c
+++ b/net/sched/act_pedit.c
@@ -474,6 +474,12 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb,
 		if (write_offset < 0) {
 			if (skb_cow(skb, -write_offset))
 				goto bad;
+			if (write_offset + (int)sizeof(hdata) > 0) {
+				if (skb_ensure_writable(skb,
+						min_t(int, skb->len,
+						      write_offset + (int)sizeof(hdata))))
+					goto bad;
+			}
 		} else {
 			if (unlikely(check_add_overflow(write_offset,
 							(int)sizeof(hdata),

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH net] net/sched: fix pedit partial COW leading to page cache corruption
  2026-05-21 10:15           ` Jamal Hadi Salim
@ 2026-05-21 14:35             ` Jakub Kicinski
  2026-05-21 15:16               ` Jamal Hadi Salim
       [not found]               ` <CAKa-r6soz=iMBiYG0Grhhc12yhdw9vMNV+XjjEPCmtgKK6+rhA@mail.gmail.com>
  0 siblings, 2 replies; 18+ messages in thread
From: Jakub Kicinski @ 2026-05-21 14:35 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: Rajat Gupta, netdev, davem, edumazet, pabeni, horms, jiri,
	yimingqian591, keenanat2000, 2045gemini, rollkingzzc

On Thu, 21 May 2026 06:15:17 -0400 Jamal Hadi Salim wrote:
> > This is the same claim as sashiko1 but sashiko2 gave a much more
> > convincing description ;->
> > skb_has_shared_frag() is only true if the frags are flagged as
> > SKBFL_SHARED_FRAG (which is what the repro did); however, if we get
> > frags from eg a driver on ingress and that skb gets cloned with frags
> > we won't catch it.
> > One approach is to do an if (skb_has_any_shared_frags(skb)) and then
> > do a skb_linearize_cow() but that sounds like overkill.  
> 
> Yeah, this would be overkill - imagine running tcpdump 100% will be cloned
> 
> > Another which will make the patch even uglier (but less expensive) is
> > to add an extra check insde the patch's "if (write_offset < 0)"
> > to do:  if (write_offset + (int)sizeof(hdata) > 0) { skb_ensure_writable()}
> >  
> 
> To be precise, something like attached (untested, uncompiled)

Can we not pull the headers? Do you know of anyone modifying payloads
with pedit?

The concept of "shared frags" is silly IMHO, if I'm checking right only
rxrpc and xfrm think that it's a thing. I'm hoping to delete that and
reclaim the flag id in net-next...

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net] net/sched: fix pedit partial COW leading to page cache corruption
  2026-05-21 14:35             ` Jakub Kicinski
@ 2026-05-21 15:16               ` Jamal Hadi Salim
  2026-05-21 15:46                 ` Jakub Kicinski
       [not found]               ` <CAKa-r6soz=iMBiYG0Grhhc12yhdw9vMNV+XjjEPCmtgKK6+rhA@mail.gmail.com>
  1 sibling, 1 reply; 18+ messages in thread
From: Jamal Hadi Salim @ 2026-05-21 15:16 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Rajat Gupta, netdev, davem, edumazet, pabeni, horms, jiri,
	yimingqian591, keenanat2000, 2045gemini, rollkingzzc

On Thu, May 21, 2026 at 10:35 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Thu, 21 May 2026 06:15:17 -0400 Jamal Hadi Salim wrote:
> > > This is the same claim as sashiko1 but sashiko2 gave a much more
> > > convincing description ;->
> > > skb_has_shared_frag() is only true if the frags are flagged as
> > > SKBFL_SHARED_FRAG (which is what the repro did); however, if we get
> > > frags from eg a driver on ingress and that skb gets cloned with frags
> > > we won't catch it.
> > > One approach is to do an if (skb_has_any_shared_frags(skb)) and then
> > > do a skb_linearize_cow() but that sounds like overkill.
> >
> > Yeah, this would be overkill - imagine running tcpdump 100% will be cloned
> >
> > > Another which will make the patch even uglier (but less expensive) is
> > > to add an extra check insde the patch's "if (write_offset < 0)"
> > > to do:  if (write_offset + (int)sizeof(hdata) > 0) { skb_ensure_writable()}
> > >
> >
> > To be precise, something like attached (untested, uncompiled)
>
> Can we not pull the headers? Do you know of anyone modifying payloads
> with pedit?
>

not sure - but it is not unreasonable if someone used it for such a case.

> The concept of "shared frags" is silly IMHO, if I'm checking right only
> rxrpc and xfrm think that it's a thing. I'm hoping to delete that and
> reclaim the flag id in net-next...

All these issues stem from the shared frags point, but the patchlet i
showed is unrelated to SKBFL_SHARED_FRAG, it is more related to
cloned+frags.
If you remove it in net-next - does that necessitate removing all
"fixes" that tried to address that issue? It seems like that would be
sensible, but it makes me wonder why the shared frags concept exists
in the first place.

cheers,
jamal

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net] net/sched: fix pedit partial COW leading to page cache corruption
  2026-05-21 15:16               ` Jamal Hadi Salim
@ 2026-05-21 15:46                 ` Jakub Kicinski
  0 siblings, 0 replies; 18+ messages in thread
From: Jakub Kicinski @ 2026-05-21 15:46 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: Rajat Gupta, netdev, davem, edumazet, pabeni, horms, jiri,
	yimingqian591, keenanat2000, 2045gemini, rollkingzzc

On Thu, 21 May 2026 11:16:35 -0400 Jamal Hadi Salim wrote:
> On Thu, May 21, 2026 at 10:35 AM Jakub Kicinski <kuba@kernel.org> wrote:
> > > To be precise, something like attached (untested, uncompiled)  
> >
> > Can we not pull the headers? Do you know of anyone modifying payloads
> > with pedit?
> 
> not sure - but it is not unreasonable if someone used it for such a case.

Ack, we'd still support modifying any offset in the packet.
But not optimize for modifying frags until we know user exists.

> > The concept of "shared frags" is silly IMHO, if I'm checking right only
> > rxrpc and xfrm think that it's a thing. I'm hoping to delete that and
> > reclaim the flag id in net-next...  
> 
> All these issues stem from the shared frags point, but the patchlet i
> showed is unrelated to SKBFL_SHARED_FRAG, it is more related to
> cloned+frags.

Right, right. Just pointing it out cause Rajat's patch adds another
skb_has_shared_frag(). Which we'll then have to delete in net-next :\

> If you remove it in net-next - does that necessitate removing all
> "fixes" that tried to address that issue? It seems like that would be
> sensible, but it makes me wonder why the shared frags concept exists
> in the first place.

IDK. We have in the tree plenty of "good ideas that went nowhere"..
As I said elsewhere IMO having the maintain this flag correctly
100% of the time will only create more CVEs, and there's no practical
reason to optimize for frags not being shared.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net] net/sched: fix pedit partial COW leading to page cache corruption
       [not found]               ` <CAKa-r6soz=iMBiYG0Grhhc12yhdw9vMNV+XjjEPCmtgKK6+rhA@mail.gmail.com>
@ 2026-05-21 15:56                 ` Jakub Kicinski
  0 siblings, 0 replies; 18+ messages in thread
From: Jakub Kicinski @ 2026-05-21 15:56 UTC (permalink / raw)
  To: Davide Caratti
  Cc: Jamal Hadi Salim, Rajat Gupta, netdev, davem, edumazet, pabeni,
	horms, jiri, yimingqian591, keenanat2000, 2045gemini, rollkingzzc

On Thu, 21 May 2026 17:51:25 +0200 Davide Caratti wrote:
> >  Do you know of anyone modifying payloads with pedit?  
> 
> yes: MPTCP kselftest does that on purpose, to mimic a middlebox and trigger
> DSS checksum failures [1]

As I said in another reply (looks like the ML is having a significant
lag this morning :() - we would still support it, the shared frag
tracking is just an optimization to avoid a copy. And maintaining that
optimization for an occasional CI test is not the right eng tradeoff.

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2026-05-21 15:56 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-18  1:30 [PATCH net] net/sched: fix pedit partial COW leading to page cache Rajat Gupta
2026-05-18 13:10 ` Han Guidong
2026-05-18 13:31 ` Jamal Hadi Salim
2026-05-19  3:39   ` [PATCH net] net/sched: fix pedit partial COW leading to page cache corruption Rajat Gupta
2026-05-19 11:18     ` Toke Høiland-Jørgensen
2026-05-19 15:10     ` Han Guidong
2026-05-20  9:12       ` Jamal Hadi Salim
2026-05-20 10:04         ` Han Guidong
2026-05-20 10:36         ` Han Guidong
2026-05-20 11:40           ` Jamal Hadi Salim
2026-05-20  9:23     ` Jamal Hadi Salim
2026-05-20 20:00       ` Jamal Hadi Salim
2026-05-21  9:53         ` Jamal Hadi Salim
2026-05-21 10:15           ` Jamal Hadi Salim
2026-05-21 14:35             ` Jakub Kicinski
2026-05-21 15:16               ` Jamal Hadi Salim
2026-05-21 15:46                 ` Jakub Kicinski
     [not found]               ` <CAKa-r6soz=iMBiYG0Grhhc12yhdw9vMNV+XjjEPCmtgKK6+rhA@mail.gmail.com>
2026-05-21 15:56                 ` Jakub Kicinski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox