Netdev List
 help / color / mirror / Atom feed
* [PATCH net v2 1/1] net/sched: fix pedit partial COW leading to page cache corruption
@ 2026-05-26 15:59 Jamal Hadi Salim
  2026-05-26 19:22 ` Toke Høiland-Jørgensen
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Jamal Hadi Salim @ 2026-05-26 15:59 UTC (permalink / raw)
  To: netdev
  Cc: davem, edumazet, kuba, pabeni, horms, jiri, victor,
	david.laight.linux, yimingqian591, keenanat2000, 2045gemini,
	rollkingzzc, toke, dcaratti, security, linux-kernel, Rajat Gupta,
	Jamal Hadi Salim

From: Rajat Gupta <rajat.gupta@oss.qualcomm.com>

tcf_pedit_act() computes the COW range for skb_ensure_writable()
once before the key loop using tcfp_off_max_hint, but the hint does
not account for the runtime header offset added by typed keys. This
can leave part of the write region un-COW'd.

Fix by moving skb_ensure_writable() inside the per-key loop where
the actual write offset is known, and add overflow checking on the
offset arithmetic. For negative offsets (e.g. Ethernet header edits
at ingress), use skb_cow() to COW the headroom instead. Guard
offset_valid() against INT_MIN, where negation is undefined.

Additionally, linearize skbs with shared frags upfront to prevent
silent data corruption when pedit operates on zero-copy pages
(e.g. from sendfile).

Fixes: 8b796475fd78 ("net/sched: act_pedit: really ensure the skb is writable")
Reported-by: Yiming Qian <yimingqian591@gmail.com>
Reported-by: Keenan Dong <keenanat2000@gmail.com>
Reported-by: Han Guidong <2045gemini@gmail.com>
Reported-by: Zhang Cen <rollkingzzc@gmail.com>
Tested-by: Victor Nogueira <victor@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Rajat Gupta <rajat.gupta@oss.qualcomm.com>
---
Changes v1->v2:
 1. Do better boundary analysis to cover cloned skbs with frags. Pointed
    out by sashiko-nipa:
    https://netdev-ai.bots.linux.dev/sashiko/#/patchset/20260519033950.2037-1-rajat.gupta%40oss.qualcomm.com
 2. As a result of fix #1 remove the skb_has_shared_frag() check, unnecessary.
    Also Jakub has plans where the shared frags is not going to be a "thing"
 3. Make small adjustments everywhere for integer checks, suggested by D. Laight
 4. Remove all reviewers and testers since this is a large enough change.
    Please retest and re-review.
 5. Remove Rajat as reporter since he is the author (which implies he is a reporter)
---
 net/sched/act_pedit.c | 49 ++++++++++++++++++++++++++++---------------
 1 file changed, 32 insertions(+), 17 deletions(-)

diff --git a/net/sched/act_pedit.c b/net/sched/act_pedit.c
index bc20f08a2789..719bee335e1f 100644
--- a/net/sched/act_pedit.c
+++ b/net/sched/act_pedit.c
@@ -16,6 +16,7 @@
 #include <linux/ip.h>
 #include <linux/ipv6.h>
 #include <linux/slab.h>
+#include <linux/overflow.h>
 #include <net/ipv6.h>
 #include <net/netlink.h>
 #include <net/pkt_sched.h>
@@ -323,7 +324,7 @@ static bool offset_valid(struct sk_buff *skb, int offset)
 	if (offset > 0 && offset > skb->len)
 		return false;
 
-	if  (offset < 0 && -offset > skb_headroom(skb))
+	if (offset < 0 && offset < -(int)skb_headroom(skb))
 		return false;
 
 	return true;
@@ -393,18 +394,10 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb,
 	struct tcf_pedit_key_ex *tkey_ex;
 	struct tcf_pedit_parms *parms;
 	struct tc_pedit_key *tkey;
-	u32 max_offset;
 	int i;
 
 	parms = rcu_dereference_bh(p->parms);
 
-	max_offset = (skb_transport_header_was_set(skb) ?
-		      skb_transport_offset(skb) :
-		      skb_network_offset(skb)) +
-		     parms->tcfp_off_max_hint;
-	if (skb_ensure_writable(skb, min(skb->len, max_offset)))
-		goto done;
-
 	tcf_lastuse_update(&p->tcf_tm);
 	tcf_action_update_bstats(&p->common, skb);
 
@@ -412,9 +405,10 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb,
 	tkey_ex = parms->tcfp_keys_ex;
 
 	for (i = parms->tcfp_nkeys; i > 0; i--, tkey++) {
+		int write_offset, write_len;
 		int offset = tkey->off;
 		int hoffset = 0;
-		u32 *ptr, hdata;
+		u32 *ptr;
 		u32 val;
 		int rc;
 
@@ -451,15 +445,38 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb,
 			}
 		}
 
-		if (!offset_valid(skb, hoffset + offset)) {
-			pr_info_ratelimited("tc action pedit offset %d out of bounds\n", hoffset + offset);
+		if (unlikely(check_add_overflow(hoffset, offset,
+						&write_offset))) {
+			pr_info_ratelimited("tc action pedit offset overflow\n");
 			goto bad;
 		}
 
-		ptr = skb_header_pointer(skb, hoffset + offset,
-					 sizeof(hdata), &hdata);
-		if (!ptr)
+		if (!offset_valid(skb, write_offset)) {
+			pr_info_ratelimited("tc action pedit offset %d out of bounds\n",
+					    write_offset);
 			goto bad;
+		}
+
+		if (write_offset < 0) {
+			if (skb_cow(skb, -write_offset))
+				goto bad;
+			if (write_offset + (int)sizeof(*ptr) > 0) {
+				if (skb_ensure_writable(skb,
+							min(skb->len,
+							    write_offset + sizeof(*ptr))))
+					goto bad;
+			}
+		} else {
+			if (unlikely(check_add_overflow(write_offset,
+							(int)sizeof(*ptr),
+							&write_len)))
+				goto bad;
+			if (skb_ensure_writable(skb, min_t(int, skb->len,
+							   write_len)))
+				goto bad;
+		}
+
+		ptr = (u32 *)(skb->data + write_offset);
 		/* just do it, baby */
 		switch (cmd) {
 		case TCA_PEDIT_KEY_EX_CMD_SET:
@@ -474,8 +491,6 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb,
 		}
 
 		*ptr = ((*ptr & tkey->mask) ^ val);
-		if (ptr == &hdata)
-			skb_store_bits(skb, hoffset + offset, ptr, 4);
 	}
 
 	goto done;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH net v2 1/1] net/sched: fix pedit partial COW leading to page cache corruption
  2026-05-26 15:59 [PATCH net v2 1/1] net/sched: fix pedit partial COW leading to page cache corruption Jamal Hadi Salim
@ 2026-05-26 19:22 ` Toke Høiland-Jørgensen
  2026-05-27  9:00   ` David Laight
  2026-05-27 14:56   ` Jamal Hadi Salim
  2026-05-26 21:29 ` Davide Caratti
  2026-05-27  2:36 ` Han Guidong
  2 siblings, 2 replies; 12+ messages in thread
From: Toke Høiland-Jørgensen @ 2026-05-26 19:22 UTC (permalink / raw)
  To: Jamal Hadi Salim, netdev
  Cc: davem, edumazet, kuba, pabeni, horms, jiri, victor,
	david.laight.linux, yimingqian591, keenanat2000, 2045gemini,
	rollkingzzc, dcaratti, security, linux-kernel, Rajat Gupta,
	Jamal Hadi Salim

Jamal Hadi Salim <jhs@mojatatu.com> writes:

> From: Rajat Gupta <rajat.gupta@oss.qualcomm.com>
>
> tcf_pedit_act() computes the COW range for skb_ensure_writable()
> once before the key loop using tcfp_off_max_hint, but the hint does
> not account for the runtime header offset added by typed keys. This
> can leave part of the write region un-COW'd.
>
> Fix by moving skb_ensure_writable() inside the per-key loop where
> the actual write offset is known, and add overflow checking on the
> offset arithmetic. For negative offsets (e.g. Ethernet header edits
> at ingress), use skb_cow() to COW the headroom instead. Guard
> offset_valid() against INT_MIN, where negation is undefined.
>
> Additionally, linearize skbs with shared frags upfront to prevent
> silent data corruption when pedit operates on zero-copy pages
> (e.g. from sendfile).
>
> Fixes: 8b796475fd78 ("net/sched: act_pedit: really ensure the skb is writable")
> Reported-by: Yiming Qian <yimingqian591@gmail.com>
> Reported-by: Keenan Dong <keenanat2000@gmail.com>
> Reported-by: Han Guidong <2045gemini@gmail.com>
> Reported-by: Zhang Cen <rollkingzzc@gmail.com>
> Tested-by: Victor Nogueira <victor@mojatatu.com>
> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
> Signed-off-by: Rajat Gupta <rajat.gupta@oss.qualcomm.com>

Re-ran the tests, and everything looks good, so:

Tested-by: Toke Høiland-Jørgensen <toke@redhat.com>

Also looked at the code, and I have a few nits below, but I'm really
nitpicking here, so whether you end up fixing those or not:

Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>


[...]

> @@ -323,7 +324,7 @@ static bool offset_valid(struct sk_buff *skb, int offset)
>  	if (offset > 0 && offset > skb->len)
>  		return false;
>  
> -	if  (offset < 0 && -offset > skb_headroom(skb))
> +	if (offset < 0 && offset < -(int)skb_headroom(skb))
>  		return false;

This change makes it really obvious that this is really just:

	if (offset < -(int)skb_headroom(skb))
  		return false;

so, well, that would be clearer, IMO.

But then I guess the same could be said of the positive case, so:

static bool offset_valid(struct sk_buff *skb, int offset)
{
	if (offset > skb->len || offset < -(int)skb_headroom(skb))
		return false;

	return true;
}

> @@ -393,18 +394,10 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb,
>  	struct tcf_pedit_key_ex *tkey_ex;
>  	struct tcf_pedit_parms *parms;
>  	struct tc_pedit_key *tkey;
> -	u32 max_offset;
>  	int i;
>  
>  	parms = rcu_dereference_bh(p->parms);
>  
> -	max_offset = (skb_transport_header_was_set(skb) ?
> -		      skb_transport_offset(skb) :
> -		      skb_network_offset(skb)) +
> -		     parms->tcfp_off_max_hint;
> -	if (skb_ensure_writable(skb, min(skb->len, max_offset)))
> -		goto done;
> -
>  	tcf_lastuse_update(&p->tcf_tm);
>  	tcf_action_update_bstats(&p->common, skb);
>  
> @@ -412,9 +405,10 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb,
>  	tkey_ex = parms->tcfp_keys_ex;
>  
>  	for (i = parms->tcfp_nkeys; i > 0; i--, tkey++) {
> +		int write_offset, write_len;
>  		int offset = tkey->off;
>  		int hoffset = 0;
> -		u32 *ptr, hdata;
> +		u32 *ptr;
>  		u32 val;
>  		int rc;
>  
> @@ -451,15 +445,38 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb,
>  			}
>  		}
>  
> -		if (!offset_valid(skb, hoffset + offset)) {
> -			pr_info_ratelimited("tc action pedit offset %d out of bounds\n", hoffset + offset);
> +		if (unlikely(check_add_overflow(hoffset, offset,

It's a bit weird that this has the unlikely(), but the offset_valid()
check doesn't?

> +						&write_offset))) {
> +			pr_info_ratelimited("tc action pedit offset overflow\n");
>  			goto bad;
>  		}
>  
> -		ptr = skb_header_pointer(skb, hoffset + offset,
> -					 sizeof(hdata), &hdata);
> -		if (!ptr)
> +		if (!offset_valid(skb, write_offset)) {
> +			pr_info_ratelimited("tc action pedit offset %d out of bounds\n",
> +					    write_offset);
>  			goto bad;
> +		}
> +
> +		if (write_offset < 0) {
> +			if (skb_cow(skb, -write_offset))
> +				goto bad;
> +			if (write_offset + (int)sizeof(*ptr) > 0) {
> +				if (skb_ensure_writable(skb,
> +							min(skb->len,
> +							    write_offset + sizeof(*ptr))))

Combining these with && instead of the double indentation would be more
readable IMO (shorter lines, aligning the 'goto bad' labels).

> +					goto bad;
> +			}
> +		} else {
> +			if (unlikely(check_add_overflow(write_offset,
> +							(int)sizeof(*ptr),
> +							&write_len)))

Same comment wrt unlikely()


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net v2 1/1] net/sched: fix pedit partial COW leading to page cache corruption
  2026-05-26 15:59 [PATCH net v2 1/1] net/sched: fix pedit partial COW leading to page cache corruption Jamal Hadi Salim
  2026-05-26 19:22 ` Toke Høiland-Jørgensen
@ 2026-05-26 21:29 ` Davide Caratti
  2026-05-27  2:36 ` Han Guidong
  2 siblings, 0 replies; 12+ messages in thread
From: Davide Caratti @ 2026-05-26 21:29 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: netdev, davem, edumazet, kuba, pabeni, horms, jiri, victor,
	david.laight.linux, yimingqian591, keenanat2000, 2045gemini,
	rollkingzzc, toke, security, linux-kernel, Rajat Gupta

On Tue, May 26, 2026 at 11:59:13AM -0400, Jamal Hadi Salim wrote:
> From: Rajat Gupta <rajat.gupta@oss.qualcomm.com>
> 
> tcf_pedit_act() computes the COW range for skb_ensure_writable()
> once before the key loop using tcfp_off_max_hint, but the hint does
> not account for the runtime header offset added by typed keys. This
> can leave part of the write region un-COW'd.
> 
> Fix by moving skb_ensure_writable() inside the per-key loop where
> the actual write offset is known, and add overflow checking on the
> offset arithmetic. For negative offsets (e.g. Ethernet header edits
> at ingress), use skb_cow() to COW the headroom instead. Guard
> offset_valid() against INT_MIN, where negation is undefined.
> 
> Additionally, linearize skbs with shared frags upfront to prevent
> silent data corruption when pedit operates on zero-copy pages
> (e.g. from sendfile).
> 
> Fixes: 8b796475fd78 ("net/sched: act_pedit: really ensure the skb is writable")
> Reported-by: Yiming Qian <yimingqian591@gmail.com>
> Reported-by: Keenan Dong <keenanat2000@gmail.com>
> Reported-by: Han Guidong <2045gemini@gmail.com>
> Reported-by: Zhang Cen <rollkingzzc@gmail.com>
> Tested-by: Victor Nogueira <victor@mojatatu.com>
> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
> Signed-off-by: Rajat Gupta <rajat.gupta@oss.qualcomm.com>
> ---
> Changes v1->v2:
>  1. Do better boundary analysis to cover cloned skbs with frags. Pointed
>     out by sashiko-nipa:
>     https://netdev-ai.bots.linux.dev/sashiko/#/patchset/20260519033950.2037-1-rajat.gupta%40oss.qualcomm.com
>  2. As a result of fix #1 remove the skb_has_shared_frag() check, unnecessary.
>     Also Jakub has plans where the shared frags is not going to be a "thing"
>  3. Make small adjustments everywhere for integer checks, suggested by D. Laight
>  4. Remove all reviewers and testers since this is a large enough change.
>     Please retest and re-review.
>  5. Remove Rajat as reporter since he is the author (which implies he is a reporter)

re-ran mp_join + pedit_l4port + pedit_ip kselftests on patch v2, no issues found.

Reviewed-and-tested-by: Davide Caratti <dcaratti@redhat.com>

Thanks!
-- 
davide


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net v2 1/1] net/sched: fix pedit partial COW leading to page cache corruption
  2026-05-26 15:59 [PATCH net v2 1/1] net/sched: fix pedit partial COW leading to page cache corruption Jamal Hadi Salim
  2026-05-26 19:22 ` Toke Høiland-Jørgensen
  2026-05-26 21:29 ` Davide Caratti
@ 2026-05-27  2:36 ` Han Guidong
  2 siblings, 0 replies; 12+ messages in thread
From: Han Guidong @ 2026-05-27  2:36 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: netdev, davem, edumazet, kuba, pabeni, horms, jiri, victor,
	david.laight.linux, yimingqian591, keenanat2000, rollkingzzc,
	toke, dcaratti, security, linux-kernel, Rajat Gupta

On Tue, May 26, 2026 at 11:59 PM Jamal Hadi Salim <jhs@mojatatu.com> wrote:
>
> From: Rajat Gupta <rajat.gupta@oss.qualcomm.com>
>
> tcf_pedit_act() computes the COW range for skb_ensure_writable()
> once before the key loop using tcfp_off_max_hint, but the hint does
> not account for the runtime header offset added by typed keys. This
> can leave part of the write region un-COW'd.
>
> Fix by moving skb_ensure_writable() inside the per-key loop where
> the actual write offset is known, and add overflow checking on the
> offset arithmetic. For negative offsets (e.g. Ethernet header edits
> at ingress), use skb_cow() to COW the headroom instead. Guard
> offset_valid() against INT_MIN, where negation is undefined.
>
> Additionally, linearize skbs with shared frags upfront to prevent
> silent data corruption when pedit operates on zero-copy pages
> (e.g. from sendfile).
>
> Fixes: 8b796475fd78 ("net/sched: act_pedit: really ensure the skb is writable")
> Reported-by: Yiming Qian <yimingqian591@gmail.com>
> Reported-by: Keenan Dong <keenanat2000@gmail.com>
> Reported-by: Han Guidong <2045gemini@gmail.com>
> Reported-by: Zhang Cen <rollkingzzc@gmail.com>
> Tested-by: Victor Nogueira <victor@mojatatu.com>
> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
> Signed-off-by: Rajat Gupta <rajat.gupta@oss.qualcomm.com>

Reviewed-and-tested-by: Han Guidong <2045gemini@gmail.com>

I tested this v2 both with the PoC and with the module-based corner
case we discussed earlier, and both look good to me.

Thanks for the careful and thorough rework here.

> ---
> Changes v1->v2:
>  1. Do better boundary analysis to cover cloned skbs with frags. Pointed
>     out by sashiko-nipa:
>     https://netdev-ai.bots.linux.dev/sashiko/#/patchset/20260519033950.2037-1-rajat.gupta%40oss.qualcomm.com
>  2. As a result of fix #1 remove the skb_has_shared_frag() check, unnecessary.
>     Also Jakub has plans where the shared frags is not going to be a "thing"
>  3. Make small adjustments everywhere for integer checks, suggested by D. Laight
>  4. Remove all reviewers and testers since this is a large enough change.
>     Please retest and re-review.
>  5. Remove Rajat as reporter since he is the author (which implies he is a reporter)
> ---
>  net/sched/act_pedit.c | 49 ++++++++++++++++++++++++++++---------------
>  1 file changed, 32 insertions(+), 17 deletions(-)
>
> diff --git a/net/sched/act_pedit.c b/net/sched/act_pedit.c
> index bc20f08a2789..719bee335e1f 100644
> --- a/net/sched/act_pedit.c
> +++ b/net/sched/act_pedit.c
> @@ -16,6 +16,7 @@
>  #include <linux/ip.h>
>  #include <linux/ipv6.h>
>  #include <linux/slab.h>
> +#include <linux/overflow.h>
>  #include <net/ipv6.h>
>  #include <net/netlink.h>
>  #include <net/pkt_sched.h>
> @@ -323,7 +324,7 @@ static bool offset_valid(struct sk_buff *skb, int offset)
>         if (offset > 0 && offset > skb->len)
>                 return false;
>
> -       if  (offset < 0 && -offset > skb_headroom(skb))
> +       if (offset < 0 && offset < -(int)skb_headroom(skb))
>                 return false;
>
>         return true;
> @@ -393,18 +394,10 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb,
>         struct tcf_pedit_key_ex *tkey_ex;
>         struct tcf_pedit_parms *parms;
>         struct tc_pedit_key *tkey;
> -       u32 max_offset;
>         int i;
>
>         parms = rcu_dereference_bh(p->parms);
>
> -       max_offset = (skb_transport_header_was_set(skb) ?
> -                     skb_transport_offset(skb) :
> -                     skb_network_offset(skb)) +
> -                    parms->tcfp_off_max_hint;
> -       if (skb_ensure_writable(skb, min(skb->len, max_offset)))
> -               goto done;
> -
>         tcf_lastuse_update(&p->tcf_tm);
>         tcf_action_update_bstats(&p->common, skb);
>
> @@ -412,9 +405,10 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb,
>         tkey_ex = parms->tcfp_keys_ex;
>
>         for (i = parms->tcfp_nkeys; i > 0; i--, tkey++) {
> +               int write_offset, write_len;
>                 int offset = tkey->off;
>                 int hoffset = 0;
> -               u32 *ptr, hdata;
> +               u32 *ptr;
>                 u32 val;
>                 int rc;
>
> @@ -451,15 +445,38 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb,
>                         }
>                 }
>
> -               if (!offset_valid(skb, hoffset + offset)) {
> -                       pr_info_ratelimited("tc action pedit offset %d out of bounds\n", hoffset + offset);
> +               if (unlikely(check_add_overflow(hoffset, offset,
> +                                               &write_offset))) {
> +                       pr_info_ratelimited("tc action pedit offset overflow\n");
>                         goto bad;
>                 }
>
> -               ptr = skb_header_pointer(skb, hoffset + offset,
> -                                        sizeof(hdata), &hdata);
> -               if (!ptr)
> +               if (!offset_valid(skb, write_offset)) {
> +                       pr_info_ratelimited("tc action pedit offset %d out of bounds\n",
> +                                           write_offset);
>                         goto bad;
> +               }
> +
> +               if (write_offset < 0) {
> +                       if (skb_cow(skb, -write_offset))
> +                               goto bad;
> +                       if (write_offset + (int)sizeof(*ptr) > 0) {
> +                               if (skb_ensure_writable(skb,
> +                                                       min(skb->len,
> +                                                           write_offset + sizeof(*ptr))))
> +                                       goto bad;
> +                       }
> +               } else {
> +                       if (unlikely(check_add_overflow(write_offset,
> +                                                       (int)sizeof(*ptr),
> +                                                       &write_len)))
> +                               goto bad;
> +                       if (skb_ensure_writable(skb, min_t(int, skb->len,
> +                                                          write_len)))
> +                               goto bad;
> +               }
> +
> +               ptr = (u32 *)(skb->data + write_offset);
>                 /* just do it, baby */
>                 switch (cmd) {
>                 case TCA_PEDIT_KEY_EX_CMD_SET:
> @@ -474,8 +491,6 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb,
>                 }
>
>                 *ptr = ((*ptr & tkey->mask) ^ val);
> -               if (ptr == &hdata)
> -                       skb_store_bits(skb, hoffset + offset, ptr, 4);
>         }
>
>         goto done;
> --
> 2.34.1
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net v2 1/1] net/sched: fix pedit partial COW leading to page cache corruption
  2026-05-26 19:22 ` Toke Høiland-Jørgensen
@ 2026-05-27  9:00   ` David Laight
  2026-05-27 10:21     ` Toke Høiland-Jørgensen
  2026-05-27 14:56   ` Jamal Hadi Salim
  1 sibling, 1 reply; 12+ messages in thread
From: David Laight @ 2026-05-27  9:00 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Jamal Hadi Salim, netdev, davem, edumazet, kuba, pabeni, horms,
	jiri, victor, yimingqian591, keenanat2000, 2045gemini,
	rollkingzzc, dcaratti, security, linux-kernel, Rajat Gupta

On Tue, 26 May 2026 21:22:52 +0200
Toke Høiland-Jørgensen <toke@redhat.com> wrote:

> Jamal Hadi Salim <jhs@mojatatu.com> writes:
> 
> > From: Rajat Gupta <rajat.gupta@oss.qualcomm.com>
> >
> > tcf_pedit_act() computes the COW range for skb_ensure_writable()
> > once before the key loop using tcfp_off_max_hint, but the hint does
> > not account for the runtime header offset added by typed keys. This
> > can leave part of the write region un-COW'd.
> >
> > Fix by moving skb_ensure_writable() inside the per-key loop where
> > the actual write offset is known, and add overflow checking on the
> > offset arithmetic. For negative offsets (e.g. Ethernet header edits
> > at ingress), use skb_cow() to COW the headroom instead. Guard
> > offset_valid() against INT_MIN, where negation is undefined.
> >
> > Additionally, linearize skbs with shared frags upfront to prevent
> > silent data corruption when pedit operates on zero-copy pages
> > (e.g. from sendfile).
> >
> > Fixes: 8b796475fd78 ("net/sched: act_pedit: really ensure the skb is writable")
> > Reported-by: Yiming Qian <yimingqian591@gmail.com>
> > Reported-by: Keenan Dong <keenanat2000@gmail.com>
> > Reported-by: Han Guidong <2045gemini@gmail.com>
> > Reported-by: Zhang Cen <rollkingzzc@gmail.com>
> > Tested-by: Victor Nogueira <victor@mojatatu.com>
> > Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
> > Signed-off-by: Rajat Gupta <rajat.gupta@oss.qualcomm.com>  
> 
> Re-ran the tests, and everything looks good, so:
> 
> Tested-by: Toke Høiland-Jørgensen <toke@redhat.com>
> 
> Also looked at the code, and I have a few nits below, but I'm really
> nitpicking here, so whether you end up fixing those or not:
> 
> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
> 
> 
> [...]
> 
> > @@ -323,7 +324,7 @@ static bool offset_valid(struct sk_buff *skb, int offset)
> >  	if (offset > 0 && offset > skb->len)
> >  		return false;
> >  
> > -	if  (offset < 0 && -offset > skb_headroom(skb))
> > +	if (offset < 0 && offset < -(int)skb_headroom(skb))
> >  		return false;  
> 
> This change makes it really obvious that this is really just:
> 
> 	if (offset < -(int)skb_headroom(skb))
>   		return false;
> 
> so, well, that would be clearer, IMO.
> 
> But then I guess the same could be said of the positive case, so:
> 
> static bool offset_valid(struct sk_buff *skb, int offset)
> {
> 	if (offset > skb->len || offset < -(int)skb_headroom(skb))
> 		return false;
> 
> 	return true;
> }

There are all sorts of integer conversions going on.
IIRC Both skb->len and skb_headroom() are 32bit unsigned.
skb_headroom() is relatively small, skb->len can be over 64k but nowhere
near MAX_INT.
offset is signed 32bit and the code is allowing for it being -MAX_INT
(but I'm not at all sure whether that can happen without overflow being likely).

So I think the single test:
	if (offset + skb_headroom(skb) >= skb->len + skb_headroom(skb))
		return false;
is correct.
If offset is 'too negative' the LHS will be 'very large postitive' and
the test fails.

-- David

> 
> > @@ -393,18 +394,10 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb,
> >  	struct tcf_pedit_key_ex *tkey_ex;
> >  	struct tcf_pedit_parms *parms;
> >  	struct tc_pedit_key *tkey;
> > -	u32 max_offset;
> >  	int i;
> >  
> >  	parms = rcu_dereference_bh(p->parms);
> >  
> > -	max_offset = (skb_transport_header_was_set(skb) ?
> > -		      skb_transport_offset(skb) :
> > -		      skb_network_offset(skb)) +
> > -		     parms->tcfp_off_max_hint;
> > -	if (skb_ensure_writable(skb, min(skb->len, max_offset)))
> > -		goto done;
> > -
> >  	tcf_lastuse_update(&p->tcf_tm);
> >  	tcf_action_update_bstats(&p->common, skb);
> >  
> > @@ -412,9 +405,10 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb,
> >  	tkey_ex = parms->tcfp_keys_ex;
> >  
> >  	for (i = parms->tcfp_nkeys; i > 0; i--, tkey++) {
> > +		int write_offset, write_len;
> >  		int offset = tkey->off;
> >  		int hoffset = 0;
> > -		u32 *ptr, hdata;
> > +		u32 *ptr;
> >  		u32 val;
> >  		int rc;
> >  
> > @@ -451,15 +445,38 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb,
> >  			}
> >  		}
> >  
> > -		if (!offset_valid(skb, hoffset + offset)) {
> > -			pr_info_ratelimited("tc action pedit offset %d out of bounds\n", hoffset + offset);
> > +		if (unlikely(check_add_overflow(hoffset, offset,  
> 
> It's a bit weird that this has the unlikely(), but the offset_valid()
> check doesn't?
> 
> > +						&write_offset))) {
> > +			pr_info_ratelimited("tc action pedit offset overflow\n");
> >  			goto bad;
> >  		}
> >  
> > -		ptr = skb_header_pointer(skb, hoffset + offset,
> > -					 sizeof(hdata), &hdata);
> > -		if (!ptr)
> > +		if (!offset_valid(skb, write_offset)) {
> > +			pr_info_ratelimited("tc action pedit offset %d out of bounds\n",
> > +					    write_offset);
> >  			goto bad;
> > +		}
> > +
> > +		if (write_offset < 0) {
> > +			if (skb_cow(skb, -write_offset))
> > +				goto bad;
> > +			if (write_offset + (int)sizeof(*ptr) > 0) {
> > +				if (skb_ensure_writable(skb,
> > +							min(skb->len,
> > +							    write_offset + sizeof(*ptr))))  
> 
> Combining these with && instead of the double indentation would be more
> readable IMO (shorter lines, aligning the 'goto bad' labels).
> 
> > +					goto bad;
> > +			}
> > +		} else {
> > +			if (unlikely(check_add_overflow(write_offset,
> > +							(int)sizeof(*ptr),
> > +							&write_len)))  
> 
> Same comment wrt unlikely()
> 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net v2 1/1] net/sched: fix pedit partial COW leading to page cache corruption
  2026-05-27  9:00   ` David Laight
@ 2026-05-27 10:21     ` Toke Høiland-Jørgensen
  0 siblings, 0 replies; 12+ messages in thread
From: Toke Høiland-Jørgensen @ 2026-05-27 10:21 UTC (permalink / raw)
  To: David Laight
  Cc: Jamal Hadi Salim, netdev, davem, edumazet, kuba, pabeni, horms,
	jiri, victor, yimingqian591, keenanat2000, 2045gemini,
	rollkingzzc, dcaratti, security, linux-kernel, Rajat Gupta

David Laight <david.laight.linux@gmail.com> writes:

> On Tue, 26 May 2026 21:22:52 +0200
> Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
>> Jamal Hadi Salim <jhs@mojatatu.com> writes:
>> 
>> > From: Rajat Gupta <rajat.gupta@oss.qualcomm.com>
>> >
>> > tcf_pedit_act() computes the COW range for skb_ensure_writable()
>> > once before the key loop using tcfp_off_max_hint, but the hint does
>> > not account for the runtime header offset added by typed keys. This
>> > can leave part of the write region un-COW'd.
>> >
>> > Fix by moving skb_ensure_writable() inside the per-key loop where
>> > the actual write offset is known, and add overflow checking on the
>> > offset arithmetic. For negative offsets (e.g. Ethernet header edits
>> > at ingress), use skb_cow() to COW the headroom instead. Guard
>> > offset_valid() against INT_MIN, where negation is undefined.
>> >
>> > Additionally, linearize skbs with shared frags upfront to prevent
>> > silent data corruption when pedit operates on zero-copy pages
>> > (e.g. from sendfile).
>> >
>> > Fixes: 8b796475fd78 ("net/sched: act_pedit: really ensure the skb is writable")
>> > Reported-by: Yiming Qian <yimingqian591@gmail.com>
>> > Reported-by: Keenan Dong <keenanat2000@gmail.com>
>> > Reported-by: Han Guidong <2045gemini@gmail.com>
>> > Reported-by: Zhang Cen <rollkingzzc@gmail.com>
>> > Tested-by: Victor Nogueira <victor@mojatatu.com>
>> > Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
>> > Signed-off-by: Rajat Gupta <rajat.gupta@oss.qualcomm.com>  
>> 
>> Re-ran the tests, and everything looks good, so:
>> 
>> Tested-by: Toke Høiland-Jørgensen <toke@redhat.com>
>> 
>> Also looked at the code, and I have a few nits below, but I'm really
>> nitpicking here, so whether you end up fixing those or not:
>> 
>> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
>> 
>> 
>> [...]
>> 
>> > @@ -323,7 +324,7 @@ static bool offset_valid(struct sk_buff *skb, int offset)
>> >  	if (offset > 0 && offset > skb->len)
>> >  		return false;
>> >  
>> > -	if  (offset < 0 && -offset > skb_headroom(skb))
>> > +	if (offset < 0 && offset < -(int)skb_headroom(skb))
>> >  		return false;  
>> 
>> This change makes it really obvious that this is really just:
>> 
>> 	if (offset < -(int)skb_headroom(skb))
>>   		return false;
>> 
>> so, well, that would be clearer, IMO.
>> 
>> But then I guess the same could be said of the positive case, so:
>> 
>> static bool offset_valid(struct sk_buff *skb, int offset)
>> {
>> 	if (offset > skb->len || offset < -(int)skb_headroom(skb))
>> 		return false;
>> 
>> 	return true;
>> }
>
> There are all sorts of integer conversions going on.
> IIRC Both skb->len and skb_headroom() are 32bit unsigned.
> skb_headroom() is relatively small, skb->len can be over 64k but nowhere
> near MAX_INT.

Right, I had the implicit signed/unsigned conversions the wrong way
'round in my head. So what's missing above is a cast of skb->len to int,
i.e.:

 	if (offset > (int)skb->len || offset < -(int)skb_headroom(skb))
 		return false;

right?

> offset is signed 32bit and the code is allowing for it being -MAX_INT
> (but I'm not at all sure whether that can happen without overflow being likely).
>
> So I think the single test:
> 	if (offset + skb_headroom(skb) >= skb->len + skb_headroom(skb))
> 		return false;
> is correct.
> If offset is 'too negative' the LHS will be 'very large postitive' and
> the test fails.

Yeah, I agree. FWIW, they turn out to be the same number of
instructions, the difference being that one contains a jump and the
other doesn't. It seems clang needs an unlikely() to put that jump in
the 'false' path, though:

https://godbolt.org/z/5o5Gxqe3W

OTOH, I think the two tests are more readable; the single-test version
would need a comment explaining your "too negative" rationale. But given
the subtleties of the implicit integer conversions, maybe that's better
anyway.

And they're both half the instructions of the version currently in the
patch, so if we're code golfing (which I guess we are at this point) I
guess we should pick one of those :)

-Toke


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net v2 1/1] net/sched: fix pedit partial COW leading to page cache corruption
  2026-05-26 19:22 ` Toke Høiland-Jørgensen
  2026-05-27  9:00   ` David Laight
@ 2026-05-27 14:56   ` Jamal Hadi Salim
  2026-05-27 15:12     ` Toke Høiland-Jørgensen
  2026-05-27 16:13     ` David Laight
  1 sibling, 2 replies; 12+ messages in thread
From: Jamal Hadi Salim @ 2026-05-27 14:56 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: netdev, davem, edumazet, kuba, pabeni, horms, jiri, victor,
	david.laight.linux, yimingqian591, keenanat2000, 2045gemini,
	rollkingzzc, dcaratti, security, linux-kernel, Rajat Gupta

 &&

On Tue, May 26, 2026 at 3:22 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> Jamal Hadi Salim <jhs@mojatatu.com> writes:
>
> > From: Rajat Gupta <rajat.gupta@oss.qualcomm.com>
> >
> > tcf_pedit_act() computes the COW range for skb_ensure_writable()
> > once before the key loop using tcfp_off_max_hint, but the hint does
> > not account for the runtime header offset added by typed keys. This
> > can leave part of the write region un-COW'd.
> >
> > Fix by moving skb_ensure_writable() inside the per-key loop where
> > the actual write offset is known, and add overflow checking on the
> > offset arithmetic. For negative offsets (e.g. Ethernet header edits
> > at ingress), use skb_cow() to COW the headroom instead. Guard
> > offset_valid() against INT_MIN, where negation is undefined.
> >
> > Additionally, linearize skbs with shared frags upfront to prevent
> > silent data corruption when pedit operates on zero-copy pages
> > (e.g. from sendfile).
> >
> > Fixes: 8b796475fd78 ("net/sched: act_pedit: really ensure the skb is writable")
> > Reported-by: Yiming Qian <yimingqian591@gmail.com>
> > Reported-by: Keenan Dong <keenanat2000@gmail.com>
> > Reported-by: Han Guidong <2045gemini@gmail.com>
> > Reported-by: Zhang Cen <rollkingzzc@gmail.com>
> > Tested-by: Victor Nogueira <victor@mojatatu.com>
> > Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
> > Signed-off-by: Rajat Gupta <rajat.gupta@oss.qualcomm.com>
>
> Re-ran the tests, and everything looks good, so:
>
> Tested-by: Toke Høiland-Jørgensen <toke@redhat.com>
>
> Also looked at the code, and I have a few nits below, but I'm really
> nitpicking here, so whether you end up fixing those or not:
>
> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
>
>
> [...]
>
> > @@ -323,7 +324,7 @@ static bool offset_valid(struct sk_buff *skb, int offset)
> >       if (offset > 0 && offset > skb->len)
> >               return false;
> >
> > -     if  (offset < 0 && -offset > skb_headroom(skb))
> > +     if (offset < 0 && offset < -(int)skb_headroom(skb))
> >               return false;
>
> This change makes it really obvious that this is really just:
>
>         if (offset < -(int)skb_headroom(skb))
>                 return false;
>
> so, well, that would be clearer, IMO.
>
> But then I guess the same could be said of the positive case, so:
>
> static bool offset_valid(struct sk_buff *skb, int offset)
> {
>         if (offset > skb->len || offset < -(int)skb_headroom(skb))
>                 return false;
>
>         return true;
> }
>

Yes, that improves readability. If i understood the discussion between
you and David L. something like this one liner would be reasonable?

if (offset > (int)skb->len || offset < -(int)skb_headroom(skb))


> > @@ -393,18 +394,10 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb,
> >       struct tcf_pedit_key_ex *tkey_ex;
> >       struct tcf_pedit_parms *parms;
> >       struct tc_pedit_key *tkey;
> > -     u32 max_offset;
> >       int i;
> >
> >       parms = rcu_dereference_bh(p->parms);
> >
> > -     max_offset = (skb_transport_header_was_set(skb) ?
> > -                   skb_transport_offset(skb) :
> > -                   skb_network_offset(skb)) +
> > -                  parms->tcfp_off_max_hint;
> > -     if (skb_ensure_writable(skb, min(skb->len, max_offset)))
> > -             goto done;
> > -
> >       tcf_lastuse_update(&p->tcf_tm);
> >       tcf_action_update_bstats(&p->common, skb);
> >
> > @@ -412,9 +405,10 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb,
> >       tkey_ex = parms->tcfp_keys_ex;
> >
> >       for (i = parms->tcfp_nkeys; i > 0; i--, tkey++) {
> > +             int write_offset, write_len;
> >               int offset = tkey->off;
> >               int hoffset = 0;
> > -             u32 *ptr, hdata;
> > +             u32 *ptr;
> >               u32 val;
> >               int rc;
> >
> > @@ -451,15 +445,38 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb,
> >                       }
> >               }
> >
> > -             if (!offset_valid(skb, hoffset + offset)) {
> > -                     pr_info_ratelimited("tc action pedit offset %d out of bounds\n", hoffset + offset);
> > +             if (unlikely(check_add_overflow(hoffset, offset,
>
> It's a bit weird that this has the unlikely(), but the offset_valid()
> check doesn't?
>

I focused on the bigger solution and worried more about timeliness to
get this patch in - but it seems the distros had already picked up the
first posted patch, so hakuna matata (Still, I should have caught
these unlikelies ;->). Yes on the second one you pointed out.
Will remove them.

> > +                                             &write_offset))) {
> > +                     pr_info_ratelimited("tc action pedit offset overflow\n");
> >                       goto bad;
> >               }
> >
> > -             ptr = skb_header_pointer(skb, hoffset + offset,
> > -                                      sizeof(hdata), &hdata);
> > -             if (!ptr)
> > +             if (!offset_valid(skb, write_offset)) {
> > +                     pr_info_ratelimited("tc action pedit offset %d out of bounds\n",
> > +                                         write_offset);
> >                       goto bad;
> > +             }
> > +
> > +             if (write_offset < 0) {
> > +                     if (skb_cow(skb, -write_offset))
> > +                             goto bad;
> > +                     if (write_offset + (int)sizeof(*ptr) > 0) {
> > +                             if (skb_ensure_writable(skb,
> > +                                                     min(skb->len,
> > +                                                         write_offset + sizeof(*ptr))))
>
> Combining these with && instead of the double indentation would be more
> readable IMO (shorter lines, aligning the 'goto bad' labels).

hrm. So:
if ((write_offset < 0 && ((skb_cow(skb, -write_offset)) &&
(write_offset + (int)sizeof(*ptr) > 0) &&
(skb_ensure_writable(skb,min(skb->len,write_offset + sizeof(*ptr)))) {
            goto bad;
else {
  ...
}

Not sure which is more readable.

cheers,
jamal
>
> > +                                     goto bad;
> > +                     }
> > +             } else {
> > +                     if (unlikely(check_add_overflow(write_offset,
> > +                                                     (int)sizeof(*ptr),
> > +                                                     &write_len)))
>
> Same comment wrt unlikely()
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net v2 1/1] net/sched: fix pedit partial COW leading to page cache corruption
  2026-05-27 14:56   ` Jamal Hadi Salim
@ 2026-05-27 15:12     ` Toke Høiland-Jørgensen
  2026-05-27 16:44       ` Jamal Hadi Salim
  2026-05-27 16:13     ` David Laight
  1 sibling, 1 reply; 12+ messages in thread
From: Toke Høiland-Jørgensen @ 2026-05-27 15:12 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: netdev, davem, edumazet, kuba, pabeni, horms, jiri, victor,
	david.laight.linux, yimingqian591, keenanat2000, 2045gemini,
	rollkingzzc, dcaratti, security, linux-kernel, Rajat Gupta

Jamal Hadi Salim <jhs@mojatatu.com> writes:

>  &&
>
> On Tue, May 26, 2026 at 3:22 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>
>> Jamal Hadi Salim <jhs@mojatatu.com> writes:
>>
>> > From: Rajat Gupta <rajat.gupta@oss.qualcomm.com>
>> >
>> > tcf_pedit_act() computes the COW range for skb_ensure_writable()
>> > once before the key loop using tcfp_off_max_hint, but the hint does
>> > not account for the runtime header offset added by typed keys. This
>> > can leave part of the write region un-COW'd.
>> >
>> > Fix by moving skb_ensure_writable() inside the per-key loop where
>> > the actual write offset is known, and add overflow checking on the
>> > offset arithmetic. For negative offsets (e.g. Ethernet header edits
>> > at ingress), use skb_cow() to COW the headroom instead. Guard
>> > offset_valid() against INT_MIN, where negation is undefined.
>> >
>> > Additionally, linearize skbs with shared frags upfront to prevent
>> > silent data corruption when pedit operates on zero-copy pages
>> > (e.g. from sendfile).
>> >
>> > Fixes: 8b796475fd78 ("net/sched: act_pedit: really ensure the skb is writable")
>> > Reported-by: Yiming Qian <yimingqian591@gmail.com>
>> > Reported-by: Keenan Dong <keenanat2000@gmail.com>
>> > Reported-by: Han Guidong <2045gemini@gmail.com>
>> > Reported-by: Zhang Cen <rollkingzzc@gmail.com>
>> > Tested-by: Victor Nogueira <victor@mojatatu.com>
>> > Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
>> > Signed-off-by: Rajat Gupta <rajat.gupta@oss.qualcomm.com>
>>
>> Re-ran the tests, and everything looks good, so:
>>
>> Tested-by: Toke Høiland-Jørgensen <toke@redhat.com>
>>
>> Also looked at the code, and I have a few nits below, but I'm really
>> nitpicking here, so whether you end up fixing those or not:
>>
>> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
>>
>>
>> [...]
>>
>> > @@ -323,7 +324,7 @@ static bool offset_valid(struct sk_buff *skb, int offset)
>> >       if (offset > 0 && offset > skb->len)
>> >               return false;
>> >
>> > -     if  (offset < 0 && -offset > skb_headroom(skb))
>> > +     if (offset < 0 && offset < -(int)skb_headroom(skb))
>> >               return false;
>>
>> This change makes it really obvious that this is really just:
>>
>>         if (offset < -(int)skb_headroom(skb))
>>                 return false;
>>
>> so, well, that would be clearer, IMO.
>>
>> But then I guess the same could be said of the positive case, so:
>>
>> static bool offset_valid(struct sk_buff *skb, int offset)
>> {
>>         if (offset > skb->len || offset < -(int)skb_headroom(skb))
>>                 return false;
>>
>>         return true;
>> }
>>
>
> Yes, that improves readability. If i understood the discussion between
> you and David L. something like this one liner would be reasonable?
>
> if (offset > (int)skb->len || offset < -(int)skb_headroom(skb))

Yes, I believe that one is correct wrt the integer conversion rules.

>> > @@ -393,18 +394,10 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb,
>> >       struct tcf_pedit_key_ex *tkey_ex;
>> >       struct tcf_pedit_parms *parms;
>> >       struct tc_pedit_key *tkey;
>> > -     u32 max_offset;
>> >       int i;
>> >
>> >       parms = rcu_dereference_bh(p->parms);
>> >
>> > -     max_offset = (skb_transport_header_was_set(skb) ?
>> > -                   skb_transport_offset(skb) :
>> > -                   skb_network_offset(skb)) +
>> > -                  parms->tcfp_off_max_hint;
>> > -     if (skb_ensure_writable(skb, min(skb->len, max_offset)))
>> > -             goto done;
>> > -
>> >       tcf_lastuse_update(&p->tcf_tm);
>> >       tcf_action_update_bstats(&p->common, skb);
>> >
>> > @@ -412,9 +405,10 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb,
>> >       tkey_ex = parms->tcfp_keys_ex;
>> >
>> >       for (i = parms->tcfp_nkeys; i > 0; i--, tkey++) {
>> > +             int write_offset, write_len;
>> >               int offset = tkey->off;
>> >               int hoffset = 0;
>> > -             u32 *ptr, hdata;
>> > +             u32 *ptr;
>> >               u32 val;
>> >               int rc;
>> >
>> > @@ -451,15 +445,38 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb,
>> >                       }
>> >               }
>> >
>> > -             if (!offset_valid(skb, hoffset + offset)) {
>> > -                     pr_info_ratelimited("tc action pedit offset %d out of bounds\n", hoffset + offset);
>> > +             if (unlikely(check_add_overflow(hoffset, offset,
>>
>> It's a bit weird that this has the unlikely(), but the offset_valid()
>> check doesn't?
>>
>
> I focused on the bigger solution and worried more about timeliness to
> get this patch in - but it seems the distros had already picked up the
> first posted patch, so hakuna matata (Still, I should have caught
> these unlikelies ;->). Yes on the second one you pointed out.
> Will remove them.

Cool.

>> > +                                             &write_offset))) {
>> > +                     pr_info_ratelimited("tc action pedit offset overflow\n");
>> >                       goto bad;
>> >               }
>> >
>> > -             ptr = skb_header_pointer(skb, hoffset + offset,
>> > -                                      sizeof(hdata), &hdata);
>> > -             if (!ptr)
>> > +             if (!offset_valid(skb, write_offset)) {
>> > +                     pr_info_ratelimited("tc action pedit offset %d out of bounds\n",
>> > +                                         write_offset);
>> >                       goto bad;
>> > +             }
>> > +
>> > +             if (write_offset < 0) {
>> > +                     if (skb_cow(skb, -write_offset))
>> > +                             goto bad;
>> > +                     if (write_offset + (int)sizeof(*ptr) > 0) {
>> > +                             if (skb_ensure_writable(skb,
>> > +                                                     min(skb->len,
>> > +                                                         write_offset + sizeof(*ptr))))
>>
>> Combining these with && instead of the double indentation would be more
>> readable IMO (shorter lines, aligning the 'goto bad' labels).
>
> hrm. So:
> if ((write_offset < 0 && ((skb_cow(skb, -write_offset)) &&
> (write_offset + (int)sizeof(*ptr) > 0) &&
> (skb_ensure_writable(skb,min(skb->len,write_offset + sizeof(*ptr)))) {
>             goto bad;
> else {
>   ...
> }
>
> Not sure which is more readable.

No, I meant just collapsing the two inner levels - this, on top of your
patch:

diff --git i/net/sched/act_pedit.c w/net/sched/act_pedit.c
index 719bee335e1f..eb61d73730c4 100644
--- i/net/sched/act_pedit.c
+++ w/net/sched/act_pedit.c
@@ -460,12 +460,11 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb,
                if (write_offset < 0) {
                        if (skb_cow(skb, -write_offset))
                                goto bad;
-                       if (write_offset + (int)sizeof(*ptr) > 0) {
-                               if (skb_ensure_writable(skb,
-                                                       min(skb->len,
-                                                           write_offset + sizeof(*ptr))))
-                                       goto bad;
-                       }
+                       if (write_offset + (int)sizeof(*ptr) > 0 &&
+                           skb_ensure_writable(skb,
+                                               min(skb->len,
+                                                   write_offset + sizeof(*ptr))))
+                               goto bad;
                } else {
                        if (unlikely(check_add_overflow(write_offset,
                                                        (int)sizeof(*ptr),

(You could do the other thing you suggested, but as you point out that
quickly becomes a too dense soup of && :)

-Toke


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH net v2 1/1] net/sched: fix pedit partial COW leading to page cache corruption
  2026-05-27 14:56   ` Jamal Hadi Salim
  2026-05-27 15:12     ` Toke Høiland-Jørgensen
@ 2026-05-27 16:13     ` David Laight
  2026-05-27 16:48       ` Jamal Hadi Salim
  1 sibling, 1 reply; 12+ messages in thread
From: David Laight @ 2026-05-27 16:13 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: Toke Høiland-Jørgensen, netdev, davem, edumazet, kuba,
	pabeni, horms, jiri, victor, yimingqian591, keenanat2000,
	2045gemini, rollkingzzc, dcaratti, security, linux-kernel,
	Rajat Gupta

On Wed, 27 May 2026 10:56:57 -0400
Jamal Hadi Salim <jhs@mojatatu.com> wrote:

>  &&
> 
> On Tue, May 26, 2026 at 3:22 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> >
> > Jamal Hadi Salim <jhs@mojatatu.com> writes:
> >  
> > > From: Rajat Gupta <rajat.gupta@oss.qualcomm.com>
> > >
> > > tcf_pedit_act() computes the COW range for skb_ensure_writable()
> > > once before the key loop using tcfp_off_max_hint, but the hint does
> > > not account for the runtime header offset added by typed keys. This
> > > can leave part of the write region un-COW'd.
> > >
> > > Fix by moving skb_ensure_writable() inside the per-key loop where
> > > the actual write offset is known, and add overflow checking on the
> > > offset arithmetic. For negative offsets (e.g. Ethernet header edits
> > > at ingress), use skb_cow() to COW the headroom instead. Guard
> > > offset_valid() against INT_MIN, where negation is undefined.
> > >
> > > Additionally, linearize skbs with shared frags upfront to prevent
> > > silent data corruption when pedit operates on zero-copy pages
> > > (e.g. from sendfile).
> > >
> > > Fixes: 8b796475fd78 ("net/sched: act_pedit: really ensure the skb is writable")
> > > Reported-by: Yiming Qian <yimingqian591@gmail.com>
> > > Reported-by: Keenan Dong <keenanat2000@gmail.com>
> > > Reported-by: Han Guidong <2045gemini@gmail.com>
> > > Reported-by: Zhang Cen <rollkingzzc@gmail.com>
> > > Tested-by: Victor Nogueira <victor@mojatatu.com>
> > > Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
> > > Signed-off-by: Rajat Gupta <rajat.gupta@oss.qualcomm.com>  
> >
> > Re-ran the tests, and everything looks good, so:
> >
> > Tested-by: Toke Høiland-Jørgensen <toke@redhat.com>
> >
> > Also looked at the code, and I have a few nits below, but I'm really
> > nitpicking here, so whether you end up fixing those or not:
> >
> > Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
> >
> >
> > [...]
> >  
> > > @@ -323,7 +324,7 @@ static bool offset_valid(struct sk_buff *skb, int offset)
> > >       if (offset > 0 && offset > skb->len)
> > >               return false;
> > >
> > > -     if  (offset < 0 && -offset > skb_headroom(skb))
> > > +     if (offset < 0 && offset < -(int)skb_headroom(skb))
> > >               return false;  
> >
> > This change makes it really obvious that this is really just:
> >
> >         if (offset < -(int)skb_headroom(skb))
> >                 return false;
> >
> > so, well, that would be clearer, IMO.
> >
> > But then I guess the same could be said of the positive case, so:
> >
> > static bool offset_valid(struct sk_buff *skb, int offset)
> > {
> >         if (offset > skb->len || offset < -(int)skb_headroom(skb))
> >                 return false;
> >
> >         return true;
> > }
> >  
> 
> Yes, that improves readability. If i understood the discussion between
> you and David L. something like this one liner would be reasonable?
> 
> if (offset > (int)skb->len || offset < -(int)skb_headroom(skb))

I doubt it will be measurable whatever you pick.

Thinks, can 'we' get away with the much more readable:
	offset + (int)skb_headroom(skb) < 0
No need to worry about the '+' wrapping, the first test will fail.
Large -ve offset are going to stay negative.
So it does look all right.

> 
> 
> > > @@ -393,18 +394,10 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb,
> > >       struct tcf_pedit_key_ex *tkey_ex;
> > >       struct tcf_pedit_parms *parms;
> > >       struct tc_pedit_key *tkey;
> > > -     u32 max_offset;
> > >       int i;
> > >
> > >       parms = rcu_dereference_bh(p->parms);
> > >
> > > -     max_offset = (skb_transport_header_was_set(skb) ?
> > > -                   skb_transport_offset(skb) :
> > > -                   skb_network_offset(skb)) +
> > > -                  parms->tcfp_off_max_hint;
> > > -     if (skb_ensure_writable(skb, min(skb->len, max_offset)))
> > > -             goto done;
> > > -
> > >       tcf_lastuse_update(&p->tcf_tm);
> > >       tcf_action_update_bstats(&p->common, skb);
> > >
> > > @@ -412,9 +405,10 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb,
> > >       tkey_ex = parms->tcfp_keys_ex;
> > >
> > >       for (i = parms->tcfp_nkeys; i > 0; i--, tkey++) {
> > > +             int write_offset, write_len;
> > >               int offset = tkey->off;
> > >               int hoffset = 0;
> > > -             u32 *ptr, hdata;
> > > +             u32 *ptr;
> > >               u32 val;
> > >               int rc;
> > >
> > > @@ -451,15 +445,38 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb,
> > >                       }
> > >               }
> > >
> > > -             if (!offset_valid(skb, hoffset + offset)) {
> > > -                     pr_info_ratelimited("tc action pedit offset %d out of bounds\n", hoffset + offset);
> > > +             if (unlikely(check_add_overflow(hoffset, offset,  
> >
> > It's a bit weird that this has the unlikely(), but the offset_valid()
> > check doesn't?
> >  
> 
> I focused on the bigger solution and worried more about timeliness to
> get this patch in - but it seems the distros had already picked up the
> first posted patch, so hakuna matata (Still, I should have caught
> these unlikelies ;->). Yes on the second one you pointed out.
> Will remove them.
> 
> > > +                                             &write_offset))) {
> > > +                     pr_info_ratelimited("tc action pedit offset overflow\n");
> > >                       goto bad;
> > >               }
> > >
> > > -             ptr = skb_header_pointer(skb, hoffset + offset,
> > > -                                      sizeof(hdata), &hdata);
> > > -             if (!ptr)
> > > +             if (!offset_valid(skb, write_offset)) {
> > > +                     pr_info_ratelimited("tc action pedit offset %d out of bounds\n",
> > > +                                         write_offset);
> > >                       goto bad;
> > > +             }
> > > +
> > > +             if (write_offset < 0) {
> > > +                     if (skb_cow(skb, -write_offset))
> > > +                             goto bad;
> > > +                     if (write_offset + (int)sizeof(*ptr) > 0) {
> > > +                             if (skb_ensure_writable(skb,
> > > +                                                     min(skb->len,
> > > +                                                         write_offset + sizeof(*ptr))))  
> >
> > Combining these with && instead of the double indentation would be more
> > readable IMO (shorter lines, aligning the 'goto bad' labels).  
> 
> hrm. So:
> if ((write_offset < 0 && ((skb_cow(skb, -write_offset)) &&
> (write_offset + (int)sizeof(*ptr) > 0) &&
> (skb_ensure_writable(skb,min(skb->len,write_offset + sizeof(*ptr)))) {
>             goto bad;
> else {
>   ...
> }
> 
> Not sure which is more readable.

I think he meant just the last two 'if'.

-- David

> 
> cheers,
> jamal
> >  
> > > +                                     goto bad;
> > > +                     }
> > > +             } else {
> > > +                     if (unlikely(check_add_overflow(write_offset,
> > > +                                                     (int)sizeof(*ptr),
> > > +                                                     &write_len)))  
> >
> > Same comment wrt unlikely()
> >  


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net v2 1/1] net/sched: fix pedit partial COW leading to page cache corruption
  2026-05-27 15:12     ` Toke Høiland-Jørgensen
@ 2026-05-27 16:44       ` Jamal Hadi Salim
  0 siblings, 0 replies; 12+ messages in thread
From: Jamal Hadi Salim @ 2026-05-27 16:44 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: netdev, davem, edumazet, kuba, pabeni, horms, jiri, victor,
	david.laight.linux, yimingqian591, keenanat2000, 2045gemini,
	rollkingzzc, dcaratti, security, linux-kernel, Rajat Gupta

On Wed, May 27, 2026 at 11:12 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> Jamal Hadi Salim <jhs@mojatatu.com> writes:
>
> >  &&
> >
> > On Tue, May 26, 2026 at 3:22 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> >>
> >> Jamal Hadi Salim <jhs@mojatatu.com> writes:
> >>
> >> > From: Rajat Gupta <rajat.gupta@oss.qualcomm.com>
> >> >
> >> > tcf_pedit_act() computes the COW range for skb_ensure_writable()
> >> > once before the key loop using tcfp_off_max_hint, but the hint does
> >> > not account for the runtime header offset added by typed keys. This
> >> > can leave part of the write region un-COW'd.
> >> >
> >> > Fix by moving skb_ensure_writable() inside the per-key loop where
> >> > the actual write offset is known, and add overflow checking on the
> >> > offset arithmetic. For negative offsets (e.g. Ethernet header edits
> >> > at ingress), use skb_cow() to COW the headroom instead. Guard
> >> > offset_valid() against INT_MIN, where negation is undefined.
> >> >
> >> > Additionally, linearize skbs with shared frags upfront to prevent
> >> > silent data corruption when pedit operates on zero-copy pages
> >> > (e.g. from sendfile).
> >> >
> >> > Fixes: 8b796475fd78 ("net/sched: act_pedit: really ensure the skb is writable")
> >> > Reported-by: Yiming Qian <yimingqian591@gmail.com>
> >> > Reported-by: Keenan Dong <keenanat2000@gmail.com>
> >> > Reported-by: Han Guidong <2045gemini@gmail.com>
> >> > Reported-by: Zhang Cen <rollkingzzc@gmail.com>
> >> > Tested-by: Victor Nogueira <victor@mojatatu.com>
> >> > Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
> >> > Signed-off-by: Rajat Gupta <rajat.gupta@oss.qualcomm.com>
> >>
> >> Re-ran the tests, and everything looks good, so:
> >>
> >> Tested-by: Toke Høiland-Jørgensen <toke@redhat.com>
> >>
> >> Also looked at the code, and I have a few nits below, but I'm really
> >> nitpicking here, so whether you end up fixing those or not:
> >>
> >> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
> >>
> >>
> >> [...]
> >>
> >> > @@ -323,7 +324,7 @@ static bool offset_valid(struct sk_buff *skb, int offset)
> >> >       if (offset > 0 && offset > skb->len)
> >> >               return false;
> >> >
> >> > -     if  (offset < 0 && -offset > skb_headroom(skb))
> >> > +     if (offset < 0 && offset < -(int)skb_headroom(skb))
> >> >               return false;
> >>
> >> This change makes it really obvious that this is really just:
> >>
> >>         if (offset < -(int)skb_headroom(skb))
> >>                 return false;
> >>
> >> so, well, that would be clearer, IMO.
> >>
> >> But then I guess the same could be said of the positive case, so:
> >>
> >> static bool offset_valid(struct sk_buff *skb, int offset)
> >> {
> >>         if (offset > skb->len || offset < -(int)skb_headroom(skb))
> >>                 return false;
> >>
> >>         return true;
> >> }
> >>
> >
> > Yes, that improves readability. If i understood the discussion between
> > you and David L. something like this one liner would be reasonable?
> >
> > if (offset > (int)skb->len || offset < -(int)skb_headroom(skb))
>
> Yes, I believe that one is correct wrt the integer conversion rules.
>
> >> > @@ -393,18 +394,10 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb,
> >> >       struct tcf_pedit_key_ex *tkey_ex;
> >> >       struct tcf_pedit_parms *parms;
> >> >       struct tc_pedit_key *tkey;
> >> > -     u32 max_offset;
> >> >       int i;
> >> >
> >> >       parms = rcu_dereference_bh(p->parms);
> >> >
> >> > -     max_offset = (skb_transport_header_was_set(skb) ?
> >> > -                   skb_transport_offset(skb) :
> >> > -                   skb_network_offset(skb)) +
> >> > -                  parms->tcfp_off_max_hint;
> >> > -     if (skb_ensure_writable(skb, min(skb->len, max_offset)))
> >> > -             goto done;
> >> > -
> >> >       tcf_lastuse_update(&p->tcf_tm);
> >> >       tcf_action_update_bstats(&p->common, skb);
> >> >
> >> > @@ -412,9 +405,10 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb,
> >> >       tkey_ex = parms->tcfp_keys_ex;
> >> >
> >> >       for (i = parms->tcfp_nkeys; i > 0; i--, tkey++) {
> >> > +             int write_offset, write_len;
> >> >               int offset = tkey->off;
> >> >               int hoffset = 0;
> >> > -             u32 *ptr, hdata;
> >> > +             u32 *ptr;
> >> >               u32 val;
> >> >               int rc;
> >> >
> >> > @@ -451,15 +445,38 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb,
> >> >                       }
> >> >               }
> >> >
> >> > -             if (!offset_valid(skb, hoffset + offset)) {
> >> > -                     pr_info_ratelimited("tc action pedit offset %d out of bounds\n", hoffset + offset);
> >> > +             if (unlikely(check_add_overflow(hoffset, offset,
> >>
> >> It's a bit weird that this has the unlikely(), but the offset_valid()
> >> check doesn't?
> >>
> >
> > I focused on the bigger solution and worried more about timeliness to
> > get this patch in - but it seems the distros had already picked up the
> > first posted patch, so hakuna matata (Still, I should have caught
> > these unlikelies ;->). Yes on the second one you pointed out.
> > Will remove them.
>
> Cool.
>
> >> > +                                             &write_offset))) {
> >> > +                     pr_info_ratelimited("tc action pedit offset overflow\n");
> >> >                       goto bad;
> >> >               }
> >> >
> >> > -             ptr = skb_header_pointer(skb, hoffset + offset,
> >> > -                                      sizeof(hdata), &hdata);
> >> > -             if (!ptr)
> >> > +             if (!offset_valid(skb, write_offset)) {
> >> > +                     pr_info_ratelimited("tc action pedit offset %d out of bounds\n",
> >> > +                                         write_offset);
> >> >                       goto bad;
> >> > +             }
> >> > +
> >> > +             if (write_offset < 0) {
> >> > +                     if (skb_cow(skb, -write_offset))
> >> > +                             goto bad;
> >> > +                     if (write_offset + (int)sizeof(*ptr) > 0) {
> >> > +                             if (skb_ensure_writable(skb,
> >> > +                                                     min(skb->len,
> >> > +                                                         write_offset + sizeof(*ptr))))
> >>
> >> Combining these with && instead of the double indentation would be more
> >> readable IMO (shorter lines, aligning the 'goto bad' labels).
> >
> > hrm. So:
> > if ((write_offset < 0 && ((skb_cow(skb, -write_offset)) &&
> > (write_offset + (int)sizeof(*ptr) > 0) &&
> > (skb_ensure_writable(skb,min(skb->len,write_offset + sizeof(*ptr)))) {
> >             goto bad;
> > else {
> >   ...
> > }
> >
> > Not sure which is more readable.
>
> No, I meant just collapsing the two inner levels - this, on top of your
> patch:
>
> diff --git i/net/sched/act_pedit.c w/net/sched/act_pedit.c
> index 719bee335e1f..eb61d73730c4 100644
> --- i/net/sched/act_pedit.c
> +++ w/net/sched/act_pedit.c
> @@ -460,12 +460,11 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb,
>                 if (write_offset < 0) {
>                         if (skb_cow(skb, -write_offset))
>                                 goto bad;
> -                       if (write_offset + (int)sizeof(*ptr) > 0) {
> -                               if (skb_ensure_writable(skb,
> -                                                       min(skb->len,
> -                                                           write_offset + sizeof(*ptr))))
> -                                       goto bad;
> -                       }
> +                       if (write_offset + (int)sizeof(*ptr) > 0 &&
> +                           skb_ensure_writable(skb,
> +                                               min(skb->len,
> +                                                   write_offset + sizeof(*ptr))))
> +                               goto bad;
>                 } else {
>                         if (unlikely(check_add_overflow(write_offset,
>                                                         (int)sizeof(*ptr),
>
> (You could do the other thing you suggested, but as you point out that
> quickly becomes a too dense soup of && :)
>

What you shared is better. Will use that one.

cheers,
jamal
> -Toke
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net v2 1/1] net/sched: fix pedit partial COW leading to page cache corruption
  2026-05-27 16:13     ` David Laight
@ 2026-05-27 16:48       ` Jamal Hadi Salim
  2026-05-27 18:25         ` Jamal Hadi Salim
  0 siblings, 1 reply; 12+ messages in thread
From: Jamal Hadi Salim @ 2026-05-27 16:48 UTC (permalink / raw)
  To: David Laight
  Cc: Toke Høiland-Jørgensen, netdev, davem, edumazet, kuba,
	pabeni, horms, jiri, victor, yimingqian591, keenanat2000,
	2045gemini, rollkingzzc, dcaratti, security, linux-kernel,
	Rajat Gupta

On Wed, May 27, 2026 at 12:13 PM David Laight
<david.laight.linux@gmail.com> wrote:
>
> On Wed, 27 May 2026 10:56:57 -0400
> Jamal Hadi Salim <jhs@mojatatu.com> wrote:
>
> >  &&
> >
> > On Tue, May 26, 2026 at 3:22 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> > >
> > > Jamal Hadi Salim <jhs@mojatatu.com> writes:
> > >
> > > > From: Rajat Gupta <rajat.gupta@oss.qualcomm.com>
> > > >
> > > > tcf_pedit_act() computes the COW range for skb_ensure_writable()
> > > > once before the key loop using tcfp_off_max_hint, but the hint does
> > > > not account for the runtime header offset added by typed keys. This
> > > > can leave part of the write region un-COW'd.
> > > >
> > > > Fix by moving skb_ensure_writable() inside the per-key loop where
> > > > the actual write offset is known, and add overflow checking on the
> > > > offset arithmetic. For negative offsets (e.g. Ethernet header edits
> > > > at ingress), use skb_cow() to COW the headroom instead. Guard
> > > > offset_valid() against INT_MIN, where negation is undefined.
> > > >
> > > > Additionally, linearize skbs with shared frags upfront to prevent
> > > > silent data corruption when pedit operates on zero-copy pages
> > > > (e.g. from sendfile).
> > > >
> > > > Fixes: 8b796475fd78 ("net/sched: act_pedit: really ensure the skb is writable")
> > > > Reported-by: Yiming Qian <yimingqian591@gmail.com>
> > > > Reported-by: Keenan Dong <keenanat2000@gmail.com>
> > > > Reported-by: Han Guidong <2045gemini@gmail.com>
> > > > Reported-by: Zhang Cen <rollkingzzc@gmail.com>
> > > > Tested-by: Victor Nogueira <victor@mojatatu.com>
> > > > Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
> > > > Signed-off-by: Rajat Gupta <rajat.gupta@oss.qualcomm.com>
> > >
> > > Re-ran the tests, and everything looks good, so:
> > >
> > > Tested-by: Toke Høiland-Jørgensen <toke@redhat.com>
> > >
> > > Also looked at the code, and I have a few nits below, but I'm really
> > > nitpicking here, so whether you end up fixing those or not:
> > >
> > > Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
> > >
> > >
> > > [...]
> > >
> > > > @@ -323,7 +324,7 @@ static bool offset_valid(struct sk_buff *skb, int offset)
> > > >       if (offset > 0 && offset > skb->len)
> > > >               return false;
> > > >
> > > > -     if  (offset < 0 && -offset > skb_headroom(skb))
> > > > +     if (offset < 0 && offset < -(int)skb_headroom(skb))
> > > >               return false;
> > >
> > > This change makes it really obvious that this is really just:
> > >
> > >         if (offset < -(int)skb_headroom(skb))
> > >                 return false;
> > >
> > > so, well, that would be clearer, IMO.
> > >
> > > But then I guess the same could be said of the positive case, so:
> > >
> > > static bool offset_valid(struct sk_buff *skb, int offset)
> > > {
> > >         if (offset > skb->len || offset < -(int)skb_headroom(skb))
> > >                 return false;
> > >
> > >         return true;
> > > }
> > >
> >
> > Yes, that improves readability. If i understood the discussion between
> > you and David L. something like this one liner would be reasonable?
> >
> > if (offset > (int)skb->len || offset < -(int)skb_headroom(skb))
>
> I doubt it will be measurable whatever you pick.
>
> Thinks, can 'we' get away with the much more readable:
>         offset + (int)skb_headroom(skb) < 0
> No need to worry about the '+' wrapping,

By that you mean if offset was _very large_ then things will overflow
and wrap around to a negative number?

> the first test will fail.
> Large -ve offset are going to stay negative.
> So it does look all right.
>

To use Toke's code golfing analogy, maybe let's just stay with the
last optimization I posted i.e " (offset > (int)skb->len || offset <
-(int)skb_headroom(skb))"  ?

cheers,
jamal

> >
> >
> > > > @@ -393,18 +394,10 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb,
> > > >       struct tcf_pedit_key_ex *tkey_ex;
> > > >       struct tcf_pedit_parms *parms;
> > > >       struct tc_pedit_key *tkey;
> > > > -     u32 max_offset;
> > > >       int i;
> > > >
> > > >       parms = rcu_dereference_bh(p->parms);
> > > >
> > > > -     max_offset = (skb_transport_header_was_set(skb) ?
> > > > -                   skb_transport_offset(skb) :
> > > > -                   skb_network_offset(skb)) +
> > > > -                  parms->tcfp_off_max_hint;
> > > > -     if (skb_ensure_writable(skb, min(skb->len, max_offset)))
> > > > -             goto done;
> > > > -
> > > >       tcf_lastuse_update(&p->tcf_tm);
> > > >       tcf_action_update_bstats(&p->common, skb);
> > > >
> > > > @@ -412,9 +405,10 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb,
> > > >       tkey_ex = parms->tcfp_keys_ex;
> > > >
> > > >       for (i = parms->tcfp_nkeys; i > 0; i--, tkey++) {
> > > > +             int write_offset, write_len;
> > > >               int offset = tkey->off;
> > > >               int hoffset = 0;
> > > > -             u32 *ptr, hdata;
> > > > +             u32 *ptr;
> > > >               u32 val;
> > > >               int rc;
> > > >
> > > > @@ -451,15 +445,38 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb,
> > > >                       }
> > > >               }
> > > >
> > > > -             if (!offset_valid(skb, hoffset + offset)) {
> > > > -                     pr_info_ratelimited("tc action pedit offset %d out of bounds\n", hoffset + offset);
> > > > +             if (unlikely(check_add_overflow(hoffset, offset,
> > >
> > > It's a bit weird that this has the unlikely(), but the offset_valid()
> > > check doesn't?
> > >
> >
> > I focused on the bigger solution and worried more about timeliness to
> > get this patch in - but it seems the distros had already picked up the
> > first posted patch, so hakuna matata (Still, I should have caught
> > these unlikelies ;->). Yes on the second one you pointed out.
> > Will remove them.
> >
> > > > +                                             &write_offset))) {
> > > > +                     pr_info_ratelimited("tc action pedit offset overflow\n");
> > > >                       goto bad;
> > > >               }
> > > >
> > > > -             ptr = skb_header_pointer(skb, hoffset + offset,
> > > > -                                      sizeof(hdata), &hdata);
> > > > -             if (!ptr)
> > > > +             if (!offset_valid(skb, write_offset)) {
> > > > +                     pr_info_ratelimited("tc action pedit offset %d out of bounds\n",
> > > > +                                         write_offset);
> > > >                       goto bad;
> > > > +             }
> > > > +
> > > > +             if (write_offset < 0) {
> > > > +                     if (skb_cow(skb, -write_offset))
> > > > +                             goto bad;
> > > > +                     if (write_offset + (int)sizeof(*ptr) > 0) {
> > > > +                             if (skb_ensure_writable(skb,
> > > > +                                                     min(skb->len,
> > > > +                                                         write_offset + sizeof(*ptr))))
> > >
> > > Combining these with && instead of the double indentation would be more
> > > readable IMO (shorter lines, aligning the 'goto bad' labels).
> >
> > hrm. So:
> > if ((write_offset < 0 && ((skb_cow(skb, -write_offset)) &&
> > (write_offset + (int)sizeof(*ptr) > 0) &&
> > (skb_ensure_writable(skb,min(skb->len,write_offset + sizeof(*ptr)))) {
> >             goto bad;
> > else {
> >   ...
> > }
> >
> > Not sure which is more readable.
>
> I think he meant just the last two 'if'.
>
> -- David
>
> >
> > cheers,
> > jamal
> > >
> > > > +                                     goto bad;
> > > > +                     }
> > > > +             } else {
> > > > +                     if (unlikely(check_add_overflow(write_offset,
> > > > +                                                     (int)sizeof(*ptr),
> > > > +                                                     &write_len)))
> > >
> > > Same comment wrt unlikely()
> > >
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net v2 1/1] net/sched: fix pedit partial COW leading to page cache corruption
  2026-05-27 16:48       ` Jamal Hadi Salim
@ 2026-05-27 18:25         ` Jamal Hadi Salim
  0 siblings, 0 replies; 12+ messages in thread
From: Jamal Hadi Salim @ 2026-05-27 18:25 UTC (permalink / raw)
  To: David Laight
  Cc: Toke Høiland-Jørgensen, netdev, davem, edumazet, kuba,
	pabeni, horms, jiri, victor, yimingqian591, keenanat2000,
	2045gemini, rollkingzzc, dcaratti, security, linux-kernel,
	Rajat Gupta

On Wed, May 27, 2026 at 12:48 PM Jamal Hadi Salim <jhs@mojatatu.com> wrote:
>
> On Wed, May 27, 2026 at 12:13 PM David Laight
> <david.laight.linux@gmail.com> wrote:
> >
> > On Wed, 27 May 2026 10:56:57 -0400
> > Jamal Hadi Salim <jhs@mojatatu.com> wrote:
> >
> > >  &&
> > >
> > > On Tue, May 26, 2026 at 3:22 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> > > >
> > > > Jamal Hadi Salim <jhs@mojatatu.com> writes:
> > > >
> > > > > From: Rajat Gupta <rajat.gupta@oss.qualcomm.com>
> > > > >
> > > > > tcf_pedit_act() computes the COW range for skb_ensure_writable()
> > > > > once before the key loop using tcfp_off_max_hint, but the hint does
> > > > > not account for the runtime header offset added by typed keys. This
> > > > > can leave part of the write region un-COW'd.
> > > > >
> > > > > Fix by moving skb_ensure_writable() inside the per-key loop where
> > > > > the actual write offset is known, and add overflow checking on the
> > > > > offset arithmetic. For negative offsets (e.g. Ethernet header edits
> > > > > at ingress), use skb_cow() to COW the headroom instead. Guard
> > > > > offset_valid() against INT_MIN, where negation is undefined.
> > > > >
> > > > > Additionally, linearize skbs with shared frags upfront to prevent
> > > > > silent data corruption when pedit operates on zero-copy pages
> > > > > (e.g. from sendfile).
> > > > >
> > > > > Fixes: 8b796475fd78 ("net/sched: act_pedit: really ensure the skb is writable")
> > > > > Reported-by: Yiming Qian <yimingqian591@gmail.com>
> > > > > Reported-by: Keenan Dong <keenanat2000@gmail.com>
> > > > > Reported-by: Han Guidong <2045gemini@gmail.com>
> > > > > Reported-by: Zhang Cen <rollkingzzc@gmail.com>
> > > > > Tested-by: Victor Nogueira <victor@mojatatu.com>
> > > > > Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
> > > > > Signed-off-by: Rajat Gupta <rajat.gupta@oss.qualcomm.com>
> > > >
> > > > Re-ran the tests, and everything looks good, so:
> > > >
> > > > Tested-by: Toke Høiland-Jørgensen <toke@redhat.com>
> > > >
> > > > Also looked at the code, and I have a few nits below, but I'm really
> > > > nitpicking here, so whether you end up fixing those or not:
> > > >
> > > > Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
> > > >
> > > >
> > > > [...]
> > > >
> > > > > @@ -323,7 +324,7 @@ static bool offset_valid(struct sk_buff *skb, int offset)
> > > > >       if (offset > 0 && offset > skb->len)
> > > > >               return false;
> > > > >
> > > > > -     if  (offset < 0 && -offset > skb_headroom(skb))
> > > > > +     if (offset < 0 && offset < -(int)skb_headroom(skb))
> > > > >               return false;
> > > >
> > > > This change makes it really obvious that this is really just:
> > > >
> > > >         if (offset < -(int)skb_headroom(skb))
> > > >                 return false;
> > > >
> > > > so, well, that would be clearer, IMO.
> > > >
> > > > But then I guess the same could be said of the positive case, so:
> > > >
> > > > static bool offset_valid(struct sk_buff *skb, int offset)
> > > > {
> > > >         if (offset > skb->len || offset < -(int)skb_headroom(skb))
> > > >                 return false;
> > > >
> > > >         return true;
> > > > }
> > > >
> > >
> > > Yes, that improves readability. If i understood the discussion between
> > > you and David L. something like this one liner would be reasonable?
> > >
> > > if (offset > (int)skb->len || offset < -(int)skb_headroom(skb))
> >
> > I doubt it will be measurable whatever you pick.
> >
> > Thinks, can 'we' get away with the much more readable:
> >         offset + (int)skb_headroom(skb) < 0
> > No need to worry about the '+' wrapping,
>
> By that you mean if offset was _very large_ then things will overflow
> and wrap around to a negative number?
>
> > the first test will fail.
> > Large -ve offset are going to stay negative.
> > So it does look all right.
> >
>
> To use Toke's code golfing analogy, maybe let's just stay with the
> last optimization I posted i.e " (offset > (int)skb->len || offset <
> -(int)skb_headroom(skb))"  ?
>

I sent v3. I added names for Tested-by/Reviewed-by but because i made
change, even though they are simple, please test if you can. I dont
expect anything to break and Victor reviewed and tested already.

Please no more changes. Hold your tongue if you see some missing comma
or period.

cheers,
jamal

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2026-05-27 18:25 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-26 15:59 [PATCH net v2 1/1] net/sched: fix pedit partial COW leading to page cache corruption Jamal Hadi Salim
2026-05-26 19:22 ` Toke Høiland-Jørgensen
2026-05-27  9:00   ` David Laight
2026-05-27 10:21     ` Toke Høiland-Jørgensen
2026-05-27 14:56   ` Jamal Hadi Salim
2026-05-27 15:12     ` Toke Høiland-Jørgensen
2026-05-27 16:44       ` Jamal Hadi Salim
2026-05-27 16:13     ` David Laight
2026-05-27 16:48       ` Jamal Hadi Salim
2026-05-27 18:25         ` Jamal Hadi Salim
2026-05-26 21:29 ` Davide Caratti
2026-05-27  2:36 ` Han Guidong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox