* [PATCH net v5 1/1] net/sched: fix pedit partial COW leading to page cache corruption
@ 2026-05-31 12:32 Jamal Hadi Salim
2026-05-31 16:15 ` David Laight
0 siblings, 1 reply; 6+ messages in thread
From: Jamal Hadi Salim @ 2026-05-31 12:32 UTC (permalink / raw)
To: netdev
Cc: davem, edumazet, kuba, pabeni, horms, jiri, victor,
david.laight.linux, yimingqian591, keenanat2000, 2045gemini,
rollkingzzc, toke, dcaratti, security, linux-kernel, stable,
Rajat Gupta, Jamal Hadi Salim
From: Rajat Gupta <rajat.gupta@oss.qualcomm.com>
tcf_pedit_act() computes the COW range for skb_ensure_writable()
once before the key loop using tcfp_off_max_hint, but the hint does
not account for the runtime header offset added by typed keys. This
can leave part of the write region un-COW'd.
Fix by moving skb_ensure_writable() inside the per-key loop where
the actual write offset is known, and add overflow checking on the
offset arithmetic. For negative offsets (e.g. Ethernet header edits
at ingress), use skb_cow() to COW the headroom instead. Guard
offset_valid() against INT_MIN, where negation is undefined.
Fixes: 8b796475fd78 ("net/sched: act_pedit: really ensure the skb is writable")
Reported-by: Yiming Qian <yimingqian591@gmail.com>
Reported-by: Keenan Dong <keenanat2000@gmail.com>
Reported-by: Han Guidong <2045gemini@gmail.com>
Reported-by: Zhang Cen <rollkingzzc@gmail.com>
Reviewed-by: Han Guidong <2045gemini@gmail.com>
Tested-by: Han Guidong <2045gemini@gmail.com>
Reviewed-by: Davide Caratti <dcaratti@redhat.com>
Tested-by: Davide Caratti <dcaratti@redhat.com>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Tested-by: Toke Høiland-Jørgensen <toke@redhat.com>
Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Tested-by: Victor Nogueira <victor@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Rajat Gupta <rajat.gupta@oss.qualcomm.com>
---
v4->v5:
1) Dead code removal obsoleted after removing tcfp_off_max_hint as identified
by sashiko-claude[1]. Remove dead code updating "cur".
2) Addition of hoffset at tkey->at promotes hoffset to unsigned int,
wraps on overflow. Claim made by both sashiko-claude[1] and sashiko-gemini[2]
I think this is far fetched but possible. Cast tkey->at to int.
3) Signedness mismatch in min() macro may potentially cause build issues.
Claim made by both sashiko-claude[1] and sashiko-gemini[2].
Go back to min_t() for now. David L. can make a better proposal later..
[1]https://netdev-ai.bots.linux.dev/sashiko/#/patchset/20260530080643.1345521-1-jhs%40mojatatu.com
[2]https://sashiko.dev/#/patchset/20260530080643.1345521-1-jhs%40mojatatu.com
---
include/net/tc_act/tc_pedit.h | 1 -
net/sched/act_pedit.c | 77 +++++++++++++++++++----------------
2 files changed, 41 insertions(+), 37 deletions(-)
diff --git a/include/net/tc_act/tc_pedit.h b/include/net/tc_act/tc_pedit.h
index f58ee15cd858..cb7b82f2cbc7 100644
--- a/include/net/tc_act/tc_pedit.h
+++ b/include/net/tc_act/tc_pedit.h
@@ -15,7 +15,6 @@ struct tcf_pedit_parms {
struct tc_pedit_key *tcfp_keys;
struct tcf_pedit_key_ex *tcfp_keys_ex;
int action;
- u32 tcfp_off_max_hint;
unsigned char tcfp_nkeys;
unsigned char tcfp_flags;
struct rcu_head rcu;
diff --git a/net/sched/act_pedit.c b/net/sched/act_pedit.c
index bc20f08a2789..bd3b1da3cd63 100644
--- a/net/sched/act_pedit.c
+++ b/net/sched/act_pedit.c
@@ -16,6 +16,8 @@
#include <linux/ip.h>
#include <linux/ipv6.h>
#include <linux/slab.h>
+#include <linux/overflow.h>
+#include <linux/unaligned.h>
#include <net/ipv6.h>
#include <net/netlink.h>
#include <net/pkt_sched.h>
@@ -242,7 +244,6 @@ static int tcf_pedit_init(struct net *net, struct nlattr *nla,
goto out_free_ex;
}
- nparms->tcfp_off_max_hint = 0;
nparms->tcfp_flags = parm->flags;
nparms->tcfp_nkeys = parm->nkeys;
@@ -268,14 +269,6 @@ static int tcf_pedit_init(struct net *net, struct nlattr *nla,
BITS_PER_TYPE(int) - 1,
nparms->tcfp_keys[i].shift);
- /* The AT option can read a single byte, we can bound the actual
- * value with uchar max.
- */
- cur += (0xff & offmask) >> nparms->tcfp_keys[i].shift;
-
- /* Each key touches 4 bytes starting from the computed offset */
- nparms->tcfp_off_max_hint =
- max(nparms->tcfp_off_max_hint, cur + 4);
}
p = to_pedit(*a);
@@ -318,15 +311,12 @@ static void tcf_pedit_cleanup(struct tc_action *a)
call_rcu(&parms->rcu, tcf_pedit_cleanup_rcu);
}
-static bool offset_valid(struct sk_buff *skb, int offset)
+static bool offset_valid(struct sk_buff *skb, int offset, int len)
{
- if (offset > 0 && offset > skb->len)
- return false;
-
- if (offset < 0 && -offset > skb_headroom(skb))
+ if (offset < -(int)skb_headroom(skb))
return false;
- return true;
+ return offset <= (int)skb->len - len;
}
static int pedit_l4_skb_offset(struct sk_buff *skb, int *hoffset, const int header_type)
@@ -393,18 +383,10 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb,
struct tcf_pedit_key_ex *tkey_ex;
struct tcf_pedit_parms *parms;
struct tc_pedit_key *tkey;
- u32 max_offset;
int i;
parms = rcu_dereference_bh(p->parms);
- max_offset = (skb_transport_header_was_set(skb) ?
- skb_transport_offset(skb) :
- skb_network_offset(skb)) +
- parms->tcfp_off_max_hint;
- if (skb_ensure_writable(skb, min(skb->len, max_offset)))
- goto done;
-
tcf_lastuse_update(&p->tcf_tm);
tcf_action_update_bstats(&p->common, skb);
@@ -412,10 +394,11 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb,
tkey_ex = parms->tcfp_keys_ex;
for (i = parms->tcfp_nkeys; i > 0; i--, tkey++) {
+ int write_offset, write_len;
int offset = tkey->off;
int hoffset = 0;
- u32 *ptr, hdata;
- u32 val;
+ u32 cur_val, val;
+ u32 *ptr;
int rc;
if (tkey_ex) {
@@ -433,13 +416,15 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb,
if (tkey->offmask) {
u8 *d, _d;
+ int at_offset;
- if (!offset_valid(skb, hoffset + tkey->at)) {
+ if (check_add_overflow(hoffset, (int)tkey->at, &at_offset) ||
+ !offset_valid(skb, at_offset, sizeof(_d))) {
pr_info_ratelimited("tc action pedit 'at' offset %d out of bounds\n",
hoffset + tkey->at);
goto bad;
}
- d = skb_header_pointer(skb, hoffset + tkey->at,
+ d = skb_header_pointer(skb, at_offset,
sizeof(_d), &_d);
if (!d)
goto bad;
@@ -451,31 +436,51 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb,
}
}
- if (!offset_valid(skb, hoffset + offset)) {
- pr_info_ratelimited("tc action pedit offset %d out of bounds\n", hoffset + offset);
+ if (check_add_overflow(hoffset, offset, &write_offset)) {
+ pr_info_ratelimited("tc action pedit offset overflow\n");
goto bad;
}
- ptr = skb_header_pointer(skb, hoffset + offset,
- sizeof(hdata), &hdata);
- if (!ptr)
+ if (!offset_valid(skb, write_offset, sizeof(*ptr))) {
+ pr_info_ratelimited("tc action pedit offset %d out of bounds\n",
+ write_offset);
goto bad;
+ }
+
+ if (write_offset < 0) {
+ if (skb_cow(skb, -write_offset))
+ goto bad;
+ if (write_offset + (int)sizeof(*ptr) > 0) {
+ if (skb_ensure_writable(skb,
+ min_t(int, skb->len,
+ write_offset + (int)sizeof(*ptr))))
+ goto bad;
+ }
+ } else {
+ if (check_add_overflow(write_offset, (int)sizeof(*ptr),
+ &write_len))
+ goto bad;
+ if (skb_ensure_writable(skb, min_t(int, skb->len,
+ write_len)))
+ goto bad;
+ }
+
+ ptr = (u32 *)(skb->data + write_offset);
+ cur_val = get_unaligned(ptr);
/* just do it, baby */
switch (cmd) {
case TCA_PEDIT_KEY_EX_CMD_SET:
val = tkey->val;
break;
case TCA_PEDIT_KEY_EX_CMD_ADD:
- val = (*ptr + tkey->val) & ~tkey->mask;
+ val = (cur_val + tkey->val) & ~tkey->mask;
break;
default:
pr_info_ratelimited("tc action pedit bad command (%d)\n", cmd);
goto bad;
}
- *ptr = ((*ptr & tkey->mask) ^ val);
- if (ptr == &hdata)
- skb_store_bits(skb, hoffset + offset, ptr, 4);
+ put_unaligned((cur_val & tkey->mask) ^ val, ptr);
}
goto done;
--
2.34.1
^ permalink raw reply related [flat|nested] 6+ messages in thread* Re: [PATCH net v5 1/1] net/sched: fix pedit partial COW leading to page cache corruption 2026-05-31 12:32 [PATCH net v5 1/1] net/sched: fix pedit partial COW leading to page cache corruption Jamal Hadi Salim @ 2026-05-31 16:15 ` David Laight 2026-05-31 17:25 ` Jamal Hadi Salim 0 siblings, 1 reply; 6+ messages in thread From: David Laight @ 2026-05-31 16:15 UTC (permalink / raw) To: Jamal Hadi Salim Cc: netdev, davem, edumazet, kuba, pabeni, horms, jiri, victor, yimingqian591, keenanat2000, 2045gemini, rollkingzzc, toke, dcaratti, security, linux-kernel, stable, Rajat Gupta On Sun, 31 May 2026 08:32:21 -0400 Jamal Hadi Salim <jhs@mojatatu.com> wrote: > From: Rajat Gupta <rajat.gupta@oss.qualcomm.com> > > tcf_pedit_act() computes the COW range for skb_ensure_writable() > once before the key loop using tcfp_off_max_hint, but the hint does > not account for the runtime header offset added by typed keys. This > can leave part of the write region un-COW'd. > > Fix by moving skb_ensure_writable() inside the per-key loop where > the actual write offset is known, and add overflow checking on the > offset arithmetic. For negative offsets (e.g. Ethernet header edits > at ingress), use skb_cow() to COW the headroom instead. Guard > offset_valid() against INT_MIN, where negation is undefined. This code has all got out of hand somewhere. I've only been looking at offset_valid() code. > Fixes: 8b796475fd78 ("net/sched: act_pedit: really ensure the skb is writable") > Reported-by: Yiming Qian <yimingqian591@gmail.com> > Reported-by: Keenan Dong <keenanat2000@gmail.com> > Reported-by: Han Guidong <2045gemini@gmail.com> > Reported-by: Zhang Cen <rollkingzzc@gmail.com> > Reviewed-by: Han Guidong <2045gemini@gmail.com> > Tested-by: Han Guidong <2045gemini@gmail.com> > Reviewed-by: Davide Caratti <dcaratti@redhat.com> > Tested-by: Davide Caratti <dcaratti@redhat.com> > Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> > Tested-by: Toke Høiland-Jørgensen <toke@redhat.com> > Reviewed-by: Victor Nogueira <victor@mojatatu.com> > Tested-by: Victor Nogueira <victor@mojatatu.com> > Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> > Signed-off-by: Rajat Gupta <rajat.gupta@oss.qualcomm.com> > --- > v4->v5: > 1) Dead code removal obsoleted after removing tcfp_off_max_hint as identified > by sashiko-claude[1]. Remove dead code updating "cur". > 2) Addition of hoffset at tkey->at promotes hoffset to unsigned int, > wraps on overflow. Claim made by both sashiko-claude[1] and sashiko-gemini[2] > I think this is far fetched but possible. Cast tkey->at to int. > 3) Signedness mismatch in min() macro may potentially cause build issues. > Claim made by both sashiko-claude[1] and sashiko-gemini[2]. > Go back to min_t() for now. David L. can make a better proposal later.. > > [1]https://netdev-ai.bots.linux.dev/sashiko/#/patchset/20260530080643.1345521-1-jhs%40mojatatu.com > [2]https://sashiko.dev/#/patchset/20260530080643.1345521-1-jhs%40mojatatu.com > --- > include/net/tc_act/tc_pedit.h | 1 - > net/sched/act_pedit.c | 77 +++++++++++++++++++---------------- > 2 files changed, 41 insertions(+), 37 deletions(-) > > diff --git a/include/net/tc_act/tc_pedit.h b/include/net/tc_act/tc_pedit.h > index f58ee15cd858..cb7b82f2cbc7 100644 > --- a/include/net/tc_act/tc_pedit.h > +++ b/include/net/tc_act/tc_pedit.h > @@ -15,7 +15,6 @@ struct tcf_pedit_parms { > struct tc_pedit_key *tcfp_keys; > struct tcf_pedit_key_ex *tcfp_keys_ex; > int action; > - u32 tcfp_off_max_hint; > unsigned char tcfp_nkeys; > unsigned char tcfp_flags; > struct rcu_head rcu; > diff --git a/net/sched/act_pedit.c b/net/sched/act_pedit.c > index bc20f08a2789..bd3b1da3cd63 100644 > --- a/net/sched/act_pedit.c > +++ b/net/sched/act_pedit.c > @@ -16,6 +16,8 @@ > #include <linux/ip.h> > #include <linux/ipv6.h> > #include <linux/slab.h> > +#include <linux/overflow.h> > +#include <linux/unaligned.h> > #include <net/ipv6.h> > #include <net/netlink.h> > #include <net/pkt_sched.h> > @@ -242,7 +244,6 @@ static int tcf_pedit_init(struct net *net, struct nlattr *nla, > goto out_free_ex; > } > > - nparms->tcfp_off_max_hint = 0; > nparms->tcfp_flags = parm->flags; > nparms->tcfp_nkeys = parm->nkeys; > > @@ -268,14 +269,6 @@ static int tcf_pedit_init(struct net *net, struct nlattr *nla, > BITS_PER_TYPE(int) - 1, > nparms->tcfp_keys[i].shift); > > - /* The AT option can read a single byte, we can bound the actual > - * value with uchar max. > - */ > - cur += (0xff & offmask) >> nparms->tcfp_keys[i].shift; > - > - /* Each key touches 4 bytes starting from the computed offset */ > - nparms->tcfp_off_max_hint = > - max(nparms->tcfp_off_max_hint, cur + 4); > } > > p = to_pedit(*a); > @@ -318,15 +311,12 @@ static void tcf_pedit_cleanup(struct tc_action *a) > call_rcu(&parms->rcu, tcf_pedit_cleanup_rcu); > } > > -static bool offset_valid(struct sk_buff *skb, int offset) > +static bool offset_valid(struct sk_buff *skb, int offset, int len) > { > - if (offset > 0 && offset > skb->len) > - return false; > - > - if (offset < 0 && -offset > skb_headroom(skb)) > + if (offset < -(int)skb_headroom(skb)) > return false; > > - return true; > + return offset <= (int)skb->len - len; > } > > static int pedit_l4_skb_offset(struct sk_buff *skb, int *hoffset, const int header_type) > @@ -393,18 +383,10 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb, > struct tcf_pedit_key_ex *tkey_ex; > struct tcf_pedit_parms *parms; > struct tc_pedit_key *tkey; > - u32 max_offset; > int i; > > parms = rcu_dereference_bh(p->parms); > > - max_offset = (skb_transport_header_was_set(skb) ? > - skb_transport_offset(skb) : > - skb_network_offset(skb)) + > - parms->tcfp_off_max_hint; > - if (skb_ensure_writable(skb, min(skb->len, max_offset))) > - goto done; > - > tcf_lastuse_update(&p->tcf_tm); > tcf_action_update_bstats(&p->common, skb); > > @@ -412,10 +394,11 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb, > tkey_ex = parms->tcfp_keys_ex; > > for (i = parms->tcfp_nkeys; i > 0; i--, tkey++) { > + int write_offset, write_len; > int offset = tkey->off; > int hoffset = 0; > - u32 *ptr, hdata; > - u32 val; > + u32 cur_val, val; > + u32 *ptr; > int rc; > > if (tkey_ex) { > @@ -433,13 +416,15 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb, > > if (tkey->offmask) { > u8 *d, _d; > + int at_offset; > > - if (!offset_valid(skb, hoffset + tkey->at)) { > + if (check_add_overflow(hoffset, (int)tkey->at, &at_offset) || There is no real reason for checking the add. All that matters is that offset_valid() and skb_header_pointer() are passed the same offset. Assigning it to a local would make that clearer. > + !offset_valid(skb, at_offset, sizeof(_d))) { > pr_info_ratelimited("tc action pedit 'at' offset %d out of bounds\n", > hoffset + tkey->at); > goto bad; > } > - d = skb_header_pointer(skb, hoffset + tkey->at, > + d = skb_header_pointer(skb, at_offset, > sizeof(_d), &_d); > if (!d) > goto bad; > @@ -451,31 +436,51 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb, > } > } > > - if (!offset_valid(skb, hoffset + offset)) { > - pr_info_ratelimited("tc action pedit offset %d out of bounds\n", hoffset + offset); > + if (check_add_overflow(hoffset, offset, &write_offset)) { > + pr_info_ratelimited("tc action pedit offset overflow\n"); > goto bad; > } > > - ptr = skb_header_pointer(skb, hoffset + offset, > - sizeof(hdata), &hdata); > - if (!ptr) > + if (!offset_valid(skb, write_offset, sizeof(*ptr))) { > + pr_info_ratelimited("tc action pedit offset %d out of bounds\n", > + write_offset); > goto bad; > + } Again all that matters is that the same value is passed to offset_valid() and skb_header_pointer(). > + > + if (write_offset < 0) { > + if (skb_cow(skb, -write_offset)) > + goto bad; > + if (write_offset + (int)sizeof(*ptr) > 0) { > + if (skb_ensure_writable(skb, > + min_t(int, skb->len, > + write_offset + (int)sizeof(*ptr)))) That min isn't needed (same as below). > + goto bad; > + } > + } else { > + if (check_add_overflow(write_offset, (int)sizeof(*ptr), > + &write_len)) That can't fail because of the 'return offset <= (int)skb->len - len;' check. > + goto bad; > + if (skb_ensure_writable(skb, min_t(int, skb->len, > + write_len))) Similarly that min_t() will return 'write_len'. But is that even correct? The code wants to write 4 bytes at skb->data + write_offset. I can't see where the offset is used. All the overflow tests have made the logic far too hard to follow. I think that whole lot can just be: hoffset += offset; if (offset_valid(skb, hoffset, sizeof(hdata))) goto bad; /* Ensure any needed headroom is writeable */ if (hoffset < 0 && skb_cow(skb, -hoffset) goto bad; /* Ensure the skb is writeable to the end of the area to be written. * The data is pulled into the skb header. */ if ((hoffset >= 0 || hoffset + (int)sizeof(hdata) > 0) && skb_ensure_writable(skb, hoffset + (int)sizeof(hdata)) goto bad; The call to offset_valid() does all the other checks. (The hoffset >= 0 check is the inverse of the earlier check and will be optimised away - skipping the second second test which is then always true.) If you really want a warning message, put a generic one on 'bad;'. > + goto bad; > + } > + > + ptr = (u32 *)(skb->data + write_offset); > + cur_val = get_unaligned(ptr); Given you are doing an unaligned read you could remove the check in the 'offmask' path that the final offset is a multiple of 4. Nothing checked that tkey->off is a multiple of 4, and I suspect that is user-specifiable? I'm also not entirely that the mac_header is aligned. -- David > /* just do it, baby */ > switch (cmd) { > case TCA_PEDIT_KEY_EX_CMD_SET: > val = tkey->val; > break; > case TCA_PEDIT_KEY_EX_CMD_ADD: > - val = (*ptr + tkey->val) & ~tkey->mask; > + val = (cur_val + tkey->val) & ~tkey->mask; > break; > default: > pr_info_ratelimited("tc action pedit bad command (%d)\n", cmd); > goto bad; > } > > - *ptr = ((*ptr & tkey->mask) ^ val); > - if (ptr == &hdata) > - skb_store_bits(skb, hoffset + offset, ptr, 4); > + put_unaligned((cur_val & tkey->mask) ^ val, ptr); > } > > goto done; ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH net v5 1/1] net/sched: fix pedit partial COW leading to page cache corruption 2026-05-31 16:15 ` David Laight @ 2026-05-31 17:25 ` Jamal Hadi Salim 2026-05-31 17:28 ` Jamal Hadi Salim 2026-05-31 18:42 ` David Laight 0 siblings, 2 replies; 6+ messages in thread From: Jamal Hadi Salim @ 2026-05-31 17:25 UTC (permalink / raw) To: David Laight Cc: netdev, davem, edumazet, kuba, pabeni, horms, jiri, victor, yimingqian591, keenanat2000, 2045gemini, rollkingzzc, toke, dcaratti, security, linux-kernel, stable, Rajat Gupta On Sun, May 31, 2026 at 12:15 PM David Laight <david.laight.linux@gmail.com> wrote: > > On Sun, 31 May 2026 08:32:21 -0400 > Jamal Hadi Salim <jhs@mojatatu.com> wrote: > > > From: Rajat Gupta <rajat.gupta@oss.qualcomm.com> > > > > tcf_pedit_act() computes the COW range for skb_ensure_writable() > > once before the key loop using tcfp_off_max_hint, but the hint does > > not account for the runtime header offset added by typed keys. This > > can leave part of the write region un-COW'd. > > > > Fix by moving skb_ensure_writable() inside the per-key loop where > > the actual write offset is known, and add overflow checking on the > > offset arithmetic. For negative offsets (e.g. Ethernet header edits > > at ingress), use skb_cow() to COW the headroom instead. Guard > > offset_valid() against INT_MIN, where negation is undefined. > > This code has all got out of hand somewhere. > I've only been looking at offset_valid() code. > > > Fixes: 8b796475fd78 ("net/sched: act_pedit: really ensure the skb is writable") > > Reported-by: Yiming Qian <yimingqian591@gmail.com> > > Reported-by: Keenan Dong <keenanat2000@gmail.com> > > Reported-by: Han Guidong <2045gemini@gmail.com> > > Reported-by: Zhang Cen <rollkingzzc@gmail.com> > > Reviewed-by: Han Guidong <2045gemini@gmail.com> > > Tested-by: Han Guidong <2045gemini@gmail.com> > > Reviewed-by: Davide Caratti <dcaratti@redhat.com> > > Tested-by: Davide Caratti <dcaratti@redhat.com> > > Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> > > Tested-by: Toke Høiland-Jørgensen <toke@redhat.com> > > Reviewed-by: Victor Nogueira <victor@mojatatu.com> > > Tested-by: Victor Nogueira <victor@mojatatu.com> > > Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> > > Signed-off-by: Rajat Gupta <rajat.gupta@oss.qualcomm.com> > > --- > > v4->v5: > > 1) Dead code removal obsoleted after removing tcfp_off_max_hint as identified > > by sashiko-claude[1]. Remove dead code updating "cur". > > 2) Addition of hoffset at tkey->at promotes hoffset to unsigned int, > > wraps on overflow. Claim made by both sashiko-claude[1] and sashiko-gemini[2] > > I think this is far fetched but possible. Cast tkey->at to int. > > 3) Signedness mismatch in min() macro may potentially cause build issues. > > Claim made by both sashiko-claude[1] and sashiko-gemini[2]. > > Go back to min_t() for now. David L. can make a better proposal later.. > > > > [1]https://netdev-ai.bots.linux.dev/sashiko/#/patchset/20260530080643.1345521-1-jhs%40mojatatu.com > > [2]https://sashiko.dev/#/patchset/20260530080643.1345521-1-jhs%40mojatatu.com > > --- > > include/net/tc_act/tc_pedit.h | 1 - > > net/sched/act_pedit.c | 77 +++++++++++++++++++---------------- > > 2 files changed, 41 insertions(+), 37 deletions(-) > > > > diff --git a/include/net/tc_act/tc_pedit.h b/include/net/tc_act/tc_pedit.h > > index f58ee15cd858..cb7b82f2cbc7 100644 > > --- a/include/net/tc_act/tc_pedit.h > > +++ b/include/net/tc_act/tc_pedit.h > > @@ -15,7 +15,6 @@ struct tcf_pedit_parms { > > struct tc_pedit_key *tcfp_keys; > > struct tcf_pedit_key_ex *tcfp_keys_ex; > > int action; > > - u32 tcfp_off_max_hint; > > unsigned char tcfp_nkeys; > > unsigned char tcfp_flags; > > struct rcu_head rcu; > > diff --git a/net/sched/act_pedit.c b/net/sched/act_pedit.c > > index bc20f08a2789..bd3b1da3cd63 100644 > > --- a/net/sched/act_pedit.c > > +++ b/net/sched/act_pedit.c > > @@ -16,6 +16,8 @@ > > #include <linux/ip.h> > > #include <linux/ipv6.h> > > #include <linux/slab.h> > > +#include <linux/overflow.h> > > +#include <linux/unaligned.h> > > #include <net/ipv6.h> > > #include <net/netlink.h> > > #include <net/pkt_sched.h> > > @@ -242,7 +244,6 @@ static int tcf_pedit_init(struct net *net, struct nlattr *nla, > > goto out_free_ex; > > } > > > > - nparms->tcfp_off_max_hint = 0; > > nparms->tcfp_flags = parm->flags; > > nparms->tcfp_nkeys = parm->nkeys; > > > > @@ -268,14 +269,6 @@ static int tcf_pedit_init(struct net *net, struct nlattr *nla, > > BITS_PER_TYPE(int) - 1, > > nparms->tcfp_keys[i].shift); > > > > - /* The AT option can read a single byte, we can bound the actual > > - * value with uchar max. > > - */ > > - cur += (0xff & offmask) >> nparms->tcfp_keys[i].shift; > > - > > - /* Each key touches 4 bytes starting from the computed offset */ > > - nparms->tcfp_off_max_hint = > > - max(nparms->tcfp_off_max_hint, cur + 4); > > } > > > > p = to_pedit(*a); > > @@ -318,15 +311,12 @@ static void tcf_pedit_cleanup(struct tc_action *a) > > call_rcu(&parms->rcu, tcf_pedit_cleanup_rcu); > > } > > > > -static bool offset_valid(struct sk_buff *skb, int offset) > > +static bool offset_valid(struct sk_buff *skb, int offset, int len) > > { > > - if (offset > 0 && offset > skb->len) > > - return false; > > - > > - if (offset < 0 && -offset > skb_headroom(skb)) > > + if (offset < -(int)skb_headroom(skb)) > > return false; > > > > - return true; > > + return offset <= (int)skb->len - len; > > } > > > > static int pedit_l4_skb_offset(struct sk_buff *skb, int *hoffset, const int header_type) > > @@ -393,18 +383,10 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb, > > struct tcf_pedit_key_ex *tkey_ex; > > struct tcf_pedit_parms *parms; > > struct tc_pedit_key *tkey; > > - u32 max_offset; > > int i; > > > > parms = rcu_dereference_bh(p->parms); > > > > - max_offset = (skb_transport_header_was_set(skb) ? > > - skb_transport_offset(skb) : > > - skb_network_offset(skb)) + > > - parms->tcfp_off_max_hint; > > - if (skb_ensure_writable(skb, min(skb->len, max_offset))) > > - goto done; > > - > > tcf_lastuse_update(&p->tcf_tm); > > tcf_action_update_bstats(&p->common, skb); > > > > @@ -412,10 +394,11 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb, > > tkey_ex = parms->tcfp_keys_ex; > > > > for (i = parms->tcfp_nkeys; i > 0; i--, tkey++) { > > + int write_offset, write_len; > > int offset = tkey->off; > > int hoffset = 0; > > - u32 *ptr, hdata; > > - u32 val; > > + u32 cur_val, val; > > + u32 *ptr; > > int rc; > > > > if (tkey_ex) { > > @@ -433,13 +416,15 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb, > > > > if (tkey->offmask) { > > u8 *d, _d; > > + int at_offset; > > > > - if (!offset_valid(skb, hoffset + tkey->at)) { > > + if (check_add_overflow(hoffset, (int)tkey->at, &at_offset) || > > There is no real reason for checking the add. > All that matters is that offset_valid() and skb_header_pointer() are > passed the same offset. > Assigning it to a local would make that clearer. You are technically correct - but earlier discussion was I believe that signed integer overflow as undefined behavior and those checks are there to avoid that and logic bypasses. > > > + !offset_valid(skb, at_offset, sizeof(_d))) { > > pr_info_ratelimited("tc action pedit 'at' offset %d out of bounds\n", > > hoffset + tkey->at); > > goto bad; > > } > > - d = skb_header_pointer(skb, hoffset + tkey->at, > > + d = skb_header_pointer(skb, at_offset, > > sizeof(_d), &_d); > > if (!d) > > goto bad; > > @@ -451,31 +436,51 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb, > > } > > } > > > > - if (!offset_valid(skb, hoffset + offset)) { > > - pr_info_ratelimited("tc action pedit offset %d out of bounds\n", hoffset + offset); > > + if (check_add_overflow(hoffset, offset, &write_offset)) { > > + pr_info_ratelimited("tc action pedit offset overflow\n"); > > goto bad; > > } > > > > - ptr = skb_header_pointer(skb, hoffset + offset, > > - sizeof(hdata), &hdata); > > - if (!ptr) > > + if (!offset_valid(skb, write_offset, sizeof(*ptr))) { > > + pr_info_ratelimited("tc action pedit offset %d out of bounds\n", > > + write_offset); > > goto bad; > > + } > > Again all that matters is that the same value is passed to offset_valid() > and skb_header_pointer(). Same comment as earlier. Note: we replaced skb_header_pointer() with direct memory access "ptr = (u32 *)(skb->data + write_offset)" further down. so extra validation of write_offset is justifiable, no? > > + > > + if (write_offset < 0) { > > + if (skb_cow(skb, -write_offset)) > > + goto bad; > > + if (write_offset + (int)sizeof(*ptr) > 0) { > > + if (skb_ensure_writable(skb, > > + min_t(int, skb->len, > > + write_offset + (int)sizeof(*ptr)))) > > That min isn't needed (same as below). Ok - yes, i see that. Same with the min(). Why did we think we needed min/min_t() before? > > + goto bad; > > + } > > + } else { > > + if (check_add_overflow(write_offset, (int)sizeof(*ptr), > > + &write_len)) > > That can't fail because of the 'return offset <= (int)skb->len - len;' check. True. > > > + goto bad; > > + if (skb_ensure_writable(skb, min_t(int, skb->len, > > + write_len))) > > Similarly that min_t() will return 'write_len'. > But is that even correct? > The code wants to write 4 bytes at skb->data + write_offset. > I can't see where the offset is used. > All the overflow tests have made the logic far too hard to follow. > > I think that whole lot can just be: > hoffset += offset; > if (offset_valid(skb, hoffset, sizeof(hdata))) > goto bad; > > /* Ensure any needed headroom is writeable */ > if (hoffset < 0 && skb_cow(skb, -hoffset) > goto bad; > /* Ensure the skb is writeable to the end of the area to be written. > * The data is pulled into the skb header. */ > if ((hoffset >= 0 || hoffset + (int)sizeof(hdata) > 0) && > skb_ensure_writable(skb, hoffset + (int)sizeof(hdata)) > goto bad; > The call to offset_valid() does all the other checks. > (The hoffset >= 0 check is the inverse of the earlier check and will > be optimised away - skipping the second second test which is then > always true.) Yes, that code is more readable. > If you really want a warning message, put a generic one on 'bad;'. > > > + goto bad; > > + } > > + > > + ptr = (u32 *)(skb->data + write_offset); > > + cur_val = get_unaligned(ptr); > > Given you are doing an unaligned read you could remove the check in the > 'offmask' path that the final offset is a multiple of 4. > Nothing checked that tkey->off is a multiple of 4, and I suspect that > is user-specifiable? The only reason i will keep that check is because we have always checked for the 32b alignment from an API perspective. I am less worried about than potentially breaking something in the hardware offload path when a user passes unaligned offsets down given that these code paths (towards hardware) have been well vetted. I am trying to get this security fix in. Do you see anything in what you stated as a show stopper? IOW introducing regression or plain wrong? If not, I will wait for the AI review or if someone finds an issue which breaks something - and if it is worth going back then i will consider your suggestions as part of the next update. Does that sound reasonable? cheers, jamal > I'm also not entirely that the mac_header is aligned. > > -- David > > > /* just do it, baby */ > > switch (cmd) { > > case TCA_PEDIT_KEY_EX_CMD_SET: > > val = tkey->val; > > break; > > case TCA_PEDIT_KEY_EX_CMD_ADD: > > - val = (*ptr + tkey->val) & ~tkey->mask; > > + val = (cur_val + tkey->val) & ~tkey->mask; > > break; > > default: > > pr_info_ratelimited("tc action pedit bad command (%d)\n", cmd); > > goto bad; > > } > > > > - *ptr = ((*ptr & tkey->mask) ^ val); > > - if (ptr == &hdata) > > - skb_store_bits(skb, hoffset + offset, ptr, 4); > > + put_unaligned((cur_val & tkey->mask) ^ val, ptr); > > } > > > > goto done; > ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH net v5 1/1] net/sched: fix pedit partial COW leading to page cache corruption 2026-05-31 17:25 ` Jamal Hadi Salim @ 2026-05-31 17:28 ` Jamal Hadi Salim 2026-05-31 18:42 ` David Laight 1 sibling, 0 replies; 6+ messages in thread From: Jamal Hadi Salim @ 2026-05-31 17:28 UTC (permalink / raw) To: David Laight Cc: netdev, davem, edumazet, kuba, pabeni, horms, jiri, victor, yimingqian591, keenanat2000, 2045gemini, rollkingzzc, toke, dcaratti, security, linux-kernel, stable, Rajat Gupta [-- Attachment #1: Type: text/plain, Size: 14754 bytes --] On Sun, May 31, 2026 at 1:25 PM Jamal Hadi Salim <jhs@mojatatu.com> wrote: > > On Sun, May 31, 2026 at 12:15 PM David Laight > <david.laight.linux@gmail.com> wrote: > > > > On Sun, 31 May 2026 08:32:21 -0400 > > Jamal Hadi Salim <jhs@mojatatu.com> wrote: > > > > > From: Rajat Gupta <rajat.gupta@oss.qualcomm.com> > > > > > > tcf_pedit_act() computes the COW range for skb_ensure_writable() > > > once before the key loop using tcfp_off_max_hint, but the hint does > > > not account for the runtime header offset added by typed keys. This > > > can leave part of the write region un-COW'd. > > > > > > Fix by moving skb_ensure_writable() inside the per-key loop where > > > the actual write offset is known, and add overflow checking on the > > > offset arithmetic. For negative offsets (e.g. Ethernet header edits > > > at ingress), use skb_cow() to COW the headroom instead. Guard > > > offset_valid() against INT_MIN, where negation is undefined. > > > > This code has all got out of hand somewhere. > > I've only been looking at offset_valid() code. > > > > > Fixes: 8b796475fd78 ("net/sched: act_pedit: really ensure the skb is writable") > > > Reported-by: Yiming Qian <yimingqian591@gmail.com> > > > Reported-by: Keenan Dong <keenanat2000@gmail.com> > > > Reported-by: Han Guidong <2045gemini@gmail.com> > > > Reported-by: Zhang Cen <rollkingzzc@gmail.com> > > > Reviewed-by: Han Guidong <2045gemini@gmail.com> > > > Tested-by: Han Guidong <2045gemini@gmail.com> > > > Reviewed-by: Davide Caratti <dcaratti@redhat.com> > > > Tested-by: Davide Caratti <dcaratti@redhat.com> > > > Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> > > > Tested-by: Toke Høiland-Jørgensen <toke@redhat.com> > > > Reviewed-by: Victor Nogueira <victor@mojatatu.com> > > > Tested-by: Victor Nogueira <victor@mojatatu.com> > > > Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> > > > Signed-off-by: Rajat Gupta <rajat.gupta@oss.qualcomm.com> > > > --- > > > v4->v5: > > > 1) Dead code removal obsoleted after removing tcfp_off_max_hint as identified > > > by sashiko-claude[1]. Remove dead code updating "cur". > > > 2) Addition of hoffset at tkey->at promotes hoffset to unsigned int, > > > wraps on overflow. Claim made by both sashiko-claude[1] and sashiko-gemini[2] > > > I think this is far fetched but possible. Cast tkey->at to int. > > > 3) Signedness mismatch in min() macro may potentially cause build issues. > > > Claim made by both sashiko-claude[1] and sashiko-gemini[2]. > > > Go back to min_t() for now. David L. can make a better proposal later.. > > > > > > [1]https://netdev-ai.bots.linux.dev/sashiko/#/patchset/20260530080643.1345521-1-jhs%40mojatatu.com > > > [2]https://sashiko.dev/#/patchset/20260530080643.1345521-1-jhs%40mojatatu.com > > > --- > > > include/net/tc_act/tc_pedit.h | 1 - > > > net/sched/act_pedit.c | 77 +++++++++++++++++++---------------- > > > 2 files changed, 41 insertions(+), 37 deletions(-) > > > > > > diff --git a/include/net/tc_act/tc_pedit.h b/include/net/tc_act/tc_pedit.h > > > index f58ee15cd858..cb7b82f2cbc7 100644 > > > --- a/include/net/tc_act/tc_pedit.h > > > +++ b/include/net/tc_act/tc_pedit.h > > > @@ -15,7 +15,6 @@ struct tcf_pedit_parms { > > > struct tc_pedit_key *tcfp_keys; > > > struct tcf_pedit_key_ex *tcfp_keys_ex; > > > int action; > > > - u32 tcfp_off_max_hint; > > > unsigned char tcfp_nkeys; > > > unsigned char tcfp_flags; > > > struct rcu_head rcu; > > > diff --git a/net/sched/act_pedit.c b/net/sched/act_pedit.c > > > index bc20f08a2789..bd3b1da3cd63 100644 > > > --- a/net/sched/act_pedit.c > > > +++ b/net/sched/act_pedit.c > > > @@ -16,6 +16,8 @@ > > > #include <linux/ip.h> > > > #include <linux/ipv6.h> > > > #include <linux/slab.h> > > > +#include <linux/overflow.h> > > > +#include <linux/unaligned.h> > > > #include <net/ipv6.h> > > > #include <net/netlink.h> > > > #include <net/pkt_sched.h> > > > @@ -242,7 +244,6 @@ static int tcf_pedit_init(struct net *net, struct nlattr *nla, > > > goto out_free_ex; > > > } > > > > > > - nparms->tcfp_off_max_hint = 0; > > > nparms->tcfp_flags = parm->flags; > > > nparms->tcfp_nkeys = parm->nkeys; > > > > > > @@ -268,14 +269,6 @@ static int tcf_pedit_init(struct net *net, struct nlattr *nla, > > > BITS_PER_TYPE(int) - 1, > > > nparms->tcfp_keys[i].shift); > > > > > > - /* The AT option can read a single byte, we can bound the actual > > > - * value with uchar max. > > > - */ > > > - cur += (0xff & offmask) >> nparms->tcfp_keys[i].shift; > > > - > > > - /* Each key touches 4 bytes starting from the computed offset */ > > > - nparms->tcfp_off_max_hint = > > > - max(nparms->tcfp_off_max_hint, cur + 4); > > > } > > > > > > p = to_pedit(*a); > > > @@ -318,15 +311,12 @@ static void tcf_pedit_cleanup(struct tc_action *a) > > > call_rcu(&parms->rcu, tcf_pedit_cleanup_rcu); > > > } > > > > > > -static bool offset_valid(struct sk_buff *skb, int offset) > > > +static bool offset_valid(struct sk_buff *skb, int offset, int len) > > > { > > > - if (offset > 0 && offset > skb->len) > > > - return false; > > > - > > > - if (offset < 0 && -offset > skb_headroom(skb)) > > > + if (offset < -(int)skb_headroom(skb)) > > > return false; > > > > > > - return true; > > > + return offset <= (int)skb->len - len; > > > } > > > > > > static int pedit_l4_skb_offset(struct sk_buff *skb, int *hoffset, const int header_type) > > > @@ -393,18 +383,10 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb, > > > struct tcf_pedit_key_ex *tkey_ex; > > > struct tcf_pedit_parms *parms; > > > struct tc_pedit_key *tkey; > > > - u32 max_offset; > > > int i; > > > > > > parms = rcu_dereference_bh(p->parms); > > > > > > - max_offset = (skb_transport_header_was_set(skb) ? > > > - skb_transport_offset(skb) : > > > - skb_network_offset(skb)) + > > > - parms->tcfp_off_max_hint; > > > - if (skb_ensure_writable(skb, min(skb->len, max_offset))) > > > - goto done; > > > - > > > tcf_lastuse_update(&p->tcf_tm); > > > tcf_action_update_bstats(&p->common, skb); > > > > > > @@ -412,10 +394,11 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb, > > > tkey_ex = parms->tcfp_keys_ex; > > > > > > for (i = parms->tcfp_nkeys; i > 0; i--, tkey++) { > > > + int write_offset, write_len; > > > int offset = tkey->off; > > > int hoffset = 0; > > > - u32 *ptr, hdata; > > > - u32 val; > > > + u32 cur_val, val; > > > + u32 *ptr; > > > int rc; > > > > > > if (tkey_ex) { > > > @@ -433,13 +416,15 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb, > > > > > > if (tkey->offmask) { > > > u8 *d, _d; > > > + int at_offset; > > > > > > - if (!offset_valid(skb, hoffset + tkey->at)) { > > > + if (check_add_overflow(hoffset, (int)tkey->at, &at_offset) || > > > > There is no real reason for checking the add. > > All that matters is that offset_valid() and skb_header_pointer() are > > passed the same offset. > > Assigning it to a local would make that clearer. > > You are technically correct - but earlier discussion was I believe > that signed integer overflow as undefined behavior and those checks > are there to avoid that and logic bypasses. > > > > > > + !offset_valid(skb, at_offset, sizeof(_d))) { > > > pr_info_ratelimited("tc action pedit 'at' offset %d out of bounds\n", > > > hoffset + tkey->at); > > > goto bad; > > > } > > > - d = skb_header_pointer(skb, hoffset + tkey->at, > > > + d = skb_header_pointer(skb, at_offset, > > > sizeof(_d), &_d); > > > if (!d) > > > goto bad; > > > @@ -451,31 +436,51 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb, > > > } > > > } > > > > > > - if (!offset_valid(skb, hoffset + offset)) { > > > - pr_info_ratelimited("tc action pedit offset %d out of bounds\n", hoffset + offset); > > > + if (check_add_overflow(hoffset, offset, &write_offset)) { > > > + pr_info_ratelimited("tc action pedit offset overflow\n"); > > > goto bad; > > > } > > > > > > - ptr = skb_header_pointer(skb, hoffset + offset, > > > - sizeof(hdata), &hdata); > > > - if (!ptr) > > > + if (!offset_valid(skb, write_offset, sizeof(*ptr))) { > > > + pr_info_ratelimited("tc action pedit offset %d out of bounds\n", > > > + write_offset); > > > goto bad; > > > + } > > > > Again all that matters is that the same value is passed to offset_valid() > > and skb_header_pointer(). > > Same comment as earlier. > Note: we replaced skb_header_pointer() with direct memory access "ptr > = (u32 *)(skb->data + write_offset)" further down. > so extra validation of write_offset is justifiable, no? > > > > > + > > > + if (write_offset < 0) { > > > + if (skb_cow(skb, -write_offset)) > > > + goto bad; > > > + if (write_offset + (int)sizeof(*ptr) > 0) { > > > + if (skb_ensure_writable(skb, > > > + min_t(int, skb->len, > > > + write_offset + (int)sizeof(*ptr)))) > > > > That min isn't needed (same as below). > > Ok - yes, i see that. Same with the min(). Why did we think we needed > min/min_t() before? > > > > + goto bad; > > > + } > > > + } else { > > > + if (check_add_overflow(write_offset, (int)sizeof(*ptr), > > > + &write_len)) > > > > That can't fail because of the 'return offset <= (int)skb->len - len;' check. > > True. > > > > > > + goto bad; > > > + if (skb_ensure_writable(skb, min_t(int, skb->len, > > > + write_len))) > > > > Similarly that min_t() will return 'write_len'. > > But is that even correct? > > The code wants to write 4 bytes at skb->data + write_offset. > > I can't see where the offset is used. > > All the overflow tests have made the logic far too hard to follow. > > > > I think that whole lot can just be: > > hoffset += offset; > > if (offset_valid(skb, hoffset, sizeof(hdata))) > > goto bad; > > > > /* Ensure any needed headroom is writeable */ > > if (hoffset < 0 && skb_cow(skb, -hoffset) > > goto bad; > > /* Ensure the skb is writeable to the end of the area to be written. > > * The data is pulled into the skb header. */ > > if ((hoffset >= 0 || hoffset + (int)sizeof(hdata) > 0) && > > skb_ensure_writable(skb, hoffset + (int)sizeof(hdata)) > > goto bad; > > The call to offset_valid() does all the other checks. > > (The hoffset >= 0 check is the inverse of the earlier check and will > > be optimised away - skipping the second second test which is then > > always true.) > > Yes, that code is more readable. > > > If you really want a warning message, put a generic one on 'bad;'. > > > > > + goto bad; > > > + } > > > + > > > + ptr = (u32 *)(skb->data + write_offset); > > > + cur_val = get_unaligned(ptr); > > > > Given you are doing an unaligned read you could remove the check in the > > 'offmask' path that the final offset is a multiple of 4. > > Nothing checked that tkey->off is a multiple of 4, and I suspect that > > is user-specifiable? > > The only reason i will keep that check is because we have always > checked for the 32b alignment from an API perspective. I am less > worried about than potentially breaking something in the hardware > offload path when a user passes unaligned offsets down given that > these code paths (towards hardware) have been well vetted. > > I am trying to get this security fix in. Do you see anything in what > you stated as a show stopper? IOW introducing regression or plain > wrong? > If not, I will wait for the AI review or if someone finds an issue > which breaks something - and if it is worth going back then i will > consider your suggestions as part of the next update. > Does that sound reasonable? > Something like attached (not compile tested) basically if i didnt miss what else you suggested. cheers, jamal > cheers, > jamal > > > I'm also not entirely that the mac_header is aligned. > > > > -- David > > > > > /* just do it, baby */ > > > switch (cmd) { > > > case TCA_PEDIT_KEY_EX_CMD_SET: > > > val = tkey->val; > > > break; > > > case TCA_PEDIT_KEY_EX_CMD_ADD: > > > - val = (*ptr + tkey->val) & ~tkey->mask; > > > + val = (cur_val + tkey->val) & ~tkey->mask; > > > break; > > > default: > > > pr_info_ratelimited("tc action pedit bad command (%d)\n", cmd); > > > goto bad; > > > } > > > > > > - *ptr = ((*ptr & tkey->mask) ^ val); > > > - if (ptr == &hdata) > > > - skb_store_bits(skb, hoffset + offset, ptr, 4); > > > + put_unaligned((cur_val & tkey->mask) ^ val, ptr); > > > } > > > > > > goto done; > > [-- Attachment #2: patchlet-david.laight --] [-- Type: application/octet-stream, Size: 1315 bytes --] diff --git a/net/sched/act_pedit.c b/net/sched/act_pedit.c index bd3b1da3cd63..e7da01804891 100644 --- a/net/sched/act_pedit.c +++ b/net/sched/act_pedit.c @@ -394,7 +394,7 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb, tkey_ex = parms->tcfp_keys_ex; for (i = parms->tcfp_nkeys; i > 0; i--, tkey++) { - int write_offset, write_len; + int write_offset; int offset = tkey->off; int hoffset = 0; u32 cur_val, val; @@ -447,23 +447,12 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb, goto bad; } - if (write_offset < 0) { - if (skb_cow(skb, -write_offset)) - goto bad; - if (write_offset + (int)sizeof(*ptr) > 0) { - if (skb_ensure_writable(skb, - min_t(int, skb->len, - write_offset + (int)sizeof(*ptr)))) - goto bad; - } - } else { - if (check_add_overflow(write_offset, (int)sizeof(*ptr), - &write_len)) - goto bad; - if (skb_ensure_writable(skb, min_t(int, skb->len, - write_len))) - goto bad; - } + if (write_offset < 0 && skb_cow(skb, -write_offset)) + goto bad; + + if ((write_offset >= 0 || write_offset + (int)sizeof(*ptr) > 0) && + skb_ensure_writable(skb, write_offset + (int)sizeof(*ptr))) + goto bad; ptr = (u32 *)(skb->data + write_offset); cur_val = get_unaligned(ptr); ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH net v5 1/1] net/sched: fix pedit partial COW leading to page cache corruption 2026-05-31 17:25 ` Jamal Hadi Salim 2026-05-31 17:28 ` Jamal Hadi Salim @ 2026-05-31 18:42 ` David Laight 2026-05-31 18:52 ` David Laight 1 sibling, 1 reply; 6+ messages in thread From: David Laight @ 2026-05-31 18:42 UTC (permalink / raw) To: Jamal Hadi Salim Cc: netdev, davem, edumazet, kuba, pabeni, horms, jiri, victor, yimingqian591, keenanat2000, 2045gemini, rollkingzzc, toke, dcaratti, security, linux-kernel, stable, Rajat Gupta On Sun, 31 May 2026 13:25:36 -0400 Jamal Hadi Salim <jhs@mojatatu.com> wrote: > On Sun, May 31, 2026 at 12:15 PM David Laight > <david.laight.linux@gmail.com> wrote: > > > > On Sun, 31 May 2026 08:32:21 -0400 > > Jamal Hadi Salim <jhs@mojatatu.com> wrote: > > > > > From: Rajat Gupta <rajat.gupta@oss.qualcomm.com> > > > > > > tcf_pedit_act() computes the COW range for skb_ensure_writable() > > > once before the key loop using tcfp_off_max_hint, but the hint does > > > not account for the runtime header offset added by typed keys. This > > > can leave part of the write region un-COW'd. > > > > > > Fix by moving skb_ensure_writable() inside the per-key loop where > > > the actual write offset is known, and add overflow checking on the > > > offset arithmetic. For negative offsets (e.g. Ethernet header edits > > > at ingress), use skb_cow() to COW the headroom instead. Guard > > > offset_valid() against INT_MIN, where negation is undefined. > > > > This code has all got out of hand somewhere. > > I've only been looking at offset_valid() code. > > > > > Fixes: 8b796475fd78 ("net/sched: act_pedit: really ensure the skb is writable") > > > Reported-by: Yiming Qian <yimingqian591@gmail.com> > > > Reported-by: Keenan Dong <keenanat2000@gmail.com> > > > Reported-by: Han Guidong <2045gemini@gmail.com> > > > Reported-by: Zhang Cen <rollkingzzc@gmail.com> > > > Reviewed-by: Han Guidong <2045gemini@gmail.com> > > > Tested-by: Han Guidong <2045gemini@gmail.com> > > > Reviewed-by: Davide Caratti <dcaratti@redhat.com> > > > Tested-by: Davide Caratti <dcaratti@redhat.com> > > > Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> > > > Tested-by: Toke Høiland-Jørgensen <toke@redhat.com> > > > Reviewed-by: Victor Nogueira <victor@mojatatu.com> > > > Tested-by: Victor Nogueira <victor@mojatatu.com> > > > Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> > > > Signed-off-by: Rajat Gupta <rajat.gupta@oss.qualcomm.com> > > > --- > > > v4->v5: > > > 1) Dead code removal obsoleted after removing tcfp_off_max_hint as identified > > > by sashiko-claude[1]. Remove dead code updating "cur". > > > 2) Addition of hoffset at tkey->at promotes hoffset to unsigned int, > > > wraps on overflow. Claim made by both sashiko-claude[1] and sashiko-gemini[2] > > > I think this is far fetched but possible. Cast tkey->at to int. > > > 3) Signedness mismatch in min() macro may potentially cause build issues. > > > Claim made by both sashiko-claude[1] and sashiko-gemini[2]. > > > Go back to min_t() for now. David L. can make a better proposal later.. > > > > > > [1]https://netdev-ai.bots.linux.dev/sashiko/#/patchset/20260530080643.1345521-1-jhs%40mojatatu.com > > > [2]https://sashiko.dev/#/patchset/20260530080643.1345521-1-jhs%40mojatatu.com > > > --- > > > include/net/tc_act/tc_pedit.h | 1 - > > > net/sched/act_pedit.c | 77 +++++++++++++++++++---------------- > > > 2 files changed, 41 insertions(+), 37 deletions(-) > > > > > > diff --git a/include/net/tc_act/tc_pedit.h b/include/net/tc_act/tc_pedit.h > > > index f58ee15cd858..cb7b82f2cbc7 100644 > > > --- a/include/net/tc_act/tc_pedit.h > > > +++ b/include/net/tc_act/tc_pedit.h > > > @@ -15,7 +15,6 @@ struct tcf_pedit_parms { > > > struct tc_pedit_key *tcfp_keys; > > > struct tcf_pedit_key_ex *tcfp_keys_ex; > > > int action; > > > - u32 tcfp_off_max_hint; > > > unsigned char tcfp_nkeys; > > > unsigned char tcfp_flags; > > > struct rcu_head rcu; > > > diff --git a/net/sched/act_pedit.c b/net/sched/act_pedit.c > > > index bc20f08a2789..bd3b1da3cd63 100644 > > > --- a/net/sched/act_pedit.c > > > +++ b/net/sched/act_pedit.c > > > @@ -16,6 +16,8 @@ > > > #include <linux/ip.h> > > > #include <linux/ipv6.h> > > > #include <linux/slab.h> > > > +#include <linux/overflow.h> > > > +#include <linux/unaligned.h> > > > #include <net/ipv6.h> > > > #include <net/netlink.h> > > > #include <net/pkt_sched.h> > > > @@ -242,7 +244,6 @@ static int tcf_pedit_init(struct net *net, struct nlattr *nla, > > > goto out_free_ex; > > > } > > > > > > - nparms->tcfp_off_max_hint = 0; > > > nparms->tcfp_flags = parm->flags; > > > nparms->tcfp_nkeys = parm->nkeys; > > > > > > @@ -268,14 +269,6 @@ static int tcf_pedit_init(struct net *net, struct nlattr *nla, > > > BITS_PER_TYPE(int) - 1, > > > nparms->tcfp_keys[i].shift); > > > > > > - /* The AT option can read a single byte, we can bound the actual > > > - * value with uchar max. > > > - */ > > > - cur += (0xff & offmask) >> nparms->tcfp_keys[i].shift; > > > - > > > - /* Each key touches 4 bytes starting from the computed offset */ > > > - nparms->tcfp_off_max_hint = > > > - max(nparms->tcfp_off_max_hint, cur + 4); > > > } > > > > > > p = to_pedit(*a); > > > @@ -318,15 +311,12 @@ static void tcf_pedit_cleanup(struct tc_action *a) > > > call_rcu(&parms->rcu, tcf_pedit_cleanup_rcu); > > > } > > > > > > -static bool offset_valid(struct sk_buff *skb, int offset) > > > +static bool offset_valid(struct sk_buff *skb, int offset, int len) > > > { > > > - if (offset > 0 && offset > skb->len) > > > - return false; > > > - > > > - if (offset < 0 && -offset > skb_headroom(skb)) > > > + if (offset < -(int)skb_headroom(skb)) > > > return false; > > > > > > - return true; > > > + return offset <= (int)skb->len - len; > > > } > > > > > > static int pedit_l4_skb_offset(struct sk_buff *skb, int *hoffset, const int header_type) > > > @@ -393,18 +383,10 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb, > > > struct tcf_pedit_key_ex *tkey_ex; > > > struct tcf_pedit_parms *parms; > > > struct tc_pedit_key *tkey; > > > - u32 max_offset; > > > int i; > > > > > > parms = rcu_dereference_bh(p->parms); > > > > > > - max_offset = (skb_transport_header_was_set(skb) ? > > > - skb_transport_offset(skb) : > > > - skb_network_offset(skb)) + > > > - parms->tcfp_off_max_hint; > > > - if (skb_ensure_writable(skb, min(skb->len, max_offset))) > > > - goto done; > > > - > > > tcf_lastuse_update(&p->tcf_tm); > > > tcf_action_update_bstats(&p->common, skb); > > > > > > @@ -412,10 +394,11 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb, > > > tkey_ex = parms->tcfp_keys_ex; > > > > > > for (i = parms->tcfp_nkeys; i > 0; i--, tkey++) { > > > + int write_offset, write_len; > > > int offset = tkey->off; > > > int hoffset = 0; > > > - u32 *ptr, hdata; > > > - u32 val; > > > + u32 cur_val, val; > > > + u32 *ptr; > > > int rc; > > > > > > if (tkey_ex) { > > > @@ -433,13 +416,15 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb, > > > > > > if (tkey->offmask) { > > > u8 *d, _d; > > > + int at_offset; > > > > > > - if (!offset_valid(skb, hoffset + tkey->at)) { > > > + if (check_add_overflow(hoffset, (int)tkey->at, &at_offset) || > > > > There is no real reason for checking the add. > > All that matters is that offset_valid() and skb_header_pointer() are > > passed the same offset. > > Assigning it to a local would make that clearer. > > You are technically correct - but earlier discussion was I believe > that signed integer overflow as undefined behavior and those checks > are there to avoid that and logic bypasses. The kernel is compiled with -Wno-strict-overflow so that integers wrap the way you might expect (and hope). A lot of code relies on that. So not only is signed integer overflow going to wrap (rather than saturate, fault, add 0 (a cobol interpreter I once used did that - annoying to debug), return a random value) the compiler is also not allowed to assume it doesn't wrap. If you are feeling really pedantic: at_offset = hoffeset + key->at; OPTIMZER_HIDE_VAR(at_offset); will ensure that the value that is tested is the value that is used. > > > > > > + !offset_valid(skb, at_offset, sizeof(_d))) { > > > pr_info_ratelimited("tc action pedit 'at' offset %d out of bounds\n", > > > hoffset + tkey->at); > > > goto bad; > > > } > > > - d = skb_header_pointer(skb, hoffset + tkey->at, > > > + d = skb_header_pointer(skb, at_offset, > > > sizeof(_d), &_d); > > > if (!d) > > > goto bad; > > > @@ -451,31 +436,51 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb, > > > } > > > } > > > > > > - if (!offset_valid(skb, hoffset + offset)) { > > > - pr_info_ratelimited("tc action pedit offset %d out of bounds\n", hoffset + offset); > > > + if (check_add_overflow(hoffset, offset, &write_offset)) { > > > + pr_info_ratelimited("tc action pedit offset overflow\n"); > > > goto bad; > > > } > > > > > > - ptr = skb_header_pointer(skb, hoffset + offset, > > > - sizeof(hdata), &hdata); > > > - if (!ptr) > > > + if (!offset_valid(skb, write_offset, sizeof(*ptr))) { > > > + pr_info_ratelimited("tc action pedit offset %d out of bounds\n", > > > + write_offset); > > > goto bad; > > > + } > > > > Again all that matters is that the same value is passed to offset_valid() > > and skb_header_pointer(). > > Same comment as earlier. > Note: we replaced skb_header_pointer() with direct memory access "ptr > = (u32 *)(skb->data + write_offset)" further down. > so extra validation of write_offset is justifiable, no? AFACT offset_valid() does all the required checks. I did initially miss that skb_ensure_writable() has a side effect of pulling all the data into the header. I don't 100% know all the forms an skb can have, I have used (and designed) too many buffer schemes for my brain to just know the answers. I don't see an obvious reason why it shouldn't be valid to write into data that isn't in the header. > > > > > + > > > + if (write_offset < 0) { > > > + if (skb_cow(skb, -write_offset)) > > > + goto bad; > > > + if (write_offset + (int)sizeof(*ptr) > 0) { > > > + if (skb_ensure_writable(skb, > > > + min_t(int, skb->len, > > > + write_offset + (int)sizeof(*ptr)))) > > > > That min isn't needed (same as below). > > Ok - yes, i see that. Same with the min(). Why did we think we needed > min/min_t() before? > > > > + goto bad; > > > + } > > > + } else { > > > + if (check_add_overflow(write_offset, (int)sizeof(*ptr), > > > + &write_len)) > > > > That can't fail because of the 'return offset <= (int)skb->len - len;' check. > > True. > > > > > > + goto bad; > > > + if (skb_ensure_writable(skb, min_t(int, skb->len, > > > + write_len))) You've completely missed that the above has to include the write_offset. > > > > Similarly that min_t() will return 'write_len'. > > But is that even correct? > > The code wants to write 4 bytes at skb->data + write_offset. > > I can't see where the offset is used. > > All the overflow tests have made the logic far too hard to follow. > > > > I think that whole lot can just be: > > hoffset += offset; > > if (offset_valid(skb, hoffset, sizeof(hdata))) > > goto bad; > > > > /* Ensure any needed headroom is writeable */ > > if (hoffset < 0 && skb_cow(skb, -hoffset) > > goto bad; > > /* Ensure the skb is writeable to the end of the area to be written. > > * The data is pulled into the skb header. */ > > if ((hoffset >= 0 || hoffset + (int)sizeof(hdata) > 0) && > > skb_ensure_writable(skb, hoffset + (int)sizeof(hdata)) > > goto bad; > > The call to offset_valid() does all the other checks. > > (The hoffset >= 0 check is the inverse of the earlier check and will > > be optimised away - skipping the second second test which is then > > always true.) > > Yes, that code is more readable. > > > If you really want a warning message, put a generic one on 'bad;'. > > > > > + goto bad; > > > + } > > > + > > > + ptr = (u32 *)(skb->data + write_offset); > > > + cur_val = get_unaligned(ptr); > > > > Given you are doing an unaligned read you could remove the check in the > > 'offmask' path that the final offset is a multiple of 4. > > Nothing checked that tkey->off is a multiple of 4, and I suspect that > > is user-specifiable? > > The only reason i will keep that check is because we have always > checked for the 32b alignment from an API perspective. I am less > worried about than potentially breaking something in the hardware > offload path when a user passes unaligned offsets down given that > these code paths (towards hardware) have been well vetted. > > I am trying to get this security fix in. Do you see anything in what > you stated as a show stopper? IOW introducing regression or plain > wrong? One thing I'm not sure about is whether applications might be setting a mask in order to update a single byte. If that byte is one of the last three of the skb data then the RMW will read/write bytes beyond the end. That used to be ok, but will now be rejected. -- David > If not, I will wait for the AI review or if someone finds an issue > which breaks something - and if it is worth going back then i will > consider your suggestions as part of the next update. > Does that sound reasonable? > > cheers, > jamal > > > I'm also not entirely that the mac_header is aligned. > > > > -- David > > > > > /* just do it, baby */ > > > switch (cmd) { > > > case TCA_PEDIT_KEY_EX_CMD_SET: > > > val = tkey->val; > > > break; > > > case TCA_PEDIT_KEY_EX_CMD_ADD: > > > - val = (*ptr + tkey->val) & ~tkey->mask; > > > + val = (cur_val + tkey->val) & ~tkey->mask; > > > break; > > > default: > > > pr_info_ratelimited("tc action pedit bad command (%d)\n", cmd); > > > goto bad; > > > } > > > > > > - *ptr = ((*ptr & tkey->mask) ^ val); > > > - if (ptr == &hdata) > > > - skb_store_bits(skb, hoffset + offset, ptr, 4); > > > + put_unaligned((cur_val & tkey->mask) ^ val, ptr); > > > } > > > > > > goto done; > > ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH net v5 1/1] net/sched: fix pedit partial COW leading to page cache corruption 2026-05-31 18:42 ` David Laight @ 2026-05-31 18:52 ` David Laight 0 siblings, 0 replies; 6+ messages in thread From: David Laight @ 2026-05-31 18:52 UTC (permalink / raw) To: Jamal Hadi Salim Cc: netdev, davem, edumazet, kuba, pabeni, horms, jiri, victor, yimingqian591, keenanat2000, 2045gemini, rollkingzzc, toke, dcaratti, security, linux-kernel, stable, Rajat Gupta On Sun, 31 May 2026 19:42:41 +0100 David Laight <david.laight.linux@gmail.com> wrote: > On Sun, 31 May 2026 13:25:36 -0400 > Jamal Hadi Salim <jhs@mojatatu.com> wrote: .... > > > > > > > + goto bad; > > > > + if (skb_ensure_writable(skb, min_t(int, skb->len, > > > > + write_len))) > > You've completely missed that the above has to include the write_offset. And I've missed that write_len isn't the length you are going to write :-( This version is unreadable.... A few one line comments can make all the difference. I put these two in for a reason. It took some effort to find out exactly what they did. > > /* Ensure any needed headroom is writeable */ > > if (hoffset < 0 && skb_cow(skb, -hoffset) > > goto bad; > > /* Ensure the skb is writeable to the end of the area to be written. > > * The data is pulled into the skb header. */ -- David ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2026-05-31 18:52 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-05-31 12:32 [PATCH net v5 1/1] net/sched: fix pedit partial COW leading to page cache corruption Jamal Hadi Salim 2026-05-31 16:15 ` David Laight 2026-05-31 17:25 ` Jamal Hadi Salim 2026-05-31 17:28 ` Jamal Hadi Salim 2026-05-31 18:42 ` David Laight 2026-05-31 18:52 ` David Laight
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox