From: David Laight <david.laight.linux@gmail.com>
To: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: netdev@vger.kernel.org, davem@davemloft.net, edumazet@google.com,
kuba@kernel.org, pabeni@redhat.com, horms@kernel.org,
jiri@resnulli.us, victor@mojatatu.com, yimingqian591@gmail.com,
keenanat2000@gmail.com, 2045gemini@gmail.com,
rollkingzzc@gmail.com, toke@redhat.com, dcaratti@redhat.com,
security@kernel.org, linux-kernel@vger.kernel.org,
stable@kernel.org, Rajat Gupta <rajat.gupta@oss.qualcomm.com>
Subject: Re: [PATCH net v5 1/1] net/sched: fix pedit partial COW leading to page cache corruption
Date: Sun, 31 May 2026 17:15:34 +0100 [thread overview]
Message-ID: <20260531171534.6aae381a@pumpkin> (raw)
In-Reply-To: <20260531123221.48732-1-jhs@mojatatu.com>
On Sun, 31 May 2026 08:32:21 -0400
Jamal Hadi Salim <jhs@mojatatu.com> wrote:
> From: Rajat Gupta <rajat.gupta@oss.qualcomm.com>
>
> tcf_pedit_act() computes the COW range for skb_ensure_writable()
> once before the key loop using tcfp_off_max_hint, but the hint does
> not account for the runtime header offset added by typed keys. This
> can leave part of the write region un-COW'd.
>
> Fix by moving skb_ensure_writable() inside the per-key loop where
> the actual write offset is known, and add overflow checking on the
> offset arithmetic. For negative offsets (e.g. Ethernet header edits
> at ingress), use skb_cow() to COW the headroom instead. Guard
> offset_valid() against INT_MIN, where negation is undefined.
This code has all got out of hand somewhere.
I've only been looking at offset_valid() code.
> Fixes: 8b796475fd78 ("net/sched: act_pedit: really ensure the skb is writable")
> Reported-by: Yiming Qian <yimingqian591@gmail.com>
> Reported-by: Keenan Dong <keenanat2000@gmail.com>
> Reported-by: Han Guidong <2045gemini@gmail.com>
> Reported-by: Zhang Cen <rollkingzzc@gmail.com>
> Reviewed-by: Han Guidong <2045gemini@gmail.com>
> Tested-by: Han Guidong <2045gemini@gmail.com>
> Reviewed-by: Davide Caratti <dcaratti@redhat.com>
> Tested-by: Davide Caratti <dcaratti@redhat.com>
> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
> Tested-by: Toke Høiland-Jørgensen <toke@redhat.com>
> Reviewed-by: Victor Nogueira <victor@mojatatu.com>
> Tested-by: Victor Nogueira <victor@mojatatu.com>
> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
> Signed-off-by: Rajat Gupta <rajat.gupta@oss.qualcomm.com>
> ---
> v4->v5:
> 1) Dead code removal obsoleted after removing tcfp_off_max_hint as identified
> by sashiko-claude[1]. Remove dead code updating "cur".
> 2) Addition of hoffset at tkey->at promotes hoffset to unsigned int,
> wraps on overflow. Claim made by both sashiko-claude[1] and sashiko-gemini[2]
> I think this is far fetched but possible. Cast tkey->at to int.
> 3) Signedness mismatch in min() macro may potentially cause build issues.
> Claim made by both sashiko-claude[1] and sashiko-gemini[2].
> Go back to min_t() for now. David L. can make a better proposal later..
>
> [1]https://netdev-ai.bots.linux.dev/sashiko/#/patchset/20260530080643.1345521-1-jhs%40mojatatu.com
> [2]https://sashiko.dev/#/patchset/20260530080643.1345521-1-jhs%40mojatatu.com
> ---
> include/net/tc_act/tc_pedit.h | 1 -
> net/sched/act_pedit.c | 77 +++++++++++++++++++----------------
> 2 files changed, 41 insertions(+), 37 deletions(-)
>
> diff --git a/include/net/tc_act/tc_pedit.h b/include/net/tc_act/tc_pedit.h
> index f58ee15cd858..cb7b82f2cbc7 100644
> --- a/include/net/tc_act/tc_pedit.h
> +++ b/include/net/tc_act/tc_pedit.h
> @@ -15,7 +15,6 @@ struct tcf_pedit_parms {
> struct tc_pedit_key *tcfp_keys;
> struct tcf_pedit_key_ex *tcfp_keys_ex;
> int action;
> - u32 tcfp_off_max_hint;
> unsigned char tcfp_nkeys;
> unsigned char tcfp_flags;
> struct rcu_head rcu;
> diff --git a/net/sched/act_pedit.c b/net/sched/act_pedit.c
> index bc20f08a2789..bd3b1da3cd63 100644
> --- a/net/sched/act_pedit.c
> +++ b/net/sched/act_pedit.c
> @@ -16,6 +16,8 @@
> #include <linux/ip.h>
> #include <linux/ipv6.h>
> #include <linux/slab.h>
> +#include <linux/overflow.h>
> +#include <linux/unaligned.h>
> #include <net/ipv6.h>
> #include <net/netlink.h>
> #include <net/pkt_sched.h>
> @@ -242,7 +244,6 @@ static int tcf_pedit_init(struct net *net, struct nlattr *nla,
> goto out_free_ex;
> }
>
> - nparms->tcfp_off_max_hint = 0;
> nparms->tcfp_flags = parm->flags;
> nparms->tcfp_nkeys = parm->nkeys;
>
> @@ -268,14 +269,6 @@ static int tcf_pedit_init(struct net *net, struct nlattr *nla,
> BITS_PER_TYPE(int) - 1,
> nparms->tcfp_keys[i].shift);
>
> - /* The AT option can read a single byte, we can bound the actual
> - * value with uchar max.
> - */
> - cur += (0xff & offmask) >> nparms->tcfp_keys[i].shift;
> -
> - /* Each key touches 4 bytes starting from the computed offset */
> - nparms->tcfp_off_max_hint =
> - max(nparms->tcfp_off_max_hint, cur + 4);
> }
>
> p = to_pedit(*a);
> @@ -318,15 +311,12 @@ static void tcf_pedit_cleanup(struct tc_action *a)
> call_rcu(&parms->rcu, tcf_pedit_cleanup_rcu);
> }
>
> -static bool offset_valid(struct sk_buff *skb, int offset)
> +static bool offset_valid(struct sk_buff *skb, int offset, int len)
> {
> - if (offset > 0 && offset > skb->len)
> - return false;
> -
> - if (offset < 0 && -offset > skb_headroom(skb))
> + if (offset < -(int)skb_headroom(skb))
> return false;
>
> - return true;
> + return offset <= (int)skb->len - len;
> }
>
> static int pedit_l4_skb_offset(struct sk_buff *skb, int *hoffset, const int header_type)
> @@ -393,18 +383,10 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb,
> struct tcf_pedit_key_ex *tkey_ex;
> struct tcf_pedit_parms *parms;
> struct tc_pedit_key *tkey;
> - u32 max_offset;
> int i;
>
> parms = rcu_dereference_bh(p->parms);
>
> - max_offset = (skb_transport_header_was_set(skb) ?
> - skb_transport_offset(skb) :
> - skb_network_offset(skb)) +
> - parms->tcfp_off_max_hint;
> - if (skb_ensure_writable(skb, min(skb->len, max_offset)))
> - goto done;
> -
> tcf_lastuse_update(&p->tcf_tm);
> tcf_action_update_bstats(&p->common, skb);
>
> @@ -412,10 +394,11 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb,
> tkey_ex = parms->tcfp_keys_ex;
>
> for (i = parms->tcfp_nkeys; i > 0; i--, tkey++) {
> + int write_offset, write_len;
> int offset = tkey->off;
> int hoffset = 0;
> - u32 *ptr, hdata;
> - u32 val;
> + u32 cur_val, val;
> + u32 *ptr;
> int rc;
>
> if (tkey_ex) {
> @@ -433,13 +416,15 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb,
>
> if (tkey->offmask) {
> u8 *d, _d;
> + int at_offset;
>
> - if (!offset_valid(skb, hoffset + tkey->at)) {
> + if (check_add_overflow(hoffset, (int)tkey->at, &at_offset) ||
There is no real reason for checking the add.
All that matters is that offset_valid() and skb_header_pointer() are
passed the same offset.
Assigning it to a local would make that clearer.
> + !offset_valid(skb, at_offset, sizeof(_d))) {
> pr_info_ratelimited("tc action pedit 'at' offset %d out of bounds\n",
> hoffset + tkey->at);
> goto bad;
> }
> - d = skb_header_pointer(skb, hoffset + tkey->at,
> + d = skb_header_pointer(skb, at_offset,
> sizeof(_d), &_d);
> if (!d)
> goto bad;
> @@ -451,31 +436,51 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb,
> }
> }
>
> - if (!offset_valid(skb, hoffset + offset)) {
> - pr_info_ratelimited("tc action pedit offset %d out of bounds\n", hoffset + offset);
> + if (check_add_overflow(hoffset, offset, &write_offset)) {
> + pr_info_ratelimited("tc action pedit offset overflow\n");
> goto bad;
> }
>
> - ptr = skb_header_pointer(skb, hoffset + offset,
> - sizeof(hdata), &hdata);
> - if (!ptr)
> + if (!offset_valid(skb, write_offset, sizeof(*ptr))) {
> + pr_info_ratelimited("tc action pedit offset %d out of bounds\n",
> + write_offset);
> goto bad;
> + }
Again all that matters is that the same value is passed to offset_valid()
and skb_header_pointer().
> +
> + if (write_offset < 0) {
> + if (skb_cow(skb, -write_offset))
> + goto bad;
> + if (write_offset + (int)sizeof(*ptr) > 0) {
> + if (skb_ensure_writable(skb,
> + min_t(int, skb->len,
> + write_offset + (int)sizeof(*ptr))))
That min isn't needed (same as below).
> + goto bad;
> + }
> + } else {
> + if (check_add_overflow(write_offset, (int)sizeof(*ptr),
> + &write_len))
That can't fail because of the 'return offset <= (int)skb->len - len;' check.
> + goto bad;
> + if (skb_ensure_writable(skb, min_t(int, skb->len,
> + write_len)))
Similarly that min_t() will return 'write_len'.
But is that even correct?
The code wants to write 4 bytes at skb->data + write_offset.
I can't see where the offset is used.
All the overflow tests have made the logic far too hard to follow.
I think that whole lot can just be:
hoffset += offset;
if (offset_valid(skb, hoffset, sizeof(hdata)))
goto bad;
/* Ensure any needed headroom is writeable */
if (hoffset < 0 && skb_cow(skb, -hoffset)
goto bad;
/* Ensure the skb is writeable to the end of the area to be written.
* The data is pulled into the skb header. */
if ((hoffset >= 0 || hoffset + (int)sizeof(hdata) > 0) &&
skb_ensure_writable(skb, hoffset + (int)sizeof(hdata))
goto bad;
The call to offset_valid() does all the other checks.
(The hoffset >= 0 check is the inverse of the earlier check and will
be optimised away - skipping the second second test which is then
always true.)
If you really want a warning message, put a generic one on 'bad;'.
> + goto bad;
> + }
> +
> + ptr = (u32 *)(skb->data + write_offset);
> + cur_val = get_unaligned(ptr);
Given you are doing an unaligned read you could remove the check in the
'offmask' path that the final offset is a multiple of 4.
Nothing checked that tkey->off is a multiple of 4, and I suspect that
is user-specifiable?
I'm also not entirely that the mac_header is aligned.
-- David
> /* just do it, baby */
> switch (cmd) {
> case TCA_PEDIT_KEY_EX_CMD_SET:
> val = tkey->val;
> break;
> case TCA_PEDIT_KEY_EX_CMD_ADD:
> - val = (*ptr + tkey->val) & ~tkey->mask;
> + val = (cur_val + tkey->val) & ~tkey->mask;
> break;
> default:
> pr_info_ratelimited("tc action pedit bad command (%d)\n", cmd);
> goto bad;
> }
>
> - *ptr = ((*ptr & tkey->mask) ^ val);
> - if (ptr == &hdata)
> - skb_store_bits(skb, hoffset + offset, ptr, 4);
> + put_unaligned((cur_val & tkey->mask) ^ val, ptr);
> }
>
> goto done;
next prev parent reply other threads:[~2026-05-31 16:15 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-31 12:32 [PATCH net v5 1/1] net/sched: fix pedit partial COW leading to page cache corruption Jamal Hadi Salim
2026-05-31 16:15 ` David Laight [this message]
2026-05-31 17:25 ` Jamal Hadi Salim
2026-05-31 17:28 ` Jamal Hadi Salim
2026-05-31 18:42 ` David Laight
2026-05-31 18:52 ` David Laight
2026-06-01 8:26 ` Jamal Hadi Salim
2026-06-04 10:01 ` Jamal Hadi Salim
2026-06-04 15:40 ` patchwork-bot+netdevbpf
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260531171534.6aae381a@pumpkin \
--to=david.laight.linux@gmail.com \
--cc=2045gemini@gmail.com \
--cc=davem@davemloft.net \
--cc=dcaratti@redhat.com \
--cc=edumazet@google.com \
--cc=horms@kernel.org \
--cc=jhs@mojatatu.com \
--cc=jiri@resnulli.us \
--cc=keenanat2000@gmail.com \
--cc=kuba@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=rajat.gupta@oss.qualcomm.com \
--cc=rollkingzzc@gmail.com \
--cc=security@kernel.org \
--cc=stable@kernel.org \
--cc=toke@redhat.com \
--cc=victor@mojatatu.com \
--cc=yimingqian591@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.