* Re: [PATCH net] net/sched: fix pedit partial COW leading to page cache corruption
2026-05-19 3:39 ` [PATCH net] net/sched: fix pedit partial COW leading to page cache corruption Rajat Gupta
@ 2026-05-19 11:18 ` Toke Høiland-Jørgensen
2026-05-19 15:10 ` Han Guidong
` (2 subsequent siblings)
3 siblings, 0 replies; 44+ messages in thread
From: Toke Høiland-Jørgensen @ 2026-05-19 11:18 UTC (permalink / raw)
To: Rajat Gupta, netdev
Cc: davem, edumazet, kuba, pabeni, horms, jhs, jiri, yimingqian591,
keenanat2000, 2045gemini, rollkingzzc, Rajat Gupta
Rajat Gupta <rajat.gupta@oss.qualcomm.com> writes:
> tcf_pedit_act() computes the COW range for skb_ensure_writable()
> once before the key loop using tcfp_off_max_hint, but the hint does
> not account for the runtime header offset added by typed keys. This
> can leave part of the write region un-COW'd.
>
> Fix by moving skb_ensure_writable() inside the per-key loop where
> the actual write offset is known, and add overflow checking on the
> offset arithmetic. For negative offsets (e.g. Ethernet header edits
> at ingress), use skb_cow() to COW the headroom instead. Guard
> offset_valid() against INT_MIN, where negation is undefined.
>
> Additionally, linearize skbs with shared frags upfront to prevent
> silent data corruption when pedit operates on zero-copy pages
> (e.g. from sendfile).
>
> Fixes: 8b796475fd78 ("net/sched: act_pedit: really ensure the skb is writable")
> Reported-by: Rajat Gupta <rajat.gupta@oss.qualcomm.com>
> Reported-by: Yiming Qian <yimingqian591@gmail.com>
> Reported-by: Keenan Dong <keenanat2000@gmail.com>
> Reported-by: Han Guidong <2045gemini@gmail.com>
> Reported-by: Zhang Cen <rollkingzzc@gmail.com>
> Tested-by: Han Guidong <2045gemini@gmail.com>
> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
> Signed-off-by: Rajat Gupta <rajat.gupta@oss.qualcomm.com>
Reviewed-by: Toke Høiland-Jørgensen <toke@kernel.org>
Also applied this to -net and ran the TC pedit selftests and the
pedit_* scripts in net/forwarding, none of which turned up any
regressions, so:
Tested-by: Toke Høiland-Jørgensen <toke@kernel.org>
^ permalink raw reply [flat|nested] 44+ messages in thread* Re: [PATCH net] net/sched: fix pedit partial COW leading to page cache corruption
2026-05-19 3:39 ` [PATCH net] net/sched: fix pedit partial COW leading to page cache corruption Rajat Gupta
2026-05-19 11:18 ` Toke Høiland-Jørgensen
@ 2026-05-19 15:10 ` Han Guidong
2026-05-20 9:12 ` Jamal Hadi Salim
2026-05-20 9:23 ` Jamal Hadi Salim
2026-05-26 9:53 ` David Laight
3 siblings, 1 reply; 44+ messages in thread
From: Han Guidong @ 2026-05-19 15:10 UTC (permalink / raw)
To: Rajat Gupta
Cc: netdev, davem, edumazet, kuba, pabeni, horms, jhs, jiri,
yimingqian591, keenanat2000, rollkingzzc
On Tue, May 19, 2026 at 11:42 AM Rajat Gupta
<rajat.gupta@oss.qualcomm.com> wrote:
>
> tcf_pedit_act() computes the COW range for skb_ensure_writable()
> once before the key loop using tcfp_off_max_hint, but the hint does
> not account for the runtime header offset added by typed keys. This
> can leave part of the write region un-COW'd.
>
> Fix by moving skb_ensure_writable() inside the per-key loop where
> the actual write offset is known, and add overflow checking on the
> offset arithmetic. For negative offsets (e.g. Ethernet header edits
> at ingress), use skb_cow() to COW the headroom instead. Guard
> offset_valid() against INT_MIN, where negation is undefined.
>
> Additionally, linearize skbs with shared frags upfront to prevent
> silent data corruption when pedit operates on zero-copy pages
> (e.g. from sendfile).
>
> Fixes: 8b796475fd78 ("net/sched: act_pedit: really ensure the skb is writable")
Fixes: 6c02568fd1ae ("net/sched: act_pedit: Parse L3 Header for L4 offset")
Cc: stable@vger.kernel.org
I took a closer look at the code, and this patch is really addressing
two issues.
The negative-offset write into shared head/headroom appears to be from
8b796475fd78, while the shared-frag write, and thus possible
page-cache corruption, appears to be the later regression from
6c02568fd1ae. So it may be worth adding an additional Fixes tag for
6c02568fd1ae, and I think an explicit Cc: stable@vger.kernel.org would
also make sense here.
> Reported-by: Rajat Gupta <rajat.gupta@oss.qualcomm.com>
> Reported-by: Yiming Qian <yimingqian591@gmail.com>
> Reported-by: Keenan Dong <keenanat2000@gmail.com>
> Reported-by: Han Guidong <2045gemini@gmail.com>
> Reported-by: Zhang Cen <rollkingzzc@gmail.com>
> Tested-by: Han Guidong <2045gemini@gmail.com>
> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
> Signed-off-by: Rajat Gupta <rajat.gupta@oss.qualcomm.com>
> ---
> net/sched/act_pedit.c | 54 ++++++++++++++++++++++++++++++++-----------
> 1 file changed, 41 insertions(+), 13 deletions(-)
>
> diff --git a/net/sched/act_pedit.c b/net/sched/act_pedit.c
> index bc20f08a2..79921b8d8 100644
> --- a/net/sched/act_pedit.c
> +++ b/net/sched/act_pedit.c
> @@ -16,6 +16,7 @@
> #include <linux/ip.h>
> #include <linux/ipv6.h>
> #include <linux/slab.h>
> +#include <linux/overflow.h>
> #include <net/ipv6.h>
> #include <net/netlink.h>
> #include <net/pkt_sched.h>
> @@ -323,8 +324,10 @@ static bool offset_valid(struct sk_buff *skb, int offset)
> if (offset > 0 && offset > skb->len)
> return false;
>
> - if (offset < 0 && -offset > skb_headroom(skb))
> - return false;
> + if (offset < 0) {
> + if (offset == INT_MIN || -offset > skb_headroom(skb))
> + return false;
> + }
>
> return true;
> }
> @@ -393,17 +396,21 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb,
> struct tcf_pedit_key_ex *tkey_ex;
> struct tcf_pedit_parms *parms;
> struct tc_pedit_key *tkey;
> - u32 max_offset;
> int i;
>
> parms = rcu_dereference_bh(p->parms);
>
> - max_offset = (skb_transport_header_was_set(skb) ?
> - skb_transport_offset(skb) :
> - skb_network_offset(skb)) +
> - parms->tcfp_off_max_hint;
> - if (skb_ensure_writable(skb, min(skb->len, max_offset)))
> - goto done;
> + /*
> + * If the skb has shared frags the user is likely using zero-copy
> + * (e.g. sendfile). Those page frags may point to page-cache pages;
> + * writing into them would silently corrupt the page cache.
> + * Linearize so pedit operates on a private copy.
> + * TL;DR: if you want zero-copy, don't use pedit.
> + */
> + if (skb_has_shared_frag(skb)) {
> + if (__skb_linearize(skb))
> + goto bad;
> + }
>
> tcf_lastuse_update(&p->tcf_tm);
> tcf_action_update_bstats(&p->common, skb);
> @@ -412,6 +419,7 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb,
> tkey_ex = parms->tcfp_keys_ex;
>
> for (i = parms->tcfp_nkeys; i > 0; i--, tkey++) {
> + int write_offset, write_len;
> int offset = tkey->off;
> int hoffset = 0;
> u32 *ptr, hdata;
> @@ -451,12 +459,32 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb,
> }
> }
>
> - if (!offset_valid(skb, hoffset + offset)) {
> - pr_info_ratelimited("tc action pedit offset %d out of bounds\n", hoffset + offset);
> + if (unlikely(check_add_overflow(hoffset, offset,
> + &write_offset))) {
I also noticed two remaining unchecked additions, e.g. hoffset +
tkey->at and offset += (*d & offmask) >> shift. These look more like
wrap/type-semantics cleanup than a practical signed-overflow/UB issue
to me, so I am OK with this patch as-is.
> + pr_info_ratelimited("tc action pedit offset overflow\n");
> + goto bad;
> + }
> +
> + if (!offset_valid(skb, write_offset)) {
> + pr_info_ratelimited("tc action pedit offset %d out of bounds\n",
> + write_offset);
> goto bad;
> }
>
> - ptr = skb_header_pointer(skb, hoffset + offset,
> + if (write_offset < 0) {
There is also a small negative-offset 4-byte-write corner case, e.g.
an ETH-relative edit with hoffset=-14 and offset=12, giving a final
write_offset=-2; if the linear head after skb->data is only 1 byte,
that store spans -2..1. That seems unlikely in practice; fixing it
would be nice, but I would not hold this patch for it.
Reviewed-by: Han Guidong <2045gemini@gmail.com>
> + if (skb_cow(skb, -write_offset))
> + goto bad;
> + } else {
> + if (unlikely(check_add_overflow(write_offset,
> + (int)sizeof(hdata),
> + &write_len)))
> + goto bad;
> + if (skb_ensure_writable(skb, min_t(int, skb->len,
> + write_len)))
> + goto bad;
> + }
> +
> + ptr = skb_header_pointer(skb, write_offset,
> sizeof(hdata), &hdata);
> if (!ptr)
> goto bad;
> @@ -475,7 +503,7 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb,
>
> *ptr = ((*ptr & tkey->mask) ^ val);
> if (ptr == &hdata)
> - skb_store_bits(skb, hoffset + offset, ptr, 4);
> + skb_store_bits(skb, write_offset, ptr, sizeof(hdata));
> }
>
> goto done;
> --
> 2.51.2.windows.1
>
^ permalink raw reply [flat|nested] 44+ messages in thread* Re: [PATCH net] net/sched: fix pedit partial COW leading to page cache corruption
2026-05-19 15:10 ` Han Guidong
@ 2026-05-20 9:12 ` Jamal Hadi Salim
2026-05-20 10:04 ` Han Guidong
2026-05-20 10:36 ` Han Guidong
0 siblings, 2 replies; 44+ messages in thread
From: Jamal Hadi Salim @ 2026-05-20 9:12 UTC (permalink / raw)
To: Han Guidong
Cc: Rajat Gupta, netdev, davem, edumazet, kuba, pabeni, horms, jiri,
yimingqian591, keenanat2000, rollkingzzc
On Tue, May 19, 2026 at 11:11 AM Han Guidong <2045gemini@gmail.com> wrote:
>
> On Tue, May 19, 2026 at 11:42 AM Rajat Gupta
> <rajat.gupta@oss.qualcomm.com> wrote:
> >
> > tcf_pedit_act() computes the COW range for skb_ensure_writable()
> > once before the key loop using tcfp_off_max_hint, but the hint does
> > not account for the runtime header offset added by typed keys. This
> > can leave part of the write region un-COW'd.
> >
> > Fix by moving skb_ensure_writable() inside the per-key loop where
> > the actual write offset is known, and add overflow checking on the
> > offset arithmetic. For negative offsets (e.g. Ethernet header edits
> > at ingress), use skb_cow() to COW the headroom instead. Guard
> > offset_valid() against INT_MIN, where negation is undefined.
> >
> > Additionally, linearize skbs with shared frags upfront to prevent
> > silent data corruption when pedit operates on zero-copy pages
> > (e.g. from sendfile).
> >
> > Fixes: 8b796475fd78 ("net/sched: act_pedit: really ensure the skb is writable")
>
> Fixes: 6c02568fd1ae ("net/sched: act_pedit: Parse L3 Header for L4 offset")
> Cc: stable@vger.kernel.org
>
> I took a closer look at the code, and this patch is really addressing
> two issues.
>
> The negative-offset write into shared head/headroom appears to be from
> 8b796475fd78, while the shared-frag write, and thus possible
> page-cache corruption, appears to be the later regression from
> 6c02568fd1ae. So it may be worth adding an additional Fixes tag for
> 6c02568fd1ae,
Pushing it a little - but sure we should also point to that commit.
> and I think an explicit Cc: stable@vger.kernel.org would also make sense here.
Please study the rules.
> > Reported-by: Rajat Gupta <rajat.gupta@oss.qualcomm.com>
> > Reported-by: Yiming Qian <yimingqian591@gmail.com>
> > Reported-by: Keenan Dong <keenanat2000@gmail.com>
> > Reported-by: Han Guidong <2045gemini@gmail.com>
> > Reported-by: Zhang Cen <rollkingzzc@gmail.com>
> > Tested-by: Han Guidong <2045gemini@gmail.com>
> > Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
> > Signed-off-by: Rajat Gupta <rajat.gupta@oss.qualcomm.com>
> > ---
> > net/sched/act_pedit.c | 54 ++++++++++++++++++++++++++++++++-----------
> > 1 file changed, 41 insertions(+), 13 deletions(-)
> >
> > diff --git a/net/sched/act_pedit.c b/net/sched/act_pedit.c
> > index bc20f08a2..79921b8d8 100644
> > --- a/net/sched/act_pedit.c
> > +++ b/net/sched/act_pedit.c
> > @@ -16,6 +16,7 @@
> > #include <linux/ip.h>
> > #include <linux/ipv6.h>
> > #include <linux/slab.h>
> > +#include <linux/overflow.h>
> > #include <net/ipv6.h>
> > #include <net/netlink.h>
> > #include <net/pkt_sched.h>
> > @@ -323,8 +324,10 @@ static bool offset_valid(struct sk_buff *skb, int offset)
> > if (offset > 0 && offset > skb->len)
> > return false;
> >
> > - if (offset < 0 && -offset > skb_headroom(skb))
> > - return false;
> > + if (offset < 0) {
> > + if (offset == INT_MIN || -offset > skb_headroom(skb))
> > + return false;
> > + }
> >
> > return true;
> > }
> > @@ -393,17 +396,21 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb,
> > struct tcf_pedit_key_ex *tkey_ex;
> > struct tcf_pedit_parms *parms;
> > struct tc_pedit_key *tkey;
> > - u32 max_offset;
> > int i;
> >
> > parms = rcu_dereference_bh(p->parms);
> >
> > - max_offset = (skb_transport_header_was_set(skb) ?
> > - skb_transport_offset(skb) :
> > - skb_network_offset(skb)) +
> > - parms->tcfp_off_max_hint;
> > - if (skb_ensure_writable(skb, min(skb->len, max_offset)))
> > - goto done;
> > + /*
> > + * If the skb has shared frags the user is likely using zero-copy
> > + * (e.g. sendfile). Those page frags may point to page-cache pages;
> > + * writing into them would silently corrupt the page cache.
> > + * Linearize so pedit operates on a private copy.
> > + * TL;DR: if you want zero-copy, don't use pedit.
> > + */
> > + if (skb_has_shared_frag(skb)) {
> > + if (__skb_linearize(skb))
> > + goto bad;
> > + }
> >
> > tcf_lastuse_update(&p->tcf_tm);
> > tcf_action_update_bstats(&p->common, skb);
> > @@ -412,6 +419,7 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb,
> > tkey_ex = parms->tcfp_keys_ex;
> >
> > for (i = parms->tcfp_nkeys; i > 0; i--, tkey++) {
> > + int write_offset, write_len;
> > int offset = tkey->off;
> > int hoffset = 0;
> > u32 *ptr, hdata;
> > @@ -451,12 +459,32 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb,
> > }
> > }
> >
> > - if (!offset_valid(skb, hoffset + offset)) {
> > - pr_info_ratelimited("tc action pedit offset %d out of bounds\n", hoffset + offset);
> > + if (unlikely(check_add_overflow(hoffset, offset,
> > + &write_offset))) {
>
> I also noticed two remaining unchecked additions, e.g. hoffset +
> tkey->at and offset += (*d & offmask) >> shift. These look more like
> wrap/type-semantics cleanup than a practical signed-overflow/UB issue
> to me, so I am OK with this patch as-is.
Did you get this from some AI?
None of these are possible. Both additions you mention are safe in
practice because any potential wrap around or overflow is caught
_before any memory access occurs_. Example see:
offset_valid() which implictly casts hoffset + key to signed int.
> > + pr_info_ratelimited("tc action pedit offset overflow\n");
> > + goto bad;
> > + }
> > +
> > + if (!offset_valid(skb, write_offset)) {
> > + pr_info_ratelimited("tc action pedit offset %d out of bounds\n",
> > + write_offset);
> > goto bad;
> > }
> >
> > - ptr = skb_header_pointer(skb, hoffset + offset,
> > + if (write_offset < 0) {
>
> There is also a small negative-offset 4-byte-write corner case, e.g.
> an ETH-relative edit with hoffset=-14 and offset=12, giving a final
> write_offset=-2; if the linear head after skb->data is only 1 byte,
> that store spans -2..1. That seems unlikely in practice; fixing it
> would be nice, but I would not hold this patch for it.
>
Again smells AI generated. I spent a lot of time trying to determine
the practicality and validity of this (and your previous claim), only
to find that this whole claim is based on a view that a headroom of 1
byte is a valid scenario in practice. Is it theoretically possible
from the math? yes. Practically, I don't see how it makes any sense.
Can you describe a scenario where this would happen in practise?
> Reviewed-by: Han Guidong <2045gemini@gmail.com>
More like reviewed by my AI who goes by the name foobar.
cheers,
jamal
>
>
> > + if (skb_cow(skb, -write_offset))
> > + goto bad;
> > + } else {
> > + if (unlikely(check_add_overflow(write_offset,
> > + (int)sizeof(hdata),
> > + &write_len)))
> > + goto bad;
> > + if (skb_ensure_writable(skb, min_t(int, skb->len,
> > + write_len)))
> > + goto bad;
> > + }
> > +
> > + ptr = skb_header_pointer(skb, write_offset,
> > sizeof(hdata), &hdata);
> > if (!ptr)
> > goto bad;
> > @@ -475,7 +503,7 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb,
> >
> > *ptr = ((*ptr & tkey->mask) ^ val);
> > if (ptr == &hdata)
> > - skb_store_bits(skb, hoffset + offset, ptr, 4);
> > + skb_store_bits(skb, write_offset, ptr, sizeof(hdata));
> > }
> >
> > goto done;
> > --
> > 2.51.2.windows.1
> >
^ permalink raw reply [flat|nested] 44+ messages in thread* Re: [PATCH net] net/sched: fix pedit partial COW leading to page cache corruption
2026-05-20 9:12 ` Jamal Hadi Salim
@ 2026-05-20 10:04 ` Han Guidong
2026-05-20 10:36 ` Han Guidong
1 sibling, 0 replies; 44+ messages in thread
From: Han Guidong @ 2026-05-20 10:04 UTC (permalink / raw)
To: Jamal Hadi Salim
Cc: Rajat Gupta, netdev, davem, edumazet, kuba, pabeni, horms, jiri,
yimingqian591, keenanat2000, rollkingzzc
On Wed, May 20, 2026 at 5:12 PM Jamal Hadi Salim <jhs@mojatatu.com> wrote:
>
> On Tue, May 19, 2026 at 11:11 AM Han Guidong <2045gemini@gmail.com> wrote:
> >
> > On Tue, May 19, 2026 at 11:42 AM Rajat Gupta
> > <rajat.gupta@oss.qualcomm.com> wrote:
> > >
> > > tcf_pedit_act() computes the COW range for skb_ensure_writable()
> > > once before the key loop using tcfp_off_max_hint, but the hint does
> > > not account for the runtime header offset added by typed keys. This
> > > can leave part of the write region un-COW'd.
> > >
> > > Fix by moving skb_ensure_writable() inside the per-key loop where
> > > the actual write offset is known, and add overflow checking on the
> > > offset arithmetic. For negative offsets (e.g. Ethernet header edits
> > > at ingress), use skb_cow() to COW the headroom instead. Guard
> > > offset_valid() against INT_MIN, where negation is undefined.
> > >
> > > Additionally, linearize skbs with shared frags upfront to prevent
> > > silent data corruption when pedit operates on zero-copy pages
> > > (e.g. from sendfile).
> > >
> > > Fixes: 8b796475fd78 ("net/sched: act_pedit: really ensure the skb is writable")
> >
> > Fixes: 6c02568fd1ae ("net/sched: act_pedit: Parse L3 Header for L4 offset")
> > Cc: stable@vger.kernel.org
> >
> > I took a closer look at the code, and this patch is really addressing
> > two issues.
> >
> > The negative-offset write into shared head/headroom appears to be from
> > 8b796475fd78, while the shared-frag write, and thus possible
> > page-cache corruption, appears to be the later regression from
> > 6c02568fd1ae. So it may be worth adding an additional Fixes tag for
> > 6c02568fd1ae,
>
> Pushing it a little - but sure we should also point to that commit.
>
> > and I think an explicit Cc: stable@vger.kernel.org would also make sense here.
>
> Please study the rules.
I will review the rules. Sorry for the noise.
>
> > > Reported-by: Rajat Gupta <rajat.gupta@oss.qualcomm.com>
> > > Reported-by: Yiming Qian <yimingqian591@gmail.com>
> > > Reported-by: Keenan Dong <keenanat2000@gmail.com>
> > > Reported-by: Han Guidong <2045gemini@gmail.com>
> > > Reported-by: Zhang Cen <rollkingzzc@gmail.com>
> > > Tested-by: Han Guidong <2045gemini@gmail.com>
> > > Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
> > > Signed-off-by: Rajat Gupta <rajat.gupta@oss.qualcomm.com>
> > > ---
> > > net/sched/act_pedit.c | 54 ++++++++++++++++++++++++++++++++-----------
> > > 1 file changed, 41 insertions(+), 13 deletions(-)
> > >
> > > diff --git a/net/sched/act_pedit.c b/net/sched/act_pedit.c
> > > index bc20f08a2..79921b8d8 100644
> > > --- a/net/sched/act_pedit.c
> > > +++ b/net/sched/act_pedit.c
> > > @@ -16,6 +16,7 @@
> > > #include <linux/ip.h>
> > > #include <linux/ipv6.h>
> > > #include <linux/slab.h>
> > > +#include <linux/overflow.h>
> > > #include <net/ipv6.h>
> > > #include <net/netlink.h>
> > > #include <net/pkt_sched.h>
> > > @@ -323,8 +324,10 @@ static bool offset_valid(struct sk_buff *skb, int offset)
> > > if (offset > 0 && offset > skb->len)
> > > return false;
> > >
> > > - if (offset < 0 && -offset > skb_headroom(skb))
> > > - return false;
> > > + if (offset < 0) {
> > > + if (offset == INT_MIN || -offset > skb_headroom(skb))
> > > + return false;
> > > + }
> > >
> > > return true;
> > > }
> > > @@ -393,17 +396,21 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb,
> > > struct tcf_pedit_key_ex *tkey_ex;
> > > struct tcf_pedit_parms *parms;
> > > struct tc_pedit_key *tkey;
> > > - u32 max_offset;
> > > int i;
> > >
> > > parms = rcu_dereference_bh(p->parms);
> > >
> > > - max_offset = (skb_transport_header_was_set(skb) ?
> > > - skb_transport_offset(skb) :
> > > - skb_network_offset(skb)) +
> > > - parms->tcfp_off_max_hint;
> > > - if (skb_ensure_writable(skb, min(skb->len, max_offset)))
> > > - goto done;
> > > + /*
> > > + * If the skb has shared frags the user is likely using zero-copy
> > > + * (e.g. sendfile). Those page frags may point to page-cache pages;
> > > + * writing into them would silently corrupt the page cache.
> > > + * Linearize so pedit operates on a private copy.
> > > + * TL;DR: if you want zero-copy, don't use pedit.
> > > + */
> > > + if (skb_has_shared_frag(skb)) {
> > > + if (__skb_linearize(skb))
> > > + goto bad;
> > > + }
> > >
> > > tcf_lastuse_update(&p->tcf_tm);
> > > tcf_action_update_bstats(&p->common, skb);
> > > @@ -412,6 +419,7 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb,
> > > tkey_ex = parms->tcfp_keys_ex;
> > >
> > > for (i = parms->tcfp_nkeys; i > 0; i--, tkey++) {
> > > + int write_offset, write_len;
> > > int offset = tkey->off;
> > > int hoffset = 0;
> > > u32 *ptr, hdata;
> > > @@ -451,12 +459,32 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb,
> > > }
> > > }
> > >
> > > - if (!offset_valid(skb, hoffset + offset)) {
> > > - pr_info_ratelimited("tc action pedit offset %d out of bounds\n", hoffset + offset);
> > > + if (unlikely(check_add_overflow(hoffset, offset,
> > > + &write_offset))) {
> >
> > I also noticed two remaining unchecked additions, e.g. hoffset +
> > tkey->at and offset += (*d & offmask) >> shift. These look more like
> > wrap/type-semantics cleanup than a practical signed-overflow/UB issue
> > to me, so I am OK with this patch as-is.
>
> Did you get this from some AI?
> None of these are possible. Both additions you mention are safe in
> practice because any potential wrap around or overflow is caught
> _before any memory access occurs_. Example see:
> offset_valid() which implictly casts hoffset + key to signed int.
Not AI-generated. I was looking at the code and noticed that
tc_pedit_key fields are unsigned, but the results are stored in int
offset and int hoffset. I was just thinking about odd edge cases where
an unsigned-to-signed conversion results in a negative number. I agree
with you that offset_valid() catches this anyway, so it causes no
actual harm.
>
> > > + pr_info_ratelimited("tc action pedit offset overflow\n");
> > > + goto bad;
> > > + }
> > > +
> > > + if (!offset_valid(skb, write_offset)) {
> > > + pr_info_ratelimited("tc action pedit offset %d out of bounds\n",
> > > + write_offset);
> > > goto bad;
> > > }
> > >
> > > - ptr = skb_header_pointer(skb, hoffset + offset,
> > > + if (write_offset < 0) {
> >
> > There is also a small negative-offset 4-byte-write corner case, e.g.
> > an ETH-relative edit with hoffset=-14 and offset=12, giving a final
> > write_offset=-2; if the linear head after skb->data is only 1 byte,
> > that store spans -2..1. That seems unlikely in practice; fixing it
> > would be nice, but I would not hold this patch for it.
> >
>
> Again smells AI generated. I spent a lot of time trying to determine
> the practicality and validity of this (and your previous claim), only
> to find that this whole claim is based on a view that a headroom of 1
> byte is a valid scenario in practice. Is it theoretically possible
> from the math? yes. Practically, I don't see how it makes any sense.
> Can you describe a scenario where this would happen in practise?
As I mentioned in the previous mail, I agree it is unlikely in
practice. For context, this was an edge case brought up by sashiko
regarding our patch. I wrote a test kernel module and verified that if
an skb is artificially constructed with exactly 1 byte of headroom,
skb_cow() is indeed insufficient. However, I also checked real-world
packets and the headroom is always plenty, so I agree it doesn't
happen in the real world.
>
> > Reviewed-by: Han Guidong <2045gemini@gmail.com>
>
> More like reviewed by my AI who goes by the name foobar.
I will work on improving my review quality going forward. Sorry again
for the noise.
Thanks.
>
> cheers,
> jamal
>
> >
> >
> > > + if (skb_cow(skb, -write_offset))
> > > + goto bad;
> > > + } else {
> > > + if (unlikely(check_add_overflow(write_offset,
> > > + (int)sizeof(hdata),
> > > + &write_len)))
> > > + goto bad;
> > > + if (skb_ensure_writable(skb, min_t(int, skb->len,
> > > + write_len)))
> > > + goto bad;
> > > + }
> > > +
> > > + ptr = skb_header_pointer(skb, write_offset,
> > > sizeof(hdata), &hdata);
> > > if (!ptr)
> > > goto bad;
> > > @@ -475,7 +503,7 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb,
> > >
> > > *ptr = ((*ptr & tkey->mask) ^ val);
> > > if (ptr == &hdata)
> > > - skb_store_bits(skb, hoffset + offset, ptr, 4);
> > > + skb_store_bits(skb, write_offset, ptr, sizeof(hdata));
> > > }
> > >
> > > goto done;
> > > --
> > > 2.51.2.windows.1
> > >
^ permalink raw reply [flat|nested] 44+ messages in thread* Re: [PATCH net] net/sched: fix pedit partial COW leading to page cache corruption
2026-05-20 9:12 ` Jamal Hadi Salim
2026-05-20 10:04 ` Han Guidong
@ 2026-05-20 10:36 ` Han Guidong
2026-05-20 11:40 ` Jamal Hadi Salim
1 sibling, 1 reply; 44+ messages in thread
From: Han Guidong @ 2026-05-20 10:36 UTC (permalink / raw)
To: Jamal Hadi Salim
Cc: Rajat Gupta, netdev, davem, edumazet, kuba, pabeni, horms, jiri,
yimingqian591, keenanat2000, rollkingzzc
On Wed, May 20, 2026 at 5:12 PM Jamal Hadi Salim <jhs@mojatatu.com> wrote:
>
> On Tue, May 19, 2026 at 11:11 AM Han Guidong <2045gemini@gmail.com> wrote:
> >
> > On Tue, May 19, 2026 at 11:42 AM Rajat Gupta
> > <rajat.gupta@oss.qualcomm.com> wrote:
> > >
> > > tcf_pedit_act() computes the COW range for skb_ensure_writable()
> > > once before the key loop using tcfp_off_max_hint, but the hint does
> > > not account for the runtime header offset added by typed keys. This
> > > can leave part of the write region un-COW'd.
> > >
> > > Fix by moving skb_ensure_writable() inside the per-key loop where
> > > the actual write offset is known, and add overflow checking on the
> > > offset arithmetic. For negative offsets (e.g. Ethernet header edits
> > > at ingress), use skb_cow() to COW the headroom instead. Guard
> > > offset_valid() against INT_MIN, where negation is undefined.
> > >
> > > Additionally, linearize skbs with shared frags upfront to prevent
> > > silent data corruption when pedit operates on zero-copy pages
> > > (e.g. from sendfile).
> > >
> > > Fixes: 8b796475fd78 ("net/sched: act_pedit: really ensure the skb is writable")
> >
> > Fixes: 6c02568fd1ae ("net/sched: act_pedit: Parse L3 Header for L4 offset")
> > Cc: stable@vger.kernel.org
> >
> > I took a closer look at the code, and this patch is really addressing
> > two issues.
> >
> > The negative-offset write into shared head/headroom appears to be from
> > 8b796475fd78, while the shared-frag write, and thus possible
> > page-cache corruption, appears to be the later regression from
> > 6c02568fd1ae. So it may be worth adding an additional Fixes tag for
> > 6c02568fd1ae,
>
> Pushing it a little - but sure we should also point to that commit.
>
> > and I think an explicit Cc: stable@vger.kernel.org would also make sense here.
>
> Please study the rules.
Hi Jamal,
Apologies, I'm still a newbie learning the process. I checked the docs
and saw commit dbbe7c962c3a ("docs: networking: drop special stable
handling"), but I'm still a bit confused.
Did the rule change to allow explicit "Cc: stable@vger.kernel.org"
tags for netdev?
If I completely missed the correct documentation, I would be very
grateful if you could point me to it.
Thanks for your patience!
^ permalink raw reply [flat|nested] 44+ messages in thread* Re: [PATCH net] net/sched: fix pedit partial COW leading to page cache corruption
2026-05-20 10:36 ` Han Guidong
@ 2026-05-20 11:40 ` Jamal Hadi Salim
0 siblings, 0 replies; 44+ messages in thread
From: Jamal Hadi Salim @ 2026-05-20 11:40 UTC (permalink / raw)
To: Han Guidong
Cc: Rajat Gupta, netdev, davem, edumazet, kuba, pabeni, horms, jiri,
yimingqian591, keenanat2000, rollkingzzc
On Wed, May 20, 2026 at 6:37 AM Han Guidong <2045gemini@gmail.com> wrote:
>
> On Wed, May 20, 2026 at 5:12 PM Jamal Hadi Salim <jhs@mojatatu.com> wrote:
> >
> > On Tue, May 19, 2026 at 11:11 AM Han Guidong <2045gemini@gmail.com> wrote:
> > >
> > > On Tue, May 19, 2026 at 11:42 AM Rajat Gupta
> > > <rajat.gupta@oss.qualcomm.com> wrote:
> > > >
> > > > tcf_pedit_act() computes the COW range for skb_ensure_writable()
> > > > once before the key loop using tcfp_off_max_hint, but the hint does
> > > > not account for the runtime header offset added by typed keys. This
> > > > can leave part of the write region un-COW'd.
> > > >
> > > > Fix by moving skb_ensure_writable() inside the per-key loop where
> > > > the actual write offset is known, and add overflow checking on the
> > > > offset arithmetic. For negative offsets (e.g. Ethernet header edits
> > > > at ingress), use skb_cow() to COW the headroom instead. Guard
> > > > offset_valid() against INT_MIN, where negation is undefined.
> > > >
> > > > Additionally, linearize skbs with shared frags upfront to prevent
> > > > silent data corruption when pedit operates on zero-copy pages
> > > > (e.g. from sendfile).
> > > >
> > > > Fixes: 8b796475fd78 ("net/sched: act_pedit: really ensure the skb is writable")
> > >
> > > Fixes: 6c02568fd1ae ("net/sched: act_pedit: Parse L3 Header for L4 offset")
> > > Cc: stable@vger.kernel.org
> > >
> > > I took a closer look at the code, and this patch is really addressing
> > > two issues.
> > >
> > > The negative-offset write into shared head/headroom appears to be from
> > > 8b796475fd78, while the shared-frag write, and thus possible
> > > page-cache corruption, appears to be the later regression from
> > > 6c02568fd1ae. So it may be worth adding an additional Fixes tag for
> > > 6c02568fd1ae,
> >
> > Pushing it a little - but sure we should also point to that commit.
> >
> > > and I think an explicit Cc: stable@vger.kernel.org would also make sense here.
> >
> > Please study the rules.
>
> Hi Jamal,
>
> Apologies, I'm still a newbie learning the process. I checked the docs
> and saw commit dbbe7c962c3a ("docs: networking: drop special stable
> handling"), but I'm still a bit confused.
>
> Did the rule change to allow explicit "Cc: stable@vger.kernel.org"
> tags for netdev?
>
> If I completely missed the correct documentation, I would be very
> grateful if you could point me to it.
>
This is a security fix. AFAIK, It does not follow the stable rules.
It goes via the net tree (then propagated via net-next) before making
it to linux main. It gets picked up by the stable tree.
But: You are correct that Ccing stable is useful when the vulnerberity
hasnt been published. I apologize for being harsh - too many AI emails
claiming bugs on which i spend too much time and you bore the brunt of
it. So we should Cc them.
cheers,
jamal
> Thanks for your patience!
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH net] net/sched: fix pedit partial COW leading to page cache corruption
2026-05-19 3:39 ` [PATCH net] net/sched: fix pedit partial COW leading to page cache corruption Rajat Gupta
2026-05-19 11:18 ` Toke Høiland-Jørgensen
2026-05-19 15:10 ` Han Guidong
@ 2026-05-20 9:23 ` Jamal Hadi Salim
2026-05-20 20:00 ` Jamal Hadi Salim
2026-05-26 9:53 ` David Laight
3 siblings, 1 reply; 44+ messages in thread
From: Jamal Hadi Salim @ 2026-05-20 9:23 UTC (permalink / raw)
To: Rajat Gupta
Cc: netdev, davem, edumazet, kuba, pabeni, horms, jiri, yimingqian591,
keenanat2000, 2045gemini, rollkingzzc
On Mon, May 18, 2026 at 11:42 PM Rajat Gupta
<rajat.gupta@oss.qualcomm.com> wrote:
>
> tcf_pedit_act() computes the COW range for skb_ensure_writable()
> once before the key loop using tcfp_off_max_hint, but the hint does
> not account for the runtime header offset added by typed keys. This
> can leave part of the write region un-COW'd.
>
> Fix by moving skb_ensure_writable() inside the per-key loop where
> the actual write offset is known, and add overflow checking on the
> offset arithmetic. For negative offsets (e.g. Ethernet header edits
> at ingress), use skb_cow() to COW the headroom instead. Guard
> offset_valid() against INT_MIN, where negation is undefined.
>
> Additionally, linearize skbs with shared frags upfront to prevent
> silent data corruption when pedit operates on zero-copy pages
> (e.g. from sendfile).
>
Sashiko makes the claim that:
"
....
.....
If the payload part extending past offset 0 resides in a shared
fragment for a cloned skb, wouldn't skb_store_bits() write the
positive offset portion directly into the shared page, causing a data
corruption regression?
"
It missed the earlier code (before the quoted code is hit) which is
very well documented!
/*
* If the skb has shared frags the user is likely using zero-copy
* (e.g. sendfile). Those page frags may point to page-cache pages;
* writing into them would silently corrupt the page cache.
* Linearize so pedit operates on a private copy.
* TL;DR: if you want zero-copy, don't use pedit.
*/
if (skb_has_shared_frag(skb)) {
if (__skb_linearize(skb))
goto bad;
}
TLDR:
If the skb has shared frags, it is linearized upfront using
__skb_linearize() which means any risk of modifying a shared fragment
is non-existent.
cheers,
jamal
> Fixes: 8b796475fd78 ("net/sched: act_pedit: really ensure the skb is writable")
> Reported-by: Rajat Gupta <rajat.gupta@oss.qualcomm.com>
> Reported-by: Yiming Qian <yimingqian591@gmail.com>
> Reported-by: Keenan Dong <keenanat2000@gmail.com>
> Reported-by: Han Guidong <2045gemini@gmail.com>
> Reported-by: Zhang Cen <rollkingzzc@gmail.com>
> Tested-by: Han Guidong <2045gemini@gmail.com>
> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
> Signed-off-by: Rajat Gupta <rajat.gupta@oss.qualcomm.com>
> ---
> net/sched/act_pedit.c | 54 ++++++++++++++++++++++++++++++++-----------
> 1 file changed, 41 insertions(+), 13 deletions(-)
>
> diff --git a/net/sched/act_pedit.c b/net/sched/act_pedit.c
> index bc20f08a2..79921b8d8 100644
> --- a/net/sched/act_pedit.c
> +++ b/net/sched/act_pedit.c
> @@ -16,6 +16,7 @@
> #include <linux/ip.h>
> #include <linux/ipv6.h>
> #include <linux/slab.h>
> +#include <linux/overflow.h>
> #include <net/ipv6.h>
> #include <net/netlink.h>
> #include <net/pkt_sched.h>
> @@ -323,8 +324,10 @@ static bool offset_valid(struct sk_buff *skb, int offset)
> if (offset > 0 && offset > skb->len)
> return false;
>
> - if (offset < 0 && -offset > skb_headroom(skb))
> - return false;
> + if (offset < 0) {
> + if (offset == INT_MIN || -offset > skb_headroom(skb))
> + return false;
> + }
>
> return true;
> }
> @@ -393,17 +396,21 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb,
> struct tcf_pedit_key_ex *tkey_ex;
> struct tcf_pedit_parms *parms;
> struct tc_pedit_key *tkey;
> - u32 max_offset;
> int i;
>
> parms = rcu_dereference_bh(p->parms);
>
> - max_offset = (skb_transport_header_was_set(skb) ?
> - skb_transport_offset(skb) :
> - skb_network_offset(skb)) +
> - parms->tcfp_off_max_hint;
> - if (skb_ensure_writable(skb, min(skb->len, max_offset)))
> - goto done;
> + /*
> + * If the skb has shared frags the user is likely using zero-copy
> + * (e.g. sendfile). Those page frags may point to page-cache pages;
> + * writing into them would silently corrupt the page cache.
> + * Linearize so pedit operates on a private copy.
> + * TL;DR: if you want zero-copy, don't use pedit.
> + */
> + if (skb_has_shared_frag(skb)) {
> + if (__skb_linearize(skb))
> + goto bad;
> + }
>
> tcf_lastuse_update(&p->tcf_tm);
> tcf_action_update_bstats(&p->common, skb);
> @@ -412,6 +419,7 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb,
> tkey_ex = parms->tcfp_keys_ex;
>
> for (i = parms->tcfp_nkeys; i > 0; i--, tkey++) {
> + int write_offset, write_len;
> int offset = tkey->off;
> int hoffset = 0;
> u32 *ptr, hdata;
> @@ -451,12 +459,32 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb,
> }
> }
>
> - if (!offset_valid(skb, hoffset + offset)) {
> - pr_info_ratelimited("tc action pedit offset %d out of bounds\n", hoffset + offset);
> + if (unlikely(check_add_overflow(hoffset, offset,
> + &write_offset))) {
> + pr_info_ratelimited("tc action pedit offset overflow\n");
> + goto bad;
> + }
> +
> + if (!offset_valid(skb, write_offset)) {
> + pr_info_ratelimited("tc action pedit offset %d out of bounds\n",
> + write_offset);
> goto bad;
> }
>
> - ptr = skb_header_pointer(skb, hoffset + offset,
> + if (write_offset < 0) {
> + if (skb_cow(skb, -write_offset))
> + goto bad;
> + } else {
> + if (unlikely(check_add_overflow(write_offset,
> + (int)sizeof(hdata),
> + &write_len)))
> + goto bad;
> + if (skb_ensure_writable(skb, min_t(int, skb->len,
> + write_len)))
> + goto bad;
> + }
> +
> + ptr = skb_header_pointer(skb, write_offset,
> sizeof(hdata), &hdata);
> if (!ptr)
> goto bad;
> @@ -475,7 +503,7 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb,
>
> *ptr = ((*ptr & tkey->mask) ^ val);
> if (ptr == &hdata)
> - skb_store_bits(skb, hoffset + offset, ptr, 4);
> + skb_store_bits(skb, write_offset, ptr, sizeof(hdata));
> }
>
> goto done;
> --
> 2.51.2.windows.1
>
^ permalink raw reply [flat|nested] 44+ messages in thread* Re: [PATCH net] net/sched: fix pedit partial COW leading to page cache corruption
2026-05-20 9:23 ` Jamal Hadi Salim
@ 2026-05-20 20:00 ` Jamal Hadi Salim
2026-05-21 9:53 ` Jamal Hadi Salim
0 siblings, 1 reply; 44+ messages in thread
From: Jamal Hadi Salim @ 2026-05-20 20:00 UTC (permalink / raw)
To: Rajat Gupta
Cc: netdev, davem, edumazet, kuba, pabeni, horms, jiri, yimingqian591,
keenanat2000, 2045gemini, rollkingzzc
On Wed, May 20, 2026 at 5:23 AM Jamal Hadi Salim <jhs@mojatatu.com> wrote:
>
> On Mon, May 18, 2026 at 11:42 PM Rajat Gupta
> <rajat.gupta@oss.qualcomm.com> wrote:
> >
> > tcf_pedit_act() computes the COW range for skb_ensure_writable()
> > once before the key loop using tcfp_off_max_hint, but the hint does
> > not account for the runtime header offset added by typed keys. This
> > can leave part of the write region un-COW'd.
> >
> > Fix by moving skb_ensure_writable() inside the per-key loop where
> > the actual write offset is known, and add overflow checking on the
> > offset arithmetic. For negative offsets (e.g. Ethernet header edits
> > at ingress), use skb_cow() to COW the headroom instead. Guard
> > offset_valid() against INT_MIN, where negation is undefined.
> >
> > Additionally, linearize skbs with shared frags upfront to prevent
> > silent data corruption when pedit operates on zero-copy pages
> > (e.g. from sendfile).
> >
>
> Sashiko makes the claim that:
> "
> ....
> .....
> If the payload part extending past offset 0 resides in a shared
> fragment for a cloned skb, wouldn't skb_store_bits() write the
> positive offset portion directly into the shared page, causing a data
> corruption regression?
> "
>
> It missed the earlier code (before the quoted code is hit) which is
> very well documented!
> /*
> * If the skb has shared frags the user is likely using zero-copy
> * (e.g. sendfile). Those page frags may point to page-cache pages;
> * writing into them would silently corrupt the page cache.
> * Linearize so pedit operates on a private copy.
> * TL;DR: if you want zero-copy, don't use pedit.
> */
> if (skb_has_shared_frag(skb)) {
> if (__skb_linearize(skb))
> goto bad;
> }
>
> TLDR:
> If the skb has shared frags, it is linearized upfront using
> __skb_linearize() which means any risk of modifying a shared fragment
> is non-existent.
It seems there's a second sashiko ive been made aware of - these
things are just multiplying ;->
https://netdev-ai.bots.linux.dev/sashiko/#/patchset/20260519033950.2037-1-rajat.gupta%40oss.qualcomm.com
Both points it makes are valid. Dont have time right now.
And infact looking at the second point sashiko 2 made I realize I was
only paying attention to externally shared fragments which the poc
used.
So probably the sashiko1 had a point as well. Anyways i dont have time
- will respond by am tommorow.
cheers,
jamal
^ permalink raw reply [flat|nested] 44+ messages in thread* Re: [PATCH net] net/sched: fix pedit partial COW leading to page cache corruption
2026-05-20 20:00 ` Jamal Hadi Salim
@ 2026-05-21 9:53 ` Jamal Hadi Salim
2026-05-21 10:15 ` Jamal Hadi Salim
0 siblings, 1 reply; 44+ messages in thread
From: Jamal Hadi Salim @ 2026-05-21 9:53 UTC (permalink / raw)
To: Rajat Gupta
Cc: netdev, davem, edumazet, kuba, pabeni, horms, jiri, yimingqian591,
keenanat2000, 2045gemini, rollkingzzc
::
On Wed, May 20, 2026 at 4:00 PM Jamal Hadi Salim <jhs@mojatatu.com> wrote:
>
> On Wed, May 20, 2026 at 5:23 AM Jamal Hadi Salim <jhs@mojatatu.com> wrote:
> >
> > On Mon, May 18, 2026 at 11:42 PM Rajat Gupta
> > <rajat.gupta@oss.qualcomm.com> wrote:
> > >
> > > tcf_pedit_act() computes the COW range for skb_ensure_writable()
> > > once before the key loop using tcfp_off_max_hint, but the hint does
> > > not account for the runtime header offset added by typed keys. This
> > > can leave part of the write region un-COW'd.
> > >
> > > Fix by moving skb_ensure_writable() inside the per-key loop where
> > > the actual write offset is known, and add overflow checking on the
> > > offset arithmetic. For negative offsets (e.g. Ethernet header edits
> > > at ingress), use skb_cow() to COW the headroom instead. Guard
> > > offset_valid() against INT_MIN, where negation is undefined.
> > >
> > > Additionally, linearize skbs with shared frags upfront to prevent
> > > silent data corruption when pedit operates on zero-copy pages
> > > (e.g. from sendfile).
> > >
> >
> > Sashiko makes the claim that:
> > "
> > ....
> > .....
> > If the payload part extending past offset 0 resides in a shared
> > fragment for a cloned skb, wouldn't skb_store_bits() write the
> > positive offset portion directly into the shared page, causing a data
> > corruption regression?
> > "
> >
> > It missed the earlier code (before the quoted code is hit) which is
> > very well documented!
> > /*
> > * If the skb has shared frags the user is likely using zero-copy
> > * (e.g. sendfile). Those page frags may point to page-cache pages;
> > * writing into them would silently corrupt the page cache.
> > * Linearize so pedit operates on a private copy.
> > * TL;DR: if you want zero-copy, don't use pedit.
> > */
> > if (skb_has_shared_frag(skb)) {
> > if (__skb_linearize(skb))
> > goto bad;
> > }
> >
> > TLDR:
> > If the skb has shared frags, it is linearized upfront using
> > __skb_linearize() which means any risk of modifying a shared fragment
> > is non-existent.
>
> It seems there's a second sashiko ive been made aware of - these
> things are just multiplying ;->
> https://netdev-ai.bots.linux.dev/sashiko/#/patchset/20260519033950.2037-1-rajat.gupta%40oss.qualcomm.com
>
> Both points it makes are valid. Dont have time right now.
> And infact looking at the second point sashiko 2 made I realize I was
> only paying attention to externally shared fragments which the poc
> used.
> So probably the sashiko1 had a point as well. Anyways i dont have time
> - will respond by am tommorow.
>
Sashiko2, claim 1 (rephrased for brevity):
"With the pre-loop skb_ensure_writable() removed, tcfp_off_max_hint
and related use is no longer needed.
I was going to say "separate patch" but since the cause is this patch
that removed the use, at the expense of a much bigger patch we should
remove it (delete it from include/net/tc_act/tc_pedit.h::struct
tcf_pedit_parms and net/sched/act_pedit.c - as described in sashiko2
claim.
Sashiko2, claim 2 (summarized):
"..skb_store_bits() will memcpy() the tail bytes of that 4-byte write
onto frag pages that are still referenced by the original clone, so a
clone held by a tap, AF_PACKET listener, or mirroring path would see
the modified bytes on its shared pages."
This is the same claim as sashiko1 but sashiko2 gave a much more
convincing description ;->
skb_has_shared_frag() is only true if the frags are flagged as
SKBFL_SHARED_FRAG (which is what the repro did); however, if we get
frags from eg a driver on ingress and that skb gets cloned with frags
we won't catch it.
One approach is to do an if (skb_has_any_shared_frags(skb)) and then
do a skb_linearize_cow() but that sounds like overkill.
Another which will make the patch even uglier (but less expensive) is
to add an extra check insde the patch's "if (write_offset < 0)"
to do: if (write_offset + (int)sizeof(hdata) > 0) { skb_ensure_writable()}
cheers,
jamal
^ permalink raw reply [flat|nested] 44+ messages in thread* Re: [PATCH net] net/sched: fix pedit partial COW leading to page cache corruption
2026-05-21 9:53 ` Jamal Hadi Salim
@ 2026-05-21 10:15 ` Jamal Hadi Salim
2026-05-21 14:35 ` Jakub Kicinski
2026-05-22 7:49 ` Han Guidong
0 siblings, 2 replies; 44+ messages in thread
From: Jamal Hadi Salim @ 2026-05-21 10:15 UTC (permalink / raw)
To: Rajat Gupta
Cc: netdev, davem, edumazet, kuba, pabeni, horms, jiri, yimingqian591,
keenanat2000, 2045gemini, rollkingzzc
[-- Attachment #1: Type: text/plain, Size: 4508 bytes --]
On Thu, May 21, 2026 at 5:53 AM Jamal Hadi Salim <jhs@mojatatu.com> wrote:
>
> ::
>
> On Wed, May 20, 2026 at 4:00 PM Jamal Hadi Salim <jhs@mojatatu.com> wrote:
> >
> > On Wed, May 20, 2026 at 5:23 AM Jamal Hadi Salim <jhs@mojatatu.com> wrote:
> > >
> > > On Mon, May 18, 2026 at 11:42 PM Rajat Gupta
> > > <rajat.gupta@oss.qualcomm.com> wrote:
> > > >
> > > > tcf_pedit_act() computes the COW range for skb_ensure_writable()
> > > > once before the key loop using tcfp_off_max_hint, but the hint does
> > > > not account for the runtime header offset added by typed keys. This
> > > > can leave part of the write region un-COW'd.
> > > >
> > > > Fix by moving skb_ensure_writable() inside the per-key loop where
> > > > the actual write offset is known, and add overflow checking on the
> > > > offset arithmetic. For negative offsets (e.g. Ethernet header edits
> > > > at ingress), use skb_cow() to COW the headroom instead. Guard
> > > > offset_valid() against INT_MIN, where negation is undefined.
> > > >
> > > > Additionally, linearize skbs with shared frags upfront to prevent
> > > > silent data corruption when pedit operates on zero-copy pages
> > > > (e.g. from sendfile).
> > > >
> > >
> > > Sashiko makes the claim that:
> > > "
> > > ....
> > > .....
> > > If the payload part extending past offset 0 resides in a shared
> > > fragment for a cloned skb, wouldn't skb_store_bits() write the
> > > positive offset portion directly into the shared page, causing a data
> > > corruption regression?
> > > "
> > >
> > > It missed the earlier code (before the quoted code is hit) which is
> > > very well documented!
> > > /*
> > > * If the skb has shared frags the user is likely using zero-copy
> > > * (e.g. sendfile). Those page frags may point to page-cache pages;
> > > * writing into them would silently corrupt the page cache.
> > > * Linearize so pedit operates on a private copy.
> > > * TL;DR: if you want zero-copy, don't use pedit.
> > > */
> > > if (skb_has_shared_frag(skb)) {
> > > if (__skb_linearize(skb))
> > > goto bad;
> > > }
> > >
> > > TLDR:
> > > If the skb has shared frags, it is linearized upfront using
> > > __skb_linearize() which means any risk of modifying a shared fragment
> > > is non-existent.
> >
> > It seems there's a second sashiko ive been made aware of - these
> > things are just multiplying ;->
> > https://netdev-ai.bots.linux.dev/sashiko/#/patchset/20260519033950.2037-1-rajat.gupta%40oss.qualcomm.com
> >
> > Both points it makes are valid. Dont have time right now.
> > And infact looking at the second point sashiko 2 made I realize I was
> > only paying attention to externally shared fragments which the poc
> > used.
> > So probably the sashiko1 had a point as well. Anyways i dont have time
> > - will respond by am tommorow.
> >
>
> Sashiko2, claim 1 (rephrased for brevity):
> "With the pre-loop skb_ensure_writable() removed, tcfp_off_max_hint
> and related use is no longer needed.
>
> I was going to say "separate patch" but since the cause is this patch
> that removed the use, at the expense of a much bigger patch we should
> remove it (delete it from include/net/tc_act/tc_pedit.h::struct
> tcf_pedit_parms and net/sched/act_pedit.c - as described in sashiko2
> claim.
>
> Sashiko2, claim 2 (summarized):
> "..skb_store_bits() will memcpy() the tail bytes of that 4-byte write
> onto frag pages that are still referenced by the original clone, so a
> clone held by a tap, AF_PACKET listener, or mirroring path would see
> the modified bytes on its shared pages."
>
> This is the same claim as sashiko1 but sashiko2 gave a much more
> convincing description ;->
> skb_has_shared_frag() is only true if the frags are flagged as
> SKBFL_SHARED_FRAG (which is what the repro did); however, if we get
> frags from eg a driver on ingress and that skb gets cloned with frags
> we won't catch it.
> One approach is to do an if (skb_has_any_shared_frags(skb)) and then
> do a skb_linearize_cow() but that sounds like overkill.
Yeah, this would be overkill - imagine running tcpdump 100% will be cloned
> Another which will make the patch even uglier (but less expensive) is
> to add an extra check insde the patch's "if (write_offset < 0)"
> to do: if (write_offset + (int)sizeof(hdata) > 0) { skb_ensure_writable()}
>
To be precise, something like attached (untested, uncompiled)
cheers,
jamal
[-- Attachment #2: patchlet-rajat --]
[-- Type: application/octet-stream, Size: 585 bytes --]
diff --git a/net/sched/act_pedit.c b/net/sched/act_pedit.c
index 79921b8d89ba..8f0f84b50c85 100644
--- a/net/sched/act_pedit.c
+++ b/net/sched/act_pedit.c
@@ -474,6 +474,12 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb,
if (write_offset < 0) {
if (skb_cow(skb, -write_offset))
goto bad;
+ if (write_offset + (int)sizeof(hdata) > 0) {
+ if (skb_ensure_writable(skb,
+ min_t(int, skb->len,
+ write_offset + (int)sizeof(hdata))))
+ goto bad;
+ }
} else {
if (unlikely(check_add_overflow(write_offset,
(int)sizeof(hdata),
^ permalink raw reply related [flat|nested] 44+ messages in thread* Re: [PATCH net] net/sched: fix pedit partial COW leading to page cache corruption
2026-05-21 10:15 ` Jamal Hadi Salim
@ 2026-05-21 14:35 ` Jakub Kicinski
2026-05-21 15:16 ` Jamal Hadi Salim
[not found] ` <CAKa-r6soz=iMBiYG0Grhhc12yhdw9vMNV+XjjEPCmtgKK6+rhA@mail.gmail.com>
2026-05-22 7:49 ` Han Guidong
1 sibling, 2 replies; 44+ messages in thread
From: Jakub Kicinski @ 2026-05-21 14:35 UTC (permalink / raw)
To: Jamal Hadi Salim
Cc: Rajat Gupta, netdev, davem, edumazet, pabeni, horms, jiri,
yimingqian591, keenanat2000, 2045gemini, rollkingzzc
On Thu, 21 May 2026 06:15:17 -0400 Jamal Hadi Salim wrote:
> > This is the same claim as sashiko1 but sashiko2 gave a much more
> > convincing description ;->
> > skb_has_shared_frag() is only true if the frags are flagged as
> > SKBFL_SHARED_FRAG (which is what the repro did); however, if we get
> > frags from eg a driver on ingress and that skb gets cloned with frags
> > we won't catch it.
> > One approach is to do an if (skb_has_any_shared_frags(skb)) and then
> > do a skb_linearize_cow() but that sounds like overkill.
>
> Yeah, this would be overkill - imagine running tcpdump 100% will be cloned
>
> > Another which will make the patch even uglier (but less expensive) is
> > to add an extra check insde the patch's "if (write_offset < 0)"
> > to do: if (write_offset + (int)sizeof(hdata) > 0) { skb_ensure_writable()}
> >
>
> To be precise, something like attached (untested, uncompiled)
Can we not pull the headers? Do you know of anyone modifying payloads
with pedit?
The concept of "shared frags" is silly IMHO, if I'm checking right only
rxrpc and xfrm think that it's a thing. I'm hoping to delete that and
reclaim the flag id in net-next...
^ permalink raw reply [flat|nested] 44+ messages in thread* Re: [PATCH net] net/sched: fix pedit partial COW leading to page cache corruption
2026-05-21 14:35 ` Jakub Kicinski
@ 2026-05-21 15:16 ` Jamal Hadi Salim
2026-05-21 15:46 ` Jakub Kicinski
[not found] ` <CAKa-r6soz=iMBiYG0Grhhc12yhdw9vMNV+XjjEPCmtgKK6+rhA@mail.gmail.com>
1 sibling, 1 reply; 44+ messages in thread
From: Jamal Hadi Salim @ 2026-05-21 15:16 UTC (permalink / raw)
To: Jakub Kicinski
Cc: Rajat Gupta, netdev, davem, edumazet, pabeni, horms, jiri,
yimingqian591, keenanat2000, 2045gemini, rollkingzzc
On Thu, May 21, 2026 at 10:35 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Thu, 21 May 2026 06:15:17 -0400 Jamal Hadi Salim wrote:
> > > This is the same claim as sashiko1 but sashiko2 gave a much more
> > > convincing description ;->
> > > skb_has_shared_frag() is only true if the frags are flagged as
> > > SKBFL_SHARED_FRAG (which is what the repro did); however, if we get
> > > frags from eg a driver on ingress and that skb gets cloned with frags
> > > we won't catch it.
> > > One approach is to do an if (skb_has_any_shared_frags(skb)) and then
> > > do a skb_linearize_cow() but that sounds like overkill.
> >
> > Yeah, this would be overkill - imagine running tcpdump 100% will be cloned
> >
> > > Another which will make the patch even uglier (but less expensive) is
> > > to add an extra check insde the patch's "if (write_offset < 0)"
> > > to do: if (write_offset + (int)sizeof(hdata) > 0) { skb_ensure_writable()}
> > >
> >
> > To be precise, something like attached (untested, uncompiled)
>
> Can we not pull the headers? Do you know of anyone modifying payloads
> with pedit?
>
not sure - but it is not unreasonable if someone used it for such a case.
> The concept of "shared frags" is silly IMHO, if I'm checking right only
> rxrpc and xfrm think that it's a thing. I'm hoping to delete that and
> reclaim the flag id in net-next...
All these issues stem from the shared frags point, but the patchlet i
showed is unrelated to SKBFL_SHARED_FRAG, it is more related to
cloned+frags.
If you remove it in net-next - does that necessitate removing all
"fixes" that tried to address that issue? It seems like that would be
sensible, but it makes me wonder why the shared frags concept exists
in the first place.
cheers,
jamal
^ permalink raw reply [flat|nested] 44+ messages in thread* Re: [PATCH net] net/sched: fix pedit partial COW leading to page cache corruption
2026-05-21 15:16 ` Jamal Hadi Salim
@ 2026-05-21 15:46 ` Jakub Kicinski
2026-05-22 11:47 ` Jamal Hadi Salim
0 siblings, 1 reply; 44+ messages in thread
From: Jakub Kicinski @ 2026-05-21 15:46 UTC (permalink / raw)
To: Jamal Hadi Salim
Cc: Rajat Gupta, netdev, davem, edumazet, pabeni, horms, jiri,
yimingqian591, keenanat2000, 2045gemini, rollkingzzc
On Thu, 21 May 2026 11:16:35 -0400 Jamal Hadi Salim wrote:
> On Thu, May 21, 2026 at 10:35 AM Jakub Kicinski <kuba@kernel.org> wrote:
> > > To be precise, something like attached (untested, uncompiled)
> >
> > Can we not pull the headers? Do you know of anyone modifying payloads
> > with pedit?
>
> not sure - but it is not unreasonable if someone used it for such a case.
Ack, we'd still support modifying any offset in the packet.
But not optimize for modifying frags until we know user exists.
> > The concept of "shared frags" is silly IMHO, if I'm checking right only
> > rxrpc and xfrm think that it's a thing. I'm hoping to delete that and
> > reclaim the flag id in net-next...
>
> All these issues stem from the shared frags point, but the patchlet i
> showed is unrelated to SKBFL_SHARED_FRAG, it is more related to
> cloned+frags.
Right, right. Just pointing it out cause Rajat's patch adds another
skb_has_shared_frag(). Which we'll then have to delete in net-next :\
> If you remove it in net-next - does that necessitate removing all
> "fixes" that tried to address that issue? It seems like that would be
> sensible, but it makes me wonder why the shared frags concept exists
> in the first place.
IDK. We have in the tree plenty of "good ideas that went nowhere"..
As I said elsewhere IMO having the maintain this flag correctly
100% of the time will only create more CVEs, and there's no practical
reason to optimize for frags not being shared.
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH net] net/sched: fix pedit partial COW leading to page cache corruption
2026-05-21 15:46 ` Jakub Kicinski
@ 2026-05-22 11:47 ` Jamal Hadi Salim
2026-05-22 15:46 ` Jakub Kicinski
0 siblings, 1 reply; 44+ messages in thread
From: Jamal Hadi Salim @ 2026-05-22 11:47 UTC (permalink / raw)
To: Jakub Kicinski
Cc: Rajat Gupta, netdev, davem, edumazet, pabeni, horms, jiri,
yimingqian591, keenanat2000, 2045gemini, rollkingzzc
On Thu, May 21, 2026 at 11:46 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Thu, 21 May 2026 11:16:35 -0400 Jamal Hadi Salim wrote:
> > On Thu, May 21, 2026 at 10:35 AM Jakub Kicinski <kuba@kernel.org> wrote:
> > > > To be precise, something like attached (untested, uncompiled)
> > >
> > > Can we not pull the headers? Do you know of anyone modifying payloads
> > > with pedit?
> >
> > not sure - but it is not unreasonable if someone used it for such a case.
>
> Ack, we'd still support modifying any offset in the packet.
> But not optimize for modifying frags until we know user exists.
>
> > > The concept of "shared frags" is silly IMHO, if I'm checking right only
> > > rxrpc and xfrm think that it's a thing. I'm hoping to delete that and
> > > reclaim the flag id in net-next...
> >
> > All these issues stem from the shared frags point, but the patchlet i
> > showed is unrelated to SKBFL_SHARED_FRAG, it is more related to
> > cloned+frags.
>
> Right, right. Just pointing it out cause Rajat's patch adds another
> skb_has_shared_frag(). Which we'll then have to delete in net-next :\
>
With a note that the exploits currently take advantage of that
specific cache issue. How about we allow it for net now and then
schedule a follow-up phase to remove it when you get rid of "shared
frags"?
cheers,
jamal
> > If you remove it in net-next - does that necessitate removing all
> > "fixes" that tried to address that issue? It seems like that would be
> > sensible, but it makes me wonder why the shared frags concept exists
> > in the first place.
>
> IDK. We have in the tree plenty of "good ideas that went nowhere"..
> As I said elsewhere IMO having the maintain this flag correctly
> 100% of the time will only create more CVEs, and there's no practical
> reason to optimize for frags not being shared.
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH net] net/sched: fix pedit partial COW leading to page cache corruption
2026-05-22 11:47 ` Jamal Hadi Salim
@ 2026-05-22 15:46 ` Jakub Kicinski
2026-05-22 16:37 ` Jamal Hadi Salim
0 siblings, 1 reply; 44+ messages in thread
From: Jakub Kicinski @ 2026-05-22 15:46 UTC (permalink / raw)
To: Jamal Hadi Salim
Cc: Rajat Gupta, netdev, davem, edumazet, pabeni, horms, jiri,
yimingqian591, keenanat2000, 2045gemini, rollkingzzc
On Fri, 22 May 2026 07:47:32 -0400 Jamal Hadi Salim wrote:
> > > All these issues stem from the shared frags point, but the patchlet i
> > > showed is unrelated to SKBFL_SHARED_FRAG, it is more related to
> > > cloned+frags.
> >
> > Right, right. Just pointing it out cause Rajat's patch adds another
> > skb_has_shared_frag(). Which we'll then have to delete in net-next :\
>
> With a note that the exploits currently take advantage of that
> specific cache issue. How about we allow it for net now and then
> schedule a follow-up phase to remove it when you get rid of "shared
> frags"?
I must be missing something. What's the problem with changing the patch
to pull headers instead? I mean - if we agree that this is where we'll
end up - we should just do it now. It's the long standing kernel policy
to "fix things right" instead of creating temporary fixes which then
have to be reworked in -next
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH net] net/sched: fix pedit partial COW leading to page cache corruption
2026-05-22 15:46 ` Jakub Kicinski
@ 2026-05-22 16:37 ` Jamal Hadi Salim
2026-05-22 17:01 ` Jamal Hadi Salim
0 siblings, 1 reply; 44+ messages in thread
From: Jamal Hadi Salim @ 2026-05-22 16:37 UTC (permalink / raw)
To: Jakub Kicinski
Cc: Rajat Gupta, netdev, davem, edumazet, pabeni, horms, jiri,
yimingqian591, keenanat2000, 2045gemini, rollkingzzc
On Fri, May 22, 2026 at 11:46 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Fri, 22 May 2026 07:47:32 -0400 Jamal Hadi Salim wrote:
> > > > All these issues stem from the shared frags point, but the patchlet i
> > > > showed is unrelated to SKBFL_SHARED_FRAG, it is more related to
> > > > cloned+frags.
> > >
> > > Right, right. Just pointing it out cause Rajat's patch adds another
> > > skb_has_shared_frag(). Which we'll then have to delete in net-next :\
> >
> > With a note that the exploits currently take advantage of that
> > specific cache issue. How about we allow it for net now and then
> > schedule a follow-up phase to remove it when you get rid of "shared
> > frags"?
>
> I must be missing something. What's the problem with changing the patch
> to pull headers instead? I mean - if we agree that this is where we'll
> end up - we should just do it now. It's the long standing kernel policy
> to "fix things right" instead of creating temporary fixes which then
> have to be reworked in -next
I may be the one missing things. You main concern is with this:
+ /*
+ * If the skb has shared frags the user is likely using zero-copy
+ * (e.g. sendfile). Those page frags may point to page-cache pages;
+ * writing into them would silently corrupt the page cache.
+ * Linearize so pedit operates on a private copy.
+ * TL;DR: if you want zero-copy, don't use pedit.
+ */
+ if (skb_has_shared_frag(skb)) {
+ if (__skb_linearize(skb))
+ goto bad;
+ }
+
i.e you want that gone, correct?
And my concern was whether removing this still exposes things to
exploit even if temporarily.
Likely it wont. I have time, let me test the exploit with that code
ifdef'ed out.
I didnt get the "pull headers instead" part.
cheers,
jamal
^ permalink raw reply [flat|nested] 44+ messages in thread* Re: [PATCH net] net/sched: fix pedit partial COW leading to page cache corruption
2026-05-22 16:37 ` Jamal Hadi Salim
@ 2026-05-22 17:01 ` Jamal Hadi Salim
2026-05-23 0:55 ` Jakub Kicinski
0 siblings, 1 reply; 44+ messages in thread
From: Jamal Hadi Salim @ 2026-05-22 17:01 UTC (permalink / raw)
To: Jakub Kicinski
Cc: Rajat Gupta, netdev, davem, edumazet, pabeni, horms, jiri,
yimingqian591, keenanat2000, 2045gemini, rollkingzzc
On Fri, May 22, 2026 at 12:37 PM Jamal Hadi Salim <jhs@mojatatu.com> wrote:
>
> On Fri, May 22, 2026 at 11:46 AM Jakub Kicinski <kuba@kernel.org> wrote:
> >
> > On Fri, 22 May 2026 07:47:32 -0400 Jamal Hadi Salim wrote:
> > > > > All these issues stem from the shared frags point, but the patchlet i
> > > > > showed is unrelated to SKBFL_SHARED_FRAG, it is more related to
> > > > > cloned+frags.
> > > >
> > > > Right, right. Just pointing it out cause Rajat's patch adds another
> > > > skb_has_shared_frag(). Which we'll then have to delete in net-next :\
> > >
> > > With a note that the exploits currently take advantage of that
> > > specific cache issue. How about we allow it for net now and then
> > > schedule a follow-up phase to remove it when you get rid of "shared
> > > frags"?
> >
> > I must be missing something. What's the problem with changing the patch
> > to pull headers instead? I mean - if we agree that this is where we'll
> > end up - we should just do it now. It's the long standing kernel policy
> > to "fix things right" instead of creating temporary fixes which then
> > have to be reworked in -next
>
> I may be the one missing things. You main concern is with this:
> + /*
> + * If the skb has shared frags the user is likely using zero-copy
> + * (e.g. sendfile). Those page frags may point to page-cache pages;
> + * writing into them would silently corrupt the page cache.
> + * Linearize so pedit operates on a private copy.
> + * TL;DR: if you want zero-copy, don't use pedit.
> + */
> + if (skb_has_shared_frag(skb)) {
> + if (__skb_linearize(skb))
> + goto bad;
> + }
> +
>
> i.e you want that gone, correct?
> And my concern was whether removing this still exposes things to
> exploit even if temporarily.
> Likely it wont. I have time, let me test the exploit with that code
> ifdef'ed out.
Ok, it fixes the issue even i ifdef that out.
We still need the little patchlet i sent ...
cheers,
jamal
> I didnt get the "pull headers instead" part.
>
> cheers,
> jamal
^ permalink raw reply [flat|nested] 44+ messages in thread* Re: [PATCH net] net/sched: fix pedit partial COW leading to page cache corruption
2026-05-22 17:01 ` Jamal Hadi Salim
@ 2026-05-23 0:55 ` Jakub Kicinski
2026-05-23 12:07 ` Jamal Hadi Salim
0 siblings, 1 reply; 44+ messages in thread
From: Jakub Kicinski @ 2026-05-23 0:55 UTC (permalink / raw)
To: Jamal Hadi Salim
Cc: Rajat Gupta, netdev, davem, edumazet, pabeni, horms, jiri,
yimingqian591, keenanat2000, 2045gemini, rollkingzzc
On Fri, 22 May 2026 13:01:49 -0400 Jamal Hadi Salim wrote:
> > > I must be missing something. What's the problem with changing the patch
> > > to pull headers instead? I mean - if we agree that this is where we'll
> > > end up - we should just do it now. It's the long standing kernel policy
> > > to "fix things right" instead of creating temporary fixes which then
> > > have to be reworked in -next
> >
> > I may be the one missing things. You main concern is with this:
> > + /*
> > + * If the skb has shared frags the user is likely using zero-copy
> > + * (e.g. sendfile). Those page frags may point to page-cache pages;
> > + * writing into them would silently corrupt the page cache.
> > + * Linearize so pedit operates on a private copy.
> > + * TL;DR: if you want zero-copy, don't use pedit.
> > + */
> > + if (skb_has_shared_frag(skb)) {
> > + if (__skb_linearize(skb))
> > + goto bad;
> > + }
> > +
> >
> > i.e you want that gone, correct?
> > And my concern was whether removing this still exposes things to
> > exploit even if temporarily.
> > Likely it wont. I have time, let me test the exploit with that code
> > ifdef'ed out.
>
> Ok, it fixes the issue even i ifdef that out.
> We still need the little patchlet i sent ...
More and more I feel like I'm completely missing the plot but
for the portion of the problem that's "we are writing to frags"
the fix I was trying to describe is:
diff --git a/net/sched/act_pedit.c b/net/sched/act_pedit.c
index bc20f08a2789..3a74cef58e17 100644
--- a/net/sched/act_pedit.c
+++ b/net/sched/act_pedit.c
@@ -414,7 +414,7 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb,
for (i = parms->tcfp_nkeys; i > 0; i--, tkey++) {
int offset = tkey->off;
int hoffset = 0;
- u32 *ptr, hdata;
+ u32 *ptr;
u32 val;
int rc;
@@ -456,10 +456,9 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb,
goto bad;
}
- ptr = skb_header_pointer(skb, hoffset + offset,
- sizeof(hdata), &hdata);
- if (!ptr)
+ if (!pskb_may_pull(skb, hoffset + offset + sizeof(*ptr)))
goto bad;
+ ptr = (u32 *)(skb->data + hoffset + offset);
/* just do it, baby */
switch (cmd) {
case TCA_PEDIT_KEY_EX_CMD_SET:
@@ -474,8 +473,6 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb,
}
*ptr = ((*ptr & tkey->mask) ^ val);
- if (ptr == &hdata)
- skb_store_bits(skb, hoffset + offset, ptr, 4);
}
goto done;
^ permalink raw reply related [flat|nested] 44+ messages in thread* Re: [PATCH net] net/sched: fix pedit partial COW leading to page cache corruption
2026-05-23 0:55 ` Jakub Kicinski
@ 2026-05-23 12:07 ` Jamal Hadi Salim
2026-05-23 12:13 ` Jamal Hadi Salim
0 siblings, 1 reply; 44+ messages in thread
From: Jamal Hadi Salim @ 2026-05-23 12:07 UTC (permalink / raw)
To: Jakub Kicinski
Cc: Rajat Gupta, netdev, davem, edumazet, pabeni, horms, jiri,
yimingqian591, keenanat2000, 2045gemini, rollkingzzc
On Fri, May 22, 2026 at 8:55 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Fri, 22 May 2026 13:01:49 -0400 Jamal Hadi Salim wrote:
> > > > I must be missing something. What's the problem with changing the patch
> > > > to pull headers instead? I mean - if we agree that this is where we'll
> > > > end up - we should just do it now. It's the long standing kernel policy
> > > > to "fix things right" instead of creating temporary fixes which then
> > > > have to be reworked in -next
> > >
> > > I may be the one missing things. You main concern is with this:
> > > + /*
> > > + * If the skb has shared frags the user is likely using zero-copy
> > > + * (e.g. sendfile). Those page frags may point to page-cache pages;
> > > + * writing into them would silently corrupt the page cache.
> > > + * Linearize so pedit operates on a private copy.
> > > + * TL;DR: if you want zero-copy, don't use pedit.
> > > + */
> > > + if (skb_has_shared_frag(skb)) {
> > > + if (__skb_linearize(skb))
> > > + goto bad;
> > > + }
> > > +
> > >
> > > i.e you want that gone, correct?
> > > And my concern was whether removing this still exposes things to
> > > exploit even if temporarily.
> > > Likely it wont. I have time, let me test the exploit with that code
> > > ifdef'ed out.
> >
> > Ok, it fixes the issue even i ifdef that out.
> > We still need the little patchlet i sent ...
>
> More and more I feel like I'm completely missing the plot but
> for the portion of the problem that's "we are writing to frags"
> the fix I was trying to describe is:
>
> diff --git a/net/sched/act_pedit.c b/net/sched/act_pedit.c
> index bc20f08a2789..3a74cef58e17 100644
> --- a/net/sched/act_pedit.c
> +++ b/net/sched/act_pedit.c
> @@ -414,7 +414,7 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb,
> for (i = parms->tcfp_nkeys; i > 0; i--, tkey++) {
> int offset = tkey->off;
> int hoffset = 0;
> - u32 *ptr, hdata;
> + u32 *ptr;
> u32 val;
> int rc;
>
> @@ -456,10 +456,9 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb,
> goto bad;
> }
>
> - ptr = skb_header_pointer(skb, hoffset + offset,
> - sizeof(hdata), &hdata);
> - if (!ptr)
> + if (!pskb_may_pull(skb, hoffset + offset + sizeof(*ptr)))
> goto bad;
> + ptr = (u32 *)(skb->data + hoffset + offset);
> /* just do it, baby */
> switch (cmd) {
> case TCA_PEDIT_KEY_EX_CMD_SET:
> @@ -474,8 +473,6 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb,
> }
>
> *ptr = ((*ptr & tkey->mask) ^ val);
> - if (ptr == &hdata)
> - skb_store_bits(skb, hoffset + offset, ptr, 4);
> }
>
> goto done;
I see you are trying to get rid of the skb_header_pointer() /
skb_store_bits() piece. Sure looks cleaner if we must linearize.
My primary concern would be it would break negative offsets - which
that patchlet and much of the dance in the Rajat patch accounts for.
ex: pskb_may_pull len being an unsigned int implies hoffset + offset +
sizeof(*ptr) will be intepreted as a very large positive number
I dont have time to test but something like:
# Change Destination MAC to 00:11:22:33:44:55
# Offset -14 is the start of the Ethernet header (relative to IP header)
tc filter add dev eth0 ingress protocol ip flower \
action pedit munge offset -14 u32 set 0x00112233
cheers,
jamal
^ permalink raw reply [flat|nested] 44+ messages in thread* Re: [PATCH net] net/sched: fix pedit partial COW leading to page cache corruption
2026-05-23 12:07 ` Jamal Hadi Salim
@ 2026-05-23 12:13 ` Jamal Hadi Salim
2026-05-23 16:46 ` Jakub Kicinski
0 siblings, 1 reply; 44+ messages in thread
From: Jamal Hadi Salim @ 2026-05-23 12:13 UTC (permalink / raw)
To: Jakub Kicinski
Cc: Rajat Gupta, netdev, davem, edumazet, pabeni, horms, jiri,
yimingqian591, keenanat2000, 2045gemini, rollkingzzc
()
On Sat, May 23, 2026 at 8:07 AM Jamal Hadi Salim <jhs@mojatatu.com> wrote:
>
> On Fri, May 22, 2026 at 8:55 PM Jakub Kicinski <kuba@kernel.org> wrote:
> >
> > On Fri, 22 May 2026 13:01:49 -0400 Jamal Hadi Salim wrote:
> > > > > I must be missing something. What's the problem with changing the patch
> > > > > to pull headers instead? I mean - if we agree that this is where we'll
> > > > > end up - we should just do it now. It's the long standing kernel policy
> > > > > to "fix things right" instead of creating temporary fixes which then
> > > > > have to be reworked in -next
> > > >
> > > > I may be the one missing things. You main concern is with this:
> > > > + /*
> > > > + * If the skb has shared frags the user is likely using zero-copy
> > > > + * (e.g. sendfile). Those page frags may point to page-cache pages;
> > > > + * writing into them would silently corrupt the page cache.
> > > > + * Linearize so pedit operates on a private copy.
> > > > + * TL;DR: if you want zero-copy, don't use pedit.
> > > > + */
> > > > + if (skb_has_shared_frag(skb)) {
> > > > + if (__skb_linearize(skb))
> > > > + goto bad;
> > > > + }
> > > > +
> > > >
> > > > i.e you want that gone, correct?
> > > > And my concern was whether removing this still exposes things to
> > > > exploit even if temporarily.
> > > > Likely it wont. I have time, let me test the exploit with that code
> > > > ifdef'ed out.
> > >
> > > Ok, it fixes the issue even i ifdef that out.
> > > We still need the little patchlet i sent ...
> >
> > More and more I feel like I'm completely missing the plot but
> > for the portion of the problem that's "we are writing to frags"
> > the fix I was trying to describe is:
> >
> > diff --git a/net/sched/act_pedit.c b/net/sched/act_pedit.c
> > index bc20f08a2789..3a74cef58e17 100644
> > --- a/net/sched/act_pedit.c
> > +++ b/net/sched/act_pedit.c
> > @@ -414,7 +414,7 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb,
> > for (i = parms->tcfp_nkeys; i > 0; i--, tkey++) {
> > int offset = tkey->off;
> > int hoffset = 0;
> > - u32 *ptr, hdata;
> > + u32 *ptr;
> > u32 val;
> > int rc;
> >
> > @@ -456,10 +456,9 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb,
> > goto bad;
> > }
> >
> > - ptr = skb_header_pointer(skb, hoffset + offset,
> > - sizeof(hdata), &hdata);
> > - if (!ptr)
> > + if (!pskb_may_pull(skb, hoffset + offset + sizeof(*ptr)))
> > goto bad;
> > + ptr = (u32 *)(skb->data + hoffset + offset);
> > /* just do it, baby */
> > switch (cmd) {
> > case TCA_PEDIT_KEY_EX_CMD_SET:
> > @@ -474,8 +473,6 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb,
> > }
> >
> > *ptr = ((*ptr & tkey->mask) ^ val);
> > - if (ptr == &hdata)
> > - skb_store_bits(skb, hoffset + offset, ptr, 4);
> > }
> >
> > goto done;
>
> I see you are trying to get rid of the skb_header_pointer() /
> skb_store_bits() piece. Sure looks cleaner if we must linearize.
The other thing (i may be over thinking) with pskb_may_pull is: if the
data is already linear (in a clone), wouldn't we corrupt the shared
linear data of the clone?
cheers,
jamal
> My primary concern would be it would break negative offsets - which
> that patchlet and much of the dance in the Rajat patch accounts for.
> ex: pskb_may_pull len being an unsigned int implies hoffset + offset +
> sizeof(*ptr) will be intepreted as a very large positive number
>
> I dont have time to test but something like:
> # Change Destination MAC to 00:11:22:33:44:55
> # Offset -14 is the start of the Ethernet header (relative to IP header)
> tc filter add dev eth0 ingress protocol ip flower \
> action pedit munge offset -14 u32 set 0x00112233
>
> cheers,
> jamal
^ permalink raw reply [flat|nested] 44+ messages in thread* Re: [PATCH net] net/sched: fix pedit partial COW leading to page cache corruption
2026-05-23 12:13 ` Jamal Hadi Salim
@ 2026-05-23 16:46 ` Jakub Kicinski
2026-05-23 16:57 ` Jamal Hadi Salim
0 siblings, 1 reply; 44+ messages in thread
From: Jakub Kicinski @ 2026-05-23 16:46 UTC (permalink / raw)
To: Jamal Hadi Salim
Cc: Rajat Gupta, netdev, davem, edumazet, pabeni, horms, jiri,
yimingqian591, keenanat2000, 2045gemini, rollkingzzc
On Sat, 23 May 2026 08:13:21 -0400 Jamal Hadi Salim wrote:
> > > @@ -474,8 +473,6 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb,
> > > }
> > >
> > > *ptr = ((*ptr & tkey->mask) ^ val);
> > > - if (ptr == &hdata)
> > > - skb_store_bits(skb, hoffset + offset, ptr, 4);
> > > }
> > >
> > > goto done;
> >
> > I see you are trying to get rid of the skb_header_pointer() /
> > skb_store_bits() piece. Sure looks cleaner if we must linearize.
>
> The other thing (i may be over thinking) with pskb_may_pull is: if the
> data is already linear (in a clone), wouldn't we corrupt the shared
> linear data of the clone?
I said
for the portion of the problem that's "we are writing to frags"
IOW not replacing the rest of the patch (assuming we care).
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH net] net/sched: fix pedit partial COW leading to page cache corruption
2026-05-23 16:46 ` Jakub Kicinski
@ 2026-05-23 16:57 ` Jamal Hadi Salim
2026-05-25 15:39 ` Jakub Kicinski
2026-05-26 9:48 ` David Laight
0 siblings, 2 replies; 44+ messages in thread
From: Jamal Hadi Salim @ 2026-05-23 16:57 UTC (permalink / raw)
To: Jakub Kicinski
Cc: Rajat Gupta, netdev, davem, edumazet, pabeni, horms, jiri,
yimingqian591, keenanat2000, 2045gemini, rollkingzzc
On Sat, May 23, 2026 at 12:46 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Sat, 23 May 2026 08:13:21 -0400 Jamal Hadi Salim wrote:
> > > > @@ -474,8 +473,6 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb,
> > > > }
> > > >
> > > > *ptr = ((*ptr & tkey->mask) ^ val);
> > > > - if (ptr == &hdata)
> > > > - skb_store_bits(skb, hoffset + offset, ptr, 4);
> > > > }
> > > >
> > > > goto done;
> > >
> > > I see you are trying to get rid of the skb_header_pointer() /
> > > skb_store_bits() piece. Sure looks cleaner if we must linearize.
> >
> > The other thing (i may be over thinking) with pskb_may_pull is: if the
> > data is already linear (in a clone), wouldn't we corrupt the shared
> > linear data of the clone?
>
> I said
>
> for the portion of the problem that's "we are writing to frags"
>
> IOW not replacing the rest of the patch (assuming we care).
So as an alternative to the piece i posted? i.e this:
diff --git a/net/sched/act_pedit.c b/net/sched/act_pedit.c
index 79921b8d89ba..8f0f84b50c85 100644
--- a/net/sched/act_pedit.c
+++ b/net/sched/act_pedit.c
@@ -474,6 +474,12 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb,
if (write_offset < 0) {
if (skb_cow(skb, -write_offset))
goto bad;
+ if (write_offset + (int)sizeof(hdata) > 0) {
+ if (skb_ensure_writable(skb,
+ min_t(int, skb->len,
+ write_offset +
(int)sizeof(hdata))))
+ goto bad;
+ }
} else {
if (unlikely(check_add_overflow(write_offset,
(int)sizeof(hdata),
cheers,
jamal
^ permalink raw reply related [flat|nested] 44+ messages in thread* Re: [PATCH net] net/sched: fix pedit partial COW leading to page cache corruption
2026-05-23 16:57 ` Jamal Hadi Salim
@ 2026-05-25 15:39 ` Jakub Kicinski
2026-05-25 16:22 ` Jamal Hadi Salim
2026-05-26 9:48 ` David Laight
1 sibling, 1 reply; 44+ messages in thread
From: Jakub Kicinski @ 2026-05-25 15:39 UTC (permalink / raw)
To: Jamal Hadi Salim
Cc: Rajat Gupta, netdev, davem, edumazet, pabeni, horms, jiri,
yimingqian591, keenanat2000, 2045gemini, rollkingzzc
On Sat, 23 May 2026 12:57:08 -0400 Jamal Hadi Salim wrote:
> On Sat, May 23, 2026 at 12:46 PM Jakub Kicinski <kuba@kernel.org> wrote:
> >
> > On Sat, 23 May 2026 08:13:21 -0400 Jamal Hadi Salim wrote:
> [...]
> [...]
> > >
> > > The other thing (i may be over thinking) with pskb_may_pull is: if the
> > > data is already linear (in a clone), wouldn't we corrupt the shared
> > > linear data of the clone?
> >
> > I said
> >
> > for the portion of the problem that's "we are writing to frags"
> >
> > IOW not replacing the rest of the patch (assuming we care).
>
> So as an alternative to the piece i posted? i.e this:
>
> diff --git a/net/sched/act_pedit.c b/net/sched/act_pedit.c
> index 79921b8d89ba..8f0f84b50c85 100644
> --- a/net/sched/act_pedit.c
> +++ b/net/sched/act_pedit.c
> @@ -474,6 +474,12 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb,
> if (write_offset < 0) {
> if (skb_cow(skb, -write_offset))
> goto bad;
> + if (write_offset + (int)sizeof(hdata) > 0) {
> + if (skb_ensure_writable(skb,
> + min_t(int, skb->len,
> + write_offset + (int)sizeof(hdata))))
> + goto bad;
> + }
> } else {
> if (unlikely(check_add_overflow(write_offset,
> (int)sizeof(hdata),
Yup! Even better.
^ permalink raw reply [flat|nested] 44+ messages in thread* Re: [PATCH net] net/sched: fix pedit partial COW leading to page cache corruption
2026-05-25 15:39 ` Jakub Kicinski
@ 2026-05-25 16:22 ` Jamal Hadi Salim
2026-05-25 17:34 ` Jakub Kicinski
0 siblings, 1 reply; 44+ messages in thread
From: Jamal Hadi Salim @ 2026-05-25 16:22 UTC (permalink / raw)
To: Jakub Kicinski
Cc: Rajat Gupta, netdev, davem, edumazet, pabeni, horms, jiri,
yimingqian591, keenanat2000, 2045gemini, rollkingzzc
On Mon, May 25, 2026 at 11:39 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Sat, 23 May 2026 12:57:08 -0400 Jamal Hadi Salim wrote:
> > On Sat, May 23, 2026 at 12:46 PM Jakub Kicinski <kuba@kernel.org> wrote:
> > >
> > > On Sat, 23 May 2026 08:13:21 -0400 Jamal Hadi Salim wrote:
> > [...]
> > [...]
> > > >
> > > > The other thing (i may be over thinking) with pskb_may_pull is: if the
> > > > data is already linear (in a clone), wouldn't we corrupt the shared
> > > > linear data of the clone?
> > >
> > > I said
> > >
> > > for the portion of the problem that's "we are writing to frags"
> > >
> > > IOW not replacing the rest of the patch (assuming we care).
> >
> > So as an alternative to the piece i posted? i.e this:
> >
> > diff --git a/net/sched/act_pedit.c b/net/sched/act_pedit.c
> > index 79921b8d89ba..8f0f84b50c85 100644
> > --- a/net/sched/act_pedit.c
> > +++ b/net/sched/act_pedit.c
> > @@ -474,6 +474,12 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb,
> > if (write_offset < 0) {
> > if (skb_cow(skb, -write_offset))
> > goto bad;
> > + if (write_offset + (int)sizeof(hdata) > 0) {
> > + if (skb_ensure_writable(skb,
> > + min_t(int, skb->len,
> > + write_offset + (int)sizeof(hdata))))
> > + goto bad;
> > + }
> > } else {
> > if (unlikely(check_add_overflow(write_offset,
> > (int)sizeof(hdata),
>
> Yup! Even better.
Dude, it's hard to follow you sometimes ;-> It's hard to grok what you
mean the problem of "we are writing to frags".
Let me try to be verbose and you can narrow it down. There are _two_
codelets dealing with "frags" both of which can be written to.
1) The patch from Rajat deals with zero copy from app with shared
flags. That's whats being exploited in the wild.
There's a piece of code there that does this in Rajat's patch to handle it:
+ /*
+ * If the skb has shared frags the user is likely using zero-copy
+ * (e.g. sendfile). Those page frags may point to page-cache pages;
+ * writing into them would silently corrupt the page cache.
+ * Linearize so pedit operates on a private copy.
+ * TL;DR: if you want zero-copy, don't use pedit.
+ */
+ if (skb_has_shared_frag(skb)) {
+ if (__skb_linearize(skb))
+ goto bad;
+ }
After you posted, i thought this is the "we are writing to frags"
issue you were referring to.
I if-zeroed that code and indeed it does not seem that the exploit can
be executed even with that taken out.
2) There's the frags coming from the other (driver) direction in
particular when skbs get cloned. That is what sashiko2 (nipa variant)
pointed out. IOW, there's no active report for that specific one but
Han Guidong responded after i posted this:
--- a/net/sched/act_pedit.c
+++ b/net/sched/act_pedit.c
@@ -474,6 +474,12 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb,
if (write_offset < 0) {
if (skb_cow(skb, -write_offset))
goto bad;
+ if (write_offset + (int)sizeof(hdata) > 0) {
+ if (skb_ensure_writable(skb,
+ min_t(int, skb->len,
+ write_offset +
(int)sizeof(hdata))))
+ goto bad;
+ }
} else {
if (unlikely(check_add_overflow(write_offset,
(int)sizeof(hdata),
Saying he was able to recreate that scenario with a kernel module. And
that this patchlet fixed it.
Hope you are still following at this point ;->
So when you said we can use skb pulls - I thought you were referring
to removing the above patchlet and instead to use an skb pull approach
(for which you posted a sample patch). I mentioned the two issues:
1) It will likely break the negative offset that work with pedit
already (skb pull could conceivably be tricked to assume a large
positive number)
2) that skb clones could result in writting into the shared data
(which i said i may be overthinking).
So which one of the two are you referring to? Or maybe it is both.
Should we keep #1? or this the one that should be replaced?
Are you ok with patchlet for number #2? Or do you want that replaced?
Provide as much context as you can so we dont go back and forth ;->
cheers,
jamal
^ permalink raw reply [flat|nested] 44+ messages in thread* Re: [PATCH net] net/sched: fix pedit partial COW leading to page cache corruption
2026-05-25 16:22 ` Jamal Hadi Salim
@ 2026-05-25 17:34 ` Jakub Kicinski
2026-05-25 19:03 ` Jamal Hadi Salim
0 siblings, 1 reply; 44+ messages in thread
From: Jakub Kicinski @ 2026-05-25 17:34 UTC (permalink / raw)
To: Jamal Hadi Salim
Cc: Rajat Gupta, netdev, davem, edumazet, pabeni, horms, jiri,
yimingqian591, keenanat2000, 2045gemini, rollkingzzc
On Mon, 25 May 2026 12:22:40 -0400 Jamal Hadi Salim wrote:
> On Mon, May 25, 2026 at 11:39 AM Jakub Kicinski <kuba@kernel.org> wrote:
> > > So as an alternative to the piece i posted? i.e this:
> > >
> > > diff --git a/net/sched/act_pedit.c b/net/sched/act_pedit.c
> > > index 79921b8d89ba..8f0f84b50c85 100644
> > > --- a/net/sched/act_pedit.c
> > > +++ b/net/sched/act_pedit.c
> > > @@ -474,6 +474,12 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb,
> > > if (write_offset < 0) {
> > > if (skb_cow(skb, -write_offset))
> > > goto bad;
> > > + if (write_offset + (int)sizeof(hdata) > 0) {
> > > + if (skb_ensure_writable(skb,
> > > + min_t(int, skb->len,
> > > + write_offset + (int)sizeof(hdata))))
> > > + goto bad;
> > > + }
> > > } else {
> > > if (unlikely(check_add_overflow(write_offset,
> > > (int)sizeof(hdata),
> >
> > Yup! Even better.
>
> Dude, it's hard to follow you sometimes ;-> It's hard to grok what you
> mean the problem of "we are writing to frags".
Long threads have the tendency of losing focus.
Better to just repost the whole diff than tossing snippets at some point.. ;)
> Let me try to be verbose and you can narrow it down. There are _two_
> codelets dealing with "frags" both of which can be written to.
>
> 1) The patch from Rajat deals with zero copy from app with shared
> flags. That's whats being exploited in the wild.
> There's a piece of code there that does this in Rajat's patch to handle it:
>
> + /*
> + * If the skb has shared frags the user is likely using zero-copy
> + * (e.g. sendfile). Those page frags may point to page-cache pages;
> + * writing into them would silently corrupt the page cache.
> + * Linearize so pedit operates on a private copy.
> + * TL;DR: if you want zero-copy, don't use pedit.
> + */
> + if (skb_has_shared_frag(skb)) {
> + if (__skb_linearize(skb))
> + goto bad;
> + }
>
> After you posted, i thought this is the "we are writing to frags"
> issue you were referring to.
> I if-zeroed that code and indeed it does not seem that the exploit can
> be executed even with that taken out.
I don't want the skb_has_shared_frag() being added, that's my ask.
TBH I missed that skb_ensure_writable() when reading the patch.
If we use skb_ensure_writable() before all the writes we can delete
the skb_has_shared_frag() (as your test proves). This is the extent
to which I care about the patch, anything else - no strong opinion :)
> 2) There's the frags coming from the other (driver) direction in
> particular when skbs get cloned. That is what sashiko2 (nipa variant)
> pointed out. IOW, there's no active report for that specific one but
> Han Guidong responded after i posted this:
>
> --- a/net/sched/act_pedit.c
> +++ b/net/sched/act_pedit.c
> @@ -474,6 +474,12 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb,
> if (write_offset < 0) {
> if (skb_cow(skb, -write_offset))
> goto bad;
> + if (write_offset + (int)sizeof(hdata) > 0) {
> + if (skb_ensure_writable(skb,
> + min_t(int, skb->len,
> + write_offset + (int)sizeof(hdata))))
> + goto bad;
> + }
> } else {
> if (unlikely(check_add_overflow(write_offset,
> (int)sizeof(hdata),
>
> Saying he was able to recreate that scenario with a kernel module. And
> that this patchlet fixed it.
>
> Hope you are still following at this point ;->
> So when you said we can use skb pulls - I thought you were referring
> to removing the above patchlet and instead to use an skb pull approach
> (for which you posted a sample patch).
Yes, skb_ensure_writable() does a pull already. So the only problem
with existing patch was that the negative offset branch was missing
a skb_ensure_writable(). Your snippet added it, plugging that hole,
so skb_has_shared_frag() can now be 100% safely removed. Hence my
"LGTM".
Please also remove the skb_store_bits(), and skb_header_pointer().
Unless I'm missing something these are dead code.
> I mentioned the two issues:
> 1) It will likely break the negative offset that work with pedit
> already (skb pull could conceivably be tricked to assume a large
> positive number)
Your snippet looked fine tho unnecessarily complex in practice.
(AI generated?). I'd go with skb_ensure_writable(sizeof(*ptr))
as the worst case. Packets shorter than 4B are irrelevant in practice.
But again, up to you.
> 2) that skb clones could result in writting into the shared data
> (which i said i may be overthinking).
>
> So which one of the two are you referring to? Or maybe it is both.
> Should we keep #1? or this the one that should be replaced?
> Are you ok with patchlet for number #2? Or do you want that replaced?
>
> Provide as much context as you can so we dont go back and forth ;->
^ permalink raw reply [flat|nested] 44+ messages in thread* Re: [PATCH net] net/sched: fix pedit partial COW leading to page cache corruption
2026-05-25 17:34 ` Jakub Kicinski
@ 2026-05-25 19:03 ` Jamal Hadi Salim
2026-05-26 2:06 ` Rajat Gupta
0 siblings, 1 reply; 44+ messages in thread
From: Jamal Hadi Salim @ 2026-05-25 19:03 UTC (permalink / raw)
To: Jakub Kicinski
Cc: Rajat Gupta, netdev, davem, edumazet, pabeni, horms, jiri,
yimingqian591, keenanat2000, 2045gemini, rollkingzzc
On Mon, May 25, 2026 at 1:34 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Mon, 25 May 2026 12:22:40 -0400 Jamal Hadi Salim wrote:
> > On Mon, May 25, 2026 at 11:39 AM Jakub Kicinski <kuba@kernel.org> wrote:
> > > > So as an alternative to the piece i posted? i.e this:
> > > >
> > > > diff --git a/net/sched/act_pedit.c b/net/sched/act_pedit.c
> > > > index 79921b8d89ba..8f0f84b50c85 100644
> > > > --- a/net/sched/act_pedit.c
> > > > +++ b/net/sched/act_pedit.c
> > > > @@ -474,6 +474,12 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb,
> > > > if (write_offset < 0) {
> > > > if (skb_cow(skb, -write_offset))
> > > > goto bad;
> > > > + if (write_offset + (int)sizeof(hdata) > 0) {
> > > > + if (skb_ensure_writable(skb,
> > > > + min_t(int, skb->len,
> > > > + write_offset + (int)sizeof(hdata))))
> > > > + goto bad;
> > > > + }
> > > > } else {
> > > > if (unlikely(check_add_overflow(write_offset,
> > > > (int)sizeof(hdata),
> > >
> > > Yup! Even better.
> >
> > Dude, it's hard to follow you sometimes ;-> It's hard to grok what you
> > mean the problem of "we are writing to frags".
>
> Long threads have the tendency of losing focus.
> Better to just repost the whole diff than tossing snippets at some point.. ;)
>
> > Let me try to be verbose and you can narrow it down. There are _two_
> > codelets dealing with "frags" both of which can be written to.
> >
> > 1) The patch from Rajat deals with zero copy from app with shared
> > flags. That's whats being exploited in the wild.
> > There's a piece of code there that does this in Rajat's patch to handle it:
> >
> > + /*
> > + * If the skb has shared frags the user is likely using zero-copy
> > + * (e.g. sendfile). Those page frags may point to page-cache pages;
> > + * writing into them would silently corrupt the page cache.
> > + * Linearize so pedit operates on a private copy.
> > + * TL;DR: if you want zero-copy, don't use pedit.
> > + */
> > + if (skb_has_shared_frag(skb)) {
> > + if (__skb_linearize(skb))
> > + goto bad;
> > + }
> >
> > After you posted, i thought this is the "we are writing to frags"
> > issue you were referring to.
> > I if-zeroed that code and indeed it does not seem that the exploit can
> > be executed even with that taken out.
>
> I don't want the skb_has_shared_frag() being added, that's my ask.
> TBH I missed that skb_ensure_writable() when reading the patch.
>
> If we use skb_ensure_writable() before all the writes we can delete
> the skb_has_shared_frag() (as your test proves). This is the extent
> to which I care about the patch, anything else - no strong opinion :)
>
Ok, I will send the patch probably tommorow AM because Rajat may be
confused by now ;->
> > 2) There's the frags coming from the other (driver) direction in
> > particular when skbs get cloned. That is what sashiko2 (nipa variant)
> > pointed out. IOW, there's no active report for that specific one but
> > Han Guidong responded after i posted this:
> >
> > --- a/net/sched/act_pedit.c
> > +++ b/net/sched/act_pedit.c
> > @@ -474,6 +474,12 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb,
> > if (write_offset < 0) {
> > if (skb_cow(skb, -write_offset))
> > goto bad;
> > + if (write_offset + (int)sizeof(hdata) > 0) {
> > + if (skb_ensure_writable(skb,
> > + min_t(int, skb->len,
> > + write_offset + (int)sizeof(hdata))))
> > + goto bad;
> > + }
> > } else {
> > if (unlikely(check_add_overflow(write_offset,
> > (int)sizeof(hdata),
> >
> > Saying he was able to recreate that scenario with a kernel module. And
> > that this patchlet fixed it.
> >
> > Hope you are still following at this point ;->
> > So when you said we can use skb pulls - I thought you were referring
> > to removing the above patchlet and instead to use an skb pull approach
> > (for which you posted a sample patch).
>
> Yes, skb_ensure_writable() does a pull already. So the only problem
> with existing patch was that the negative offset branch was missing
> a skb_ensure_writable(). Your snippet added it, plugging that hole,
> so skb_has_shared_frag() can now be 100% safely removed. Hence my
> "LGTM".
>
> Please also remove the skb_store_bits(), and skb_header_pointer().
> Unless I'm missing something these are dead code.
>
That also seems sensible. But needs testing.
> > I mentioned the two issues:
> > 1) It will likely break the negative offset that work with pedit
> > already (skb pull could conceivably be tricked to assume a large
> > positive number)
>
> Your snippet looked fine tho unnecessarily complex in practice.
> (AI generated?). I'd go with skb_ensure_writable(sizeof(*ptr))
> as the worst case. Packets shorter than 4B are irrelevant in practice.
> But again, up to you.
>
Sashiko-nipa provided very good context and implicitly suggested a way
to solve it (and it doesnt seem to be a mechanical turk ;->).
We need a session at netdev conf to discuss all this tooling. Maybe
some of the security people can show up and share their secret trade -
it's overloading.
cheers,
jamal
> > 2) that skb clones could result in writting into the shared data
> > (which i said i may be overthinking).
> >
> > So which one of the two are you referring to? Or maybe it is both.
> > Should we keep #1? or this the one that should be replaced?
> > Are you ok with patchlet for number #2? Or do you want that replaced?
> >
> > Provide as much context as you can so we dont go back and forth ;->
^ permalink raw reply [flat|nested] 44+ messages in thread* Re: [PATCH net] net/sched: fix pedit partial COW leading to page cache corruption
2026-05-25 19:03 ` Jamal Hadi Salim
@ 2026-05-26 2:06 ` Rajat Gupta
0 siblings, 0 replies; 44+ messages in thread
From: Rajat Gupta @ 2026-05-26 2:06 UTC (permalink / raw)
To: Jamal Hadi Salim, Jakub Kicinski
Cc: netdev, davem, edumazet, pabeni, horms, jiri, yimingqian591,
keenanat2000, 2045gemini, rollkingzzc
On 5/25/2026 12:03 PM, Jamal Hadi Salim wrote:
> On Mon, May 25, 2026 at 1:34 PM Jakub Kicinski <kuba@kernel.org> wrote:
>>
>> On Mon, 25 May 2026 12:22:40 -0400 Jamal Hadi Salim wrote:
>>> On Mon, May 25, 2026 at 11:39 AM Jakub Kicinski <kuba@kernel.org> wrote:
>>>>> So as an alternative to the piece i posted? i.e this:
>>>>>
>>>>> diff --git a/net/sched/act_pedit.c b/net/sched/act_pedit.c
>>>>> index 79921b8d89ba..8f0f84b50c85 100644
>>>>> --- a/net/sched/act_pedit.c
>>>>> +++ b/net/sched/act_pedit.c
>>>>> @@ -474,6 +474,12 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb,
>>>>> if (write_offset < 0) {
>>>>> if (skb_cow(skb, -write_offset))
>>>>> goto bad;
>>>>> + if (write_offset + (int)sizeof(hdata) > 0) {
>>>>> + if (skb_ensure_writable(skb,
>>>>> + min_t(int, skb->len,
>>>>> + write_offset + (int)sizeof(hdata))))
>>>>> + goto bad;
>>>>> + }
>>>>> } else {
>>>>> if (unlikely(check_add_overflow(write_offset,
>>>>> (int)sizeof(hdata),
>>>>
>>>> Yup! Even better.
>>>
>>> Dude, it's hard to follow you sometimes ;-> It's hard to grok what you
>>> mean the problem of "we are writing to frags".
>>
>> Long threads have the tendency of losing focus.
>> Better to just repost the whole diff than tossing snippets at some point.. ;)
>>
>>> Let me try to be verbose and you can narrow it down. There are _two_
>>> codelets dealing with "frags" both of which can be written to.
>>>
>>> 1) The patch from Rajat deals with zero copy from app with shared
>>> flags. That's whats being exploited in the wild.
>>> There's a piece of code there that does this in Rajat's patch to handle it:
>>>
>>> + /*
>>> + * If the skb has shared frags the user is likely using zero-copy
>>> + * (e.g. sendfile). Those page frags may point to page-cache pages;
>>> + * writing into them would silently corrupt the page cache.
>>> + * Linearize so pedit operates on a private copy.
>>> + * TL;DR: if you want zero-copy, don't use pedit.
>>> + */
>>> + if (skb_has_shared_frag(skb)) {
>>> + if (__skb_linearize(skb))
>>> + goto bad;
>>> + }
>>>
>>> After you posted, i thought this is the "we are writing to frags"
>>> issue you were referring to.
>>> I if-zeroed that code and indeed it does not seem that the exploit can
>>> be executed even with that taken out.
>>
>> I don't want the skb_has_shared_frag() being added, that's my ask.
>> TBH I missed that skb_ensure_writable() when reading the patch.
>>
>> If we use skb_ensure_writable() before all the writes we can delete
>> the skb_has_shared_frag() (as your test proves). This is the extent
>> to which I care about the patch, anything else - no strong opinion :)
>>
>
> Ok, I will send the patch probably tommorow AM because Rajat may be
> confused by now ;->
Ahah! Thanks Jamal, I indeed kind-of lost track. But if you need me
to test/help-with anything, I'm always here :-) >
>>> 2) There's the frags coming from the other (driver) direction in
>>> particular when skbs get cloned. That is what sashiko2 (nipa variant)
>>> pointed out. IOW, there's no active report for that specific one but
>>> Han Guidong responded after i posted this:
>>>
>>> --- a/net/sched/act_pedit.c
>>> +++ b/net/sched/act_pedit.c
>>> @@ -474,6 +474,12 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb,
>>> if (write_offset < 0) {
>>> if (skb_cow(skb, -write_offset))
>>> goto bad;
>>> + if (write_offset + (int)sizeof(hdata) > 0) {
>>> + if (skb_ensure_writable(skb,
>>> + min_t(int, skb->len,
>>> + write_offset + (int)sizeof(hdata))))
>>> + goto bad;
>>> + }
>>> } else {
>>> if (unlikely(check_add_overflow(write_offset,
>>> (int)sizeof(hdata),
>>>
>>> Saying he was able to recreate that scenario with a kernel module. And
>>> that this patchlet fixed it.
>>>
>>> Hope you are still following at this point ;->
>>> So when you said we can use skb pulls - I thought you were referring
>>> to removing the above patchlet and instead to use an skb pull approach
>>> (for which you posted a sample patch).
>>
>> Yes, skb_ensure_writable() does a pull already. So the only problem
>> with existing patch was that the negative offset branch was missing
>> a skb_ensure_writable(). Your snippet added it, plugging that hole,
>> so skb_has_shared_frag() can now be 100% safely removed. Hence my
>> "LGTM".
>>
>> Please also remove the skb_store_bits(), and skb_header_pointer().
>> Unless I'm missing something these are dead code.
>>
>
> That also seems sensible. But needs testing.
>
>>> I mentioned the two issues:
>>> 1) It will likely break the negative offset that work with pedit
>>> already (skb pull could conceivably be tricked to assume a large
>>> positive number)
>>
>> Your snippet looked fine tho unnecessarily complex in practice.
>> (AI generated?). I'd go with skb_ensure_writable(sizeof(*ptr))
>> as the worst case. Packets shorter than 4B are irrelevant in practice.
>> But again, up to you.
>>
>
> Sashiko-nipa provided very good context and implicitly suggested a way
> to solve it (and it doesnt seem to be a mechanical turk ;->).
>
> We need a session at netdev conf to discuss all this tooling. Maybe
> some of the security people can show up and share their secret trade -
> it's overloading.
>
> cheers,
> jamal
>>> 2) that skb clones could result in writting into the shared data
>>> (which i said i may be overthinking).
>>>
>>> So which one of the two are you referring to? Or maybe it is both.
>>> Should we keep #1? or this the one that should be replaced?
>>> Are you ok with patchlet for number #2? Or do you want that replaced?
>>>
>>> Provide as much context as you can so we dont go back and forth ;->
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH net] net/sched: fix pedit partial COW leading to page cache corruption
2026-05-23 16:57 ` Jamal Hadi Salim
2026-05-25 15:39 ` Jakub Kicinski
@ 2026-05-26 9:48 ` David Laight
2026-05-26 11:57 ` Jamal Hadi Salim
1 sibling, 1 reply; 44+ messages in thread
From: David Laight @ 2026-05-26 9:48 UTC (permalink / raw)
To: Jamal Hadi Salim
Cc: Jakub Kicinski, Rajat Gupta, netdev, davem, edumazet, pabeni,
horms, jiri, yimingqian591, keenanat2000, 2045gemini, rollkingzzc
On Sat, 23 May 2026 12:57:08 -0400
Jamal Hadi Salim <jhs@mojatatu.com> wrote:
> On Sat, May 23, 2026 at 12:46 PM Jakub Kicinski <kuba@kernel.org> wrote:
> >
> > On Sat, 23 May 2026 08:13:21 -0400 Jamal Hadi Salim wrote:
> > > > > @@ -474,8 +473,6 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb,
> > > > > }
> > > > >
> > > > > *ptr = ((*ptr & tkey->mask) ^ val);
> > > > > - if (ptr == &hdata)
> > > > > - skb_store_bits(skb, hoffset + offset, ptr, 4);
> > > > > }
> > > > >
> > > > > goto done;
> > > >
> > > > I see you are trying to get rid of the skb_header_pointer() /
> > > > skb_store_bits() piece. Sure looks cleaner if we must linearize.
> > >
> > > The other thing (i may be over thinking) with pskb_may_pull is: if the
> > > data is already linear (in a clone), wouldn't we corrupt the shared
> > > linear data of the clone?
> >
> > I said
> >
> > for the portion of the problem that's "we are writing to frags"
> >
> > IOW not replacing the rest of the patch (assuming we care).
>
> So as an alternative to the piece i posted? i.e this:
>
> diff --git a/net/sched/act_pedit.c b/net/sched/act_pedit.c
> index 79921b8d89ba..8f0f84b50c85 100644
> --- a/net/sched/act_pedit.c
> +++ b/net/sched/act_pedit.c
> @@ -474,6 +474,12 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb,
> if (write_offset < 0) {
> if (skb_cow(skb, -write_offset))
> goto bad;
> + if (write_offset + (int)sizeof(hdata) > 0) {
> + if (skb_ensure_writable(skb,
> + min_t(int, skb->len,
> + write_offset +
> (int)sizeof(hdata))))
I don't think that needs to be min_t(), a plain min() will do.
(Possibly with the (int) cast removed from the sizeof.)
-- David
> + goto bad;
> + }
> } else {
> if (unlikely(check_add_overflow(write_offset,
> (int)sizeof(hdata),
>
> cheers,
> jamal
>
^ permalink raw reply [flat|nested] 44+ messages in thread* Re: [PATCH net] net/sched: fix pedit partial COW leading to page cache corruption
2026-05-26 9:48 ` David Laight
@ 2026-05-26 11:57 ` Jamal Hadi Salim
2026-05-26 13:08 ` David Laight
0 siblings, 1 reply; 44+ messages in thread
From: Jamal Hadi Salim @ 2026-05-26 11:57 UTC (permalink / raw)
To: David Laight
Cc: Jakub Kicinski, Rajat Gupta, netdev, davem, edumazet, pabeni,
horms, jiri, yimingqian591, keenanat2000, 2045gemini, rollkingzzc
On Tue, May 26, 2026 at 5:48 AM David Laight
<david.laight.linux@gmail.com> wrote:
>
> On Sat, 23 May 2026 12:57:08 -0400
> Jamal Hadi Salim <jhs@mojatatu.com> wrote:
>
> > On Sat, May 23, 2026 at 12:46 PM Jakub Kicinski <kuba@kernel.org> wrote:
> > >
> > > On Sat, 23 May 2026 08:13:21 -0400 Jamal Hadi Salim wrote:
> > > > > > @@ -474,8 +473,6 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb,
> > > > > > }
> > > > > >
> > > > > > *ptr = ((*ptr & tkey->mask) ^ val);
> > > > > > - if (ptr == &hdata)
> > > > > > - skb_store_bits(skb, hoffset + offset, ptr, 4);
> > > > > > }
> > > > > >
> > > > > > goto done;
> > > > >
> > > > > I see you are trying to get rid of the skb_header_pointer() /
> > > > > skb_store_bits() piece. Sure looks cleaner if we must linearize.
> > > >
> > > > The other thing (i may be over thinking) with pskb_may_pull is: if the
> > > > data is already linear (in a clone), wouldn't we corrupt the shared
> > > > linear data of the clone?
> > >
> > > I said
> > >
> > > for the portion of the problem that's "we are writing to frags"
> > >
> > > IOW not replacing the rest of the patch (assuming we care).
> >
> > So as an alternative to the piece i posted? i.e this:
> >
> > diff --git a/net/sched/act_pedit.c b/net/sched/act_pedit.c
> > index 79921b8d89ba..8f0f84b50c85 100644
> > --- a/net/sched/act_pedit.c
> > +++ b/net/sched/act_pedit.c
> > @@ -474,6 +474,12 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb,
> > if (write_offset < 0) {
> > if (skb_cow(skb, -write_offset))
> > goto bad;
> > + if (write_offset + (int)sizeof(hdata) > 0) {
> > + if (skb_ensure_writable(skb,
> > + min_t(int, skb->len,
> > + write_offset +
> > (int)sizeof(hdata))))
>
> I don't think that needs to be min_t(), a plain min() will do.
> (Possibly with the (int) cast removed from the sizeof.)
>
what is the benefit? would min() allow a mixed type (eg skb_len not being int)?
cheers,
jamal
> -- David
>
> > + goto bad;
> > + }
> > } else {
> > if (unlikely(check_add_overflow(write_offset,
> > (int)sizeof(hdata),
> >
> > cheers,
> > jamal
> >
>
^ permalink raw reply [flat|nested] 44+ messages in thread* Re: [PATCH net] net/sched: fix pedit partial COW leading to page cache corruption
2026-05-26 11:57 ` Jamal Hadi Salim
@ 2026-05-26 13:08 ` David Laight
2026-05-26 14:22 ` Jamal Hadi Salim
0 siblings, 1 reply; 44+ messages in thread
From: David Laight @ 2026-05-26 13:08 UTC (permalink / raw)
To: Jamal Hadi Salim
Cc: Jakub Kicinski, Rajat Gupta, netdev, davem, edumazet, pabeni,
horms, jiri, yimingqian591, keenanat2000, 2045gemini, rollkingzzc
On Tue, 26 May 2026 07:57:08 -0400
Jamal Hadi Salim <jhs@mojatatu.com> wrote:
> On Tue, May 26, 2026 at 5:48 AM David Laight
> <david.laight.linux@gmail.com> wrote:
> >
> > On Sat, 23 May 2026 12:57:08 -0400
> > Jamal Hadi Salim <jhs@mojatatu.com> wrote:
> >
> > > On Sat, May 23, 2026 at 12:46 PM Jakub Kicinski <kuba@kernel.org> wrote:
> > > >
> > > > On Sat, 23 May 2026 08:13:21 -0400 Jamal Hadi Salim wrote:
> > > > > > > @@ -474,8 +473,6 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb,
> > > > > > > }
> > > > > > >
> > > > > > > *ptr = ((*ptr & tkey->mask) ^ val);
> > > > > > > - if (ptr == &hdata)
> > > > > > > - skb_store_bits(skb, hoffset + offset, ptr, 4);
> > > > > > > }
> > > > > > >
> > > > > > > goto done;
> > > > > >
> > > > > > I see you are trying to get rid of the skb_header_pointer() /
> > > > > > skb_store_bits() piece. Sure looks cleaner if we must linearize.
> > > > >
> > > > > The other thing (i may be over thinking) with pskb_may_pull is: if the
> > > > > data is already linear (in a clone), wouldn't we corrupt the shared
> > > > > linear data of the clone?
> > > >
> > > > I said
> > > >
> > > > for the portion of the problem that's "we are writing to frags"
> > > >
> > > > IOW not replacing the rest of the patch (assuming we care).
> > >
> > > So as an alternative to the piece i posted? i.e this:
> > >
> > > diff --git a/net/sched/act_pedit.c b/net/sched/act_pedit.c
> > > index 79921b8d89ba..8f0f84b50c85 100644
> > > --- a/net/sched/act_pedit.c
> > > +++ b/net/sched/act_pedit.c
> > > @@ -474,6 +474,12 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb,
> > > if (write_offset < 0) {
> > > if (skb_cow(skb, -write_offset))
> > > goto bad;
> > > + if (write_offset + (int)sizeof(hdata) > 0) {
> > > + if (skb_ensure_writable(skb,
> > > + min_t(int, skb->len,
> > > + write_offset +
> > > (int)sizeof(hdata))))
> >
> > I don't think that needs to be min_t(), a plain min() will do.
> > (Possibly with the (int) cast removed from the sizeof.)
> >
>
> what is the benefit? would min() allow a mixed type (eg skb_len not being int)?
min_t() is really a broken idea.
You wouldn't really write:
if ((int)skb->len < (int)(write_offset + (int)sizeof(hdata)))
which it what it effectively expands to.
IMHO it was always better to cast the one side that was 'wrong'.
Then there wouldn't be all the broken/pointless max_t(u8, x, 255).
Here min(skb->len, write_offset + (int)sizeof(hdata)) is actually likely to
be allowed even though they types are unsigned and signed because the
compiler knows the preceding 'if' stops negative values getting through.
(Sometimes the KASAN (etc) builds (randomly) error such cases.)
But min(skb->len, write_offset + sizeof(hdata)) will always be fine.
Both sides are unsigned, a negative 32bit write_offset is sign extended
to 64bits before being converted to unsigned (as 2s compliment) and the
sum is positive.
But the compiler might not do CSE with the previous condition.
-- David
>
> cheers,
> jamal
>
> > -- David
> >
> > > + goto bad;
> > > + }
> > > } else {
> > > if (unlikely(check_add_overflow(write_offset,
> > > (int)sizeof(hdata),
> > >
> > > cheers,
> > > jamal
> > >
> >
^ permalink raw reply [flat|nested] 44+ messages in thread* Re: [PATCH net] net/sched: fix pedit partial COW leading to page cache corruption
2026-05-26 13:08 ` David Laight
@ 2026-05-26 14:22 ` Jamal Hadi Salim
0 siblings, 0 replies; 44+ messages in thread
From: Jamal Hadi Salim @ 2026-05-26 14:22 UTC (permalink / raw)
To: David Laight
Cc: Jakub Kicinski, Rajat Gupta, netdev, davem, edumazet, pabeni,
horms, jiri, yimingqian591, keenanat2000, 2045gemini, rollkingzzc
On Tue, May 26, 2026 at 9:08 AM David Laight
<david.laight.linux@gmail.com> wrote:
>
> On Tue, 26 May 2026 07:57:08 -0400
> Jamal Hadi Salim <jhs@mojatatu.com> wrote:
>
> > On Tue, May 26, 2026 at 5:48 AM David Laight
> > <david.laight.linux@gmail.com> wrote:
> > >
> > > On Sat, 23 May 2026 12:57:08 -0400
> > > Jamal Hadi Salim <jhs@mojatatu.com> wrote:
> > >
> > > > On Sat, May 23, 2026 at 12:46 PM Jakub Kicinski <kuba@kernel.org> wrote:
> > > > >
> > > > > On Sat, 23 May 2026 08:13:21 -0400 Jamal Hadi Salim wrote:
> > > > > > > > @@ -474,8 +473,6 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb,
> > > > > > > > }
> > > > > > > >
> > > > > > > > *ptr = ((*ptr & tkey->mask) ^ val);
> > > > > > > > - if (ptr == &hdata)
> > > > > > > > - skb_store_bits(skb, hoffset + offset, ptr, 4);
> > > > > > > > }
> > > > > > > >
> > > > > > > > goto done;
> > > > > > >
> > > > > > > I see you are trying to get rid of the skb_header_pointer() /
> > > > > > > skb_store_bits() piece. Sure looks cleaner if we must linearize.
> > > > > >
> > > > > > The other thing (i may be over thinking) with pskb_may_pull is: if the
> > > > > > data is already linear (in a clone), wouldn't we corrupt the shared
> > > > > > linear data of the clone?
> > > > >
> > > > > I said
> > > > >
> > > > > for the portion of the problem that's "we are writing to frags"
> > > > >
> > > > > IOW not replacing the rest of the patch (assuming we care).
> > > >
> > > > So as an alternative to the piece i posted? i.e this:
> > > >
> > > > diff --git a/net/sched/act_pedit.c b/net/sched/act_pedit.c
> > > > index 79921b8d89ba..8f0f84b50c85 100644
> > > > --- a/net/sched/act_pedit.c
> > > > +++ b/net/sched/act_pedit.c
> > > > @@ -474,6 +474,12 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb,
> > > > if (write_offset < 0) {
> > > > if (skb_cow(skb, -write_offset))
> > > > goto bad;
> > > > + if (write_offset + (int)sizeof(hdata) > 0) {
> > > > + if (skb_ensure_writable(skb,
> > > > + min_t(int, skb->len,
> > > > + write_offset +
> > > > (int)sizeof(hdata))))
> > >
> > > I don't think that needs to be min_t(), a plain min() will do.
> > > (Possibly with the (int) cast removed from the sizeof.)
> > >
> >
> > what is the benefit? would min() allow a mixed type (eg skb_len not being int)?
>
> min_t() is really a broken idea.
> You wouldn't really write:
> if ((int)skb->len < (int)(write_offset + (int)sizeof(hdata)))
> which it what it effectively expands to.
> IMHO it was always better to cast the one side that was 'wrong'.
> Then there wouldn't be all the broken/pointless max_t(u8, x, 255).
>
> Here min(skb->len, write_offset + (int)sizeof(hdata)) is actually likely to
> be allowed even though they types are unsigned and signed because the
> compiler knows the preceding 'if' stops negative values getting through.
> (Sometimes the KASAN (etc) builds (randomly) error such cases.)
>
> But min(skb->len, write_offset + sizeof(hdata)) will always be fine.
So i am looking at min(a,b) macro
It does a strict type check to ensure that both arguments have the
same _exact_ type.
But i will take your word for it since you seem to be keeping on top
of these things.
I will update this in the next patch. And if the test passes i will post
cheers,
jamal
> Both sides are unsigned, a negative 32bit write_offset is sign extended
> to 64bits before being converted to unsigned (as 2s compliment) and the
> sum is positive.
> But the compiler might not do CSE with the previous condition.
>
> -- David
>
> >
> > cheers,
> > jamal
> >
> > > -- David
> > >
> > > > + goto bad;
> > > > + }
> > > > } else {
> > > > if (unlikely(check_add_overflow(write_offset,
> > > > (int)sizeof(hdata),
> > > >
> > > > cheers,
> > > > jamal
> > > >
> > >
>
^ permalink raw reply [flat|nested] 44+ messages in thread
[parent not found: <CAKa-r6soz=iMBiYG0Grhhc12yhdw9vMNV+XjjEPCmtgKK6+rhA@mail.gmail.com>]
* Re: [PATCH net] net/sched: fix pedit partial COW leading to page cache corruption
[not found] ` <CAKa-r6soz=iMBiYG0Grhhc12yhdw9vMNV+XjjEPCmtgKK6+rhA@mail.gmail.com>
@ 2026-05-21 15:56 ` Jakub Kicinski
2026-05-22 11:49 ` Jamal Hadi Salim
1 sibling, 0 replies; 44+ messages in thread
From: Jakub Kicinski @ 2026-05-21 15:56 UTC (permalink / raw)
To: Davide Caratti
Cc: Jamal Hadi Salim, Rajat Gupta, netdev, davem, edumazet, pabeni,
horms, jiri, yimingqian591, keenanat2000, 2045gemini, rollkingzzc
On Thu, 21 May 2026 17:51:25 +0200 Davide Caratti wrote:
> > Do you know of anyone modifying payloads with pedit?
>
> yes: MPTCP kselftest does that on purpose, to mimic a middlebox and trigger
> DSS checksum failures [1]
As I said in another reply (looks like the ML is having a significant
lag this morning :() - we would still support it, the shared frag
tracking is just an optimization to avoid a copy. And maintaining that
optimization for an occasional CI test is not the right eng tradeoff.
^ permalink raw reply [flat|nested] 44+ messages in thread* Re: [PATCH net] net/sched: fix pedit partial COW leading to page cache corruption
[not found] ` <CAKa-r6soz=iMBiYG0Grhhc12yhdw9vMNV+XjjEPCmtgKK6+rhA@mail.gmail.com>
2026-05-21 15:56 ` Jakub Kicinski
@ 2026-05-22 11:49 ` Jamal Hadi Salim
2026-05-22 12:00 ` Toke Høiland-Jørgensen
2026-05-22 14:49 ` Davide Caratti
1 sibling, 2 replies; 44+ messages in thread
From: Jamal Hadi Salim @ 2026-05-22 11:49 UTC (permalink / raw)
To: Davide Caratti
Cc: Jakub Kicinski, Rajat Gupta, netdev, davem, edumazet, pabeni,
horms, jiri, yimingqian591, keenanat2000, 2045gemini, rollkingzzc,
Toke Høiland-Jørgensen
On Thu, May 21, 2026 at 11:51 AM Davide Caratti <dcaratti@redhat.com> wrote:
>
> On Thu, May 21, 2026 at 4:59 PM Jakub Kicinski <kuba@kernel.org> wrote:
> >
> > On Thu, 21 May 2026 06:15:17 -0400 Jamal Hadi Salim wrote:
> > > > This is the same claim as sashiko1 but sashiko2 gave a much more
> > > > convincing description ;->
> > > > skb_has_shared_frag() is only true if the frags are flagged as
> > > > SKBFL_SHARED_FRAG (which is what the repro did); however, if we get
> > > > frags from eg a driver on ingress and that skb gets cloned with frags
> > > > we won't catch it.
> > > > One approach is to do an if (skb_has_any_shared_frags(skb)) and then
> > > > do a skb_linearize_cow() but that sounds like overkill.
> > >
> > > Yeah, this would be overkill - imagine running tcpdump 100% will be cloned
> > >
> > > > Another which will make the patch even uglier (but less expensive) is
> > > > to add an extra check insde the patch's "if (write_offset < 0)"
> > > > to do: if (write_offset + (int)sizeof(hdata) > 0) { skb_ensure_writable()}
> > > >
> > >
> > > To be precise, something like attached (untested, uncompiled)
>
> hi Jamal,
>
> I tested Rajat's patch with your latest addition; it compiles and passes with the same subset of tests ran earlier by Toke.
> Agree some follow-ups can be done (e.g. removing the hint, and maybe another smaller thing not yet detected by Sashiko) but AFAICT you can add my Reviewed-by: when sending v2.
>
Thanks Davide. And a Tested-by as well? I guess the same goes for Toke.
Rajat, please send a v2 with all the feedback once we hear from Jakub
what he thinks of keeping the shared frags check.
cheers,
jamal
> > Do you know of anyone modifying payloads with pedit?
>
> hi Jakub,
> yes: MPTCP kselftest does that on purpose, to mimic a middlebox and trigger DSS checksum failures [1]
>
> [1] https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=b6e074e171bc5cc83eb91b03c2558c2d9fccf26e
>
> thanks!
> --
> davide
>
>
>
>
>
>
^ permalink raw reply [flat|nested] 44+ messages in thread* Re: [PATCH net] net/sched: fix pedit partial COW leading to page cache corruption
2026-05-22 11:49 ` Jamal Hadi Salim
@ 2026-05-22 12:00 ` Toke Høiland-Jørgensen
2026-05-22 14:49 ` Davide Caratti
1 sibling, 0 replies; 44+ messages in thread
From: Toke Høiland-Jørgensen @ 2026-05-22 12:00 UTC (permalink / raw)
To: Jamal Hadi Salim, Davide Caratti
Cc: Jakub Kicinski, Rajat Gupta, netdev, davem, edumazet, pabeni,
horms, jiri, yimingqian591, keenanat2000, 2045gemini, rollkingzzc
Jamal Hadi Salim <jhs@mojatatu.com> writes:
> On Thu, May 21, 2026 at 11:51 AM Davide Caratti <dcaratti@redhat.com> wrote:
>>
>> On Thu, May 21, 2026 at 4:59 PM Jakub Kicinski <kuba@kernel.org> wrote:
>> >
>> > On Thu, 21 May 2026 06:15:17 -0400 Jamal Hadi Salim wrote:
>> > > > This is the same claim as sashiko1 but sashiko2 gave a much more
>> > > > convincing description ;->
>> > > > skb_has_shared_frag() is only true if the frags are flagged as
>> > > > SKBFL_SHARED_FRAG (which is what the repro did); however, if we get
>> > > > frags from eg a driver on ingress and that skb gets cloned with frags
>> > > > we won't catch it.
>> > > > One approach is to do an if (skb_has_any_shared_frags(skb)) and then
>> > > > do a skb_linearize_cow() but that sounds like overkill.
>> > >
>> > > Yeah, this would be overkill - imagine running tcpdump 100% will be cloned
>> > >
>> > > > Another which will make the patch even uglier (but less expensive) is
>> > > > to add an extra check insde the patch's "if (write_offset < 0)"
>> > > > to do: if (write_offset + (int)sizeof(hdata) > 0) { skb_ensure_writable()}
>> > > >
>> > >
>> > > To be precise, something like attached (untested, uncompiled)
>>
>> hi Jamal,
>>
>> I tested Rajat's patch with your latest addition; it compiles and passes with the same subset of tests ran earlier by Toke.
>> Agree some follow-ups can be done (e.g. removing the hint, and maybe another smaller thing not yet detected by Sashiko) but AFAICT you can add my Reviewed-by: when sending v2.
>>
>
> Thanks Davide. And a Tested-by as well? I guess the same goes for
> Toke.
Applied your hunk on top Rajat's patch and re-ran the tests. So feel
free to apply my reviewed-by and tested-by to the combination on
resubmit :)
-Toke
^ permalink raw reply [flat|nested] 44+ messages in thread* Re: [PATCH net] net/sched: fix pedit partial COW leading to page cache corruption
2026-05-22 11:49 ` Jamal Hadi Salim
2026-05-22 12:00 ` Toke Høiland-Jørgensen
@ 2026-05-22 14:49 ` Davide Caratti
1 sibling, 0 replies; 44+ messages in thread
From: Davide Caratti @ 2026-05-22 14:49 UTC (permalink / raw)
To: Jamal Hadi Salim
Cc: Jakub Kicinski, Rajat Gupta, netdev, davem, edumazet, pabeni,
horms, jiri, yimingqian591, keenanat2000, 2045gemini, rollkingzzc,
Toke Høiland-Jørgensen
On Fri, May 22, 2026 at 07:49:59AM -0400, Jamal Hadi Salim wrote:
> On Thu, May 21, 2026 at 11:51 AM Davide Caratti <dcaratti@redhat.com> wrote:
> >
> > On Thu, May 21, 2026 at 4:59 PM Jakub Kicinski <kuba@kernel.org> wrote:
[...]
> > > > To be precise, something like attached (untested, uncompiled)
> >
> > hi Jamal,
> >
> > I tested Rajat's patch with your latest addition; it compiles and passes with the same subset of tests ran earlier by Toke.
> > Agree some follow-ups can be done (e.g. removing the hint, and maybe another smaller thing not yet detected by Sashiko) but AFAICT you can add my Reviewed-by: when sending v2.
> >
>
> Thanks Davide. And a Tested-by as well? I guess the same goes for Toke.
yes,
Tssted-by: Davide Caratti <dcaratti@redhat.com>
(for the patch with Jamal's addition for negative offsets)
thanks a lot !!!
--
davide
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH net] net/sched: fix pedit partial COW leading to page cache corruption
2026-05-21 10:15 ` Jamal Hadi Salim
2026-05-21 14:35 ` Jakub Kicinski
@ 2026-05-22 7:49 ` Han Guidong
1 sibling, 0 replies; 44+ messages in thread
From: Han Guidong @ 2026-05-22 7:49 UTC (permalink / raw)
To: Jamal Hadi Salim
Cc: Rajat Gupta, netdev, davem, edumazet, kuba, pabeni, horms, jiri,
yimingqian591, keenanat2000, rollkingzzc
On Thu, May 21, 2026 at 6:15 PM Jamal Hadi Salim <jhs@mojatatu.com> wrote:
>
> On Thu, May 21, 2026 at 5:53 AM Jamal Hadi Salim <jhs@mojatatu.com> wrote:
> >
> > ::
> >
> > On Wed, May 20, 2026 at 4:00 PM Jamal Hadi Salim <jhs@mojatatu.com> wrote:
> > >
> > > On Wed, May 20, 2026 at 5:23 AM Jamal Hadi Salim <jhs@mojatatu.com> wrote:
> > > >
> > > > On Mon, May 18, 2026 at 11:42 PM Rajat Gupta
> > > > <rajat.gupta@oss.qualcomm.com> wrote:
> > > > >
> > > > > tcf_pedit_act() computes the COW range for skb_ensure_writable()
> > > > > once before the key loop using tcfp_off_max_hint, but the hint does
> > > > > not account for the runtime header offset added by typed keys. This
> > > > > can leave part of the write region un-COW'd.
> > > > >
> > > > > Fix by moving skb_ensure_writable() inside the per-key loop where
> > > > > the actual write offset is known, and add overflow checking on the
> > > > > offset arithmetic. For negative offsets (e.g. Ethernet header edits
> > > > > at ingress), use skb_cow() to COW the headroom instead. Guard
> > > > > offset_valid() against INT_MIN, where negation is undefined.
> > > > >
> > > > > Additionally, linearize skbs with shared frags upfront to prevent
> > > > > silent data corruption when pedit operates on zero-copy pages
> > > > > (e.g. from sendfile).
> > > > >
> > > >
> > > > Sashiko makes the claim that:
> > > > "
> > > > ....
> > > > .....
> > > > If the payload part extending past offset 0 resides in a shared
> > > > fragment for a cloned skb, wouldn't skb_store_bits() write the
> > > > positive offset portion directly into the shared page, causing a data
> > > > corruption regression?
> > > > "
> > > >
> > > > It missed the earlier code (before the quoted code is hit) which is
> > > > very well documented!
> > > > /*
> > > > * If the skb has shared frags the user is likely using zero-copy
> > > > * (e.g. sendfile). Those page frags may point to page-cache pages;
> > > > * writing into them would silently corrupt the page cache.
> > > > * Linearize so pedit operates on a private copy.
> > > > * TL;DR: if you want zero-copy, don't use pedit.
> > > > */
> > > > if (skb_has_shared_frag(skb)) {
> > > > if (__skb_linearize(skb))
> > > > goto bad;
> > > > }
> > > >
> > > > TLDR:
> > > > If the skb has shared frags, it is linearized upfront using
> > > > __skb_linearize() which means any risk of modifying a shared fragment
> > > > is non-existent.
> > >
> > > It seems there's a second sashiko ive been made aware of - these
> > > things are just multiplying ;->
> > > https://netdev-ai.bots.linux.dev/sashiko/#/patchset/20260519033950.2037-1-rajat.gupta%40oss.qualcomm.com
> > >
> > > Both points it makes are valid. Dont have time right now.
> > > And infact looking at the second point sashiko 2 made I realize I was
> > > only paying attention to externally shared fragments which the poc
> > > used.
> > > So probably the sashiko1 had a point as well. Anyways i dont have time
> > > - will respond by am tommorow.
> > >
> >
> > Sashiko2, claim 1 (rephrased for brevity):
> > "With the pre-loop skb_ensure_writable() removed, tcfp_off_max_hint
> > and related use is no longer needed.
> >
> > I was going to say "separate patch" but since the cause is this patch
> > that removed the use, at the expense of a much bigger patch we should
> > remove it (delete it from include/net/tc_act/tc_pedit.h::struct
> > tcf_pedit_parms and net/sched/act_pedit.c - as described in sashiko2
> > claim.
> >
> > Sashiko2, claim 2 (summarized):
> > "..skb_store_bits() will memcpy() the tail bytes of that 4-byte write
> > onto frag pages that are still referenced by the original clone, so a
> > clone held by a tap, AF_PACKET listener, or mirroring path would see
> > the modified bytes on its shared pages."
> >
> > This is the same claim as sashiko1 but sashiko2 gave a much more
> > convincing description ;->
> > skb_has_shared_frag() is only true if the frags are flagged as
> > SKBFL_SHARED_FRAG (which is what the repro did); however, if we get
> > frags from eg a driver on ingress and that skb gets cloned with frags
> > we won't catch it.
> > One approach is to do an if (skb_has_any_shared_frags(skb)) and then
> > do a skb_linearize_cow() but that sounds like overkill.
>
> Yeah, this would be overkill - imagine running tcpdump 100% will be cloned
>
> > Another which will make the patch even uglier (but less expensive) is
> > to add an extra check insde the patch's "if (write_offset < 0)"
> > to do: if (write_offset + (int)sizeof(hdata) > 0) { skb_ensure_writable()}
> >
>
> To be precise, something like attached (untested, uncompiled)
Tested this and it looks correct.
I used a small kernel module to inject a cloned skb through the real
tc ingress path with a crafted layout: crafted skb_mac_offset(),
tiny/no linear head after skb->data, and the rest in a clone-shared
frag tail. With Rajat's patch alone, the peer clone could still be
changed. With this added hunk, the peer clone stayed unchanged.
Thanks.
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH net] net/sched: fix pedit partial COW leading to page cache corruption
2026-05-19 3:39 ` [PATCH net] net/sched: fix pedit partial COW leading to page cache corruption Rajat Gupta
` (2 preceding siblings ...)
2026-05-20 9:23 ` Jamal Hadi Salim
@ 2026-05-26 9:53 ` David Laight
2026-05-26 12:01 ` Jamal Hadi Salim
3 siblings, 1 reply; 44+ messages in thread
From: David Laight @ 2026-05-26 9:53 UTC (permalink / raw)
To: Rajat Gupta
Cc: netdev, davem, edumazet, kuba, pabeni, horms, jhs, jiri,
yimingqian591, keenanat2000, 2045gemini, rollkingzzc
On Mon, 18 May 2026 20:39:50 -0700
Rajat Gupta <rajat.gupta@oss.qualcomm.com> wrote:
> tcf_pedit_act() computes the COW range for skb_ensure_writable()
> once before the key loop using tcfp_off_max_hint, but the hint does
> not account for the runtime header offset added by typed keys. This
> can leave part of the write region un-COW'd.
>
> Fix by moving skb_ensure_writable() inside the per-key loop where
> the actual write offset is known, and add overflow checking on the
> offset arithmetic. For negative offsets (e.g. Ethernet header edits
> at ingress), use skb_cow() to COW the headroom instead. Guard
> offset_valid() against INT_MIN, where negation is undefined.
>
> Additionally, linearize skbs with shared frags upfront to prevent
> silent data corruption when pedit operates on zero-copy pages
> (e.g. from sendfile).
>
> Fixes: 8b796475fd78 ("net/sched: act_pedit: really ensure the skb is writable")
> Reported-by: Rajat Gupta <rajat.gupta@oss.qualcomm.com>
> Reported-by: Yiming Qian <yimingqian591@gmail.com>
> Reported-by: Keenan Dong <keenanat2000@gmail.com>
> Reported-by: Han Guidong <2045gemini@gmail.com>
> Reported-by: Zhang Cen <rollkingzzc@gmail.com>
> Tested-by: Han Guidong <2045gemini@gmail.com>
> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
> Signed-off-by: Rajat Gupta <rajat.gupta@oss.qualcomm.com>
> ---
> net/sched/act_pedit.c | 54 ++++++++++++++++++++++++++++++++-----------
> 1 file changed, 41 insertions(+), 13 deletions(-)
>
> diff --git a/net/sched/act_pedit.c b/net/sched/act_pedit.c
> index bc20f08a2..79921b8d8 100644
> --- a/net/sched/act_pedit.c
> +++ b/net/sched/act_pedit.c
> @@ -16,6 +16,7 @@
> #include <linux/ip.h>
> #include <linux/ipv6.h>
> #include <linux/slab.h>
> +#include <linux/overflow.h>
> #include <net/ipv6.h>
> #include <net/netlink.h>
> #include <net/pkt_sched.h>
> @@ -323,8 +324,10 @@ static bool offset_valid(struct sk_buff *skb, int offset)
> if (offset > 0 && offset > skb->len)
> return false;
>
> - if (offset < 0 && -offset > skb_headroom(skb))
> - return false;
> + if (offset < 0) {
> + if (offset == INT_MIN || -offset > skb_headroom(skb))
> + return false;
> + }
Can't you negate skb_headroom() instead - that cannot be INT_MIN.
So:
if (offset < 0 && offset < -skb_headroom(skb))
>
> return true;
> }
> @@ -393,17 +396,21 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb,
> struct tcf_pedit_key_ex *tkey_ex;
> struct tcf_pedit_parms *parms;
> struct tc_pedit_key *tkey;
> - u32 max_offset;
> int i;
>
> parms = rcu_dereference_bh(p->parms);
>
> - max_offset = (skb_transport_header_was_set(skb) ?
> - skb_transport_offset(skb) :
> - skb_network_offset(skb)) +
> - parms->tcfp_off_max_hint;
> - if (skb_ensure_writable(skb, min(skb->len, max_offset)))
> - goto done;
> + /*
> + * If the skb has shared frags the user is likely using zero-copy
> + * (e.g. sendfile). Those page frags may point to page-cache pages;
> + * writing into them would silently corrupt the page cache.
> + * Linearize so pedit operates on a private copy.
> + * TL;DR: if you want zero-copy, don't use pedit.
> + */
> + if (skb_has_shared_frag(skb)) {
> + if (__skb_linearize(skb))
> + goto bad;
> + }
Should there be a way of 'unsharing' frags by just copying the frags
rather than doing a full linearize?
That would be much less likely to fail for big skb.
-- David
>
> tcf_lastuse_update(&p->tcf_tm);
> tcf_action_update_bstats(&p->common, skb);
> @@ -412,6 +419,7 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb,
> tkey_ex = parms->tcfp_keys_ex;
>
> for (i = parms->tcfp_nkeys; i > 0; i--, tkey++) {
> + int write_offset, write_len;
> int offset = tkey->off;
> int hoffset = 0;
> u32 *ptr, hdata;
> @@ -451,12 +459,32 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb,
> }
> }
>
> - if (!offset_valid(skb, hoffset + offset)) {
> - pr_info_ratelimited("tc action pedit offset %d out of bounds\n", hoffset + offset);
> + if (unlikely(check_add_overflow(hoffset, offset,
> + &write_offset))) {
> + pr_info_ratelimited("tc action pedit offset overflow\n");
> + goto bad;
> + }
> +
> + if (!offset_valid(skb, write_offset)) {
> + pr_info_ratelimited("tc action pedit offset %d out of bounds\n",
> + write_offset);
> goto bad;
> }
>
> - ptr = skb_header_pointer(skb, hoffset + offset,
> + if (write_offset < 0) {
> + if (skb_cow(skb, -write_offset))
> + goto bad;
> + } else {
> + if (unlikely(check_add_overflow(write_offset,
> + (int)sizeof(hdata),
> + &write_len)))
> + goto bad;
> + if (skb_ensure_writable(skb, min_t(int, skb->len,
> + write_len)))
> + goto bad;
> + }
> +
> + ptr = skb_header_pointer(skb, write_offset,
> sizeof(hdata), &hdata);
> if (!ptr)
> goto bad;
> @@ -475,7 +503,7 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb,
>
> *ptr = ((*ptr & tkey->mask) ^ val);
> if (ptr == &hdata)
> - skb_store_bits(skb, hoffset + offset, ptr, 4);
> + skb_store_bits(skb, write_offset, ptr, sizeof(hdata));
> }
>
> goto done;
^ permalink raw reply [flat|nested] 44+ messages in thread* Re: [PATCH net] net/sched: fix pedit partial COW leading to page cache corruption
2026-05-26 9:53 ` David Laight
@ 2026-05-26 12:01 ` Jamal Hadi Salim
2026-05-26 12:47 ` David Laight
0 siblings, 1 reply; 44+ messages in thread
From: Jamal Hadi Salim @ 2026-05-26 12:01 UTC (permalink / raw)
To: David Laight
Cc: Rajat Gupta, netdev, davem, edumazet, kuba, pabeni, horms, jiri,
yimingqian591, keenanat2000, 2045gemini, rollkingzzc
On Tue, May 26, 2026 at 5:53 AM David Laight
<david.laight.linux@gmail.com> wrote:
>
> On Mon, 18 May 2026 20:39:50 -0700
> Rajat Gupta <rajat.gupta@oss.qualcomm.com> wrote:
>
> > tcf_pedit_act() computes the COW range for skb_ensure_writable()
> > once before the key loop using tcfp_off_max_hint, but the hint does
> > not account for the runtime header offset added by typed keys. This
> > can leave part of the write region un-COW'd.
> >
> > Fix by moving skb_ensure_writable() inside the per-key loop where
> > the actual write offset is known, and add overflow checking on the
> > offset arithmetic. For negative offsets (e.g. Ethernet header edits
> > at ingress), use skb_cow() to COW the headroom instead. Guard
> > offset_valid() against INT_MIN, where negation is undefined.
> >
> > Additionally, linearize skbs with shared frags upfront to prevent
> > silent data corruption when pedit operates on zero-copy pages
> > (e.g. from sendfile).
> >
> > Fixes: 8b796475fd78 ("net/sched: act_pedit: really ensure the skb is writable")
> > Reported-by: Rajat Gupta <rajat.gupta@oss.qualcomm.com>
> > Reported-by: Yiming Qian <yimingqian591@gmail.com>
> > Reported-by: Keenan Dong <keenanat2000@gmail.com>
> > Reported-by: Han Guidong <2045gemini@gmail.com>
> > Reported-by: Zhang Cen <rollkingzzc@gmail.com>
> > Tested-by: Han Guidong <2045gemini@gmail.com>
> > Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
> > Signed-off-by: Rajat Gupta <rajat.gupta@oss.qualcomm.com>
> > ---
> > net/sched/act_pedit.c | 54 ++++++++++++++++++++++++++++++++-----------
> > 1 file changed, 41 insertions(+), 13 deletions(-)
> >
> > diff --git a/net/sched/act_pedit.c b/net/sched/act_pedit.c
> > index bc20f08a2..79921b8d8 100644
> > --- a/net/sched/act_pedit.c
> > +++ b/net/sched/act_pedit.c
> > @@ -16,6 +16,7 @@
> > #include <linux/ip.h>
> > #include <linux/ipv6.h>
> > #include <linux/slab.h>
> > +#include <linux/overflow.h>
> > #include <net/ipv6.h>
> > #include <net/netlink.h>
> > #include <net/pkt_sched.h>
> > @@ -323,8 +324,10 @@ static bool offset_valid(struct sk_buff *skb, int offset)
> > if (offset > 0 && offset > skb->len)
> > return false;
> >
> > - if (offset < 0 && -offset > skb_headroom(skb))
> > - return false;
> > + if (offset < 0) {
> > + if (offset == INT_MIN || -offset > skb_headroom(skb))
> > + return false;
> > + }
>
> Can't you negate skb_headroom() instead - that cannot be INT_MIN.
> So:
> if (offset < 0 && offset < -skb_headroom(skb))
>
You mean something like this? if (offset < 0 && offset <
-(int)skb_headroom(skb))
That does feel cleaner, yes.
> >
> > return true;
> > }
> > @@ -393,17 +396,21 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb,
> > struct tcf_pedit_key_ex *tkey_ex;
> > struct tcf_pedit_parms *parms;
> > struct tc_pedit_key *tkey;
> > - u32 max_offset;
> > int i;
> >
> > parms = rcu_dereference_bh(p->parms);
> >
> > - max_offset = (skb_transport_header_was_set(skb) ?
> > - skb_transport_offset(skb) :
> > - skb_network_offset(skb)) +
> > - parms->tcfp_off_max_hint;
> > - if (skb_ensure_writable(skb, min(skb->len, max_offset)))
> > - goto done;
> > + /*
> > + * If the skb has shared frags the user is likely using zero-copy
> > + * (e.g. sendfile). Those page frags may point to page-cache pages;
> > + * writing into them would silently corrupt the page cache.
> > + * Linearize so pedit operates on a private copy.
> > + * TL;DR: if you want zero-copy, don't use pedit.
> > + */
> > + if (skb_has_shared_frag(skb)) {
> > + if (__skb_linearize(skb))
> > + goto bad;
> > + }
>
> Should there be a way of 'unsharing' frags by just copying the frags
> rather than doing a full linearize?
> That would be much less likely to fail for big skb.
>
It has been agreed that this chunk is unnecessary, so it will be
removed in the next update.
cheers,
jamal
^ permalink raw reply [flat|nested] 44+ messages in thread* Re: [PATCH net] net/sched: fix pedit partial COW leading to page cache corruption
2026-05-26 12:01 ` Jamal Hadi Salim
@ 2026-05-26 12:47 ` David Laight
2026-05-26 12:48 ` Jamal Hadi Salim
0 siblings, 1 reply; 44+ messages in thread
From: David Laight @ 2026-05-26 12:47 UTC (permalink / raw)
To: Jamal Hadi Salim
Cc: Rajat Gupta, netdev, davem, edumazet, kuba, pabeni, horms, jiri,
yimingqian591, keenanat2000, 2045gemini, rollkingzzc
On Tue, 26 May 2026 08:01:08 -0400
Jamal Hadi Salim <jhs@mojatatu.com> wrote:
> On Tue, May 26, 2026 at 5:53 AM David Laight
> <david.laight.linux@gmail.com> wrote:
> >
> > On Mon, 18 May 2026 20:39:50 -0700
> > Rajat Gupta <rajat.gupta@oss.qualcomm.com> wrote:
> >
> > > tcf_pedit_act() computes the COW range for skb_ensure_writable()
> > > once before the key loop using tcfp_off_max_hint, but the hint does
> > > not account for the runtime header offset added by typed keys. This
> > > can leave part of the write region un-COW'd.
> > >
> > > Fix by moving skb_ensure_writable() inside the per-key loop where
> > > the actual write offset is known, and add overflow checking on the
> > > offset arithmetic. For negative offsets (e.g. Ethernet header edits
> > > at ingress), use skb_cow() to COW the headroom instead. Guard
> > > offset_valid() against INT_MIN, where negation is undefined.
> > >
> > > Additionally, linearize skbs with shared frags upfront to prevent
> > > silent data corruption when pedit operates on zero-copy pages
> > > (e.g. from sendfile).
> > >
> > > Fixes: 8b796475fd78 ("net/sched: act_pedit: really ensure the skb is writable")
> > > Reported-by: Rajat Gupta <rajat.gupta@oss.qualcomm.com>
> > > Reported-by: Yiming Qian <yimingqian591@gmail.com>
> > > Reported-by: Keenan Dong <keenanat2000@gmail.com>
> > > Reported-by: Han Guidong <2045gemini@gmail.com>
> > > Reported-by: Zhang Cen <rollkingzzc@gmail.com>
> > > Tested-by: Han Guidong <2045gemini@gmail.com>
> > > Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
> > > Signed-off-by: Rajat Gupta <rajat.gupta@oss.qualcomm.com>
> > > ---
> > > net/sched/act_pedit.c | 54 ++++++++++++++++++++++++++++++++-----------
> > > 1 file changed, 41 insertions(+), 13 deletions(-)
> > >
> > > diff --git a/net/sched/act_pedit.c b/net/sched/act_pedit.c
> > > index bc20f08a2..79921b8d8 100644
> > > --- a/net/sched/act_pedit.c
> > > +++ b/net/sched/act_pedit.c
> > > @@ -16,6 +16,7 @@
> > > #include <linux/ip.h>
> > > #include <linux/ipv6.h>
> > > #include <linux/slab.h>
> > > +#include <linux/overflow.h>
> > > #include <net/ipv6.h>
> > > #include <net/netlink.h>
> > > #include <net/pkt_sched.h>
> > > @@ -323,8 +324,10 @@ static bool offset_valid(struct sk_buff *skb, int offset)
> > > if (offset > 0 && offset > skb->len)
> > > return false;
> > >
> > > - if (offset < 0 && -offset > skb_headroom(skb))
> > > - return false;
> > > + if (offset < 0) {
> > > + if (offset == INT_MIN || -offset > skb_headroom(skb))
> > > + return false;
> > > + }
> >
> > Can't you negate skb_headroom() instead - that cannot be INT_MIN.
> > So:
> > if (offset < 0 && offset < -skb_headroom(skb))
> >
>
> You mean something like this? if (offset < 0 && offset <
> -(int)skb_headroom(skb))
> That does feel cleaner, yes.
yes - it does need the cast...
-- David
>
> > >
> > > return true;
> > > }
> > > @@ -393,17 +396,21 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb,
> > > struct tcf_pedit_key_ex *tkey_ex;
> > > struct tcf_pedit_parms *parms;
> > > struct tc_pedit_key *tkey;
> > > - u32 max_offset;
> > > int i;
> > >
> > > parms = rcu_dereference_bh(p->parms);
> > >
> > > - max_offset = (skb_transport_header_was_set(skb) ?
> > > - skb_transport_offset(skb) :
> > > - skb_network_offset(skb)) +
> > > - parms->tcfp_off_max_hint;
> > > - if (skb_ensure_writable(skb, min(skb->len, max_offset)))
> > > - goto done;
> > > + /*
> > > + * If the skb has shared frags the user is likely using zero-copy
> > > + * (e.g. sendfile). Those page frags may point to page-cache pages;
> > > + * writing into them would silently corrupt the page cache.
> > > + * Linearize so pedit operates on a private copy.
> > > + * TL;DR: if you want zero-copy, don't use pedit.
> > > + */
> > > + if (skb_has_shared_frag(skb)) {
> > > + if (__skb_linearize(skb))
> > > + goto bad;
> > > + }
> >
> > Should there be a way of 'unsharing' frags by just copying the frags
> > rather than doing a full linearize?
> > That would be much less likely to fail for big skb.
> >
>
> It has been agreed that this chunk is unnecessary, so it will be
> removed in the next update.
>
> cheers,
> jamal
^ permalink raw reply [flat|nested] 44+ messages in thread* Re: [PATCH net] net/sched: fix pedit partial COW leading to page cache corruption
2026-05-26 12:47 ` David Laight
@ 2026-05-26 12:48 ` Jamal Hadi Salim
0 siblings, 0 replies; 44+ messages in thread
From: Jamal Hadi Salim @ 2026-05-26 12:48 UTC (permalink / raw)
To: David Laight
Cc: Rajat Gupta, netdev, davem, edumazet, kuba, pabeni, horms, jiri,
yimingqian591, keenanat2000, 2045gemini, rollkingzzc
On Tue, May 26, 2026 at 8:47 AM David Laight
<david.laight.linux@gmail.com> wrote:
>
> On Tue, 26 May 2026 08:01:08 -0400
> Jamal Hadi Salim <jhs@mojatatu.com> wrote:
>
> > On Tue, May 26, 2026 at 5:53 AM David Laight
> > <david.laight.linux@gmail.com> wrote:
> > >
> > > On Mon, 18 May 2026 20:39:50 -0700
> > > Rajat Gupta <rajat.gupta@oss.qualcomm.com> wrote:
> > >
> > > > tcf_pedit_act() computes the COW range for skb_ensure_writable()
> > > > once before the key loop using tcfp_off_max_hint, but the hint does
> > > > not account for the runtime header offset added by typed keys. This
> > > > can leave part of the write region un-COW'd.
> > > >
> > > > Fix by moving skb_ensure_writable() inside the per-key loop where
> > > > the actual write offset is known, and add overflow checking on the
> > > > offset arithmetic. For negative offsets (e.g. Ethernet header edits
> > > > at ingress), use skb_cow() to COW the headroom instead. Guard
> > > > offset_valid() against INT_MIN, where negation is undefined.
> > > >
> > > > Additionally, linearize skbs with shared frags upfront to prevent
> > > > silent data corruption when pedit operates on zero-copy pages
> > > > (e.g. from sendfile).
> > > >
> > > > Fixes: 8b796475fd78 ("net/sched: act_pedit: really ensure the skb is writable")
> > > > Reported-by: Rajat Gupta <rajat.gupta@oss.qualcomm.com>
> > > > Reported-by: Yiming Qian <yimingqian591@gmail.com>
> > > > Reported-by: Keenan Dong <keenanat2000@gmail.com>
> > > > Reported-by: Han Guidong <2045gemini@gmail.com>
> > > > Reported-by: Zhang Cen <rollkingzzc@gmail.com>
> > > > Tested-by: Han Guidong <2045gemini@gmail.com>
> > > > Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
> > > > Signed-off-by: Rajat Gupta <rajat.gupta@oss.qualcomm.com>
> > > > ---
> > > > net/sched/act_pedit.c | 54 ++++++++++++++++++++++++++++++++-----------
> > > > 1 file changed, 41 insertions(+), 13 deletions(-)
> > > >
> > > > diff --git a/net/sched/act_pedit.c b/net/sched/act_pedit.c
> > > > index bc20f08a2..79921b8d8 100644
> > > > --- a/net/sched/act_pedit.c
> > > > +++ b/net/sched/act_pedit.c
> > > > @@ -16,6 +16,7 @@
> > > > #include <linux/ip.h>
> > > > #include <linux/ipv6.h>
> > > > #include <linux/slab.h>
> > > > +#include <linux/overflow.h>
> > > > #include <net/ipv6.h>
> > > > #include <net/netlink.h>
> > > > #include <net/pkt_sched.h>
> > > > @@ -323,8 +324,10 @@ static bool offset_valid(struct sk_buff *skb, int offset)
> > > > if (offset > 0 && offset > skb->len)
> > > > return false;
> > > >
> > > > - if (offset < 0 && -offset > skb_headroom(skb))
> > > > - return false;
> > > > + if (offset < 0) {
> > > > + if (offset == INT_MIN || -offset > skb_headroom(skb))
> > > > + return false;
> > > > + }
> > >
> > > Can't you negate skb_headroom() instead - that cannot be INT_MIN.
> > > So:
> > > if (offset < 0 && offset < -skb_headroom(skb))
> > >
> >
> > You mean something like this? if (offset < 0 && offset <
> > -(int)skb_headroom(skb))
> > That does feel cleaner, yes.
>
> yes - it does need the cast...
Ok, I'll update with that change.
cheers,
jamal
^ permalink raw reply [flat|nested] 44+ messages in thread