All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Laight <david.laight.linux@gmail.com>
To: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: "Toke Høiland-Jørgensen" <toke@redhat.com>,
	netdev@vger.kernel.org, davem@davemloft.net, edumazet@google.com,
	kuba@kernel.org, pabeni@redhat.com, horms@kernel.org,
	jiri@resnulli.us, victor@mojatatu.com, yimingqian591@gmail.com,
	keenanat2000@gmail.com, 2045gemini@gmail.com,
	rollkingzzc@gmail.com, dcaratti@redhat.com, security@kernel.org,
	linux-kernel@vger.kernel.org,
	"Rajat Gupta" <rajat.gupta@oss.qualcomm.com>
Subject: Re: [PATCH net v2 1/1] net/sched: fix pedit partial COW leading to page cache corruption
Date: Wed, 27 May 2026 17:13:10 +0100	[thread overview]
Message-ID: <20260527171310.0d5ed340@pumpkin> (raw)
In-Reply-To: <CAM0EoMm_FcS0xSNmCjdXaRMfZh12jU71xM92Z50eVE8uk7tKFQ@mail.gmail.com>

On Wed, 27 May 2026 10:56:57 -0400
Jamal Hadi Salim <jhs@mojatatu.com> wrote:

>  &&
> 
> On Tue, May 26, 2026 at 3:22 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> >
> > Jamal Hadi Salim <jhs@mojatatu.com> writes:
> >  
> > > From: Rajat Gupta <rajat.gupta@oss.qualcomm.com>
> > >
> > > tcf_pedit_act() computes the COW range for skb_ensure_writable()
> > > once before the key loop using tcfp_off_max_hint, but the hint does
> > > not account for the runtime header offset added by typed keys. This
> > > can leave part of the write region un-COW'd.
> > >
> > > Fix by moving skb_ensure_writable() inside the per-key loop where
> > > the actual write offset is known, and add overflow checking on the
> > > offset arithmetic. For negative offsets (e.g. Ethernet header edits
> > > at ingress), use skb_cow() to COW the headroom instead. Guard
> > > offset_valid() against INT_MIN, where negation is undefined.
> > >
> > > Additionally, linearize skbs with shared frags upfront to prevent
> > > silent data corruption when pedit operates on zero-copy pages
> > > (e.g. from sendfile).
> > >
> > > Fixes: 8b796475fd78 ("net/sched: act_pedit: really ensure the skb is writable")
> > > Reported-by: Yiming Qian <yimingqian591@gmail.com>
> > > Reported-by: Keenan Dong <keenanat2000@gmail.com>
> > > Reported-by: Han Guidong <2045gemini@gmail.com>
> > > Reported-by: Zhang Cen <rollkingzzc@gmail.com>
> > > Tested-by: Victor Nogueira <victor@mojatatu.com>
> > > Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
> > > Signed-off-by: Rajat Gupta <rajat.gupta@oss.qualcomm.com>  
> >
> > Re-ran the tests, and everything looks good, so:
> >
> > Tested-by: Toke Høiland-Jørgensen <toke@redhat.com>
> >
> > Also looked at the code, and I have a few nits below, but I'm really
> > nitpicking here, so whether you end up fixing those or not:
> >
> > Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
> >
> >
> > [...]
> >  
> > > @@ -323,7 +324,7 @@ static bool offset_valid(struct sk_buff *skb, int offset)
> > >       if (offset > 0 && offset > skb->len)
> > >               return false;
> > >
> > > -     if  (offset < 0 && -offset > skb_headroom(skb))
> > > +     if (offset < 0 && offset < -(int)skb_headroom(skb))
> > >               return false;  
> >
> > This change makes it really obvious that this is really just:
> >
> >         if (offset < -(int)skb_headroom(skb))
> >                 return false;
> >
> > so, well, that would be clearer, IMO.
> >
> > But then I guess the same could be said of the positive case, so:
> >
> > static bool offset_valid(struct sk_buff *skb, int offset)
> > {
> >         if (offset > skb->len || offset < -(int)skb_headroom(skb))
> >                 return false;
> >
> >         return true;
> > }
> >  
> 
> Yes, that improves readability. If i understood the discussion between
> you and David L. something like this one liner would be reasonable?
> 
> if (offset > (int)skb->len || offset < -(int)skb_headroom(skb))

I doubt it will be measurable whatever you pick.

Thinks, can 'we' get away with the much more readable:
	offset + (int)skb_headroom(skb) < 0
No need to worry about the '+' wrapping, the first test will fail.
Large -ve offset are going to stay negative.
So it does look all right.

> 
> 
> > > @@ -393,18 +394,10 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb,
> > >       struct tcf_pedit_key_ex *tkey_ex;
> > >       struct tcf_pedit_parms *parms;
> > >       struct tc_pedit_key *tkey;
> > > -     u32 max_offset;
> > >       int i;
> > >
> > >       parms = rcu_dereference_bh(p->parms);
> > >
> > > -     max_offset = (skb_transport_header_was_set(skb) ?
> > > -                   skb_transport_offset(skb) :
> > > -                   skb_network_offset(skb)) +
> > > -                  parms->tcfp_off_max_hint;
> > > -     if (skb_ensure_writable(skb, min(skb->len, max_offset)))
> > > -             goto done;
> > > -
> > >       tcf_lastuse_update(&p->tcf_tm);
> > >       tcf_action_update_bstats(&p->common, skb);
> > >
> > > @@ -412,9 +405,10 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb,
> > >       tkey_ex = parms->tcfp_keys_ex;
> > >
> > >       for (i = parms->tcfp_nkeys; i > 0; i--, tkey++) {
> > > +             int write_offset, write_len;
> > >               int offset = tkey->off;
> > >               int hoffset = 0;
> > > -             u32 *ptr, hdata;
> > > +             u32 *ptr;
> > >               u32 val;
> > >               int rc;
> > >
> > > @@ -451,15 +445,38 @@ TC_INDIRECT_SCOPE int tcf_pedit_act(struct sk_buff *skb,
> > >                       }
> > >               }
> > >
> > > -             if (!offset_valid(skb, hoffset + offset)) {
> > > -                     pr_info_ratelimited("tc action pedit offset %d out of bounds\n", hoffset + offset);
> > > +             if (unlikely(check_add_overflow(hoffset, offset,  
> >
> > It's a bit weird that this has the unlikely(), but the offset_valid()
> > check doesn't?
> >  
> 
> I focused on the bigger solution and worried more about timeliness to
> get this patch in - but it seems the distros had already picked up the
> first posted patch, so hakuna matata (Still, I should have caught
> these unlikelies ;->). Yes on the second one you pointed out.
> Will remove them.
> 
> > > +                                             &write_offset))) {
> > > +                     pr_info_ratelimited("tc action pedit offset overflow\n");
> > >                       goto bad;
> > >               }
> > >
> > > -             ptr = skb_header_pointer(skb, hoffset + offset,
> > > -                                      sizeof(hdata), &hdata);
> > > -             if (!ptr)
> > > +             if (!offset_valid(skb, write_offset)) {
> > > +                     pr_info_ratelimited("tc action pedit offset %d out of bounds\n",
> > > +                                         write_offset);
> > >                       goto bad;
> > > +             }
> > > +
> > > +             if (write_offset < 0) {
> > > +                     if (skb_cow(skb, -write_offset))
> > > +                             goto bad;
> > > +                     if (write_offset + (int)sizeof(*ptr) > 0) {
> > > +                             if (skb_ensure_writable(skb,
> > > +                                                     min(skb->len,
> > > +                                                         write_offset + sizeof(*ptr))))  
> >
> > Combining these with && instead of the double indentation would be more
> > readable IMO (shorter lines, aligning the 'goto bad' labels).  
> 
> hrm. So:
> if ((write_offset < 0 && ((skb_cow(skb, -write_offset)) &&
> (write_offset + (int)sizeof(*ptr) > 0) &&
> (skb_ensure_writable(skb,min(skb->len,write_offset + sizeof(*ptr)))) {
>             goto bad;
> else {
>   ...
> }
> 
> Not sure which is more readable.

I think he meant just the last two 'if'.

-- David

> 
> cheers,
> jamal
> >  
> > > +                                     goto bad;
> > > +                     }
> > > +             } else {
> > > +                     if (unlikely(check_add_overflow(write_offset,
> > > +                                                     (int)sizeof(*ptr),
> > > +                                                     &write_len)))  
> >
> > Same comment wrt unlikely()
> >  


  parent reply	other threads:[~2026-05-27 16:13 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-26 15:59 [PATCH net v2 1/1] net/sched: fix pedit partial COW leading to page cache corruption Jamal Hadi Salim
2026-05-26 19:22 ` Toke Høiland-Jørgensen
2026-05-27  9:00   ` David Laight
2026-05-27 10:21     ` Toke Høiland-Jørgensen
2026-05-27 14:56   ` Jamal Hadi Salim
2026-05-27 15:12     ` Toke Høiland-Jørgensen
2026-05-27 16:44       ` Jamal Hadi Salim
2026-05-27 16:13     ` David Laight [this message]
2026-05-27 16:48       ` Jamal Hadi Salim
2026-05-27 18:25         ` Jamal Hadi Salim
2026-05-26 21:29 ` Davide Caratti
2026-05-27  2:36 ` Han Guidong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260527171310.0d5ed340@pumpkin \
    --to=david.laight.linux@gmail.com \
    --cc=2045gemini@gmail.com \
    --cc=davem@davemloft.net \
    --cc=dcaratti@redhat.com \
    --cc=edumazet@google.com \
    --cc=horms@kernel.org \
    --cc=jhs@mojatatu.com \
    --cc=jiri@resnulli.us \
    --cc=keenanat2000@gmail.com \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=rajat.gupta@oss.qualcomm.com \
    --cc=rollkingzzc@gmail.com \
    --cc=security@kernel.org \
    --cc=toke@redhat.com \
    --cc=victor@mojatatu.com \
    --cc=yimingqian591@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.