netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Pablo Neira Ayuso <pablo@netfilter.org>
To: Eric Dumazet <edumazet@google.com>
Cc: netfilter-devel@vger.kernel.org, davem@davemloft.net,
	netdev@vger.kernel.org, kuba@kernel.org, pabeni@redhat.com,
	fw@strlen.de
Subject: Re: [PATCH net 4/4] netfilter: nft_inner: incorrect percpu area handling under softirq
Date: Mon, 2 Dec 2024 02:24:09 +0100	[thread overview]
Message-ID: <Z00MOYmYgmlrrpWN@calendula> (raw)
In-Reply-To: <CANn89iKGUKxLGY4UG9JrAVQR5bahHrUurf7TVwPcE4rf4g3cAQ@mail.gmail.com>

Hi Eric,

On Fri, Nov 29, 2024 at 10:14:34AM +0100, Eric Dumazet wrote:
> On Thu, Nov 28, 2024 at 1:38 PM Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> >
> > Softirq can interrupt packet from process context which walks over the
> > percpu area.
> >
> > Add routines to disable bh while restoring and saving the tunnel parser
> > context from percpu area to stack. Add a skbuff owner for this percpu
> > area to catch softirq interference to exercise the packet tunnel parser
> > again in such case.
> >
> > Reported-by: syzbot+84d0441b9860f0d63285@syzkaller.appspotmail.com
> > Fixes: 3a07327d10a0 ("netfilter: nft_inner: support for inner tunnel header matching")
> > Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
> > ---
> >  include/net/netfilter/nf_tables_core.h |  1 +
> >  net/netfilter/nft_inner.c              | 56 ++++++++++++++++++++------
> >  2 files changed, 45 insertions(+), 12 deletions(-)
> >
> > diff --git a/include/net/netfilter/nf_tables_core.h b/include/net/netfilter/nf_tables_core.h
> > index ff27cb2e1662..dae0e7592934 100644
> > --- a/include/net/netfilter/nf_tables_core.h
> > +++ b/include/net/netfilter/nf_tables_core.h
> > @@ -161,6 +161,7 @@ enum {
> >  };
> >
> >  struct nft_inner_tun_ctx {
> > +       struct sk_buff *skb;    /* percpu area owner */
> >         u16     type;
> >         u16     inner_tunoff;
> >         u16     inner_lloff;
> > diff --git a/net/netfilter/nft_inner.c b/net/netfilter/nft_inner.c
> > index 928312d01eb1..fcaa126ac8da 100644
> > --- a/net/netfilter/nft_inner.c
> > +++ b/net/netfilter/nft_inner.c
> > @@ -210,35 +210,65 @@ static int nft_inner_parse(const struct nft_inner *priv,
> >                            struct nft_pktinfo *pkt,
> >                            struct nft_inner_tun_ctx *tun_ctx)
> >  {
> > -       struct nft_inner_tun_ctx ctx = {};
> >         u32 off = pkt->inneroff;
> >
> >         if (priv->flags & NFT_INNER_HDRSIZE &&
> > -           nft_inner_parse_tunhdr(priv, pkt, &ctx, &off) < 0)
> > +           nft_inner_parse_tunhdr(priv, pkt, tun_ctx, &off) < 0)
> >                 return -1;
> >
> >         if (priv->flags & (NFT_INNER_LL | NFT_INNER_NH)) {
> > -               if (nft_inner_parse_l2l3(priv, pkt, &ctx, off) < 0)
> > +               if (nft_inner_parse_l2l3(priv, pkt, tun_ctx, off) < 0)
> >                         return -1;
> >         } else if (priv->flags & NFT_INNER_TH) {
> > -               ctx.inner_thoff = off;
> > -               ctx.flags |= NFT_PAYLOAD_CTX_INNER_TH;
> > +               tun_ctx->inner_thoff = off;
> > +               tun_ctx->flags |= NFT_PAYLOAD_CTX_INNER_TH;
> >         }
> >
> > -       *tun_ctx = ctx;
> >         tun_ctx->type = priv->type;
> > +       tun_ctx->skb = pkt->skb;
> >         pkt->flags |= NFT_PKTINFO_INNER_FULL;
> >
> >         return 0;
> >  }
> >
> > +static bool nft_inner_restore_tun_ctx(const struct nft_pktinfo *pkt,
> > +                                     struct nft_inner_tun_ctx *tun_ctx)
> > +{
> > +       struct nft_inner_tun_ctx *this_cpu_tun_ctx;
> > +
> > +       local_bh_disable();
> > +       this_cpu_tun_ctx = this_cpu_ptr(&nft_pcpu_tun_ctx);
> > +       if (this_cpu_tun_ctx->skb != pkt->skb) {
> 
> I must say I do not understand this patch.
> 
> If a context is used by a save/restore more than one time per packet
> traversal, then this means we can not use per-cpu storage,
> or risk flakes.
> 
> Also, skb could be freed and re-allocated ?
> 
> Perhaps describe a bit more what is going on in the changelog.

There is an on-stack nft_pktinfo structure with a flags field. This
nft_pktinfo is a wrapper for the sk_buff, containing header offsets
and metainformation. This is initialize when entering this chain.

If the NFT_PKTINFO_INNER_FULL flag is set on, then the percpu area
could contain information on the inner header offsets (ie. packet was
already parsed in this chain).

There is a check to validate that the percpu area refers to this
skbuff. If there is a mismatch, then this needs to parse the inner
headers because the data in the percpu area is stale. Otherwise, if
there is a match, the percpu area is copied on-stack.

After processing (payload/meta fetching), the on-stack copy is stored
back to the percpu area. I can make an improvement on this patch to
check if this skbuff still owns the percpu area in the store/exit
section of this inner evaluation routine, to avoid a unnecessary update.

So, it is basically the NFT_PKTINFO_INNER_FULL flag that handles the
skbuff reallocation scenario. This is not blindly trusting this percpu
area per-se.

One comestic change I can apply to this: I can also turn the struct
sk_buff into unsigned long so it clear to the reader this pointer to
skbuff is not meant to be used for being dereferenced.

If the explaination above is sufficient, I can revamp to extend the
changelog as you suggest and post a new version of this patch.

Thanks.

  reply	other threads:[~2024-12-02  1:24 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-11-28 12:38 [PATCH net,v2 0/4] Netfilter fixes for net Pablo Neira Ayuso
2024-11-28 12:38 ` [PATCH net 1/4] ipvs: fix UB due to uninitialized stack access in ip_vs_protocol_init() Pablo Neira Ayuso
2024-11-28 12:38 ` [PATCH net 2/4] netfilter: x_tables: fix LED ID check in led_tg_check() Pablo Neira Ayuso
2024-11-28 12:38 ` [PATCH net 3/4] netfilter: nft_socket: remove WARN_ON_ONCE on maximum cgroup level Pablo Neira Ayuso
2024-11-28 12:38 ` [PATCH net 4/4] netfilter: nft_inner: incorrect percpu area handling under softirq Pablo Neira Ayuso
2024-11-29  9:14   ` Eric Dumazet
2024-12-02  1:24     ` Pablo Neira Ayuso [this message]
2024-12-02  9:17       ` Eric Dumazet
2024-12-02  9:28         ` Pablo Neira Ayuso
2024-12-03 20:22         ` Pablo Neira Ayuso
2024-11-28 14:33 ` [PATCH net,v2 0/4] Netfilter fixes for net Paolo Abeni
2024-11-28 14:41   ` Pablo Neira Ayuso
  -- strict thread matches above, loose matches on Subject: below --
2024-11-28 12:23 [PATCH net " Pablo Neira Ayuso
2024-11-28 12:23 ` [PATCH net 4/4] netfilter: nft_inner: incorrect percpu area handling under softirq Pablo Neira Ayuso

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z00MOYmYgmlrrpWN@calendula \
    --to=pablo@netfilter.org \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=fw@strlen.de \
    --cc=kuba@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=netfilter-devel@vger.kernel.org \
    --cc=pabeni@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).