From: Hyunwoo Kim <imv4bel@gmail.com>
To: Eric Dumazet <edumazet@google.com>
Cc: dsahern@kernel.org, idosch@nvidia.com, davem@davemloft.net,
kuba@kernel.org, pabeni@redhat.com, horms@kernel.org,
netdev@vger.kernel.org, imv4bel@gmail.com
Subject: Re: [PATCH net] inet: frags: fix use-after-free caused by the fqdir_pre_exit() flush
Date: Mon, 1 Jun 2026 19:49:21 +0900 [thread overview]
Message-ID: <ah1jsYSmm2PzZ3V1@v4bel> (raw)
In-Reply-To: <CANn89i+q8uKxsjCCe6Gz-YcHTwGZQsDWRKNO2FP=+0Z4S7uQCg@mail.gmail.com>
On Mon, Jun 01, 2026 at 02:56:37AM -0700, Eric Dumazet wrote:
> On Mon, Jun 1, 2026 at 2:37 AM Hyunwoo Kim <imv4bel@gmail.com> wrote:
> >
> > On netns teardown, fqdir_pre_exit() walks the fqdir rhashtable and
> > flushes every fragment queue that is not yet complete using
> > inet_frag_queue_flush(). That helper frees all the skbs queued on the
> > fragment queue but does not set INET_FRAG_COMPLETE, and leaves
> > q->fragments_tail and q->last_run_head pointing at the freed skbs.
> > The queue itself stays in the rhashtable.
> >
> > fqdir_pre_exit() first lowers high_thresh to 0 to stop new queue lookups,
> > but it cannot stop a fragment that already obtained the queue through
> > inet_frag_find() earlier and stalled just before taking the queue lock.
> > Once that fragment resumes after the flush and takes the queue lock,
> > it passes the INET_FRAG_COMPLETE check and then dereferences the freed
> > fragments_tail. inet_frag_queue_insert() reads FRAG_CB() and ->len of
> > that pointer and, on the append path, writes ->next_frag, causing a
> > slab use-after-free. IPv6, nf_conntrack_reasm6 and 6lowpan reassembly
> > share the same flush path and are affected as well.
> >
> > Mark the queue complete and reset its remaining pointers under the same
> > lock right after the flush. With INET_FRAG_COMPLETE set, the insert in
> > each reassembly path bails out at its check as soon as it takes the
> > queue lock and no longer accesses the freed fragments_tail.
> >
> > Fixes: 006a5035b495 ("inet: frags: flush pending skbs in fqdir_pre_exit()")
> > Signed-off-by: Hyunwoo Kim <imv4bel@gmail.com>
> > ---
> > net/ipv4/inet_fragment.c | 7 ++++++-
> > 1 file changed, 6 insertions(+), 1 deletion(-)
> >
> > diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c
> > index 393770920abd..d532f6182c8a 100644
> > --- a/net/ipv4/inet_fragment.c
> > +++ b/net/ipv4/inet_fragment.c
> > @@ -243,8 +243,13 @@ void fqdir_pre_exit(struct fqdir *fqdir)
> > continue;
> > }
> > spin_lock_bh(&fq->lock);
> > - if (!(fq->flags & INET_FRAG_COMPLETE))
> > + if (!(fq->flags & INET_FRAG_COMPLETE)) {
> > inet_frag_queue_flush(fq, 0);
> > + fq->flags |= INET_FRAG_COMPLETE;
> > + fq->rb_fragments = RB_ROOT;
> > + fq->fragments_tail = NULL;
> > + fq->last_run_head = NULL;
> > + }
>
>
> Any reason this is not done from inet_frag_queue_flush() so that we can
> remove the related code from ip_frag_reinit()?
I looked at the callers and agree that doing this in
inet_frag_queue_flush() is the right direction.
>
> diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c
> index 86b100694659ee51292625216113f9411b98a351..6c5f373e55d3a39a581a6364599d911782469f77
> 100644
> --- a/net/ipv4/inet_fragment.c
> +++ b/net/ipv4/inet_fragment.c
> @@ -326,6 +326,10 @@ void inet_frag_queue_flush(struct inet_frag_queue *q,
> reason = reason ?: SKB_DROP_REASON_FRAG_REASM_TIMEOUT;
> sum = inet_frag_rbtree_purge(&q->rb_fragments, reason);
> sub_frag_mem_limit(q->fqdir, sum);
> + q->flags |= INET_FRAG_COMPLETE;
While testing, though, I found that setting INET_FRAG_COMPLETE there
leaks the inet_frag_queue. A queue flushed by fqdir_pre_exit() then
reaches inet_frags_free_cb() with INET_FRAG_COMPLETE set but
INET_FRAG_HASH_DEAD clear, so neither branch there drops its hash
reference. So the INET_FRAG_COMPLETE assignment should be dropped.
I'll send v2 after 24 hours.
Best regards,
Hyunwoo Kim
> + q->rb_fragments = RB_ROOT;
> + q->fragments_tail = NULL;
> + q->last_run_head = NULL;
> }
> EXPORT_SYMBOL(inet_frag_queue_flush);
>
> diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c
> index 56b0f738d2f27b6b4c4b55f5ca9368305ce1eb4f..c790d2f494870e1debd7e73b2d67df017a29f8a8
> 100644
> --- a/net/ipv4/ip_fragment.c
> +++ b/net/ipv4/ip_fragment.c
> @@ -250,9 +250,6 @@ static int ip_frag_reinit(struct ipq *qp)
> qp->q.flags = 0;
> qp->q.len = 0;
> qp->q.meat = 0;
> - qp->q.rb_fragments = RB_ROOT;
> - qp->q.fragments_tail = NULL;
> - qp->q.last_run_head = NULL;
> qp->iif = 0;
> qp->ecn = 0;
prev parent reply other threads:[~2026-06-01 10:49 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-01 9:37 [PATCH net] inet: frags: fix use-after-free caused by the fqdir_pre_exit() flush Hyunwoo Kim
2026-06-01 9:56 ` Eric Dumazet
2026-06-01 10:49 ` Hyunwoo Kim [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ah1jsYSmm2PzZ3V1@v4bel \
--to=imv4bel@gmail.com \
--cc=davem@davemloft.net \
--cc=dsahern@kernel.org \
--cc=edumazet@google.com \
--cc=horms@kernel.org \
--cc=idosch@nvidia.com \
--cc=kuba@kernel.org \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox