Netdev List
 help / color / mirror / Atom feed
* [PATCH net] inet: frags: fix use-after-free caused by the fqdir_pre_exit() flush
@ 2026-06-01  9:37 Hyunwoo Kim
  2026-06-01  9:56 ` Eric Dumazet
  0 siblings, 1 reply; 3+ messages in thread
From: Hyunwoo Kim @ 2026-06-01  9:37 UTC (permalink / raw)
  To: dsahern, idosch, davem, edumazet, kuba, pabeni, horms; +Cc: netdev, imv4bel

On netns teardown, fqdir_pre_exit() walks the fqdir rhashtable and
flushes every fragment queue that is not yet complete using
inet_frag_queue_flush(). That helper frees all the skbs queued on the
fragment queue but does not set INET_FRAG_COMPLETE, and leaves
q->fragments_tail and q->last_run_head pointing at the freed skbs.
The queue itself stays in the rhashtable.

fqdir_pre_exit() first lowers high_thresh to 0 to stop new queue lookups,
but it cannot stop a fragment that already obtained the queue through
inet_frag_find() earlier and stalled just before taking the queue lock.
Once that fragment resumes after the flush and takes the queue lock,
it passes the INET_FRAG_COMPLETE check and then dereferences the freed
fragments_tail. inet_frag_queue_insert() reads FRAG_CB() and ->len of
that pointer and, on the append path, writes ->next_frag, causing a
slab use-after-free. IPv6, nf_conntrack_reasm6 and 6lowpan reassembly
share the same flush path and are affected as well.

Mark the queue complete and reset its remaining pointers under the same
lock right after the flush. With INET_FRAG_COMPLETE set, the insert in
each reassembly path bails out at its check as soon as it takes the
queue lock and no longer accesses the freed fragments_tail.

Fixes: 006a5035b495 ("inet: frags: flush pending skbs in fqdir_pre_exit()")
Signed-off-by: Hyunwoo Kim <imv4bel@gmail.com>
---
 net/ipv4/inet_fragment.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c
index 393770920abd..d532f6182c8a 100644
--- a/net/ipv4/inet_fragment.c
+++ b/net/ipv4/inet_fragment.c
@@ -243,8 +243,13 @@ void fqdir_pre_exit(struct fqdir *fqdir)
 			continue;
 		}
 		spin_lock_bh(&fq->lock);
-		if (!(fq->flags & INET_FRAG_COMPLETE))
+		if (!(fq->flags & INET_FRAG_COMPLETE)) {
 			inet_frag_queue_flush(fq, 0);
+			fq->flags |= INET_FRAG_COMPLETE;
+			fq->rb_fragments = RB_ROOT;
+			fq->fragments_tail = NULL;
+			fq->last_run_head = NULL;
+		}
 		spin_unlock_bh(&fq->lock);
 	}
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH net] inet: frags: fix use-after-free caused by the fqdir_pre_exit() flush
  2026-06-01  9:37 [PATCH net] inet: frags: fix use-after-free caused by the fqdir_pre_exit() flush Hyunwoo Kim
@ 2026-06-01  9:56 ` Eric Dumazet
  2026-06-01 10:49   ` Hyunwoo Kim
  0 siblings, 1 reply; 3+ messages in thread
From: Eric Dumazet @ 2026-06-01  9:56 UTC (permalink / raw)
  To: Hyunwoo Kim; +Cc: dsahern, idosch, davem, kuba, pabeni, horms, netdev

On Mon, Jun 1, 2026 at 2:37 AM Hyunwoo Kim <imv4bel@gmail.com> wrote:
>
> On netns teardown, fqdir_pre_exit() walks the fqdir rhashtable and
> flushes every fragment queue that is not yet complete using
> inet_frag_queue_flush(). That helper frees all the skbs queued on the
> fragment queue but does not set INET_FRAG_COMPLETE, and leaves
> q->fragments_tail and q->last_run_head pointing at the freed skbs.
> The queue itself stays in the rhashtable.
>
> fqdir_pre_exit() first lowers high_thresh to 0 to stop new queue lookups,
> but it cannot stop a fragment that already obtained the queue through
> inet_frag_find() earlier and stalled just before taking the queue lock.
> Once that fragment resumes after the flush and takes the queue lock,
> it passes the INET_FRAG_COMPLETE check and then dereferences the freed
> fragments_tail. inet_frag_queue_insert() reads FRAG_CB() and ->len of
> that pointer and, on the append path, writes ->next_frag, causing a
> slab use-after-free. IPv6, nf_conntrack_reasm6 and 6lowpan reassembly
> share the same flush path and are affected as well.
>
> Mark the queue complete and reset its remaining pointers under the same
> lock right after the flush. With INET_FRAG_COMPLETE set, the insert in
> each reassembly path bails out at its check as soon as it takes the
> queue lock and no longer accesses the freed fragments_tail.
>
> Fixes: 006a5035b495 ("inet: frags: flush pending skbs in fqdir_pre_exit()")
> Signed-off-by: Hyunwoo Kim <imv4bel@gmail.com>
> ---
>  net/ipv4/inet_fragment.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c
> index 393770920abd..d532f6182c8a 100644
> --- a/net/ipv4/inet_fragment.c
> +++ b/net/ipv4/inet_fragment.c
> @@ -243,8 +243,13 @@ void fqdir_pre_exit(struct fqdir *fqdir)
>                         continue;
>                 }
>                 spin_lock_bh(&fq->lock);
> -               if (!(fq->flags & INET_FRAG_COMPLETE))
> +               if (!(fq->flags & INET_FRAG_COMPLETE)) {
>                         inet_frag_queue_flush(fq, 0);
> +                       fq->flags |= INET_FRAG_COMPLETE;
> +                       fq->rb_fragments = RB_ROOT;
> +                       fq->fragments_tail = NULL;
> +                       fq->last_run_head = NULL;
> +               }


Any reason this is not done from inet_frag_queue_flush() so that we can
remove the related code from ip_frag_reinit()?

diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c
index 86b100694659ee51292625216113f9411b98a351..6c5f373e55d3a39a581a6364599d911782469f77
100644
--- a/net/ipv4/inet_fragment.c
+++ b/net/ipv4/inet_fragment.c
@@ -326,6 +326,10 @@ void inet_frag_queue_flush(struct inet_frag_queue *q,
        reason = reason ?: SKB_DROP_REASON_FRAG_REASM_TIMEOUT;
        sum = inet_frag_rbtree_purge(&q->rb_fragments, reason);
        sub_frag_mem_limit(q->fqdir, sum);
+       q->flags |= INET_FRAG_COMPLETE;
+       q->rb_fragments = RB_ROOT;
+       q->fragments_tail = NULL;
+       q->last_run_head = NULL;
 }
 EXPORT_SYMBOL(inet_frag_queue_flush);

diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c
index 56b0f738d2f27b6b4c4b55f5ca9368305ce1eb4f..c790d2f494870e1debd7e73b2d67df017a29f8a8
100644
--- a/net/ipv4/ip_fragment.c
+++ b/net/ipv4/ip_fragment.c
@@ -250,9 +250,6 @@ static int ip_frag_reinit(struct ipq *qp)
        qp->q.flags = 0;
        qp->q.len = 0;
        qp->q.meat = 0;
-       qp->q.rb_fragments = RB_ROOT;
-       qp->q.fragments_tail = NULL;
-       qp->q.last_run_head = NULL;
        qp->iif = 0;
        qp->ecn = 0;

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH net] inet: frags: fix use-after-free caused by the fqdir_pre_exit() flush
  2026-06-01  9:56 ` Eric Dumazet
@ 2026-06-01 10:49   ` Hyunwoo Kim
  0 siblings, 0 replies; 3+ messages in thread
From: Hyunwoo Kim @ 2026-06-01 10:49 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: dsahern, idosch, davem, kuba, pabeni, horms, netdev, imv4bel

On Mon, Jun 01, 2026 at 02:56:37AM -0700, Eric Dumazet wrote:
> On Mon, Jun 1, 2026 at 2:37 AM Hyunwoo Kim <imv4bel@gmail.com> wrote:
> >
> > On netns teardown, fqdir_pre_exit() walks the fqdir rhashtable and
> > flushes every fragment queue that is not yet complete using
> > inet_frag_queue_flush(). That helper frees all the skbs queued on the
> > fragment queue but does not set INET_FRAG_COMPLETE, and leaves
> > q->fragments_tail and q->last_run_head pointing at the freed skbs.
> > The queue itself stays in the rhashtable.
> >
> > fqdir_pre_exit() first lowers high_thresh to 0 to stop new queue lookups,
> > but it cannot stop a fragment that already obtained the queue through
> > inet_frag_find() earlier and stalled just before taking the queue lock.
> > Once that fragment resumes after the flush and takes the queue lock,
> > it passes the INET_FRAG_COMPLETE check and then dereferences the freed
> > fragments_tail. inet_frag_queue_insert() reads FRAG_CB() and ->len of
> > that pointer and, on the append path, writes ->next_frag, causing a
> > slab use-after-free. IPv6, nf_conntrack_reasm6 and 6lowpan reassembly
> > share the same flush path and are affected as well.
> >
> > Mark the queue complete and reset its remaining pointers under the same
> > lock right after the flush. With INET_FRAG_COMPLETE set, the insert in
> > each reassembly path bails out at its check as soon as it takes the
> > queue lock and no longer accesses the freed fragments_tail.
> >
> > Fixes: 006a5035b495 ("inet: frags: flush pending skbs in fqdir_pre_exit()")
> > Signed-off-by: Hyunwoo Kim <imv4bel@gmail.com>
> > ---
> >  net/ipv4/inet_fragment.c | 7 ++++++-
> >  1 file changed, 6 insertions(+), 1 deletion(-)
> >
> > diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c
> > index 393770920abd..d532f6182c8a 100644
> > --- a/net/ipv4/inet_fragment.c
> > +++ b/net/ipv4/inet_fragment.c
> > @@ -243,8 +243,13 @@ void fqdir_pre_exit(struct fqdir *fqdir)
> >                         continue;
> >                 }
> >                 spin_lock_bh(&fq->lock);
> > -               if (!(fq->flags & INET_FRAG_COMPLETE))
> > +               if (!(fq->flags & INET_FRAG_COMPLETE)) {
> >                         inet_frag_queue_flush(fq, 0);
> > +                       fq->flags |= INET_FRAG_COMPLETE;
> > +                       fq->rb_fragments = RB_ROOT;
> > +                       fq->fragments_tail = NULL;
> > +                       fq->last_run_head = NULL;
> > +               }
> 
> 
> Any reason this is not done from inet_frag_queue_flush() so that we can
> remove the related code from ip_frag_reinit()?

I looked at the callers and agree that doing this in 
inet_frag_queue_flush() is the right direction.

> 
> diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c
> index 86b100694659ee51292625216113f9411b98a351..6c5f373e55d3a39a581a6364599d911782469f77
> 100644
> --- a/net/ipv4/inet_fragment.c
> +++ b/net/ipv4/inet_fragment.c
> @@ -326,6 +326,10 @@ void inet_frag_queue_flush(struct inet_frag_queue *q,
>         reason = reason ?: SKB_DROP_REASON_FRAG_REASM_TIMEOUT;
>         sum = inet_frag_rbtree_purge(&q->rb_fragments, reason);
>         sub_frag_mem_limit(q->fqdir, sum);
> +       q->flags |= INET_FRAG_COMPLETE;

While testing, though, I found that setting INET_FRAG_COMPLETE there 
leaks the inet_frag_queue. A queue flushed by fqdir_pre_exit() then 
reaches inet_frags_free_cb() with INET_FRAG_COMPLETE set but 
INET_FRAG_HASH_DEAD clear, so neither branch there drops its hash 
reference. So the INET_FRAG_COMPLETE assignment should be dropped.

I'll send v2 after 24 hours.


Best regards,
Hyunwoo Kim

> +       q->rb_fragments = RB_ROOT;
> +       q->fragments_tail = NULL;
> +       q->last_run_head = NULL;
>  }
>  EXPORT_SYMBOL(inet_frag_queue_flush);
> 
> diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c
> index 56b0f738d2f27b6b4c4b55f5ca9368305ce1eb4f..c790d2f494870e1debd7e73b2d67df017a29f8a8
> 100644
> --- a/net/ipv4/ip_fragment.c
> +++ b/net/ipv4/ip_fragment.c
> @@ -250,9 +250,6 @@ static int ip_frag_reinit(struct ipq *qp)
>         qp->q.flags = 0;
>         qp->q.len = 0;
>         qp->q.meat = 0;
> -       qp->q.rb_fragments = RB_ROOT;
> -       qp->q.fragments_tail = NULL;
> -       qp->q.last_run_head = NULL;
>         qp->iif = 0;
>         qp->ecn = 0;

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2026-06-01 10:49 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-01  9:37 [PATCH net] inet: frags: fix use-after-free caused by the fqdir_pre_exit() flush Hyunwoo Kim
2026-06-01  9:56 ` Eric Dumazet
2026-06-01 10:49   ` Hyunwoo Kim

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox