[PATCH ipsec] xfrm: policy: use hlist_del_init_rcu in xfrm_hash

Netdev List
 help / color / mirror / Atom feed

* [PATCH ipsec] xfrm: policy: use hlist_del_init_rcu in xfrm_hash_rebuild to avoid bydst poison
@ 2026-07-02 18:58 Xiang Mei (Microsoft)
  2026-07-02 19:19 ` Florian Westphal
  0 siblings, 1 reply; 5+ messages in thread
From: Xiang Mei (Microsoft) @ 2026-07-02 18:58 UTC (permalink / raw)
  To: steffen.klassert, herbert, davem, netdev
  Cc: horms, fw, edumazet, kuba, pabeni, AutonomousCodeSecurity,
	tgopinath, kys, Xiang Mei (Microsoft)

xfrm_hash_rebuild() unlinks each policy from its bydst chain with
hlist_del_rcu() and re-inserts it. For an inexact policy the re-insert goes
through xfrm_policy_inexact_insert(), which can fail on a GFP_ATOMIC
allocation; on failure the error path only WARN_ONCE()s and continues, so the
policy is left with a poisoned bydst node (LIST_POISON2). The next rebuild
calls hlist_del_rcu() on that node again, dereferences the poison, and takes a
general protection fault.

Use hlist_del_init_rcu() instead, so a failed-reinsert node is left unhashed
(pprev == NULL) rather than poisoned. The next rebuild's hlist_del_init_rcu()
is then a no-op for it, and the non-failing case is unchanged.

The reinsert allocation is GFP_ATOMIC (it runs under xfrm_policy_lock), so in
practice this is only reached under memory pressure; the crash below was
reproduced deterministically by forcing that allocation to fail with fault
injection (failslab).

Crash:
  Oops: general protection fault, probably for non-canonical address
  0xfbd59c0000000024: 0000 [#1] SMP KASAN NOPTI
  KASAN: maybe wild-memory-access in range [0xdead000000000120-0xdead000000000127]
  ...
  Workqueue: events xfrm_hash_rebuild
  RIP: 0010:xfrm_hash_rebuild+0x5b3/0x1190
  RAX: dead000000000122   (LIST_POISON2 + offset)
  ...
  Call Trace:
   hlist_del_rcu (include/linux/rculist.h:599)
   xfrm_hash_rebuild (net/xfrm/xfrm_policy.c:1365)
   process_one_work (kernel/workqueue.c:3322)
   worker_thread (kernel/workqueue.c:3486)
   kthread (kernel/kthread.c:436)
   ret_from_fork (arch/x86/kernel/process.c:158)
   ret_from_fork_asm (arch/x86/entry/entry_64.S:245)
   ...
  Kernel panic - not syncing: Fatal exception in interrupt

Fixes: 563d5ca93e88 ("xfrm: switch migrate to xfrm_policy_lookup_bytype")
Reported-by: AutonomousCodeSecurity@microsoft.com
Signed-off-by: Xiang Mei (Microsoft) <xmei5@asu.edu>
---
 net/xfrm/xfrm_policy.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index 7ef861a0e823..2612a405542b 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -1362,7 +1362,7 @@ static void xfrm_hash_rebuild(struct work_struct *work)
 		if (xfrm_policy_is_dead_or_sk(policy))
 			continue;

-		hlist_del_rcu(&policy->bydst);
+		hlist_del_init_rcu(&policy->bydst);

 		newpos = NULL;
 		dir = xfrm_policy_id2dir(policy->index);
-- 
2.43.0

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH ipsec] xfrm: policy: use hlist_del_init_rcu in xfrm_hash_rebuild to avoid bydst poison
  2026-07-02 18:58 [PATCH ipsec] xfrm: policy: use hlist_del_init_rcu in xfrm_hash_rebuild to avoid bydst poison Xiang Mei (Microsoft)
@ 2026-07-02 19:19 ` Florian Westphal
  2026-07-02 22:11   ` Xiang Mei
  0 siblings, 1 reply; 5+ messages in thread
From: Florian Westphal @ 2026-07-02 19:19 UTC (permalink / raw)
  To: Xiang Mei (Microsoft)
  Cc: steffen.klassert, herbert, davem, netdev, horms, edumazet, kuba,
	pabeni, AutonomousCodeSecurity, tgopinath, kys

Xiang Mei (Microsoft) <xmei5@asu.edu> wrote:
> xfrm_hash_rebuild() unlinks each policy from its bydst chain with
> hlist_del_rcu() and re-inserts it. For an inexact policy the re-insert goes
> through xfrm_policy_inexact_insert(), which can fail on a GFP_ATOMIC
> allocation; on failure the error path only WARN_ONCE()s and continues, so the
> policy is left with a poisoned bydst node (LIST_POISON2). The next rebuild
> calls hlist_del_rcu() on that node again, dereferences the poison, and takes a
> general protection fault.
> 
> Use hlist_del_init_rcu() instead, so a failed-reinsert node is left unhashed
> (pprev == NULL) rather than poisoned. The next rebuild's hlist_del_init_rcu()
> is then a no-op for it, and the non-failing case is unchanged.
> 
> The reinsert allocation is GFP_ATOMIC (it runs under xfrm_policy_lock), so in
> practice this is only reached under memory pressure; the crash below was
> reproduced deterministically by forcing that allocation to fail with fault
> injection (failslab).
> 
> Crash:
>   Oops: general protection fault, probably for non-canonical address
>   0xfbd59c0000000024: 0000 [#1] SMP KASAN NOPTI
>   KASAN: maybe wild-memory-access in range [0xdead000000000120-0xdead000000000127]
>   ...
>   Workqueue: events xfrm_hash_rebuild
>   RIP: 0010:xfrm_hash_rebuild+0x5b3/0x1190
>   RAX: dead000000000122   (LIST_POISON2 + offset)
>   ...
>   Call Trace:
>    hlist_del_rcu (include/linux/rculist.h:599)
>    xfrm_hash_rebuild (net/xfrm/xfrm_policy.c:1365)
>    process_one_work (kernel/workqueue.c:3322)
>    worker_thread (kernel/workqueue.c:3486)
>    kthread (kernel/kthread.c:436)
>    ret_from_fork (arch/x86/kernel/process.c:158)
>    ret_from_fork_asm (arch/x86/entry/entry_64.S:245)
>    ...
>   Kernel panic - not syncing: Fatal exception in interrupt
> 
> Fixes: 563d5ca93e88 ("xfrm: switch migrate to xfrm_policy_lookup_bytype")
> Reported-by: AutonomousCodeSecurity@microsoft.com
> Signed-off-by: Xiang Mei (Microsoft) <xmei5@asu.edu>
> ---
>  net/xfrm/xfrm_policy.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
> index 7ef861a0e823..2612a405542b 100644
> --- a/net/xfrm/xfrm_policy.c
> +++ b/net/xfrm/xfrm_policy.c
> @@ -1362,7 +1362,7 @@ static void xfrm_hash_rebuild(struct work_struct *work)
>  		if (xfrm_policy_is_dead_or_sk(policy))
>  			continue;
>  
> -		hlist_del_rcu(&policy->bydst);
> +		hlist_del_init_rcu(&policy->bydst);

This patch is dubious.  I looks to me as if it papers over the
actual bug.

Why is there a memory allocation error?

The first loop -- before unlink -- is supposed to preallocate the new
bins and chain heads.

This is also why there is a WARN. No memory allocations are supposed to
occur after the hlist_del_rcu(), there is supposed to be a guarantee
that the insertion succeeds.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH ipsec] xfrm: policy: use hlist_del_init_rcu in xfrm_hash_rebuild to avoid bydst poison
  2026-07-02 19:19 ` Florian Westphal
@ 2026-07-02 22:11   ` Xiang Mei
  2026-07-03  3:58     ` Florian Westphal
  0 siblings, 1 reply; 5+ messages in thread
From: Xiang Mei @ 2026-07-02 22:11 UTC (permalink / raw)
  To: Florian Westphal
  Cc: steffen.klassert, herbert, davem, netdev, horms, edumazet, kuba,
	pabeni, AutonomousCodeSecurity, tgopinath, kys

On Thu, Jul 2, 2026 at 12:19 PM Florian Westphal <fw@strlen.de> wrote:
>
> Xiang Mei (Microsoft) <xmei5@asu.edu> wrote:
> > xfrm_hash_rebuild() unlinks each policy from its bydst chain with
> > hlist_del_rcu() and re-inserts it. For an inexact policy the re-insert goes
> > through xfrm_policy_inexact_insert(), which can fail on a GFP_ATOMIC
> > allocation; on failure the error path only WARN_ONCE()s and continues, so the
> > policy is left with a poisoned bydst node (LIST_POISON2). The next rebuild
> > calls hlist_del_rcu() on that node again, dereferences the poison, and takes a
> > general protection fault.
> >
> > Use hlist_del_init_rcu() instead, so a failed-reinsert node is left unhashed
> > (pprev == NULL) rather than poisoned. The next rebuild's hlist_del_init_rcu()
> > is then a no-op for it, and the non-failing case is unchanged.
> >
> > The reinsert allocation is GFP_ATOMIC (it runs under xfrm_policy_lock), so in
> > practice this is only reached under memory pressure; the crash below was
> > reproduced deterministically by forcing that allocation to fail with fault
> > injection (failslab).
> >
> > Crash:
> >   Oops: general protection fault, probably for non-canonical address
> >   0xfbd59c0000000024: 0000 [#1] SMP KASAN NOPTI
> >   KASAN: maybe wild-memory-access in range [0xdead000000000120-0xdead000000000127]
> >   ...
> >   Workqueue: events xfrm_hash_rebuild
> >   RIP: 0010:xfrm_hash_rebuild+0x5b3/0x1190
> >   RAX: dead000000000122   (LIST_POISON2 + offset)
> >   ...
> >   Call Trace:
> >    hlist_del_rcu (include/linux/rculist.h:599)
> >    xfrm_hash_rebuild (net/xfrm/xfrm_policy.c:1365)
> >    process_one_work (kernel/workqueue.c:3322)
> >    worker_thread (kernel/workqueue.c:3486)
> >    kthread (kernel/kthread.c:436)
> >    ret_from_fork (arch/x86/kernel/process.c:158)
> >    ret_from_fork_asm (arch/x86/entry/entry_64.S:245)
> >    ...
> >   Kernel panic - not syncing: Fatal exception in interrupt
> >
> > Fixes: 563d5ca93e88 ("xfrm: switch migrate to xfrm_policy_lookup_bytype")
> > Reported-by: AutonomousCodeSecurity@microsoft.com
> > Signed-off-by: Xiang Mei (Microsoft) <xmei5@asu.edu>
> > ---
> >  net/xfrm/xfrm_policy.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
> > index 7ef861a0e823..2612a405542b 100644
> > --- a/net/xfrm/xfrm_policy.c
> > +++ b/net/xfrm/xfrm_policy.c
> > @@ -1362,7 +1362,7 @@ static void xfrm_hash_rebuild(struct work_struct *work)
> >               if (xfrm_policy_is_dead_or_sk(policy))
> >                       continue;
> >
> > -             hlist_del_rcu(&policy->bydst);
> > +             hlist_del_init_rcu(&policy->bydst);
>
> This patch is dubious.  I looks to me as if it papers over the
> actual bug.
>
> Why is there a memory allocation error?
>
> The first loop -- before unlink -- is supposed to preallocate the new
> bins and chain heads.
>
Agreed, and my patch hides it instead of avoiding it. The problem is
the prep loop's guard is inverted:

        if (policy->selector.prefixlen_d < dbits ||
            policy->selector.prefixlen_s < sbits)
                continue;

That skips exactly the policies reinserted via the tree (prefixlen <
threshold => policy_hash_bysel() NULL => xfrm_policy_inexact_insert()), and
preallocates for the exact ones instead, which never allocate and get
pruned again at out_unlock. So the inexact bin/node is allocated GFP_ATOMIC
after the hlist_del_rcu(), and that's the failure the WARN catches.

The reproducer lowers then raises the threshold so those bins are pruned
and reallocated during reinsert; failslab just makes the failure
deterministic, OOM would do the same.

v2 inverts the guard so prep prepares the set that's actually reinserted:

        -       if (policy->selector.prefixlen_d < dbits ||
        -           policy->selector.prefixlen_s < sbits)
        +       if (policy->selector.prefixlen_d >= dbits &&
        +           policy->selector.prefixlen_s >= sbits)
                        continue;

Then inexact_insert() finds bin+node present and allocates nothing, so the
reinsert can't fail; a prep failure still hits goto out_unlock before any
unlink. hlist_del_rcu vs _init_ then no longer matters.

---
  net/xfrm/xfrm_policy.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index 7ef861a0e823..932a313b9460 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -1329,8 +1329,8 @@ static void xfrm_hash_rebuild(struct work_struct *work)
                        }
                }

-               if (policy->selector.prefixlen_d < dbits ||
-                   policy->selector.prefixlen_s < sbits)
+               if (policy->selector.prefixlen_d >= dbits &&
+                   policy->selector.prefixlen_s >= sbits)
                        continue;

                bin = xfrm_policy_inexact_alloc_bin(policy, dir);
---


I checked the new patch on the reproducer, and the crash can't be triggered.
If you agree with this new patch, I'll send this as v2. I can also share
the reproducer if you need that to do further checks.

Xiang


Xiang

> This is also why there is a WARN. No memory allocations are supposed to
> occur after the hlist_del_rcu(), there is supposed to be a guarantee
> that the insertion succeeds.
>

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH ipsec] xfrm: policy: use hlist_del_init_rcu in xfrm_hash_rebuild to avoid bydst poison
  2026-07-02 22:11   ` Xiang Mei
@ 2026-07-03  3:58     ` Florian Westphal
  2026-07-03  5:22       ` Xiang Mei
  0 siblings, 1 reply; 5+ messages in thread
From: Florian Westphal @ 2026-07-03  3:58 UTC (permalink / raw)
  To: Xiang Mei
  Cc: steffen.klassert, herbert, davem, netdev, horms, edumazet, kuba,
	pabeni, AutonomousCodeSecurity, tgopinath, kys

Xiang Mei <xmei5@asu.edu> wrote:
> Agreed, and my patch hides it instead of avoiding it. The problem is
> the prep loop's guard is inverted:
> 
>         if (policy->selector.prefixlen_d < dbits ||
>             policy->selector.prefixlen_s < sbits)
>                 continue;
> 
> That skips exactly the policies reinserted via the tree (prefixlen <
> threshold => policy_hash_bysel() NULL => xfrm_policy_inexact_insert()),

Indeed.

> locates for the exact ones instead, which never allocate and get
> pruned again at out_unlock. So the inexact bin/node is allocated GFP_ATOMIC
> after the hlist_del_rcu(), and that's the failure the WARN catches.
> 
> The reproducer lowers then raises the threshold so those bins are pruned
> and reallocated during reinsert; failslab just makes the failure
> deterministic, OOM would do the same.
> 
> v2 inverts the guard so prep prepares the set that's actually reinserted:
> 
>         -       if (policy->selector.prefixlen_d < dbits ||
>         -           policy->selector.prefixlen_s < sbits)
>         +       if (policy->selector.prefixlen_d >= dbits &&
>         +           policy->selector.prefixlen_s >= sbits)
>                         continue;

Much better!

> I checked the new patch on the reproducer, and the crash can't be triggered.
> If you agree with this new patch, I'll send this as v2.

Please do, thanks!

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH ipsec] xfrm: policy: use hlist_del_init_rcu in xfrm_hash_rebuild to avoid bydst poison
  2026-07-03  3:58     ` Florian Westphal
@ 2026-07-03  5:22       ` Xiang Mei
  0 siblings, 0 replies; 5+ messages in thread
From: Xiang Mei @ 2026-07-03  5:22 UTC (permalink / raw)
  To: Florian Westphal
  Cc: steffen.klassert, herbert, davem, netdev, horms, edumazet, kuba,
	pabeni, AutonomousCodeSecurity, tgopinath, kys

On Thu, Jul 2, 2026 at 8:58 PM Florian Westphal <fw@strlen.de> wrote:
>
> Xiang Mei <xmei5@asu.edu> wrote:
> > Agreed, and my patch hides it instead of avoiding it. The problem is
> > the prep loop's guard is inverted:
> >
> >         if (policy->selector.prefixlen_d < dbits ||
> >             policy->selector.prefixlen_s < sbits)
> >                 continue;
> >
> > That skips exactly the policies reinserted via the tree (prefixlen <
> > threshold => policy_hash_bysel() NULL => xfrm_policy_inexact_insert()),
>
> Indeed.
>
> > locates for the exact ones instead, which never allocate and get
> > pruned again at out_unlock. So the inexact bin/node is allocated GFP_ATOMIC
> > after the hlist_del_rcu(), and that's the failure the WARN catches.
> >
> > The reproducer lowers then raises the threshold so those bins are pruned
> > and reallocated during reinsert; failslab just makes the failure
> > deterministic, OOM would do the same.
> >
> > v2 inverts the guard so prep prepares the set that's actually reinserted:
> >
> >         -       if (policy->selector.prefixlen_d < dbits ||
> >         -           policy->selector.prefixlen_s < sbits)
> >         +       if (policy->selector.prefixlen_d >= dbits &&
> >         +           policy->selector.prefixlen_s >= sbits)
> >                         continue;
>
> Much better!
>
> > I checked the new patch on the reproducer, and the crash can't be triggered.
> > If you agree with this new patch, I'll send this as v2.
>
> Please do, thanks!

Thanks for checking with me. The v2 has been sent at:
https://lore.kernel.org/netdev/20260703051932.966884-1-xmei5@asu.edu/

Xiang

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2026-07-03  5:22 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-07-02 18:58 [PATCH ipsec] xfrm: policy: use hlist_del_init_rcu in xfrm_hash_rebuild to avoid bydst poison Xiang Mei (Microsoft)
2026-07-02 19:19 ` Florian Westphal
2026-07-02 22:11   ` Xiang Mei
2026-07-03  3:58     ` Florian Westphal
2026-07-03  5:22       ` Xiang Mei

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox