* [PATCH ipsec] xfrm: policy: use hlist_del_init_rcu in xfrm_hash_rebuild to avoid bydst poison
@ 2026-07-02 18:58 Xiang Mei (Microsoft)
2026-07-02 19:19 ` Florian Westphal
0 siblings, 1 reply; 5+ messages in thread
From: Xiang Mei (Microsoft) @ 2026-07-02 18:58 UTC (permalink / raw)
To: steffen.klassert, herbert, davem, netdev
Cc: horms, fw, edumazet, kuba, pabeni, AutonomousCodeSecurity,
tgopinath, kys, Xiang Mei (Microsoft)
xfrm_hash_rebuild() unlinks each policy from its bydst chain with
hlist_del_rcu() and re-inserts it. For an inexact policy the re-insert goes
through xfrm_policy_inexact_insert(), which can fail on a GFP_ATOMIC
allocation; on failure the error path only WARN_ONCE()s and continues, so the
policy is left with a poisoned bydst node (LIST_POISON2). The next rebuild
calls hlist_del_rcu() on that node again, dereferences the poison, and takes a
general protection fault.
Use hlist_del_init_rcu() instead, so a failed-reinsert node is left unhashed
(pprev == NULL) rather than poisoned. The next rebuild's hlist_del_init_rcu()
is then a no-op for it, and the non-failing case is unchanged.
The reinsert allocation is GFP_ATOMIC (it runs under xfrm_policy_lock), so in
practice this is only reached under memory pressure; the crash below was
reproduced deterministically by forcing that allocation to fail with fault
injection (failslab).
Crash:
Oops: general protection fault, probably for non-canonical address
0xfbd59c0000000024: 0000 [#1] SMP KASAN NOPTI
KASAN: maybe wild-memory-access in range [0xdead000000000120-0xdead000000000127]
...
Workqueue: events xfrm_hash_rebuild
RIP: 0010:xfrm_hash_rebuild+0x5b3/0x1190
RAX: dead000000000122 (LIST_POISON2 + offset)
...
Call Trace:
hlist_del_rcu (include/linux/rculist.h:599)
xfrm_hash_rebuild (net/xfrm/xfrm_policy.c:1365)
process_one_work (kernel/workqueue.c:3322)
worker_thread (kernel/workqueue.c:3486)
kthread (kernel/kthread.c:436)
ret_from_fork (arch/x86/kernel/process.c:158)
ret_from_fork_asm (arch/x86/entry/entry_64.S:245)
...
Kernel panic - not syncing: Fatal exception in interrupt
Fixes: 563d5ca93e88 ("xfrm: switch migrate to xfrm_policy_lookup_bytype")
Reported-by: AutonomousCodeSecurity@microsoft.com
Signed-off-by: Xiang Mei (Microsoft) <xmei5@asu.edu>
---
net/xfrm/xfrm_policy.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index 7ef861a0e823..2612a405542b 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -1362,7 +1362,7 @@ static void xfrm_hash_rebuild(struct work_struct *work)
if (xfrm_policy_is_dead_or_sk(policy))
continue;
- hlist_del_rcu(&policy->bydst);
+ hlist_del_init_rcu(&policy->bydst);
newpos = NULL;
dir = xfrm_policy_id2dir(policy->index);
--
2.43.0
^ permalink raw reply related [flat|nested] 5+ messages in thread* Re: [PATCH ipsec] xfrm: policy: use hlist_del_init_rcu in xfrm_hash_rebuild to avoid bydst poison 2026-07-02 18:58 [PATCH ipsec] xfrm: policy: use hlist_del_init_rcu in xfrm_hash_rebuild to avoid bydst poison Xiang Mei (Microsoft) @ 2026-07-02 19:19 ` Florian Westphal 2026-07-02 22:11 ` Xiang Mei 0 siblings, 1 reply; 5+ messages in thread From: Florian Westphal @ 2026-07-02 19:19 UTC (permalink / raw) To: Xiang Mei (Microsoft) Cc: steffen.klassert, herbert, davem, netdev, horms, edumazet, kuba, pabeni, AutonomousCodeSecurity, tgopinath, kys Xiang Mei (Microsoft) <xmei5@asu.edu> wrote: > xfrm_hash_rebuild() unlinks each policy from its bydst chain with > hlist_del_rcu() and re-inserts it. For an inexact policy the re-insert goes > through xfrm_policy_inexact_insert(), which can fail on a GFP_ATOMIC > allocation; on failure the error path only WARN_ONCE()s and continues, so the > policy is left with a poisoned bydst node (LIST_POISON2). The next rebuild > calls hlist_del_rcu() on that node again, dereferences the poison, and takes a > general protection fault. > > Use hlist_del_init_rcu() instead, so a failed-reinsert node is left unhashed > (pprev == NULL) rather than poisoned. The next rebuild's hlist_del_init_rcu() > is then a no-op for it, and the non-failing case is unchanged. > > The reinsert allocation is GFP_ATOMIC (it runs under xfrm_policy_lock), so in > practice this is only reached under memory pressure; the crash below was > reproduced deterministically by forcing that allocation to fail with fault > injection (failslab). > > Crash: > Oops: general protection fault, probably for non-canonical address > 0xfbd59c0000000024: 0000 [#1] SMP KASAN NOPTI > KASAN: maybe wild-memory-access in range [0xdead000000000120-0xdead000000000127] > ... > Workqueue: events xfrm_hash_rebuild > RIP: 0010:xfrm_hash_rebuild+0x5b3/0x1190 > RAX: dead000000000122 (LIST_POISON2 + offset) > ... > Call Trace: > hlist_del_rcu (include/linux/rculist.h:599) > xfrm_hash_rebuild (net/xfrm/xfrm_policy.c:1365) > process_one_work (kernel/workqueue.c:3322) > worker_thread (kernel/workqueue.c:3486) > kthread (kernel/kthread.c:436) > ret_from_fork (arch/x86/kernel/process.c:158) > ret_from_fork_asm (arch/x86/entry/entry_64.S:245) > ... > Kernel panic - not syncing: Fatal exception in interrupt > > Fixes: 563d5ca93e88 ("xfrm: switch migrate to xfrm_policy_lookup_bytype") > Reported-by: AutonomousCodeSecurity@microsoft.com > Signed-off-by: Xiang Mei (Microsoft) <xmei5@asu.edu> > --- > net/xfrm/xfrm_policy.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c > index 7ef861a0e823..2612a405542b 100644 > --- a/net/xfrm/xfrm_policy.c > +++ b/net/xfrm/xfrm_policy.c > @@ -1362,7 +1362,7 @@ static void xfrm_hash_rebuild(struct work_struct *work) > if (xfrm_policy_is_dead_or_sk(policy)) > continue; > > - hlist_del_rcu(&policy->bydst); > + hlist_del_init_rcu(&policy->bydst); This patch is dubious. I looks to me as if it papers over the actual bug. Why is there a memory allocation error? The first loop -- before unlink -- is supposed to preallocate the new bins and chain heads. This is also why there is a WARN. No memory allocations are supposed to occur after the hlist_del_rcu(), there is supposed to be a guarantee that the insertion succeeds. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH ipsec] xfrm: policy: use hlist_del_init_rcu in xfrm_hash_rebuild to avoid bydst poison 2026-07-02 19:19 ` Florian Westphal @ 2026-07-02 22:11 ` Xiang Mei 2026-07-03 3:58 ` Florian Westphal 0 siblings, 1 reply; 5+ messages in thread From: Xiang Mei @ 2026-07-02 22:11 UTC (permalink / raw) To: Florian Westphal Cc: steffen.klassert, herbert, davem, netdev, horms, edumazet, kuba, pabeni, AutonomousCodeSecurity, tgopinath, kys On Thu, Jul 2, 2026 at 12:19 PM Florian Westphal <fw@strlen.de> wrote: > > Xiang Mei (Microsoft) <xmei5@asu.edu> wrote: > > xfrm_hash_rebuild() unlinks each policy from its bydst chain with > > hlist_del_rcu() and re-inserts it. For an inexact policy the re-insert goes > > through xfrm_policy_inexact_insert(), which can fail on a GFP_ATOMIC > > allocation; on failure the error path only WARN_ONCE()s and continues, so the > > policy is left with a poisoned bydst node (LIST_POISON2). The next rebuild > > calls hlist_del_rcu() on that node again, dereferences the poison, and takes a > > general protection fault. > > > > Use hlist_del_init_rcu() instead, so a failed-reinsert node is left unhashed > > (pprev == NULL) rather than poisoned. The next rebuild's hlist_del_init_rcu() > > is then a no-op for it, and the non-failing case is unchanged. > > > > The reinsert allocation is GFP_ATOMIC (it runs under xfrm_policy_lock), so in > > practice this is only reached under memory pressure; the crash below was > > reproduced deterministically by forcing that allocation to fail with fault > > injection (failslab). > > > > Crash: > > Oops: general protection fault, probably for non-canonical address > > 0xfbd59c0000000024: 0000 [#1] SMP KASAN NOPTI > > KASAN: maybe wild-memory-access in range [0xdead000000000120-0xdead000000000127] > > ... > > Workqueue: events xfrm_hash_rebuild > > RIP: 0010:xfrm_hash_rebuild+0x5b3/0x1190 > > RAX: dead000000000122 (LIST_POISON2 + offset) > > ... > > Call Trace: > > hlist_del_rcu (include/linux/rculist.h:599) > > xfrm_hash_rebuild (net/xfrm/xfrm_policy.c:1365) > > process_one_work (kernel/workqueue.c:3322) > > worker_thread (kernel/workqueue.c:3486) > > kthread (kernel/kthread.c:436) > > ret_from_fork (arch/x86/kernel/process.c:158) > > ret_from_fork_asm (arch/x86/entry/entry_64.S:245) > > ... > > Kernel panic - not syncing: Fatal exception in interrupt > > > > Fixes: 563d5ca93e88 ("xfrm: switch migrate to xfrm_policy_lookup_bytype") > > Reported-by: AutonomousCodeSecurity@microsoft.com > > Signed-off-by: Xiang Mei (Microsoft) <xmei5@asu.edu> > > --- > > net/xfrm/xfrm_policy.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c > > index 7ef861a0e823..2612a405542b 100644 > > --- a/net/xfrm/xfrm_policy.c > > +++ b/net/xfrm/xfrm_policy.c > > @@ -1362,7 +1362,7 @@ static void xfrm_hash_rebuild(struct work_struct *work) > > if (xfrm_policy_is_dead_or_sk(policy)) > > continue; > > > > - hlist_del_rcu(&policy->bydst); > > + hlist_del_init_rcu(&policy->bydst); > > This patch is dubious. I looks to me as if it papers over the > actual bug. > > Why is there a memory allocation error? > > The first loop -- before unlink -- is supposed to preallocate the new > bins and chain heads. > Agreed, and my patch hides it instead of avoiding it. The problem is the prep loop's guard is inverted: if (policy->selector.prefixlen_d < dbits || policy->selector.prefixlen_s < sbits) continue; That skips exactly the policies reinserted via the tree (prefixlen < threshold => policy_hash_bysel() NULL => xfrm_policy_inexact_insert()), and preallocates for the exact ones instead, which never allocate and get pruned again at out_unlock. So the inexact bin/node is allocated GFP_ATOMIC after the hlist_del_rcu(), and that's the failure the WARN catches. The reproducer lowers then raises the threshold so those bins are pruned and reallocated during reinsert; failslab just makes the failure deterministic, OOM would do the same. v2 inverts the guard so prep prepares the set that's actually reinserted: - if (policy->selector.prefixlen_d < dbits || - policy->selector.prefixlen_s < sbits) + if (policy->selector.prefixlen_d >= dbits && + policy->selector.prefixlen_s >= sbits) continue; Then inexact_insert() finds bin+node present and allocates nothing, so the reinsert can't fail; a prep failure still hits goto out_unlock before any unlink. hlist_del_rcu vs _init_ then no longer matters. --- net/xfrm/xfrm_policy.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c index 7ef861a0e823..932a313b9460 100644 --- a/net/xfrm/xfrm_policy.c +++ b/net/xfrm/xfrm_policy.c @@ -1329,8 +1329,8 @@ static void xfrm_hash_rebuild(struct work_struct *work) } } - if (policy->selector.prefixlen_d < dbits || - policy->selector.prefixlen_s < sbits) + if (policy->selector.prefixlen_d >= dbits && + policy->selector.prefixlen_s >= sbits) continue; bin = xfrm_policy_inexact_alloc_bin(policy, dir); --- I checked the new patch on the reproducer, and the crash can't be triggered. If you agree with this new patch, I'll send this as v2. I can also share the reproducer if you need that to do further checks. Xiang Xiang > This is also why there is a WARN. No memory allocations are supposed to > occur after the hlist_del_rcu(), there is supposed to be a guarantee > that the insertion succeeds. > ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH ipsec] xfrm: policy: use hlist_del_init_rcu in xfrm_hash_rebuild to avoid bydst poison 2026-07-02 22:11 ` Xiang Mei @ 2026-07-03 3:58 ` Florian Westphal 2026-07-03 5:22 ` Xiang Mei 0 siblings, 1 reply; 5+ messages in thread From: Florian Westphal @ 2026-07-03 3:58 UTC (permalink / raw) To: Xiang Mei Cc: steffen.klassert, herbert, davem, netdev, horms, edumazet, kuba, pabeni, AutonomousCodeSecurity, tgopinath, kys Xiang Mei <xmei5@asu.edu> wrote: > Agreed, and my patch hides it instead of avoiding it. The problem is > the prep loop's guard is inverted: > > if (policy->selector.prefixlen_d < dbits || > policy->selector.prefixlen_s < sbits) > continue; > > That skips exactly the policies reinserted via the tree (prefixlen < > threshold => policy_hash_bysel() NULL => xfrm_policy_inexact_insert()), Indeed. > locates for the exact ones instead, which never allocate and get > pruned again at out_unlock. So the inexact bin/node is allocated GFP_ATOMIC > after the hlist_del_rcu(), and that's the failure the WARN catches. > > The reproducer lowers then raises the threshold so those bins are pruned > and reallocated during reinsert; failslab just makes the failure > deterministic, OOM would do the same. > > v2 inverts the guard so prep prepares the set that's actually reinserted: > > - if (policy->selector.prefixlen_d < dbits || > - policy->selector.prefixlen_s < sbits) > + if (policy->selector.prefixlen_d >= dbits && > + policy->selector.prefixlen_s >= sbits) > continue; Much better! > I checked the new patch on the reproducer, and the crash can't be triggered. > If you agree with this new patch, I'll send this as v2. Please do, thanks! ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH ipsec] xfrm: policy: use hlist_del_init_rcu in xfrm_hash_rebuild to avoid bydst poison 2026-07-03 3:58 ` Florian Westphal @ 2026-07-03 5:22 ` Xiang Mei 0 siblings, 0 replies; 5+ messages in thread From: Xiang Mei @ 2026-07-03 5:22 UTC (permalink / raw) To: Florian Westphal Cc: steffen.klassert, herbert, davem, netdev, horms, edumazet, kuba, pabeni, AutonomousCodeSecurity, tgopinath, kys On Thu, Jul 2, 2026 at 8:58 PM Florian Westphal <fw@strlen.de> wrote: > > Xiang Mei <xmei5@asu.edu> wrote: > > Agreed, and my patch hides it instead of avoiding it. The problem is > > the prep loop's guard is inverted: > > > > if (policy->selector.prefixlen_d < dbits || > > policy->selector.prefixlen_s < sbits) > > continue; > > > > That skips exactly the policies reinserted via the tree (prefixlen < > > threshold => policy_hash_bysel() NULL => xfrm_policy_inexact_insert()), > > Indeed. > > > locates for the exact ones instead, which never allocate and get > > pruned again at out_unlock. So the inexact bin/node is allocated GFP_ATOMIC > > after the hlist_del_rcu(), and that's the failure the WARN catches. > > > > The reproducer lowers then raises the threshold so those bins are pruned > > and reallocated during reinsert; failslab just makes the failure > > deterministic, OOM would do the same. > > > > v2 inverts the guard so prep prepares the set that's actually reinserted: > > > > - if (policy->selector.prefixlen_d < dbits || > > - policy->selector.prefixlen_s < sbits) > > + if (policy->selector.prefixlen_d >= dbits && > > + policy->selector.prefixlen_s >= sbits) > > continue; > > Much better! > > > I checked the new patch on the reproducer, and the crash can't be triggered. > > If you agree with this new patch, I'll send this as v2. > > Please do, thanks! Thanks for checking with me. The v2 has been sent at: https://lore.kernel.org/netdev/20260703051932.966884-1-xmei5@asu.edu/ Xiang ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2026-07-03 5:22 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-07-02 18:58 [PATCH ipsec] xfrm: policy: use hlist_del_init_rcu in xfrm_hash_rebuild to avoid bydst poison Xiang Mei (Microsoft) 2026-07-02 19:19 ` Florian Westphal 2026-07-02 22:11 ` Xiang Mei 2026-07-03 3:58 ` Florian Westphal 2026-07-03 5:22 ` Xiang Mei
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox