* [PATCH ipsec] xfrm: policy: use hlist_del_init_rcu in xfrm_hash_rebuild to avoid bydst poison
@ 2026-07-02 18:58 Xiang Mei (Microsoft)
2026-07-02 19:19 ` Florian Westphal
0 siblings, 1 reply; 5+ messages in thread
From: Xiang Mei (Microsoft) @ 2026-07-02 18:58 UTC (permalink / raw)
To: steffen.klassert, herbert, davem, netdev
Cc: horms, fw, edumazet, kuba, pabeni, AutonomousCodeSecurity,
tgopinath, kys, Xiang Mei (Microsoft)
xfrm_hash_rebuild() unlinks each policy from its bydst chain with
hlist_del_rcu() and re-inserts it. For an inexact policy the re-insert goes
through xfrm_policy_inexact_insert(), which can fail on a GFP_ATOMIC
allocation; on failure the error path only WARN_ONCE()s and continues, so the
policy is left with a poisoned bydst node (LIST_POISON2). The next rebuild
calls hlist_del_rcu() on that node again, dereferences the poison, and takes a
general protection fault.
Use hlist_del_init_rcu() instead, so a failed-reinsert node is left unhashed
(pprev == NULL) rather than poisoned. The next rebuild's hlist_del_init_rcu()
is then a no-op for it, and the non-failing case is unchanged.
The reinsert allocation is GFP_ATOMIC (it runs under xfrm_policy_lock), so in
practice this is only reached under memory pressure; the crash below was
reproduced deterministically by forcing that allocation to fail with fault
injection (failslab).
Crash:
Oops: general protection fault, probably for non-canonical address
0xfbd59c0000000024: 0000 [#1] SMP KASAN NOPTI
KASAN: maybe wild-memory-access in range [0xdead000000000120-0xdead000000000127]
...
Workqueue: events xfrm_hash_rebuild
RIP: 0010:xfrm_hash_rebuild+0x5b3/0x1190
RAX: dead000000000122 (LIST_POISON2 + offset)
...
Call Trace:
hlist_del_rcu (include/linux/rculist.h:599)
xfrm_hash_rebuild (net/xfrm/xfrm_policy.c:1365)
process_one_work (kernel/workqueue.c:3322)
worker_thread (kernel/workqueue.c:3486)
kthread (kernel/kthread.c:436)
ret_from_fork (arch/x86/kernel/process.c:158)
ret_from_fork_asm (arch/x86/entry/entry_64.S:245)
...
Kernel panic - not syncing: Fatal exception in interrupt
Fixes: 563d5ca93e88 ("xfrm: switch migrate to xfrm_policy_lookup_bytype")
Reported-by: AutonomousCodeSecurity@microsoft.com
Signed-off-by: Xiang Mei (Microsoft) <xmei5@asu.edu>
---
net/xfrm/xfrm_policy.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index 7ef861a0e823..2612a405542b 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -1362,7 +1362,7 @@ static void xfrm_hash_rebuild(struct work_struct *work)
if (xfrm_policy_is_dead_or_sk(policy))
continue;
- hlist_del_rcu(&policy->bydst);
+ hlist_del_init_rcu(&policy->bydst);
newpos = NULL;
dir = xfrm_policy_id2dir(policy->index);
--
2.43.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH ipsec] xfrm: policy: use hlist_del_init_rcu in xfrm_hash_rebuild to avoid bydst poison
2026-07-02 18:58 [PATCH ipsec] xfrm: policy: use hlist_del_init_rcu in xfrm_hash_rebuild to avoid bydst poison Xiang Mei (Microsoft)
@ 2026-07-02 19:19 ` Florian Westphal
2026-07-02 22:11 ` Xiang Mei
0 siblings, 1 reply; 5+ messages in thread
From: Florian Westphal @ 2026-07-02 19:19 UTC (permalink / raw)
To: Xiang Mei (Microsoft)
Cc: steffen.klassert, herbert, davem, netdev, horms, edumazet, kuba,
pabeni, AutonomousCodeSecurity, tgopinath, kys
Xiang Mei (Microsoft) <xmei5@asu.edu> wrote:
> xfrm_hash_rebuild() unlinks each policy from its bydst chain with
> hlist_del_rcu() and re-inserts it. For an inexact policy the re-insert goes
> through xfrm_policy_inexact_insert(), which can fail on a GFP_ATOMIC
> allocation; on failure the error path only WARN_ONCE()s and continues, so the
> policy is left with a poisoned bydst node (LIST_POISON2). The next rebuild
> calls hlist_del_rcu() on that node again, dereferences the poison, and takes a
> general protection fault.
>
> Use hlist_del_init_rcu() instead, so a failed-reinsert node is left unhashed
> (pprev == NULL) rather than poisoned. The next rebuild's hlist_del_init_rcu()
> is then a no-op for it, and the non-failing case is unchanged.
>
> The reinsert allocation is GFP_ATOMIC (it runs under xfrm_policy_lock), so in
> practice this is only reached under memory pressure; the crash below was
> reproduced deterministically by forcing that allocation to fail with fault
> injection (failslab).
>
> Crash:
> Oops: general protection fault, probably for non-canonical address
> 0xfbd59c0000000024: 0000 [#1] SMP KASAN NOPTI
> KASAN: maybe wild-memory-access in range [0xdead000000000120-0xdead000000000127]
> ...
> Workqueue: events xfrm_hash_rebuild
> RIP: 0010:xfrm_hash_rebuild+0x5b3/0x1190
> RAX: dead000000000122 (LIST_POISON2 + offset)
> ...
> Call Trace:
> hlist_del_rcu (include/linux/rculist.h:599)
> xfrm_hash_rebuild (net/xfrm/xfrm_policy.c:1365)
> process_one_work (kernel/workqueue.c:3322)
> worker_thread (kernel/workqueue.c:3486)
> kthread (kernel/kthread.c:436)
> ret_from_fork (arch/x86/kernel/process.c:158)
> ret_from_fork_asm (arch/x86/entry/entry_64.S:245)
> ...
> Kernel panic - not syncing: Fatal exception in interrupt
>
> Fixes: 563d5ca93e88 ("xfrm: switch migrate to xfrm_policy_lookup_bytype")
> Reported-by: AutonomousCodeSecurity@microsoft.com
> Signed-off-by: Xiang Mei (Microsoft) <xmei5@asu.edu>
> ---
> net/xfrm/xfrm_policy.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
> index 7ef861a0e823..2612a405542b 100644
> --- a/net/xfrm/xfrm_policy.c
> +++ b/net/xfrm/xfrm_policy.c
> @@ -1362,7 +1362,7 @@ static void xfrm_hash_rebuild(struct work_struct *work)
> if (xfrm_policy_is_dead_or_sk(policy))
> continue;
>
> - hlist_del_rcu(&policy->bydst);
> + hlist_del_init_rcu(&policy->bydst);
This patch is dubious. I looks to me as if it papers over the
actual bug.
Why is there a memory allocation error?
The first loop -- before unlink -- is supposed to preallocate the new
bins and chain heads.
This is also why there is a WARN. No memory allocations are supposed to
occur after the hlist_del_rcu(), there is supposed to be a guarantee
that the insertion succeeds.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH ipsec] xfrm: policy: use hlist_del_init_rcu in xfrm_hash_rebuild to avoid bydst poison
2026-07-02 19:19 ` Florian Westphal
@ 2026-07-02 22:11 ` Xiang Mei
2026-07-03 3:58 ` Florian Westphal
0 siblings, 1 reply; 5+ messages in thread
From: Xiang Mei @ 2026-07-02 22:11 UTC (permalink / raw)
To: Florian Westphal
Cc: steffen.klassert, herbert, davem, netdev, horms, edumazet, kuba,
pabeni, AutonomousCodeSecurity, tgopinath, kys
On Thu, Jul 2, 2026 at 12:19 PM Florian Westphal <fw@strlen.de> wrote:
>
> Xiang Mei (Microsoft) <xmei5@asu.edu> wrote:
> > xfrm_hash_rebuild() unlinks each policy from its bydst chain with
> > hlist_del_rcu() and re-inserts it. For an inexact policy the re-insert goes
> > through xfrm_policy_inexact_insert(), which can fail on a GFP_ATOMIC
> > allocation; on failure the error path only WARN_ONCE()s and continues, so the
> > policy is left with a poisoned bydst node (LIST_POISON2). The next rebuild
> > calls hlist_del_rcu() on that node again, dereferences the poison, and takes a
> > general protection fault.
> >
> > Use hlist_del_init_rcu() instead, so a failed-reinsert node is left unhashed
> > (pprev == NULL) rather than poisoned. The next rebuild's hlist_del_init_rcu()
> > is then a no-op for it, and the non-failing case is unchanged.
> >
> > The reinsert allocation is GFP_ATOMIC (it runs under xfrm_policy_lock), so in
> > practice this is only reached under memory pressure; the crash below was
> > reproduced deterministically by forcing that allocation to fail with fault
> > injection (failslab).
> >
> > Crash:
> > Oops: general protection fault, probably for non-canonical address
> > 0xfbd59c0000000024: 0000 [#1] SMP KASAN NOPTI
> > KASAN: maybe wild-memory-access in range [0xdead000000000120-0xdead000000000127]
> > ...
> > Workqueue: events xfrm_hash_rebuild
> > RIP: 0010:xfrm_hash_rebuild+0x5b3/0x1190
> > RAX: dead000000000122 (LIST_POISON2 + offset)
> > ...
> > Call Trace:
> > hlist_del_rcu (include/linux/rculist.h:599)
> > xfrm_hash_rebuild (net/xfrm/xfrm_policy.c:1365)
> > process_one_work (kernel/workqueue.c:3322)
> > worker_thread (kernel/workqueue.c:3486)
> > kthread (kernel/kthread.c:436)
> > ret_from_fork (arch/x86/kernel/process.c:158)
> > ret_from_fork_asm (arch/x86/entry/entry_64.S:245)
> > ...
> > Kernel panic - not syncing: Fatal exception in interrupt
> >
> > Fixes: 563d5ca93e88 ("xfrm: switch migrate to xfrm_policy_lookup_bytype")
> > Reported-by: AutonomousCodeSecurity@microsoft.com
> > Signed-off-by: Xiang Mei (Microsoft) <xmei5@asu.edu>
> > ---
> > net/xfrm/xfrm_policy.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
> > index 7ef861a0e823..2612a405542b 100644
> > --- a/net/xfrm/xfrm_policy.c
> > +++ b/net/xfrm/xfrm_policy.c
> > @@ -1362,7 +1362,7 @@ static void xfrm_hash_rebuild(struct work_struct *work)
> > if (xfrm_policy_is_dead_or_sk(policy))
> > continue;
> >
> > - hlist_del_rcu(&policy->bydst);
> > + hlist_del_init_rcu(&policy->bydst);
>
> This patch is dubious. I looks to me as if it papers over the
> actual bug.
>
> Why is there a memory allocation error?
>
> The first loop -- before unlink -- is supposed to preallocate the new
> bins and chain heads.
>
Agreed, and my patch hides it instead of avoiding it. The problem is
the prep loop's guard is inverted:
if (policy->selector.prefixlen_d < dbits ||
policy->selector.prefixlen_s < sbits)
continue;
That skips exactly the policies reinserted via the tree (prefixlen <
threshold => policy_hash_bysel() NULL => xfrm_policy_inexact_insert()), and
preallocates for the exact ones instead, which never allocate and get
pruned again at out_unlock. So the inexact bin/node is allocated GFP_ATOMIC
after the hlist_del_rcu(), and that's the failure the WARN catches.
The reproducer lowers then raises the threshold so those bins are pruned
and reallocated during reinsert; failslab just makes the failure
deterministic, OOM would do the same.
v2 inverts the guard so prep prepares the set that's actually reinserted:
- if (policy->selector.prefixlen_d < dbits ||
- policy->selector.prefixlen_s < sbits)
+ if (policy->selector.prefixlen_d >= dbits &&
+ policy->selector.prefixlen_s >= sbits)
continue;
Then inexact_insert() finds bin+node present and allocates nothing, so the
reinsert can't fail; a prep failure still hits goto out_unlock before any
unlink. hlist_del_rcu vs _init_ then no longer matters.
---
net/xfrm/xfrm_policy.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index 7ef861a0e823..932a313b9460 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -1329,8 +1329,8 @@ static void xfrm_hash_rebuild(struct work_struct *work)
}
}
- if (policy->selector.prefixlen_d < dbits ||
- policy->selector.prefixlen_s < sbits)
+ if (policy->selector.prefixlen_d >= dbits &&
+ policy->selector.prefixlen_s >= sbits)
continue;
bin = xfrm_policy_inexact_alloc_bin(policy, dir);
---
I checked the new patch on the reproducer, and the crash can't be triggered.
If you agree with this new patch, I'll send this as v2. I can also share
the reproducer if you need that to do further checks.
Xiang
Xiang
> This is also why there is a WARN. No memory allocations are supposed to
> occur after the hlist_del_rcu(), there is supposed to be a guarantee
> that the insertion succeeds.
>
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH ipsec] xfrm: policy: use hlist_del_init_rcu in xfrm_hash_rebuild to avoid bydst poison
2026-07-02 22:11 ` Xiang Mei
@ 2026-07-03 3:58 ` Florian Westphal
2026-07-03 5:22 ` Xiang Mei
0 siblings, 1 reply; 5+ messages in thread
From: Florian Westphal @ 2026-07-03 3:58 UTC (permalink / raw)
To: Xiang Mei
Cc: steffen.klassert, herbert, davem, netdev, horms, edumazet, kuba,
pabeni, AutonomousCodeSecurity, tgopinath, kys
Xiang Mei <xmei5@asu.edu> wrote:
> Agreed, and my patch hides it instead of avoiding it. The problem is
> the prep loop's guard is inverted:
>
> if (policy->selector.prefixlen_d < dbits ||
> policy->selector.prefixlen_s < sbits)
> continue;
>
> That skips exactly the policies reinserted via the tree (prefixlen <
> threshold => policy_hash_bysel() NULL => xfrm_policy_inexact_insert()),
Indeed.
> locates for the exact ones instead, which never allocate and get
> pruned again at out_unlock. So the inexact bin/node is allocated GFP_ATOMIC
> after the hlist_del_rcu(), and that's the failure the WARN catches.
>
> The reproducer lowers then raises the threshold so those bins are pruned
> and reallocated during reinsert; failslab just makes the failure
> deterministic, OOM would do the same.
>
> v2 inverts the guard so prep prepares the set that's actually reinserted:
>
> - if (policy->selector.prefixlen_d < dbits ||
> - policy->selector.prefixlen_s < sbits)
> + if (policy->selector.prefixlen_d >= dbits &&
> + policy->selector.prefixlen_s >= sbits)
> continue;
Much better!
> I checked the new patch on the reproducer, and the crash can't be triggered.
> If you agree with this new patch, I'll send this as v2.
Please do, thanks!
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH ipsec] xfrm: policy: use hlist_del_init_rcu in xfrm_hash_rebuild to avoid bydst poison
2026-07-03 3:58 ` Florian Westphal
@ 2026-07-03 5:22 ` Xiang Mei
0 siblings, 0 replies; 5+ messages in thread
From: Xiang Mei @ 2026-07-03 5:22 UTC (permalink / raw)
To: Florian Westphal
Cc: steffen.klassert, herbert, davem, netdev, horms, edumazet, kuba,
pabeni, AutonomousCodeSecurity, tgopinath, kys
On Thu, Jul 2, 2026 at 8:58 PM Florian Westphal <fw@strlen.de> wrote:
>
> Xiang Mei <xmei5@asu.edu> wrote:
> > Agreed, and my patch hides it instead of avoiding it. The problem is
> > the prep loop's guard is inverted:
> >
> > if (policy->selector.prefixlen_d < dbits ||
> > policy->selector.prefixlen_s < sbits)
> > continue;
> >
> > That skips exactly the policies reinserted via the tree (prefixlen <
> > threshold => policy_hash_bysel() NULL => xfrm_policy_inexact_insert()),
>
> Indeed.
>
> > locates for the exact ones instead, which never allocate and get
> > pruned again at out_unlock. So the inexact bin/node is allocated GFP_ATOMIC
> > after the hlist_del_rcu(), and that's the failure the WARN catches.
> >
> > The reproducer lowers then raises the threshold so those bins are pruned
> > and reallocated during reinsert; failslab just makes the failure
> > deterministic, OOM would do the same.
> >
> > v2 inverts the guard so prep prepares the set that's actually reinserted:
> >
> > - if (policy->selector.prefixlen_d < dbits ||
> > - policy->selector.prefixlen_s < sbits)
> > + if (policy->selector.prefixlen_d >= dbits &&
> > + policy->selector.prefixlen_s >= sbits)
> > continue;
>
> Much better!
>
> > I checked the new patch on the reproducer, and the crash can't be triggered.
> > If you agree with this new patch, I'll send this as v2.
>
> Please do, thanks!
Thanks for checking with me. The v2 has been sent at:
https://lore.kernel.org/netdev/20260703051932.966884-1-xmei5@asu.edu/
Xiang
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2026-07-03 5:22 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-07-02 18:58 [PATCH ipsec] xfrm: policy: use hlist_del_init_rcu in xfrm_hash_rebuild to avoid bydst poison Xiang Mei (Microsoft)
2026-07-02 19:19 ` Florian Westphal
2026-07-02 22:11 ` Xiang Mei
2026-07-03 3:58 ` Florian Westphal
2026-07-03 5:22 ` Xiang Mei
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox