netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net] vxlan: Fix nexthop hash size
@ 2023-07-31 20:02 Benjamin Poirier
  2023-08-01  9:53 ` Simon Horman
  2023-08-02 10:00 ` patchwork-bot+netdevbpf
  0 siblings, 2 replies; 3+ messages in thread
From: Benjamin Poirier @ 2023-07-31 20:02 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
  Cc: Roopa Prabhu, Ido Schimmel, netdev

The nexthop code expects a 31 bit hash, such as what is returned by
fib_multipath_hash() and rt6_multipath_hash(). Passing the 32 bit hash
returned by skb_get_hash() can lead to problems related to the fact that
'int hash' is a negative number when the MSB is set.

In the case of hash threshold nexthop groups, nexthop_select_path_hthr()
will disproportionately select the first nexthop group entry. In the case
of resilient nexthop groups, nexthop_select_path_res() may do an out of
bounds access in nh_buckets[], for example:
    hash = -912054133
    num_nh_buckets = 2
    bucket_index = 65535

which leads to the following panic:

BUG: unable to handle page fault for address: ffffc900025910c8
PGD 100000067 P4D 100000067 PUD 10026b067 PMD 0
Oops: 0002 [#1] PREEMPT SMP KASAN NOPTI
CPU: 4 PID: 856 Comm: kworker/4:3 Not tainted 6.5.0-rc2+ #34
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
Workqueue: ipv6_addrconf addrconf_dad_work
RIP: 0010:nexthop_select_path+0x197/0xbf0
Code: c1 e4 05 be 08 00 00 00 4c 8b 35 a4 14 7e 01 4e 8d 6c 25 00 4a 8d 7c 25 08 48 01 dd e8 c2 25 15 ff 49 8d 7d 08 e8 39 13 15 ff <4d> 89 75 08 48 89 ef e8 7d 12 15 ff 48 8b 5d 00 e8 14 55 2f 00 85
RSP: 0018:ffff88810c36f260 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 00000000002000c0 RCX: ffffffffaf02dd77
RDX: dffffc0000000000 RSI: 0000000000000008 RDI: ffffc900025910c8
RBP: ffffc900025910c0 R08: 0000000000000001 R09: fffff520004b2219
R10: ffffc900025910cf R11: 31392d2068736168 R12: 00000000002000c0
R13: ffffc900025910c0 R14: 00000000fffef608 R15: ffff88811840e900
FS:  0000000000000000(0000) GS:ffff8881f7000000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffc900025910c8 CR3: 0000000129d00000 CR4: 0000000000750ee0
PKRU: 55555554
Call Trace:
 <TASK>
 ? __die+0x23/0x70
 ? page_fault_oops+0x1ee/0x5c0
 ? __pfx_is_prefetch.constprop.0+0x10/0x10
 ? __pfx_page_fault_oops+0x10/0x10
 ? search_bpf_extables+0xfe/0x1c0
 ? fixup_exception+0x3b/0x470
 ? exc_page_fault+0xf6/0x110
 ? asm_exc_page_fault+0x26/0x30
 ? nexthop_select_path+0x197/0xbf0
 ? nexthop_select_path+0x197/0xbf0
 ? lock_is_held_type+0xe7/0x140
 vxlan_xmit+0x5b2/0x2340
 ? __lock_acquire+0x92b/0x3370
 ? __pfx_vxlan_xmit+0x10/0x10
 ? __pfx___lock_acquire+0x10/0x10
 ? __pfx_register_lock_class+0x10/0x10
 ? skb_network_protocol+0xce/0x2d0
 ? dev_hard_start_xmit+0xca/0x350
 ? __pfx_vxlan_xmit+0x10/0x10
 dev_hard_start_xmit+0xca/0x350
 __dev_queue_xmit+0x513/0x1e20
 ? __pfx___dev_queue_xmit+0x10/0x10
 ? __pfx_lock_release+0x10/0x10
 ? mark_held_locks+0x44/0x90
 ? skb_push+0x4c/0x80
 ? eth_header+0x81/0xe0
 ? __pfx_eth_header+0x10/0x10
 ? neigh_resolve_output+0x215/0x310
 ? ip6_finish_output2+0x2ba/0xc90
 ip6_finish_output2+0x2ba/0xc90
 ? lock_release+0x236/0x3e0
 ? ip6_mtu+0xbb/0x240
 ? __pfx_ip6_finish_output2+0x10/0x10
 ? find_held_lock+0x83/0xa0
 ? lock_is_held_type+0xe7/0x140
 ip6_finish_output+0x1ee/0x780
 ip6_output+0x138/0x460
 ? __pfx_ip6_output+0x10/0x10
 ? __pfx___lock_acquire+0x10/0x10
 ? __pfx_ip6_finish_output+0x10/0x10
 NF_HOOK.constprop.0+0xc0/0x420
 ? __pfx_NF_HOOK.constprop.0+0x10/0x10
 ? ndisc_send_skb+0x2c0/0x960
 ? __pfx_lock_release+0x10/0x10
 ? __local_bh_enable_ip+0x93/0x110
 ? lock_is_held_type+0xe7/0x140
 ndisc_send_skb+0x4be/0x960
 ? __pfx_ndisc_send_skb+0x10/0x10
 ? mark_held_locks+0x65/0x90
 ? find_held_lock+0x83/0xa0
 ndisc_send_ns+0xb0/0x110
 ? __pfx_ndisc_send_ns+0x10/0x10
 addrconf_dad_work+0x631/0x8e0
 ? lock_acquire+0x180/0x3f0
 ? __pfx_addrconf_dad_work+0x10/0x10
 ? mark_held_locks+0x24/0x90
 process_one_work+0x582/0x9c0
 ? __pfx_process_one_work+0x10/0x10
 ? __pfx_do_raw_spin_lock+0x10/0x10
 ? mark_held_locks+0x24/0x90
 worker_thread+0x93/0x630
 ? __kthread_parkme+0xdc/0x100
 ? __pfx_worker_thread+0x10/0x10
 kthread+0x1a5/0x1e0
 ? __pfx_kthread+0x10/0x10
 ret_from_fork+0x34/0x60
 ? __pfx_kthread+0x10/0x10
 ret_from_fork_asm+0x1b/0x30
RIP: 0000:0x0
Code: Unable to access opcode bytes at 0xffffffffffffffd6.
RSP: 0000:0000000000000000 EFLAGS: 00000000 ORIG_RAX: 0000000000000000
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
 </TASK>
Modules linked in:
CR2: ffffc900025910c8
---[ end trace 0000000000000000 ]---
RIP: 0010:nexthop_select_path+0x197/0xbf0
Code: c1 e4 05 be 08 00 00 00 4c 8b 35 a4 14 7e 01 4e 8d 6c 25 00 4a 8d 7c 25 08 48 01 dd e8 c2 25 15 ff 49 8d 7d 08 e8 39 13 15 ff <4d> 89 75 08 48 89 ef e8 7d 12 15 ff 48 8b 5d 00 e8 14 55 2f 00 85
RSP: 0018:ffff88810c36f260 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 00000000002000c0 RCX: ffffffffaf02dd77
RDX: dffffc0000000000 RSI: 0000000000000008 RDI: ffffc900025910c8
RBP: ffffc900025910c0 R08: 0000000000000001 R09: fffff520004b2219
R10: ffffc900025910cf R11: 31392d2068736168 R12: 00000000002000c0
R13: ffffc900025910c0 R14: 00000000fffef608 R15: ffff88811840e900
FS:  0000000000000000(0000) GS:ffff8881f7000000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffffffffffd6 CR3: 0000000129d00000 CR4: 0000000000750ee0
PKRU: 55555554
Kernel panic - not syncing: Fatal exception in interrupt
Kernel Offset: 0x2ca00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---

Fix this problem by ensuring the MSB of hash is 0 using a right shift - the
same approach used in fib_multipath_hash() and rt6_multipath_hash().

Fixes: 1274e1cc4226 ("vxlan: ecmp support for mac fdb entries")
Signed-off-by: Benjamin Poirier <bpoirier@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
---
 include/net/vxlan.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/net/vxlan.h b/include/net/vxlan.h
index 1648240c9668..6a9f8a5f387c 100644
--- a/include/net/vxlan.h
+++ b/include/net/vxlan.h
@@ -556,12 +556,12 @@ static inline void vxlan_flag_attr_error(int attrtype,
 }
 
 static inline bool vxlan_fdb_nh_path_select(struct nexthop *nh,
-					    int hash,
+					    u32 hash,
 					    struct vxlan_rdst *rdst)
 {
 	struct fib_nh_common *nhc;
 
-	nhc = nexthop_path_fdb_result(nh, hash);
+	nhc = nexthop_path_fdb_result(nh, hash >> 1);
 	if (unlikely(!nhc))
 		return false;
 
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH net] vxlan: Fix nexthop hash size
  2023-07-31 20:02 [PATCH net] vxlan: Fix nexthop hash size Benjamin Poirier
@ 2023-08-01  9:53 ` Simon Horman
  2023-08-02 10:00 ` patchwork-bot+netdevbpf
  1 sibling, 0 replies; 3+ messages in thread
From: Simon Horman @ 2023-08-01  9:53 UTC (permalink / raw)
  To: Benjamin Poirier
  Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Roopa Prabhu, Ido Schimmel, netdev

On Mon, Jul 31, 2023 at 04:02:08PM -0400, Benjamin Poirier wrote:
> The nexthop code expects a 31 bit hash, such as what is returned by
> fib_multipath_hash() and rt6_multipath_hash(). Passing the 32 bit hash
> returned by skb_get_hash() can lead to problems related to the fact that
> 'int hash' is a negative number when the MSB is set.
> 
> In the case of hash threshold nexthop groups, nexthop_select_path_hthr()
> will disproportionately select the first nexthop group entry. In the case
> of resilient nexthop groups, nexthop_select_path_res() may do an out of
> bounds access in nh_buckets[], for example:
>     hash = -912054133
>     num_nh_buckets = 2
>     bucket_index = 65535
> 
> which leads to the following panic:
> 
> BUG: unable to handle page fault for address: ffffc900025910c8
> PGD 100000067 P4D 100000067 PUD 10026b067 PMD 0
> Oops: 0002 [#1] PREEMPT SMP KASAN NOPTI
> CPU: 4 PID: 856 Comm: kworker/4:3 Not tainted 6.5.0-rc2+ #34
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> Workqueue: ipv6_addrconf addrconf_dad_work
> RIP: 0010:nexthop_select_path+0x197/0xbf0
> Code: c1 e4 05 be 08 00 00 00 4c 8b 35 a4 14 7e 01 4e 8d 6c 25 00 4a 8d 7c 25 08 48 01 dd e8 c2 25 15 ff 49 8d 7d 08 e8 39 13 15 ff <4d> 89 75 08 48 89 ef e8 7d 12 15 ff 48 8b 5d 00 e8 14 55 2f 00 85
> RSP: 0018:ffff88810c36f260 EFLAGS: 00010246
> RAX: 0000000000000000 RBX: 00000000002000c0 RCX: ffffffffaf02dd77
> RDX: dffffc0000000000 RSI: 0000000000000008 RDI: ffffc900025910c8
> RBP: ffffc900025910c0 R08: 0000000000000001 R09: fffff520004b2219
> R10: ffffc900025910cf R11: 31392d2068736168 R12: 00000000002000c0
> R13: ffffc900025910c0 R14: 00000000fffef608 R15: ffff88811840e900
> FS:  0000000000000000(0000) GS:ffff8881f7000000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: ffffc900025910c8 CR3: 0000000129d00000 CR4: 0000000000750ee0
> PKRU: 55555554
> Call Trace:
>  <TASK>
>  ? __die+0x23/0x70
>  ? page_fault_oops+0x1ee/0x5c0
>  ? __pfx_is_prefetch.constprop.0+0x10/0x10
>  ? __pfx_page_fault_oops+0x10/0x10
>  ? search_bpf_extables+0xfe/0x1c0
>  ? fixup_exception+0x3b/0x470
>  ? exc_page_fault+0xf6/0x110
>  ? asm_exc_page_fault+0x26/0x30
>  ? nexthop_select_path+0x197/0xbf0
>  ? nexthop_select_path+0x197/0xbf0
>  ? lock_is_held_type+0xe7/0x140
>  vxlan_xmit+0x5b2/0x2340
>  ? __lock_acquire+0x92b/0x3370
>  ? __pfx_vxlan_xmit+0x10/0x10
>  ? __pfx___lock_acquire+0x10/0x10
>  ? __pfx_register_lock_class+0x10/0x10
>  ? skb_network_protocol+0xce/0x2d0
>  ? dev_hard_start_xmit+0xca/0x350
>  ? __pfx_vxlan_xmit+0x10/0x10
>  dev_hard_start_xmit+0xca/0x350
>  __dev_queue_xmit+0x513/0x1e20
>  ? __pfx___dev_queue_xmit+0x10/0x10
>  ? __pfx_lock_release+0x10/0x10
>  ? mark_held_locks+0x44/0x90
>  ? skb_push+0x4c/0x80
>  ? eth_header+0x81/0xe0
>  ? __pfx_eth_header+0x10/0x10
>  ? neigh_resolve_output+0x215/0x310
>  ? ip6_finish_output2+0x2ba/0xc90
>  ip6_finish_output2+0x2ba/0xc90
>  ? lock_release+0x236/0x3e0
>  ? ip6_mtu+0xbb/0x240
>  ? __pfx_ip6_finish_output2+0x10/0x10
>  ? find_held_lock+0x83/0xa0
>  ? lock_is_held_type+0xe7/0x140
>  ip6_finish_output+0x1ee/0x780
>  ip6_output+0x138/0x460
>  ? __pfx_ip6_output+0x10/0x10
>  ? __pfx___lock_acquire+0x10/0x10
>  ? __pfx_ip6_finish_output+0x10/0x10
>  NF_HOOK.constprop.0+0xc0/0x420
>  ? __pfx_NF_HOOK.constprop.0+0x10/0x10
>  ? ndisc_send_skb+0x2c0/0x960
>  ? __pfx_lock_release+0x10/0x10
>  ? __local_bh_enable_ip+0x93/0x110
>  ? lock_is_held_type+0xe7/0x140
>  ndisc_send_skb+0x4be/0x960
>  ? __pfx_ndisc_send_skb+0x10/0x10
>  ? mark_held_locks+0x65/0x90
>  ? find_held_lock+0x83/0xa0
>  ndisc_send_ns+0xb0/0x110
>  ? __pfx_ndisc_send_ns+0x10/0x10
>  addrconf_dad_work+0x631/0x8e0
>  ? lock_acquire+0x180/0x3f0
>  ? __pfx_addrconf_dad_work+0x10/0x10
>  ? mark_held_locks+0x24/0x90
>  process_one_work+0x582/0x9c0
>  ? __pfx_process_one_work+0x10/0x10
>  ? __pfx_do_raw_spin_lock+0x10/0x10
>  ? mark_held_locks+0x24/0x90
>  worker_thread+0x93/0x630
>  ? __kthread_parkme+0xdc/0x100
>  ? __pfx_worker_thread+0x10/0x10
>  kthread+0x1a5/0x1e0
>  ? __pfx_kthread+0x10/0x10
>  ret_from_fork+0x34/0x60
>  ? __pfx_kthread+0x10/0x10
>  ret_from_fork_asm+0x1b/0x30
> RIP: 0000:0x0
> Code: Unable to access opcode bytes at 0xffffffffffffffd6.
> RSP: 0000:0000000000000000 EFLAGS: 00000000 ORIG_RAX: 0000000000000000
> RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
> RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
> R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
>  </TASK>
> Modules linked in:
> CR2: ffffc900025910c8
> ---[ end trace 0000000000000000 ]---
> RIP: 0010:nexthop_select_path+0x197/0xbf0
> Code: c1 e4 05 be 08 00 00 00 4c 8b 35 a4 14 7e 01 4e 8d 6c 25 00 4a 8d 7c 25 08 48 01 dd e8 c2 25 15 ff 49 8d 7d 08 e8 39 13 15 ff <4d> 89 75 08 48 89 ef e8 7d 12 15 ff 48 8b 5d 00 e8 14 55 2f 00 85
> RSP: 0018:ffff88810c36f260 EFLAGS: 00010246
> RAX: 0000000000000000 RBX: 00000000002000c0 RCX: ffffffffaf02dd77
> RDX: dffffc0000000000 RSI: 0000000000000008 RDI: ffffc900025910c8
> RBP: ffffc900025910c0 R08: 0000000000000001 R09: fffff520004b2219
> R10: ffffc900025910cf R11: 31392d2068736168 R12: 00000000002000c0
> R13: ffffc900025910c0 R14: 00000000fffef608 R15: ffff88811840e900
> FS:  0000000000000000(0000) GS:ffff8881f7000000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: ffffffffffffffd6 CR3: 0000000129d00000 CR4: 0000000000750ee0
> PKRU: 55555554
> Kernel panic - not syncing: Fatal exception in interrupt
> Kernel Offset: 0x2ca00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
> 
> Fix this problem by ensuring the MSB of hash is 0 using a right shift - the
> same approach used in fib_multipath_hash() and rt6_multipath_hash().
> 
> Fixes: 1274e1cc4226 ("vxlan: ecmp support for mac fdb entries")
> Signed-off-by: Benjamin Poirier <bpoirier@nvidia.com>
> Reviewed-by: Ido Schimmel <idosch@nvidia.com>

Reviewed-by: Simon Horman <horms@kernel.org>


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH net] vxlan: Fix nexthop hash size
  2023-07-31 20:02 [PATCH net] vxlan: Fix nexthop hash size Benjamin Poirier
  2023-08-01  9:53 ` Simon Horman
@ 2023-08-02 10:00 ` patchwork-bot+netdevbpf
  1 sibling, 0 replies; 3+ messages in thread
From: patchwork-bot+netdevbpf @ 2023-08-02 10:00 UTC (permalink / raw)
  To: Benjamin Poirier; +Cc: davem, edumazet, kuba, pabeni, roopa, idosch, netdev

Hello:

This patch was applied to netdev/net.git (main)
by David S. Miller <davem@davemloft.net>:

On Mon, 31 Jul 2023 16:02:08 -0400 you wrote:
> The nexthop code expects a 31 bit hash, such as what is returned by
> fib_multipath_hash() and rt6_multipath_hash(). Passing the 32 bit hash
> returned by skb_get_hash() can lead to problems related to the fact that
> 'int hash' is a negative number when the MSB is set.
> 
> In the case of hash threshold nexthop groups, nexthop_select_path_hthr()
> will disproportionately select the first nexthop group entry. In the case
> of resilient nexthop groups, nexthop_select_path_res() may do an out of
> bounds access in nh_buckets[], for example:
>     hash = -912054133
>     num_nh_buckets = 2
>     bucket_index = 65535
> 
> [...]

Here is the summary with links:
  - [net] vxlan: Fix nexthop hash size
    https://git.kernel.org/netdev/net/c/0756384fb1bd

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2023-08-02 10:00 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-07-31 20:02 [PATCH net] vxlan: Fix nexthop hash size Benjamin Poirier
2023-08-01  9:53 ` Simon Horman
2023-08-02 10:00 ` patchwork-bot+netdevbpf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).