* [PATCH net 0/4] bonding: fix xfrm offload bugs
@ 2024-08-16 11:48 Nikolay Aleksandrov
2024-08-16 11:48 ` [PATCH net 1/4] bonding: fix bond_ipsec_offload_ok return type Nikolay Aleksandrov
` (5 more replies)
0 siblings, 6 replies; 20+ messages in thread
From: Nikolay Aleksandrov @ 2024-08-16 11:48 UTC (permalink / raw)
To: netdev
Cc: Taehee Yoo, davem, jv, andy, edumazet, kuba, pabeni, jarod,
Nikolay Aleksandrov
Hi,
I noticed these problems while reviewing a bond xfrm patch recently.
The fixes are straight-forward, please review carefully the last one
because it has side-effects. This set has passed bond's selftests
and my custom bond stress tests which crash without these fixes.
Note the first patch is not critical, but it simplifies the next fix.
Thanks,
Nik
Nikolay Aleksandrov (4):
bonding: fix bond_ipsec_offload_ok return type
bonding: fix null pointer deref in bond_ipsec_offload_ok
bonding: fix xfrm real_dev null pointer dereference
bonding: fix xfrm state handling when clearing active slave
drivers/net/bonding/bond_main.c | 21 ++++++++-------------
drivers/net/bonding/bond_options.c | 2 +-
2 files changed, 9 insertions(+), 14 deletions(-)
--
2.44.0
^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH net 1/4] bonding: fix bond_ipsec_offload_ok return type
2024-08-16 11:48 [PATCH net 0/4] bonding: fix xfrm offload bugs Nikolay Aleksandrov
@ 2024-08-16 11:48 ` Nikolay Aleksandrov
2024-08-19 2:43 ` Hangbin Liu
2024-08-16 11:48 ` [PATCH net 2/4] bonding: fix null pointer deref in bond_ipsec_offload_ok Nikolay Aleksandrov
` (4 subsequent siblings)
5 siblings, 1 reply; 20+ messages in thread
From: Nikolay Aleksandrov @ 2024-08-16 11:48 UTC (permalink / raw)
To: netdev
Cc: Taehee Yoo, davem, jv, andy, edumazet, kuba, pabeni, jarod,
Nikolay Aleksandrov
Fix the return type which should be bool.
Fixes: 955b785ec6b3 ("bonding: fix suspicious RCU usage in bond_ipsec_offload_ok()")
Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
---
drivers/net/bonding/bond_main.c | 18 ++++++------------
1 file changed, 6 insertions(+), 12 deletions(-)
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 1cd92c12e782..85b5868deeea 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -599,34 +599,28 @@ static bool bond_ipsec_offload_ok(struct sk_buff *skb, struct xfrm_state *xs)
struct net_device *real_dev;
struct slave *curr_active;
struct bonding *bond;
- int err;
+ bool ok = false;
bond = netdev_priv(bond_dev);
rcu_read_lock();
curr_active = rcu_dereference(bond->curr_active_slave);
real_dev = curr_active->dev;
- if (BOND_MODE(bond) != BOND_MODE_ACTIVEBACKUP) {
- err = false;
+ if (BOND_MODE(bond) != BOND_MODE_ACTIVEBACKUP)
goto out;
- }
- if (!xs->xso.real_dev) {
- err = false;
+ if (!xs->xso.real_dev)
goto out;
- }
if (!real_dev->xfrmdev_ops ||
!real_dev->xfrmdev_ops->xdo_dev_offload_ok ||
- netif_is_bond_master(real_dev)) {
- err = false;
+ netif_is_bond_master(real_dev))
goto out;
- }
- err = real_dev->xfrmdev_ops->xdo_dev_offload_ok(skb, xs);
+ ok = real_dev->xfrmdev_ops->xdo_dev_offload_ok(skb, xs);
out:
rcu_read_unlock();
- return err;
+ return ok;
}
static const struct xfrmdev_ops bond_xfrmdev_ops = {
--
2.44.0
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH net 2/4] bonding: fix null pointer deref in bond_ipsec_offload_ok
2024-08-16 11:48 [PATCH net 0/4] bonding: fix xfrm offload bugs Nikolay Aleksandrov
2024-08-16 11:48 ` [PATCH net 1/4] bonding: fix bond_ipsec_offload_ok return type Nikolay Aleksandrov
@ 2024-08-16 11:48 ` Nikolay Aleksandrov
2024-08-19 2:45 ` Hangbin Liu
` (2 more replies)
2024-08-16 11:48 ` [PATCH net 3/4] bonding: fix xfrm real_dev null pointer dereference Nikolay Aleksandrov
` (3 subsequent siblings)
5 siblings, 3 replies; 20+ messages in thread
From: Nikolay Aleksandrov @ 2024-08-16 11:48 UTC (permalink / raw)
To: netdev
Cc: Taehee Yoo, davem, jv, andy, edumazet, kuba, pabeni, jarod,
Nikolay Aleksandrov
We must check if there is an active slave before dereferencing the pointer.
Fixes: 18cb261afd7b ("bonding: support hardware encryption offload to slaves")
Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
---
drivers/net/bonding/bond_main.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 85b5868deeea..65ddb71eebcd 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -604,6 +604,8 @@ static bool bond_ipsec_offload_ok(struct sk_buff *skb, struct xfrm_state *xs)
bond = netdev_priv(bond_dev);
rcu_read_lock();
curr_active = rcu_dereference(bond->curr_active_slave);
+ if (!curr_active)
+ goto out;
real_dev = curr_active->dev;
if (BOND_MODE(bond) != BOND_MODE_ACTIVEBACKUP)
--
2.44.0
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH net 3/4] bonding: fix xfrm real_dev null pointer dereference
2024-08-16 11:48 [PATCH net 0/4] bonding: fix xfrm offload bugs Nikolay Aleksandrov
2024-08-16 11:48 ` [PATCH net 1/4] bonding: fix bond_ipsec_offload_ok return type Nikolay Aleksandrov
2024-08-16 11:48 ` [PATCH net 2/4] bonding: fix null pointer deref in bond_ipsec_offload_ok Nikolay Aleksandrov
@ 2024-08-16 11:48 ` Nikolay Aleksandrov
2024-08-19 2:54 ` Hangbin Liu
2024-08-16 11:48 ` [PATCH net 4/4] bonding: fix xfrm state handling when clearing active slave Nikolay Aleksandrov
` (2 subsequent siblings)
5 siblings, 1 reply; 20+ messages in thread
From: Nikolay Aleksandrov @ 2024-08-16 11:48 UTC (permalink / raw)
To: netdev
Cc: Taehee Yoo, davem, jv, andy, edumazet, kuba, pabeni, jarod,
Nikolay Aleksandrov
We shouldn't set real_dev to NULL because packets can be in transit and
xfrm might call xdo_dev_offload_ok() in parallel. All callbacks assume
real_dev is set.
Example trace:
kernel: BUG: unable to handle page fault for address: 0000000000001030
kernel: bond0: (slave eni0np1): making interface the new active one
kernel: #PF: supervisor write access in kernel mode
kernel: #PF: error_code(0x0002) - not-present page
kernel: PGD 0 P4D 0
kernel: Oops: 0002 [#1] PREEMPT SMP
kernel: CPU: 4 PID: 2237 Comm: ping Not tainted 6.7.7+ #12
kernel: Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-2.fc40 04/01/2014
kernel: RIP: 0010:nsim_ipsec_offload_ok+0xc/0x20 [netdevsim]
kernel: bond0: (slave eni0np1): bond_ipsec_add_sa_all: failed to add SA
kernel: Code: e0 0f 0b 48 83 7f 38 00 74 de 0f 0b 48 8b 47 08 48 8b 37 48 8b 78 40 e9 b2 e5 9a d7 66 90 0f 1f 44 00 00 48 8b 86 80 02 00 00 <83> 80 30 10 00 00 01 b8 01 00 00 00 c3 0f 1f 80 00 00 00 00 0f 1f
kernel: bond0: (slave eni0np1): making interface the new active one
kernel: RSP: 0018:ffffabde81553b98 EFLAGS: 00010246
kernel: bond0: (slave eni0np1): bond_ipsec_add_sa_all: failed to add SA
kernel:
kernel: RAX: 0000000000000000 RBX: ffff9eb404e74900 RCX: ffff9eb403d97c60
kernel: RDX: ffffffffc090de10 RSI: ffff9eb404e74900 RDI: ffff9eb3c5de9e00
kernel: RBP: ffff9eb3c0a42000 R08: 0000000000000010 R09: 0000000000000014
kernel: R10: 7974203030303030 R11: 3030303030303030 R12: 0000000000000000
kernel: R13: ffff9eb3c5de9e00 R14: ffffabde81553cc8 R15: ffff9eb404c53000
kernel: FS: 00007f2a77a3ad00(0000) GS:ffff9eb43bd00000(0000) knlGS:0000000000000000
kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kernel: CR2: 0000000000001030 CR3: 00000001122ab000 CR4: 0000000000350ef0
kernel: bond0: (slave eni0np1): making interface the new active one
kernel: Call Trace:
kernel: <TASK>
kernel: ? __die+0x1f/0x60
kernel: bond0: (slave eni0np1): bond_ipsec_add_sa_all: failed to add SA
kernel: ? page_fault_oops+0x142/0x4c0
kernel: ? do_user_addr_fault+0x65/0x670
kernel: ? kvm_read_and_reset_apf_flags+0x3b/0x50
kernel: bond0: (slave eni0np1): making interface the new active one
kernel: ? exc_page_fault+0x7b/0x180
kernel: ? asm_exc_page_fault+0x22/0x30
kernel: ? nsim_bpf_uninit+0x50/0x50 [netdevsim]
kernel: bond0: (slave eni0np1): bond_ipsec_add_sa_all: failed to add SA
kernel: ? nsim_ipsec_offload_ok+0xc/0x20 [netdevsim]
kernel: bond0: (slave eni0np1): making interface the new active one
kernel: bond_ipsec_offload_ok+0x7b/0x90 [bonding]
kernel: xfrm_output+0x61/0x3b0
kernel: bond0: (slave eni0np1): bond_ipsec_add_sa_all: failed to add SA
kernel: ip_push_pending_frames+0x56/0x80
Fixes: 18cb261afd7b ("bonding: support hardware encryption offload to slaves")
Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
---
drivers/net/bonding/bond_main.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 65ddb71eebcd..f74bacf071fc 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -582,7 +582,6 @@ static void bond_ipsec_del_sa_all(struct bonding *bond)
} else {
slave->dev->xfrmdev_ops->xdo_dev_state_delete(ipsec->xs);
}
- ipsec->xs->xso.real_dev = NULL;
}
spin_unlock_bh(&bond->ipsec_lock);
rcu_read_unlock();
--
2.44.0
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH net 4/4] bonding: fix xfrm state handling when clearing active slave
2024-08-16 11:48 [PATCH net 0/4] bonding: fix xfrm offload bugs Nikolay Aleksandrov
` (2 preceding siblings ...)
2024-08-16 11:48 ` [PATCH net 3/4] bonding: fix xfrm real_dev null pointer dereference Nikolay Aleksandrov
@ 2024-08-16 11:48 ` Nikolay Aleksandrov
2024-08-19 3:05 ` Hangbin Liu
2024-08-16 11:52 ` [PATCH net 0/4] bonding: fix xfrm offload bugs Nikolay Aleksandrov
2024-08-20 13:40 ` patchwork-bot+netdevbpf
5 siblings, 1 reply; 20+ messages in thread
From: Nikolay Aleksandrov @ 2024-08-16 11:48 UTC (permalink / raw)
To: netdev
Cc: Taehee Yoo, davem, jv, andy, edumazet, kuba, pabeni, jarod,
Nikolay Aleksandrov
If the active slave is cleared manually the xfrm state is not flushed.
This leads to xfrm add/del imbalance and adding the same state multiple
times. For example when the device cannot handle anymore states we get:
[ 1169.884811] bond0: (slave eni0np1): bond_ipsec_add_sa_all: failed to add SA
because it's filled with the same state after multiple active slave
clearings. This change also has a few nice side effects: user-space
gets a notification for the change, the old device gets its mac address
and promisc/mcast adjusted properly.
Fixes: 18cb261afd7b ("bonding: support hardware encryption offload to slaves")
Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
---
Please review this one more carefully. I plan to add a selftest with
netdevsim for this as well.
drivers/net/bonding/bond_options.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/bonding/bond_options.c b/drivers/net/bonding/bond_options.c
index bc80fb6397dc..95d59a18c022 100644
--- a/drivers/net/bonding/bond_options.c
+++ b/drivers/net/bonding/bond_options.c
@@ -936,7 +936,7 @@ static int bond_option_active_slave_set(struct bonding *bond,
/* check to see if we are clearing active */
if (!slave_dev) {
netdev_dbg(bond->dev, "Clearing current active slave\n");
- RCU_INIT_POINTER(bond->curr_active_slave, NULL);
+ bond_change_active_slave(bond, NULL);
bond_select_active_slave(bond);
} else {
struct slave *old_active = rtnl_dereference(bond->curr_active_slave);
--
2.44.0
^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH net 0/4] bonding: fix xfrm offload bugs
2024-08-16 11:48 [PATCH net 0/4] bonding: fix xfrm offload bugs Nikolay Aleksandrov
` (3 preceding siblings ...)
2024-08-16 11:48 ` [PATCH net 4/4] bonding: fix xfrm state handling when clearing active slave Nikolay Aleksandrov
@ 2024-08-16 11:52 ` Nikolay Aleksandrov
2024-08-20 13:40 ` patchwork-bot+netdevbpf
5 siblings, 0 replies; 20+ messages in thread
From: Nikolay Aleksandrov @ 2024-08-16 11:52 UTC (permalink / raw)
To: netdev
Cc: Taehee Yoo, davem, jv, andy, edumazet, kuba, pabeni, jarod,
Hangbin Liu
On 16/08/2024 14:48, Nikolay Aleksandrov wrote:
> Hi,
> I noticed these problems while reviewing a bond xfrm patch recently.
> The fixes are straight-forward, please review carefully the last one
> because it has side-effects. This set has passed bond's selftests
> and my custom bond stress tests which crash without these fixes.
>
> Note the first patch is not critical, but it simplifies the next fix.
>
> Thanks,
> Nik
>
>
> Nikolay Aleksandrov (4):
> bonding: fix bond_ipsec_offload_ok return type
> bonding: fix null pointer deref in bond_ipsec_offload_ok
> bonding: fix xfrm real_dev null pointer dereference
> bonding: fix xfrm state handling when clearing active slave
>
> drivers/net/bonding/bond_main.c | 21 ++++++++-------------
> drivers/net/bonding/bond_options.c | 2 +-
> 2 files changed, 9 insertions(+), 14 deletions(-)
>
Oops forgot to CC Hangbin, sorry about that (CCed).
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH net 1/4] bonding: fix bond_ipsec_offload_ok return type
2024-08-16 11:48 ` [PATCH net 1/4] bonding: fix bond_ipsec_offload_ok return type Nikolay Aleksandrov
@ 2024-08-19 2:43 ` Hangbin Liu
0 siblings, 0 replies; 20+ messages in thread
From: Hangbin Liu @ 2024-08-19 2:43 UTC (permalink / raw)
To: Nikolay Aleksandrov
Cc: netdev, Taehee Yoo, davem, jv, andy, edumazet, kuba, pabeni,
jarod
On Fri, Aug 16, 2024 at 02:48:10PM +0300, Nikolay Aleksandrov wrote:
> Fix the return type which should be bool.
>
> Fixes: 955b785ec6b3 ("bonding: fix suspicious RCU usage in bond_ipsec_offload_ok()")
> Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
> ---
> drivers/net/bonding/bond_main.c | 18 ++++++------------
> 1 file changed, 6 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> index 1cd92c12e782..85b5868deeea 100644
> --- a/drivers/net/bonding/bond_main.c
> +++ b/drivers/net/bonding/bond_main.c
> @@ -599,34 +599,28 @@ static bool bond_ipsec_offload_ok(struct sk_buff *skb, struct xfrm_state *xs)
> struct net_device *real_dev;
> struct slave *curr_active;
> struct bonding *bond;
> - int err;
> + bool ok = false;
>
> bond = netdev_priv(bond_dev);
> rcu_read_lock();
> curr_active = rcu_dereference(bond->curr_active_slave);
> real_dev = curr_active->dev;
>
> - if (BOND_MODE(bond) != BOND_MODE_ACTIVEBACKUP) {
> - err = false;
> + if (BOND_MODE(bond) != BOND_MODE_ACTIVEBACKUP)
> goto out;
> - }
>
> - if (!xs->xso.real_dev) {
> - err = false;
> + if (!xs->xso.real_dev)
> goto out;
> - }
>
> if (!real_dev->xfrmdev_ops ||
> !real_dev->xfrmdev_ops->xdo_dev_offload_ok ||
> - netif_is_bond_master(real_dev)) {
> - err = false;
> + netif_is_bond_master(real_dev))
> goto out;
> - }
>
> - err = real_dev->xfrmdev_ops->xdo_dev_offload_ok(skb, xs);
> + ok = real_dev->xfrmdev_ops->xdo_dev_offload_ok(skb, xs);
> out:
> rcu_read_unlock();
> - return err;
> + return ok;
> }
>
> static const struct xfrmdev_ops bond_xfrmdev_ops = {
> --
> 2.44.0
>
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH net 2/4] bonding: fix null pointer deref in bond_ipsec_offload_ok
2024-08-16 11:48 ` [PATCH net 2/4] bonding: fix null pointer deref in bond_ipsec_offload_ok Nikolay Aleksandrov
@ 2024-08-19 2:45 ` Hangbin Liu
2024-08-19 2:53 ` Hangbin Liu
2024-08-19 7:26 ` Eric Dumazet
2 siblings, 0 replies; 20+ messages in thread
From: Hangbin Liu @ 2024-08-19 2:45 UTC (permalink / raw)
To: Nikolay Aleksandrov
Cc: netdev, Taehee Yoo, davem, jv, andy, edumazet, kuba, pabeni,
jarod
On Fri, Aug 16, 2024 at 02:48:11PM +0300, Nikolay Aleksandrov wrote:
> We must check if there is an active slave before dereferencing the pointer.
>
> Fixes: 18cb261afd7b ("bonding: support hardware encryption offload to slaves")
> Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
> ---
> drivers/net/bonding/bond_main.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> index 85b5868deeea..65ddb71eebcd 100644
> --- a/drivers/net/bonding/bond_main.c
> +++ b/drivers/net/bonding/bond_main.c
> @@ -604,6 +604,8 @@ static bool bond_ipsec_offload_ok(struct sk_buff *skb, struct xfrm_state *xs)
> bond = netdev_priv(bond_dev);
> rcu_read_lock();
> curr_active = rcu_dereference(bond->curr_active_slave);
> + if (!curr_active)
> + goto out;
> real_dev = curr_active->dev;
>
> if (BOND_MODE(bond) != BOND_MODE_ACTIVEBACKUP)
> --
> 2.44.0
>
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH net 2/4] bonding: fix null pointer deref in bond_ipsec_offload_ok
2024-08-16 11:48 ` [PATCH net 2/4] bonding: fix null pointer deref in bond_ipsec_offload_ok Nikolay Aleksandrov
2024-08-19 2:45 ` Hangbin Liu
@ 2024-08-19 2:53 ` Hangbin Liu
2024-08-19 7:25 ` Nikolay Aleksandrov
2024-08-19 7:26 ` Eric Dumazet
2 siblings, 1 reply; 20+ messages in thread
From: Hangbin Liu @ 2024-08-19 2:53 UTC (permalink / raw)
To: Nikolay Aleksandrov
Cc: netdev, Taehee Yoo, davem, jv, andy, edumazet, kuba, pabeni,
jarod
On Fri, Aug 16, 2024 at 02:48:11PM +0300, Nikolay Aleksandrov wrote:
> We must check if there is an active slave before dereferencing the pointer.
>
> Fixes: 18cb261afd7b ("bonding: support hardware encryption offload to slaves")
> Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
> ---
> drivers/net/bonding/bond_main.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> index 85b5868deeea..65ddb71eebcd 100644
> --- a/drivers/net/bonding/bond_main.c
> +++ b/drivers/net/bonding/bond_main.c
> @@ -604,6 +604,8 @@ static bool bond_ipsec_offload_ok(struct sk_buff *skb, struct xfrm_state *xs)
> bond = netdev_priv(bond_dev);
> rcu_read_lock();
> curr_active = rcu_dereference(bond->curr_active_slave);
> + if (!curr_active)
> + goto out;
> real_dev = curr_active->dev;
>
> if (BOND_MODE(bond) != BOND_MODE_ACTIVEBACKUP)
> --
> 2.44.0
>
BTW, the bond_ipsec_offload_ok() only checks !xs->xso.real_dev, should we also
add WARN_ON(xs->xso.real_dev != slave->dev) checking?
Thanks
Hangbin
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH net 3/4] bonding: fix xfrm real_dev null pointer dereference
2024-08-16 11:48 ` [PATCH net 3/4] bonding: fix xfrm real_dev null pointer dereference Nikolay Aleksandrov
@ 2024-08-19 2:54 ` Hangbin Liu
2024-08-19 7:34 ` Nikolay Aleksandrov
0 siblings, 1 reply; 20+ messages in thread
From: Hangbin Liu @ 2024-08-19 2:54 UTC (permalink / raw)
To: Nikolay Aleksandrov
Cc: netdev, Taehee Yoo, davem, jv, andy, edumazet, kuba, pabeni,
jarod
On Fri, Aug 16, 2024 at 02:48:12PM +0300, Nikolay Aleksandrov wrote:
> We shouldn't set real_dev to NULL because packets can be in transit and
> xfrm might call xdo_dev_offload_ok() in parallel. All callbacks assume
> real_dev is set.
>
> Example trace:
> kernel: BUG: unable to handle page fault for address: 0000000000001030
> kernel: bond0: (slave eni0np1): making interface the new active one
> kernel: #PF: supervisor write access in kernel mode
> kernel: #PF: error_code(0x0002) - not-present page
> kernel: PGD 0 P4D 0
> kernel: Oops: 0002 [#1] PREEMPT SMP
> kernel: CPU: 4 PID: 2237 Comm: ping Not tainted 6.7.7+ #12
> kernel: Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-2.fc40 04/01/2014
> kernel: RIP: 0010:nsim_ipsec_offload_ok+0xc/0x20 [netdevsim]
> kernel: bond0: (slave eni0np1): bond_ipsec_add_sa_all: failed to add SA
I saw the errors are during bond_ipsec_add_sa_all, which also
set ipsec->xs->xso.real_dev = NULL. Should we fix it there?
Thanks
Hangbin
> kernel: Code: e0 0f 0b 48 83 7f 38 00 74 de 0f 0b 48 8b 47 08 48 8b 37 48 8b 78 40 e9 b2 e5 9a d7 66 90 0f 1f 44 00 00 48 8b 86 80 02 00 00 <83> 80 30 10 00 00 01 b8 01 00 00 00 c3 0f 1f 80 00 00 00 00 0f 1f
> kernel: bond0: (slave eni0np1): making interface the new active one
> kernel: RSP: 0018:ffffabde81553b98 EFLAGS: 00010246
> kernel: bond0: (slave eni0np1): bond_ipsec_add_sa_all: failed to add SA
> kernel:
> kernel: RAX: 0000000000000000 RBX: ffff9eb404e74900 RCX: ffff9eb403d97c60
> kernel: RDX: ffffffffc090de10 RSI: ffff9eb404e74900 RDI: ffff9eb3c5de9e00
> kernel: RBP: ffff9eb3c0a42000 R08: 0000000000000010 R09: 0000000000000014
> kernel: R10: 7974203030303030 R11: 3030303030303030 R12: 0000000000000000
> kernel: R13: ffff9eb3c5de9e00 R14: ffffabde81553cc8 R15: ffff9eb404c53000
> kernel: FS: 00007f2a77a3ad00(0000) GS:ffff9eb43bd00000(0000) knlGS:0000000000000000
> kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> kernel: CR2: 0000000000001030 CR3: 00000001122ab000 CR4: 0000000000350ef0
> kernel: bond0: (slave eni0np1): making interface the new active one
> kernel: Call Trace:
> kernel: <TASK>
> kernel: ? __die+0x1f/0x60
> kernel: bond0: (slave eni0np1): bond_ipsec_add_sa_all: failed to add SA
> kernel: ? page_fault_oops+0x142/0x4c0
> kernel: ? do_user_addr_fault+0x65/0x670
> kernel: ? kvm_read_and_reset_apf_flags+0x3b/0x50
> kernel: bond0: (slave eni0np1): making interface the new active one
> kernel: ? exc_page_fault+0x7b/0x180
> kernel: ? asm_exc_page_fault+0x22/0x30
> kernel: ? nsim_bpf_uninit+0x50/0x50 [netdevsim]
> kernel: bond0: (slave eni0np1): bond_ipsec_add_sa_all: failed to add SA
> kernel: ? nsim_ipsec_offload_ok+0xc/0x20 [netdevsim]
> kernel: bond0: (slave eni0np1): making interface the new active one
> kernel: bond_ipsec_offload_ok+0x7b/0x90 [bonding]
> kernel: xfrm_output+0x61/0x3b0
> kernel: bond0: (slave eni0np1): bond_ipsec_add_sa_all: failed to add SA
> kernel: ip_push_pending_frames+0x56/0x80
>
> Fixes: 18cb261afd7b ("bonding: support hardware encryption offload to slaves")
> Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
> ---
> drivers/net/bonding/bond_main.c | 1 -
> 1 file changed, 1 deletion(-)
>
> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> index 65ddb71eebcd..f74bacf071fc 100644
> --- a/drivers/net/bonding/bond_main.c
> +++ b/drivers/net/bonding/bond_main.c
> @@ -582,7 +582,6 @@ static void bond_ipsec_del_sa_all(struct bonding *bond)
> } else {
> slave->dev->xfrmdev_ops->xdo_dev_state_delete(ipsec->xs);
> }
> - ipsec->xs->xso.real_dev = NULL;
> }
> spin_unlock_bh(&bond->ipsec_lock);
> rcu_read_unlock();
> --
> 2.44.0
>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH net 4/4] bonding: fix xfrm state handling when clearing active slave
2024-08-16 11:48 ` [PATCH net 4/4] bonding: fix xfrm state handling when clearing active slave Nikolay Aleksandrov
@ 2024-08-19 3:05 ` Hangbin Liu
2024-08-19 7:38 ` Nikolay Aleksandrov
0 siblings, 1 reply; 20+ messages in thread
From: Hangbin Liu @ 2024-08-19 3:05 UTC (permalink / raw)
To: Nikolay Aleksandrov
Cc: netdev, Taehee Yoo, davem, jv, andy, edumazet, kuba, pabeni,
jarod
On Fri, Aug 16, 2024 at 02:48:13PM +0300, Nikolay Aleksandrov wrote:
> If the active slave is cleared manually the xfrm state is not flushed.
> This leads to xfrm add/del imbalance and adding the same state multiple
> times. For example when the device cannot handle anymore states we get:
> [ 1169.884811] bond0: (slave eni0np1): bond_ipsec_add_sa_all: failed to add SA
> because it's filled with the same state after multiple active slave
> clearings. This change also has a few nice side effects: user-space
> gets a notification for the change, the old device gets its mac address
> and promisc/mcast adjusted properly.
>
> Fixes: 18cb261afd7b ("bonding: support hardware encryption offload to slaves")
> Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
> ---
> Please review this one more carefully. I plan to add a selftest with
> netdevsim for this as well.
>
> drivers/net/bonding/bond_options.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/net/bonding/bond_options.c b/drivers/net/bonding/bond_options.c
> index bc80fb6397dc..95d59a18c022 100644
> --- a/drivers/net/bonding/bond_options.c
> +++ b/drivers/net/bonding/bond_options.c
> @@ -936,7 +936,7 @@ static int bond_option_active_slave_set(struct bonding *bond,
> /* check to see if we are clearing active */
> if (!slave_dev) {
> netdev_dbg(bond->dev, "Clearing current active slave\n");
> - RCU_INIT_POINTER(bond->curr_active_slave, NULL);
> + bond_change_active_slave(bond, NULL);
The good part of this is we can do bond_ipsec_del_sa_all and
bond_ipsec_add_sa_all. I'm not sure if we should do promisc/mcast adjustment
when set active_slave to null.
Jay should know better.
Thanks
Hangbin
> bond_select_active_slave(bond);
> } else {
> struct slave *old_active = rtnl_dereference(bond->curr_active_slave);
> --
> 2.44.0
>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH net 2/4] bonding: fix null pointer deref in bond_ipsec_offload_ok
2024-08-19 2:53 ` Hangbin Liu
@ 2024-08-19 7:25 ` Nikolay Aleksandrov
2024-08-19 8:12 ` Hangbin Liu
0 siblings, 1 reply; 20+ messages in thread
From: Nikolay Aleksandrov @ 2024-08-19 7:25 UTC (permalink / raw)
To: Hangbin Liu
Cc: netdev, Taehee Yoo, davem, jv, andy, edumazet, kuba, pabeni,
jarod
On 19/08/2024 05:53, Hangbin Liu wrote:
> On Fri, Aug 16, 2024 at 02:48:11PM +0300, Nikolay Aleksandrov wrote:
>> We must check if there is an active slave before dereferencing the pointer.
>>
>> Fixes: 18cb261afd7b ("bonding: support hardware encryption offload to slaves")
>> Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
>> ---
>> drivers/net/bonding/bond_main.c | 2 ++
>> 1 file changed, 2 insertions(+)
>>
>> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>> index 85b5868deeea..65ddb71eebcd 100644
>> --- a/drivers/net/bonding/bond_main.c
>> +++ b/drivers/net/bonding/bond_main.c
>> @@ -604,6 +604,8 @@ static bool bond_ipsec_offload_ok(struct sk_buff *skb, struct xfrm_state *xs)
>> bond = netdev_priv(bond_dev);
>> rcu_read_lock();
>> curr_active = rcu_dereference(bond->curr_active_slave);
>> + if (!curr_active)
>> + goto out;
>> real_dev = curr_active->dev;
>>
>> if (BOND_MODE(bond) != BOND_MODE_ACTIVEBACKUP)
>> --
>> 2.44.0
>>
>
> BTW, the bond_ipsec_offload_ok() only checks !xs->xso.real_dev, should we also
> add WARN_ON(xs->xso.real_dev != slave->dev) checking?
>
> Thanks
> Hangbin
We could, but not a warn_on() because I bet it can be easily triggered
by changing the active slave in parallel. real_dev is read without a
lock here so we cannot guarantee a sane state if policies are changed
under us. I think the callback should handle it by checking that the
new device doesn't have the policy setup yet, because the case happens
when an active slave changes which means policies are about to be
installed on the new one.
Cheers,
Nik
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH net 2/4] bonding: fix null pointer deref in bond_ipsec_offload_ok
2024-08-16 11:48 ` [PATCH net 2/4] bonding: fix null pointer deref in bond_ipsec_offload_ok Nikolay Aleksandrov
2024-08-19 2:45 ` Hangbin Liu
2024-08-19 2:53 ` Hangbin Liu
@ 2024-08-19 7:26 ` Eric Dumazet
2 siblings, 0 replies; 20+ messages in thread
From: Eric Dumazet @ 2024-08-19 7:26 UTC (permalink / raw)
To: Nikolay Aleksandrov
Cc: netdev, Taehee Yoo, davem, jv, andy, kuba, pabeni, jarod
On Fri, Aug 16, 2024 at 1:51 PM Nikolay Aleksandrov <razor@blackwall.org> wrote:
>
> We must check if there is an active slave before dereferencing the pointer.
>
> Fixes: 18cb261afd7b ("bonding: support hardware encryption offload to slaves")
> Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
> ---
Reviewed-by: Eric Dumazet <edumazet@google.com>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH net 3/4] bonding: fix xfrm real_dev null pointer dereference
2024-08-19 2:54 ` Hangbin Liu
@ 2024-08-19 7:34 ` Nikolay Aleksandrov
2024-08-19 8:25 ` Nikolay Aleksandrov
2024-08-19 8:34 ` Hangbin Liu
0 siblings, 2 replies; 20+ messages in thread
From: Nikolay Aleksandrov @ 2024-08-19 7:34 UTC (permalink / raw)
To: Hangbin Liu
Cc: netdev, Taehee Yoo, davem, jv, andy, edumazet, kuba, pabeni,
jarod
On 19/08/2024 05:54, Hangbin Liu wrote:
> On Fri, Aug 16, 2024 at 02:48:12PM +0300, Nikolay Aleksandrov wrote:
>> We shouldn't set real_dev to NULL because packets can be in transit and
>> xfrm might call xdo_dev_offload_ok() in parallel. All callbacks assume
>> real_dev is set.
>>
>> Example trace:
>> kernel: BUG: unable to handle page fault for address: 0000000000001030
>> kernel: bond0: (slave eni0np1): making interface the new active one
>> kernel: #PF: supervisor write access in kernel mode
>> kernel: #PF: error_code(0x0002) - not-present page
>> kernel: PGD 0 P4D 0
>> kernel: Oops: 0002 [#1] PREEMPT SMP
>> kernel: CPU: 4 PID: 2237 Comm: ping Not tainted 6.7.7+ #12
>> kernel: Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-2.fc40 04/01/2014
>> kernel: RIP: 0010:nsim_ipsec_offload_ok+0xc/0x20 [netdevsim]
>> kernel: bond0: (slave eni0np1): bond_ipsec_add_sa_all: failed to add SA
>
> I saw the errors are during bond_ipsec_add_sa_all, which also
> set ipsec->xs->xso.real_dev = NULL. Should we fix it there?
>
> Thanks
> Hangbin
Correct, I saw it too but I didn't remove it on purpose. I know it can lead to a
similar error, but the fix is more complicated. I don't believe it's correct to
set real_dev if the SA add failed, so we need to think about a different way
to sync it. To be fair in real life it would be more difficult to hit it because
the device must be in a state where the SA add fails, although it supports
xfrm offload. The problem is that real_dev must be set before attempting the SA
add in the first place.
>> kernel: Code: e0 0f 0b 48 83 7f 38 00 74 de 0f 0b 48 8b 47 08 48 8b 37 48 8b 78 40 e9 b2 e5 9a d7 66 90 0f 1f 44 00 00 48 8b 86 80 02 00 00 <83> 80 30 10 00 00 01 b8 01 00 00 00 c3 0f 1f 80 00 00 00 00 0f 1f
>> kernel: bond0: (slave eni0np1): making interface the new active one
>> kernel: RSP: 0018:ffffabde81553b98 EFLAGS: 00010246
>> kernel: bond0: (slave eni0np1): bond_ipsec_add_sa_all: failed to add SA
>> kernel:
>> kernel: RAX: 0000000000000000 RBX: ffff9eb404e74900 RCX: ffff9eb403d97c60
>> kernel: RDX: ffffffffc090de10 RSI: ffff9eb404e74900 RDI: ffff9eb3c5de9e00
>> kernel: RBP: ffff9eb3c0a42000 R08: 0000000000000010 R09: 0000000000000014
>> kernel: R10: 7974203030303030 R11: 3030303030303030 R12: 0000000000000000
>> kernel: R13: ffff9eb3c5de9e00 R14: ffffabde81553cc8 R15: ffff9eb404c53000
>> kernel: FS: 00007f2a77a3ad00(0000) GS:ffff9eb43bd00000(0000) knlGS:0000000000000000
>> kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> kernel: CR2: 0000000000001030 CR3: 00000001122ab000 CR4: 0000000000350ef0
>> kernel: bond0: (slave eni0np1): making interface the new active one
>> kernel: Call Trace:
>> kernel: <TASK>
>> kernel: ? __die+0x1f/0x60
>> kernel: bond0: (slave eni0np1): bond_ipsec_add_sa_all: failed to add SA
>> kernel: ? page_fault_oops+0x142/0x4c0
>> kernel: ? do_user_addr_fault+0x65/0x670
>> kernel: ? kvm_read_and_reset_apf_flags+0x3b/0x50
>> kernel: bond0: (slave eni0np1): making interface the new active one
>> kernel: ? exc_page_fault+0x7b/0x180
>> kernel: ? asm_exc_page_fault+0x22/0x30
>> kernel: ? nsim_bpf_uninit+0x50/0x50 [netdevsim]
>> kernel: bond0: (slave eni0np1): bond_ipsec_add_sa_all: failed to add SA
>> kernel: ? nsim_ipsec_offload_ok+0xc/0x20 [netdevsim]
>> kernel: bond0: (slave eni0np1): making interface the new active one
>> kernel: bond_ipsec_offload_ok+0x7b/0x90 [bonding]
>> kernel: xfrm_output+0x61/0x3b0
>> kernel: bond0: (slave eni0np1): bond_ipsec_add_sa_all: failed to add SA
>> kernel: ip_push_pending_frames+0x56/0x80
>>
>> Fixes: 18cb261afd7b ("bonding: support hardware encryption offload to slaves")
>> Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
>> ---
>> drivers/net/bonding/bond_main.c | 1 -
>> 1 file changed, 1 deletion(-)
>>
>> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>> index 65ddb71eebcd..f74bacf071fc 100644
>> --- a/drivers/net/bonding/bond_main.c
>> +++ b/drivers/net/bonding/bond_main.c
>> @@ -582,7 +582,6 @@ static void bond_ipsec_del_sa_all(struct bonding *bond)
>> } else {
>> slave->dev->xfrmdev_ops->xdo_dev_state_delete(ipsec->xs);
>> }
>> - ipsec->xs->xso.real_dev = NULL;
>> }
>> spin_unlock_bh(&bond->ipsec_lock);
>> rcu_read_unlock();
>> --
>> 2.44.0
>>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH net 4/4] bonding: fix xfrm state handling when clearing active slave
2024-08-19 3:05 ` Hangbin Liu
@ 2024-08-19 7:38 ` Nikolay Aleksandrov
2024-08-20 3:48 ` Hangbin Liu
0 siblings, 1 reply; 20+ messages in thread
From: Nikolay Aleksandrov @ 2024-08-19 7:38 UTC (permalink / raw)
To: Hangbin Liu
Cc: netdev, Taehee Yoo, davem, jv, andy, edumazet, kuba, pabeni,
jarod
On 19/08/2024 06:05, Hangbin Liu wrote:
> On Fri, Aug 16, 2024 at 02:48:13PM +0300, Nikolay Aleksandrov wrote:
>> If the active slave is cleared manually the xfrm state is not flushed.
>> This leads to xfrm add/del imbalance and adding the same state multiple
>> times. For example when the device cannot handle anymore states we get:
>> [ 1169.884811] bond0: (slave eni0np1): bond_ipsec_add_sa_all: failed to add SA
>> because it's filled with the same state after multiple active slave
>> clearings. This change also has a few nice side effects: user-space
>> gets a notification for the change, the old device gets its mac address
>> and promisc/mcast adjusted properly.
>>
>> Fixes: 18cb261afd7b ("bonding: support hardware encryption offload to slaves")
>> Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
>> ---
>> Please review this one more carefully. I plan to add a selftest with
>> netdevsim for this as well.
>>
>> drivers/net/bonding/bond_options.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/net/bonding/bond_options.c b/drivers/net/bonding/bond_options.c
>> index bc80fb6397dc..95d59a18c022 100644
>> --- a/drivers/net/bonding/bond_options.c
>> +++ b/drivers/net/bonding/bond_options.c
>> @@ -936,7 +936,7 @@ static int bond_option_active_slave_set(struct bonding *bond,
>> /* check to see if we are clearing active */
>> if (!slave_dev) {
>> netdev_dbg(bond->dev, "Clearing current active slave\n");
>> - RCU_INIT_POINTER(bond->curr_active_slave, NULL);
>> + bond_change_active_slave(bond, NULL);
>
> The good part of this is we can do bond_ipsec_del_sa_all and
> bond_ipsec_add_sa_all. I'm not sure if we should do promisc/mcast adjustment
> when set active_slave to null.
>
> Jay should know better.
>
> Thanks
> Hangbin
Jay please correct me, but I'm pretty sure we should adjust them. They get adjusted on
every active slave change, this is no different. In fact I'd argue that it's a long
standing bug because they don't get adjusted when the active slave is cleared
manually and if a new one is chosen (we call bond_select_active_slave() right after)
then the old one would still have them set. During normal operations and automatic
curr active slave changes, it is always adjusted.
>> bond_select_active_slave(bond);
>> } else {
>> struct slave *old_active = rtnl_dereference(bond->curr_active_slave);
>> --
>> 2.44.0
>>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH net 2/4] bonding: fix null pointer deref in bond_ipsec_offload_ok
2024-08-19 7:25 ` Nikolay Aleksandrov
@ 2024-08-19 8:12 ` Hangbin Liu
0 siblings, 0 replies; 20+ messages in thread
From: Hangbin Liu @ 2024-08-19 8:12 UTC (permalink / raw)
To: Nikolay Aleksandrov
Cc: netdev, Taehee Yoo, davem, jv, andy, edumazet, kuba, pabeni,
jarod
On Mon, Aug 19, 2024 at 10:25:37AM +0300, Nikolay Aleksandrov wrote:
> On 19/08/2024 05:53, Hangbin Liu wrote:
> > On Fri, Aug 16, 2024 at 02:48:11PM +0300, Nikolay Aleksandrov wrote:
> >> We must check if there is an active slave before dereferencing the pointer.
> >>
> >> Fixes: 18cb261afd7b ("bonding: support hardware encryption offload to slaves")
> >> Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
> >> ---
> >> drivers/net/bonding/bond_main.c | 2 ++
> >> 1 file changed, 2 insertions(+)
> >>
> >> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> >> index 85b5868deeea..65ddb71eebcd 100644
> >> --- a/drivers/net/bonding/bond_main.c
> >> +++ b/drivers/net/bonding/bond_main.c
> >> @@ -604,6 +604,8 @@ static bool bond_ipsec_offload_ok(struct sk_buff *skb, struct xfrm_state *xs)
> >> bond = netdev_priv(bond_dev);
> >> rcu_read_lock();
> >> curr_active = rcu_dereference(bond->curr_active_slave);
> >> + if (!curr_active)
> >> + goto out;
> >> real_dev = curr_active->dev;
> >>
> >> if (BOND_MODE(bond) != BOND_MODE_ACTIVEBACKUP)
> >> --
> >> 2.44.0
> >>
> >
> > BTW, the bond_ipsec_offload_ok() only checks !xs->xso.real_dev, should we also
> > add WARN_ON(xs->xso.real_dev != slave->dev) checking?
> >
> > Thanks
> > Hangbin
>
> We could, but not a warn_on() because I bet it can be easily triggered
> by changing the active slave in parallel. real_dev is read without a
OK, maybe a pr_warn or salve_warn()?
> lock here so we cannot guarantee a sane state if policies are changed
> under us. I think the callback should handle it by checking that the
> new device doesn't have the policy setup yet, because the case happens
> when an active slave changes which means policies are about to be
> installed on the new one.
Hmm, how to check if a device has policy setup except ipsec->xs->xso.real_dev?
Hangbin
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH net 3/4] bonding: fix xfrm real_dev null pointer dereference
2024-08-19 7:34 ` Nikolay Aleksandrov
@ 2024-08-19 8:25 ` Nikolay Aleksandrov
2024-08-19 8:34 ` Hangbin Liu
1 sibling, 0 replies; 20+ messages in thread
From: Nikolay Aleksandrov @ 2024-08-19 8:25 UTC (permalink / raw)
To: Hangbin Liu
Cc: netdev, Taehee Yoo, davem, jv, andy, edumazet, kuba, pabeni,
jarod
On 19/08/2024 10:34, Nikolay Aleksandrov wrote:
> On 19/08/2024 05:54, Hangbin Liu wrote:
>> On Fri, Aug 16, 2024 at 02:48:12PM +0300, Nikolay Aleksandrov wrote:
>>> We shouldn't set real_dev to NULL because packets can be in transit and
>>> xfrm might call xdo_dev_offload_ok() in parallel. All callbacks assume
>>> real_dev is set.
>>>
>>> Example trace:
>>> kernel: BUG: unable to handle page fault for address: 0000000000001030
>>> kernel: bond0: (slave eni0np1): making interface the new active one
>>> kernel: #PF: supervisor write access in kernel mode
>>> kernel: #PF: error_code(0x0002) - not-present page
>>> kernel: PGD 0 P4D 0
>>> kernel: Oops: 0002 [#1] PREEMPT SMP
>>> kernel: CPU: 4 PID: 2237 Comm: ping Not tainted 6.7.7+ #12
>>> kernel: Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-2.fc40 04/01/2014
>>> kernel: RIP: 0010:nsim_ipsec_offload_ok+0xc/0x20 [netdevsim]
>>> kernel: bond0: (slave eni0np1): bond_ipsec_add_sa_all: failed to add SA
>>
>> I saw the errors are during bond_ipsec_add_sa_all, which also
>> set ipsec->xs->xso.real_dev = NULL. Should we fix it there?
>>
>> Thanks
>> Hangbin
>
> Correct, I saw it too but I didn't remove it on purpose. I know it can lead to a
> similar error, but the fix is more complicated. I don't believe it's correct to
> set real_dev if the SA add failed, so we need to think about a different way
> to sync it. To be fair in real life it would be more difficult to hit it because
> the device must be in a state where the SA add fails, although it supports
> xfrm offload. The problem is that real_dev must be set before attempting the SA
> add in the first place.
>
Just fyi I do have an idea about an additional bit that is set on successful ops
in combination with a call_rcu to wait for a grace period on error, I'll test it
this week and send a patch if it's good.
>>> kernel: Code: e0 0f 0b 48 83 7f 38 00 74 de 0f 0b 48 8b 47 08 48 8b 37 48 8b 78 40 e9 b2 e5 9a d7 66 90 0f 1f 44 00 00 48 8b 86 80 02 00 00 <83> 80 30 10 00 00 01 b8 01 00 00 00 c3 0f 1f 80 00 00 00 00 0f 1f
>>> kernel: bond0: (slave eni0np1): making interface the new active one
>>> kernel: RSP: 0018:ffffabde81553b98 EFLAGS: 00010246
>>> kernel: bond0: (slave eni0np1): bond_ipsec_add_sa_all: failed to add SA
>>> kernel:
>>> kernel: RAX: 0000000000000000 RBX: ffff9eb404e74900 RCX: ffff9eb403d97c60
>>> kernel: RDX: ffffffffc090de10 RSI: ffff9eb404e74900 RDI: ffff9eb3c5de9e00
>>> kernel: RBP: ffff9eb3c0a42000 R08: 0000000000000010 R09: 0000000000000014
>>> kernel: R10: 7974203030303030 R11: 3030303030303030 R12: 0000000000000000
>>> kernel: R13: ffff9eb3c5de9e00 R14: ffffabde81553cc8 R15: ffff9eb404c53000
>>> kernel: FS: 00007f2a77a3ad00(0000) GS:ffff9eb43bd00000(0000) knlGS:0000000000000000
>>> kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> kernel: CR2: 0000000000001030 CR3: 00000001122ab000 CR4: 0000000000350ef0
>>> kernel: bond0: (slave eni0np1): making interface the new active one
>>> kernel: Call Trace:
>>> kernel: <TASK>
>>> kernel: ? __die+0x1f/0x60
>>> kernel: bond0: (slave eni0np1): bond_ipsec_add_sa_all: failed to add SA
>>> kernel: ? page_fault_oops+0x142/0x4c0
>>> kernel: ? do_user_addr_fault+0x65/0x670
>>> kernel: ? kvm_read_and_reset_apf_flags+0x3b/0x50
>>> kernel: bond0: (slave eni0np1): making interface the new active one
>>> kernel: ? exc_page_fault+0x7b/0x180
>>> kernel: ? asm_exc_page_fault+0x22/0x30
>>> kernel: ? nsim_bpf_uninit+0x50/0x50 [netdevsim]
>>> kernel: bond0: (slave eni0np1): bond_ipsec_add_sa_all: failed to add SA
>>> kernel: ? nsim_ipsec_offload_ok+0xc/0x20 [netdevsim]
>>> kernel: bond0: (slave eni0np1): making interface the new active one
>>> kernel: bond_ipsec_offload_ok+0x7b/0x90 [bonding]
>>> kernel: xfrm_output+0x61/0x3b0
>>> kernel: bond0: (slave eni0np1): bond_ipsec_add_sa_all: failed to add SA
>>> kernel: ip_push_pending_frames+0x56/0x80
>>>
>>> Fixes: 18cb261afd7b ("bonding: support hardware encryption offload to slaves")
>>> Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
>>> ---
>>> drivers/net/bonding/bond_main.c | 1 -
>>> 1 file changed, 1 deletion(-)
>>>
>>> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>>> index 65ddb71eebcd..f74bacf071fc 100644
>>> --- a/drivers/net/bonding/bond_main.c
>>> +++ b/drivers/net/bonding/bond_main.c
>>> @@ -582,7 +582,6 @@ static void bond_ipsec_del_sa_all(struct bonding *bond)
>>> } else {
>>> slave->dev->xfrmdev_ops->xdo_dev_state_delete(ipsec->xs);
>>> }
>>> - ipsec->xs->xso.real_dev = NULL;
>>> }
>>> spin_unlock_bh(&bond->ipsec_lock);
>>> rcu_read_unlock();
>>> --
>>> 2.44.0
>>>
>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH net 3/4] bonding: fix xfrm real_dev null pointer dereference
2024-08-19 7:34 ` Nikolay Aleksandrov
2024-08-19 8:25 ` Nikolay Aleksandrov
@ 2024-08-19 8:34 ` Hangbin Liu
1 sibling, 0 replies; 20+ messages in thread
From: Hangbin Liu @ 2024-08-19 8:34 UTC (permalink / raw)
To: Nikolay Aleksandrov
Cc: netdev, Taehee Yoo, davem, jv, andy, edumazet, kuba, pabeni,
jarod
On Mon, Aug 19, 2024 at 10:34:16AM +0300, Nikolay Aleksandrov wrote:
> On 19/08/2024 05:54, Hangbin Liu wrote:
> > On Fri, Aug 16, 2024 at 02:48:12PM +0300, Nikolay Aleksandrov wrote:
> >> We shouldn't set real_dev to NULL because packets can be in transit and
> >> xfrm might call xdo_dev_offload_ok() in parallel. All callbacks assume
> >> real_dev is set.
> >>
> >> Example trace:
> >> kernel: BUG: unable to handle page fault for address: 0000000000001030
> >> kernel: bond0: (slave eni0np1): making interface the new active one
> >> kernel: #PF: supervisor write access in kernel mode
> >> kernel: #PF: error_code(0x0002) - not-present page
> >> kernel: PGD 0 P4D 0
> >> kernel: Oops: 0002 [#1] PREEMPT SMP
> >> kernel: CPU: 4 PID: 2237 Comm: ping Not tainted 6.7.7+ #12
> >> kernel: Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-2.fc40 04/01/2014
> >> kernel: RIP: 0010:nsim_ipsec_offload_ok+0xc/0x20 [netdevsim]
> >> kernel: bond0: (slave eni0np1): bond_ipsec_add_sa_all: failed to add SA
> >
> > I saw the errors are during bond_ipsec_add_sa_all, which also
> > set ipsec->xs->xso.real_dev = NULL. Should we fix it there?
> >
> > Thanks
> > Hangbin
>
> Correct, I saw it too but I didn't remove it on purpose. I know it can lead to a
> similar error, but the fix is more complicated. I don't believe it's correct to
> set real_dev if the SA add failed, so we need to think about a different way
> to sync it. To be fair in real life it would be more difficult to hit it because
> the device must be in a state where the SA add fails, although it supports
> xfrm offload. The problem is that real_dev must be set before attempting the SA
> add in the first place.
Got it, so this time we only fix the delete path.
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH net 4/4] bonding: fix xfrm state handling when clearing active slave
2024-08-19 7:38 ` Nikolay Aleksandrov
@ 2024-08-20 3:48 ` Hangbin Liu
0 siblings, 0 replies; 20+ messages in thread
From: Hangbin Liu @ 2024-08-20 3:48 UTC (permalink / raw)
To: Nikolay Aleksandrov
Cc: netdev, Taehee Yoo, davem, jv, andy, edumazet, kuba, pabeni,
jarod
On Mon, Aug 19, 2024 at 10:38:01AM +0300, Nikolay Aleksandrov wrote:
> >> diff --git a/drivers/net/bonding/bond_options.c b/drivers/net/bonding/bond_options.c
> >> index bc80fb6397dc..95d59a18c022 100644
> >> --- a/drivers/net/bonding/bond_options.c
> >> +++ b/drivers/net/bonding/bond_options.c
> >> @@ -936,7 +936,7 @@ static int bond_option_active_slave_set(struct bonding *bond,
> >> /* check to see if we are clearing active */
> >> if (!slave_dev) {
> >> netdev_dbg(bond->dev, "Clearing current active slave\n");
> >> - RCU_INIT_POINTER(bond->curr_active_slave, NULL);
> >> + bond_change_active_slave(bond, NULL);
> >
> > The good part of this is we can do bond_ipsec_del_sa_all and
> > bond_ipsec_add_sa_all. I'm not sure if we should do promisc/mcast adjustment
> > when set active_slave to null.
> >
> > Jay should know better.
> >
> > Thanks
> > Hangbin
>
> Jay please correct me, but I'm pretty sure we should adjust them. They get adjusted on
> every active slave change, this is no different. In fact I'd argue that it's a long
> standing bug because they don't get adjusted when the active slave is cleared
> manually and if a new one is chosen (we call bond_select_active_slave() right after)
> then the old one would still have them set. During normal operations and automatic
> curr active slave changes, it is always adjusted.
OK, I rechecked the code. The mcast resend only happens when there is a new
new_active or in rr mode. But bond_option_active_slave_set() only called with
active-backup/alb/tlb mode. So this should be safe.
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH net 0/4] bonding: fix xfrm offload bugs
2024-08-16 11:48 [PATCH net 0/4] bonding: fix xfrm offload bugs Nikolay Aleksandrov
` (4 preceding siblings ...)
2024-08-16 11:52 ` [PATCH net 0/4] bonding: fix xfrm offload bugs Nikolay Aleksandrov
@ 2024-08-20 13:40 ` patchwork-bot+netdevbpf
5 siblings, 0 replies; 20+ messages in thread
From: patchwork-bot+netdevbpf @ 2024-08-20 13:40 UTC (permalink / raw)
To: Nikolay Aleksandrov
Cc: netdev, ap420073, davem, jv, andy, edumazet, kuba, pabeni, jarod
Hello:
This series was applied to netdev/net.git (main)
by Paolo Abeni <pabeni@redhat.com>:
On Fri, 16 Aug 2024 14:48:09 +0300 you wrote:
> Hi,
> I noticed these problems while reviewing a bond xfrm patch recently.
> The fixes are straight-forward, please review carefully the last one
> because it has side-effects. This set has passed bond's selftests
> and my custom bond stress tests which crash without these fixes.
>
> Note the first patch is not critical, but it simplifies the next fix.
>
> [...]
Here is the summary with links:
- [net,1/4] bonding: fix bond_ipsec_offload_ok return type
https://git.kernel.org/netdev/net/c/fc59b9a5f720
- [net,2/4] bonding: fix null pointer deref in bond_ipsec_offload_ok
https://git.kernel.org/netdev/net/c/95c90e4ad89d
- [net,3/4] bonding: fix xfrm real_dev null pointer dereference
https://git.kernel.org/netdev/net/c/f8cde9805981
- [net,4/4] bonding: fix xfrm state handling when clearing active slave
https://git.kernel.org/netdev/net/c/c4c5c5d2ef40
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2024-08-20 13:40 UTC | newest]
Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-16 11:48 [PATCH net 0/4] bonding: fix xfrm offload bugs Nikolay Aleksandrov
2024-08-16 11:48 ` [PATCH net 1/4] bonding: fix bond_ipsec_offload_ok return type Nikolay Aleksandrov
2024-08-19 2:43 ` Hangbin Liu
2024-08-16 11:48 ` [PATCH net 2/4] bonding: fix null pointer deref in bond_ipsec_offload_ok Nikolay Aleksandrov
2024-08-19 2:45 ` Hangbin Liu
2024-08-19 2:53 ` Hangbin Liu
2024-08-19 7:25 ` Nikolay Aleksandrov
2024-08-19 8:12 ` Hangbin Liu
2024-08-19 7:26 ` Eric Dumazet
2024-08-16 11:48 ` [PATCH net 3/4] bonding: fix xfrm real_dev null pointer dereference Nikolay Aleksandrov
2024-08-19 2:54 ` Hangbin Liu
2024-08-19 7:34 ` Nikolay Aleksandrov
2024-08-19 8:25 ` Nikolay Aleksandrov
2024-08-19 8:34 ` Hangbin Liu
2024-08-16 11:48 ` [PATCH net 4/4] bonding: fix xfrm state handling when clearing active slave Nikolay Aleksandrov
2024-08-19 3:05 ` Hangbin Liu
2024-08-19 7:38 ` Nikolay Aleksandrov
2024-08-20 3:48 ` Hangbin Liu
2024-08-16 11:52 ` [PATCH net 0/4] bonding: fix xfrm offload bugs Nikolay Aleksandrov
2024-08-20 13:40 ` patchwork-bot+netdevbpf
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).