netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net 0/4] bonding: fix xfrm offload bugs
@ 2024-08-16 11:48 Nikolay Aleksandrov
  2024-08-16 11:48 ` [PATCH net 1/4] bonding: fix bond_ipsec_offload_ok return type Nikolay Aleksandrov
                   ` (5 more replies)
  0 siblings, 6 replies; 20+ messages in thread
From: Nikolay Aleksandrov @ 2024-08-16 11:48 UTC (permalink / raw)
  To: netdev
  Cc: Taehee Yoo, davem, jv, andy, edumazet, kuba, pabeni, jarod,
	Nikolay Aleksandrov

Hi,
I noticed these problems while reviewing a bond xfrm patch recently.
The fixes are straight-forward, please review carefully the last one
because it has side-effects. This set has passed bond's selftests
and my custom bond stress tests which crash without these fixes.

Note the first patch is not critical, but it simplifies the next fix.

Thanks,
 Nik


Nikolay Aleksandrov (4):
  bonding: fix bond_ipsec_offload_ok return type
  bonding: fix null pointer deref in bond_ipsec_offload_ok
  bonding: fix xfrm real_dev null pointer dereference
  bonding: fix xfrm state handling when clearing active slave

 drivers/net/bonding/bond_main.c    | 21 ++++++++-------------
 drivers/net/bonding/bond_options.c |  2 +-
 2 files changed, 9 insertions(+), 14 deletions(-)

-- 
2.44.0


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH net 1/4] bonding: fix bond_ipsec_offload_ok return type
  2024-08-16 11:48 [PATCH net 0/4] bonding: fix xfrm offload bugs Nikolay Aleksandrov
@ 2024-08-16 11:48 ` Nikolay Aleksandrov
  2024-08-19  2:43   ` Hangbin Liu
  2024-08-16 11:48 ` [PATCH net 2/4] bonding: fix null pointer deref in bond_ipsec_offload_ok Nikolay Aleksandrov
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 20+ messages in thread
From: Nikolay Aleksandrov @ 2024-08-16 11:48 UTC (permalink / raw)
  To: netdev
  Cc: Taehee Yoo, davem, jv, andy, edumazet, kuba, pabeni, jarod,
	Nikolay Aleksandrov

Fix the return type which should be bool.

Fixes: 955b785ec6b3 ("bonding: fix suspicious RCU usage in bond_ipsec_offload_ok()")
Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
---
 drivers/net/bonding/bond_main.c | 18 ++++++------------
 1 file changed, 6 insertions(+), 12 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 1cd92c12e782..85b5868deeea 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -599,34 +599,28 @@ static bool bond_ipsec_offload_ok(struct sk_buff *skb, struct xfrm_state *xs)
 	struct net_device *real_dev;
 	struct slave *curr_active;
 	struct bonding *bond;
-	int err;
+	bool ok = false;
 
 	bond = netdev_priv(bond_dev);
 	rcu_read_lock();
 	curr_active = rcu_dereference(bond->curr_active_slave);
 	real_dev = curr_active->dev;
 
-	if (BOND_MODE(bond) != BOND_MODE_ACTIVEBACKUP) {
-		err = false;
+	if (BOND_MODE(bond) != BOND_MODE_ACTIVEBACKUP)
 		goto out;
-	}
 
-	if (!xs->xso.real_dev) {
-		err = false;
+	if (!xs->xso.real_dev)
 		goto out;
-	}
 
 	if (!real_dev->xfrmdev_ops ||
 	    !real_dev->xfrmdev_ops->xdo_dev_offload_ok ||
-	    netif_is_bond_master(real_dev)) {
-		err = false;
+	    netif_is_bond_master(real_dev))
 		goto out;
-	}
 
-	err = real_dev->xfrmdev_ops->xdo_dev_offload_ok(skb, xs);
+	ok = real_dev->xfrmdev_ops->xdo_dev_offload_ok(skb, xs);
 out:
 	rcu_read_unlock();
-	return err;
+	return ok;
 }
 
 static const struct xfrmdev_ops bond_xfrmdev_ops = {
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH net 2/4] bonding: fix null pointer deref in bond_ipsec_offload_ok
  2024-08-16 11:48 [PATCH net 0/4] bonding: fix xfrm offload bugs Nikolay Aleksandrov
  2024-08-16 11:48 ` [PATCH net 1/4] bonding: fix bond_ipsec_offload_ok return type Nikolay Aleksandrov
@ 2024-08-16 11:48 ` Nikolay Aleksandrov
  2024-08-19  2:45   ` Hangbin Liu
                     ` (2 more replies)
  2024-08-16 11:48 ` [PATCH net 3/4] bonding: fix xfrm real_dev null pointer dereference Nikolay Aleksandrov
                   ` (3 subsequent siblings)
  5 siblings, 3 replies; 20+ messages in thread
From: Nikolay Aleksandrov @ 2024-08-16 11:48 UTC (permalink / raw)
  To: netdev
  Cc: Taehee Yoo, davem, jv, andy, edumazet, kuba, pabeni, jarod,
	Nikolay Aleksandrov

We must check if there is an active slave before dereferencing the pointer.

Fixes: 18cb261afd7b ("bonding: support hardware encryption offload to slaves")
Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
---
 drivers/net/bonding/bond_main.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 85b5868deeea..65ddb71eebcd 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -604,6 +604,8 @@ static bool bond_ipsec_offload_ok(struct sk_buff *skb, struct xfrm_state *xs)
 	bond = netdev_priv(bond_dev);
 	rcu_read_lock();
 	curr_active = rcu_dereference(bond->curr_active_slave);
+	if (!curr_active)
+		goto out;
 	real_dev = curr_active->dev;
 
 	if (BOND_MODE(bond) != BOND_MODE_ACTIVEBACKUP)
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH net 3/4] bonding: fix xfrm real_dev null pointer dereference
  2024-08-16 11:48 [PATCH net 0/4] bonding: fix xfrm offload bugs Nikolay Aleksandrov
  2024-08-16 11:48 ` [PATCH net 1/4] bonding: fix bond_ipsec_offload_ok return type Nikolay Aleksandrov
  2024-08-16 11:48 ` [PATCH net 2/4] bonding: fix null pointer deref in bond_ipsec_offload_ok Nikolay Aleksandrov
@ 2024-08-16 11:48 ` Nikolay Aleksandrov
  2024-08-19  2:54   ` Hangbin Liu
  2024-08-16 11:48 ` [PATCH net 4/4] bonding: fix xfrm state handling when clearing active slave Nikolay Aleksandrov
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 20+ messages in thread
From: Nikolay Aleksandrov @ 2024-08-16 11:48 UTC (permalink / raw)
  To: netdev
  Cc: Taehee Yoo, davem, jv, andy, edumazet, kuba, pabeni, jarod,
	Nikolay Aleksandrov

We shouldn't set real_dev to NULL because packets can be in transit and
xfrm might call xdo_dev_offload_ok() in parallel. All callbacks assume
real_dev is set.

 Example trace:
 kernel: BUG: unable to handle page fault for address: 0000000000001030
 kernel: bond0: (slave eni0np1): making interface the new active one
 kernel: #PF: supervisor write access in kernel mode
 kernel: #PF: error_code(0x0002) - not-present page
 kernel: PGD 0 P4D 0
 kernel: Oops: 0002 [#1] PREEMPT SMP
 kernel: CPU: 4 PID: 2237 Comm: ping Not tainted 6.7.7+ #12
 kernel: Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-2.fc40 04/01/2014
 kernel: RIP: 0010:nsim_ipsec_offload_ok+0xc/0x20 [netdevsim]
 kernel: bond0: (slave eni0np1): bond_ipsec_add_sa_all: failed to add SA
 kernel: Code: e0 0f 0b 48 83 7f 38 00 74 de 0f 0b 48 8b 47 08 48 8b 37 48 8b 78 40 e9 b2 e5 9a d7 66 90 0f 1f 44 00 00 48 8b 86 80 02 00 00 <83> 80 30 10 00 00 01 b8 01 00 00 00 c3 0f 1f 80 00 00 00 00 0f 1f
 kernel: bond0: (slave eni0np1): making interface the new active one
 kernel: RSP: 0018:ffffabde81553b98 EFLAGS: 00010246
 kernel: bond0: (slave eni0np1): bond_ipsec_add_sa_all: failed to add SA
 kernel:
 kernel: RAX: 0000000000000000 RBX: ffff9eb404e74900 RCX: ffff9eb403d97c60
 kernel: RDX: ffffffffc090de10 RSI: ffff9eb404e74900 RDI: ffff9eb3c5de9e00
 kernel: RBP: ffff9eb3c0a42000 R08: 0000000000000010 R09: 0000000000000014
 kernel: R10: 7974203030303030 R11: 3030303030303030 R12: 0000000000000000
 kernel: R13: ffff9eb3c5de9e00 R14: ffffabde81553cc8 R15: ffff9eb404c53000
 kernel: FS:  00007f2a77a3ad00(0000) GS:ffff9eb43bd00000(0000) knlGS:0000000000000000
 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 kernel: CR2: 0000000000001030 CR3: 00000001122ab000 CR4: 0000000000350ef0
 kernel: bond0: (slave eni0np1): making interface the new active one
 kernel: Call Trace:
 kernel:  <TASK>
 kernel:  ? __die+0x1f/0x60
 kernel: bond0: (slave eni0np1): bond_ipsec_add_sa_all: failed to add SA
 kernel:  ? page_fault_oops+0x142/0x4c0
 kernel:  ? do_user_addr_fault+0x65/0x670
 kernel:  ? kvm_read_and_reset_apf_flags+0x3b/0x50
 kernel: bond0: (slave eni0np1): making interface the new active one
 kernel:  ? exc_page_fault+0x7b/0x180
 kernel:  ? asm_exc_page_fault+0x22/0x30
 kernel:  ? nsim_bpf_uninit+0x50/0x50 [netdevsim]
 kernel: bond0: (slave eni0np1): bond_ipsec_add_sa_all: failed to add SA
 kernel:  ? nsim_ipsec_offload_ok+0xc/0x20 [netdevsim]
 kernel: bond0: (slave eni0np1): making interface the new active one
 kernel:  bond_ipsec_offload_ok+0x7b/0x90 [bonding]
 kernel:  xfrm_output+0x61/0x3b0
 kernel: bond0: (slave eni0np1): bond_ipsec_add_sa_all: failed to add SA
 kernel:  ip_push_pending_frames+0x56/0x80

Fixes: 18cb261afd7b ("bonding: support hardware encryption offload to slaves")
Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
---
 drivers/net/bonding/bond_main.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 65ddb71eebcd..f74bacf071fc 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -582,7 +582,6 @@ static void bond_ipsec_del_sa_all(struct bonding *bond)
 		} else {
 			slave->dev->xfrmdev_ops->xdo_dev_state_delete(ipsec->xs);
 		}
-		ipsec->xs->xso.real_dev = NULL;
 	}
 	spin_unlock_bh(&bond->ipsec_lock);
 	rcu_read_unlock();
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH net 4/4] bonding: fix xfrm state handling when clearing active slave
  2024-08-16 11:48 [PATCH net 0/4] bonding: fix xfrm offload bugs Nikolay Aleksandrov
                   ` (2 preceding siblings ...)
  2024-08-16 11:48 ` [PATCH net 3/4] bonding: fix xfrm real_dev null pointer dereference Nikolay Aleksandrov
@ 2024-08-16 11:48 ` Nikolay Aleksandrov
  2024-08-19  3:05   ` Hangbin Liu
  2024-08-16 11:52 ` [PATCH net 0/4] bonding: fix xfrm offload bugs Nikolay Aleksandrov
  2024-08-20 13:40 ` patchwork-bot+netdevbpf
  5 siblings, 1 reply; 20+ messages in thread
From: Nikolay Aleksandrov @ 2024-08-16 11:48 UTC (permalink / raw)
  To: netdev
  Cc: Taehee Yoo, davem, jv, andy, edumazet, kuba, pabeni, jarod,
	Nikolay Aleksandrov

If the active slave is cleared manually the xfrm state is not flushed.
This leads to xfrm add/del imbalance and adding the same state multiple
times. For example when the device cannot handle anymore states we get:
 [ 1169.884811] bond0: (slave eni0np1): bond_ipsec_add_sa_all: failed to add SA
because it's filled with the same state after multiple active slave
clearings. This change also has a few nice side effects: user-space
gets a notification for the change, the old device gets its mac address
and promisc/mcast adjusted properly.

Fixes: 18cb261afd7b ("bonding: support hardware encryption offload to slaves")
Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
---
Please review this one more carefully. I plan to add a selftest with
netdevsim for this as well.

 drivers/net/bonding/bond_options.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/bonding/bond_options.c b/drivers/net/bonding/bond_options.c
index bc80fb6397dc..95d59a18c022 100644
--- a/drivers/net/bonding/bond_options.c
+++ b/drivers/net/bonding/bond_options.c
@@ -936,7 +936,7 @@ static int bond_option_active_slave_set(struct bonding *bond,
 	/* check to see if we are clearing active */
 	if (!slave_dev) {
 		netdev_dbg(bond->dev, "Clearing current active slave\n");
-		RCU_INIT_POINTER(bond->curr_active_slave, NULL);
+		bond_change_active_slave(bond, NULL);
 		bond_select_active_slave(bond);
 	} else {
 		struct slave *old_active = rtnl_dereference(bond->curr_active_slave);
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH net 0/4] bonding: fix xfrm offload bugs
  2024-08-16 11:48 [PATCH net 0/4] bonding: fix xfrm offload bugs Nikolay Aleksandrov
                   ` (3 preceding siblings ...)
  2024-08-16 11:48 ` [PATCH net 4/4] bonding: fix xfrm state handling when clearing active slave Nikolay Aleksandrov
@ 2024-08-16 11:52 ` Nikolay Aleksandrov
  2024-08-20 13:40 ` patchwork-bot+netdevbpf
  5 siblings, 0 replies; 20+ messages in thread
From: Nikolay Aleksandrov @ 2024-08-16 11:52 UTC (permalink / raw)
  To: netdev
  Cc: Taehee Yoo, davem, jv, andy, edumazet, kuba, pabeni, jarod,
	Hangbin Liu

On 16/08/2024 14:48, Nikolay Aleksandrov wrote:
> Hi,
> I noticed these problems while reviewing a bond xfrm patch recently.
> The fixes are straight-forward, please review carefully the last one
> because it has side-effects. This set has passed bond's selftests
> and my custom bond stress tests which crash without these fixes.
> 
> Note the first patch is not critical, but it simplifies the next fix.
> 
> Thanks,
>  Nik
> 
> 
> Nikolay Aleksandrov (4):
>   bonding: fix bond_ipsec_offload_ok return type
>   bonding: fix null pointer deref in bond_ipsec_offload_ok
>   bonding: fix xfrm real_dev null pointer dereference
>   bonding: fix xfrm state handling when clearing active slave
> 
>  drivers/net/bonding/bond_main.c    | 21 ++++++++-------------
>  drivers/net/bonding/bond_options.c |  2 +-
>  2 files changed, 9 insertions(+), 14 deletions(-)
> 

Oops forgot to CC Hangbin, sorry about that (CCed).



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net 1/4] bonding: fix bond_ipsec_offload_ok return type
  2024-08-16 11:48 ` [PATCH net 1/4] bonding: fix bond_ipsec_offload_ok return type Nikolay Aleksandrov
@ 2024-08-19  2:43   ` Hangbin Liu
  0 siblings, 0 replies; 20+ messages in thread
From: Hangbin Liu @ 2024-08-19  2:43 UTC (permalink / raw)
  To: Nikolay Aleksandrov
  Cc: netdev, Taehee Yoo, davem, jv, andy, edumazet, kuba, pabeni,
	jarod

On Fri, Aug 16, 2024 at 02:48:10PM +0300, Nikolay Aleksandrov wrote:
> Fix the return type which should be bool.
> 
> Fixes: 955b785ec6b3 ("bonding: fix suspicious RCU usage in bond_ipsec_offload_ok()")
> Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
> ---
>  drivers/net/bonding/bond_main.c | 18 ++++++------------
>  1 file changed, 6 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> index 1cd92c12e782..85b5868deeea 100644
> --- a/drivers/net/bonding/bond_main.c
> +++ b/drivers/net/bonding/bond_main.c
> @@ -599,34 +599,28 @@ static bool bond_ipsec_offload_ok(struct sk_buff *skb, struct xfrm_state *xs)
>  	struct net_device *real_dev;
>  	struct slave *curr_active;
>  	struct bonding *bond;
> -	int err;
> +	bool ok = false;
>  
>  	bond = netdev_priv(bond_dev);
>  	rcu_read_lock();
>  	curr_active = rcu_dereference(bond->curr_active_slave);
>  	real_dev = curr_active->dev;
>  
> -	if (BOND_MODE(bond) != BOND_MODE_ACTIVEBACKUP) {
> -		err = false;
> +	if (BOND_MODE(bond) != BOND_MODE_ACTIVEBACKUP)
>  		goto out;
> -	}
>  
> -	if (!xs->xso.real_dev) {
> -		err = false;
> +	if (!xs->xso.real_dev)
>  		goto out;
> -	}
>  
>  	if (!real_dev->xfrmdev_ops ||
>  	    !real_dev->xfrmdev_ops->xdo_dev_offload_ok ||
> -	    netif_is_bond_master(real_dev)) {
> -		err = false;
> +	    netif_is_bond_master(real_dev))
>  		goto out;
> -	}
>  
> -	err = real_dev->xfrmdev_ops->xdo_dev_offload_ok(skb, xs);
> +	ok = real_dev->xfrmdev_ops->xdo_dev_offload_ok(skb, xs);
>  out:
>  	rcu_read_unlock();
> -	return err;
> +	return ok;
>  }
>  
>  static const struct xfrmdev_ops bond_xfrmdev_ops = {
> -- 
> 2.44.0
> 

Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net 2/4] bonding: fix null pointer deref in bond_ipsec_offload_ok
  2024-08-16 11:48 ` [PATCH net 2/4] bonding: fix null pointer deref in bond_ipsec_offload_ok Nikolay Aleksandrov
@ 2024-08-19  2:45   ` Hangbin Liu
  2024-08-19  2:53   ` Hangbin Liu
  2024-08-19  7:26   ` Eric Dumazet
  2 siblings, 0 replies; 20+ messages in thread
From: Hangbin Liu @ 2024-08-19  2:45 UTC (permalink / raw)
  To: Nikolay Aleksandrov
  Cc: netdev, Taehee Yoo, davem, jv, andy, edumazet, kuba, pabeni,
	jarod

On Fri, Aug 16, 2024 at 02:48:11PM +0300, Nikolay Aleksandrov wrote:
> We must check if there is an active slave before dereferencing the pointer.
> 
> Fixes: 18cb261afd7b ("bonding: support hardware encryption offload to slaves")
> Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
> ---
>  drivers/net/bonding/bond_main.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> index 85b5868deeea..65ddb71eebcd 100644
> --- a/drivers/net/bonding/bond_main.c
> +++ b/drivers/net/bonding/bond_main.c
> @@ -604,6 +604,8 @@ static bool bond_ipsec_offload_ok(struct sk_buff *skb, struct xfrm_state *xs)
>  	bond = netdev_priv(bond_dev);
>  	rcu_read_lock();
>  	curr_active = rcu_dereference(bond->curr_active_slave);
> +	if (!curr_active)
> +		goto out;
>  	real_dev = curr_active->dev;
>  
>  	if (BOND_MODE(bond) != BOND_MODE_ACTIVEBACKUP)
> -- 
> 2.44.0
> 

Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net 2/4] bonding: fix null pointer deref in bond_ipsec_offload_ok
  2024-08-16 11:48 ` [PATCH net 2/4] bonding: fix null pointer deref in bond_ipsec_offload_ok Nikolay Aleksandrov
  2024-08-19  2:45   ` Hangbin Liu
@ 2024-08-19  2:53   ` Hangbin Liu
  2024-08-19  7:25     ` Nikolay Aleksandrov
  2024-08-19  7:26   ` Eric Dumazet
  2 siblings, 1 reply; 20+ messages in thread
From: Hangbin Liu @ 2024-08-19  2:53 UTC (permalink / raw)
  To: Nikolay Aleksandrov
  Cc: netdev, Taehee Yoo, davem, jv, andy, edumazet, kuba, pabeni,
	jarod

On Fri, Aug 16, 2024 at 02:48:11PM +0300, Nikolay Aleksandrov wrote:
> We must check if there is an active slave before dereferencing the pointer.
> 
> Fixes: 18cb261afd7b ("bonding: support hardware encryption offload to slaves")
> Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
> ---
>  drivers/net/bonding/bond_main.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> index 85b5868deeea..65ddb71eebcd 100644
> --- a/drivers/net/bonding/bond_main.c
> +++ b/drivers/net/bonding/bond_main.c
> @@ -604,6 +604,8 @@ static bool bond_ipsec_offload_ok(struct sk_buff *skb, struct xfrm_state *xs)
>  	bond = netdev_priv(bond_dev);
>  	rcu_read_lock();
>  	curr_active = rcu_dereference(bond->curr_active_slave);
> +	if (!curr_active)
> +		goto out;
>  	real_dev = curr_active->dev;
>  
>  	if (BOND_MODE(bond) != BOND_MODE_ACTIVEBACKUP)
> -- 
> 2.44.0
> 

BTW, the bond_ipsec_offload_ok() only checks !xs->xso.real_dev, should we also
add WARN_ON(xs->xso.real_dev != slave->dev) checking?

Thanks
Hangbin

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net 3/4] bonding: fix xfrm real_dev null pointer dereference
  2024-08-16 11:48 ` [PATCH net 3/4] bonding: fix xfrm real_dev null pointer dereference Nikolay Aleksandrov
@ 2024-08-19  2:54   ` Hangbin Liu
  2024-08-19  7:34     ` Nikolay Aleksandrov
  0 siblings, 1 reply; 20+ messages in thread
From: Hangbin Liu @ 2024-08-19  2:54 UTC (permalink / raw)
  To: Nikolay Aleksandrov
  Cc: netdev, Taehee Yoo, davem, jv, andy, edumazet, kuba, pabeni,
	jarod

On Fri, Aug 16, 2024 at 02:48:12PM +0300, Nikolay Aleksandrov wrote:
> We shouldn't set real_dev to NULL because packets can be in transit and
> xfrm might call xdo_dev_offload_ok() in parallel. All callbacks assume
> real_dev is set.
> 
>  Example trace:
>  kernel: BUG: unable to handle page fault for address: 0000000000001030
>  kernel: bond0: (slave eni0np1): making interface the new active one
>  kernel: #PF: supervisor write access in kernel mode
>  kernel: #PF: error_code(0x0002) - not-present page
>  kernel: PGD 0 P4D 0
>  kernel: Oops: 0002 [#1] PREEMPT SMP
>  kernel: CPU: 4 PID: 2237 Comm: ping Not tainted 6.7.7+ #12
>  kernel: Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-2.fc40 04/01/2014
>  kernel: RIP: 0010:nsim_ipsec_offload_ok+0xc/0x20 [netdevsim]
>  kernel: bond0: (slave eni0np1): bond_ipsec_add_sa_all: failed to add SA

I saw the errors are during bond_ipsec_add_sa_all, which also
set ipsec->xs->xso.real_dev = NULL. Should we fix it there?

Thanks
Hangbin
>  kernel: Code: e0 0f 0b 48 83 7f 38 00 74 de 0f 0b 48 8b 47 08 48 8b 37 48 8b 78 40 e9 b2 e5 9a d7 66 90 0f 1f 44 00 00 48 8b 86 80 02 00 00 <83> 80 30 10 00 00 01 b8 01 00 00 00 c3 0f 1f 80 00 00 00 00 0f 1f
>  kernel: bond0: (slave eni0np1): making interface the new active one
>  kernel: RSP: 0018:ffffabde81553b98 EFLAGS: 00010246
>  kernel: bond0: (slave eni0np1): bond_ipsec_add_sa_all: failed to add SA
>  kernel:
>  kernel: RAX: 0000000000000000 RBX: ffff9eb404e74900 RCX: ffff9eb403d97c60
>  kernel: RDX: ffffffffc090de10 RSI: ffff9eb404e74900 RDI: ffff9eb3c5de9e00
>  kernel: RBP: ffff9eb3c0a42000 R08: 0000000000000010 R09: 0000000000000014
>  kernel: R10: 7974203030303030 R11: 3030303030303030 R12: 0000000000000000
>  kernel: R13: ffff9eb3c5de9e00 R14: ffffabde81553cc8 R15: ffff9eb404c53000
>  kernel: FS:  00007f2a77a3ad00(0000) GS:ffff9eb43bd00000(0000) knlGS:0000000000000000
>  kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>  kernel: CR2: 0000000000001030 CR3: 00000001122ab000 CR4: 0000000000350ef0
>  kernel: bond0: (slave eni0np1): making interface the new active one
>  kernel: Call Trace:
>  kernel:  <TASK>
>  kernel:  ? __die+0x1f/0x60
>  kernel: bond0: (slave eni0np1): bond_ipsec_add_sa_all: failed to add SA
>  kernel:  ? page_fault_oops+0x142/0x4c0
>  kernel:  ? do_user_addr_fault+0x65/0x670
>  kernel:  ? kvm_read_and_reset_apf_flags+0x3b/0x50
>  kernel: bond0: (slave eni0np1): making interface the new active one
>  kernel:  ? exc_page_fault+0x7b/0x180
>  kernel:  ? asm_exc_page_fault+0x22/0x30
>  kernel:  ? nsim_bpf_uninit+0x50/0x50 [netdevsim]
>  kernel: bond0: (slave eni0np1): bond_ipsec_add_sa_all: failed to add SA
>  kernel:  ? nsim_ipsec_offload_ok+0xc/0x20 [netdevsim]
>  kernel: bond0: (slave eni0np1): making interface the new active one
>  kernel:  bond_ipsec_offload_ok+0x7b/0x90 [bonding]
>  kernel:  xfrm_output+0x61/0x3b0
>  kernel: bond0: (slave eni0np1): bond_ipsec_add_sa_all: failed to add SA
>  kernel:  ip_push_pending_frames+0x56/0x80
> 
> Fixes: 18cb261afd7b ("bonding: support hardware encryption offload to slaves")
> Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
> ---
>  drivers/net/bonding/bond_main.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> index 65ddb71eebcd..f74bacf071fc 100644
> --- a/drivers/net/bonding/bond_main.c
> +++ b/drivers/net/bonding/bond_main.c
> @@ -582,7 +582,6 @@ static void bond_ipsec_del_sa_all(struct bonding *bond)
>  		} else {
>  			slave->dev->xfrmdev_ops->xdo_dev_state_delete(ipsec->xs);
>  		}
> -		ipsec->xs->xso.real_dev = NULL;
>  	}
>  	spin_unlock_bh(&bond->ipsec_lock);
>  	rcu_read_unlock();
> -- 
> 2.44.0
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net 4/4] bonding: fix xfrm state handling when clearing active slave
  2024-08-16 11:48 ` [PATCH net 4/4] bonding: fix xfrm state handling when clearing active slave Nikolay Aleksandrov
@ 2024-08-19  3:05   ` Hangbin Liu
  2024-08-19  7:38     ` Nikolay Aleksandrov
  0 siblings, 1 reply; 20+ messages in thread
From: Hangbin Liu @ 2024-08-19  3:05 UTC (permalink / raw)
  To: Nikolay Aleksandrov
  Cc: netdev, Taehee Yoo, davem, jv, andy, edumazet, kuba, pabeni,
	jarod

On Fri, Aug 16, 2024 at 02:48:13PM +0300, Nikolay Aleksandrov wrote:
> If the active slave is cleared manually the xfrm state is not flushed.
> This leads to xfrm add/del imbalance and adding the same state multiple
> times. For example when the device cannot handle anymore states we get:
>  [ 1169.884811] bond0: (slave eni0np1): bond_ipsec_add_sa_all: failed to add SA
> because it's filled with the same state after multiple active slave
> clearings. This change also has a few nice side effects: user-space
> gets a notification for the change, the old device gets its mac address
> and promisc/mcast adjusted properly.
> 
> Fixes: 18cb261afd7b ("bonding: support hardware encryption offload to slaves")
> Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
> ---
> Please review this one more carefully. I plan to add a selftest with
> netdevsim for this as well.
> 
>  drivers/net/bonding/bond_options.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/bonding/bond_options.c b/drivers/net/bonding/bond_options.c
> index bc80fb6397dc..95d59a18c022 100644
> --- a/drivers/net/bonding/bond_options.c
> +++ b/drivers/net/bonding/bond_options.c
> @@ -936,7 +936,7 @@ static int bond_option_active_slave_set(struct bonding *bond,
>  	/* check to see if we are clearing active */
>  	if (!slave_dev) {
>  		netdev_dbg(bond->dev, "Clearing current active slave\n");
> -		RCU_INIT_POINTER(bond->curr_active_slave, NULL);
> +		bond_change_active_slave(bond, NULL);

The good part of this is we can do bond_ipsec_del_sa_all and
bond_ipsec_add_sa_all. I'm not sure if we should do promisc/mcast adjustment
when set active_slave to null.

Jay should know better.

Thanks
Hangbin
>  		bond_select_active_slave(bond);
>  	} else {
>  		struct slave *old_active = rtnl_dereference(bond->curr_active_slave);
> -- 
> 2.44.0
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net 2/4] bonding: fix null pointer deref in bond_ipsec_offload_ok
  2024-08-19  2:53   ` Hangbin Liu
@ 2024-08-19  7:25     ` Nikolay Aleksandrov
  2024-08-19  8:12       ` Hangbin Liu
  0 siblings, 1 reply; 20+ messages in thread
From: Nikolay Aleksandrov @ 2024-08-19  7:25 UTC (permalink / raw)
  To: Hangbin Liu
  Cc: netdev, Taehee Yoo, davem, jv, andy, edumazet, kuba, pabeni,
	jarod

On 19/08/2024 05:53, Hangbin Liu wrote:
> On Fri, Aug 16, 2024 at 02:48:11PM +0300, Nikolay Aleksandrov wrote:
>> We must check if there is an active slave before dereferencing the pointer.
>>
>> Fixes: 18cb261afd7b ("bonding: support hardware encryption offload to slaves")
>> Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
>> ---
>>  drivers/net/bonding/bond_main.c | 2 ++
>>  1 file changed, 2 insertions(+)
>>
>> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>> index 85b5868deeea..65ddb71eebcd 100644
>> --- a/drivers/net/bonding/bond_main.c
>> +++ b/drivers/net/bonding/bond_main.c
>> @@ -604,6 +604,8 @@ static bool bond_ipsec_offload_ok(struct sk_buff *skb, struct xfrm_state *xs)
>>  	bond = netdev_priv(bond_dev);
>>  	rcu_read_lock();
>>  	curr_active = rcu_dereference(bond->curr_active_slave);
>> +	if (!curr_active)
>> +		goto out;
>>  	real_dev = curr_active->dev;
>>  
>>  	if (BOND_MODE(bond) != BOND_MODE_ACTIVEBACKUP)
>> -- 
>> 2.44.0
>>
> 
> BTW, the bond_ipsec_offload_ok() only checks !xs->xso.real_dev, should we also
> add WARN_ON(xs->xso.real_dev != slave->dev) checking?
> 
> Thanks
> Hangbin

We could, but not a warn_on() because I bet it can be easily triggered
by changing the active slave in parallel. real_dev is read without a
lock here so we cannot guarantee a sane state if policies are changed
under us. I think the callback should handle it by checking that the
new device doesn't have the policy setup yet, because the case happens
when an active slave changes which means policies are about to be
installed on the new one.

Cheers,
 Nik


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net 2/4] bonding: fix null pointer deref in bond_ipsec_offload_ok
  2024-08-16 11:48 ` [PATCH net 2/4] bonding: fix null pointer deref in bond_ipsec_offload_ok Nikolay Aleksandrov
  2024-08-19  2:45   ` Hangbin Liu
  2024-08-19  2:53   ` Hangbin Liu
@ 2024-08-19  7:26   ` Eric Dumazet
  2 siblings, 0 replies; 20+ messages in thread
From: Eric Dumazet @ 2024-08-19  7:26 UTC (permalink / raw)
  To: Nikolay Aleksandrov
  Cc: netdev, Taehee Yoo, davem, jv, andy, kuba, pabeni, jarod

On Fri, Aug 16, 2024 at 1:51 PM Nikolay Aleksandrov <razor@blackwall.org> wrote:
>
> We must check if there is an active slave before dereferencing the pointer.
>
> Fixes: 18cb261afd7b ("bonding: support hardware encryption offload to slaves")
> Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
> ---

Reviewed-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net 3/4] bonding: fix xfrm real_dev null pointer dereference
  2024-08-19  2:54   ` Hangbin Liu
@ 2024-08-19  7:34     ` Nikolay Aleksandrov
  2024-08-19  8:25       ` Nikolay Aleksandrov
  2024-08-19  8:34       ` Hangbin Liu
  0 siblings, 2 replies; 20+ messages in thread
From: Nikolay Aleksandrov @ 2024-08-19  7:34 UTC (permalink / raw)
  To: Hangbin Liu
  Cc: netdev, Taehee Yoo, davem, jv, andy, edumazet, kuba, pabeni,
	jarod

On 19/08/2024 05:54, Hangbin Liu wrote:
> On Fri, Aug 16, 2024 at 02:48:12PM +0300, Nikolay Aleksandrov wrote:
>> We shouldn't set real_dev to NULL because packets can be in transit and
>> xfrm might call xdo_dev_offload_ok() in parallel. All callbacks assume
>> real_dev is set.
>>
>>  Example trace:
>>  kernel: BUG: unable to handle page fault for address: 0000000000001030
>>  kernel: bond0: (slave eni0np1): making interface the new active one
>>  kernel: #PF: supervisor write access in kernel mode
>>  kernel: #PF: error_code(0x0002) - not-present page
>>  kernel: PGD 0 P4D 0
>>  kernel: Oops: 0002 [#1] PREEMPT SMP
>>  kernel: CPU: 4 PID: 2237 Comm: ping Not tainted 6.7.7+ #12
>>  kernel: Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-2.fc40 04/01/2014
>>  kernel: RIP: 0010:nsim_ipsec_offload_ok+0xc/0x20 [netdevsim]
>>  kernel: bond0: (slave eni0np1): bond_ipsec_add_sa_all: failed to add SA
> 
> I saw the errors are during bond_ipsec_add_sa_all, which also
> set ipsec->xs->xso.real_dev = NULL. Should we fix it there?
> 
> Thanks
> Hangbin

Correct, I saw it too but I didn't remove it on purpose. I know it can lead to a
similar error, but the fix is more complicated. I don't believe it's correct to
set real_dev if the SA add failed, so we need to think about a different way
to sync it. To be fair in real life it would be more difficult to hit it because
the device must be in a state where the SA add fails, although it supports
xfrm offload. The problem is that real_dev must be set before attempting the SA
add in the first place.

>>  kernel: Code: e0 0f 0b 48 83 7f 38 00 74 de 0f 0b 48 8b 47 08 48 8b 37 48 8b 78 40 e9 b2 e5 9a d7 66 90 0f 1f 44 00 00 48 8b 86 80 02 00 00 <83> 80 30 10 00 00 01 b8 01 00 00 00 c3 0f 1f 80 00 00 00 00 0f 1f
>>  kernel: bond0: (slave eni0np1): making interface the new active one
>>  kernel: RSP: 0018:ffffabde81553b98 EFLAGS: 00010246
>>  kernel: bond0: (slave eni0np1): bond_ipsec_add_sa_all: failed to add SA
>>  kernel:
>>  kernel: RAX: 0000000000000000 RBX: ffff9eb404e74900 RCX: ffff9eb403d97c60
>>  kernel: RDX: ffffffffc090de10 RSI: ffff9eb404e74900 RDI: ffff9eb3c5de9e00
>>  kernel: RBP: ffff9eb3c0a42000 R08: 0000000000000010 R09: 0000000000000014
>>  kernel: R10: 7974203030303030 R11: 3030303030303030 R12: 0000000000000000
>>  kernel: R13: ffff9eb3c5de9e00 R14: ffffabde81553cc8 R15: ffff9eb404c53000
>>  kernel: FS:  00007f2a77a3ad00(0000) GS:ffff9eb43bd00000(0000) knlGS:0000000000000000
>>  kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>  kernel: CR2: 0000000000001030 CR3: 00000001122ab000 CR4: 0000000000350ef0
>>  kernel: bond0: (slave eni0np1): making interface the new active one
>>  kernel: Call Trace:
>>  kernel:  <TASK>
>>  kernel:  ? __die+0x1f/0x60
>>  kernel: bond0: (slave eni0np1): bond_ipsec_add_sa_all: failed to add SA
>>  kernel:  ? page_fault_oops+0x142/0x4c0
>>  kernel:  ? do_user_addr_fault+0x65/0x670
>>  kernel:  ? kvm_read_and_reset_apf_flags+0x3b/0x50
>>  kernel: bond0: (slave eni0np1): making interface the new active one
>>  kernel:  ? exc_page_fault+0x7b/0x180
>>  kernel:  ? asm_exc_page_fault+0x22/0x30
>>  kernel:  ? nsim_bpf_uninit+0x50/0x50 [netdevsim]
>>  kernel: bond0: (slave eni0np1): bond_ipsec_add_sa_all: failed to add SA
>>  kernel:  ? nsim_ipsec_offload_ok+0xc/0x20 [netdevsim]
>>  kernel: bond0: (slave eni0np1): making interface the new active one
>>  kernel:  bond_ipsec_offload_ok+0x7b/0x90 [bonding]
>>  kernel:  xfrm_output+0x61/0x3b0
>>  kernel: bond0: (slave eni0np1): bond_ipsec_add_sa_all: failed to add SA
>>  kernel:  ip_push_pending_frames+0x56/0x80
>>
>> Fixes: 18cb261afd7b ("bonding: support hardware encryption offload to slaves")
>> Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
>> ---
>>  drivers/net/bonding/bond_main.c | 1 -
>>  1 file changed, 1 deletion(-)
>>
>> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>> index 65ddb71eebcd..f74bacf071fc 100644
>> --- a/drivers/net/bonding/bond_main.c
>> +++ b/drivers/net/bonding/bond_main.c
>> @@ -582,7 +582,6 @@ static void bond_ipsec_del_sa_all(struct bonding *bond)
>>  		} else {
>>  			slave->dev->xfrmdev_ops->xdo_dev_state_delete(ipsec->xs);
>>  		}
>> -		ipsec->xs->xso.real_dev = NULL;
>>  	}
>>  	spin_unlock_bh(&bond->ipsec_lock);
>>  	rcu_read_unlock();
>> -- 
>> 2.44.0
>>


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net 4/4] bonding: fix xfrm state handling when clearing active slave
  2024-08-19  3:05   ` Hangbin Liu
@ 2024-08-19  7:38     ` Nikolay Aleksandrov
  2024-08-20  3:48       ` Hangbin Liu
  0 siblings, 1 reply; 20+ messages in thread
From: Nikolay Aleksandrov @ 2024-08-19  7:38 UTC (permalink / raw)
  To: Hangbin Liu
  Cc: netdev, Taehee Yoo, davem, jv, andy, edumazet, kuba, pabeni,
	jarod

On 19/08/2024 06:05, Hangbin Liu wrote:
> On Fri, Aug 16, 2024 at 02:48:13PM +0300, Nikolay Aleksandrov wrote:
>> If the active slave is cleared manually the xfrm state is not flushed.
>> This leads to xfrm add/del imbalance and adding the same state multiple
>> times. For example when the device cannot handle anymore states we get:
>>  [ 1169.884811] bond0: (slave eni0np1): bond_ipsec_add_sa_all: failed to add SA
>> because it's filled with the same state after multiple active slave
>> clearings. This change also has a few nice side effects: user-space
>> gets a notification for the change, the old device gets its mac address
>> and promisc/mcast adjusted properly.
>>
>> Fixes: 18cb261afd7b ("bonding: support hardware encryption offload to slaves")
>> Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
>> ---
>> Please review this one more carefully. I plan to add a selftest with
>> netdevsim for this as well.
>>
>>  drivers/net/bonding/bond_options.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/net/bonding/bond_options.c b/drivers/net/bonding/bond_options.c
>> index bc80fb6397dc..95d59a18c022 100644
>> --- a/drivers/net/bonding/bond_options.c
>> +++ b/drivers/net/bonding/bond_options.c
>> @@ -936,7 +936,7 @@ static int bond_option_active_slave_set(struct bonding *bond,
>>  	/* check to see if we are clearing active */
>>  	if (!slave_dev) {
>>  		netdev_dbg(bond->dev, "Clearing current active slave\n");
>> -		RCU_INIT_POINTER(bond->curr_active_slave, NULL);
>> +		bond_change_active_slave(bond, NULL);
> 
> The good part of this is we can do bond_ipsec_del_sa_all and
> bond_ipsec_add_sa_all. I'm not sure if we should do promisc/mcast adjustment
> when set active_slave to null.
> 
> Jay should know better.
> 
> Thanks
> Hangbin

Jay please correct me, but I'm pretty sure we should adjust them. They get adjusted on
every active slave change, this is no different. In fact I'd argue that it's a long
standing bug because they don't get adjusted when the active slave is cleared
manually and if a new one is chosen (we call bond_select_active_slave() right after)
then the old one would still have them set. During normal operations and automatic
curr active slave changes, it is always adjusted.

>>  		bond_select_active_slave(bond);
>>  	} else {
>>  		struct slave *old_active = rtnl_dereference(bond->curr_active_slave);
>> -- 
>> 2.44.0
>>



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net 2/4] bonding: fix null pointer deref in bond_ipsec_offload_ok
  2024-08-19  7:25     ` Nikolay Aleksandrov
@ 2024-08-19  8:12       ` Hangbin Liu
  0 siblings, 0 replies; 20+ messages in thread
From: Hangbin Liu @ 2024-08-19  8:12 UTC (permalink / raw)
  To: Nikolay Aleksandrov
  Cc: netdev, Taehee Yoo, davem, jv, andy, edumazet, kuba, pabeni,
	jarod

On Mon, Aug 19, 2024 at 10:25:37AM +0300, Nikolay Aleksandrov wrote:
> On 19/08/2024 05:53, Hangbin Liu wrote:
> > On Fri, Aug 16, 2024 at 02:48:11PM +0300, Nikolay Aleksandrov wrote:
> >> We must check if there is an active slave before dereferencing the pointer.
> >>
> >> Fixes: 18cb261afd7b ("bonding: support hardware encryption offload to slaves")
> >> Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
> >> ---
> >>  drivers/net/bonding/bond_main.c | 2 ++
> >>  1 file changed, 2 insertions(+)
> >>
> >> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> >> index 85b5868deeea..65ddb71eebcd 100644
> >> --- a/drivers/net/bonding/bond_main.c
> >> +++ b/drivers/net/bonding/bond_main.c
> >> @@ -604,6 +604,8 @@ static bool bond_ipsec_offload_ok(struct sk_buff *skb, struct xfrm_state *xs)
> >>  	bond = netdev_priv(bond_dev);
> >>  	rcu_read_lock();
> >>  	curr_active = rcu_dereference(bond->curr_active_slave);
> >> +	if (!curr_active)
> >> +		goto out;
> >>  	real_dev = curr_active->dev;
> >>  
> >>  	if (BOND_MODE(bond) != BOND_MODE_ACTIVEBACKUP)
> >> -- 
> >> 2.44.0
> >>
> > 
> > BTW, the bond_ipsec_offload_ok() only checks !xs->xso.real_dev, should we also
> > add WARN_ON(xs->xso.real_dev != slave->dev) checking?
> > 
> > Thanks
> > Hangbin
> 
> We could, but not a warn_on() because I bet it can be easily triggered
> by changing the active slave in parallel. real_dev is read without a

OK, maybe a pr_warn or salve_warn()?

> lock here so we cannot guarantee a sane state if policies are changed
> under us. I think the callback should handle it by checking that the
> new device doesn't have the policy setup yet, because the case happens
> when an active slave changes which means policies are about to be
> installed on the new one.

Hmm, how to check if a device has policy setup except ipsec->xs->xso.real_dev?

Hangbin

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net 3/4] bonding: fix xfrm real_dev null pointer dereference
  2024-08-19  7:34     ` Nikolay Aleksandrov
@ 2024-08-19  8:25       ` Nikolay Aleksandrov
  2024-08-19  8:34       ` Hangbin Liu
  1 sibling, 0 replies; 20+ messages in thread
From: Nikolay Aleksandrov @ 2024-08-19  8:25 UTC (permalink / raw)
  To: Hangbin Liu
  Cc: netdev, Taehee Yoo, davem, jv, andy, edumazet, kuba, pabeni,
	jarod

On 19/08/2024 10:34, Nikolay Aleksandrov wrote:
> On 19/08/2024 05:54, Hangbin Liu wrote:
>> On Fri, Aug 16, 2024 at 02:48:12PM +0300, Nikolay Aleksandrov wrote:
>>> We shouldn't set real_dev to NULL because packets can be in transit and
>>> xfrm might call xdo_dev_offload_ok() in parallel. All callbacks assume
>>> real_dev is set.
>>>
>>>  Example trace:
>>>  kernel: BUG: unable to handle page fault for address: 0000000000001030
>>>  kernel: bond0: (slave eni0np1): making interface the new active one
>>>  kernel: #PF: supervisor write access in kernel mode
>>>  kernel: #PF: error_code(0x0002) - not-present page
>>>  kernel: PGD 0 P4D 0
>>>  kernel: Oops: 0002 [#1] PREEMPT SMP
>>>  kernel: CPU: 4 PID: 2237 Comm: ping Not tainted 6.7.7+ #12
>>>  kernel: Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-2.fc40 04/01/2014
>>>  kernel: RIP: 0010:nsim_ipsec_offload_ok+0xc/0x20 [netdevsim]
>>>  kernel: bond0: (slave eni0np1): bond_ipsec_add_sa_all: failed to add SA
>>
>> I saw the errors are during bond_ipsec_add_sa_all, which also
>> set ipsec->xs->xso.real_dev = NULL. Should we fix it there?
>>
>> Thanks
>> Hangbin
> 
> Correct, I saw it too but I didn't remove it on purpose. I know it can lead to a
> similar error, but the fix is more complicated. I don't believe it's correct to
> set real_dev if the SA add failed, so we need to think about a different way
> to sync it. To be fair in real life it would be more difficult to hit it because
> the device must be in a state where the SA add fails, although it supports
> xfrm offload. The problem is that real_dev must be set before attempting the SA
> add in the first place.
> 

Just fyi I do have an idea about an additional bit that is set on successful ops
in combination with a call_rcu to wait for a grace period on error, I'll test it
this week and send a patch if it's good.

>>>  kernel: Code: e0 0f 0b 48 83 7f 38 00 74 de 0f 0b 48 8b 47 08 48 8b 37 48 8b 78 40 e9 b2 e5 9a d7 66 90 0f 1f 44 00 00 48 8b 86 80 02 00 00 <83> 80 30 10 00 00 01 b8 01 00 00 00 c3 0f 1f 80 00 00 00 00 0f 1f
>>>  kernel: bond0: (slave eni0np1): making interface the new active one
>>>  kernel: RSP: 0018:ffffabde81553b98 EFLAGS: 00010246
>>>  kernel: bond0: (slave eni0np1): bond_ipsec_add_sa_all: failed to add SA
>>>  kernel:
>>>  kernel: RAX: 0000000000000000 RBX: ffff9eb404e74900 RCX: ffff9eb403d97c60
>>>  kernel: RDX: ffffffffc090de10 RSI: ffff9eb404e74900 RDI: ffff9eb3c5de9e00
>>>  kernel: RBP: ffff9eb3c0a42000 R08: 0000000000000010 R09: 0000000000000014
>>>  kernel: R10: 7974203030303030 R11: 3030303030303030 R12: 0000000000000000
>>>  kernel: R13: ffff9eb3c5de9e00 R14: ffffabde81553cc8 R15: ffff9eb404c53000
>>>  kernel: FS:  00007f2a77a3ad00(0000) GS:ffff9eb43bd00000(0000) knlGS:0000000000000000
>>>  kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>  kernel: CR2: 0000000000001030 CR3: 00000001122ab000 CR4: 0000000000350ef0
>>>  kernel: bond0: (slave eni0np1): making interface the new active one
>>>  kernel: Call Trace:
>>>  kernel:  <TASK>
>>>  kernel:  ? __die+0x1f/0x60
>>>  kernel: bond0: (slave eni0np1): bond_ipsec_add_sa_all: failed to add SA
>>>  kernel:  ? page_fault_oops+0x142/0x4c0
>>>  kernel:  ? do_user_addr_fault+0x65/0x670
>>>  kernel:  ? kvm_read_and_reset_apf_flags+0x3b/0x50
>>>  kernel: bond0: (slave eni0np1): making interface the new active one
>>>  kernel:  ? exc_page_fault+0x7b/0x180
>>>  kernel:  ? asm_exc_page_fault+0x22/0x30
>>>  kernel:  ? nsim_bpf_uninit+0x50/0x50 [netdevsim]
>>>  kernel: bond0: (slave eni0np1): bond_ipsec_add_sa_all: failed to add SA
>>>  kernel:  ? nsim_ipsec_offload_ok+0xc/0x20 [netdevsim]
>>>  kernel: bond0: (slave eni0np1): making interface the new active one
>>>  kernel:  bond_ipsec_offload_ok+0x7b/0x90 [bonding]
>>>  kernel:  xfrm_output+0x61/0x3b0
>>>  kernel: bond0: (slave eni0np1): bond_ipsec_add_sa_all: failed to add SA
>>>  kernel:  ip_push_pending_frames+0x56/0x80
>>>
>>> Fixes: 18cb261afd7b ("bonding: support hardware encryption offload to slaves")
>>> Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
>>> ---
>>>  drivers/net/bonding/bond_main.c | 1 -
>>>  1 file changed, 1 deletion(-)
>>>
>>> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>>> index 65ddb71eebcd..f74bacf071fc 100644
>>> --- a/drivers/net/bonding/bond_main.c
>>> +++ b/drivers/net/bonding/bond_main.c
>>> @@ -582,7 +582,6 @@ static void bond_ipsec_del_sa_all(struct bonding *bond)
>>>  		} else {
>>>  			slave->dev->xfrmdev_ops->xdo_dev_state_delete(ipsec->xs);
>>>  		}
>>> -		ipsec->xs->xso.real_dev = NULL;
>>>  	}
>>>  	spin_unlock_bh(&bond->ipsec_lock);
>>>  	rcu_read_unlock();
>>> -- 
>>> 2.44.0
>>>
> 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net 3/4] bonding: fix xfrm real_dev null pointer dereference
  2024-08-19  7:34     ` Nikolay Aleksandrov
  2024-08-19  8:25       ` Nikolay Aleksandrov
@ 2024-08-19  8:34       ` Hangbin Liu
  1 sibling, 0 replies; 20+ messages in thread
From: Hangbin Liu @ 2024-08-19  8:34 UTC (permalink / raw)
  To: Nikolay Aleksandrov
  Cc: netdev, Taehee Yoo, davem, jv, andy, edumazet, kuba, pabeni,
	jarod

On Mon, Aug 19, 2024 at 10:34:16AM +0300, Nikolay Aleksandrov wrote:
> On 19/08/2024 05:54, Hangbin Liu wrote:
> > On Fri, Aug 16, 2024 at 02:48:12PM +0300, Nikolay Aleksandrov wrote:
> >> We shouldn't set real_dev to NULL because packets can be in transit and
> >> xfrm might call xdo_dev_offload_ok() in parallel. All callbacks assume
> >> real_dev is set.
> >>
> >>  Example trace:
> >>  kernel: BUG: unable to handle page fault for address: 0000000000001030
> >>  kernel: bond0: (slave eni0np1): making interface the new active one
> >>  kernel: #PF: supervisor write access in kernel mode
> >>  kernel: #PF: error_code(0x0002) - not-present page
> >>  kernel: PGD 0 P4D 0
> >>  kernel: Oops: 0002 [#1] PREEMPT SMP
> >>  kernel: CPU: 4 PID: 2237 Comm: ping Not tainted 6.7.7+ #12
> >>  kernel: Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-2.fc40 04/01/2014
> >>  kernel: RIP: 0010:nsim_ipsec_offload_ok+0xc/0x20 [netdevsim]
> >>  kernel: bond0: (slave eni0np1): bond_ipsec_add_sa_all: failed to add SA
> > 
> > I saw the errors are during bond_ipsec_add_sa_all, which also
> > set ipsec->xs->xso.real_dev = NULL. Should we fix it there?
> > 
> > Thanks
> > Hangbin
> 
> Correct, I saw it too but I didn't remove it on purpose. I know it can lead to a
> similar error, but the fix is more complicated. I don't believe it's correct to
> set real_dev if the SA add failed, so we need to think about a different way
> to sync it. To be fair in real life it would be more difficult to hit it because
> the device must be in a state where the SA add fails, although it supports
> xfrm offload. The problem is that real_dev must be set before attempting the SA
> add in the first place.

Got it, so this time we only fix the delete path.

Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net 4/4] bonding: fix xfrm state handling when clearing active slave
  2024-08-19  7:38     ` Nikolay Aleksandrov
@ 2024-08-20  3:48       ` Hangbin Liu
  0 siblings, 0 replies; 20+ messages in thread
From: Hangbin Liu @ 2024-08-20  3:48 UTC (permalink / raw)
  To: Nikolay Aleksandrov
  Cc: netdev, Taehee Yoo, davem, jv, andy, edumazet, kuba, pabeni,
	jarod

On Mon, Aug 19, 2024 at 10:38:01AM +0300, Nikolay Aleksandrov wrote:
> >> diff --git a/drivers/net/bonding/bond_options.c b/drivers/net/bonding/bond_options.c
> >> index bc80fb6397dc..95d59a18c022 100644
> >> --- a/drivers/net/bonding/bond_options.c
> >> +++ b/drivers/net/bonding/bond_options.c
> >> @@ -936,7 +936,7 @@ static int bond_option_active_slave_set(struct bonding *bond,
> >>  	/* check to see if we are clearing active */
> >>  	if (!slave_dev) {
> >>  		netdev_dbg(bond->dev, "Clearing current active slave\n");
> >> -		RCU_INIT_POINTER(bond->curr_active_slave, NULL);
> >> +		bond_change_active_slave(bond, NULL);
> > 
> > The good part of this is we can do bond_ipsec_del_sa_all and
> > bond_ipsec_add_sa_all. I'm not sure if we should do promisc/mcast adjustment
> > when set active_slave to null.
> > 
> > Jay should know better.
> > 
> > Thanks
> > Hangbin
> 
> Jay please correct me, but I'm pretty sure we should adjust them. They get adjusted on
> every active slave change, this is no different. In fact I'd argue that it's a long
> standing bug because they don't get adjusted when the active slave is cleared
> manually and if a new one is chosen (we call bond_select_active_slave() right after)
> then the old one would still have them set. During normal operations and automatic
> curr active slave changes, it is always adjusted.

OK, I rechecked the code. The mcast resend only happens when there is a new
new_active or in rr mode. But bond_option_active_slave_set() only called with
active-backup/alb/tlb mode. So this should be safe.

Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net 0/4] bonding: fix xfrm offload bugs
  2024-08-16 11:48 [PATCH net 0/4] bonding: fix xfrm offload bugs Nikolay Aleksandrov
                   ` (4 preceding siblings ...)
  2024-08-16 11:52 ` [PATCH net 0/4] bonding: fix xfrm offload bugs Nikolay Aleksandrov
@ 2024-08-20 13:40 ` patchwork-bot+netdevbpf
  5 siblings, 0 replies; 20+ messages in thread
From: patchwork-bot+netdevbpf @ 2024-08-20 13:40 UTC (permalink / raw)
  To: Nikolay Aleksandrov
  Cc: netdev, ap420073, davem, jv, andy, edumazet, kuba, pabeni, jarod

Hello:

This series was applied to netdev/net.git (main)
by Paolo Abeni <pabeni@redhat.com>:

On Fri, 16 Aug 2024 14:48:09 +0300 you wrote:
> Hi,
> I noticed these problems while reviewing a bond xfrm patch recently.
> The fixes are straight-forward, please review carefully the last one
> because it has side-effects. This set has passed bond's selftests
> and my custom bond stress tests which crash without these fixes.
> 
> Note the first patch is not critical, but it simplifies the next fix.
> 
> [...]

Here is the summary with links:
  - [net,1/4] bonding: fix bond_ipsec_offload_ok return type
    https://git.kernel.org/netdev/net/c/fc59b9a5f720
  - [net,2/4] bonding: fix null pointer deref in bond_ipsec_offload_ok
    https://git.kernel.org/netdev/net/c/95c90e4ad89d
  - [net,3/4] bonding: fix xfrm real_dev null pointer dereference
    https://git.kernel.org/netdev/net/c/f8cde9805981
  - [net,4/4] bonding: fix xfrm state handling when clearing active slave
    https://git.kernel.org/netdev/net/c/c4c5c5d2ef40

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2024-08-20 13:40 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-16 11:48 [PATCH net 0/4] bonding: fix xfrm offload bugs Nikolay Aleksandrov
2024-08-16 11:48 ` [PATCH net 1/4] bonding: fix bond_ipsec_offload_ok return type Nikolay Aleksandrov
2024-08-19  2:43   ` Hangbin Liu
2024-08-16 11:48 ` [PATCH net 2/4] bonding: fix null pointer deref in bond_ipsec_offload_ok Nikolay Aleksandrov
2024-08-19  2:45   ` Hangbin Liu
2024-08-19  2:53   ` Hangbin Liu
2024-08-19  7:25     ` Nikolay Aleksandrov
2024-08-19  8:12       ` Hangbin Liu
2024-08-19  7:26   ` Eric Dumazet
2024-08-16 11:48 ` [PATCH net 3/4] bonding: fix xfrm real_dev null pointer dereference Nikolay Aleksandrov
2024-08-19  2:54   ` Hangbin Liu
2024-08-19  7:34     ` Nikolay Aleksandrov
2024-08-19  8:25       ` Nikolay Aleksandrov
2024-08-19  8:34       ` Hangbin Liu
2024-08-16 11:48 ` [PATCH net 4/4] bonding: fix xfrm state handling when clearing active slave Nikolay Aleksandrov
2024-08-19  3:05   ` Hangbin Liu
2024-08-19  7:38     ` Nikolay Aleksandrov
2024-08-20  3:48       ` Hangbin Liu
2024-08-16 11:52 ` [PATCH net 0/4] bonding: fix xfrm offload bugs Nikolay Aleksandrov
2024-08-20 13:40 ` patchwork-bot+netdevbpf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).