* [PATCH net 0/2] bond: fix xfrm offload feature during init
@ 2024-12-11 7:11 Hangbin Liu
2024-12-11 7:11 ` [PATCH net 1/2] bonding: fix xfrm offload feature setup on active-backup mode Hangbin Liu
` (2 more replies)
0 siblings, 3 replies; 29+ messages in thread
From: Hangbin Liu @ 2024-12-11 7:11 UTC (permalink / raw)
To: netdev
Cc: Jay Vosburgh, Andy Gospodarek, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Nikolay Aleksandrov, Simon Horman,
Andrew Lunn, Shuah Khan, linux-kselftest, linux-kernel,
Hangbin Liu
The first patch fixes the xfrm offload feature during setup active-backup
mode. The second patch add a ipsec offload testing.
Hangbin Liu (2):
bonding: fix xfrm offload feature setup on active-backup mode
selftests: bonding: add ipsec offload test
drivers/net/bonding/bond_main.c | 2 +-
drivers/net/bonding/bond_netlink.c | 17 +-
include/net/bonding.h | 1 +
.../selftests/drivers/net/bonding/Makefile | 3 +-
.../drivers/net/bonding/bond_ipsec_offload.sh | 155 ++++++++++++++++++
.../selftests/drivers/net/bonding/config | 4 +
6 files changed, 173 insertions(+), 9 deletions(-)
create mode 100755 tools/testing/selftests/drivers/net/bonding/bond_ipsec_offload.sh
--
2.39.5 (Apple Git-154)
^ permalink raw reply [flat|nested] 29+ messages in thread
* [PATCH net 1/2] bonding: fix xfrm offload feature setup on active-backup mode
2024-12-11 7:11 [PATCH net 0/2] bond: fix xfrm offload feature during init Hangbin Liu
@ 2024-12-11 7:11 ` Hangbin Liu
2024-12-12 9:19 ` Nikolay Aleksandrov
2024-12-11 7:11 ` [PATCH net 2/2] selftests: bonding: add ipsec offload test Hangbin Liu
2024-12-12 14:27 ` [PATCH net 0/2] bond: fix xfrm offload feature during init Jakub Kicinski
2 siblings, 1 reply; 29+ messages in thread
From: Hangbin Liu @ 2024-12-11 7:11 UTC (permalink / raw)
To: netdev
Cc: Jay Vosburgh, Andy Gospodarek, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Nikolay Aleksandrov, Simon Horman,
Andrew Lunn, Shuah Khan, linux-kselftest, linux-kernel,
Hangbin Liu
The active-backup bonding mode supports XFRM ESP offload. However, when
a bond is added using command like `ip link add bond0 type bond mode 1
miimon 100`, the `ethtool -k` command shows that the XFRM ESP offload is
disabled. This occurs because, in bond_newlink(), we change bond link
first and register bond device later. So the XFRM feature update in
bond_option_mode_set() is not called as the bond device is not yet
registered, leading to the offload feature not being set successfully.
To resolve this issue, we can modify the code order in bond_newlink() to
ensure that the bond device is registered first before changing the bond
link parameters. This change will allow the XFRM ESP offload feature to be
correctly enabled.
Fixes: 007ab5345545 ("bonding: fix feature flag setting at init time")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
---
drivers/net/bonding/bond_main.c | 2 +-
drivers/net/bonding/bond_netlink.c | 17 ++++++++++-------
include/net/bonding.h | 1 +
3 files changed, 12 insertions(+), 8 deletions(-)
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 49dd4fe195e5..7daeab67e7b5 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -4389,7 +4389,7 @@ void bond_work_init_all(struct bonding *bond)
INIT_DELAYED_WORK(&bond->slave_arr_work, bond_slave_arr_handler);
}
-static void bond_work_cancel_all(struct bonding *bond)
+void bond_work_cancel_all(struct bonding *bond)
{
cancel_delayed_work_sync(&bond->mii_work);
cancel_delayed_work_sync(&bond->arp_work);
diff --git a/drivers/net/bonding/bond_netlink.c b/drivers/net/bonding/bond_netlink.c
index 2a6a424806aa..7fe8c62366eb 100644
--- a/drivers/net/bonding/bond_netlink.c
+++ b/drivers/net/bonding/bond_netlink.c
@@ -568,18 +568,21 @@ static int bond_newlink(struct net *src_net, struct net_device *bond_dev,
struct nlattr *tb[], struct nlattr *data[],
struct netlink_ext_ack *extack)
{
+ struct bonding *bond = netdev_priv(bond_dev);
int err;
- err = bond_changelink(bond_dev, tb, data, extack);
- if (err < 0)
+ err = register_netdevice(bond_dev);
+ if (err)
return err;
- err = register_netdevice(bond_dev);
- if (!err) {
- struct bonding *bond = netdev_priv(bond_dev);
+ netif_carrier_off(bond_dev);
+ bond_work_init_all(bond);
- netif_carrier_off(bond_dev);
- bond_work_init_all(bond);
+ err = bond_changelink(bond_dev, tb, data, extack);
+ if (err) {
+ bond_work_cancel_all(bond);
+ netif_carrier_on(bond_dev);
+ unregister_netdevice(bond_dev);
}
return err;
diff --git a/include/net/bonding.h b/include/net/bonding.h
index 8bb5f016969f..e5e005cd2e17 100644
--- a/include/net/bonding.h
+++ b/include/net/bonding.h
@@ -707,6 +707,7 @@ struct bond_vlan_tag *bond_verify_device_path(struct net_device *start_dev,
int bond_update_slave_arr(struct bonding *bond, struct slave *skipslave);
void bond_slave_arr_work_rearm(struct bonding *bond, unsigned long delay);
void bond_work_init_all(struct bonding *bond);
+void bond_work_cancel_all(struct bonding *bond);
#ifdef CONFIG_PROC_FS
void bond_create_proc_entry(struct bonding *bond);
--
2.39.5 (Apple Git-154)
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH net 2/2] selftests: bonding: add ipsec offload test
2024-12-11 7:11 [PATCH net 0/2] bond: fix xfrm offload feature during init Hangbin Liu
2024-12-11 7:11 ` [PATCH net 1/2] bonding: fix xfrm offload feature setup on active-backup mode Hangbin Liu
@ 2024-12-11 7:11 ` Hangbin Liu
2024-12-12 14:27 ` [PATCH net 0/2] bond: fix xfrm offload feature during init Jakub Kicinski
2 siblings, 0 replies; 29+ messages in thread
From: Hangbin Liu @ 2024-12-11 7:11 UTC (permalink / raw)
To: netdev
Cc: Jay Vosburgh, Andy Gospodarek, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Nikolay Aleksandrov, Simon Horman,
Andrew Lunn, Shuah Khan, linux-kselftest, linux-kernel,
Hangbin Liu
This introduces a test for IPSec offload over bonding, utilizing netdevsim
for the testing process, as veth interfaces do not support IPSec offload.
The test will ensure that the IPSec offload functionality remains operational
even after a failover event occurs in the bonding configuration.
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
---
.../selftests/drivers/net/bonding/Makefile | 3 +-
.../drivers/net/bonding/bond_ipsec_offload.sh | 155 ++++++++++++++++++
.../selftests/drivers/net/bonding/config | 4 +
3 files changed, 161 insertions(+), 1 deletion(-)
create mode 100755 tools/testing/selftests/drivers/net/bonding/bond_ipsec_offload.sh
diff --git a/tools/testing/selftests/drivers/net/bonding/Makefile b/tools/testing/selftests/drivers/net/bonding/Makefile
index 03a089165d3f..c938475fdefa 100644
--- a/tools/testing/selftests/drivers/net/bonding/Makefile
+++ b/tools/testing/selftests/drivers/net/bonding/Makefile
@@ -10,7 +10,8 @@ TEST_PROGS := \
mode-2-recovery-updelay.sh \
bond_options.sh \
bond-eth-type-change.sh \
- bond_macvlan.sh
+ bond_macvlan.sh \
+ bond_ipsec_offload.sh
TEST_FILES := \
lag_lib.sh \
diff --git a/tools/testing/selftests/drivers/net/bonding/bond_ipsec_offload.sh b/tools/testing/selftests/drivers/net/bonding/bond_ipsec_offload.sh
new file mode 100755
index 000000000000..868f22ad11aa
--- /dev/null
+++ b/tools/testing/selftests/drivers/net/bonding/bond_ipsec_offload.sh
@@ -0,0 +1,155 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+
+# IPsec over bonding offload test:
+#
+# +----------------+
+# | bond0 |
+# | | |
+# | eth0 eth1 |
+# +---+-------+----+
+#
+# We use netdevsim instead of physical interfaces
+#-------------------------------------------------------------------
+# Example commands
+# ip x s add proto esp src 192.0.2.1 dst 192.0.2.2 \
+# spi 0x07 mode transport reqid 0x07 replay-window 32 \
+# aead 'rfc4106(gcm(aes))' 1234567890123456dcba 128 \
+# sel src 192.0.2.1/24 dst 192.0.2.2/24
+# offload dev bond0 dir out
+# ip x p add dir out src 192.0.2.1/24 dst 192.0.2.2/24 \
+# tmpl proto esp src 192.0.2.1 dst 192.0.2.2 \
+# spi 0x07 mode transport reqid 0x07
+#
+#-------------------------------------------------------------------
+
+lib_dir=$(dirname "$0")
+source "$lib_dir"/../../../net/lib.sh
+algo="aead rfc4106(gcm(aes)) 0x3132333435363738393031323334353664636261 128"
+srcip=192.0.2.1
+dstip=192.0.2.2
+ipsec0=/sys/kernel/debug/netdevsim/netdevsim0/ports/0/ipsec
+ipsec1=/sys/kernel/debug/netdevsim/netdevsim0/ports/1/ipsec
+ret=0
+
+cleanup()
+{
+ modprobe -r netdevsim
+ cleanup_ns $ns
+}
+
+active_slave_changed()
+{
+ local old_active_slave=$1
+ local new_active_slave=$(ip -n ${ns} -d -j link show bond0 | \
+ jq -r ".[].linkinfo.info_data.active_slave")
+ [ "$new_active_slave" != "$old_active_slave" -a "$new_active_slave" != "null" ]
+}
+
+test_offload()
+{
+ # use ping to exercise the Tx path
+ ip netns exec $ns ping -I bond0 -c 3 -W 1 -i 0 $dstip >/dev/null
+
+ active_slave=$(ip -n ${ns} -d -j link show bond0 | \
+ jq -r ".[].linkinfo.info_data.active_slave")
+
+ if [ $active_slave = $nic0 ]; then
+ sysfs=$ipsec0
+ elif [ $active_slave = $nic1 ]; then
+ sysfs=$ipsec1
+ else
+ echo "FAIL: bond_ipsec_offload invalid active_slave $active_slave"
+ ret=1
+ fi
+
+ # The tx/rx order in sysfs may changed after failover
+ if grep -q "SA count=2 tx=3" $sysfs && grep -q "tx ipaddr=$dstip" $sysfs; then
+ echo "PASS: bond_ipsec_offload has correct tx count with link ${active_slave}"
+ else
+ echo "FAIL: bond_ipsec_offload incorrect tx count with link ${active_slave}"
+ ret=1
+ fi
+}
+
+if ! mount | grep -q debugfs; then
+ mount -t debugfs none /sys/kernel/debug/ &> /dev/null
+fi
+
+# setup netdevsim since dummy/veth dev doesn't have offload support
+if [ ! -w /sys/bus/netdevsim/new_device ] ; then
+ modprobe -q netdevsim
+ if [ $? -ne 0 ]; then
+ echo "SKIP: can't load netdevsim for ipsec offload"
+ return $ksft_skip
+ fi
+fi
+
+trap cleanup EXIT
+
+setup_ns ns
+ip -n $ns link add bond0 type bond mode active-backup miimon 100
+ip -n $ns addr add $srcip/24 dev bond0
+ip -n $ns link set bond0 up
+
+ifaces=$(ip netns exec $ns bash -c '
+ sysfsnet=/sys/bus/netdevsim/devices/netdevsim0/net/
+ echo "0 2" > /sys/bus/netdevsim/new_device
+ while [ ! -d $sysfsnet ] ; do :; done
+ udevadm settle
+ ls $sysfsnet
+')
+nic0=$(echo $ifaces | cut -f1 -d ' ')
+nic1=$(echo $ifaces | cut -f2 -d ' ')
+ip -n $ns link set $nic0 master bond0
+ip -n $ns link set $nic1 master bond0
+
+# create offloaded SAs, both in and out
+ip -n $ns x p add dir out src $srcip/24 dst $dstip/24 \
+ tmpl proto esp src $srcip dst $dstip spi 9 \
+ mode transport reqid 42
+
+ip -n $ns x p add dir in src $dstip/24 dst $srcip/24 \
+ tmpl proto esp src $dstip dst $srcip spi 9 \
+ mode transport reqid 42
+
+ip -n $ns x s add proto esp src $srcip dst $dstip spi 9 \
+ mode transport reqid 42 $algo sel src $srcip/24 dst $dstip/24 \
+ offload dev bond0 dir out
+
+ip -n $ns x s add proto esp src $dstip dst $srcip spi 9 \
+ mode transport reqid 42 $algo sel src $dstip/24 dst $srcip/24 \
+ offload dev bond0 dir in
+
+# does offload show up in ip output
+lines=`ip -n $ns x s list | grep -c "crypto offload parameters: dev bond0 dir"`
+if [ $lines -ne 2 ] ; then
+ echo "FAIL: bond_ipsec_offload SA offload missing from list output"
+ ret=1
+fi
+
+# we didn't create a peer, make sure we can Tx by adding a permanent neighbour
+# this need to be added after enslave
+ip -n $ns neigh add $dstip dev bond0 lladdr 00:11:22:33:44:55
+
+# start Offload testing
+test_offload
+
+# do failover
+ip -n $ns link set $active_slave down
+slowwait 5 active_slave_changed $active_slave
+test_offload
+
+# make sure offload get removed from driver
+ip -n $ns x s flush
+ip -n $ns x p flush
+line0=$(grep -c "SA count=0" $ipsec0)
+line1=$(grep -c "SA count=0" $ipsec1)
+if [ $line0 -ne 1 -o $line1 -ne 1 ] ; then
+ echo "FAIL: bond_ipsec_offload SA not removed from driver"
+ ret=1
+else
+ echo "PASS: bond_ipsec_offload SA removed from driver"
+fi
+
+exit $ret
diff --git a/tools/testing/selftests/drivers/net/bonding/config b/tools/testing/selftests/drivers/net/bonding/config
index 899d7fb6ea8e..91c581abe79c 100644
--- a/tools/testing/selftests/drivers/net/bonding/config
+++ b/tools/testing/selftests/drivers/net/bonding/config
@@ -8,3 +8,7 @@ CONFIG_NET_CLS_FLOWER=y
CONFIG_NET_SCH_INGRESS=y
CONFIG_NLMON=y
CONFIG_VETH=y
+CONFIG_INET_ESP=y
+CONFIG_INET_ESP_OFFLOAD=y
+CONFIG_XFRM_USER=m
+CONFIG_NETDEVSIM=m
--
2.39.5 (Apple Git-154)
^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [PATCH net 1/2] bonding: fix xfrm offload feature setup on active-backup mode
2024-12-11 7:11 ` [PATCH net 1/2] bonding: fix xfrm offload feature setup on active-backup mode Hangbin Liu
@ 2024-12-12 9:19 ` Nikolay Aleksandrov
2024-12-12 9:39 ` Hangbin Liu
0 siblings, 1 reply; 29+ messages in thread
From: Nikolay Aleksandrov @ 2024-12-12 9:19 UTC (permalink / raw)
To: Hangbin Liu, netdev
Cc: Jay Vosburgh, Andy Gospodarek, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Simon Horman, Andrew Lunn,
Shuah Khan, linux-kselftest, linux-kernel
On 12/11/24 09:11, Hangbin Liu wrote:
> The active-backup bonding mode supports XFRM ESP offload. However, when
> a bond is added using command like `ip link add bond0 type bond mode 1
> miimon 100`, the `ethtool -k` command shows that the XFRM ESP offload is
> disabled. This occurs because, in bond_newlink(), we change bond link
> first and register bond device later. So the XFRM feature update in
> bond_option_mode_set() is not called as the bond device is not yet
> registered, leading to the offload feature not being set successfully.
>
> To resolve this issue, we can modify the code order in bond_newlink() to
> ensure that the bond device is registered first before changing the bond
> link parameters. This change will allow the XFRM ESP offload feature to be
> correctly enabled.
>
> Fixes: 007ab5345545 ("bonding: fix feature flag setting at init time")
> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
> ---
> drivers/net/bonding/bond_main.c | 2 +-
> drivers/net/bonding/bond_netlink.c | 17 ++++++++++-------
> include/net/bonding.h | 1 +
> 3 files changed, 12 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> index 49dd4fe195e5..7daeab67e7b5 100644
> --- a/drivers/net/bonding/bond_main.c
> +++ b/drivers/net/bonding/bond_main.c
> @@ -4389,7 +4389,7 @@ void bond_work_init_all(struct bonding *bond)
> INIT_DELAYED_WORK(&bond->slave_arr_work, bond_slave_arr_handler);
> }
>
> -static void bond_work_cancel_all(struct bonding *bond)
> +void bond_work_cancel_all(struct bonding *bond)
> {
> cancel_delayed_work_sync(&bond->mii_work);
> cancel_delayed_work_sync(&bond->arp_work);
> diff --git a/drivers/net/bonding/bond_netlink.c b/drivers/net/bonding/bond_netlink.c
> index 2a6a424806aa..7fe8c62366eb 100644
> --- a/drivers/net/bonding/bond_netlink.c
> +++ b/drivers/net/bonding/bond_netlink.c
> @@ -568,18 +568,21 @@ static int bond_newlink(struct net *src_net, struct net_device *bond_dev,
> struct nlattr *tb[], struct nlattr *data[],
> struct netlink_ext_ack *extack)
> {
> + struct bonding *bond = netdev_priv(bond_dev);
> int err;
>
> - err = bond_changelink(bond_dev, tb, data, extack);
> - if (err < 0)
> + err = register_netdevice(bond_dev);
> + if (err)
> return err;
>
> - err = register_netdevice(bond_dev);
> - if (!err) {
> - struct bonding *bond = netdev_priv(bond_dev);
> + netif_carrier_off(bond_dev);
> + bond_work_init_all(bond);
>
> - netif_carrier_off(bond_dev);
> - bond_work_init_all(bond);
> + err = bond_changelink(bond_dev, tb, data, extack);
> + if (err) {
> + bond_work_cancel_all(bond);
> + netif_carrier_on(bond_dev);
The patch looks good, but I'm curious why the carrier on here?
> + unregister_netdevice(bond_dev);
> }
>
> return err;
> diff --git a/include/net/bonding.h b/include/net/bonding.h
> index 8bb5f016969f..e5e005cd2e17 100644
> --- a/include/net/bonding.h
> +++ b/include/net/bonding.h
> @@ -707,6 +707,7 @@ struct bond_vlan_tag *bond_verify_device_path(struct net_device *start_dev,
> int bond_update_slave_arr(struct bonding *bond, struct slave *skipslave);
> void bond_slave_arr_work_rearm(struct bonding *bond, unsigned long delay);
> void bond_work_init_all(struct bonding *bond);
> +void bond_work_cancel_all(struct bonding *bond);
>
> #ifdef CONFIG_PROC_FS
> void bond_create_proc_entry(struct bonding *bond);
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH net 1/2] bonding: fix xfrm offload feature setup on active-backup mode
2024-12-12 9:19 ` Nikolay Aleksandrov
@ 2024-12-12 9:39 ` Hangbin Liu
2024-12-12 9:43 ` Nikolay Aleksandrov
0 siblings, 1 reply; 29+ messages in thread
From: Hangbin Liu @ 2024-12-12 9:39 UTC (permalink / raw)
To: Nikolay Aleksandrov
Cc: netdev, Jay Vosburgh, Andy Gospodarek, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
Andrew Lunn, Shuah Khan, linux-kselftest, linux-kernel
On Thu, Dec 12, 2024 at 11:19:33AM +0200, Nikolay Aleksandrov wrote:
> > diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> > index 49dd4fe195e5..7daeab67e7b5 100644
> > --- a/drivers/net/bonding/bond_main.c
> > +++ b/drivers/net/bonding/bond_main.c
> > @@ -4389,7 +4389,7 @@ void bond_work_init_all(struct bonding *bond)
> > INIT_DELAYED_WORK(&bond->slave_arr_work, bond_slave_arr_handler);
> > }
> >
> > -static void bond_work_cancel_all(struct bonding *bond)
> > +void bond_work_cancel_all(struct bonding *bond)
> > {
> > cancel_delayed_work_sync(&bond->mii_work);
> > cancel_delayed_work_sync(&bond->arp_work);
> > diff --git a/drivers/net/bonding/bond_netlink.c b/drivers/net/bonding/bond_netlink.c
> > index 2a6a424806aa..7fe8c62366eb 100644
> > --- a/drivers/net/bonding/bond_netlink.c
> > +++ b/drivers/net/bonding/bond_netlink.c
> > @@ -568,18 +568,21 @@ static int bond_newlink(struct net *src_net, struct net_device *bond_dev,
> > struct nlattr *tb[], struct nlattr *data[],
> > struct netlink_ext_ack *extack)
> > {
> > + struct bonding *bond = netdev_priv(bond_dev);
> > int err;
> >
> > - err = bond_changelink(bond_dev, tb, data, extack);
> > - if (err < 0)
> > + err = register_netdevice(bond_dev);
> > + if (err)
> > return err;
> >
> > - err = register_netdevice(bond_dev);
> > - if (!err) {
> > - struct bonding *bond = netdev_priv(bond_dev);
> > + netif_carrier_off(bond_dev);
> > + bond_work_init_all(bond);
> >
> > - netif_carrier_off(bond_dev);
> > - bond_work_init_all(bond);
> > + err = bond_changelink(bond_dev, tb, data, extack);
> > + if (err) {
> > + bond_work_cancel_all(bond);
> > + netif_carrier_on(bond_dev);
>
> The patch looks good, but I'm curious why the carrier on here?
The current code set netif_carrier_off(bond_dev) after register_netdevice()
success, So I make it on if register failed.
Thanks
hangbin
>
> > + unregister_netdevice(bond_dev);
> > }
> >
> > return err;
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH net 1/2] bonding: fix xfrm offload feature setup on active-backup mode
2024-12-12 9:39 ` Hangbin Liu
@ 2024-12-12 9:43 ` Nikolay Aleksandrov
2024-12-13 3:10 ` Hangbin Liu
0 siblings, 1 reply; 29+ messages in thread
From: Nikolay Aleksandrov @ 2024-12-12 9:43 UTC (permalink / raw)
To: Hangbin Liu
Cc: netdev, Jay Vosburgh, Andy Gospodarek, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
Andrew Lunn, Shuah Khan, linux-kselftest, linux-kernel
On 12/12/24 11:39, Hangbin Liu wrote:
> On Thu, Dec 12, 2024 at 11:19:33AM +0200, Nikolay Aleksandrov wrote:
>>> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>>> index 49dd4fe195e5..7daeab67e7b5 100644
>>> --- a/drivers/net/bonding/bond_main.c
>>> +++ b/drivers/net/bonding/bond_main.c
>>> @@ -4389,7 +4389,7 @@ void bond_work_init_all(struct bonding *bond)
>>> INIT_DELAYED_WORK(&bond->slave_arr_work, bond_slave_arr_handler);
>>> }
>>>
>>> -static void bond_work_cancel_all(struct bonding *bond)
>>> +void bond_work_cancel_all(struct bonding *bond)
>>> {
>>> cancel_delayed_work_sync(&bond->mii_work);
>>> cancel_delayed_work_sync(&bond->arp_work);
>>> diff --git a/drivers/net/bonding/bond_netlink.c b/drivers/net/bonding/bond_netlink.c
>>> index 2a6a424806aa..7fe8c62366eb 100644
>>> --- a/drivers/net/bonding/bond_netlink.c
>>> +++ b/drivers/net/bonding/bond_netlink.c
>>> @@ -568,18 +568,21 @@ static int bond_newlink(struct net *src_net, struct net_device *bond_dev,
>>> struct nlattr *tb[], struct nlattr *data[],
>>> struct netlink_ext_ack *extack)
>>> {
>>> + struct bonding *bond = netdev_priv(bond_dev);
>>> int err;
>>>
>>> - err = bond_changelink(bond_dev, tb, data, extack);
>>> - if (err < 0)
>>> + err = register_netdevice(bond_dev);
>>> + if (err)
>>> return err;
>>>
>>> - err = register_netdevice(bond_dev);
>>> - if (!err) {
>>> - struct bonding *bond = netdev_priv(bond_dev);
>>> + netif_carrier_off(bond_dev);
>>> + bond_work_init_all(bond);
>>>
>>> - netif_carrier_off(bond_dev);
>>> - bond_work_init_all(bond);
>>> + err = bond_changelink(bond_dev, tb, data, extack);
>>> + if (err) {
>>> + bond_work_cancel_all(bond);
>>> + netif_carrier_on(bond_dev);
>>
>> The patch looks good, but I'm curious why the carrier on here?
>
> The current code set netif_carrier_off(bond_dev) after register_netdevice()
> success, So I make it on if register failed.
>
> Thanks
> hangbin
I don't like adding code just for symmetry alone, I think you should drop it
unless there is an actual reason to turn carrier on.
>>
>>> + unregister_netdevice(bond_dev);
>>> }
>>>
>>> return err;
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH net 0/2] bond: fix xfrm offload feature during init
2024-12-11 7:11 [PATCH net 0/2] bond: fix xfrm offload feature during init Hangbin Liu
2024-12-11 7:11 ` [PATCH net 1/2] bonding: fix xfrm offload feature setup on active-backup mode Hangbin Liu
2024-12-11 7:11 ` [PATCH net 2/2] selftests: bonding: add ipsec offload test Hangbin Liu
@ 2024-12-12 14:27 ` Jakub Kicinski
2024-12-13 7:18 ` Hangbin Liu
2 siblings, 1 reply; 29+ messages in thread
From: Jakub Kicinski @ 2024-12-12 14:27 UTC (permalink / raw)
To: Hangbin Liu
Cc: netdev, Jay Vosburgh, Andy Gospodarek, David S. Miller,
Eric Dumazet, Paolo Abeni, Nikolay Aleksandrov, Simon Horman,
Andrew Lunn, Shuah Khan, linux-kselftest, linux-kernel
On Wed, 11 Dec 2024 07:11:25 +0000 Hangbin Liu wrote:
> The first patch fixes the xfrm offload feature during setup active-backup
> mode. The second patch add a ipsec offload testing.
Looks like the test is too good, is there a fix pending somewhere for
the BUG below? We can't merge the test before that:
https://netdev-3.bots.linux.dev/vmksft-bonding-dbg/results/900082/11-bond-ipsec-offload-sh/stderr
[ 859.672652][ C3] bond_xfrm_update_stats: eth0 doesn't support xdo_dev_state_update_stats
[ 860.467189][ T8677] bond0: (slave eth0): link status definitely down, disabling slave
[ 860.467664][ T8677] bond0: (slave eth1): making interface the new active one
[ 860.831042][ T9677] bond_xfrm_update_stats: eth1 doesn't support xdo_dev_state_update_stats
[ 862.195271][ T9683] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:562
[ 862.195880][ T9683] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 9683, name: ip
[ 862.196189][ T9683] preempt_count: 201, expected: 0
[ 862.196396][ T9683] RCU nest depth: 0, expected: 0
[ 862.196591][ T9683] 2 locks held by ip/9683:
[ 862.196818][ T9683] #0: ffff88800a829558 (&net->xfrm.xfrm_cfg_mutex){+.+.}-{4:4}, at: xfrm_netlink_rcv+0x65/0x90 [xfrm_user]
[ 862.197264][ T9683] #1: ffff88800f460548 (&x->lock){+.-.}-{3:3}, at: xfrm_state_flush+0x1b3/0x3a0
[ 862.197629][ T9683] CPU: 3 UID: 0 PID: 9683 Comm: ip Not tainted 6.13.0-rc1-virtme #1
[ 862.197967][ T9683] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[ 862.198204][ T9683] Call Trace:
[ 862.198352][ T9683] <TASK>
[ 862.198458][ T9683] dump_stack_lvl+0xb0/0xd0
[ 862.198659][ T9683] __might_resched+0x2f8/0x530
[ 862.198852][ T9683] ? kfree+0x2d/0x330
[ 862.199005][ T9683] __mutex_lock+0xd9/0xbc0
[ 862.199202][ T9683] ? ref_tracker_free+0x35e/0x910
[ 862.199401][ T9683] ? bond_ipsec_del_sa+0x2c1/0x790
[ 862.199937][ T9683] ? find_held_lock+0x2c/0x110
[ 862.200133][ T9683] ? __pfx___mutex_lock+0x10/0x10
[ 862.200329][ T9683] ? bond_ipsec_del_sa+0x280/0x790
[ 862.200519][ T9683] ? xfrm_dev_state_delete+0x97/0x170
[ 862.200711][ T9683] ? __xfrm_state_delete+0x681/0x8e0
[ 862.200907][ T9683] ? xfrm_user_rcv_msg+0x4f8/0x920 [xfrm_user]
[ 862.201151][ T9683] ? netlink_rcv_skb+0x130/0x360
[ 862.201347][ T9683] ? xfrm_netlink_rcv+0x74/0x90 [xfrm_user]
[ 862.201587][ T9683] ? netlink_unicast+0x44b/0x710
[ 862.201780][ T9683] ? netlink_sendmsg+0x723/0xbe0
[ 862.201973][ T9683] ? ____sys_sendmsg+0x7ac/0xa10
[ 862.202164][ T9683] ? ___sys_sendmsg+0xee/0x170
[ 862.202355][ T9683] ? __sys_sendmsg+0x109/0x1a0
[ 862.202546][ T9683] ? do_syscall_64+0xc1/0x1d0
[ 862.202738][ T9683] ? entry_SYSCALL_64_after_hwframe+0x77/0x7f
[ 862.202986][ T9683] ? __pfx_nsim_ipsec_del_sa+0x10/0x10 [netdevsim]
[ 862.203251][ T9683] ? bond_ipsec_del_sa+0x2c1/0x790
[ 862.203457][ T9683] bond_ipsec_del_sa+0x2c1/0x790
[ 862.203648][ T9683] ? __pfx_lock_acquire.part.0+0x10/0x10
[ 862.203845][ T9683] ? __pfx_bond_ipsec_del_sa+0x10/0x10
[ 862.204034][ T9683] ? do_raw_spin_lock+0x131/0x270
[ 862.204225][ T9683] ? __pfx_do_raw_spin_lock+0x10/0x10
[ 862.204468][ T9683] xfrm_dev_state_delete+0x97/0x170
[ 862.204665][ T9683] __xfrm_state_delete+0x681/0x8e0
[ 862.204858][ T9683] xfrm_state_flush+0x1bb/0x3a0
[ 862.205057][ T9683] xfrm_flush_sa+0xf0/0x270 [xfrm_user]
[ 862.205290][ T9683] ? __pfx_xfrm_flush_sa+0x10/0x10 [xfrm_user]
[ 862.205537][ T9683] ? __nla_validate_parse+0x48/0x3d0
[ 862.205744][ T9683] xfrm_user_rcv_msg+0x4f8/0x920 [xfrm_user]
[ 862.205985][ T9683] ? __pfx___lock_release+0x10/0x10
[ 862.206174][ T9683] ? __pfx_xfrm_user_rcv_msg+0x10/0x10 [xfrm_user]
[ 862.206412][ T9683] ? __pfx_validate_chain+0x10/0x10
[ 862.206614][ T9683] ? hlock_class+0x4e/0x130
[ 862.206807][ T9683] ? mark_lock+0x38/0x3e0
[ 862.206986][ T9683] ? __mutex_trylock_common+0xfa/0x260
[ 862.207181][ T9683] ? __pfx___mutex_trylock_common+0x10/0x10
[ 862.207425][ T9683] netlink_rcv_skb+0x130/0x360
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH net 1/2] bonding: fix xfrm offload feature setup on active-backup mode
2024-12-12 9:43 ` Nikolay Aleksandrov
@ 2024-12-13 3:10 ` Hangbin Liu
0 siblings, 0 replies; 29+ messages in thread
From: Hangbin Liu @ 2024-12-13 3:10 UTC (permalink / raw)
To: Nikolay Aleksandrov
Cc: netdev, Jay Vosburgh, Andy Gospodarek, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
Andrew Lunn, Shuah Khan, linux-kselftest, linux-kernel
On Thu, Dec 12, 2024 at 11:43:15AM +0200, Nikolay Aleksandrov wrote:
> >>> --- a/drivers/net/bonding/bond_netlink.c
> >>> +++ b/drivers/net/bonding/bond_netlink.c
> >>> @@ -568,18 +568,21 @@ static int bond_newlink(struct net *src_net, struct net_device *bond_dev,
> >>> struct nlattr *tb[], struct nlattr *data[],
> >>> struct netlink_ext_ack *extack)
> >>> {
> >>> + struct bonding *bond = netdev_priv(bond_dev);
> >>> int err;
> >>>
> >>> - err = bond_changelink(bond_dev, tb, data, extack);
> >>> - if (err < 0)
> >>> + err = register_netdevice(bond_dev);
> >>> + if (err)
> >>> return err;
> >>>
> >>> - err = register_netdevice(bond_dev);
> >>> - if (!err) {
> >>> - struct bonding *bond = netdev_priv(bond_dev);
> >>> + netif_carrier_off(bond_dev);
> >>> + bond_work_init_all(bond);
> >>>
> >>> - netif_carrier_off(bond_dev);
> >>> - bond_work_init_all(bond);
> >>> + err = bond_changelink(bond_dev, tb, data, extack);
> >>> + if (err) {
> >>> + bond_work_cancel_all(bond);
> >>> + netif_carrier_on(bond_dev);
> >>
> >> The patch looks good, but I'm curious why the carrier on here?
> >
> > The current code set netif_carrier_off(bond_dev) after register_netdevice()
> > success, So I make it on if register failed.
> >
> > Thanks
> > hangbin
>
> I don't like adding code just for symmetry alone, I think you should drop it
> unless there is an actual reason to turn carrier on.
OK, I will drop it.
Thanks
Hangbin
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH net 0/2] bond: fix xfrm offload feature during init
2024-12-12 14:27 ` [PATCH net 0/2] bond: fix xfrm offload feature during init Jakub Kicinski
@ 2024-12-13 7:18 ` Hangbin Liu
2024-12-14 3:31 ` Jakub Kicinski
0 siblings, 1 reply; 29+ messages in thread
From: Hangbin Liu @ 2024-12-13 7:18 UTC (permalink / raw)
To: Jakub Kicinski
Cc: netdev, Jay Vosburgh, Andy Gospodarek, David S. Miller,
Eric Dumazet, Paolo Abeni, Nikolay Aleksandrov, Simon Horman,
Jianbo Liu, Tariq Toukan, Andrew Lunn, Shuah Khan,
linux-kselftest, linux-kernel
On Thu, Dec 12, 2024 at 06:27:34AM -0800, Jakub Kicinski wrote:
> On Wed, 11 Dec 2024 07:11:25 +0000 Hangbin Liu wrote:
> > The first patch fixes the xfrm offload feature during setup active-backup
> > mode. The second patch add a ipsec offload testing.
>
> Looks like the test is too good, is there a fix pending somewhere for
> the BUG below? We can't merge the test before that:
This should be a regression of 2aeeef906d5a ("bonding: change ipsec_lock from
spin lock to mutex"). As in xfrm_state_delete we called spin_lock_bh(&x->lock)
for the xfrm state delete.
But I'm not sure if it's proper to release the spin lock in bond code.
This seems too specific.
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 7daeab67e7b5..69563bc958ca 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -592,6 +592,7 @@ static void bond_ipsec_del_sa(struct xfrm_state *xs)
real_dev->xfrmdev_ops->xdo_dev_state_delete(xs);
out:
netdev_put(real_dev, &tracker);
+ spin_unlock_bh(&xs->lock);
mutex_lock(&bond->ipsec_lock);
list_for_each_entry(ipsec, &bond->ipsec_list, list) {
if (ipsec->xs == xs) {
@@ -601,6 +602,7 @@ static void bond_ipsec_del_sa(struct xfrm_state *xs)
}
}
mutex_unlock(&bond->ipsec_lock);
+ spin_lock_bh(&xs->lock);
}
What do you think?
Thanks
Hangbin
>
> https://netdev-3.bots.linux.dev/vmksft-bonding-dbg/results/900082/11-bond-ipsec-offload-sh/stderr
>
> [ 859.672652][ C3] bond_xfrm_update_stats: eth0 doesn't support xdo_dev_state_update_stats
> [ 860.467189][ T8677] bond0: (slave eth0): link status definitely down, disabling slave
> [ 860.467664][ T8677] bond0: (slave eth1): making interface the new active one
> [ 860.831042][ T9677] bond_xfrm_update_stats: eth1 doesn't support xdo_dev_state_update_stats
> [ 862.195271][ T9683] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:562
> [ 862.195880][ T9683] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 9683, name: ip
> [ 862.196189][ T9683] preempt_count: 201, expected: 0
> [ 862.196396][ T9683] RCU nest depth: 0, expected: 0
> [ 862.196591][ T9683] 2 locks held by ip/9683:
> [ 862.196818][ T9683] #0: ffff88800a829558 (&net->xfrm.xfrm_cfg_mutex){+.+.}-{4:4}, at: xfrm_netlink_rcv+0x65/0x90 [xfrm_user]
> [ 862.197264][ T9683] #1: ffff88800f460548 (&x->lock){+.-.}-{3:3}, at: xfrm_state_flush+0x1b3/0x3a0
> [ 862.197629][ T9683] CPU: 3 UID: 0 PID: 9683 Comm: ip Not tainted 6.13.0-rc1-virtme #1
> [ 862.197967][ T9683] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> [ 862.198204][ T9683] Call Trace:
> [ 862.198352][ T9683] <TASK>
> [ 862.198458][ T9683] dump_stack_lvl+0xb0/0xd0
> [ 862.198659][ T9683] __might_resched+0x2f8/0x530
> [ 862.198852][ T9683] ? kfree+0x2d/0x330
> [ 862.199005][ T9683] __mutex_lock+0xd9/0xbc0
> [ 862.199202][ T9683] ? ref_tracker_free+0x35e/0x910
> [ 862.199401][ T9683] ? bond_ipsec_del_sa+0x2c1/0x790
> [ 862.199937][ T9683] ? find_held_lock+0x2c/0x110
> [ 862.200133][ T9683] ? __pfx___mutex_lock+0x10/0x10
> [ 862.200329][ T9683] ? bond_ipsec_del_sa+0x280/0x790
> [ 862.200519][ T9683] ? xfrm_dev_state_delete+0x97/0x170
> [ 862.200711][ T9683] ? __xfrm_state_delete+0x681/0x8e0
> [ 862.200907][ T9683] ? xfrm_user_rcv_msg+0x4f8/0x920 [xfrm_user]
> [ 862.201151][ T9683] ? netlink_rcv_skb+0x130/0x360
> [ 862.201347][ T9683] ? xfrm_netlink_rcv+0x74/0x90 [xfrm_user]
> [ 862.201587][ T9683] ? netlink_unicast+0x44b/0x710
> [ 862.201780][ T9683] ? netlink_sendmsg+0x723/0xbe0
> [ 862.201973][ T9683] ? ____sys_sendmsg+0x7ac/0xa10
> [ 862.202164][ T9683] ? ___sys_sendmsg+0xee/0x170
> [ 862.202355][ T9683] ? __sys_sendmsg+0x109/0x1a0
> [ 862.202546][ T9683] ? do_syscall_64+0xc1/0x1d0
> [ 862.202738][ T9683] ? entry_SYSCALL_64_after_hwframe+0x77/0x7f
> [ 862.202986][ T9683] ? __pfx_nsim_ipsec_del_sa+0x10/0x10 [netdevsim]
> [ 862.203251][ T9683] ? bond_ipsec_del_sa+0x2c1/0x790
> [ 862.203457][ T9683] bond_ipsec_del_sa+0x2c1/0x790
> [ 862.203648][ T9683] ? __pfx_lock_acquire.part.0+0x10/0x10
> [ 862.203845][ T9683] ? __pfx_bond_ipsec_del_sa+0x10/0x10
> [ 862.204034][ T9683] ? do_raw_spin_lock+0x131/0x270
> [ 862.204225][ T9683] ? __pfx_do_raw_spin_lock+0x10/0x10
> [ 862.204468][ T9683] xfrm_dev_state_delete+0x97/0x170
> [ 862.204665][ T9683] __xfrm_state_delete+0x681/0x8e0
> [ 862.204858][ T9683] xfrm_state_flush+0x1bb/0x3a0
> [ 862.205057][ T9683] xfrm_flush_sa+0xf0/0x270 [xfrm_user]
> [ 862.205290][ T9683] ? __pfx_xfrm_flush_sa+0x10/0x10 [xfrm_user]
> [ 862.205537][ T9683] ? __nla_validate_parse+0x48/0x3d0
> [ 862.205744][ T9683] xfrm_user_rcv_msg+0x4f8/0x920 [xfrm_user]
> [ 862.205985][ T9683] ? __pfx___lock_release+0x10/0x10
> [ 862.206174][ T9683] ? __pfx_xfrm_user_rcv_msg+0x10/0x10 [xfrm_user]
> [ 862.206412][ T9683] ? __pfx_validate_chain+0x10/0x10
> [ 862.206614][ T9683] ? hlock_class+0x4e/0x130
> [ 862.206807][ T9683] ? mark_lock+0x38/0x3e0
> [ 862.206986][ T9683] ? __mutex_trylock_common+0xfa/0x260
> [ 862.207181][ T9683] ? __pfx___mutex_trylock_common+0x10/0x10
> [ 862.207425][ T9683] netlink_rcv_skb+0x130/0x360
^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [PATCH net 0/2] bond: fix xfrm offload feature during init
2024-12-13 7:18 ` Hangbin Liu
@ 2024-12-14 3:31 ` Jakub Kicinski
2025-01-02 2:44 ` Hangbin Liu
0 siblings, 1 reply; 29+ messages in thread
From: Jakub Kicinski @ 2024-12-14 3:31 UTC (permalink / raw)
To: Hangbin Liu
Cc: netdev, Jay Vosburgh, Andy Gospodarek, David S. Miller,
Eric Dumazet, Paolo Abeni, Nikolay Aleksandrov, Simon Horman,
Jianbo Liu, Tariq Toukan, Andrew Lunn, Shuah Khan,
linux-kselftest, linux-kernel
On Fri, 13 Dec 2024 07:18:08 +0000 Hangbin Liu wrote:
> On Thu, Dec 12, 2024 at 06:27:34AM -0800, Jakub Kicinski wrote:
> > On Wed, 11 Dec 2024 07:11:25 +0000 Hangbin Liu wrote:
> > > The first patch fixes the xfrm offload feature during setup active-backup
> > > mode. The second patch add a ipsec offload testing.
> >
> > Looks like the test is too good, is there a fix pending somewhere for
> > the BUG below? We can't merge the test before that:
>
> This should be a regression of 2aeeef906d5a ("bonding: change ipsec_lock from
> spin lock to mutex"). As in xfrm_state_delete we called spin_lock_bh(&x->lock)
> for the xfrm state delete.
>
> But I'm not sure if it's proper to release the spin lock in bond code.
> This seems too specific.
>
> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> index 7daeab67e7b5..69563bc958ca 100644
> --- a/drivers/net/bonding/bond_main.c
> +++ b/drivers/net/bonding/bond_main.c
> @@ -592,6 +592,7 @@ static void bond_ipsec_del_sa(struct xfrm_state *xs)
> real_dev->xfrmdev_ops->xdo_dev_state_delete(xs);
> out:
> netdev_put(real_dev, &tracker);
> + spin_unlock_bh(&xs->lock);
> mutex_lock(&bond->ipsec_lock);
> list_for_each_entry(ipsec, &bond->ipsec_list, list) {
> if (ipsec->xs == xs) {
> @@ -601,6 +602,7 @@ static void bond_ipsec_del_sa(struct xfrm_state *xs)
> }
> }
> mutex_unlock(&bond->ipsec_lock);
> + spin_lock_bh(&xs->lock);
> }
>
>
> What do you think?
Re-locking doesn't look great, glancing at the code I don't see any
obvious better workarounds. Easiest fix would be to don't let the
drivers sleep in the callbacks and then we can go back to a spin lock.
Maybe nvidia people have better ideas, I'm not familiar with this
offload.
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH net 0/2] bond: fix xfrm offload feature during init
2024-12-14 3:31 ` Jakub Kicinski
@ 2025-01-02 2:44 ` Hangbin Liu
2025-01-02 3:33 ` Jianbo Liu
0 siblings, 1 reply; 29+ messages in thread
From: Hangbin Liu @ 2025-01-02 2:44 UTC (permalink / raw)
To: Jakub Kicinski, Jianbo Liu
Cc: netdev, Jay Vosburgh, Andy Gospodarek, David S. Miller,
Eric Dumazet, Paolo Abeni, Nikolay Aleksandrov, Simon Horman,
Jianbo Liu, Tariq Toukan, Andrew Lunn, Shuah Khan,
linux-kselftest, linux-kernel
On Fri, Dec 13, 2024 at 07:31:27PM -0800, Jakub Kicinski wrote:
> On Fri, 13 Dec 2024 07:18:08 +0000 Hangbin Liu wrote:
> > On Thu, Dec 12, 2024 at 06:27:34AM -0800, Jakub Kicinski wrote:
> > > On Wed, 11 Dec 2024 07:11:25 +0000 Hangbin Liu wrote:
> > > > The first patch fixes the xfrm offload feature during setup active-backup
> > > > mode. The second patch add a ipsec offload testing.
> > >
> > > Looks like the test is too good, is there a fix pending somewhere for
> > > the BUG below? We can't merge the test before that:
> >
> > This should be a regression of 2aeeef906d5a ("bonding: change ipsec_lock from
> > spin lock to mutex"). As in xfrm_state_delete we called spin_lock_bh(&x->lock)
> > for the xfrm state delete.
> >
> > But I'm not sure if it's proper to release the spin lock in bond code.
> > This seems too specific.
> >
> > diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> > index 7daeab67e7b5..69563bc958ca 100644
> > --- a/drivers/net/bonding/bond_main.c
> > +++ b/drivers/net/bonding/bond_main.c
> > @@ -592,6 +592,7 @@ static void bond_ipsec_del_sa(struct xfrm_state *xs)
> > real_dev->xfrmdev_ops->xdo_dev_state_delete(xs);
> > out:
> > netdev_put(real_dev, &tracker);
> > + spin_unlock_bh(&xs->lock);
> > mutex_lock(&bond->ipsec_lock);
> > list_for_each_entry(ipsec, &bond->ipsec_list, list) {
> > if (ipsec->xs == xs) {
> > @@ -601,6 +602,7 @@ static void bond_ipsec_del_sa(struct xfrm_state *xs)
> > }
> > }
> > mutex_unlock(&bond->ipsec_lock);
> > + spin_lock_bh(&xs->lock);
> > }
> >
> >
> > What do you think?
>
> Re-locking doesn't look great, glancing at the code I don't see any
> obvious better workarounds. Easiest fix would be to don't let the
> drivers sleep in the callbacks and then we can go back to a spin lock.
> Maybe nvidia people have better ideas, I'm not familiar with this
> offload.
I don't know how to disable bonding sleeping since we use mutex_lock now.
Hi Jianbo, do you have any idea?
Thanks
Hangbin
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH net 0/2] bond: fix xfrm offload feature during init
2025-01-02 2:44 ` Hangbin Liu
@ 2025-01-02 3:33 ` Jianbo Liu
2025-01-03 11:05 ` Hangbin Liu
2025-01-06 10:47 ` Hangbin Liu
0 siblings, 2 replies; 29+ messages in thread
From: Jianbo Liu @ 2025-01-02 3:33 UTC (permalink / raw)
To: Hangbin Liu, Jakub Kicinski
Cc: netdev, Jay Vosburgh, Andy Gospodarek, David S. Miller,
Eric Dumazet, Paolo Abeni, Nikolay Aleksandrov, Simon Horman,
Tariq Toukan, Andrew Lunn, Shuah Khan, linux-kselftest,
linux-kernel
On 1/2/2025 10:44 AM, Hangbin Liu wrote:
> On Fri, Dec 13, 2024 at 07:31:27PM -0800, Jakub Kicinski wrote:
>> On Fri, 13 Dec 2024 07:18:08 +0000 Hangbin Liu wrote:
>>> On Thu, Dec 12, 2024 at 06:27:34AM -0800, Jakub Kicinski wrote:
>>>> On Wed, 11 Dec 2024 07:11:25 +0000 Hangbin Liu wrote:
>>>>> The first patch fixes the xfrm offload feature during setup active-backup
>>>>> mode. The second patch add a ipsec offload testing.
>>>>
>>>> Looks like the test is too good, is there a fix pending somewhere for
>>>> the BUG below? We can't merge the test before that:
>>>
>>> This should be a regression of 2aeeef906d5a ("bonding: change ipsec_lock from
>>> spin lock to mutex"). As in xfrm_state_delete we called spin_lock_bh(&x->lock)
>>> for the xfrm state delete.
>>>
>>> But I'm not sure if it's proper to release the spin lock in bond code.
>>> This seems too specific.
>>>
>>> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>>> index 7daeab67e7b5..69563bc958ca 100644
>>> --- a/drivers/net/bonding/bond_main.c
>>> +++ b/drivers/net/bonding/bond_main.c
>>> @@ -592,6 +592,7 @@ static void bond_ipsec_del_sa(struct xfrm_state *xs)
>>> real_dev->xfrmdev_ops->xdo_dev_state_delete(xs);
>>> out:
>>> netdev_put(real_dev, &tracker);
>>> + spin_unlock_bh(&xs->lock);
>>> mutex_lock(&bond->ipsec_lock);
>>> list_for_each_entry(ipsec, &bond->ipsec_list, list) {
>>> if (ipsec->xs == xs) {
>>> @@ -601,6 +602,7 @@ static void bond_ipsec_del_sa(struct xfrm_state *xs)
>>> }
>>> }
>>> mutex_unlock(&bond->ipsec_lock);
>>> + spin_lock_bh(&xs->lock);
>>> }
>>>
>>>
>>> What do you think?
>>
>> Re-locking doesn't look great, glancing at the code I don't see any
>> obvious better workarounds. Easiest fix would be to don't let the
>> drivers sleep in the callbacks and then we can go back to a spin lock.
>> Maybe nvidia people have better ideas, I'm not familiar with this
>> offload.
>
> I don't know how to disable bonding sleeping since we use mutex_lock now.
> Hi Jianbo, do you have any idea?
>
I think we should allow drivers to sleep in the callbacks. So, maybe
it's better to move driver's xdo_dev_state_delete out of state's spin lock.
Thanks!
Jianbo
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH net 0/2] bond: fix xfrm offload feature during init
2025-01-02 3:33 ` Jianbo Liu
@ 2025-01-03 11:05 ` Hangbin Liu
2025-01-06 10:47 ` Hangbin Liu
1 sibling, 0 replies; 29+ messages in thread
From: Hangbin Liu @ 2025-01-03 11:05 UTC (permalink / raw)
To: Jianbo Liu
Cc: Jakub Kicinski, netdev, Jay Vosburgh, Andy Gospodarek,
David S. Miller, Eric Dumazet, Paolo Abeni, Nikolay Aleksandrov,
Simon Horman, Tariq Toukan, Andrew Lunn, Shuah Khan,
linux-kselftest, linux-kernel
On Thu, Jan 02, 2025 at 11:33:34AM +0800, Jianbo Liu wrote:
> > > Re-locking doesn't look great, glancing at the code I don't see any
> > > obvious better workarounds. Easiest fix would be to don't let the
> > > drivers sleep in the callbacks and then we can go back to a spin lock.
> > > Maybe nvidia people have better ideas, I'm not familiar with this
> > > offload.
> >
> > I don't know how to disable bonding sleeping since we use mutex_lock now.
> > Hi Jianbo, do you have any idea?
> >
>
> I think we should allow drivers to sleep in the callbacks. So, maybe it's
> better to move driver's xdo_dev_state_delete out of state's spin lock.
Thanks for the suggestion, let me have a try first.
Hangbin
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH net 0/2] bond: fix xfrm offload feature during init
2025-01-02 3:33 ` Jianbo Liu
2025-01-03 11:05 ` Hangbin Liu
@ 2025-01-06 10:47 ` Hangbin Liu
2025-01-08 2:46 ` Hangbin Liu
1 sibling, 1 reply; 29+ messages in thread
From: Hangbin Liu @ 2025-01-06 10:47 UTC (permalink / raw)
To: Jianbo Liu
Cc: Jakub Kicinski, netdev, Jay Vosburgh, Andy Gospodarek,
David S. Miller, Eric Dumazet, Paolo Abeni, Nikolay Aleksandrov,
Simon Horman, Tariq Toukan, Andrew Lunn, Shuah Khan,
linux-kselftest, linux-kernel
On Thu, Jan 02, 2025 at 11:33:34AM +0800, Jianbo Liu wrote:
> > > Re-locking doesn't look great, glancing at the code I don't see any
> > > obvious better workarounds. Easiest fix would be to don't let the
> > > drivers sleep in the callbacks and then we can go back to a spin lock.
> > > Maybe nvidia people have better ideas, I'm not familiar with this
> > > offload.
> >
> > I don't know how to disable bonding sleeping since we use mutex_lock now.
> > Hi Jianbo, do you have any idea?
> >
>
> I think we should allow drivers to sleep in the callbacks. So, maybe it's
> better to move driver's xdo_dev_state_delete out of state's spin lock.
I just check the code, xfrm_dev_state_delete() and later
dev->xfrmdev_ops->xdo_dev_state_delete(x) have too many xfrm_state x
checks. Can we really move it out of spin lock from xfrm_state_delete()
Thanks
Hangbin
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH net 0/2] bond: fix xfrm offload feature during init
2025-01-06 10:47 ` Hangbin Liu
@ 2025-01-08 2:46 ` Hangbin Liu
2025-01-08 3:40 ` Jianbo Liu
0 siblings, 1 reply; 29+ messages in thread
From: Hangbin Liu @ 2025-01-08 2:46 UTC (permalink / raw)
To: Jianbo Liu
Cc: Jakub Kicinski, netdev, Jay Vosburgh, Andy Gospodarek,
David S. Miller, Eric Dumazet, Paolo Abeni, Nikolay Aleksandrov,
Simon Horman, Tariq Toukan, Andrew Lunn, Shuah Khan,
Steffen Klassert, Herbert Xu, Sabrina Dubroca, linux-kselftest,
linux-kernel
On Mon, Jan 06, 2025 at 10:47:16AM +0000, Hangbin Liu wrote:
> On Thu, Jan 02, 2025 at 11:33:34AM +0800, Jianbo Liu wrote:
> > > > Re-locking doesn't look great, glancing at the code I don't see any
> > > > obvious better workarounds. Easiest fix would be to don't let the
> > > > drivers sleep in the callbacks and then we can go back to a spin lock.
> > > > Maybe nvidia people have better ideas, I'm not familiar with this
> > > > offload.
> > >
> > > I don't know how to disable bonding sleeping since we use mutex_lock now.
> > > Hi Jianbo, do you have any idea?
> > >
> >
> > I think we should allow drivers to sleep in the callbacks. So, maybe it's
> > better to move driver's xdo_dev_state_delete out of state's spin lock.
>
> I just check the code, xfrm_dev_state_delete() and later
> dev->xfrmdev_ops->xdo_dev_state_delete(x) have too many xfrm_state x
> checks. Can we really move it out of spin lock from xfrm_state_delete()
I tried to move the mutex lock code to a work queue, but found we need to
check (ipsec->xs == xs) in bonding. So we still need xfrm_state x during bond
ipsec gc.
So either we add a new lock for xfrm_state, or we need to unlock spin lock in
bonding bond_ipsec_del_sa().
Cc IPsec experts to see if they have any comments.
Background: The xfrm_dev_state_delete() in xfrm_state_delete() is protected
by spin lock. But the driver delete ops dev->xfrmdev_ops->xdo_dev_state_delete(x)
may sleep, e.g. bond_ipsec_del_sa(). What we should deal with this issue?
Thanks
Hangbin
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH net 0/2] bond: fix xfrm offload feature during init
2025-01-08 2:46 ` Hangbin Liu
@ 2025-01-08 3:40 ` Jianbo Liu
2025-01-08 7:14 ` Hangbin Liu
0 siblings, 1 reply; 29+ messages in thread
From: Jianbo Liu @ 2025-01-08 3:40 UTC (permalink / raw)
To: Hangbin Liu
Cc: Jakub Kicinski, netdev, Jay Vosburgh, Andy Gospodarek,
David S. Miller, Eric Dumazet, Paolo Abeni, Nikolay Aleksandrov,
Simon Horman, Tariq Toukan, Andrew Lunn, Shuah Khan,
Steffen Klassert, Herbert Xu, Sabrina Dubroca, linux-kselftest,
linux-kernel
On 1/8/2025 10:46 AM, Hangbin Liu wrote:
> On Mon, Jan 06, 2025 at 10:47:16AM +0000, Hangbin Liu wrote:
>> On Thu, Jan 02, 2025 at 11:33:34AM +0800, Jianbo Liu wrote:
>>>>> Re-locking doesn't look great, glancing at the code I don't see any
>>>>> obvious better workarounds. Easiest fix would be to don't let the
>>>>> drivers sleep in the callbacks and then we can go back to a spin lock.
>>>>> Maybe nvidia people have better ideas, I'm not familiar with this
>>>>> offload.
>>>>
>>>> I don't know how to disable bonding sleeping since we use mutex_lock now.
>>>> Hi Jianbo, do you have any idea?
>>>>
>>>
>>> I think we should allow drivers to sleep in the callbacks. So, maybe it's
>>> better to move driver's xdo_dev_state_delete out of state's spin lock.
>>
>> I just check the code, xfrm_dev_state_delete() and later
>> dev->xfrmdev_ops->xdo_dev_state_delete(x) have too many xfrm_state x
>> checks. Can we really move it out of spin lock from xfrm_state_delete()
>
> I tried to move the mutex lock code to a work queue, but found we need to
> check (ipsec->xs == xs) in bonding. So we still need xfrm_state x during bond
Maybe I miss something, but why need to hold spin lock. You can keep
xfrm state by its refcnt.
> ipsec gc.
>
> So either we add a new lock for xfrm_state, or we need to unlock spin lock in
> bonding bond_ipsec_del_sa().
>
> Cc IPsec experts to see if they have any comments.
>
> Background: The xfrm_dev_state_delete() in xfrm_state_delete() is protected
> by spin lock. But the driver delete ops dev->xfrmdev_ops->xdo_dev_state_delete(x)
> may sleep, e.g. bond_ipsec_del_sa(). What we should deal with this issue?
>
> Thanks
> Hangbin
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH net 0/2] bond: fix xfrm offload feature during init
2025-01-08 3:40 ` Jianbo Liu
@ 2025-01-08 7:14 ` Hangbin Liu
2025-01-09 1:26 ` Jianbo Liu
2025-01-15 9:19 ` Hangbin Liu
0 siblings, 2 replies; 29+ messages in thread
From: Hangbin Liu @ 2025-01-08 7:14 UTC (permalink / raw)
To: Jianbo Liu
Cc: Jakub Kicinski, netdev, Jay Vosburgh, Andy Gospodarek,
David S. Miller, Eric Dumazet, Paolo Abeni, Nikolay Aleksandrov,
Simon Horman, Tariq Toukan, Andrew Lunn, Shuah Khan,
Steffen Klassert, Herbert Xu, Sabrina Dubroca, linux-kselftest,
linux-kernel
On Wed, Jan 08, 2025 at 11:40:05AM +0800, Jianbo Liu wrote:
>
>
> On 1/8/2025 10:46 AM, Hangbin Liu wrote:
> > On Mon, Jan 06, 2025 at 10:47:16AM +0000, Hangbin Liu wrote:
> > > On Thu, Jan 02, 2025 at 11:33:34AM +0800, Jianbo Liu wrote:
> > > > > > Re-locking doesn't look great, glancing at the code I don't see any
> > > > > > obvious better workarounds. Easiest fix would be to don't let the
> > > > > > drivers sleep in the callbacks and then we can go back to a spin lock.
> > > > > > Maybe nvidia people have better ideas, I'm not familiar with this
> > > > > > offload.
> > > > >
> > > > > I don't know how to disable bonding sleeping since we use mutex_lock now.
> > > > > Hi Jianbo, do you have any idea?
> > > > >
> > > >
> > > > I think we should allow drivers to sleep in the callbacks. So, maybe it's
> > > > better to move driver's xdo_dev_state_delete out of state's spin lock.
> > >
> > > I just check the code, xfrm_dev_state_delete() and later
> > > dev->xfrmdev_ops->xdo_dev_state_delete(x) have too many xfrm_state x
> > > checks. Can we really move it out of spin lock from xfrm_state_delete()
> >
> > I tried to move the mutex lock code to a work queue, but found we need to
> > check (ipsec->xs == xs) in bonding. So we still need xfrm_state x during bond
>
> Maybe I miss something, but why need to hold spin lock. You can keep xfrm
> state by its refcnt.
Do you mean move the xfrm_dev_state_delete() out of spin lock directly like:
diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index 67ca7ac955a3..6881ddeb4360 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -766,13 +766,6 @@ int __xfrm_state_delete(struct xfrm_state *x)
if (x->encap_sk)
sock_put(rcu_dereference_raw(x->encap_sk));
- xfrm_dev_state_delete(x);
-
- /* All xfrm_state objects are created by xfrm_state_alloc.
- * The xfrm_state_alloc call gives a reference, and that
- * is what we are dropping here.
- */
- xfrm_state_put(x);
err = 0;
}
@@ -787,8 +780,20 @@ int xfrm_state_delete(struct xfrm_state *x)
spin_lock_bh(&x->lock);
err = __xfrm_state_delete(x);
spin_unlock_bh(&x->lock);
+ if (err)
+ return err;
- return err;
+ if (x->km.state == XFRM_STATE_DEAD) {
+ xfrm_dev_state_delete(x);
+
+ /* All xfrm_state objects are created by xfrm_state_alloc.
+ * The xfrm_state_alloc call gives a reference, and that
+ * is what we are dropping here.
+ */
+ xfrm_state_put(x);
+ }
+
+ return 0;
}
EXPORT_SYMBOL(xfrm_state_delete);
Then why we need the spin lock in xfrm_state_delete?
Hangbin
^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [PATCH net 0/2] bond: fix xfrm offload feature during init
2025-01-08 7:14 ` Hangbin Liu
@ 2025-01-09 1:26 ` Jianbo Liu
2025-01-09 8:37 ` Hangbin Liu
2025-01-15 9:19 ` Hangbin Liu
1 sibling, 1 reply; 29+ messages in thread
From: Jianbo Liu @ 2025-01-09 1:26 UTC (permalink / raw)
To: Hangbin Liu
Cc: Jakub Kicinski, netdev, Jay Vosburgh, Andy Gospodarek,
David S. Miller, Eric Dumazet, Paolo Abeni, Nikolay Aleksandrov,
Simon Horman, Tariq Toukan, Andrew Lunn, Shuah Khan,
Steffen Klassert, Herbert Xu, Sabrina Dubroca, linux-kselftest,
linux-kernel
On 1/8/2025 3:14 PM, Hangbin Liu wrote:
> On Wed, Jan 08, 2025 at 11:40:05AM +0800, Jianbo Liu wrote:
>>
>>
>> On 1/8/2025 10:46 AM, Hangbin Liu wrote:
>>> On Mon, Jan 06, 2025 at 10:47:16AM +0000, Hangbin Liu wrote:
>>>> On Thu, Jan 02, 2025 at 11:33:34AM +0800, Jianbo Liu wrote:
>>>>>>> Re-locking doesn't look great, glancing at the code I don't see any
>>>>>>> obvious better workarounds. Easiest fix would be to don't let the
>>>>>>> drivers sleep in the callbacks and then we can go back to a spin lock.
>>>>>>> Maybe nvidia people have better ideas, I'm not familiar with this
>>>>>>> offload.
>>>>>>
>>>>>> I don't know how to disable bonding sleeping since we use mutex_lock now.
>>>>>> Hi Jianbo, do you have any idea?
>>>>>>
>>>>>
>>>>> I think we should allow drivers to sleep in the callbacks. So, maybe it's
>>>>> better to move driver's xdo_dev_state_delete out of state's spin lock.
>>>>
>>>> I just check the code, xfrm_dev_state_delete() and later
>>>> dev->xfrmdev_ops->xdo_dev_state_delete(x) have too many xfrm_state x
>>>> checks. Can we really move it out of spin lock from xfrm_state_delete()
>>>
>>> I tried to move the mutex lock code to a work queue, but found we need to
>>> check (ipsec->xs == xs) in bonding. So we still need xfrm_state x during bond
>>
>> Maybe I miss something, but why need to hold spin lock. You can keep xfrm
>> state by its refcnt.
>
> Do you mean move the xfrm_dev_state_delete() out of spin lock directly like:
>
Yes. Not feasible?
> diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
> index 67ca7ac955a3..6881ddeb4360 100644
> --- a/net/xfrm/xfrm_state.c
> +++ b/net/xfrm/xfrm_state.c
> @@ -766,13 +766,6 @@ int __xfrm_state_delete(struct xfrm_state *x)
> if (x->encap_sk)
> sock_put(rcu_dereference_raw(x->encap_sk));
>
> - xfrm_dev_state_delete(x);
> -
> - /* All xfrm_state objects are created by xfrm_state_alloc.
> - * The xfrm_state_alloc call gives a reference, and that
> - * is what we are dropping here.
> - */
> - xfrm_state_put(x);
> err = 0;
> }
>
> @@ -787,8 +780,20 @@ int xfrm_state_delete(struct xfrm_state *x)
> spin_lock_bh(&x->lock);
> err = __xfrm_state_delete(x);
> spin_unlock_bh(&x->lock);
> + if (err)
> + return err;
>
> - return err;
> + if (x->km.state == XFRM_STATE_DEAD) {
> + xfrm_dev_state_delete(x);
> +
> + /* All xfrm_state objects are created by xfrm_state_alloc.
> + * The xfrm_state_alloc call gives a reference, and that
> + * is what we are dropping here.
> + */
> + xfrm_state_put(x);
> + }
> +
> + return 0;
> }
> EXPORT_SYMBOL(xfrm_state_delete);
>
>
> Then why we need the spin lock in xfrm_state_delete?
>
No, we don't need. But I am trying to understand what you said in your
last email about adding a new lock, or unlocking spin lock in
bond_ipsec_del_sa(). Anything I missed?
Thanks!
Jianbo
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH net 0/2] bond: fix xfrm offload feature during init
2025-01-09 1:26 ` Jianbo Liu
@ 2025-01-09 8:37 ` Hangbin Liu
2025-01-09 9:51 ` Jianbo Liu
0 siblings, 1 reply; 29+ messages in thread
From: Hangbin Liu @ 2025-01-09 8:37 UTC (permalink / raw)
To: Jianbo Liu
Cc: Jakub Kicinski, netdev, Jay Vosburgh, Andy Gospodarek,
David S. Miller, Eric Dumazet, Paolo Abeni, Nikolay Aleksandrov,
Simon Horman, Tariq Toukan, Andrew Lunn, Shuah Khan,
Steffen Klassert, Herbert Xu, Sabrina Dubroca, linux-kselftest,
linux-kernel
On Thu, Jan 09, 2025 at 09:26:38AM +0800, Jianbo Liu wrote:
>
>
> On 1/8/2025 3:14 PM, Hangbin Liu wrote:
> > On Wed, Jan 08, 2025 at 11:40:05AM +0800, Jianbo Liu wrote:
> > >
> > >
> > > On 1/8/2025 10:46 AM, Hangbin Liu wrote:
> > > > On Mon, Jan 06, 2025 at 10:47:16AM +0000, Hangbin Liu wrote:
> > > > > On Thu, Jan 02, 2025 at 11:33:34AM +0800, Jianbo Liu wrote:
> > > > > > > > Re-locking doesn't look great, glancing at the code I don't see any
> > > > > > > > obvious better workarounds. Easiest fix would be to don't let the
> > > > > > > > drivers sleep in the callbacks and then we can go back to a spin lock.
> > > > > > > > Maybe nvidia people have better ideas, I'm not familiar with this
> > > > > > > > offload.
> > > > > > >
> > > > > > > I don't know how to disable bonding sleeping since we use mutex_lock now.
> > > > > > > Hi Jianbo, do you have any idea?
> > > > > > >
> > > > > >
> > > > > > I think we should allow drivers to sleep in the callbacks. So, maybe it's
> > > > > > better to move driver's xdo_dev_state_delete out of state's spin lock.
> > > > >
> > > > > I just check the code, xfrm_dev_state_delete() and later
> > > > > dev->xfrmdev_ops->xdo_dev_state_delete(x) have too many xfrm_state x
> > > > > checks. Can we really move it out of spin lock from xfrm_state_delete()
> > > >
> > > > I tried to move the mutex lock code to a work queue, but found we need to
> > > > check (ipsec->xs == xs) in bonding. So we still need xfrm_state x during bond
> > >
> > > Maybe I miss something, but why need to hold spin lock. You can keep xfrm
> > > state by its refcnt.
> >
> > Do you mean move the xfrm_dev_state_delete() out of spin lock directly like:
> >
>
> Yes. Not feasible?
>
> > diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
> > index 67ca7ac955a3..6881ddeb4360 100644
> > --- a/net/xfrm/xfrm_state.c
> > +++ b/net/xfrm/xfrm_state.c
> > @@ -766,13 +766,6 @@ int __xfrm_state_delete(struct xfrm_state *x)
> > if (x->encap_sk)
> > sock_put(rcu_dereference_raw(x->encap_sk));
> > - xfrm_dev_state_delete(x);
> > -
> > - /* All xfrm_state objects are created by xfrm_state_alloc.
> > - * The xfrm_state_alloc call gives a reference, and that
> > - * is what we are dropping here.
> > - */
> > - xfrm_state_put(x);
> > err = 0;
> > }
> > @@ -787,8 +780,20 @@ int xfrm_state_delete(struct xfrm_state *x)
> > spin_lock_bh(&x->lock);
> > err = __xfrm_state_delete(x);
> > spin_unlock_bh(&x->lock);
> > + if (err)
> > + return err;
> > - return err;
> > + if (x->km.state == XFRM_STATE_DEAD) {
> > + xfrm_dev_state_delete(x);
> > +
> > + /* All xfrm_state objects are created by xfrm_state_alloc.
> > + * The xfrm_state_alloc call gives a reference, and that
> > + * is what we are dropping here.
> > + */
> > + xfrm_state_put(x);
> > + }
> > +
> > + return 0;
> > }
> > EXPORT_SYMBOL(xfrm_state_delete);
> >
> > Then why we need the spin lock in xfrm_state_delete?
> >
>
> No, we don't need. But I am trying to understand what you said in your last
> email about adding a new lock, or unlocking spin lock in
I *thought* we need the spin lock in xfrm_state_delete(). So to protect xfrm_state,
we need a new lock. Although it looks redundant. e.g.
int xfrm_state_delete(struct xfrm_state *x)
{
int err;
spin_lock_bh(&x->lock);
err = __xfrm_state_delete(x);
spin_unlock_bh(&x->lock);
if (err)
return err;
another_lock(&x->other_lock)
if (x->km.state == XFRM_STATE_DEAD) {
xfrm_dev_state_delete(x);
xfrm_state_put(x);
}
another_unlock(&x->other_lock)
return 0;
}
> bond_ipsec_del_sa(). Anything I missed?
The unlock spin lock in bond_ipsec_del_sa looks like
https://lore.kernel.org/netdev/Z1vfsAyuxcohT7th@fedora/
Thanks
Hangbin
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH net 0/2] bond: fix xfrm offload feature during init
2025-01-09 8:37 ` Hangbin Liu
@ 2025-01-09 9:51 ` Jianbo Liu
2025-01-09 10:17 ` Hangbin Liu
0 siblings, 1 reply; 29+ messages in thread
From: Jianbo Liu @ 2025-01-09 9:51 UTC (permalink / raw)
To: Hangbin Liu
Cc: Jakub Kicinski, netdev, Jay Vosburgh, Andy Gospodarek,
David S. Miller, Eric Dumazet, Paolo Abeni, Nikolay Aleksandrov,
Simon Horman, Tariq Toukan, Andrew Lunn, Shuah Khan,
Steffen Klassert, Herbert Xu, Sabrina Dubroca, linux-kselftest,
linux-kernel
On 1/9/2025 4:37 PM, Hangbin Liu wrote:
> On Thu, Jan 09, 2025 at 09:26:38AM +0800, Jianbo Liu wrote:
>>
>>
>> On 1/8/2025 3:14 PM, Hangbin Liu wrote:
>>> On Wed, Jan 08, 2025 at 11:40:05AM +0800, Jianbo Liu wrote:
>>>>
>>>>
>>>> On 1/8/2025 10:46 AM, Hangbin Liu wrote:
>>>>> On Mon, Jan 06, 2025 at 10:47:16AM +0000, Hangbin Liu wrote:
>>>>>> On Thu, Jan 02, 2025 at 11:33:34AM +0800, Jianbo Liu wrote:
>>>>>>>>> Re-locking doesn't look great, glancing at the code I don't see any
>>>>>>>>> obvious better workarounds. Easiest fix would be to don't let the
>>>>>>>>> drivers sleep in the callbacks and then we can go back to a spin lock.
>>>>>>>>> Maybe nvidia people have better ideas, I'm not familiar with this
>>>>>>>>> offload.
>>>>>>>>
>>>>>>>> I don't know how to disable bonding sleeping since we use mutex_lock now.
>>>>>>>> Hi Jianbo, do you have any idea?
>>>>>>>>
>>>>>>>
>>>>>>> I think we should allow drivers to sleep in the callbacks. So, maybe it's
>>>>>>> better to move driver's xdo_dev_state_delete out of state's spin lock.
>>>>>>
>>>>>> I just check the code, xfrm_dev_state_delete() and later
>>>>>> dev->xfrmdev_ops->xdo_dev_state_delete(x) have too many xfrm_state x
>>>>>> checks. Can we really move it out of spin lock from xfrm_state_delete()
>>>>>
>>>>> I tried to move the mutex lock code to a work queue, but found we need to
>>>>> check (ipsec->xs == xs) in bonding. So we still need xfrm_state x during bond
>>>>
>>>> Maybe I miss something, but why need to hold spin lock. You can keep xfrm
>>>> state by its refcnt.
>>>
>>> Do you mean move the xfrm_dev_state_delete() out of spin lock directly like:
>>>
>>
>> Yes. Not feasible?
>>
>>> diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
>>> index 67ca7ac955a3..6881ddeb4360 100644
>>> --- a/net/xfrm/xfrm_state.c
>>> +++ b/net/xfrm/xfrm_state.c
>>> @@ -766,13 +766,6 @@ int __xfrm_state_delete(struct xfrm_state *x)
>>> if (x->encap_sk)
>>> sock_put(rcu_dereference_raw(x->encap_sk));
>>> - xfrm_dev_state_delete(x);
>>> -
>>> - /* All xfrm_state objects are created by xfrm_state_alloc.
>>> - * The xfrm_state_alloc call gives a reference, and that
>>> - * is what we are dropping here.
>>> - */
>>> - xfrm_state_put(x);
>>> err = 0;
>>> }
>>> @@ -787,8 +780,20 @@ int xfrm_state_delete(struct xfrm_state *x)
>>> spin_lock_bh(&x->lock);
>>> err = __xfrm_state_delete(x);
>>> spin_unlock_bh(&x->lock);
>>> + if (err)
>>> + return err;
>>> - return err;
>>> + if (x->km.state == XFRM_STATE_DEAD) {
>>> + xfrm_dev_state_delete(x);
>>> +
>>> + /* All xfrm_state objects are created by xfrm_state_alloc.
>>> + * The xfrm_state_alloc call gives a reference, and that
>>> + * is what we are dropping here.
>>> + */
>>> + xfrm_state_put(x);
>>> + }
>>> +
>>> + return 0;
>>> }
>>> EXPORT_SYMBOL(xfrm_state_delete);
>>>
>>> Then why we need the spin lock in xfrm_state_delete?
>>>
>>
>> No, we don't need. But I am trying to understand what you said in your last
>> email about adding a new lock, or unlocking spin lock in
>
> I *thought* we need the spin lock in xfrm_state_delete(). So to protect xfrm_state,
But not need in bond_ipsec_del_sa() because the state still hold by
xfrm_state_hold(), right?
> we need a new lock. Although it looks redundant. e.g.
>
> int xfrm_state_delete(struct xfrm_state *x)
> {
> int err;
>
> spin_lock_bh(&x->lock);
> err = __xfrm_state_delete(x);
> spin_unlock_bh(&x->lock);
> if (err)
> return err;
>
> another_lock(&x->other_lock)
> if (x->km.state == XFRM_STATE_DEAD) {
> xfrm_dev_state_delete(x);
> xfrm_state_put(x);
> }
> another_unlock(&x->other_lock)
>
> return 0;
> }
>> bond_ipsec_del_sa(). Anything I missed?
>
> The unlock spin lock in bond_ipsec_del_sa looks like
> https://lore.kernel.org/netdev/Z1vfsAyuxcohT7th@fedora/
>
> Thanks
> Hangbin
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH net 0/2] bond: fix xfrm offload feature during init
2025-01-09 9:51 ` Jianbo Liu
@ 2025-01-09 10:17 ` Hangbin Liu
2025-01-09 12:21 ` Jianbo Liu
0 siblings, 1 reply; 29+ messages in thread
From: Hangbin Liu @ 2025-01-09 10:17 UTC (permalink / raw)
To: Jianbo Liu
Cc: Jakub Kicinski, netdev, Jay Vosburgh, Andy Gospodarek,
David S. Miller, Eric Dumazet, Paolo Abeni, Nikolay Aleksandrov,
Simon Horman, Tariq Toukan, Andrew Lunn, Shuah Khan,
Steffen Klassert, Herbert Xu, Sabrina Dubroca, linux-kselftest,
linux-kernel
On Thu, Jan 09, 2025 at 05:51:07PM +0800, Jianbo Liu wrote:
> > > No, we don't need. But I am trying to understand what you said in your last
> > > email about adding a new lock, or unlocking spin lock in
> >
> > I *thought* we need the spin lock in xfrm_state_delete(). So to protect xfrm_state,
>
> But not need in bond_ipsec_del_sa() because the state still hold by
> xfrm_state_hold(), right?
Hmm, I'm not sure. If xfrm_state_hold() is safe. Why not just remove the spin
lock in xfrm_state_delete(). This is more straightforward. e.g.
diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index 67ca7ac955a3..150562abf513 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -784,9 +784,7 @@ int xfrm_state_delete(struct xfrm_state *x)
{
int err;
- spin_lock_bh(&x->lock);
err = __xfrm_state_delete(x);
- spin_unlock_bh(&x->lock);
return err;
}
We can even rename xfrm_state_delete() to xfrm_state_delete() directly.
Thanks
Hangbin
^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [PATCH net 0/2] bond: fix xfrm offload feature during init
2025-01-09 10:17 ` Hangbin Liu
@ 2025-01-09 12:21 ` Jianbo Liu
0 siblings, 0 replies; 29+ messages in thread
From: Jianbo Liu @ 2025-01-09 12:21 UTC (permalink / raw)
To: Hangbin Liu
Cc: Jakub Kicinski, netdev, Jay Vosburgh, Andy Gospodarek,
David S. Miller, Eric Dumazet, Paolo Abeni, Nikolay Aleksandrov,
Simon Horman, Tariq Toukan, Andrew Lunn, Shuah Khan,
Steffen Klassert, Herbert Xu, Sabrina Dubroca, linux-kselftest,
linux-kernel
On 1/9/2025 6:17 PM, Hangbin Liu wrote:
> On Thu, Jan 09, 2025 at 05:51:07PM +0800, Jianbo Liu wrote:
>>>> No, we don't need. But I am trying to understand what you said in your last
>>>> email about adding a new lock, or unlocking spin lock in
>>>
>>> I *thought* we need the spin lock in xfrm_state_delete(). So to protect xfrm_state,
>>
>> But not need in bond_ipsec_del_sa() because the state still hold by
>> xfrm_state_hold(), right?
>
> Hmm, I'm not sure. If xfrm_state_hold() is safe. Why not just remove the spin
> lock in xfrm_state_delete(). This is more straightforward. e.g.
>
We can't remove the spin lock in xfrm_state_delete(), but I think we can
access the state while holding it, for example, checking (ipsec->xs ==
xs) as you mentioned before, because memory is not freed yet.
> diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
> index 67ca7ac955a3..150562abf513 100644
> --- a/net/xfrm/xfrm_state.c
> +++ b/net/xfrm/xfrm_state.c
> @@ -784,9 +784,7 @@ int xfrm_state_delete(struct xfrm_state *x)
> {
> int err;
>
> - spin_lock_bh(&x->lock);
> err = __xfrm_state_delete(x);
> - spin_unlock_bh(&x->lock);
>
> return err;
> }
>
> We can even rename xfrm_state_delete() to xfrm_state_delete() directly.
>
> Thanks
> Hangbin
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH net 0/2] bond: fix xfrm offload feature during init
2025-01-08 7:14 ` Hangbin Liu
2025-01-09 1:26 ` Jianbo Liu
@ 2025-01-15 9:19 ` Hangbin Liu
2025-01-17 7:54 ` Steffen Klassert
1 sibling, 1 reply; 29+ messages in thread
From: Hangbin Liu @ 2025-01-15 9:19 UTC (permalink / raw)
To: Steffen Klassert, Jianbo Liu
Cc: Jakub Kicinski, netdev, Jay Vosburgh, Andy Gospodarek,
David S. Miller, Eric Dumazet, Paolo Abeni, Nikolay Aleksandrov,
Simon Horman, Tariq Toukan, Andrew Lunn, Shuah Khan, Herbert Xu,
Sabrina Dubroca, linux-kselftest, linux-kernel
On Wed, Jan 08, 2025 at 07:15:00AM +0000, Hangbin Liu wrote:
> > > > > > I don't know how to disable bonding sleeping since we use mutex_lock now.
> > > > > > Hi Jianbo, do you have any idea?
> > > > > >
> > > > >
> > > > > I think we should allow drivers to sleep in the callbacks. So, maybe it's
> > > > > better to move driver's xdo_dev_state_delete out of state's spin lock.
> > > >
> > > > I just check the code, xfrm_dev_state_delete() and later
> > > > dev->xfrmdev_ops->xdo_dev_state_delete(x) have too many xfrm_state x
> > > > checks. Can we really move it out of spin lock from xfrm_state_delete()
> > >
> > > I tried to move the mutex lock code to a work queue, but found we need to
> > > check (ipsec->xs == xs) in bonding. So we still need xfrm_state x during bond
> >
> > Maybe I miss something, but why need to hold spin lock. You can keep xfrm
> > state by its refcnt.
>
> Do you mean move the xfrm_dev_state_delete() out of spin lock directly like:
>
> diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
> index 67ca7ac955a3..6881ddeb4360 100644
> --- a/net/xfrm/xfrm_state.c
> +++ b/net/xfrm/xfrm_state.c
> @@ -766,13 +766,6 @@ int __xfrm_state_delete(struct xfrm_state *x)
> if (x->encap_sk)
> sock_put(rcu_dereference_raw(x->encap_sk));
>
> - xfrm_dev_state_delete(x);
> -
> - /* All xfrm_state objects are created by xfrm_state_alloc.
> - * The xfrm_state_alloc call gives a reference, and that
> - * is what we are dropping here.
> - */
> - xfrm_state_put(x);
> err = 0;
> }
>
> @@ -787,8 +780,20 @@ int xfrm_state_delete(struct xfrm_state *x)
> spin_lock_bh(&x->lock);
> err = __xfrm_state_delete(x);
> spin_unlock_bh(&x->lock);
> + if (err)
> + return err;
>
> - return err;
> + if (x->km.state == XFRM_STATE_DEAD) {
> + xfrm_dev_state_delete(x);
> +
> + /* All xfrm_state objects are created by xfrm_state_alloc.
> + * The xfrm_state_alloc call gives a reference, and that
> + * is what we are dropping here.
> + */
> + xfrm_state_put(x);
> + }
> +
> + return 0;
> }
> EXPORT_SYMBOL(xfrm_state_delete);
>
Hi Jianbo,
I talked with Sabrina and it looks we can't simply do this. Because both
xfrm_add_sa_expire() and xfrm_timer_handler() calling __xfrm_state_delete() under
spin lock. If we move the xfrm_dev_state_delete() out of __xfrm_state_delete(),
all the places need to be handled correctly.
At the same time xfrm_timer_handler() calling xfrm_dev_state_update_stats before
__xfrm_state_delete(). Should we also take care of it to make sure the state
change and delete are called at the same time?
Hi Steffen, do you have any comments?
Thanks
Hangbin
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH net 0/2] bond: fix xfrm offload feature during init
2025-01-15 9:19 ` Hangbin Liu
@ 2025-01-17 7:54 ` Steffen Klassert
2025-01-20 16:16 ` Cosmin Ratiu
0 siblings, 1 reply; 29+ messages in thread
From: Steffen Klassert @ 2025-01-17 7:54 UTC (permalink / raw)
To: Hangbin Liu
Cc: Jianbo Liu, Jakub Kicinski, netdev, Jay Vosburgh, Andy Gospodarek,
David S. Miller, Eric Dumazet, Paolo Abeni, Nikolay Aleksandrov,
Simon Horman, Tariq Toukan, Andrew Lunn, Shuah Khan, Herbert Xu,
Sabrina Dubroca, linux-kselftest, linux-kernel
On Wed, Jan 15, 2025 at 09:19:33AM +0000, Hangbin Liu wrote:
> On Wed, Jan 08, 2025 at 07:15:00AM +0000, Hangbin Liu wrote:
> > > > > > > I don't know how to disable bonding sleeping since we use mutex_lock now.
> > > > > > > Hi Jianbo, do you have any idea?
> > > > > > >
> > > > > >
> > > > > > I think we should allow drivers to sleep in the callbacks. So, maybe it's
> > > > > > better to move driver's xdo_dev_state_delete out of state's spin lock.
> > > > >
> > > > > I just check the code, xfrm_dev_state_delete() and later
> > > > > dev->xfrmdev_ops->xdo_dev_state_delete(x) have too many xfrm_state x
> > > > > checks. Can we really move it out of spin lock from xfrm_state_delete()
> > > >
> > > > I tried to move the mutex lock code to a work queue, but found we need to
> > > > check (ipsec->xs == xs) in bonding. So we still need xfrm_state x during bond
> > >
> > > Maybe I miss something, but why need to hold spin lock. You can keep xfrm
> > > state by its refcnt.
> >
> > Do you mean move the xfrm_dev_state_delete() out of spin lock directly like:
> >
> > diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
> > index 67ca7ac955a3..6881ddeb4360 100644
> > --- a/net/xfrm/xfrm_state.c
> > +++ b/net/xfrm/xfrm_state.c
> > @@ -766,13 +766,6 @@ int __xfrm_state_delete(struct xfrm_state *x)
> > if (x->encap_sk)
> > sock_put(rcu_dereference_raw(x->encap_sk));
> >
> > - xfrm_dev_state_delete(x);
> > -
> > - /* All xfrm_state objects are created by xfrm_state_alloc.
> > - * The xfrm_state_alloc call gives a reference, and that
> > - * is what we are dropping here.
> > - */
> > - xfrm_state_put(x);
> > err = 0;
> > }
> >
> > @@ -787,8 +780,20 @@ int xfrm_state_delete(struct xfrm_state *x)
> > spin_lock_bh(&x->lock);
> > err = __xfrm_state_delete(x);
> > spin_unlock_bh(&x->lock);
> > + if (err)
> > + return err;
> >
> > - return err;
> > + if (x->km.state == XFRM_STATE_DEAD) {
> > + xfrm_dev_state_delete(x);
> > +
> > + /* All xfrm_state objects are created by xfrm_state_alloc.
> > + * The xfrm_state_alloc call gives a reference, and that
> > + * is what we are dropping here.
> > + */
> > + xfrm_state_put(x);
> > + }
> > +
> > + return 0;
> > }
> > EXPORT_SYMBOL(xfrm_state_delete);
> >
>
> Hi Jianbo,
>
> I talked with Sabrina and it looks we can't simply do this. Because both
> xfrm_add_sa_expire() and xfrm_timer_handler() calling __xfrm_state_delete() under
> spin lock. If we move the xfrm_dev_state_delete() out of __xfrm_state_delete(),
> all the places need to be handled correctly.
>
> At the same time xfrm_timer_handler() calling xfrm_dev_state_update_stats before
> __xfrm_state_delete(). Should we also take care of it to make sure the state
> change and delete are called at the same time?
>
> Hi Steffen, do you have any comments?
Can't you just fix this in bonding? xfrm_timer_handler() can't sleep
anyway, even if you remove the spinlock, it is a timer function.
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH net 0/2] bond: fix xfrm offload feature during init
2025-01-17 7:54 ` Steffen Klassert
@ 2025-01-20 16:16 ` Cosmin Ratiu
2025-01-20 23:59 ` Hangbin Liu
0 siblings, 1 reply; 29+ messages in thread
From: Cosmin Ratiu @ 2025-01-20 16:16 UTC (permalink / raw)
To: steffen.klassert@secunet.com, liuhangbin@gmail.com
Cc: shuah@kernel.org, andrew+netdev@lunn.ch, davem@davemloft.net,
jv@jvosburgh.net, sd@queasysnail.net, andy@greyhouse.net,
linux-kernel@vger.kernel.org, pabeni@redhat.com,
edumazet@google.com, razor@blackwall.org, Jianbo Liu,
horms@kernel.org, kuba@kernel.org, Tariq Toukan,
herbert@gondor.apana.org.au, netdev@vger.kernel.org,
linux-kselftest@vger.kernel.org
On Fri, 2025-01-17 at 08:54 +0100, Steffen Klassert wrote:
> >
> > Hi Jianbo,
> >
> > I talked with Sabrina and it looks we can't simply do this. Because
> > both
> > xfrm_add_sa_expire() and xfrm_timer_handler() calling
> > __xfrm_state_delete() under
> > spin lock. If we move the xfrm_dev_state_delete() out of
> > __xfrm_state_delete(),
> > all the places need to be handled correctly.
> >
> > At the same time xfrm_timer_handler() calling
> > xfrm_dev_state_update_stats before
> > __xfrm_state_delete(). Should we also take care of it to make sure
> > the state
> > change and delete are called at the same time?
> >
> > Hi Steffen, do you have any comments?
>
> Can't you just fix this in bonding? xfrm_timer_handler() can't sleep
> anyway, even if you remove the spinlock, it is a timer function.
>
I am not sure this can be fixed in bonding given that the
xdo_dev_state_delete op could, in the general case, sleep while talking
to the hardware. I don't think it's reasonable to expect devices to
offload xfrm while the kernel holds a spinlock.
Bonding just exposed this assumption mismatch because of the mutex that
was added to replace a spinlock which exhibited the same problem we are
talking about here.
Do the dev offload operations need to be synchronous? Couldn't
__xfrm_state_delete instead schedule a wq to do the dev offload? I saw
there's already an xfrm_state_gc_task that's invoked to call
xfrm_dev_state_free, perhaps that could be used to do the delete as
well?
Cosmin.
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH net 0/2] bond: fix xfrm offload feature during init
2025-01-20 16:16 ` Cosmin Ratiu
@ 2025-01-20 23:59 ` Hangbin Liu
2025-02-20 10:48 ` Cosmin Ratiu
0 siblings, 1 reply; 29+ messages in thread
From: Hangbin Liu @ 2025-01-20 23:59 UTC (permalink / raw)
To: Cosmin Ratiu
Cc: steffen.klassert@secunet.com, shuah@kernel.org,
andrew+netdev@lunn.ch, davem@davemloft.net, jv@jvosburgh.net,
sd@queasysnail.net, andy@greyhouse.net,
linux-kernel@vger.kernel.org, pabeni@redhat.com,
edumazet@google.com, razor@blackwall.org, Jianbo Liu,
horms@kernel.org, kuba@kernel.org, Tariq Toukan,
herbert@gondor.apana.org.au, netdev@vger.kernel.org,
linux-kselftest@vger.kernel.org
On Mon, Jan 20, 2025 at 04:16:49PM +0000, Cosmin Ratiu wrote:
> On Fri, 2025-01-17 at 08:54 +0100, Steffen Klassert wrote:
> > >
> > > Hi Jianbo,
> > >
> > > I talked with Sabrina and it looks we can't simply do this. Because
> > > both
> > > xfrm_add_sa_expire() and xfrm_timer_handler() calling
> > > __xfrm_state_delete() under
> > > spin lock. If we move the xfrm_dev_state_delete() out of
> > > __xfrm_state_delete(),
> > > all the places need to be handled correctly.
> > >
> > > At the same time xfrm_timer_handler() calling
> > > xfrm_dev_state_update_stats before
> > > __xfrm_state_delete(). Should we also take care of it to make sure
> > > the state
> > > change and delete are called at the same time?
> > >
> > > Hi Steffen, do you have any comments?
> >
> > Can't you just fix this in bonding? xfrm_timer_handler() can't sleep
> > anyway, even if you remove the spinlock, it is a timer function.
> >
>
> I am not sure this can be fixed in bonding given that the
> xdo_dev_state_delete op could, in the general case, sleep while talking
> to the hardware. I don't think it's reasonable to expect devices to
> offload xfrm while the kernel holds a spinlock.
> Bonding just exposed this assumption mismatch because of the mutex that
> was added to replace a spinlock which exhibited the same problem we are
> talking about here.
>
> Do the dev offload operations need to be synchronous? Couldn't
> __xfrm_state_delete instead schedule a wq to do the dev offload? I saw
> there's already an xfrm_state_gc_task that's invoked to call
> xfrm_dev_state_free, perhaps that could be used to do the delete as
> well?
Yes, I have tried to move the bonding gc work in bond_ipsec_del_sa() to a wq
in https://lore.kernel.org/netdev/Z33nEKg4PxwReUu_@fedora/. i.e. move the
following part out of spin lock via wq.
mutex_lock(&bond->ipsec_lock);
list_for_each_entry(ipsec, &bond->ipsec_list, list) {
if (ipsec->xs == xs) {
list_del(&ipsec->list);
kfree(ipsec);
break;
}
}
mutex_unlock(&bond->ipsec_lock);
But we can see there is an (ipsec->xs == xs). So we still need to make
sure the xs is not released. Can we add a xs reference in bond_ipsec_del_sa()
to achieve this?
Thanks
Hangbin
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH net 0/2] bond: fix xfrm offload feature during init
2025-01-20 23:59 ` Hangbin Liu
@ 2025-02-20 10:48 ` Cosmin Ratiu
2025-02-20 11:18 ` Hangbin Liu
0 siblings, 1 reply; 29+ messages in thread
From: Cosmin Ratiu @ 2025-02-20 10:48 UTC (permalink / raw)
To: liuhangbin@gmail.com
Cc: shuah@kernel.org, andrew+netdev@lunn.ch, davem@davemloft.net,
jv@jvosburgh.net, herbert@gondor.apana.org.au, andy@greyhouse.net,
linux-kernel@vger.kernel.org, pabeni@redhat.com,
edumazet@google.com, sd@queasysnail.net, Jianbo Liu,
horms@kernel.org, kuba@kernel.org, Tariq Toukan,
razor@blackwall.org, netdev@vger.kernel.org,
steffen.klassert@secunet.com, linux-kselftest@vger.kernel.org
On Mon, 2025-01-20 at 23:59 +0000, Hangbin Liu wrote:
> >
> > I am not sure this can be fixed in bonding given that the
> > xdo_dev_state_delete op could, in the general case, sleep while
> > talking
> > to the hardware. I don't think it's reasonable to expect devices to
> > offload xfrm while the kernel holds a spinlock.
> > Bonding just exposed this assumption mismatch because of the mutex
> > that
> > was added to replace a spinlock which exhibited the same problem we
> > are
> > talking about here.
> >
> > Do the dev offload operations need to be synchronous? Couldn't
> > __xfrm_state_delete instead schedule a wq to do the dev offload? I
> > saw
> > there's already an xfrm_state_gc_task that's invoked to call
> > xfrm_dev_state_free, perhaps that could be used to do the delete as
> > well?
>
> Yes, I have tried to move the bonding gc work in bond_ipsec_del_sa()
> to a wq
> in https://lore.kernel.org/netdev/Z33nEKg4PxwReUu_@fedora/. i.e. move
> the
> following part out of spin lock via wq.
>
> mutex_lock(&bond->ipsec_lock);
> list_for_each_entry(ipsec, &bond->ipsec_list, list) {
> if (ipsec->xs == xs) {
> list_del(&ipsec->list);
> kfree(ipsec);
> break;
> }
> }
> mutex_unlock(&bond->ipsec_lock);
>
> But we can see there is an (ipsec->xs == xs). So we still need to
> make
> sure the xs is not released. Can we add a xs reference in
> bond_ipsec_del_sa()
> to achieve this?
Hello,
After staring at the issue a while longer, I am also converging on just
moving that mutex part from bond_ipsec_del_sa out to a wq. I browsed
through all driver implementations of .xdo_dev_state_delete() and found
none that sleeps or allocates memory with GFP_KERNEL. So if we only fix
bond_ipsec_del_sa, that would be enough to make it all work again.
So it should be perfectly safe to add a ref to xs in bond_ipsec_del_sa
before firing up a wq to do the mutex lock + list traversal, before
releasing the ref.
xfrm_state is already unlinked from everything by __xfrm_state_delete
before xfrm_dev_state_delete is called and the xfrm_state_alloc
reference is dropped by the end of xfrm_dev_state_delete, so the only
thing keeping it alive would be the reference added in
bond_ipsec_del_sa. When that is put after the list traversal,
__xfrm_state_destroy gets called with sync == false, which passes on
the baton to another wq to do the gc for xs.
This all sounds reasonable.
Will you chase this down or do you prefer me to send the proposed fix?
Cosmin.
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH net 0/2] bond: fix xfrm offload feature during init
2025-02-20 10:48 ` Cosmin Ratiu
@ 2025-02-20 11:18 ` Hangbin Liu
2025-02-20 11:33 ` Cosmin Ratiu
0 siblings, 1 reply; 29+ messages in thread
From: Hangbin Liu @ 2025-02-20 11:18 UTC (permalink / raw)
To: Cosmin Ratiu
Cc: shuah@kernel.org, andrew+netdev@lunn.ch, davem@davemloft.net,
jv@jvosburgh.net, herbert@gondor.apana.org.au, andy@greyhouse.net,
linux-kernel@vger.kernel.org, pabeni@redhat.com,
edumazet@google.com, sd@queasysnail.net, Jianbo Liu,
horms@kernel.org, kuba@kernel.org, Tariq Toukan,
razor@blackwall.org, netdev@vger.kernel.org,
steffen.klassert@secunet.com, linux-kselftest@vger.kernel.org
On Thu, Feb 20, 2025 at 10:48:43AM +0000, Cosmin Ratiu wrote:
> On Mon, 2025-01-20 at 23:59 +0000, Hangbin Liu wrote:
> > >
> > > I am not sure this can be fixed in bonding given that the
> > > xdo_dev_state_delete op could, in the general case, sleep while
> > > talking
> > > to the hardware. I don't think it's reasonable to expect devices to
> > > offload xfrm while the kernel holds a spinlock.
> > > Bonding just exposed this assumption mismatch because of the mutex
> > > that
> > > was added to replace a spinlock which exhibited the same problem we
> > > are
> > > talking about here.
> > >
> > > Do the dev offload operations need to be synchronous? Couldn't
> > > __xfrm_state_delete instead schedule a wq to do the dev offload? I
> > > saw
> > > there's already an xfrm_state_gc_task that's invoked to call
> > > xfrm_dev_state_free, perhaps that could be used to do the delete as
> > > well?
> >
> > Yes, I have tried to move the bonding gc work in bond_ipsec_del_sa()
> > to a wq
> > in https://lore.kernel.org/netdev/Z33nEKg4PxwReUu_@fedora/. i.e. move
> > the
> > following part out of spin lock via wq.
> >
> > mutex_lock(&bond->ipsec_lock);
> > list_for_each_entry(ipsec, &bond->ipsec_list, list) {
> > if (ipsec->xs == xs) {
> > list_del(&ipsec->list);
> > kfree(ipsec);
> > break;
> > }
> > }
> > mutex_unlock(&bond->ipsec_lock);
> >
> > But we can see there is an (ipsec->xs == xs). So we still need to
> > make
> > sure the xs is not released. Can we add a xs reference in
> > bond_ipsec_del_sa()
> > to achieve this?
>
> Hello,
>
> After staring at the issue a while longer, I am also converging on just
> moving that mutex part from bond_ipsec_del_sa out to a wq. I browsed
> through all driver implementations of .xdo_dev_state_delete() and found
> none that sleeps or allocates memory with GFP_KERNEL. So if we only fix
> bond_ipsec_del_sa, that would be enough to make it all work again.
>
> So it should be perfectly safe to add a ref to xs in bond_ipsec_del_sa
> before firing up a wq to do the mutex lock + list traversal, before
> releasing the ref.
> xfrm_state is already unlinked from everything by __xfrm_state_delete
> before xfrm_dev_state_delete is called and the xfrm_state_alloc
> reference is dropped by the end of xfrm_dev_state_delete, so the only
> thing keeping it alive would be the reference added in
> bond_ipsec_del_sa. When that is put after the list traversal,
> __xfrm_state_destroy gets called with sync == false, which passes on
> the baton to another wq to do the gc for xs.
>
> This all sounds reasonable.
> Will you chase this down or do you prefer me to send the proposed fix?
Thanks for the feedback and confirmation. Let me try it first. Hope
unregistering bond doesn't affect the gc works.
Thanks
Hangbin
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH net 0/2] bond: fix xfrm offload feature during init
2025-02-20 11:18 ` Hangbin Liu
@ 2025-02-20 11:33 ` Cosmin Ratiu
0 siblings, 0 replies; 29+ messages in thread
From: Cosmin Ratiu @ 2025-02-20 11:33 UTC (permalink / raw)
To: liuhangbin@gmail.com
Cc: shuah@kernel.org, andrew+netdev@lunn.ch, davem@davemloft.net,
jv@jvosburgh.net, sd@queasysnail.net, andy@greyhouse.net,
linux-kernel@vger.kernel.org, edumazet@google.com,
pabeni@redhat.com, herbert@gondor.apana.org.au, Jianbo Liu,
horms@kernel.org, kuba@kernel.org, Tariq Toukan,
razor@blackwall.org, netdev@vger.kernel.org,
steffen.klassert@secunet.com, linux-kselftest@vger.kernel.org
On Thu, 2025-02-20 at 11:18 +0000, Hangbin Liu wrote:
>
> Thanks for the feedback and confirmation. Let me try it first. Hope
> unregistering bond doesn't affect the gc works.
I think this should be handled naturally as part of the bond device
tear down. A quick peek shows:
bond_uninit -> (for each slave) __bond_release_one ->
bond_change_active_slave (_, NULL) -> bond_ipsec_del_sa_all deletes all
matching xfrm_state entries from bond->ipsec_list, with the mutex held.
Presumably, a new step after all slaves are deleted should call
drain_workqueue on the new workqueue to wait for any scheduled work to
be done before finally destroying the workqueue.
Cosmin.
^ permalink raw reply [flat|nested] 29+ messages in thread
end of thread, other threads:[~2025-02-20 11:33 UTC | newest]
Thread overview: 29+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-12-11 7:11 [PATCH net 0/2] bond: fix xfrm offload feature during init Hangbin Liu
2024-12-11 7:11 ` [PATCH net 1/2] bonding: fix xfrm offload feature setup on active-backup mode Hangbin Liu
2024-12-12 9:19 ` Nikolay Aleksandrov
2024-12-12 9:39 ` Hangbin Liu
2024-12-12 9:43 ` Nikolay Aleksandrov
2024-12-13 3:10 ` Hangbin Liu
2024-12-11 7:11 ` [PATCH net 2/2] selftests: bonding: add ipsec offload test Hangbin Liu
2024-12-12 14:27 ` [PATCH net 0/2] bond: fix xfrm offload feature during init Jakub Kicinski
2024-12-13 7:18 ` Hangbin Liu
2024-12-14 3:31 ` Jakub Kicinski
2025-01-02 2:44 ` Hangbin Liu
2025-01-02 3:33 ` Jianbo Liu
2025-01-03 11:05 ` Hangbin Liu
2025-01-06 10:47 ` Hangbin Liu
2025-01-08 2:46 ` Hangbin Liu
2025-01-08 3:40 ` Jianbo Liu
2025-01-08 7:14 ` Hangbin Liu
2025-01-09 1:26 ` Jianbo Liu
2025-01-09 8:37 ` Hangbin Liu
2025-01-09 9:51 ` Jianbo Liu
2025-01-09 10:17 ` Hangbin Liu
2025-01-09 12:21 ` Jianbo Liu
2025-01-15 9:19 ` Hangbin Liu
2025-01-17 7:54 ` Steffen Klassert
2025-01-20 16:16 ` Cosmin Ratiu
2025-01-20 23:59 ` Hangbin Liu
2025-02-20 10:48 ` Cosmin Ratiu
2025-02-20 11:18 ` Hangbin Liu
2025-02-20 11:33 ` Cosmin Ratiu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).