Re: [PATCH RFC net-next 0/4] bonding: support LAG IPsec offload with replicated SAs

Netdev List
 help / color / mirror / Atom feed

From: Jihong Min <hurryman2212@gmail.com>
To: Leon Romanovsky <leon@kernel.org>
Cc: netdev@vger.kernel.org, Jay Vosburgh <jv@jvosburgh.net>,
	Andrew Lunn <andrew+netdev@lunn.ch>,
	"David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	Simon Horman <horms@kernel.org>,
	Steffen Klassert <steffen.klassert@secunet.com>,
	Herbert Xu <herbert@gondor.apana.org.au>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH RFC net-next 0/4] bonding: support LAG IPsec offload with replicated SAs
Date: Thu, 11 Jun 2026 17:56:17 +0900	[thread overview]
Message-ID: <a7f661e5-61ee-42d7-be9e-5569e0f16e28@gmail.com> (raw)
In-Reply-To: <20260610141843.GI327369@unreal>

Hi,

On 6/10/26 23:18, Leon Romanovsky wrote:
> On Wed, May 20, 2026 at 05:10:00PM +0900, Jihong Min wrote:
>> This RFC adds a bonding model for IPsec/XFRM hardware offload on
>> 802.3ad and balance-xor LAG devices when the transmit hash policy is
>> layer3+4. This is an intentional scope limit rather than a hard limit,
>> as this is the configuration I can test with my gear.
>>
>> The main idea is to leave the existing upstream single-lower-device XFRM
>> offload path for active-backup intentionally untouched, while adding a
>> replicated state model for LAG.
>>
>> For LAG bonds, the bonding driver installs the same XFRM state on every
>> eligible running slave and stores the per-slave hardware handles in
>> bonding-private state. Lower drivers that support this model can then
>> resolve the handle for the concrete lower netdev used by the datapath.
>>
>> LAG IPsec features are user controlled. Newly eligible LAG bonds start
>> with the ESP/XFRM features disabled, but advertise supported mutable
>> features when all running eligible slaves can support them. Users can
>> then opt in with ethtool. Feature enable is propagated to the lower
>> devices and rolled back if a lower device cannot enable the requested
>> features.
>>
>> The series also handles LAG membership and eligibility changes by adding
>> replicated SAs to newly usable slaves, removing the departing lower
>> instance on down/remove, and flushing bond-owned XFRM offload state when
>> the bond leaves the supported mode or hash-policy configuration.
>>
>> This series does not convert any physical NIC driver. A lower driver
>> must explicitly opt in to the replicated-upper-device model before it can
>> use these bond-owned states in its datapath.
>>
>> For example, a driver such as mlx5 would opt in by marking its
>> xfrmdev_ops and by resolving datapath handles through the helper:
>>
>>         static const struct xfrmdev_ops mlx5e_ipsec_xfrmdev_ops = {
>>                 ...
>>                 .xdo_dev_state_lower_handle = NULL,
>>                 .flags = XFRMDEV_OPS_F_LOWER_HANDLE,
>>         };
>>
>>         handle = xfrm_dev_state_lower_handle(x, netdev);
>>         if (!handle)
>>                 goto drop;
>>
>>         sa_entry = (struct mlx5e_ipsec_sa_entry *)handle;
> 
> I’m curious how you replicate and maintain the hardware state across these
> devices. How are you handling the anti-replay window?
> 
> Thanks
> 

The short answer is that the RFC I sent was not complete enough in this
area.

The long answer is:

At that time my preliminary test setup was an Airoha AN7581 board with
two 10G PHYs bonded together. I had ESP hardware offload working by
modifying both airoha_eth and an EIP93 driver that was tied into
airoha_eth in a rather ugly way. For this RFC, I tried to extract only
the generic bonding/XFRM infrastructure and leave out the
Airoha-specific pieces, but that split was not clean enough.

The mlx5 example in the cover letter was not tested. I used it only as
an example because the modified airoha_eth + EIP93 setup was not a good
thing to show as a driver model. Looking back, that was misleading.

After doing more work on this, I agree that the original RFC did not
handle the replay issue well enough. The current version is quite
different. I now have a driver for the SOE (Secure Offload Engine) block
in AN7581, which is the SoC's ESP crypto and packet encap/decap (+
NAT-T) offload engine, linked directly from airoha_eth. With that
version I tested XFRM/strongSwan (IPsec/IKEv2) over the same two 10G PHY
LAG setup, in 802.3ad with layer3+4 hashing, and I can get up to about 5
Gbps.

If I were to write the driver opt-in example again, I would not use only
XFRMDEV_OPS_F_LOWER_HANDLE. That flag only says that the lower driver
resolves the hardware handle through xfrm_dev_state_lower_handle()
instead of using x->xso.offload_handle directly. It does not say
anything about whether the replicated LAG state is safe for sequence and
replay handling.

The opt-in would need to describe those guarantees explicitly, for example:

        static const struct xfrmdev_ops mlx5e_ipsec_xfrmdev_ops = {
                ...
                .xdo_dev_packet_xmit = mlx5e_ipsec_packet_xmit,
                .flags = XFRMDEV_OPS_F_LOWER_HANDLE |
                         XFRMDEV_OPS_F_LAG_SHARED_TX_SEQ |
                         XFRMDEV_OPS_F_LAG_SHARED_RX_REPLAY,
        };

        handle = xfrm_dev_state_lower_handle(x, skb->dev);
        if (!handle)
                goto drop;

        sa_entry = (struct mlx5e_ipsec_sa_entry *)handle;

Here XFRMDEV_OPS_F_LOWER_HANDLE means that the driver uses the
lower-handle resolver in its datapath. XFRMDEV_OPS_F_LAG_SHARED_TX_SEQ
would mean that the driver/hardware can keep the outbound sequence state
correct when an SA is used through a LAG upper device.
XFRMDEV_OPS_F_LAG_SHARED_RX_REPLAY would mean that inbound packets for
the same SA are checked against one valid replay state, so the same
packet cannot be accepted again just because it arrived on another slave.

The exact flag names are only illustrative, but the point is that the
lower-driver opt-in needs to describe the sequence/replay guarantees,
not only the handle lookup mechanism.

I'm currently working on and distributing OpenWrt source with the newer
bonding/XFRM LAG offload work and the Airoha SOE/PPE integration; one
snapshot is this commit, [kernel: add bonding LAG XFRM offload
infrastructure and Airoha
support](https://github.com/hurryman2212/OpenW1700k-test/commit/fbfe8f919f836bb62b3849f803865a4d9b8dc76f),
which carries both the generic bonding/XFRM patches and the
Airoha-specific SOE pieces. I do not think it is ready for the next
submission yet, because some logic that is still only in the Airoha path
needs to be generalized and moved into the bonding/XFRM code, and the
TX/RX sequence and replay protection rules still need to be made
complete. Once that is done, I plan to submit a new version; I have not
decided yet whether that will include the Airoha driver code or only the
generic part.

Sincerely,
Jihong Min

>>
>> Jihong Min (4):
>>   xfrm: add a lower-device offload handle resolver
>>   bonding: replicate XFRM offload state across LAG slaves
>>   bonding: expose user-controlled IPsec features for LAG
>>   bonding: handle replicated IPsec SAs across LAG changes
>>
>>  drivers/net/bonding/bond_main.c    | 855 ++++++++++++++++++++++++++++-
>>  drivers/net/bonding/bond_options.c |  59 +-
>>  include/linux/netdevice.h          |  27 +
>>  include/net/bonding.h              |  29 +-
>>  include/net/xfrm.h                 |  48 +-
>>  net/xfrm/xfrm_state.c              |   1 +
>>  6 files changed, 1000 insertions(+), 19 deletions(-)
>>
>>
>> base-commit: 27fa82620cbaa89a7fc11ac3057701d598813e87
>> -- 
>> 2.53.0
>>

next prev parent reply	other threads:[~2026-06-11  8:56 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-20  8:10 [PATCH RFC net-next 0/4] bonding: support LAG IPsec offload with replicated SAs Jihong Min
2026-05-20  8:10 ` [PATCH RFC net-next 1/4] xfrm: add a lower-device offload handle resolver Jihong Min
2026-05-20  8:10 ` [PATCH RFC net-next 2/4] bonding: replicate XFRM offload state across LAG slaves Jihong Min
2026-05-20  8:10 ` [PATCH RFC net-next 3/4] bonding: expose user-controlled IPsec features for LAG Jihong Min
2026-05-20  8:10 ` [PATCH RFC net-next 4/4] bonding: handle replicated IPsec SAs across LAG changes Jihong Min
2026-06-10 14:18 ` [PATCH RFC net-next 0/4] bonding: support LAG IPsec offload with replicated SAs Leon Romanovsky
2026-06-11  8:56   ` Jihong Min [this message]
2026-06-11 10:06     ` Leon Romanovsky
2026-06-11 11:21       ` Jihong Min

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a7f661e5-61ee-42d7-be9e-5569e0f16e28@gmail.com \
    --to=hurryman2212@gmail.com \
    --cc=andrew+netdev@lunn.ch \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=herbert@gondor.apana.org.au \
    --cc=horms@kernel.org \
    --cc=jv@jvosburgh.net \
    --cc=kuba@kernel.org \
    --cc=leon@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=steffen.klassert@secunet.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox