[REGRESSION] Discussion on "xfrm: Duplicate SPI Handling"

public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed

* [REGRESSION] Discussion on "xfrm: Duplicate SPI Handling"
@ 2026-01-28  8:43 Yan Yan
  2026-02-02  9:44 ` Steffen Klassert
  2026-02-02 14:27 ` Antony Antony
  0 siblings, 2 replies; 9+ messages in thread
From: Yan Yan @ 2026-01-28  8:43 UTC (permalink / raw)
  To: netdev
  Cc: Steffen Klassert, Herbert Xu, antony.antony, David S . Miller,
	Eric Dumazet, Jakub Kicinski, pabeni, horms, saakashkumar,
	akamluddin, Nathan Harold, greg

Hi all,

I am writing because unfortunately commit: 94f39804d891 ("xfrm:
Duplicate SPI Handling") has caused a regression in the Android OS, so
we would like to gain some context to help determine next steps. The
issue is caused by the new requirement for global SPI uniqueness
during allocation. Based on our review of RFC 4301 and the previous
review history, we would like to highlight a few concerns:

1. Regression on Android:
This change breaks production behavior in Android, which uses
XFRM_MSG_ALLOCSPI to create larval SAs for both directions. Since RFC
4301 allows duplicate outbound SPIs, and larval SAs are often created
before direction or selectors are finalized, the allocator must remain
permissive (at least in our current design).
This also aligns with a concern Herbert Xu raised during the initial
review regarding compatibility:
>    "It's also dangerous to unilaterally do this since existing deployments
>    could rely on the old behaviour. You'd need to add a toggle for
>    compatibility."

2. Inbound SPI uniqueness should not be a hard requirement:
The justification for enforcing global SPI uniqueness often cites the
statement in RFC 4301, Section 4.1, that for unicast traffic, the SPI
"by itself suffices to specify an SA." However, we don’t think this
means inbound SPI uniqueness is a hard requirement because of the two
following reasons:

– Another statement implies that SPI uniqueness is just an
implementation choice:
>    "Each entry in the SA Database (SAD) MUST
>    indicate whether the SA lookup makes use of the destination IP address, or the
>    destination and source IP addresses, in addition to the SPI."

– There is a "Longest Match" mandate which makes SPI uniqueness unnecessary:
>    "For each inbound, IPsec-protected packet, an implementation MUST
>    conduct its search of the SAD such that it finds the entry that
>    matches the 'longest' SA identifier. In this context, if two or more
>    SAD entries match based on the SPI value, then the entry that also
>    matches based on destination address... is the 'longest' match."

3. Further clarification on the specific problem being addressed would
be helpful. The "real-world" problem this commit intends to fix
remains unclear. The patch mentions:
>    "This behavior causes inconsistencies during SPI lookups for inbound packets.
>    Since the lookup may return an arbitrary SA among those with the same SPI,
>    packet processing can fail, resulting in packet drops."

However, Linux kernel lookups using the triplet (SPI, Protocol, Daddr)
are deterministic. The lookup will not return an "arbitrary" SA
because the destination address is used to disambiguate the state.

As Antony suggested, this change may cater to SPI-only hardware
indexing. If that is the case, we are concerned about applying such
hardware-specific limits to the software stack, especially if the
behavior is not opt-in, as it appears to require an overly-narrow
reading of the RFC 4301.

Given these concerns, would it be possible to discuss a revert or,
alternatively, could further context be provided regarding the
specific real-world problem this commit was intended to address? Once
the underlying issue is clearly defined, we can work together to find
a backward-compatible solution that satisfies all requirements.

Review threads are attached for easy reference:
https://lore.kernel.org/netdev/aDQhZ_ikHEt_pLn_@gondor.apana.org.au/T/#r45c1786651ce5af730f757aca7438474d494a323
https://lore.kernel.org/netdev/20250616100621.837472-1-saakashkumar@marvell.com/T/#u

Best regards,

Yan (on behalf of the Android IPsec team)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [REGRESSION] Discussion on "xfrm: Duplicate SPI Handling"
  2026-01-28  8:43 [REGRESSION] Discussion on "xfrm: Duplicate SPI Handling" Yan Yan
@ 2026-02-02  9:44 ` Steffen Klassert
  2026-02-02 14:27 ` Antony Antony
  1 sibling, 0 replies; 9+ messages in thread
From: Steffen Klassert @ 2026-02-02  9:44 UTC (permalink / raw)
  To: Yan Yan
  Cc: netdev, Herbert Xu, antony.antony, David S . Miller, Eric Dumazet,
	Jakub Kicinski, pabeni, horms, saakashkumar, akamluddin,
	Nathan Harold, greg

On Wed, Jan 28, 2026 at 04:43:06PM +0800, Yan Yan wrote:
> Hi all,
> 
> I am writing because unfortunately commit: 94f39804d891 ("xfrm:
> Duplicate SPI Handling") has caused a regression in the Android OS, so
> we would like to gain some context to help determine next steps. The
> issue is caused by the new requirement for global SPI uniqueness
> during allocation. Based on our review of RFC 4301 and the previous
> review history, we would like to highlight a few concerns:
> 
> 1. Regression on Android:
> This change breaks production behavior in Android, which uses
> XFRM_MSG_ALLOCSPI to create larval SAs for both directions. Since RFC
> 4301 allows duplicate outbound SPIs, and larval SAs are often created
> before direction or selectors are finalized, the allocator must remain
> permissive (at least in our current design).
> This also aligns with a concern Herbert Xu raised during the initial
> review regarding compatibility:
> >    "It's also dangerous to unilaterally do this since existing deployments
> >    could rely on the old behaviour. You'd need to add a toggle for
> >    compatibility."

Indeed, it was...

> 2. Inbound SPI uniqueness should not be a hard requirement:
> The justification for enforcing global SPI uniqueness often cites the
> statement in RFC 4301, Section 4.1, that for unicast traffic, the SPI
> "by itself suffices to specify an SA." However, we don’t think this
> means inbound SPI uniqueness is a hard requirement because of the two
> following reasons:
> 
> – Another statement implies that SPI uniqueness is just an
> implementation choice:
> >    "Each entry in the SA Database (SAD) MUST
> >    indicate whether the SA lookup makes use of the destination IP address, or the
> >    destination and source IP addresses, in addition to the SPI."

When reading RFC 4301, I'd interpret this as 'The SA
must indicate whether it is unicast or multicast'.

> 
> – There is a "Longest Match" mandate which makes SPI uniqueness unnecessary:
> >    "For each inbound, IPsec-protected packet, an implementation MUST
> >    conduct its search of the SAD such that it finds the entry that
> >    matches the 'longest' SA identifier. In this context, if two or more
> >    SAD entries match based on the SPI value, then the entry that also
> >    matches based on destination address... is the 'longest' match."

This is meant for multicast SAs, we don't support multicast SAs.

> 3. Further clarification on the specific problem being addressed would
> be helpful. The "real-world" problem this commit intends to fix
> remains unclear. The patch mentions:
> >    "This behavior causes inconsistencies during SPI lookups for inbound packets.
> >    Since the lookup may return an arbitrary SA among those with the same SPI,
> >    packet processing can fail, resulting in packet drops."
> 
> However, Linux kernel lookups using the triplet (SPI, Protocol, Daddr)
> are deterministic. The lookup will not return an "arbitrary" SA
> because the destination address is used to disambiguate the state.

That was how RFC 2401 did it:

A security association is uniquely identified by a triple consisting
of a Security Parameter Index (SPI), an IP Destination Address, and a
security protocol (AH or ESP) identifier.

That was obsoleted by RFC RFC 4301 saying the SPI alone identifies
the SA for unicast traffic.

Unfortunately our implementation is somewhat between RFC 2401
and RFC 4301. There was no clear cut to support both versions.

Maybe it is time to do this now and implement RFC 4301
with a toggle to fallback to RFC 2401. This would avoid
these kind of problems in the future, at least.

> As Antony suggested, this change may cater to SPI-only hardware
> indexing. If that is the case, we are concerned about applying such
> hardware-specific limits to the software stack, especially if the
> behavior is not opt-in, as it appears to require an overly-narrow
> reading of the RFC 4301.
> 
> Given these concerns, would it be possible to discuss a revert or,
> alternatively, could further context be provided regarding the
> specific real-world problem this commit was intended to address? Once
> the underlying issue is clearly defined, we can work together to find
> a backward-compatible solution that satisfies all requirements.

Here we end up with the question about the default behaviour,
should the default be RFC 4301 or RFC 2401?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [REGRESSION] Discussion on "xfrm: Duplicate SPI Handling"
  2026-01-28  8:43 [REGRESSION] Discussion on "xfrm: Duplicate SPI Handling" Yan Yan
  2026-02-02  9:44 ` Steffen Klassert
@ 2026-02-02 14:27 ` Antony Antony
  2026-02-09  3:03   ` Yan Yan
  1 sibling, 1 reply; 9+ messages in thread
From: Antony Antony @ 2026-02-02 14:27 UTC (permalink / raw)
  To: Yan Yan
  Cc: netdev, Steffen Klassert, Herbert Xu, antony.antony,
	David S . Miller, Eric Dumazet, Jakub Kicinski, pabeni, horms,
	saakashkumar, akamluddin, Nathan Harold, greg

Hi Yan,

On Wed, Jan 28, 2026 at 16:43:06 +0800, Yan Yan wrote:
> Hi all,
> 
> I am writing because unfortunately commit: 94f39804d891 ("xfrm:
> Duplicate SPI Handling") has caused a regression in the Android OS, so
> we would like to gain some context to help determine next steps. The
> issue is caused by the new requirement for global SPI uniqueness
> during allocation. Based on our review of RFC 4301 and the previous
> review history, we would like to highlight a few concerns:
> 
> 1. Regression on Android:
> This change breaks production behavior in Android, which uses
> XFRM_MSG_ALLOCSPI to create larval SAs for both directions. Since RFC
> 4301 allows duplicate outbound SPIs, and larval SAs are often created

To clarify, how are larval outbound SAs created in Android?
Are they created with zero or non-zero SPIs? Multiple zero SPIs (acquire states) are allowed. However, I guess there is no user API for managing these acquire states. Only the kernel can create them and handle expiration or deletion via SA updates with acq_req?

A user API (UAPI) for creating and deleting acquire states might be a possible solution.

I haven’t been able to consistently reproduce the issue Marvell reported,
but I suspect the bug could also affect outbound SAs with non-zero SPIs.
Also when one peer is behind NAT. 

I wonder wouldn't duplicate SPI when behind NAT cause issues for output SAs? 
Because the triplet is SPI, Protocol, Daddr. There is  no dport in it.

> before direction or selectors are finalized, the allocator must remain
> permissive (at least in our current design).
> This also aligns with a concern Herbert Xu raised during the initial
> review regarding compatibility:
> >    "It's also dangerous to unilaterally do this since existing deployments
> >    could rely on the old behaviour. You'd need to add a toggle for
> >    compatibility."
> 
> 2. Inbound SPI uniqueness should not be a hard requirement:
> The justification for enforcing global SPI uniqueness often cites the
> statement in RFC 4301, Section 4.1, that for unicast traffic, the SPI
> "by itself suffices to specify an SA." However, we don’t think this
> means inbound SPI uniqueness is a hard requirement because of the two
> following reasons:
> 
> – Another statement implies that SPI uniqueness is just an
> implementation choice:
> >    "Each entry in the SA Database (SAD) MUST
> >    indicate whether the SA lookup makes use of the destination IP address, or the
> >    destination and source IP addresses, in addition to the SPI."
> 
> – There is a "Longest Match" mandate which makes SPI uniqueness unnecessary:
> >    "For each inbound, IPsec-protected packet, an implementation MUST
> >    conduct its search of the SAD such that it finds the entry that
> >    matches the 'longest' SA identifier. In this context, if two or more
> >    SAD entries match based on the SPI value, then the entry that also
> >    matches based on destination address... is the 'longest' match."
> 
> 3. Further clarification on the specific problem being addressed would
> be helpful. The "real-world" problem this commit intends to fix
> remains unclear. The patch mentions:
> >    "This behavior causes inconsistencies during SPI lookups for inbound packets.
> >    Since the lookup may return an arbitrary SA among those with the same SPI,
> >    packet processing can fail, resulting in packet drops."
> 
> However, Linux kernel lookups using the triplet (SPI, Protocol, Daddr)
> are deterministic. The lookup will not return an "arbitrary" SA
> because the destination address is used to disambiguate the state.
> 
> As Antony suggested, this change may cater to SPI-only hardware
> indexing. If that is the case, we are concerned about applying such
> hardware-specific limits to the software stack, especially if the
> behavior is not opt-in, as it appears to require an overly-narrow
> reading of the RFC 4301. 

I agree with your suggestion that making the behavior opt-in. 
I would prefer the Default : to allow duplicate.

> Given these concerns, would it be possible to discuss a revert or,
> alternatively, could further context be provided regarding the
> specific real-world problem this commit was intended to address? Once
> the underlying issue is clearly defined, we can work together to find
> a backward-compatible solution that satisfies all requirements.
> 
> Review threads are attached for easy reference:
> https://lore.kernel.org/netdev/aDQhZ_ikHEt_pLn_@gondor.apana.org.au/T/#r45c1786651ce5af730f757aca7438474d494a323
> https://lore.kernel.org/netdev/20250616100621.837472-1-saakashkumar@marvell.com/T/#u

-antony

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [REGRESSION] Discussion on "xfrm: Duplicate SPI Handling"
  2026-02-02 14:27 ` Antony Antony
@ 2026-02-09  3:03   ` Yan Yan
  2026-02-09 19:05     ` Nathan Harold
  2026-02-10 10:17     ` Tobias Brunner
  0 siblings, 2 replies; 9+ messages in thread
From: Yan Yan @ 2026-02-09  3:03 UTC (permalink / raw)
  To: antony.antony, Steffen Klassert, paul
  Cc: netdev, Herbert Xu, David S . Miller, Eric Dumazet,
	Jakub Kicinski, pabeni, horms, saakashkumar, akamluddin,
	Nathan Harold, greg

Thank you all for the reply! Please see my response inline below

> This is meant for multicast SAs, we don't support multicast SAs.
I reread the RFC and that makes sense to me. Thanks for the explanation.

> To clarify, how are larval outbound SAs created in Android?
> Are they created with zero or non-zero SPIs? Multiple zero SPIs (acquire states) are allowed.

In Android, larval outbound SAs are created with non-zero SPIs because
Android userspace uses XFRM_MSG_ALLOCSPI to manually reserve a unique
value before the SA is fully finalized. These are not kernel-generated
"acquire" states.

Regarding the default behavior:

I'm fine with defaulting to RFC 4301 for the upstream kernel to align
with the modern standard. As long as a toggle exists to fallback to
the legacy RFC 2401 behavior, Android’s compatibility requirements are
met.




On Mon, Feb 2, 2026 at 10:27 PM Antony Antony <antony.antony@secunet.com> wrote:
>
> Hi Yan,
>
> On Wed, Jan 28, 2026 at 16:43:06 +0800, Yan Yan wrote:
> > Hi all,
> >
> > I am writing because unfortunately commit: 94f39804d891 ("xfrm:
> > Duplicate SPI Handling") has caused a regression in the Android OS, so
> > we would like to gain some context to help determine next steps. The
> > issue is caused by the new requirement for global SPI uniqueness
> > during allocation. Based on our review of RFC 4301 and the previous
> > review history, we would like to highlight a few concerns:
> >
> > 1. Regression on Android:
> > This change breaks production behavior in Android, which uses
> > XFRM_MSG_ALLOCSPI to create larval SAs for both directions. Since RFC
> > 4301 allows duplicate outbound SPIs, and larval SAs are often created
>
> To clarify, how are larval outbound SAs created in Android?
> Are they created with zero or non-zero SPIs? Multiple zero SPIs (acquire states) are allowed. However, I guess there is no user API for managing these acquire states. Only the kernel can create them and handle expiration or deletion via SA updates with acq_req?
>
> A user API (UAPI) for creating and deleting acquire states might be a possible solution.
>
> I haven’t been able to consistently reproduce the issue Marvell reported,
> but I suspect the bug could also affect outbound SAs with non-zero SPIs.
> Also when one peer is behind NAT.
>
> I wonder wouldn't duplicate SPI when behind NAT cause issues for output SAs?
> Because the triplet is SPI, Protocol, Daddr. There is  no dport in it.
>
> > before direction or selectors are finalized, the allocator must remain
> > permissive (at least in our current design).
> > This also aligns with a concern Herbert Xu raised during the initial
> > review regarding compatibility:
> > >    "It's also dangerous to unilaterally do this since existing deployments
> > >    could rely on the old behaviour. You'd need to add a toggle for
> > >    compatibility."
> >
> > 2. Inbound SPI uniqueness should not be a hard requirement:
> > The justification for enforcing global SPI uniqueness often cites the
> > statement in RFC 4301, Section 4.1, that for unicast traffic, the SPI
> > "by itself suffices to specify an SA." However, we don’t think this
> > means inbound SPI uniqueness is a hard requirement because of the two
> > following reasons:
> >
> > – Another statement implies that SPI uniqueness is just an
> > implementation choice:
> > >    "Each entry in the SA Database (SAD) MUST
> > >    indicate whether the SA lookup makes use of the destination IP address, or the
> > >    destination and source IP addresses, in addition to the SPI."
> >
> > – There is a "Longest Match" mandate which makes SPI uniqueness unnecessary:
> > >    "For each inbound, IPsec-protected packet, an implementation MUST
> > >    conduct its search of the SAD such that it finds the entry that
> > >    matches the 'longest' SA identifier. In this context, if two or more
> > >    SAD entries match based on the SPI value, then the entry that also
> > >    matches based on destination address... is the 'longest' match."
> >
> > 3. Further clarification on the specific problem being addressed would
> > be helpful. The "real-world" problem this commit intends to fix
> > remains unclear. The patch mentions:
> > >    "This behavior causes inconsistencies during SPI lookups for inbound packets.
> > >    Since the lookup may return an arbitrary SA among those with the same SPI,
> > >    packet processing can fail, resulting in packet drops."
> >
> > However, Linux kernel lookups using the triplet (SPI, Protocol, Daddr)
> > are deterministic. The lookup will not return an "arbitrary" SA
> > because the destination address is used to disambiguate the state.
> >
> > As Antony suggested, this change may cater to SPI-only hardware
> > indexing. If that is the case, we are concerned about applying such
> > hardware-specific limits to the software stack, especially if the
> > behavior is not opt-in, as it appears to require an overly-narrow
> > reading of the RFC 4301.
>
> I agree with your suggestion that making the behavior opt-in.
> I would prefer the Default : to allow duplicate.
>
> > Given these concerns, would it be possible to discuss a revert or,
> > alternatively, could further context be provided regarding the
> > specific real-world problem this commit was intended to address? Once
> > the underlying issue is clearly defined, we can work together to find
> > a backward-compatible solution that satisfies all requirements.
> >
> > Review threads are attached for easy reference:
> > https://lore.kernel.org/netdev/aDQhZ_ikHEt_pLn_@gondor.apana.org.au/T/#r45c1786651ce5af730f757aca7438474d494a323
> > https://lore.kernel.org/netdev/20250616100621.837472-1-saakashkumar@marvell.com/T/#u
>
> -antony



--
--
Best,
Yan

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [REGRESSION] Discussion on "xfrm: Duplicate SPI Handling"
  2026-02-09  3:03   ` Yan Yan
@ 2026-02-09 19:05     ` Nathan Harold
  2026-02-10 10:17     ` Tobias Brunner
  1 sibling, 0 replies; 9+ messages in thread
From: Nathan Harold @ 2026-02-09 19:05 UTC (permalink / raw)
  To: Yan Yan
  Cc: antony.antony, Steffen Klassert, paul, netdev, Herbert Xu,
	David S . Miller, Eric Dumazet, Jakub Kicinski, pabeni, horms,
	saakashkumar, akamluddin, greg

Unfortunately, Android's requirements significantly limit our ability
to implement compatibility fixes between the OS and kernel due to
interface freezes and cross-version compatibility requirements, so
thank you for taking us into consideration! Generally, without a
userspace mechanism to opt in to new behavior though, it's tough. For
now, the Android kernel is just reverting these patches, but that's
not a long-term strategy.

One approach, which seems generally sound to me for this specific
issue (and perhaps more broadly), would be to leverage the relatively
new XFRM_SA_DIR attribute to distinguish between legacy behavior and
stricter 4301 compliance. Much of the optimization is
direction-specific, so we can infer that an implementation using the
newer API and opting in to direction-specific behavior wants 4301/new
feature support--also, Android doesn't yet use that attribute :-) .
Or, as Steffen suggested, there could be a new toggle that indicates a
standards-based target behavior (like 4301), which sounds great if the
SA_DIR flag is deemed insufficient. Thoughts?

-Nathan


On Sun, Feb 8, 2026 at 7:03 PM Yan Yan <evitayan@google.com> wrote:
>
> Thank you all for the reply! Please see my response inline below
>
> > This is meant for multicast SAs, we don't support multicast SAs.
> I reread the RFC and that makes sense to me. Thanks for the explanation.
>
> > To clarify, how are larval outbound SAs created in Android?
> > Are they created with zero or non-zero SPIs? Multiple zero SPIs (acquire states) are allowed.
>
> In Android, larval outbound SAs are created with non-zero SPIs because
> Android userspace uses XFRM_MSG_ALLOCSPI to manually reserve a unique
> value before the SA is fully finalized. These are not kernel-generated
> "acquire" states.
>
> Regarding the default behavior:
>
> I'm fine with defaulting to RFC 4301 for the upstream kernel to align
> with the modern standard. As long as a toggle exists to fallback to
> the legacy RFC 2401 behavior, Android’s compatibility requirements are
> met.
>
>
>
>
> On Mon, Feb 2, 2026 at 10:27 PM Antony Antony <antony.antony@secunet.com> wrote:
> >
> > Hi Yan,
> >
> > On Wed, Jan 28, 2026 at 16:43:06 +0800, Yan Yan wrote:
> > > Hi all,
> > >
> > > I am writing because unfortunately commit: 94f39804d891 ("xfrm:
> > > Duplicate SPI Handling") has caused a regression in the Android OS, so
> > > we would like to gain some context to help determine next steps. The
> > > issue is caused by the new requirement for global SPI uniqueness
> > > during allocation. Based on our review of RFC 4301 and the previous
> > > review history, we would like to highlight a few concerns:
> > >
> > > 1. Regression on Android:
> > > This change breaks production behavior in Android, which uses
> > > XFRM_MSG_ALLOCSPI to create larval SAs for both directions. Since RFC
> > > 4301 allows duplicate outbound SPIs, and larval SAs are often created
> >
> > To clarify, how are larval outbound SAs created in Android?
> > Are they created with zero or non-zero SPIs? Multiple zero SPIs (acquire states) are allowed. However, I guess there is no user API for managing these acquire states. Only the kernel can create them and handle expiration or deletion via SA updates with acq_req?
> >
> > A user API (UAPI) for creating and deleting acquire states might be a possible solution.
> >
> > I haven’t been able to consistently reproduce the issue Marvell reported,
> > but I suspect the bug could also affect outbound SAs with non-zero SPIs.
> > Also when one peer is behind NAT.
> >
> > I wonder wouldn't duplicate SPI when behind NAT cause issues for output SAs?
> > Because the triplet is SPI, Protocol, Daddr. There is  no dport in it.
> >
> > > before direction or selectors are finalized, the allocator must remain
> > > permissive (at least in our current design).
> > > This also aligns with a concern Herbert Xu raised during the initial
> > > review regarding compatibility:
> > > >    "It's also dangerous to unilaterally do this since existing deployments
> > > >    could rely on the old behaviour. You'd need to add a toggle for
> > > >    compatibility."
> > >
> > > 2. Inbound SPI uniqueness should not be a hard requirement:
> > > The justification for enforcing global SPI uniqueness often cites the
> > > statement in RFC 4301, Section 4.1, that for unicast traffic, the SPI
> > > "by itself suffices to specify an SA." However, we don’t think this
> > > means inbound SPI uniqueness is a hard requirement because of the two
> > > following reasons:
> > >
> > > – Another statement implies that SPI uniqueness is just an
> > > implementation choice:
> > > >    "Each entry in the SA Database (SAD) MUST
> > > >    indicate whether the SA lookup makes use of the destination IP address, or the
> > > >    destination and source IP addresses, in addition to the SPI."
> > >
> > > – There is a "Longest Match" mandate which makes SPI uniqueness unnecessary:
> > > >    "For each inbound, IPsec-protected packet, an implementation MUST
> > > >    conduct its search of the SAD such that it finds the entry that
> > > >    matches the 'longest' SA identifier. In this context, if two or more
> > > >    SAD entries match based on the SPI value, then the entry that also
> > > >    matches based on destination address... is the 'longest' match."
> > >
> > > 3. Further clarification on the specific problem being addressed would
> > > be helpful. The "real-world" problem this commit intends to fix
> > > remains unclear. The patch mentions:
> > > >    "This behavior causes inconsistencies during SPI lookups for inbound packets.
> > > >    Since the lookup may return an arbitrary SA among those with the same SPI,
> > > >    packet processing can fail, resulting in packet drops."
> > >
> > > However, Linux kernel lookups using the triplet (SPI, Protocol, Daddr)
> > > are deterministic. The lookup will not return an "arbitrary" SA
> > > because the destination address is used to disambiguate the state.
> > >
> > > As Antony suggested, this change may cater to SPI-only hardware
> > > indexing. If that is the case, we are concerned about applying such
> > > hardware-specific limits to the software stack, especially if the
> > > behavior is not opt-in, as it appears to require an overly-narrow
> > > reading of the RFC 4301.
> >
> > I agree with your suggestion that making the behavior opt-in.
> > I would prefer the Default : to allow duplicate.
> >
> > > Given these concerns, would it be possible to discuss a revert or,
> > > alternatively, could further context be provided regarding the
> > > specific real-world problem this commit was intended to address? Once
> > > the underlying issue is clearly defined, we can work together to find
> > > a backward-compatible solution that satisfies all requirements.
> > >
> > > Review threads are attached for easy reference:
> > > https://lore.kernel.org/netdev/aDQhZ_ikHEt_pLn_@gondor.apana.org.au/T/#r45c1786651ce5af730f757aca7438474d494a323
> > > https://lore.kernel.org/netdev/20250616100621.837472-1-saakashkumar@marvell.com/T/#u
> >
> > -antony
>
>
>
> --
> --
> Best,
> Yan

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [REGRESSION] Discussion on "xfrm: Duplicate SPI Handling"
  2026-02-09  3:03   ` Yan Yan
  2026-02-09 19:05     ` Nathan Harold
@ 2026-02-10 10:17     ` Tobias Brunner
  2026-02-18  4:36       ` Yan Yan
  1 sibling, 1 reply; 9+ messages in thread
From: Tobias Brunner @ 2026-02-10 10:17 UTC (permalink / raw)
  To: Yan Yan, antony.antony, Steffen Klassert, paul
  Cc: netdev, Herbert Xu, David S . Miller, Eric Dumazet,
	Jakub Kicinski, pabeni, horms, saakashkumar, akamluddin,
	Nathan Harold, greg

Hi Yan,

>> To clarify, how are larval outbound SAs created in Android?
>> Are they created with zero or non-zero SPIs? Multiple zero SPIs (acquire states) are allowed.
> 
> In Android, larval outbound SAs are created with non-zero SPIs because
> Android userspace uses XFRM_MSG_ALLOCSPI to manually reserve a unique
> value before the SA is fully finalized. These are not kernel-generated
> "acquire" states.

Could you clarify what exactly the reasoning is behind this?  Do you use
this for every outbound SA or just for special use cases?  Do you set
the same SPI for in- and outbound SA?  (If so, what happens later when
the SPI of the outbound SA is received from the peer?)  Or do you use
ALLOCSPI once you already know the peer's SPI?  (And only run into the
issue if both allocate the same SPI by chance.)  If so, why would you
use ALLOCSPI and not just install the outbound SA?  Is it to avoid
differences for in- and outbound SAs (ALLOCSPI+UPDSA vs. NEWSA)?

Regards,
Tobias

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [REGRESSION] Discussion on "xfrm: Duplicate SPI Handling"
  2026-02-10 10:17     ` Tobias Brunner
@ 2026-02-18  4:36       ` Yan Yan
  2026-02-18  8:41         ` Tobias Brunner
  0 siblings, 1 reply; 9+ messages in thread
From: Yan Yan @ 2026-02-18  4:36 UTC (permalink / raw)
  To: Tobias Brunner
  Cc: antony.antony, Steffen Klassert, paul, netdev, Herbert Xu,
	David S . Miller, Eric Dumazet, Jakub Kicinski, pabeni, horms,
	saakashkumar, akamluddin, Nathan Harold, greg

Hi Tobias,

Regarding your questions:

For every inbound SA, we allocate SPIs before negotiation. For
outbound SAs, we allocate SPIs once requested by the peer. We only
require the (SPI, destination address) combo to be unique. Thus, we
may have an inbound and outbound SA sharing an SPI with different
destinations, or multiple outbound SAs to different peers sharing an
SPI.

> If so, why would you use ALLOCSPI and not just install the outbound SA? Is it to avoid differences for in- and outbound SAs (ALLOCSPI+UPDSA vs. NEWSA)?"

Exactly—it is primarily for code symmetry. By using ALLOCSPI + UPDSA
for both directions, we maintain a consistent larval lifecycle and
make it easier to maintain.


On Tue, Feb 10, 2026 at 6:17 PM Tobias Brunner <tobias@strongswan.org> wrote:
>
> Hi Yan,
>
> >> To clarify, how are larval outbound SAs created in Android?
> >> Are they created with zero or non-zero SPIs? Multiple zero SPIs (acquire states) are allowed.
> >
> > In Android, larval outbound SAs are created with non-zero SPIs because
> > Android userspace uses XFRM_MSG_ALLOCSPI to manually reserve a unique
> > value before the SA is fully finalized. These are not kernel-generated
> > "acquire" states.
>
> Could you clarify what exactly the reasoning is behind this?  Do you use
> this for every outbound SA or just for special use cases?  Do you set
> the same SPI for in- and outbound SA?  (If so, what happens later when
> the SPI of the outbound SA is received from the peer?)  Or do you use
> ALLOCSPI once you already know the peer's SPI?  (And only run into the
> issue if both allocate the same SPI by chance.)  If so, why would you
> use ALLOCSPI and not just install the outbound SA?  Is it to avoid
> differences for in- and outbound SAs (ALLOCSPI+UPDSA vs. NEWSA)?
>
> Regards,
> Tobias
>


-- 
--
Best,
Yan

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [REGRESSION] Discussion on "xfrm: Duplicate SPI Handling"
  2026-02-18  4:36       ` Yan Yan
@ 2026-02-18  8:41         ` Tobias Brunner
  2026-02-24 23:53           ` Nathan Harold
  0 siblings, 1 reply; 9+ messages in thread
From: Tobias Brunner @ 2026-02-18  8:41 UTC (permalink / raw)
  To: Yan Yan
  Cc: antony.antony, Steffen Klassert, paul, netdev, Herbert Xu,
	David S . Miller, Eric Dumazet, Jakub Kicinski, pabeni, horms,
	saakashkumar, akamluddin, Nathan Harold, greg

Hi Yan,

> For every inbound SA, we allocate SPIs before negotiation. For
> outbound SAs, we allocate SPIs once requested by the peer. We only
> require the (SPI, destination address) combo to be unique. Thus, we
> may have an inbound and outbound SA sharing an SPI with different
> destinations, or multiple outbound SAs to different peers sharing an
> SPI.

That should still be allowed when using the intended APIs (i.e. ALLOCSPI
for the inbound and NEWSA for the outbound SA).  ALLOCSPI might enforce
a unique SPI without considering the address, as that's intended for
local, inbound SAs, where the kernel has full control (looking at the
the patch, it's certainly not ideal, as it goes through all installed
SAs to find a duplicate and it prevents an inbound SPI that matches an
existing outbound SPI - I guess that could be resolved by using separate
tables for in- and outbound SAs).  But that must not prevent installing
outbound SAs with the same SPI to another peer using NEWSA, which still
uses a hash that includes the destination address (that must always be
the case because peers are free to allocate whatever SPI they want).

>> If so, why would you use ALLOCSPI and not just install the outbound SA? Is it to avoid differences for in- and outbound SAs (ALLOCSPI+UPDSA vs. NEWSA)?"
> 
> Exactly—it is primarily for code symmetry. By using ALLOCSPI + UPDSA
> for both directions, we maintain a consistent larval lifecycle and
> make it easier to maintain.

In my opinion, you are using the API incorrectly.  ALLOCSPI is intended
to allocate a free local SPI for an inbound SA.  That is, reserve it
before and while the details of the SA are negotiated with the peer
using IKE.  This step isn't necessary for outbound SAs and forcing such
an allocation, after all the details are known, to the responder's SPI
(which I assume you do via min/max option) doesn't feel right.  I also
don't think there are any benefits in that "consistent larval lifecycle"
(if you found any, please let us know).  And the difference between
UPDSA and NEWSA is the nlmsg_type (there are some attributes that are
different for in- and outbound SAs, especially if you set the direction
in newer kernels, but that's the case regardless of the message type).

By the way, are you using the min/max option for inbound SAs as well,
with an SPI generated in userland?  That would seem like a violation of
the intention of the API as well (i.e. letting the kernel control the
local SPIs).

As the XFRM API basically mirrors PF_KEYv2 here, you can find more about
the two ways to install SAs in RFC 2367 (SADB_GETSPI/UPDATE vs. SADB_ADD).

Regards,
Tobias

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [REGRESSION] Discussion on "xfrm: Duplicate SPI Handling"
  2026-02-18  8:41         ` Tobias Brunner
@ 2026-02-24 23:53           ` Nathan Harold
  0 siblings, 0 replies; 9+ messages in thread
From: Nathan Harold @ 2026-02-24 23:53 UTC (permalink / raw)
  To: Tobias Brunner
  Cc: Yan Yan, antony.antony, Steffen Klassert, paul, netdev,
	Herbert Xu, David S . Miller, Eric Dumazet, Jakub Kicinski,
	pabeni, horms, saakashkumar, akamluddin, greg

> That should still be allowed when using the intended APIs (i.e. ALLOCSPI
> for the inbound and NEWSA for the outbound SA).  ALLOCSPI might enforce
> a unique SPI without considering the address, as that's intended for
> local, inbound SAs, where the kernel has full control (looking at the
> the patch, it's certainly not ideal, as it goes through all installed
> SAs to find a duplicate and it prevents an inbound SPI that matches an
> existing outbound SPI - I guess that could be resolved by using separate
> tables for in- and outbound SAs).  But that must not prevent installing
> outbound SAs with the same SPI to another peer using NEWSA, which still
> uses a hash that includes the destination address (that must always be
> the case because peers are free to allocate whatever SPI they want).

Agreed that there are some unfortunate limitations with the current
patch. Keying off the inclusion of XFRM_SA_DIR would resolve the issue
you noted (conflating inbound and outbound SPIs) and function as an
opt-in for this enforcement. Whatever the mechanism though, the new
behavior should be opt-in rather than opt-out in order to maintain
backwards compatibility.

> In my opinion, you are using the API incorrectly...  I also
> don't think there are any benefits in that "consistent larval lifecycle"
> (if you found any, please let us know).

The Android architecture is multi-tenant and allows userspace apps to
establish SAs. At the time we designed it, this felt like the cleanest
way to facilitate leak-free resource management because the chain of
associations between kernel resources could be symmetrical (and
managing them was already quite complicated). Mea culpa (Nathan). But,
correctly or not, it has/had worked for many years.

> By the way, are you using the min/max option for inbound SAs as well,
> with an SPI generated in userland?  That would seem like a violation of
> the intention of the API as well (i.e. letting the kernel control the
> local SPIs).

We provide following two Android APIs for app developers:
https://developer.android.com/reference/android/net/IpSecManager#allocateSecurityParameterIndex(java.net.InetAddress)
https://developer.android.com/reference/android/net/IpSecManager#allocateSecurityParameterIndex(java.net.InetAddress,%20int)

Indeed, allocateSecurityParameterIndex is direction-agnostic; both
overloads are implemented internally by including the min and max
values in ALLOCSPI. For the variant where app developers provide a
specific SPI (which is also useful in testing), Android simply sets
both the min and max parameters to that exact value. Our understanding
of #xfrm_alloc_spi  is that min/max are required for ALLOCSPI, and
otherwise ENOENT will be returned.

Note that we also use the DADDR as a mandatory part of the tuple
because of the issue mentioned above: SPIs are only unique in
conjunction with a DADDR, regardless of direction, and accordingly,
that’s how Android is expecting the uniqueness requirement be
enforced. In this way, 5 duplicate SPIs can be used on 5 unique IP
addresses on the same machine; therefore, a strict "SPI only"
interpretation for ALLOCSPI (or SPI handling in general) is curious.
We feel that ALLOCSPI should really enforce the same uniqueness
requirements as the SAD.

Best,

Nathan and Yan




-Nathan


On Wed, Feb 18, 2026 at 12:42 AM Tobias Brunner <tobias@strongswan.org> wrote:
>
> Hi Yan,
>
> > For every inbound SA, we allocate SPIs before negotiation. For
> > outbound SAs, we allocate SPIs once requested by the peer. We only
> > require the (SPI, destination address) combo to be unique. Thus, we
> > may have an inbound and outbound SA sharing an SPI with different
> > destinations, or multiple outbound SAs to different peers sharing an
> > SPI.
>
> That should still be allowed when using the intended APIs (i.e. ALLOCSPI
> for the inbound and NEWSA for the outbound SA).  ALLOCSPI might enforce
> a unique SPI without considering the address, as that's intended for
> local, inbound SAs, where the kernel has full control (looking at the
> the patch, it's certainly not ideal, as it goes through all installed
> SAs to find a duplicate and it prevents an inbound SPI that matches an
> existing outbound SPI - I guess that could be resolved by using separate
> tables for in- and outbound SAs).  But that must not prevent installing
> outbound SAs with the same SPI to another peer using NEWSA, which still
> uses a hash that includes the destination address (that must always be
> the case because peers are free to allocate whatever SPI they want).
>
> >> If so, why would you use ALLOCSPI and not just install the outbound SA? Is it to avoid differences for in- and outbound SAs (ALLOCSPI+UPDSA vs. NEWSA)?"
> >
> > Exactly—it is primarily for code symmetry. By using ALLOCSPI + UPDSA
> > for both directions, we maintain a consistent larval lifecycle and
> > make it easier to maintain.
>
> In my opinion, you are using the API incorrectly.  ALLOCSPI is intended
> to allocate a free local SPI for an inbound SA.  That is, reserve it
> before and while the details of the SA are negotiated with the peer
> using IKE.  This step isn't necessary for outbound SAs and forcing such
> an allocation, after all the details are known, to the responder's SPI
> (which I assume you do via min/max option) doesn't feel right.  I also
> don't think there are any benefits in that "consistent larval lifecycle"
> (if you found any, please let us know).  And the difference between
> UPDSA and NEWSA is the nlmsg_type (there are some attributes that are
> different for in- and outbound SAs, especially if you set the direction
> in newer kernels, but that's the case regardless of the message type).
>
> By the way, are you using the min/max option for inbound SAs as well,
> with an SPI generated in userland?  That would seem like a violation of
> the intention of the API as well (i.e. letting the kernel control the
> local SPIs).
>
> As the XFRM API basically mirrors PF_KEYv2 here, you can find more about
> the two ways to install SAs in RFC 2367 (SADB_GETSPI/UPDATE vs. SADB_ADD).
>
> Regards,
> Tobias
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2026-02-24 23:53 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-28  8:43 [REGRESSION] Discussion on "xfrm: Duplicate SPI Handling" Yan Yan
2026-02-02  9:44 ` Steffen Klassert
2026-02-02 14:27 ` Antony Antony
2026-02-09  3:03   ` Yan Yan
2026-02-09 19:05     ` Nathan Harold
2026-02-10 10:17     ` Tobias Brunner
2026-02-18  4:36       ` Yan Yan
2026-02-18  8:41         ` Tobias Brunner
2026-02-24 23:53           ` Nathan Harold

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox