* Re: [PATCH 0/2] iio: adc: Initialize completions before requesting IRQs
From: Jonathan Cameron @ 2026-06-14 13:51 UTC (permalink / raw)
To: Maxwell Doose
Cc: David Lechner, Nuno Sá, Andy Shevchenko, Vladimir Zapolskiy,
Piotr Wojtaszczyk, Hartmut Knaack,
open list:IIO SUBSYSTEM AND DRIVERS,
moderated list:ARM/LPC32XX SOC SUPPORT, open list, Sangyun Kim,
Kyungwook Boo, Jaeyoung Chung
In-Reply-To: <20260613005812.160572-1-m32285159@gmail.com>
On Fri, 12 Jun 2026 19:58:09 -0500
Maxwell Doose <m32285159@gmail.com> wrote:
> Hi all,
>
> This short patch series fixes the issues raised by Jaeyoung Chung,
> Sangyun Kim, and Kyungwook Boo regarding init_completion() and spurious
> IRQs. The report is linked below [1], but I will also put it here
> inline:
>
> "lpc32xx_adc_probe() in drivers/iio/adc/lpc32xx_adc.c and
> spear_adc_probe() in drivers/iio/adc/spear_adc.c register their
> interrupt handler with devm_request_irq() before they initialize
> st->completion with init_completion(). If an interrupt arrives after
> devm_request_irq() and before init_completion(), the handler calls
> complete() on an uninitialized completion, causing a kernel panic.
>
> The probe path, in lpc32xx_adc_probe():
>
> iodev = devm_iio_device_alloc(&pdev->dev, sizeof(*st)); /* st kzalloc-zeroed */
> ...
> retval = devm_request_irq(&pdev->dev, irq, lpc32xx_adc_isr, 0,
> LPC32XXAD_NAME, st); /* register handler */
> ...
> init_completion(&st->completion); /* initialize completion */
>
> spear_adc_probe() has the same ordering: devm_request_irq() for
> spear_adc_isr() before init_completion(&st->completion).
>
> Both interrupt handlers, lpc32xx_adc_isr() and spear_adc_isr(), call
> complete():
>
> complete(&st->completion);
>
> If the device raises an interrupt before init_completion() runs,
> complete() acquires the uninitialized wait.lock and walks the zeroed
> task_list in swake_up_locked(). The zeroed task_list makes list_empty()
> return false, so swake_up_locked() dereferences a NULL list entry,
> triggering a KASAN wild-memory-access.
>
> Suggested fix: move init_completion(&st->completion) above
> devm_request_irq(), so the completion is valid before the handler can run.
>
> Reported-by: Sangyun Kim <sangyun.kim@snu.ac.kr>
> Reported-by: Kyungwook Boo <bookyungwook@gmail.com>"
>
> + Reported-by: Jaeyoung Chung <jjy600901@snu.ac.kr>
>
> Quick note, I ended up editing the report a little in the individual
> commits to match the driver we were fixing.
>
> [1] Link: https://lore.kernel.org/linux-iio/20260610115700.774689-1-jjy600901@snu.ac.kr/
Applied to the fixes-togreg branch of iio.git and marked for stable.
Note that I'll be rebasing on rc1 once available.
Thanks
Jonathan
>
> Maxwell Doose (2):
> iio: adc: lpc32xx: Initialize completion before requesting IRQ
> iio: adc: spear: Initialize completion before requesting IRQ
>
> drivers/iio/adc/lpc32xx_adc.c | 4 ++--
> drivers/iio/adc/spear_adc.c | 3 +--
> 2 files changed, 3 insertions(+), 4 deletions(-)
>
^ permalink raw reply
* Re: [PATCH v1 2/4] KVM: arm64: Move kvm_define_hypevents.h to arch/arm64/kvm/
From: Fuad Tabba @ 2026-06-14 13:41 UTC (permalink / raw)
To: Vincent Donnefort
Cc: maz, oliver.upton, joey.gouly, suzuki.poulose, yuzenghui,
catalin.marinas, will, rostedt, linux-arm-kernel, kvmarm,
kernel-team, qerret
In-Reply-To: <20260612142245.1015744-3-vdonnefort@google.com>
Hi Vincent,
On Fri, 12 Jun 2026 at 15:22, Vincent Donnefort <vdonnefort@google.com> wrote:
>
> kvm_define_hypevents.h is used to define the kernel-side structures for
> hypervisor events. It doesn't need to be used anywhere else than in
> hyp_trace.c.
>
> Move it to arch/arm64/kvm/
nit: rename and move
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Cheers,
/fuad
>
> Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
>
> diff --git a/arch/arm64/include/asm/kvm_define_hypevents.h b/arch/arm64/kvm/define_hypevents.h
> similarity index 100%
> rename from arch/arm64/include/asm/kvm_define_hypevents.h
> rename to arch/arm64/kvm/define_hypevents.h
> diff --git a/arch/arm64/kvm/hyp_trace.c b/arch/arm64/kvm/hyp_trace.c
> index c4b3ee552131..821bc93ecdd1 100644
> --- a/arch/arm64/kvm/hyp_trace.c
> +++ b/arch/arm64/kvm/hyp_trace.c
> @@ -391,7 +391,7 @@ static struct trace_remote_callbacks trace_remote_callbacks = {
>
> static const char *__hyp_enter_exit_reason_str(u8 reason);
>
> -#include <asm/kvm_define_hypevents.h>
> +#include "define_hypevents.h"
>
> static const char *__hyp_enter_exit_reason_str(u8 reason)
> {
> --
> 2.54.0.1136.gdb2ca164c4-goog
>
^ permalink raw reply
* Re: [PATCH v1 1/4] KVM: arm64: Allow early calls to pKVM host_share/unshare_hyp
From: Fuad Tabba @ 2026-06-14 13:39 UTC (permalink / raw)
To: Vincent Donnefort
Cc: maz, oliver.upton, joey.gouly, suzuki.poulose, yuzenghui,
catalin.marinas, will, rostedt, linux-arm-kernel, kvmarm,
kernel-team
In-Reply-To: <20260612142245.1015744-2-vdonnefort@google.com>
On Fri, 12 Jun 2026 at 15:22, Vincent Donnefort <vdonnefort@google.com> wrote:
>
> The hypervisor tracing for pKVM relies on the __pkvm_host_share_hyp and
> __pkvm_host_unshare_hyp HVCs. In order to start tracing as early as
> possible, allow those two HVCs before the host is deprivileged.
>
> Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
To my good friend Sashiko:
https://lore.kernel.org/all/20260529121755.2923500-1-tabba@google.com/
The hyp_trace_load() issue seems legit though.
As for this patch itself:
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Cheers,
/fuad
>
> diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
> index 043495f7fc78..fb049c40d04f 100644
> --- a/arch/arm64/include/asm/kvm_asm.h
> +++ b/arch/arm64/include/asm/kvm_asm.h
> @@ -89,12 +89,12 @@ enum __kvm_host_smccc_func {
> __KVM_HOST_SMCCC_FUNC___vgic_v3_restore_vmcr_aprs,
> __KVM_HOST_SMCCC_FUNC___vgic_v5_save_apr,
> __KVM_HOST_SMCCC_FUNC___vgic_v5_restore_vmcr_apr,
> + __KVM_HOST_SMCCC_FUNC___pkvm_host_share_hyp,
> + __KVM_HOST_SMCCC_FUNC___pkvm_host_unshare_hyp,
>
> MARKER(__KVM_HOST_SMCCC_FUNC_PKVM_ONLY),
>
> /* Hypercalls that are available only when pKVM has finalised. */
> - __KVM_HOST_SMCCC_FUNC___pkvm_host_share_hyp,
> - __KVM_HOST_SMCCC_FUNC___pkvm_host_unshare_hyp,
> __KVM_HOST_SMCCC_FUNC___pkvm_host_donate_guest,
> __KVM_HOST_SMCCC_FUNC___pkvm_host_share_guest,
> __KVM_HOST_SMCCC_FUNC___pkvm_host_unshare_guest,
> diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> index 06db299c37a8..f0c52667cf52 100644
> --- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> +++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> @@ -721,9 +721,9 @@ static const hcall_t host_hcall[] = {
> HANDLE_FUNC(__vgic_v3_restore_vmcr_aprs),
> HANDLE_FUNC(__vgic_v5_save_apr),
> HANDLE_FUNC(__vgic_v5_restore_vmcr_apr),
> -
> HANDLE_FUNC(__pkvm_host_share_hyp),
> HANDLE_FUNC(__pkvm_host_unshare_hyp),
> +
> HANDLE_FUNC(__pkvm_host_donate_guest),
> HANDLE_FUNC(__pkvm_host_share_guest),
> HANDLE_FUNC(__pkvm_host_unshare_guest),
> --
> 2.54.0.1136.gdb2ca164c4-goog
>
^ permalink raw reply
* Re: [PATCH 4/7] drivers: staging: media: sunxi: cedrus: add H616 variant
From: Jernej Škrabec @ 2026-06-14 13:36 UTC (permalink / raw)
To: wens
Cc: Maxime Ripard, Paul Kocialkowski, Mauro Carvalho Chehab,
Jernej Skrabec, Samuel Holland, Rob Herring, Krzysztof Kozlowski,
Conor Dooley, Greg Kroah-Hartman, linux-media, linux-staging,
devicetree, linux-sunxi, linux-arm-kernel, linux-kernel
In-Reply-To: <CAGb2v677Pi9s3eWC7aXh8j=+eJh7AV5Mucr7SDXMN9Lo8yA2nA@mail.gmail.com>
Dne sobota, 13. junij 2026 ob 16:34:00 Srednjeevropski poletni čas je Chen-Yu Tsai napisal(a):
> On Sat, Jun 13, 2026 at 6:33 PM Jernej Škrabec <jernej.skrabec@gmail.com> wrote:
> >
> > Dne sobota, 30. maj 2026 ob 18:43:05 Srednjeevropski poletni čas je Chen-Yu Tsai napisal(a):
> > > On Tue, May 5, 2026 at 7:18 PM Jernej Škrabec <jernej.skrabec@gmail.com> wrote:
> > > >
> > > > Dne torek, 5. maj 2026 ob 15:48:08 Srednjeevropski poletni čas je Chen-Yu Tsai napisal(a):
> > > > > The Allwinner H616 SoC has a video engine hardware block like the one
> > > > > found on previous generations such as the H6. In addition to the
> > > > > currently supported features of the H6, it is also supposed to include
> > > >
> > > > Remove "supposed".
> > >
> > > I can't actually verify that, so "supposed" is accurate from my point of
> > > view.
> >
> > Isn't info from manual good enough?
>
> The manual says the SoC supports it. Same was said for the H6. Then
> we discovered that the VP9 decoder was a separate Hantro block.
>
> So again, *I* cannot claim in the commit message that the hardware
> block supports VP9 decoding, because I have not verified it.
>
> > In the interest of unblocking this, I would be fine with "supposed" too,
> > but manual and all my experiments show VP9 is supported.
>
> Please give an ack or reviewed-by with a comment at the end stating
> VP9 verified.
Well, just go with original text.
Reviewed-by: Jernej Skrabec <jernej.skrabec@gmail.com>
Best regards,
Jernej
>
>
> Thanks
> ChenYu
>
>
> > Best regards,
> > Jernej
> >
> > >
> > > ChenYu
> > >
> > > > > a VP9 decoder. However software support for this is currently missing
> > > > > and still needs to be reverse engineered from the vendor BSP.
> > > > >
> > > > > Add the compatible for the H616 variant, using the H6 variant data.
> > > > >
> > > > > Signed-off-by: Chen-Yu Tsai <wens@kernel.org>
> > > >
> > > > With that:
> > > > Reviewed-by: Jernej Skrabec <jernej.skrabec@gmail.com>
> > > >
> > > > Best regards,
> > > > Jernej
> > > >
> > > >
> > >
> >
> >
> >
> >
> >
>
^ permalink raw reply
* Re: [PATCH v10 4/6] dt-bindings: sun6i-a31-mipi-dphy: Add V3s SoC compatible entry
From: Paul Kocialkowski @ 2026-06-14 13:28 UTC (permalink / raw)
To: Krzysztof Kozlowski
Cc: linux-media, devicetree, linux-arm-kernel, linux-sunxi,
linux-kernel, Yong Deng, Mauro Carvalho Chehab, Rob Herring,
Krzysztof Kozlowski, Conor Dooley, Chen-Yu Tsai, Jernej Skrabec,
Samuel Holland, Michael Turquette, Stephen Boyd, Brian Masney,
Maxime Ripard
In-Reply-To: <20260613-nondescript-sociable-goat-aee13a@quoll>
[-- Attachment #1: Type: text/plain, Size: 1535 bytes --]
Hi,
Le Sat 13 Jun 26, 20:22, Krzysztof Kozlowski a écrit :
> On Sat, Jun 13, 2026 at 05:26:53PM +0200, Paul Kocialkowski wrote:
> > The V3s/V3/S3 comes with a rx-only D-PHY paired with the MIPI CSI-2
> > controller. It is compatible with the D-PHY found on the A31.
> >
> > Add an entry with a new compatible and the A31 compatible as fallback.
> >
> > Signed-off-by: Paul Kocialkowski <paulk@sys-base.io>
> > ---
> > .../devicetree/bindings/phy/allwinner,sun6i-a31-mipi-dphy.yaml | 3 +++
> > 1 file changed, 3 insertions(+)
> >
> > diff --git a/Documentation/devicetree/bindings/phy/allwinner,sun6i-a31-mipi-dphy.yaml b/Documentation/devicetree/bindings/phy/allwinner,sun6i-a31-mipi-dphy.yaml
> > index 6a4fd4929959..3ca1a1c47032 100644
> > --- a/Documentation/devicetree/bindings/phy/allwinner,sun6i-a31-mipi-dphy.yaml
> > +++ b/Documentation/devicetree/bindings/phy/allwinner,sun6i-a31-mipi-dphy.yaml
> > @@ -21,6 +21,9 @@ properties:
> > - items:
> > - const: allwinner,sun50i-a64-mipi-dphy
> > - const: allwinner,sun6i-a31-mipi-dphy
> > + - items:
> > + - const: allwinner,sun8i-v3s-mipi-dphy
>
> So that's enum with previous first entry (50i-a64) - same fallback.
Ah sorry about that. Thanks for the review!
All the best,
Paul
--
Paul Kocialkowski,
Independent contractor - sys-base - https://www.sys-base.io/
Free software developer - https://www.paulk.fr/
Expert in multimedia, graphics and embedded hardware support with Linux.
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply
* Re: [PATCH] ARM: dts: exynos: Add bluetooth support to manta
From: Krzysztof Kozlowski @ 2026-06-14 13:25 UTC (permalink / raw)
To: Lukas Timmermann, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
Alim Akhtar
Cc: devicetree, linux-arm-kernel, linux-samsung-soc, linux-kernel,
Alexandre Marquet
In-Reply-To: <ai2OYGlCx0x4NUs8@archstation>
On 13/06/2026 19:11, Lukas Timmermann wrote:
> #include <atomic>
> On Mon, Apr 27, 2026 at 03:49:34PM +0200, Krzysztof Kozlowski wrote:
>> On 08/04/2026 13:56, Lukas Timmermann wrote:
>>> Enable the bcm4330-bt device for manta boards on serial0.
>>> Also adds the necessary pin definitions and interrupt handling for
>>> wakeup.
>>>
>>> Signed-off-by: Lukas Timmermann <linux@timmermann.space>
>>> Co-developed-by: Alexandre Marquet <tb@a-marquet.fr>
>>> Signed-off-by: Alexandre Marquet <tb@a-marquet.fr>
>>
>> Incomplete/incorrect DCO chain. Please do not reorder tags. Git does
>> them correctly, so you HAD to change them manually.
>>
>> You send the patch or you apply the patch so you must commit with sign off.
> I developed the actual patch based on his findings. We both don't really
> care about who is mentioned first or anything.
>
> Sorry. Yes I rearranged stuff. So it should be:
>
> co-dev: alex
> sign-off: alex
> co-dev: me
> sign-off: me
>
> Correct?
>>
Yes
Best regards,
Krzysztof
^ permalink raw reply
* Re: [PATCH v1 0/4] trace_hyp_printk() for pKVM/nVHE hypervisor
From: Fuad Tabba @ 2026-06-14 12:57 UTC (permalink / raw)
To: Vincent Donnefort
Cc: maz, oliver.upton, joey.gouly, suzuki.poulose, yuzenghui,
catalin.marinas, will, rostedt, linux-arm-kernel, kvmarm,
kernel-team, qerret
In-Reply-To: <20260612142245.1015744-1-vdonnefort@google.com>
Hi Vincent,
On Fri, 12 Jun 2026 at 15:22, Vincent Donnefort <vdonnefort@google.com> wrote:
>
> Hi all,
>
> This series adds a hypervisor event "hyp_printk" which enables
> developers to log pretty much anything into the hypervisor tracing
> buffer, just like the kernel function trace_printk().
>
> This enables rich logging from the hypervisor, while leaving all the
> string parsing burden to the kernel. This has been the main way of
> debugging pKVM in Android.
I tested the series on v7.1-rc7 under QEMU (cortex-a53 CPU, pKVM nVHE):
- Booted a host under pKVM with a non-protected kvmtool guest (npVM)
and a protected kvmtool guest (pVM).
- Functional test: added a temporary trace_hyp_printk() call site in
handle___kvm_vcpu_run() with 0-arg, 1-arg, and 2-arg calls. Mounted
tracefs, enabled the hyp_printk event, ran a kvmtool guest to trigger
vcpu_run, read the trace buffer. All expected entries appeared with
correctly formatted output.
One question: kvm_hyp_trace_init() returns early when
is_kernel_in_hyp_mode() is true. On VHE-capable hardware, pKVM uses
hVHE. So it seams that the entire hyp tracing subsystem (not just
hyp_printk) is non-functional in hVHE mode. Is hVHE support
intentionally deferred?
Cheers,
/fuad
>
> Even though not strictly related to trace_hyp_printk, I have added the
> following two patches:
>
> * KVM: arm64: Allow early calls to pKVM host_share/unshare_hyp
>
> This one mainly intends to support one of the new features I have
> posted here [1], which allows to enable tracing as early as
> possible. I have added it here to limit cross-posting.
>
> * KVM: arm64: Move kvm_define_hypevents.h to arch/arm64/kvm/
>
> This one is just a cleanup.
>
> [1] https://lore.kernel.org/all/20260605163825.1762953-1-vdonnefort@google.com/
>
> Vincent Donnefort (4):
> KVM: arm64: Allow early calls to pKVM host_share/unshare_hyp
> KVM: arm64: Move kvm_define_hypevents.h to arch/arm64/kvm/
> tracing/remotes: Add REMOTE_EVENT_CUSTOM_PRINTK() helper
> KVM: arm64: Add hyp_printk event to nVHE/pKVM hyp
>
> arch/arm64/include/asm/kvm_asm.h | 4 +-
> arch/arm64/include/asm/kvm_hypevents.h | 14 ++++
> arch/arm64/include/asm/kvm_hyptrace.h | 8 +++
> arch/arm64/kernel/image-vars.h | 1 +
> arch/arm64/kernel/vmlinux.lds.S | 4 ++
> .../define_hypevents.h} | 0
> .../kvm/hyp/include/nvhe/define_events.h | 2 -
> arch/arm64/kvm/hyp/include/nvhe/trace.h | 65 +++++++++++++++++++
> arch/arm64/kvm/hyp/nvhe/events.c | 6 ++
> arch/arm64/kvm/hyp/nvhe/hyp-main.c | 2 +-
> arch/arm64/kvm/hyp_trace.c | 60 ++++++++++++++++-
> include/trace/define_remote_events.h | 19 +++++-
> 12 files changed, 176 insertions(+), 9 deletions(-)
> rename arch/arm64/{include/asm/kvm_define_hypevents.h => kvm/define_hypevents.h} (100%)
>
>
> base-commit: 4549871118cf616eecdd2d939f78e3b9e1dddc48
> --
> 2.54.0.1136.gdb2ca164c4-goog
>
^ permalink raw reply
* Re: [PATCH RFC 4/4] Documentation/kernel-parameters: add/update printk_delay/boot_delay
From: Andrew Murray @ 2026-06-14 12:55 UTC (permalink / raw)
To: Petr Mladek
Cc: Jonathan Corbet, Shuah Khan, Russell King, Florian Fainelli,
Broadcom internal kernel review list, Ray Jui, Scott Branden,
Steven Rostedt, John Ogness, Sergey Senozhatsky, Andrew Morton,
Sebastian Andrzej Siewior, Clark Williams, Randy Dunlap,
Linus Torvalds, linux-doc, linux-kernel, linux-arm-kernel,
linux-rpi-kernel, linux-rt-devel
In-Reply-To: <aibfFQpK0Se-SiaT@pathway.suse.cz>
On Mon, 8 Jun 2026 at 16:26, Petr Mladek <pmladek@suse.com> wrote:
>
> On Mon 2026-06-01 00:17:40, Andrew Murray wrote:
> > boot_delay has been deprecated in favour of an extended printk_delay,
> > let's update kernel-parameters to reflect the addition of printk_delay
> > and the deprecation of boot_delay.
> >
> > Signed-off-by: Andrew Murray <amurray@thegoodpenguin.co.uk>
>
> LGTM:
>
> Reviewed-by: Petr Mladek <pmladek@suse.com>
>
> Best Regards,
> Petr
Thanks for the reviews!
Andrew Murray
^ permalink raw reply
* Re: [PATCH RFC 3/4] printk: nbcon: move printk_delay to console emiting code
From: Andrew Murray @ 2026-06-14 12:55 UTC (permalink / raw)
To: Petr Mladek
Cc: Jonathan Corbet, Shuah Khan, Russell King, Florian Fainelli,
Broadcom internal kernel review list, Ray Jui, Scott Branden,
Steven Rostedt, John Ogness, Sergey Senozhatsky, Andrew Morton,
Sebastian Andrzej Siewior, Clark Williams, Randy Dunlap,
Linus Torvalds, linux-doc, linux-kernel, linux-arm-kernel,
linux-rpi-kernel, linux-rt-devel
In-Reply-To: <aibe12WcrLxVWTez@pathway.suse.cz>
On Mon, 8 Jun 2026 at 16:25, Petr Mladek <pmladek@suse.com> wrote:
>
> On Mon 2026-06-01 00:17:39, Andrew Murray wrote:
> > The printk_delay and boot_delay features are helpful for debugging
> > as kernel output can be slowed down during boot allowing messages to
> > be seen before scrolling off the screen, or to correlate timing between
> > some physical event and console output.
> >
> > However, since the introduction of nbcon and the legacy printer thread
> > for PREEMPT_RT kernels, printk records are now emited to the console
> > asynchronously to the caller of printk. Thus, any printk delay added by
> > boot_delay/printk_delay continues to slow down the calling process but
> > may not have any impact to the rate in which records are emited to the
> > console.
> >
> > Let's address this by moving the printk delay from the calling code
> > to the console emiting code instead. Whilst this ensures that delays
> > are still observed (especially for slower consoles), it doesn't improve
> > the use-case of using boot_delay/printk_delay to correlate timings
> > between physical events and console output.
> >
> > --- a/include/linux/printk.h
> > +++ b/include/linux/printk.h
>
> The declaration is needed just inside kernel/printk/ directory.
> It should better be done via kernel/printk/internal.h
OK.
>
> > @@ -209,6 +209,7 @@ extern bool nbcon_device_try_acquire(struct console *con);
> > extern void nbcon_device_release(struct console *con);
> > void nbcon_atomic_flush_unsafe(void);
> > bool pr_flush(int timeout_ms, bool reset_on_progress);
> > +void printk_delay(bool use_atomic);
> > #else
> > static inline __printf(1, 0)
> > int vprintk(const char *s, va_list args)
> > @@ -326,6 +327,9 @@ static inline bool pr_flush(int timeout_ms, bool reset_on_progress)
> > {
> > return true;
> > }
> > +static inline void printk_delay(bool use_atomic)
> > +{
> > +}
> >
> > #endif
> >
> > diff --git a/kernel/printk/nbcon.c b/kernel/printk/nbcon.c
> > index d7044a7a214bdd4537a5e20d876d99bc3ffe8b3a..a507a2fed5bf4366e24330f763b842a698ecf6f7 100644
> > --- a/kernel/printk/nbcon.c
> > +++ b/kernel/printk/nbcon.c
> > @@ -1267,11 +1267,16 @@ static int nbcon_kthread_func(void *__console)
> >
> > con_flags = console_srcu_read_flags(con);
> >
> > + wctxt.len = 0;
> > +
> > if (console_is_usable(con, con_flags, false))
> > backlog = nbcon_emit_one(&wctxt, false);
> >
> > console_srcu_read_unlock(cookie);
> >
> > + if (backlog && wctxt.len > 0)
>
> Heh, this is tricky. It might probably work but it is not guarantted
> by design.
>
> The "backlog" name is a bit misleading. The value is basically
> wctxt.ctxt.backlog. The real meaning is that printk_get_next_message()
> was able to read a message. It means that there _was_ a backlog.
> But it is not clear whether there are still pending messages or not.
Yes I found that to be the case (see my notes in the cover letter) -
backlog is only true if a record was successfully retrieved, though
that record may be one that is suppressed.
>
> Also it is not clear that whether the message was pushed to the
> console or not. It might have been supressed in which case
> (wctxt.len == 0). But it might also be emitted only partially
> when a higher priority context took over the console context
> ownership.
You say it might probably work but isn't guaranteed by design, I'm
struggling to see what I've missed...
As far as I could tell, nbcon_emit_next_record only returns true when
a record has been printed and it still has context. The only exception
to that is where pmsg.outbuf_len is zero (suppressed), in which case
it may return true. Thus if (nbcon_emit_next_record() &&
!pmsg.outbuf_len) then we can be sure a record was printed. In order
to apply this test from the various callers...
for nbcon_emit_one - this returns ctxt->backlog if
nbcon_emit_next_record returned true. But backlog is *always* true
when nbcon_emit_next_record returns true. Thus the test of (backlog &&
wctxt.len) is equivelant to (nbcon_emit_next_record() &&
!pmsg.outbuf_len).
So I still think this implementation is valid.
>
> I would prefer to explicitely set some flag when
> nbcon_emit_next_record() really called con->write*().
> See below.
>
> > + printk_delay(false);
> > +
> > cond_resched();
> >
> > } while (backlog);
> > @@ -1525,6 +1530,8 @@ bool nbcon_legacy_emit_next_record(struct console *con, bool *handover,
> > }
> >
> > progress = nbcon_emit_one(&wctxt, use_atomic);
> > + if (progress && wctxt.len > 0)
>
> Same here.
>
> > + printk_delay(use_atomic);
> >
> > if (use_atomic) {
> > start_critical_timings();
> > @@ -1584,6 +1591,8 @@ static int __nbcon_atomic_flush_pending_con(struct console *con, u64 stop_seq)
> > if (!nbcon_context_try_acquire(ctxt, false))
> > return -EPERM;
> >
> > + wctxt.len = 0;
> > +
> > /*
> > * nbcon_emit_next_record() returns false when
> > * the console was handed over or taken over.
> > @@ -1595,7 +1604,9 @@ static int __nbcon_atomic_flush_pending_con(struct console *con, u64 stop_seq)
> > nbcon_context_release(ctxt);
> > }
> >
> > - if (!ctxt->backlog) {
> > + if (ctxt->backlog && wctxt.len > 0) {
> > + printk_delay(true);
> > + } else {
>
> This changes the semantic. The original code call this when
> no message was read. The new code would call this path also
> when the output was suppressed. It would probably work.
> But still.
Ah, good spot! I missed that.
>
> > /* Are there reserved but not yet finalized records? */
> > if (nbcon_seq_read(con) < stop_seq)
> > err = -ENOENT;
>
>
> As mentioned above, I would add a flag which would be set when
> con->write*() was called.
I'm not sure why I tried to avoid adding members to nbcon_context, but
I prefer your solution, it isn't so fragile, and makes it easier to
understand. I'll update for my next revision.
>
> It modifies the type of unsafe_takeover in struct nbcon_write_context.
> But it actually makes it more compatible with struct nbcon_state.
What is the intent of this change (bool to unsigned char)?
>
> My proposal (on top of this patch):
>
> diff --git a/include/linux/console.h b/include/linux/console.h
> index 5520e4477ad7..5a86942e55ef 100644
> --- a/include/linux/console.h
> +++ b/include/linux/console.h
> @@ -290,6 +290,7 @@ struct nbcon_context {
> * @outbuf: Pointer to the text buffer for output
> * @len: Length to write
> * @unsafe_takeover: If a hostile takeover in an unsafe state has occurred
> + * @emitted: The write context tried to emit the message. Might be incomplete.
> * @cpu: CPU on which the message was generated
> * @pid: PID of the task that generated the message
> * @comm: Name of the task that generated the message
> @@ -298,7 +299,8 @@ struct nbcon_write_context {
> struct nbcon_context __private ctxt;
> char *outbuf;
> unsigned int len;
> - bool unsafe_takeover;
> + unsigned char unsafe_takeover : 1;
> + unsigned char emitted : 1
> #ifdef CONFIG_PRINTK_EXECUTION_CTX
> int cpu;
> pid_t pid;
> diff --git a/kernel/printk/nbcon.c b/kernel/printk/nbcon.c
> index a507a2fed5bf..060534becefc 100644
> --- a/kernel/printk/nbcon.c
> +++ b/kernel/printk/nbcon.c
> @@ -1069,6 +1069,9 @@ static bool nbcon_emit_next_record(struct nbcon_write_context *wctxt, bool use_a
> else
> con->write_thread(con, wctxt);
>
> + /* Tried to emit something. Might be incomplete. */
> + wctxt.emitted = 1;
> +
> if (!wctxt->outbuf) {
> /*
> * Ownership was lost and reacquired by the driver. Handle it
> @@ -1267,14 +1270,14 @@ static int nbcon_kthread_func(void *__console)
>
> con_flags = console_srcu_read_flags(con);
>
> - wctxt.len = 0;
> + wctxt.emitted = 0;
>
> if (console_is_usable(con, con_flags, false))
> backlog = nbcon_emit_one(&wctxt, false);
>
> console_srcu_read_unlock(cookie);
>
> - if (backlog && wctxt.len > 0)
> + if (wctxt.emitted)
> printk_delay(false);
>
> cond_resched();
> @@ -1530,7 +1533,7 @@ bool nbcon_legacy_emit_next_record(struct console *con, bool *handover,
> }
>
> progress = nbcon_emit_one(&wctxt, use_atomic);
> - if (progress && wctxt.len > 0)
> + if (wctxt.emitted)
> printk_delay(use_atomic);
>
> if (use_atomic) {
> @@ -1591,7 +1594,7 @@ static int __nbcon_atomic_flush_pending_con(struct console *con, u64 stop_seq)
> if (!nbcon_context_try_acquire(ctxt, false))
> return -EPERM;
>
> - wctxt.len = 0;
> + wctxt.emitted = 0;
>
> /*
> * nbcon_emit_next_record() returns false when
> @@ -1604,9 +1607,10 @@ static int __nbcon_atomic_flush_pending_con(struct console *con, u64 stop_seq)
> nbcon_context_release(ctxt);
> }
>
> - if (ctxt->backlog && wctxt.len > 0) {
> + if (wctxt.emitted)
> printk_delay(true);
> - } else {
> +
> + if (!ctxt->backlog) {
> /* Are there reserved but not yet finalized records? */
> if (nbcon_seq_read(con) < stop_seq)
> err = -ENOENT;
Thanks,
Andrew Murray
^ permalink raw reply
* Re: [PATCH v2] Input: apple_z2 - bound the device-reported finger count
From: Joshua Peisach @ 2026-06-14 12:24 UTC (permalink / raw)
To: hexlabsecurity, Sasha Finkelstein, Dmitry Torokhov
Cc: linux-kernel, Janne Grunau, linux-arm-kernel, linux-input,
Sven Peter, asahi, Neal Gompa
In-Reply-To: <20260613-b4-disp-4ebcbd68-v2-1-0161acfbd688@proton.me>
On Sat Jun 13, 2026 at 9:22 PM EDT, Bryam Vargas via B4 Relay wrote:
> From: Bryam Vargas <hexlabsecurity@proton.me>
>
> apple_z2_parse_touches() takes the finger count from the touch
> controller's report and loops over that many fixed-size finger records
> without ever checking the count against the length of the report:
>
> nfingers = msg[APPLE_Z2_NUM_FINGERS_OFFSET];
> fingers = (struct apple_z2_finger *)(msg + APPLE_Z2_FINGERS_OFFSET);
> for (i = 0; i < nfingers; i++)
> /* read fingers[i] ... */
>
> msg points into the fixed 4000-byte z2->rx_buf and nfingers is a single
> device-supplied byte, so it can be as large as 255. A malicious,
> malfunctioning or counterfeit controller (or an interposer on the SPI
> bus) can report a large finger count in a short packet, making the loop
> read up to 255 * sizeof(struct apple_z2_finger) bytes starting 24 bytes
> into msg -- far past the 4000-byte buffer. This is a controller-driven
> heap out-of-bounds read, and the finger fields that are read (position,
> pressure, touch and tool dimensions) are forwarded to userspace as input
> events, leaking adjacent kernel memory.
>
> Bound the device-reported count to the number of finger records the
> report actually carries.
>
> Reported-by: sashiko-bot@kernel.org
> Closes: https://lore.kernel.org/all/20260613215358.329921F000E9@smtp.kernel.org/
> Fixes: 471a92f8a21a ("Input: apple_z2 - add a driver for Apple Z2 touchscreens")
> Cc: stable@vger.kernel.org
> Signed-off-by: Bryam Vargas <hexlabsecurity@proton.me>
> ---
> Changes since v1 [1]:
> - Keep the early-return at NUM_FINGERS_OFFSET instead of moving it to
> FINGERS_OFFSET, so a short zero-finger ("all lifted") report still
> reaches input_mt_sync_frame()/input_sync() and does not leave touches
> stuck on the screen (caught by the sashiko-bot review of v1 [2]). A
> packet too short to hold even one finger record clamps nfingers to 0
> instead of being dropped.
>
> [1] https://lore.kernel.org/all/20260613-b4-disp-f0148c89-v1-1-868a48b2a187@proton.me/
> [2] https://lore.kernel.org/all/20260614000725.6B8D11F000E9@smtp.kernel.org/
>
> Reachable on every touch interrupt once the controller is booted
> (apple_z2_irq -> apple_z2_read_packet -> apple_z2_parse_touches).
>
> nfingers is bounded here by the message length; the message length is in
> turn bounded by the companion "Input: apple_z2 - bound the device-reported
> packet length" change (in flight), which caps the device-reported pkt_len
> to the 4000-byte receive buffer. The two together close the device-driven
> out-of-bounds accesses in apple_z2_parse_touches() / apple_z2_read_packet().
>
> Verified with a faithful in-kernel KASAN litmus (the verbatim 4000-byte
> buffer, the struct apple_z2_finger layout and the parse loop),
> CONFIG_KASAN=y on x86_64:
>
> Arm A, nfingers = 255 in a short packet (msg_len 19):
> BUG: KASAN: slab-out-of-bounds in apple_z2_parse_touches
> Read of size 2 ... 1 bytes to the right of allocated 4000-byte region
> ... cache kmalloc-4k of size 4096
> Arm B, with this patch: a zero-finger report (msg_len 19) reaches the
> sync; a 255-finger claim is clamped to what the packet holds; clean.
> Arm C, benign device (3 fingers): clean
>
> AddressSanitizer (x86_64 and i386): heap-buffer-overflow READ, both ABIs.
>
> Reproducer and full logs available on request.
> ---
> drivers/input/touchscreen/apple_z2.c | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> diff --git a/drivers/input/touchscreen/apple_z2.c b/drivers/input/touchscreen/apple_z2.c
> index 271ababf0ad5..39ade83ef0de 100644
> --- a/drivers/input/touchscreen/apple_z2.c
> +++ b/drivers/input/touchscreen/apple_z2.c
> @@ -92,6 +92,12 @@ static void apple_z2_parse_touches(struct apple_z2 *z2,
> return;
> nfingers = msg[APPLE_Z2_NUM_FINGERS_OFFSET];
> fingers = (struct apple_z2_finger *)(msg + APPLE_Z2_FINGERS_OFFSET);
> + /* a malicious controller can claim more fingers than the packet holds */
> + if (msg_len < APPLE_Z2_FINGERS_OFFSET)
> + nfingers = 0;
> + else
> + nfingers = min_t(int, nfingers,
> + (msg_len - APPLE_Z2_FINGERS_OFFSET) / sizeof(*fingers));
> for (i = 0; i < nfingers; i++) {
> slot = input_mt_get_slot_by_key(z2->input_dev, fingers[i].finger);
> if (slot < 0) {
>
> ---
> base-commit: 8e65320d91cdc3b241d4b94855c88459b91abf66
> change-id: 20260613-b4-disp-4ebcbd68-ed8a28672ccc
>
> Best regards,
Reviewed-by: Joshua Peisach <jpeisach@ubuntu.com>
^ permalink raw reply
* Re: [PATCH] ARM: disable broken eBPF JIT on the Risc PC
From: David Laight @ 2026-06-14 11:58 UTC (permalink / raw)
To: Ethan Nelson-Moore
Cc: linux-arm-kernel, linux-kernel, stable, Russell King,
Russell King (Oracle), Arnd Bergmann, Linus Walleij, Kees Cook,
Nathan Chancellor, Thomas Weissschuh, Peter Zijlstra,
Shubham Bansal, David S. Miller
In-Reply-To: <20260518014920.135011-1-enelsonmoore@gmail.com>
On Sun, 17 May 2026 18:49:17 -0700
Ethan Nelson-Moore <enelsonmoore@gmail.com> wrote:
> The eBPF JIT unconditionally generates ldrh/strh instructions, which do
> not function correctly on the Risc PC because its bus is unable to
> signal half-word accesses. Work around this issue by disabling the eBPF
> JIT when building for ARMv3 (the Risc PC is the only currently
> supported ARMv3 machine).
Isn't it more the case that the ldrh/strh instructions were added for armv4.
Whether the bus supports 16bit accesses is entirely different.
I'm guessing that WRITE_ONCE() gets implemented as two 8-bit writes and
the code 'just hopes' than an ISR won't care and won't do an update.
David
>
> Fixes: 39c13c204bb1 ("arm: eBPF JIT compiler")
> Cc: stable@vger.kernel.org
> Signed-off-by: Ethan Nelson-Moore <enelsonmoore@gmail.com>
> ---
> arch/arm/Kconfig | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
> index 1155c78bb6aa..8185d013e5d1 100644
> --- a/arch/arm/Kconfig
> +++ b/arch/arm/Kconfig
> @@ -98,7 +98,7 @@ config ARM
> select HAVE_ARCH_TRACEHOOK
> select HAVE_ARCH_TRANSPARENT_HUGEPAGE if ARM_LPAE
> select HAVE_ARM_SMCCC if CPU_V7
> - select HAVE_EBPF_JIT if !CPU_ENDIAN_BE32
> + select HAVE_EBPF_JIT if !CPU_ENDIAN_BE32 && !CPU_32v3
> select HAVE_CONTEXT_TRACKING_USER
> select HAVE_C_RECORDMCOUNT
> select HAVE_BUILDTIME_MCOUNT_SORT
^ permalink raw reply
* Re: [PATCH RFC 2/4] printk: deprecate boot_delay in favour of printk_delay
From: Andrew Murray @ 2026-06-14 11:45 UTC (permalink / raw)
To: Petr Mladek
Cc: Jonathan Corbet, Shuah Khan, Russell King, Florian Fainelli,
Broadcom internal kernel review list, Ray Jui, Scott Branden,
Steven Rostedt, John Ogness, Sergey Senozhatsky, Andrew Morton,
Sebastian Andrzej Siewior, Clark Williams, Randy Dunlap,
Linus Torvalds, linux-doc, linux-kernel, linux-arm-kernel,
linux-rpi-kernel, linux-rt-devel
In-Reply-To: <aibMr16r55xE26rU@pathway.suse.cz>
On Mon, 8 Jun 2026 at 15:07, Petr Mladek <pmladek@suse.com> wrote:
>
> On Mon 2026-06-01 00:17:38, Andrew Murray wrote:
> > The boot_delay (BOOT_PRINTK_DELAY) kernel parameter and printk_delay sysctl
> > are two distinct mechanisms for providing similar functionality which add a
> > delay prior to each printed printk message.
> >
> > boot_delay provides a kernel parameter for delaying printk output from
> > kernel start through to boot (SYSTEM_RUNNING), whereas printk_delay is
> > configurable only via sysctl and thus is only used post boot.
> >
> > Let's deprecate the boot_delay feature in favour of printk_delay. In order
> > to preserve functionality, we'll also extend printk_delay such that it can
> > additionally configured via a kernel parameter.
>
> I would make it clear and say: "via an early kernel parameter".
>
> Note that there are also kernel parameters which can be modified at runtime
> via /sys/module/kernel/paramters/<parameter>
OK thanks, I will update.
>
> Also I would make it clear that this changes the behavior, for
> example:
>
> <proposal>
> Behavior change:
>
> The delay enabled by both "boot_delay" and "printk_delay" continues
> working even in SYSTEM_RUNNING state. It must be explicitly stopped
> by setting printk_delay=0 via sysctl.
>
> The delay is skipped when the message is suppressed in all system
> states. It used to skipped only for the boot_delay.
> </proposal>
Yes, I'm happy to make that clearer.
>
> > --- a/kernel/printk/printk.c
> > +++ b/kernel/printk/printk.c
> > @@ -1339,11 +1327,34 @@ static void boot_delay_msec(int level)
> > }
> > }
> > #else
> > -static inline void boot_delay_msec(int level)
> > +static inline void __init printk_delay_calculate(void)
> > +{
> > +}
> > +
> > +static inline void early_boot_delay_msec(void)
> > {
>
> It would be nice to print a warning that the early boot delay
> does not work, something like:
>
> pr_warn_once("Early boot delay does not work without CONFIG_GENERIC_CALIBRATE_DELAY enabled.\n");
>
> > }
> > #endif
> >
> > +static int __init printk_delay_setup(char *str)
> > +{
> > + get_option(&str, &printk_delay_msec);
> > + if (printk_delay_msec > 10 * 1000)
> > + printk_delay_msec = 0;
>
> Sashiko AI warns that this code accepts negative values.
> It might cause long delays, see
> https://sashiko.dev/#/patchset/20260601-deprecate_boot_delay-v1-0-c34c187142a6%40thegoodpenguin.co.uk
>
> The problem has already been there even before. But it would be nice
> to fix it.
Thanks for pointing out Sashiko, I hadn't seen its review on my
patches. Are authors expected to get emails from it, as I didn't?
In any case, it's a good spot, so I'll address.
>
> > +
> > + printk_delay_calculate();
> > +
> > + return 0;
> > +}
> > +early_param("printk_delay", printk_delay_setup);
> > +
> > +static int __init boot_delay_setup(char *str)
> > +{
> > + pr_warn("boot_delay will soon be deprecated, please use printk_delay instead");
> > + return printk_delay_setup(str);
> > +}
> > +early_param("boot_delay", boot_delay_setup);
> > +
> > static bool printk_time = IS_ENABLED(CONFIG_PRINTK_TIME);
> > module_param_named(time, printk_time, bool, S_IRUGO | S_IWUSR);
>
> Otherwise, it looks good to me.
>
> Best Regards,
> Petr
Thanks,
Andrew Murray
^ permalink raw reply
* Re: [PATCH RFC 1/4] printk: remove BOOT_PRINTK_DELAY config option
From: Andrew Murray @ 2026-06-14 11:41 UTC (permalink / raw)
To: Petr Mladek
Cc: Jonathan Corbet, Shuah Khan, Russell King, Florian Fainelli,
Broadcom internal kernel review list, Ray Jui, Scott Branden,
Steven Rostedt, John Ogness, Sergey Senozhatsky, Andrew Morton,
Sebastian Andrzej Siewior, Clark Williams, Randy Dunlap,
Linus Torvalds, linux-doc, linux-kernel, linux-arm-kernel,
linux-rpi-kernel, linux-rt-devel
In-Reply-To: <aibCBGjVk4yqtYyT@pathway.suse.cz>
On Mon, 8 Jun 2026 at 14:22, Petr Mladek <pmladek@suse.com> wrote:
>
> On Mon 2026-06-01 00:17:37, Andrew Murray wrote:
> > The boot_delay (BOOT_PRINTK_DELAY) kernel parameter and printk_delay sysctl
> > are two distinct mechanisms for providing similar functionality which add a
> > delay prior to each printed printk message.
> >
> > In preparation of combining them into a single configurable feature, let's
> > first remove the kconfig option BOOT_PRINTK_DELAY.
> >
> > Signed-off-by: Andrew Murray <amurray@thegoodpenguin.co.uk>
>
> The option allowed to reduce a bit the vmlinux size when people were
> not interested into the functionality. I am not sure if it is worth
> it though. I am personally fine with this change.
I hadn't considered that need.
I'm happy to add this back in, but it would only make sense if this
option covered both boot_delay and printk_delay. That would change the
meaning of this existing Kconfig option, and would also allow the
removal of the printk_delay sysctl, I'm not sure if userspace assumes
this will always be there (probably not).
I'll leave this as is, unless there are objections.
Thanks,
Andrew Murray
>
> Reviewed-by: Petr Mladek <pmladek@suse.com>
>
> Best Regards,
> Petr
^ permalink raw reply
* Re: [PATCH v2] arm64: tlbflush: Don't broadcast if mm was only active on local cpu
From: Will Deacon @ 2026-06-14 11:33 UTC (permalink / raw)
To: Linu Cherian
Cc: Catalin Marinas, Ryan Roberts, Kevin Brodsky, Anshuman Khandual,
Yang Shi, Mark Rutland, Huang Ying, linux-arm-kernel,
linux-kernel
In-Reply-To: <ai6KzFgfMAxqplcr@willie-the-truck>
On Sun, Jun 14, 2026 at 12:04:44PM +0100, Will Deacon wrote:
> Can you simplify the 'if' condition here?
>
> if (active == ACTIVE_CPU_NONE) {
> if (!try_cmpxchg_relaxed(...))
> WRITE_ONCE(...);
>
> dsb(ishst);
> }
>
> (as an aside, maybe we should implement arch_try_cmpxchg{,_relaxed} so
> we could drop the READ_ONCE() here as well?)
Mulling this over a little more, we probably can't drop the READ_ONCE()
even if we optimised our try_cmpxchg() implementation, as it would
prevent us from eliding the DSB on the fast path.
The rest of my comments (including the refactoring above) stand, however.
Will
^ permalink raw reply
* Re: [PATCH v2] arm64: tlbflush: Don't broadcast if mm was only active on local cpu
From: Will Deacon @ 2026-06-14 11:04 UTC (permalink / raw)
To: Linu Cherian
Cc: Catalin Marinas, Ryan Roberts, Kevin Brodsky, Anshuman Khandual,
Yang Shi, Mark Rutland, Huang Ying, linux-arm-kernel,
linux-kernel
In-Reply-To: <20260523134710.3827956-1-linu.cherian@arm.com>
On Sat, May 23, 2026 at 07:17:10PM +0530, Linu Cherian wrote:
> From: Ryan Roberts <ryan.roberts@arm.com>
>
> There are 3 variants of tlb flush that invalidate user mappings:
> flush_tlb_mm(), flush_tlb_page() and __flush_tlb_range(). All of these
> would previously unconditionally broadcast their tlbis to all cpus in
> the inner shareable domain.
>
> But this is a waste of effort if we can prove that the mm for which we
> are flushing the mappings has only ever been active on the local cpu. In
> that case, it is safe to avoid the broadcast and simply invalidate the
> current cpu.
>
> So let's track in mm_context_t::active_cpu either the mm has never been
> active on any cpu, has been active on more than 1 cpu, or has been
> active on precisely 1 cpu - and in that case, which one. We update this
> when switching context, being careful to ensure that it gets updated
> *before* installing the mm's pgtables. On the reader side, we ensure we
> read *after* the previous write(s) to the pgtable(s) that necessitated
> the tlb flush have completed. This guarrantees that if a cpu that is
> doing a tlb flush sees it's own id in active_cpu, then the old pgtable
> entry cannot have been seen by any other cpu and we can flush only the
> local cpu.
>
> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
> Tested-by: Huang Ying <ying.huang@linux.alibaba.com>
> [linu.cherian@arm.com: Adapted for v7.1 flush tlb API changes]
> Signed-off-by: Linu Cherian <linu.cherian@arm.com>
> ---
> Changelog from RFC v1:
> - Adapted for v7.1 flush tlb API changes
> No changes in core logic
> - Collected Rb and Tb tags
> - lat_mmap benchmark showed dsb(ishst) performs better than dsb(ish),
> hence retained dsb(ishst) in flush_tlb_user_pre
>
>
> Testing with 7.1-rc4 :
> +-----------------------+---------------------------------------------------+-------------+
> | Benchmark | Result Class | Improvement|
> +=======================+===================================================+=============+
> | perf/syscall | fork (ops/sec) | (I) 3.25% |
> +-----------------------+---------------------------------------------------+-------------+
> | pts/memtier-benchmark | Protocol: Redis Clients: 100 Ratio: 1:5 (Ops/sec) | (I) 2.70% |
> | | Protocol: Redis Clients: 100 Ratio: 5:1 (Ops/sec) | (I) 2.13% |
> +-----------------------+---------------------------------------------------+-------------+
I think we need a much more comprehensive set of benchmarks before we can
begin to consider a change like this.
> arch/arm64/include/asm/mmu.h | 12 +++
> arch/arm64/include/asm/mmu_context.h | 2 +
> arch/arm64/include/asm/tlbflush.h | 127 +++++++++++++++++++++------
> arch/arm64/mm/context.c | 30 ++++++-
> 4 files changed, 141 insertions(+), 30 deletions(-)
Doesn't this break BTM/SVM with the SMMU? I think that's a non-starter
even if you can provide some more compelling numbers.
> +static inline bool flush_tlb_user_pre(struct mm_struct *mm, tlbf_t flags)
> +{
> + unsigned int self, active;
> + bool local;
> +
> + migrate_disable();
> +
> + if (flags & TLBF_NOBROADCAST) {
> + dsb(nshst);
> + return true;
> + }
Why does the NOBROADCAST case need migration disabled? It didn't before...
> +
> + self = smp_processor_id();
> +
> + /*
> + * The load of mm->context.active_cpu must not be reordered before the
> + * store to the pgtable that necessitated this flush. This ensures that
> + * if the value read is our cpu id, then no other cpu can have seen the
> + * old pgtable value and therefore does not need this old value to be
> + * flushed from its tlb. But we don't want to upgrade the dsb(ishst),
> + * needed to make the pgtable updates visible to the walker, to a
> + * dsb(ish) by default. So speculatively load without a barrier and if
> + * it indicates our cpu id, then upgrade the barrier and re-load.
> + */
> + active = READ_ONCE(mm->context.active_cpu);
> + if (active == self) {
> + dsb(ish);
> + active = READ_ONCE(mm->context.active_cpu);
> + } else {
> + dsb(ishst);
> + }
Why can't you just do:
dsb(ishst);
active = READ_ONCE(mm->context.active_cpu);
?
> +
> + local = active == self;
> + if (!local)
> + migrate_enable();
> +
> + return local;
> +}
> +
> +static inline void flush_tlb_user_post(bool local)
> +{
> + if (local)
> + migrate_enable();
> +}
I was under the impression that disabling/enabling migration was an
expensive thing to do, so I'd really want to see some more numbers to
justify this (including from inside a VM) and allow us to consider the
trade-offs properly. It's also not at all clear to me that it's safe
from such a low-level TLB invalidation helper.
> +
> /*
> * TLB Invalidation
> * ================
> @@ -408,12 +482,20 @@ static inline void flush_tlb_all(void)
> static inline void flush_tlb_mm(struct mm_struct *mm)
> {
> unsigned long asid;
> + bool local;
>
> - dsb(ishst);
> + local = flush_tlb_user_pre(mm, TLBF_NONE);
> asid = __TLBI_VADDR(0, ASID(mm));
> - __tlbi(aside1is, asid);
> - __tlbi_user(aside1is, asid);
> - __tlbi_sync_s1ish(mm);
> + if (local) {
> + __tlbi(aside1, asid);
> + __tlbi_user(aside1, asid);
> + dsb(nsh);
> + } else {
> + __tlbi(aside1is, asid);
> + __tlbi_user(aside1is, asid);
> + __tlbi_sync_s1ish(mm);
> + }
> + flush_tlb_user_post(local);
I think you've changed this since Ryan's original patch, but why are you
only calling __tlbi_sync_s1ish() for the !local case? Doesn't that break
the erratum workaround when running as a VM if the vCPU is migrated?
> diff --git a/arch/arm64/mm/context.c b/arch/arm64/mm/context.c
> index 0f4a28b87469..f34ed78393e0 100644
> --- a/arch/arm64/mm/context.c
> +++ b/arch/arm64/mm/context.c
> @@ -214,9 +214,10 @@ static u64 new_context(struct mm_struct *mm)
>
> void check_and_switch_context(struct mm_struct *mm)
> {
> - unsigned long flags;
> - unsigned int cpu;
> + unsigned int cpu = smp_processor_id();
> u64 asid, old_active_asid;
> + unsigned int active;
> + unsigned long flags;
>
> if (system_supports_cnp())
> cpu_set_reserved_ttbr0();
> @@ -251,7 +252,6 @@ void check_and_switch_context(struct mm_struct *mm)
> atomic64_set(&mm->context.id, asid);
> }
>
> - cpu = smp_processor_id();
> if (cpumask_test_and_clear_cpu(cpu, &tlb_flush_pending))
> local_flush_tlb_all();
>
> @@ -262,6 +262,30 @@ void check_and_switch_context(struct mm_struct *mm)
>
> arm64_apply_bp_hardening();
>
> + /*
> + * Update mm->context.active_cpu in such a manner that we avoid cmpxchg
> + * and dsb unless we definitely need it. If initially ACTIVE_CPU_NONE
> + * then we are the first cpu to run so set it to our id. If initially
> + * any id other than ours, we are the second cpu to run so set it to
> + * ACTIVE_CPU_MULTIPLE. If we update the value then we must issue
> + * dsb(ishst) to ensure stores to mm->context.active_cpu are ordered
> + * against the TTBR0 write in cpu_switch_mm()/uaccess_enable(); the
> + * store must be visible to another cpu before this cpu could have
> + * populated any TLB entries based on the pgtables that will be
> + * installed.
> + */
> + active = READ_ONCE(mm->context.active_cpu);
> + if (active != cpu && active != ACTIVE_CPU_MULTIPLE) {
> + if (active == ACTIVE_CPU_NONE)
> + active = cmpxchg_relaxed(&mm->context.active_cpu,
> + ACTIVE_CPU_NONE, cpu);
> +
> + if (active != ACTIVE_CPU_NONE)
> + WRITE_ONCE(mm->context.active_cpu, ACTIVE_CPU_MULTIPLE);
> +
> + dsb(ishst);
> + }
> +
Can you simplify the 'if' condition here?
if (active == ACTIVE_CPU_NONE) {
if (!try_cmpxchg_relaxed(...))
WRITE_ONCE(...);
dsb(ishst);
}
(as an aside, maybe we should implement arch_try_cmpxchg{,_relaxed} so
we could drop the READ_ONCE() here as well?)
Will
^ permalink raw reply
* Re: [PATCH 1/8] mm: Add ptep_try_set() for lockless empty-slot installs
From: Will Deacon @ 2026-06-14 9:28 UTC (permalink / raw)
To: Tejun Heo
Cc: David Vernet, Andrea Righi, Changwoo Min, Alexei Starovoitov,
Andrii Nakryiko, Daniel Borkmann, Martin KaFai Lau,
Kumar Kartikeya Dwivedi, Peter Zijlstra, Catalin Marinas,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
Andrew Morton, David Hildenbrand, Mike Rapoport, Emil Tsalapatis,
sched-ext, bpf, x86, linux-arm-kernel, linux-mm, linux-kernel
In-Reply-To: <20260522172219.1423324-2-tj@kernel.org>
On Fri, May 22, 2026 at 07:22:12AM -1000, Tejun Heo wrote:
> Add ptep_try_set(ptep, new_pte): atomically set *ptep to new_pte iff it is
> currently pte_none(). Returns true on success, false if the slot was already
> populated or the arch has no implementation.
>
> The intended caller is the upcoming bpf_arena kernel-side fault recovery
> path. The install runs from a page fault that can be nested under locks
> held by the faulting kernel caller (e.g. a BPF program holding
> raw_res_spin_lock_irqsave on its arena's spinlock), so trylock-and-retry
> would A-A deadlock. Lock-free cmpxchg is the only viable option, which
> constrains this helper to special kernel page tables where concurrent
> writers cooperate via atomic accessors.
>
> The generic version in <linux/pgtable.h> returns false. x86 and arm64
> override with try_cmpxchg-based implementations on the underlying pteval.
> Other architectures get the false stub - the callers there already fall
> through to oops.
>
> v2: Rename to ptep_try_set(). Tighten kerneldoc. (David, Alexei)
> v3: Note that strict-zero cmpxchg is narrower than pte_none(). (Andrea)
>
> Suggested-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> Suggested-by: Alexei Starovoitov <ast@kernel.org>
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Reviewed-by: Andrea Righi <arighi@nvidia.com>
> Cc: David Hildenbrand <david@kernel.org>
> ---
> arch/arm64/include/asm/pgtable.h | 12 ++++++++++++
> arch/x86/include/asm/pgtable.h | 12 ++++++++++++
> include/linux/pgtable.h | 25 +++++++++++++++++++++++++
> 3 files changed, 49 insertions(+)
>
> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
> index 9029b81ccbe8..28bada97d443 100644
> --- a/arch/arm64/include/asm/pgtable.h
> +++ b/arch/arm64/include/asm/pgtable.h
> @@ -1830,6 +1830,18 @@ static inline pte_t ptep_get_and_clear(struct mm_struct *mm,
> return __ptep_get_and_clear(mm, addr, ptep);
> }
>
> +/*
> + * Note: strictly-zero compare is narrower than pte_none(), but the gap is
> + * harmless: a fresh kernel PTE has no software bits set.
> + */
This comment really confused me :/
What is a "fresh" kernel PTE and why do you specifically call out "software
bits" if the CAS requires all 64 bits to be 0? Why is that narrower than
pte_none() given that pte_none() for arm64 is:
#define pte_none(pte) (!pte_val(pte))
Will
^ permalink raw reply
* Re: [PATCH 1/2] arm64: tlbflush: Don't broadcast if mm was only active on local cpu
From: Will Deacon @ 2026-06-14 9:44 UTC (permalink / raw)
To: Catalin Marinas
Cc: sk, linux-arm-kernel, linux-kernel, Ryan Roberts, Andrew Morton,
David Hildenbrand, Anshuman Khandual, Mike Rapoport, Dev Jain,
Kevin Brodsky, Marc Zyngier, Oliver Upton, cl, Huang Ying,
Linu Cherian
In-Reply-To: <airUWY4jFgxWvQ4s@arm.com>
On Thu, Jun 11, 2026 at 04:29:29PM +0100, Catalin Marinas wrote:
> On Tue, Jun 09, 2026 at 02:34:32PM -0700, sk@gentwo.org wrote:
> > From: Ryan Roberts <ryan.roberts@arm.com>
> >
> > There are 3 variants of tlb flush that invalidate user mappings:
> > flush_tlb_mm(), flush_tlb_page() and __flush_tlb_range(). All of these
> > would previously unconditionally broadcast their tlbis to all cpus in
> > the inner shareable domain.
> >
> > But this is a waste of effort if we can prove that the mm for which we
> > are flushing the mappings has only ever been active on the local cpu. In
> > that case, it is safe to avoid the broadcast and simply invalidate the
> > current cpu.
> >
> > So let's track in mm_context_t::active_cpu either the mm has never been
> > active on any cpu, has been active on more than 1 cpu, or has been
> > active on precisely 1 cpu - and in that case, which one. We update this
> > when switching context, being careful to ensure that it gets updated
> > *before* installing the mm's pgtables. On the reader side, we ensure we
> > read *after* the previous write(s) to the pgtable(s) that necessitated
> > the tlb flush have completed. This guarrantees that if a cpu that is
> > doing a tlb flush sees it's own id in active_cpu, then the old pgtable
> > entry cannot have been seen by any other cpu and we can flush only the
> > local cpu.
> >
> > Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
> > Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
> > Tested-by: Huang Ying <ying.huang@linux.alibaba.com>
> > [linu.cherian@arm.com: Adapted for v7.1 flush tlb API changes]
> > Signed-off-by: Linu Cherian <linu.cherian@arm.com>
>
> Nit: if you repost someone's patch, please add your signed-off-by.
I have a feeling this patch is horribly broken, so I'll reply on the
original.
Will
^ permalink raw reply
* Re: [PATCH v2 0/7] KVM: arm64: Forward FFA_NOTIFICATION* calls to TrustZone
From: Will Deacon @ 2026-06-14 9:29 UTC (permalink / raw)
To: Vincent Donnefort
Cc: Sebastian Ene, catalin.marinas, maz, oupton, joey.gouly, korneld,
kvmarm, linux-arm-kernel, linux-kernel, android-kvm,
mrigendra.chaubey, perlarsen, suzuki.poulose, yuzenghui
In-Reply-To: <ailtCAcEaJIgZ5Ap@google.com>
On Wed, Jun 10, 2026 at 02:56:24PM +0100, Vincent Donnefort wrote:
> On Wed, Jun 10, 2026 at 01:23:04PM +0100, Will Deacon wrote:
> > On Wed, Jun 10, 2026 at 01:15:44PM +0100, Vincent Donnefort wrote:
> > > On Wed, Jun 10, 2026 at 11:15:14AM +0100, Will Deacon wrote:
> > > > On Wed, Jun 10, 2026 at 10:26:59AM +0100, Vincent Donnefort wrote:
> > > > > On Mon, Jun 08, 2026 at 04:55:42PM +0000, Sebastian Ene wrote:
> > > > > > Remove the FFA_NOTIFICATION* calls from the blocklist used by the pKVM
> > > > > > FF-A proxy. This restriction was preventing the use of asynchronous
> > > > > > signaling mechanisms defined by the Arm FF-A specification to
> > > > > > communicate with the secure services.
> > > > > > While these calls are markes as optional, there is no reason why the
> > > > > > hypervisor proxy would block them because:
> > > > > >
> > > > > > 1. Host is the Sole Non-Secure Endpoint: The Host operates as the
> > > > > > only Non-Secure VM ID (VM ID 0) recognized by the Secure World.
> > > > > > Because all forwarded notifications are inherently attributed to
> > > > > > the Host by the SPMC, there is no risk of VM ID spoofing
> > > > > > originating from the Normal World.
> > > > > >
> > > > > > 2. No Memory Pointers or Addresses: The FFA_NOTIFICATION_* ABIs
> > > > > > operate strictly via register-based parameters, passing only
> > > > > > VM IDs, VCPU IDs, flags, and bitmaps. Because these calls do
> > > > > > not contain memory addresses, offsets, or pointers, forwarding
> > > > > > them doesn't pose a risk of memory-based confused deputy attack
> > > > > > (e.g., tricking the SPMC into overwriting protected memory).
> > > > > >
> > > > > > While the pKVM proxy behaves as a relayer, it doesn't currently have its
> > > > > > own FF-A ID(only the host has the ID 0). The behavior of the setup
> > > > > > flow is covered by the spec in the: '10.9 Notification support without
> > > > > > a Hypervisor'.
> > > > >
> > > > > As it is only a relayer. Is it really important to check SBZ arguments and
> > > > > fields on behalf of Trustzone? It doesn't feel it brings any security. If the
> > > > > host passes broken arguments, I don't believe this puts pKVM at risk. Does it?
> > > >
> > > > I think the problem would be if an update to FF-A allocated some of the
> > > > currently SBZ bits to implement some functionality that we would want
> > > > to filter at EL2.
> > >
> > > I suppose that would bump the FF-A version and the proxy would reject it?
> >
> > Maybe? I don't think they'd _have_ to bump the version number.
> >
> > > If we really want to check for those arguments to be 0:
> > >
> > > * Shouldn't we extend this check to other FF-A invocations?
> >
> > yes, that's what the diff was doing in the reply here:
> >
> > https://lore.kernel.org/all/af3fW468-f1KXCrC@google.com/
> >
> > but, as I said here:
> >
> > https://lore.kernel.org/all/ahmxiFXXTupafbXw@willie-the-truck/
> >
> > I don't particularly like the table-driven indirection (the checks
> > should just be inlined).
>
> Ha, sorry I'm late to the party.
>
> Perhaps this series should start with adding ffa_check_unused_args_sbz() to the
> existing allowed FF-A invocations?
Yes, that part now seems to be missing.
Seb, please can you respin with that included?
Will
^ permalink raw reply
* Re: [PATCH] net: airoha: Fix skb->priority underflow in airoha_dev_select_queue()
From: Lorenzo Bianconi @ 2026-06-14 8:09 UTC (permalink / raw)
To: Wayen.Yan
Cc: netdev, horms, pabeni, kuba, edumazet, andrew+netdev,
angelogioacchino.delregno, matthias.bgg, linux-arm-kernel,
linux-mediatek
In-Reply-To: <6a2de8c5.2c570c9e.53b1a.0e1b@mx.google.com>
[-- Attachment #1: Type: text/plain, Size: 1707 bytes --]
> In airoha_dev_select_queue(), the expression:
>
> queue = (skb->priority - 1) % AIROHA_NUM_QOS_QUEUES;
>
> implicitly converts to unsigned arithmetic: when skb->priority is 0
> (the default for unclassified traffic), (0u - 1u) wraps to UINT_MAX,
> and UINT_MAX % 8 = 7, routing default best-effort packets to the
> highest-priority QoS queue. This causes QoS inversion where the
> majority of traffic on a PON gateway starves actual high-priority
> flows (VoIP, gaming, etc.).
>
> Fix by guarding the subtraction: when priority is 0, map to queue 0
> (lowest priority), otherwise apply the original (priority - 1) % 8
> mapping.
>
> Fixes: 2b288b81560b ("net: airoha: Introduce ndo_select_queue callback")
> Signed-off-by: Wayen <win847@gmail.com>
Acked-by: Lorenzo Bianconi <lorenzo@kernel.org>
> ---
> drivers/net/ethernet/airoha/airoha_eth.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/net/ethernet/airoha/airoha_eth.c b/drivers/net/ethernet/airoha/airoha_eth.c
> index 31cdb11cd7..d476ef83c3 100644
> --- a/drivers/net/ethernet/airoha/airoha_eth.c
> +++ b/drivers/net/ethernet/airoha/airoha_eth.c
> @@ -1933,7 +1933,7 @@ static u16 airoha_dev_select_queue(struct net_device *dev, struct sk_buff *skb,
> */
> channel = netdev_uses_dsa(dev) ? skb_get_queue_mapping(skb) : port->id;
> channel = channel % AIROHA_NUM_QOS_CHANNELS;
> - queue = (skb->priority - 1) % AIROHA_NUM_QOS_QUEUES; /* QoS queue */
> + queue = skb->priority ? (skb->priority - 1) % AIROHA_NUM_QOS_QUEUES : 0;
> queue = channel * AIROHA_NUM_QOS_QUEUES + queue;
>
> return queue < dev->num_tx_queues ? queue : 0;
> --
> 2.51.0
>
>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
^ permalink raw reply
* Re: [PATCH] net: airoha: Remove dead MT7996 NPU firmware declarations
From: Lorenzo Bianconi @ 2026-06-14 8:16 UTC (permalink / raw)
To: Wayen.Yan
Cc: netdev, horms, pabeni, kuba, edumazet, andrew+netdev,
angelogioacchino.delregno, matthias.bgg, linux-arm-kernel,
linux-mediatek
In-Reply-To: <6a2dea77.01c4f138.336eeb.a256@mx.google.com>
[-- Attachment #1: Type: text/plain, Size: 2985 bytes --]
> Remove the NPU_EN7581_7996_FIRMWARE_DATA/RV32 #define macros and
> their corresponding MODULE_FIRMWARE() declarations. Neither the
> en7581_npu_soc_data nor the an7583_npu_soc_data references these
> firmware names, and no firmware loading path in the driver ever
> requests them. The only references are the #define lines themselves
> and the MODULE_FIRMWARE() declarations below.
>
> Keeping dead MODULE_FIRMWARE entries causes modprobe/udev to attempt
> pre-loading non-existent firmware files, generating kernel log noise
> and misleading distributors about which firmware files to package.
>
> Fixes: 23290c7bc190 ("net: airoha: Introduce Airoha NPU support")
> Signed-off-by: Wayen <win847@gmail.com>
Please drop this patch since EN7581_7996 firmware is defined via dts
for 7581:
commit 3847173525e307ebcd23bd4863da943ea78b0057
Author: Lorenzo Bianconi <lorenzo@kernel.org>
Date: Tue Jan 20 11:17:18 2026 +0100
net: airoha: npu: Add the capability to read firmware names from dts
Introduce the capability to read the firmware binary names from device-tree
using the firmware-name property if available.
This patch is needed because NPU firmware binaries are board specific since
they depend on the MediaTek WiFi chip used on the board (e.g. MT7996 or
MT7992) and the WiFi chip version info is not available in the NPU driver.
This is a preliminary patch to enable MT76 NPU offloading if the Airoha SoC
is equipped with MT7996 (Eagle) WiFi chipset.
https://github.com/openwrt/openwrt/blob/main/target/linux/airoha/dts/an7581-npu-mt7996.dtsi
and here these macros are used to notify userspace for firmware loading.
Regards,
Lorenzo
> ---
> drivers/net/ethernet/airoha/airoha_npu.c | 4 ----
> 1 file changed, 4 deletions(-)
>
> diff --git a/drivers/net/ethernet/airoha/airoha_npu.c b/drivers/net/ethernet/airoha/airoha_npu.c
> index 17dbdc8325..93095f3894 100644
> --- a/drivers/net/ethernet/airoha/airoha_npu.c
> +++ b/drivers/net/ethernet/airoha/airoha_npu.c
> @@ -16,8 +16,6 @@
>
> #define NPU_EN7581_FIRMWARE_DATA "airoha/en7581_npu_data.bin"
> #define NPU_EN7581_FIRMWARE_RV32 "airoha/en7581_npu_rv32.bin"
> -#define NPU_EN7581_7996_FIRMWARE_DATA "airoha/en7581_MT7996_npu_data.bin"
> -#define NPU_EN7581_7996_FIRMWARE_RV32 "airoha/en7581_MT7996_npu_rv32.bin"
> #define NPU_AN7583_FIRMWARE_DATA "airoha/an7583_npu_data.bin"
> #define NPU_AN7583_FIRMWARE_RV32 "airoha/an7583_npu_rv32.bin"
> #define NPU_EN7581_FIRMWARE_RV32_MAX_SIZE 0x200000
> @@ -822,8 +820,6 @@ module_platform_driver(airoha_npu_driver);
>
> MODULE_FIRMWARE(NPU_EN7581_FIRMWARE_DATA);
> MODULE_FIRMWARE(NPU_EN7581_FIRMWARE_RV32);
> -MODULE_FIRMWARE(NPU_EN7581_7996_FIRMWARE_DATA);
> -MODULE_FIRMWARE(NPU_EN7581_7996_FIRMWARE_RV32);
> MODULE_FIRMWARE(NPU_AN7583_FIRMWARE_DATA);
> MODULE_FIRMWARE(NPU_AN7583_FIRMWARE_RV32);
> MODULE_LICENSE("GPL");
> --
> 2.51.0
>
>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
^ permalink raw reply
* Re:Re: [PATCH v3] net: stmmac: fix fatal bus error on resume by reinitializing RX buffers
From: Ding Hui @ 2026-06-14 6:14 UTC (permalink / raw)
To: kuba
Cc: alexandre.torgue, andrew+netdev, davem, dinghui1111, dinghui,
edumazet, j.raczynski, linux-arm-kernel, linux-kernel,
linux-stm32, liuxuanjun, maxime.chevallier, mcoquelin.stm32,
netdev, pabeni, rmk+kernel, xiasanbo, yangchen11
In-Reply-To: <20260608193059.78e05dce@kernel.org>
At 2026-06-09 10:30:59, "Jakub Kicinski" <kuba@kernel.org> wrote:
>On Thu, 4 Jun 2026 22:45:54 +0800 Ding Hui wrote:
>> +/**
>> + * stmmac_reinit_rx_descriptors - re-program RX descriptor buffer addresses
>> + * after stmmac_clear_descriptors()
>> + * @priv: driver private structure
>> + * @dma_conf: structure holding the dma data
>> + * @queue: RX queue index
>
>nit:
>
>kernel-doc script says:
>
>Warning: drivers/net/ethernet/stmicro/stmmac/stmmac_main.c:1733 No description found for return value of 'stmmac_reinit_rx_descriptors'
>
>You need a Returns: statement in this kdoc
>--
>pw-bot: cr
Sorry for late reply. I will update a new version for it. Thanks.
^ permalink raw reply
* Re:Re: [PATCH v3] net: stmmac: fix fatal bus error on resume by reinitializing RX buffers
From: Ding Hui @ 2026-06-14 6:02 UTC (permalink / raw)
To: j.raczynski
Cc: alexandre.torgue, andrew+netdev, davem, dinghui1111, dinghui,
edumazet, kuba, linux-arm-kernel, linux-kernel, linux-stm32,
liuxuanjun, maxime.chevallier, mcoquelin.stm32, netdev, pabeni,
rmk+kernel, xiasanbo, yangchen11
In-Reply-To: <aiaORbb0lZVxDg8L@AMDC4622.eu.corp.samsungelectronics.net>
At 2026-06-08 17:41:25, "Jakub Raczynski" <j.raczynski@samsung.com> wrote:
>On Thu, Jun 04, 2026 at 10:45:54PM +0800, Ding Hui wrote:
>> From: Ding Hui <dinghui@lixiang.com>
>> + for (queue = 0; queue < priv->plat->rx_queues_to_use; queue++) {
>> + ret = stmmac_reinit_rx_descriptors(priv, &priv->dma_conf,
>> + queue);
>> + if (ret) {
>> + netdev_err(priv->dev,
>> + "%s: rx desc reinit failed on queue %u\n",
>> + __func__, queue);
>> + mutex_unlock(&priv->lock);
>> + rtnl_unlock();
>> + return ret;
>> + }
>> + }
>
>This is not directly related to the patch, but rather stmmac_resume() itself,
>but doesn't this return and hw_setup one leave bunch of descriptor memory
>hanging and effectively leaked?
>
>> +
>> ret = stmmac_hw_setup(ndev);
>> if (ret < 0) {
>> netdev_err(priv->dev, "%s: Hw setup failed\n", __func__);
>> --
>
You are right that both error paths leave the descriptor rings and RX
buffers allocated without an explicit cleanup. However, I prefer to call
it a memory "hanging" but not "leaked":
The memory is not permanently leaked. All RX buffers allocated in the
error path are stored in dma_conf->rx_queue[q].buf_pool[].page (or
.xdp for XSK queues), and the DMA descriptor rings themselves remain
reachable via priv->dma_conf. When the user eventually brings the
interface down, stmmac_release() -> free_dma_desc_resources() will
free everything correctly.
Maybe I should submit a follow-up patch that adds proper cleanup to
stmmac_resume()'s error paths (calling free_dma_desc_resources() and
marking the device as not running), if that would be welcome. I'd
prefer to keep it separate from this fix to keep the scope clean.
>Other than that, I don't see any obvious issues.
>
Thanks for the review.
^ permalink raw reply
* [RFC PATCH net-next 7/7] net: airoha: add SOE XFRM packet offload support
From: Jihong Min @ 2026-06-14 4:00 UTC (permalink / raw)
To: netdev, Lorenzo Bianconi
Cc: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Andrew Lunn, Simon Horman, Herbert Xu, Steffen Klassert,
Rob Herring, Krzysztof Kozlowski, Conor Dooley, devicetree,
Matthias Brugger, AngeloGioacchino Del Regno, linux-arm-kernel,
linux-mediatek, Christian Marangi, Felix Fietkau, linux-kernel,
Jihong Min
In-Reply-To: <20260614040032.1567994-1-hurryman2212@gmail.com>
Add the EN7581 Secure Offload Engine provider. The provider programs ESP
SAs, exposes NETIF_F_HW_ESP through xfrmdev_ops, submits encrypt and
decrypt packets through the QDMA SOE path, and handles SOE completion
delivery.
Mirror the XFRM ops to DSA user devices whose CPU conduit is an Airoha
netdev so packet offload remains available through switch ports.
Signed-off-by: Jihong Min <hurryman2212@gmail.com>
---
drivers/net/ethernet/airoha/Kconfig | 13 +
drivers/net/ethernet/airoha/Makefile | 1 +
drivers/net/ethernet/airoha/airoha_soe.c | 1896 ++++++++++++++++++++++
3 files changed, 1910 insertions(+)
create mode 100644 drivers/net/ethernet/airoha/airoha_soe.c
diff --git a/drivers/net/ethernet/airoha/Kconfig b/drivers/net/ethernet/airoha/Kconfig
index ad3ce501e7a5..a20e9dd0bfde 100644
--- a/drivers/net/ethernet/airoha/Kconfig
+++ b/drivers/net/ethernet/airoha/Kconfig
@@ -31,4 +31,17 @@ config NET_AIROHA_FLOW_STATS
help
Enable Aiorha flowtable statistic counters.
+config NET_AIROHA_SOE
+ bool "Airoha SOE ESP offload support"
+ depends on NET_AIROHA
+ depends on INET
+ select XFRM
+ select XFRM_OFFLOAD
+ help
+ Enable support for the Airoha Secure Offload Engine used by
+ the Ethernet driver for ESP packet offload. This option only
+ adds the provider and netdev plumbing; ESP offload is still
+ advertised at runtime only when the SOE block and required
+ packet offload path are available.
+
endif #NET_VENDOR_AIROHA
diff --git a/drivers/net/ethernet/airoha/Makefile b/drivers/net/ethernet/airoha/Makefile
index 94468053e34b..b68b8f614b0e 100644
--- a/drivers/net/ethernet/airoha/Makefile
+++ b/drivers/net/ethernet/airoha/Makefile
@@ -6,4 +6,5 @@
obj-$(CONFIG_NET_AIROHA) += airoha-eth.o
airoha-eth-y := airoha_eth.o airoha_ppe.o
airoha-eth-$(CONFIG_DEBUG_FS) += airoha_ppe_debugfs.o
+airoha-eth-$(CONFIG_NET_AIROHA_SOE) += airoha_soe.o
obj-$(CONFIG_NET_AIROHA_NPU) += airoha_npu.o
diff --git a/drivers/net/ethernet/airoha/airoha_soe.c b/drivers/net/ethernet/airoha/airoha_soe.c
new file mode 100644
index 000000000000..3a240ed44d7f
--- /dev/null
+++ b/drivers/net/ethernet/airoha/airoha_soe.c
@@ -0,0 +1,1896 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Airoha Secure Offload Engine (SOE) provider for the Ethernet driver.
+ *
+ * This file owns the EN7581 SOE packet-offload glue used by airoha_eth:
+ * xfrm state programming, hop-descriptor TX metadata, SOE RX completion
+ * decoding, and DSA proxy netdev binding. The SOE block is reached through
+ * the FE/QDMA packet fabric and is initialized by the Ethernet driver rather
+ * than by a separate platform driver.
+ */
+
+#include <linux/atomic.h>
+#include <linux/bitfield.h>
+#include <linux/completion.h>
+#include <linux/device.h>
+#include <linux/err.h>
+#include <linux/if_ether.h>
+#include <linux/if_packet.h>
+#include <linux/iopoll.h>
+#include <linux/io.h>
+#include <linux/ipv6.h>
+#include <linux/list.h>
+#include <linux/mutex.h>
+#include <linux/moduleparam.h>
+#include <linux/netdevice.h>
+#include <linux/of.h>
+#include <linux/of_address.h>
+#include <linux/rcupdate.h>
+#include <linux/refcount.h>
+#include <linux/skbuff.h>
+#include <linux/slab.h>
+#include <linux/spinlock.h>
+#include <linux/string.h>
+#include <linux/udp.h>
+#include <linux/unaligned.h>
+
+#include <net/dst.h>
+#include <net/esp.h>
+#include <net/gso.h>
+#include <net/ip.h>
+#include <net/net_namespace.h>
+#include <net/xfrm.h>
+
+#include "airoha_eth.h"
+#include "airoha_regs.h"
+#include "airoha_soe.h"
+
+#define AIROHA_SOE_NUM_SA 32
+#define AIROHA_SOE_QDMA_HOP_DESC_LEN 32
+#define AIROHA_SOE_KEY_WORDS 8
+#define AIROHA_SOE_ADDR_WORDS 4
+#define AIROHA_SOE_SA_TIMEOUT_US 1000
+#define AIROHA_SOE_SA_FREE_TIMEOUT HZ
+#define AIROHA_SOE_HOP_DESC0_ENCRYPT 0xffffff81ULL
+#define AIROHA_SOE_HOP_DESC0_DECRYPT 0xffffff82ULL
+#define AIROHA_SOE_HOP_DESC1 0x1ff00000000ULL
+#define AIROHA_SOE_QDMA_TX_RING 2
+#define AIROHA_SOE_TXMSG2_DEFAULT 0xff00ffff
+
+/* This is the packet/IPsec SOE window at 0x1fbfa000. EN7581 E2 exposes
+ * this register block for packet processing, not standalone crypto offload.
+ */
+#define AIROHA_SOE_GLB_CFG 0x000
+#define AIROHA_SOE_GLB_CFG_ENC_EN BIT(0)
+#define AIROHA_SOE_GLB_CFG_DEC_EN BIT(1)
+#define AIROHA_SOE_CONT_ICV_CTRL 0x004
+#define AIROHA_SOE_INT_EN 0x020
+#define AIROHA_SOE_INT_STS 0x024
+#define AIROHA_SOE_INT_ALL GENMASK(15, 0)
+#define AIROHA_SOE_CNT_CLR 0x04c
+#define AIROHA_SOE_CNT_CLR_ALL BIT(0)
+#define AIROHA_SOE_SA_CTRL 0x100
+#define AIROHA_SOE_SA_DONE 0x104
+#define AIROHA_SOE_SA_CMD 0x110
+#define AIROHA_SOE_BCNT_THSHD_32_SOFT 0x114
+#define AIROHA_SOE_BCNT_THSHD_64_SOFT 0x118
+#define AIROHA_SOE_SA_SPI 0x11c
+#define AIROHA_SOE_SA_UDP_PORT 0x120
+#define AIROHA_SOE_SA_ENC_KEY(n) (0x124 + (n) * 4)
+#define AIROHA_SOE_SA_HMAC_KEY(n) (0x144 + (n) * 4)
+#define AIROHA_SOE_SA_SRC_ADDR(n) (0x164 + (n) * 4)
+#define AIROHA_SOE_SA_DST_ADDR(n) (0x174 + (n) * 4)
+#define AIROHA_SOE_ICV_OK_LO_CNT 0x184
+#define AIROHA_SOE_ICV_OK_HI_CNT 0x188
+#define AIROHA_SOE_ICV_FAIL_LO_CNT 0x18c
+#define AIROHA_SOE_ICV_FAIL_HI_CNT 0x190
+#define AIROHA_SOE_CON_ICV_FAIL_CNT 0x194
+#define AIROHA_SOE_SEQ_NUM_LO 0x198
+#define AIROHA_SOE_SEQ_NUM_HI 0x19c
+#define AIROHA_SOE_BCNT_LO 0x1a0
+#define AIROHA_SOE_BCNT_HI 0x1a4
+#define AIROHA_SOE_FLOW_LAB_DSCP 0x1a8
+#define AIROHA_SOE_BCNT_80 0x1ac
+#define AIROHA_SOE_BCNT_THSHD_80 0x1b0
+#define AIROHA_SOE_BCNT_THSHD_32_HARD 0x1b4
+#define AIROHA_SOE_BCNT_THSHD_64_HARD 0x1b8
+#define AIROHA_SOE_SEQ_THSHD_32_SOFT 0x1bc
+#define AIROHA_SOE_SEQ_THSHD_64_SOFT 0x1c0
+#define AIROHA_SOE_SEQ_THSHD_32_HARD 0x1c4
+#define AIROHA_SOE_SEQ_THSHD_64_HARD 0x1c8
+#define AIROHA_SOE_SA_CTRL_WR BIT(0)
+#define AIROHA_SOE_SA_CTRL_IDX GENMASK(15, 8)
+#define AIROHA_SOE_SA_DONE_W1C BIT(0)
+
+#define AIROHA_SOE_SA_CMD_ENC BIT(0)
+#define AIROHA_SOE_SA_CMD_CIPHER GENMASK(3, 1)
+#define AIROHA_SOE_SA_CMD_HASH GENMASK(6, 4)
+#define AIROHA_SOE_SA_CMD_AES_KEY_LEN GENMASK(8, 7)
+#define AIROHA_SOE_SA_CMD_ESN_EN BIT(9)
+#define AIROHA_SOE_SA_CMD_OUT_IPV6 BIT(10)
+#define AIROHA_SOE_SA_CMD_ESP_MODE BIT(11) /* 0=tunnel, 1=transport */
+#define AIROHA_SOE_SA_CMD_NAT_EN BIT(12)
+#define AIROHA_SOE_SA_CMD_ANTI_RPLY_EN BIT(13)
+#define AIROHA_SOE_SA_CMD_ANTI_RPLY_WDW GENMASK(15, 14)
+#define AIROHA_SOE_SA_CMD_SN_ERR_DROP BIT(16)
+#define AIROHA_SOE_SA_CMD_PAD_ERR_DROP BIT(17)
+#define AIROHA_SOE_SA_CMD_ICV_ERR_DROP BIT(18)
+#define AIROHA_SOE_SA_CMD_GCM_ICV_LEN GENMASK(25, 24)
+#define AIROHA_SOE_SA_CMD_DEC_UDP_PARSER_EN BIT(29)
+#define AIROHA_SOE_SA_CMD_VLD BIT(31)
+
+#define AIROHA_SOE_CIPHER_AES_CBC 1
+#define AIROHA_SOE_CIPHER_AES_GCM 2
+#define AIROHA_SOE_HASH_HMAC_SHA1_96 1
+#define AIROHA_SOE_HASH_HMAC_SHA256_128 2
+#define AIROHA_SOE_AES_KEY_128 0
+#define AIROHA_SOE_AES_KEY_192 1
+#define AIROHA_SOE_AES_KEY_256 2
+#define AIROHA_SOE_QDMA_QUEUE_ENCRYPT 8
+#define AIROHA_SOE_QDMA_QUEUE_DECRYPT 9
+#define AIROHA_SOE_NATT_PORT 4500
+#define AIROHA_SOE_HOP_FLAG_ENCRYPTED 3
+#define AIROHA_SOE_HOP_FLAG_DECRYPTED 4
+#define AIROHA_SOE_HOP_FLAG_ERROR_BASE 5
+#define AIROHA_SOE_HOP_INFO_ENCRYPT 2
+#define AIROHA_SOE_HOP_INFO_DECRYPT 3
+
+static unsigned int airoha_soe_rx_trace_packets;
+module_param_named(soe_rx_trace_packets, airoha_soe_rx_trace_packets, uint,
+ 0600);
+MODULE_PARM_DESC(soe_rx_trace_packets,
+ "Number of SOE RX completion IPv4 headers to log");
+
+enum airoha_soe_ctx_dir {
+ AIROHA_SOE_CTX_OUT,
+ AIROHA_SOE_CTX_IN,
+};
+
+struct airoha_soe_ctx {
+ struct list_head list;
+ enum airoha_soe_ctx_dir dir;
+ union {
+ struct dst_entry *dst;
+ struct {
+ struct xfrm_state *x;
+ struct airoha_gdm_dev *gdm_dev;
+ struct net_device *dev;
+ __be32 saddr;
+ __be16 sport;
+ u16 foe_hash;
+ u32 foe_reason;
+ u8 sa_index;
+ bool foe_valid;
+ u32 mark;
+ } rx;
+ };
+};
+
+struct airoha_soe_sa {
+ struct airoha_soe *soe;
+ unsigned int index;
+ u32 cmd;
+ u32 spi;
+
+ spinlock_t lock; /* Protects in-flight context queues and dead. */
+ struct list_head tx_queue;
+ struct list_head rx_queue;
+ struct completion idle;
+ unsigned int inflight;
+ bool dead;
+};
+
+struct airoha_soe_xfrm_state {
+ struct airoha_gdm_dev *dev;
+ struct airoha_soe *soe;
+ struct airoha_soe_sa *sa;
+ bool counted;
+};
+
+struct airoha_soe_sa_cfg {
+ u32 cmd;
+ u32 spi;
+ u32 udp_port;
+ u32 enc_key[AIROHA_SOE_KEY_WORDS];
+ u32 hmac_key[AIROHA_SOE_KEY_WORDS];
+ u32 src_addr[AIROHA_SOE_ADDR_WORDS];
+ u32 dst_addr[AIROHA_SOE_ADDR_WORDS];
+ u64 soft_byte_limit;
+ u64 hard_byte_limit;
+ u64 soft_packet_limit;
+ u64 hard_packet_limit;
+};
+
+struct airoha_soe_rx_info {
+ int packet_len;
+ bool encap;
+ __be16 sport;
+ __be16 dport;
+ __be32 spi;
+};
+
+struct airoha_soe {
+ struct device *dev;
+ void __iomem *base;
+
+ /* Serialize SA table programming and software slot ownership. */
+ struct mutex sa_lock;
+ unsigned long sa_map;
+ struct airoha_soe_sa __rcu *sa[AIROHA_SOE_NUM_SA];
+ atomic_t pending_rx;
+
+ spinlock_t state_lock; /* Protects dead against concurrent users. */
+ refcount_t refcnt;
+ struct completion released;
+ bool dead;
+};
+
+static const struct xfrmdev_ops airoha_soe_xfrmdev_ops;
+static const struct xfrmdev_ops airoha_soe_dsa_xfrmdev_ops;
+
+static struct airoha_soe *airoha_soe_get_ref(struct airoha_soe *soe)
+{
+ unsigned long flags;
+ bool alive;
+
+ if (!soe)
+ return NULL;
+
+ spin_lock_irqsave(&soe->state_lock, flags);
+ alive = !soe->dead && refcount_inc_not_zero(&soe->refcnt);
+ spin_unlock_irqrestore(&soe->state_lock, flags);
+
+ return alive ? soe : NULL;
+}
+
+static void airoha_soe_put_ref(struct airoha_soe *soe)
+{
+ if (soe && refcount_dec_and_test(&soe->refcnt))
+ complete(&soe->released);
+}
+
+bool airoha_soe_available(struct airoha_soe *soe)
+{
+ unsigned long flags;
+ bool available;
+
+ if (!soe)
+ return false;
+
+ spin_lock_irqsave(&soe->state_lock, flags);
+ available = !soe->dead;
+ spin_unlock_irqrestore(&soe->state_lock, flags);
+
+ return available;
+}
+
+u32 airoha_soe_features(struct airoha_soe *soe)
+{
+ return airoha_soe_available(soe) ? AIROHA_SOE_FEATURE_ESP : 0;
+}
+
+static u64 airoha_soe_limit(u64 limit)
+{
+ return limit == XFRM_INF ? U64_MAX : limit;
+}
+
+static int airoha_soe_wait_sa_done(struct airoha_soe *soe)
+{
+ u32 done;
+ int err;
+
+ err = readl_poll_timeout(soe->base + AIROHA_SOE_SA_DONE, done,
+ done & AIROHA_SOE_SA_DONE_W1C, 1,
+ AIROHA_SOE_SA_TIMEOUT_US);
+ writel(0, soe->base + AIROHA_SOE_SA_CTRL);
+ writel(AIROHA_SOE_SA_DONE_W1C, soe->base + AIROHA_SOE_SA_DONE);
+
+ return err;
+}
+
+static int airoha_soe_commit_sa(struct airoha_soe *soe, unsigned int index)
+{
+ u32 ctrl;
+
+ /* SA registers are a single staging window committed by index. */
+ writel(AIROHA_SOE_SA_DONE_W1C, soe->base + AIROHA_SOE_SA_DONE);
+ ctrl = FIELD_PREP(AIROHA_SOE_SA_CTRL_IDX, index) |
+ AIROHA_SOE_SA_CTRL_WR;
+ writel(ctrl, soe->base + AIROHA_SOE_SA_CTRL);
+
+ return airoha_soe_wait_sa_done(soe);
+}
+
+static void airoha_soe_write_key(void __iomem *base, u32 reg, const u32 *key)
+{
+ unsigned int i;
+
+ for (i = 0; i < AIROHA_SOE_KEY_WORDS; i++)
+ writel(key[i], base + reg + i * sizeof(u32));
+}
+
+static void airoha_soe_write_addr(void __iomem *base, u32 reg, const u32 *addr)
+{
+ unsigned int i;
+
+ for (i = 0; i < AIROHA_SOE_ADDR_WORDS; i++)
+ writel(addr[i], base + reg + i * sizeof(u32));
+}
+
+static int airoha_soe_program_sa_locked(struct airoha_soe *soe,
+ unsigned int index,
+ const struct airoha_soe_sa_cfg *cfg)
+{
+ void __iomem *base = soe->base;
+
+ writel(cfg->cmd | AIROHA_SOE_SA_CMD_VLD, base + AIROHA_SOE_SA_CMD);
+ writel(lower_32_bits(cfg->soft_byte_limit),
+ base + AIROHA_SOE_BCNT_THSHD_32_SOFT);
+ writel(upper_32_bits(cfg->soft_byte_limit),
+ base + AIROHA_SOE_BCNT_THSHD_64_SOFT);
+ writel(cfg->spi, base + AIROHA_SOE_SA_SPI);
+ writel(cfg->udp_port, base + AIROHA_SOE_SA_UDP_PORT);
+ airoha_soe_write_key(base, AIROHA_SOE_SA_ENC_KEY(0), cfg->enc_key);
+ airoha_soe_write_key(base, AIROHA_SOE_SA_HMAC_KEY(0), cfg->hmac_key);
+ airoha_soe_write_addr(base, AIROHA_SOE_SA_SRC_ADDR(0), cfg->src_addr);
+ airoha_soe_write_addr(base, AIROHA_SOE_SA_DST_ADDR(0), cfg->dst_addr);
+
+ writel(0, base + AIROHA_SOE_ICV_OK_LO_CNT);
+ writel(0, base + AIROHA_SOE_ICV_OK_HI_CNT);
+ writel(0, base + AIROHA_SOE_ICV_FAIL_LO_CNT);
+ writel(0, base + AIROHA_SOE_ICV_FAIL_HI_CNT);
+ writel(0, base + AIROHA_SOE_CON_ICV_FAIL_CNT);
+ writel(0, base + AIROHA_SOE_SEQ_NUM_LO);
+ writel(0, base + AIROHA_SOE_SEQ_NUM_HI);
+ writel(0, base + AIROHA_SOE_BCNT_LO);
+ writel(0, base + AIROHA_SOE_BCNT_HI);
+ writel(0, base + AIROHA_SOE_FLOW_LAB_DSCP);
+ writel(0, base + AIROHA_SOE_BCNT_80);
+ writel(0xffffffff, base + AIROHA_SOE_BCNT_THSHD_80);
+ writel(lower_32_bits(cfg->hard_byte_limit),
+ base + AIROHA_SOE_BCNT_THSHD_32_HARD);
+ writel(upper_32_bits(cfg->hard_byte_limit),
+ base + AIROHA_SOE_BCNT_THSHD_64_HARD);
+ writel(lower_32_bits(cfg->soft_packet_limit),
+ base + AIROHA_SOE_SEQ_THSHD_32_SOFT);
+ writel(upper_32_bits(cfg->soft_packet_limit),
+ base + AIROHA_SOE_SEQ_THSHD_64_SOFT);
+ writel(lower_32_bits(cfg->hard_packet_limit),
+ base + AIROHA_SOE_SEQ_THSHD_32_HARD);
+ writel(upper_32_bits(cfg->hard_packet_limit),
+ base + AIROHA_SOE_SEQ_THSHD_64_HARD);
+
+ return airoha_soe_commit_sa(soe, index);
+}
+
+static int airoha_soe_clear_sa_locked(struct airoha_soe *soe,
+ unsigned int index)
+{
+ struct airoha_soe_sa_cfg cfg = {};
+
+ return airoha_soe_program_sa_locked(soe, index, &cfg);
+}
+
+static void airoha_soe_copy_words(u32 *dst, const u8 *src, unsigned int bits)
+{
+ unsigned int words = bits / (BITS_PER_BYTE * sizeof(u32));
+ unsigned int i;
+
+ for (i = 0; i < words && i < AIROHA_SOE_KEY_WORDS; i++)
+ dst[i] = get_unaligned_be32(src + i * sizeof(u32));
+}
+
+static int airoha_soe_aes_key_len(unsigned int bits,
+ struct netlink_ext_ack *extack, u32 *val)
+{
+ switch (bits) {
+ case 128:
+ *val = AIROHA_SOE_AES_KEY_128;
+ return 0;
+ case 192:
+ *val = AIROHA_SOE_AES_KEY_192;
+ return 0;
+ case 256:
+ *val = AIROHA_SOE_AES_KEY_256;
+ return 0;
+ default:
+ NL_SET_ERR_MSG_MOD(extack,
+ "SOE supports AES-128/192/256 keys only");
+ return -EOPNOTSUPP;
+ }
+}
+
+static int airoha_soe_build_algo(struct xfrm_state *x,
+ struct airoha_soe_sa_cfg *cfg,
+ struct netlink_ext_ack *extack)
+{
+ u32 key_len;
+ u32 field;
+ int err;
+
+ if (x->aead) {
+ if (strcmp(x->aead->alg_name, "rfc4106(gcm(aes))")) {
+ NL_SET_ERR_MSG_MOD(extack,
+ "SOE supports rfc4106(gcm(aes)) AEAD only");
+ return -EOPNOTSUPP;
+ }
+
+ if (x->aead->alg_key_len < 32) {
+ NL_SET_ERR_MSG_MOD(extack, "invalid AEAD key length");
+ return -EINVAL;
+ }
+
+ key_len = x->aead->alg_key_len - 32;
+ err = airoha_soe_aes_key_len(key_len, extack, &field);
+ if (err)
+ return err;
+
+ cfg->cmd |= FIELD_PREP(AIROHA_SOE_SA_CMD_CIPHER,
+ AIROHA_SOE_CIPHER_AES_GCM);
+ cfg->cmd |= FIELD_PREP(AIROHA_SOE_SA_CMD_AES_KEY_LEN, field);
+ switch (x->aead->alg_icv_len) {
+ case 64:
+ field = 0;
+ break;
+ case 96:
+ field = 1;
+ break;
+ case 128:
+ field = 2;
+ break;
+ default:
+ NL_SET_ERR_MSG_MOD(extack,
+ "SOE supports 64/96/128-bit GCM ICV only");
+ return -EOPNOTSUPP;
+ }
+ cfg->cmd |= FIELD_PREP(AIROHA_SOE_SA_CMD_GCM_ICV_LEN, field);
+ airoha_soe_copy_words(cfg->enc_key, x->aead->alg_key, key_len);
+ cfg->hmac_key[0] =
+ get_unaligned_be32(x->aead->alg_key + key_len / 8);
+ return 0;
+ }
+
+ if (!x->ealg || strcmp(x->ealg->alg_name, "cbc(aes)")) {
+ NL_SET_ERR_MSG_MOD(extack,
+ "SOE supports cbc(aes) encryption only");
+ return -EOPNOTSUPP;
+ }
+
+ err = airoha_soe_aes_key_len(x->ealg->alg_key_len, extack, &field);
+ if (err)
+ return err;
+
+ cfg->cmd |=
+ FIELD_PREP(AIROHA_SOE_SA_CMD_CIPHER, AIROHA_SOE_CIPHER_AES_CBC);
+ cfg->cmd |= FIELD_PREP(AIROHA_SOE_SA_CMD_AES_KEY_LEN, field);
+ airoha_soe_copy_words(cfg->enc_key, x->ealg->alg_key,
+ x->ealg->alg_key_len);
+
+ if (!x->aalg) {
+ NL_SET_ERR_MSG_MOD(extack,
+ "SOE CBC mode requires HMAC authentication");
+ return -EOPNOTSUPP;
+ }
+
+ if (!strcmp(x->aalg->alg_name, "hmac(sha1)")) {
+ if (x->aalg->alg_key_len != 160 ||
+ x->aalg->alg_trunc_len != 96) {
+ NL_SET_ERR_MSG_MOD(extack,
+ "SOE supports HMAC-SHA1-96 only");
+ return -EOPNOTSUPP;
+ }
+ field = AIROHA_SOE_HASH_HMAC_SHA1_96;
+ } else if (!strcmp(x->aalg->alg_name, "hmac(sha256)")) {
+ if (x->aalg->alg_key_len != 256 ||
+ x->aalg->alg_trunc_len != 128) {
+ NL_SET_ERR_MSG_MOD(extack,
+ "SOE supports HMAC-SHA256-128 only");
+ return -EOPNOTSUPP;
+ }
+ field = AIROHA_SOE_HASH_HMAC_SHA256_128;
+ } else {
+ NL_SET_ERR_MSG_MOD(extack,
+ "SOE supports HMAC-SHA1/SHA256 only");
+ return -EOPNOTSUPP;
+ }
+
+ cfg->cmd |= FIELD_PREP(AIROHA_SOE_SA_CMD_HASH, field);
+ airoha_soe_copy_words(cfg->hmac_key, x->aalg->alg_key,
+ x->aalg->alg_key_len);
+
+ return 0;
+}
+
+static int airoha_soe_build_replay(struct xfrm_state *x,
+ struct airoha_soe_sa_cfg *cfg,
+ struct netlink_ext_ack *extack)
+{
+ u32 window;
+
+ if ((x->props.flags & XFRM_STATE_ESN) ||
+ x->repl_mode == XFRM_REPLAY_MODE_ESN) {
+ NL_SET_ERR_MSG_MOD(extack, "SOE ESN is not supported yet");
+ return -EOPNOTSUPP;
+ }
+
+ window = x->replay_esn ? x->replay_esn->replay_window :
+ x->props.replay_window;
+ if (!window)
+ return 0;
+
+ cfg->cmd |= AIROHA_SOE_SA_CMD_ANTI_RPLY_EN;
+ cfg->cmd |= FIELD_PREP(AIROHA_SOE_SA_CMD_ANTI_RPLY_WDW,
+ min_t(u32, (window - 1) / 64, 3));
+
+ return 0;
+}
+
+static int airoha_soe_build_sa(struct xfrm_state *x,
+ struct airoha_soe_sa_cfg *cfg,
+ struct netlink_ext_ack *extack)
+{
+ int err;
+
+ if (x->xso.type != XFRM_DEV_OFFLOAD_PACKET) {
+ NL_SET_ERR_MSG_MOD(extack,
+ "SOE supports XFRM packet offload only");
+ return -EOPNOTSUPP;
+ }
+
+ if (x->xso.dir != XFRM_DEV_OFFLOAD_OUT &&
+ x->xso.dir != XFRM_DEV_OFFLOAD_IN) {
+ NL_SET_ERR_MSG_MOD(extack, "SOE supports in/out SAs only");
+ return -EOPNOTSUPP;
+ }
+
+ if (x->id.proto != IPPROTO_ESP) {
+ NL_SET_ERR_MSG_MOD(extack, "SOE supports ESP only");
+ return -EOPNOTSUPP;
+ }
+
+ if (x->props.family != AF_INET || x->outer_mode.family != AF_INET) {
+ NL_SET_ERR_MSG_MOD(extack,
+ "SOE bring-up supports IPv4 outer tunnel only");
+ return -EOPNOTSUPP;
+ }
+
+ if (x->props.mode != XFRM_MODE_TUNNEL) {
+ NL_SET_ERR_MSG_MOD(extack, "SOE supports tunnel mode only");
+ return -EOPNOTSUPP;
+ }
+
+ if (x->encap && x->encap->encap_type != UDP_ENCAP_ESPINUDP) {
+ NL_SET_ERR_MSG_MOD(extack,
+ "SOE supports native ESP or UDP_ENCAP_ESPINUDP");
+ return -EOPNOTSUPP;
+ }
+
+ if (x->tfcpad) {
+ NL_SET_ERR_MSG_MOD(extack, "SOE does not support TFC padding");
+ return -EOPNOTSUPP;
+ }
+
+ cfg->cmd = AIROHA_SOE_SA_CMD_SN_ERR_DROP |
+ AIROHA_SOE_SA_CMD_PAD_ERR_DROP |
+ AIROHA_SOE_SA_CMD_ICV_ERR_DROP;
+ if (x->xso.dir == XFRM_DEV_OFFLOAD_OUT) {
+ cfg->cmd |= AIROHA_SOE_SA_CMD_ENC;
+ if (x->encap)
+ cfg->cmd |= AIROHA_SOE_SA_CMD_NAT_EN;
+ cfg->src_addr[0] = be32_to_cpu(x->props.saddr.a4);
+ cfg->dst_addr[0] = be32_to_cpu(x->id.daddr.a4);
+ } else if (x->encap) {
+ /* RX submit passes the full UDP/4500 packet to SOE. Ask the
+ * decrypt parser to consume the UDP header before ESP decap.
+ */
+ cfg->cmd |= AIROHA_SOE_SA_CMD_DEC_UDP_PARSER_EN;
+ }
+
+ err = airoha_soe_build_algo(x, cfg, extack);
+ if (err)
+ return err;
+
+ err = airoha_soe_build_replay(x, cfg, extack);
+ if (err)
+ return err;
+
+ cfg->spi = be32_to_cpu(x->id.spi);
+ if (x->encap) {
+ /* The NAT-T port word stores dport above sport. */
+ cfg->udp_port = (u32)ntohs(x->encap->encap_dport) << 16 |
+ ntohs(x->encap->encap_sport);
+ }
+ cfg->soft_byte_limit = airoha_soe_limit(x->lft.soft_byte_limit);
+ cfg->hard_byte_limit = airoha_soe_limit(x->lft.hard_byte_limit);
+ cfg->soft_packet_limit = airoha_soe_limit(x->lft.soft_packet_limit);
+ cfg->hard_packet_limit = airoha_soe_limit(x->lft.hard_packet_limit);
+
+ return 0;
+}
+
+static int airoha_soe_alloc_sa(struct airoha_soe *soe, struct xfrm_state *x,
+ struct netlink_ext_ack *extack,
+ struct airoha_soe_sa **sa)
+{
+ struct airoha_soe_sa_cfg cfg = {};
+ struct airoha_soe_sa *new_sa;
+ unsigned int i;
+ int err;
+
+ if (!soe || !sa || !airoha_soe_available(soe)) {
+ NL_SET_ERR_MSG_MOD(extack, "SOE provider is unavailable");
+ return -ENODEV;
+ }
+
+ err = airoha_soe_build_sa(x, &cfg, extack);
+ if (err)
+ return err;
+
+ new_sa = kzalloc_obj(*new_sa, GFP_KERNEL);
+ if (!new_sa)
+ return -ENOMEM;
+
+ mutex_lock(&soe->sa_lock);
+ for (i = 0; i < AIROHA_SOE_NUM_SA; i++) {
+ if (!(soe->sa_map & BIT(i)))
+ break;
+ }
+ if (i == AIROHA_SOE_NUM_SA) {
+ mutex_unlock(&soe->sa_lock);
+ kfree(new_sa);
+ return -ENOSPC;
+ }
+
+ err = airoha_soe_program_sa_locked(soe, i, &cfg);
+ if (err) {
+ mutex_unlock(&soe->sa_lock);
+ kfree(new_sa);
+ return err;
+ }
+
+ new_sa->soe = soe;
+ new_sa->index = i;
+ new_sa->cmd = cfg.cmd;
+ new_sa->spi = cfg.spi;
+ spin_lock_init(&new_sa->lock);
+ INIT_LIST_HEAD(&new_sa->tx_queue);
+ INIT_LIST_HEAD(&new_sa->rx_queue);
+ init_completion(&new_sa->idle);
+ rcu_assign_pointer(soe->sa[i], new_sa);
+ soe->sa_map |= BIT(i);
+ mutex_unlock(&soe->sa_lock);
+
+ *sa = new_sa;
+ return 0;
+}
+
+static void airoha_soe_mark_sa_dead(struct airoha_soe_sa *sa)
+{
+ if (!sa)
+ return;
+
+ spin_lock_bh(&sa->lock);
+ sa->dead = true;
+ if (!sa->inflight)
+ complete(&sa->idle);
+ spin_unlock_bh(&sa->lock);
+}
+
+static void airoha_soe_free_ctx(struct airoha_soe_ctx *ctx)
+{
+ if (!ctx)
+ return;
+
+ if (ctx->dir == AIROHA_SOE_CTX_OUT)
+ dst_release(ctx->dst);
+ else
+ xfrm_state_put(ctx->rx.x);
+ kfree(ctx);
+}
+
+static void airoha_soe_purge_ctx_list(struct list_head *head)
+{
+ struct airoha_soe_ctx *ctx, *tmp;
+
+ list_for_each_entry_safe(ctx, tmp, head, list) {
+ list_del(&ctx->list);
+ airoha_soe_free_ctx(ctx);
+ }
+}
+
+static void airoha_soe_forget_rx_ctx_list(struct airoha_soe_sa *sa)
+{
+ if (!list_empty(&sa->rx_queue))
+ atomic_sub((int)list_count_nodes(&sa->rx_queue),
+ &sa->soe->pending_rx);
+}
+
+static void airoha_soe_abort_sa(struct airoha_soe_sa *sa)
+{
+ LIST_HEAD(rx_queue);
+ LIST_HEAD(tx_queue);
+
+ if (!sa)
+ return;
+
+ spin_lock_bh(&sa->lock);
+ sa->dead = true;
+ airoha_soe_forget_rx_ctx_list(sa);
+ list_splice_init(&sa->tx_queue, &tx_queue);
+ list_splice_init(&sa->rx_queue, &rx_queue);
+ sa->inflight = 0;
+ complete(&sa->idle);
+ spin_unlock_bh(&sa->lock);
+
+ airoha_soe_purge_ctx_list(&tx_queue);
+ airoha_soe_purge_ctx_list(&rx_queue);
+}
+
+static void airoha_soe_free_sa(struct airoha_soe_sa *sa)
+{
+ LIST_HEAD(rx_queue);
+ LIST_HEAD(tx_queue);
+ struct airoha_soe *soe;
+
+ if (!sa)
+ return;
+
+ soe = sa->soe;
+ airoha_soe_mark_sa_dead(sa);
+ if (!wait_for_completion_timeout(&sa->idle, AIROHA_SOE_SA_FREE_TIMEOUT))
+ dev_warn(soe->dev,
+ "timed out waiting for SOE SA%u in-flight packets\n",
+ sa->index);
+
+ mutex_lock(&soe->sa_lock);
+ if (sa->index < AIROHA_SOE_NUM_SA &&
+ rcu_access_pointer(soe->sa[sa->index]) == sa) {
+ airoha_soe_clear_sa_locked(soe, sa->index);
+ RCU_INIT_POINTER(soe->sa[sa->index], NULL);
+ soe->sa_map &= ~BIT(sa->index);
+ }
+ mutex_unlock(&soe->sa_lock);
+ synchronize_rcu();
+
+ spin_lock_bh(&sa->lock);
+ airoha_soe_forget_rx_ctx_list(sa);
+ list_splice_init(&sa->tx_queue, &tx_queue);
+ list_splice_init(&sa->rx_queue, &rx_queue);
+ spin_unlock_bh(&sa->lock);
+ airoha_soe_purge_ctx_list(&tx_queue);
+ airoha_soe_purge_ctx_list(&rx_queue);
+
+ kfree(sa);
+}
+
+static struct airoha_soe_ctx *airoha_soe_pop_ctx(struct airoha_soe_sa *sa,
+ enum airoha_soe_ctx_dir dir)
+{
+ struct list_head *head;
+ struct airoha_soe_ctx *ctx = NULL;
+
+ head = dir == AIROHA_SOE_CTX_OUT ? &sa->tx_queue : &sa->rx_queue;
+
+ spin_lock_bh(&sa->lock);
+ if (!list_empty(head)) {
+ ctx = list_first_entry(head, struct airoha_soe_ctx, list);
+ list_del(&ctx->list);
+ if (dir == AIROHA_SOE_CTX_IN)
+ atomic_dec(&sa->soe->pending_rx);
+ }
+
+ if (ctx && !WARN_ON_ONCE(!sa->inflight)) {
+ sa->inflight--;
+ if (sa->dead && !sa->inflight)
+ complete(&sa->idle);
+ }
+ spin_unlock_bh(&sa->lock);
+
+ return ctx;
+}
+
+static int airoha_soe_prepare_ip_headers(struct sk_buff *skb)
+{
+ unsigned int hdr_len;
+
+ if (!pskb_may_pull(skb, 1))
+ return -EINVAL;
+
+ switch (skb->data[0] & 0xf0) {
+ case 0x40:
+ hdr_len = sizeof(struct iphdr);
+ skb->protocol = htons(ETH_P_IP);
+ break;
+ case 0x60:
+ hdr_len = sizeof(struct ipv6hdr);
+ skb->protocol = htons(ETH_P_IPV6);
+ break;
+ default:
+ return -EINVAL;
+ }
+
+ if (!pskb_may_pull(skb, hdr_len))
+ return -EINVAL;
+
+ skb_reset_network_header(skb);
+ skb_set_transport_header(skb, hdr_len);
+
+ return 0;
+}
+
+static void airoha_soe_trace_rx_complete(struct sk_buff *skb,
+ const struct airoha_soe_ctx *ctx,
+ const struct xfrm_state *x)
+{
+ unsigned int trace = READ_ONCE(airoha_soe_rx_trace_packets);
+ const struct iphdr *iph;
+
+ if (!trace || skb->protocol != htons(ETH_P_IP))
+ return;
+
+ iph = ip_hdr(skb);
+ pr_info("airoha_eth: SOE RX complete dev=%s saddr=%pI4 daddr=%pI4 proto=%u len=%u mark=0x%x spi=0x%08x natt=%u foe=%u hash=0x%04x sa=%u\n",
+ ctx->rx.dev->name, &iph->saddr, &iph->daddr, iph->protocol,
+ ntohs(iph->tot_len), skb->mark, ntohl(x->id.spi),
+ x->encap ? 1 : 0, ctx->rx.foe_valid, ctx->rx.foe_hash,
+ ctx->rx.sa_index);
+ WRITE_ONCE(airoha_soe_rx_trace_packets, trace - 1);
+}
+
+static int airoha_soe_push_l2_header(struct sk_buff *skb)
+{
+ static const u8 ipv4_l2_header[ETH_HLEN] = {
+ 0x00, 0x0c, 0xe7, 0x20, 0x21, 0x12, 0x00,
+ 0x0c, 0xe7, 0x20, 0x22, 0x62, 0x08, 0x00,
+ };
+ static const u8 ipv6_l2_header[ETH_HLEN] = {
+ 0x00, 0x0c, 0xe7, 0x20, 0x21, 0x12, 0x00,
+ 0x0c, 0xe7, 0x20, 0x22, 0x62, 0x86, 0xdd,
+ };
+ const u8 *l2_header;
+ int err;
+
+ err = airoha_soe_prepare_ip_headers(skb);
+ if (err)
+ return err;
+
+ if (skb->protocol == htons(ETH_P_IP))
+ l2_header = ipv4_l2_header;
+ else
+ l2_header = ipv6_l2_header;
+
+ /* TDMA/SOE port 7 expects an Ethernet-looking frame before the SOE hop. */
+ memcpy(skb_push(skb, ETH_HLEN), l2_header, ETH_HLEN);
+
+ return 0;
+}
+
+static void airoha_soe_push_hop_desc(struct sk_buff *skb, unsigned int sa_index,
+ bool encrypt, int foe_idx)
+{
+ u32 hop_direction = encrypt ? AIROHA_SOE_HOP_INFO_ENCRYPT :
+ AIROHA_SOE_HOP_INFO_DECRYPT;
+ u64 desc3 = ((u64)(u16)((hop_direction << 4) | 0x80) << 48) |
+ ((u64)(sa_index & 0x3f) << 40) | 0x05dc0000ULL;
+ u64 desc2 = 0;
+ __le64 desc[4] = {};
+
+ if (foe_idx >= 0)
+ desc2 = (u64)(foe_idx & 0xffff) << 32;
+
+ desc[0] = cpu_to_le64(encrypt ? AIROHA_SOE_HOP_DESC0_ENCRYPT :
+ AIROHA_SOE_HOP_DESC0_DECRYPT);
+ desc[1] = cpu_to_le64(AIROHA_SOE_HOP_DESC1);
+ desc[2] = cpu_to_le64(desc2);
+ desc[3] = cpu_to_le64(desc3);
+ ((u8 *)desc)[28] = sa_index;
+
+ /* The FE/QDMA hop descriptor is consumed by PSE port 7 before SOE. */
+ memcpy(skb_push(skb, AIROHA_SOE_QDMA_HOP_DESC_LEN), desc, sizeof(desc));
+}
+
+static int airoha_soe_submit_skb(struct airoha_soe_sa *sa,
+ struct airoha_gdm_dev *dev,
+ struct sk_buff *skb,
+ struct airoha_soe_ctx *ctx)
+{
+ struct net_device *netdev = netdev_from_priv(dev);
+ u32 queue = ctx->dir == AIROHA_SOE_CTX_OUT ?
+ AIROHA_SOE_QDMA_QUEUE_ENCRYPT :
+ AIROHA_SOE_QDMA_QUEUE_DECRYPT;
+ bool encrypt = ctx->dir == AIROHA_SOE_CTX_OUT;
+ unsigned int headroom = AIROHA_SOE_QDMA_HOP_DESC_LEN + ETH_HLEN;
+ struct list_head *head;
+ u32 msg0, msg1;
+ int foe_idx = -1;
+ int err;
+
+ if (skb->ip_summed == CHECKSUM_PARTIAL) {
+ err = skb_checksum_help(skb);
+ if (err)
+ return err;
+ }
+
+ err = skb_cow_head(skb, headroom);
+ if (err)
+ return err;
+
+ err = airoha_soe_push_l2_header(skb);
+ if (err)
+ return err;
+
+ msg0 = FIELD_PREP(QDMA_ETH_TXMSG_SOE_SA_MASK, sa->index & 0x3f);
+ msg1 = FIELD_PREP(QDMA_ETH_TXMSG_METER_MASK, 0x7f) |
+ FIELD_PREP(QDMA_ETH_TXMSG_FPORT_MASK, 7) |
+ FIELD_PREP(QDMA_ETH_TXMSG_NBOQ_MASK, queue) |
+ QDMA_ETH_TXMSG_HOP_MASK |
+ FIELD_PREP(QDMA_ETH_TXMSG_ACNT_G1_MASK, 0x1f) |
+ FIELD_PREP(QDMA_ETH_TXMSG_ACNT_G0_MASK, 0x3f);
+
+ if (ctx->dir == AIROHA_SOE_CTX_IN && ctx->rx.foe_valid &&
+ ctx->rx.foe_hash != AIROHA_RXD4_FOE_ENTRY)
+ foe_idx = ctx->rx.foe_hash;
+
+ airoha_soe_push_hop_desc(skb, sa->index, encrypt, foe_idx);
+
+ skb->dev = netdev;
+ skb_set_queue_mapping(skb, AIROHA_SOE_QDMA_TX_RING);
+
+ if (!dev->soe_xmit_skb)
+ return -ENODEV;
+
+ head = ctx->dir == AIROHA_SOE_CTX_OUT ? &sa->tx_queue : &sa->rx_queue;
+ spin_lock_bh(&sa->lock);
+ if (sa->dead) {
+ spin_unlock_bh(&sa->lock);
+ return -ENOENT;
+ }
+
+ /* Completion descriptors carry only SA/hop flags, so keep skb context here. */
+ list_add_tail(&ctx->list, head);
+ sa->inflight++;
+ if (ctx->dir == AIROHA_SOE_CTX_IN)
+ atomic_inc(&sa->soe->pending_rx);
+ reinit_completion(&sa->idle);
+
+ err = dev->soe_xmit_skb(dev, skb, msg0, msg1,
+ AIROHA_SOE_TXMSG2_DEFAULT);
+ if (err) {
+ list_del(&ctx->list);
+ if (ctx->dir == AIROHA_SOE_CTX_IN)
+ atomic_dec(&sa->soe->pending_rx);
+ sa->inflight--;
+ if (sa->dead && !sa->inflight)
+ complete(&sa->idle);
+ }
+ spin_unlock_bh(&sa->lock);
+
+ return err;
+}
+
+int airoha_soe_xmit(struct airoha_soe_sa *sa, struct airoha_gdm_dev *dev,
+ struct sk_buff *skb, struct xfrm_state *x)
+{
+ struct airoha_soe_ctx *ctx;
+ struct dst_entry *path;
+ struct dst_entry *dst;
+ int err;
+
+ if (!sa || !dev || !skb || !x || x->xso.dir != XFRM_DEV_OFFLOAD_OUT)
+ return -EINVAL;
+
+ if (skb_is_gso(skb))
+ return -EOPNOTSUPP;
+
+ dst = skb_dst(skb);
+ if (!dst)
+ return -EHOSTUNREACH;
+
+ path = xfrm_dst_path(dst);
+ if (!path)
+ return -EHOSTUNREACH;
+
+ ctx = kzalloc_obj(*ctx, GFP_ATOMIC);
+ if (!ctx)
+ return -ENOMEM;
+
+ ctx->dir = AIROHA_SOE_CTX_OUT;
+ dst_hold(path);
+ ctx->dst = path;
+
+ err = airoha_soe_submit_skb(sa, dev, skb, ctx);
+ if (err) {
+ airoha_soe_free_ctx(ctx);
+ return err;
+ }
+
+ return 0;
+}
+
+static bool airoha_soe_rx_parse_ipv4(struct sk_buff *skb,
+ struct airoha_soe_rx_info *info)
+{
+ struct ip_esp_hdr *esph;
+ struct udphdr *uh;
+ struct iphdr *iph;
+ int iphlen;
+ int udp_len;
+ int packet_len;
+
+ if (skb->protocol != htons(ETH_P_IP)) {
+ if (!pskb_may_pull(skb, 1) || (skb->data[0] >> 4) != 4)
+ return false;
+
+ skb->protocol = htons(ETH_P_IP);
+ }
+
+ if (!pskb_may_pull(skb, sizeof(*iph)))
+ return false;
+
+ iph = ip_hdr(skb);
+ if (iph->version != 4 || ip_is_fragment(iph))
+ return false;
+
+ iphlen = iph->ihl * 4;
+ packet_len = ntohs(iph->tot_len);
+ if (iphlen < sizeof(*iph) || packet_len > skb->len)
+ return false;
+
+ if (iph->protocol == IPPROTO_ESP) {
+ if (packet_len <= iphlen + sizeof(*esph) ||
+ !pskb_may_pull(skb, iphlen + sizeof(*esph)))
+ return false;
+
+ esph = (struct ip_esp_hdr *)(skb->data + iphlen);
+ if (!esph->spi)
+ return false;
+
+ info->packet_len = packet_len;
+ info->encap = false;
+ info->sport = 0;
+ info->dport = 0;
+ info->spi = esph->spi;
+
+ return true;
+ }
+
+ if (iph->protocol != IPPROTO_UDP ||
+ !pskb_may_pull(skb, iphlen + sizeof(*uh) + sizeof(*esph)))
+ return false;
+
+ uh = (struct udphdr *)(skb->data + iphlen);
+ udp_len = ntohs(uh->len);
+ if (uh->dest != htons(AIROHA_SOE_NATT_PORT) ||
+ udp_len <= sizeof(*uh) + sizeof(*esph) ||
+ iphlen + udp_len != packet_len || packet_len > skb->len)
+ return false;
+
+ esph = (struct ip_esp_hdr *)(skb->data + iphlen + sizeof(*uh));
+ if (!esph->spi)
+ return false;
+
+ info->packet_len = packet_len;
+ info->encap = true;
+ info->sport = uh->source;
+ info->dport = uh->dest;
+ info->spi = esph->spi;
+
+ return true;
+}
+
+/* Plain ESP/NAT-T first arrives as normal RX, then is bounced to SOE decrypt. */
+bool airoha_soe_rx_plain_skb(struct airoha_gdm_dev *dev, struct sk_buff *skb,
+ struct net_device *rx_dev, u16 foe_hash,
+ u32 foe_reason, bool foe_valid)
+{
+ struct airoha_soe_xfrm_state *state;
+ struct airoha_soe_rx_info info = {};
+ struct airoha_soe_ctx *ctx;
+ xfrm_address_t daddr = {};
+ struct xfrm_state *x;
+ int err;
+
+ if (!dev || !skb || !rx_dev)
+ return false;
+
+ if (!dev->eth->soe || !(rx_dev->features & NETIF_F_HW_ESP))
+ return false;
+
+ if (!atomic_read(&dev->soe_xfrm_state_count))
+ return false;
+
+ /* The packet is still in the driver RX path after eth_type_trans(). */
+ skb_reset_network_header(skb);
+ if (!airoha_soe_rx_parse_ipv4(skb, &info))
+ return false;
+
+ if (skb->len != info.packet_len && pskb_trim(skb, info.packet_len))
+ return false;
+
+ daddr.a4 = ip_hdr(skb)->daddr;
+ x = xfrm_input_state_lookup(dev_net(rx_dev), skb->mark, &daddr,
+ info.spi, IPPROTO_ESP, AF_INET);
+ if (!x)
+ return false;
+
+ if (x->xso.dir != XFRM_DEV_OFFLOAD_IN)
+ goto put_state;
+ if (x->xso.type != XFRM_DEV_OFFLOAD_PACKET)
+ goto put_state;
+ if (x->xso.dev != rx_dev)
+ goto put_state;
+ if ((info.encap &&
+ (!x->encap || x->encap->encap_type != UDP_ENCAP_ESPINUDP)) ||
+ (!info.encap && x->encap))
+ goto put_state;
+
+ if (info.encap && info.dport != x->encap->encap_dport)
+ goto put_state;
+
+ state = (struct airoha_soe_xfrm_state *)x->xso.offload_handle;
+ if (!state || state->dev != dev || !state->sa)
+ goto put_state;
+
+ ctx = kzalloc_obj(*ctx, GFP_ATOMIC);
+ if (!ctx)
+ goto put_state;
+
+ ctx->dir = AIROHA_SOE_CTX_IN;
+ ctx->rx.x = x;
+ ctx->rx.gdm_dev = dev;
+ ctx->rx.dev = rx_dev;
+ ctx->rx.saddr = ip_hdr(skb)->saddr;
+ ctx->rx.sport = info.sport;
+ ctx->rx.foe_hash = foe_hash;
+ ctx->rx.foe_reason = foe_reason;
+ ctx->rx.sa_index = state->sa->index;
+ ctx->rx.foe_valid = foe_valid;
+ ctx->rx.mark = skb->mark;
+
+ err = airoha_soe_submit_skb(state->sa, dev, skb, ctx);
+ if (err) {
+ airoha_soe_free_ctx(ctx);
+ goto drop_state;
+ }
+
+ return true;
+
+drop_state:
+ kfree_skb(skb);
+ return true;
+put_state:
+ xfrm_state_put(x);
+ return false;
+}
+
+static bool airoha_soe_complete_out(struct sk_buff *skb,
+ struct airoha_soe_ctx *ctx)
+{
+ struct dst_entry *dst = ctx->dst;
+ struct net *net;
+ int err;
+
+ ctx->dst = NULL;
+ if (!pskb_may_pull(skb, ETH_HLEN + 1))
+ goto drop;
+ skb_pull(skb, ETH_HLEN);
+
+ err = airoha_soe_prepare_ip_headers(skb);
+ if (err)
+ goto drop;
+
+ /* Re-enter dst_output() with the original dst after hardware ESP encode. */
+ skb->protocol = htons(ETH_P_IP);
+ skb_dst_drop(skb);
+ skb_dst_set(skb, dst);
+ skb->ignore_df = 1;
+ net = dev_net(dst->dev);
+ kfree(ctx);
+ dst_output(net, NULL, skb);
+
+ return true;
+
+drop:
+ dst_release(dst);
+ kfree(ctx);
+ kfree_skb(skb);
+ return true;
+}
+
+static bool airoha_soe_complete_in(struct sk_buff *skb,
+ struct airoha_soe_ctx *ctx)
+{
+ struct xfrm_state *x = ctx->rx.x;
+ struct net_device *rx_dev = ctx->rx.dev;
+ struct xfrm_offload *xo;
+ struct sec_path *sp;
+ int err;
+
+ if (!pskb_may_pull(skb, ETH_HLEN + 1))
+ goto drop;
+ skb_pull(skb, ETH_HLEN);
+
+ err = airoha_soe_prepare_ip_headers(skb);
+ if (err)
+ goto drop;
+
+ skb->dev = rx_dev;
+ skb->mark = ctx->rx.mark;
+ skb->ip_summed = CHECKSUM_NONE;
+ skb_reset_mac_header(skb);
+ skb_reset_mac_len(skb);
+ skb->pkt_type = PACKET_HOST;
+ skb->encapsulation = 0;
+ skb_dst_drop(skb);
+
+ if (x->encap && x->encap->encap_type == UDP_ENCAP_ESPINUDP &&
+ (ctx->rx.saddr != x->props.saddr.a4 ||
+ ctx->rx.sport != x->encap->encap_sport)) {
+ xfrm_address_t ipaddr = {
+ .a4 = ctx->rx.saddr,
+ };
+
+ km_new_mapping(x, &ipaddr, ctx->rx.sport);
+ }
+
+ /* Tell xfrm_input() equivalent consumers that hardware already decrypted. */
+ sp = secpath_set(skb);
+ if (!sp)
+ goto drop;
+
+ if (sp->len == XFRM_MAX_DEPTH) {
+ secpath_reset(skb);
+ goto drop;
+ }
+
+ sp->xvec[sp->len++] = x;
+ sp->olen++;
+ ctx->rx.x = NULL;
+ xo = xfrm_offload(skb);
+ if (!xo) {
+ secpath_reset(skb);
+ goto drop;
+ }
+
+ xo->flags = CRYPTO_DONE;
+ xo->status = CRYPTO_SUCCESS;
+
+ airoha_soe_trace_rx_complete(skb, ctx, x);
+
+ /* SOE decrypt completion reaches the CPU before the routed plaintext
+ * packet has selected its final egress port. Preserve the original FOE
+ * hash and SA hop until the Ethernet xmit path can bind that decrypt
+ * entry with the completed L2/PSE descriptor.
+ */
+ if (ctx->rx.foe_valid)
+ airoha_ppe_soe_mark_skb(&ctx->rx.gdm_dev->eth->ppe->dev, skb,
+ ctx->rx.foe_hash, ctx->rx.sa_index,
+ AIROHA_SOE_HOP_INFO_DECRYPT);
+
+ kfree(ctx);
+ netif_rx(skb);
+
+ return true;
+
+drop:
+ airoha_soe_free_ctx(ctx);
+ kfree_skb(skb);
+ return true;
+}
+
+bool airoha_soe_rx_skb(struct airoha_soe *soe, struct sk_buff *skb,
+ unsigned int sa_index, u32 hop_flags)
+{
+ struct airoha_soe_ctx *ctx;
+ struct airoha_soe_sa *sa;
+
+ if (!soe || !skb || sa_index >= AIROHA_SOE_NUM_SA)
+ return false;
+
+ rcu_read_lock();
+ sa = rcu_dereference(soe->sa[sa_index]);
+ if (!sa) {
+ rcu_read_unlock();
+ return false;
+ }
+
+ if (hop_flags >= AIROHA_SOE_HOP_FLAG_ERROR_BASE) {
+ ctx = airoha_soe_pop_ctx(sa, AIROHA_SOE_CTX_OUT);
+ if (!ctx)
+ ctx = airoha_soe_pop_ctx(sa, AIROHA_SOE_CTX_IN);
+ rcu_read_unlock();
+ airoha_soe_free_ctx(ctx);
+ kfree_skb(skb);
+ return true;
+ }
+
+ if (hop_flags == AIROHA_SOE_HOP_FLAG_ENCRYPTED) {
+ ctx = airoha_soe_pop_ctx(sa, AIROHA_SOE_CTX_OUT);
+ rcu_read_unlock();
+ if (!ctx) {
+ kfree_skb(skb);
+ return true;
+ }
+ return airoha_soe_complete_out(skb, ctx);
+ }
+
+ if (hop_flags == AIROHA_SOE_HOP_FLAG_DECRYPTED) {
+ ctx = airoha_soe_pop_ctx(sa, AIROHA_SOE_CTX_IN);
+ rcu_read_unlock();
+ if (!ctx) {
+ kfree_skb(skb);
+ return true;
+ }
+ return airoha_soe_complete_in(skb, ctx);
+ }
+
+ rcu_read_unlock();
+ return false;
+}
+
+bool airoha_soe_has_pending_rx(struct airoha_soe *soe)
+{
+ if (!soe)
+ return false;
+
+ return !!atomic_read(&soe->pending_rx);
+}
+
+int airoha_soe_xfrm_ppe_info(const struct dst_entry *dst, u8 *sa_index, u8 *hop)
+{
+ struct airoha_soe_xfrm_state *state;
+ struct net_device *netdev;
+ struct xfrm_state *x;
+
+ if (!dst || !sa_index || !hop)
+ return -EINVAL;
+
+ x = dst_xfrm(dst);
+ if (!x || x->xso.type != XFRM_DEV_OFFLOAD_PACKET)
+ return -EOPNOTSUPP;
+
+ state = (struct airoha_soe_xfrm_state *)x->xso.offload_handle;
+ if (!state || !state->sa)
+ return -ENODEV;
+
+ if (!state->dev)
+ return -ENODEV;
+
+ netdev = netdev_from_priv(state->dev);
+ if (netdev != x->xso.dev || !(netdev->features & NETIF_F_HW_ESP))
+ return -ENODEV;
+
+ switch (x->xso.dir) {
+ case XFRM_DEV_OFFLOAD_OUT:
+ *hop = AIROHA_SOE_HOP_INFO_ENCRYPT;
+ break;
+ case XFRM_DEV_OFFLOAD_IN:
+ *hop = AIROHA_SOE_HOP_INFO_DECRYPT;
+ break;
+ default:
+ return -EOPNOTSUPP;
+ }
+
+ *sa_index = state->sa->index;
+
+ return 0;
+}
+
+static int airoha_soe_xfrm_state_add(struct net_device *dev,
+ struct xfrm_state *x,
+ struct netlink_ext_ack *extack)
+{
+ struct airoha_soe_xfrm_state *state;
+ struct airoha_gdm_dev *gdm_dev;
+ struct airoha_soe *soe;
+ gfp_t gfp;
+ int err;
+
+ if (dev->xfrmdev_ops != &airoha_soe_xfrmdev_ops ||
+ !(dev->features & NETIF_F_HW_ESP))
+ return -EOPNOTSUPP;
+
+ gdm_dev = netdev_priv(dev);
+ soe = airoha_soe_get_ref(gdm_dev->eth->soe);
+ if (!soe)
+ return -ENODEV;
+
+ gfp = (x->xso.flags & XFRM_DEV_OFFLOAD_FLAG_ACQ) ? GFP_ATOMIC :
+ GFP_KERNEL;
+ state = kzalloc_obj(*state, gfp);
+ if (!state) {
+ airoha_soe_put_ref(soe);
+ return -ENOMEM;
+ }
+
+ state->dev = gdm_dev;
+ state->soe = soe;
+
+ if (x->xso.flags & XFRM_DEV_OFFLOAD_FLAG_ACQ)
+ goto out;
+
+ err = airoha_soe_alloc_sa(soe, x, extack, &state->sa);
+ if (err)
+ goto err_free;
+
+ atomic_inc(&gdm_dev->soe_xfrm_state_count);
+ state->counted = true;
+out:
+ x->xso.offload_handle = (unsigned long)state;
+ return 0;
+
+err_free:
+ kfree(state);
+ airoha_soe_put_ref(soe);
+ return err;
+}
+
+static void airoha_soe_xfrm_state_delete(struct net_device *dev,
+ struct xfrm_state *x)
+{
+ struct airoha_soe_xfrm_state *state;
+
+ state = (struct airoha_soe_xfrm_state *)x->xso.offload_handle;
+ if (state && state->sa) {
+ airoha_ppe_soe_flush_sa(state->dev->eth->ppe, state->sa->index);
+ airoha_soe_abort_sa(state->sa);
+ }
+}
+
+static void airoha_soe_xfrm_state_free(struct net_device *dev,
+ struct xfrm_state *x)
+{
+ struct airoha_soe_xfrm_state *state;
+
+ state = (struct airoha_soe_xfrm_state *)xchg(&x->xso.offload_handle, 0);
+ if (!state)
+ return;
+
+ if (state->sa) {
+ airoha_ppe_soe_flush_sa(state->dev->eth->ppe,
+ state->sa->index);
+ airoha_soe_free_sa(state->sa);
+ }
+ if (state->counted)
+ atomic_dec(&state->dev->soe_xfrm_state_count);
+ airoha_soe_put_ref(state->soe);
+ kfree(state);
+}
+
+static bool airoha_soe_xfrm_offload_ok(struct sk_buff *skb,
+ struct xfrm_state *x)
+{
+ struct airoha_soe_xfrm_state *state;
+ struct net_device *dev = x->xso.dev;
+
+ if (!dev || !(dev->features & NETIF_F_HW_ESP))
+ return false;
+
+ if (x->xso.type != XFRM_DEV_OFFLOAD_PACKET ||
+ x->xso.dir != XFRM_DEV_OFFLOAD_OUT)
+ return false;
+
+ state = (struct airoha_soe_xfrm_state *)x->xso.offload_handle;
+
+ return state && state->sa;
+}
+
+static int airoha_soe_xfrm_packet_xmit_gso(struct sk_buff *skb,
+ struct xfrm_state *x,
+ struct airoha_soe_xfrm_state *state)
+{
+ struct sk_buff *segs, *nskb;
+ int err;
+
+ segs = skb_gso_segment(skb, 0);
+ if (IS_ERR(segs)) {
+ XFRM_INC_STATS(xs_net(x), LINUX_MIB_XFRMOUTERROR);
+ kfree_skb(skb);
+ return PTR_ERR(segs);
+ }
+
+ consume_skb(skb);
+
+ skb_list_walk_safe(segs, skb, nskb) {
+ skb_mark_not_on_list(skb);
+ err = airoha_soe_xmit(state->sa, state->dev, skb, x);
+ if (err) {
+ XFRM_INC_STATS(xs_net(x), LINUX_MIB_XFRMOUTERROR);
+ kfree_skb(skb);
+ kfree_skb_list(nskb);
+ return err;
+ }
+ }
+
+ return 0;
+}
+
+static int airoha_soe_xfrm_packet_xmit(struct sk_buff *skb,
+ struct xfrm_state *x)
+{
+ struct airoha_soe_xfrm_state *state;
+ struct net_device *netdev;
+ int err = -EHOSTUNREACH;
+
+ state = (struct airoha_soe_xfrm_state *)x->xso.offload_handle;
+ if (!state || !state->sa || !state->dev)
+ goto drop;
+
+ netdev = netdev_from_priv(state->dev);
+ if (netdev->xfrmdev_ops != &airoha_soe_xfrmdev_ops ||
+ !(netdev->features & NETIF_F_HW_ESP))
+ goto drop;
+
+ if (skb_is_gso(skb))
+ return airoha_soe_xfrm_packet_xmit_gso(skb, x, state);
+
+ err = airoha_soe_xmit(state->sa, state->dev, skb, x);
+ if (err)
+ goto drop;
+
+ return 0;
+
+drop:
+ XFRM_INC_STATS(xs_net(x), LINUX_MIB_XFRMOUTERROR);
+ kfree_skb(skb);
+ return err;
+}
+
+static int airoha_soe_xfrm_policy_add(struct xfrm_policy *x,
+ struct netlink_ext_ack *extack)
+{
+ if (x->xdo.type != XFRM_DEV_OFFLOAD_PACKET) {
+ NL_SET_ERR_MSG_MOD(extack,
+ "SOE supports XFRM packet policies only");
+ return -EOPNOTSUPP;
+ }
+
+ if (xfrm_policy_id2dir(x->index) >= XFRM_POLICY_MAX) {
+ NL_SET_ERR_MSG_MOD(extack,
+ "SOE does not offload socket policies");
+ return -EOPNOTSUPP;
+ }
+
+ if (x->xfrm_nr != 1 ||
+ x->xfrm_vec[0].id.proto != IPPROTO_ESP) {
+ NL_SET_ERR_MSG_MOD(extack,
+ "SOE offloads ESP policies only");
+ return -EOPNOTSUPP;
+ }
+
+ if (!x->xdo.dev || !(x->xdo.dev->features & NETIF_F_HW_ESP)) {
+ NL_SET_ERR_MSG_MOD(extack, "SOE ESP offload is disabled");
+ return -EOPNOTSUPP;
+ }
+
+ switch (x->xdo.dir) {
+ case XFRM_DEV_OFFLOAD_IN:
+ case XFRM_DEV_OFFLOAD_OUT:
+ return 0;
+ default:
+ NL_SET_ERR_MSG_MOD(extack, "SOE supports in/out policies only");
+ return -EOPNOTSUPP;
+ }
+}
+
+static struct net_device *airoha_soe_dsa_conduit_get(struct net_device *dev)
+{
+ struct net_device *conduit;
+ struct dsa_port *dp;
+
+ if (!dsa_user_dev_check(dev))
+ return NULL;
+
+ /* DSA users expose XFRM, but SOE is attached to their CPU conduit. */
+ dp = dsa_port_from_netdev(dev);
+ if (IS_ERR(dp) || !dp->cpu_dp)
+ return NULL;
+
+ conduit = dsa_port_to_conduit(dp);
+ if (!conduit || conduit->xfrmdev_ops != &airoha_soe_xfrmdev_ops)
+ return NULL;
+
+ dev_hold(conduit);
+
+ return conduit;
+}
+
+static int airoha_soe_dsa_xfrm_state_add(struct net_device *dev,
+ struct xfrm_state *x,
+ struct netlink_ext_ack *extack)
+{
+ struct net_device *conduit;
+ int err;
+
+ conduit = airoha_soe_dsa_conduit_get(dev);
+ if (!conduit) {
+ NL_SET_ERR_MSG_MOD(extack, "SOE DSA conduit is unavailable");
+ return -EOPNOTSUPP;
+ }
+
+ err = airoha_soe_xfrm_state_add(conduit, x, extack);
+ dev_put(conduit);
+
+ return err;
+}
+
+static void airoha_soe_dsa_xfrm_state_delete(struct net_device *dev,
+ struct xfrm_state *x)
+{
+ airoha_soe_xfrm_state_delete(dev, x);
+}
+
+static void airoha_soe_dsa_xfrm_state_free(struct net_device *dev,
+ struct xfrm_state *x)
+{
+ airoha_soe_xfrm_state_free(dev, x);
+}
+
+static bool airoha_soe_dsa_xfrm_offload_ok(struct sk_buff *skb,
+ struct xfrm_state *x)
+{
+ return airoha_soe_xfrm_offload_ok(skb, x);
+}
+
+static int airoha_soe_dsa_xfrm_policy_add(struct xfrm_policy *x,
+ struct netlink_ext_ack *extack)
+{
+ return airoha_soe_xfrm_policy_add(x, extack);
+}
+
+static int airoha_soe_dsa_xfrm_packet_xmit(struct sk_buff *skb,
+ struct xfrm_state *x)
+{
+ return airoha_soe_xfrm_packet_xmit(skb, x);
+}
+
+static const struct xfrmdev_ops airoha_soe_xfrmdev_ops = {
+ .xdo_dev_state_add = airoha_soe_xfrm_state_add,
+ .xdo_dev_state_delete = airoha_soe_xfrm_state_delete,
+ .xdo_dev_state_free = airoha_soe_xfrm_state_free,
+ .xdo_dev_offload_ok = airoha_soe_xfrm_offload_ok,
+ .xdo_dev_policy_add = airoha_soe_xfrm_policy_add,
+ .xdo_dev_packet_xmit = airoha_soe_xfrm_packet_xmit,
+};
+
+static const struct xfrmdev_ops airoha_soe_dsa_xfrmdev_ops = {
+ .xdo_dev_state_add = airoha_soe_dsa_xfrm_state_add,
+ .xdo_dev_state_delete = airoha_soe_dsa_xfrm_state_delete,
+ .xdo_dev_state_free = airoha_soe_dsa_xfrm_state_free,
+ .xdo_dev_offload_ok = airoha_soe_dsa_xfrm_offload_ok,
+ .xdo_dev_policy_add = airoha_soe_dsa_xfrm_policy_add,
+ .xdo_dev_packet_xmit = airoha_soe_dsa_xfrm_packet_xmit,
+};
+
+static void airoha_soe_dsa_proxy_enable(struct net_device *dev)
+{
+ struct net_device *conduit;
+
+ conduit = airoha_soe_dsa_conduit_get(dev);
+ if (!conduit)
+ return;
+
+ if (dev->xfrmdev_ops && dev->xfrmdev_ops != &airoha_soe_dsa_xfrmdev_ops)
+ goto out;
+
+ /* Mirror ESP capability onto DSA users while programming SAs on the conduit. */
+ dev->xfrmdev_ops = &airoha_soe_dsa_xfrmdev_ops;
+ dev->hw_features |= NETIF_F_HW_ESP;
+ dev->hw_enc_features |= NETIF_F_HW_ESP;
+ dev->wanted_features |= NETIF_F_HW_ESP;
+
+ conduit->wanted_features |= NETIF_F_HW_ESP;
+ netdev_update_features(conduit);
+ netdev_update_features(dev);
+out:
+ dev_put(conduit);
+}
+
+static void airoha_soe_dsa_proxy_clear(struct net_device *dev)
+{
+ if (dev->xfrmdev_ops != &airoha_soe_dsa_xfrmdev_ops)
+ return;
+
+ dev->wanted_features &= ~NETIF_F_HW_ESP;
+ dev->hw_features &= ~NETIF_F_HW_ESP;
+ dev->hw_enc_features &= ~NETIF_F_HW_ESP;
+ netdev_update_features(dev);
+ dev->xfrmdev_ops = NULL;
+}
+
+static void airoha_soe_dsa_proxy_scan(bool enable)
+{
+ struct net_device *dev;
+
+ for_each_netdev(&init_net, dev) {
+ if (enable)
+ airoha_soe_dsa_proxy_enable(dev);
+ else
+ airoha_soe_dsa_proxy_clear(dev);
+ }
+}
+
+static int airoha_soe_netdev_event(struct notifier_block *nb,
+ unsigned long event, void *ptr)
+{
+ switch (event) {
+ case NETDEV_REGISTER:
+ case NETDEV_CHANGENAME:
+ airoha_soe_dsa_proxy_scan(true);
+ break;
+ }
+
+ return NOTIFY_DONE;
+}
+
+static struct notifier_block airoha_soe_netdev_notifier = {
+ .notifier_call = airoha_soe_netdev_event,
+};
+
+void airoha_soe_build_netdev(struct net_device *netdev,
+ airoha_soe_xmit_skb_t xmit_skb)
+{
+ struct airoha_gdm_dev *dev = netdev_priv(netdev);
+
+ atomic_set(&dev->soe_xfrm_state_count, 0);
+ dev->soe_xmit_skb = xmit_skb;
+
+ if (!xmit_skb ||
+ !(airoha_soe_features(dev->eth->soe) & AIROHA_SOE_FEATURE_ESP))
+ return;
+
+ netdev->xfrmdev_ops = &airoha_soe_xfrmdev_ops;
+ netdev->hw_features |= NETIF_F_HW_ESP;
+ netdev->hw_enc_features |= NETIF_F_HW_ESP;
+}
+
+void airoha_soe_teardown_netdev(struct net_device *netdev)
+{
+ struct airoha_gdm_dev *dev = netdev_priv(netdev);
+
+ if (netdev->xfrmdev_ops == &airoha_soe_xfrmdev_ops)
+ netdev->xfrmdev_ops = NULL;
+ dev->soe_xmit_skb = NULL;
+}
+
+int airoha_soe_set_features(struct net_device *netdev,
+ netdev_features_t features)
+{
+ netdev_features_t changed = (netdev->features ^ features) &
+ NETIF_F_HW_ESP;
+ struct airoha_gdm_dev *dev = netdev_priv(netdev);
+
+ if (!changed)
+ return 0;
+
+ if ((features & NETIF_F_HW_ESP) &&
+ !(airoha_soe_features(dev->eth->soe) & AIROHA_SOE_FEATURE_ESP))
+ return -EOPNOTSUPP;
+
+ if (atomic_read(&dev->soe_xfrm_state_count)) {
+ netdev_err(netdev,
+ "cannot change ESP features with active SAs\n");
+ return -EBUSY;
+ }
+
+ return 0;
+}
+
+static struct device_node *airoha_soe_find_node(struct airoha_eth *eth)
+{
+ struct device_node *parent, *np;
+
+ if (!eth->dev->of_node)
+ return NULL;
+
+ parent = of_get_parent(eth->dev->of_node);
+ if (!parent)
+ return NULL;
+
+ /* SOE is a sibling DT node; Ethernet owns the provider lifetime. */
+ for_each_child_of_node(parent, np) {
+ if (!of_device_is_available(np) ||
+ !of_device_is_compatible(np, "airoha,en7581-soe"))
+ continue;
+
+ of_node_put(parent);
+ return np;
+ }
+
+ of_node_put(parent);
+
+ return NULL;
+}
+
+int airoha_soe_init(struct airoha_eth *eth)
+{
+ struct device *dev = eth->dev;
+ struct device_node *np;
+ struct resource res;
+ struct airoha_soe *soe;
+ void __iomem *base;
+ int err;
+
+ np = airoha_soe_find_node(eth);
+ if (!np)
+ return 0;
+
+ err = of_address_to_resource(np, 0, &res);
+ if (err)
+ goto put_node;
+
+ base = devm_ioremap_resource(dev, &res);
+ if (IS_ERR(base)) {
+ err = PTR_ERR(base);
+ goto put_node;
+ }
+
+ soe = devm_kzalloc(dev, sizeof(*soe), GFP_KERNEL);
+ if (!soe) {
+ err = -ENOMEM;
+ goto put_node;
+ }
+
+ soe->dev = dev;
+ soe->base = base;
+ mutex_init(&soe->sa_lock);
+ spin_lock_init(&soe->state_lock);
+ refcount_set(&soe->refcnt, 1);
+ init_completion(&soe->released);
+
+ /* Enable the packet engines; reset leaves SOE present but idle. */
+ writel(AIROHA_SOE_INT_ALL, base + AIROHA_SOE_INT_STS);
+ writel(AIROHA_SOE_CNT_CLR_ALL, base + AIROHA_SOE_CNT_CLR);
+ writel(AIROHA_SOE_INT_ALL, base + AIROHA_SOE_INT_EN);
+ writel(AIROHA_SOE_GLB_CFG_ENC_EN | AIROHA_SOE_GLB_CFG_DEC_EN,
+ base + AIROHA_SOE_GLB_CFG);
+
+ err = register_netdevice_notifier(&airoha_soe_netdev_notifier);
+ if (err)
+ goto disable_soe;
+
+ eth->soe = soe;
+
+ rtnl_lock();
+ airoha_soe_dsa_proxy_scan(true);
+ rtnl_unlock();
+
+ of_node_put(np);
+
+ return 0;
+
+disable_soe:
+ writel(0, base + AIROHA_SOE_GLB_CFG);
+ writel(0, base + AIROHA_SOE_INT_EN);
+ writel(0xffffffff, base + AIROHA_SOE_INT_STS);
+put_node:
+ of_node_put(np);
+
+ return err;
+}
+
+void airoha_soe_deinit(struct airoha_eth *eth)
+{
+ struct airoha_soe *soe = eth->soe;
+ unsigned long flags;
+
+ if (!soe)
+ return;
+
+ eth->soe = NULL;
+
+ spin_lock_irqsave(&soe->state_lock, flags);
+ soe->dead = true;
+ spin_unlock_irqrestore(&soe->state_lock, flags);
+
+ rtnl_lock();
+ airoha_soe_dsa_proxy_scan(false);
+ rtnl_unlock();
+ unregister_netdevice_notifier(&airoha_soe_netdev_notifier);
+
+ airoha_soe_put_ref(soe);
+ wait_for_completion(&soe->released);
+
+ writel(0, soe->base + AIROHA_SOE_GLB_CFG);
+ writel(0, soe->base + AIROHA_SOE_INT_EN);
+ writel(0xffffffff, soe->base + AIROHA_SOE_INT_STS);
+}
--
2.53.0
^ permalink raw reply related
* [RFC PATCH net-next 6/7] net: airoha: add PPE support for SOE flows
From: Jihong Min @ 2026-06-14 4:00 UTC (permalink / raw)
To: netdev, Lorenzo Bianconi
Cc: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Andrew Lunn, Simon Horman, Herbert Xu, Steffen Klassert,
Rob Herring, Krzysztof Kozlowski, Conor Dooley, devicetree,
Matthias Brugger, AngeloGioacchino Del Regno, linux-arm-kernel,
linux-mediatek, Christian Marangi, Felix Fietkau, linux-kernel,
Jihong Min
In-Reply-To: <20260614040032.1567994-1-hurryman2212@gmail.com>
Add PPE metadata handling for SOE flows so decrypted packets can carry
their original FOE/SA context until the normal egress path is known, and
so XFRM flowtable entries can be programmed with SOE SA and hop
information.
Signed-off-by: Jihong Min <hurryman2212@gmail.com>
---
drivers/net/ethernet/airoha/airoha_ppe.c | 606 +++++++++++++++++++++-
include/linux/soc/airoha/airoha_offload.h | 5 +
2 files changed, 585 insertions(+), 26 deletions(-)
diff --git a/drivers/net/ethernet/airoha/airoha_ppe.c b/drivers/net/ethernet/airoha/airoha_ppe.c
index 91bcc55a6ac6..faa5f04d4c7b 100644
--- a/drivers/net/ethernet/airoha/airoha_ppe.c
+++ b/drivers/net/ethernet/airoha/airoha_ppe.c
@@ -6,18 +6,77 @@
#include <linux/ip.h>
#include <linux/ipv6.h>
+#include <linux/kstrtox.h>
+#include <linux/moduleparam.h>
#include <linux/of_platform.h>
#include <linux/platform_device.h>
#include <linux/rhashtable.h>
+#include <linux/sysfs.h>
+#include <linux/tcp.h>
+#include <linux/udp.h>
#include <net/ipv6.h>
+#include <net/netfilter/nf_flow_table.h>
#include <net/pkt_cls.h>
+#include <net/route.h>
#include "airoha_regs.h"
#include "airoha_eth.h"
+#include "airoha_soe.h"
static DEFINE_MUTEX(flow_offload_mutex);
static DEFINE_SPINLOCK(ppe_lock);
+#define AIROHA_FOE_HOP0 GENMASK(31, 29)
+#define AIROHA_FOE_HOP1 GENMASK(28, 26)
+#define AIROHA_FOE_HOP2 GENMASK(25, 23)
+#define AIROHA_FOE_HOP3 GENMASK(22, 20)
+#define AIROHA_FOE_HOP_MASK \
+ (AIROHA_FOE_HOP0 | AIROHA_FOE_HOP1 | AIROHA_FOE_HOP2 | \
+ AIROHA_FOE_HOP3)
+#define AIROHA_PPE_SOE_DEFAULT_TUNNEL_MTU 1500
+#define AIROHA_PPE_SOE_MAGIC_IPSEC 0x72a1
+#define AIROHA_PPE_SOE_PORT_AG 0x3f
+#define AIROHA_PPE_SOE_CHANNEL 2
+#define AIROHA_PPE_SOE_META_TIMEOUT_MS 1000
+#define AIROHA_PPE_SOE_MAGIC_GDM4 0x729a
+#define AIROHA_PPE_SOE_MARK_MAGIC 0x5e000000
+#define AIROHA_PPE_SOE_MARK_MAGIC_MASK 0xff000000
+#define AIROHA_PPE_SOE_MARK_HASH_MASK 0x00ffff00
+#define AIROHA_PPE_DEFAULT_BIND_RATE 0x1e
+#define AIROHA_PPE_SOE_BIND_DELAY_PACKETS 2
+#define AIROHA_PPE_FORCE_COMMIT_PROBE_WINDOW 0
+
+struct airoha_ppe_soe_tuple_info {
+ unsigned int tunnel_mtu;
+ u8 sa_index;
+ u8 hop;
+};
+
+static unsigned int airoha_ppe_bind_rate = AIROHA_PPE_DEFAULT_BIND_RATE;
+static unsigned int airoha_ppe_soe_inline_bind_delay_packets =
+ AIROHA_PPE_SOE_BIND_DELAY_PACKETS;
+static unsigned int airoha_ppe_soe_inline_force_commit_probe_window =
+ AIROHA_PPE_FORCE_COMMIT_PROBE_WINDOW;
+static struct airoha_ppe *airoha_ppe_active;
+
+static int airoha_ppe_set_bind_rate(const char *val,
+ const struct kernel_param *kp);
+static int airoha_ppe_get_bind_rate(char *buf, const struct kernel_param *kp);
+
+module_param_named(soe_inline_force_commit_probe_window,
+ airoha_ppe_soe_inline_force_commit_probe_window, uint, 0600);
+MODULE_PARM_DESC(soe_inline_force_commit_probe_window,
+ "Adjacent FOE slots searched before force-commit");
+module_param_named(soe_inline_bind_delay_packets,
+ airoha_ppe_soe_inline_bind_delay_packets, uint, 0600);
+MODULE_PARM_DESC(soe_inline_bind_delay_packets,
+ "CPU-visible SOE decrypt packets before binding FOE entry");
+module_param_call(ppe_bind_rate, airoha_ppe_set_bind_rate,
+ airoha_ppe_get_bind_rate, NULL, 0600);
+__MODULE_PARM_TYPE(ppe_bind_rate, "uint");
+MODULE_PARM_DESC(ppe_bind_rate,
+ "PPE bind-rate threshold for L2B and bind fields");
+
static const struct rhashtable_params airoha_flow_table_params = {
.head_offset = offsetof(struct airoha_flow_table_entry, node),
.key_offset = offsetof(struct airoha_flow_table_entry, cookie),
@@ -78,6 +137,17 @@ bool airoha_ppe_is_enabled(struct airoha_eth *eth, int index)
return airoha_fe_rr(eth, REG_PPE_GLO_CFG(index)) & PPE_GLO_CFG_EN_MASK;
}
+static void airoha_ppe_apply_bind_rate(struct airoha_eth *eth, int ppe_idx)
+{
+ u32 rate = min_t(u32, READ_ONCE(airoha_ppe_bind_rate),
+ FIELD_MAX(PPE_BIND_RATE_BIND_MASK));
+
+ airoha_fe_rmw(eth, REG_PPE_BIND_RATE(ppe_idx),
+ PPE_BIND_RATE_L2B_BIND_MASK | PPE_BIND_RATE_BIND_MASK,
+ FIELD_PREP(PPE_BIND_RATE_L2B_BIND_MASK, rate) |
+ FIELD_PREP(PPE_BIND_RATE_BIND_MASK, rate));
+}
+
static u32 airoha_ppe_get_timestamp(struct airoha_ppe *ppe)
{
return airoha_fe_get(ppe->eth, REG_FE_FOE_TS,
@@ -157,15 +227,14 @@ static void airoha_ppe_hw_init(struct airoha_ppe *ppe)
FIELD_PREP(PPE_DRAM_TB_NUM_ENTRY_MASK,
dram_num_entries));
- airoha_fe_rmw(eth, REG_PPE_BIND_RATE(i),
- PPE_BIND_RATE_L2B_BIND_MASK |
- PPE_BIND_RATE_BIND_MASK,
- FIELD_PREP(PPE_BIND_RATE_L2B_BIND_MASK, 0x1e) |
- FIELD_PREP(PPE_BIND_RATE_BIND_MASK, 0x1e));
+ airoha_ppe_apply_bind_rate(eth, i);
airoha_fe_wr(eth, REG_PPE_HASH_SEED(i), PPE_HASH_SEED);
airoha_fe_clear(eth, REG_PPE_PPE_FLOW_CFG(i),
PPE_FLOW_CFG_IP6_6RD_MASK);
+ airoha_fe_set(eth, REG_PPE_PPE_FLOW_CFG(i),
+ PPE_FLOW_CFG_IP4_IPSEC_MASK |
+ PPE_FLOW_CFG_IP6_IPSEC_MASK);
for (p = 0; p < ARRAY_SIZE(eth->ports); p++)
airoha_fe_rmw(eth, REG_PPE_MTU(i, p),
@@ -509,6 +578,162 @@ static int airoha_ppe_foe_entry_set_ipv6_tuple(struct airoha_foe_entry *hwe,
return 0;
}
+static int airoha_ppe_soe_fill_inner_ipv4_data(struct sk_buff *skb,
+ struct airoha_flow_data *data,
+ int *type, int *l4proto)
+{
+ unsigned int ip_offset = ETH_HLEN;
+ union {
+ struct tcphdr tcp;
+ struct udphdr udp;
+ } ports;
+ struct iphdr iph_buf, *iph;
+ unsigned int l4_offset;
+ struct udphdr *udp;
+ struct tcphdr *tcp;
+
+ if (skb_headlen(skb) < ETH_HLEN)
+ return -EINVAL;
+
+ memcpy(&data->eth, skb->data, ETH_HLEN);
+ if (data->eth.h_proto != htons(ETH_P_IP))
+ return -EAFNOSUPPORT;
+
+ iph = skb_header_pointer(skb, ip_offset, sizeof(iph_buf), &iph_buf);
+ if (!iph || iph->ihl < 5 || iph->version != 4)
+ return -EINVAL;
+
+ l4_offset = ip_offset + iph->ihl * 4;
+ data->v4.src_addr = iph->saddr;
+ data->v4.dst_addr = iph->daddr;
+ *l4proto = iph->protocol;
+
+ switch (iph->protocol) {
+ case IPPROTO_TCP:
+ tcp = skb_header_pointer(skb, l4_offset, sizeof(ports.tcp),
+ &ports.tcp);
+ if (!tcp)
+ return -EINVAL;
+
+ data->src_port = tcp->source;
+ data->dst_port = tcp->dest;
+ *type = PPE_PKT_TYPE_IPV4_HNAPT;
+ break;
+ case IPPROTO_UDP:
+ udp = skb_header_pointer(skb, l4_offset, sizeof(ports.udp),
+ &ports.udp);
+ if (!udp)
+ return -EINVAL;
+
+ data->src_port = udp->source;
+ data->dst_port = udp->dest;
+ *type = PPE_PKT_TYPE_IPV4_HNAPT;
+ break;
+ default:
+ *type = PPE_PKT_TYPE_IPV4_ROUTE;
+ break;
+ }
+
+ return 0;
+}
+
+static int airoha_ppe_foe_entry_set_soe_fields(struct airoha_foe_entry *hwe,
+ u8 sa_index, u8 hop,
+ unsigned int tunnel_mtu)
+{
+ int type;
+
+ if (hop > FIELD_MAX(AIROHA_FOE_HOP0))
+ return -ERANGE;
+
+ type = FIELD_GET(AIROHA_FOE_IB1_BIND_PACKET_TYPE, hwe->ib1);
+ switch (type) {
+ case PPE_PKT_TYPE_IPV4_HNAPT:
+ case PPE_PKT_TYPE_IPV4_ROUTE:
+ break;
+ default:
+ return -EOPNOTSUPP;
+ }
+
+ tunnel_mtu = min_t(unsigned int, tunnel_mtu,
+ FIELD_MAX(AIROHA_FOE_TUNNEL_MTU));
+
+ /* SOE FOE entries store the hop selector and SA index here. */
+ hwe->ipv4.rsv[0] &= ~AIROHA_FOE_HOP_MASK;
+ hwe->ipv4.rsv[0] |= FIELD_PREP(AIROHA_FOE_HOP0, hop);
+
+ hwe->ipv4.data &= ~(AIROHA_FOE_ACTDP | AIROHA_FOE_TUNNEL_ID);
+ hwe->ipv4.data |= AIROHA_FOE_TUNNEL |
+ FIELD_PREP(AIROHA_FOE_ACTDP, sa_index);
+
+ hwe->ipv4.l2.meter &= ~AIROHA_FOE_TUNNEL_MTU;
+ hwe->ipv4.l2.meter |= FIELD_PREP(AIROHA_FOE_TUNNEL_MTU, tunnel_mtu);
+
+ hwe->ib1 &= ~AIROHA_FOE_IB1_BIND_TUNNEL_DECAP;
+
+ return 0;
+}
+
+static int
+airoha_ppe_soe_get_tuple_info(const struct flow_offload_tuple *tuple,
+ struct airoha_ppe_soe_tuple_info *info)
+{
+ int err;
+
+ if (!tuple || !info)
+ return -EINVAL;
+ if (tuple->xmit_type != FLOW_OFFLOAD_XMIT_XFRM)
+ return -EOPNOTSUPP;
+
+ err = airoha_soe_xfrm_ppe_info(tuple->dst_cache, &info->sa_index,
+ &info->hop);
+ if (err)
+ return err;
+
+ info->tunnel_mtu = tuple->mtu ? tuple->mtu :
+ AIROHA_PPE_SOE_DEFAULT_TUNNEL_MTU;
+
+ return 0;
+}
+
+static int
+airoha_ppe_foe_entry_set_soe_info(struct airoha_foe_entry *hwe,
+ const struct flow_offload_tuple *tuple)
+{
+ struct airoha_ppe_soe_tuple_info info;
+ int err;
+
+ if (!tuple)
+ return 0;
+ if (tuple->xmit_type != FLOW_OFFLOAD_XMIT_XFRM)
+ return 0;
+
+ err = airoha_ppe_soe_get_tuple_info(tuple, &info);
+ if (err)
+ return err;
+
+ err = airoha_ppe_foe_entry_set_soe_fields(hwe, info.sa_index,
+ info.hop, info.tunnel_mtu);
+ if (err)
+ return err;
+
+ /* XFRM packet-offload entries are not plain Ethernet/IP entries:
+ * the PPE must tag them as SOE/IPsec work and submit them through the
+ * SOE-facing channel/port aggregation path. Without these fields the
+ * entry can still become BND, but traffic falls back to the slow path
+ * instead of the inline encrypt/decrypt datapath.
+ */
+ hwe->ipv4.l2.common.etype = AIROHA_PPE_SOE_MAGIC_IPSEC;
+ hwe->ipv4.data &= ~AIROHA_FOE_CHANNEL;
+ hwe->ipv4.data |= FIELD_PREP(AIROHA_FOE_CHANNEL,
+ AIROHA_PPE_SOE_CHANNEL);
+ hwe->ipv4.ib2 &= ~AIROHA_FOE_IB2_PORT_AG;
+ hwe->ipv4.ib2 |= FIELD_PREP(AIROHA_FOE_IB2_PORT_AG,
+ AIROHA_PPE_SOE_PORT_AG);
+
+ return 0;
+}
+
static u32 airoha_ppe_foe_get_entry_hash(struct airoha_ppe *ppe,
struct airoha_foe_entry *hwe)
{
@@ -633,6 +858,9 @@ static void airoha_ppe_foe_flow_stats_update(struct airoha_ppe *ppe,
meter = &hwe->ipv4.l2.meter;
}
+ if (*data & AIROHA_FOE_TUNNEL)
+ return;
+
pse_port = FIELD_GET(AIROHA_FOE_IB2_PSE_PORT, *ib2);
if (pse_port == FE_PSE_PORT_CDM4)
return;
@@ -868,16 +1096,129 @@ airoha_ppe_foe_commit_subflow_entry(struct airoha_ppe *ppe,
return 0;
}
-static void airoha_ppe_foe_insert_entry(struct airoha_ppe *ppe,
- struct sk_buff *skb,
- u32 hash, bool rx_wlan)
+static void airoha_ppe_soe_meta_store(struct airoha_ppe_soe_meta *meta,
+ u32 key_hash, u16 foe_hash, u8 sa_index,
+ u8 hop)
{
- struct airoha_flow_table_entry *e;
- struct airoha_foe_bridge br = {};
- struct airoha_foe_entry *hwe;
- bool commit_done = false;
- struct hlist_node *n;
- u32 index, state;
+ u8 seen = 1;
+
+ if (READ_ONCE(meta->valid) &&
+ READ_ONCE(meta->key_hash) == key_hash &&
+ READ_ONCE(meta->foe_hash) == foe_hash &&
+ READ_ONCE(meta->sa_index) == sa_index &&
+ READ_ONCE(meta->hop) == hop &&
+ time_before_eq(jiffies, READ_ONCE(meta->expires)))
+ seen = min_t(u8, READ_ONCE(meta->seen) + 1, U8_MAX);
+
+ WRITE_ONCE(meta->key_hash, key_hash);
+ WRITE_ONCE(meta->foe_hash, foe_hash);
+ WRITE_ONCE(meta->sa_index, sa_index);
+ WRITE_ONCE(meta->hop, hop);
+ WRITE_ONCE(meta->seen, seen);
+ WRITE_ONCE(meta->expires,
+ jiffies + msecs_to_jiffies(AIROHA_PPE_SOE_META_TIMEOUT_MS));
+ WRITE_ONCE(meta->valid, 1);
+}
+
+void airoha_ppe_soe_mark_skb(struct airoha_ppe_dev *dev, struct sk_buff *skb,
+ u16 hash, u8 sa_index, u8 hop)
+{
+ struct airoha_ppe *ppe;
+ u32 ppe_hash_mask;
+
+ if (!dev || !skb)
+ return;
+
+ ppe = dev->priv;
+ if (!ppe || !ppe->soe_meta)
+ return;
+
+ ppe_hash_mask = airoha_ppe_get_total_num_entries(ppe) - 1;
+ if (hash > ppe_hash_mask)
+ return;
+
+ /* SOE decrypt completion is CPU-visible before normal routing has
+ * selected the plaintext egress netdev. Keep the original encrypted FOE
+ * hash and SA hop briefly on the skb so airoha_dev_xmit() can finish
+ * the PPE entry once the final egress descriptor is known.
+ */
+ airoha_ppe_soe_meta_store(&ppe->soe_meta[hash], hash, hash, sa_index,
+ hop);
+ ppe->foe_check_time[hash] = 0;
+
+ skb->mark &= ~(AIROHA_PPE_SOE_MARK_MAGIC_MASK |
+ AIROHA_PPE_SOE_MARK_HASH_MASK);
+ skb->mark |= AIROHA_PPE_SOE_MARK_MAGIC |
+ FIELD_PREP(AIROHA_PPE_SOE_MARK_HASH_MASK, hash);
+}
+
+bool airoha_ppe_soe_skb_marked(struct sk_buff *skb)
+{
+ return skb && ((skb->mark & AIROHA_PPE_SOE_MARK_MAGIC_MASK) ==
+ AIROHA_PPE_SOE_MARK_MAGIC);
+}
+
+void airoha_ppe_soe_xmit_skb(struct airoha_ppe_dev *dev, struct sk_buff *skb,
+ struct net_device *netdev)
+{
+ struct airoha_foe_entry entry, tmpl, *hwe;
+ struct airoha_flow_data data = {};
+ struct airoha_ppe_soe_meta *meta;
+ u32 ppe_hash_mask, key_hash;
+ struct airoha_gdm_dev *gdm;
+ struct airoha_ppe *ppe;
+ unsigned long expires;
+ u16 hash;
+ int err, l4proto, type;
+ u8 sa_index, hop;
+ u8 seen;
+
+ if (!dev || !skb || !netdev)
+ return;
+
+ if ((skb->mark & AIROHA_PPE_SOE_MARK_MAGIC_MASK) !=
+ AIROHA_PPE_SOE_MARK_MAGIC)
+ return;
+
+ ppe = dev->priv;
+ if (!ppe || !ppe->soe_meta)
+ goto clear_mark;
+
+ ppe_hash_mask = airoha_ppe_get_total_num_entries(ppe) - 1;
+ key_hash = FIELD_GET(AIROHA_PPE_SOE_MARK_HASH_MASK, skb->mark);
+ if (key_hash > ppe_hash_mask)
+ goto clear_mark;
+
+ meta = &ppe->soe_meta[key_hash];
+ if (!READ_ONCE(meta->valid))
+ goto clear_mark;
+
+ if (READ_ONCE(meta->key_hash) != key_hash)
+ goto clear_mark;
+
+ expires = READ_ONCE(meta->expires);
+ if (time_after(jiffies, expires)) {
+ WRITE_ONCE(meta->valid, 0);
+ goto clear_mark;
+ }
+
+ hash = READ_ONCE(meta->foe_hash);
+ if (hash > ppe_hash_mask) {
+ WRITE_ONCE(meta->valid, 0);
+ goto clear_mark;
+ }
+
+ seen = READ_ONCE(meta->seen);
+ if (seen <= READ_ONCE(airoha_ppe_soe_inline_bind_delay_packets))
+ goto clear_mark;
+
+ err = airoha_ppe_soe_fill_inner_ipv4_data(skb, &data, &type, &l4proto);
+ if (err)
+ goto clear_mark;
+
+ sa_index = READ_ONCE(meta->sa_index);
+ hop = READ_ONCE(meta->hop);
+ WRITE_ONCE(meta->valid, 0);
spin_lock_bh(&ppe_lock);
@@ -885,13 +1226,120 @@ static void airoha_ppe_foe_insert_entry(struct airoha_ppe *ppe,
if (!hwe)
goto unlock;
- state = FIELD_GET(AIROHA_FOE_IB1_BIND_STATE, hwe->ib1);
- if (state == AIROHA_FOE_STATE_BIND)
+ switch (FIELD_GET(AIROHA_FOE_IB1_BIND_PACKET_TYPE, hwe->ib1)) {
+ case PPE_PKT_TYPE_IPV4_HNAPT:
+ case PPE_PKT_TYPE_IPV4_ROUTE:
+ break;
+ default:
goto unlock;
+ }
- index = airoha_ppe_foe_get_entry_hash(ppe, hwe);
- hlist_for_each_entry_safe(e, n, &ppe->foe_flow[index], list) {
+ err = airoha_ppe_foe_entry_prepare(ppe->eth, &tmpl, netdev, type,
+ &data, l4proto);
+ if (err)
+ goto unlock;
+
+ memcpy(&entry, hwe, sizeof(entry));
+ entry.ib1 &= ~(AIROHA_FOE_IB1_BIND_STATE |
+ AIROHA_FOE_IB1_BIND_KEEPALIVE |
+ AIROHA_FOE_IB1_BIND_TIMESTAMP);
+ entry.ib1 |= FIELD_PREP(AIROHA_FOE_IB1_BIND_STATE,
+ AIROHA_FOE_STATE_BIND) |
+ AIROHA_FOE_IB1_BIND_TTL;
+ entry.ib1 = (entry.ib1 & (AIROHA_FOE_IB1_BIND_PACKET_TYPE |
+ AIROHA_FOE_IB1_BIND_UDP)) |
+ (tmpl.ib1 & ~(AIROHA_FOE_IB1_BIND_PACKET_TYPE |
+ AIROHA_FOE_IB1_BIND_UDP));
+ entry.ipv4.ib2 = tmpl.ipv4.ib2;
+ entry.ipv4.data = tmpl.ipv4.data;
+ memcpy(&entry.ipv4.l2, &tmpl.ipv4.l2, sizeof(entry.ipv4.l2));
+
+ gdm = netdev_priv(netdev);
+ if (gdm->port && gdm->port->id == AIROHA_GDM4_IDX)
+ entry.ipv4.l2.common.etype = AIROHA_PPE_SOE_MAGIC_GDM4;
+
+ if (FIELD_GET(AIROHA_FOE_IB1_BIND_PACKET_TYPE, entry.ib1) ==
+ PPE_PKT_TYPE_IPV4_HNAPT)
+ memcpy(&entry.ipv4.new_tuple, &entry.ipv4.orig_tuple,
+ sizeof(entry.ipv4.new_tuple));
+
+ /* Commit the original decrypt entry only after the normal transmit path
+ * has provided the final plaintext egress descriptor. Binding it at SOE
+ * RX completion would miss this device-specific L2/PSE state.
+ */
+ err = airoha_ppe_foe_entry_set_soe_fields(&entry, sa_index, hop,
+ AIROHA_PPE_SOE_DEFAULT_TUNNEL_MTU);
+ if (!err)
+ airoha_ppe_foe_commit_entry(ppe, &entry, hash, false);
+
+unlock:
+ spin_unlock_bh(&ppe_lock);
+clear_mark:
+ skb->mark &= ~(AIROHA_PPE_SOE_MARK_MAGIC_MASK |
+ AIROHA_PPE_SOE_MARK_HASH_MASK);
+}
+
+void airoha_ppe_soe_flush_sa(struct airoha_ppe *ppe, u8 sa_index)
+{
+ u32 num_entries, hash;
+
+ if (!ppe)
+ return;
+
+ num_entries = airoha_ppe_get_total_num_entries(ppe);
+
+ spin_lock_bh(&ppe_lock);
+ for (hash = 0; hash < num_entries; hash++) {
+ struct airoha_foe_entry *hwe;
+ u32 state, type;
+
+ hwe = airoha_ppe_foe_get_entry_locked(ppe, hash);
+ if (!hwe)
+ continue;
+
+ state = FIELD_GET(AIROHA_FOE_IB1_BIND_STATE, hwe->ib1);
+ if (state != AIROHA_FOE_STATE_BIND)
+ continue;
+
+ type = FIELD_GET(AIROHA_FOE_IB1_BIND_PACKET_TYPE, hwe->ib1);
+ if (type != PPE_PKT_TYPE_IPV4_HNAPT &&
+ type != PPE_PKT_TYPE_IPV4_ROUTE)
+ continue;
+
+ if (!(hwe->ipv4.data & AIROHA_FOE_TUNNEL))
+ continue;
+
+ if (FIELD_GET(AIROHA_FOE_ACTDP, hwe->ipv4.data) != sa_index)
+ continue;
+
+ /* NAT-T data and IKE control both use UDP/4500. A stale SOE
+ * bound entry can otherwise keep sending later IKE_AUTH packets
+ * to the SOE path after the SA has been deleted.
+ */
+ hwe->ib1 &= ~AIROHA_FOE_IB1_BIND_STATE;
+ hwe->ib1 |= FIELD_PREP(AIROHA_FOE_IB1_BIND_STATE,
+ AIROHA_FOE_STATE_INVALID);
+ airoha_ppe_foe_commit_entry(ppe, hwe, hash, false);
+ }
+ spin_unlock_bh(&ppe_lock);
+}
+
+static bool airoha_ppe_foe_try_flow_commit_bucket(struct airoha_ppe *ppe,
+ struct airoha_foe_entry *hwe,
+ u32 hash, u32 probe_index,
+ bool rx_wlan,
+ bool allow_l2_subflow)
+{
+ struct airoha_flow_table_entry *e;
+ struct hlist_node *n;
+ bool commit_done = false;
+ u32 state;
+
+ hlist_for_each_entry_safe(e, n, &ppe->foe_flow[probe_index], list) {
if (e->type == FLOW_TYPE_L2_SUBFLOW) {
+ if (!allow_l2_subflow)
+ continue;
+
state = FIELD_GET(AIROHA_FOE_IB1_BIND_STATE, hwe->ib1);
if (state != AIROHA_FOE_STATE_BIND) {
e->hash = 0xffff;
@@ -908,6 +1356,51 @@ static void airoha_ppe_foe_insert_entry(struct airoha_ppe *ppe,
e->hash = hash;
}
+ return commit_done;
+}
+
+static void airoha_ppe_foe_insert_entry(struct airoha_ppe *ppe,
+ struct sk_buff *skb,
+ u32 hash, bool rx_wlan)
+{
+ struct airoha_flow_table_entry *e;
+ struct airoha_foe_bridge br = {};
+ struct airoha_foe_entry *hwe;
+ bool commit_done = false;
+ u32 index, mask, state, window;
+ unsigned int i;
+
+ spin_lock_bh(&ppe_lock);
+
+ hwe = airoha_ppe_foe_get_entry_locked(ppe, hash);
+ if (!hwe)
+ goto unlock;
+
+ state = FIELD_GET(AIROHA_FOE_IB1_BIND_STATE, hwe->ib1);
+ if (state == AIROHA_FOE_STATE_BIND)
+ goto unlock;
+
+ index = airoha_ppe_foe_get_entry_hash(ppe, hwe);
+ commit_done =
+ airoha_ppe_foe_try_flow_commit_bucket(ppe, hwe, hash, index,
+ rx_wlan, true);
+
+ mask = airoha_ppe_get_total_num_entries(ppe) - 1;
+ window = min_t(u32,
+ READ_ONCE(airoha_ppe_soe_inline_force_commit_probe_window),
+ mask);
+ for (i = 1; !commit_done && i <= window; i++) {
+ u32 candidates[2] = { (index + i) & mask, (index - i) & mask };
+ unsigned int j;
+
+ for (j = 0; !commit_done && j < ARRAY_SIZE(candidates); j++) {
+ commit_done =
+ airoha_ppe_foe_try_flow_commit_bucket(ppe, hwe,
+ hash, candidates[j],
+ rx_wlan, false);
+ }
+ }
+
if (commit_done)
goto unlock;
@@ -940,8 +1433,9 @@ airoha_ppe_foe_l2_flow_commit_entry(struct airoha_ppe *ppe,
airoha_l2_flow_table_params);
}
-static int airoha_ppe_foe_flow_commit_entry(struct airoha_ppe *ppe,
- struct airoha_flow_table_entry *e)
+static int
+airoha_ppe_foe_flow_commit_entry(struct airoha_ppe *ppe,
+ struct airoha_flow_table_entry *e)
{
int type = FIELD_GET(AIROHA_FOE_IB1_BIND_PACKET_TYPE, e->data.ib1);
u32 hash;
@@ -1057,6 +1551,7 @@ static int airoha_ppe_entry_idle_time(struct airoha_ppe *ppe,
static int airoha_ppe_flow_offload_replace(struct airoha_eth *eth,
struct flow_cls_offload *f)
{
+ const struct flow_offload_tuple *tuple = (const void *)f->cookie;
struct flow_rule *rule = flow_cls_offload_flow_rule(f);
struct airoha_flow_table_entry *e;
struct airoha_flow_data data = {};
@@ -1183,7 +1678,9 @@ static int airoha_ppe_flow_offload_replace(struct airoha_eth *eth,
flow_rule_match_ipv4_addrs(rule, &addrs);
data.v4.src_addr = addrs.key->src;
data.v4.dst_addr = addrs.key->dst;
- airoha_ppe_foe_entry_set_ipv4_tuple(&hwe, &data, false);
+ err = airoha_ppe_foe_entry_set_ipv4_tuple(&hwe, &data, false);
+ if (err)
+ return err;
}
if (addr_type == FLOW_DISSECTOR_KEY_IPV6_ADDRS) {
@@ -1228,6 +1725,10 @@ static int airoha_ppe_flow_offload_replace(struct airoha_eth *eth,
return err;
}
+ err = airoha_ppe_foe_entry_set_soe_info(&hwe, tuple);
+ if (err)
+ return err;
+
e = kzalloc_obj(*e);
if (!e)
return -ENOMEM;
@@ -1350,16 +1851,26 @@ static int airoha_ppe_flow_offload_cmd(struct airoha_eth *eth,
return -EOPNOTSUPP;
}
-static int airoha_ppe_flush_sram_entries(struct airoha_ppe *ppe)
+static int airoha_ppe_flush_entries(struct airoha_ppe *ppe)
{
+ u32 ppe_num_entries = airoha_ppe_get_total_num_entries(ppe);
u32 sram_num_entries = airoha_ppe_get_total_sram_num_entries(ppe);
struct airoha_foe_entry *hwe = ppe->foe;
int i, err = 0;
+ memset(hwe, 0, ppe_num_entries * sizeof(*hwe));
+ if (ppe->foe_stats) {
+ u32 ppe_num_stats_entries =
+ airoha_ppe_get_total_num_stats_entries(ppe);
+
+ memset(ppe->foe_stats, 0,
+ ppe_num_stats_entries * sizeof(*ppe->foe_stats));
+ }
+ dma_wmb();
+
for (i = 0; i < sram_num_entries; i++) {
int err;
- memset(&hwe[i], 0, sizeof(*hwe));
err = airoha_ppe_foe_commit_sram_entry(ppe, i);
if (err)
break;
@@ -1368,6 +1879,37 @@ static int airoha_ppe_flush_sram_entries(struct airoha_ppe *ppe)
return err;
}
+static int airoha_ppe_set_bind_rate(const char *val,
+ const struct kernel_param *kp)
+{
+ struct airoha_ppe *ppe;
+ unsigned long rate;
+ int err, i;
+
+ err = kstrtoul(val, 0, &rate);
+ if (err)
+ return err;
+ if (rate > FIELD_MAX(PPE_BIND_RATE_BIND_MASK))
+ return -ERANGE;
+
+ WRITE_ONCE(airoha_ppe_bind_rate, (unsigned int)rate);
+
+ mutex_lock(&flow_offload_mutex);
+ ppe = READ_ONCE(airoha_ppe_active);
+ if (ppe) {
+ for (i = 0; i < ppe->eth->soc->num_ppe; i++)
+ airoha_ppe_apply_bind_rate(ppe->eth, i);
+ }
+ mutex_unlock(&flow_offload_mutex);
+
+ return 0;
+}
+
+static int airoha_ppe_get_bind_rate(char *buf, const struct kernel_param *kp)
+{
+ return sysfs_emit(buf, "%u\n", READ_ONCE(airoha_ppe_bind_rate));
+}
+
static struct airoha_npu *airoha_ppe_npu_get(struct airoha_eth *eth)
{
struct airoha_npu *npu = airoha_npu_get(eth->dev);
@@ -1601,12 +2143,20 @@ int airoha_ppe_init(struct airoha_eth *eth)
return -ENOMEM;
}
- ppe->foe_check_time = devm_kzalloc(eth->dev, ppe_num_entries,
- GFP_KERNEL);
+ ppe->foe_check_time =
+ devm_kzalloc(eth->dev,
+ ppe_num_entries * sizeof(*ppe->foe_check_time),
+ GFP_KERNEL);
if (!ppe->foe_check_time)
return -ENOMEM;
- err = airoha_ppe_flush_sram_entries(ppe);
+ ppe->soe_meta = devm_kzalloc(eth->dev,
+ ppe_num_entries * sizeof(*ppe->soe_meta),
+ GFP_KERNEL);
+ if (!ppe->soe_meta)
+ return -ENOMEM;
+
+ err = airoha_ppe_flush_entries(ppe);
if (err)
return err;
@@ -1622,6 +2172,8 @@ int airoha_ppe_init(struct airoha_eth *eth)
if (err)
goto error_l2_flow_table_destroy;
+ WRITE_ONCE(airoha_ppe_active, ppe);
+
return 0;
error_l2_flow_table_destroy:
@@ -1636,6 +2188,8 @@ void airoha_ppe_deinit(struct airoha_eth *eth)
{
struct airoha_npu *npu;
+ WRITE_ONCE(airoha_ppe_active, NULL);
+
mutex_lock(&flow_offload_mutex);
npu = rcu_replace_pointer(eth->npu, NULL,
diff --git a/include/linux/soc/airoha/airoha_offload.h b/include/linux/soc/airoha/airoha_offload.h
index 7589fccfeef6..120dbd274c89 100644
--- a/include/linux/soc/airoha/airoha_offload.h
+++ b/include/linux/soc/airoha/airoha_offload.h
@@ -11,7 +11,12 @@
#include <linux/workqueue.h>
enum {
+ PPE_CPU_REASON_UN_HIT = 0x0d,
+ PPE_CPU_REASON_HIT_UNBIND = 0x0e,
PPE_CPU_REASON_HIT_UNBIND_RATE_REACHED = 0x0f,
+ PPE_CPU_REASON_HIT_BIND_FORCE_CPU = 0x16,
+ PPE_CPU_REASON_HIT_BIND_EXCEED_MTU = 0x1c,
+ PPE_CPU_REASON_NOT_THROUGH_PPE = 0x1e,
};
struct airoha_ppe_dev {
--
2.53.0
^ permalink raw reply related
* [PULL] PCI: meson: Fix PERST# timing by asserting reset before LTSSM enable
From: gowtham @ 2026-06-14 4:26 UTC (permalink / raw)
To: yue.wang, lpieralisi, kwilczynski, mani
Cc: robh, bhelgaas, neil.armstrong, khilman, jbrunet,
martin.blumenstingl, linux-pci, linux-amlogic, linux-arm-kernel,
linux-kernel
The following changes since commit
bb532bfaf7919c7c98caab81864e9ce2646e11e3:
Linux 7.0.11 (2026-06-01 17:54:55 +0200)
are available in the Git repository at:
https://github.com/GowthamKudupudi/linux.git
tags/meson-pcie-warm-reset-linux-7.0.y
for you to fetch changes up to 852811b11795ee389ea6a953ed0db69b76722469:
PCI: meson: Fix PERST# timing by asserting reset before LTSSM enable
(2026-06-14 09:41:01 +0530)
----------------------------------------------------------------
PCI: meson: Fix PERST# timing by asserting reset before LTSSM enable
On warm reboot, the PCIe controller's LTSSM starts link training
immediately if PERST# is already deasserted from the previous boot.
The driver then pulses PERST# for only 500us, which is too short to
properly reset the endpoint device that has already started training.
Fix by moving the PERST# assert/deassert pulse BEFORE enabling LTSSM,
so the endpoint gets a clean reset cycle before link training begins.
This was found on Amlogic G12B (A311D) with NVMe on an M.2 slot.
Cold boot worked because POR held PERST# low; warm reboot did not.
The fix was confirmed on a Banana Pi CM4 with Waveshare IO base board.
----------------------------------------------------------------
Gowtham Kudupudi (1):
PCI: meson: Fix PERST# timing by asserting reset before LTSSM
enable
drivers/pci/controller/dwc/pci-meson.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox