* [PATCH v2 3/4] dt-bindings: PCI: mediatek-gen3: Split Airoha schema and document 2-lanes
From: Christian Marangi @ 2026-06-26 9:20 UTC (permalink / raw)
To: Bjorn Helgaas, Lorenzo Pieralisi, Krzysztof Wilczyński,
Manivannan Sadhasivam, Rob Herring, Krzysztof Kozlowski,
Conor Dooley, Ryder Lee, Michael Turquette, Stephen Boyd,
Brian Masney, Philipp Zabel, Matthias Brugger,
AngeloGioacchino Del Regno, Christian Marangi, Jianjun Wang,
linux-pci, devicetree, linux-kernel, linux-mediatek, linux-clk,
linux-arm-kernel
In-Reply-To: <20260626092029.3525264-1-ansuelsmth@gmail.com>
To permit proper documentation of required property to support PCIe
configured for 2-lanes mode, split the Airoha schema part from the
mediatek-gen3 schema to a dedicated schema.
A PCIe configured for 2-lanes mode require an additional reg for the
secondary PCIe to be configured and the airoha,scu phandle to correctly
configure the PCIe MUX.
Rework the mediatek-gen3 schema to drop any redundant constraint previsouly
introduced for Airoha PCIe properties.
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
---
.../bindings/pci/airoha,en7581-pcie.yaml | 251 ++++++++++++++++++
.../bindings/pci/mediatek-pcie-gen3.yaml | 77 +-----
2 files changed, 256 insertions(+), 72 deletions(-)
create mode 100644 Documentation/devicetree/bindings/pci/airoha,en7581-pcie.yaml
diff --git a/Documentation/devicetree/bindings/pci/airoha,en7581-pcie.yaml b/Documentation/devicetree/bindings/pci/airoha,en7581-pcie.yaml
new file mode 100644
index 000000000000..c690ba7f207c
--- /dev/null
+++ b/Documentation/devicetree/bindings/pci/airoha,en7581-pcie.yaml
@@ -0,0 +1,251 @@
+# SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/pci/airoha,en7581-pcie.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Gen3 PCIe controller on Airoha SoCs
+
+maintainers:
+ - Christian Marangi <ansuelsmth@gmail.com>
+
+description: |+
+ PCIe Gen3 MAC controller for Airoha SoCs, it supports Gen3 speed
+ and compatible with Gen2, Gen1 speed.
+
+ This PCIe controller supports up to 256 MSI vectors, the MSI hardware
+ block diagram is as follows:
+
+ +-----+
+ | GIC |
+ +-----+
+ ^
+ |
+ port->irq
+ |
+ +-+-+-+-+-+-+-+-+
+ |0|1|2|3|4|5|6|7| (PCIe intc)
+ +-+-+-+-+-+-+-+-+
+ ^ ^ ^
+ | | ... |
+ +-------+ +------+ +-----------+
+ | | |
+ +-+-+---+--+--+ +-+-+---+--+--+ +-+-+---+--+--+
+ |0|1|...|30|31| |0|1|...|30|31| |0|1|...|30|31| (MSI sets)
+ +-+-+---+--+--+ +-+-+---+--+--+ +-+-+---+--+--+
+ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^
+ | | | | | | | | | | | | (MSI vectors)
+ | | | | | | | | | | | |
+
+ (MSI SET0) (MSI SET1) ... (MSI SET7)
+
+ With 256 MSI vectors supported, the MSI vectors are composed of 8 sets,
+ each set has its own address for MSI message, and supports 32 MSI vectors
+ to generate interrupt.
+
+properties:
+ compatible:
+ const: airoha,en7581-pcie
+
+ reg:
+ minItems: 1
+ maxItems: 2
+
+ reg-names:
+ minItems: 1
+ maxItems: 2
+
+ interrupts:
+ maxItems: 1
+
+ ranges:
+ minItems: 1
+ maxItems: 8
+
+ iommu-map:
+ maxItems: 1
+
+ iommu-map-mask:
+ const: 0
+
+ resets:
+ minItems: 1
+ maxItems: 4
+
+ reset-names:
+ minItems: 1
+ maxItems: 4
+
+ clocks:
+ maxItems: 1
+
+ clock-names:
+ items:
+ - const: sys-ck
+
+ phys:
+ maxItems: 1
+
+ phy-names:
+ items:
+ - const: pcie-phy
+
+ num-lanes:
+ enum: [1, 2]
+
+ mediatek,pbus-csr:
+ $ref: /schemas/types.yaml#/definitions/phandle-array
+ items:
+ - items:
+ - description: phandle to pbus-csr syscon
+ - description: offset of pbus-csr base address register
+ - description: offset of pbus-csr base address mask register
+ description:
+ Phandle with two arguments to the syscon node used to detect if
+ a given address is accessible on PCIe controller.
+
+ airoha,scu:
+ $ref: /schemas/types.yaml#/definitions/phandle-array
+ items:
+ - items:
+ - description: phandle to airoha SCU syscon
+ description:
+ Phandle to SCU syscon to configure PCIe MUX for 2 lines support.
+
+ '#interrupt-cells':
+ const: 1
+
+ interrupt-controller:
+ description: Interrupt controller node for handling legacy PCI interrupts.
+ type: object
+ properties:
+ '#address-cells':
+ const: 0
+ '#interrupt-cells':
+ const: 1
+ interrupt-controller: true
+
+ required:
+ - '#address-cells'
+ - '#interrupt-cells'
+ - interrupt-controller
+
+ additionalProperties: false
+
+required:
+ - compatible
+ - reg
+ - reg-names
+ - interrupts
+ - ranges
+ - clocks
+ - clock-names
+ - '#interrupt-cells'
+ - interrupt-controller
+
+allOf:
+ - $ref: /schemas/pci/pci-host-bridge.yaml#
+ - if:
+ properties:
+ num-lanes:
+ const: 2
+ then:
+ properties:
+ reg:
+ minItems: 2
+
+ reg-names:
+ items:
+ - const: pcie-mac
+ - const: sec-pcie-mac
+
+ resets:
+ minItems: 4
+
+ reset-names:
+ items:
+ - const: phy-lane0
+ - const: phy-lane1
+ - const: perstout
+ - const: sec-perstout
+
+ required:
+ - airoha,scu
+
+ else:
+ properties:
+ reg:
+ maxItems: 1
+
+ reg-names:
+ items:
+ - const: pcie-mac
+
+ resets:
+ minItems: 2
+ maxItems: 3
+
+ reset-names:
+ minItems: 2
+ items:
+ - enum: [ phy-lane0, phy-lane1, phy-lane2 ]
+ - enum: [ phy-lane1, perstout ]
+ - const: phy-lane2
+
+unevaluatedProperties: false
+
+examples:
+ - |
+ #include <dt-bindings/interrupt-controller/arm-gic.h>
+ #include <dt-bindings/interrupt-controller/irq.h>
+
+ bus {
+ #address-cells = <2>;
+ #size-cells = <2>;
+
+ pcie@1fc00000 {
+ compatible = "airoha,en7581-pcie";
+ device_type = "pci";
+ #address-cells = <3>;
+ #size-cells = <2>;
+
+ reg = <0x0 0x1fc00000 0x0 0x1670>,
+ <0x0 0x1fc20000 0x0 0x1670>;
+ reg-names = "pcie-mac", "sec-pcie-mac";
+
+ clocks = <&scuclk 7>;
+ clock-names = "sys-ck";
+
+ phys = <&pciephy>;
+ phy-names = "pcie-phy";
+
+ ranges = <0x02000000 0 0x20000000 0x0 0x20000000 0 0x4000000>;
+
+ resets = <&scuclk 48>,
+ <&scuclk 49>,
+ <&scuclk 53>,
+ <&scuclk 54>;
+ reset-names = "phy-lane0", "phy-lane1",
+ "perstout", "sec-perstout";
+
+ num-lanes = <2>;
+
+ mediatek,pbus-csr = <&pbus_csr 0x0 0x4>;
+
+ airoha,scu = <&scuclk>;
+
+ interrupts = <GIC_SPI 39 IRQ_TYPE_LEVEL_HIGH>;
+ bus-range = <0x00 0xff>;
+ #interrupt-cells = <1>;
+ interrupt-map-mask = <0 0 0 0x7>;
+ interrupt-map = <0 0 0 1 &pcie_intc 0>,
+ <0 0 0 2 &pcie_intc 1>,
+ <0 0 0 3 &pcie_intc 2>,
+ <0 0 0 4 &pcie_intc 3>;
+ pcie_intc: interrupt-controller {
+ #address-cells = <0>;
+ #interrupt-cells = <1>;
+ interrupt-controller;
+ };
+ };
+ };
diff --git a/Documentation/devicetree/bindings/pci/mediatek-pcie-gen3.yaml b/Documentation/devicetree/bindings/pci/mediatek-pcie-gen3.yaml
index 4db700fc36ba..510f1f2b1c5a 100644
--- a/Documentation/devicetree/bindings/pci/mediatek-pcie-gen3.yaml
+++ b/Documentation/devicetree/bindings/pci/mediatek-pcie-gen3.yaml
@@ -59,7 +59,6 @@ properties:
- const: mediatek,mt8196-pcie
- const: mediatek,mt8192-pcie
- const: mediatek,mt8196-pcie
- - const: airoha,en7581-pcie
reg:
maxItems: 1
@@ -83,20 +82,20 @@ properties:
resets:
minItems: 1
- maxItems: 3
+ maxItems: 2
reset-names:
minItems: 1
- maxItems: 3
+ maxItems: 2
items:
- enum: [ phy, mac, phy-lane0, phy-lane1, phy-lane2 ]
+ enum: [ phy, mac ]
clocks:
- minItems: 1
+ minItems: 4
maxItems: 6
clock-names:
- minItems: 1
+ minItems: 4
maxItems: 6
assigned-clocks:
@@ -115,17 +114,6 @@ properties:
power-domains:
maxItems: 1
- mediatek,pbus-csr:
- $ref: /schemas/types.yaml#/definitions/phandle-array
- items:
- - items:
- - description: phandle to pbus-csr syscon
- - description: offset of pbus-csr base address register
- - description: offset of pbus-csr base address mask register
- description:
- Phandle with two arguments to the syscon node used to detect if
- a given address is accessible on PCIe controller.
-
'#interrupt-cells':
const: 1
@@ -177,16 +165,6 @@ allOf:
- const: peri_26m
- const: top_133m
- resets:
- minItems: 1
- maxItems: 2
-
- reset-names:
- minItems: 1
- maxItems: 2
-
- mediatek,pbus-csr: false
-
- if:
properties:
compatible:
@@ -208,16 +186,6 @@ allOf:
- const: peri_26m
- const: peri_mem
- resets:
- minItems: 1
- maxItems: 2
-
- reset-names:
- minItems: 1
- maxItems: 2
-
- mediatek,pbus-csr: false
-
- if:
properties:
compatible:
@@ -246,8 +214,6 @@ allOf:
- const: phy
- const: mac
- mediatek,pbus-csr: false
-
- if:
properties:
compatible:
@@ -257,7 +223,6 @@ allOf:
then:
properties:
clocks:
- minItems: 4
maxItems: 4
clock-names:
@@ -267,38 +232,6 @@ allOf:
- const: peri_26m
- const: top_133m
- resets:
- minItems: 1
- maxItems: 2
-
- reset-names:
- minItems: 1
- maxItems: 2
-
- mediatek,pbus-csr: false
-
- - if:
- properties:
- compatible:
- const: airoha,en7581-pcie
- then:
- properties:
- clocks:
- maxItems: 1
-
- clock-names:
- items:
- - const: sys-ck
-
- resets:
- minItems: 3
-
- reset-names:
- items:
- - const: phy-lane0
- - const: phy-lane1
- - const: phy-lane2
-
unevaluatedProperties: false
examples:
--
2.53.0
^ permalink raw reply related
* [PATCH v2 1/4] dt-bindings: clock: airoha: Add additional reset for PCIe PERSTOUT
From: Christian Marangi @ 2026-06-26 9:20 UTC (permalink / raw)
To: Bjorn Helgaas, Lorenzo Pieralisi, Krzysztof Wilczyński,
Manivannan Sadhasivam, Rob Herring, Krzysztof Kozlowski,
Conor Dooley, Ryder Lee, Michael Turquette, Stephen Boyd,
Brian Masney, Philipp Zabel, Matthias Brugger,
AngeloGioacchino Del Regno, Christian Marangi, Jianjun Wang,
linux-pci, devicetree, linux-kernel, linux-mediatek, linux-clk,
linux-arm-kernel
In-Reply-To: <20260626092029.3525264-1-ansuelsmth@gmail.com>
Add additional reset to control PCIe PERSTOUT reset line for each of the 3
PCIe lines.
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
---
include/dt-bindings/reset/airoha,en7581-reset.h | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/include/dt-bindings/reset/airoha,en7581-reset.h b/include/dt-bindings/reset/airoha,en7581-reset.h
index 6544a1790b83..25e75534daa9 100644
--- a/include/dt-bindings/reset/airoha,en7581-reset.h
+++ b/include/dt-bindings/reset/airoha,en7581-reset.h
@@ -62,5 +62,9 @@
#define EN7581_CPU_TIMER_RST 50
#define EN7581_PCIE_HB_RST 51
#define EN7581_XPON_MAC_RST 52
+/* RST_PCIC */
+#define EN7581_PCIC_PERSTOUT0_RST 53
+#define EN7581_PCIC_PERSTOUT1_RST 54
+#define EN7581_PCIC_PERSTOUT2_RST 55
#endif /* __DT_BINDINGS_RESET_CONTROLLER_AIROHA_EN7581_H_ */
--
2.53.0
^ permalink raw reply related
* [PATCH v2 0/4] PCI: mediatek-gen3: Add 2-lanes mode support + clock
From: Christian Marangi @ 2026-06-26 9:20 UTC (permalink / raw)
To: Bjorn Helgaas, Lorenzo Pieralisi, Krzysztof Wilczyński,
Manivannan Sadhasivam, Rob Herring, Krzysztof Kozlowski,
Conor Dooley, Ryder Lee, Michael Turquette, Stephen Boyd,
Brian Masney, Philipp Zabel, Matthias Brugger,
AngeloGioacchino Del Regno, Christian Marangi, Jianjun Wang,
linux-pci, devicetree, linux-kernel, linux-mediatek, linux-clk,
linux-arm-kernel
This small series introduce support for 2-lanes mode for Airoha AN7581
SoC. This is needed for correctly functionality of Eagle WiFi Card
normally attached to this SoC that require a 2-line PCIe card to
correctly work (and give the proper performance)
The first 2 patch address a limitation of the PCIe implementation
where the PERSTOUT reset were indirectly asserted and deasserted
all at the same time (for all the 3 PCIe card) with PCIe
enable and disable.
The 2 patch address this and introduce correct reset to control
reset line for the relevant PCIe line.
The last 2 patch add additional logic and support to assert
and deassert the PERSTOUT and also apply the required configuration
for 2-lanes mode.
2-lanes mode is implemented in DT by adding the required property
and by defining the "num-lanes" to 2.
Changes v2:
- Address typo regs -> reg in Documentation
- Address typo lan -> lane in Documentation
- Apply a suggested fix from Airoha for PCIe MUX configuration
before PHY init
- Parse secondary reg in probe
- Add missing reset_status handling for inverted bits
- Move SCU to local handling in power_up
- Add check for max num-lanes for EN7581
Christian Marangi (4):
dt-bindings: clock: airoha: Add additional reset for PCIe PERSTOUT
clk: en7523: add support for dedicated PCIe PERSTOUT reset
dt-bindings: PCI: mediatek-gen3: Split Airoha schema and document
2-lanes
PCI: mediatek-gen3: Add 2-lanes mode support for Airoha AN7581
.../bindings/pci/airoha,en7581-pcie.yaml | 251 ++++++++++++++++++
.../bindings/pci/mediatek-pcie-gen3.yaml | 77 +-----
drivers/clk/clk-en7523.c | 39 ++-
drivers/pci/controller/pcie-mediatek-gen3.c | 101 +++++--
.../dt-bindings/reset/airoha,en7581-reset.h | 4 +
5 files changed, 370 insertions(+), 102 deletions(-)
create mode 100644 Documentation/devicetree/bindings/pci/airoha,en7581-pcie.yaml
--
2.53.0
^ permalink raw reply
* [PATCH] MAINTAINERS: Update maintainer and git tree for CIX SoC
From: Gary Yang @ 2026-06-26 9:20 UTC (permalink / raw)
To: arnd; +Cc: fugang.duan, linux-arm-kernel, cix-kernel-upstream, Gary Yang
Peter Chen has left CIX Technology. Take over maintenance of the CIX SoC
and Update the git tree URL accordingly.
Signed-off-by: Gary Yang <gary.yang@cixtech.com>
Reviewed-by: Fugang Duan <fugang.duan@cixtech.com>
---
MAINTAINERS | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/MAINTAINERS b/MAINTAINERS
index 2fb1c75afd16..17b3704bbcde 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2692,12 +2692,12 @@ F: arch/arm/mach-ep93xx/
F: drivers/iio/adc/ep93xx_adc.c
ARM/CIX SOC SUPPORT
-M: Peter Chen <peter.chen@cixtech.com>
+M: Gary Yang <gary.yang@cixtech.com>
M: Fugang Duan <fugang.duan@cixtech.com>
R: CIX Linux Kernel Upstream Group <cix-kernel-upstream@cixtech.com>
L: linux-arm-kernel@lists.infradead.org (moderated for non-subscribers)
S: Maintained
-T: git git://git.kernel.org/pub/scm/linux/kernel/git/peter.chen/cix.git
+T: git https://github.com/cixtech/linux-mainline.git
F: Documentation/devicetree/bindings/arm/cix.yaml
F: Documentation/devicetree/bindings/mailbox/cix,sky1-mbox.yaml
F: arch/arm64/boot/dts/cix/
--
2.50.1
^ permalink raw reply related
* Re: [RFC PATCH 1/3] dt-bindings: pinctrl: mt8516/mt8167: Move compatibles from mt66xx to mt6795
From: Luca Leonardo Scorcia @ 2026-06-26 9:12 UTC (permalink / raw)
To: Conor Dooley
Cc: linux-mediatek, Sean Wang, Linus Walleij, Rob Herring,
Krzysztof Kozlowski, Conor Dooley, Matthias Brugger,
AngeloGioacchino Del Regno, linux-gpio, devicetree, linux-kernel,
linux-arm-kernel
In-Reply-To: <20260625-unearth-suffering-e2c59d39da0f@spud>
> Usually when making ABI changes because something was inaccurate (but
> not wrong to the point that it didn't work at all) it's possible to
> support both new and old ABIs at the same time because of new properties
> etc. This is a difficult one because it's using the same properties in
> different ways. A new compatible would definitely be required for a
> genuine fresh start while retaining kernel support for the old mechanism
> in this case.
All things considered, the cleanest solution seems to be adding a new
compatible, mark the old one as deprecated and also try to fix the old
driver code. I'll try to do that before submitting again.
Thank you for your help!
--
Luca Leonardo Scorcia
l.scorcia@gmail.com
^ permalink raw reply
* Re: [PATCH v6 7/7] KVM: arm64: Enforce strict SBZ checks in the FF-A proxy
From: Will Deacon @ 2026-06-26 9:11 UTC (permalink / raw)
To: Sebastian Ene
Cc: catalin.marinas, maz, oupton, joey.gouly, korneld, kvmarm,
linux-arm-kernel, linux-kernel, android-kvm, mrigendra.chaubey,
perlarsen, suzuki.poulose, vdonnefort, yuzenghui
In-Reply-To: <20260626074545.433234-8-sebastianene@google.com>
On Fri, Jun 26, 2026 at 07:45:45AM +0000, Sebastian Ene wrote:
> Introduce a helper method ffa_check_unused_args_sbz to enforce strict
> arguments checking when the hypervisor acts as a relayer between the
> host and Trustzone.
>
> Signed-off-by: Sebastian Ene <sebastianene@google.com>
> Reviewed-by: Vincent Donnefort <vdonnefort@google.com>
> ---
> arch/arm64/kvm/hyp/nvhe/ffa.c | 96 ++++++++++++++++++++++++++++++++++-
> 1 file changed, 95 insertions(+), 1 deletion(-)
>
> diff --git a/arch/arm64/kvm/hyp/nvhe/ffa.c b/arch/arm64/kvm/hyp/nvhe/ffa.c
> index 712811e89435..bd50ddc5b61c 100644
> --- a/arch/arm64/kvm/hyp/nvhe/ffa.c
> +++ b/arch/arm64/kvm/hyp/nvhe/ffa.c
> @@ -74,6 +74,21 @@ static u32 hyp_ffa_version;
> static bool has_version_negotiated;
> static hyp_spinlock_t version_lock;
>
> +static bool ffa_check_unused_args_sbz(struct kvm_cpu_context *ctxt, int first_reg)
> +{
> + DECLARE_REG(u32, func_id, ctxt, 0);
> + int reg, end_reg = 7;
> +
> + if (FFA_MINOR_VERSION(hyp_ffa_version) >= 2)
> + end_reg = ARM_SMCCC_IS_64(func_id) ? 17 : 7;
This looks like an accident waiting to happen if we don't check the major
number as well.
I think you should just check:
if (hyp_ffa_version >= FFA_VERSION_1_2)
instead.
You should also add a comment.
Will
^ permalink raw reply
* Re: [PATCH v5 1/7] dt-bindings: display: verisilicon,dc: generalize for single-output variants
From: Icenowy Zheng @ 2026-06-26 9:09 UTC (permalink / raw)
To: Conor Dooley
Cc: Conor Dooley, Joey Lu, maarten.lankhorst, mripard, tzimmermann,
airlied, simona, robh, krzk+dt, conor+dt, ychuang3, schung, yclu4,
dri-devel, devicetree, linux-arm-kernel, linux-kernel
In-Reply-To: <20260626-agreement-express-b16c71315f7b@wendy>
在 2026-06-26五的 09:57 +0100,Conor Dooley写道:
> On Fri, Jun 26, 2026 at 03:58:14PM +0800, Icenowy Zheng wrote:
> > 在 2026-06-26五的 08:22 +0100,Conor Dooley写道:
> > > On Thu, Jun 25, 2026 at 05:33:37PM +0100, Conor Dooley wrote:
> > > > On Thu, Jun 25, 2026 at 05:44:43PM +0800, Joey Lu wrote:
> > > > > +
> > > > > + - if:
> > > > > + properties:
> > > > > + compatible:
> > > > > + contains:
> > > > > + const: nuvoton,ma35d1-dcu
> > > > > + then:
> > > > > + properties:
> > > > > + clocks:
> > > > > + minItems: 2
> > > >
> > > > Anything that updates the minimum constraint should be done at
> > > > the
> > > > top
> > > > level of this schema. The conditional section should then
> > > > tighten
> > > > the
> > > > constraint, in this case that means only having maxItems.
> > > >
> > > > > + maxItems: 2
> > > > > +
> > > > > + clock-names:
> > > > > + items:
> > > > > + - const: core
> > > > > + - const: pix0
> > > >
> > > > Does this even work when the top level schema thinks clock 2
> > > > should
> > > > be
> > > > called axi?
> > >
> > > Additionally here, only have core and pix0 seems like it might be
> > > an
> > > oversimplification. I doubt removing the second output port means
> > > that
> > > the axi and ahb clocks are no longer needed.
> > > Is it the case that your device supplies the same clock to core,
> > > ahb
> > > and
> > > axi? If so, then you should fill those clocks in in your
> > > devicetree
> > > and
> > > this can just constrain the number of clocks/clock-names to 4.
> >
> > The clock controller of that SoC is quite weird -- it has only a
> > single
> > gate bit, but controlling 3 clock gates. All core, ahb and axi
> > clocks
> > have gates controlled by this single bit, so it's why currently
> > it's
> > modelled as only core clock supplied.
>
> Yeah, then what's in the binding is definitely wrong.
> Even if the same clock was provided to all clock inputs in the IP,
> all
> individual clock should be listed in the devicetree - although it
> will
> look a little silly to see clocks = <&foo 2>, <&foo 2>, <&foo 2>,
> <&foo 2>;
> In this case, 3 clocks controlled by 1 gate bit is an implementation
> detail
> of the SoC's clocking hardware, and not relevant to how the dc
> instance
> should be described.
>
> > Well it might be worthful to supply the bus clock before the gate
> > as
> > ahb/axi, especially axi, because both the AXI clock and the core
> > clock
> > constraints the maximum pixel clock.
>
> Right. And looking at patch 4/7, and the wording:
> > The Nuvoton MA35D1 SoC integrates a DCUltraLite display controller
> > whose
> > AXI and AHB bus clocks share a single gate enable bit with the
> > display
> > core clock, so the clock driver does not expose them separately.
> > This
> > patch makes the axi and ahb clocks optional in the probe.
>
> It sounds like there's probably some issues with how things are
> modelled
> clock wise in this device, unless this is not an accurate statement
> and
> there's actually one clock provided to all three inputs. If they're
> distinct clocks, with different rates, only having one exposed has a
> lot
> of potential to be problematic!
Yes, I agree with this, they're different clocks according to the
manual.
I added the clk people to the CC list in a reply of the previous
revision, but they didn't react yet. I don't know how to represent
multiple clock gates sharing a single control bit in the clock
framework...
Maybe just supplying the ungated AXI/AHB clocks here, and let the core
clock manage the gate?
Thanks,
Icenowy
^ permalink raw reply
* Re: [PATCH v14 29/44] arm64: RMI: Runtime faulting of memory
From: Suzuki K Poulose @ 2026-06-26 9:04 UTC (permalink / raw)
To: Gavin Shan, Lorenzo Pieralisi
Cc: Steven Price, kvm, kvmarm, Catalin Marinas, Marc Zyngier,
Will Deacon, James Morse, Oliver Upton, Zenghui Yu,
linux-arm-kernel, linux-kernel, Joey Gouly, Alexandru Elisei,
Christoffer Dall, Fuad Tabba, linux-coco, Ganapatrao Kulkarni,
Shanker Donthineni, Alper Gun, Aneesh Kumar K . V, Emi Kisanuki,
Vishal Annapurve, WeiLin.Chang, Lorenzo.Pieralisi2
In-Reply-To: <9482dfbc-4d96-47ba-a615-f4ba0bda833f@arm.com>
On 26/06/2026 09:47, Suzuki K Poulose wrote:
> On 26/06/2026 08:43, Gavin Shan wrote:
>> On 6/26/26 1:58 AM, Suzuki K Poulose wrote:
>>> On 25/06/2026 14:53, Gavin Shan wrote:
>>>> On 6/6/26 12:35 AM, Lorenzo Pieralisi wrote:
>>>>> On Fri, Jun 05, 2026 at 06:11:11PM +1000, Gavin Shan wrote:
>>>>>> On 6/5/26 5:28 PM, Lorenzo Pieralisi wrote:
>>>>>>> On Fri, Jun 05, 2026 at 04:23:15PM +1000, Gavin Shan wrote:
>>
>> [...]
>>
>>>>>>
>>>>>> I tried to rebase Jean's latest QEMU series [1] to upstream QEMU,
>>>>>> and found
>>>>>> that memory slots backed by THP are broken. With THP disabled on
>>>>>> the host and
>>>>>> other fixes (mentioned in my prevous replies) applied on the top
>>>>>> of this (v14)
>>>>>> series, I'm able to boot a realm guest with rebased QEMU series
>>>>>> [2], plus more
>>>>>> fxies on the top.
>>>>>>
>>>>>> [1] https://git.codelinaro.org/linaro/dcap/qemu.git (branch: cca/
>>>>>> latest)
>>>>>> [2] https://git.qemu.org/git/qemu.git (branch: cca/
>>>>>> gavin)
>>>>>>
>>>>>> Lorenzo, You may be saying there is someone making QEMU to support
>>>>>> ARM/CCA?
>>>>>
>>>>> Mathieu and I are working on that yes and with Steven/Suzuki to fix
>>>>> the THP
>>>>> issues you pointed out above.
>>>>>
>>>>>> If so, I'm not sure if there is a QEMU repository for me to try?
>>>>>
>>>>> We should be able to submit patches by end of June - we shall let
>>>>> you know
>>>>> whether we can make something available earlier.
>>>>>
>>>>
>>>> Not sure if there are other known issues in this series. It seems
>>>> the stage2
>>>> page fault handling on the shared space isn't working well. In my
>>>> test, the
>>>> vring (struct vring_desc) of virtio-net-pci is updated by the guest,
>>>> and the
>>>> data isn't seen by QEMU, I'm suspecting if the host-page-frame-
>>>> number is properly
>>>> resolved in the s2 page fault handler for shared (unprotected) space.
>>>>
>>>> - I rebased Jean's latest qemu branch to the upstream qemu;
>>>>
>>>> - On the host, which is emulated by qemu/tcg, the THP (transparent
>>>> huge page) is
>>>> disabled.
>>>>
>>>> - On the guest, I can see the virtio vring (struct vring_desc) is
>>>> updated. The
>>>> S1 page-table entry looks correct because the corresponding
>>>> physical address
>>>> 0x10046880000 is a sane shared (unprotected) space address.
>>>>
>>>> [ 52.094143] software IO TLB: Memory encryption is active and
>>>> system is using DMA bounce buffers
>>>> [ 52.289746] virtqueue_add_desc_split:
>>>> desc[0]@0xffff000006880000, [00000100b983f000 00000640 0002 0001]
>>>> [ 52.432150] PTE 0x00e8010046880707 at address 0xffff000006880000
>>>>
>>>> - On the host, the s2 page-table-entry is unmapped due to attribute
>>>> transition (private -> shared).
>>>> A subsequent S2 page fault is raised against the adress and the
>>>> s2 page-table-entry is built.
>>>>
>>>> [ 109.259077] ====> realm_unmap_shared_range:
>>>> tracked_unprot_addr=0x10046880000
>>>> [ 109.260249] realm_unmap_shared_range: unmapped shared range at
>>>> 0x10046880000
>>>> [ 109.317786] realm_unmap_shared_range: unmapped shared range at
>>>> 0x10046880000
>>>> [ 109.629939] ====> kvm_handle_guest_abort:
>>>> fault_ipa=0x10046880000, esr=0x92000007
>>>> [ 109.630245] realm_map_non_secure: ipa=0x10046880000,
>>>> pfn=0xb8b59, size=0x1000, prot=0xf
>>>> [ 109.630331] realm_map_non_secure: ipa=0x10046880000,
>>>> ipa_top=0x10046881000, flags=0x1e0001, range_desc=0xb8b59004
>>>
>>> Are you able to correlate the order of the transitions and the Guest
>>> access with RMM log ? We haven't seen this from our end. We are aware
>>> of permission fault issues with Unprotected IPA when backing the memslot
>>> with MAP_PRIVATE areas. But this looks different.
>>>
>>> Lorenzo, have you run into this ?
>>>
>>
>> It's hard to correlate the order since the logs are collected from two
>> separate
>> consoles. For the write permission, I add code to the host where the
>> permission
>> is always added for all s2 page faults in the shared space. Otherwise,
>> qemu can
>> be killed by -EFAULT or similar error.
>
> This is the problem. We can't add WRITE permission by default. I believe
> you may have MAP_PRIVATE mapping and it has to be mapped as READ only
> and on a permission fault, we replace it with a writable page. By
> overriding the WRITE permission, you let the guest write to a page
> that may not be seen by the VMM.
>
> We identified this as a bug in the KVM driver in this series (reported
> by Lorenzo) and there is a corresponding tf-RMM change that is required
> to get this working. So, please could you wait until the next series
> when this will be addressed ? Or you could switch to using MAP_SHARED
> for the "shared" memory in the memslot.
For the record, you need something like this :
--- a/arch/arm64/kvm/rmi.c
+++ b/arch/arm64/kvm/rmi.c
@@ -838,8 +838,17 @@ int realm_map_non_secure(struct realm *realm,
if (RMI_RETURN_STATUS(ret) == RMI_ERROR_RTT) {
/* Create missing RTTs and retry */
int level = RMI_RETURN_INDEX(ret);
+ int req_level = find_map_level(realm, ipa, ipa_top);
+
+ /*
+ * There already exists a mapping at the level.
May be
+ * we are relaxing a permission for the given
range ?
+ */
+ if (level >= req_level) {
+ realm_unmap_shared_range(kvm, ipa,
ipa_top, false);
+ continue;
+ }
- WARN_ON(level == KVM_PGTABLE_LAST_LEVEL);
ret = realm_create_rtt_levels(realm, ipa, level,
KVM_PGTABLE_LAST_LEVEL,
memcache);
Thanks
Suzuki
>
>
> Suzuki
>
>
>>
>> There are more findings after more experiments: this virtio-net-pci
>> device has 3
>> queues or vrings (Rx/Tx/Ctrl). The Rx/Tx/Ctrl queue are populated in
>> order one after
>> one. In the guest kernel, I intentionally write fixed data
>> (0x0123456789abcdef) to
>> the first 8 bytes of the queue when it gets populated, and stop the
>> guest at random
>> points to see if the data is gone. I found that the data written to
>> Rx/ Tx queue are
>> lost after Ctrl queue is allocated.
>>
>> The data written to Rx/Tx queue is lost if the guest stops (B). The
>> data written to
>> Rx/Tx queue isn't lost if the guest stops at (A). I can see the
>> pattern (0x0123...cdef)
>> by dumping the physcial memory through 'pmemsave' command in qemu.
>>
>> DMA allocation
>> ==============
>> dma_alloc_coherent
>> dma_alloc_attrs
>> dma_direct_alloc
>> __dma_direct_alloc_pages
>> dma_set_decrypted // (A) No data lost if
>> being stopped here for the Ctrl queue
>> memset(ret, 0, size) // (B) Data lost after
>> being stopped after memset() for the Ctrl queue
>>
>> The memset() on the Ctrl queue should trigger a stage2 page fault. It
>> seems the page
>> fault enforces the shared pages for Rx/Tx queue to be dropped? I need
>> to add more
>> debugging code and track it down.
>>
>>> Suzuki
>>>
>>>
>>>>
>>>> - On QEMU, the updated vring (struct vring_desc) at GPA 0x46880000
>>>> isn't seen. All the
>>>> data in that adress are zeros.
>>>>
>>>> ====> virtqueue_split_pop: vdev=<virtio-net>, sz=0x38,
>>>> queue_index=0x0, vq->vring.num=0x100
>>>> virtqueue_split_pop: last_avail_idx=0x0, head=0x0
>>>> address_space_read_cached_slow: cache@0xffff1c036440, addr=0x0,
>>>> buf=0xffffeee34880, len=0x10
>>>> address_space_read_cached_slow: cache: ptr=0x0,
>>>> xlat=0x10046880000, len=0x1000, mrs=<realm-dma-region>, is_write=no
>>>> address_space_read_cached_slow: translated to mr=<mach-virt.ram>,
>>>> mr_addr=0x6880000, l=0x10
>>>> flatview_read_continue_step: mr=<mach-virt.ram>,
>>>> host=0xffff23e00000, mr_addr=0x6880000, ram_ptr=0xffff2a680000
>>>> virtqueue_split_pop: desc: 0000000000000000 - 00000000 - 00000000
>>>> - 00000000
>>>> qemu-system-aarch64: virtio: zero sized buffers are not allowed
>>>>
>>>>
>> Thanks,
>> Gavin
>>
>
^ permalink raw reply
* Re: [PATCH v5 1/7] dt-bindings: display: verisilicon,dc: generalize for single-output variants
From: Icenowy Zheng @ 2026-06-26 9:00 UTC (permalink / raw)
To: Conor Dooley
Cc: Conor Dooley, Joey Lu, maarten.lankhorst, mripard, tzimmermann,
airlied, simona, robh, krzk+dt, conor+dt, ychuang3, schung, yclu4,
dri-devel, devicetree, linux-arm-kernel, linux-kernel
In-Reply-To: <20260626-astrology-mural-853d3860e048@wendy>
在 2026-06-26五的 08:19 +0100,Conor Dooley写道:
> On Fri, Jun 26, 2026 at 01:27:21PM +0800, Icenowy Zheng wrote:
> > 在 2026-06-25四的 17:33 +0100,Conor Dooley写道:
> > > On Thu, Jun 25, 2026 at 05:44:43PM +0800, Joey Lu wrote:
> > > > +allOf:
> > > > + - if:
> > > > + properties:
> > > > + compatible:
> > > > + contains:
> > > > + const: thead,th1520-dc8200
> > > > + then:
> > > > + properties:
> > > > + clocks:
> > > > + minItems: 5
> > > > + maxItems: 5
> > > > +
> > > > + clock-names:
> > > > + minItems: 5
> > > > + maxItems: 5
> > >
> > > All the maxItems here repeat the maximum constraint and do
> > > nothing.
> > >
> > > Since you didn't change the minimum constraint at the top level,
> > > your
> > > minItems also do nothing.
> > >
> > > > +
> > > > + resets:
> > > > + minItems: 3
> > > > + maxItems: 3
> > > > +
> > > > + reset-names:
> > > > + minItems: 3
> > > > + maxItems: 3
> > > > +
> > > > + required:
> > > > + - resets
> > > > + - reset-names
> > >
> > > Both conditional sections have this, but the original binding
> > > doesn't
> > > require these for the thead device. This is a functional change
> > > therefore and shouldn't be in a patch calling itself "generalise
> > > for
> > > single ended variants".
> >
> > Well yes they're required.
> >
> > Should I send a patch adding the `thead,th1520-dc8200` part of the
> > schema?
>
> If you mean the code above, no. Adding a conditional section when
> there's only that compatible doesn't make sense.
>
> What you could do is just add it at the top level though, which would
> also benefit this patch since it'd not have to be conditionally added
> for the new nuvoton device.
> Just note in your commit message about what the ABI impact of the
> change
> to required properties is (effectively nothing because it's optional
> in
> the driver and the only user has the properties).
Okay, I will craft such a patch and send it.
>
> > > > +
> > > > + resets:
> > > > + minItems: 1
> > > > + maxItems: 1
> > > > +
> > > > + reset-names:
> > > > + items:
> > > > + - const: core
> > >
> > > This is just maxItems: 1.
> >
> > Well the implicit rules of DT binding schemas are quite weird...
>
> I don't think it is that strange, as the binding has
> reset-names:
> items:
> - const: core
> - const: axi
> - const: ahb
Ah does the list constraint the order of items? If it constrains the
order, it partly breaks the intention of having names; if it does not
constrain the order, it needs to be clarified that the required 1 reset
is core instead of the other two.
Thanks,
Icenowy
> so just constraining to one item is the simplest way to do this
> without
> duplication.
^ permalink raw reply
* RE: [PATCH v3 01/10] mailbox: imx: Forward the timeout/ error in imx_mu_generic_tx()
From: Peng Fan (OSS) @ 2026-06-26 9:00 UTC (permalink / raw)
To: Sebastian Andrzej Siewior, Peng Fan (OSS)
Cc: linux-remoteproc@vger.kernel.org, imx@lists.linux.dev,
linux-arm-kernel@lists.infradead.org,
linux-rt-devel@lists.linux.dev, Bjorn Andersson, Clark Williams,
Fabio Estevam, Frank Li, Jassi Brar, Mathieu Poirier,
Pengutronix Kernel Team, Sascha Hauer, Steven Rostedt
In-Reply-To: <20260626083416.xbVlbOQJ@linutronix.de>
> Subject: Re: [PATCH v3 01/10] mailbox: imx: Forward the timeout/
> error in imx_mu_generic_tx()
>
> On 2026-06-26 16:23:49 [+0800], Peng Fan wrote:
> > On Wed, Jun 24, 2026 at 09:44:09AM +0200, Sebastian Andrzej
> Siewior wrote:
> > >On 2026-06-22 19:24:00 [+0800], Peng Fan wrote:
> > >> We may need to use atomic API for TXDB_V2. For the patchset
> itself,
> > >> it looks good to me.
> > >>
> > >> Reviewed-by: Peng Fan <peng.fan@nxp.com>
> > >
> > >Thank you. Is there anything you want me to do or is this series
> good
> > >as-is?
> >
> > If you would like to address the AI reported issue further, you may
> > update readl_poll_timeout to readl_poll_timeout_atomic.
>
> What about the timeout value? Keep it as-is or reduce to?
Let's keep it as-is.
Thanks,
Peng.
>
> > From the fix on error return, this patch is ok to me, and the series is
> good.
>
> Thanks.
>
> > Thanks,
> > Peng
> > >
>
> Sebastian
^ permalink raw reply
* Re: [PATCH v5 1/7] dt-bindings: display: verisilicon,dc: generalize for single-output variants
From: Conor Dooley @ 2026-06-26 8:57 UTC (permalink / raw)
To: Icenowy Zheng
Cc: Conor Dooley, Joey Lu, maarten.lankhorst, mripard, tzimmermann,
airlied, simona, robh, krzk+dt, conor+dt, ychuang3, schung, yclu4,
dri-devel, devicetree, linux-arm-kernel, linux-kernel
In-Reply-To: <84b93c496fabdeee05d2f962a1b764fdbfaacdb7.camel@iscas.ac.cn>
[-- Attachment #1: Type: text/plain, Size: 3174 bytes --]
On Fri, Jun 26, 2026 at 03:58:14PM +0800, Icenowy Zheng wrote:
> 在 2026-06-26五的 08:22 +0100,Conor Dooley写道:
> > On Thu, Jun 25, 2026 at 05:33:37PM +0100, Conor Dooley wrote:
> > > On Thu, Jun 25, 2026 at 05:44:43PM +0800, Joey Lu wrote:
> > > > +
> > > > + - if:
> > > > + properties:
> > > > + compatible:
> > > > + contains:
> > > > + const: nuvoton,ma35d1-dcu
> > > > + then:
> > > > + properties:
> > > > + clocks:
> > > > + minItems: 2
> > >
> > > Anything that updates the minimum constraint should be done at the
> > > top
> > > level of this schema. The conditional section should then tighten
> > > the
> > > constraint, in this case that means only having maxItems.
> > >
> > > > + maxItems: 2
> > > > +
> > > > + clock-names:
> > > > + items:
> > > > + - const: core
> > > > + - const: pix0
> > >
> > > Does this even work when the top level schema thinks clock 2 should
> > > be
> > > called axi?
> >
> > Additionally here, only have core and pix0 seems like it might be an
> > oversimplification. I doubt removing the second output port means
> > that
> > the axi and ahb clocks are no longer needed.
> > Is it the case that your device supplies the same clock to core, ahb
> > and
> > axi? If so, then you should fill those clocks in in your devicetree
> > and
> > this can just constrain the number of clocks/clock-names to 4.
>
> The clock controller of that SoC is quite weird -- it has only a single
> gate bit, but controlling 3 clock gates. All core, ahb and axi clocks
> have gates controlled by this single bit, so it's why currently it's
> modelled as only core clock supplied.
Yeah, then what's in the binding is definitely wrong.
Even if the same clock was provided to all clock inputs in the IP, all
individual clock should be listed in the devicetree - although it will
look a little silly to see clocks = <&foo 2>, <&foo 2>, <&foo 2>, <&foo 2>;
In this case, 3 clocks controlled by 1 gate bit is an implementation detail
of the SoC's clocking hardware, and not relevant to how the dc instance
should be described.
> Well it might be worthful to supply the bus clock before the gate as
> ahb/axi, especially axi, because both the AXI clock and the core clock
> constraints the maximum pixel clock.
Right. And looking at patch 4/7, and the wording:
| The Nuvoton MA35D1 SoC integrates a DCUltraLite display controller whose
| AXI and AHB bus clocks share a single gate enable bit with the display
| core clock, so the clock driver does not expose them separately. This
| patch makes the axi and ahb clocks optional in the probe.
It sounds like there's probably some issues with how things are modelled
clock wise in this device, unless this is not an accurate statement and
there's actually one clock provided to all three inputs. If they're
distinct clocks, with different rates, only having one exposed has a lot
of potential to be problematic!
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
^ permalink raw reply
* Re: [EXTERNAL] Re: [PATCH v4 1/3] perf: marvell: Add MPAM partid filtering to CN10K TAD PMU
From: Ben Horgan @ 2026-06-26 8:57 UTC (permalink / raw)
To: Geethasowjanya Akula, linux-perf-users@vger.kernel.org,
linux-kernel@vger.kernel.org,
linux-arm-kernel@lists.infradead.org, devicetree@vger.kernel.org
Cc: mark.rutland@arm.com, will@kernel.org, krzk+dt@kernel.org,
james.morse@arm.com, Sunil Kovvuri Goutham, Tanmay Jagdale
In-Reply-To: <CH0PR18MB43394668EAEBD367DB4D4990CDEB2@CH0PR18MB4339.namprd18.prod.outlook.com>
Hi Geetha,
On 6/26/26 07:21, Geethasowjanya Akula wrote:
>
>
>> -----Original Message-----
>> From: Ben Horgan <ben.horgan@arm.com>
>> Sent: Thursday, June 25, 2026 7:23 PM
>> To: Geethasowjanya Akula <gakula@marvell.com>; linux-perf-
>> users@vger.kernel.org; linux-kernel@vger.kernel.org; linux-arm-
>> kernel@lists.infradead.org; devicetree@vger.kernel.org
>> Cc: mark.rutland@arm.com; will@kernel.org; krzk+dt@kernel.org;
>> james.morse@arm.com
>> Subject: [EXTERNAL] Re: [PATCH v4 1/3] perf: marvell: Add MPAM partid
>> filtering to CN10K TAD PMU
>> Hi Geetha,
>>
>> +CC James
>>
>> On 6/18/26 16:36, Geetha sowjanya wrote:
>>> From: Tanmay Jagdale <tanmay@marvell.com>
>>>
>>> The TAD PMU exposes counters that can be filtered by MPAM partition id
>>> for a subset of allocation and hit events.
>>>
>>> Add a 9-bit partid format attribute (config1) and route counter
>>> programming through variant-specific ops so CN10K keeps MPAM-capable
>>> programming while Odyssey keeps the reduced event set without advertising
>> partid in sysfs.
>>>
>>> Probe no longer mutates the platform_device MMIO resource (walk a
>>> local map_start), rejects tad-cnt / page sizes of zero, validates the
>>> memory window against tad-cnt, and registers the perf PMU before
>>> hotplug with correct unwind.
>>>
>>> Example:
>>> perf stat -e tad/tad_alloc_any,partid=0x12,partid_en=1/ -- <program>
>>
>> Where is the user expected to get the PARTID from? The MPAM driver
>> considers the PARTID as an internal only value.
>>
>> resctrl does support a 'debug' mount option which will show the CLOSID
>> associated with a control group. Whilst the CLOSID is often the PARTID, it is
>> really a set of PARTIDs. When the cdp mount option is used, CLOSID maps to 2
>> PARTIDs and if we use PARTID narrowing to give us more monitors, as in
>> proposed in [1], then the set of PARTIDs may be bigger.
>> Furthermore, if the PARTID narrowing scheme is made dynamic the size of the
>> PARTID set may change when control or monitoring groups are created or
>> deleted.
>>
>> It seems that a way to map from a resctrl control group to the set of PARTIDs is
>> required and a mechanism to tie this to lifetime of the resctrl mount.
>>
>> Perhaps some helpers along the lines of:
>>
>> int resctrl_mount_generation(void)
>> int mpam_rdtgrp_to_partid_is_static(int mount_gen) int
>> resctrl_rdtgrp_generation(char *name) int
>> mpam_rdtgrp_to_partid_count(char *name, int rdt_gen) int
>> mpam_rdtgrp_to_partid_array(char *name, int rdt_gen, int* partids)
>>
>> The rdtgrp generation is to an attempt to avoid having to use a debug interface
>> in anger and cope with renaming of control groups in resctrl.
>> This does seem a bit unwieldly so hopefully there is better way to do this.
>>
>> Sorry to throw a spanner in the works.
> On …, … wrote:
>> Where is the user expected to get the PARTID from? The MPAM driver
>> considers the PARTID as an internal only value.
>> …
>> Perhaps some helpers along the lines of:
>> int resctrl_mount_generation(void)
>> …
> Hi Ben,
>
> Thank you for the detailed feedback — the concern you raise is valid, particularly when
> viewed from the perspective of resctrl-managed deployments.
>
> However, to clarify the intent of this patch: the exposure of partid in the TAD PMU is deliberately
> a low-level, hardware-facing interface, and is not intended to integrate with or mirror the
> abstractions provided by resctrl. It is mainly meant for platform bring-up and low-level
> performance/debug users, who already have explicit knowledge of the MPAM configuration,
> typically provisioned by firmware or other privileged software layers (e.g. EL3/EL2).
> In such environments, PARTIDs are known out-of-band, so the expectation is that the
> user supplying partid is already aware of the MPAM IDs programmed on the system.
When this was proposed before, [1], there was feedback asking to
document how to get the PARTID.
Thanks,
Ben
[1]
https://lore.kernel.org/linux-arm-kernel/c981692b-af7b-453d-39af-402221e174f5@arm.com/>
> A proper “profile this resctrl group” path would require MPAM–resctrl support (e.g. something along the lines of the helpers you suggest)
> to resolve a group to its PARTID set. This is indeed important, but it constitutes a separate design discussion that is outside the scope of this driver patch.
>
> We will clarify this in the commit message and avoid implying that users normally obtain PARTIDs from resctrl today.
>
>
> Thanks,
> Geetha
>>
>> Thanks,
>>
>> Ben
>>
>>>
>>> Signed-off-by: Tanmay Jagdale <tanmay@marvell.com>
>>> Signed-off-by: Geetha sowjanya <gakula@marvell.com>
>>> ---
>>>
>>> Changelog (since v3)
>>> --------------------
>>> - Restore cpuhp_state_add_instance_nocalls before perf_pmu_register in
>> probe
>>> so users cannot attach events before the hotplug instance exists; unwind
>>> removes the hotplug instance if perf registration fails.
>>> - Add perf_ready: tad_pmu_offline_cpu skips perf_pmu_migrate_context
>> until after
>>> successful perf_pmu_register, so a CPU offline between hotplug add and
>> perf
>>> register does not touch perf core state for an unregistered PMU.
>>>
>>> Changelog (since v2)
>>> --------------------
>>> - Validate the eventId using an appropriate mask to ensure
>>> it is restricted to 8 bits.
>>>
>>> Changelog (since v1)
>>> --------------------
>>> - Fix config1 filter enable to use bit 9 consistently with the PMU format
>>> string (partid_en) and reject reserved bits with GENMASK(9, 0).
>>> - Register perf_pmu_register before cpuhp_state_add_instance_nocalls and
>>> unregister on hotplug failure.
>>>
>>> drivers/perf/marvell_cn10k_tad_pmu.c | 220
>>> +++++++++++++++++++++------
>>> 1 file changed, 171 insertions(+), 49 deletions(-)
>>>
>>> diff --git a/drivers/perf/marvell_cn10k_tad_pmu.c
>>> b/drivers/perf/marvell_cn10k_tad_pmu.c
>>> index 51ccb0befa05..340be3776fe7 100644
>>> --- a/drivers/perf/marvell_cn10k_tad_pmu.c
>>> +++ b/drivers/perf/marvell_cn10k_tad_pmu.c
>>> @@ -7,6 +7,8 @@
>>> #define pr_fmt(fmt) "tad_pmu: " fmt
>>>
>>> #include <linux/io.h>
>>> +#include <linux/bits.h>
>>> +#include <linux/compiler.h>
>>> #include <linux/module.h>
>>> #include <linux/of.h>
>>> #include <linux/cpuhotplug.h>
>>> @@ -14,12 +16,20 @@
>>> #include <linux/platform_device.h>
>>> #include <linux/acpi.h>
>>>
>>> -#define TAD_PFC_OFFSET 0x800
>>> -#define TAD_PFC(counter) (TAD_PFC_OFFSET | (counter << 3))
>>> #define TAD_PRF_OFFSET 0x900
>>> -#define TAD_PRF(counter) (TAD_PRF_OFFSET | (counter << 3))
>>> +#define TAD_PFC_OFFSET 0x800
>>> +#define TAD_PFC(base, counter) ((base) | ((u64)(counter) << 3))
>>> +#define TAD_PRF(base, counter) ((base) | ((u64)(counter) << 3))
>>> #define TAD_PRF_CNTSEL_MASK 0xFF
>>> +#define TAD_PRF_MATCH_PARTID BIT(8)
>>> +#define TAD_PRF_PARTID_NS BIT(10)
>>> +/*
>>> + * config1: bits 0..8 MPAM partition id (including 0); bit 9 requests
>>> + * filtering for MPAM-capable events. All-zero config1 means no filter.
>>> + */
>>> +#define TAD_PARTID_FILTER_EN BIT(9)
>>> #define TAD_MAX_COUNTERS 8
>>> +#define TAD_EVENT_SEL_MASK GENMASK(7, 0)
>>>
>>> #define to_tad_pmu(p) (container_of(p, struct tad_pmu, pmu))
>>>
>>> @@ -27,30 +37,94 @@ struct tad_region {
>>> void __iomem *base;
>>> };
>>>
>>> +enum mrvl_tad_pmu_version {
>>> + TAD_PMU_V1 = 1,
>>> + TAD_PMU_V2,
>>> +};
>>> +
>>> +struct tad_pmu_data {
>>> + int id;
>>> + u64 tad_prf_offset;
>>> + u64 tad_pfc_offset;
>>> +};
>>> +
>>> struct tad_pmu {
>>> struct pmu pmu;
>>> struct tad_region *regions;
>>> u32 region_cnt;
>>> unsigned int cpu;
>>> + /* Set after successful perf_pmu_register(); gates offline migration. */
>>> + bool perf_ready;
>>> + const struct tad_pmu_ops *ops;
>>> + const struct tad_pmu_data *pdata;
>>> struct hlist_node node;
>>> struct perf_event *events[TAD_MAX_COUNTERS];
>>> DECLARE_BITMAP(counters_map, TAD_MAX_COUNTERS); };
>>>
>>> -enum mrvl_tad_pmu_version {
>>> - TAD_PMU_V1 = 1,
>>> - TAD_PMU_V2,
>>> -};
>>> -
>>> -struct tad_pmu_data {
>>> - int id;
>>> +struct tad_pmu_ops {
>>> + void (*start_counter)(struct tad_pmu *pmu, struct perf_event
>>> +*event);
>>> };
>>>
>>> static int tad_pmu_cpuhp_state;
>>>
>>> +static void tad_pmu_start_counter(struct tad_pmu *pmu,
>>> + struct perf_event *event)
>>> +{
>>> + const struct tad_pmu_data *pdata = pmu->pdata;
>>> + struct hw_perf_event *hwc = &event->hw;
>>> + u32 event_idx = (u32)(event->attr.config & TAD_EVENT_SEL_MASK);
>>> + u32 counter_idx = hwc->idx;
>>> + u64 partid_filter = 0;
>>> + u64 reg_val;
>>> + u64 cfg1 = event->attr.config1;
>>> + bool use_mpam = cfg1 & TAD_PARTID_FILTER_EN;
>>> + u32 partid = (u32)(cfg1 & GENMASK(8, 0));
>>> + int i;
>>> +
>>> + for (i = 0; i < pmu->region_cnt; i++)
>>> + writeq_relaxed(0, pmu->regions[i].base +
>>> + TAD_PFC(pdata->tad_pfc_offset, counter_idx));
>>> +
>>> + if (use_mpam && event_idx > 0x19 && event_idx < 0x21) {
>>> + partid_filter = TAD_PRF_MATCH_PARTID |
>> TAD_PRF_PARTID_NS |
>>> + ((u64)partid << 11);
>>> + }
>>> +
>>> +
>>> + for (i = 0; i < pmu->region_cnt; i++) {
>>> + reg_val = event_idx & 0xFF;
>>> + reg_val |= partid_filter;
>>> + writeq_relaxed(reg_val, pmu->regions[i].base +
>>> + TAD_PRF(pdata->tad_prf_offset, counter_idx));
>>> + }
>>> +}
>>> +
>>> +static void tad_pmu_v2_start_counter(struct tad_pmu *pmu,
>>> + struct perf_event *event)
>>> +{
>>> + const struct tad_pmu_data *pdata = pmu->pdata;
>>> + struct hw_perf_event *hwc = &event->hw;
>>> + u32 event_idx = (u32)(event->attr.config & TAD_EVENT_SEL_MASK);
>>> + u32 counter_idx = hwc->idx;
>>> + u64 reg_val;
>>> + int i;
>>> +
>>> + for (i = 0; i < pmu->region_cnt; i++)
>>> + writeq_relaxed(0, pmu->regions[i].base +
>>> + TAD_PFC(pdata->tad_pfc_offset, counter_idx));
>>> +
>>> + for (i = 0; i < pmu->region_cnt; i++) {
>>> + reg_val = event_idx & 0xFF;
>>> + writeq_relaxed(reg_val, pmu->regions[i].base +
>>> + TAD_PRF(pdata->tad_prf_offset, counter_idx));
>>> + }
>>> +}
>>> +
>>> static void tad_pmu_event_counter_read(struct perf_event *event) {
>>> struct tad_pmu *tad_pmu = to_tad_pmu(event->pmu);
>>> + const struct tad_pmu_data *pdata = tad_pmu->pdata;
>>> struct hw_perf_event *hwc = &event->hw;
>>> u32 counter_idx = hwc->idx;
>>> u64 prev, new;
>>> @@ -60,7 +134,7 @@ static void tad_pmu_event_counter_read(struct
>> perf_event *event)
>>> prev = local64_read(&hwc->prev_count);
>>> for (i = 0, new = 0; i < tad_pmu->region_cnt; i++)
>>> new += readq(tad_pmu->regions[i].base +
>>> - TAD_PFC(counter_idx));
>>> + TAD_PFC(pdata->tad_pfc_offset,
>> counter_idx));
>>> } while (local64_cmpxchg(&hwc->prev_count, prev, new) != prev);
>>>
>>> local64_add(new - prev, &event->count); @@ -69,16 +143,14 @@
>> static
>>> void tad_pmu_event_counter_read(struct perf_event *event) static void
>>> tad_pmu_event_counter_stop(struct perf_event *event, int flags) {
>>> struct tad_pmu *tad_pmu = to_tad_pmu(event->pmu);
>>> + const struct tad_pmu_data *pdata = tad_pmu->pdata;
>>> struct hw_perf_event *hwc = &event->hw;
>>> u32 counter_idx = hwc->idx;
>>> int i;
>>>
>>> - /* TAD()_PFC() stop counting on the write
>>> - * which sets TAD()_PRF()[CNTSEL] == 0
>>> - */
>>> for (i = 0; i < tad_pmu->region_cnt; i++) {
>>> writeq_relaxed(0, tad_pmu->regions[i].base +
>>> - TAD_PRF(counter_idx));
>>> + TAD_PRF(pdata->tad_prf_offset, counter_idx));
>>> }
>>>
>>> tad_pmu_event_counter_read(event);
>>> @@ -89,26 +161,10 @@ static void tad_pmu_event_counter_start(struct
>>> perf_event *event, int flags) {
>>> struct tad_pmu *tad_pmu = to_tad_pmu(event->pmu);
>>> struct hw_perf_event *hwc = &event->hw;
>>> - u32 event_idx = event->attr.config;
>>> - u32 counter_idx = hwc->idx;
>>> - u64 reg_val;
>>> - int i;
>>>
>>> hwc->state = 0;
>>>
>>> - /* Typically TAD_PFC() are zeroed to start counting */
>>> - for (i = 0; i < tad_pmu->region_cnt; i++)
>>> - writeq_relaxed(0, tad_pmu->regions[i].base +
>>> - TAD_PFC(counter_idx));
>>> -
>>> - /* TAD()_PFC() start counting on the write
>>> - * which sets TAD()_PRF()[CNTSEL] != 0
>>> - */
>>> - for (i = 0; i < tad_pmu->region_cnt; i++) {
>>> - reg_val = event_idx & 0xFF;
>>> - writeq_relaxed(reg_val, tad_pmu->regions[i].base +
>>> - TAD_PRF(counter_idx));
>>> - }
>>> + tad_pmu->ops->start_counter(tad_pmu, event);
>>> }
>>>
>>> static void tad_pmu_event_counter_del(struct perf_event *event, int
>>> flags) @@ -128,7 +184,6 @@ static int tad_pmu_event_counter_add(struct
>> perf_event *event, int flags)
>>> struct hw_perf_event *hwc = &event->hw;
>>> int idx;
>>>
>>> - /* Get a free counter for this event */
>>> idx = find_first_zero_bit(tad_pmu->counters_map,
>> TAD_MAX_COUNTERS);
>>> if (idx == TAD_MAX_COUNTERS)
>>> return -EAGAIN;
>>> @@ -148,6 +203,9 @@ static int tad_pmu_event_counter_add(struct
>>> perf_event *event, int flags) static int tad_pmu_event_init(struct
>>> perf_event *event) {
>>> struct tad_pmu *tad_pmu = to_tad_pmu(event->pmu);
>>> + const struct tad_pmu_data *pdata = tad_pmu->pdata;
>>> + u32 event_idx = (u32)(event->attr.config & TAD_EVENT_SEL_MASK);
>>> + u64 cfg1 = event->attr.config1;
>>>
>>> if (event->attr.type != event->pmu->type)
>>> return -ENOENT;
>>> @@ -158,6 +216,23 @@ static int tad_pmu_event_init(struct perf_event
>> *event)
>>> if (event->state != PERF_EVENT_STATE_OFF)
>>> return -EINVAL;
>>>
>>> + if (event->attr.config & ~TAD_EVENT_SEL_MASK)
>>> + return -EINVAL;
>>> +
>>> + if (pdata->id == TAD_PMU_V2) {
>>> + if (cfg1)
>>> + return -EINVAL;
>>> + } else {
>>> + if ((cfg1 & GENMASK(8, 0)) && !(cfg1 &
>> TAD_PARTID_FILTER_EN))
>>> + return -EINVAL;
>>> + if (cfg1 & TAD_PARTID_FILTER_EN) {
>>> + if (event_idx <= 0x19 || event_idx >= 0x21)
>>> + return -EINVAL;
>>> + }
>>> + if (cfg1 & ~GENMASK(9, 0))
>>> + return -EINVAL;
>>> + }
>>> +
>>> event->cpu = tad_pmu->cpu;
>>> event->hw.idx = -1;
>>> event->hw.config_base = event->attr.config; @@ -232,7 +307,7 @@
>>> static struct attribute *ody_tad_pmu_event_attrs[] = {
>>> TAD_PMU_EVENT_ATTR(tad_hit_ltg, 0x1e),
>>> TAD_PMU_EVENT_ATTR(tad_hit_any, 0x1f),
>>> TAD_PMU_EVENT_ATTR(tad_tag_rd, 0x20),
>>> - TAD_PMU_EVENT_ATTR(tad_tot_cycle, 0xFF),
>>> + TAD_PMU_EVENT_ATTR(tad_tot_cycle, 0xff),
>>> NULL
>>> };
>>>
>>> @@ -242,9 +317,13 @@ static const struct attribute_group
>>> ody_tad_pmu_events_attr_group = { };
>>>
>>> PMU_FORMAT_ATTR(event, "config:0-7");
>>> +PMU_FORMAT_ATTR(partid, "config1:0-8"); PMU_FORMAT_ATTR(partid_en,
>>> +"config1:9-9");
>>>
>>> static struct attribute *tad_pmu_format_attrs[] = {
>>> &format_attr_event.attr,
>>> + &format_attr_partid.attr,
>>> + &format_attr_partid_en.attr,
>>> NULL
>>> };
>>>
>>> @@ -253,6 +332,16 @@ static struct attribute_group
>> tad_pmu_format_attr_group = {
>>> .attrs = tad_pmu_format_attrs,
>>> };
>>>
>>> +static struct attribute *ody_tad_pmu_format_attrs[] = {
>>> + &format_attr_event.attr,
>>> + NULL
>>> +};
>>> +
>>> +static struct attribute_group ody_tad_pmu_format_attr_group = {
>>> + .name = "format",
>>> + .attrs = ody_tad_pmu_format_attrs,
>>> +};
>>> +
>>> static ssize_t tad_pmu_cpumask_show(struct device *dev,
>>> struct device_attribute *attr, char *buf) { @@
>> -281,16 +370,25
>>> @@ static const struct attribute_group *tad_pmu_attr_groups[] = {
>>>
>>> static const struct attribute_group *ody_tad_pmu_attr_groups[] = {
>>> &ody_tad_pmu_events_attr_group,
>>> - &tad_pmu_format_attr_group,
>>> + &ody_tad_pmu_format_attr_group,
>>> &tad_pmu_cpumask_attr_group,
>>> NULL
>>> };
>>>
>>> +static const struct tad_pmu_ops tad_pmu_ops = {
>>> + .start_counter = tad_pmu_start_counter, };
>>> +
>>> +static const struct tad_pmu_ops tad_pmu_v2_ops = {
>>> + .start_counter = tad_pmu_v2_start_counter, };
>>> +
>>> static int tad_pmu_probe(struct platform_device *pdev) {
>>> const struct tad_pmu_data *dev_data;
>>> struct device *dev = &pdev->dev;
>>> struct tad_region *regions;
>>> + resource_size_t map_start;
>>> struct tad_pmu *tad_pmu;
>>> struct resource *res;
>>> u32 tad_pmu_page_size;
>>> @@ -298,7 +396,6 @@ static int tad_pmu_probe(struct platform_device
>> *pdev)
>>> u32 tad_cnt;
>>> int version;
>>> int i, ret;
>>> - char *name;
>>>
>>> tad_pmu = devm_kzalloc(&pdev->dev, sizeof(*tad_pmu),
>> GFP_KERNEL);
>>> if (!tad_pmu)
>>> @@ -312,6 +409,7 @@ static int tad_pmu_probe(struct platform_device
>> *pdev)
>>> return -ENODEV;
>>> }
>>> version = dev_data->id;
>>> + tad_pmu->pdata = dev_data;
>>>
>>> res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
>>> if (!res) {
>>> @@ -338,22 +436,31 @@ static int tad_pmu_probe(struct platform_device
>> *pdev)
>>> dev_err(&pdev->dev, "Can't find tad-cnt property\n");
>>> return ret;
>>> }
>>> + if (!tad_cnt || !tad_page_size || !tad_pmu_page_size) {
>>> + dev_err(&pdev->dev, "Invalid tad-cnt or page size\n");
>>> + return -EINVAL;
>>> + }
>>>
>>> regions = devm_kcalloc(&pdev->dev, tad_cnt,
>>> sizeof(*regions), GFP_KERNEL);
>>> if (!regions)
>>> return -ENOMEM;
>>>
>>> - /* ioremap the distributed TAD pmu regions */
>>> - for (i = 0; i < tad_cnt && res->start < res->end; i++) {
>>> - regions[i].base = devm_ioremap(&pdev->dev,
>>> - res->start,
>>> + map_start = res->start;
>>> + for (i = 0; i < tad_cnt; i++) {
>>> + if (map_start > res->end ||
>>> + tad_pmu_page_size > (resource_size_t)(res->end -
>> map_start + 1)) {
>>> + dev_err(&pdev->dev, "TAD PMU mem window too
>> small for tad-cnt=%u\n",
>>> + tad_cnt);
>>> + return -EINVAL;
>>> + }
>>> + regions[i].base = devm_ioremap(&pdev->dev, map_start,
>>> tad_pmu_page_size);
>>> if (!regions[i].base) {
>>> dev_err(&pdev->dev, "TAD%d ioremap fail\n", i);
>>> return -ENOMEM;
>>> }
>>> - res->start += tad_page_size;
>>> + map_start += tad_page_size;
>>> }
>>>
>>> tad_pmu->regions = regions;
>>> @@ -374,14 +481,16 @@ static int tad_pmu_probe(struct platform_device
>> *pdev)
>>> .read = tad_pmu_event_counter_read,
>>> };
>>>
>>> - if (version == TAD_PMU_V1)
>>> + if (version == TAD_PMU_V1) {
>>> tad_pmu->pmu.attr_groups = tad_pmu_attr_groups;
>>> - else
>>> + tad_pmu->ops = &tad_pmu_ops;
>>> + } else {
>>> tad_pmu->pmu.attr_groups = ody_tad_pmu_attr_groups;
>>> + tad_pmu->ops = &tad_pmu_v2_ops;
>>> + }
>>>
>>> tad_pmu->cpu = raw_smp_processor_id();
>>>
>>> - /* Register pmu instance for cpu hotplug */
>>> ret = cpuhp_state_add_instance_nocalls(tad_pmu_cpuhp_state,
>>> &tad_pmu->node);
>>> if (ret) {
>>> @@ -389,19 +498,24 @@ static int tad_pmu_probe(struct platform_device
>> *pdev)
>>> return ret;
>>> }
>>>
>>> - name = "tad";
>>> - ret = perf_pmu_register(&tad_pmu->pmu, name, -1);
>>> - if (ret)
>>> + ret = perf_pmu_register(&tad_pmu->pmu, "tad", -1);
>>> + if (ret) {
>>> + dev_err(&pdev->dev, "Error %d registering perf PMU\n", ret);
>>> cpuhp_state_remove_instance_nocalls(tad_pmu_cpuhp_state,
>>> &tad_pmu->node);
>>> + return ret;
>>> + }
>>>
>>> - return ret;
>>> + WRITE_ONCE(tad_pmu->perf_ready, true);
>>> +
>>> + return 0;
>>> }
>>>
>>> static void tad_pmu_remove(struct platform_device *pdev) {
>>> struct tad_pmu *pmu = platform_get_drvdata(pdev);
>>>
>>> + WRITE_ONCE(pmu->perf_ready, false);
>>> cpuhp_state_remove_instance_nocalls(tad_pmu_cpuhp_state,
>>> &pmu->node);
>>> perf_pmu_unregister(&pmu->pmu);
>>> @@ -410,12 +524,17 @@ static void tad_pmu_remove(struct
>>> platform_device *pdev) #if defined(CONFIG_OF) || defined(CONFIG_ACPI)
>>> static const struct tad_pmu_data tad_pmu_data = {
>>> .id = TAD_PMU_V1,
>>> + .tad_prf_offset = TAD_PRF_OFFSET,
>>> + .tad_pfc_offset = TAD_PFC_OFFSET,
>>> };
>>> +
>>> #endif
>>>
>>> #ifdef CONFIG_ACPI
>>> static const struct tad_pmu_data tad_pmu_v2_data = {
>>> .id = TAD_PMU_V2,
>>> + .tad_prf_offset = TAD_PRF_OFFSET,
>>> + .tad_pfc_offset = TAD_PFC_OFFSET,
>>> };
>>> #endif
>>>
>>> @@ -451,6 +570,9 @@ static int tad_pmu_offline_cpu(unsigned int cpu,
>> struct hlist_node *node)
>>> struct tad_pmu *pmu = hlist_entry_safe(node, struct tad_pmu, node);
>>> unsigned int target;
>>>
>>> + if (!READ_ONCE(pmu->perf_ready))
>>> + return 0;
>>> +
>>> if (cpu != pmu->cpu)
>>> return 0;
>>>
>>> @@ -491,6 +613,6 @@ static void __exit tad_pmu_exit(void)
>>> module_init(tad_pmu_init); module_exit(tad_pmu_exit);
>>>
>>> -MODULE_DESCRIPTION("Marvell CN10K LLC-TAD Perf driver");
>>> +MODULE_DESCRIPTION("Marvell CN10K LLC-TAD perf driver");
>>> MODULE_AUTHOR("Bhaskara Budiredla <bbudiredla@marvell.com>");
>>> MODULE_LICENSE("GPL v2");
>
^ permalink raw reply
* Re: [PATCH v4 2/2] tracing: Remove trace_printk.h from kernel.h
From: Steven Rostedt @ 2026-06-26 8:51 UTC (permalink / raw)
To: Nathan Chancellor
Cc: linux-kernel, linux-trace-kernel, Masami Hiramatsu, Mark Rutland,
Mathieu Desnoyers, Andrew Morton, Linus Torvalds,
Sebastian Andrzej Siewior, John Ogness, Thomas Gleixner,
Peter Zijlstra, Julia Lawall, Yury Norov, linux-doc, linux-kbuild,
linuxppc-dev, dri-devel, linux-stm32, linux-arm-kernel,
linux-rdma, linux-usb, linux-ext4, linux-nfs, kvm, intel-gfx
In-Reply-To: <20260625234158.GA261868@ax162>
On Thu, 25 Jun 2026 16:41:58 -0700
Nathan Chancellor <nathan@kernel.org> wrote:
> The following diff resolves it for me, should I send it as a separate
> patch or do you want to just fold it in with a note?
>
> diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
> index 621566345406..2301a701ffbb 100644
> --- a/include/linux/lockdep.h
> +++ b/include/linux/lockdep.h
> @@ -10,6 +10,7 @@
> #ifndef __LINUX_LOCKDEP_H
> #define __LINUX_LOCKDEP_H
>
> +#include <linux/instruction_pointer.h>
Ah, so the reason for this breakage is because lockdep was relying on
instruction_pointer.h, that just happened to be included in kernel.h
via trace_printk.h.
This is a separate issue, so it should be a separate patch. I'll add it
as patch 1 of this series.
Can you send me the config you used. This didn't trigger in my tests.
Thanks,
-- Steve
> #include <linux/lockdep_types.h>
> #include <linux/smp.h>
> #include <asm/percpu.h>
^ permalink raw reply
* Re: [PATCH v4 0/4] arm64: cross-CPU NMI via SDEI
From: Breno Leitao @ 2026-06-26 8:48 UTC (permalink / raw)
To: Doug Anderson, kernel-team
Cc: Kiryl Shutsemau, Marc Zyngier, Catalin Marinas, Will Deacon,
James Morse, Mark Rutland, Petr Mladek, Thomas Gleixner,
Andrew Morton, Baoquan He, Puranjay Mohan, Usama Arif,
Julien Thierry, Lecopzer Chen, Sumit Garg, kernel-team, kexec,
linux-arm-kernel, linux-kernel, paulmck, rmikey
In-Reply-To: <CAD=FV=UTcL1NVkvR8Fw_BXvHHk-vBtLSGoJrU-RFSt0yvGUjxA@mail.gmail.com>
On Mon, Jun 22, 2026 at 09:52:40AM -0700, Doug Anderson wrote:
> Having them as a stop-gap until true NMI is available
> seems nice to me
I completely agree. Should these patches unfortunately not make it upstream,
my plan is to maintain them downstream at Meta kernels until the Meta fleet
no longer contains any non-FEAT_NMI hosts.
The trade-offs under consideration are:
1) Accept the performance overhead from pseudo-NMI
2) Lose debuggability by disabling NMI entirely
3) Maintain downstream patches
Of these three options, carrying downstream patches appears to be the least
disruptive path for our business needs.
^ permalink raw reply
* Re: [PATCH v14 29/44] arm64: RMI: Runtime faulting of memory
From: Suzuki K Poulose @ 2026-06-26 8:47 UTC (permalink / raw)
To: Gavin Shan, Lorenzo Pieralisi
Cc: Steven Price, kvm, kvmarm, Catalin Marinas, Marc Zyngier,
Will Deacon, James Morse, Oliver Upton, Zenghui Yu,
linux-arm-kernel, linux-kernel, Joey Gouly, Alexandru Elisei,
Christoffer Dall, Fuad Tabba, linux-coco, Ganapatrao Kulkarni,
Shanker Donthineni, Alper Gun, Aneesh Kumar K . V, Emi Kisanuki,
Vishal Annapurve, WeiLin.Chang, Lorenzo.Pieralisi2
In-Reply-To: <8da87878-2a5d-478a-a280-60dbed7ad1b9@redhat.com>
On 26/06/2026 08:43, Gavin Shan wrote:
> On 6/26/26 1:58 AM, Suzuki K Poulose wrote:
>> On 25/06/2026 14:53, Gavin Shan wrote:
>>> On 6/6/26 12:35 AM, Lorenzo Pieralisi wrote:
>>>> On Fri, Jun 05, 2026 at 06:11:11PM +1000, Gavin Shan wrote:
>>>>> On 6/5/26 5:28 PM, Lorenzo Pieralisi wrote:
>>>>>> On Fri, Jun 05, 2026 at 04:23:15PM +1000, Gavin Shan wrote:
>
> [...]
>
>>>>>
>>>>> I tried to rebase Jean's latest QEMU series [1] to upstream QEMU,
>>>>> and found
>>>>> that memory slots backed by THP are broken. With THP disabled on
>>>>> the host and
>>>>> other fixes (mentioned in my prevous replies) applied on the top of
>>>>> this (v14)
>>>>> series, I'm able to boot a realm guest with rebased QEMU series
>>>>> [2], plus more
>>>>> fxies on the top.
>>>>>
>>>>> [1] https://git.codelinaro.org/linaro/dcap/qemu.git (branch: cca/
>>>>> latest)
>>>>> [2] https://git.qemu.org/git/qemu.git (branch: cca/
>>>>> gavin)
>>>>>
>>>>> Lorenzo, You may be saying there is someone making QEMU to support
>>>>> ARM/CCA?
>>>>
>>>> Mathieu and I are working on that yes and with Steven/Suzuki to fix
>>>> the THP
>>>> issues you pointed out above.
>>>>
>>>>> If so, I'm not sure if there is a QEMU repository for me to try?
>>>>
>>>> We should be able to submit patches by end of June - we shall let
>>>> you know
>>>> whether we can make something available earlier.
>>>>
>>>
>>> Not sure if there are other known issues in this series. It seems the
>>> stage2
>>> page fault handling on the shared space isn't working well. In my
>>> test, the
>>> vring (struct vring_desc) of virtio-net-pci is updated by the guest,
>>> and the
>>> data isn't seen by QEMU, I'm suspecting if the host-page-frame-number
>>> is properly
>>> resolved in the s2 page fault handler for shared (unprotected) space.
>>>
>>> - I rebased Jean's latest qemu branch to the upstream qemu;
>>>
>>> - On the host, which is emulated by qemu/tcg, the THP (transparent
>>> huge page) is
>>> disabled.
>>>
>>> - On the guest, I can see the virtio vring (struct vring_desc) is
>>> updated. The
>>> S1 page-table entry looks correct because the corresponding
>>> physical address
>>> 0x10046880000 is a sane shared (unprotected) space address.
>>>
>>> [ 52.094143] software IO TLB: Memory encryption is active and
>>> system is using DMA bounce buffers
>>> [ 52.289746] virtqueue_add_desc_split:
>>> desc[0]@0xffff000006880000, [00000100b983f000 00000640 0002 0001]
>>> [ 52.432150] PTE 0x00e8010046880707 at address 0xffff000006880000
>>>
>>> - On the host, the s2 page-table-entry is unmapped due to attribute
>>> transition (private -> shared).
>>> A subsequent S2 page fault is raised against the adress and the s2
>>> page-table-entry is built.
>>>
>>> [ 109.259077] ====> realm_unmap_shared_range:
>>> tracked_unprot_addr=0x10046880000
>>> [ 109.260249] realm_unmap_shared_range: unmapped shared range at
>>> 0x10046880000
>>> [ 109.317786] realm_unmap_shared_range: unmapped shared range at
>>> 0x10046880000
>>> [ 109.629939] ====> kvm_handle_guest_abort:
>>> fault_ipa=0x10046880000, esr=0x92000007
>>> [ 109.630245] realm_map_non_secure: ipa=0x10046880000,
>>> pfn=0xb8b59, size=0x1000, prot=0xf
>>> [ 109.630331] realm_map_non_secure: ipa=0x10046880000,
>>> ipa_top=0x10046881000, flags=0x1e0001, range_desc=0xb8b59004
>>
>> Are you able to correlate the order of the transitions and the Guest
>> access with RMM log ? We haven't seen this from our end. We are aware
>> of permission fault issues with Unprotected IPA when backing the memslot
>> with MAP_PRIVATE areas. But this looks different.
>>
>> Lorenzo, have you run into this ?
>>
>
> It's hard to correlate the order since the logs are collected from two
> separate
> consoles. For the write permission, I add code to the host where the
> permission
> is always added for all s2 page faults in the shared space. Otherwise,
> qemu can
> be killed by -EFAULT or similar error.
This is the problem. We can't add WRITE permission by default. I believe
you may have MAP_PRIVATE mapping and it has to be mapped as READ only
and on a permission fault, we replace it with a writable page. By
overriding the WRITE permission, you let the guest write to a page
that may not be seen by the VMM.
We identified this as a bug in the KVM driver in this series (reported
by Lorenzo) and there is a corresponding tf-RMM change that is required
to get this working. So, please could you wait until the next series
when this will be addressed ? Or you could switch to using MAP_SHARED
for the "shared" memory in the memslot.
Suzuki
>
> There are more findings after more experiments: this virtio-net-pci
> device has 3
> queues or vrings (Rx/Tx/Ctrl). The Rx/Tx/Ctrl queue are populated in
> order one after
> one. In the guest kernel, I intentionally write fixed data
> (0x0123456789abcdef) to
> the first 8 bytes of the queue when it gets populated, and stop the
> guest at random
> points to see if the data is gone. I found that the data written to Rx/
> Tx queue are
> lost after Ctrl queue is allocated.
>
> The data written to Rx/Tx queue is lost if the guest stops (B). The data
> written to
> Rx/Tx queue isn't lost if the guest stops at (A). I can see the pattern
> (0x0123...cdef)
> by dumping the physcial memory through 'pmemsave' command in qemu.
>
> DMA allocation
> ==============
> dma_alloc_coherent
> dma_alloc_attrs
> dma_direct_alloc
> __dma_direct_alloc_pages
> dma_set_decrypted // (A) No data lost if being
> stopped here for the Ctrl queue
> memset(ret, 0, size) // (B) Data lost after being
> stopped after memset() for the Ctrl queue
>
> The memset() on the Ctrl queue should trigger a stage2 page fault. It
> seems the page
> fault enforces the shared pages for Rx/Tx queue to be dropped? I need to
> add more
> debugging code and track it down.
>
>> Suzuki
>>
>>
>>>
>>> - On QEMU, the updated vring (struct vring_desc) at GPA 0x46880000
>>> isn't seen. All the
>>> data in that adress are zeros.
>>>
>>> ====> virtqueue_split_pop: vdev=<virtio-net>, sz=0x38,
>>> queue_index=0x0, vq->vring.num=0x100
>>> virtqueue_split_pop: last_avail_idx=0x0, head=0x0
>>> address_space_read_cached_slow: cache@0xffff1c036440, addr=0x0,
>>> buf=0xffffeee34880, len=0x10
>>> address_space_read_cached_slow: cache: ptr=0x0,
>>> xlat=0x10046880000, len=0x1000, mrs=<realm-dma-region>, is_write=no
>>> address_space_read_cached_slow: translated to mr=<mach-virt.ram>,
>>> mr_addr=0x6880000, l=0x10
>>> flatview_read_continue_step: mr=<mach-virt.ram>,
>>> host=0xffff23e00000, mr_addr=0x6880000, ram_ptr=0xffff2a680000
>>> virtqueue_split_pop: desc: 0000000000000000 - 00000000 - 00000000
>>> - 00000000
>>> qemu-system-aarch64: virtio: zero sized buffers are not allowed
>>>
>>>
> Thanks,
> Gavin
>
^ permalink raw reply
* Re: [PATCH v3 01/10] mailbox: imx: Forward the timeout/ error in imx_mu_generic_tx()
From: Sebastian Andrzej Siewior @ 2026-06-26 8:34 UTC (permalink / raw)
To: Peng Fan
Cc: linux-remoteproc, imx, linux-arm-kernel, linux-rt-devel,
Bjorn Andersson, Clark Williams, Fabio Estevam, Frank Li,
Jassi Brar, Mathieu Poirier, Pengutronix Kernel Team,
Sascha Hauer, Steven Rostedt
In-Reply-To: <aj43FQ7f6juXQ/tP@shlinux89>
On 2026-06-26 16:23:49 [+0800], Peng Fan wrote:
> On Wed, Jun 24, 2026 at 09:44:09AM +0200, Sebastian Andrzej Siewior wrote:
> >On 2026-06-22 19:24:00 [+0800], Peng Fan wrote:
> >> We may need to use atomic API for TXDB_V2. For the patchset itself, it
> >> looks good to me.
> >>
> >> Reviewed-by: Peng Fan <peng.fan@nxp.com>
> >
> >Thank you. Is there anything you want me to do or is this series good
> >as-is?
>
> If you would like to address the AI reported issue further, you may
> update readl_poll_timeout to readl_poll_timeout_atomic.
What about the timeout value? Keep it as-is or reduce to?
> From the fix on error return, this patch is ok to me, and the series is good.
Thanks.
> Thanks,
> Peng
> >
Sebastian
^ permalink raw reply
* Re: [PATCH] ARM: enable interrupts when arm_notify_die() is handling user mode errors
From: Xie Yuanbin @ 2026-06-26 8:29 UTC (permalink / raw)
To: linux, bigeasy, rmk+kernel
Cc: clrkwllms, rostedt, linusw, arnd, linux-arm-kernel, linux-kernel,
linux-rt-devel, liaohua4, lilinjie8, Xie Yuanbin
In-Reply-To: <20260625152159.WtO_S3i7@linutronix.de>
On Thu, 25 Jun 2026 17:21:59 +0200,Sebastian Andrzej Siewior wrote:
> Why would the latter be not appropriate?.
> Anyway, in the kernel case you do die() which disables interrupts as of
> oops_begin(). It does later restore the state in oops_end() and invokes
> make_task_dead(). This one will complain if either preemption or
> interrupts are disabled and reset both.
>
> Should you get that far and not panic() earlier (due to in_interrupt()
> for instance) then interrupts will be later enabled before that kernel
> thread is killed. So it could be done earlier or not, at this point the
> system is pretty much done.
My personal view is as follows: First, an Unhandled kernel fault indicates
that the kernel has encountered an error, which is different from an
Unhandled user fault. An Unhandled user fault can be artificially
constructed by user programs, whereas a healthy kernel, in theory, should
not trigger an Unhandled kernel fault.
When a kernel has already encountered a fault, I think that printing fault
information is more meaningful than improving the kernel's real-time
performance. Imagine this: The interrupts enable here, and at the same
time an interrupt arrives, and then another kernel error is triggered
within the interrupt context. The log would be a disaster.
But no matter what, the above are merely my personal views,
and I respect the maintainer's opinions.
^ permalink raw reply
* RE: [PATCH v3 1/2] i2c: imx: Clear slave pointer on registration error
From: liem @ 2026-06-26 8:30 UTC (permalink / raw)
To: carlos.song
Cc: andi.shyti, biwen.li, festevam, frank.li, frank.li, imx, kernel,
liem16213, linux-arm-kernel, linux-i2c, linux-kernel, o.rempel,
s.hauer, stable, wsa
In-Reply-To: <AM0PR04MB6802B863CD9B9AE1609C1785E8EB2@AM0PR04MB6802.eurprd04.prod.outlook.com>
Hi, carlos!
Thanks for the review.
This is a good idea; this is a better way to fix it.
I'll fix Patch 1 as suggested and send a v4.
Regards,
Liem
^ permalink raw reply
* Re: [PATCH v4 0/4] arm64: cross-CPU NMI via SDEI
From: YinFengwei @ 2026-06-26 8:25 UTC (permalink / raw)
To: Kiryl Shutsemau
Cc: Marc Zyngier, Catalin Marinas, Will Deacon, James Morse,
Mark Rutland, Doug Anderson, Petr Mladek, Thomas Gleixner,
Andrew Morton, Baoquan He, Puranjay Mohan, Usama Arif,
Breno Leitao, Julien Thierry, Lecopzer Chen, Sumit Garg,
kernel-team, kexec, linux-arm-kernel, linux-kernel
In-Reply-To: <ajk-Vge2qhaY-TwJ@thinkstation>
Hi Kirill
On Mon, Jun 22, 2026 at 02:56:16PM +0100, Kiryl Shutsemau wrote:
> On Fri, Jun 19, 2026 at 03:26:21PM +0100, Marc Zyngier wrote:
> > > Does your firmware set ICC_CTLR_EL1.PMHE? I'd be curious to see the
> > > numbers if the DSB was omitted on the enable path.
> >
> > I certainly don't observe this sort of overhead on the HW I have
> > access to, and would like to understand where this is coming from with
> > actual profiling data.
>
> Full disclosure: the ~66% figures come from internal testing about a year ago.
> I no longer have the details of the machine it ran on and can't confirm whether
> ICC_CTLR_EL1.PMHE was set there -- it may well have been. I shouldn't have
> carried those numbers forward without being able to stand behind them, so
> please disregard them.
>
> Here are fresh numbers from NVIDIA Grace (Neoverse V2). Importantly, this
> box reports:
>
> GICv3: Pseudo-NMIs enabled using relaxed ICC_PMR_EL1 synchronisation
>
> i.e. PMHE == 0, so the synchronising DSB on the unmask path is already
> patched to a NOP (ARM64_HAS_GIC_PRIO_RELAXED_SYNC). What's left is the
> floor cost of PMR-based masking itself plus the PMR save/restore on
> exception entry/exit -- not the DSB. So this is the case Catalin asked
> about (DSB omitted), and there is still a measurable cost.
>
> A trivial single-threaded gettid() loop (1e6 calls, median of 5,
> performance governor, ASLR off):
>
> pseudo_nmi=0 (DAIF): 178.4 ns/call
> pseudo_nmi=1 (PMR): 252.5 ns/call
> delta: +74.1 ns/call (~230-250 cycles)
> +41.5% wall time / 0.706 throughput
I tested the u-bench.c on a Neoverse N2 based arm64 server. The result
is as following:
pseudo_nmi=0 (DAIF): 96.3 ns/call
pseudo_nmi=1 (PMR): 169.8 ns/call
delta: +73.5 ns/call
>
> --- u-bench.c ---
> #include <unistd.h>
> #include <sys/syscall.h>
> #include <time.h>
> #include <stdio.h>
> int main(void) {
> struct timespec a, b;
> clock_gettime(CLOCK_MONOTONIC, &a);
> for (long i = 0; i < 1000000; i++)
> syscall(SYS_gettid);
> clock_gettime(CLOCK_MONOTONIC, &b);
> printf("%f ns\n", (b.tv_sec-a.tv_sec)*1e9 + (b.tv_nsec-a.tv_nsec));
> return 0;
> }
>
> will-it-scale agrees independently. sched_yield (ops/s, median of 5):
>
> 1 task 72 tasks
> pseudo_nmi=0 3,195,656 230,824,534
> pseudo_nmi=1 2,253,753 163,914,837
> ratio 0.705 0.710
>
> The ratio is flat across the whole 1-to-72 sweep, so -- relevant to the
> scalability question -- it's a constant per-syscall tax, not a contention
> effect. The impact tracks syscall/exception density: page_fault1, a more
> realistic workload, stays within ~5%.
>
> > The direction of travel is to deprecate SDEI. I wouldn't add more stuff
> > on top of this interface.
>
> I understand FEAT_NMI is the long-term answer, and I'm not arguing against
> deprecating SDEI. My concern is the gap in between. By our estimate it's
> 10+ years before the last non-FEAT_NMI machine retires from the fleet --
> for scale, we're still running Skylake today. So there's roughly a
> decade where a large installed base has neither FEAT_NMI nor affordable
> pseudo-NMI, and no way to reach a DAIF-masked CPU for an all-CPU
> backtrace or to capture a wedged CPU in a crash dump. That's the
> functional gap this series tries to cover.
>
> Given the deprecation direction, I deliberately kept the SDEI footprint as
> small as I could. The series adds no new firmware interface and no vendor
> SMC -- it uses only the standard software-signalled event (event 0) via
> SDEI_EVENT_SIGNAL, which is already present on these systems for
> firmware-first RAS (APEI/GHES). And SDEI is only ever invoked in a "bad
> state": to deliver a backtrace signal to a CPU that a normal IPI can't
> reach, or to stop a CPU that ignored the stop IPIs. Nothing on any hot or
> steady-state path touches it.
>
> If even that minimal use is unacceptable on a deprecated interface, I'd
> rather know now and redirect the effort -- but I'd appreciate a pointer to
> what should cover this gap for existing silicon in the meantime.
I couldn't agree more: We need a solution for existing system. And
like to see this patchset merged. Thanks.
Regards
Yin, Fengwei
>
> --
> Kiryl Shutsemau / Kirill A. Shutemov
>
^ permalink raw reply
* Re: [PATCH net] net: airoha: fix max receive size configuration
From: Lorenzo Bianconi @ 2026-06-26 8:25 UTC (permalink / raw)
To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Simon Horman
Cc: linux-arm-kernel, linux-mediatek, netdev, Madhur Agrawal
In-Reply-To: <20260625-airoha-fix-rx-max-len-v1-1-45b9b827358d@kernel.org>
[-- Attachment #1: Type: text/plain, Size: 12499 bytes --]
> Set the GDM maximum receive size to AIROHA_MAX_RX_SIZE unconditionally
> during hardware initialization instead of updating it according to the
> configured MTU. This avoids dropping incoming frames that exceed the
> current MTU but could still be processed by the networking stack, which
> is able to fragment the reply on the TX side (e.g. ICMP echo requests).
> Move the per-port MTU configuration to the PPE egress path where it
> belongs, and set the tx frame size running airoha_ppe_set_xmit_frame_size()
> to dynamically track the maximum MTU across running interfaces sharing
> the same PPE instance.
> Fix the PPE MTU register addressing to pack two port entries per
> register word and add WAN_MTU0 configuration for non-LAN GDM devices.
>
> Fixes: 54d989d58d2a ("net: airoha: Move min/max packet len configuration in airoha_dev_open()")
> Tested-by: Madhur Agrawal <madhur.agrawal@airoha.com>
> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
commenting on sashiko's report:
https://netdev-ai.bots.linux.dev/sashiko/#/patchset/20260625-airoha-fix-rx-max-len-v1-1-45b9b827358d%40kernel.org
> ---
> drivers/net/ethernet/airoha/airoha_eth.c | 68 ++++++++++---------------------
> drivers/net/ethernet/airoha/airoha_eth.h | 2 +
> drivers/net/ethernet/airoha/airoha_ppe.c | 39 +++++++++++++-----
> drivers/net/ethernet/airoha/airoha_regs.h | 9 ++--
> 4 files changed, 58 insertions(+), 60 deletions(-)
>
> diff --git a/drivers/net/ethernet/airoha/airoha_eth.c b/drivers/net/ethernet/airoha/airoha_eth.c
> index 932b3a3df2e5..3f451c2d4c24 100644
> --- a/drivers/net/ethernet/airoha/airoha_eth.c
> +++ b/drivers/net/ethernet/airoha/airoha_eth.c
> @@ -178,10 +178,15 @@ static void airoha_fe_maccr_init(struct airoha_eth *eth)
> {
> int p;
>
> - for (p = 1; p <= ARRAY_SIZE(eth->ports); p++)
> + for (p = 1; p <= ARRAY_SIZE(eth->ports); p++) {
> airoha_fe_set(eth, REG_GDM_FWD_CFG(p),
> GDM_TCP_CKSUM_MASK | GDM_UDP_CKSUM_MASK |
> GDM_IP4_CKSUM_MASK | GDM_DROP_CRC_ERR_MASK);
> + airoha_fe_rmw(eth, REG_GDM_LEN_CFG(p),
> + GDM_SHORT_LEN_MASK | GDM_LONG_LEN_MASK,
> + FIELD_PREP(GDM_SHORT_LEN_MASK, 60) |
> + FIELD_PREP(GDM_LONG_LEN_MASK, AIROHA_MAX_RX_SIZE));
> + }
>
> airoha_fe_rmw(eth, REG_CDM_VLAN_CTRL(1), CDM_VLAN_MASK,
> FIELD_PREP(CDM_VLAN_MASK, 0x8100));
> @@ -1831,13 +1836,24 @@ static void airoha_update_hw_stats(struct airoha_gdm_dev *dev)
> spin_unlock(&port->stats_lock);
> }
>
> +static void airoha_dev_set_xmit_frame_size(struct net_device *netdev)
> +{
> + struct airoha_gdm_dev *dev = netdev_priv(netdev);
> +
> + airoha_ppe_set_xmit_frame_size(dev);
> + if (!airoha_is_lan_gdm_dev(dev))
> + airoha_fe_rmw(dev->eth, REG_WAN_MTU0, WAN_MTU0_MASK,
> + FIELD_PREP(WAN_MTU0_MASK,
> + VLAN_ETH_HLEN + netdev->mtu));
> +}
- Could the WAN_MTU0 update here use the same max-across-siblings
aggregation as airoha_ppe_set_xmit_frame_size()?
- This is same issue reported by sashiko-gemini. There is just one WAN device
in the system so we do not need calculate the max MTU here.
> +
> static int airoha_dev_open(struct net_device *netdev)
> {
> - int err, len = ETH_HLEN + netdev->mtu + ETH_FCS_LEN;
> struct airoha_gdm_dev *dev = netdev_priv(netdev);
> struct airoha_gdm_port *port = dev->port;
> - u32 cur_len, pse_port = FE_PSE_PORT_PPE1;
> struct airoha_qdma *qdma = dev->qdma;
> + u32 pse_port = FE_PSE_PORT_PPE1;
> + int err;
>
> netif_tx_start_all_queues(netdev);
> err = airoha_set_vip_for_gdm_port(dev, true);
> @@ -1851,19 +1867,7 @@ static int airoha_dev_open(struct net_device *netdev)
> airoha_fe_clear(qdma->eth, REG_GDM_INGRESS_CFG(port->id),
> GDM_STAG_EN_MASK);
>
> - cur_len = airoha_fe_get(qdma->eth, REG_GDM_LEN_CFG(port->id),
> - GDM_LONG_LEN_MASK);
> - if (!port->users || len > cur_len) {
> - /* Opening a sibling net_device with a larger MTU updates the
> - * MTU of already running devices. This is required to allow
> - * multiple net_devices with different MTUs to share the same
> - * GDM port.
> - */
> - airoha_fe_rmw(qdma->eth, REG_GDM_LEN_CFG(port->id),
> - GDM_SHORT_LEN_MASK | GDM_LONG_LEN_MASK,
> - FIELD_PREP(GDM_SHORT_LEN_MASK, 60) |
> - FIELD_PREP(GDM_LONG_LEN_MASK, len));
> - }
> + airoha_dev_set_xmit_frame_size(netdev);
> port->users++;
>
> if (!airoha_is_lan_gdm_dev(dev) &&
> @@ -1875,30 +1879,6 @@ static int airoha_dev_open(struct net_device *netdev)
> return 0;
> }
>
> -static void airoha_set_port_mtu(struct airoha_eth *eth,
> - struct airoha_gdm_port *port)
> -{
> - u32 len = 0;
> - int i;
> -
> - for (i = 0; i < ARRAY_SIZE(port->devs); i++) {
> - struct airoha_gdm_dev *dev = port->devs[i];
> - struct net_device *netdev;
> -
> - if (!dev)
> - continue;
> -
> - netdev = netdev_from_priv(dev);
> - if (netif_running(netdev))
> - len = max_t(u32, len, netdev->mtu);
> - }
> - len += ETH_HLEN + ETH_FCS_LEN;
> -
> - airoha_fe_rmw(eth, REG_GDM_LEN_CFG(port->id),
> - GDM_LONG_LEN_MASK,
> - FIELD_PREP(GDM_LONG_LEN_MASK, len));
> -}
> -
> static int airoha_dev_stop(struct net_device *netdev)
> {
> struct airoha_gdm_dev *dev = netdev_priv(netdev);
> @@ -1909,7 +1889,7 @@ static int airoha_dev_stop(struct net_device *netdev)
> airoha_set_vip_for_gdm_port(dev, false);
>
> if (--port->users)
> - airoha_set_port_mtu(dev->eth, port);
> + airoha_ppe_set_xmit_frame_size(dev);
- On the close path, the call is to airoha_ppe_set_xmit_frame_size()
directly rather than the airoha_dev_set_xmit_frame_size() wrapper.
Does this mean WAN_MTU0 is never refreshed when a WAN dev is closed?
For example, if a small-MTU sibling is closed while a larger-MTU dev
remains running, the PPE MTU register gets recomputed to the larger
value but WAN_MTU0 retains the smaller value written at the last open
or change_mtu.
The commit message states:
set the tx frame size running airoha_ppe_set_xmit_frame_size()
to dynamically track the maximum MTU across running interfaces sharing
the same PPE instance.
Is the asymmetry between PPE MTU (max across siblings) and WAN_MTU0
(per-netdev write) intentional?
- This is same issue reported by sashiko-gemini. There is just one WAN device
in the system so there is no point to update WAN_MTU0 if the WAN device is
stopped.
Regards,
Lorenzo
> else
> airoha_set_gdm_port_fwd_cfg(qdma->eth,
> REG_GDM_FWD_CFG(port->id),
> @@ -1962,10 +1942,6 @@ static int airoha_enable_gdm2_loopback(struct airoha_gdm_dev *dev)
> FIELD_PREP(LPBK_CHAN_MASK, chan) |
> LBK_GAP_MODE_MASK | LBK_LEN_MODE_MASK |
> LBK_CHAN_MODE_MASK | LPBK_EN_MASK);
> - airoha_fe_rmw(eth, REG_GDM_LEN_CFG(AIROHA_GDM2_IDX),
> - GDM_SHORT_LEN_MASK | GDM_LONG_LEN_MASK,
> - FIELD_PREP(GDM_SHORT_LEN_MASK, 60) |
> - FIELD_PREP(GDM_LONG_LEN_MASK, AIROHA_MAX_MTU));
> /* Forward the traffic to the proper GDM port */
> pse_port = port->id == AIROHA_GDM3_IDX ? FE_PSE_PORT_GDM3
> : FE_PSE_PORT_GDM4;
> @@ -2098,7 +2074,7 @@ static int airoha_dev_change_mtu(struct net_device *netdev, int mtu)
>
> WRITE_ONCE(netdev->mtu, mtu);
> if (port->users)
> - airoha_set_port_mtu(dev->eth, port);
> + airoha_dev_set_xmit_frame_size(netdev);
>
> return 0;
> }
> diff --git a/drivers/net/ethernet/airoha/airoha_eth.h b/drivers/net/ethernet/airoha/airoha_eth.h
> index d7ff8c5200e2..0c3fb6e5d7f1 100644
> --- a/drivers/net/ethernet/airoha/airoha_eth.h
> +++ b/drivers/net/ethernet/airoha/airoha_eth.h
> @@ -23,6 +23,7 @@
> #define AIROHA_MAX_DSA_PORTS 7
> #define AIROHA_MAX_NUM_RSTS 3
> #define AIROHA_MAX_MTU 9220
> +#define AIROHA_MAX_RX_SIZE 16128
> #define AIROHA_MAX_PACKET_SIZE 2048
> #define AIROHA_NUM_QOS_CHANNELS 4
> #define AIROHA_NUM_QOS_QUEUES 8
> @@ -676,6 +677,7 @@ int airoha_get_fe_port(struct airoha_gdm_dev *dev);
> bool airoha_is_valid_gdm_dev(struct airoha_eth *eth,
> struct airoha_gdm_dev *dev);
>
> +void airoha_ppe_set_xmit_frame_size(struct airoha_gdm_dev *dev);
> void airoha_ppe_set_cpu_port(struct airoha_gdm_dev *dev, u8 ppe_id, u8 fport);
> bool airoha_ppe_is_enabled(struct airoha_eth *eth, int index);
> void airoha_ppe_check_skb(struct airoha_ppe_dev *dev, struct sk_buff *skb,
> diff --git a/drivers/net/ethernet/airoha/airoha_ppe.c b/drivers/net/ethernet/airoha/airoha_ppe.c
> index 42f4b0f21d17..e7c78293002a 100644
> --- a/drivers/net/ethernet/airoha/airoha_ppe.c
> +++ b/drivers/net/ethernet/airoha/airoha_ppe.c
> @@ -97,6 +97,33 @@ void airoha_ppe_set_cpu_port(struct airoha_gdm_dev *dev, u8 ppe_id, u8 fport)
> __field_prep(DFT_CPORT_MASK(fport), fe_cpu_port));
> }
>
> +void airoha_ppe_set_xmit_frame_size(struct airoha_gdm_dev *dev)
> +{
> + struct airoha_gdm_port *port = dev->port;
> + struct airoha_eth *eth = dev->eth;
> + int i, ppe_id, index;
> + u32 len = 0;
> +
> + for (i = 0; i < ARRAY_SIZE(port->devs); i++) {
> + struct airoha_gdm_dev *d = port->devs[i];
> + struct net_device *netdev;
> +
> + if (!d)
> + continue;
> +
> + netdev = netdev_from_priv(d);
> + if (netif_running(netdev))
> + len = max_t(u32, len, netdev->mtu);
> + }
> + len += VLAN_ETH_HLEN;
> +
> + ppe_id = !airoha_is_lan_gdm_dev(dev) && airoha_ppe_is_enabled(eth, 1);
> + index = port->id == AIROHA_GDM4_IDX ? 7 : port->id;
> + airoha_fe_rmw(eth, REG_PPE_MTU(ppe_id, index),
> + FP_EGRESS_MTU_MASK(index),
> + __field_prep(FP_EGRESS_MTU_MASK(index), len));
> +}
> +
> static void airoha_ppe_hw_init(struct airoha_ppe *ppe)
> {
> u32 sram_ppe_num_data_entries = PPE_SRAM_NUM_ENTRIES, sram_num_entries;
> @@ -115,8 +142,6 @@ static void airoha_ppe_hw_init(struct airoha_ppe *ppe)
> PPE_RAM_NUM_ENTRIES_SHIFT(sram_ppe_num_data_entries);
>
> for (i = 0; i < eth->soc->num_ppe; i++) {
> - int p;
> -
> airoha_fe_wr(eth, REG_PPE_TB_BASE(i),
> ppe->foe_dma + sram_tb_size);
>
> @@ -166,15 +191,6 @@ static void airoha_ppe_hw_init(struct airoha_ppe *ppe)
> airoha_fe_wr(eth, REG_PPE_HASH_SEED(i), PPE_HASH_SEED);
> airoha_fe_clear(eth, REG_PPE_PPE_FLOW_CFG(i),
> PPE_FLOW_CFG_IP6_6RD_MASK);
> -
> - for (p = 0; p < ARRAY_SIZE(eth->ports); p++)
> - airoha_fe_rmw(eth, REG_PPE_MTU(i, p),
> - FP0_EGRESS_MTU_MASK |
> - FP1_EGRESS_MTU_MASK,
> - FIELD_PREP(FP0_EGRESS_MTU_MASK,
> - AIROHA_MAX_MTU) |
> - FIELD_PREP(FP1_EGRESS_MTU_MASK,
> - AIROHA_MAX_MTU));
> }
>
> for (i = 0; i < ARRAY_SIZE(eth->ports); i++) {
> @@ -196,6 +212,7 @@ static void airoha_ppe_hw_init(struct airoha_ppe *ppe)
> airoha_ppe_is_enabled(eth, 1);
> fport = airoha_get_fe_port(dev);
> airoha_ppe_set_cpu_port(dev, ppe_id, fport);
> + airoha_ppe_set_xmit_frame_size(dev);
> }
> }
> }
> diff --git a/drivers/net/ethernet/airoha/airoha_regs.h b/drivers/net/ethernet/airoha/airoha_regs.h
> index 436f3c8779c1..6fed63d013b4 100644
> --- a/drivers/net/ethernet/airoha/airoha_regs.h
> +++ b/drivers/net/ethernet/airoha/airoha_regs.h
> @@ -327,9 +327,8 @@
> #define PPE_SRAM_TABLE_EN_MASK BIT(0)
>
> #define REG_PPE_MTU_BASE(_n) (((_n) ? PPE2_BASE : PPE1_BASE) + 0x304)
> -#define REG_PPE_MTU(_m, _n) (REG_PPE_MTU_BASE(_m) + ((_n) << 2))
> -#define FP1_EGRESS_MTU_MASK GENMASK(29, 16)
> -#define FP0_EGRESS_MTU_MASK GENMASK(13, 0)
> +#define REG_PPE_MTU(_m, _n) (REG_PPE_MTU_BASE(_m) + (((_n) / 2) << 2))
> +#define FP_EGRESS_MTU_MASK(_n) GENMASK(13 + (((_n) % 2) << 4), ((_n) % 2) << 4)
>
> #define REG_PPE_RAM_CTRL(_n) (((_n) ? PPE2_BASE : PPE1_BASE) + 0x31c)
> #define PPE_SRAM_CTRL_ACK_MASK BIT(31)
> @@ -377,6 +376,10 @@
> #define REG_SRC_PORT_FC_MAP6 0x2298
> #define FC_ID_OF_SRC_PORT_MASK(_n) GENMASK(4 + ((_n) << 3), ((_n) << 3))
>
> +#define REG_WAN_MTU0 0x2300
> +#define WAN_MTU1_MASK GENMASK(29, 16)
> +#define WAN_MTU0_MASK GENMASK(13, 0)
> +
> #define REG_CDM5_RX_OQ1_DROP_CNT 0x29d4
>
> /* QDMA */
>
> ---
> base-commit: fd1269e454089abda0e4f9e5e25ecd02a90ab009
> change-id: 20260618-airoha-fix-rx-max-len-57654b661646
>
> Best regards,
> --
> Lorenzo Bianconi <lorenzo@kernel.org>
>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
^ permalink raw reply
* Re: [PATCH 0/6] treewide: remove unnecessary invalid range checks in memblock iteration loops
From: Mike Rapoport @ 2026-06-26 8:23 UTC (permalink / raw)
To: Sang-Heon Jeon
Cc: Albert Ou, Andrew Morton, Andrey Ryabinin, Catalin Marinas,
Huacai Chen, Muchun Song, Oscar Salvador, Palmer Dabbelt,
Paul Walmsley, Will Deacon, Alexander Potapenko, Alexandre Ghiti,
Andrey Konovalov, David Hildenbrand, Dmitry Vyukov, kasan-dev,
linux-arm-kernel, linux-mm, linux-riscv, loongarch,
Vincenzo Frascino, WANG Xuerui
In-Reply-To: <20260621145919.1453-1-ekffu200098@gmail.com>
On Sun, Jun 21, 2026 at 11:59:10PM +0900, Sang-Heon Jeon wrote:
> The memblock API guarantees that for_each_mem_range() and
> for_each_mem_pfn_range() never return an invalid range, meaning start is
> always less than end.
>
> Several memblock callers still have unnecessary invalid range checks in
> their loop bodies, so remove them.
>
> Sang-Heon Jeon (6):
> arm64: mm: remove unreachable invalid range check in
> kasan_init_shadow()
> LoongArch: remove unreachable invalid range check in kasan_init()
> riscv: remove unreachable invalid range check in
> create_linear_mapping_page_table()
> riscv: remove unreachable invalid range check in kasan_init()
> mm: remove unnecessary empty range check in
> early_calculate_totalpages()
> mm/hugetlb: remove unnecessary empty range check in
> hugetlb_bootmem_set_nodes()
I queued this for inclusion into memblock tree.
> arch/arm64/mm/kasan_init.c | 3 ---
> arch/loongarch/mm/kasan_init.c | 3 ---
> arch/riscv/mm/init.c | 2 --
> arch/riscv/mm/kasan_init.c | 3 ---
> mm/hugetlb.c | 3 +--
> mm/mm_init.c | 3 +--
> 6 files changed, 2 insertions(+), 15 deletions(-)
>
> --
> 2.43.0
>
--
Sincerely yours,
Mike.
^ permalink raw reply
* [PATCH 2/2] usb: mtu3: condition PHY wakeup for host mode and runtime suspend
From: Fei Shao @ 2026-06-26 8:21 UTC (permalink / raw)
To: Chunfeng Yun
Cc: Fei Shao, Greg Kroah-Hartman, linux-arm-kernel, linux-kernel,
linux-mediatek, linux-usb
In-Reply-To: <20260626082218.2750459-1-fshao@chromium.org>
Host bus activity during gadget mode system suspend can trigger
unexpected system wakeups via interrupts because PHY wakeup is enabled
unconditionally.
Improve the suspend flow by conditioning PHY wakeup setup on
device_may_wakeup() and restricting it to host mode or runtime suspend
to ensure proper wakeup handling in gadget mode.
Signed-off-by: Fei Shao <fshao@chromium.org>
---
drivers/usb/mtu3/mtu3_plat.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/usb/mtu3/mtu3_plat.c b/drivers/usb/mtu3/mtu3_plat.c
index cc8a864dbd63..b4f4c776adb9 100644
--- a/drivers/usb/mtu3/mtu3_plat.c
+++ b/drivers/usb/mtu3/mtu3_plat.c
@@ -544,7 +544,8 @@ static int mtu3_suspend_common(struct device *dev, pm_message_t msg)
ssusb_phy_power_off(ssusb);
clk_bulk_disable_unprepare(BULK_CLKS_CNT, ssusb->clks);
- ssusb_wakeup_set(ssusb, true);
+ if (device_may_wakeup(dev) && (ssusb->is_host || PMSG_IS_AUTO(msg)))
+ ssusb_wakeup_set(ssusb, true);
return 0;
sleep_err:
--
2.55.0.rc0.799.gd6f94ed593-goog
^ permalink raw reply related
* [PATCH 1/2] usb: mtu3: allow system suspend during active gadget connection
From: Fei Shao @ 2026-06-26 8:21 UTC (permalink / raw)
To: Chunfeng Yun
Cc: Fei Shao, Greg Kroah-Hartman, linux-arm-kernel, linux-kernel,
linux-mediatek, linux-usb
In-Reply-To: <20260626082218.2750459-1-fshao@chromium.org>
When operating in gadget mode connected to a USB host, system suspend
fails with -EBUSY because active peripheral connections block suspend
entry.
Fix this by restricting the -EBUSY check to runtime autosuspend
(PMSG_IS_AUTO). For system suspend (!PMSG_IS_AUTO), perform soft
disconnect to disconnect from the bus and allow MAC sleep.
Fixes: 427c66422e14 ("usb: mtu3: support suspend/resume for device mode")
Signed-off-by: Fei Shao <fshao@chromium.org>
---
drivers/usb/mtu3/mtu3_core.c | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)
diff --git a/drivers/usb/mtu3/mtu3_core.c b/drivers/usb/mtu3/mtu3_core.c
index 66dbfe1705d5..a40bf5bad2d5 100644
--- a/drivers/usb/mtu3/mtu3_core.c
+++ b/drivers/usb/mtu3/mtu3_core.c
@@ -1037,9 +1037,14 @@ int ssusb_gadget_suspend(struct ssusb_mtk *ssusb, pm_message_t msg)
if (!mtu->gadget_driver)
return 0;
- if (mtu->connected)
+ /* Prevent runtime suspend when active connection exists */
+ if (mtu->connected && PMSG_IS_AUTO(msg))
return -EBUSY;
+ /* Perform soft disconnect for system suspend */
+ if (mtu->softconnect && !PMSG_IS_AUTO(msg))
+ mtu3_dev_on_off(mtu, 0);
+
mtu3_dev_suspend(mtu);
synchronize_irq(mtu->irq);
@@ -1055,5 +1060,9 @@ int ssusb_gadget_resume(struct ssusb_mtk *ssusb, pm_message_t msg)
mtu3_dev_resume(mtu);
+ /* Restore soft connect for system resume */
+ if (mtu->softconnect && !PMSG_IS_AUTO(msg))
+ mtu3_dev_on_off(mtu, 1);
+
return 0;
}
--
2.55.0.rc0.799.gd6f94ed593-goog
^ permalink raw reply related
* [PATCH 0/2] usb: mtu3: fix system suspend failure and improve gadget PM
From: Fei Shao @ 2026-06-26 8:21 UTC (permalink / raw)
To: Chunfeng Yun
Cc: Fei Shao, Greg Kroah-Hartman, linux-arm-kernel, linux-kernel,
linux-mediatek, linux-usb
This series resolves system suspend failures and improves power
management in the MediaTek MTU3 USB driver when operating in gadget
mode:
Patch 1 fixes a system suspend failure when connected to a host by
restricting the active connection -EBUSY check to runtime autosuspend
and performing a soft disconnect.
Patch 2 improves PHY wakeup handling by conditioning setup on
device_may_wakeup() and restricting it to host mode or runtime suspend.
Regards,
Fei
Fei Shao (2):
usb: mtu3: allow system suspend during active gadget connection
usb: mtu3: condition PHY wakeup for host mode and runtime suspend
drivers/usb/mtu3/mtu3_core.c | 11 ++++++++++-
drivers/usb/mtu3/mtu3_plat.c | 3 ++-
2 files changed, 12 insertions(+), 2 deletions(-)
--
2.55.0.rc0.799.gd6f94ed593-goog
^ permalink raw reply
* Re: [PATCH 0/4] module: force sh_addr=0 for arch-specific sections
From: patchwork-bot+linux-riscv @ 2026-06-26 8:21 UTC (permalink / raw)
To: Petr Pavlu
Cc: linux-riscv, linux, catalin.marinas, will, geert, pjw, palmer,
aou, samitolvanen, alex, mcgrof, da.gomez, atomlin, joe.lawrence,
linux-arm-kernel, linux-m68k, linux-modules, linux-kernel
In-Reply-To: <20260327080023.861105-1-petr.pavlu@suse.com>
Hello:
This series was applied to riscv/linux.git (fixes)
by Sami Tolvanen <samitolvanen@google.com>:
On Fri, 27 Mar 2026 08:58:59 +0100 you wrote:
> When linking modules with 'ld.bfd -r', sections defined without an address
> inherit the location counter, resulting in non-zero sh_addr values in the
> resulting .ko files. Relocatable objects are expected to have sh_addr=0 for
> all sections. Non-zero addresses are confusing in this context, typically
> worse compressible, and may cause tools to misbehave [1].
>
> Joe Lawrence previously addressed the same issue in the main
> scripts/module.lds.S file [2] and we discussed that the same fix should be
> also applied to architecture-specific module sections. This series
> implements these changes.
>
> [...]
Here is the summary with links:
- [1/4] module, arm: force sh_addr=0 for arch-specific sections
https://git.kernel.org/riscv/c/ffe1545ce8a0
- [2/4] module, arm64: force sh_addr=0 for arch-specific sections
https://git.kernel.org/riscv/c/c5553deb577f
- [3/4] module, m68k: force sh_addr=0 for arch-specific sections
https://git.kernel.org/riscv/c/9cb4d4dc8227
- [4/4] module, riscv: force sh_addr=0 for arch-specific sections
https://git.kernel.org/riscv/c/04e17ca3f77e
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox