* [PATCH v7 2/2] arm64: dts: cix: add sky1 DMA-350 node with channel IRQ entries
From: Jun Guo @ 2026-05-21 7:29 UTC (permalink / raw)
To: peter.chen, fugang.duan, robh, krzk+dt, conor+dt, vkoul, ychuang3,
schung, robin.murphy, Frank.Li
Cc: dmaengine, devicetree, linux-kernel, cix-kernel-upstream,
linux-arm-kernel, Jun Guo
In-Reply-To: <20260521072924.3000282-1-jun.guo@cixtech.com>
Describe the DMA-350 channel interrupt sources in DT using 8
interrupt entries, while all entries map to the same GIC SPI
as wired on this platform.
Signed-off-by: Jun Guo <jun.guo@cixtech.com>
---
arch/arm64/boot/dts/cix/sky1.dtsi | 14 ++++++++++++++
1 file changed, 14 insertions(+)
diff --git a/arch/arm64/boot/dts/cix/sky1.dtsi b/arch/arm64/boot/dts/cix/sky1.dtsi
index bb5cfb1f2113..823adeef51f1 100644
--- a/arch/arm64/boot/dts/cix/sky1.dtsi
+++ b/arch/arm64/boot/dts/cix/sky1.dtsi
@@ -444,6 +444,20 @@ iomuxc: pinctrl@4170000 {
reg = <0x0 0x04170000 0x0 0x1000>;
};
+ fch_dmac: dma-controller@4190000 {
+ compatible = "arm,dma-350";
+ reg = <0x0 0x4190000 0x0 0x10000>;
+ interrupts = <GIC_SPI 303 IRQ_TYPE_LEVEL_HIGH 0>,
+ <GIC_SPI 303 IRQ_TYPE_LEVEL_HIGH 0>,
+ <GIC_SPI 303 IRQ_TYPE_LEVEL_HIGH 0>,
+ <GIC_SPI 303 IRQ_TYPE_LEVEL_HIGH 0>,
+ <GIC_SPI 303 IRQ_TYPE_LEVEL_HIGH 0>,
+ <GIC_SPI 303 IRQ_TYPE_LEVEL_HIGH 0>,
+ <GIC_SPI 303 IRQ_TYPE_LEVEL_HIGH 0>,
+ <GIC_SPI 303 IRQ_TYPE_LEVEL_HIGH 0>;
+ #dma-cells = <1>;
+ };
+
mbox_ap2se: mailbox@5060000 {
compatible = "cix,sky1-mbox";
reg = <0x0 0x05060000 0x0 0x10000>;
--
2.34.1
^ permalink raw reply related
* Re: [PATCH 7/8] sched_ext: Sub-allocator over kernel-claimed BPF arena pages
From: Peter Zijlstra @ 2026-05-21 7:27 UTC (permalink / raw)
To: Tejun Heo
Cc: Alexei Starovoitov, David Vernet, Andrea Righi, Changwoo Min,
Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
Martin KaFai Lau, Kumar Kartikeya Dwivedi, Catalin Marinas,
Will Deacon, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, Andrew Morton, David Hildenbrand, Mike Rapoport,
Emil Tsalapatis, sched-ext, bpf, x86, linux-arm-kernel, linux-mm,
linux-kernel
In-Reply-To: <eaeb316b7a45956ed5f335050e6e9538@kernel.org>
On Wed, May 20, 2026 at 01:47:32PM -1000, Tejun Heo wrote:
> Hello,
>
> On Mon, May 18, 2026 at 04:26:11PM -0700, Alexei Starovoitov wrote:
> > Well, this gen_pool based allocator of arena memory is a temporary hack.
> > It's ok for rare allocation like in this at scx init time, but not suitable
> > for active arena management. We don't need to expose it beyond scx.
>
> I see. Peter, as Alexei is already prototyping a slab-based arena
> allocator, how about keeping the gen_pool layer scx-local for now? Once
> the proper allocator lands, scx can switch to it and the custom piece
> goes away.
OK.
^ permalink raw reply
* Re: [PATCH v2 1/1] arm64: dts: ti: k3-j7: Reserve memory for LPM meta data
From: Richard GENOUD @ 2026-05-21 7:27 UTC (permalink / raw)
To: Nishanth Menon
Cc: Vignesh Raghavendra, Tero Kristo, Rob Herring,
Krzysztof Kozlowski, Conor Dooley, Udit Kumar, Abhash Kumar,
Beleswar Padhi, Thomas Richard, Gregory CLEMENT, Thomas Petazzoni,
linux-arm-kernel, devicetree, linux-kernel
In-Reply-To: <20260505125945.mxucy4cfpno2x66z@slacking>
Hi Nishanth,
Le 05/05/2026 à 14:59, Nishanth Menon a écrit :
> On 18:03-20260427, Richard Genoud (TI) wrote:
>> From: Prasanth Babu Mantena <p-mantena@ti.com>
>>
>> For TI SOCs J7200, J784S4, J722S, J721s2 which support low power modes,
>> a chunk of memory is reserved for LPM meta data, which is needed for
>> saving ATF context and the certificate information of ATF and OPTEE and
>> DM image. This LPM metadata area is firewalled to be accessed only by
>> TIFS.
>>
>> U-Boot/TIFS will use this area to save and restore:
>> - ATF context
>> - ATF certificate information
>> - OPTEE certificate information
>> - DM image
>
> DM image is loaded from storage, correct?
Actually, after being loaded from storage at boot time by U-Boot R5 SPL,
the DM image is copied in this memory, so that it doesn't have to be
loaded from storage at resume. (This speeds up the resume time)
For the context:
At resume, U-Boot R5 SPL is executed and detects that the board is
resuming (with a flag set in the PMIC), then it:
- brings out of retention the DDR
- retrieves the LPM memory region from DTS
- authenticates certificates from LPM memory region and applies firewalls
- asks TIFS to restore TFA and its own minimal context
- starts TFA on remote proc
- loads back DM image from memory and jumps to DM
>>
>> https://software-dl.ti.com/tisci/esd/latest/2_tisci_msgs/pm/lpm.html#lpm-msg-lpm-save-addr
>>
>> U-Boot has to parse and retrieve this area from the device tree, thus
>> @lpm-memory node are used instead of the generic @memory.
>>
>> Signed-off-by: Prasanth Babu Mantena <p-mantena@ti.com>
>> Signed-off-by: Richard Genoud (TI) <richard.genoud@bootlin.com>
>> ---
>> arch/arm64/boot/dts/ti/k3-j7200-som-p0.dtsi | 6 ++++++
>> arch/arm64/boot/dts/ti/k3-j721s2-som-p0.dtsi | 6 ++++++
>> arch/arm64/boot/dts/ti/k3-j722s-evm.dts | 6 ++++++
>> arch/arm64/boot/dts/ti/k3-j742s2-evm.dts | 9 +++++++++
>> arch/arm64/boot/dts/ti/k3-j784s4-evm.dts | 9 ++++++---
>
> Split this up into platform wise. I dont understand why you'd not modify
> the ipc-firmware.dtsi and use the phandle similar to https://lore.kernel.org/all/20260318-topic-am62a-ioddr-dt-v6-19-v3-4-c41473cb23c3@baylibre.com/
The wkup_r5fss0_core0_memory_region can't be used in our case because
the DM memory isn't retained during suspend.
For Sitara, the LPM metadata are stored in the DM DDR, but here, as the
DM memory is not kept during suspend, the LPM meta-data is stored in
another memory region, so I don't think I can use this phandle.
>
> Split the patches per ipc-firmware.dtsi as required.
>
>> 5 files changed, 33 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/arm64/boot/dts/ti/k3-j7200-som-p0.dtsi b/arch/arm64/boot/dts/ti/k3-j7200-som-p0.dtsi
>> index 5a8c2e707fde..756928a2d411 100644
>> --- a/arch/arm64/boot/dts/ti/k3-j7200-som-p0.dtsi
>> +++ b/arch/arm64/boot/dts/ti/k3-j7200-som-p0.dtsi
>> @@ -40,6 +40,12 @@ mcu_r5fss0_core0_memory_region: memory@a0100000 {
>> reg = <0x00 0xa0100000 0x00 0xf00000>;
>> no-map;
>> };
>> +
>> + lpm_memory_region: lpm-memory@a4800000 {
>
> vignesh already flagged this in previous revision - just use phandle
> reference in u-boot and make this memory@
I would happily use the phandle, but as this memory is not related to
the DM DDR, I don't think I can.
Thanks!
Regards,
Richard
>
>> + reg = <0x00 0xa4800000 0x00 0x00300000>;
>> + no-map;
>> + bootph-all;
>> + };
>> };
>>
>> mux0: mux-controller-0 {
>> diff --git a/arch/arm64/boot/dts/ti/k3-j721s2-som-p0.dtsi b/arch/arm64/boot/dts/ti/k3-j721s2-som-p0.dtsi
>> index 12a38dd1514b..ceab8f057640 100644
>> --- a/arch/arm64/boot/dts/ti/k3-j721s2-som-p0.dtsi
>> +++ b/arch/arm64/boot/dts/ti/k3-j721s2-som-p0.dtsi
>> @@ -42,6 +42,12 @@ mcu_r5fss0_core0_memory_region: memory@a0100000 {
>> reg = <0x00 0xa0100000 0x00 0xf00000>;
>> no-map;
>> };
>> +
>> + lpm_memory_region: lpm-memory@a9c00000 {
>> + reg = <0x00 0xa9c00000 0x00 0x00300000>;
>> + no-map;
>> + bootph-all;
>> + };
>> };
>>
>> mux0: mux-controller-0 {
>> diff --git a/arch/arm64/boot/dts/ti/k3-j722s-evm.dts b/arch/arm64/boot/dts/ti/k3-j722s-evm.dts
>> index e66330c71593..eebc5cc7d4cd 100644
>> --- a/arch/arm64/boot/dts/ti/k3-j722s-evm.dts
>> +++ b/arch/arm64/boot/dts/ti/k3-j722s-evm.dts
>> @@ -63,6 +63,12 @@ wkup_r5fss0_core0_memory_region: memory@a0100000 {
>> reg = <0x00 0xa0100000 0x00 0xf00000>;
>> no-map;
>> };
>> +
>> + lpm_memory_region: lpm-memory@a6c00000 {
>> + reg = <0x00 0xa6c00000 0x00 0x00300000>;
>> + no-map;
>> + bootph-all;
>> + };
>> };
>>
>> vmain_pd: regulator-0 {
>> diff --git a/arch/arm64/boot/dts/ti/k3-j742s2-evm.dts b/arch/arm64/boot/dts/ti/k3-j742s2-evm.dts
>> index fcb7f05d7faf..d0752c8a6b37 100644
>> --- a/arch/arm64/boot/dts/ti/k3-j742s2-evm.dts
>> +++ b/arch/arm64/boot/dts/ti/k3-j742s2-evm.dts
>> @@ -23,4 +23,13 @@ memory@80000000 {
>> device_type = "memory";
>> bootph-all;
>> };
>> +
>> +};
>> +
>> +&reserved_memory {
>> + lpm_memory_region: lpm-memory@ab000000 {
>> + reg = <0x00 0xab000000 0x00 0x00300000>;
>> + no-map;
>> + bootph-all;
>> + };
>> };
>> diff --git a/arch/arm64/boot/dts/ti/k3-j784s4-evm.dts b/arch/arm64/boot/dts/ti/k3-j784s4-evm.dts
>> index 6c7458c76f53..114594f37f0b 100644
>> --- a/arch/arm64/boot/dts/ti/k3-j784s4-evm.dts
>> +++ b/arch/arm64/boot/dts/ti/k3-j784s4-evm.dts
>> @@ -23,10 +23,13 @@ memory@80000000 {
>> device_type = "memory";
>> bootph-all;
>> };
>> +};
>>
>> - reserved_memory: reserved-memory {
>> - #address-cells = <2>;
>> - #size-cells = <2>;
>> +&reserved_memory {
>> + lpm_memory_region: lpm-memory@ac000000 {
>> + reg = <0x00 0xac000000 0x00 0x00300000>;
>> + no-map;
>> + bootph-all;
>> };
>> };
>>
>
^ permalink raw reply
* [RESEND v4] ASoC: dt-bindings: imx-card: Complete the full list of supported DAI formats
From: Chancel Liu @ 2026-05-21 7:22 UTC (permalink / raw)
To: lgirdwood, broonie, robh, krzk+dt, conor+dt, Frank.Li, s.hauer
Cc: kernel, festevam, shengjiu.wang, linux-sound, devicetree, imx,
linux-arm-kernel, linux-kernel
Currently this binding only lists i2s and dsp_b formats that are used
by existing sound cards. However, DT bindings should describe the full
hardware capabilities rather than only the formats of current usage.
The SAI audio controller of i.MX audio sound card supports multiple DAI
formats, including:
- i2s
- left_j
- right_j
- dsp_a
- dsp_b
- pdm
- msb
- lsb
Complete the full list of formats supported by i.MX audio sound card to
ensure the binding correctly describes hardware.
Signed-off-by: Chancel Liu <chancel.liu@nxp.com>
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com>
---
Changes in v4:
- Completed the full list of DAI formats (i2s, left_j, right_j, dsp_a,
dsp_b, pdm, msb, lsb) supported by i.MX sound card.
- Rewrote commit message to focus on describing hardware capability
rather than current usage.
Changes in v3:
- Rewrote commit message completely to describe hardware requirements.
Explicitly documented why only dsp_a is added and why other formats
are not included.
- Rebased on latest code base. No functional changes.
Changes in v2:
- Updated commit message to explain current support for i2s and dsp_b
formats and new support for dsp_a. No code changes.
Documentation/devicetree/bindings/sound/imx-audio-card.yaml | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/Documentation/devicetree/bindings/sound/imx-audio-card.yaml b/Documentation/devicetree/bindings/sound/imx-audio-card.yaml
index 5424d4f16f52..950e3eab2942 100644
--- a/Documentation/devicetree/bindings/sound/imx-audio-card.yaml
+++ b/Documentation/devicetree/bindings/sound/imx-audio-card.yaml
@@ -37,7 +37,13 @@ patternProperties:
items:
enum:
- i2s
+ - left_j
+ - right_j
+ - dsp_a
- dsp_b
+ - pdm
+ - msb
+ - lsb
dai-tdm-slot-num: true
--
2.50.1
^ permalink raw reply related
* Re: [PATCH v3 0/6] KVM: arm64: Don't perform vgic-v2 lazy init on timer injection
From: Marc Zyngier @ 2026-05-21 7:23 UTC (permalink / raw)
To: kvmarm, linux-arm-kernel, Marc Zyngier
Cc: Deepanshu Kartikey, Steffen Eiden, Joey Gouly, Suzuki K Poulose,
Oliver Upton, Zenghui Yu
In-Reply-To: <20260520100200.543845-1-maz@kernel.org>
On Wed, 20 May 2026 11:01:54 +0100, Marc Zyngier wrote:
> This is the third version of this series aiming at fixing issues with
> vgic-v2 being initialised from non-preemptible context.
>
> * From v2 [2]:
>
> - Remove the PMU's irq level cache which was hidding in plain sight
>
> [...]
Applied to next, thanks!
[1/6] KVM: arm64: timer: Repaint kvm_timer_{should,irq_can}_fire() to kvm_timer_{pending,enabled}()
commit: 68a612d4dbc7f2b9dac731c79676a21fce573d29
[2/6] KVM: arm64: Simplify userspace notification of interrupt state
commit: 0d27b4b351493cb2fe1f87cd152856704d4e141d
[3/6] KVM: arm64: timer: Kill the per-timer irq level cache
commit: ac7002031852ab8f75b3debb1a4c4b2d1ff5a26c
[4/6] KVM: arm64: pmu: Kill the PMU interrupt level cache
commit: 2772383afc5c65d6242f62947b5c184ffb049359
[5/6] KVM: arm64: vgic-v2: Force vgic init on injection outside the run loop
commit: 1a8685ed8cd1ded20d0c81070a49b1cddf70481d
[6/6] KVM: arm64: vgic-v2: Don't init the vgic on in-kernel interrupt injection
commit: 958023d269e0312d10da85a6a49438d2e107dead
Cheers,
M.
--
Without deviation from the norm, progress is not possible.
^ permalink raw reply
* Re: [PATCH v4 1/3] PCI: Allow ATS to be always on for CXL.cache capable devices
From: Yi Liu @ 2026-05-21 7:31 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Nicolin Chen, will, robin.murphy, bhelgaas, joro, praan, baolu.lu,
kevin.tian, miko.lenczewski, linux-arm-kernel, iommu,
linux-kernel, linux-pci, dan.j.williams, jonathan.cameron, vsethi,
linux-cxl, nirmoyd
In-Reply-To: <20260520143410.GV3602937@nvidia.com>
On 5/20/26 22:34, Jason Gunthorpe wrote:
> On Wed, May 20, 2026 at 09:12:31PM +0800, Yi Liu wrote:
>> On 4/27/26 13:54, Nicolin Chen wrote:
>>> Controlled by the IOMMU driver, ATS is usually enabled "on demand" when a
>>> given PASID on a device is attached to an I/O page table. This is working
>>> even when a device has no translation on its RID (i.e., the RID is IOMMU
>>> bypassed).
>>
>> nit: this description seems not accurate. Intel iommu driver enables ATS
>> in the probe_device() phase. mind tweak a bit to avoid misleading
>> message. :)
>
> It probably shouldn't do this, it should follow ARM and have it
> dynamic during domain attach.
Agreed that making it dynamic during domain attach is a better
direction. However, even framing it that way, the description tying ATS
enablement to PASID attachment is still architecturally specific to ARM
SMMUv3, and doesn't hold as a general statement. :)
> For security we need ATS disabled for blocking domains at a minimum.
Agreed on the security model.
One more data point worth discussing: today Intel's IOMMU driver enables
ATS at probe time, which has two effects — enabling the PCI ATS
capability on the device, and setting the DTE bit in the scalable-mode
PASID-table entry. When a RID or PASID is subsequently attached to a
blocking domain, the corresponding PASID-table entry has its Present (P)
bit cleared.
Per the VT-d spec (condition SPT.2), with P=0:
- Translation Requests (with or without PASID) complete successfully,
but return R=W=U=S=0 to the device — effectively a no-access result.
- Untranslated Requests receive UR.
- Translated Requests are N/A.
So while neither the PCI ATS capability nor the DTE bit is explicitly
cleared when a blocking domain is attached, ATS-related transactions
don't produce any usable result from the device's perspective.
Does this hardware behavior satisfy the security expectation you have in
mind? Or do you still require that both the DTE bit and the PCI ATS
capability be explicitly disabled when a blocking domain is in effect?
Regards,
Yi Liu
^ permalink raw reply
* Re: [PATCH v5 7/8] dt-bindings: raspberrypi,bcm2835-firmware: Drop unnecessary select
From: Krzysztof Kozlowski @ 2026-05-21 7:11 UTC (permalink / raw)
To: Gregor Herburger, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
Florian Fainelli, Ray Jui, Scott Branden,
Broadcom internal kernel review list, Eric Anholt, Stefan Wahren,
Srinivas Kandagatla, Kees Cook, Gustavo A. R. Silva,
Thomas Weißschuh
Cc: devicetree, linux-rpi-kernel, linux-arm-kernel, linux-kernel,
linux-hardening, Conor Dooley
In-Reply-To: <20260520-rpi-otp-driver-v5-7-b26e5908eeac@linutronix.de>
On 20/05/2026 16:27, Gregor Herburger wrote:
> The select schema is not necessary because the
> raspberrypi,bcm2835-firmware compatible is already matched by the
> compatible string values.
This is wrong. The select was not because of that. Select was needed
because of simple-mfd, but dtschema was changed, so please rephrase:
The "select" in schema is not necessary anymore since dtschema drops
simple-mfd when constructing the select/filter query for schemas with
compatibles.
Best regards,
Krzysztof
^ permalink raw reply
* Re: [PATCH v5 8/8] arm64: defconfig: Enable the raspberrypi otp driver as module
From: Krzysztof Kozlowski @ 2026-05-21 7:09 UTC (permalink / raw)
To: Gregor Herburger, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
Florian Fainelli, Ray Jui, Scott Branden,
Broadcom internal kernel review list, Eric Anholt, Stefan Wahren,
Srinivas Kandagatla, Kees Cook, Gustavo A. R. Silva,
Thomas Weißschuh
Cc: devicetree, linux-rpi-kernel, linux-arm-kernel, linux-kernel,
linux-hardening
In-Reply-To: <20260520-rpi-otp-driver-v5-8-b26e5908eeac@linutronix.de>
On 20/05/2026 16:28, Gregor Herburger wrote:
> Enable the newly add Raspberry Pi OTP driver as module to allow access
> to the otp registers.
... on foo bar board?
Otherwise, why do we want it in upstream?
Best regards,
Krzysztof
^ permalink raw reply
* Re: [PATCH v2 0/2] KVM: arm64: nv: Reduce FP/SVE overhead on exception/exception return
From: Marc Zyngier @ 2026-05-21 7:07 UTC (permalink / raw)
To: kvmarm, linux-arm-kernel, kvm, Marc Zyngier
Cc: Steffen Eiden, Joey Gouly, Suzuki K Poulose, Oliver Upton,
Zenghui Yu, Mark Rutland, Will Deacon, Fuad Tabba
In-Reply-To: <20260520085036.541666-1-maz@kernel.org>
On Wed, 20 May 2026 09:50:34 +0100, Marc Zyngier wrote:
> This is the second version of this short series optimising away a lot
> of unnecessary FPSIMD/SVE context switch with NV.
>
> * From v1 [1]:
>
> - New commit message on patch #2 (Mark)
>
> [...]
Applied to next, thanks!
[1/2] KVM: arm64: nv: Track L2 to L1 exception emulation
commit: 27ae400e6e888153ded1ad807a94a94e506dd2df
[2/2] KVM: arm64: nv: Don't save/restore FP register during a nested ERET or exception
commit: 435c466196148ae116f616e6cda97c33281defc2
Cheers,
M.
--
Without deviation from the norm, progress is not possible.
^ permalink raw reply
* Re: [PATCH 1/8] mm: Add ptep_try_set() for lockless empty-slot installs
From: Andrea Righi @ 2026-05-21 7:00 UTC (permalink / raw)
To: Tejun Heo
Cc: David Vernet, Changwoo Min, Alexei Starovoitov, Andrii Nakryiko,
Daniel Borkmann, Martin KaFai Lau, Kumar Kartikeya Dwivedi,
Peter Zijlstra, Catalin Marinas, Will Deacon, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen, Andrew Morton,
David Hildenbrand, Mike Rapoport, Emil Tsalapatis, sched-ext, bpf,
x86, linux-arm-kernel, linux-mm, linux-kernel
In-Reply-To: <20260520235052.4180316-2-tj@kernel.org>
Hi Tejun,
On Wed, May 20, 2026 at 01:50:45PM -1000, Tejun Heo wrote:
> Add ptep_try_set(ptep, new_pte): atomically set *ptep to new_pte iff it is
> currently pte_none(). Returns true on success, false if the slot was already
> populated or the arch has no implementation.
>
> The intended caller is the upcoming bpf_arena kernel-side fault recovery
> path. The install runs from a page fault that can be nested under locks
> held by the faulting kernel caller (e.g. a BPF program holding
> raw_res_spin_lock_irqsave on its arena's spinlock), so trylock-and-retry
> would A-A deadlock. Lock-free cmpxchg is the only viable option, which
> constrains this helper to special kernel page tables where concurrent
> writers cooperate via atomic accessors.
>
> The generic version in <linux/pgtable.h> returns false. x86 and arm64
> override with try_cmpxchg-based implementations on the underlying pteval.
> Other architectures get the false stub - the callers there already fall
> through to oops.
>
> v2: Rename to ptep_try_set(). Tighten kerneldoc for kernel-PTE use.
> (David, Alexei)
>
> Suggested-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> Suggested-by: Alexei Starovoitov <ast@kernel.org>
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Cc: David Hildenbrand <david@kernel.org>
> ---
> arch/arm64/include/asm/pgtable.h | 8 ++++++++
> arch/x86/include/asm/pgtable.h | 8 ++++++++
> include/linux/pgtable.h | 26 ++++++++++++++++++++++++++
> 3 files changed, 42 insertions(+)
>
> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
> index 9029b81ccbe8..a129be91ef2c 100644
> --- a/arch/arm64/include/asm/pgtable.h
> +++ b/arch/arm64/include/asm/pgtable.h
> @@ -1830,6 +1830,14 @@ static inline pte_t ptep_get_and_clear(struct mm_struct *mm,
> return __ptep_get_and_clear(mm, addr, ptep);
> }
>
> +static inline bool ptep_try_set(pte_t *ptep, pte_t new_pte)
> +{
> + pteval_t old = 0;
> +
> + return try_cmpxchg(&pte_val(*ptep), &old, pte_val(new_pte));
> +}
> +#define ptep_try_set ptep_try_set
> +
> #define test_and_clear_young_ptes test_and_clear_young_ptes
> static inline bool test_and_clear_young_ptes(struct vm_area_struct *vma,
> unsigned long addr, pte_t *ptep, unsigned int nr)
> diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
> index 13e3e9a054cb..047e273a4eab 100644
> --- a/arch/x86/include/asm/pgtable.h
> +++ b/arch/x86/include/asm/pgtable.h
> @@ -1284,6 +1284,14 @@ static inline void ptep_set_wrprotect(struct mm_struct *mm,
> } while (!try_cmpxchg((long *)&ptep->pte, (long *)&old_pte, *(long *)&new_pte));
> }
>
> +static inline bool ptep_try_set(pte_t *ptep, pte_t new_pte)
> +{
> + pte_t old_pte = __pte(0);
> +
> + return try_cmpxchg((long *)&ptep->pte, (long *)&old_pte, *(long *)&new_pte);
> +}
Minor nit (feel free to ignore), on x86 pte_none() is defined as:
static inline int pte_none(pte_t pte)
{
return !(pte.pte & ~(_PAGE_KNL_ERRATUM_MASK));
}
With:
#if defined(CONFIG_X86_64) || defined(CONFIG_X86_PAE)
#define _PAGE_KNL_ERRATUM_MASK (_PAGE_DIRTY | _PAGE_ACCESSED)
#else
#define _PAGE_KNL_ERRATUM_MASK 0
#endif
If that mask has the D/A bits set, try_cmpxchg(..., &old=0, ...) will reject a
PTE that has only those bits set, even though pte_none() would return true. I
think this is fine for the bpf_arena use case, since hardware shouldn't set A/D
for fresh pages that the BPF prog hasn't touched.
Maybe it's worth adding a comment (something along these lines)?
/*
* Note: strictly-zero compare is narrower than pte_none() (see pte_none() and
* _PAGE_KNL_ERRATUM_MASK), but the gap is harmless in practice: HW shouldn't
* set _PAGE_DIRTY | _PAGE_ACCESSED bits on entries the caller never touched.
*/
Other than that, looks good to me.
Reviewed-by: Andrea Righi <arighi@nvidia.com>
Thanks,
-Andrea
> +#define ptep_try_set ptep_try_set
> +
> #define flush_tlb_fix_spurious_fault(vma, address, ptep) do { } while (0)
>
> #define __HAVE_ARCH_PMDP_SET_ACCESS_FLAGS
> diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
> index cdd68ed3ae1a..d68374f404c1 100644
> --- a/include/linux/pgtable.h
> +++ b/include/linux/pgtable.h
> @@ -1036,6 +1036,32 @@ static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long addres
> }
> #endif
>
> +#ifndef ptep_try_set
> +/**
> + * ptep_try_set - atomically set an empty kernel PTE
> + * @ptep: page table entry
> + * @new_pte: value to install
> + *
> + * Atomically set *@ptep to @new_pte iff *@ptep is pte_none(). Return
> + * true on success, false if the slot was already populated or the
> + * arch has no implementation.
> + *
> + * For special kernel page tables only - never user page tables. The
> + * caller must prevent concurrent teardown of @ptep and must accept
> + * that other writers may race. Concurrent clearers must use
> + * ptep_get_and_clear() so racing accesses agree on the outcome.
> + *
> + * Architectures opt in by providing a cmpxchg-based override and
> + * defining ptep_try_set as an identity macro. The generic stub
> + * returns false, which is correct for callers that fall through to
> + * oops on failure.
> + */
> +static inline bool ptep_try_set(pte_t *ptep, pte_t new_pte)
> +{
> + return false;
> +}
> +#endif
> +
> #ifndef wrprotect_ptes
> /**
> * wrprotect_ptes - Write-protect PTEs that map consecutive pages of the same
> --
> 2.54.0
>
^ permalink raw reply
* [PATCH v3] usb: gadget: aspeed_udc: avoid past-the-end iterator in dequeue
From: Maoyi Xie @ 2026-05-21 6:54 UTC (permalink / raw)
To: Andrew Jeffery, Neal Liu
Cc: Greg Kroah-Hartman, Benjamin Herrenschmidt, Joel Stanley,
Andrew Lunn, Alan Stern, linux-aspeed, linux-arm-kernel,
linux-usb, linux-kernel
In-Reply-To: <20260519080213.1932516-1-maoyixie.tju@gmail.com>
ast_udc_ep_dequeue() declares the loop cursor `req` outside the
list_for_each_entry(). After the loop it tests `&req->req != _req`
to decide whether the request was found. If the queue holds no
match, `req` is past-the-end. It then aliases
container_of(&ep->queue, struct ast_udc_request, queue) via offset
cancellation. Whether that synthetic address equals `_req` depends
on heap layout. The function can return 0 without dequeueing
anything.
Default `rc` to -EINVAL and set it to 0 only inside the match
branch. `req` is no longer read after the loop, so the past-the-end
dereference goes away. No extra cursor variable or post-loop test
is needed.
Suggested-by: Alan Stern <stern@rowland.harvard.edu>
Suggested-by: Andrew Jeffery <andrew@codeconstruct.com.au>
Signed-off-by: Maoyi Xie <maoyixie.tju@gmail.com>
---
v3: Switch to Andrew Jeffery's shape: default rc to -EINVAL, set
rc=0 inside the match branch, drop the post-loop check. Smaller
diff, no extra cursor variable, no goto. Same semantic fix as v2.
v2: https://lore.kernel.org/linux-usb/20260519080213.1932516-1-maoyixie.tju@gmail.com/
v1: https://lore.kernel.org/linux-usb/20260518073403.1285339-1-maoyi.xie@ntu.edu.sg/
drivers/usb/gadget/udc/aspeed_udc.c | 7 ++-----
1 file changed, 2 insertions(+), 5 deletions(-)
diff --git a/drivers/usb/gadget/udc/aspeed_udc.c b/drivers/usb/gadget/udc/aspeed_udc.c
index 7fc6696b7694..75f9c831b21a 100644
--- a/drivers/usb/gadget/udc/aspeed_udc.c
+++ b/drivers/usb/gadget/udc/aspeed_udc.c
@@ -694,7 +694,7 @@ static int ast_udc_ep_dequeue(struct usb_ep *_ep, struct usb_request *_req)
struct ast_udc_dev *udc = ep->udc;
struct ast_udc_request *req;
unsigned long flags;
- int rc = 0;
+ int rc = -EINVAL;
spin_lock_irqsave(&udc->lock, flags);
@@ -704,14 +704,11 @@ static int ast_udc_ep_dequeue(struct usb_ep *_ep, struct usb_request *_req)
list_del_init(&req->queue);
ast_udc_done(ep, req, -ESHUTDOWN);
_req->status = -ECONNRESET;
+ rc = 0;
break;
}
}
- /* dequeue request not found */
- if (&req->req != _req)
- rc = -EINVAL;
-
spin_unlock_irqrestore(&udc->lock, flags);
return rc;
--
2.34.1
^ permalink raw reply related
* Re: [PATCH 04/10] [v2] sh: select legacy gpiolib interface
From: John Paul Adrian Glaubitz @ 2026-05-21 6:49 UTC (permalink / raw)
To: Arnd Bergmann, linux-gpio
Cc: linux-kernel, Arnd Bergmann, Christian Lamparter, Johannes Berg,
Aaro Koskinen, Andreas Kemnade, Kevin Hilman, Roger Quadros,
Tony Lindgren, Thomas Bogendoerfer, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, Linus Walleij,
Bartosz Golaszewski, Dmitry Torokhov, Lee Jones, Pavel Machek,
Matti Vaittinen, Florian Fainelli, Jonas Gorski, Andrew Lunn,
Vladimir Oltean, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, linux-wireless, linux-omap, linux-arm-kernel,
linux-mips, linux-sh, linux-input, linux-leds, netdev
In-Reply-To: <20260520183815.2510387-5-arnd@kernel.org>
Hi Arnd,
On Wed, 2026-05-20 at 20:38 +0200, Arnd Bergmann wrote:
> From: Arnd Bergmann <arnd@arndb.de>
>
> Many board files on sh reference the legacy gpiolib interfaces that
> are becoming optional. To ensure the boards can keep building, select
> CONFIG_GPIOLIB_LEGACY on each of the boards that have one of the
> hardcoded calls.
>
> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
> ---
> v2: no changes. Adrian said he'll pick it up for 7.2, but so
> far the patch is not in linux-next yet, so I'm including it
> for completeness here.
Sorry, I hadn't gotten around to pick the changes for v7.2 yet. I can
pick it up this weekend as I was planning to review and merge some
patches this weekend.
I have received quite a lot of patches for SH recently, so it will take
some time to dig myself through the queue.
Adrian
--
.''`. John Paul Adrian Glaubitz
: :' : Debian Developer
`. `' Physicist
`- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913
^ permalink raw reply
* [PATCH v2] i2c: imx: fix clock and pinctrl state inconsistency in runtime PM
From: Carlos Song (OSS) @ 2026-05-21 6:50 UTC (permalink / raw)
To: o.rempel, kernel, andi.shyti, Frank.Li, s.hauer, festevam,
carlos.song
Cc: linux-i2c, linux-arm-kernel, linux-kernel, stable
From: Carlos Song <carlos.song@nxp.com>
In i2c_imx_runtime_suspend(), the clock is disabled before switching
the pinctrl state to sleep. If pinctrl_pm_select_sleep_state() fails,
the runtime suspend is aborted but the clock remains disabled, causing
a system crash when the hardware is subsequently accessed.
Fix this by switching the pinctrl state before disabling the clock so
that a pinctrl failure leaves the clock enabled and the hardware
accessible.
In i2c_imx_runtime_resume(), restore the pinctrl state back to sleep
if clk_enable() fails to keep the consistent.
Fixes: 576eba03c994 ("i2c: imx: switch different pinctrl state in different system power status")
Cc: stable@vger.kernel.org
Signed-off-by: Carlos Song <carlos.song@nxp.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
---
Change for v2:
- Fix commit log to "keep the consistent" according to Frank's
suggestion.
---
drivers/i2c/busses/i2c-imx.c | 15 ++++++++++++---
1 file changed, 12 insertions(+), 3 deletions(-)
diff --git a/drivers/i2c/busses/i2c-imx.c b/drivers/i2c/busses/i2c-imx.c
index a208fefd3c3b..28313d0fad37 100644
--- a/drivers/i2c/busses/i2c-imx.c
+++ b/drivers/i2c/busses/i2c-imx.c
@@ -1892,9 +1892,15 @@ static void i2c_imx_remove(struct platform_device *pdev)
static int i2c_imx_runtime_suspend(struct device *dev)
{
struct imx_i2c_struct *i2c_imx = dev_get_drvdata(dev);
+ int ret;
+
+ ret = pinctrl_pm_select_sleep_state(dev);
+ if (ret)
+ return ret;
clk_disable(i2c_imx->clk);
- return pinctrl_pm_select_sleep_state(dev);
+
+ return 0;
}
static int i2c_imx_runtime_resume(struct device *dev)
@@ -1907,10 +1913,13 @@ static int i2c_imx_runtime_resume(struct device *dev)
return ret;
ret = clk_enable(i2c_imx->clk);
- if (ret)
+ if (ret) {
dev_err(dev, "can't enable I2C clock, ret=%d\n", ret);
+ pinctrl_pm_select_sleep_state(dev);
+ return ret;
+ }
- return ret;
+ return 0;
}
static int i2c_imx_suspend(struct device *dev)
--
2.43.0
^ permalink raw reply related
* Re: [PATCH v2 2/2] KVM: arm64: nv: Don't save/restore FP register during a nested ERET or exception
From: Marc Zyngier @ 2026-05-21 6:35 UTC (permalink / raw)
To: Mark Rutland
Cc: kvmarm, linux-arm-kernel, kvm, Steffen Eiden, Joey Gouly,
Suzuki K Poulose, Oliver Upton, Zenghui Yu, Will Deacon,
Fuad Tabba
In-Reply-To: <ag2w0G34NycT2456@J2N7QTR9R3.cambridge.arm.com>
On Wed, 20 May 2026 14:02:08 +0100,
Mark Rutland <mark.rutland@arm.com> wrote:
>
> On Wed, May 20, 2026 at 09:50:36AM +0100, Marc Zyngier wrote:
> > When switching between L1 and L2, we save the old state using
> > kvm_arch_vcpu_put(), mutate the state in memory, then load the new
> > state using kvm_arch_vcpu_load(). Any live FPSIMD/SVE state is saved
> > and unbound, such that it can be lazily restored on a subsequent trap.
> >
> > The FPSIMD/SVE state is shared by exception levels, and only a handful
> > of related control registers need to be changed when transitioning
> > between L1 and L2. The save/restore of the common state is needless
> > overhead, especially as trapping becomes exponentially more expensive
> > with nesting.
> >
> > Avoid this overhead by leaving the common FPSIMD/SVE state live on the
> > CPU, and only switching the state that is distinct for L1 and L2:
> >
> > - the trap controls: the effective values are recomputed on each entry
> > into the guest to take the EL into account and merge the L0 and L1
> > configuration if in a nested context, or directly use the L0 configuration
> > in non-nested context (see __activate_traps()).
> >
> > - the VL settings: the effective values are are also recomputed on each
> > entry into the guest (see fpsimd_lazy_switch_to_guest()).
> >
> > Since we appear to cover all bases, use the vcpu flags indicating the
> > handling of a nested ERET or exception delivery to avoid the whole FP
> > save/restore shenanigans. SME will have to be similarly dealt with when
> > it eventually gets supported.
> >
> > For an EL1 L3 guest where L1 and L2 have this optimisation, this
> > results in at least a 10% wall clock reduction when running an I/O
> > heavy workload, generating a high rate of nested exceptions.
>
> There's on additional thing that's important, but I forgot to mention
> last time: in the window between kvm_arch_vcpu_put() and
> kvm_arch_vcpu_load(), it's possible to take an interrupt, and for a
> softirq handler to try to use kernel mode NEON.
>
> Due to that, kvm_arch_vcpu_put() must leave the L1 guest's maximum VL
> configured in the host's ZCR_ELx, such that the guest's state can be
> saved.
>
> That value is configured by fpsimd_lazy_switch_to_host(), so we just
> need to make sure that kvm_arch_vcpu_put() doesn't clobber it. I *think*
> that's fine today, but maybe that warrants a comment somewhere.
I have slapped this onto this patch:
diff --git a/arch/arm64/kvm/fpsimd.c b/arch/arm64/kvm/fpsimd.c
index aca98752a6e42..3f6b1e29cd6b9 100644
--- a/arch/arm64/kvm/fpsimd.c
+++ b/arch/arm64/kvm/fpsimd.c
@@ -117,7 +117,10 @@ void kvm_arch_vcpu_put_fp(struct kvm_vcpu *vcpu)
unsigned long flags;
/*
- * See comment in kvm_arch_vcpu_load_fp().
+ * See comment in kvm_arch_vcpu_load_fp(). Note that we also rely on
+ * the guest's max VL to have been set by fpsimd_lazy_switch_to_host()
+ * so that any intervening kernel-mode SIMD (NEON or otherwise)
+ * operation sees the full guest state that needs saving.
*/
if (vcpu_get_flag(vcpu, IN_NESTED_ERET) ||
vcpu_get_flag(vcpu, IN_NESTED_EXCEPTION)) {
> Other than that, this all looks good to me:
>
> Acked-by: Mark Rutland <mark.rutland@arm.com>
Thanks,
M.
--
Without deviation from the norm, progress is not possible.
^ permalink raw reply related
* Re: [PATCH v2 2/2] KVM: arm64: nv: Don't save/restore FP register during a nested ERET or exception
From: Marc Zyngier @ 2026-05-21 6:21 UTC (permalink / raw)
To: Joey Gouly
Cc: kvmarm, linux-arm-kernel, kvm, Steffen Eiden, Suzuki K Poulose,
Oliver Upton, Zenghui Yu, Mark Rutland, Will Deacon, Fuad Tabba
In-Reply-To: <20260520110231.GA4005903@e124191.cambridge.arm.com>
On Wed, 20 May 2026 12:02:31 +0100,
Joey Gouly <joey.gouly@arm.com> wrote:
>
> Hi Marc,
>
> On Wed, May 20, 2026 at 09:50:36AM +0100, Marc Zyngier wrote:
> > When switching between L1 and L2, we save the old state using
> > kvm_arch_vcpu_put(), mutate the state in memory, then load the new
> > state using kvm_arch_vcpu_load(). Any live FPSIMD/SVE state is saved
> > and unbound, such that it can be lazily restored on a subsequent trap.
> >
> > The FPSIMD/SVE state is shared by exception levels, and only a handful
> > of related control registers need to be changed when transitioning
> > between L1 and L2. The save/restore of the common state is needless
> > overhead, especially as trapping becomes exponentially more expensive
> > with nesting.
> >
> > Avoid this overhead by leaving the common FPSIMD/SVE state live on the
> > CPU, and only switching the state that is distinct for L1 and L2:
>
> To make sure I understand this part:
>
> L1 sets up L2's FP state live on the CPU
> L1 erets
> eret traps to L0/host
> preemption disabled
> kvm_arch_vcpu_put()
> kvm_arch_vcpu_put_fp() <-- actually saves the state of the live registers
> .. set elr etc ..
> kvm_arch_vcpu_load()
> kvm_arch_vcpu_load_fp() <-- doesn't actually restore state, but ensures
> the CPTR trap will be set
> .. returns to L2 (traps on first use of FP and state will be restored)
>
> So this patch is (effectively) removing the put_fp()/load_fp(), because the FP
> state is common/shared between L1 and L2, so whatever L1 put into that state
> before the eret, L2 was going to see.
Yes, you got it right. The other path is on L1 to L2 exception, which
also requires L0 mediation and has a similar shape.
The most horrible thing is that because all these traps can happen at
a arbitrary depth, each individual trap usually results in the
combination of all of the above.
> If my understanding is correct:
> Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Thanks!
M.
--
Without deviation from the norm, progress is not possible.
^ permalink raw reply
* Re: [PATCH v3] dt-bindings: mfd: st,stmpe: fix PWM schema and drop legacy binding
From: Manish Baing @ 2026-05-21 5:47 UTC (permalink / raw)
To: Uwe Kleine-König
Cc: lee, linusw, robh, krzk+dt, conor+dt, mcoquelin.stm32,
alexandre.torgue, devicetree, linux-stm32, linux-arm-kernel,
linux-kernel, linux-pwm
In-Reply-To: <agnY16I4sYAdRd9T@monoceros>
Hi Uwe,
> If the patch was split into two, each touching just one of the files,
> there would be no need for merge coordination. Also logically it's two
> patches. Would you mind splitting?
That makes perfect sense. I will split this into a two-patch series
(one for the MFD YAML fix and one for the PWM TXT deletion) and submit
it shortly as v4.
Thanks for the feedback!
Thanks and Regards,
Manish
On Sun, May 17, 2026 at 8:35 PM Uwe Kleine-König <ukleinek@kernel.org> wrote:
>
> Hello,
>
> On Sat, May 09, 2026 at 07:39:28PM +0000, Manish Baing wrote:
> > The st,stmpe-pwm binding is already covered by the MFD schema in
> > Documentation/devicetree/bindings/mfd/st,stmpe.yaml. However, the
> > PWM subnode was missing a 'required' properties block. This allowed
> > Device Tree nodes to pass validation even if the 'compatible'
> > string was omitted. This omission could lead to probe failures
> > at runtime.
> >
> > Fix the schema by adding the missing 'required' block and
> > remove the obsolete and redundant text binding file.
> >
> > Signed-off-by: Manish Baing <manishbaing2789@gmail.com>
> > ---
> > Changes in v3:
> > - Added 'required' properties to the pwm subnode in st,stmpe.yaml
> > to close a validation gap identified by the Sashiko.
> > - Updated commit message and description to reflect MFD subsystem changes.
> >
> > Changes in v2:
> > - Droppped the TXT file instead of converting to YAML, as the
> > functionality is already covered by st,stmpe.yaml.
> >
> > .../devicetree/bindings/mfd/st,stmpe.yaml | 4 ++++
> > .../devicetree/bindings/pwm/st,stmpe-pwm.txt | 18 ------------------
>
> If the patch was split into two, each touching just one of the files,
> there would be no need for merge coordination. Also logically it's two
> patches. Would you mind splitting?
>
> Best regards
> Uwe
^ permalink raw reply
* Re: [PATCH v2 1/4] dt-bindings: display: verisilicon, dc: generalize for single-output variants
From: Joey Lu @ 2026-05-21 5:41 UTC (permalink / raw)
To: Icenowy Zheng, Conor Dooley
Cc: maarten.lankhorst, mripard, tzimmermann, airlied, simona, robh,
krzk+dt, conor+dt, ychuang3, schung, yclu4, dri-devel, devicetree,
linux-arm-kernel, linux-kernel
In-Reply-To: <47a06094541da642cabcb6b7d2f92d5125d365ea.camel@iscas.ac.cn>
On 5/20/2026 12:07 PM, Icenowy Zheng wrote:
> 在 2026-05-20三的 11:06 +0800,Joey Lu写道:
>> On 5/20/2026 12:47 AM, Conor Dooley wrote:
>>> On Tue, May 19, 2026 at 03:26:58PM +0800, Icenowy Zheng wrote:
>>>> 在 2026-05-19二的 13:51 +0800,Joey Lu写道:
>>>>> The existing schema assumes a fixed clock/reset topology and
>>>>> dual-
>>>>> output
>>>>> port structure matching the DC8200 IP block. This prevents
>>>>> reuse for
>>>>> single-output variants such as the Verisilicon DCU Lite used in
>>>>> the
>>>>> Nuvoton MA35D1 SoC.
>>>>>
>>>>> Rework the schema so that variant-specific constraints are
>>>>> expressed
>>>>> via allOf/if-then-else:
>>>>>
>>>>> - The thead,th1520-dc8200 compatible keeps its existing five-
>>>>> clock,
>>>>> three-reset, dual-port requirements.
>>>>>
>>>>> - A standalone verisilicon,dc compatible covers IPs whose
>>>>> identity is
>>>>> discovered entirely through hardware registers; these have
>>>>> flexible
>>>>> clock and reset counts, a single 'port' property, and no
>>>>> 'ports'
>>>>> requirement.
>>>>>
>>>>> Changes to the base schema:
>>>>> - Replace the fixed clock/reset items lists with
>>>>> minItems/maxItems
>>>>> ranges; variant sub-schemas tighten the constraints via if-
>>>>> then-
>>>>> else.
>>>>> - Add a 'port' property (graph.yaml single-port alias)
>>>>> alongside the
>>>>> existing 'ports', for single-output variants.
>>>>> - Drop the unconditional 'ports' requirement; each if-branch
>>>>> enforces
>>>>> its own port topology.
>>>>> - Tighten additionalProperties to unevaluatedProperties to
>>>>> allow
>>>>> per-variant schemas to add their own constraints cleanly.
>>>>> - Fix a stray space in the port@0 description.
>>>>> - Add a DT example for the generic verisilicon,dc compatible
>>>>> (Nuvoton MA35D1 DCU Lite).
>>>>>
>>>>> Signed-off-by: Joey Lu <a0987203069@gmail.com>
>>>>> ---
>>>>> .../bindings/display/verisilicon,dc.yaml | 135
>>>>> ++++++++++++++--
>>>>> --
>>>>> 1 file changed, 108 insertions(+), 27 deletions(-)
>>>>>
>>>>> diff --git
>>>>> a/Documentation/devicetree/bindings/display/verisilicon,dc.yaml
>>>>> b/Documentation/devicetree/bindings/display/verisilicon,dc.yaml
>>>>> index 9dc35ab973f2..3a814c2e083e 100644
>>>>> ---
>>>>> a/Documentation/devicetree/bindings/display/verisilicon,dc.yaml
>>>>> +++
>>>>> b/Documentation/devicetree/bindings/display/verisilicon,dc.yaml
>>>>> @@ -14,10 +14,12 @@ properties:
>>>>> pattern: "^display@[0-9a-f]+$"
>>>>>
>>>>> compatible:
>>>>> - items:
>>>>> - - enum:
>>>>> - - thead,th1520-dc8200
>>>> You should add a fallback compatible here for your SoC, in case
>>>> its
>>>> integration gets something quirky; this compatible is usually not
>>>> consumed by the driver (see how thead,th1520-dc8200 exists in the
>>>> binding but not the driver).
>>> s/fallback compatible/soc-specific compatible/, but yes.
>>> NAK to what's been done here, especially after the discussions on
>>> earlier versions of this verisilicon binding.
>>> pw-bot: changes-requested
>> Understood. I will add `nuvoton,ma35d1-dcu` as the SoC-specific
>> compatible string paired with `verisilicon,dc` as the generic
>> fallback,
>> matching the pattern used for `thead,th1520-dc8200`. The standalone
>> `verisilicon,dc` compatible will be removed from the binding. The
>> driver
> No, please don't remove compatible strings from existing binding, and
> the generic compatible is still used for driver binding.
>
> The SoC-specific compatible is informative here, it needs to exist, but
> it doesn't supersede "verisilicon,dc" .
>
> In addition, the SoC-specific compatible is also used for verification
> of the SoC device tree, which is the reason if clauses exist with
> compatible match and additional constraints (e.g. for the nuvoton DCU
> it's invalid to have a 2nd output port).
Sorry for the misunderstanding. I now see that a standalone generic
fallback compatible is not preferred here, and that the SoC-specific
compatible is strictly required for DT validation. I will add
`nuvoton,ma35d1-dcu` as the SoC-specific compatible string in the
existing compatible items list, without adding or removing anything else.
>> match table is not changed since hardware detection is done via ID
>> registers.
>>>>> - - const: verisilicon,dc # DC IPs have discoverable
>>>>> ID/revision
>>>>> registers
>>>>> + oneOf:
>>>>> + - items:
>>>>> + - enum:
>>>>> + - thead,th1520-dc8200
>>>>> + - const: verisilicon,dc
>>>>> + - const: verisilicon,dc # DC IPs have discoverable
>>>>> ID/revision registers
>>>>>
>>>>> reg:
>>>>> maxItems: 1
>>>>> @@ -26,32 +28,24 @@ properties:
>>>>> maxItems: 1
>>>>>
>>>>> clocks:
>>>>> - items:
>>>>> - - description: DC Core clock
>>>>> - - description: DMA AXI bus clock
>>>>> - - description: Configuration AHB bus clock
>>>>> - - description: Pixel clock of output 0
>>>>> - - description: Pixel clock of output 1
>>>>> + minItems: 2
>>>>> + maxItems: 5
>>>>>
>>>>> clock-names:
>>>>> - items:
>>>>> - - const: core
>>>>> - - const: axi
>>>>> - - const: ahb
>>>>> - - const: pix0
>>>>> - - const: pix1
>>>>> + minItems: 2
>>>>> + maxItems: 5
>>>>>
>>>>> resets:
>>>>> - items:
>>>>> - - description: DC Core reset
>>>>> - - description: DMA AXI bus reset
>>>>> - - description: Configuration AHB bus reset
>>>>> + minItems: 1
>>>>> + maxItems: 3
>>>>>
>>>>> reset-names:
>>>>> - items:
>>>>> - - const: core
>>>>> - - const: axi
>>>>> - - const: ahb
>>>>> + minItems: 1
>>>>> + maxItems: 3
>>>>> +
>>>>> + port:
>>>>> + $ref: /schemas/graph.yaml#/properties/port
>>>>> + description: Single video output port for single-output
>>>>> variants.
>>>> Maybe the endpoint numbering rule needs a move to here? (I am not
>>>> very
>>>> sure).
>> I will add a description to the `port` property noting that endpoint
>> 0
>> is used for DPI output, which is the only output type for
>> DCUltraLite.
> Please note that DC8000 exists, which is single-port but supports both
> DPI and DP.
To make it simple, the `port` property will not be added. `ports`
remains the sole port property and is kept in the global `required:`
list as in the original. The MA35D1 example will use `ports { port@0 {
... } }`, consistent with how other single-output DT nodes are written
in the kernel.
>>>>>
>>>>> ports:
>>>>> $ref: /schemas/graph.yaml#/properties/ports
>>>>> @@ -59,7 +53,7 @@ properties:
>>>>> properties:
>>>>> port@0:
>>>>> $ref: /schemas/graph.yaml#/properties/port
>>>>> - description: The first output channel , endpoint 0
>>>>> should be
>>>>> + description: The first output channel, endpoint 0
>>>>> should be
>>>>> used for DPI format output and endpoint 1 should be
>>>>> used
>>>>> for DP format output.
>>>>>
>>>>> @@ -75,9 +69,75 @@ required:
>>>>> - interrupts
>>>>> - clocks
>>>>> - clock-names
>>>>> - - ports
>>>>>
>>>>> -additionalProperties: false
>>>>> +allOf:
>>>>> + - if:
>>>>> + properties:
>>>>> + compatible:
>>>>> + contains:
>>>>> + const: thead,th1520-dc8200
>>>>> + then:
>>>>> + properties:
>>>>> + clocks:
>>>>> + items:
>>>>> + - description: DC Core clock
>>>>> + - description: DMA AXI bus clock
>>>>> + - description: Configuration AHB bus clock
>>>>> + - description: Pixel clock of output 0
>>>>> + - description: Pixel clock of output 1
>>>>> +
>>>>> + clock-names:
>>>>> + items:
>>>>> + - const: core
>>>>> + - const: axi
>>>>> + - const: ahb
>>>>> + - const: pix0
>>>>> + - const: pix1
>>>>> +
>>>>> + resets:
>>>>> + items:
>>>>> + - description: DC Core reset
>>>>> + - description: DMA AXI bus reset
>>>>> + - description: Configuration AHB bus reset
>>>>> +
>>>>> + reset-names:
>>>>> + items:
>>>>> + - const: core
>>>>> + - const: axi
>>>>> + - const: ahb
>>>>> +
>>>>> + required:
>>>>> + - ports
>>>>> +
>>>>> + else:
>>>>> + properties:
>>>>> + clocks:
>>>>> + items:
>>>>> + - description: Bus clock that gates register
>>>>> access
>>>>> + - description: Pixel clock divider for display
>>>>> timing
>>>> Please don't make compatible-specific description strings for
>>>> individual compatibles, and keep these descriptions outside of
>>>> the if.
>>>> The compatible-specific part should be used to specify what's
>>>> required
>>>> for the specific SoC, for dt validation purpose.
>>>>
>>>> BTW if the clock is both the working clock and bus clock for the
>>>> controller, I suggest listing it twice, except if the IP core is
>>>> provided without a dedicated core clock (in the case I suggest to
>>>> use
>>>> "bus" only).
>>> I agree. If the same clock is provided to two+ ports on the IP,
>>> that
>>> should still be two+ clocks in the devicetree.
>>>
>>>> Here's an example for "listing it twice":
>>>> ```
>>>> clocks = <&clk DCU_GATE>, <&clk DCU_GATE>, <&clk DCUP_DIV>;
>>>> clock-names = "core", "bus", "pix0";
>>>> ```
>>>>
>>>> Well nonetheless the name "core" does not match the description
>>>> "Bus
>>>> clock that gates register access".
>>>>
>>>> Thanks,
>>>> Icenowy
>> Understood. I will remove all description strings from the if/else
>> branches; the if/then clauses will only constrain clock-names and
>> reset-names items (name values only, no descriptions). Regarding
>> clock
> Well I think a required properties list is also needed in the if/then
> clause, to prevent DT's from lacking properties.
Since `ports` is kept in the global `required:` list, neither if/then
block needs a `required:` entry for port topology. Each if/then only
constrains clock-names and reset-names for DT validation. The `else`
branch has been eliminated; each variant has its own independent
`if/then` in the `allOf` array.
>> naming: DCU_GATE on MA35D1 is a peripheral gate clock without a
>> separate
>> dedicated core working clock, so I will keep "core" as the name and
> Do you mean there's no seperate dedicated bus clock? I find that in the
> clock driver dcu_gate has no parent as bus clocks -- its parent is
> dcu_mux, and dcu_mux's 2 parents are both pll ("epll_div2" and
> "syspll").
>
> Thanks,
> Icenowy
You are right — DCU_GATE has no parent as a bus clock. For this case, I
prefer to keep "core" as the sole gate clock name alongside "pix0".
Thanks.
Here is what the v3 yaml would look like:
```yaml
compatible:
items:
- enum: [nuvoton,ma35d1-dcu, thead,th1520-dc8200]
- const: verisilicon,dc
properties:
clocks: minItems: 2, items with descriptions
resets: minItems: 1, items with descriptions
required:
[compatible, reg, interrupts, clocks, clock-names, ports]
allOf:
- if: compatible contains thead,th1520-dc8200
then:
clock-names: [core, axi, ahb, pix0, pix1]
reset-names: [core, axi, ahb]
- if: compatible contains nuvoton,ma35d1-dcu
then:
clock-names: [core, pix0]
reset-names: [core]
```
>> drop
>> the misleading description "Bus clock that gates register access".
>> The
>> description mismatch was entirely in the if/else strings which are
>> now
>> removed.
>>
>> Thanks.
>>
>>>>> +
>>>>> + clock-names:
>>>>> + items:
>>>>> + - const: core
>>>>> + - const: pix0
>>>>> +
>>>>> + resets:
>>>>> + maxItems: 1
>>>>> + description:
>>>>> + Reset line for the display controller.
>>>>> +
>>>>> + reset-names:
>>>>> + items:
>>>>> + - const: core
>>>>> +
>>>>> + required:
>>>>> + - port
>>>>> +
>>>>> + not:
>>>>> + required:
>>>>> + - ports
>>>>> +
>>>>> +unevaluatedProperties: false
>>>>>
>>>>> examples:
>>>>> - |
>>>>> @@ -120,3 +180,24 @@ examples:
>>>>> };
>>>>> };
>>>>> };
>>>>> +
>>>>> + - |
>>>>> + #include <dt-bindings/interrupt-controller/arm-gic.h>
>>>>> + #include <dt-bindings/clock/nuvoton,ma35d1-clk.h>
>>>>> + #include <dt-bindings/reset/nuvoton,ma35d1-reset.h>
>>>>> +
>>>>> + display@40260000 {
>>>>> + compatible = "verisilicon,dc";
>>>>> + reg = <0x40260000 0x20000>;
>>>>> + interrupts = <GIC_SPI 20 IRQ_TYPE_LEVEL_HIGH>;
>>>>> + clocks = <&clk DCU_GATE>, <&clk DCUP_DIV>;
>>>>> + clock-names = "core", "pix0";
>>>>> + resets = <&sys MA35D1_RESET_DISP>;
>>>>> + reset-names = "core";
>>>>> +
>>>>> + port {
>>>>> + dpi_out: endpoint {
>>>>> + remote-endpoint = <&panel_in>;
>>>>> + };
>>>>> + };
>>>>> + };
^ permalink raw reply
* RE: [PATCH V3 0/8] PCI: imx6: Integrate pwrctrl API and update device trees
From: Sherry Sun @ 2026-05-21 4:40 UTC (permalink / raw)
To: Hongxing Zhu (OSS), Sherry Sun (OSS), robh@kernel.org,
krzk+dt@kernel.org, conor+dt@kernel.org, Frank Li,
s.hauer@pengutronix.de, kernel@pengutronix.de, festevam@gmail.com,
lpieralisi@kernel.org, kwilczynski@kernel.org, mani@kernel.org,
bhelgaas@google.com, l.stach@pengutronix.de
Cc: imx@lists.linux.dev, linux-pci@vger.kernel.org,
linux-arm-kernel@lists.infradead.org, devicetree@vger.kernel.org,
linux-kernel@vger.kernel.org
In-Reply-To: <GV2PR04MB120194C8BDA9B49C79DCE72168C0E2@GV2PR04MB12019.eurprd04.prod.outlook.com>
> > -----Original Message-----
> > From: Sherry Sun (OSS) <sherry.sun@oss.nxp.com>
> > Sent: Wednesday, May 20, 2026 4:49 PM
> > To: robh@kernel.org; krzk+dt@kernel.org; conor+dt@kernel.org; Frank Li
> > <frank.li@nxp.com>; s.hauer@pengutronix.de; kernel@pengutronix.de;
> > festevam@gmail.com; lpieralisi@kernel.org; kwilczynski@kernel.org;
> > mani@kernel.org; bhelgaas@google.com; Hongxing Zhu
> > <hongxing.zhu@nxp.com>; l.stach@pengutronix.de
> > Cc: imx@lists.linux.dev; linux-pci@vger.kernel.org; linux-arm-
> > kernel@lists.infradead.org; devicetree@vger.kernel.org; linux-
> > kernel@vger.kernel.org; Sherry Sun <sherry.sun@nxp.com>
> > Subject: [PATCH V3 0/8] PCI: imx6: Integrate pwrctrl API and update
> > device trees
> >
> > From: Sherry Sun <sherry.sun@nxp.com>
> >
> > This series integrates the PCI pwrctrl framework into the pci-imx6
> > driver and updates i.MX EVK board device trees to support it.
> >
> > Patches 2-8 update device trees for i.MX EVK boards which maintained
> > by NXP to move power supply properties from the PCIe controller node
> > to the Root Port child node, which is required for pwrctrl framework.
> > Affected boards:
> > - i.MX6Q/DL SABRESD
> > - i.MX6SX SDB
> > - i.MX8MM EVK
> > - i.MX8MP EVK
> > - i.MX8MQ EVK
> > - i.MX8DXL/QM/QXP EVK
> > - i.MX95 15x15/19x19 EVK
> >
> > The driver maintains legacy regulator handling for device trees that
> > haven't been updated yet. Both old and new device tree structures are
> supported.
> >
> > Signed-off-by: Sherry Sun <sherry.sun@nxp.com>
> Hi Sherry:
> Since the vpcie3v3aux is used to power up the WAKE#, it is always on in this
> pwrctrl framework whatever the system is in suspend or not, right?
>
Hi Richard,
Currently the new pwrctrl framework doesn't support vpcie3v3aux, it handles all
regulators with of_regulator_bulk_get_all() and regulator_bulk_enable/disable().
The vpcie3v3aux now only works with pci-imx6 driver.
Best Regards
Sherry
> Best Regards
> Richard Zhu
> > ---
> > Changes in V3:
> > 1. Rebased on top of latest 7.1.0-rc4
> >
> > Changes in V2:
> > 1. After commit 2d8c5098b847 ("PCI/pwrctrl: Do not power off on pwrctrl
> > device removal"), the pwrctrl drivers no longer power off devices
> > during removal. Update pci-imx6 driver's shutdown callback in patch#1
> > to explicitly call pci_pwrctrl_power_off_devices() before
> > pci_pwrctrl_destroy_devices() to ensure devices are properly powered
> > off.
> > ---
> >
> > Sherry Sun (8):
> > PCI: imx6: Integrate new pwrctrl API for pci-imx6
> > arm: dts: imx6qdl-sabresd: Move power supply property to Root Port
> > node
> > arm: dts: imx6sx-sdb: Move power supply property to Root Port node
> > arm64: dts: imx8mm-evk: Move power supply property to Root Port node
> > arm64: dts: imx8mp-evk: Move power supply properties to Root Port node
> > arm64: dts: imx8mq-evk: Move power supply properties to Root Port node
> > arm64: dts: imx8dxl/qm/qxp: Move power supply properties to Root Port
> > node
> > arm64: dts: imx95: Move power supply properties to Root Port node
> >
> > .../arm/boot/dts/nxp/imx/imx6qdl-sabresd.dtsi | 2 +-
> > arch/arm/boot/dts/nxp/imx/imx6sx-sdb.dtsi | 2 +-
> > arch/arm64/boot/dts/freescale/imx8dxl-evk.dts | 4 ++--
> > arch/arm64/boot/dts/freescale/imx8mm-evk.dtsi | 2 +-
> > arch/arm64/boot/dts/freescale/imx8mp-evk.dts | 4 ++--
> > arch/arm64/boot/dts/freescale/imx8mq-evk.dts | 4 ++--
> > arch/arm64/boot/dts/freescale/imx8qm-mek.dts | 4 ++--
> > arch/arm64/boot/dts/freescale/imx8qxp-mek.dts | 4 ++--
> > .../boot/dts/freescale/imx95-15x15-evk.dts | 4 ++--
> > .../boot/dts/freescale/imx95-19x19-evk.dts | 8 +++----
> > drivers/pci/controller/dwc/Kconfig | 1 +
> > drivers/pci/controller/dwc/pci-imx6.c | 24 ++++++++++++++++++-
> > 12 files changed, 43 insertions(+), 20 deletions(-)
> >
> > --
> > 2.37.1
^ permalink raw reply
* Re: [PATCH v14 10/44] arm64: RMI: Add support for SRO
From: Gavin Shan @ 2026-05-21 4:38 UTC (permalink / raw)
To: Steven Price, kvm, kvmarm
Cc: Catalin Marinas, Marc Zyngier, Will Deacon, James Morse,
Oliver Upton, Suzuki K Poulose, Zenghui Yu, linux-arm-kernel,
linux-kernel, Joey Gouly, Alexandru Elisei, Christoffer Dall,
Fuad Tabba, linux-coco, Ganapatrao Kulkarni, Shanker Donthineni,
Alper Gun, Aneesh Kumar K . V, Emi Kisanuki, Vishal Annapurve,
WeiLin.Chang, Lorenzo.Pieralisi2
In-Reply-To: <20260513131757.116630-11-steven.price@arm.com>
Hi Steven,
On 5/13/26 11:17 PM, Steven Price wrote:
> RMM v2.0 introduces the concept of "Stateful RMI Operations" (SRO). This
> means that an SMC can return with an operation still in progress. The
> host is excepted to continue the operation until is reaches a conclusion
> (either success or failure). During this process the RMM can request
> additional memory ('donate') or hand memory back to the host
> ('reclaim'). The host can request an in progress operation is cancelled,
> but still continue the operation until it has completed (otherwise the
> incomplete operation may cause future RMM operations to fail).
>
> The SRO is tracked using a struct rmi_sro_state object which keeps track
> of any memory which has been allocated but not yet consumed by the RMM
> or reclaimed from the RMM. This allows the memory to be reused in a
> future request within the same operation. It will also permit an
> operation to be done in a context where memory allocation may be
> difficult (e.g. atomic context) with the option to abort the operation
> and retry the memory allocation outside of the atomic context. The
> memory stored in the struct rmi_sro_state object can then be reused on
> the subsequent attempt.
>
> Signed-off-by: Steven Price <steven.price@arm.com>
> ---
> v14:
> * SRO support has improved although is still not fully complete. The
> infrastructure has been moved out of KVM.
> ---
> arch/arm64/include/asm/rmi_cmds.h | 1 +
> arch/arm64/kernel/rmi.c | 359 ++++++++++++++++++++++++++++++
> 2 files changed, 360 insertions(+)
>
> diff --git a/arch/arm64/include/asm/rmi_cmds.h b/arch/arm64/include/asm/rmi_cmds.h
> index eb213c8e6f26..1a7b0c8f1e38 100644
> --- a/arch/arm64/include/asm/rmi_cmds.h
> +++ b/arch/arm64/include/asm/rmi_cmds.h
> @@ -35,6 +35,7 @@ struct rmi_sro_state {
>
> int rmi_delegate_range(phys_addr_t phys, unsigned long size);
> int rmi_undelegate_range(phys_addr_t phys, unsigned long size);
> +int free_delegated_page(phys_addr_t phys);
>
> static inline int rmi_delegate_page(phys_addr_t phys)
> {
> diff --git a/arch/arm64/kernel/rmi.c b/arch/arm64/kernel/rmi.c
> index 08cef54acadb..a8107ca9bb6d 100644
> --- a/arch/arm64/kernel/rmi.c
> +++ b/arch/arm64/kernel/rmi.c
> @@ -48,6 +48,365 @@ int rmi_undelegate_range(phys_addr_t phys, unsigned long size)
> return ret;
> }
>
> +static unsigned long donate_req_to_size(unsigned long donatereq)
> +{
> + unsigned long unit_size = RMI_DONATE_SIZE(donatereq);
> +
> + switch (unit_size) {
> + case 0:
> + return PAGE_SIZE;
> + case 1:
> + return PMD_SIZE;
> + case 2:
> + return PUD_SIZE;
> + case 3:
> + return P4D_SIZE;
> + }
> + unreachable();
> +}
> +
It's worthy to have 'inline'. {P4D, PUD, PMD}_SIZE can be equal if there are
no P4D and PUD, depending on CONFIG_PGTABLE_LEVELS. In this case, can the
'unit_size' be translated to wrong value?
> +static void rmi_smccc_invoke(struct arm_smccc_1_2_regs *regs_in,
> + struct arm_smccc_1_2_regs *regs_out)
> +{
> + struct arm_smccc_1_2_regs regs = *regs_in;
> + unsigned long status;
> +
> + do {
> + arm_smccc_1_2_invoke(®s, regs_out);
> + status = RMI_RETURN_STATUS(regs_out->a0);
> + } while (status == RMI_BUSY || status == RMI_BLOCKED);
> +}
> +
> +int free_delegated_page(phys_addr_t phys)
> +{
> + if (WARN_ON(rmi_undelegate_page(phys))) {
> + /* Undelegate failed: leak the page */
> + return -EBUSY;
> + }
> +
> + free_page((unsigned long)phys_to_virt(phys));
> +
> + return 0;
> +}
> +
> +static int rmi_sro_ensure_capacity(struct rmi_sro_state *sro,
> + unsigned long count)
> +{
> + if (WARN_ON_ONCE(sro->addr_count > RMI_MAX_ADDR_LIST))
> + return -EOVERFLOW;
> +
> + if (count > RMI_MAX_ADDR_LIST - sro->addr_count)
> + return -ENOSPC;
> +
> + return 0;
> +}
> +
> +static int rmi_sro_donate_contig(struct rmi_sro_state *sro,
> + unsigned long sro_handle,
> + unsigned long donatereq,
> + struct arm_smccc_1_2_regs *out_regs,
> + gfp_t gfp)
> +{
> + unsigned long unit_size = RMI_DONATE_SIZE(donatereq);
> + unsigned long unit_size_bytes = donate_req_to_size(donatereq);
> + unsigned long count = RMI_DONATE_COUNT(donatereq);
> + unsigned long state = RMI_DONATE_STATE(donatereq);
> + unsigned long size = unit_size_bytes * count;
> + unsigned long addr_range;
> + int ret;
> + void *virt;
> + phys_addr_t phys;
> + struct arm_smccc_1_2_regs regs = {
> + SMC_RMI_OP_MEM_DONATE,
> + sro_handle
> + };
> +
> + for (int i = 0; i < sro->addr_count; i++) {
> + unsigned long entry = sro->addr_list[i];
> +
> + if (RMI_ADDR_RANGE_SIZE(entry) == unit_size &&
> + RMI_ADDR_RANGE_COUNT(entry) == count &&
> + RMI_ADDR_RANGE_STATE(entry) == state) {
> + sro->addr_count--;
> + swap(sro->addr_list[sro->addr_count],
> + sro->addr_list[i]);
> +
> + goto out;
> + }
> + }
> +
> + ret = rmi_sro_ensure_capacity(sro, 1);
> + if (ret)
> + return ret;
> +
> + virt = alloc_pages_exact(size, gfp);
> + if (!virt)
> + return -ENOMEM;
> + phys = virt_to_phys(virt);
> +
alloc_pages_exact() will fail if the requested size exceeds the maximal allowed
size (1 << MAX_PAGE_ORDER). The maximal size is usually smaller than PUD_SIZE
but PUD_SIZE is allowed by the RMM.
> + if (state == RMI_OP_MEM_DELEGATED) {
> + if (rmi_delegate_range(phys, size)) {
> + free_pages_exact(virt, size);
> + return -ENXIO;
> + }
> + }
> +
> + addr_range = phys & RMI_ADDR_RANGE_ADDR_MASK;
> + FIELD_MODIFY(RMI_ADDR_RANGE_SIZE_MASK, &addr_range, unit_size);
> + FIELD_MODIFY(RMI_ADDR_RANGE_COUNT_MASK, &addr_range, count);
> + FIELD_MODIFY(RMI_ADDR_RANGE_STATE_MASK, &addr_range, state);
> +
> + sro->addr_list[sro->addr_count] = addr_range;
> +
> +out:
> + regs.a2 = virt_to_phys(&sro->addr_list[sro->addr_count]);
> + regs.a3 = 1;
> + rmi_smccc_invoke(®s, out_regs);
> +
> + unsigned long donated_granules = out_regs->a1;
> + unsigned long donated_size = donated_granules << PAGE_SHIFT;
> +
> + if (donated_granules == 0) {
> + /* No pages used by the RMM */
> + sro->addr_count++;
> + } else if (donated_size < size) {
> + phys = sro->addr_list[sro->addr_count] & RMI_ADDR_RANGE_ADDR_MASK;
> +
> + /* Not all granules used by the RMM, free the remaining pages */
> + for (long i = donated_size; i < size; i += PAGE_SIZE) {
> + if (state == RMI_OP_MEM_DELEGATED)
> + free_delegated_page(phys + i);
> + else
> + __free_page(phys_to_page(phys + i));
> + }
> + }
> +
> + return 0;
> +}
> +
> +static int rmi_sro_donate_noncontig(struct rmi_sro_state *sro,
> + unsigned long sro_handle,
> + unsigned long donatereq,
> + struct arm_smccc_1_2_regs *out_regs,
> + gfp_t gfp)
> +{
> + unsigned long unit_size = RMI_DONATE_SIZE(donatereq);
> + unsigned long unit_size_bytes = donate_req_to_size(donatereq);
> + unsigned long count = RMI_DONATE_COUNT(donatereq);
> + unsigned long state = RMI_DONATE_STATE(donatereq);
> + unsigned long found = 0;
> + unsigned long addr_list_start = sro->addr_count;
> + int ret;
> + struct arm_smccc_1_2_regs regs = {
> + SMC_RMI_OP_MEM_DONATE,
> + sro_handle
> + };
> +
> + for (int i = 0; i < addr_list_start && found < count; i++) {
> + unsigned long entry = sro->addr_list[i];
> +
> + if (RMI_ADDR_RANGE_SIZE(entry) == unit_size &&
> + RMI_ADDR_RANGE_COUNT(entry) == 1 &&
> + RMI_ADDR_RANGE_STATE(entry) == state) {
> + addr_list_start--;
> + swap(sro->addr_list[addr_list_start],
> + sro->addr_list[i]);
> + found++;
> + i--;
> + }
> + }
> +
> + ret = rmi_sro_ensure_capacity(sro, count - found);
> + if (ret)
> + return ret;
> +
> + while (found < count) {
> + unsigned long addr_range;
> + void *virt = alloc_pages_exact(unit_size_bytes, gfp);
> + phys_addr_t phys;
> +
> + if (!virt)
> + return -ENOMEM;
> +
> + phys = virt_to_phys(virt);
> +
> + if (state == RMI_OP_MEM_DELEGATED) {
> + if (rmi_delegate_range(phys, unit_size_bytes)) {
> + free_pages_exact(virt, unit_size_bytes);
> + return -ENXIO;
> + }
> + }
> +
> + addr_range = phys & RMI_ADDR_RANGE_ADDR_MASK;
> + FIELD_MODIFY(RMI_ADDR_RANGE_SIZE_MASK, &addr_range, unit_size);
> + FIELD_MODIFY(RMI_ADDR_RANGE_COUNT_MASK, &addr_range, 1);
> + FIELD_MODIFY(RMI_ADDR_RANGE_STATE_MASK, &addr_range, state);
> +
> + sro->addr_list[sro->addr_count++] = addr_range;
> + found++;
> + }
> +
> + regs.a2 = virt_to_phys(&sro->addr_list[addr_list_start]);
> + regs.a3 = found;
> + rmi_smccc_invoke(®s, out_regs);
> +
> + unsigned long donated_granules = out_regs->a1;
> +
> + if (WARN_ON(donated_granules & ((unit_size_bytes >> PAGE_SHIFT) - 1))) {
> + /*
> + * FIXME: RMM has only consumed part of a huge page, this leaks
> + * the rest of the huge page
> + */
> + donated_granules = ALIGN(donated_granules,
> + (unit_size_bytes >> PAGE_SHIFT));
> + }
> + unsigned long donated_blocks = donated_granules / (unit_size_bytes >> PAGE_SHIFT);
> +
> + if (WARN_ON(donated_blocks > found))
> + donated_blocks = found;
> +
> + unsigned long undonated_blocks = found - donated_blocks;
> +
> + while (donated_blocks && undonated_blocks) {
> + sro->addr_count--;
> + swap(sro->addr_list[addr_list_start],
> + sro->addr_list[sro->addr_count]);
> + addr_list_start++;
> +
> + donated_blocks--;
> + undonated_blocks--;
> + }
> + sro->addr_count -= donated_blocks;
> +
> + return 0;
> +}
> +
> +static int rmi_sro_donate(struct rmi_sro_state *sro,
> + unsigned long sro_handle,
> + unsigned long donatereq,
> + struct arm_smccc_1_2_regs *regs,
> + gfp_t gfp)
> +{
> + unsigned long count = RMI_DONATE_COUNT(donatereq);
> +
> + if (WARN_ON(!count))
> + return 0;
> +
> + if (RMI_DONATE_CONTIG(donatereq)) {
> + return rmi_sro_donate_contig(sro, sro_handle, donatereq,
> + regs, gfp);
> + } else {
> + return rmi_sro_donate_noncontig(sro, sro_handle, donatereq,
> + regs, gfp);
> + }
> +}
> +
> +static int rmi_sro_reclaim(struct rmi_sro_state *sro,
> + unsigned long sro_handle,
> + struct arm_smccc_1_2_regs *out_regs)
> +{
> + unsigned long capacity;
> + struct arm_smccc_1_2_regs regs;
> + int ret;
> +
> + ret = rmi_sro_ensure_capacity(sro, 1);
> + if (ret)
> + rmi_sro_free(sro);
> +
> + capacity = RMI_MAX_ADDR_LIST - sro->addr_count;
> +
> + regs = (struct arm_smccc_1_2_regs){
> + SMC_RMI_OP_MEM_RECLAIM,
> + sro_handle,
> + virt_to_phys(&sro->addr_list[sro->addr_count]),
> + capacity
> + };
> + rmi_smccc_invoke(®s, out_regs);
> +
> + if (WARN_ON_ONCE(out_regs->a1 > capacity))
> + out_regs->a1 = capacity;
> +
> + sro->addr_count += out_regs->a1;
> +
> + return 0;
> +}
> +
> +void rmi_sro_free(struct rmi_sro_state *sro)
> +{
> + for (int i = 0; i < sro->addr_count; i++) {
> + unsigned long entry = sro->addr_list[i];
> + unsigned long addr = RMI_ADDR_RANGE_ADDR(entry);
> + unsigned long unit_size = RMI_ADDR_RANGE_SIZE(entry);
> + unsigned long count = RMI_ADDR_RANGE_COUNT(entry);
> + unsigned long state = RMI_ADDR_RANGE_STATE(entry);
> + unsigned long size = donate_req_to_size(unit_size) * count;
> +
> + if (state == RMI_OP_MEM_DELEGATED) {
> + if (WARN_ON(rmi_undelegate_range(addr, size))) {
> + /* Leak the pages */
> + continue;
> + }
> + }
> + free_pages_exact(phys_to_virt(addr), size);
> + }
> +
> + sro->addr_count = 0;
> +}
> +
> +unsigned long rmi_sro_execute(struct rmi_sro_state *sro, gfp_t gfp)
> +{
> + unsigned long sro_handle;
> + struct arm_smccc_1_2_regs regs;
> + struct arm_smccc_1_2_regs *regs_in = &sro->regs;
> +
> + rmi_smccc_invoke(regs_in, ®s);
> +
> + sro_handle = regs.a1;
> +
> + while (RMI_RETURN_STATUS(regs.a0) == RMI_INCOMPLETE) {
> + bool can_cancel = RMI_RETURN_CAN_CANCEL(regs.a0);
> + int ret;
> +
> + switch (RMI_RETURN_MEMREQ(regs.a0)) {
> + case RMI_OP_MEM_REQ_NONE:
> + regs = (struct arm_smccc_1_2_regs){
> + SMC_RMI_OP_CONTINUE, sro_handle, 0
> + };
> + rmi_smccc_invoke(®s, ®s);
> + break;
'ret' isn't initialized for case RMI_OP_MEM_REQ_NONE.
> + case RMI_OP_MEM_REQ_DONATE:
> + ret = rmi_sro_donate(sro, sro_handle, regs.a2, ®s,
> + gfp);
> + break;
> + case RMI_OP_MEM_REQ_RECLAIM:
> + ret = rmi_sro_reclaim(sro, sro_handle, ®s);
> + break;
> + default:
> + ret = WARN_ON(1);
> + break;
> + }
> +
> + if (ret) {
> + if (can_cancel) {
> + /*
> + * FIXME: Handle cancelling properly!
> + *
> + * If the operation has failed due to memory
> + * allocation failure then the information on
> + * the memory allocation should be saved, so
> + * that the allocation can be repeated outside
> + * of any context which prevented the
> + * allocation.
> + */
> + }
> + if (WARN_ON(ret))
> + return ret;
> + }
> + }
> +
> + return regs.a0;
> +}
> +
> static int rmi_check_version(void)
> {
> struct arm_smccc_res res;
Thanks,
Gavin
^ permalink raw reply
* [PATCH] [RFC] arm64: mmu: use range based TLB flushing when hot unplugging memory
From: Alistair Popple @ 2026-05-21 4:24 UTC (permalink / raw)
To: linux-arm-kernel
Cc: linux-kernel, linux-mm, catalin.marinas, will, david,
anshuman.khandual, ryan.roberts, dev.jain, balbirs, jhubbard,
Alistair Popple
Hot unplugging memory on ARM64 requires a TLB invalidate after unmapping
the page to be hot unplugged from the direct map. Currently that happens
one page at a time, meaning range based invalidates cannot be used. The
result of this is that removing large amounts of memory takes a long
time and in some cases can trigger an RCU stall warning.
For example on one system hot unplugging 480GB of memory takes ~1
minute. With this change the same operation took ~1 second, a 60x
improvement.
Signed-off-by: Alistair Popple <apopple@nvidia.com>
---
This is an RFC, because I'm not sure the change is correct as it frees
the PTE page before flushing the TLB. I'm not familiar enough with ARM64
architecture to be sure this is safe, for example I don't know if HW
can update PTE bits such as access/dirty in the page through a stale
TLB entry.
If so this would open a window during which the page is free but could
still be written to. Likely the safe option would be to collect all the
pages to be free on a list and free them after doing the range based TLB
flush, but wanted to get feedback on the approach before implementing it
which is the goal of this RFC.
---
arch/arm64/mm/mmu.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 0c24fe650e95..75c773232c14 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -1459,11 +1459,12 @@ static void unmap_hotplug_pte_range(pmd_t *pmdp, unsigned long addr,
WARN_ON(!pte_present(pte));
__pte_clear(&init_mm, addr, ptep);
- flush_tlb_kernel_range(addr, addr + PAGE_SIZE);
if (free_mapped)
free_hotplug_page_range(pte_page(pte),
PAGE_SIZE, altmap);
} while (addr += PAGE_SIZE, addr < end);
+
+ flush_tlb_kernel_range(addr, end);
}
static void unmap_hotplug_pmd_range(pud_t *pudp, unsigned long addr,
--
2.54.0
^ permalink raw reply related
* [PATCH] clocksource/drivers/owl: fix refcount leak
From: Alexander A. Klimov @ 2026-05-21 4:19 UTC (permalink / raw)
To: Daniel Lezcano, Thomas Gleixner, Andreas Färber,
Manivannan Sadhasivam, open list:CLOCKSOURCE, CLOCKEVENT DRIVERS,
moderated list:ARM/ACTIONS SEMI ARCHITECTURE,
moderated list:ARM/ACTIONS SEMI ARCHITECTURE
Cc: Alexander A. Klimov
Every value returned from of_clk_get() is supposed to be cleaned up
via clk_put() once not needed anymore.
Fixes: 4be78a86c506 ("clocksource: Add Owl timer")
Signed-off-by: Alexander A. Klimov <grandmaster@al2klimov.de>
---
drivers/clocksource/timer-owl.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/clocksource/timer-owl.c b/drivers/clocksource/timer-owl.c
index ac97420bfa7c..fa347f430563 100644
--- a/drivers/clocksource/timer-owl.c
+++ b/drivers/clocksource/timer-owl.c
@@ -142,6 +142,7 @@ static int __init owl_timer_init(struct device_node *node)
}
rate = clk_get_rate(clk);
+ clk_put(clk);
owl_timer_reset(owl_clksrc_base);
owl_timer_set_enabled(owl_clksrc_base, true);
--
2.54.0
^ permalink raw reply related
* Re: [PATCH 8/8] sched_ext: Convert ops.set_cmask() to arena-resident cmask
From: Emil Tsalapatis @ 2026-05-21 4:19 UTC (permalink / raw)
To: Tejun Heo, David Vernet, Andrea Righi, Changwoo Min,
Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
Martin KaFai Lau, Kumar Kartikeya Dwivedi
Cc: Peter Zijlstra, Catalin Marinas, Will Deacon, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen, Andrew Morton,
David Hildenbrand, Mike Rapoport, Emil Tsalapatis, sched-ext, bpf,
x86, linux-arm-kernel, linux-mm, linux-kernel
In-Reply-To: <20260520235052.4180316-9-tj@kernel.org>
On Wed May 20, 2026 at 7:50 PM EDT, Tejun Heo wrote:
> ops_cid.set_cmask() expects a cmask. The kernel couldn't write into the
> arena, so it translated cpumask -> cmask in kernel memory and passed the
> result as a trusted pointer. The BPF cmask helpers all operate on arena
> cmasks though, so the BPF side had to word-by-word probe-read the kernel
> cmask into an arena cmask via cmask_copy_from_kernel() before any helper
> could touch it. It works, but is clumsy.
>
> With direct kernel-side arena access now in place, build the cmask in the
> arena. The kernel writes to it through the kern_va side of the dual mapping;
> BPF directly dereferences it via an __arena pointer like any other arena
> struct.
>
> Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
> ---
> kernel/sched/ext.c | 68 +++++++++++++++++++++++++--
> kernel/sched/ext_cid.c | 20 +-------
> kernel/sched/ext_internal.h | 10 +++-
> tools/sched_ext/include/scx/cid.bpf.h | 52 --------------------
> tools/sched_ext/scx_qmap.bpf.c | 5 +-
> 5 files changed, 75 insertions(+), 80 deletions(-)
>
> diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
> index fb91079c1244..94562e3350c6 100644
> --- a/kernel/sched/ext.c
> +++ b/kernel/sched/ext.c
> @@ -621,11 +621,16 @@ static inline void scx_call_op_set_cpumask(struct scx_sched *sch, struct rq *rq,
> update_locked_rq(rq);
>
> if (scx_is_cid_type()) {
> - struct scx_cmask *cmask = this_cpu_ptr(scx_set_cmask_scratch);
> -
> - lockdep_assert_irqs_disabled();
> - scx_cpumask_to_cmask(cpumask, cmask);
> - sch->ops_cid.set_cmask(task, cmask);
> + struct scx_cmask *kern_va = *this_cpu_ptr(sch->set_cmask_scratch);
> + unsigned long uaddr = (unsigned long)kern_va -
> + bpf_arena_map_kern_vm_start(sch->arena_map);
> + /*
> + * Build the per-CPU arena cmask and hand BPF the uaddr. Caller
> + * holds the rq lock with IRQs disabled, which makes us the sole
> + * user of the scratch area.
> + */
> + scx_cpumask_to_cmask(cpumask, kern_va);
> + sch->ops_cid.set_cmask(task, (struct scx_cmask *)uaddr);
> } else {
> sch->ops.set_cpumask(task, cpumask);
> }
> @@ -4949,6 +4954,48 @@ static const struct attribute_group scx_global_attr_group = {
> static void free_pnode(struct scx_sched_pnode *pnode);
> static void free_exit_info(struct scx_exit_info *ei);
>
> +static s32 scx_set_cmask_scratch_alloc(struct scx_sched *sch)
> +{
> + size_t size = struct_size_t(struct scx_cmask, bits,
> + SCX_CMASK_NR_WORDS(num_possible_cpus()));
> + int cpu;
> +
> + if (!sch->is_cid_type || !sch->arena_pool)
> + return 0;
> +
> + sch->set_cmask_scratch = alloc_percpu(struct scx_cmask *);
> + if (!sch->set_cmask_scratch)
> + return -ENOMEM;
> +
> + for_each_possible_cpu(cpu) {
> + struct scx_cmask **slot = per_cpu_ptr(sch->set_cmask_scratch, cpu);
> +
> + *slot = scx_arena_alloc(sch, size);
> + if (!*slot)
> + return -ENOMEM;
> + scx_cmask_init(*slot, 0, num_possible_cpus());
> + }
> + return 0;
> +}
> +
> +static void scx_set_cmask_scratch_free(struct scx_sched *sch)
> +{
> + size_t size = struct_size_t(struct scx_cmask, bits,
> + SCX_CMASK_NR_WORDS(num_possible_cpus()));
> + int cpu;
> +
> + if (!sch->set_cmask_scratch)
> + return;
> +
> + for_each_possible_cpu(cpu) {
> + struct scx_cmask **slot = per_cpu_ptr(sch->set_cmask_scratch, cpu);
> +
> + scx_arena_free(sch, *slot, size);
> + }
> + free_percpu(sch->set_cmask_scratch);
> + sch->set_cmask_scratch = NULL;
> +}
> +
> static void scx_sched_free_rcu_work(struct work_struct *work)
> {
> struct rcu_work *rcu_work = to_rcu_work(work);
> @@ -5003,6 +5050,7 @@ static void scx_sched_free_rcu_work(struct work_struct *work)
>
> rhashtable_free_and_destroy(&sch->dsq_hash, NULL, NULL);
> free_exit_info(sch->exit_info);
> + scx_set_cmask_scratch_free(sch);
> scx_arena_pool_destroy(sch);
> if (sch->arena_map)
> bpf_map_put(sch->arena_map);
> @@ -7162,6 +7210,12 @@ static void scx_root_enable_workfn(struct kthread_work *work)
> goto err_disable;
> }
>
> + ret = scx_set_cmask_scratch_alloc(sch);
> + if (ret) {
> + cpus_read_unlock();
> + goto err_disable;
> + }
> +
> for (i = SCX_OPI_CPU_HOTPLUG_BEGIN; i < SCX_OPI_CPU_HOTPLUG_END; i++)
> if (((void (**)(void))ops)[i])
> set_bit(i, sch->has_op);
> @@ -7484,6 +7538,10 @@ static void scx_sub_enable_workfn(struct kthread_work *work)
> if (ret)
> goto err_disable;
>
> + ret = scx_set_cmask_scratch_alloc(sch);
> + if (ret)
> + goto err_disable;
> +
> if (validate_ops(sch, ops))
> goto err_disable;
>
> diff --git a/kernel/sched/ext_cid.c b/kernel/sched/ext_cid.c
> index 0c91b951fd33..808c6390da5a 100644
> --- a/kernel/sched/ext_cid.c
> +++ b/kernel/sched/ext_cid.c
> @@ -7,14 +7,6 @@
> */
> #include <linux/cacheinfo.h>
>
> -/*
> - * Per-cpu scratch cmask used by scx_call_op_set_cpumask() to synthesize a
> - * cmask from a cpumask. Allocated alongside the cid arrays on first enable
> - * and never freed. Sized to the full cid space. Caller holds rq lock so
> - * this_cpu_ptr is safe.
> - */
> -struct scx_cmask __percpu *scx_set_cmask_scratch;
> -
> /*
> * cid tables.
> *
> @@ -54,8 +46,6 @@ static s32 scx_cid_arrays_alloc(void)
> u32 npossible = num_possible_cpus();
> s16 *cid_to_cpu, *cpu_to_cid;
> struct scx_cid_topo *cid_topo;
> - struct scx_cmask __percpu *set_cmask_scratch;
> - s32 cpu;
>
> if (scx_cid_to_cpu_tbl)
> return 0;
> @@ -63,25 +53,17 @@ static s32 scx_cid_arrays_alloc(void)
> cid_to_cpu = kzalloc_objs(*scx_cid_to_cpu_tbl, npossible, GFP_KERNEL);
> cpu_to_cid = kzalloc_objs(*scx_cpu_to_cid_tbl, nr_cpu_ids, GFP_KERNEL);
> cid_topo = kmalloc_objs(*scx_cid_topo, npossible, GFP_KERNEL);
> - set_cmask_scratch = __alloc_percpu(struct_size(set_cmask_scratch, bits,
> - SCX_CMASK_NR_WORDS(npossible)),
> - sizeof(u64));
>
> - if (!cid_to_cpu || !cpu_to_cid || !cid_topo || !set_cmask_scratch) {
> + if (!cid_to_cpu || !cpu_to_cid || !cid_topo) {
> kfree(cid_to_cpu);
> kfree(cpu_to_cid);
> kfree(cid_topo);
> - free_percpu(set_cmask_scratch);
> return -ENOMEM;
> }
>
> WRITE_ONCE(scx_cid_to_cpu_tbl, cid_to_cpu);
> WRITE_ONCE(scx_cpu_to_cid_tbl, cpu_to_cid);
> WRITE_ONCE(scx_cid_topo, cid_topo);
> - for_each_possible_cpu(cpu)
> - scx_cmask_init(per_cpu_ptr(set_cmask_scratch, cpu),
> - 0, npossible);
> - WRITE_ONCE(scx_set_cmask_scratch, set_cmask_scratch);
> return 0;
> }
>
> diff --git a/kernel/sched/ext_internal.h b/kernel/sched/ext_internal.h
> index ff7e882bd67a..9bb65367f510 100644
> --- a/kernel/sched/ext_internal.h
> +++ b/kernel/sched/ext_internal.h
> @@ -1124,6 +1124,14 @@ struct scx_sched {
> struct bpf_map *arena_map;
> struct gen_pool *arena_pool;
>
> + /*
> + * Per-CPU arena cmask used by scx_call_op_set_cpumask() to hand a cmask
> + * to ops_cid.set_cmask(). The kernel writes through the stored kern_va;
> + * the BPF-arena uaddr handed to BPF is recovered by subtracting the
> + * arena's kern_vm_start.
> + */
> + struct scx_cmask * __percpu *set_cmask_scratch;
> +
> DECLARE_BITMAP(has_op, SCX_OPI_END);
>
> /*
> @@ -1480,8 +1488,6 @@ enum scx_ops_state {
> extern struct scx_sched __rcu *scx_root;
> DECLARE_PER_CPU(struct rq *, scx_locked_rq_state);
>
> -extern struct scx_cmask __percpu *scx_set_cmask_scratch;
> -
> /*
> * True when the currently loaded scheduler hierarchy is cid-form. All scheds
> * in a hierarchy share one form, so this single key tells callsites which
> diff --git a/tools/sched_ext/include/scx/cid.bpf.h b/tools/sched_ext/include/scx/cid.bpf.h
> index e281c88fa824..70f2a3829af4 100644
> --- a/tools/sched_ext/include/scx/cid.bpf.h
> +++ b/tools/sched_ext/include/scx/cid.bpf.h
> @@ -675,56 +675,4 @@ static __always_inline void cmask_from_cpumask(struct scx_cmask __arena *m,
> }
> }
>
> -/**
> - * cmask_copy_from_kernel - probe-read a kernel cmask into an arena cmask
> - * @dst: arena cmask to fill; must have @dst->base == 0 and be sized for @src.
> - * @src: kernel-memory cmask (e.g. ops.set_cmask() arg); @src->base must be 0.
> - *
> - * Word-for-word copy; @src and @dst must share base 0 alignment. Triggers
> - * scx_bpf_error() on probe failure or precondition violation.
> - */
> -static __always_inline void cmask_copy_from_kernel(struct scx_cmask __arena *dst,
> - const struct scx_cmask *src)
> -{
> - u32 base = 0, nr_cids = 0, nr_words, wi;
> -
> - if (dst->base != 0) {
> - scx_bpf_error("cmask_copy_from_kernel requires dst->base == 0");
> - return;
> - }
> -
> - if (bpf_probe_read_kernel(&base, sizeof(base), &src->base)) {
> - scx_bpf_error("probe-read cmask->base failed");
> - return;
> - }
> - if (base != 0) {
> - scx_bpf_error("cmask_copy_from_kernel requires src->base == 0");
> - return;
> - }
> -
> - if (bpf_probe_read_kernel(&nr_cids, sizeof(nr_cids), &src->nr_cids)) {
> - scx_bpf_error("probe-read cmask->nr_cids failed");
> - return;
> - }
> -
> - if (nr_cids > dst->nr_cids) {
> - scx_bpf_error("src cmask nr_cids=%u exceeds dst nr_cids=%u",
> - nr_cids, dst->nr_cids);
> - return;
> - }
> -
> - nr_words = CMASK_NR_WORDS(nr_cids);
> - cmask_zero(dst);
> - bpf_for(wi, 0, CMASK_MAX_WORDS) {
> - u64 word = 0;
> - if (wi >= nr_words)
> - break;
> - if (bpf_probe_read_kernel(&word, sizeof(u64), &src->bits[wi])) {
> - scx_bpf_error("probe-read cmask->bits[%u] failed", wi);
> - return;
> - }
> - dst->bits[wi] = word;
> - }
> -}
> -
> #endif /* __SCX_CID_BPF_H */
> diff --git a/tools/sched_ext/scx_qmap.bpf.c b/tools/sched_ext/scx_qmap.bpf.c
> index 7e77f22674ea..8a2d6a8ebd8e 100644
> --- a/tools/sched_ext/scx_qmap.bpf.c
> +++ b/tools/sched_ext/scx_qmap.bpf.c
> @@ -919,14 +919,15 @@ void BPF_STRUCT_OPS(qmap_update_idle, s32 cid, bool idle)
> }
>
> void BPF_STRUCT_OPS(qmap_set_cmask, struct task_struct *p,
> - const struct scx_cmask *cmask)
> + const struct scx_cmask *cmask_in)
> {
> + struct scx_cmask __arena *cmask = (struct scx_cmask __arena *)(long)cmask_in;
> task_ctx_t *taskc;
>
> taskc = lookup_task_ctx(p);
> if (!taskc)
> return;
> - cmask_copy_from_kernel(&taskc->cpus_allowed, cmask);
> + cmask_copy(&taskc->cpus_allowed, cmask);
> }
>
> struct monitor_timer {
^ permalink raw reply
* Re: [PATCH] clk: moxart: fix refcount leak
From: Alexander A. Klimov @ 2026-05-21 4:16 UTC (permalink / raw)
To: Brian Masney
Cc: Krzysztof Kozlowski, Michael Turquette, Stephen Boyd,
Jonas Jensen, Mike Turquette, moderated list:ARM/MOXA ART SOC,
open list:COMMON CLK FRAMEWORK, open list
In-Reply-To: <ag41wJBhdK7-Zynb@redhat.com>
On 5/21/26 00:29, Brian Masney wrote:
> Hi Alexander,
>
> On Wed, May 20, 2026 at 07:55:50PM +0200, Alexander A. Klimov wrote:
>> Every value returned from of_clk_get() is supposed to be cleaned up
>> via clk_put() once not needed anymore.
>> The values here are used only for error checking,
>> but weren't cleaned up until now.
>>
>> Fixes: c7bb4fc16ead ("clk: add MOXA ART SoCs clock driver")
>> Signed-off-by: Alexander A. Klimov <grandmaster@al2klimov.de>
>> ---
>> drivers/clk/clk-moxart.c | 2 ++
>> 1 file changed, 2 insertions(+)
>>
>> diff --git a/drivers/clk/clk-moxart.c b/drivers/clk/clk-moxart.c
>> index 3786a0153ad1..7e191b1481bb 100644
>> --- a/drivers/clk/clk-moxart.c
>> +++ b/drivers/clk/clk-moxart.c
>> @@ -39,6 +39,7 @@ static void __init moxart_of_pll_clk_init(struct device_node *node)
>> pr_err("%pOF: of_clk_get failed\n", node);
>> return;
>> }
>> + clk_put(ref_clk);
>>
>> hw = clk_hw_register_fixed_factor(NULL, name, parent_name, 0, mul, 1);
>> if (IS_ERR(hw)) {
>> @@ -83,6 +84,7 @@ static void __init moxart_of_apb_clk_init(struct device_node *node)
>> pr_err("%pOF: of_clk_get failed\n", node);
>> return;
>> }
>> + clk_put(pll_clk);
>
> So this immediately drops the reference to the clk after of_clk_get() is
> called. Can we just remove these two of_clk_get() calls since they don't
> appear to be used?
Not if their purpose is to... idk...
check whether device_node is a clock at all, maybe?
^ permalink raw reply
* Re: [PATCH 6/8] sched_ext: Require an arena for cid-form schedulers
From: Emil Tsalapatis @ 2026-05-21 4:15 UTC (permalink / raw)
To: Tejun Heo, David Vernet, Andrea Righi, Changwoo Min,
Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
Martin KaFai Lau, Kumar Kartikeya Dwivedi
Cc: Peter Zijlstra, Catalin Marinas, Will Deacon, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen, Andrew Morton,
David Hildenbrand, Mike Rapoport, Emil Tsalapatis, sched-ext, bpf,
x86, linux-arm-kernel, linux-mm, linux-kernel
In-Reply-To: <20260520235052.4180316-7-tj@kernel.org>
On Wed May 20, 2026 at 7:50 PM EDT, Tejun Heo wrote:
> Upcoming patches will let the kernel place arena-resident scratch shared
> with the BPF program (e.g. per-CPU set_cmask cmask) so the BPF side can
> dereference it directly via __arena pointers, replacing the current
> cmask_copy_from_kernel() probe-read loop. That requires each cid-form
> scheduler to expose its arena to the kernel. Kernel- side accesses are
> recovered by the per-arena scratch-page mechanism.
>
> bpf_scx_reg_cid() walks the struct_ops member progs via
> bpf_struct_ops_for_each_prog() and reads each prog's arena via
> bpf_prog_arena(). The verifier enforces one arena per program, so each
> member prog contributes at most one arena. All non-NULL contributions must
> match and at least one member prog must use an arena. The map ref is held on
> scx_sched and dropped on sched destroy. cpu-form schedulers (bpf_scx_reg)
> are unchanged - no arena requirement.
>
> Signed-off-by: Tejun Heo <tj@kernel.org>
> ---
> kernel/sched/ext.c | 56 ++++++++++++++++++++++++++++++++++++-
> kernel/sched/ext_internal.h | 8 ++++++
> 2 files changed, 63 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
> index 9c458552d14f..56f94ac32ba0 100644
> --- a/kernel/sched/ext.c
> +++ b/kernel/sched/ext.c
> @@ -5003,6 +5003,8 @@ static void scx_sched_free_rcu_work(struct work_struct *work)
>
> rhashtable_free_and_destroy(&sch->dsq_hash, NULL, NULL);
> free_exit_info(sch->exit_info);
> + if (sch->arena_map)
> + bpf_map_put(sch->arena_map);
> kfree(sch);
> }
>
> @@ -6746,6 +6748,7 @@ struct scx_enable_cmd {
> struct sched_ext_ops_cid *ops_cid;
> };
> bool is_cid_type;
> + struct bpf_map *arena_map; /* arena ref to transfer to sch */
> int ret;
> };
>
> @@ -6913,6 +6916,15 @@ static struct scx_sched *scx_alloc_and_add_sched(struct scx_enable_cmd *cmd,
> return ERR_PTR(ret);
> }
> #endif /* CONFIG_EXT_SUB_SCHED */
> +
> + /*
> + * Consume the arena_map ref bpf_scx_reg_cid() took. Defer to here so
> + * earlier failure paths leave cmd->arena_map set and bpf_scx_reg_cid
> + * drops the ref. After this point, sch owns the ref and any cleanup
> + * runs through scx_sched_free_rcu_work() which puts it.
> + */
> + sch->arena_map = cmd->arena_map;
> + cmd->arena_map = NULL;
> return sch;
>
> #ifdef CONFIG_EXT_SUB_SCHED
> @@ -7898,11 +7910,53 @@ static int bpf_scx_reg(void *kdata, struct bpf_link *link)
> return scx_enable(&cmd, link);
> }
>
> +struct scx_arena_scan {
> + struct bpf_map *arena;
> + int err;
Can we skip the int err here...
> +};
> +
> +/*
> + * The verifier enforces one arena per BPF program, so each struct_ops
> + * member prog contributes at most one arena via bpf_prog_arena().
> + * Require all non-NULL contributions to match.
> + */
> +static int scx_arena_scan_prog(struct bpf_prog *prog, void *data)
> +{
> + struct scx_arena_scan *s = data;
> + struct bpf_map *arena = bpf_prog_arena(prog);
> +
> + if (!arena)
> + return 0;
> + if (s->arena && s->arena != arena) {
> + s->err = -EINVAL;
...and just directly return -EINVAL here? bpf_struct_ops_for_each_prog
breaks when we return non-zero so do we need the extra scx_arena_scan
struct?
> + return 1;
> + }
> + s->arena = arena;
> + return 0;
> +}
> +
> static int bpf_scx_reg_cid(void *kdata, struct bpf_link *link)
> {
> struct scx_enable_cmd cmd = { .ops_cid = kdata, .is_cid_type = true };
> + struct scx_arena_scan scan = {};
> + int ret;
>
> - return scx_enable(&cmd, link);
> + bpf_struct_ops_for_each_prog(kdata, scx_arena_scan_prog, &scan);
> + if (scan.err) {
> + pr_err("sched_ext: cid-form scheduler uses multiple arena maps\n");
> + return scan.err;
> + }
> + if (!scan.arena) {
> + pr_err("sched_ext: cid-form scheduler must use a BPF arena map\n");
> + return -EINVAL;
> + }
> +
> + bpf_map_inc(scan.arena);
> + cmd.arena_map = scan.arena;
> + ret = scx_enable(&cmd, link);
> + if (cmd.arena_map) /* not consumed by scx_alloc_and_add_sched() */
> + bpf_map_put(cmd.arena_map);
> + return ret;
> }
>
> static void bpf_scx_unreg(void *kdata, struct bpf_link *link)
> diff --git a/kernel/sched/ext_internal.h b/kernel/sched/ext_internal.h
> index 7258aea94b9f..d40cfd29ddaa 100644
> --- a/kernel/sched/ext_internal.h
> +++ b/kernel/sched/ext_internal.h
> @@ -1111,6 +1111,14 @@ struct scx_sched {
> struct sched_ext_ops_cid ops_cid;
> };
> bool is_cid_type; /* true if registered via bpf_sched_ext_ops_cid */
> +
> + /*
> + * Arena map auto-discovered from member progs at struct_ops attach.
> + * cid-form schedulers must use exactly one arena across all member
> + * progs. NULL on cpu-form.
> + */
> + struct bpf_map *arena_map;
> +
> DECLARE_BITMAP(has_op, SCX_OPI_END);
>
> /*
^ permalink raw reply
* Re: [PATCH 5/8] bpf/arena: Add bpf_arena_map_kern_vm_start() and bpf_prog_arena()
From: Emil Tsalapatis @ 2026-05-21 4:08 UTC (permalink / raw)
To: Tejun Heo, David Vernet, Andrea Righi, Changwoo Min,
Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
Martin KaFai Lau, Kumar Kartikeya Dwivedi
Cc: Peter Zijlstra, Catalin Marinas, Will Deacon, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen, Andrew Morton,
David Hildenbrand, Mike Rapoport, Emil Tsalapatis, sched-ext, bpf,
x86, linux-arm-kernel, linux-mm, linux-kernel
In-Reply-To: <20260520235052.4180316-6-tj@kernel.org>
On Wed May 20, 2026 at 7:50 PM EDT, Tejun Heo wrote:
> struct bpf_arena is opaque to callers outside arena.c. Add two helpers
> for struct_ops subsystems that need to reach into an arena:
>
> bpf_arena_map_kern_vm_start(struct bpf_map *map)
> returns @map's kern_vm_start. A sched_ext follow-up needs this
> to translate kern_va <-> uaddr.
>
> bpf_prog_arena(struct bpf_prog *prog)
> returns the bpf_map of the arena referenced by @prog (NULL if
> @prog references no arena). The verifier enforces at most one
> arena per program. Used by struct_ops callers that auto-discover
> an arena from a member prog and need to take a map reference.
>
> Suggested-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
> ---
> include/linux/bpf.h | 2 ++
> kernel/bpf/arena.c | 26 ++++++++++++++++++++++++++
> 2 files changed, 28 insertions(+)
>
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index 5b99d786e98c..e1ba57c10aaa 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -618,6 +618,8 @@ void bpf_rb_root_free(const struct btf_field *field, void *rb_root,
> struct bpf_spin_lock *spin_lock);
> u64 bpf_arena_get_kern_vm_start(struct bpf_arena *arena);
> u64 bpf_arena_get_user_vm_start(struct bpf_arena *arena);
> +u64 bpf_arena_map_kern_vm_start(struct bpf_map *map);
> +struct bpf_map *bpf_prog_arena(struct bpf_prog *prog);
> int bpf_obj_name_cpy(char *dst, const char *src, unsigned int size);
>
> struct bpf_offload_dev;
> diff --git a/kernel/bpf/arena.c b/kernel/bpf/arena.c
> index a811cf6170fa..51b9ae36feb6 100644
> --- a/kernel/bpf/arena.c
> +++ b/kernel/bpf/arena.c
> @@ -84,6 +84,32 @@ u64 bpf_arena_get_user_vm_start(struct bpf_arena *arena)
> return arena ? arena->user_vm_start : 0;
> }
>
> +/**
> + * bpf_arena_map_kern_vm_start - kern_vm_start lookup by struct bpf_map *
> + * @map: a BPF_MAP_TYPE_ARENA map
> + *
> + * Return @map's kern_vm_start.
> + */
> +u64 bpf_arena_map_kern_vm_start(struct bpf_map *map)
> +{
> + return bpf_arena_get_kern_vm_start(container_of(map, struct bpf_arena, map));
> +}
> +
> +/**
> + * bpf_prog_arena - return the bpf_map of the arena referenced by @prog
> + * @prog: a loaded BPF program
> + *
> + * The verifier enforces at most one arena per program and stores it in
> + * prog->aux->arena. Return that arena's underlying bpf_map, or NULL if
> + * @prog does not reference an arena.
> + */
> +struct bpf_map *bpf_prog_arena(struct bpf_prog *prog)
> +{
> + struct bpf_arena *arena = prog->aux->arena;
> +
> + return arena ? &arena->map : NULL;
> +}
> +
> static long arena_map_peek_elem(struct bpf_map *map, void *value)
> {
> return -EOPNOTSUPP;
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox