* Re: [PATCH RFC] ACPI: processor: idle: Do not propagate acpi_processor_ffh_lpi_probe() -ENODEV
From: lihuisong (C) @ 2026-04-20 6:38 UTC (permalink / raw)
To: Rafael J. Wysocki, Breno Leitao
Cc: Sudeep Holla, Len Brown, lpieralisi, catalin.marinas, will,
Rafael J. Wysocki, linux-acpi, linux-kernel, pjaroszynski,
guohanjun, linux-arm-kernel, rmikey, kernel-team, lihuisong
In-Reply-To: <CAJZ5v0hftnagiLTPCEjqviyug5g5NQCztX1o0w5NOY8d=5cz+g@mail.gmail.com>
On 4/15/2026 10:03 PM, Rafael J. Wysocki wrote:
> On Wed, Apr 15, 2026 at 3:32 AM lihuisong (C) <lihuisong@huawei.com> wrote:
>>
>> On 4/14/2026 8:25 PM, Sudeep Holla wrote:
>>> On Tue, Apr 14, 2026 at 07:31:29PM +0800, lihuisong (C) wrote:
>>>> On 4/14/2026 6:21 PM, Breno Leitao wrote:
>>>>> Hello Huisong,
>>>>>
>>>>> On Tue, Apr 14, 2026 at 05:43:51PM +0800, lihuisong (C) wrote:
>>>>>> But it is a real issue. Thanks for your report.
>>>>>> I think the best way to fix your issue is that remove this verification in
>>>>>> psci_acpi_cpu_init_idle().
>>>>>> Because it is legal for platform to report one LPI state.
>>>>>> This function just needs to verify the LPI states which are FFH.
>>>>> Thank you for the prompt feedback.
>>>>>
>>>>> Would this approach work?
>>>>>
>>>>> commit 6c9d52840a4f778cc989838ba76ee51416e85de3
>>>>> Author: Breno Leitao <leitao@debian.org>
>>>>> Date: Tue Apr 14 03:16:08 2026 -0700
>>>>>
>>>>> ACPI: processor: idle: Allow platforms with only one LPI state
>>>>> psci_acpi_cpu_init_idle() rejects platforms where power.count - 1 <= 0
>>>>> by returning -ENODEV. However, having a single LPI state (WFI) is a
>>>>> valid configuration. The function's purpose is to verify FFH idle states,
>>>>> and when count is zero, there are simply no FFH states to validate —
>>>>> this is not an error.
>>>>> On NVIDIA Grace (aarch64) systems with PSCIv1.1, power.count is 1 for
>>>>> all 72 CPUs, so the probe fails with -ENODEV. After commit cac173bea57d
>>>>> ("ACPI: processor: idle: Rework the handling of
>>>>> acpi_processor_ffh_lpi_probe()"), this failure propagates up and prevents
>>>>> cpuidle registration entirely.
>>>>> Change the check from (count <= 0) to (count < 0) so that platforms
>>>>> with only WFI are accepted. The for loop naturally handles count == 0
>>>>> by not iterating.
>>>>> Fixes: cac173bea57d ("ACPI: processor: idle: Rework the handling of acpi_processor_ffh_lpi_probe()")
>>>>> Signed-off-by: Breno Leitao <leitao@debian.org>
>>>>>
>>>>> diff --git a/drivers/acpi/arm64/cpuidle.c b/drivers/acpi/arm64/cpuidle.c
>>>>> index 801f9c4501425..7791b751042ce 100644
>>>>> --- a/drivers/acpi/arm64/cpuidle.c
>>>>> +++ b/drivers/acpi/arm64/cpuidle.c
>>>>> @@ -31,7 +31,7 @@ static int psci_acpi_cpu_init_idle(unsigned int cpu)
>>>>> return -EOPNOTSUPP;
>>>>> count = pr->power.count - 1;
>>>>> - if (count <= 0)
>>>>> + if (count < 0)
>>>>> return -ENODEV;
>>>>> for (i = 0; i < count; i++) {
>>>> This count already verified in acpi_processor_get_lpi_info.
>>>>
>>>> I suggest modifing it as below:
>>>>
>>>> -->
>>>>
>>>> git diff
>>>> diff --git a/drivers/acpi/arm64/cpuidle.c b/drivers/acpi/arm64/cpuidle.c
>>>> index 801f9c450142..c68a5db8ebba 100644
>>>> --- a/drivers/acpi/arm64/cpuidle.c
>>>> +++ b/drivers/acpi/arm64/cpuidle.c
>>>> @@ -16,7 +16,7 @@
>>>>
>>>> static int psci_acpi_cpu_init_idle(unsigned int cpu)
>>>> {
>>>> - int i, count;
>>>> + int i;
>>>> struct acpi_lpi_state *lpi;
>>>> struct acpi_processor *pr = per_cpu(processors, cpu);
>>>>
>>>> @@ -30,14 +30,10 @@ static int psci_acpi_cpu_init_idle(unsigned int cpu)
>>>> if (!psci_ops.cpu_suspend)
>>>> return -EOPNOTSUPP;
>>>>
>>>> - count = pr->power.count - 1;
>>>> - if (count <= 0)
>>>> - return -ENODEV;
>>>> -
>>> It was intentionally designed this way, as there is little value in defining
>>> only WFI in the _LPI tables. In the absence of a cpuidle driver/LPI entry,
>>> arch_cpu_idle() is invoked, which is sufficient and avoids unnecessary
>>> complexity, only to ultimately execute wfi() anyway.
>> Yeah, it's correct. The code flow will be more simple and high-efficiency.
>> This looks good to me.
>>
>>
>> But cpuidle sysfs under per CPU is created when firmware just reports
>> WFI state before
>> my commit cac173bea57d ("ACPI: processor: idle: Rework the handling of
>> acpi_processor_ffh_lpi_probe()").
>> However, these platforms will no longer be created now and some
>> statistics for state0 are also missing.
>> This change in behavor is visiable to user space.I'm not sure if it is
>> acceptable.
>> What do you think, Rafael?
> I think that it would be good to restore the previous behavior,
> especially if it has been changed inadvertently.
Agreed.
Can you send it again using my proposal, @breno?
We can send out other patch to discuss it if need to optimize the point
Sudeep mentioned.
>
^ permalink raw reply
* RE: [PATCH v1] clk: imx95-blk-ctl: Fix REFCLK rise-fall mismatch on i.MX95
From: Hongxing Zhu @ 2026-04-20 6:44 UTC (permalink / raw)
To: Peng Fan, abelvesa@kernel.org, mturquette@baylibre.com,
sboyd@kernel.org, Frank Li, s.hauer@pengutronix.de,
festevam@gmail.com
Cc: linux-clk@vger.kernel.org, imx@nxp.com,
linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org, kernel@pengutronix.de
In-Reply-To: <PAXPR04MB845979CEDF7426A07BCB6FA388202@PAXPR04MB8459.eurprd04.prod.outlook.com>
> -----Original Message-----
> From: Peng Fan <peng.fan@nxp.com>
> Sent: Friday, April 17, 2026 5:22 PM
> To: Hongxing Zhu <hongxing.zhu@nxp.com>; abelvesa@kernel.org;
> mturquette@baylibre.com; sboyd@kernel.org; Frank Li <frank.li@nxp.com>;
> s.hauer@pengutronix.de; festevam@gmail.com
> Cc: linux-clk@vger.kernel.org; imx@nxp.com; linux-arm-
> kernel@lists.infradead.org; linux-kernel@vger.kernel.org; kernel@pengutronix.de
> Subject: RE: [PATCH v1] clk: imx95-blk-ctl: Fix REFCLK rise-fall mismatch on
> i.MX95
>
> Hi Richard,
>
> > Subject: [PATCH v1] clk: imx95-blk-ctl: Fix REFCLK rise-fall mismatch
> > on
> > i.MX95
> >
> > When the internal PLL is used as the PCIe reference clock source on
> > i.MX95, a REFCLK rise-fall time mismatch is observed during PCIe Gen1
> > compliance testing with the Lfast IO analyzer.
> >
> > Fix this issue by configuring the IREF_TX field to 0xF (15), which
> > adjusts the transmitter current reference to meet the PCIe
> > specification timing requirements.
>
> BLK CTRL in HSIOMIX should be save/restore for the settings you configured in
> probe phase.
Hi Peng:
The register containing the pre-configured settings is the same as the
gate-clock register. Therefore, its value will be saved and restored during
the suspend/resume procedures.
Thanks.
Best Regards
Richard Zhu
>
> Regards
> Peng.
>
> >
> > Signed-off-by: Richard Zhu <hongxing.zhu@nxp.com>
> > ---
> > drivers/clk/imx/clk-imx95-blk-ctl.c | 7 +++++++
> > 1 file changed, 7 insertions(+)
> >
> > diff --git a/drivers/clk/imx/clk-imx95-blk-ctl.c
> > b/drivers/clk/imx/clk- imx95-blk-ctl.c index
> > 1f9259f45607..bc6957299cec 100644
> > --- a/drivers/clk/imx/clk-imx95-blk-ctl.c
> > +++ b/drivers/clk/imx/clk-imx95-blk-ctl.c
> > @@ -44,6 +44,8 @@ struct imx95_blk_ctl_clk_dev_data {
> > const char * const *parent_names;
> > u32 num_parents;
> > u32 reg;
> > + u32 reg_init_msk;
> > + u32 reg_init_val;
> > u32 bit_idx;
> > u32 bit_width;
> > u32 clk_type;
> > @@ -289,6 +291,8 @@ static const struct imx95_blk_ctl_clk_dev_data
> > hsio_blk_ctl_clk_dev_data[] = {
> > .parent_names = (const char *[]){ "func_out_en", },
> > .num_parents = 1,
> > .reg = 0,
> > + .reg_init_msk = GENMASK(10, 7),
> > + .reg_init_val = GENMASK(10, 7),
> > .bit_idx = 6,
> > .bit_width = 1,
> > .type = CLK_GATE,
> > @@ -410,6 +414,9 @@ static int imx95_bc_probe(struct platform_device
> > *pdev)
> > const struct imx95_blk_ctl_clk_dev_data *data = &bc-
> > >pdata->clk_dev_data[i];
> > void __iomem *reg = base + data->reg;
> >
> > + if (data->reg_init_msk)
> > + writel((readl(reg) & ~data->reg_init_msk) |
> > data->reg_init_val,
> > +reg);
> > +
> > if (data->type == CLK_MUX) {
> > hws[i] = clk_hw_register_mux(dev, data-
> > >name, data->parent_names,
> > data-
> > >num_parents, data->flags, reg,
> > --
> > 2.37.1
^ permalink raw reply
* [PATCH v2] drm/mediatek: hdmi: Convert DRM_ERROR() to drm_err()
From: sai madhu @ 2026-04-20 6:45 UTC (permalink / raw)
To: Chun-Kuang Hu
Cc: Philipp Zabel, dri-devel, linux-mediatek, linux-kernel,
linux-arm-kernel, sai madhu
The DRM_ERROR() macro is deprecated in favor of drm_err() which
provides device-specific logging.
Replace DRM_ERROR() with drm_err() in the Mediatek HDMI bridge
driver and pass the drm_device pointer via bridge->dev.
No functional change intended.
Signed-off-by: sai madhu <suryasaimadhu369@gmail.com>
---
drivers/gpu/drm/mediatek/mtk_hdmi.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/mediatek/mtk_hdmi.c b/drivers/gpu/drm/mediatek/mtk_hdmi.c
index 1ea259854780..4ddcdbf7bc8c 100644
--- a/drivers/gpu/drm/mediatek/mtk_hdmi.c
+++ b/drivers/gpu/drm/mediatek/mtk_hdmi.c
@@ -981,8 +981,8 @@ static int mtk_hdmi_bridge_attach(struct drm_bridge *bridge,
int ret;
if (!(flags & DRM_BRIDGE_ATTACH_NO_CONNECTOR)) {
- DRM_ERROR("%s: The flag DRM_BRIDGE_ATTACH_NO_CONNECTOR must be supplied\n",
- __func__);
+ drm_err(bridge->dev,
+ "DRM_BRIDGE_ATTACH_NO_CONNECTOR must be supplied\n");
return -EINVAL;
}
--
2.34.1
^ permalink raw reply related
* Re: [PATCH v6 01/30] mm: Introduce kpkeys
From: Kevin Brodsky @ 2026-04-20 6:46 UTC (permalink / raw)
To: David Hildenbrand (Arm), linux-hardening
Cc: linux-kernel, Andrew Morton, Andy Lutomirski, Catalin Marinas,
Dave Hansen, Ira Weiny, Jann Horn, Jeff Xu, Joey Gouly, Kees Cook,
Linus Walleij, Lorenzo Stoakes, Marc Zyngier, Mark Brown,
Matthew Wilcox, Maxwell Bland, Mike Rapoport (IBM),
Peter Zijlstra, Pierre Langlois, Quentin Perret, Rick Edgecombe,
Ryan Roberts, Thomas Gleixner, Vlastimil Babka, Will Deacon,
Yang Shi, Yeoreum Yun, linux-arm-kernel, linux-mm, x86
In-Reply-To: <cd2bcc09-2507-4ed4-bb92-2d53baedaf04@kernel.org>
On 17/04/2026 19:38, David Hildenbrand (Arm) wrote:
> On 4/17/26 17:59, Kevin Brodsky wrote:
>> On 17/04/2026 16:37, David Hildenbrand (Arm) wrote:
>>> On 2/27/26 18:54, Kevin Brodsky wrote:
>>>> kpkeys is a simple framework to enable the use of protection keys
>>>> (pkeys) to harden the kernel itself. This patch introduces the basic
>>>> API in <linux/kpkeys.h>: a couple of functions to set and restore
>>>> the pkey register and macros to define guard objects.
>>>>
>>>> kpkeys introduces a new concept on top of pkeys: the kpkeys level.
>>>> Each level is associated to a set of permissions for the pkeys
>>>> managed by the kpkeys framework. kpkeys_set_level(lvl) sets those
>>>> permissions according to lvl, and returns the original pkey
>>>> register, to be later restored by kpkeys_restore_pkey_reg(). To
>>>> start with, only KPKEYS_LVL_DEFAULT is available, which is meant
>>>> to grant RW access to KPKEYS_PKEY_DEFAULT (i.e. all memory since
>>>> this is the only available pkey for now).
>>>>
>>>> Because each architecture implementing pkeys uses a different
>>>> representation for the pkey register, and may reserve certain pkeys
>>>> for specific uses, support for kpkeys must be explicitly indicated
>>>> by selecting ARCH_HAS_KPKEYS and defining the following functions in
>>>> <asm/kpkeys.h>, in addition to the macros provided in
>>>> <asm-generic/kpkeys.h>:
>>>>
>>>> - arch_kpkeys_set_level()
>>>> - arch_kpkeys_restore_pkey_reg()
>>>> - arch_kpkeys_enabled()
>>> Another thing: why not simply drop the "arch_" stuff from these helpers?
>> The first two are not meant to be directly called, they're the
>> arch-specific implementation of kpkeys_set_level() and
>> kpkeys_restore_pkey_reg(), and those generic functions handle some
>> generic logic.
>>
>> arch_kpkeys_enabled() is directly used in generic code, so I suppose it
>> could be renamed to kpkeys_enabled()? It's actually implemented in an
>> arch header so I wasn't too sure about it.
> I was skimming over patch #13 and spotted:
>
> +void·__init·kpkeys_hardened_pgtables_init(void)
> +{
> +› if·(!arch_kpkeys_enabled())
> +› › return;
> +
> +› static_branch_enable(&kpkeys_hardened_pgtables_key);
> +}
>
> The arch_* there can just go IMHO.
>
> I'd also do it for the two ones used by the GUARD macros. If we don't
> expect common code wrappers (arch_kpkeys_enabled() vs. kpkeys_enabled),
> then the arch_ is unnecessary information -- IMHO
Makes sense. I could just rename arch_kpkeys_enabled() to
kpkeys_enabled(), but I'm thinking having an arch abstraction could be
clearer, after looking into protecting sparse-vmemmap page tables. The
new version would look like this:
* <asm/kpkeys.h>:
- arch_supports_kpkeys()
- arch_supports_kpkeys_early() [can be called before features have
been detected]
* <linux/kpkeys.h> defines:
- kpkeys_enabled() -> arch_supports_kpkeys()
- kpkeys_hardened_pgtables_enabled() -> static key
- kpkeys_hardened_pgtables_early_enabled() ->
arch_supports_kpkeys_early() [called when setting up sparse-vmemmap,
linear map, etc.]
There is extra #ifdef'ing going on in <linux/kpkeys.h>, but
<asm/kpkeys.h> doesn't need to worry about it. I think this might be
easier to follow, I don't like too much having an interface function
like kpkeys_enabled() defined in an arch header (not great for
kernel-doc comments either). Any thoughts?
- Kevin
^ permalink raw reply
* Re: [PATCH v2 1/3] MAINTAINERS: Move Peter De Schrijver to CREDITS
From: Geert Uytterhoeven @ 2026-04-20 6:50 UTC (permalink / raw)
To: Thierry Reding
Cc: Aaro Koskinen, linux-tegra, linux-arm-kernel, linux-pm,
linux-omap, linux-m68k, devicetree, linux-kernel, Paul Walmsley
In-Reply-To: <20260417131549.3154534-1-thierry.reding@kernel.org>
Hi Thierry,
On Fri, 17 Apr 2026 at 15:15, Thierry Reding <thierry.reding@kernel.org> wrote:
> From: Thierry Reding <treding@nvidia.com>
>
> Peter sadly passed away a while back. Paul did a much better job at
> finding the right words to mourn this loss than I ever could, so I will
> leave this link here:
>
> https://lore.kernel.org/lkml/alpine.DEB.2.21.999.2407240345480.11116@utopia.booyaka.com/T/#u
>
> Co-developed-by: Paul Walmsley <pjw@kernel.org>
> Co-developed-by: Aaro Koskinen <aaro.koskinen@iki.fi>
> Co-developed-by: Geert Uytterhoeven <geert@linux-m68k.org>
"every Co-developed-by: must be immediately
followed by a Signed-off-by: of the associated co-author."
https://elixir.bootlin.com/linux/v7.0/source/Documentation/process/submitting-patches.rst#L506
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
> Signed-off-by: Thierry Reding <treding@nvidia.com>
> ---
> Changes in v2:
> - add more missing entries
Thanks!
Gr{oetje,eeting}s,
Geert
--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org
In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
^ permalink raw reply
* Re: [PATCH] drm/bridge: imx8qxp-pxl2dpi: avoid of_node_put() on ERR_PTR()
From: Liu Ying @ 2026-04-20 6:53 UTC (permalink / raw)
To: Guangshuo Li, Frank Li
Cc: Andrzej Hajda, Neil Armstrong, Robert Foss, Laurent Pinchart,
Jonas Karlman, Jernej Skrabec, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, David Airlie, Simona Vetter, Sascha Hauer,
Pengutronix Kernel Team, Fabio Estevam, Luca Ceresoli, dri-devel,
imx, linux-arm-kernel, linux-kernel, stable
In-Reply-To: <CANUHTR8FaXLX+Nbeb7+sWRF9jQ5SoBgWc2y_LVD38KE7TqsxeQ@mail.gmail.com>
On Mon, Apr 20, 2026 at 10:19:35AM +0800, Guangshuo Li wrote:
> [You don't often get email from lgs201920130244@gmail.com. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
>
> Hi Frank,
>
> Thanks for the review.
>
> On Mon, 20 Apr 2026 at 09:56, Frank Li <Frank.li@nxp.com> wrote:
>>
>>
>> Please fix
>> DEFINE_FREE(device_node, struct device_node *, if (_T) of_node_put(_T))
>>
>> If (!IS_ERR(_T))
>>
>
> You're right, fixing DEFINE_FREE(device_node, ...) is the proper way
> to handle this:
> if (_T && !IS_ERR(_T)) of_node_put(_T)
This would be intrusive because it effectively changes the cleanup action.
A similar case[1] was handled by ensuring only NULL pointer was returned
on error. And, this is actually what i2c_of_probe_get_i2c_node()[2] does
now.
[1] https://lore.kernel.org/all/Zw-VkQ3di5nFHiXB@smile.fi.intel.com/
[2] https://elixir.bootlin.com/linux/v7.0/source/drivers/i2c/i2c-core-of-prober.c#L38-L58
BTW, even if the cleanup action needs to be changed, the 'if' condition
should be '!IS_ERR_OR_NULL(_T)'.
>
> This is a better fix than handling it only in this driver.
>
> I'll rework the patch based on your suggestion and send v2 later.
>
> Thanks,
> Guangshuo
--
Regards,
Liu Ying
^ permalink raw reply
* Re: [PATCH 2/2] arm64: dts: rockchip: Replace deprecated snps,* props for NanoPi R5S
From: Tianling Shen @ 2026-04-20 6:58 UTC (permalink / raw)
To: Diederik de Haas, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
Heiko Stuebner
Cc: Arnd Bergmann, devicetree, linux-arm-kernel, linux-rockchip,
linux-kernel, Quentin Schulz, Jonas Karlman
In-Reply-To: <DHTSOV43O2EX.38TGASN7SQEZL@cknow-tech.com>
On 2026/4/15 22:23, Diederik de Haas wrote:
> On Wed Apr 1, 2026 at 3:11 PM CEST, Diederik de Haas wrote:
>> The various snps,reset-* properties are deprecated, so convert them into
>> their replacements.
>>
>> Signed-off-by: Diederik de Haas <diederik@cknow-tech.com>
>> ---
>> arch/arm64/boot/dts/rockchip/rk3568-nanopi-r5s.dts | 7 +++----
>> 1 file changed, 3 insertions(+), 4 deletions(-)
>>
>> diff --git a/arch/arm64/boot/dts/rockchip/rk3568-nanopi-r5s.dts b/arch/arm64/boot/dts/rockchip/rk3568-nanopi-r5s.dts
>> index 90ce6f0e1dcf..92d044ec696b 100644
>> --- a/arch/arm64/boot/dts/rockchip/rk3568-nanopi-r5s.dts
>> +++ b/arch/arm64/boot/dts/rockchip/rk3568-nanopi-r5s.dts
>> @@ -85,10 +85,6 @@ &gmac0_tx_bus2
>> &gmac0_rx_bus2
>> &gmac0_rgmii_clk
>> &gmac0_rgmii_bus>;
>> - snps,reset-gpio = <&gpio0 RK_PC5 GPIO_ACTIVE_LOW>;
>> - snps,reset-active-low;
>> - /* Reset time is 15ms, 50ms for rtl8211f */
>> - snps,reset-delays-us = <0 15000 50000>;
>> tx_delay = <0x3c>;
>> rx_delay = <0x2f>;
>> status = "okay";
>> @@ -100,6 +96,9 @@ rgmii_phy0: ethernet-phy@1 {
>> reg = <1>;
>> pinctrl-0 = <&gmac0_rstn_gpio0_c5_pin>;
>> pinctrl-names = "default";
>> + reset-assert-us = <15000>;
>> + reset-deassert-us = <50000>;
>> + reset-gpios = <&gpio0 RK_PC5 GPIO_ACTIVE_LOW>;
>> };
>> };
>>
>
> Please disregard/drop this patch.
>
> I was recently made aware of 'sashiko.dev' and checked whether it had
> also checked my patch, which it did:
> https://sashiko.dev/#/patchset/20260401131551.734456-1-diederik%40cknow-tech.com
>
> And it turns out that the concern raised is valid (thanks Quentin!), so
> this patch could introduce a regression.
> So it looks like staying with the deprecated properties is actually
> better (in this case?).
Well actually we more or less rely on U-Boot to reset the PHY first now.
Many rockchip boards in tree require a reset before the PHY can be
recognized, but we just use the generic "ethernet-phy-ieee802.3-c22"
compatible.
Another option is to move the reset props to mdio node instead of PHY
node, though.
Thanks,
Tianling.
>
> Cheers,
> Diederik
^ permalink raw reply
* Re: [PATCH v7 2/4] KVM: arm64: PMU: Protect the list of PMUs with RCU
From: Marc Zyngier @ 2026-04-20 7:01 UTC (permalink / raw)
To: Akihiko Odaki
Cc: Oliver Upton, Joey Gouly, Suzuki K Poulose, Zenghui Yu,
Catalin Marinas, Will Deacon, Kees Cook, Gustavo A. R. Silva,
Paolo Bonzini, Jonathan Corbet, Shuah Khan, linux-arm-kernel,
kvmarm, linux-kernel, linux-hardening, devel, kvm, linux-doc,
linux-kselftest
In-Reply-To: <483e5cf2-a54c-4781-ac6d-49f5bc7128ba@rsg.ci.i.u-tokyo.ac.jp>
On Mon, 20 Apr 2026 07:21:45 +0100,
Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> wrote:
>
> On 2026/04/19 23:34, Marc Zyngier wrote:
> > On Sat, 18 Apr 2026 09:14:24 +0100,
> > Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> wrote:
> >>
> >> Convert the list of PMUs to a RCU-protected list that has primitives to
> >> avoid read-side contention.
> >>
> >> Signed-off-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp>
> >> ---
> >> arch/arm64/kvm/pmu-emul.c | 14 ++++++--------
> >> 1 file changed, 6 insertions(+), 8 deletions(-)
> >>
> >> diff --git a/arch/arm64/kvm/pmu-emul.c b/arch/arm64/kvm/pmu-emul.c
> >> index 59ec96e09321..ef5140bbfe28 100644
> >> --- a/arch/arm64/kvm/pmu-emul.c
> >> +++ b/arch/arm64/kvm/pmu-emul.c
> >> @@ -7,9 +7,9 @@
> >> #include <linux/cpu.h>
> >> #include <linux/kvm.h>
> >> #include <linux/kvm_host.h>
> >> -#include <linux/list.h>
> >> #include <linux/perf_event.h>
> >> #include <linux/perf/arm_pmu.h>
> >> +#include <linux/rculist.h>
> >> #include <linux/uaccess.h>
> >> #include <asm/kvm_emulate.h>
> >> #include <kvm/arm_pmu.h>
> >> @@ -26,7 +26,6 @@ static bool kvm_pmu_counter_is_enabled(struct kvm_pmc *pmc);
> >> bool kvm_supports_guest_pmuv3(void)
> >> {
> >> - guard(mutex)(&arm_pmus_lock);
> >> return !list_empty(&arm_pmus);
> >
> > Please read include/linux/rculist.h and the discussion about the
> > interaction of list_empty() with RCU-protected lists. How about using
> > list_first_or_null_rcu() for peace of mind?
>
> list_first_or_null_rcu() is useful to replace a sequence of
> list_empty() and list_first_entry() that is protected by a lock, but
> this function instead requires the invariant that nobody deletes an
> element from the list, and list_first_or_null_rcu() does not allow
> removing the requirement.
>
> The header file says:
> > Where are list_empty_rcu() and list_first_entry_rcu()?
> >
> > They do not exist because they would lead to subtle race conditions:
> >
> > if (!list_empty_rcu(mylist)) {
> > struct foo *bar = list_first_entry_rcu(mylist, struct foo,
> > list_member);
> > do_something(bar);
> > }
> >
> > The list might be non-empty when list_empty_rcu() checks it, but it
> > might have become empty by the time that list_first_entry_rcu()
> > rereads the ->next pointer, which would result in a SEGV.
> >
> > When not using RCU, it is OK for list_first_entry() to re-read that
> > pointer because both functions should be protected by some lock that
> > blocks writers.
> >
> > When using RCU, list_empty() uses READ_ONCE() to fetch the
> > RCU-protected ->next pointer and then compares it to the address of
> > the list head. However, it neither dereferences this pointer nor
> > provides this pointer to its caller. Thus, READ_ONCE() suffices
> > (that is, rcu_dereference() is not needed), which means that
> > list_empty() can be used anywhere you would want to use
> > list_empty_rcu(). Just don't expect anything useful to happen if you
> > do a subsequent lockless call to list_first_entry_rcu()!!!
> >
> > See list_first_or_null_rcu for an alternative.
>
> However, kvm_supports_guest_pmuv3() locked a mutex when calling
> list_empty() and unlocked it immediately after that, instead of
> re-reading list_first_entry(). This construct inherently had a race
> condition with code that deletes an element; when the caller of
> kvm_supports_guest_pmuv3() decides to enable guest PMUv3, the host PMU
> may have been gone. But it was still safe because no one deletes an
> element.
>
> The same logic also applies when using RCU. As the comment says, we
> can use list_empty() instead of the hypothetical list_empty_rcu()
> macro because we don't expect it to magically enable something like
> list_first_entry_rcu(). This function instead keep relying on the fact
> that no one deletes an element of the list.
And that's exactly the sort of thing I am trying to plan for. *Should*
we introduce a way to remove PMUs from the list, this predicate
becomes unsafe.
So I want at least a comment explaining this to the unsuspecting
reader, as this is rather subtle.
M.
--
Without deviation from the norm, progress is not possible.
^ permalink raw reply
* Re: [PATCH v3 1/1] kernel: kprobes: fix cur_kprobe corruption during re-entrant kprobe_busy_begin() calls
From: Khaja Hussain Shaik Khaji @ 2026-04-20 7:05 UTC (permalink / raw)
To: mark.rutland
Cc: catalin.marinas, dev.jain, linux-kernel, mhiramat, linux-arm-msm,
will, linux-arm-kernel, yang
In-Reply-To: <aaWS20g-jGu8mCKH@J2N7QTR9R3>
On Mon, Mar 02, 2026 at 01:38:35PM +0000, Mark Rutland wrote:
> That suggests that something is going wrong *within* your entry handler
> that causes IRQs to be unmasked unexpectedly.
>
> Please can we find out *exactly* where IRQs get unmasked for the first
> time?
Thanks for the pointer -- that was the right direction to look.
You are correct. I confirmed that arm64_enter_el1_dbg() does NOT re-enable
IRQs; it only manages lockdep and context-tracking state. The IRQ unmask
originates entirely within our kretprobe entry_handler itself.
The exact call chain is:
pre_handler_kretprobe()
entry_dwc3_gadget_pullup() <- kretprobe entry_handler
dwc3_msm_notify_event()
_raw_spin_unlock_irq() <- first IRQ unmask (spin_unlock_irq)
dwc3_msm_notify_event() is called from within the entry_handler while
holding a spinlock acquired with spin_lock_irq() (i.e. IRQs were disabled
on lock, and re-enabled unconditionally on unlock via spin_unlock_irq /
_raw_spin_unlock_irq). This is the first point at which IRQs become
unmasked.
From that point, a hardware IRQ fires, softirq processing runs, and
kprobe_flush_task() -> kprobe_busy_begin()/end() is invoked while the
kretprobe entry_handler is still on the stack -- triggering the cur_kprobe
corruption described in the patch.
Regarding documentation: the kprobes documentation in
Documentation/trace/kprobes.rst (section "Kretprobe entry-handler") does
not mention any restriction on enabling IRQs within an entry_handler. The
only constraint documented is:
"Probe handlers are run with preemption disabled or interrupt disabled,
which depends on the architecture and optimization state."
This is stated for kprobe/kretprobe handlers in general, but there is no
explicit warning that an entry_handler must not re-enable IRQs for arm64.
Given that entry_handlers are user-supplied callbacks, a note
here would help future users avoid this class of bug.
As for the fix itself: we plan to carry this as a downstream patch for our
platform. We are not planning to push it upstream at this time.
Thanks again for the detailed review.
Khaja
^ permalink raw reply
* Re: [PATCH v7 2/4] KVM: arm64: PMU: Protect the list of PMUs with RCU
From: Akihiko Odaki @ 2026-04-20 7:17 UTC (permalink / raw)
To: Marc Zyngier
Cc: Oliver Upton, Joey Gouly, Suzuki K Poulose, Zenghui Yu,
Catalin Marinas, Will Deacon, Kees Cook, Gustavo A. R. Silva,
Paolo Bonzini, Jonathan Corbet, Shuah Khan, linux-arm-kernel,
kvmarm, linux-kernel, linux-hardening, devel, kvm, linux-doc,
linux-kselftest
In-Reply-To: <86se8q15eo.wl-maz@kernel.org>
On 2026/04/20 16:01, Marc Zyngier wrote:
> On Mon, 20 Apr 2026 07:21:45 +0100,
> Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> wrote:
>>
>> On 2026/04/19 23:34, Marc Zyngier wrote:
>>> On Sat, 18 Apr 2026 09:14:24 +0100,
>>> Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> wrote:
>>>>
>>>> Convert the list of PMUs to a RCU-protected list that has primitives to
>>>> avoid read-side contention.
>>>>
>>>> Signed-off-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp>
>>>> ---
>>>> arch/arm64/kvm/pmu-emul.c | 14 ++++++--------
>>>> 1 file changed, 6 insertions(+), 8 deletions(-)
>>>>
>>>> diff --git a/arch/arm64/kvm/pmu-emul.c b/arch/arm64/kvm/pmu-emul.c
>>>> index 59ec96e09321..ef5140bbfe28 100644
>>>> --- a/arch/arm64/kvm/pmu-emul.c
>>>> +++ b/arch/arm64/kvm/pmu-emul.c
>>>> @@ -7,9 +7,9 @@
>>>> #include <linux/cpu.h>
>>>> #include <linux/kvm.h>
>>>> #include <linux/kvm_host.h>
>>>> -#include <linux/list.h>
>>>> #include <linux/perf_event.h>
>>>> #include <linux/perf/arm_pmu.h>
>>>> +#include <linux/rculist.h>
>>>> #include <linux/uaccess.h>
>>>> #include <asm/kvm_emulate.h>
>>>> #include <kvm/arm_pmu.h>
>>>> @@ -26,7 +26,6 @@ static bool kvm_pmu_counter_is_enabled(struct kvm_pmc *pmc);
>>>> bool kvm_supports_guest_pmuv3(void)
>>>> {
>>>> - guard(mutex)(&arm_pmus_lock);
>>>> return !list_empty(&arm_pmus);
>>>
>>> Please read include/linux/rculist.h and the discussion about the
>>> interaction of list_empty() with RCU-protected lists. How about using
>>> list_first_or_null_rcu() for peace of mind?
>>
>> list_first_or_null_rcu() is useful to replace a sequence of
>> list_empty() and list_first_entry() that is protected by a lock, but
>> this function instead requires the invariant that nobody deletes an
>> element from the list, and list_first_or_null_rcu() does not allow
>> removing the requirement.
>>
>> The header file says:
>>> Where are list_empty_rcu() and list_first_entry_rcu()?
>>>
>>> They do not exist because they would lead to subtle race conditions:
>>>
>>> if (!list_empty_rcu(mylist)) {
>>> struct foo *bar = list_first_entry_rcu(mylist, struct foo,
>>> list_member);
>>> do_something(bar);
>>> }
>>>
>>> The list might be non-empty when list_empty_rcu() checks it, but it
>>> might have become empty by the time that list_first_entry_rcu()
>>> rereads the ->next pointer, which would result in a SEGV.
>>>
>>> When not using RCU, it is OK for list_first_entry() to re-read that
>>> pointer because both functions should be protected by some lock that
>>> blocks writers.
>>>
>>> When using RCU, list_empty() uses READ_ONCE() to fetch the
>>> RCU-protected ->next pointer and then compares it to the address of
>>> the list head. However, it neither dereferences this pointer nor
>>> provides this pointer to its caller. Thus, READ_ONCE() suffices
>>> (that is, rcu_dereference() is not needed), which means that
>>> list_empty() can be used anywhere you would want to use
>>> list_empty_rcu(). Just don't expect anything useful to happen if you
>>> do a subsequent lockless call to list_first_entry_rcu()!!!
>>>
>>> See list_first_or_null_rcu for an alternative.
>>
>> However, kvm_supports_guest_pmuv3() locked a mutex when calling
>> list_empty() and unlocked it immediately after that, instead of
>> re-reading list_first_entry(). This construct inherently had a race
>> condition with code that deletes an element; when the caller of
>> kvm_supports_guest_pmuv3() decides to enable guest PMUv3, the host PMU
>> may have been gone. But it was still safe because no one deletes an
>> element.
>>
>> The same logic also applies when using RCU. As the comment says, we
>> can use list_empty() instead of the hypothetical list_empty_rcu()
>> macro because we don't expect it to magically enable something like
>> list_first_entry_rcu(). This function instead keep relying on the fact
>> that no one deletes an element of the list.
>
> And that's exactly the sort of thing I am trying to plan for. *Should*
> we introduce a way to remove PMUs from the list, this predicate
> becomes unsafe.
Perhaps so. In regards to this series, I'd rather like to keep it out of
scope as the requirement is not new.
>
> So I want at least a comment explaining this to the unsuspecting
> reader, as this is rather subtle.
I agree. I had to put some effort to understand the previous
mutex-protected implementation and to design the new RCU-protected one.
I'll add one with the next version.
Regards,
Akihiko Odaki
^ permalink raw reply
* Re: [PATCH v2 3/4] gpio: realtek: Add driver for Realtek DHC RTD1625 SoC
From: Michael Walle @ 2026-04-20 7:22 UTC (permalink / raw)
To: Linus Walleij, Yu-Chun Lin [林祐君]
Cc: Bartosz Golaszewski, linux-gpio@vger.kernel.org,
devicetree@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-arm-kernel@lists.infradead.org,
linux-realtek-soc@lists.infradead.org,
CY_Huang[黃鉦晏],
Stanley Chang[昌育德],
James Tai [戴志峰], robh@kernel.org,
krzk+dt@kernel.org, conor+dt@kernel.org, afaerber@suse.com,
TY_Chang[張子逸]
In-Reply-To: <CAD++jLkpS-T9yK=ctSwpLvXkj7s7ivmwu1KKwzy4KS40LVYeyA@mail.gmail.com>
[-- Attachment #1: Type: text/plain, Size: 3744 bytes --]
Hi,
On Sun Apr 19, 2026 at 11:19 PM CEST, Linus Walleij wrote:
> Hi Yu-Chun,
>
> On Fri, Apr 10, 2026 at 11:39 AM Yu-Chun Lin [林祐君]
> <eleanor.lin@realtek.com> wrote:
>
>> We did look into gpio-mmio and gpio-regmap, but they are not quite suitable for
>> our platform due to the specific hardware design:
>>
>> 1. Per-GPIO Dedicated Registers: Unlike typical GPIO controllers that pack 32 pins
>> into a single 32-bit register (1 bit per pin), our hardware uses a dedicated 32-bit
>> register for each individual GPIO. This single register controls the
>> input/output state, direction, and interrupt trigger type for that specific pin.
>
> Isn't that attainable by:
>
> - setting .ngpio_per_reg to 1 in struct gpio_regmap_config
Which is just used by the gpio_regmap_simple_xlate() anyway. So it
doesn't really matter. But yeah, 1 would be the correct value here,
assuming that the registers are consecutive.
> - extend .reg_mask_xlate callback with an enum for each operation
> (need to change all users of the .reg_mask_xlate callback but
> who cares, they are not many):
>
> e.g.
>
> enum gpio_regmap_operation {
> GPIO_REGMAP_GET_OP,
> GPIO_REGMAP_SET_OP,
> GPIO_REGMAP_SET_WITH_CLEAR_OP,
> GPIO_REGMAP_GET_DIR_OP,
> GPIO_REGMAP_SET_DIR_OP,
> };
>
> int (*reg_mask_xlate)(struct gpio_regmap *gpio,
> enum_gpio_regmap_operation op,
> unsigned int base,
> unsigned int offset, unsigned int *reg,
> unsigned int *mask);
>
> This way .reg_mask_xlate() can hit different bits in the returned
> *mask depending on operation and it will be find to pack all of
> the bits into one 32bit register.
>
> Added Michael Walle to the the thread, he will know if this is a
> good idea.
Nice idea, though the information is then redundant in the usual
case, i.e. drivers which need to translate specific registers
will do a "switch (base)" at the moment. These should be converted
to "switch (op)" just to keep all the drivers aligned and prevent
new drivers from using the old method. You'd need to touch them
anyway.
I was briefly thinking about making it somewhat possible to embed
the op into the base, if it would otherwise be all the same. That
way, you could gpio-regmap as is. A special case like
GPIO_REGMAP_ADDR_ZERO, that could be used by these kind of drivers,
but that is probably too hacky.
I'm fine with either way.
>> 2. Write-Enable (WREN) Mask Mechanism: Our hardware requires a specific Write-Enable
>> mask to be written simultaneously when updating the register values.
>
> Which is to just set bit 31.
>
> With the above scheme your .reg_mask_xlate callback can just set bit 31
> no matter what operating you're doing. Piece of cake.
Keep in mind, that this will make reading and writing somewhat
different. reading assumes there is only one bit set in mask,
because of the "!!(val & mask)" op, which is hardcoded. I'm not
against using the write like that though.
-michael
>> 3. Hardware Debounce: We also need to support hardware debounce settings per pin,
>> which requires custom configuration via set_config mapped to these specific per-pin
>> registers.
>
> Just add a version of an optional .set_config() call to gpio-regmap.c
> to handle this using .reg_mask_xlate() per above and add a new
> GPIO_REGMAP_CONFIG_OP to the above enum, problem solved.
>
> If it seems too hard I can write patch 1 & 2 adding this infrastructure
> but I bet you can easily see what can be done with gpio-regmap.c
> here provided Michael W approves the idea.
>
> Yours,
> Linus Walleij
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 297 bytes --]
^ permalink raw reply
* Re: [PATCH v7 1/3] dt-bindings: pinctrl: Add aspeed,ast2700-soc0-pinctrl
From: Billy Tsai @ 2026-04-20 7:22 UTC (permalink / raw)
To: Conor Dooley
Cc: Lee Jones, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
Joel Stanley, Andrew Jeffery, Linus Walleij, Bartosz Golaszewski,
Ryan Chen, Andrew Jeffery, devicetree@vger.kernel.org,
linux-arm-kernel@lists.infradead.org,
linux-aspeed@lists.ozlabs.org, linux-kernel@vger.kernel.org,
openbmc@lists.ozlabs.org, linux-gpio@vger.kernel.org,
linux-clk@vger.kernel.org
In-Reply-To: <20260417-anemia-borrower-fb90ac02b417@spud>
> > > > + properties:
> > > > + function:
> > > > + enum:
> > > > + - EMMC
> > > > + - JTAGDDR
> > > > + - JTAGM0
> > > > + - JTAGPCIEA
> > > > + - JTAGPCIEB
> > > > + - JTAGPSP
> > > > + - JTAGSSP
> > > > + - JTAGTSP
> > > > + - JTAGUSB3A
> > > > + - JTAGUSB3B
> > > > + - PCIERC0PERST
> > > > + - PCIERC1PERST
> > > > + - TSPRSTN
> > > > + - UFSCLKI
> > > > + - USB2AD0
> > > > + - USB2AD1
> > > > + - USB2AH
> > > > + - USB2AHP
> > > > + - USB2AHPD0
> > > > + - USB2AXH
> > > > + - USB2AXH2B
> > > > + - USB2AXHD1
> > > > + - USB2AXHP
> > > > + - USB2AXHP2B
> > > > + - USB2AXHPD1
> > > > + - USB2BD0
> > > > + - USB2BD1
> > > > + - USB2BH
> > > > + - USB2BHP
> > > > + - USB2BHPD0
> > > > + - USB2BXH
> > > > + - USB2BXH2A
> > > > + - USB2BXHD1
> > > > + - USB2BXHP
> > > > + - USB2BXHP2A
> > > > + - USB2BXHPD1
> > > > + - USB3AXH
> > > > + - USB3AXH2B
> > > > + - USB3AXHD
> > > > + - USB3AXHP
> > > > + - USB3AXHP2B
> > > > + - USB3AXHPD
> > > > + - USB3BXH
> > > > + - USB3BXH2A
> > > > + - USB3BXHD
> > > > + - USB3BXHP
> > > > + - USB3BXHP2A
> > > > + - USB3BXHPD
> > > > + - VB
> > > > + - VGADDC
> > > > +
> > > > + groups:
> > > > + enum:
> > > > + - EMMCCDN
> > > > + - EMMCG1
> > > > + - EMMCG4
> > > > + - EMMCG8
> > > > + - EMMCWPN
> > > > + - JTAG0
> > > > + - PCIERC0PERST
> > > > + - PCIERC1PERST
> > > > + - TSPRSTN
> > > > + - UFSCLKI
> > > > + - USB2A
> > > > + - USB2AAP
> > > > + - USB2ABP
> > > > + - USB2ADAP
> > > > + - USB2AH
> > > > + - USB2AHAP
> > > > + - USB2B
> > > > + - USB2BAP
> > > > + - USB2BBP
> > > > + - USB2BDBP
> > > > + - USB2BH
> > > > + - USB2BHBP
> > > > + - USB3A
> > > > + - USB3AAP
> > > > + - USB3ABP
> > > > + - USB3B
> > > > + - USB3BAP
> > > > + - USB3BBP
> > > > + - VB0
> > > > + - VB1
> > > > + - VGADDC
> > > > + pins:
> > > > + enum:
> > > > + - AB13
> > > > + - AB14
> > > > + - AC13
> > > > + - AC14
> > > > + - AD13
> > > > + - AD14
> > > > + - AE13
> > > > + - AE14
> > > > + - AE15
> > > > + - AF13
> > > > + - AF14
> > > > + - AF15
> > > Why do you have groups and pins?
> > > Is it valid in your device to have groups and pins in the same node?
> > The intent is to support both group-based mux selection and
> > configuration, as well as per-pin configuration.
> > In our hardware:
> > - `function` + `groups` are used for pinmux selection.
> > - `pins` is used for per-pin configuration (e.g. drive strength,
> > bias settings).
> > - `groups` may also be used for group-level configuration.
> > As a result, both `groups` and `pins` may appear in the same node,
> > but they serve different purposes and do not conflict:
> > - `groups` selects the mux function and may apply configuration to
> > the entire group.
> > - `pins` allows overriding or specifying configuration for individual
> > pins.
> > In most cases, only one of them is needed, but both are allowed when
> > both group-level and per-pin configuration are required.
> To be honest, that sounds like your groups are not sufficiently
> granular and should be reduced such that you can use them for pin
> settings.
The intent was to keep the binding flexible, but in practice the mixed
use of `groups` and `pins` in the same node is not expected to be used.
Given that, I agree this flexibility is unnecessary and makes the
binding semantics less clear. I'll rework the binding to make the
expected usage explicit rather than allowing combinations that do not
correspond to a real use case.
In particular, I'll split the constraints as follows:
- For pinmux, the presence of `function` will require `groups`, and
`pins` will not be allowed. This reflects the hardware design, where
the groups are defined by the pins affected by a given mux expression
- For pin configuration, exactly one of `groups` or `pins` will be
required (using oneOf), so that configuration is applied either at
group level or per-pin, but not both.
- if:
required:
- function
then:
required:
- groups
not:
required:
- pins
else:
oneOf:
- required:
- groups
not:
required:
- pins
- required:
- pins
not:
required:
- groups
Does this match what you had in mind?
Thanks
Billy Tsai
^ permalink raw reply
* Re: [PATCH] arm_pmu: acpi: fix reference leak on failed device registration
From: Johan Hovold @ 2026-04-20 7:28 UTC (permalink / raw)
To: Mark Rutland
Cc: Greg Kroah-Hartman, Guangshuo Li, Will Deacon, Anshuman Khandual,
linux-arm-kernel, linux-perf-users, linux-kernel, stable
In-Reply-To: <aeCsLy-45QyeCwGA@J2N7QTR9R3>
On Thu, Apr 16, 2026 at 10:30:23AM +0100, Mark Rutland wrote:
> On Thu, Apr 16, 2026 at 09:23:33AM +0200, Johan Hovold wrote:
> > It's not just the platform code as this directly reflects the behaviour
> > of device_register() as Mark pointed out.
> >
> > It is indeed an unfortunate quirk of the driver model, but one can argue
> > that having a registration function that frees its argument on errors
> > would be even worse. And even more so when many (or most) users get this
> > right.
>
> Ah, sorry; I had missed that the _put() step would actually free the
> object (and as you explain below, how that won't work for many callers).
>
> > So if we want to change this, I think we would need to deprecate
> > device_register() in favour of explicit device_initialize() and
> > device_add().
>
> Is is possible to have {platfom_,}device_uninitialize() functions that
> does everything except the ->release() call? If we had that, then we'd
> be able to have a flow along the lines of:
>
> int some_init_function(void)
> {
> int err;
>
> platform_device_init(&static_pdev);
>
> err = platform_device_add(&static_pdev))
> if (err)
> goto out_uninit;
>
> return 0;
>
> out_uninit:
> platform_device_uninit(&static_pdev);
> return err;
> }
>
> ... which I think would align with what people generally expect to have
> to do.
The issue here is that platform_device_add() allocates a device name and
such resources are not released until the last reference is dropped.
It's been this way since 2008, but some of the static platform devices
predates that and they both lack a release callback (explicitly required
since 2003) and are not cleaned up on registration failure.
Since registration would essentially only fail during development (e.g.
due to name collision or fault injection), this is hardly something to
worry about, but we could consider moving towards dynamic objects to
address both issues.
We have a few functions for allocating *and* registering platform
devices that could be used in many of these cases (and they already
clean up after themselves on errors):
platform_device_register_simple()
platform_device_register_data()
platform_device_register_resndata()
platform_device_register_full()
and where those do not fit (and cannot be extended) we have the
underlying:
platform_device_alloc()
platform_device_add_resources()
platform_device_add_data()
plaform_device_add()
But there are some 800 static platform devices left, mostly in legacy
platform code and board files that I assume few people care about.
Johan
^ permalink raw reply
* Re: [PATCH v8 next 00/10] arm_mpam: Introduce Narrow-PARTID feature
From: Zeng Heng @ 2026-04-20 7:31 UTC (permalink / raw)
To: tan.shaopeng, ben.horgan, Dave.Martin, james.morse,
reinette.chatre, fenghuay, tglx, will, hpa, bp, babu.moger,
dave.hansen, mingo, tony.luck, gshan, catalin.marinas
Cc: linux-arm-kernel, x86, linux-kernel, wangkefeng.wang
In-Reply-To: <20260413085405.1166412-1-zengheng4@huawei.com>
Hi Shaopeng,
> Hello Zeng Heng,
>
> Could you tell me which branch this patch series based on?
>
> Best regards,
> Shaopent TAN
As indicated in the patch series tags, this patch set applies to the
linux-next repository, specifically the master branch at:
https://kernel.googlesource.com/pub/scm/linux/kernel/git/next/linux-next.git
Keep me in the mail list for follow-up responses if you want my feedback
in time. I was accidentally dropped from the mail list in a previous
thread (see
https://lore.kernel.org/all/TY4PR01MB16930EB1ACB3A3356A92169BC8B232@TY4PR01MB16930.jpnprd01.prod.outlook.com/).
Kind regards,
Zeng Heng
^ permalink raw reply
* Re: [PATCH 00/30] KVM: arm64: Add support for protected guest memory with pKVM
From: Pavan Kondeti @ 2026-04-20 8:02 UTC (permalink / raw)
To: Will Deacon
Cc: kvmarm, linux-arm-kernel, Marc Zyngier, Oliver Upton, Joey Gouly,
Suzuki K Poulose, Zenghui Yu, Catalin Marinas, Quentin Perret,
Fuad Tabba, Vincent Donnefort, Mostafa Saleh
In-Reply-To: <20260105154939.11041-1-will@kernel.org>
Hi Will,
On Mon, Jan 05, 2026 at 03:49:08PM +0000, Will Deacon wrote:
> Hi folks,
>
> Although pKVM has been shipping in Android kernels for a while now,
> protected guest (pVM) support has been somewhat languishing upstream.
> This has partly been because we've been waiting for guest_memfd() but
> also because it hasn't been clear how to expose pVMs to userspace (which
> is necessary for testing) without getting everything in place beforehand.
> This has led to frustration on both sides of the fence [1] and so this
> patch series attempts to get things moving again by exposing pVM
> features in an incremental fashion based on top of anonymous memory,
> which is what we have been using in Android. The big difference between
> this series and the Android implementation is the graceful handling of
> host stage-2 faults arising from accesses made using kernel mappings.
> The hope is that this will unblock pKVM upstreaming efforts while the
> guest_memfd() work continues to evolve.
>
> Specifically, this patch series implements support for protected guest
> memory with pKVM, where pages are unmapped from the host as they are
> faulted into the guest and can be shared back from the guest using pKVM
> hypercalls. Protected guests are created using a new machine type
> identifier and can be booted to a shell using the kvmtool patches
> available at [2], which finally means that we are able to test the pVM
> logic in pKVM. Since this is an incremental step towards full isolation
> from the host (for example, the CPU register state and DMA accesses are
> not yet isolated), creating a pVM requires a developer Kconfig option to
> be enabled in addition to booting with 'kvm-arm.mode=protected' and
> results in a kernel taint.
>
Good to see Protected VM support in upstream w/ pKVM.
We (Qualcomm) have been trying to resume Gunyah upstreaming [1] efforts
for some time but the path to re-use guest_memfd is not straight forward as
guest_memfd is tightly coupled with KVM. While the efforts to use it for
pKVM is pending and refactoring to make it use outside KVM is not
happening anytime soon, we plan to send Gunyah series similar to how
this series is dealt with pages lent/donated to the Guest. Please let us
know if you have any suggestions/comments for us.
[1]
https://lore.kernel.org/all/20240222-gunyah-v17-0-1e9da6763d38@quicinc.com/
Thanks,
Pavan
^ permalink raw reply
* Re: [PATCH] arm_pmu: acpi: fix reference leak on failed device registration
From: Greg Kroah-Hartman @ 2026-04-20 8:05 UTC (permalink / raw)
To: Johan Hovold
Cc: Mark Rutland, Guangshuo Li, Will Deacon, Anshuman Khandual,
linux-arm-kernel, linux-perf-users, linux-kernel, stable
In-Reply-To: <aeXVr5enpjb3rfq7@hovoldconsulting.com>
On Mon, Apr 20, 2026 at 09:28:47AM +0200, Johan Hovold wrote:
> On Thu, Apr 16, 2026 at 10:30:23AM +0100, Mark Rutland wrote:
> > On Thu, Apr 16, 2026 at 09:23:33AM +0200, Johan Hovold wrote:
>
> > > It's not just the platform code as this directly reflects the behaviour
> > > of device_register() as Mark pointed out.
> > >
> > > It is indeed an unfortunate quirk of the driver model, but one can argue
> > > that having a registration function that frees its argument on errors
> > > would be even worse. And even more so when many (or most) users get this
> > > right.
> >
> > Ah, sorry; I had missed that the _put() step would actually free the
> > object (and as you explain below, how that won't work for many callers).
> >
> > > So if we want to change this, I think we would need to deprecate
> > > device_register() in favour of explicit device_initialize() and
> > > device_add().
> >
> > Is is possible to have {platfom_,}device_uninitialize() functions that
> > does everything except the ->release() call? If we had that, then we'd
> > be able to have a flow along the lines of:
> >
> > int some_init_function(void)
> > {
> > int err;
> >
> > platform_device_init(&static_pdev);
> >
> > err = platform_device_add(&static_pdev))
> > if (err)
> > goto out_uninit;
> >
> > return 0;
> >
> > out_uninit:
> > platform_device_uninit(&static_pdev);
> > return err;
> > }
> >
> > ... which I think would align with what people generally expect to have
> > to do.
>
> The issue here is that platform_device_add() allocates a device name and
> such resources are not released until the last reference is dropped.
>
> It's been this way since 2008, but some of the static platform devices
> predates that and they both lack a release callback (explicitly required
> since 2003) and are not cleaned up on registration failure.
>
> Since registration would essentially only fail during development (e.g.
> due to name collision or fault injection), this is hardly something to
> worry about, but we could consider moving towards dynamic objects to
> address both issues.
Agreed, this whole thing, including the error handling, is all just
theoretical as no real user ever hits this, which is why it has been
_way_ down my priority list.
> We have a few functions for allocating *and* registering platform
> devices that could be used in many of these cases (and they already
> clean up after themselves on errors):
>
> platform_device_register_simple()
> platform_device_register_data()
> platform_device_register_resndata()
> platform_device_register_full()
>
> and where those do not fit (and cannot be extended) we have the
> underlying:
>
> platform_device_alloc()
> platform_device_add_resources()
> platform_device_add_data()
> plaform_device_add()
>
> But there are some 800 static platform devices left, mostly in legacy
> platform code and board files that I assume few people care about.
Yes, I agree that we do have all of the needed apis here already, we
should just work at converting existing drivers to the new apis OR just
not caring at all as again, no one will ever hit these code paths :)
thanks,
greg k-h
^ permalink raw reply
* [PATCH net v2 0/2] net: airoha: Fix NULL pointer derefrences in airoha_qdma_cleanup()
From: Lorenzo Bianconi @ 2026-04-20 8:07 UTC (permalink / raw)
To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Lorenzo Bianconi
Cc: Simon Horman, linux-arm-kernel, linux-mediatek, netdev
Fix two possible NULL pointer derefrences in airoha_qdma_cleanup routine
if airoha_qdma_init() fails.
---
Changes in v2:
- Move page_pool allocation after desc list allocation in
airoha_qdma_init_rx_queue()
- Move netif_napi_add_tx() after irq desc queue allocation in
airoha_qdma_tx_irq_init()
- Link to v1: https://lore.kernel.org/r/20260417-airoha_qdma_init_rx_queue-fix-v1-0-db9fa5e468e5@kernel.org
---
Lorenzo Bianconi (2):
net: airoha: Move ndesc initialization at end of airoha_qdma_init_rx_queue()
net: airoha: Add size check for TX NAPIs in airoha_qdma_cleanup()
drivers/net/ethernet/airoha/airoha_eth.c | 25 +++++++++++++++----------
1 file changed, 15 insertions(+), 10 deletions(-)
---
base-commit: 0cf004ffb61cd32d140531c3a84afe975f9fc7ea
change-id: 20260417-airoha_qdma_init_rx_queue-fix-b9bfada51671
Best regards,
--
Lorenzo Bianconi <lorenzo@kernel.org>
^ permalink raw reply
* [PATCH net v2 1/2] net: airoha: Move ndesc initialization at end of airoha_qdma_init_rx_queue()
From: Lorenzo Bianconi @ 2026-04-20 8:07 UTC (permalink / raw)
To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Lorenzo Bianconi
Cc: Simon Horman, linux-arm-kernel, linux-mediatek, netdev
In-Reply-To: <20260420-airoha_qdma_init_rx_queue-fix-v2-0-d99347e5c18d@kernel.org>
If queue entry or DMA descriptor list allocation fails in
airoha_qdma_init_rx_queue routine, airoha_qdma_cleanup() will trigger a
NULL pointer dereference running netif_napi_del() for RX queue NAPIs
since netif_napi_add() has never been executed to this particular RX NAPI.
The issue is due to the early ndesc initialization in
airoha_qdma_init_rx_queue() since airoha_qdma_cleanup() relies on ndesc
value to check if the queue is properly initialized. Fix the issue moving
ndesc initialization at end of airoha_qdma_init_tx routine.
Move page_pool allocation after descriptor list allocation in order to
avoid memory leaks if desc allocation fails.
Fixes: 23020f049327 ("net: airoha: Introduce ethernet support for EN7581 SoC")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
drivers/net/ethernet/airoha/airoha_eth.c | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/drivers/net/ethernet/airoha/airoha_eth.c b/drivers/net/ethernet/airoha/airoha_eth.c
index e1ab15f1ee7d..fc79c456743c 100644
--- a/drivers/net/ethernet/airoha/airoha_eth.c
+++ b/drivers/net/ethernet/airoha/airoha_eth.c
@@ -745,14 +745,18 @@ static int airoha_qdma_init_rx_queue(struct airoha_queue *q,
dma_addr_t dma_addr;
q->buf_size = PAGE_SIZE / 2;
- q->ndesc = ndesc;
q->qdma = qdma;
- q->entry = devm_kzalloc(eth->dev, q->ndesc * sizeof(*q->entry),
+ q->entry = devm_kzalloc(eth->dev, ndesc * sizeof(*q->entry),
GFP_KERNEL);
if (!q->entry)
return -ENOMEM;
+ q->desc = dmam_alloc_coherent(eth->dev, ndesc * sizeof(*q->desc),
+ &dma_addr, GFP_KERNEL);
+ if (!q->desc)
+ return -ENOMEM;
+
q->page_pool = page_pool_create(&pp_params);
if (IS_ERR(q->page_pool)) {
int err = PTR_ERR(q->page_pool);
@@ -761,11 +765,7 @@ static int airoha_qdma_init_rx_queue(struct airoha_queue *q,
return err;
}
- q->desc = dmam_alloc_coherent(eth->dev, q->ndesc * sizeof(*q->desc),
- &dma_addr, GFP_KERNEL);
- if (!q->desc)
- return -ENOMEM;
-
+ q->ndesc = ndesc;
netif_napi_add(eth->napi_dev, &q->napi, airoha_qdma_rx_napi_poll);
airoha_qdma_wr(qdma, REG_RX_RING_BASE(qid), dma_addr);
--
2.53.0
^ permalink raw reply related
* [PATCH net v2 2/2] net: airoha: Add size check for TX NAPIs in airoha_qdma_cleanup()
From: Lorenzo Bianconi @ 2026-04-20 8:07 UTC (permalink / raw)
To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Lorenzo Bianconi
Cc: Simon Horman, linux-arm-kernel, linux-mediatek, netdev
In-Reply-To: <20260420-airoha_qdma_init_rx_queue-fix-v2-0-d99347e5c18d@kernel.org>
If airoha_qdma_init routine fails before airoha_qdma_tx_irq_init() runs
successfully for all TX NAPIs, airoha_qdma_cleanup() will
unconditionally runs netif_napi_del() on TX NAPIs, triggering a NULL
pointer dereference. Fix the issue relying on q_tx_irq size value to
check if the TX NAPIs is properly initialized in airoha_qdma_cleanup().
Moreover, run netif_napi_add_tx() just if irq_q queue is properly
allocated.
Fixes: 23020f049327 ("net: airoha: Introduce ethernet support for EN7581 SoC")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
drivers/net/ethernet/airoha/airoha_eth.c | 11 ++++++++---
1 file changed, 8 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/airoha/airoha_eth.c b/drivers/net/ethernet/airoha/airoha_eth.c
index fc79c456743c..fd8c4f817d85 100644
--- a/drivers/net/ethernet/airoha/airoha_eth.c
+++ b/drivers/net/ethernet/airoha/airoha_eth.c
@@ -996,8 +996,6 @@ static int airoha_qdma_tx_irq_init(struct airoha_tx_irq_queue *irq_q,
struct airoha_eth *eth = qdma->eth;
dma_addr_t dma_addr;
- netif_napi_add_tx(eth->napi_dev, &irq_q->napi,
- airoha_qdma_tx_napi_poll);
irq_q->q = dmam_alloc_coherent(eth->dev, size * sizeof(u32),
&dma_addr, GFP_KERNEL);
if (!irq_q->q)
@@ -1007,6 +1005,9 @@ static int airoha_qdma_tx_irq_init(struct airoha_tx_irq_queue *irq_q,
irq_q->size = size;
irq_q->qdma = qdma;
+ netif_napi_add_tx(eth->napi_dev, &irq_q->napi,
+ airoha_qdma_tx_napi_poll);
+
airoha_qdma_wr(qdma, REG_TX_IRQ_BASE(id), dma_addr);
airoha_qdma_rmw(qdma, REG_TX_IRQ_CFG(id), TX_IRQ_DEPTH_MASK,
FIELD_PREP(TX_IRQ_DEPTH_MASK, size));
@@ -1398,8 +1399,12 @@ static void airoha_qdma_cleanup(struct airoha_qdma *qdma)
}
}
- for (i = 0; i < ARRAY_SIZE(qdma->q_tx_irq); i++)
+ for (i = 0; i < ARRAY_SIZE(qdma->q_tx_irq); i++) {
+ if (!qdma->q_tx_irq[i].size)
+ continue;
+
netif_napi_del(&qdma->q_tx_irq[i].napi);
+ }
for (i = 0; i < ARRAY_SIZE(qdma->q_tx); i++) {
if (!qdma->q_tx[i].ndesc)
--
2.53.0
^ permalink raw reply related
* RE: [PATCH V13 02/12] PCI: host-generic: Add common helpers for parsing Root Port properties
From: Sherry Sun @ 2026-04-20 8:24 UTC (permalink / raw)
To: mani@kernel.org, Bjorn Helgaas
Cc: robh@kernel.org, krzk+dt@kernel.org, conor+dt@kernel.org,
Frank Li, s.hauer@pengutronix.de, kernel@pengutronix.de,
festevam@gmail.com, lpieralisi@kernel.org, kwilczynski@kernel.org,
bhelgaas@google.com, Hongxing Zhu, l.stach@pengutronix.de,
imx@lists.linux.dev, linux-pci@vger.kernel.org,
linux-arm-kernel@lists.infradead.org, devicetree@vger.kernel.org,
linux-kernel@vger.kernel.org
In-Reply-To: <viggqsxczf5d5hok4qpqhknalwb46xapsgdxbbgbqhruhyn2hn@wtck4yajmuw7>
> On Fri, Apr 17, 2026 at 02:55:33PM -0500, Bjorn Helgaas wrote:
> > On Fri, Apr 17, 2026 at 03:17:16AM +0000, Sherry Sun wrote:
> > > > On Thu, Apr 16, 2026 at 07:14:12PM +0800, Sherry Sun wrote:
> > > > > Introduce generic helper functions to parse Root Port device
> > > > > tree nodes and extract common properties like reset GPIOs. This
> > > > > allows multiple PCI host controller drivers to share the same
> > > > > parsing logic.
> > > > >
> > > > > Define struct pci_host_port to hold common Root Port properties
> > > > > (currently only reset GPIO descriptor) and add
> > > > > pci_host_common_parse_ports() to parse Root Port nodes from
> > > > > device tree.
> > > >
> > > > Are the Root Port and the RC the only possible places for 'reset'
> > > > GPIO descriptions in DT? I think PERST# routing is outside the
> > > > PCIe spec, so it seems like a system could provide a PERST# GPIO
> > > > routed to any Switch Upstream Port or Endpoint (I assume a PERST#
> > > > connected to a switch would apply to both the upstream port and
> > > > the downstream ports).
> > >
> > > Thanks for the feedback. You're right that PERST# routing could
> > > theoretically be connected to any device in the hierarchy. However,
> > > for this patch series, I've focused on the most common use case in
> > > practice: use Root Port level PERST# instead of the legacy Root
> > > Complex level PERST#.
> > >
> > > Root Port level PERST# - This is the primary target, where each Root
> > > Port has individual control over devices connected to it. RC level
> > > PERST# - Legacy binding support, where a single GPIO controls all
> > > ports.
> > >
> > > We can extend this framework later if real hardware emerges that
> > > needs Switch or EP-level PERST# control. I can add a comment
> > > documenting this limitation if needed.
> > >
> > > BTW, Mani and Rob had some great discussions in dt-schema about
> > > PERST# and WAKE# sideband signals settings.
> >
> > > You can check here:
> > > https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgi
> > > thub.com%2Fdevicetree-org%2Fdt-
> schema%2Fissues%2F168&data=05%7C02%7C
> > >
> sherry.sun%40nxp.com%7C232644f8bbe64279f77908de9ea20b09%7C686ea1
> d3bc
> > >
> 2b4c6fa92cd99c5c301635%7C0%7C0%7C639122615977862858%7CUnknown
> %7CTWFp
> > >
> bGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4z
> MiIs
> > >
> IkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=r7szCLCsGFN2
> 1ULZ
> > > ibH7Ga%2FH0e6VyIdqznKCJ6yIGM4%3D&reserved=0
> > > https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgi
> > > thub.com%2Fdevicetree-org%2Fdt-
> schema%2Fpull%2F126&data=05%7C02%7Csh
> > >
> erry.sun%40nxp.com%7C232644f8bbe64279f77908de9ea20b09%7C686ea1d
> 3bc2b
> > >
> 4c6fa92cd99c5c301635%7C0%7C0%7C639122615977892044%7CUnknown%7
> CTWFpbG
> > >
> Zsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiI
> sIk
> > >
> FOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=o3RIy1SfvTGfkX
> 9rm8
> > > dNH2or5SZ7v5bYF%2Fl1XGaf8aA%3D&reserved=0
> > > https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgi
> > > thub.com%2Fdevicetree-org%2Fdt-
> schema%2Fpull%2F170&data=05%7C02%7Csh
> > >
> erry.sun%40nxp.com%7C232644f8bbe64279f77908de9ea20b09%7C686ea1d
> 3bc2b
> > >
> 4c6fa92cd99c5c301635%7C0%7C0%7C639122615977910169%7CUnknown%7
> CTWFpbG
> > >
> Zsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiI
> sIk
> > >
> FOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=d8SBGcqKcjYe1i
> iqs9
> > > %2F%2Bg1o%2FbECHYtnEULg7hTXyKmY%3D&reserved=0
> >
> > The upshot of all those conversations is that WAKE# and PERST# can be
> > routed to arbitrary devices independent of the PCI topology.
> >
> > I think extending host-generic to look for 'reset' in Root Port nodes
> > is the right thing. My concern is more about where we store it. This
> > patch saves it in a new "pci_host_port" struct, but someday we'll want
> > a place to save the PERST# GPIOs for several slots behind a switch.
> > Then we'll have two different ways to save the same information.
> >
>
> Even if there are PERST# GPIOs from the host, connected to downstream
> ports of a PCIe switch, they could be stored in the Root Port's (pci_host_port)
> struct as a list of PERST#. This is what pcie-qcom driver does.
>
> It is too clumsy to handle PERST# individually for each device. We tried it
> before with pwrctrl, but it always ended up biting us on who gets to control
> the PERST#. We can't let pwrctrl handle PERST# for a switch port and host
> controller driver handle it for RP. And we cannot let pwrctrl handle PERST# for
> all ports, because, host controller drivers also need to control them for RC
> initialization.
>
> That's why it was decided to handle PERST# for all ports in the host controller
> drivers. So following that pattern, this helper could also be extended to parse
> the PERST# from all ports defined in DT and store them in the same Root Port
> struct.
>
> It should be trivial to implement this logic in the current helper. @Sherry:
> Could you please implement this logic?
Hi Mani, do you mean the similar logic in this patch?
https://lore.kernel.org/all/20251216-pci-pwrctrl-rework-v2-1-745a563b9be6@oss.qualcomm.com/
If yes, of cause I can do this for current helper functions in pci-host-common.c.
Best Regards
Sherry
^ permalink raw reply
* Re: [PATCH v7 3/4] KVM: arm64: PMU: Introduce FIXED_COUNTERS_ONLY
From: Akihiko Odaki @ 2026-04-20 8:36 UTC (permalink / raw)
To: Marc Zyngier
Cc: Oliver Upton, Joey Gouly, Suzuki K Poulose, Zenghui Yu,
Catalin Marinas, Will Deacon, Kees Cook, Gustavo A. R. Silva,
Paolo Bonzini, Jonathan Corbet, Shuah Khan, linux-arm-kernel,
kvmarm, linux-kernel, linux-hardening, devel, kvm, linux-doc,
linux-kselftest
In-Reply-To: <87ldeic1gk.wl-maz@kernel.org>
On 2026/04/20 2:19, Marc Zyngier wrote:
> On Sat, 18 Apr 2026 09:14:25 +0100,
> Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> wrote:
>>
>> On a heterogeneous arm64 system, KVM's PMU emulation is based on the
>> features of a single host PMU instance. When a vCPU is migrated to a
>> pCPU with an incompatible PMU, counters such as PMCCNTR_EL0 stop
>> incrementing.
>>
>> Although this behavior is permitted by the architecture, Windows does
>> not handle it gracefully and may crash with a division-by-zero error.
>>
>> The current workaround requires VMMs to pin vCPUs to a set of pCPUs
>> that share a compatible PMU. This is difficult to implement correctly in
>> QEMU/libvirt, where pinning occurs after vCPU initialization, and it
>> also restricts the guest to a subset of available pCPUs.
>>
>> Introduce the KVM_ARM_VCPU_PMU_V3_FIXED_COUNTERS_ONLY attribute to
>> create a "fixed-counters-only" PMU. When set, KVM exposes a PMU that is
>> compatible with all pCPUs but that does not support programmable
>> event counters which may have different feature sets on different PMUs.
>>
>> This allows Windows guests to run reliably on heterogeneous systems
>> without crashing, even without vCPU pinning, and enables VMMs to
>> schedule vCPUs across all available pCPUs, making full use of the host
>> hardware.
>>
>> Much like KVM_ARM_VCPU_PMU_V3_IRQ and other read-write attributes, this
>> attribute provides a getter that facilitates kernel and userspace
>> debugging/testing.
>
> OK, so that's the sales pitch. But how is it implemented? I would like
> to be able to read a high-level description of the implementation
> trade-offs.
Implementation-wise it is very trivial. Essentially the following
addition in kvm_arm_pmu_v3_get_attr() is the entire implementation:
+ case KVM_ARM_VCPU_PMU_V3_FIXED_COUNTERS_ONLY:
+ if (test_bit(KVM_ARCH_FLAG_PMU_V3_FIXED_COUNTERS_ONLY,
&vcpu->kvm->arch.flags))
+ return 0;
Both its functionality and code complexity is trivial. So we can argue that:
- the functionality is too trivial to be useful or
- the interface/implementation complexity is so trivial that it does not
incur maintenance burden
In this case the selftest uses the getter so I was more inclined to have
it, but adding one just for the selftest sounds too ad-hoc, so here I
looked into other attributes to ensure that it was not introducing
inconsistency with existing interfaces.
As the result, I found there are other read-write attributes; in fact
there are more read-write attributes than write-only ones.
>
>>
>> Signed-off-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp>
>> ---
>> Documentation/virt/kvm/devices/vcpu.rst | 29 ++++++
>> arch/arm64/include/asm/kvm_host.h | 2 +
>> arch/arm64/include/uapi/asm/kvm.h | 1 +
>> arch/arm64/kvm/arm.c | 1 +
>> arch/arm64/kvm/pmu-emul.c | 155 +++++++++++++++++++++++---------
>> include/kvm/arm_pmu.h | 2 +
>> 6 files changed, 147 insertions(+), 43 deletions(-)
>>
>> diff --git a/Documentation/virt/kvm/devices/vcpu.rst b/Documentation/virt/kvm/devices/vcpu.rst
>> index 60bf205cb373..e0aeb1897d77 100644
>> --- a/Documentation/virt/kvm/devices/vcpu.rst
>> +++ b/Documentation/virt/kvm/devices/vcpu.rst
>> @@ -161,6 +161,35 @@ explicitly selected, or the number of counters is out of range for the
>> selected PMU. Selecting a new PMU cancels the effect of setting this
>> attribute.
>>
>> +1.6 ATTRIBUTE: KVM_ARM_VCPU_PMU_V3_FIXED_COUNTERS_ONLY
>> +------------------------------------------------------
>> +
>> +:Parameters: no additional parameter in kvm_device_attr.addr
>> +
>> +:Returns:
>> +
>> + ======= =====================================================
>> + -EBUSY Attempted to set after initializing PMUv3 or running
>> + VCPU, or attempted to set for the first time after
>> + setting an event filter
>> + -ENXIO Attempted to get before setting
>> + -ENODEV Attempted to set while PMUv3 not supported
>> + ======= =====================================================
>> +
>> +If set, PMUv3 will be emulated without programmable event counters. The VCPU
>> +will use any compatible hardware PMU. This attribute is particularly useful on
>
> Not quite "any PMU". It will use *the* PMU of the physical CPU,
> irrespective of the implementation.
I think:
- this comment
- one on the KVM_EXIT_FAIL_ENTRY_CPU_UNSUPPORTED note
- one on kvm_pmu_create_perf_event()
- and one on kvm_arm_pmu_v3_set_pmu_fixed_counters_only()
All boil down into one question: will it support all possible CPUs, or
will it support a subset? Let me answer here:
This patch is written to support a subset instead of all possible CPUs.
If a pCPU does not have a compatible PMU, the pCPU will not be supported
and cause KVM_EXIT_FAIL_ENTRY_CPU_UNSUPPORTED.
This patch does not enforce all possible CPUs are covered by the
compatible PMUs. Theoretically speaking, kvm_arm_pmu_get_pmuver_limit()
enables the PMU emulation when real PMUv3 hardware covers all possible
CPUs *or* the relevant registers can be trapped with IMPDEF, so some
pCPU may not have a compatible PMU and only provide the IMPDEF trapping.
Practically, I don't think any sane configuration will ever have such a
subset support, so we can explicitly enforce all possible CPUs are
covered by the compatible PMUs if desired.
>
>> +heterogeneous systems where different hardware PMUs cover different physical
>> +CPUs. The compatibility of hardware PMUs can be checked with
>> +KVM_ARM_VCPU_PMU_V3_SET_PMU. All VCPUs in a VM share this attribute. It isn't
>> +possible to set it for the first time if a PMU event filter is already present.
>
> "for the first time" gives the impression that it will work if you try
> again. I'd rather we say that "This feature is incompatible with the
> existence of a PMU event filter".
The following sequence will work:
1. Set KVM_ARM_VCPU_PMU_V3_FIXED_COUNTERS_ONLY
2. Set KVM_ARM_VCPU_PMU_V3_FILTER
3. Set KVM_ARM_VCPU_PMU_V3_FIXED_COUNTERS_ONLY
This is to make the behavior conistent with KVM_ARM_VCPU_PMU_V3_SET_PMU.
>
> Furthermore, the architecture currently describes *two* fixed-function
> counters (cycles and instructions), while KVM only expose the cycle
> counter. I'm all for the extra abstraction, but what does it mean for
> migration if we enable FEAT_PMUv3_ICNTR?
I'll answe this at the end of this email.
>
>> +
>> +Note that KVM will not make any attempts to run the VCPU on the physical CPUs
>> +with compatible hardware PMUs. This is entirely left to userspace. However,
>> +attempting to run the VCPU on an unsupported CPU will fail and KVM_RUN will
>> +return with exit_reason = KVM_EXIT_FAIL_ENTRY and populate the fail_entry struct
>> +by setting hardware_entry_failure_reason field to
>> +KVM_EXIT_FAIL_ENTRY_CPU_UNSUPPORTED and the cpu field to the processor id.
>> +
>
> This is mostly a copy-paste of the previous section. How relevant is
> this to the fixed-counters-only feature? If the whole point of this
> stuff is to ensure compatibility across CPUs with different PMU
> implementations, surely what you describe here is the opposite of what
> you want.
Please see the earlier discussion of supported pCPUs.
>
> My preference would be to move this to a separate patch in any case,
> more on that below.
I will do so with the next version.
>
>> 2. GROUP: KVM_ARM_VCPU_TIMER_CTRL
>> =================================
>>
>> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
>> index 59f25b85be2b..b59e0182472c 100644
>> --- a/arch/arm64/include/asm/kvm_host.h
>> +++ b/arch/arm64/include/asm/kvm_host.h
>> @@ -353,6 +353,8 @@ struct kvm_arch {
>> #define KVM_ARCH_FLAG_WRITABLE_IMP_ID_REGS 10
>> /* Unhandled SEAs are taken to userspace */
>> #define KVM_ARCH_FLAG_EXIT_SEA 11
>> + /* PMUv3 is emulated without progammable event counters */
>> +#define KVM_ARCH_FLAG_PMU_V3_FIXED_COUNTERS_ONLY 12
>> unsigned long flags;
>>
>> /* VM-wide vCPU feature set */
>> diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
>> index a792a599b9d6..474c84fa757f 100644
>> --- a/arch/arm64/include/uapi/asm/kvm.h
>> +++ b/arch/arm64/include/uapi/asm/kvm.h
>> @@ -436,6 +436,7 @@ enum {
>> #define KVM_ARM_VCPU_PMU_V3_FILTER 2
>> #define KVM_ARM_VCPU_PMU_V3_SET_PMU 3
>> #define KVM_ARM_VCPU_PMU_V3_SET_NR_COUNTERS 4
>> +#define KVM_ARM_VCPU_PMU_V3_FIXED_COUNTERS_ONLY 5
>> #define KVM_ARM_VCPU_TIMER_CTRL 1
>> #define KVM_ARM_VCPU_TIMER_IRQ_VTIMER 0
>> #define KVM_ARM_VCPU_TIMER_IRQ_PTIMER 1
>> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
>> index 620a465248d1..dca16ca26d32 100644
>> --- a/arch/arm64/kvm/arm.c
>> +++ b/arch/arm64/kvm/arm.c
>> @@ -634,6 +634,7 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>> if (has_vhe())
>> kvm_vcpu_load_vhe(vcpu);
>> kvm_arch_vcpu_load_fp(vcpu);
>> + kvm_vcpu_load_pmu(vcpu);
>> kvm_vcpu_pmu_restore_guest(vcpu);
>> if (kvm_arm_is_pvtime_enabled(&vcpu->arch))
>> kvm_make_request(KVM_REQ_RECORD_STEAL, vcpu);
>> diff --git a/arch/arm64/kvm/pmu-emul.c b/arch/arm64/kvm/pmu-emul.c
>> index ef5140bbfe28..d1009c144581 100644
>> --- a/arch/arm64/kvm/pmu-emul.c
>> +++ b/arch/arm64/kvm/pmu-emul.c
>> @@ -326,7 +326,10 @@ u64 kvm_pmu_implemented_counter_mask(struct kvm_vcpu *vcpu)
>>
>> static void kvm_pmc_enable_perf_event(struct kvm_pmc *pmc)
>> {
>> - if (!pmc->perf_event) {
>> + struct kvm_vcpu *vcpu = kvm_pmc_to_vcpu(pmc);
>> +
>> + if (!pmc->perf_event ||
>> + !cpumask_test_cpu(vcpu->cpu, &to_arm_pmu(pmc->perf_event->pmu)->supported_cpus)) {
>> kvm_pmu_create_perf_event(pmc);
>> return;
>> }
>> @@ -667,10 +670,8 @@ static bool kvm_pmc_counts_at_el2(struct kvm_pmc *pmc)
>> return kvm_pmc_read_evtreg(pmc) & ARMV8_PMU_INCLUDE_EL2;
>> }
>>
>> -static int kvm_map_pmu_event(struct kvm *kvm, unsigned int eventsel)
>> +static int kvm_map_pmu_event(struct arm_pmu *pmu, unsigned int eventsel)
>> {
>> - struct arm_pmu *pmu = kvm->arch.arm_pmu;
>> -
>> /*
>> * The CPU PMU likely isn't PMUv3; let the driver provide a mapping
>> * for the guest's PMUv3 event ID.
>
> This refactor should be in its own patch. This sort of minor change is
> adding noise to the mean of the patch, for no good reason.
I'll make that change with the next version too.
>
>> @@ -681,6 +682,23 @@ static int kvm_map_pmu_event(struct kvm *kvm, unsigned int eventsel)
>> return eventsel;
>> }
>>
>> +static struct arm_pmu *kvm_pmu_probe_armpmu(int cpu)
>> +{
>> + struct arm_pmu_entry *entry;
>> + struct arm_pmu *pmu;
>> +
>> + guard(rcu)();
>> +
>> + list_for_each_entry_rcu(entry, &arm_pmus, entry) {
>> + pmu = entry->arm_pmu;
>> +
>> + if (cpumask_test_cpu(cpu, &pmu->supported_cpus))
>> + return pmu;
>> + }
>> +
>> + return NULL;
>> +}
>> +
>> /**
>> * kvm_pmu_create_perf_event - create a perf event for a counter
>> * @pmc: Counter context
>> @@ -694,6 +712,12 @@ static void kvm_pmu_create_perf_event(struct kvm_pmc *pmc)
>> int eventsel;
>> u64 evtreg;
>>
>> + if (test_bit(KVM_ARCH_FLAG_PMU_V3_FIXED_COUNTERS_ONLY, &vcpu->kvm->arch.flags)) {
>> + arm_pmu = kvm_pmu_probe_armpmu(vcpu->cpu);
>> + if (!arm_pmu)
>> + return;
>
> How is it possible to not get a PMU here? We don't expose the PMU to a
> guest at all if there are CPUs without PMUs, see the comment in
> kvm_host_pmu_init(). So I'd expect this to never fail.
Please see the earlier comment.
>
>> + }
>> +
>> evtreg = kvm_pmc_read_evtreg(pmc);
>>
>> kvm_pmu_stop_counter(pmc);
>> @@ -722,7 +746,7 @@ static void kvm_pmu_create_perf_event(struct kvm_pmc *pmc)
>> * Don't create an event if we're running on hardware that requires
>> * PMUv3 event translation and we couldn't find a valid mapping.
>> */
>> - eventsel = kvm_map_pmu_event(vcpu->kvm, eventsel);
>> + eventsel = kvm_map_pmu_event(arm_pmu, eventsel);
>> if (eventsel < 0)
>> return;
>>
>> @@ -810,42 +834,6 @@ void kvm_host_pmu_init(struct arm_pmu *pmu)
>> list_add_tail_rcu(&entry->entry, &arm_pmus);
>> }
>>
>> -static struct arm_pmu *kvm_pmu_probe_armpmu(void)
>> -{
>> - struct arm_pmu_entry *entry;
>> - struct arm_pmu *pmu;
>> - int cpu;
>> -
>> - guard(rcu)();
>> -
>> - /*
>> - * It is safe to use a stale cpu to iterate the list of PMUs so long as
>> - * the same value is used for the entirety of the loop. Given this, and
>> - * the fact that no percpu data is used for the lookup there is no need
>> - * to disable preemption.
>> - *
>> - * It is still necessary to get a valid cpu, though, to probe for the
>> - * default PMU instance as userspace is not required to specify a PMU
>> - * type. In order to uphold the preexisting behavior KVM selects the
>> - * PMU instance for the core during vcpu init. A dependent use
>> - * case would be a user with disdain of all things big.LITTLE that
>> - * affines the VMM to a particular cluster of cores.
>> - *
>> - * In any case, userspace should just do the sane thing and use the UAPI
>> - * to select a PMU type directly. But, be wary of the baggage being
>> - * carried here.
>> - */
>> - cpu = raw_smp_processor_id();
>> - list_for_each_entry_rcu(entry, &arm_pmus, entry) {
>> - pmu = entry->arm_pmu;
>> -
>> - if (cpumask_test_cpu(cpu, &pmu->supported_cpus))
>> - return pmu;
>> - }
>> -
>> - return NULL;
>> -}
>> -
>
> Same thing for the refactoring of this function. Moving it, changing
> the signature and moving the comment somewhere else would be better
> placed on its own.
This will be in a separate patch with the next version.
>
>> static u64 __compute_pmceid(struct arm_pmu *pmu, bool pmceid1)
>> {
>> u32 hi[2], lo[2];
>> @@ -888,6 +876,9 @@ u64 kvm_pmu_get_pmceid(struct kvm_vcpu *vcpu, bool pmceid1)
>> u64 val, mask = 0;
>> int base, i, nr_events;
>>
>> + if (test_bit(KVM_ARCH_FLAG_PMU_V3_FIXED_COUNTERS_ONLY, &vcpu->kvm->arch.flags))
>> + return 0;
>> +
>> if (!pmceid1) {
>> val = compute_pmceid0(cpu_pmu);
>> base = 0;
>> @@ -915,6 +906,26 @@ u64 kvm_pmu_get_pmceid(struct kvm_vcpu *vcpu, bool pmceid1)
>> return val & mask;
>> }
>>
>> +void kvm_vcpu_load_pmu(struct kvm_vcpu *vcpu)
>> +{
>> + unsigned long mask = kvm_pmu_enabled_counter_mask(vcpu);
>> + struct kvm_pmc *pmc;
>> + struct arm_pmu *cpu_pmu;
>
> Move these to be inside the loop.
I followed the pattern of other functions, but I agree this new code can
follow a more modern style. It will be done with the next version.
>
>> + int i;
>> +
>> + for_each_set_bit(i, &mask, 32) {
>> + pmc = kvm_vcpu_idx_to_pmc(vcpu, i);
>> + if (!pmc->perf_event)
>> + continue;
>> +
>> + cpu_pmu = to_arm_pmu(pmc->perf_event->pmu);
>> + if (!cpumask_test_cpu(vcpu->cpu, &cpu_pmu->supported_cpus)) {
>> + kvm_make_request(KVM_REQ_RELOAD_PMU, vcpu);
>> + break;
>> + }
>> + }
>> +}
>> +
>
> Why do we need to inflict this on VMs that do not have the fixed
> counter restriction?
This function is to re-create the perf_event in case the current
perf_event does not support the pCPU because e.g., the pCPU is a E-core
while the perf_event only covers the P-cores.
>
> And even then, all you have to reconfigure is the cycle counter. So
> why the loop? All we want to find out is whether the cycle counter is
> instantiated on the PMU that matches the current CPU.
I just wanted to avoid hardcoding assumptions on the fixed counter(s).
FEAT_PMUv3_ICNTR will be naturaly handled with a loop, for example.
>
>> void kvm_vcpu_reload_pmu(struct kvm_vcpu *vcpu)
>> {
>> u64 mask = kvm_pmu_implemented_counter_mask(vcpu);
>> @@ -1016,6 +1027,9 @@ u8 kvm_arm_pmu_get_max_counters(struct kvm *kvm)
>> {
>> struct arm_pmu *arm_pmu = kvm->arch.arm_pmu;
>>
>> + if (test_bit(KVM_ARCH_FLAG_PMU_V3_FIXED_COUNTERS_ONLY, &kvm->arch.flags))
>> + return 0;
>> +
>> /*
>> * PMUv3 requires that all event counters are capable of counting any
>> * event, though the same may not be true of non-PMUv3 hardware.
>> @@ -1070,7 +1084,24 @@ static void kvm_arm_set_pmu(struct kvm *kvm, struct arm_pmu *arm_pmu)
>> */
>> int kvm_arm_set_default_pmu(struct kvm *kvm)
>> {
>> - struct arm_pmu *arm_pmu = kvm_pmu_probe_armpmu();
>> + /*
>> + * It is safe to use a stale cpu to iterate the list of PMUs so long as
>> + * the same value is used for the entirety of the loop. Given this, and
>> + * the fact that no percpu data is used for the lookup there is no need
>> + * to disable preemption.
>> + *
>> + * It is still necessary to get a valid cpu, though, to probe for the
>> + * default PMU instance as userspace is not required to specify a PMU
>> + * type. In order to uphold the preexisting behavior KVM selects the
>> + * PMU instance for the core during vcpu init. A dependent use
>> + * case would be a user with disdain of all things big.LITTLE that
>> + * affines the VMM to a particular cluster of cores.
>> + *
>> + * In any case, userspace should just do the sane thing and use the UAPI
>> + * to select a PMU type directly. But, be wary of the baggage being
>> + * carried here.
>> + */
>> + struct arm_pmu *arm_pmu = kvm_pmu_probe_armpmu(raw_smp_processor_id());
>>
>> if (!arm_pmu)
>> return -ENODEV;
>> @@ -1098,6 +1129,7 @@ static int kvm_arm_pmu_v3_set_pmu(struct kvm_vcpu *vcpu, int pmu_id)
>> break;
>> }
>>
>> + clear_bit(KVM_ARCH_FLAG_PMU_V3_FIXED_COUNTERS_ONLY, &kvm->arch.flags);
>
> Why does this need to be cleared? I'd rather we make sure it is never
> set the first place.
KVM_ARM_VCPU_PMU_V3_SET_PMU and KVM_ARM_VCPU_PMU_V3_FIXED_COUNTERS_ONLY
can be set on the same VCPU. The last KVM_ARM_VCPU_PMU_V3_SET_PMU or
KVM_ARM_VCPU_PMU_V3_FIXED_COUNTERS_ONLY setting will be effective.
A VMM may try set these attributes to check if the setting is supported.
For example, the RFC QEMU patch first uses KVM_ARM_VCPU_PMU_V3_SET_PMU
to find a compatible PMU that covers all pCPUs, and then falls back to
KVM_ARM_VCPU_PMU_V3_FIXED_COUNTERS_ONLY. The order of such probing is up
to the VMM.
This rationale applies also to the next comment.
>
>> kvm_arm_set_pmu(kvm, arm_pmu);
>> cpumask_copy(kvm->arch.supported_cpus, &arm_pmu->supported_cpus);
>> ret = 0;
>> @@ -1108,11 +1140,42 @@ static int kvm_arm_pmu_v3_set_pmu(struct kvm_vcpu *vcpu, int pmu_id)
>> return ret;
>> }
>>
>> +static int kvm_arm_pmu_v3_set_pmu_fixed_counters_only(struct kvm_vcpu *vcpu)
>> +{
>> + struct kvm *kvm = vcpu->kvm;
>> + struct arm_pmu_entry *entry;
>> + struct arm_pmu *arm_pmu;
>> + struct cpumask *supported_cpus = kvm->arch.supported_cpus;
>> +
>> + lockdep_assert_held(&kvm->arch.config_lock);
>> +
>> + if (kvm_vm_has_ran_once(kvm) ||
>> + (kvm->arch.pmu_filter &&
>> + !test_bit(KVM_ARCH_FLAG_PMU_V3_FIXED_COUNTERS_ONLY, &kvm->arch.flags)))
>> + return -EBUSY;
>> +
>> + set_bit(KVM_ARCH_FLAG_PMU_V3_FIXED_COUNTERS_ONLY, &kvm->arch.flags);
>> + kvm_arm_set_nr_counters(kvm, 0);
>> + cpumask_clear(supported_cpus);
>
> What is the purpose of this cpumask_clear()? Under what conditions can
> you have something else?
>
>> +
>> + guard(rcu)();
>> +
>> + list_for_each_entry_rcu(entry, &arm_pmus, entry) {
>> + arm_pmu = entry->arm_pmu;
>> + cpumask_or(supported_cpus, supported_cpus, &arm_pmu->supported_cpus);
>
> Why isn't supported_cpus directly set to possible_cpus? Isn't that the
> base requirement that you can run on any CPU at all?
Please see the earlier discussion of supported pCPUs.
>
>> + }
>> +
>> + return 0;
>> +}
>> +
>> static int kvm_arm_pmu_v3_set_nr_counters(struct kvm_vcpu *vcpu, unsigned int n)
>> {
>> struct kvm *kvm = vcpu->kvm;
>>
>> - if (!kvm->arch.arm_pmu)
>> + lockdep_assert_held(&kvm->arch.config_lock);
>> +
>> + if (!kvm->arch.arm_pmu &&
>> + !test_bit(KVM_ARCH_FLAG_PMU_V3_FIXED_COUNTERS_ONLY, &kvm->arch.flags))
>> return -EINVAL;
>>
>> if (n > kvm_arm_pmu_get_max_counters(kvm))
>> @@ -1227,6 +1290,8 @@ int kvm_arm_pmu_v3_set_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
>>
>> return kvm_arm_pmu_v3_set_nr_counters(vcpu, n);
>> }
>> + case KVM_ARM_VCPU_PMU_V3_FIXED_COUNTERS_ONLY:
>> + return kvm_arm_pmu_v3_set_pmu_fixed_counters_only(vcpu);
>> case KVM_ARM_VCPU_PMU_V3_INIT:
>> return kvm_arm_pmu_v3_init(vcpu);
>> }
>> @@ -1253,6 +1318,9 @@ int kvm_arm_pmu_v3_get_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
>> irq = vcpu->arch.pmu.irq_num;
>> return put_user(irq, uaddr);
>> }
>> + case KVM_ARM_VCPU_PMU_V3_FIXED_COUNTERS_ONLY:
>> + if (test_bit(KVM_ARCH_FLAG_PMU_V3_FIXED_COUNTERS_ONLY, &vcpu->kvm->arch.flags))
>
> With 6 occurrences of this test_bit(), it feels like it'd be valuable
> to have a dedicate predicate to help with readability.
I'll add one with the next version.
>
>> + return 0;
>> }
>>
>> return -ENXIO;
>> @@ -1266,6 +1334,7 @@ int kvm_arm_pmu_v3_has_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
>> case KVM_ARM_VCPU_PMU_V3_FILTER:
>> case KVM_ARM_VCPU_PMU_V3_SET_PMU:
>> case KVM_ARM_VCPU_PMU_V3_SET_NR_COUNTERS:
>> + case KVM_ARM_VCPU_PMU_V3_FIXED_COUNTERS_ONLY:
>> if (kvm_vcpu_has_pmu(vcpu))
>> return 0;
>> }
>> diff --git a/include/kvm/arm_pmu.h b/include/kvm/arm_pmu.h
>> index 96754b51b411..1375cbaf97b2 100644
>> --- a/include/kvm/arm_pmu.h
>> +++ b/include/kvm/arm_pmu.h
>> @@ -56,6 +56,7 @@ void kvm_pmu_software_increment(struct kvm_vcpu *vcpu, u64 val);
>> void kvm_pmu_handle_pmcr(struct kvm_vcpu *vcpu, u64 val);
>> void kvm_pmu_set_counter_event_type(struct kvm_vcpu *vcpu, u64 data,
>> u64 select_idx);
>> +void kvm_vcpu_load_pmu(struct kvm_vcpu *vcpu);
>> void kvm_vcpu_reload_pmu(struct kvm_vcpu *vcpu);
>> int kvm_arm_pmu_v3_set_attr(struct kvm_vcpu *vcpu,
>> struct kvm_device_attr *attr);
>> @@ -161,6 +162,7 @@ static inline u64 kvm_pmu_get_pmceid(struct kvm_vcpu *vcpu, bool pmceid1)
>> static inline void kvm_pmu_update_vcpu_events(struct kvm_vcpu *vcpu) {}
>> static inline void kvm_vcpu_pmu_restore_guest(struct kvm_vcpu *vcpu) {}
>> static inline void kvm_vcpu_pmu_restore_host(struct kvm_vcpu *vcpu) {}
>> +static inline void kvm_vcpu_load_pmu(struct kvm_vcpu *vcpu) {}
>> static inline void kvm_vcpu_reload_pmu(struct kvm_vcpu *vcpu) {}
>> static inline u8 kvm_arm_pmu_get_pmuver_limit(void)
>> {
>>
>
> In conclusion, I find this patch to be rather messy. For a start, it
> needs to be split in at least 5 patches:
>
> - at least two for the refactoring
> - one for the PMU core changes
> - one for the UAPI
> - one for documentation
That clarifies the expected granurarity of patches. The next version
will be in that layout, perhaps with more patches if an additional
change. Thanks for the guidance.
>
> I'd also like some clarification on how this is intended to work if we
> enable FEAT_PMUv3_ICNTR, because the definition seems to be designed
> to encompass all fixed-function counters, and I expect this to grow
> over time.
Indeed the UAPI was designed to encompass all fixed-function counters as
suggested by Oliver.
To support the UAPI, the implementation avoids hardcoding the assumption
on the fixed counter(s). FEAT_PMUv3_INCTR will be naturaly supported
once the common code is properly updated (i.e., the size of the event
counter bitmask is grown the corresponding registers are wired up with a
proper check of the feature.)
I expect migration will be handled with the conventional register
getters and setters, but please share if you have a concern.
>
> I'm also not planning to look at the selftest at this stage.
That is completely understandable; I'll focus on refining the design and
implementation for the next version first.
Regards,
Akihiko Odaki
^ permalink raw reply
* Re: [PATCH] iommu/arm-smmu-qcom: Fix fastrpc compatible string in ACTLR client match table
From: Bibek Kumar Patro @ 2026-04-20 8:38 UTC (permalink / raw)
To: Shawn Guo
Cc: Rob Clark, Will Deacon, Robin Murphy, Joerg Roedel,
Dmitry Baryshkov, iommu, linux-arm-msm, linux-arm-kernel,
linux-kernel, srinivas.kandagatla
In-Reply-To: <aeSly0N7IkXHYExB@QCOM-aGQu4IUr3Y>
On 4/19/2026 3:22 PM, Shawn Guo wrote:
> On Wed, Apr 08, 2026 at 06:38:25PM +0530, bibek.patro@oss.qualcomm.com wrote:
>> From: Bibek Kumar Patro <bibek.patro@oss.qualcomm.com>
> ...
>> Assisted-by: Anthropic:claude-4-6-sonnet
>
> Nit - coding-assistants.rst suggests format:
>
> Assisted-by: AGENT_NAME:MODEL_VERSION
>
> So I guess this might be better?
>
> Assisted-by: Claude:claude-4-6-sonnet
>
Agreed. Thanks for pointing this out, Shawn.
I will update the Assisted-by tag to follow the recommendation in
coding-assistants.rst.
Thanks & regards,
Bibek
> Shawn
>
>> Fixes: 3e35c3e725de ("iommu/arm-smmu: Add ACTLR data and support for qcom_smmu_500")
>> Signed-off-by: Bibek Kumar Patro <bibek.patro@oss.qualcomm.com>
^ permalink raw reply
* Re: [PATCH v3 8/8] unwind: arm64: Use sframe to unwind interrupt frames.
From: Jens Remus @ 2026-04-20 8:42 UTC (permalink / raw)
To: Dylan Hatch
Cc: Roman Gushchin, Weinan Liu, Will Deacon, Josh Poimboeuf,
Indu Bhagat, Peter Zijlstra, Steven Rostedt, Catalin Marinas,
Jiri Kosina, Mark Rutland, Prasanna Kumar T S M, Puranjay Mohan,
Song Liu, joe.lawrence, linux-toolchains, linux-kernel,
live-patching, linux-arm-kernel, Heiko Carstens
In-Reply-To: <CADBMgpwjDf44p0ApR1=XVStCyN-0Q6tuywJ4ixLcbaLZOSjjBg@mail.gmail.com>
On 4/20/2026 7:56 AM, Dylan Hatch wrote:
> On Fri, Apr 17, 2026 at 8:45 AM Jens Remus <jremus@linux.ibm.com> wrote:
>>> + case UNWIND_CFA_RULE_REG_OFFSET:
>>> + case UNWIND_CFA_RULE_REG_OFFSET_DEREF:
>>> + if (!regs)
>>
>> if (!regs || frame.cfa.regnum > 30)
>>
>>> + return -EINVAL;
>>> + cfa = regs->regs[frame.cfa.regnum];
>>
>> In unwind user this is guarded by a topmost frame check, as arbitrary
>> registers are otherwise not available. Isn't this necessary in the
>> kernel case?
>
> It is necessary, though as you point out the way I wrote the check is
> not as obvious as it probably should be.
>
> The saved state->regs is set when the current frame is recovered from
> the saved PC of a struct pt_regs, and then immediately set back to
> NULL after the next frame has been recovered. In other words, the
> state->regs is only ever set when it is relevant to the current frame,
> which occurs when state->source == KUNWIND_SOURCE_REGS_PC. This only
> happens when the topmost frame is recovered from a pt_regs, or when a
> pt_regs is recovered from the stack due to an interrupt.
>
> I can make this more readable by adding an explicit check for
> KUNWIND_SOURCE_REGS_PC in addition to state->regs != NULL.
Thanks for the explanation! Maybe just add an explanation to the commit
message and a short comment above the (!regs) test?
/* regs only available in topmost frame */
Regards,
Jens
--
Jens Remus
Linux on Z Development (D3303)
jremus@de.ibm.com / jremus@linux.ibm.com
IBM Deutschland Research & Development GmbH; Vorsitzender des Aufsichtsrats: Wolfgang Wendt; Geschäftsführung: David Faller; Sitz der Gesellschaft: Ehningen; Registergericht: Amtsgericht Stuttgart, HRB 243294
IBM Data Privacy Statement: https://www.ibm.com/privacy/
^ permalink raw reply
* Re: [PATCH] crypto: tstmgr - guard xxhash tests
From: Herbert Xu @ 2026-04-20 8:45 UTC (permalink / raw)
To: Hamza Mahfooz
Cc: linux-crypto, David S. Miller, Maxime Coquelin, Alexandre Torgue,
linux-stm32, linux-arm-kernel, linux-kernel, Jeff Barnes,
Paul Monson
In-Reply-To: <aeJw9I38heQRbbe6@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net>
On Fri, Apr 17, 2026 at 10:42:12AM -0700, Hamza Mahfooz wrote:
>
> It appears that commit 6318fbe26e67 ("crypto: testmgr - Hide ENOENT
> errors better"), already does exactly that and it appears to resolve the
> issue that I'm seeing. So, is there any reason it can't be backported to
> stable?
Sure I don't see anything wrong with that.
Thanks,
--
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply
* Re: [PATCH] gpio: rockchip: Fix GPIO after convert to dynamic base allocation
From: Bartosz Golaszewski @ 2026-04-20 8:46 UTC (permalink / raw)
To: Linus Walleij, Bartosz Golaszewski, Heiko Stuebner, Shawn Lin,
Jonas Karlman
Cc: Bartosz Golaszewski, linux-gpio, linux-arm-kernel, linux-rockchip,
linux-kernel
In-Reply-To: <20260416154928.2103388-1-jonas@kwiboo.se>
On Thu, 16 Apr 2026 15:49:28 +0000, Jonas Karlman wrote:
> The commit c8079f83e0bf ("gpio: rockchip: convert to dynamic GPIO base
> allocation") broke GPIO on devices using device trees which don't set
> the gpio-ranges property, something only Rockchip RK35xx SoC DTs do.
>
> On a Rockchip RK3399 device something like following is now observed:
>
> [ 0.082771] rockchip-gpio ff720000.gpio: probed /pinctrl/gpio@ff720000
> [ 0.083531] rockchip-gpio ff730000.gpio: probed /pinctrl/gpio@ff730000
> [ 0.084110] rockchip-gpio ff780000.gpio: probed /pinctrl/gpio@ff780000
> [ 0.084746] rockchip-gpio ff788000.gpio: probed /pinctrl/gpio@ff788000
> [ 0.085389] rockchip-gpio ff790000.gpio: probed /pinctrl/gpio@ff790000
> --
> [ 0.212208] rockchip-pinctrl pinctrl: pin 637 is not registered so it cannot be requested
> [ 0.212271] rockchip-pinctrl pinctrl: error -EINVAL: pin-637 (gpio3:637)
> [ 0.212344] leds-gpio leds: error -EINVAL: Failed to get GPIO '/leds/led-0'
> [ 0.212389] leds-gpio leds: probe with driver leds-gpio failed with error -22
> --
> [ 0.607545] rockchip-pinctrl pinctrl: pin 519 is not registered so it cannot be requested
> [ 0.608775] rockchip-pinctrl pinctrl: error -EINVAL: pin-519 (gpio0:519)
> [ 0.610003] dwmmc_rockchip fe320000.mmc: probe with driver dwmmc_rockchip failed with error -22
> --
> [ 0.805882] rockchip-pinctrl pinctrl: pin 547 is not registered so it cannot be requested
> [ 0.806672] rockchip-pinctrl pinctrl: error -EINVAL: pin-547 (gpio1:547)
> [ 0.807301] reg-fixed-voltage regulator-vbus-typec: error -EINVAL: can't get GPIO
> [ 0.807307] rockchip-pinctrl pinctrl: pin 602 is not registered so it cannot be requested
> [ 0.807970] reg-fixed-voltage regulator-vbus-typec: probe with driver reg-fixed-voltage failed with error -22
> [ 0.808692] rockchip-pinctrl pinctrl: error -EINVAL: pin-602 (gpio2:602)
> [ 0.810279] reg-fixed-voltage regulator-vcc3v3-pcie: error -EINVAL: can't get GPIO
> [ 0.810284] rockchip-pinctrl pinctrl: pin 665 is not registered so it cannot be requested
> [ 0.810299] rockchip-pinctrl pinctrl: error -EINVAL: pin-665 (gpio4:665)
> [ 0.810960] reg-fixed-voltage regulator-vcc3v3-pcie: probe with driver reg-fixed-voltage failed with error -22
> [ 0.811679] reg-fixed-voltage regulator-vcc5v0-host: error -EINVAL: can't get GPIO
> [ 0.813943] reg-fixed-voltage regulator-vcc5v0-host: probe with driver reg-fixed-voltage failed with error -22
> --
> [ 0.867788] rockchip-pinctrl pinctrl: pin 522 is not registered so it cannot be requested
> [ 0.868537] rockchip-pinctrl pinctrl: error -EINVAL: pin-522 (gpio0:522)
> [ 0.869166] pwrseq_simple sdio-pwrseq: error -EINVAL: reset GPIOs not ready
> [ 0.869798] pwrseq_simple sdio-pwrseq: probe with driver pwrseq_simple failed with error -22
> --
> [ 0.940365] rockchip-pinctrl pinctrl: pin 623 is not registered so it cannot be requested
> [ 0.941084] rockchip-pinctrl pinctrl: error -EINVAL: pin-623 (gpio3:623)
> [ 0.941823] rk_gmac-dwmac fe300000.ethernet: error -EINVAL: Cannot register the MDIO bus
> [ 0.942542] rk_gmac-dwmac fe300000.ethernet: error -EINVAL: MDIO bus (id: 0) registration failed
> [ 0.943772] rk_gmac-dwmac fe300000.ethernet: probe with driver rk_gmac-dwmac failed with error -22
>
> [...]
Applied, thanks!
[1/1] gpio: rockchip: Fix GPIO after convert to dynamic base allocation
https://git.kernel.org/brgl/c/5cd9c6d332f46d1de8b68117fe2a3f1b08ee80ff
Best regards,
--
Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox