* [PATCH v2 0/3] arm64: dts: amlogic: t7: Add UART support and enable Bluetooth on VIM4
From: Ronald Claveau @ 2026-04-16 8:54 UTC (permalink / raw)
To: Neil Armstrong, Kevin Hilman, Jerome Brunet, Martin Blumenstingl,
Rob Herring, Krzysztof Kozlowski, Conor Dooley
Cc: linux-arm-kernel, linux-amlogic, devicetree, linux-kernel,
Ronald Claveau
This series adds all UART controllers for the Amlogic T7 SoC and enables
the Bluetooth controller on the Khadas VIM4 board.
The T7 SoC ships with six UART controllers (A through F), but only
uart_a was previously described in the device tree.
- Patch 1 adds the pinctrl group for UART C, which is needed to route
its four signals (TX, RX, CTS, RTS) through the correct pads.
- Patch 2 completes the uart_a node (peripheral clock) and
repositions it to respect the ascending reg address order required
by the DT specification. It then adds nodes for UART B through F,
each with their respective peripheral clock.
- Patch 3 enables UART C on the Khadas VIM4 board and attaches the
on-board BCM43438 Bluetooth controller to it, with hardware flow
control, wakeup GPIOs, LPO clock and power supplies.
Signed-off-by: Ronald Claveau <linux-kernel-dev@aliel.fr>
---
Changes in v2:
- PATCH 1: change underscore to dash in pin node name,
according to Xianwei's feedback.
- PATCH 3: remove clocks and clock-names as already defined in DTSI,
according to Xianwei's feedback.
- Link to v1: https://lore.kernel.org/r/20260415-add-bluetooth-t7-vim4-v1-0-0ba0746cc1d6@aliel.fr
---
Ronald Claveau (3):
arm64: dts: amlogic: t7: Add uart_c pinctrl pins group
arm64: dts: amlogic: t7: Add UART controllers nodes
arm64: dts: amlogic: t7: khadas-vim4: Enable Bluetooth
.../dts/amlogic/amlogic-t7-a311d2-khadas-vim4.dts | 21 ++++++-
arch/arm64/boot/dts/amlogic/amlogic-t7.dtsi | 73 +++++++++++++++++++---
2 files changed, 85 insertions(+), 9 deletions(-)
---
base-commit: 6aa9edb4f8266cfb913ee74f5e55116550b5574d
change-id: 20260414-add-bluetooth-t7-vim4-f01e03c4ec2a
Best regards,
--
Ronald Claveau <linux-kernel-dev@aliel.fr>
^ permalink raw reply
* Re: [PATCH v3 4/6] dt-bindings: soc: mediatek: devapc: Add bindings for MT8189
From: Krzysztof Kozlowski @ 2026-04-16 8:56 UTC (permalink / raw)
To: Xiaoshun Xu
Cc: Rob Herring, Krzysztof Kozlowski, Conor Dooley, Matthias Brugger,
AngeloGioacchino Del Regno, devicetree, linux-kernel,
linux-arm-kernel, linux-mediatek, Sirius Wang, Vince-wl Liu,
Project_Global_Chrome_Upstream_Group
In-Reply-To: <20260416031231.2932493-5-xiaoshun.xu@mediatek.com>
On Thu, Apr 16, 2026 at 11:12:07AM +0800, Xiaoshun Xu wrote:
> Extend the devapc device tree bindings to support the MediaTek MT8189
> SoC. This includes:
>
> - Adding "mediatek,mt8189-devapc" to the list of compatible strings.
> - Introducing the "vio-idx-num" property to specify the number of bus
> slaves managed by devapc.
>
> These changes enable proper configuration and integration of devapc on
> MT8189 platforms, ensuring accurate device matching and resource
> allocation in the device tree.
Pointless paragraph. Would you write a commit which does not enable
proper configuration?
>
> Signed-off-by: Xiaoshun Xu <xiaoshun.xu@mediatek.com>
> ---
> .../devicetree/bindings/soc/mediatek/devapc.yaml | 10 +++++++---
> 1 file changed, 7 insertions(+), 3 deletions(-)
>
> diff --git a/Documentation/devicetree/bindings/soc/mediatek/devapc.yaml b/Documentation/devicetree/bindings/soc/mediatek/devapc.yaml
> index 99e2caafeadf..06a096440331 100644
> --- a/Documentation/devicetree/bindings/soc/mediatek/devapc.yaml
> +++ b/Documentation/devicetree/bindings/soc/mediatek/devapc.yaml
> @@ -14,13 +14,14 @@ description: |
> analysis and countermeasures.
>
> maintainers:
> - - Neal Liu <neal.liu@mediatek.com>
Your commit said what the change is doing. It's pointless because we see
it in the diff. Except that we don't...
> + - Xiaoshun Xu <xiaoshun.xu@mediatek.com>
>
> properties:
> compatible:
> enum:
> - mediatek,mt6779-devapc
> - mediatek,mt8186-devapc
> + - mediatek,mt8189-devapc
>
> reg:
> description: The base address of devapc register bank
> @@ -30,6 +31,10 @@ properties:
> description: A single interrupt specifier
> maxItems: 1
>
> + vio-idx-num:
Nah, compatible defines it. Please follow standard rules for bindings,
see writing-bindings doc.
> + description: Describe the number of bus slaves controlled by devapc
> + $ref: /schemas/types.yaml#/definitions/uint32
> +
> clocks:
> description: Contains module clock source and clock names
> maxItems: 1
> @@ -42,8 +47,6 @@ required:
> - compatible
> - reg
> - interrupts
> - - clocks
> - - clock-names
Why?
This commit explains nothing and makes some random-looking code changes.
Best regards,
Krzysztof
^ permalink raw reply
* Re: [PATCH v3 6/6] dt-bindings: soc: mediatek: devapc: Add bindings for MT8196
From: Krzysztof Kozlowski @ 2026-04-16 8:58 UTC (permalink / raw)
To: Xiaoshun Xu
Cc: Rob Herring, Krzysztof Kozlowski, Conor Dooley, Matthias Brugger,
AngeloGioacchino Del Regno, devicetree, linux-kernel,
linux-arm-kernel, linux-mediatek, Sirius Wang, Vince-wl Liu,
Project_Global_Chrome_Upstream_Group
In-Reply-To: <20260416031231.2932493-7-xiaoshun.xu@mediatek.com>
On Thu, Apr 16, 2026 at 11:12:09AM +0800, Xiaoshun Xu wrote:
> Extend the devapc device tree bindings to support the MediaTek MT8196
> SoC. This includes:
>
> - Adding "mediatek,mt8196-devapc" to the list of compatible strings.
>
> These changes enable proper configuration and integration of devapc on
> MT8196 platforms, ensuring accurate device matching and resource
> allocation in the device tree.
Same comments. It's really poor commit msg.
Also, subject wrong. Drop second/last, redundant "bindings". The
"dt-bindings" prefix is already stating that these are bindings.
See also:
https://elixir.bootlin.com/linux/v6.17-rc3/source/Documentation/devicetree/bindings/submitting-patches.rst#L18
Best regards,
Krzysztof
^ permalink raw reply
* Re: [PATCH] arm_pmu: acpi: fix reference leak on failed device registration
From: Guangshuo Li @ 2026-04-16 8:59 UTC (permalink / raw)
To: Johan Hovold
Cc: Greg Kroah-Hartman, Mark Rutland, Will Deacon, Anshuman Khandual,
linux-arm-kernel, linux-perf-users, linux-kernel, stable
In-Reply-To: <aeCOdWLaVpH-5w8s@hovoldconsulting.com>
Hi Greg, Mark, Johan,
Thanks for the further comments.
On Thu, 16 Apr 2026 at 15:23, Johan Hovold <johan@kernel.org> wrote:
>
> On Thu, Apr 16, 2026 at 06:40:55AM +0200, Greg Kroah-Hartman wrote:
> > On Wed, Apr 15, 2026 at 07:19:06PM +0100, Mark Rutland wrote:
>
> > > AFAICT you're saying that the reference was taken *within*
> > > platform_device_register(), and then platform_device_register() itself
> > > has failed. I think it's surprising that platform_device_register()
> > > doesn't clean that up itself in the case of an error.
> > >
> > > There are *tonnes* of calls to platform_device_register() throughout the
> > > kernel that don't even bother to check the return value, and many that
> > > just pass the return onto a caller that can't possibly know to call
> > > platform_device_put().
> > >
> > > Code in the same file as platform_device_register() expects it to clean up
> > > after itself, e.g.
> > >
> > > | int platform_add_devices(struct platform_device **devs, int num)
> > > | {
> > > | int i, ret = 0;
> > > |
> > > | for (i = 0; i < num; i++) {
> > > | ret = platform_device_register(devs[i]);
> > > | if (ret) {
> > > | while (--i >= 0)
> > > | platform_device_unregister(devs[i]);
> > > | break;
> > > | }
> > > | }
> > > |
> > > | return ret;
> > > | }
> > >
> > > That's been there since the initial git commit, and back then,
> > > platform_device_register() didn't mention that callers needed to perform
> > > any cleanup.
> > >
> > > I see a comment was added to platform_device_register() in commit:
> > >
> > > 67e532a42cf4 ("driver core: platform: document registration-failure requirement")
> > >
> > > ... and that copied the commend added for device_register() in commit:
> > >
> > > 5739411acbaa ("Driver core: Clarify device cleanup.")
> > >
> > > ... but the potential brokenness is so widespread, and the behaviour is
> > > so surprising, that I'd argue the real but is that device_register()
> > > doesn't clean up in case of error. I don't think it's worth changing
> > > this single instance given the prevalance and churn fixing all of that
> > > would involve.
> > >
> > > I think it would be far better to fix the core driver API such that when
> > > those functions return an error, they've already cleaned up for
> > > themselves.
> > >
> > > Greg, am I missing some functional reason why we can't rework
> > > device_register() and friends to handle cleanup themselves? I appreciate
> > > that'll involve churn for some callers, but AFAICT the majority of
> > > callers don't have the required cleanup.
> >
> > Yes, we should fix the platform core code here, this should not be
> > required to do everywhere as obviously we all got it wrong.
>
> It's not just the platform code as this directly reflects the behaviour
> of device_register() as Mark pointed out.
>
> It is indeed an unfortunate quirk of the driver model, but one can argue
> that having a registration function that frees its argument on errors
> would be even worse. And even more so when many (or most) users get this
> right.
>
> So if we want to change this, I think we would need to deprecate
> device_register() in favour of explicit device_initialize() and
> device_add().
>
> That said, most users of platform_device_register() appear to operate
> on static platform devices which don't even have a release function and
> would trigger a WARN() if we ever drop the reference (which is arguably
> worse than leaking a tiny bit of memory).
>
> So leaving things as-is is also an option.
>
> Johan
I did some more investigation, and it looks like directly changing the
semantics of the existing API would break code that is already correct
today.
In particular, there seem to be at least two different kinds of callers:
Callers that already handle the failure path explicitly after
platform_device_register() fails. For these users, changing
platform_device_register() itself to drop the reference internally
would lead to double put / use-after-free issues.
Callers that operate on static struct platform_device objects. Many of
these do not have a release callback, so blindly dropping the
reference on failure would trigger a WARN.
Because of this, changing platform_device_register() itself to always
clean up on failure does not look safe.
One possible direction may be to leave platform_device_register()
unchanged, and instead add new helper APIs for the different cases.
For case (1), I was thinking of a helper like:
platform_device_register_and_put()
The implementation would simply call platform_device_register(), and if
that fails, call platform_device_put(). Callers converted to this helper
would then no longer perform their own put on the failure path.
For case (2), I was thinking of a helper like:
platform_device_register_static()
The implementation would first install a no-op release callback when
pdev->dev.release is not set, and then call
platform_device_register_and_put(). This would make the failure path
well-defined for static platform_device users, avoiding the reference
leak without triggering a WARN.
If this direction sounds reasonable, I would be happy to work on it and
send a patch, and I would also be very willing to help with the related
API conversion work for existing callers.
Thanks,
Guangshuo
^ permalink raw reply
* Re: [PATCH v2 2/3] dt-bindings: gpio: Add EIO GPIO compatible to gpio-zynq
From: Krzysztof Kozlowski @ 2026-04-16 9:06 UTC (permalink / raw)
To: Michal Simek
Cc: Conor Dooley, Shubhrajyoti Datta, linux-kernel, git,
shubhrajyoti.datta, Srinivas Neeli, Linus Walleij,
Bartosz Golaszewski, Rob Herring, Krzysztof Kozlowski,
Conor Dooley, linux-gpio, devicetree, linux-arm-kernel
In-Reply-To: <c973f9d4-9bb5-40f4-8f09-72e23f92cd2d@amd.com>
On Thu, Apr 16, 2026 at 07:58:27AM +0200, Michal Simek wrote:
>
>
> On 4/15/26 17:01, Conor Dooley wrote:
> > On Wed, Apr 15, 2026 at 04:26:27PM +0530, Shubhrajyoti Datta wrote:
> > > EIO (Extended IO) is a GPIO block found on xa2ve3288 silicon..
> >
> >
> > Why does the compatible have a "1.0" when it is in silicon?
>
> Sorry not following what the problem is. Yes this is hard block in silicon
> and it is silicon v1.
Writing bindings: compatibles should be specific to device, not some
arbitrary versioning.
OR explain in commit msg. That commit msg clealy suggests code is wrong.
>
> > Why doesn't the compatible contain "xa2ve3288"?
>
> This unit can be used on different silicons too.
That's not what the commit said.
>
> > Why is this device not compatible with existing ones, since
> > gpio-lines-names appears to be the sole difference?
>
> There is no way how to detect gpio width.
Where in the commit msg are the differences explained?
Best regards,
Krzysztof
^ permalink raw reply
* Re: [PATCH] crypto: tstmgr - guard xxhash tests
From: Herbert Xu @ 2026-04-16 9:06 UTC (permalink / raw)
To: Hamza Mahfooz
Cc: linux-crypto, David S. Miller, Maxime Coquelin, Alexandre Torgue,
linux-stm32, linux-arm-kernel, linux-kernel, Jeff Barnes,
Paul Monson
In-Reply-To: <adffSYxKIuaDLZit@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net>
On Thu, Apr 09, 2026 at 10:18:01AM -0700, Hamza Mahfooz wrote:
>
> alg: hash: failed to allocate transform for xxhash64: -2
> Kernel panic - not syncing: alg: self-tests for xxhash64 (xxhash64) failed in fips mode!
> CPU: 0 PID: 425 Comm: modprobe Not tainted 6.6.130.2-2.azl3 #1
> Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.1 01/08/2026
> Call Trace:
> <TASK>
> dump_stack_lvl+0x4c/0x70
> dump_stack+0x14/0x20
> panic+0x179/0x330
> alg_test+0x678/0x680
> ? __alloc_pages+0x1e2/0x340
> do_test+0x26f8/0x7670 [tcrypt]
So the error is coming from tcrypt. I think that's where the ifdef
should be added.
Thanks,
--
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply
* Re: [PATCH] crypto: tstmgr - guard xxhash tests
From: Herbert Xu @ 2026-04-16 9:09 UTC (permalink / raw)
To: Hamza Mahfooz
Cc: linux-crypto, David S. Miller, Maxime Coquelin, Alexandre Torgue,
linux-stm32, linux-arm-kernel, linux-kernel, Jeff Barnes,
Paul Monson
In-Reply-To: <aeCmk6LbLFT4Keo2@gondor.apana.org.au>
On Thu, Apr 16, 2026 at 05:06:27PM +0800, Herbert Xu wrote:
>
> So the error is coming from tcrypt. I think that's where the ifdef
> should be added.
On second thought, fips_allowed should not mean that an algorithm
must be present.
So we should change it such that an -ENOENT is not fatal, or at least
when it's called from tcrypt.
Thanks,
--
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply
* Re: [PATCH v2 0/9] driver core / pmdomain: Add support for fined grained sync_state
From: Geert Uytterhoeven @ 2026-04-16 9:15 UTC (permalink / raw)
To: Ulf Hansson
Cc: Saravana Kannan, Rafael J . Wysocki, Greg Kroah-Hartman, linux-pm,
Sudeep Holla, Cristian Marussi, Kevin Hilman, Stephen Boyd,
Marek Szyprowski, Bjorn Andersson, Abel Vesa, Peng Fan,
Tomi Valkeinen, Maulik Shah, Konrad Dybcio, Thierry Reding,
Jonathan Hunter, Dmitry Baryshkov, linux-arm-kernel, linux-kernel
In-Reply-To: <20260410104058.83748-1-ulf.hansson@linaro.org>
Hi Ulf,
On Fri, 10 Apr 2026 at 12:41, Ulf Hansson <ulf.hansson@linaro.org> wrote:
> Since the introduction [1] of the common sync_state support for pmdomains
> (genpd), we have encountered a lot of various interesting problems. In most
> cases the new behaviour of genpd triggered some weird platform specific bugs.
>
> That said, in LPC in Tokyo me and Saravana hosted a session to walk through the
> remaining limitations that we have found for genpd's sync state support. In
> particular, we discussed the problems we have for the so-called onecell power
> domain providers, where a single provider typically provides multiple
> independent power domains, all with their own set of consumers.
>
> Note that, onecell power domain providers are very common. It's being used by
> many SoCs/platforms/technologies. To name a few:
> SCMI, Qualcomm, NXP, Mediatek, Renesas, TI, etc.
>
> Anyway, in these cases, the generic sync_state mechanism with fw_devlink isn't
> fine grained enough, as we end up waiting for all consumers for all power
> domains before the ->sync_callback gets called for the supplier/provider. In
> other words, we may end up keeping unused power domains powered-on, for no good
> reasons.
>
> The series intends to fix this problem. Please have a look at the commit
> messages for more details and help review/test!
Thanks for the update!
At first glance, the only real change compared to v1 seems to be
the removal of printing
pr_info("%s:%s con=%s\n", __func__, dev_name(dev),
dev_name(consumer));
Right?
Gr{oetje,eeting}s,
Geert
--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org
In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
^ permalink raw reply
* Re: [PATCH] arm_pmu: acpi: fix reference leak on failed device registration
From: Mark Rutland @ 2026-04-16 9:30 UTC (permalink / raw)
To: Johan Hovold
Cc: Greg Kroah-Hartman, Guangshuo Li, Will Deacon, Anshuman Khandual,
linux-arm-kernel, linux-perf-users, linux-kernel, stable
In-Reply-To: <aeCOdWLaVpH-5w8s@hovoldconsulting.com>
On Thu, Apr 16, 2026 at 09:23:33AM +0200, Johan Hovold wrote:
> On Thu, Apr 16, 2026 at 06:40:55AM +0200, Greg Kroah-Hartman wrote:
> > On Wed, Apr 15, 2026 at 07:19:06PM +0100, Mark Rutland wrote:
>
> > > AFAICT you're saying that the reference was taken *within*
> > > platform_device_register(), and then platform_device_register() itself
> > > has failed. I think it's surprising that platform_device_register()
> > > doesn't clean that up itself in the case of an error.
> > >
> > > There are *tonnes* of calls to platform_device_register() throughout the
> > > kernel that don't even bother to check the return value, and many that
> > > just pass the return onto a caller that can't possibly know to call
> > > platform_device_put().
> > >
> > > Code in the same file as platform_device_register() expects it to clean up
> > > after itself, e.g.
> > >
> > > | int platform_add_devices(struct platform_device **devs, int num)
> > > | {
> > > | int i, ret = 0;
> > > |
> > > | for (i = 0; i < num; i++) {
> > > | ret = platform_device_register(devs[i]);
> > > | if (ret) {
> > > | while (--i >= 0)
> > > | platform_device_unregister(devs[i]);
> > > | break;
> > > | }
> > > | }
> > > |
> > > | return ret;
> > > | }
> > >
> > > That's been there since the initial git commit, and back then,
> > > platform_device_register() didn't mention that callers needed to perform
> > > any cleanup.
> > >
> > > I see a comment was added to platform_device_register() in commit:
> > >
> > > 67e532a42cf4 ("driver core: platform: document registration-failure requirement")
> > >
> > > ... and that copied the commend added for device_register() in commit:
> > >
> > > 5739411acbaa ("Driver core: Clarify device cleanup.")
> > >
> > > ... but the potential brokenness is so widespread, and the behaviour is
> > > so surprising, that I'd argue the real but is that device_register()
> > > doesn't clean up in case of error. I don't think it's worth changing
> > > this single instance given the prevalance and churn fixing all of that
> > > would involve.
> > >
> > > I think it would be far better to fix the core driver API such that when
> > > those functions return an error, they've already cleaned up for
> > > themselves.
> > >
> > > Greg, am I missing some functional reason why we can't rework
> > > device_register() and friends to handle cleanup themselves? I appreciate
> > > that'll involve churn for some callers, but AFAICT the majority of
> > > callers don't have the required cleanup.
> >
> > Yes, we should fix the platform core code here, this should not be
> > required to do everywhere as obviously we all got it wrong.
>
> It's not just the platform code as this directly reflects the behaviour
> of device_register() as Mark pointed out.
>
> It is indeed an unfortunate quirk of the driver model, but one can argue
> that having a registration function that frees its argument on errors
> would be even worse. And even more so when many (or most) users get this
> right.
Ah, sorry; I had missed that the _put() step would actually free the
object (and as you explain below, how that won't work for many callers).
> So if we want to change this, I think we would need to deprecate
> device_register() in favour of explicit device_initialize() and
> device_add().
Is is possible to have {platfom_,}device_uninitialize() functions that
does everything except the ->release() call? If we had that, then we'd
be able to have a flow along the lines of:
int some_init_function(void)
{
int err;
platform_device_init(&static_pdev);
err = platform_device_add(&static_pdev))
if (err)
goto out_uninit;
return 0;
out_uninit:
platform_device_uninit(&static_pdev);
return err;
}
... which I think would align with what people generally expect to have
to do.
Those would have to check that only a single reference was held (from
the corresponding _initialize()), and could WARN/fail if more were held.
> That said, most users of platform_device_register() appear to operate
> on static platform devices which don't even have a release function and
> would trigger a WARN() if we ever drop the reference (which is arguably
> worse than leaking a tiny bit of memory).
>
> So leaving things as-is is also an option.
I suspect that might be the best option for now.
Mark.
^ permalink raw reply
* Re: [PATCH 1/9] dt-bindings: sound: mt2701-afe-pcm: add HDMI audio path clocks
From: Krzysztof Kozlowski @ 2026-04-16 9:38 UTC (permalink / raw)
To: Daniel Golle
Cc: Liam Girdwood, Mark Brown, Rob Herring, Krzysztof Kozlowski,
Conor Dooley, Matthias Brugger, AngeloGioacchino Del Regno,
Jaroslav Kysela, Takashi Iwai, Cyril Chao, Arnd Bergmann,
Kuninori Morimoto, Nícolas F. R. A. Prado, Eugen Hristev,
linux-sound, devicetree, linux-kernel, linux-arm-kernel,
linux-mediatek
In-Reply-To: <50afd83a314cd20c715fb9b0d3bc85fb00f9a6eb.1776265610.git.daniel@makrotopia.org>
On Wed, Apr 15, 2026 at 04:23:27PM +0100, Daniel Golle wrote:
> Document four additional optional clocks feeding the HDMI audio
> output path on MT2701 and MT7623N: the HADDS2 PLL (root of the
There is no MT7623N compatible in this file, so that's confusing. Does
mt7622 have it? If not, then it should be restricted per variant. If
yet, the model name is confusing.
Best regards,
Krzysztof
^ permalink raw reply
* ACPI dump of HP OmniBook 5 14-he0xxx
From: gaming stream @ 2026-04-16 9:42 UTC (permalink / raw)
To: linux-arm-kernel
Hi,
I’ve extracted a full ACPI dump from an HP OmniBook 5 14-he0xxx
running Snapdragon X (X126100, Oryon CPU).
This includes DSDT, all SSDTs, and full raw ACPI tables.
Repository: https://github.com/TheXterminator/hp-omnibook-5-snapdragon-x-acpi.git
Sharing in case it helps with Linux bring-up or device tree work
for Snapdragon X platforms.
^ permalink raw reply
* Re: [PATCH v2 0/9] driver core / pmdomain: Add support for fined grained sync_state
From: Ulf Hansson @ 2026-04-16 9:42 UTC (permalink / raw)
To: Geert Uytterhoeven
Cc: Saravana Kannan, Rafael J . Wysocki, Greg Kroah-Hartman, linux-pm,
Sudeep Holla, Cristian Marussi, Kevin Hilman, Stephen Boyd,
Marek Szyprowski, Bjorn Andersson, Abel Vesa, Peng Fan,
Tomi Valkeinen, Maulik Shah, Konrad Dybcio, Thierry Reding,
Jonathan Hunter, Dmitry Baryshkov, linux-arm-kernel, linux-kernel
In-Reply-To: <CAMuHMdWHnANr4R+AW5-xHrm=D4SJLuKVF5mq3PFkbevcTz5qWw@mail.gmail.com>
On Thu, 16 Apr 2026 at 11:15, Geert Uytterhoeven <geert@linux-m68k.org> wrote:
>
> Hi Ulf,
>
> On Fri, 10 Apr 2026 at 12:41, Ulf Hansson <ulf.hansson@linaro.org> wrote:
> > Since the introduction [1] of the common sync_state support for pmdomains
> > (genpd), we have encountered a lot of various interesting problems. In most
> > cases the new behaviour of genpd triggered some weird platform specific bugs.
> >
> > That said, in LPC in Tokyo me and Saravana hosted a session to walk through the
> > remaining limitations that we have found for genpd's sync state support. In
> > particular, we discussed the problems we have for the so-called onecell power
> > domain providers, where a single provider typically provides multiple
> > independent power domains, all with their own set of consumers.
> >
> > Note that, onecell power domain providers are very common. It's being used by
> > many SoCs/platforms/technologies. To name a few:
> > SCMI, Qualcomm, NXP, Mediatek, Renesas, TI, etc.
> >
> > Anyway, in these cases, the generic sync_state mechanism with fw_devlink isn't
> > fine grained enough, as we end up waiting for all consumers for all power
> > domains before the ->sync_callback gets called for the supplier/provider. In
> > other words, we may end up keeping unused power domains powered-on, for no good
> > reasons.
> >
> > The series intends to fix this problem. Please have a look at the commit
> > messages for more details and help review/test!
>
> Thanks for the update!
>
> At first glance, the only real change compared to v1 seems to be
> the removal of printing
>
> pr_info("%s:%s con=%s\n", __func__, dev_name(dev),
> dev_name(consumer));
>
> Right?
Correct! I forgot to include a version history, sorry.
Besides the removed print, I have just added your tags and updated the
commit message in patch9.
FYI, my plan is to queue this as soon as v7.1-rc1 is available.
Kind regards
Uffe
^ permalink raw reply
* Re: [PATCH] arm_pmu: acpi: fix reference leak on failed device registration
From: Mark Rutland @ 2026-04-16 9:50 UTC (permalink / raw)
To: Guangshuo Li
Cc: Johan Hovold, Greg Kroah-Hartman, Will Deacon, Anshuman Khandual,
linux-arm-kernel, linux-perf-users, linux-kernel, stable
In-Reply-To: <CANUHTR9+Z9s3thfKMC5qiLMdYJAo-1sX1g9QiU65OVCbb+mAMQ@mail.gmail.com>
On Thu, Apr 16, 2026 at 04:59:01PM +0800, Guangshuo Li wrote:
> On Thu, 16 Apr 2026 at 15:23, Johan Hovold <johan@kernel.org> wrote:
> > On Thu, Apr 16, 2026 at 06:40:55AM +0200, Greg Kroah-Hartman wrote:
> > > On Wed, Apr 15, 2026 at 07:19:06PM +0100, Mark Rutland wrote:
> > > > Greg, am I missing some functional reason why we can't rework
> > > > device_register() and friends to handle cleanup themselves? I appreciate
> > > > that'll involve churn for some callers, but AFAICT the majority of
> > > > callers don't have the required cleanup.
> > >
> > > Yes, we should fix the platform core code here, this should not be
> > > required to do everywhere as obviously we all got it wrong.
> >
> > It's not just the platform code as this directly reflects the behaviour
> > of device_register() as Mark pointed out.
> >
> > It is indeed an unfortunate quirk of the driver model, but one can argue
> > that having a registration function that frees its argument on errors
> > would be even worse. And even more so when many (or most) users get this
> > right.
> >
> > So if we want to change this, I think we would need to deprecate
> > device_register() in favour of explicit device_initialize() and
> > device_add().
> >
> > That said, most users of platform_device_register() appear to operate
> > on static platform devices which don't even have a release function and
> > would trigger a WARN() if we ever drop the reference (which is arguably
> > worse than leaking a tiny bit of memory).
> >
> > So leaving things as-is is also an option.
> >
> > Johan
>
> I did some more investigation, and it looks like directly changing the
> semantics of the existing API would break code that is already correct
> today.
Evidently this wasn't entirely clear, but when I suggested changing the
semantic, I had implicitly meant that we'd also go and fix up callers to
handle the new semantic.
I agree that whatever we do, we'll have to change some callers, given
that existing callers have inconsistent expectations.
> In particular, there seem to be at least two different kinds of callers:
>
> Callers that already handle the failure path explicitly after
> platform_device_register() fails. For these users, changing
> platform_device_register() itself to drop the reference internally
> would lead to double put / use-after-free issues.
Yes; for those we could drop the explicit cleanup.
As an alternative (as Johan mentioned above), if we deprecated
*_register() in favour of separate *_initialize() and *_add() calls,
then we could require that callers had explicit cleanup. As that cleanup
would more obviously pair with the *_initialize() step, it would be less
surprising than cleaning up for a function that returned an error.
As I mentioned in my other reply to Johan, that might also give options
for how to handle the static platform_device case, e.g. with an
*_uninitialize() function.
> Callers that operate on static struct platform_device objects. Many of
> these do not have a release callback, so blindly dropping the
> reference on failure would trigger a WARN.
>
> Because of this, changing platform_device_register() itself to always
> clean up on failure does not look safe.
I agree that we probably can't have _*register() do all the necessary
cleanup, since callers want different things.
As per Johan's suggestion, and my reply, I suspect the best option
for a consistent API would be to deprecate *_register() in favour of
separate *_initialize() and *_add() calls.
> One possible direction may be to leave platform_device_register()
> unchanged, and instead add new helper APIs for the different cases.
>
> For case (1), I was thinking of a helper like:
>
> platform_device_register_and_put()
>
> The implementation would simply call platform_device_register(), and if
> that fails, call platform_device_put(). Callers converted to this helper
> would then no longer perform their own put on the failure path.
I think that's going to be a source of confusion, because there's no
clear way to name that function. A '_and_put' suffix makes it sound like
it does a put unconditionally, rather than when the *_add() step fails.
Otherwise, I agree that would work for those callers.
> For case (2), I was thinking of a helper like:
>
> platform_device_register_static()
>
> The implementation would first install a no-op release callback when
> pdev->dev.release is not set, and then call
> platform_device_register_and_put(). This would make the failure path
> well-defined for static platform_device users, avoiding the reference
> leak without triggering a WARN.
Something like that might work.
As above, I think my preference would be to have separate
init/add/uninit calls, as that way each of the functions succeeds or
fails atomically, which is more aligned with general conventions.
> If this direction sounds reasonable, I would be happy to work on it and
> send a patch, and I would also be very willing to help with the related
> API conversion work for existing callers.
Fantastic!
I think we should hear what Greg thinks of the options before we start
on that, but it's great to hear that you're willing!
Mark.
^ permalink raw reply
* [PATCH] MAINTAINERS: Update HiSilicon PMU driver maintainer to Yushan Wang
From: Jonathan Cameron @ 2026-04-16 9:51 UTC (permalink / raw)
To: Yushan Wang, Jie Zhan, Will Deacon, Mark Rutland,
linux-arm-kernel, linux-perf-users
Cc: linuxarm, jic23
Replace myself with Yushan Wang who is very familiar with the HiSilicon PMU
drivers.
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
---
MAINTAINERS | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/MAINTAINERS b/MAINTAINERS
index d1cc0e12fe1f..8b95a43527fa 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -11563,7 +11563,7 @@ F: Documentation/devicetree/bindings/net/hisilicon*.txt
F: drivers/net/ethernet/hisilicon/
HISILICON PMU DRIVER
-M: Jonathan Cameron <jonathan.cameron@huawei.com>
+M: Yushan Wang <wangyushan12@huawei.com>
S: Supported
W: http://www.hisilicon.com
F: Documentation/admin-guide/perf/hisi-pcie-pmu.rst
--
2.51.0
^ permalink raw reply related
* Re: [PATCH] MAINTAINERS: Update HiSilicon PMU driver maintainer to Yushan Wang
From: Yushan Wang @ 2026-04-16 10:10 UTC (permalink / raw)
To: Jonathan Cameron, Yushan Wang, Jie Zhan, Will Deacon,
Mark Rutland, linux-arm-kernel, linux-perf-users
Cc: jic23, Linuxarm
In-Reply-To: <20260416095110.25612-1-Jonathan.Cameron@huawei.com>
On 4/16/2026 5:51 PM, Jonathan Cameron wrote:
> Replace myself with Yushan Wang who is very familiar with the HiSilicon PMU
> drivers.
>
> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> ---
> MAINTAINERS | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
It's privileged to have your guidance, thanks a lot!
Acked-by: Yushan Wang <wangyushan12@huawei.com>
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index d1cc0e12fe1f..8b95a43527fa 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -11563,7 +11563,7 @@ F: Documentation/devicetree/bindings/net/hisilicon*.txt
> F: drivers/net/ethernet/hisilicon/
>
> HISILICON PMU DRIVER
> -M: Jonathan Cameron <jonathan.cameron@huawei.com>
> +M: Yushan Wang <wangyushan12@huawei.com>
> S: Supported
> W: http://www.hisilicon.com
> F: Documentation/admin-guide/perf/hisi-pcie-pmu.rst
^ permalink raw reply
* Re: [PATCH v2 2/3] pwm: rp1: Add RP1 PWM controller driver
From: Andrea della Porta @ 2026-04-16 10:30 UTC (permalink / raw)
To: Uwe Kleine-König
Cc: Andrea della Porta, linux-pwm, Rob Herring, Krzysztof Kozlowski,
Conor Dooley, Florian Fainelli,
Broadcom internal kernel review list, devicetree,
linux-rpi-kernel, linux-arm-kernel, linux-kernel, Naushir Patuck,
Stanimir Varbanov, mbrugger
In-Reply-To: <adkrHkANCzxO8KUP@monoceros>
Hi Uwe,
On 19:31 Fri 10 Apr , Uwe Kleine-König wrote:
> Hello Andrea,
>
> nice work for a v2!
Thanks!
>
> On Fri, Apr 10, 2026 at 04:09:58PM +0200, Andrea della Porta wrote:
<...snip...>
> > +#define RP1_PWM_GLOBAL_CTRL 0x000
> > +#define RP1_PWM_CHANNEL_CTRL(x) (0x014 + ((x) * 0x10))
> > +#define RP1_PWM_RANGE(x) (0x018 + ((x) * 0x10))
> > +#define RP1_PWM_PHASE(x) (0x01C + ((x) * 0x10))
> > +#define RP1_PWM_DUTY(x) (0x020 + ((x) * 0x10))
> > +
> > +/* 8:FIFO_POP_MASK + 0:Trailing edge M/S modulation */
> > +#define RP1_PWM_CHANNEL_DEFAULT (BIT(8) + BIT(0))
>
> Please add a #define for BIT(8) and then use that and
> FIELD_PREP(RP1_PWM_MODE, RP1_PWM_MODE_SOMENICENAME) to define the
> constant. Also I would define it below the register defines.
Ack.
>
> > +#define RP1_PWM_CHANNEL_ENABLE(x) BIT(x)
> > +#define RP1_PWM_POLARITY BIT(3)
> > +#define RP1_PWM_SET_UPDATE BIT(31)
> > +#define RP1_PWM_MODE_MASK GENMASK(1, 0)
>
> s/_MASK// please
>
> It would be great if the bitfield's names started with the register
> name.
Ack.
>
> > +
> > +#define RP1_PWM_NUM_PWMS 4
> > +
> > +struct rp1_pwm {
> > + struct regmap *regmap;
> > + struct clk *clk;
> > + unsigned long clk_rate;
> > + bool clk_enabled;
> > +};
> > +
> > +struct rp1_pwm_waveform {
> > + u32 period_ticks;
> > + u32 duty_ticks;
> > + bool enabled;
> > + bool inverted_polarity;
> > +};
> > +
> > +static const struct regmap_config rp1_pwm_regmap_config = {
> > + .reg_bits = 32,
> > + .val_bits = 32,
> > + .reg_stride = 4,
> > + .max_register = 0x60,
>
> I'm not a fan of aligning the = in a struct, still more if it fails like
> here. Please consistently align all =s, or even better, use a single
> space before each =. (Same for the struct definitions above, but I won't
> insist.)
Let's use the single space.
>
> > +};
> > +
> > +static void rp1_pwm_apply_config(struct pwm_chip *chip, struct pwm_device *pwm)
> > +{
> > + struct rp1_pwm *rp1 = pwmchip_get_drvdata(chip);
> > + u32 value;
> > +
> > + /* update the changed registers on the next strobe to avoid glitches */
> > + regmap_read(rp1->regmap, RP1_PWM_GLOBAL_CTRL, &value);
> > + value |= RP1_PWM_SET_UPDATE;
> > + regmap_write(rp1->regmap, RP1_PWM_GLOBAL_CTRL, value);
>
> I assume there is a glitch if I update two channels and the old
> configuration of the first channel ends while I'm in the middle of
> configuring the second?
The configuration registers are per-channel but the update flag is global.
I don't have details of the hw insights, my best guess is that anything that
you set in the registers before updating the flag will take effect, so there
should be no glitches.
>
> > +}
> > +
> > +static int rp1_pwm_request(struct pwm_chip *chip, struct pwm_device *pwm)
> > +{
> > + struct rp1_pwm *rp1 = pwmchip_get_drvdata(chip);
> > +
> > + /* init channel to reset defaults */
> > + regmap_write(rp1->regmap, RP1_PWM_CHANNEL_CTRL(pwm->hwpwm), RP1_PWM_CHANNEL_DEFAULT);
> > + return 0;
> > +}
> > +
> > +static int rp1_pwm_round_waveform_tohw(struct pwm_chip *chip,
> > + struct pwm_device *pwm,
> > + const struct pwm_waveform *wf,
> > + void *_wfhw)
> > +{
> > + struct rp1_pwm *rp1 = pwmchip_get_drvdata(chip);
> > + struct rp1_pwm_waveform *wfhw = _wfhw;
> > + u64 clk_rate = rp1->clk_rate;
> > + u64 ticks;
>
> if (!wf->period_length_ns)
> wfhw->enabled = false
> return 0;
>
> > + ticks = mul_u64_u64_div_u64(wf->period_length_ns, clk_rate, NSEC_PER_SEC);
>
> To ensure this doesn't overflow please fail to probe the driver if
> clk_rate > 1 GHz with an explaining comment. (Or alternatively calculate
> the length of period_ticks = U32_MAX and skip the calculation if
> wf->period_length_ns is bigger.)
Ack.
>
> > + if (ticks > U32_MAX)
> > + ticks = U32_MAX;
> > + wfhw->period_ticks = ticks;
>
> What happens if wf->period_length_ns > 0 but ticks == 0?
I've added a check, returning 1 to signal teh round-up, and a minimum tick of 1
in this case.
>
> > + if (wf->duty_offset_ns + wf->duty_length_ns >= wf->period_length_ns) {
>
> The maybe surprising effect here is that in the two cases
>
> wf->duty_offset_ns == wf->period_length_ns and wf->duty_length_ns == 0
>
> and
>
> wf->duty_length_ns == wf->period_length_ns and wf->duty_offset_ns == 0
>
> you're configuring inverted polarity. I doesn't matter technically
> because the result is the same, but for consumers still using pwm_state
> this is irritating. That's why pwm-stm32 uses inverted polarity only if
> also wf->duty_length_ns and wf->duty_offset_ns are non-zero.
Ack.
>
> > + ticks = mul_u64_u64_div_u64(wf->period_length_ns - wf->duty_length_ns,
> > + clk_rate, NSEC_PER_SEC);
>
> The rounding is wrong here. You should pick the biggest duty_length not
> bigger than wf->duty_length_ns, so you have to use
>
> ticks = wfhw->period_ticks - mul_u64_u64_div_u64(wf->duty_length_ns, clk_rate, NSEC_PER_SEC):
>
> . I see this is a hole in the pwmtestperf coverage.
Ack.
>
> > + wfhw->inverted_polarity = true;
> > + } else {
> > + ticks = mul_u64_u64_div_u64(wf->duty_length_ns, clk_rate, NSEC_PER_SEC);
> > + wfhw->inverted_polarity = false;
> > + }
> > +
> > + if (ticks > wfhw->period_ticks)
> > + ticks = wfhw->period_ticks;
>
> You can and should assume that wf->duty_length_ns <=
> wf->period_length_ns. Then the if condition can never become true.
Ack.
>
> > + wfhw->duty_ticks = ticks;
> > +
> > + wfhw->enabled = !!wfhw->duty_ticks;
> > +
> > + return 0;
> > +}
> > +
> > +static int rp1_pwm_round_waveform_fromhw(struct pwm_chip *chip,
> > + struct pwm_device *pwm,
> > + const void *_wfhw,
> > + struct pwm_waveform *wf)
> > +{
> > + struct rp1_pwm *rp1 = pwmchip_get_drvdata(chip);
> > + const struct rp1_pwm_waveform *wfhw = _wfhw;
> > + u64 clk_rate = rp1->clk_rate;
> > + u32 ticks;
> > +
> > + memset(wf, 0, sizeof(*wf));
>
> wf = (struct pwm_waveform){ };
>
> is usually more efficient.
Ack.
>
> > + if (!wfhw->enabled)
> > + return 0;
> > +
> > + wf->period_length_ns = DIV_ROUND_UP_ULL((u64)wfhw->period_ticks * NSEC_PER_SEC, clk_rate);
> > +
> > + if (wfhw->inverted_polarity) {
> > + wf->duty_length_ns = DIV_ROUND_UP_ULL((u64)wfhw->duty_ticks * NSEC_PER_SEC,
> > + clk_rate);
> > + } else {
> > + wf->duty_offset_ns = DIV_ROUND_UP_ULL((u64)wfhw->duty_ticks * NSEC_PER_SEC,
> > + clk_rate);
> > + ticks = wfhw->period_ticks - wfhw->duty_ticks;
> > + wf->duty_length_ns = DIV_ROUND_UP_ULL((u64)ticks * NSEC_PER_SEC, clk_rate);
> > + }
>
> This needs adaption after the rounding issue in tohw is fixed.
Ack.
>
> > + return 0;
> > +}
> > +
> > +static int rp1_pwm_write_waveform(struct pwm_chip *chip,
> > + struct pwm_device *pwm,
> > + const void *_wfhw)
> > +{
> > + struct rp1_pwm *rp1 = pwmchip_get_drvdata(chip);
> > + const struct rp1_pwm_waveform *wfhw = _wfhw;
> > + u32 value;
> > +
> > + /* set period and duty cycle */
> > + regmap_write(rp1->regmap,
> > + RP1_PWM_RANGE(pwm->hwpwm), wfhw->period_ticks);
> > + regmap_write(rp1->regmap,
> > + RP1_PWM_DUTY(pwm->hwpwm), wfhw->duty_ticks);
> > +
> > + /* set polarity */
> > + regmap_read(rp1->regmap, RP1_PWM_CHANNEL_CTRL(pwm->hwpwm), &value);
> > + if (!wfhw->inverted_polarity)
> > + value &= ~RP1_PWM_POLARITY;
> > + else
> > + value |= RP1_PWM_POLARITY;
> > + regmap_write(rp1->regmap, RP1_PWM_CHANNEL_CTRL(pwm->hwpwm), value);
> > +
> > + /* enable/disable */
> > + regmap_read(rp1->regmap, RP1_PWM_GLOBAL_CTRL, &value);
> > + if (wfhw->enabled)
> > + value |= RP1_PWM_CHANNEL_ENABLE(pwm->hwpwm);
> > + else
> > + value &= ~RP1_PWM_CHANNEL_ENABLE(pwm->hwpwm);
> > + regmap_write(rp1->regmap, RP1_PWM_GLOBAL_CTRL, value);
>
> You can exit early if wfhw->enabled is false after clearing the channel
> enable bit.
Ack.
>
> > + rp1_pwm_apply_config(chip, pwm);
> > +
> > + return 0;
> > +}
> > +
<,...snip...>
> > + }
> > +
> > + return 0;
> > +
> > +err_disable_clk:
> > + clk_disable_unprepare(rp1->clk);
> > +
> > + return ret;
> > +}
>
> On remove you miss to balance the call to clk_prepare_enable() (if no
> failed call to clk_prepare_enable() in rp1_pwm_resume() happend).
Since this driver now exports a syscon, it's only builtin (=Y) so
it cannot be unloaded.
I've also avoided the .remove callback via .suppress_bind_attrs.
>
> > +
> > +static int rp1_pwm_suspend(struct device *dev)
> > +{
> > + struct rp1_pwm *rp1 = dev_get_drvdata(dev);
> > +
> > + if (rp1->clk_enabled) {
> > + clk_disable_unprepare(rp1->clk);
> > + rp1->clk_enabled = false;
> > + }
> > +
> > + return 0;
> > +}
> > +
> > +static int rp1_pwm_resume(struct device *dev)
> > +{
> > + struct rp1_pwm *rp1 = dev_get_drvdata(dev);
> > + int ret;
> > +
> > + ret = clk_prepare_enable(rp1->clk);
> > + if (ret) {
> > + dev_err(dev, "Failed to enable clock on resume: %d\n", ret);
>
> Please use %pe for error codes.
Ack.
Best regards,
Andrea
>
> > + return ret;
> > + }
> > +
> > + rp1->clk_enabled = true;
> > +
> > + return 0;
> > +}
>
> Best regards
> Uwe
^ permalink raw reply
* Re: [PATCH 0/3] mm: split the file's i_mmap tree for NUMA
From: Mateusz Guzik @ 2026-04-16 10:29 UTC (permalink / raw)
To: Huang Shijie
Cc: akpm, viro, brauner, linux-mm, linux-kernel, linux-arm-kernel,
linux-fsdevel, muchun.song, osalvador, linux-trace-kernel,
linux-perf-users, linux-parisc, nvdimm, zhongyuan, fangbaoshun,
yingzhiwei
In-Reply-To: <ad4EvoDcAKE2Sl4+@hsj-2U-Workstation>
On Tue, Apr 14, 2026 at 11:11 AM Huang Shijie <huangsj@hygon.cn> wrote:
>
> On Mon, Apr 13, 2026 at 05:33:21PM +0200, Mateusz Guzik wrote:
> > On Mon, Apr 13, 2026 at 02:20:39PM +0800, Huang Shijie wrote:
> > > In NUMA, there are maybe many NUMA nodes and many CPUs.
> > > For example, a Hygon's server has 12 NUMA nodes, and 384 CPUs.
> > > In the UnixBench tests, there is a test "execl" which tests
> > > the execve system call.
> > >
> > > When we test our server with "./Run -c 384 execl",
> > > the test result is not good enough. The i_mmap locks contended heavily on
> > > "libc.so" and "ld.so". For example, the i_mmap tree for "libc.so" can have
> > > over 6000 VMAs, all the VMAs can be in different NUMA mode.
> > > The insert/remove operations do not run quickly enough.
> > >
> > > patch 1 & patch 2 are try to hide the direct access of i_mmap.
> > > patch 3 splits the i_mmap into sibling trees, and we can get better
> > > performance with this patch set:
> > > we can get 77% performance improvement(10 times average)
> > >
> >
> > To my reading you kept the lock as-is and only distributed the protected
> > state.
> >
> > While I don't doubt the improvement, I'm confident should you take a
> > look at the profile you are going to find this still does not scale with
> > rwsem being one of the problems (there are other global locks, some of
> > which have experimental patches for).
> IMHO, when the number of VMAs in the i_mmap is very large, only optimise the rwsem
> lock does not help too much for our NUMA case.
>
> In our NUMA server, the remote access could be the major issue.
>
I'm confused how this is not supposed to help. You moved your data to
be stored per-domain. With my proposal the lock itself will also get
that treatment.
Modulo the issue of what to do with code wanting to iterate the entire
thing, this is blatantly faster.
>
> >
> > Apart from that this does nothing to help high core systems which are
> > all one node, which imo puts another question mark on this specific
> > proposal.
> Yes, this patch set only focus on the NUMA case.
> The one-node case should use the original i_mmap.
>
> Maybe I can add a new config, CONFIG_SPILT_I_MMAP. The config is disabled
> by default, and enabled when the NUMA node is not one.
>
> >
> > Of course one may question whether a RB tree is the right choice here,
> > it may be the lock-protected cost can go way down with merely a better
> > data structure.
> >
> > Regardless of that, for actual scalability, there will be no way around
> > decentralazing locking around this and partitioning per some core count
> > (not just by numa awareness).
> >
> > Decentralizing locking is definitely possible, but I have not looked
> > into specifics of how problematic it is. Best case scenario it will
> > merely with separate locks. Worst case scenario something needs a fully
> > stabilized state for traversal, in that case another rw lock can be
> Yes.
>
> The traversal may need to hold many locks.
>
The very paragraph you partially quoted answers what to do in that
case: wrap everything with a new rwsem taken for reading when
adding/removing entries and taken for writing when iterating the
entire thing. Then the iteration sticks to one lock.
The new rw lock puts an upper ceiling on scalability of the thing, but
it is way higher than the current state.
Given the extra overhead associated with it one could consider
sticking to one centralized state by default and switching to
distributed state if there is enough contention.
> > slapped around this, creating locking order read lock -> per-subset
> > write lock -- this will suffer scalability due to the read locking, but
> > it will still scale drastically better as apart from that there will be
> > no serialization. In this setting the problematic consumer will write
> > lock the new thing to stabilize the state.
> >
> > So my non-maintainer opinion is that the patchset is not worth it as it
> > fails to address anything for significantly more common and already
> > affected setups.
> This patch set is to reduce the remote access latency for insert/remove VMA
> in NUMA.
>
And I am saying the mmap semaphore is a significant problem already on
high-core no-numa setups. Addressing scalability in that case would
sort out the problem in your setup and to a significantly higher
extent.
> >
> > Have you looked into splitting the lock?
> >
> I ever tried.
>
> But there are two disadvantages:
> 1.) The traversal may need to hold many locks which makes the
> code very horrible.
>
I already above this is avoidable.
> 2.) Even we split the locks. Each lock protects a tree, when the tree becomes
> big enough, the VMA insert/remove will also become slow in NUMA.
> The reason is that the tree has VMAs in different NUMA nodes.
>
This is orthogonal to my proposal. In fact, if one is to pretend this
is never a factor with your patch, I would like to point out it will
remain not a factor if the per-numa struct gets its own lock.
^ permalink raw reply
* [PATCH net v2] net: airoha: Fix possible TX queue stall in airoha_qdma_tx_napi_poll()
From: Lorenzo Bianconi @ 2026-04-16 10:30 UTC (permalink / raw)
To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Simon Horman, Lorenzo Bianconi
Cc: linux-arm-kernel, linux-mediatek, netdev
Since multiple net_device TX queues can share the same hw QDMA TX queue,
there is no guarantee we have inflight packets queued in hw belonging to a
net_device TX queue stopped in the xmit path because hw QDMA TX queue
can be full. In this corner case the net_device TX queue will never be
re-activated. In order to avoid any potential net_device TX queue stall,
we need to wake all the net_device TX queues feeding the same hw QDMA TX
queue in airoha_qdma_tx_napi_poll routine.
Fixes: 23020f0493270 ("net: airoha: Introduce ethernet support for EN7581 SoC")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
Changes in v2:
- Add txq_stopped parameter to avoid any possible corner cases where the
netdev queue stalls.
- Link to v1: https://lore.kernel.org/r/20260413-airoha-txq-potential-stall-v1-1-7830363b1543@kernel.org
---
drivers/net/ethernet/airoha/airoha_eth.c | 37 +++++++++++++++++++++++++++-----
drivers/net/ethernet/airoha/airoha_eth.h | 1 +
2 files changed, 33 insertions(+), 5 deletions(-)
diff --git a/drivers/net/ethernet/airoha/airoha_eth.c b/drivers/net/ethernet/airoha/airoha_eth.c
index e1ab15f1ee7d..19f67c7dd8e1 100644
--- a/drivers/net/ethernet/airoha/airoha_eth.c
+++ b/drivers/net/ethernet/airoha/airoha_eth.c
@@ -843,6 +843,21 @@ static int airoha_qdma_init_rx(struct airoha_qdma *qdma)
return 0;
}
+static void airoha_qdma_wake_netdev_txqs(struct airoha_queue *q)
+{
+ struct airoha_qdma *qdma = q->qdma;
+ struct airoha_eth *eth = qdma->eth;
+ int i;
+
+ for (i = 0; i < ARRAY_SIZE(eth->ports); i++) {
+ struct airoha_gdm_port *port = eth->ports[i];
+
+ if (port && port->qdma == qdma)
+ netif_tx_wake_all_queues(port->dev);
+ }
+ q->txq_stopped = false;
+}
+
static int airoha_qdma_tx_napi_poll(struct napi_struct *napi, int budget)
{
struct airoha_tx_irq_queue *irq_q;
@@ -919,12 +934,21 @@ static int airoha_qdma_tx_napi_poll(struct napi_struct *napi, int budget)
txq = netdev_get_tx_queue(skb->dev, queue);
netdev_tx_completed_queue(txq, 1, skb->len);
- if (netif_tx_queue_stopped(txq) &&
- q->ndesc - q->queued >= q->free_thr)
- netif_tx_wake_queue(txq);
-
dev_kfree_skb_any(skb);
}
+
+ if (q->txq_stopped && q->ndesc - q->queued >= q->free_thr) {
+ /* Since multiple net_device TX queues can share the
+ * same hw QDMA TX queue, there is no guarantee we have
+ * inflight packets queued in hw belonging to a
+ * net_device TX queue stopped in the xmit path.
+ * In order to avoid any potential net_device TX queue
+ * stall, we need to wake all the net_device TX queues
+ * feeding the same hw QDMA TX queue.
+ */
+ airoha_qdma_wake_netdev_txqs(q);
+ }
+
unlock:
spin_unlock_bh(&q->lock);
}
@@ -1984,6 +2008,7 @@ static netdev_tx_t airoha_dev_xmit(struct sk_buff *skb,
if (q->queued + nr_frags >= q->ndesc) {
/* not enough space in the queue */
netif_tx_stop_queue(txq);
+ q->txq_stopped = true;
spin_unlock_bh(&q->lock);
return NETDEV_TX_BUSY;
}
@@ -2039,8 +2064,10 @@ static netdev_tx_t airoha_dev_xmit(struct sk_buff *skb,
TX_RING_CPU_IDX_MASK,
FIELD_PREP(TX_RING_CPU_IDX_MASK, index));
- if (q->ndesc - q->queued < q->free_thr)
+ if (q->ndesc - q->queued < q->free_thr) {
netif_tx_stop_queue(txq);
+ q->txq_stopped = true;
+ }
spin_unlock_bh(&q->lock);
diff --git a/drivers/net/ethernet/airoha/airoha_eth.h b/drivers/net/ethernet/airoha/airoha_eth.h
index 95e557638617..87b328cfefb0 100644
--- a/drivers/net/ethernet/airoha/airoha_eth.h
+++ b/drivers/net/ethernet/airoha/airoha_eth.h
@@ -193,6 +193,7 @@ struct airoha_queue {
int ndesc;
int free_thr;
int buf_size;
+ bool txq_stopped;
struct napi_struct napi;
struct page_pool *page_pool;
---
base-commit: 1f5ffc672165ff851063a5fd044b727ab2517ae3
change-id: 20260407-airoha-txq-potential-stall-ad52c53094e8
Best regards,
--
Lorenzo Bianconi <lorenzo@kernel.org>
^ permalink raw reply related
* Re: [patch 18/38] lib/tests: Replace get_cycles() with ktime_get()
From: Geert Uytterhoeven @ 2026-04-16 10:24 UTC (permalink / raw)
To: Thomas Gleixner
Cc: LKML, Andrew Morton, Uladzislau Rezki, linux-mm, Arnd Bergmann,
x86, Lu Baolu, iommu, Michael Grzeschik, netdev, linux-wireless,
Herbert Xu, linux-crypto, Vlastimil Babka, David Woodhouse,
Bernie Thompson, linux-fbdev, Theodore Tso, linux-ext4,
Marco Elver, Dmitry Vyukov, kasan-dev, Andrey Ryabinin,
Thomas Sailer, linux-hams, Jason A. Donenfeld, Richard Henderson,
linux-alpha, Russell King, linux-arm-kernel, Catalin Marinas,
Huacai Chen, loongarch, linux-m68k, Dinh Nguyen, Jonas Bonn,
linux-openrisc, Helge Deller, linux-parisc, Michael Ellerman,
linuxppc-dev, Paul Walmsley, linux-riscv, Heiko Carstens,
linux-s390, David S. Miller, sparclinux
In-Reply-To: <20260410120318.794680738@kernel.org>
Hi Thomas,
On Fri, 10 Apr 2026 at 14:20, Thomas Gleixner <tglx@kernel.org> wrote:
> get_cycles() is the historical access to a fine grained time source, but it
> is a suboptimal choice for two reasons:
>
> - get_cycles() is not guaranteed to be supported and functional on all
> systems/platforms. If not supported or not functional it returns 0,
> which makes benchmarking moot.
>
> - get_cycles() returns the raw counter value of whatever the
> architecture platform provides. The original x86 Time Stamp Counter
> (TSC) was despite its name tied to the actual CPU core frequency.
> That's not longer the case. So the counter value is only meaningful
> when the CPU operates at the same frequency as the TSC or the value is
> adjusted to the actual CPU frequency. Other architectures and
> platforms provide similar disjunct counters via get_cycles(), so the
> result is operations per BOGO-cycles, which is not really meaningful.
>
> Use ktime_get() instead which provides nanosecond timestamps with the
> granularity of the underlying hardware counter, which is not different to
> the variety of get_cycles() implementations.
>
> This provides at least understandable metrics, i.e. operations/nanoseconds,
> and is available on all platforms. As with get_cycles() the result might
> have to be put into relation with the CPU operating frequency, but that's
> not any different.
>
> This is part of a larger effort to remove get_cycles() usage from
> non-architecture code.
>
> Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Thanks for your patch!
> --- a/lib/interval_tree_test.c
> +++ b/lib/interval_tree_test.c
> @@ -65,13 +65,13 @@ static void init(void)
> static int basic_check(void)
> {
> int i, j;
> - cycles_t time1, time2, time;
> + ktime_t time1, time2, time;
>
> printk(KERN_ALERT "interval tree insert/remove");
>
> init();
>
> - time1 = get_cycles();
> + time1 = ktime_get();
>
> for (i = 0; i < perf_loops; i++) {
> for (j = 0; j < nnodes; j++)
> @@ -80,11 +80,11 @@ static int basic_check(void)
> interval_tree_remove(nodes + j, &root);
> }
>
> - time2 = get_cycles();
> + time2 = ktime_get();
> time = time2 - time1;
>
> time = div_u64(time, perf_loops);
> - printk(" -> %llu cycles\n", (unsigned long long)time);
> + printk(" -> %llu nsecs\n", (unsigned long long)time);
While cycles_t was unsigned long or long long, ktime_t is always s64,
so "%lld", and the cast can be dropped (everywhere).
Gr{oetje,eeting}s,
Geert
--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org
In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
^ permalink raw reply
* Re: [PATCH 2/9] dt-bindings: sound: add mediatek,mt2701-hdmi-audio machine binding
From: Krzysztof Kozlowski @ 2026-04-16 10:47 UTC (permalink / raw)
To: Daniel Golle
Cc: Liam Girdwood, Mark Brown, Rob Herring, Krzysztof Kozlowski,
Conor Dooley, Matthias Brugger, AngeloGioacchino Del Regno,
Jaroslav Kysela, Takashi Iwai, Cyril Chao, Arnd Bergmann,
Kuninori Morimoto, Nícolas F. R. A. Prado, Eugen Hristev,
linux-sound, devicetree, linux-kernel, linux-arm-kernel,
linux-mediatek
In-Reply-To: <1fe31edbdf045f87f4cfa7ae6fa53196e8b67b96.1776265610.git.daniel@makrotopia.org>
On Wed, Apr 15, 2026 at 04:23:35PM +0100, Daniel Golle wrote:
> Describe the ASoC machine compatible used to wire the MT2701/MT7623N
> AFE HDMI playback path to the on-chip HDMI transmitter acting as the
> generic HDMI audio codec. MT7623N boards carry the same IP and use
> the mt7623n- compatible as a fallback to mt2701-.
subject: sound:
Please use subject prefixes matching the subsystem. You can get them for
example with 'git log --oneline -- DIRECTORY_OR_FILE' on the directory
your patch is touching. For bindings, the preferred subjects are
explained here:
https://www.kernel.org/doc/html/latest/devicetree/bindings/submitting-patches.html#i-for-patch-submitters
>
> Signed-off-by: Daniel Golle <daniel@makrotopia.org>
> ---
> .../sound/mediatek,mt2701-hdmi-audio.yaml | 47 +++++++++++++++++++
> 1 file changed, 47 insertions(+)
> create mode 100644 Documentation/devicetree/bindings/sound/mediatek,mt2701-hdmi-audio.yaml
>
> diff --git a/Documentation/devicetree/bindings/sound/mediatek,mt2701-hdmi-audio.yaml b/Documentation/devicetree/bindings/sound/mediatek,mt2701-hdmi-audio.yaml
> new file mode 100644
> index 0000000000000..d08aee447b471
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/sound/mediatek,mt2701-hdmi-audio.yaml
> @@ -0,0 +1,47 @@
> +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
> +%YAML 1.2
> +---
> +$id: http://devicetree.org/schemas/sound/mediatek,mt2701-hdmi-audio.yaml#
> +$schema: http://devicetree.org/meta-schemas/core.yaml#
> +
> +title: MediaTek MT2701 HDMI audio machine driver
1. Don't describe drivers. Descirbe the hardware.
2. There is already audio for mt2701: mediatek,mt2701-audio. Why HDMI is
not part of existing audio machine bindings? Or maybe this is not sound
card driver?
> +
> +maintainers:
> + - Daniel Golle <daniel@makrotopia.org>
> +
> +description:
> + ASoC machine driver binding the MT2701 AFE HDMI playback path to
> + the on-chip HDMI transmitter via the generic HDMI audio codec.
> + The same HDMI audio IP is present on MT7623N.
> +
> +properties:
> + compatible:
> + oneOf:
> + - const: mediatek,mt2701-hdmi-audio
> + - items:
> + - const: mediatek,mt7623n-hdmi-audio
> + - const: mediatek,mt2701-hdmi-audio
> +
> + mediatek,platform:
> + $ref: /schemas/types.yaml#/definitions/phandle
> + description: Phandle of the MT2701/MT7623N AFE platform node.
> +
> + mediatek,audio-codec:
> + $ref: /schemas/types.yaml#/definitions/phandle
> + description: Phandle of the HDMI transmitter acting as audio codec.
But these suggest it is sound card driver...
Best regards,
Krzysztof
^ permalink raw reply
* [PATCH net,v3 1/1] net: stmmac: Update default_an_inband before passing value to phylink_config
From: KhaiWenTan @ 2026-04-16 10:26 UTC (permalink / raw)
To: andrew+netdev, davem, edumazet, kuba, pabeni, mcoquelin.stm32,
alexandre.torgue, rmk+kernel, maxime.chevallier
Cc: netdev, linux-stm32, linux-arm-kernel, linux-kernel,
yoong.siang.song, hong.aun.looi, khai.wen.tan, KhaiWenTan
From: KhaiWenTan <khai.wen.tan@linux.intel.com>
get_interfaces() will update both the plat->phy_interfaces and
mdio_bus_data->default_an_inband based on reading a SERDES register. As
get_interfaces() will be called after default_an_inband had already been
read, dwmac-intel regressed as a result with incorrect default_an_inband
value in phylink_config.
Therefore, we moved the priv->plat->get_interfaces() to be executed first
before assigning priv->plat->default_an_inband to config->default_an_inband
to ensure default_an_inband is in correct value.
Fixes: d3836052fe09 ("net: stmmac: intel: convert speed_mode_2500() to get_interfaces()")
Signed-off-by: KhaiWenTan <khai.wen.tan@linux.intel.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
---
v3:
- rebase on the latest net tree (Paolo Abeni)
v2: https://patchwork.kernel.org/project/netdevbpf/patch/20260413020339.68426-1-khai.wen.tan@linux.intel.com/
- update commit message for better understanding (Russell King)
- corrected the blamed commit (Russell King)
v1: https://patchwork.kernel.org/project/netdevbpf/patch/20260410020735.327590-1-khai.wen.tan@linux.intel.com/
---
drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 01a983001ab4..ca68248dbc78 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -1410,8 +1410,6 @@ static int stmmac_phylink_setup(struct stmmac_priv *priv)
priv->tx_lpi_clk_stop = priv->plat->flags &
STMMAC_FLAG_EN_TX_LPI_CLOCKGATING;
- config->default_an_inband = priv->plat->default_an_inband;
-
/* Get the PHY interface modes (at the PHY end of the link) that
* are supported by the platform.
*/
@@ -1419,6 +1417,8 @@ static int stmmac_phylink_setup(struct stmmac_priv *priv)
priv->plat->get_interfaces(priv, priv->plat->bsp_priv,
config->supported_interfaces);
+ config->default_an_inband = priv->plat->default_an_inband;
+
/* Set the platform/firmware specified interface mode if the
* supported interfaces have not already been provided using
* phy_interface as a last resort.
--
2.43.0
^ permalink raw reply related
* Re: [PATCH net-next 5/6] net: stmmac: move PHY handling out of __stmmac_open()/release()
From: Russell King (Oracle) @ 2026-04-16 10:49 UTC (permalink / raw)
To: Alexander Stein
Cc: Andrew Lunn, Heiner Kallweit, Alexandre Torgue, Andrew Lunn,
David S. Miller, Eric Dumazet, Jakub Kicinski, linux-arm-kernel,
linux-stm32, Maxime Coquelin, netdev, Paolo Abeni
In-Reply-To: <5987484.DvuYhMxLoT@steina-w>
On Thu, Apr 16, 2026 at 08:20:13AM +0200, Alexander Stein wrote:
> Am Mittwoch, 15. April 2026, 14:59:32 CEST schrieb Russell King (Oracle):
> > On Wed, Apr 15, 2026 at 08:08:40AM +0200, Alexander Stein wrote:
> > > Hi,
> > >
> > > Am Dienstag, 23. September 2025, 13:26:19 CEST schrieb Russell King (Oracle):
> > > > Move the PHY attachment/detachment from the network driver out of
> > > > __stmmac_open() and __stmmac_release() into stmmac_open() and
> > > > stmmac_release() where these actions will only happen when the
> > > > interface is administratively brought up or down. It does not make
> > > > sense to detach and re-attach the PHY during a change of MTU.
> > >
> > > Sorry for coming up now. But I recently noticed this commit breaks changing
> > > the MTU on i.MX8MP. Once I simply change the MTU I run into some DMA error:
> > > $ ip link set dev end1 mtu 1400
> > > imx-dwmac 30bf0000.ethernet end1: Register MEM_TYPE_PAGE_POOL RxQ-0
> > > imx-dwmac 30bf0000.ethernet end1: Register MEM_TYPE_PAGE_POOL RxQ-1
> > > imx-dwmac 30bf0000.ethernet end1: Register MEM_TYPE_PAGE_POOL RxQ-2
> > > imx-dwmac 30bf0000.ethernet end1: Register MEM_TYPE_PAGE_POOL RxQ-3
> > > imx-dwmac 30bf0000.ethernet end1: Register MEM_TYPE_PAGE_POOL RxQ-4
> > > imx-dwmac 30bf0000.ethernet end1: Link is Down
> > > imx-dwmac 30bf0000.ethernet end1: Failed to reset the dma
> > > imx-dwmac 30bf0000.ethernet end1: stmmac_hw_setup: DMA engine initialization failed
> >
> > This basically means that a clock is missing. Please provide more
> > information:
> >
> > - what kernel version are you using?
>
> Currently I am using v6.18.22.
> $ ethtool -i end1
> driver: st_gmac
> version: 6.18.22
> firmware-version:
> expansion-rom-version:
> bus-info: 30bf0000.ethernet
> supports-statistics: yes
> supports-test: no
> supports-eeprom-access: no
> supports-register-dump: yes
> supports-priv-flags: no
>
> > - has EEE been negotiated?
>
> No. It is marked as not supported
>
> $ ethtool --show-eee end1
> EEE settings for end1:
> EEE status: not supported
>
> > - does the problem persist when EEE is disabled?
>
> As EEE is not supported the problem occurs even with EEE disabled.
>
> > - which PHY is attached to stmmac?
>
> It is a TI DP83867.
>
> imx-dwmac 30bf0000.ethernet eth1: PHY [stmmac-1:03] driver [TI DP83867] (irq=136)
>
> > - which PHY interface mode is being used to connect the PHY to stmmac?
>
> For this interface
> > phy-mode = "rgmii-id";
> is set.
>
> In case it is helpful. My platform is arch/arm64/boot/dts/freescale/imx8mp-tqma8mpql-mba8mpxl.dts
> Thanks for assisting. If there a further questions, don't hesitate to ask.
Thanks.
So, as best I can determine at the moment, we end up with the following
sequence:
stmmac_change_mtu()
__stmmac_release()
phylink_stop()
phy_stop()
phy->state = PHY_HALTED
_phy_state_machine() returns PHY_STATE_WORK_SUSPEND
_phy_state_machine_post_work()
phy_suspend()
genphy_suspend()
phy_set_bits(phydev, MII_BMCR, BMCR_PDOWN)
With the DP83867, this causes most of the PHY to be powered down, thus
stopping the clocks, and this causes the stmmac reset to time out.
Prior to this commit, we would have called phylink_disconnect_phy()
immediately after phylink_stop(), but I can see nothing that would
be affected by this change there (since that also calls
phy_suspend(), but as the PHY is already suspended, this becomes a
no-op.)
However, __stmmac_open() would have called stmmac_init_phy(), which
would reattach the PHY. This would have called phy_init_hw(),
resetting the PHY, and phy_resume() which would ensure that the
PDOWN bit is clear - thus clocks would be running.
As a hack, please can you try calling phylink_prepare_resume()
between the __stmmac_release() and __stmmac_open() in
stmmac_change_mtu(). This should resume the PHY, thus restoring the
clocks necessary for stmmac to reset.
Thanks.
--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!
^ permalink raw reply
* Re: [PATCH 1/1] KVM: arm64: nv: Avoid full shadow s2 unmap
From: Marc Zyngier @ 2026-04-16 10:50 UTC (permalink / raw)
To: Wei-Lin Chang
Cc: linux-arm-kernel, kvmarm, linux-kernel, Oliver Upton, Joey Gouly,
Suzuki K Poulose, Zenghui Yu, Catalin Marinas, Will Deacon
In-Reply-To: <tozhyzmwfddgx5fgfejracv6tskgs7xhs4fs6yvmvff74m7gwy@eq3oh35tokj4>
On Thu, 16 Apr 2026 00:05:40 +0100,
Wei-Lin Chang <weilin.chang@arm.com> wrote:
>
> On Wed, Apr 15, 2026 at 09:38:55AM +0100, Marc Zyngier wrote:
[...]
> > > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> > > index 851f6171751c..a97bd461c1e1 100644
> > > --- a/arch/arm64/include/asm/kvm_host.h
> > > +++ b/arch/arm64/include/asm/kvm_host.h
> > > @@ -217,6 +217,10 @@ struct kvm_s2_mmu {
> > > */
> > > bool nested_stage2_enabled;
> > >
> > > + /* canonical IPA to nested IPA range lookup */
> > > + struct maple_tree nested_revmap_mt;
> > > + bool nested_revmap_broken;
> > > +
> >
> > Consider moving this boolean next to the other ones so that you don't
> > create too many holes in the kvm_s2_mmu structure (use pahole to find out).
> >
> > But I have some misgivings about the way things are structured
> > here. Only NV needs a revmap, yet this is present irrelevant of the
> > nature of the VM and bloats the data structure a bit.
> >
> > My naive approach would have been to only keep a pointer to the
> > revmap, and make that pointer NULL when the tree is "broken", and
> > freed under RCU if the context isn't the correct one.
>
> Can you explain what you mean by "if the context isn't the correct one"?
> If this refers to when selecting a specific kvm_s2_mmu instance for
> another context, then IIUC refcnt would already be 0 and there would be
> no other user of the tree.
Sorry, "context" is an overloaded word. I meant a situation in which
you couldn't immediately free the maple-tree because you're holding
locks and freeing (hypothetically) requires a sleeping "context". in
this case, freeing under RCU, purely as a deferring mechanism, might
be useful.
[...]
> > > +/*
> > > + * Per shadow S2 reverse map (IPA -> nested IPA range) maple tree payload
> > > + * layout:
> > > + *
> > > + * bit 63: valid, 1 for non-polluted entries, prevents the case where the
> > > + * nested IPA is 0 and turns the whole value to 0
> > > + * bits 55-12: nested IPA bits 55-12
> > > + * bit 0: polluted, 1 for polluted, 0 for not
> > > + */
> > > +#define VALID_ENTRY BIT(63)
> > > +#define NESTED_IPA_MASK GENMASK_ULL(55, 12)
> > > +#define UNKNOWN_IPA BIT(0)
> > > +
> >
> > This only works because you are using the "advanced" API, right?
> > Otherwise, you'd be losing the high bit. It'd be good to add a comment
> > so that people keep that in mind.
>
> Sorry, I can't find any relationship between the advanced API and the
> top most bit of the maple tree value, what am I missing?
From Documentation/core-api/maple_tree.rst:
<quote>
The Maple Tree can store values between ``0`` and ``ULONG_MAX``. The Maple
Tree reserves values with the bottom two bits set to '10' which are below 4096
(ie 2, 6, 10 .. 4094) for internal use. If the entries may use reserved
entries then the users can convert the entries using xa_mk_value() and convert
them back by calling xa_to_value(). If the user needs to use a reserved
value, then the user can convert the value when using the
:ref:`maple-tree-advanced-api`, but are blocked by the normal API.
</quote>
So depending how you read this, you can conclude that the bit patterns
you encode in the MT may be considered as invalid. xa_mk_value() would
make things always work, but that shifts the value left by one bit,
hence you'd lose bit 63 (see how we use trap_config in
emulate-nested.c to deal with this).
I think you are lucky that bits [11:1] are always 0 here, but that
looks extremely fragile to me, so you never hit the [1:0]==10
condition, but that's really fragile.
>
> >
> > > void kvm_init_nested(struct kvm *kvm)
> > > {
> > > kvm->arch.nested_mmus = NULL;
> > > @@ -769,12 +783,57 @@ static struct kvm_s2_mmu *get_s2_mmu_nested(struct kvm_vcpu *vcpu)
> > > return s2_mmu;
> > > }
> > >
> > > +void kvm_record_nested_revmap(gpa_t ipa, struct kvm_s2_mmu *mmu,
> > > + gpa_t fault_ipa, size_t map_size)
> > > +{
> > > + struct maple_tree *mt = &mmu->nested_revmap_mt;
> > > + gpa_t start = ipa;
> > > + gpa_t end = ipa + map_size - 1;
> > > + u64 entry, new_entry = 0;
> > > + MA_STATE(mas, mt, start, end);
> > > +
> > > + if (mmu->nested_revmap_broken)
> > > + return;
> > > +
> > > + mtree_lock(mt);
> > > + entry = (u64)mas_find_range(&mas, end);
> > > +
> > > + if (entry) {
> > > + /* maybe just a perm update... */
> > > + if (!(entry & UNKNOWN_IPA) && mas.index == start &&
> > > + mas.last == end &&
> > > + fault_ipa == (entry & NESTED_IPA_MASK))
> > > + goto unlock;
> > > + /*
> > > + * Create a "polluted" range that spans all the overlapping
> > > + * ranges and store it.
> > > + */
> > > + while (entry && mas.index <= end) {
> > > + start = min(mas.index, start);
> > > + end = max(mas.last, end);
> > > + entry = (u64)mas_find_range(&mas, end);
> > > + }
> > > + new_entry |= UNKNOWN_IPA;
> > > + } else {
> > > + new_entry |= fault_ipa;
> > > + new_entry |= VALID_ENTRY;
> > > + }
> > > +
> > > + mas_set_range(&mas, start, end);
> > > + if (mas_store_gfp(&mas, (void *)new_entry, GFP_NOWAIT | __GFP_ACCOUNT))
> > > + mmu->nested_revmap_broken = true;
> >
> > Can we try and minimise the risk of allocation failure here?
> >
> > user_mem_abort() tries very hard to pre-allocate pages for page
> > tables by maintaining an memcache. Can we have a similar approach for
> > the revmap?
>
> Unfortunately, as I understand the maple tree can only pre-allocate for
> a store when the range and the entry to be stored is given, but in this
> case we must inspect the tree to get that information after we hold the
> mmu and maple tree locks. It is possible to do a two pass approach:
>
> pre-allocate -> take MMU lock -> take maple tree lock -> revalidate what
> we pre-allocated is still usable (nobody changed the tree before we took
> the maple tree lock)
>
> But I am not fond of this extra complexity..
Fair enough. It would at least be interesting to get a feel for how
often this happens, because if we fail often, it won't help much.
[...]
> > My other concern here is related to TLB invalidation. As the guest
> > performs TLB invalidations that remove entries from the shadow S2,
> > there is no way to update the revmap to account for this.
> >
> > This obviously means that the revmap becomes more and more inaccurate
> > over time, and that is likely to accumulate conflicting entries.
> >
> > What is the plan to improve the situation on this front?
>
> Right now I think using a direct map which goes from nested IPA to
> canonical IPA could work while not generating too much complexity, if we
> keep the reverse map and direct map in lockstep (direct map keeping the
> same mappings as the reverse map but just in reverse).
Right, so that'd effectively a mirror of the guest's page tables at
the point of taking the fault.
> I'll try to do that and include it in the next iteration.
Thanks,
M.
--
Without deviation from the norm, progress is not possible.
^ permalink raw reply
* [PATCH] ARM: dts: stm32: add board pin documentation stm32mp135f-dk
From: Uwe Kleine-König @ 2026-04-16 11:02 UTC (permalink / raw)
To: Maxime Coquelin, Alexandre Torgue; +Cc: linux-stm32, linux-arm-kernel
Relate the devices defined in the device tree to the SoC ports and pins
and labels available on the board.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@baylibre.com>
---
Hello,
it's always the same things I look up in the various documentations. Put
this information in the dts to simplify this in the future.
Best regards
Uwe
arch/arm/boot/dts/st/stm32mp135f-dk.dts | 28 +++++++++++++++++++++----
1 file changed, 24 insertions(+), 4 deletions(-)
diff --git a/arch/arm/boot/dts/st/stm32mp135f-dk.dts b/arch/arm/boot/dts/st/stm32mp135f-dk.dts
index 4d4cec8b86ac..d70fc0b5362d 100644
--- a/arch/arm/boot/dts/st/stm32mp135f-dk.dts
+++ b/arch/arm/boot/dts/st/stm32mp135f-dk.dts
@@ -64,6 +64,7 @@ gpio-keys {
compatible = "gpio-keys";
button-user {
+ /* GPIO on PA13 "User button 2 (B2)" */
label = "User-PA13";
linux,code = <BTN_1>;
gpios = <&gpioa 13 (GPIO_ACTIVE_LOW | GPIO_PULL_UP)>;
@@ -74,6 +75,7 @@ leds {
compatible = "gpio-leds";
led_blue: led-blue {
+ /* GPIO on PA14 "User LED (LD3)" */
function = LED_FUNCTION_HEARTBEAT;
color = <LED_COLOR_ID_BLUE>;
gpios = <&gpioa 14 GPIO_ACTIVE_LOW>;
@@ -82,6 +84,7 @@ led_blue: led-blue {
};
led-red {
+ /* GPIO on PA13 "User LED (LD4)" */
function = LED_FUNCTION_STATUS;
color = <LED_COLOR_ID_RED>;
gpios = <&gpioa 13 GPIO_ACTIVE_LOW>;
@@ -252,6 +255,7 @@ phy0_eth1: ethernet-phy@0 {
&i2c1 {
pinctrl-names = "default", "sleep";
+ /* SDA on PE8 = CN8.27, SCL on PD12 = CN8.28 */
pinctrl-0 = <&i2c1_pins_a>;
pinctrl-1 = <&i2c1_sleep_pins_a>;
i2c-scl-rising-time-ns = <96>;
@@ -486,7 +490,10 @@ counter {
status = "okay";
};
pwm {
- /* PWM output on pin 7 of the expansion connector (CN8.7) using TIM3_CH4 func */
+ /*
+ * CH4 on PB1 = CN8.7;
+ * conflicting with &usart1 CH3 on PB0 = CN8.10 is possible
+ */
pinctrl-0 = <&pwm3_pins_a>;
pinctrl-1 = <&pwm3_sleep_pins_a>;
pinctrl-names = "default", "sleep";
@@ -505,7 +512,10 @@ counter {
status = "okay";
};
pwm {
- /* PWM output on pin 31 of the expansion connector (CN8.31) using TIM4_CH2 func */
+ /*
+ * CH2 on PD13 = CN8.31;
+ * conflicting with &i2c1 CH1 on PD12 = CN8.28 is possible
+ */
pinctrl-0 = <&pwm4_pins_a>;
pinctrl-1 = <&pwm4_sleep_pins_a>;
pinctrl-names = "default", "sleep";
@@ -524,7 +534,12 @@ counter {
status = "okay";
};
pwm {
- /* PWM output on pin 32 of the expansion connector (CN8.32) using TIM8_CH3 func */
+ /*
+ * CH3 on PE5 = CN8.32
+ * conflicting with &usart1 CH1N on PA7 = C8.36 is possible
+ * conflicting with &usart1 CH2N on PB0 = C8.10 is possible
+ * conflicting with &usart1 CH3N on PB1 = C8.7 is possible
+ */
pinctrl-0 = <&pwm8_pins_a>;
pinctrl-1 = <&pwm8_sleep_pins_a>;
pinctrl-names = "default", "sleep";
@@ -541,7 +556,7 @@ counter {
status = "okay";
};
pwm {
- /* PWM output on pin 33 of the expansion connector (CN8.33) using TIM14_CH1 func */
+ /* CH1 on PF9 = CH8.33 (alternatively on PA7 = CN8.36 conflicting with &usart1 */
pinctrl-0 = <&pwm14_pins_a>;
pinctrl-1 = <&pwm14_sleep_pins_a>;
pinctrl-names = "default", "sleep";
@@ -553,6 +568,7 @@ timer@13 {
};
&uart4 {
+ /* Accessible via micro USB ST-LINK USB (CN10) */
pinctrl-names = "default", "sleep", "idle";
pinctrl-0 = <&uart4_pins_a>;
pinctrl-1 = <&uart4_sleep_pins_a>;
@@ -564,6 +580,7 @@ &uart4 {
&uart8 {
pinctrl-names = "default", "sleep", "idle";
+ /* TX on PE1 = CN8.37, RX on PF9 = CN8.33 */
pinctrl-0 = <&uart8_pins_a>;
pinctrl-1 = <&uart8_sleep_pins_a>;
pinctrl-2 = <&uart8_idle_pins_a>;
@@ -574,6 +591,7 @@ &uart8 {
&usart1 {
pinctrl-names = "default", "sleep", "idle";
+ /* TX on PC0 = CN8.8, RX on PB0 = CN8.10, RTS on PC2 = CN8.11, CTS on PA7 = CN8.36 */
pinctrl-0 = <&usart1_pins_a>;
pinctrl-1 = <&usart1_sleep_pins_a>;
pinctrl-2 = <&usart1_idle_pins_a>;
@@ -584,6 +602,7 @@ &usart1 {
/* Bluetooth */
&usart2 {
pinctrl-names = "default", "sleep", "idle";
+ /* TX on PH12, RX on PD15, RTS on PD4, CTS on PE11 */
pinctrl-0 = <&usart2_pins_a>;
pinctrl-1 = <&usart2_sleep_pins_a>;
pinctrl-2 = <&usart2_idle_pins_a>;
@@ -613,6 +632,7 @@ hub@1 {
};
&usbotg_hs {
+ /* USB Type-C DRP (CN7) */
phys = <&usbphyc_port1 0>;
phy-names = "usb2-phy";
usb-role-switch;
base-commit: 936c21068d7ade00325e40d82bfd2f3f29d9f659
--
2.47.3
^ permalink raw reply related
* Re: [PATCH v13 00/48] arm64: Support for Arm CCA in KVM
From: Suzuki K Poulose @ 2026-04-16 11:04 UTC (permalink / raw)
To: Alper Gun, Steven Price
Cc: kvm, kvmarm, Catalin Marinas, Marc Zyngier, Will Deacon,
James Morse, Oliver Upton, Zenghui Yu, linux-arm-kernel,
linux-kernel, Joey Gouly, Alexandru Elisei, Christoffer Dall,
Fuad Tabba, linux-coco, Ganapatrao Kulkarni, Gavin Shan,
Shanker Donthineni, Aneesh Kumar K . V, Emi Kisanuki,
Vishal Annapurve
In-Reply-To: <CABpDEumAVf02uOS5Bj07EDyuU=z9FV-iocQU1j7gFM5z0BeV_w@mail.gmail.com>
On 16/04/2026 00:27, Alper Gun wrote:
> On Wed, Apr 15, 2026 at 4:01 AM Steven Price <steven.price@arm.com> wrote:
>>
>> On 14/04/2026 22:40, Alper Gun wrote:
>>> On Wed, Mar 18, 2026 at 8:54 AM Steven Price <steven.price@arm.com> wrote:
>>>>
>>>> This series adds support for running protected VMs using KVM under the
>>>> Arm Confidential Compute Architecture (CCA).
>>>>
>>>> New major version number! This now targets RMM v2.0-bet0[1]. And unlike
>>>> for Linux this represents a significant change.
>>>>
>>>> RMM v2.0 brings with it the ability to configure the RMM to have the
>>>> same page size as the host (so no more RMM_PAGE_SIZE and dealing with
>>>> granules being different from host pages). It also introduces range
>>>> based APIs for many operations which should be more efficient and
>>>> simplifies the code in places.
>>>>
>>>> The handling of the GIC has changed, so the system registers are used to
>>>> pass the GIC state rather than memory. This means fewer changes to the
>>>> KVM code as it looks much like a normal VM in this respect.
>>>>
>>>> And of course the new uAPI introduced in the previous v12 posting is
>>>> retained so that also remains simplified compared to earlier postings.
>>>>
>>>> The RMM support for v2.0 is still early and so this series includes a
>>>> few hacks to ease the integration. Of note are that there are some RMM
>>>> v1.0 SMCs added to paper over areas where the RMM implementation isn't
>>>> quite ready for v2.0, and "SROs" (see below) are deferred to the final
>>>> patch in the series.
>>>>
>>>> The PMU in RMM v2.0 requires more handling on the RMM-side (and
>>>> therefore simplifies the implementation on Linux), but this isn't quite
>>>> ready yet. The Linux side is implemented (but untested).
>>>>
>>>> PSCI still requires the VMM to provide the "target" REC for operations
>>>> that affect another vCPU. This is likely to change in a future version
>>>> of the specification. There's also a desire to force PSCI to be handled
>>>> in the VMM for realm guests - this isn't implemented yet as I'm waiting
>>>> for the dust to settle on the RMM interface first.
>>>>
>>>> Stateful RMI Operations
>>>> -----------------------
>>>>
>>>> The RMM v2.0 spec brings a new concept of Stateful RMI Operations (SROs)
>>>> which allow the RMM to complete an operation over several SMC calls and
>>>> requesting/returning memory to the host. This has the benefit of
>>>> allowing interrupts to be handled in the middle of an operation (by
>>>> returning to the host to handle the interrupt without completing the
>>>> operation) and enables the RMM to dynamically allocate memory for
>>>> internal tracking purposes. One example of this is RMI_REC_CREATE no
>>>> longer needs "auxiliary granules" provided upfront but can request the
>>>> memory needed during the RMI_REC_CREATE operation.
>>>>
>>>> There are a fairly large number of operations that are defined as SROs
>>>> in the specification, but current both Linux and RMM only have support
>>>> for RMI_REC_CREATE and RMI_REC_DESTROY. There a number of TODOs/FIXMEs
>>>> in the code where support is missing.
>>>>
>>>> Given the early stage support for this, the SRO handling is all confined
>>>> to the final patch. This patch can be dropped to return to a pre-SRO
>>>> state (albeit a mixture of RMM v1.0 and v2.0 APIs) for testing purposes.
>>>>
>>>> A future posting will reorder the series to move the generic SRO support
>>>> to an early patch and will implement the proper support for this in all
>>>> RMI SMCs.
>>>>
>>>> One aspect of SROs which is not yet well captured is that in some
>>>> circumstances the Linux kernel will need to call an SRO call in a
>>>> context where memory allocation is restricted (e.g. because a spinlock
>>>> is held). In this case the intention is that the SRO will be cancelled,
>>>> the spinlock dropped so the memory allocation can be completed, and then
>>>> the SRO restarted (obviously after rechecking the state that the
>>>> spinlock was protecting). For this reason the code stores the memory
>>>> allocations within a struct rmi_sro_state object - see the final patch
>>>> for more details.
>>>>
>>>> This series is based on v7.0-rc1. It is also available as a git
>>>> repository:
>>>>
>>>> https://gitlab.arm.com/linux-arm/linux-cca cca-host/v13
>>>>
>>>>
>>>
>>> Hi Steven,
>>>
>>> I have a question regarding host kexec and kdump scenarios, and
>>> whether there is any plan to make them work in this initial series.
>>>
>>> Intel TDX and AMD SEV-SNP both have a firmware shutdown command that
>>> is invoked during the kexec or panic code paths to safely bypass
>>> hardware memory protections and boot into the new kernel. As far as
>>> I know, there is no similar global teardown command available for
>>> the RMM.
>>
>> Correct, the RMM specification as it stands doesn't provide a mechanism
>> for the host to do this. The host would have to identify all the realm
>> guests in the system: specifically the address of the RDs (Realm
>> Descriptors) and RECs (Realm Execution Contexts). It needs this to tear
>> down the guests and be able to undelegate the memory.
>>
>> It's an interesting point and I'll raise the idea of a "firmware
>> shutdown command" to make this more possible.
>>
>>> What is the roadmap for supporting both general kexec and
>>> more specifically kdump (panic) scenarios with CCA?
>>
>> I don't have a roadmap I'm afraid for these. kexec in theory would be
>> possible with KVM gracefully terminating all realms. For kdump/panic
>> that sort of graceful shutdown isn't really appropriate (or likely to
>> succeed).
>>
>
> Thanks Steven for the clarification.
>
> For us, kdump is highly critical as it is our primary diagnostic tool
> for host crashes. Without it, monitoring and debugging at fleet scale
> would become unmanageable.
>
> To confirm my understanding of the current architecture: if a host
> panics while no Realms are actively running (and therefore no pages
> are currently in the delegated state), the standard kdump extraction
> should work perfectly fine without any modifications, correct?
This may not be true. We could have pages donated to RMM for GPT,
Tracking etc. So, unless Linux keeps track of them, it may be
unsafe for a crash kernel to access them.
>
> Regarding the KVM tracking structures (RDs, RECs, RTTs, etc.) when VMs
> are running, perhaps we could use `vmcoreinfo` to export the physical
> addresses of these delegated pages. This would allow tools like
Thinking of this, do we really need to ? We could access the pages from
"vmcore" read and handle the GPFs for such accesses and give out 0s
for the Granules. Anyways, we can't get access to the data on those
pages that are still in Realm PAS.
> `makedumpfile` to explicitly filter them out. I assume these pages must
> remain hardware-locked while the VMs are active.
>
> Long-term, having an architectural shutdown command - similar to the
> TDH.SYS.DISABLE command in Intel TDX - would be incredibly useful. It
> would allow the kdump kernel to safely bypass these hardware security
> checks, especially when extracting host-side KVM state.
For kexec, may be we could do this. Alternatively we could try to
reclaim everything back, (GPTs, Tracking) before kexec-reboot.
>
> As for the protected realm memory, I assume that is an easier problem.
> We naturally want to exclude guest pages from a host dump regardless
> of whether they are Realm pages or not. However, accidental touches
> are still fatal.
>
>> There is also some RMM configuration which cannot be repeated (see
>> RMI_RMM_CONFIG_SET) - which implies that the kexec kernel must be
>> similar to the first kernel (i.e. same page size).
That is true, the page sizes must match. RMM spec is updated to probe
the state of the RMM and detect if it can do the CONFIG_SET
Suzuki
>>
>> Thanks,
>> Steve
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox