* RE: [PATCH V11 02/12] PCI: host-generic: Add common helpers for parsing Root Port properties
From: Sherry Sun @ 2026-04-08 6:34 UTC (permalink / raw)
To: Manivannan Sadhasivam
Cc: robh@kernel.org, krzk+dt@kernel.org, conor+dt@kernel.org,
Frank Li, s.hauer@pengutronix.de, kernel@pengutronix.de,
festevam@gmail.com, lpieralisi@kernel.org, kwilczynski@kernel.org,
bhelgaas@google.com, Hongxing Zhu, l.stach@pengutronix.de,
imx@lists.linux.dev, linux-pci@vger.kernel.org,
linux-arm-kernel@lists.infradead.org, devicetree@vger.kernel.org,
linux-kernel@vger.kernel.org
In-Reply-To: <lnzprzrdwra7pn7d6m3sbj5pvjy64blwpjl6i3lmlnfbyho63b@czpyhpgz5vum>
> On Tue, Apr 07, 2026 at 06:41:44PM +0800, Sherry Sun wrote:
> > Introduce generic helper functions to parse Root Port device tree
> > nodes and extract common properties like reset GPIOs. This allows
> > multiple PCI host controller drivers to share the same parsing logic.
> >
> > Define struct pci_host_port to hold common Root Port properties
> > (currently only reset GPIO descriptor) and add
> > pci_host_common_parse_ports() to parse Root Port nodes from device
> tree.
> >
> > Also add the 'ports' list to struct pci_host_bridge for better
> > maintain parsed Root Port information.
> >
> > Signed-off-by: Sherry Sun <sherry.sun@nxp.com>
> > ---
> > drivers/pci/controller/pci-host-common.c | 77
> > ++++++++++++++++++++++++ drivers/pci/controller/pci-host-common.h |
> 16 +++++
> > drivers/pci/probe.c | 1 +
> > include/linux/pci.h | 1 +
> > 4 files changed, 95 insertions(+)
> >
> > diff --git a/drivers/pci/controller/pci-host-common.c
> > b/drivers/pci/controller/pci-host-common.c
> > index d6258c1cffe5..0fb6991dde7b 100644
> > --- a/drivers/pci/controller/pci-host-common.c
> > +++ b/drivers/pci/controller/pci-host-common.c
> > @@ -9,6 +9,7 @@
> >
> > #include <linux/kernel.h>
> > #include <linux/module.h>
> > +#include <linux/gpio/consumer.h>
> > #include <linux/of.h>
> > #include <linux/of_address.h>
> > #include <linux/of_pci.h>
> > @@ -17,6 +18,82 @@
> >
> > #include "pci-host-common.h"
> >
> > +/**
> > + * pci_host_common_delete_ports - Cleanup function for port list
> > + * @data: Pointer to the port list head */ void
> > +pci_host_common_delete_ports(void *data) {
> > + struct list_head *ports = data;
> > + struct pci_host_port *port, *tmp;
> > +
> > + list_for_each_entry_safe(port, tmp, ports, list)
> > + list_del(&port->list);
> > +}
> > +EXPORT_SYMBOL_GPL(pci_host_common_delete_ports);
> > +
> > +/**
> > + * pci_host_common_parse_port - Parse a single Root Port node
> > + * @dev: Device pointer
> > + * @bridge: PCI host bridge
> > + * @node: Device tree node of the Root Port
> > + *
> > + * Returns: 0 on success, negative error code on failure */ static
> > +int pci_host_common_parse_port(struct device *dev,
> > + struct pci_host_bridge *bridge,
> > + struct device_node *node)
> > +{
> > + struct pci_host_port *port;
> > + struct gpio_desc *reset;
> > +
> > + reset = devm_fwnode_gpiod_get(dev, of_fwnode_handle(node),
> > + "reset", GPIOD_ASIS, "PERST#");
>
> Sorry, I missed this earlier.
>
> Since PERST# is optional, you cannot reliably detect whether the Root Port
> binding intentionally skipped the PERST# GPIO or legacy binding is used, just
> by checking for PERST# in Root Port node.
>
> So this helper should do 3 things:
>
> 1. If PERST# is found in Root Port node, use it.
> 2. If not, check the RC node and if present, return -ENOENT to fallback to the
> legacy binding.
> 3. If not found in both nodes, assume that the PERST# is not present in the
> design, and proceed with parsing Root Port binding further.
Hi Mani, understand, does the following code looks ok for above three cases?
/* Check if PERST# is present in Root Port node */
reset = devm_fwnode_gpiod_get(dev, of_fwnode_handle(node),
"reset", GPIOD_ASIS, "PERST#");
if (IS_ERR(reset)) {
/* If error is not -ENOENT, it's a real error */
if (PTR_ERR(reset) != -ENOENT)
return PTR_ERR(reset);
/* PERST# not found in Root Port node, check RC node */
rc_has_reset = of_property_read_bool(dev->of_node, "reset-gpios") ||
of_property_read_bool(dev->of_node, "reset-gpio");
if (rc_has_reset)
return -ENOENT;
/* No PERST# in either node, assume not present in design */
reset = NULL;
}
port = devm_kzalloc(dev, sizeof(*port), GFP_KERNEL);
if (!port)
return -ENOMEM;
...
>
> But there is one more important limitation here. Right now, this API only
> handles PERST#. But if another vendor tries to use it and if they need other
> properties such as PHY, clocks etc... those resources should be fetched
> optionally only by this helper. But if the controller has a hard dependency on
> those resources, the driver will fail to operate.
>
> I don't think we can fix this limitation though and those platforms should
> ensure that the resource dependency is correctly modeled in DT binding and
> the DTS is validated properly. It'd be good to mention this in the kernel doc of
> this API.
Ok, I will add a NOTE for this in this API description.
>
> > + if (IS_ERR(reset))
> > + return PTR_ERR(reset);
> > +
> > + port = devm_kzalloc(dev, sizeof(*port), GFP_KERNEL);
> > + if (!port)
> > + return -ENOMEM;
> > +
> > + port->reset = reset;
> > + INIT_LIST_HEAD(&port->list);
> > + list_add_tail(&port->list, &bridge->ports);
> > +
> > + return 0;
> > +}
> > +
> > +/**
> > + * pci_host_common_parse_ports - Parse Root Port nodes from device
> > +tree
> > + * @dev: Device pointer
> > + * @bridge: PCI host bridge
> > + *
> > + * This function iterates through child nodes of the host bridge and
> > +parses
> > + * Root Port properties (currently only reset GPIO).
> > + *
> > + * Returns: 0 on success, -ENOENT if no ports found, other negative
> > +error codes
> > + * on failure
> > + */
> > +int pci_host_common_parse_ports(struct device *dev, struct
> > +pci_host_bridge *bridge) {
> > + int ret = -ENOENT;
> > +
> > + for_each_available_child_of_node_scoped(dev->of_node, of_port) {
> > + if (!of_node_is_type(of_port, "pci"))
> > + continue;
> > + ret = pci_host_common_parse_port(dev, bridge, of_port);
> > + if (ret)
> > + return ret;
>
> As Sashiko flagged, you need to make sure that devm_add_action_or_reset()
> is added even during the error path:
Yes, it needs to be fixed. We can handle it with the following two methods, I am not sure which method is better or more preferable?
#1: register cleanup action after first successful port parse and use cleanup_registered flag to avoid duplicate register.
int ret = -ENOENT;
bool cleanup_registered = false;
for_each_available_child_of_node_scoped(dev->of_node, of_port) {
if (!of_node_is_type(of_port, "pci"))
continue;
ret = pci_host_common_parse_port(dev, bridge, of_port);
if (ret)
return ret;
/* Register cleanup action after first successful port parse */
if (!cleanup_registered) {
ret = devm_add_action_or_reset(dev,
pci_host_common_delete_ports,
&bridge->ports);
if (ret)
return ret;
cleanup_registered = true;
}
}
#2: call devm_add_action to register cleanup action before the loop begins.
ret = devm_add_action(dev, pci_host_common_delete_ports,
&bridge->ports);
if (ret)
return ret;
ret = -ENOENT;
for_each_available_child_of_node_scoped(dev->of_node, of_port) {
if (!of_node_is_type(of_port, "pci"))
continue;
ret = pci_host_common_parse_port(dev, bridge, of_port);
if (ret)
return ret;
}
Best Regards
Sherry
> https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fsashiko
> .dev%2F%23%2Fpatchset%2F20260407104154.2842132-1-
> sherry.sun%2540nxp.com%3Fpart%3D2&data=05%7C02%7Csherry.sun%40nx
> p.com%7Ca19d6997cb63454afd7808de94a961fe%7C686ea1d3bc2b4c6fa92cd
> 99c5c301635%7C0%7C0%7C639111652420710900%7CUnknown%7CTWFpbG
> Zsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiI
> sIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=P4Vz%2B7kH
> i07bzBnR1w4smYzRWDKPbzQsEcJXqEGyzP4%3D&reserved=0
>
> - Mani
>
> > + }
> > +
> > + if (ret)
> > + return ret;
> > +
> > + return devm_add_action_or_reset(dev,
> pci_host_common_delete_ports,
> > + &bridge->ports);
> > +}
> > +EXPORT_SYMBOL_GPL(pci_host_common_parse_ports);
> > +
> > static void gen_pci_unmap_cfg(void *ptr) {
> > pci_ecam_free((struct pci_config_window *)ptr); diff --git
> > a/drivers/pci/controller/pci-host-common.h
> > b/drivers/pci/controller/pci-host-common.h
> > index b5075d4bd7eb..37714bedb625 100644
> > --- a/drivers/pci/controller/pci-host-common.h
> > +++ b/drivers/pci/controller/pci-host-common.h
> > @@ -12,6 +12,22 @@
> >
> > struct pci_ecam_ops;
> >
> > +/**
> > + * struct pci_host_port - Generic Root Port properties
> > + * @list: List node for linking multiple ports
> > + * @reset: GPIO descriptor for PERST# signal
> > + *
> > + * This structure contains common properties that can be parsed from
> > + * Root Port device tree nodes.
> > + */
> > +struct pci_host_port {
> > + struct list_head list;
> > + struct gpio_desc *reset;
> > +};
> > +
> > +void pci_host_common_delete_ports(void *data); int
> > +pci_host_common_parse_ports(struct device *dev, struct
> > +pci_host_bridge *bridge);
> > +
> > int pci_host_common_probe(struct platform_device *pdev); int
> > pci_host_common_init(struct platform_device *pdev,
> > struct pci_host_bridge *bridge,
> > diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c index
> > eaa4a3d662e8..629ae08b7d35 100644
> > --- a/drivers/pci/probe.c
> > +++ b/drivers/pci/probe.c
> > @@ -677,6 +677,7 @@ static void pci_init_host_bridge(struct
> > pci_host_bridge *bridge) {
> > INIT_LIST_HEAD(&bridge->windows);
> > INIT_LIST_HEAD(&bridge->dma_ranges);
> > + INIT_LIST_HEAD(&bridge->ports);
> >
> > /*
> > * We assume we can manage these PCIe features. Some systems
> may
> > diff --git a/include/linux/pci.h b/include/linux/pci.h index
> > 8f63de38f2d2..a73ea81ce88f 100644
> > --- a/include/linux/pci.h
> > +++ b/include/linux/pci.h
> > @@ -636,6 +636,7 @@ struct pci_host_bridge {
> > int domain_nr;
> > struct list_head windows; /* resource_entry */
> > struct list_head dma_ranges; /* dma ranges resource list */
> > + struct list_head ports; /* Root Port list (pci_host_port) */
> > #ifdef CONFIG_PCI_IDE
> > u16 nr_ide_streams; /* Max streams possibly active in
> @ide_stream_ida */
> > struct ida ide_stream_ida;
> > --
> > 2.37.1
> >
>
> --
> மணிவண்ணன் சதாசிவம்
^ permalink raw reply
* Re: [PATCH] arm64: dts: imx93-9x9-qsb: Add tianma,tm050rdh03 panel
From: Liu Ying @ 2026-04-08 6:02 UTC (permalink / raw)
To: Frank Li
Cc: Sascha Hauer, Pengutronix Kernel Team, Fabio Estevam, Rob Herring,
Krzysztof Kozlowski, Conor Dooley, imx, linux-arm-kernel,
devicetree, linux-kernel
In-Reply-To: <adTUkWvqVUhLiw_J@lizhi-Precision-Tower-5810>
Hi Frank,
On Tue, Apr 07, 2026 at 05:55:29AM -0400, Frank Li wrote:
> On Tue, Apr 07, 2026 at 05:15:31PM +0800, Liu Ying wrote:
>> Support tianma,tm050rdh03 DPI panel on i.MX93 9x9 QSB.
>>
>> The panel connects with the QSB board through an adapter board[1]
>> designed by NXP.
>>
>> Link: https://www.nxp.com/design/design-center/development-boards-and-designs/parallel-lcd-display:TM050RDH03-41 [1]
>> Signed-off-by: Liu Ying <victor.liu@nxp.com>
>> ---
>> arch/arm64/boot/dts/freescale/Makefile | 2 +
>> .../imx93-9x9-qsb-ontat-kd50g21-40nt-a1.dtsi | 110 +++++++++++++++++++++
>> .../imx93-9x9-qsb-ontat-kd50g21-40nt-a1.dtso | 106 +-------------------
>
> Can you add some description about raname in commit message?
I'll add some description about the file copy in commit message.
> Use -C option to create patch.
Will do.
>
> ...
>> diff --git a/arch/arm64/boot/dts/freescale/imx93-9x9-qsb-tianma-tm050rdh03.dtso b/arch/arm64/boot/dts/freescale/imx93-9x9-qsb-tianma-tm050rdh03.dtso
>> new file mode 100644
>> index 000000000000..c233797ec28c
>> --- /dev/null
>> +++ b/arch/arm64/boot/dts/freescale/imx93-9x9-qsb-tianma-tm050rdh03.dtso
>> @@ -0,0 +1,14 @@
>> +// SPDX-License-Identifier: (GPL-2.0+ OR MIT)
>> +/*
>> + * Copyright 2026 NXP
>> + */
>> +
>> +#include <dt-bindings/gpio/gpio.h>
>> +#include "imx93-9x9-qsb-ontat-kd50g21-40nt-a1.dtsi"
>> +
>> +&{/} {
>> + panel {
>> + compatible = "tianma,tm050rdh03";
>> + enable-gpios = <&pcal6524 8 GPIO_ACTIVE_HIGH>;
>> + };
>> +};
>
> Is it possible to appply this overlay file and kd50g21-40nt-a1 overlay file
>
> to imx93-9x9-qsb.dtb, so needn't create dtsi.
I'm sorry, I don't get your question here.
Anyway, the DT overlays are needed, because the 40-pin EXP/PRI interface on
the i.MX93 9x9 QSB board can not only connect to a DPI panel adapter board
but also to an audio hat[2], and maybe more. The newly introduced .dtsi
file just aims to avoid duplicated code.
[2] https://www.nxp.com/design/design-center/development-boards-and-designs/mx93aud-hat-audio-board:MX93AUD-HAT
>
> Frank
>>
>> ---
>> base-commit: 816f193dd0d95246f208590924dd962b192def78
>> change-id: 20260407-tianma-tm050rdh03-imx93-9x9-qsb-6e4bbbde3d08
>>
>> Best regards,
>> --
>> Liu Ying <victor.liu@nxp.com>
>>
--
Regards,
Liu Ying
^ permalink raw reply
* Re: [RFC PATCH 5/8] mm/vmalloc: map contiguous pages in batches for vmap() if possible
From: Barry Song @ 2026-04-08 5:12 UTC (permalink / raw)
To: Dev Jain
Cc: linux-mm, linux-arm-kernel, catalin.marinas, will, akpm, urezki,
linux-kernel, anshuman.khandual, ryan.roberts, ajd, rppt, david,
Xueyuan.chen21
In-Reply-To: <b297383a-731c-4efa-95b7-0f198b098d11@arm.com>
On Wed, Apr 8, 2026 at 12:20 PM Dev Jain <dev.jain@arm.com> wrote:
>
>
>
> On 08/04/26 8:21 am, Barry Song (Xiaomi) wrote:
> > In many cases, the pages passed to vmap() may include high-order
> > pages allocated with __GFP_COMP flags. For example, the systemheap
> > often allocates pages in descending order: order 8, then 4, then 0.
> > Currently, vmap() iterates over every page individually—even pages
> > inside a high-order block are handled one by one.
> >
> > This patch detects high-order pages and maps them as a single
> > contiguous block whenever possible.
> >
> > An alternative would be to implement a new API, vmap_sg(), but that
> > change seems to be large in scope.
> >
> > Signed-off-by: Barry Song (Xiaomi) <baohua@kernel.org>
> > ---
>
> Coincidentally, I was working on the same thing :)
Interesting, thanks — at least I’ve got one good reviewer :-)
>
> We have a usecase regarding Arm TRBE and SPE aux buffers.
>
> I'll take a look at your patches later, but my implementation is the
Yes. Please.
> following, if you have any comments. I have squashed the patches into
> a single diff.
Thanks very much, Dev. What you’ve done is quite similar to
patches 5/8 and 6/8, although the code differs somewhat.
>
>
>
> From ccb9670a52b7f50b1f1e07b579a1316f76b84811 Mon Sep 17 00:00:00 2001
> From: Dev Jain <dev.jain@arm.com>
> Date: Thu, 26 Feb 2026 16:21:29 +0530
> Subject: [PATCH] arm64/perf: map AUX buffer with large pages
>
> Signed-off-by: Dev Jain <dev.jain@arm.com>
> ---
> .../hwtracing/coresight/coresight-etm-perf.c | 3 +-
> drivers/hwtracing/coresight/coresight-trbe.c | 3 +-
> drivers/perf/arm_spe_pmu.c | 5 +-
> mm/vmalloc.c | 86 ++++++++++++++++---
> 4 files changed, 79 insertions(+), 18 deletions(-)
>
> diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c b/drivers/hwtracing/coresight/coresight-etm-perf.c
> index 72017dcc3b7f1..e90a430af86bb 100644
> --- a/drivers/hwtracing/coresight/coresight-etm-perf.c
> +++ b/drivers/hwtracing/coresight/coresight-etm-perf.c
> @@ -984,7 +984,8 @@ int __init etm_perf_init(void)
>
> etm_pmu.capabilities = (PERF_PMU_CAP_EXCLUSIVE |
> PERF_PMU_CAP_ITRACE |
> - PERF_PMU_CAP_AUX_PAUSE);
> + PERF_PMU_CAP_AUX_PAUSE |
> + PERF_PMU_CAP_AUX_PREFER_LARGE);
>
> etm_pmu.attr_groups = etm_pmu_attr_groups;
> etm_pmu.task_ctx_nr = perf_sw_context;
> diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c
> index 1511f8eb95afb..74e6ad891e236 100644
> --- a/drivers/hwtracing/coresight/coresight-trbe.c
> +++ b/drivers/hwtracing/coresight/coresight-trbe.c
> @@ -760,7 +760,8 @@ static void *arm_trbe_alloc_buffer(struct coresight_device *csdev,
> for (i = 0; i < nr_pages; i++)
> pglist[i] = virt_to_page(pages[i]);
>
> - buf->trbe_base = (unsigned long)vmap(pglist, nr_pages, VM_MAP, PAGE_KERNEL);
> + buf->trbe_base = (unsigned long)vmap(pglist, nr_pages,
> + VM_MAP | VM_ALLOW_HUGE_VMAP, PAGE_KERNEL);
> if (!buf->trbe_base) {
> kfree(pglist);
> kfree(buf);
> diff --git a/drivers/perf/arm_spe_pmu.c b/drivers/perf/arm_spe_pmu.c
> index dbd0da1116390..90c349fd66b2c 100644
> --- a/drivers/perf/arm_spe_pmu.c
> +++ b/drivers/perf/arm_spe_pmu.c
> @@ -1027,7 +1027,7 @@ static void *arm_spe_pmu_setup_aux(struct perf_event *event, void **pages,
> for (i = 0; i < nr_pages; ++i)
> pglist[i] = virt_to_page(pages[i]);
>
> - buf->base = vmap(pglist, nr_pages, VM_MAP, PAGE_KERNEL);
> + buf->base = vmap(pglist, nr_pages, VM_MAP | VM_ALLOW_HUGE_VMAP, PAGE_KERNEL);
> if (!buf->base)
> goto out_free_pglist;
>
> @@ -1064,7 +1064,8 @@ static int arm_spe_pmu_perf_init(struct arm_spe_pmu *spe_pmu)
> spe_pmu->pmu = (struct pmu) {
> .module = THIS_MODULE,
> .parent = &spe_pmu->pdev->dev,
> - .capabilities = PERF_PMU_CAP_EXCLUSIVE | PERF_PMU_CAP_ITRACE,
> + .capabilities = PERF_PMU_CAP_EXCLUSIVE | PERF_PMU_CAP_ITRACE |
> + PERF_PMU_CAP_AUX_PREFER_LARGE,
> .attr_groups = arm_spe_pmu_attr_groups,
> /*
> * We hitch a ride on the software context here, so that
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 61caa55a44027..8482463d41203 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -660,14 +660,14 @@ int __vmap_pages_range_noflush(unsigned long addr, unsigned long end,
> pgprot_t prot, struct page **pages, unsigned int page_shift)
> {
> unsigned int i, nr = (end - addr) >> PAGE_SHIFT;
> -
> + unsigned long step = 1UL << (page_shift - PAGE_SHIFT);
> WARN_ON(page_shift < PAGE_SHIFT);
>
> if (!IS_ENABLED(CONFIG_HAVE_ARCH_HUGE_VMALLOC) ||
> page_shift == PAGE_SHIFT)
> return vmap_small_pages_range_noflush(addr, end, prot, pages);
>
> - for (i = 0; i < nr; i += 1U << (page_shift - PAGE_SHIFT)) {
> + for (i = 0; i < ALIGN_DOWN(nr, step); i += step) {
> int err;
>
> err = vmap_range_noflush(addr, addr + (1UL << page_shift),
> @@ -678,8 +678,9 @@ int __vmap_pages_range_noflush(unsigned long addr, unsigned long end,
>
> addr += 1UL << page_shift;
> }
> -
> - return 0;
> + if (IS_ALIGNED(nr, step))
> + return 0;
> + return vmap_small_pages_range_noflush(addr, end, prot, pages + i);
> }
>
> int vmap_pages_range_noflush(unsigned long addr, unsigned long end,
> @@ -3514,6 +3515,50 @@ void vunmap(const void *addr)
> }
> EXPORT_SYMBOL(vunmap);
>
> +static inline unsigned int vm_shift(pgprot_t prot, unsigned long size)
> +{
> + if (arch_vmap_pmd_supported(prot) && size >= PMD_SIZE)
> + return PMD_SHIFT;
> +
> + return arch_vmap_pte_supported_shift(size);
> +}
> +
> +static inline int __vmap_huge(struct page **pages, pgprot_t prot,
> + unsigned long addr, unsigned int count)
> +{
> + unsigned int i = 0;
> + unsigned int shift;
> + unsigned long nr;
> +
> + while (i < count) {
> + nr = num_pages_contiguous(pages + i, count - i);
> + shift = vm_shift(prot, nr << PAGE_SHIFT);
> + if (vmap_pages_range(addr, addr + (nr << PAGE_SHIFT),
> + pgprot_nx(prot), pages + i, shift) < 0) {
> + return 1;
> + }
One observation on my side is that the performance gain is somewhat
offset by page table zigzagging caused by what you are doing here -
iterating each mem segment by vmap_pages_range() .
In patch 3/8, I enhanced vmap_small_pages_range_noflush() to
avoid repeated pgd → p4d → pud → pmd → pte traversals for page
shifts other than PAGE_SHIFT. This improves performance for
vmalloc as well as vmap(). Then, in patch 7/8, I adopt the new
vmap_small_pages_range_noflush() and eliminate the iteration.
> + i += nr;
> + addr += (nr << PAGE_SHIFT);
> + }
> + return 0;
> +}
> +
> +static unsigned long max_contiguous_stride_order(struct page **pages,
> + pgprot_t prot, unsigned int count)
> +{
> + unsigned long max_shift = PAGE_SHIFT;
> + unsigned int i = 0;
> +
> + while (i < count) {
> + unsigned long nr = num_pages_contiguous(pages + i, count - i);
> + unsigned long shift = vm_shift(prot, nr << PAGE_SHIFT);
> +
> + max_shift = max(max_shift, shift);
> + i += nr;
> + }
> + return max_shift;
> +}
> +
> /**
> * vmap - map an array of pages into virtually contiguous space
> * @pages: array of page pointers
> @@ -3552,15 +3597,32 @@ void *vmap(struct page **pages, unsigned int count,
> return NULL;
>
> size = (unsigned long)count << PAGE_SHIFT;
> - area = get_vm_area_caller(size, flags, __builtin_return_address(0));
> + if (flags & VM_ALLOW_HUGE_VMAP) {
> + /* determine from page array, the max alignment */
> + unsigned long max_shift = max_contiguous_stride_order(pages, prot, count);
> +
> + area = __get_vm_area_node(size, 1 << max_shift, max_shift, flags,
> + VMALLOC_START, VMALLOC_END, NUMA_NO_NODE,
> + GFP_KERNEL, __builtin_return_address(0));
> + } else {
> + area = get_vm_area_caller(size, flags, __builtin_return_address(0));
> + }
> if (!area)
> return NULL;
>
> addr = (unsigned long)area->addr;
> - if (vmap_pages_range(addr, addr + size, pgprot_nx(prot),
> - pages, PAGE_SHIFT) < 0) {
> - vunmap(area->addr);
> - return NULL;
> +
> + if (flags & VM_ALLOW_HUGE_VMAP) {
> + if (__vmap_huge(pages, prot, addr, count)) {
> + vunmap(area->addr);
> + return NULL;
> + }
> + } else {
> + if (vmap_pages_range(addr, addr + size, pgprot_nx(prot),
> + pages, PAGE_SHIFT) < 0) {
> + vunmap(area->addr);
> + return NULL;
> + }
> }
>
> if (flags & VM_MAP_PUT_PAGES) {
> @@ -4011,11 +4073,7 @@ void *__vmalloc_node_range_noprof(unsigned long size, unsigned long align,
> * their allocations due to apply_to_page_range not
> * supporting them.
> */
> -
> - if (arch_vmap_pmd_supported(prot) && size >= PMD_SIZE)
> - shift = PMD_SHIFT;
> - else
> - shift = arch_vmap_pte_supported_shift(size);
> + shift = vm_shift(prot, size);
What I actually did is different. In patches 1/8 and 2/8, I
extended the arm64 levels to support N * CONT_PTE, and let the
final PTE mapping use the maximum possible batch after avoiding
zigzag. This further improves all orders greater than CONT_PTE.
Thanks
Barry
^ permalink raw reply
* Re: [RFC PATCH 5/8] mm/vmalloc: map contiguous pages in batches for vmap() if possible
From: Dev Jain @ 2026-04-08 4:19 UTC (permalink / raw)
To: Barry Song (Xiaomi), linux-mm, linux-arm-kernel, catalin.marinas,
will, akpm, urezki
Cc: linux-kernel, anshuman.khandual, ryan.roberts, ajd, rppt, david,
Xueyuan.chen21
In-Reply-To: <20260408025115.27368-6-baohua@kernel.org>
On 08/04/26 8:21 am, Barry Song (Xiaomi) wrote:
> In many cases, the pages passed to vmap() may include high-order
> pages allocated with __GFP_COMP flags. For example, the systemheap
> often allocates pages in descending order: order 8, then 4, then 0.
> Currently, vmap() iterates over every page individually—even pages
> inside a high-order block are handled one by one.
>
> This patch detects high-order pages and maps them as a single
> contiguous block whenever possible.
>
> An alternative would be to implement a new API, vmap_sg(), but that
> change seems to be large in scope.
>
> Signed-off-by: Barry Song (Xiaomi) <baohua@kernel.org>
> ---
Coincidentally, I was working on the same thing :)
We have a usecase regarding Arm TRBE and SPE aux buffers.
I'll take a look at your patches later, but my implementation is the
following, if you have any comments. I have squashed the patches into
a single diff.
From ccb9670a52b7f50b1f1e07b579a1316f76b84811 Mon Sep 17 00:00:00 2001
From: Dev Jain <dev.jain@arm.com>
Date: Thu, 26 Feb 2026 16:21:29 +0530
Subject: [PATCH] arm64/perf: map AUX buffer with large pages
Signed-off-by: Dev Jain <dev.jain@arm.com>
---
.../hwtracing/coresight/coresight-etm-perf.c | 3 +-
drivers/hwtracing/coresight/coresight-trbe.c | 3 +-
drivers/perf/arm_spe_pmu.c | 5 +-
mm/vmalloc.c | 86 ++++++++++++++++---
4 files changed, 79 insertions(+), 18 deletions(-)
diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c b/drivers/hwtracing/coresight/coresight-etm-perf.c
index 72017dcc3b7f1..e90a430af86bb 100644
--- a/drivers/hwtracing/coresight/coresight-etm-perf.c
+++ b/drivers/hwtracing/coresight/coresight-etm-perf.c
@@ -984,7 +984,8 @@ int __init etm_perf_init(void)
etm_pmu.capabilities = (PERF_PMU_CAP_EXCLUSIVE |
PERF_PMU_CAP_ITRACE |
- PERF_PMU_CAP_AUX_PAUSE);
+ PERF_PMU_CAP_AUX_PAUSE |
+ PERF_PMU_CAP_AUX_PREFER_LARGE);
etm_pmu.attr_groups = etm_pmu_attr_groups;
etm_pmu.task_ctx_nr = perf_sw_context;
diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c
index 1511f8eb95afb..74e6ad891e236 100644
--- a/drivers/hwtracing/coresight/coresight-trbe.c
+++ b/drivers/hwtracing/coresight/coresight-trbe.c
@@ -760,7 +760,8 @@ static void *arm_trbe_alloc_buffer(struct coresight_device *csdev,
for (i = 0; i < nr_pages; i++)
pglist[i] = virt_to_page(pages[i]);
- buf->trbe_base = (unsigned long)vmap(pglist, nr_pages, VM_MAP, PAGE_KERNEL);
+ buf->trbe_base = (unsigned long)vmap(pglist, nr_pages,
+ VM_MAP | VM_ALLOW_HUGE_VMAP, PAGE_KERNEL);
if (!buf->trbe_base) {
kfree(pglist);
kfree(buf);
diff --git a/drivers/perf/arm_spe_pmu.c b/drivers/perf/arm_spe_pmu.c
index dbd0da1116390..90c349fd66b2c 100644
--- a/drivers/perf/arm_spe_pmu.c
+++ b/drivers/perf/arm_spe_pmu.c
@@ -1027,7 +1027,7 @@ static void *arm_spe_pmu_setup_aux(struct perf_event *event, void **pages,
for (i = 0; i < nr_pages; ++i)
pglist[i] = virt_to_page(pages[i]);
- buf->base = vmap(pglist, nr_pages, VM_MAP, PAGE_KERNEL);
+ buf->base = vmap(pglist, nr_pages, VM_MAP | VM_ALLOW_HUGE_VMAP, PAGE_KERNEL);
if (!buf->base)
goto out_free_pglist;
@@ -1064,7 +1064,8 @@ static int arm_spe_pmu_perf_init(struct arm_spe_pmu *spe_pmu)
spe_pmu->pmu = (struct pmu) {
.module = THIS_MODULE,
.parent = &spe_pmu->pdev->dev,
- .capabilities = PERF_PMU_CAP_EXCLUSIVE | PERF_PMU_CAP_ITRACE,
+ .capabilities = PERF_PMU_CAP_EXCLUSIVE | PERF_PMU_CAP_ITRACE |
+ PERF_PMU_CAP_AUX_PREFER_LARGE,
.attr_groups = arm_spe_pmu_attr_groups,
/*
* We hitch a ride on the software context here, so that
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 61caa55a44027..8482463d41203 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -660,14 +660,14 @@ int __vmap_pages_range_noflush(unsigned long addr, unsigned long end,
pgprot_t prot, struct page **pages, unsigned int page_shift)
{
unsigned int i, nr = (end - addr) >> PAGE_SHIFT;
-
+ unsigned long step = 1UL << (page_shift - PAGE_SHIFT);
WARN_ON(page_shift < PAGE_SHIFT);
if (!IS_ENABLED(CONFIG_HAVE_ARCH_HUGE_VMALLOC) ||
page_shift == PAGE_SHIFT)
return vmap_small_pages_range_noflush(addr, end, prot, pages);
- for (i = 0; i < nr; i += 1U << (page_shift - PAGE_SHIFT)) {
+ for (i = 0; i < ALIGN_DOWN(nr, step); i += step) {
int err;
err = vmap_range_noflush(addr, addr + (1UL << page_shift),
@@ -678,8 +678,9 @@ int __vmap_pages_range_noflush(unsigned long addr, unsigned long end,
addr += 1UL << page_shift;
}
-
- return 0;
+ if (IS_ALIGNED(nr, step))
+ return 0;
+ return vmap_small_pages_range_noflush(addr, end, prot, pages + i);
}
int vmap_pages_range_noflush(unsigned long addr, unsigned long end,
@@ -3514,6 +3515,50 @@ void vunmap(const void *addr)
}
EXPORT_SYMBOL(vunmap);
+static inline unsigned int vm_shift(pgprot_t prot, unsigned long size)
+{
+ if (arch_vmap_pmd_supported(prot) && size >= PMD_SIZE)
+ return PMD_SHIFT;
+
+ return arch_vmap_pte_supported_shift(size);
+}
+
+static inline int __vmap_huge(struct page **pages, pgprot_t prot,
+ unsigned long addr, unsigned int count)
+{
+ unsigned int i = 0;
+ unsigned int shift;
+ unsigned long nr;
+
+ while (i < count) {
+ nr = num_pages_contiguous(pages + i, count - i);
+ shift = vm_shift(prot, nr << PAGE_SHIFT);
+ if (vmap_pages_range(addr, addr + (nr << PAGE_SHIFT),
+ pgprot_nx(prot), pages + i, shift) < 0) {
+ return 1;
+ }
+ i += nr;
+ addr += (nr << PAGE_SHIFT);
+ }
+ return 0;
+}
+
+static unsigned long max_contiguous_stride_order(struct page **pages,
+ pgprot_t prot, unsigned int count)
+{
+ unsigned long max_shift = PAGE_SHIFT;
+ unsigned int i = 0;
+
+ while (i < count) {
+ unsigned long nr = num_pages_contiguous(pages + i, count - i);
+ unsigned long shift = vm_shift(prot, nr << PAGE_SHIFT);
+
+ max_shift = max(max_shift, shift);
+ i += nr;
+ }
+ return max_shift;
+}
+
/**
* vmap - map an array of pages into virtually contiguous space
* @pages: array of page pointers
@@ -3552,15 +3597,32 @@ void *vmap(struct page **pages, unsigned int count,
return NULL;
size = (unsigned long)count << PAGE_SHIFT;
- area = get_vm_area_caller(size, flags, __builtin_return_address(0));
+ if (flags & VM_ALLOW_HUGE_VMAP) {
+ /* determine from page array, the max alignment */
+ unsigned long max_shift = max_contiguous_stride_order(pages, prot, count);
+
+ area = __get_vm_area_node(size, 1 << max_shift, max_shift, flags,
+ VMALLOC_START, VMALLOC_END, NUMA_NO_NODE,
+ GFP_KERNEL, __builtin_return_address(0));
+ } else {
+ area = get_vm_area_caller(size, flags, __builtin_return_address(0));
+ }
if (!area)
return NULL;
addr = (unsigned long)area->addr;
- if (vmap_pages_range(addr, addr + size, pgprot_nx(prot),
- pages, PAGE_SHIFT) < 0) {
- vunmap(area->addr);
- return NULL;
+
+ if (flags & VM_ALLOW_HUGE_VMAP) {
+ if (__vmap_huge(pages, prot, addr, count)) {
+ vunmap(area->addr);
+ return NULL;
+ }
+ } else {
+ if (vmap_pages_range(addr, addr + size, pgprot_nx(prot),
+ pages, PAGE_SHIFT) < 0) {
+ vunmap(area->addr);
+ return NULL;
+ }
}
if (flags & VM_MAP_PUT_PAGES) {
@@ -4011,11 +4073,7 @@ void *__vmalloc_node_range_noprof(unsigned long size, unsigned long align,
* their allocations due to apply_to_page_range not
* supporting them.
*/
-
- if (arch_vmap_pmd_supported(prot) && size >= PMD_SIZE)
- shift = PMD_SHIFT;
- else
- shift = arch_vmap_pte_supported_shift(size);
+ shift = vm_shift(prot, size);
align = max(original_align, 1UL << shift);
}
--
2.34.1
> mm/vmalloc.c | 51 +++++++++++++++++++++++++++++++++++++++++++++++++--
> 1 file changed, 49 insertions(+), 2 deletions(-)
>
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index eba436386929..e8dbfada42bc 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -3529,6 +3529,53 @@ void vunmap(const void *addr)
> }
> EXPORT_SYMBOL(vunmap);
>
> +static inline int get_vmap_batch_order(struct page **pages,
> + unsigned int max_steps, unsigned int idx)
> +{
> + unsigned int nr_pages;
> +
> + if (!IS_ENABLED(CONFIG_HAVE_ARCH_HUGE_VMAP) ||
> + ioremap_max_page_shift == PAGE_SHIFT)
> + return 0;
> +
> + nr_pages = compound_nr(pages[idx]);
> + if (nr_pages == 1 || max_steps < nr_pages)
> + return 0;
> +
> + if (num_pages_contiguous(&pages[idx], nr_pages) == nr_pages)
> + return compound_order(pages[idx]);
> + return 0;
> +}
> +
> +static int vmap_contig_pages_range(unsigned long addr, unsigned long end,
> + pgprot_t prot, struct page **pages)
> +{
> + unsigned int count = (end - addr) >> PAGE_SHIFT;
> + int err;
> +
> + err = kmsan_vmap_pages_range_noflush(addr, end, prot, pages,
> + PAGE_SHIFT, GFP_KERNEL);
> + if (err)
> + goto out;
> +
> + for (unsigned int i = 0; i < count; ) {
> + unsigned int shift = PAGE_SHIFT +
> + get_vmap_batch_order(pages, count - i, i);
> +
> + err = vmap_range_noflush(addr, addr + (1UL << shift),
> + page_to_phys(pages[i]), prot, shift);
> + if (err)
> + goto out;
> +
> + addr += 1UL << shift;
> + i += 1U << (shift - PAGE_SHIFT);
> + }
> +
> +out:
> + flush_cache_vmap(addr, end);
> + return err;
> +}
> +
> /**
> * vmap - map an array of pages into virtually contiguous space
> * @pages: array of page pointers
> @@ -3572,8 +3619,8 @@ void *vmap(struct page **pages, unsigned int count,
> return NULL;
>
> addr = (unsigned long)area->addr;
> - if (vmap_pages_range(addr, addr + size, pgprot_nx(prot),
> - pages, PAGE_SHIFT) < 0) {
> + if (vmap_contig_pages_range(addr, addr + size, pgprot_nx(prot),
> + pages) < 0) {
> vunmap(area->addr);
> return NULL;
> }
^ permalink raw reply related
* Re: [PATCH v12 3/3] of: Respect #{iommu,msi}-cells in maps
From: Vijayanand Jitta @ 2026-04-08 3:46 UTC (permalink / raw)
To: Nipun Gupta, Nikhil Agarwal, Joerg Roedel, Will Deacon,
Robin Murphy, Marc Zyngier, Lorenzo Pieralisi, Thomas Gleixner,
Saravana Kannan, Richard Zhu, Lucas Stach,
Krzysztof Wilczyński, Manivannan Sadhasivam, Bjorn Helgaas,
Frank Li, Sascha Hauer, Pengutronix Kernel Team, Fabio Estevam,
Juergen Gross, Stefano Stabellini, Oleksandr Tyshchenko,
Dmitry Baryshkov, Konrad Dybcio, Bjorn Andersson, Rob Herring,
Conor Dooley, Krzysztof Kozlowski, Prakash Gupta, Vikash Garodia
Cc: linux-kernel, iommu, linux-arm-kernel, devicetree, linux-pci, imx,
xen-devel, linux-arm-msm, Charan Teja Kalla
In-Reply-To: <20260331-parse_iommu_cells-v12-3-decfd305eea9@oss.qualcomm.com>
On 3/31/2026 7:34 PM, Vijayanand Jitta wrote:
> From: Robin Murphy <robin.murphy@arm.com>
>
> So far our parsing of {iommu,msi}-map properties has always blindly
> assumed that the output specifiers will always have exactly 1 cell.
> This typically does happen to be the case, but is not actually enforced
> (and the PCI msi-map binding even explicitly states support for 0 or 1
> cells) - as a result we've now ended up with dodgy DTs out in the field
> which depend on this behaviour to map a 1-cell specifier for a 2-cell
> provider, despite that being bogus per the bindings themselves.
>
> Since there is some potential use in being able to map at least single
> input IDs to multi-cell output specifiers (and properly support 0-cell
> outputs as well), add support for properly parsing and using the target
> nodes' #cells values, albeit with the unfortunate complication of still
> having to work around expectations of the old behaviour too.
>
> Since there are multi-cell output specifiers, the callers of of_map_id()
> may need to get the exact cell output value for further processing.
> Update of_map_id() to set args_count in the output to reflect the actual
> number of output specifier cells.
>
> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> Signed-off-by: Charan Teja Kalla <charan.kalla@oss.qualcomm.com>
> Signed-off-by: Vijayanand Jitta <vijayanand.jitta@oss.qualcomm.com>
> ---
> drivers/of/base.c | 155 ++++++++++++++++++++++++++++++++++++++++-------------
> include/linux/of.h | 6 ++-
> 2 files changed, 123 insertions(+), 38 deletions(-)
>
> diff --git a/drivers/of/base.c b/drivers/of/base.c
> index b3d002015192..7b22e2484e1c 100644
> --- a/drivers/of/base.c
> +++ b/drivers/of/base.c
> @@ -2096,18 +2096,48 @@ int of_find_last_cache_level(unsigned int cpu)
> return cache_level;
> }
>
> +/*
> + * Some DTs have an iommu-map targeting a 2-cell IOMMU node while
> + * specifying only 1 cell. Fortunately they all consist of value '1'
> + * as the 2nd cell entry with the same target, so check for that pattern.
> + *
> + * Example:
> + * IOMMU node:
> + * #iommu-cells = <2>;
> + *
> + * Device node:
> + * iommu-map = <0x0000 &smmu 0x0000 0x1>,
> + * <0x0100 &smmu 0x0100 0x1>;
> + */
> +static bool of_check_bad_map(const __be32 *map, int len)
> +{
> + __be32 phandle = map[1];
> +
> + if (len % 4)
> + return false;
> + for (int i = 0; i < len; i += 4) {
> + if (map[i + 1] != phandle || map[i + 3] != cpu_to_be32(1))
> + return false;
> + }
> + return true;
> +}
> +
> /**
> * of_map_id - Translate an ID through a downstream mapping.
> * @np: root complex device node.
> * @id: device ID to map.
> * @map_name: property name of the map to use.
> + * @cells_name: property name of target specifier cells.
> * @map_mask_name: optional property name of the mask to use.
> * @filter_np: optional device node to filter matches by, or NULL to match any.
> * If non-NULL, only map entries targeting this node will be matched.
> * @arg: pointer to a &struct of_phandle_args for the result. On success,
> - * @arg->args[0] will contain the translated ID. If a map entry was
> - * matched, @arg->np will be set to the target node with a reference
> - * held that the caller must release with of_node_put().
> + * @arg->args_count will be set to the number of output specifier cells
> + * as defined by @cells_name in the target node, and
> + * @arg->args[0..args_count-1] will contain the translated output
> + * specifier values. If a map entry was matched, @arg->np will be set
> + * to the target node with a reference held that the caller must release
> + * with of_node_put().
> *
> * Given a device ID, look up the appropriate implementation-defined
> * platform ID and/or the target device which receives transactions on that
> @@ -2116,17 +2146,19 @@ int of_find_last_cache_level(unsigned int cpu)
> * Return: 0 on success or a standard error code on failure.
> */
> int of_map_id(const struct device_node *np, u32 id,
> - const char *map_name, const char *map_mask_name,
> + const char *map_name, const char *cells_name,
> + const char *map_mask_name,
> const struct device_node *filter_np, struct of_phandle_args *arg)
> {
> u32 map_mask, masked_id;
> - int map_len;
> + int map_bytes, map_len, offset = 0;
> + bool bad_map = false;
> const __be32 *map = NULL;
>
> if (!np || !map_name || !arg)
> return -EINVAL;
>
> - map = of_get_property(np, map_name, &map_len);
> + map = of_get_property(np, map_name, &map_bytes);
> if (!map) {
> if (filter_np)
> return -ENODEV;
> @@ -2136,11 +2168,9 @@ int of_map_id(const struct device_node *np, u32 id,
> return 0;
> }
>
> - if (!map_len || map_len % (4 * sizeof(*map))) {
> - pr_err("%pOF: Error: Bad %s length: %d\n", np,
> - map_name, map_len);
> - return -EINVAL;
> - }
> + if (map_bytes % sizeof(*map))
> + goto err_map_len;
> + map_len = map_bytes / sizeof(*map);
>
> /* The default is to select all bits. */
> map_mask = 0xffffffff;
> @@ -2153,39 +2183,82 @@ int of_map_id(const struct device_node *np, u32 id,
> of_property_read_u32(np, map_mask_name, &map_mask);
>
> masked_id = map_mask & id;
> - for ( ; map_len > 0; map_len -= 4 * sizeof(*map), map += 4) {
> +
> + while (offset < map_len) {
> struct device_node *phandle_node;
> - u32 id_base = be32_to_cpup(map + 0);
> - u32 phandle = be32_to_cpup(map + 1);
> - u32 out_base = be32_to_cpup(map + 2);
> - u32 id_len = be32_to_cpup(map + 3);
> + u32 id_base, phandle, id_len, id_off, cells = 0;
> + const __be32 *out_base;
> +
> + if (map_len - offset < 2)
> + goto err_map_len;
> +
> + id_base = be32_to_cpup(map + offset);
>
> if (id_base & ~map_mask) {
> - pr_err("%pOF: Invalid %s translation - %s-mask (0x%x) ignores id-base (0x%x)\n",
> - np, map_name, map_name,
> - map_mask, id_base);
> + pr_err("%pOF: Invalid %s translation - %s (0x%x) ignores id-base (0x%x)\n",
> + np, map_name, map_mask_name, map_mask, id_base);
> return -EFAULT;
> }
>
> - if (masked_id < id_base || masked_id >= id_base + id_len)
> - continue;
> -
> + phandle = be32_to_cpup(map + offset + 1);
> phandle_node = of_find_node_by_phandle(phandle);
> if (!phandle_node)
> return -ENODEV;
>
> + if (!bad_map && of_property_read_u32(phandle_node, cells_name, &cells)) {
> + pr_err("%pOF: missing %s property\n", phandle_node, cells_name);
> + of_node_put(phandle_node);
> + return -EINVAL;
> + }
> +
> + if (map_len - offset < 3 + cells) {
> + of_node_put(phandle_node);
> + goto err_map_len;
> + }
> +
> + if (offset == 0 && cells == 2) {
> + bad_map = of_check_bad_map(map, map_len);
> + if (bad_map) {
> + pr_warn_once("%pOF: %s mismatches target %s, assuming extra cell of 0\n",
> + np, map_name, cells_name);
> + cells = 1;
> + }
> + }
> +
> + out_base = map + offset + 2;
> + offset += 3 + cells;
> +
> + id_len = be32_to_cpup(map + offset - 1);
> + if (id_len > 1 && cells > 1) {
> + /*
> + * With 1 output cell we reasonably assume its value
> + * has a linear relationship to the input; with more,
> + * we'd need help from the provider to know what to do.
> + */
> + pr_err("%pOF: Unsupported %s - cannot handle %d-ID range with %d-cell output specifier\n",
> + np, map_name, id_len, cells);
> + of_node_put(phandle_node);
> + return -EINVAL;
> + }
> + id_off = masked_id - id_base;
> + if (masked_id < id_base || id_off >= id_len) {
> + of_node_put(phandle_node);
> + continue;
> + }
> +
> if (filter_np && filter_np != phandle_node) {
> of_node_put(phandle_node);
> continue;
> }
>
> arg->np = phandle_node;
> - arg->args[0] = masked_id - id_base + out_base;
> - arg->args_count = 1;
> + for (int i = 0; i < cells; i++)
> + arg->args[i] = id_off + be32_to_cpu(out_base[i]);
> + arg->args_count = cells;
>
> pr_debug("%pOF: %s, using mask %08x, id-base: %08x, out-base: %08x, length: %08x, id: %08x -> %08x\n",
> - np, map_name, map_mask, id_base, out_base,
> - id_len, id, masked_id - id_base + out_base);
> + np, map_name, map_mask, id_base, be32_to_cpup(out_base),
> + id_len, id, id_off + be32_to_cpup(out_base));
> return 0;
> }
>
> @@ -2196,6 +2269,10 @@ int of_map_id(const struct device_node *np, u32 id,
> arg->args[0] = id;
> arg->args_count = 1;
> return 0;
> +
> +err_map_len:
> + pr_err("%pOF: Error: Bad %s length: %d\n", np, map_name, map_bytes);
> + return -EINVAL;
> }
> EXPORT_SYMBOL_GPL(of_map_id);
>
> @@ -2205,18 +2282,21 @@ EXPORT_SYMBOL_GPL(of_map_id);
> * @id: Requester ID of the device (e.g. PCI RID/BDF or a platform
> * stream/device ID) used as the lookup key in the iommu-map table.
> * @arg: pointer to a &struct of_phandle_args for the result. On success,
> - * @arg->args[0] contains the translated ID. If a map entry was matched,
> - * @arg->np holds a reference to the target node that the caller must
> - * release with of_node_put().
> + * @arg->args_count will be set to the number of output specifier cells
> + * and @arg->args[0..args_count-1] will contain the translated output
> + * specifier values. If a map entry was matched, @arg->np holds a
> + * reference to the target node that the caller must release with
> + * of_node_put().
> *
> - * Convenience wrapper around of_map_id() using "iommu-map" and "iommu-map-mask".
> + * Convenience wrapper around of_map_id() using "iommu-map", "#iommu-cells",
> + * and "iommu-map-mask".
> *
> * Return: 0 on success or a standard error code on failure.
> */
> int of_map_iommu_id(const struct device_node *np, u32 id,
> struct of_phandle_args *arg)
> {
> - return of_map_id(np, id, "iommu-map", "iommu-map-mask", NULL, arg);
> + return of_map_id(np, id, "iommu-map", "#iommu-cells", "iommu-map-mask", NULL, arg);
> }
> EXPORT_SYMBOL_GPL(of_map_iommu_id);
>
> @@ -2229,17 +2309,20 @@ EXPORT_SYMBOL_GPL(of_map_iommu_id);
> * to match any. If non-NULL, only map entries targeting this node will
> * be matched.
> * @arg: pointer to a &struct of_phandle_args for the result. On success,
> - * @arg->args[0] contains the translated ID. If a map entry was matched,
> - * @arg->np holds a reference to the target node that the caller must
> - * release with of_node_put().
> + * @arg->args_count will be set to the number of output specifier cells
> + * and @arg->args[0..args_count-1] will contain the translated output
> + * specifier values. If a map entry was matched, @arg->np holds a
> + * reference to the target node that the caller must release with
> + * of_node_put().
> *
> - * Convenience wrapper around of_map_id() using "msi-map" and "msi-map-mask".
> + * Convenience wrapper around of_map_id() using "msi-map", "#msi-cells",
> + * and "msi-map-mask".
> *
> * Return: 0 on success or a standard error code on failure.
> */
> int of_map_msi_id(const struct device_node *np, u32 id,
> const struct device_node *filter_np, struct of_phandle_args *arg)
> {
> - return of_map_id(np, id, "msi-map", "msi-map-mask", filter_np, arg);
> + return of_map_id(np, id, "msi-map", "#msi-cells", "msi-map-mask", filter_np, arg);
> }
> EXPORT_SYMBOL_GPL(of_map_msi_id);
> diff --git a/include/linux/of.h b/include/linux/of.h
> index 8548cd9eb4f1..51ac8539f2c3 100644
> --- a/include/linux/of.h
> +++ b/include/linux/of.h
> @@ -462,7 +462,8 @@ const char *of_prop_next_string(const struct property *prop, const char *cur);
> bool of_console_check(const struct device_node *dn, char *name, int index);
>
> int of_map_id(const struct device_node *np, u32 id,
> - const char *map_name, const char *map_mask_name,
> + const char *map_name, const char *cells_name,
> + const char *map_mask_name,
> const struct device_node *filter_np, struct of_phandle_args *arg);
>
> int of_map_iommu_id(const struct device_node *np, u32 id,
> @@ -934,7 +935,8 @@ static inline void of_property_clear_flag(struct property *p, unsigned long flag
> }
>
> static inline int of_map_id(const struct device_node *np, u32 id,
> - const char *map_name, const char *map_mask_name,
> + const char *map_name, const char *cells_name,
> + const char *map_mask_name,
> const struct device_node *filter_np,
> struct of_phandle_args *arg)
> {
>
Gentle ping.
Thanks,
Vijay
^ permalink raw reply
* [PATCH] interconnect: imx: fix use-after-free in imx_icc_node_init_qos()
From: Wentao Liang @ 2026-04-08 3:10 UTC (permalink / raw)
To: Georgi Djakov, Shawn Guo, Sascha Hauer
Cc: Pengutronix Kernel Team, Fabio Estevam, linux-pm, imx,
linux-arm-kernel, linux-kernel, Wentao Liang, stable
Move of_node_put(dn) after the last use of dn, and add a missing put
in the error path to avoid both use-after-free and reference leak.
Fixes: f0d8048525d7 ("interconnect: Add imx core driver")
Cc: stable@vger.kernel.org
Signed-off-by: Wentao Liang <vulab@iscas.ac.cn>
---
drivers/interconnect/imx/imx.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/interconnect/imx/imx.c b/drivers/interconnect/imx/imx.c
index 9511f80cf041..75431b5ccef8 100644
--- a/drivers/interconnect/imx/imx.c
+++ b/drivers/interconnect/imx/imx.c
@@ -143,15 +143,16 @@ static int imx_icc_node_init_qos(struct icc_provider *provider,
}
pdev = of_find_device_by_node(dn);
- of_node_put(dn);
if (!pdev) {
dev_warn(dev, "node %s[%d] missing device for %pOF\n",
node->name, node->id, dn);
+ of_node_put(dn);
return -EPROBE_DEFER;
}
node_data->qos_dev = &pdev->dev;
dev_dbg(dev, "node %s[%d] has device node %pOF\n",
node->name, node->id, dn);
+ of_node_put(dn);
}
return dev_pm_qos_add_request(node_data->qos_dev,
--
2.34.1
^ permalink raw reply related
* [PATCH v2 3/4] gpio: realtek: Add driver for Realtek DHC RTD1625 SoC
From: Yu-Chun Lin @ 2026-04-08 2:52 UTC (permalink / raw)
To: linusw, brgl, robh, krzk+dt, conor+dt, afaerber, tychang
Cc: linux-gpio, devicetree, linux-kernel, linux-arm-kernel,
linux-realtek-soc, cy.huang, stanley_chang, eleanor.lin,
james.tai
In-Reply-To: <20260408025243.1155482-1-eleanor.lin@realtek.com>
From: Tzuyi Chang <tychang@realtek.com>
Add support for the GPIO controller found on Realtek DHC RTD1625 SoCs.
Unlike the existing Realtek GPIO driver (drivers/gpio/gpio-rtd.c),
which manages pins via shared bank registers, the RTD1625 introduces
a per-pin register architecture. Each GPIO line now has its own
dedicated 32-bit control register to manage configuration independently,
including direction, output value, input value, interrupt enable, and
debounce. Therefore, this distinct hardware design requires a separate
driver.
Reviewed-by: Linus Walleij <linusw@kernel.org>
Signed-off-by: Tzuyi Chang <tychang@realtek.com>
Signed-off-by: Yu-Chun Lin <eleanor.lin@realtek.com>
---
Changes in v2:
- Remove "default y".
- Add base_offset member to struct rtd1625_gpio_info to handle merged regions.
---
drivers/gpio/Kconfig | 11 +
drivers/gpio/Makefile | 1 +
drivers/gpio/gpio-rtd1625.c | 584 ++++++++++++++++++++++++++++++++++++
3 files changed, 596 insertions(+)
create mode 100644 drivers/gpio/gpio-rtd1625.c
diff --git a/drivers/gpio/Kconfig b/drivers/gpio/Kconfig
index 5ee11a889867..281549ad72ac 100644
--- a/drivers/gpio/Kconfig
+++ b/drivers/gpio/Kconfig
@@ -638,6 +638,17 @@ config GPIO_RTD
Say yes here to support GPIO functionality and GPIO interrupt on
Realtek DHC SoCs.
+config GPIO_RTD1625
+ tristate "Realtek DHC RTD1625 GPIO support"
+ depends on ARCH_REALTEK || COMPILE_TEST
+ select GPIOLIB_IRQCHIP
+ help
+ This option enables support for the GPIO controller on Realtek
+ DHC (Digital Home Center) RTD1625 SoC.
+
+ Say yes here to support both basic GPIO line functionality
+ and GPIO interrupt handling capabilities for this platform.
+
config GPIO_SAMA5D2_PIOBU
tristate "SAMA5D2 PIOBU GPIO support"
depends on MFD_SYSCON
diff --git a/drivers/gpio/Makefile b/drivers/gpio/Makefile
index c05f7d795c43..c95ba218d53a 100644
--- a/drivers/gpio/Makefile
+++ b/drivers/gpio/Makefile
@@ -159,6 +159,7 @@ obj-$(CONFIG_GPIO_REALTEK_OTTO) += gpio-realtek-otto.o
obj-$(CONFIG_GPIO_REG) += gpio-reg.o
obj-$(CONFIG_GPIO_ROCKCHIP) += gpio-rockchip.o
obj-$(CONFIG_GPIO_RTD) += gpio-rtd.o
+obj-$(CONFIG_GPIO_RTD1625) += gpio-rtd1625.o
obj-$(CONFIG_ARCH_SA1100) += gpio-sa1100.o
obj-$(CONFIG_GPIO_SAMA5D2_PIOBU) += gpio-sama5d2-piobu.o
obj-$(CONFIG_GPIO_SCH311X) += gpio-sch311x.o
diff --git a/drivers/gpio/gpio-rtd1625.c b/drivers/gpio/gpio-rtd1625.c
new file mode 100644
index 000000000000..bcc1bbb115fa
--- /dev/null
+++ b/drivers/gpio/gpio-rtd1625.c
@@ -0,0 +1,584 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Realtek DHC RTD1625 gpio driver
+ *
+ * Copyright (c) 2023 Realtek Semiconductor Corp.
+ */
+
+#include <linux/bitfield.h>
+#include <linux/bitops.h>
+#include <linux/gpio/driver.h>
+#include <linux/interrupt.h>
+#include <linux/irqchip.h>
+#include <linux/irqchip/chained_irq.h>
+#include <linux/irqdomain.h>
+#include <linux/module.h>
+#include <linux/platform_device.h>
+#include <linux/property.h>
+#include <linux/spinlock.h>
+#include <linux/types.h>
+
+#define RTD1625_GPIO_DIR BIT(0)
+#define RTD1625_GPIO_OUT BIT(2)
+#define RTD1625_GPIO_IN BIT(4)
+#define RTD1625_GPIO_EDGE_INT_DP BIT(6)
+#define RTD1625_GPIO_EDGE_INT_EN BIT(8)
+#define RTD1625_GPIO_LEVEL_INT_EN BIT(16)
+#define RTD1625_GPIO_LEVEL_INT_DP BIT(18)
+#define RTD1625_GPIO_DEBOUNCE GENMASK(30, 28)
+#define RTD1625_GPIO_DEBOUNCE_WREN BIT(31)
+
+#define RTD1625_GPIO_WREN(x) ((x) << 1)
+
+/* Write-enable masks for all GPIO configs and reserved hardware bits */
+#define RTD1625_ISO_GPIO_WREN_ALL 0x8000aa8a
+#define RTD1625_ISOM_GPIO_WREN_ALL 0x800aaa8a
+
+#define RTD1625_GPIO_DEBOUNCE_1US 0
+#define RTD1625_GPIO_DEBOUNCE_10US 1
+#define RTD1625_GPIO_DEBOUNCE_100US 2
+#define RTD1625_GPIO_DEBOUNCE_1MS 3
+#define RTD1625_GPIO_DEBOUNCE_10MS 4
+#define RTD1625_GPIO_DEBOUNCE_20MS 5
+#define RTD1625_GPIO_DEBOUNCE_30MS 6
+#define RTD1625_GPIO_DEBOUNCE_50MS 7
+
+#define GPIO_CONTROL(gpio) ((gpio) * 4)
+
+/**
+ * struct rtd1625_gpio_info - Specific GPIO register information
+ * @num_gpios: The number of GPIOs
+ * @irq_type_support: Supported IRQ types
+ * @gpa_offset: Offset for GPIO assert interrupt status registers
+ * @gpda_offset: Offset for GPIO deassert interrupt status registers
+ * @level_offset: Offset of level interrupt status register
+ * @write_en_all: Write-enable mask for all configurable bits
+ */
+struct rtd1625_gpio_info {
+ unsigned int num_gpios;
+ unsigned int irq_type_support;
+ unsigned int base_offset;
+ unsigned int gpa_offset;
+ unsigned int gpda_offset;
+ unsigned int level_offset;
+ unsigned int write_en_all;
+};
+
+struct rtd1625_gpio {
+ struct gpio_chip gpio_chip;
+ const struct rtd1625_gpio_info *info;
+ void __iomem *base;
+ void __iomem *irq_base;
+ unsigned int irqs[3];
+ raw_spinlock_t lock;
+ unsigned int *save_regs;
+};
+
+static unsigned int rtd1625_gpio_gpa_offset(struct rtd1625_gpio *data, unsigned int offset)
+{
+ return data->info->gpa_offset + ((offset / 32) * 4);
+}
+
+static unsigned int rtd1625_gpio_gpda_offset(struct rtd1625_gpio *data, unsigned int offset)
+{
+ return data->info->gpda_offset + ((offset / 32) * 4);
+}
+
+static unsigned int rtd1625_gpio_level_offset(struct rtd1625_gpio *data, unsigned int offset)
+{
+ return data->info->level_offset + ((offset / 32) * 4);
+}
+
+static unsigned int rtd1625_gpio_set_debounce(struct gpio_chip *chip, unsigned int offset,
+ unsigned int debounce)
+{
+ struct rtd1625_gpio *data = gpiochip_get_data(chip);
+ u8 deb_val;
+ u32 val;
+
+ switch (debounce) {
+ case 1:
+ deb_val = RTD1625_GPIO_DEBOUNCE_1US;
+ break;
+ case 10:
+ deb_val = RTD1625_GPIO_DEBOUNCE_10US;
+ break;
+ case 100:
+ deb_val = RTD1625_GPIO_DEBOUNCE_100US;
+ break;
+ case 1000:
+ deb_val = RTD1625_GPIO_DEBOUNCE_1MS;
+ break;
+ case 10000:
+ deb_val = RTD1625_GPIO_DEBOUNCE_10MS;
+ break;
+ case 20000:
+ deb_val = RTD1625_GPIO_DEBOUNCE_20MS;
+ break;
+ case 30000:
+ deb_val = RTD1625_GPIO_DEBOUNCE_30MS;
+ break;
+ case 50000:
+ deb_val = RTD1625_GPIO_DEBOUNCE_50MS;
+ break;
+ default:
+ return -ENOTSUPP;
+ }
+
+ val = FIELD_PREP(RTD1625_GPIO_DEBOUNCE, deb_val) | RTD1625_GPIO_DEBOUNCE_WREN;
+
+ guard(raw_spinlock_irqsave)(&data->lock);
+ writel_relaxed(val, data->base + GPIO_CONTROL(offset));
+
+ return 0;
+}
+
+static int rtd1625_gpio_set_config(struct gpio_chip *chip, unsigned int offset,
+ unsigned long config)
+{
+ int debounce;
+
+ if (pinconf_to_config_param(config) == PIN_CONFIG_INPUT_DEBOUNCE) {
+ debounce = pinconf_to_config_argument(config);
+ return rtd1625_gpio_set_debounce(chip, offset, debounce);
+ }
+
+ return gpiochip_generic_config(chip, offset, config);
+}
+
+static int rtd1625_gpio_set(struct gpio_chip *chip, unsigned int offset, int value)
+{
+ struct rtd1625_gpio *data = gpiochip_get_data(chip);
+ u32 val = RTD1625_GPIO_WREN(RTD1625_GPIO_OUT);
+
+ if (value)
+ val |= RTD1625_GPIO_OUT;
+
+ guard(raw_spinlock_irqsave)(&data->lock);
+ writel_relaxed(val, data->base + GPIO_CONTROL(offset));
+
+ return 0;
+}
+
+static int rtd1625_gpio_get(struct gpio_chip *chip, unsigned int offset)
+{
+ struct rtd1625_gpio *data = gpiochip_get_data(chip);
+ u32 val;
+
+ guard(raw_spinlock_irqsave)(&data->lock);
+ val = readl_relaxed(data->base + GPIO_CONTROL(offset));
+
+ if (val & RTD1625_GPIO_DIR)
+ return !!(val & RTD1625_GPIO_OUT);
+ else
+ return !!(val & RTD1625_GPIO_IN);
+}
+
+static int rtd1625_gpio_get_direction(struct gpio_chip *chip, unsigned int offset)
+{
+ struct rtd1625_gpio *data = gpiochip_get_data(chip);
+ u32 val;
+
+ guard(raw_spinlock_irqsave)(&data->lock);
+ val = readl_relaxed(data->base + GPIO_CONTROL(offset));
+
+ if (val & RTD1625_GPIO_DIR)
+ return GPIO_LINE_DIRECTION_OUT;
+
+ return GPIO_LINE_DIRECTION_IN;
+}
+
+static int rtd1625_gpio_set_direction(struct gpio_chip *chip, unsigned int offset, bool out)
+{
+ struct rtd1625_gpio *data = gpiochip_get_data(chip);
+ u32 val = RTD1625_GPIO_WREN(RTD1625_GPIO_DIR);
+
+ if (out)
+ val |= RTD1625_GPIO_DIR;
+
+ guard(raw_spinlock_irqsave)(&data->lock);
+ writel_relaxed(val, data->base + GPIO_CONTROL(offset));
+
+ return 0;
+}
+
+static int rtd1625_gpio_direction_input(struct gpio_chip *chip, unsigned int offset)
+{
+ return rtd1625_gpio_set_direction(chip, offset, false);
+}
+
+static int rtd1625_gpio_direction_output(struct gpio_chip *chip, unsigned int offset, int value)
+{
+ rtd1625_gpio_set(chip, offset, value);
+
+ return rtd1625_gpio_set_direction(chip, offset, true);
+}
+
+static void rtd1625_gpio_irq_handle(struct irq_desc *desc)
+{
+ unsigned int (*get_reg_offset)(struct rtd1625_gpio *gpio, unsigned int offset);
+ struct rtd1625_gpio *data = irq_desc_get_handler_data(desc);
+ struct irq_domain *domain = data->gpio_chip.irq.domain;
+ struct irq_chip *chip = irq_desc_get_chip(desc);
+ unsigned int irq = irq_desc_get_irq(desc);
+ unsigned long status;
+ unsigned int reg_offset, i, j;
+ unsigned int girq;
+ irq_hw_number_t hwirq;
+ u32 irq_type;
+
+ if (irq == data->irqs[0])
+ get_reg_offset = &rtd1625_gpio_gpa_offset;
+ else if (irq == data->irqs[1])
+ get_reg_offset = &rtd1625_gpio_gpda_offset;
+ else if (irq == data->irqs[2])
+ get_reg_offset = &rtd1625_gpio_level_offset;
+ else
+ return;
+
+ chained_irq_enter(chip, desc);
+
+ for (i = 0; i < data->info->num_gpios; i += 32) {
+ reg_offset = get_reg_offset(data, i);
+ status = readl_relaxed(data->irq_base + reg_offset);
+
+ /* Clear edge interrupts; level interrupts are cleared in ->irq_ack() */
+ if (irq != data->irqs[2])
+ writel_relaxed(status, data->irq_base + reg_offset);
+
+ for_each_set_bit(j, &status, 32) {
+ hwirq = i + j;
+ girq = irq_find_mapping(domain, hwirq);
+ irq_type = irq_get_trigger_type(girq);
+
+ if (irq == data->irqs[1] && irq_type != IRQ_TYPE_EDGE_BOTH)
+ continue;
+
+ generic_handle_domain_irq(domain, hwirq);
+ }
+ }
+
+ chained_irq_exit(chip, desc);
+}
+
+static void rtd1625_gpio_ack_irq(struct irq_data *d)
+{
+ struct rtd1625_gpio *data = irq_data_get_irq_chip_data(d);
+ irq_hw_number_t hwirq = irqd_to_hwirq(d);
+ u32 irq_type = irqd_get_trigger_type(d);
+ u32 bit_mask = BIT(hwirq % 32);
+ int reg_offset;
+
+ if (irq_type & IRQ_TYPE_LEVEL_MASK) {
+ reg_offset = rtd1625_gpio_level_offset(data, hwirq);
+ writel_relaxed(bit_mask, data->irq_base + reg_offset);
+ }
+}
+
+static void rtd1625_gpio_enable_edge_irq(struct rtd1625_gpio *data, irq_hw_number_t hwirq)
+{
+ int gpda_reg_offset = rtd1625_gpio_gpda_offset(data, hwirq);
+ int gpa_reg_offset = rtd1625_gpio_gpa_offset(data, hwirq);
+ u32 clr_mask = BIT(hwirq % 32);
+ u32 val;
+
+ guard(raw_spinlock_irqsave)(&data->lock);
+ writel_relaxed(clr_mask, data->irq_base + gpa_reg_offset);
+ writel_relaxed(clr_mask, data->irq_base + gpda_reg_offset);
+ val = RTD1625_GPIO_EDGE_INT_EN | RTD1625_GPIO_WREN(RTD1625_GPIO_EDGE_INT_EN);
+ writel_relaxed(val, data->base + GPIO_CONTROL(hwirq));
+}
+
+static void rtd1625_gpio_disable_edge_irq(struct rtd1625_gpio *data, irq_hw_number_t hwirq)
+{
+ u32 val;
+
+ guard(raw_spinlock_irqsave)(&data->lock);
+ val = RTD1625_GPIO_WREN(RTD1625_GPIO_EDGE_INT_EN);
+ writel_relaxed(val, data->base + GPIO_CONTROL(hwirq));
+}
+
+static void rtd1625_gpio_enable_level_irq(struct rtd1625_gpio *data, irq_hw_number_t hwirq)
+{
+ int level_reg_offset = rtd1625_gpio_level_offset(data, hwirq);
+ u32 clr_mask = BIT(hwirq % 32);
+ u32 val;
+
+ guard(raw_spinlock_irqsave)(&data->lock);
+ writel_relaxed(clr_mask, data->irq_base + level_reg_offset);
+ val = RTD1625_GPIO_LEVEL_INT_EN | RTD1625_GPIO_WREN(RTD1625_GPIO_LEVEL_INT_EN);
+ writel_relaxed(val, data->base + GPIO_CONTROL(hwirq));
+}
+
+static void rtd1625_gpio_disable_level_irq(struct rtd1625_gpio *data, irq_hw_number_t hwirq)
+{
+ u32 val;
+
+ guard(raw_spinlock_irqsave)(&data->lock);
+ val = RTD1625_GPIO_WREN(RTD1625_GPIO_LEVEL_INT_EN);
+ writel_relaxed(val, data->base + GPIO_CONTROL(hwirq));
+}
+
+static void rtd1625_gpio_enable_irq(struct irq_data *d)
+{
+ struct gpio_chip *gc = irq_data_get_irq_chip_data(d);
+ struct rtd1625_gpio *data = gpiochip_get_data(gc);
+ irq_hw_number_t hwirq = irqd_to_hwirq(d);
+ u32 irq_type = irqd_get_trigger_type(d);
+
+ gpiochip_enable_irq(gc, hwirq);
+
+ if (irq_type & IRQ_TYPE_EDGE_BOTH)
+ rtd1625_gpio_enable_edge_irq(data, hwirq);
+ else if (irq_type & IRQ_TYPE_LEVEL_MASK)
+ rtd1625_gpio_enable_level_irq(data, hwirq);
+}
+
+static void rtd1625_gpio_disable_irq(struct irq_data *d)
+{
+ struct gpio_chip *gc = irq_data_get_irq_chip_data(d);
+ struct rtd1625_gpio *data = gpiochip_get_data(gc);
+ irq_hw_number_t hwirq = irqd_to_hwirq(d);
+ u32 irq_type = irqd_get_trigger_type(d);
+
+ if (irq_type & IRQ_TYPE_EDGE_BOTH)
+ rtd1625_gpio_disable_edge_irq(data, hwirq);
+ else if (irq_type & IRQ_TYPE_LEVEL_MASK)
+ rtd1625_gpio_disable_level_irq(data, hwirq);
+
+ gpiochip_disable_irq(gc, hwirq);
+}
+
+static int rtd1625_gpio_irq_set_level_type(struct irq_data *d, bool level)
+{
+ struct gpio_chip *gc = irq_data_get_irq_chip_data(d);
+ struct rtd1625_gpio *data = gpiochip_get_data(gc);
+ irq_hw_number_t hwirq = irqd_to_hwirq(d);
+ u32 val = RTD1625_GPIO_WREN(RTD1625_GPIO_LEVEL_INT_DP);
+
+ if (!(data->info->irq_type_support & IRQ_TYPE_LEVEL_MASK))
+ return -EINVAL;
+
+ scoped_guard(raw_spinlock_irqsave, &data->lock) {
+ if (level)
+ val |= RTD1625_GPIO_LEVEL_INT_DP;
+ writel_relaxed(val, data->base + GPIO_CONTROL(hwirq));
+ }
+
+ irq_set_handler_locked(d, handle_level_irq);
+
+ return 0;
+}
+
+static int rtd1625_gpio_irq_set_edge_type(struct irq_data *d, bool polarity)
+{
+ struct gpio_chip *gc = irq_data_get_irq_chip_data(d);
+ struct rtd1625_gpio *data = gpiochip_get_data(gc);
+ irq_hw_number_t hwirq = irqd_to_hwirq(d);
+ u32 val = RTD1625_GPIO_WREN(RTD1625_GPIO_EDGE_INT_DP);
+
+ if (!(data->info->irq_type_support & IRQ_TYPE_EDGE_BOTH))
+ return -EINVAL;
+
+ scoped_guard(raw_spinlock_irqsave, &data->lock) {
+ if (polarity)
+ val |= RTD1625_GPIO_EDGE_INT_DP;
+ writel_relaxed(val, data->base + GPIO_CONTROL(hwirq));
+ }
+
+ irq_set_handler_locked(d, handle_edge_irq);
+
+ return 0;
+}
+
+static int rtd1625_gpio_irq_set_type(struct irq_data *d, unsigned int type)
+{
+ int ret;
+
+ switch (type & IRQ_TYPE_SENSE_MASK) {
+ case IRQ_TYPE_EDGE_RISING:
+ ret = rtd1625_gpio_irq_set_edge_type(d, 1);
+ break;
+ case IRQ_TYPE_EDGE_FALLING:
+ ret = rtd1625_gpio_irq_set_edge_type(d, 0);
+ break;
+ case IRQ_TYPE_EDGE_BOTH:
+ ret = rtd1625_gpio_irq_set_edge_type(d, 1);
+ break;
+ case IRQ_TYPE_LEVEL_HIGH:
+ ret = rtd1625_gpio_irq_set_level_type(d, 0);
+ break;
+ case IRQ_TYPE_LEVEL_LOW:
+ ret = rtd1625_gpio_irq_set_level_type(d, 1);
+ break;
+ default:
+ ret = -EINVAL;
+ }
+
+ return ret;
+}
+
+static struct irq_chip rtd1625_iso_gpio_irq_chip = {
+ .name = "rtd1625-gpio",
+ .irq_ack = rtd1625_gpio_ack_irq,
+ .irq_mask = rtd1625_gpio_disable_irq,
+ .irq_unmask = rtd1625_gpio_enable_irq,
+ .irq_set_type = rtd1625_gpio_irq_set_type,
+ .flags = IRQCHIP_IMMUTABLE | IRQCHIP_SKIP_SET_WAKE,
+ GPIOCHIP_IRQ_RESOURCE_HELPERS,
+};
+
+static int rtd1625_gpio_setup_irq(struct platform_device *pdev, struct rtd1625_gpio *data)
+{
+ struct gpio_irq_chip *irq_chip;
+ int num_irqs;
+ int irq;
+ int i;
+
+ irq = platform_get_irq_optional(pdev, 0);
+ if (irq == -ENXIO)
+ return 0;
+ if (irq < 0)
+ return irq;
+
+ num_irqs = (data->info->irq_type_support & IRQ_TYPE_LEVEL_MASK) ? 3 : 2;
+ data->irqs[0] = irq;
+
+ for (i = 1; i < num_irqs; i++) {
+ irq = platform_get_irq(pdev, i);
+ if (irq < 0)
+ return irq;
+ data->irqs[i] = irq;
+ }
+
+ irq_chip = &data->gpio_chip.irq;
+ irq_chip->handler = handle_bad_irq;
+ irq_chip->default_type = IRQ_TYPE_NONE;
+ irq_chip->parent_handler = rtd1625_gpio_irq_handle;
+ irq_chip->parent_handler_data = data;
+ irq_chip->num_parents = num_irqs;
+ irq_chip->parents = data->irqs;
+
+ gpio_irq_chip_set_chip(irq_chip, &rtd1625_iso_gpio_irq_chip);
+
+ return 0;
+}
+
+static int rtd1625_gpio_probe(struct platform_device *pdev)
+{
+ struct device *dev = &pdev->dev;
+ struct rtd1625_gpio *data;
+ void __iomem *irq_base;
+ int ret;
+
+ data = devm_kzalloc(dev, sizeof(*data), GFP_KERNEL);
+ if (!data)
+ return -ENOMEM;
+
+ data->info = device_get_match_data(dev);
+ if (!data->info)
+ return -EINVAL;
+
+ raw_spin_lock_init(&data->lock);
+
+ irq_base = devm_platform_ioremap_resource(pdev, 0);
+ if (IS_ERR(irq_base))
+ return PTR_ERR(irq_base);
+
+ data->irq_base = irq_base;
+ data->base = irq_base + data->info->base_offset;
+
+ data->save_regs = devm_kzalloc(dev, data->info->num_gpios *
+ sizeof(*data->save_regs), GFP_KERNEL);
+ if (!data->save_regs)
+ return -ENOMEM;
+
+ data->gpio_chip.label = dev_name(dev);
+ data->gpio_chip.base = -1;
+ data->gpio_chip.ngpio = data->info->num_gpios;
+ data->gpio_chip.request = gpiochip_generic_request;
+ data->gpio_chip.free = gpiochip_generic_free;
+ data->gpio_chip.get_direction = rtd1625_gpio_get_direction;
+ data->gpio_chip.direction_input = rtd1625_gpio_direction_input;
+ data->gpio_chip.direction_output = rtd1625_gpio_direction_output;
+ data->gpio_chip.set = rtd1625_gpio_set;
+ data->gpio_chip.get = rtd1625_gpio_get;
+ data->gpio_chip.set_config = rtd1625_gpio_set_config;
+ data->gpio_chip.parent = dev;
+
+ ret = rtd1625_gpio_setup_irq(pdev, data);
+ if (ret)
+ return ret;
+
+ platform_set_drvdata(pdev, data);
+
+ return devm_gpiochip_add_data(dev, &data->gpio_chip, data);
+}
+
+static const struct rtd1625_gpio_info rtd1625_iso_gpio_info = {
+ .num_gpios = 166,
+ .irq_type_support = IRQ_TYPE_EDGE_BOTH,
+ .base_offset = 0x100,
+ .gpa_offset = 0x0,
+ .gpda_offset = 0x20,
+ .write_en_all = RTD1625_ISO_GPIO_WREN_ALL,
+};
+
+static const struct rtd1625_gpio_info rtd1625_isom_gpio_info = {
+ .num_gpios = 4,
+ .irq_type_support = IRQ_TYPE_EDGE_BOTH | IRQ_TYPE_LEVEL_LOW |
+ IRQ_TYPE_LEVEL_HIGH,
+ .base_offset = 0x20,
+ .gpa_offset = 0x0,
+ .gpda_offset = 0x4,
+ .level_offset = 0x18,
+ .write_en_all = RTD1625_ISOM_GPIO_WREN_ALL,
+};
+
+static const struct of_device_id rtd1625_gpio_of_matches[] = {
+ { .compatible = "realtek,rtd1625-iso-gpio", .data = &rtd1625_iso_gpio_info },
+ { .compatible = "realtek,rtd1625-isom-gpio", .data = &rtd1625_isom_gpio_info },
+ { }
+};
+MODULE_DEVICE_TABLE(of, rtd1625_gpio_of_matches);
+
+static int rtd1625_gpio_suspend(struct device *dev)
+{
+ struct rtd1625_gpio *data = dev_get_drvdata(dev);
+ const struct rtd1625_gpio_info *info = data->info;
+ int i;
+
+ for (i = 0; i < info->num_gpios; i++)
+ data->save_regs[i] = readl_relaxed(data->base + GPIO_CONTROL(i));
+
+ return 0;
+}
+
+static int rtd1625_gpio_resume(struct device *dev)
+{
+ struct rtd1625_gpio *data = dev_get_drvdata(dev);
+ const struct rtd1625_gpio_info *info = data->info;
+ int i;
+
+ for (i = 0; i < info->num_gpios; i++)
+ writel_relaxed(data->save_regs[i] | info->write_en_all,
+ data->base + GPIO_CONTROL(i));
+
+ return 0;
+}
+
+DEFINE_NOIRQ_DEV_PM_OPS(rtd1625_gpio_pm_ops, rtd1625_gpio_suspend, rtd1625_gpio_resume);
+
+static struct platform_driver rtd1625_gpio_platform_driver = {
+ .driver = {
+ .name = "gpio-rtd1625",
+ .of_match_table = rtd1625_gpio_of_matches,
+ .pm = pm_sleep_ptr(&rtd1625_gpio_pm_ops),
+ },
+ .probe = rtd1625_gpio_probe,
+};
+module_platform_driver(rtd1625_gpio_platform_driver);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Realtek Semiconductor Corporation");
+MODULE_DESCRIPTION("Realtek DHC SoC RTD1625 gpio driver");
--
2.34.1
^ permalink raw reply related
* [PATCH v2 4/4] arm64: dts: realtek: Add GPIO support for RTD1625
From: Yu-Chun Lin @ 2026-04-08 2:52 UTC (permalink / raw)
To: linusw, brgl, robh, krzk+dt, conor+dt, afaerber, tychang
Cc: linux-gpio, devicetree, linux-kernel, linux-arm-kernel,
linux-realtek-soc, cy.huang, stanley_chang, eleanor.lin,
james.tai
In-Reply-To: <20260408025243.1155482-1-eleanor.lin@realtek.com>
Add the GPIO node for the Realtek RTD1625 SoC.
Signed-off-by: Yu-Chun Lin <eleanor.lin@realtek.com>
---
Changes in v2:
- Merge two reg memory regions.
- Remove redundant status setting.
---
arch/arm64/boot/dts/realtek/kent.dtsi | 39 +++++++++++++++++++++++++++
1 file changed, 39 insertions(+)
diff --git a/arch/arm64/boot/dts/realtek/kent.dtsi b/arch/arm64/boot/dts/realtek/kent.dtsi
index 8d4293cd4c03..dafe56ce7d71 100644
--- a/arch/arm64/boot/dts/realtek/kent.dtsi
+++ b/arch/arm64/boot/dts/realtek/kent.dtsi
@@ -151,6 +151,37 @@ uart0: serial@7800 {
status = "disabled";
};
+ gpio: gpio@31000 {
+ compatible = "realtek,rtd1625-iso-gpio";
+ reg = <0x31000 0x398>;
+ gpio-controller;
+ gpio-ranges = <&isom_pinctrl 0 0 2>,
+ <&ve4_pinctrl 2 0 6>,
+ <&iso_pinctrl 8 0 4>,
+ <&ve4_pinctrl 12 6 2>,
+ <&main2_pinctrl 14 0 2>,
+ <&ve4_pinctrl 16 8 4>,
+ <&main2_pinctrl 20 2 3>,
+ <&ve4_pinctrl 23 12 3>,
+ <&iso_pinctrl 26 4 2>,
+ <&isom_pinctrl 28 2 2>,
+ <&ve4_pinctrl 30 15 6>,
+ <&main2_pinctrl 36 5 6>,
+ <&ve4_pinctrl 42 21 3>,
+ <&iso_pinctrl 45 6 6>,
+ <&ve4_pinctrl 51 24 1>,
+ <&iso_pinctrl 52 12 1>,
+ <&ve4_pinctrl 53 25 11>,
+ <&main2_pinctrl 64 11 28>,
+ <&ve4_pinctrl 92 36 2>,
+ <&iso_pinctrl 94 13 19>,
+ <&iso_pinctrl 128 32 4>,
+ <&ve4_pinctrl 132 38 13>,
+ <&iso_pinctrl 145 36 19>,
+ <&ve4_pinctrl 164 51 2>;
+ #gpio-cells = <2>;
+ };
+
iso_pinctrl: pinctrl@4e000 {
compatible = "realtek,rtd1625-iso-pinctrl";
reg = <0x4e000 0x1a4>;
@@ -161,6 +192,14 @@ main2_pinctrl: pinctrl@4f200 {
reg = <0x4f200 0x50>;
};
+ iso_m_gpio: gpio@89100 {
+ compatible = "realtek,rtd1625-isom-gpio";
+ reg = <0x89100 0x30>;
+ gpio-controller;
+ gpio-ranges = <&isom_pinctrl 0 0 4>;
+ #gpio-cells = <2>;
+ };
+
isom_pinctrl: pinctrl@146200 {
compatible = "realtek,rtd1625-isom-pinctrl";
reg = <0x146200 0x34>;
--
2.34.1
^ permalink raw reply related
* [PATCH v2 2/4] dt-bindings: gpio: realtek: Add realtek,rtd1625-gpio
From: Yu-Chun Lin @ 2026-04-08 2:52 UTC (permalink / raw)
To: linusw, brgl, robh, krzk+dt, conor+dt, afaerber, tychang
Cc: linux-gpio, devicetree, linux-kernel, linux-arm-kernel,
linux-realtek-soc, cy.huang, stanley_chang, eleanor.lin,
james.tai
In-Reply-To: <20260408025243.1155482-1-eleanor.lin@realtek.com>
From: Tzuyi Chang <tychang@realtek.com>
Add the device tree bindings for the Realtek DHC (Digital Home Center)
RTD1625 GPIO controllers.
The RTD1625 GPIO controller features a per-pin register architecture
that differs significantly from previous generations. It utilizes
separate register blocks for GPIO configuration and interrupt control.
Signed-off-by: Tzuyi Chang <tychang@realtek.com>
Signed-off-by: Yu-Chun Lin <eleanor.lin@realtek.com>
---
Changes in v2:
- Merge two memory regions into one.
- Add a description for the reg region.
---
.../bindings/gpio/realtek,rtd1625-gpio.yaml | 82 +++++++++++++++++++
1 file changed, 82 insertions(+)
create mode 100644 Documentation/devicetree/bindings/gpio/realtek,rtd1625-gpio.yaml
diff --git a/Documentation/devicetree/bindings/gpio/realtek,rtd1625-gpio.yaml b/Documentation/devicetree/bindings/gpio/realtek,rtd1625-gpio.yaml
new file mode 100644
index 000000000000..de873876b8c6
--- /dev/null
+++ b/Documentation/devicetree/bindings/gpio/realtek,rtd1625-gpio.yaml
@@ -0,0 +1,82 @@
+# SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause)
+# Copyright 2023 Realtek Semiconductor Corporation
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/gpio/realtek,rtd1625-gpio.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Realtek DHC RTD1625 GPIO controller
+
+maintainers:
+ - Tzuyi Chang <tychang@realtek.com>
+
+description: |
+ GPIO controller for the Realtek RTD1625 SoC, featuring a per-pin register
+ architecture that differs significantly from earlier RTD series controllers.
+ Each GPIO has dedicated registers for configuration (direction, input/output
+ values, debounce), and interrupt control supporting edge and level detection
+ modes.
+
+properties:
+ compatible:
+ enum:
+ - realtek,rtd1625-iso-gpio
+ - realtek,rtd1625-isom-gpio
+
+ reg:
+ maxItems: 1
+ description: |
+ Memory region containing both interrupt control and GPIO
+ configuration registers in a contiguous address space.
+
+ For realtek,rtd1625-iso-gpio:
+ - Base + 0x0 ~ 0xff: Interrupt control registers
+ - Base + 0x100 ~ 0x397: GPIO configuration registers
+
+ For realtek,rtd1625-isom-gpio:
+ - Base + 0x0 ~ 0x1f: Interrupt control registers
+ - Base + 0x20 ~ 0x2f: GPIO configuration registers
+
+ interrupts:
+ items:
+ - description: Interrupt number of the assert GPIO interrupt, which is
+ triggered when there is a rising edge.
+ - description: Interrupt number of the deassert GPIO interrupt, which is
+ triggered when there is a falling edge.
+ - description: Interrupt number of the level-sensitive GPIO interrupt,
+ triggered by a configured logic level.
+
+ interrupt-controller: true
+
+ "#interrupt-cells":
+ const: 2
+
+ gpio-ranges: true
+
+ gpio-controller: true
+
+ "#gpio-cells":
+ const: 2
+
+required:
+ - compatible
+ - reg
+ - gpio-ranges
+ - gpio-controller
+ - "#gpio-cells"
+
+additionalProperties: false
+
+examples:
+ - |
+ gpio@89100 {
+ compatible = "realtek,rtd1625-isom-gpio";
+ reg = <0x89100 0x30>;
+ interrupt-parent = <&iso_m_irq_mux>;
+ interrupts = <0>, <1>, <2>;
+ interrupt-controller;
+ #interrupt-cells = <2>;
+ gpio-ranges = <&isom_pinctrl 0 0 4>;
+ gpio-controller;
+ #gpio-cells = <2>;
+ };
--
2.34.1
^ permalink raw reply related
* [PATCH v2 0/4] gpio: realtek: Add support for Realtek DHC RTD1625
From: Yu-Chun Lin @ 2026-04-08 2:52 UTC (permalink / raw)
To: linusw, brgl, robh, krzk+dt, conor+dt, afaerber, tychang
Cc: linux-gpio, devicetree, linux-kernel, linux-arm-kernel,
linux-realtek-soc, cy.huang, stanley_chang, eleanor.lin,
james.tai
This series adds GPIO support for the Realtek DHC RTD1625 SoC.
Unlike the existing driver (gpio-rtd.c) which uses shared bank registers,
the RTD1625 features a per-pin register architecture where each GPIO line
is managed by its own dedicated 32-bit control register. This distinct
hardware design requires a new, separate driver.
The device tree changes in this series (Patch 3) depend on the RTD1625 pinctrl
driver, which is currently under review and has not been merged yet.
The dependent pinctrl patch can be found here:
https://lore.kernel.org/lkml/20260317115411.2154365-9-eleanor.lin@realtek.com/
Best Regards,
Yu-Chun Lin
---
Changes in v2:
- Move DTS patch to the end of the series.
Patch 1 (gpio: Remove "default y" in Kconfig):
- New patch.
Patch 2 (dt-bindings: gpio: realtek: Add realtek,rtd1625-gpio):
- Merge two memory regions into one.
- Add a description for the reg region.
Patch 3 (gpio: realtek: Add driver for Realtek DHC RTD1625 SoC):
- Remove "default y".
- Add base_offset member to struct rtd1625_gpio_info to handle merged regions.
Patch 4 (arm64: dts: realtek: Add GPIO support for RTD1625):
- Merge two reg memory regions.
- Remove redundant status setting.
v1: https://lore.kernel.org/lkml/20260331113835.3510341-1-eleanor.lin@realtek.com/
Tzuyi Chang (2):
dt-bindings: gpio: realtek: Add realtek,rtd1625-gpio
gpio: realtek: Add driver for Realtek DHC RTD1625 SoC
Yu-Chun Lin (2):
gpio: Remove "default y" in Kconfig
arm64: dts: realtek: Add GPIO support for RTD1625
.../bindings/gpio/realtek,rtd1625-gpio.yaml | 82 +++
arch/arm64/boot/dts/realtek/kent.dtsi | 39 ++
drivers/gpio/Kconfig | 12 +-
drivers/gpio/Makefile | 1 +
drivers/gpio/gpio-rtd1625.c | 584 ++++++++++++++++++
5 files changed, 717 insertions(+), 1 deletion(-)
create mode 100644 Documentation/devicetree/bindings/gpio/realtek,rtd1625-gpio.yaml
create mode 100644 drivers/gpio/gpio-rtd1625.c
--
2.34.1
^ permalink raw reply
* [PATCH v2 1/4] gpio: Remove "default y" in Kconfig
From: Yu-Chun Lin @ 2026-04-08 2:52 UTC (permalink / raw)
To: linusw, brgl, robh, krzk+dt, conor+dt, afaerber, tychang
Cc: linux-gpio, devicetree, linux-kernel, linux-arm-kernel,
linux-realtek-soc, cy.huang, stanley_chang, eleanor.lin,
james.tai
In-Reply-To: <20260408025243.1155482-1-eleanor.lin@realtek.com>
Remove the default y to avoid bloating the build for non-Realtek platforms
when COMPILE_TEST is enable on other platforms.
Signed-off-by: Yu-Chun Lin <eleanor.lin@realtek.com>
---
Changes in v2:
- New patch.
---
drivers/gpio/Kconfig | 1 -
1 file changed, 1 deletion(-)
diff --git a/drivers/gpio/Kconfig b/drivers/gpio/Kconfig
index b45fb799e36c..5ee11a889867 100644
--- a/drivers/gpio/Kconfig
+++ b/drivers/gpio/Kconfig
@@ -629,7 +629,6 @@ config GPIO_ROCKCHIP
config GPIO_RTD
tristate "Realtek DHC GPIO support"
depends on ARCH_REALTEK || COMPILE_TEST
- default y
select GPIOLIB_IRQCHIP
help
This option enables support for GPIOs found on Realtek DHC(Digital
--
2.34.1
^ permalink raw reply related
* [RFC PATCH 8/8] mm/vmalloc: Stop scanning for compound pages after encountering small pages in vmap
From: Barry Song (Xiaomi) @ 2026-04-08 2:51 UTC (permalink / raw)
To: linux-mm, linux-arm-kernel, catalin.marinas, will, akpm, urezki
Cc: linux-kernel, anshuman.khandual, ryan.roberts, ajd, rppt, david,
Xueyuan.chen21, Barry Song (Xiaomi)
In-Reply-To: <20260408025115.27368-1-baohua@kernel.org>
Users typically allocate memory in descending orders, e.g.
8 → 4 → 0. Once an order-0 page is encountered, subsequent
pages are likely to also be order-0, so we stop scanning
for compound pages at that point.
Signed-off-by: Barry Song (Xiaomi) <baohua@kernel.org>
---
mm/vmalloc.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 3c3b7217693a..242f4bc1379c 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -3577,6 +3577,12 @@ static int vmap_contig_pages_range(unsigned long addr, unsigned long end,
map_addr = addr;
idx = i;
}
+ /*
+ * Once small pages are encountered, the remaining pages
+ * are likely small as well
+ */
+ if (shift == PAGE_SHIFT)
+ break;
addr += 1UL << shift;
i += 1U << (shift - PAGE_SHIFT);
--
2.39.3 (Apple Git-146)
^ permalink raw reply related
* [RFC PATCH 7/8] mm/vmalloc: Coalesce same page_shift mappings in vmap to avoid pgtable zigzag
From: Barry Song (Xiaomi) @ 2026-04-08 2:51 UTC (permalink / raw)
To: linux-mm, linux-arm-kernel, catalin.marinas, will, akpm, urezki
Cc: linux-kernel, anshuman.khandual, ryan.roberts, ajd, rppt, david,
Xueyuan.chen21, Barry Song (Xiaomi)
In-Reply-To: <20260408025115.27368-1-baohua@kernel.org>
For vmap(), detect pages with the same page_shift and map them in
batches, avoiding the pgtable zigzag caused by per-page mapping.
Signed-off-by: Barry Song (Xiaomi) <baohua@kernel.org>
---
mm/vmalloc.c | 24 ++++++++++++++++++++----
1 file changed, 20 insertions(+), 4 deletions(-)
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 6643ec0288cd..3c3b7217693a 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -3551,6 +3551,8 @@ static int vmap_contig_pages_range(unsigned long addr, unsigned long end,
pgprot_t prot, struct page **pages)
{
unsigned int count = (end - addr) >> PAGE_SHIFT;
+ unsigned int prev_shift = 0, idx = 0;
+ unsigned long map_addr = addr;
int err;
err = kmsan_vmap_pages_range_noflush(addr, end, prot, pages,
@@ -3562,15 +3564,29 @@ static int vmap_contig_pages_range(unsigned long addr, unsigned long end,
unsigned int shift = PAGE_SHIFT +
get_vmap_batch_order(pages, count - i, i);
- err = vmap_range_noflush(addr, addr + (1UL << shift),
- page_to_phys(pages[i]), prot, shift);
- if (err)
- goto out;
+ if (!i)
+ prev_shift = shift;
+
+ if (shift != prev_shift) {
+ err = vmap_small_pages_range_noflush(map_addr, addr,
+ prot, pages + idx,
+ min(prev_shift, PMD_SHIFT));
+ if (err)
+ goto out;
+ prev_shift = shift;
+ map_addr = addr;
+ idx = i;
+ }
addr += 1UL << shift;
i += 1U << (shift - PAGE_SHIFT);
}
+ /* Remaining */
+ if (map_addr < end)
+ err = vmap_small_pages_range_noflush(map_addr, end,
+ prot, pages + idx, min(prev_shift, PMD_SHIFT));
+
out:
flush_cache_vmap(addr, end);
return err;
--
2.39.3 (Apple Git-146)
^ permalink raw reply related
* [RFC PATCH 6/8] mm/vmalloc: align vm_area so vmap() can batch mappings
From: Barry Song (Xiaomi) @ 2026-04-08 2:51 UTC (permalink / raw)
To: linux-mm, linux-arm-kernel, catalin.marinas, will, akpm, urezki
Cc: linux-kernel, anshuman.khandual, ryan.roberts, ajd, rppt, david,
Xueyuan.chen21, Barry Song (Xiaomi)
In-Reply-To: <20260408025115.27368-1-baohua@kernel.org>
Try to align the vmap virtual address to PMD_SHIFT or a
larger PTE mapping size hinted by the architecture, so
contiguous pages can be batch-mapped when setting PMD or
PTE entries.
Signed-off-by: Barry Song (Xiaomi) <baohua@kernel.org>
---
mm/vmalloc.c | 31 ++++++++++++++++++++++++++++++-
1 file changed, 30 insertions(+), 1 deletion(-)
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index e8dbfada42bc..6643ec0288cd 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -3576,6 +3576,35 @@ static int vmap_contig_pages_range(unsigned long addr, unsigned long end,
return err;
}
+static struct vm_struct *get_aligned_vm_area(unsigned long size, unsigned long flags)
+{
+ unsigned int shift = (size >= PMD_SIZE) ? PMD_SHIFT :
+ arch_vmap_pte_supported_shift(size);
+ struct vm_struct *vm_area = NULL;
+
+ /*
+ * Try to allocate an aligned vm_area so contiguous pages can be
+ * mapped in batches.
+ */
+ while (1) {
+ unsigned long align = 1UL << shift;
+
+ vm_area = __get_vm_area_node(size, align, PAGE_SHIFT, flags,
+ VMALLOC_START, VMALLOC_END,
+ NUMA_NO_NODE, GFP_KERNEL,
+ __builtin_return_address(0));
+ if (vm_area || shift <= PAGE_SHIFT)
+ goto out;
+ if (shift == PMD_SHIFT)
+ shift = arch_vmap_pte_supported_shift(size);
+ else if (shift > PAGE_SHIFT)
+ shift = PAGE_SHIFT;
+ }
+
+out:
+ return vm_area;
+}
+
/**
* vmap - map an array of pages into virtually contiguous space
* @pages: array of page pointers
@@ -3614,7 +3643,7 @@ void *vmap(struct page **pages, unsigned int count,
return NULL;
size = (unsigned long)count << PAGE_SHIFT;
- area = get_vm_area_caller(size, flags, __builtin_return_address(0));
+ area = get_aligned_vm_area(size, flags);
if (!area)
return NULL;
--
2.39.3 (Apple Git-146)
^ permalink raw reply related
* [RFC PATCH 5/8] mm/vmalloc: map contiguous pages in batches for vmap() if possible
From: Barry Song (Xiaomi) @ 2026-04-08 2:51 UTC (permalink / raw)
To: linux-mm, linux-arm-kernel, catalin.marinas, will, akpm, urezki
Cc: linux-kernel, anshuman.khandual, ryan.roberts, ajd, rppt, david,
Xueyuan.chen21, Barry Song (Xiaomi)
In-Reply-To: <20260408025115.27368-1-baohua@kernel.org>
In many cases, the pages passed to vmap() may include high-order
pages allocated with __GFP_COMP flags. For example, the systemheap
often allocates pages in descending order: order 8, then 4, then 0.
Currently, vmap() iterates over every page individually—even pages
inside a high-order block are handled one by one.
This patch detects high-order pages and maps them as a single
contiguous block whenever possible.
An alternative would be to implement a new API, vmap_sg(), but that
change seems to be large in scope.
Signed-off-by: Barry Song (Xiaomi) <baohua@kernel.org>
---
mm/vmalloc.c | 51 +++++++++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 49 insertions(+), 2 deletions(-)
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index eba436386929..e8dbfada42bc 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -3529,6 +3529,53 @@ void vunmap(const void *addr)
}
EXPORT_SYMBOL(vunmap);
+static inline int get_vmap_batch_order(struct page **pages,
+ unsigned int max_steps, unsigned int idx)
+{
+ unsigned int nr_pages;
+
+ if (!IS_ENABLED(CONFIG_HAVE_ARCH_HUGE_VMAP) ||
+ ioremap_max_page_shift == PAGE_SHIFT)
+ return 0;
+
+ nr_pages = compound_nr(pages[idx]);
+ if (nr_pages == 1 || max_steps < nr_pages)
+ return 0;
+
+ if (num_pages_contiguous(&pages[idx], nr_pages) == nr_pages)
+ return compound_order(pages[idx]);
+ return 0;
+}
+
+static int vmap_contig_pages_range(unsigned long addr, unsigned long end,
+ pgprot_t prot, struct page **pages)
+{
+ unsigned int count = (end - addr) >> PAGE_SHIFT;
+ int err;
+
+ err = kmsan_vmap_pages_range_noflush(addr, end, prot, pages,
+ PAGE_SHIFT, GFP_KERNEL);
+ if (err)
+ goto out;
+
+ for (unsigned int i = 0; i < count; ) {
+ unsigned int shift = PAGE_SHIFT +
+ get_vmap_batch_order(pages, count - i, i);
+
+ err = vmap_range_noflush(addr, addr + (1UL << shift),
+ page_to_phys(pages[i]), prot, shift);
+ if (err)
+ goto out;
+
+ addr += 1UL << shift;
+ i += 1U << (shift - PAGE_SHIFT);
+ }
+
+out:
+ flush_cache_vmap(addr, end);
+ return err;
+}
+
/**
* vmap - map an array of pages into virtually contiguous space
* @pages: array of page pointers
@@ -3572,8 +3619,8 @@ void *vmap(struct page **pages, unsigned int count,
return NULL;
addr = (unsigned long)area->addr;
- if (vmap_pages_range(addr, addr + size, pgprot_nx(prot),
- pages, PAGE_SHIFT) < 0) {
+ if (vmap_contig_pages_range(addr, addr + size, pgprot_nx(prot),
+ pages) < 0) {
vunmap(area->addr);
return NULL;
}
--
2.39.3 (Apple Git-146)
^ permalink raw reply related
* [RFC PATCH 4/8] mm/vmalloc: Eliminate page table zigzag for huge vmalloc mappings
From: Barry Song (Xiaomi) @ 2026-04-08 2:51 UTC (permalink / raw)
To: linux-mm, linux-arm-kernel, catalin.marinas, will, akpm, urezki
Cc: linux-kernel, anshuman.khandual, ryan.roberts, ajd, rppt, david,
Xueyuan.chen21, Barry Song (Xiaomi)
In-Reply-To: <20260408025115.27368-1-baohua@kernel.org>
For vmalloc() allocations with VM_ALLOW_HUGE_VMAP, we no longer
need to iterate over pages one by one, which would otherwise lead to
zigzag page table mappings.
The code is now unified with the PAGE_SHIFT case by simply
calling vmap_small_pages_range_noflush().
Signed-off-by: Barry Song (Xiaomi) <baohua@kernel.org>
---
mm/vmalloc.c | 22 ++++------------------
1 file changed, 4 insertions(+), 18 deletions(-)
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 5bf072297536..eba436386929 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -689,27 +689,13 @@ static int vmap_small_pages_range_noflush(unsigned long addr, unsigned long end,
int __vmap_pages_range_noflush(unsigned long addr, unsigned long end,
pgprot_t prot, struct page **pages, unsigned int page_shift)
{
- unsigned int i, nr = (end - addr) >> PAGE_SHIFT;
-
WARN_ON(page_shift < PAGE_SHIFT);
- if (!IS_ENABLED(CONFIG_HAVE_ARCH_HUGE_VMALLOC) ||
- page_shift == PAGE_SHIFT)
- return vmap_small_pages_range_noflush(addr, end, prot, pages, PAGE_SHIFT);
-
- for (i = 0; i < nr; i += 1U << (page_shift - PAGE_SHIFT)) {
- int err;
-
- err = vmap_range_noflush(addr, addr + (1UL << page_shift),
- page_to_phys(pages[i]), prot,
- page_shift);
- if (err)
- return err;
+ if (!IS_ENABLED(CONFIG_HAVE_ARCH_HUGE_VMALLOC))
+ page_shift = PAGE_SHIFT;
- addr += 1UL << page_shift;
- }
-
- return 0;
+ return vmap_small_pages_range_noflush(addr, end, prot, pages,
+ min(page_shift, PMD_SHIFT));
}
int vmap_pages_range_noflush(unsigned long addr, unsigned long end,
--
2.39.3 (Apple Git-146)
^ permalink raw reply related
* [RFC PATCH 3/8] mm/vmalloc: Extend vmap_small_pages_range_noflush() to support larger page_shift sizes
From: Barry Song (Xiaomi) @ 2026-04-08 2:51 UTC (permalink / raw)
To: linux-mm, linux-arm-kernel, catalin.marinas, will, akpm, urezki
Cc: linux-kernel, anshuman.khandual, ryan.roberts, ajd, rppt, david,
Xueyuan.chen21, Barry Song (Xiaomi)
In-Reply-To: <20260408025115.27368-1-baohua@kernel.org>
vmap_small_pages_range_noflush() provides a clean interface by taking
struct page **pages and mapping them via direct PTE iteration. This
avoids the page table zigzag seen when using
vmap_range_noflush() for page_shift values other than PAGE_SHIFT.
Extend it to support larger page_shift values, and add PMD- and
contiguous-PTE mappings as well.
Signed-off-by: Barry Song (Xiaomi) <baohua@kernel.org>
---
mm/vmalloc.c | 54 ++++++++++++++++++++++++++++++++++++++++------------
1 file changed, 42 insertions(+), 12 deletions(-)
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 57eae99d9909..5bf072297536 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -524,8 +524,9 @@ void vunmap_range(unsigned long addr, unsigned long end)
static int vmap_pages_pte_range(pmd_t *pmd, unsigned long addr,
unsigned long end, pgprot_t prot, struct page **pages, int *nr,
- pgtbl_mod_mask *mask)
+ pgtbl_mod_mask *mask, unsigned int shift)
{
+ unsigned int steps = 1;
int err = 0;
pte_t *pte;
@@ -543,6 +544,7 @@ static int vmap_pages_pte_range(pmd_t *pmd, unsigned long addr,
do {
struct page *page = pages[*nr];
+ steps = 1;
if (WARN_ON(!pte_none(ptep_get(pte)))) {
err = -EBUSY;
break;
@@ -556,9 +558,24 @@ static int vmap_pages_pte_range(pmd_t *pmd, unsigned long addr,
break;
}
+#ifdef CONFIG_HUGETLB_PAGE
+ if (shift != PAGE_SHIFT) {
+ unsigned long pfn = page_to_pfn(page), size;
+
+ size = arch_vmap_pte_range_map_size(addr, end, pfn, shift);
+ if (size != PAGE_SIZE) {
+ steps = size >> PAGE_SHIFT;
+ pte_t entry = pfn_pte(pfn, prot);
+
+ entry = arch_make_huge_pte(entry, ilog2(size), 0);
+ set_huge_pte_at(&init_mm, addr, pte, entry, size);
+ continue;
+ }
+ }
+#endif
+
set_pte_at(&init_mm, addr, pte, mk_pte(page, prot));
- (*nr)++;
- } while (pte++, addr += PAGE_SIZE, addr != end);
+ } while (pte += steps, *nr += steps, addr += PAGE_SIZE * steps, addr != end);
lazy_mmu_mode_disable();
*mask |= PGTBL_PTE_MODIFIED;
@@ -568,7 +585,7 @@ static int vmap_pages_pte_range(pmd_t *pmd, unsigned long addr,
static int vmap_pages_pmd_range(pud_t *pud, unsigned long addr,
unsigned long end, pgprot_t prot, struct page **pages, int *nr,
- pgtbl_mod_mask *mask)
+ pgtbl_mod_mask *mask, unsigned int shift)
{
pmd_t *pmd;
unsigned long next;
@@ -578,7 +595,20 @@ static int vmap_pages_pmd_range(pud_t *pud, unsigned long addr,
return -ENOMEM;
do {
next = pmd_addr_end(addr, end);
- if (vmap_pages_pte_range(pmd, addr, next, prot, pages, nr, mask))
+
+ if (shift == PMD_SHIFT) {
+ struct page *page = pages[*nr];
+ phys_addr_t phys_addr = page_to_phys(page);
+
+ if (vmap_try_huge_pmd(pmd, addr, next, phys_addr, prot,
+ shift)) {
+ *mask |= PGTBL_PMD_MODIFIED;
+ *nr += 1 << (shift - PAGE_SHIFT);
+ continue;
+ }
+ }
+
+ if (vmap_pages_pte_range(pmd, addr, next, prot, pages, nr, mask, shift))
return -ENOMEM;
} while (pmd++, addr = next, addr != end);
return 0;
@@ -586,7 +616,7 @@ static int vmap_pages_pmd_range(pud_t *pud, unsigned long addr,
static int vmap_pages_pud_range(p4d_t *p4d, unsigned long addr,
unsigned long end, pgprot_t prot, struct page **pages, int *nr,
- pgtbl_mod_mask *mask)
+ pgtbl_mod_mask *mask, unsigned int shift)
{
pud_t *pud;
unsigned long next;
@@ -596,7 +626,7 @@ static int vmap_pages_pud_range(p4d_t *p4d, unsigned long addr,
return -ENOMEM;
do {
next = pud_addr_end(addr, end);
- if (vmap_pages_pmd_range(pud, addr, next, prot, pages, nr, mask))
+ if (vmap_pages_pmd_range(pud, addr, next, prot, pages, nr, mask, shift))
return -ENOMEM;
} while (pud++, addr = next, addr != end);
return 0;
@@ -604,7 +634,7 @@ static int vmap_pages_pud_range(p4d_t *p4d, unsigned long addr,
static int vmap_pages_p4d_range(pgd_t *pgd, unsigned long addr,
unsigned long end, pgprot_t prot, struct page **pages, int *nr,
- pgtbl_mod_mask *mask)
+ pgtbl_mod_mask *mask, unsigned int shift)
{
p4d_t *p4d;
unsigned long next;
@@ -614,14 +644,14 @@ static int vmap_pages_p4d_range(pgd_t *pgd, unsigned long addr,
return -ENOMEM;
do {
next = p4d_addr_end(addr, end);
- if (vmap_pages_pud_range(p4d, addr, next, prot, pages, nr, mask))
+ if (vmap_pages_pud_range(p4d, addr, next, prot, pages, nr, mask, shift))
return -ENOMEM;
} while (p4d++, addr = next, addr != end);
return 0;
}
static int vmap_small_pages_range_noflush(unsigned long addr, unsigned long end,
- pgprot_t prot, struct page **pages)
+ pgprot_t prot, struct page **pages, unsigned int shift)
{
unsigned long start = addr;
pgd_t *pgd;
@@ -636,7 +666,7 @@ static int vmap_small_pages_range_noflush(unsigned long addr, unsigned long end,
next = pgd_addr_end(addr, end);
if (pgd_bad(*pgd))
mask |= PGTBL_PGD_MODIFIED;
- err = vmap_pages_p4d_range(pgd, addr, next, prot, pages, &nr, &mask);
+ err = vmap_pages_p4d_range(pgd, addr, next, prot, pages, &nr, &mask, shift);
if (err)
break;
} while (pgd++, addr = next, addr != end);
@@ -665,7 +695,7 @@ int __vmap_pages_range_noflush(unsigned long addr, unsigned long end,
if (!IS_ENABLED(CONFIG_HAVE_ARCH_HUGE_VMALLOC) ||
page_shift == PAGE_SHIFT)
- return vmap_small_pages_range_noflush(addr, end, prot, pages);
+ return vmap_small_pages_range_noflush(addr, end, prot, pages, PAGE_SHIFT);
for (i = 0; i < nr; i += 1U << (page_shift - PAGE_SHIFT)) {
int err;
--
2.39.3 (Apple Git-146)
^ permalink raw reply related
* [RFC PATCH 2/8] arm64/vmalloc: Allow arch_vmap_pte_range_map_size to batch multiple CONT_PTE
From: Barry Song (Xiaomi) @ 2026-04-08 2:51 UTC (permalink / raw)
To: linux-mm, linux-arm-kernel, catalin.marinas, will, akpm, urezki
Cc: linux-kernel, anshuman.khandual, ryan.roberts, ajd, rppt, david,
Xueyuan.chen21, Barry Song (Xiaomi)
In-Reply-To: <20260408025115.27368-1-baohua@kernel.org>
Allow arch_vmap_pte_range_map_size to batch multiple CONT_PTE hugepages,
reducing both PTE setup and TLB flush iterations.
Signed-off-by: Barry Song (Xiaomi) <baohua@kernel.org>
---
arch/arm64/include/asm/vmalloc.h | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/include/asm/vmalloc.h b/arch/arm64/include/asm/vmalloc.h
index 4ec1acd3c1b3..9eea06d0f75d 100644
--- a/arch/arm64/include/asm/vmalloc.h
+++ b/arch/arm64/include/asm/vmalloc.h
@@ -23,6 +23,8 @@ static inline unsigned long arch_vmap_pte_range_map_size(unsigned long addr,
unsigned long end, u64 pfn,
unsigned int max_page_shift)
{
+ unsigned long size;
+
/*
* If the block is at least CONT_PTE_SIZE in size, and is naturally
* aligned in both virtual and physical space, then we can pte-map the
@@ -40,7 +42,9 @@ static inline unsigned long arch_vmap_pte_range_map_size(unsigned long addr,
if (!IS_ALIGNED(PFN_PHYS(pfn), CONT_PTE_SIZE))
return PAGE_SIZE;
- return CONT_PTE_SIZE;
+ size = min3(end - addr, 1UL << max_page_shift, PMD_SIZE >> 1);
+ size = 1UL << (fls(size) - 1);
+ return size;
}
#define arch_vmap_pte_range_unmap_size arch_vmap_pte_range_unmap_size
--
2.39.3 (Apple Git-146)
^ permalink raw reply related
* [RFC PATCH 1/8] arm64/hugetlb: Extend batching of multiple CONT_PTE in a single PTE setup
From: Barry Song (Xiaomi) @ 2026-04-08 2:51 UTC (permalink / raw)
To: linux-mm, linux-arm-kernel, catalin.marinas, will, akpm, urezki
Cc: linux-kernel, anshuman.khandual, ryan.roberts, ajd, rppt, david,
Xueyuan.chen21, Barry Song (Xiaomi)
In-Reply-To: <20260408025115.27368-1-baohua@kernel.org>
For sizes aligned to CONT_PTE_SIZE and smaller than PMD_SIZE,
we can batch CONT_PTE settings instead of handling them individually.
Signed-off-by: Barry Song (Xiaomi) <baohua@kernel.org>
---
arch/arm64/mm/hugetlbpage.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
index a42c05cf5640..bf31c11ebd3b 100644
--- a/arch/arm64/mm/hugetlbpage.c
+++ b/arch/arm64/mm/hugetlbpage.c
@@ -110,6 +110,12 @@ static inline int num_contig_ptes(unsigned long size, size_t *pgsize)
contig_ptes = CONT_PTES;
break;
default:
+ if (size < CONT_PMD_SIZE && size > 0 &&
+ IS_ALIGNED(size, CONT_PTE_SIZE)) {
+ contig_ptes = size >> PAGE_SHIFT;
+ *pgsize = PAGE_SIZE;
+ break;
+ }
WARN_ON(!__hugetlb_valid_size(size));
}
@@ -359,6 +365,10 @@ pte_t arch_make_huge_pte(pte_t entry, unsigned int shift, vm_flags_t flags)
case CONT_PTE_SIZE:
return pte_mkcont(entry);
default:
+ if (pagesize < CONT_PMD_SIZE && pagesize > 0 &&
+ IS_ALIGNED(pagesize, CONT_PTE_SIZE))
+ return pte_mkcont(entry);
+
break;
}
pr_warn("%s: unrecognized huge page size 0x%lx\n",
--
2.39.3 (Apple Git-146)
^ permalink raw reply related
* [RFC PATCH 0/8] mm/vmalloc: Speed up ioremap, vmalloc and vmap with contiguous memory
From: Barry Song (Xiaomi) @ 2026-04-08 2:51 UTC (permalink / raw)
To: linux-mm, linux-arm-kernel, catalin.marinas, will, akpm, urezki
Cc: linux-kernel, anshuman.khandual, ryan.roberts, ajd, rppt, david,
Xueyuan.chen21, Barry Song (Xiaomi)
This patchset accelerates ioremap, vmalloc, and vmap when the memory
is physically fully or partially contiguous. Two techniques are used:
1. Avoid page table zigzag when setting PTEs/PMDs for multiple memory
segments
2. Use batched mappings wherever possible in both vmalloc and ARM64
layers
Patches 1–2 extend ARM64 vmalloc CONT-PTE mapping to support multiple
CONT-PTE regions instead of just one.
Patches 3–4 extend vmap_small_pages_range_noflush() to support page
shifts other than PAGE_SHIFT. This allows mapping multiple memory
segments for vmalloc() without zigzagging page tables.
Patches 5–8 add huge vmap support for contiguous pages. This not only
improves performance but also enables PMD or CONT-PTE mapping for the
vmapped area, reducing TLB pressure.
Many thanks to Xueyuan Chen for his substantial testing efforts
on RK3588 boards.
On the RK3588 8-core ARM64 SoC, with tasks pinned to CPU2 and
the performance CPUfreq policy enabled, Xueyuan’s tests report:
* ioremap(1 MB): 1.2× faster
* vmalloc(1 MB) mapping time (excluding allocation) with
VM_ALLOW_HUGE_VMAP: 1.5× faster
* vmap(): 5.6× faster when memory includes some order-8 pages,
with no regression observed for order-0 pages
Barry Song (Xiaomi) (8):
arm64/hugetlb: Extend batching of multiple CONT_PTE in a single PTE
setup
arm64/vmalloc: Allow arch_vmap_pte_range_map_size to batch multiple
CONT_PTE
mm/vmalloc: Extend vmap_small_pages_range_noflush() to support larger
page_shift sizes
mm/vmalloc: Eliminate page table zigzag for huge vmalloc mappings
mm/vmalloc: map contiguous pages in batches for vmap() if possible
mm/vmalloc: align vm_area so vmap() can batch mappings
mm/vmalloc: Coalesce same page_shift mappings in vmap to avoid pgtable
zigzag
mm/vmalloc: Stop scanning for compound pages after encountering small
pages in vmap
arch/arm64/include/asm/vmalloc.h | 6 +-
arch/arm64/mm/hugetlbpage.c | 10 ++
mm/vmalloc.c | 178 +++++++++++++++++++++++++------
3 files changed, 161 insertions(+), 33 deletions(-)
--
2.39.3 (Apple Git-146)
^ permalink raw reply
* RE: [PATCH v1] PCI: imx6: Add force_suspend flag to override L1SS suspend skip
From: Hongxing Zhu @ 2026-04-08 2:38 UTC (permalink / raw)
To: mani@kernel.org
Cc: Bjorn Helgaas, Frank Li, jingoohan1@gmail.com,
l.stach@pengutronix.de, lpieralisi@kernel.org,
kwilczynski@kernel.org, robh@kernel.org, bhelgaas@google.com,
s.hauer@pengutronix.de, kernel@pengutronix.de, festevam@gmail.com,
linux-pci@vger.kernel.org, linux-arm-kernel@lists.infradead.org,
imx@lists.linux.dev, linux-kernel@vger.kernel.org,
stable@vger.kernel.org
In-Reply-To: <ihoprlijtwgihkbmszm53iftvpyg7ljvubs3bv2lt22uma74ul@zqgulwmj4jpb>
> -----Original Message-----
> From: mani@kernel.org <mani@kernel.org>
> Sent: 2026年4月7日 15:24
> To: Hongxing Zhu <hongxing.zhu@nxp.com>
> Cc: Bjorn Helgaas <helgaas@kernel.org>; Frank Li <frank.li@nxp.com>;
> jingoohan1@gmail.com; l.stach@pengutronix.de; lpieralisi@kernel.org;
> kwilczynski@kernel.org; robh@kernel.org; bhelgaas@google.com;
> s.hauer@pengutronix.de; kernel@pengutronix.de; festevam@gmail.com;
> linux-pci@vger.kernel.org; linux-arm-kernel@lists.infradead.org;
> imx@lists.linux.dev; linux-kernel@vger.kernel.org; stable@vger.kernel.org
> Subject: Re: [PATCH v1] PCI: imx6: Add force_suspend flag to override L1SS
> suspend skip
>
> On Tue, Apr 07, 2026 at 03:31:57AM +0000, Hongxing Zhu wrote:
> > > -----Original Message-----
> > > From: mani@kernel.org <mani@kernel.org>
> > > Sent: 2026年4月4日 1:03
> > > To: Hongxing Zhu <hongxing.zhu@nxp.com>
> > > Cc: Bjorn Helgaas <helgaas@kernel.org>; Frank Li <frank.li@nxp.com>;
> > > jingoohan1@gmail.com; l.stach@pengutronix.de; lpieralisi@kernel.org;
> > > kwilczynski@kernel.org; robh@kernel.org; bhelgaas@google.com;
> > > s.hauer@pengutronix.de; kernel@pengutronix.de; festevam@gmail.com;
> > > linux-pci@vger.kernel.org; linux-arm-kernel@lists.infradead.org;
> > > imx@lists.linux.dev; linux-kernel@vger.kernel.org;
> > > stable@vger.kernel.org
> > > Subject: Re: [PATCH v1] PCI: imx6: Add force_suspend flag to
> > > override L1SS suspend skip
> > >
> > > On Tue, Mar 24, 2026 at 02:01:58AM +0000, Hongxing Zhu wrote:
> > > > > -----Original Message-----
> > > > > From: Bjorn Helgaas <helgaas@kernel.org>
> > > > > Sent: 2026年3月24日 6:09
> > > > > To: Hongxing Zhu <hongxing.zhu@nxp.com>
> > > > > Cc: Frank Li <frank.li@nxp.com>; jingoohan1@gmail.com;
> > > > > l.stach@pengutronix.de; lpieralisi@kernel.org;
> > > > > kwilczynski@kernel.org; mani@kernel.org; robh@kernel.org;
> > > > > bhelgaas@google.com; s.hauer@pengutronix.de;
> > > > > kernel@pengutronix.de; festevam@gmail.com;
> > > > > linux-pci@vger.kernel.org; linux-arm-kernel@lists.infradead.org;
> > > > > imx@lists.linux.dev; linux-kernel@vger.kernel.org;
> > > > > stable@vger.kernel.org
> > > > > Subject: Re: [PATCH v1] PCI: imx6: Add force_suspend flag to
> > > > > override L1SS suspend skip
> > > > >
> > > > > On Wed, Mar 18, 2026 at 02:55:45AM +0000, Hongxing Zhu wrote:
> > > > > > > -----Original Message-----
> > > > > > > From: Bjorn Helgaas <helgaas@kernel.org>
> > > > > > ... [messed up quoting]
> > > > >
> > > > > > > On Tue, Mar 17, 2026 at 02:12:56PM +0800, Richard Zhu wrote:
> > > > > > > > Add a force_suspend flag to allow platform drivers to
> > > > > > > > force the PCIe link into L2 state during suspend, even
> > > > > > > > when L1SS (ASPM L1
> > > > > > > > Sub-States) is enabled.
> > > > > > > >
> > > > > > > > By default, the DesignWare PCIe host controller skips L2
> > > > > > > > suspend when L1SS is supported to meet low resume latency
> > > > > > > > requirements for devices like NVMe. However, some
> > > > > > > > platforms like i.MX PCIe need to enter L2 state for proper
> > > > > > > > power management regardless of L1SS
> > > > > support.
> > > > > > > >
> > > > > > > > Enable force_suspend for i.MX PCIe to ensure the link
> > > > > > > > enters
> > > > > > > > L2 during system suspend.
> > > > > > >
> > > > > > > I'm a little bit skeptical about this.
> > > > > > >
> > > > > > > What exactly does a "low resume latency requirement" mean?
> > > > > > > Is this an actual functional requirement that's special to
> > > > > > > NVMe, or is it just the desire for low resume latency that
> > > > > > > everybody has for all devices?
> > > > > >
> > > > > > From my understanding, L1SS mode is characterized by lower
> > > > > > latency when compared to L2 or L3 modes.
> > > > > >
> > > > > > It can be used on all devices, avoiding frequent power on/off cycles.
> > > > > > NVMe can also extend the service life of the equipment.
> > > > >
> > > > > All the above applies to all platforms, so it's not an argument
> > > > > for i.MX-specific code here.
> > > > >
> > > > Hi Bjorn:
> > > > Thanks for your kindly review.
> > > > Yes, it is.
> > > > > > > Is there something special about i.MX here? Why do we want
> > > > > > > i.MX to be different from other host controllers?
> > > > > >
> > > > > > i.MX PCIe loses power supply during Deep Sleep Mode (DSM),
> > > > > > requiring full reinitialization after system wake-up.
> > > > >
> > > > > I don't know what DSM means in PCIe or how it would help justify
> > > > > this change.
> > > > >
> > > > i.MX PCIe power is gated off during suspend, requiring full
> > > > reinitialization on resume
> > > >
> > >
> > > Is this an unconditional behavior? What if the PCIe device is
> > > configured as a wakeup source like WOL, WOW? And if you connect
> > > NVMe, this behavior will result in resume failure as NVMe driver
> > > expects the power to be retained if ASPM is supported.
> >
> > Yes, this is unconditional behavior. The i.MX PCIe controller
> > exclusively supports sideband wakeup mechanisms, which operate
> > independently of the PCIe link state and device power configuration.
> >
>
> I believe you are referring to WAKE# as the sideband wakeup mechanism. If so,
> both host and device has to support WAKE#.
>
Exactly.
> > For devices configured as wakeup sources (WOL, WOW, etc.): The
> > sideband wakeup path bypasses the standard PCIe power management, so
> > these configurations do not impact the i.MX PCIe RC controller's
> > suspend/resume behavior.
> >
>
> Once user enables wakeup for a device, PCI core will configure PME_EN only if
> the device supports toggling WAKE# from D3Cold. So the wakeup functionality
> depends on device too, not just the RC.
>
Yes, you're right.
> > For NVMe devices with ASPM: While NVMe drivers typically expect power
> > retention when ASPM is enabled, the i.MX implementation's sideband
> > wakeup mechanism operates through a separate signaling path. The
> > wakeup functionality does not depend on maintaining PCIe link power,
> > thus avoiding conflicts with NVMe power state expectations.
> >
>
> There is no relation between WAKE# and NVMe. NVMe is a passive device, so
> it doesn't support WAKE#. With this patch alone, the NVMe driver won't
> resume (is ASPM is enabled). You need to tell the NVMe driver to perpare for
> power loss too. Maybe this patch can help you:
> https://lore.kern/
> el.org%2Fall%2F20251231162126.7728-1-manivannan.sadhasivam%40oss.qual
> comm.com%2F&data=05%7C02%7Chongxing.zhu%40nxp.com%7C0779f20d02
> 37440e7acc08de9476b368%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7
> C0%7C639111434711840964%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hc
> GkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIj
> oyfQ%3D%3D%7C0%7C%7C%7C&sdata=R6fDeMjP%2BX3n5LCYFfHI%2Ff80Ms8
> 64geMQH2DGgg05fA%3D&reserved=0
>
> But that patch will only help if your platform supports S2RAM through PSCI.
Thanks a lot, this patch is helpful.
Since, i.MX platforms support the S2RAM through PSCI.
One additional note regarding NVMe: ASPM (Active State Power Management) is
disabled locally on i.MX platforms for NVMe devices. This decision was made
after encountering a system hang issue similar to the one reported by Hans a
few months ago in his patch listed below.
https://lore.kernel.org/linux-nvme/20250502032051.920990-1-hans.zhang@cixtech.com/
Best Regards
Richard Zhu
>
> - Mani
>
> --
> மணிவண்ணன் சதாசிவம்
^ permalink raw reply
* Re: [PATCH 10/10] arm64: Check DAIF (and PMR) at task-switch time
From: Jinjie Ruan @ 2026-04-08 2:17 UTC (permalink / raw)
To: Mark Rutland, linux-arm-kernel, Catalin Marinas, Will Deacon
Cc: vladimir.murzin, peterz, linux-kernel, tglx, luto
In-Reply-To: <20260407131650.3813777-11-mark.rutland@arm.com>
On 2026/4/7 21:16, Mark Rutland wrote:
> When __switch_to() switches from a 'prev' task to a 'next' task, various
> pieces of CPU state are expected to have specific values, such that
> these do not need to be saved/restored. If any of these hold an
> unexpected value when switching away from the prev task, they could lead
> to surprising behaviour in the context of the next task, and it would be
> difficult to determine where they were configured to their unexpected
> value.
>
> Add some checks for DAIF and PMR at task-switch time so that we can
> detect such issues.
>
> Signed-off-by: Mark Rutland <mark.rutland@arm.com>
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Jinjie Ruan <ruanjinjie@huawei.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Thomas Gleixner <tglx@kernel.org>
> Cc: Vladimir Murzin <vladimir.murzin@arm.com>
> Cc: Will Deacon <will@kernel.org>
> ---
> arch/arm64/kernel/process.c | 25 +++++++++++++++++++++++++
> 1 file changed, 25 insertions(+)
>
> diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
> index 489554931231e..ba9038434d2fb 100644
> --- a/arch/arm64/kernel/process.c
> +++ b/arch/arm64/kernel/process.c
> @@ -699,6 +699,29 @@ void update_sctlr_el1(u64 sctlr)
> isb();
> }
>
> +static inline void debug_switch_state(void)
> +{
> + if (system_uses_irq_prio_masking()) {
> + unsigned long daif_expected = 0;
> + unsigned long daif_actual = read_sysreg(daif);
> + unsigned long pmr_expected = GIC_PRIO_IRQOFF;
> + unsigned long pmr_actual = read_sysreg_s(SYS_ICC_PMR_EL1);
> +
> + WARN_ONCE(daif_actual != daif_expected ||
> + pmr_actual != pmr_expected,
> + "Unexpected DAIF + PMR: 0x%lx + 0x%lx (expected 0x%lx + 0x%lx)\n",
> + daif_actual, pmr_actual,
> + daif_expected, pmr_expected);
> + } else {
> + unsigned long daif_expected = DAIF_PROCCTX_NOIRQ;
> + unsigned long daif_actual = read_sysreg(daif);
> +
> + WARN_ONCE(daif_actual != daif_expected,
> + "Unexpected DAIF value: 0x%lx (expected 0x%lx)\n",
> + daif_actual, daif_expected);
> + }
This logic seems consistent with arm64's local_irq_disable()
implementation. Do we need to wrap these debug checks in a config option
(e.g., CONFIG_ARM64_DEBUG_PRIORITY_MASKING) to avoid unnecessary overhead?
__schedule()
-> local_irq_disable()
-> arch_local_irq_disable()
52 static __always_inline void __daif_local_irq_disable(void)
53 {
54 barrier();
55 asm volatile("msr daifset, #3");
56 barrier();
57 }
58
59 static __always_inline void __pmr_local_irq_disable(void)
60 {
61 if (IS_ENABLED(CONFIG_ARM64_DEBUG_PRIORITY_MASKING)) {
62 u32 pmr = read_sysreg_s(SYS_ICC_PMR_EL1);
63 WARN_ON_ONCE(pmr != GIC_PRIO_IRQON && pmr !=
GIC_PRIO_IRQOFF);
64 }
65
66 barrier();
67 write_sysreg_s(GIC_PRIO_IRQOFF, SYS_ICC_PMR_EL1);
68 barrier();
69 }
70
71 static inline void arch_local_irq_disable(void)
72 {
73 if (system_uses_irq_prio_masking()) {
74 __pmr_local_irq_disable();
75 } else {
76 __daif_local_irq_disable();
77 }
78 }
> +}
> +
> /*
> * Thread switching.
> */
> @@ -708,6 +731,8 @@ struct task_struct *__switch_to(struct task_struct *prev,
> {
> struct task_struct *last;
>
> + debug_switch_state();
> +
> fpsimd_thread_switch(next);
> tls_thread_switch(next);
> hw_breakpoint_thread_switch(next);
^ permalink raw reply
* Re: [PATCH 09/10] arm64: entry: Use split preemption logic
From: Jinjie Ruan @ 2026-04-08 1:52 UTC (permalink / raw)
To: Mark Rutland, linux-arm-kernel, Catalin Marinas, Will Deacon
Cc: vladimir.murzin, peterz, linux-kernel, tglx, luto
In-Reply-To: <20260407131650.3813777-10-mark.rutland@arm.com>
On 2026/4/7 21:16, Mark Rutland wrote:
> The generic irqentry code now provides
> irqentry_exit_to_kernel_mode_preempt() and
> irqentry_exit_to_kernel_mode_after_preempt(), which can be used
> where architectures have different state requirements for involuntary
> preemption and exception return, as is the case on arm64.
>
> Use the new functions on arm64, aligning our exit to kernel mode logic
> with the style of our exit to user mode logic. This removes the need for
> the recently-added bodge in arch_irqentry_exit_need_resched(), and
> allows preemption to occur when returning from any exception taken from
> kernel mode, which is nicer for RT.
>
> In an ideal world, we'd remove arch_irqentry_exit_need_resched(), and
> fold the conditionality directly into the architecture-specific entry
> code. That way all the logic necessary to avoid preempting from a
> pseudo-NMI could be constrained specifically to the EL1 IRQ/FIQ paths,
> avoiding redundant work for other exceptions, and making the flow a bit
> clearer. At present it looks like that would require a larger
> refactoring (e.g. for the PREEMPT_DYNAMIC logic), and so I've left that
> as-is for now.
>
> Signed-off-by: Mark Rutland <mark.rutland@arm.com>
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Jinjie Ruan <ruanjinjie@huawei.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Thomas Gleixner <tglx@kernel.org>
> Cc: Vladimir Murzin <vladimir.murzin@arm.com>
> Cc: Will Deacon <will@kernel.org>
> ---
> arch/arm64/include/asm/entry-common.h | 21 ++++++++-------------
> arch/arm64/kernel/entry-common.c | 12 ++++--------
> 2 files changed, 12 insertions(+), 21 deletions(-)
>
> diff --git a/arch/arm64/include/asm/entry-common.h b/arch/arm64/include/asm/entry-common.h
> index 20f0a7c7bde15..cab8cd78f6938 100644
> --- a/arch/arm64/include/asm/entry-common.h
> +++ b/arch/arm64/include/asm/entry-common.h
> @@ -29,19 +29,14 @@ static __always_inline void arch_exit_to_user_mode_work(struct pt_regs *regs,
>
> static inline bool arch_irqentry_exit_need_resched(void)
> {
> - if (system_uses_irq_prio_masking()) {
> - /*
> - * DAIF.DA are cleared at the start of IRQ/FIQ handling, and when GIC
> - * priority masking is used the GIC irqchip driver will clear DAIF.IF
> - * using gic_arch_enable_irqs() for normal IRQs. If anything is set in
> - * DAIF we must have handled an NMI, so skip preemption.
> - */
> - if (read_sysreg(daif))
> - return false;
> - } else {
> - if (read_sysreg(daif) & (PSR_D_BIT | PSR_A_BIT))
> - return false;
> - }
> + /*
> + * DAIF.DA are cleared at the start of IRQ/FIQ handling, and when GIC
> + * priority masking is used the GIC irqchip driver will clear DAIF.IF
> + * using gic_arch_enable_irqs() for normal IRQs. If anything is set in
> + * DAIF we must have handled an NMI, so skip preemption.
> + */
> + if (system_uses_irq_prio_masking() && read_sysreg(daif))
> + return false;
>
> /*
> * Preempting a task from an IRQ means we leave copies of PSTATE
> diff --git a/arch/arm64/kernel/entry-common.c b/arch/arm64/kernel/entry-common.c
> index 16a65987a6a9b..f42ce7b5c67f3 100644
> --- a/arch/arm64/kernel/entry-common.c
> +++ b/arch/arm64/kernel/entry-common.c
> @@ -54,8 +54,11 @@ static noinstr irqentry_state_t arm64_enter_from_kernel_mode(struct pt_regs *reg
> static void noinstr arm64_exit_to_kernel_mode(struct pt_regs *regs,
> irqentry_state_t state)
> {
> + local_irq_disable();
> + irqentry_exit_to_kernel_mode_preempt(regs, state);
> + local_daif_mask();
> mte_check_tfsr_exit();
> - irqentry_exit_to_kernel_mode(regs, state);
> + irqentry_exit_to_kernel_mode_after_preempt(regs, state);
> }
Reviewed-by: Jinjie Ruan <ruanjinjie@huawei.com>
>
> /*
> @@ -301,7 +304,6 @@ static void noinstr el1_abort(struct pt_regs *regs, unsigned long esr)
> state = arm64_enter_from_kernel_mode(regs);
> local_daif_inherit(regs);
> do_mem_abort(far, esr, regs);
> - local_daif_mask();
> arm64_exit_to_kernel_mode(regs, state);
> }
>
> @@ -313,7 +315,6 @@ static void noinstr el1_pc(struct pt_regs *regs, unsigned long esr)
> state = arm64_enter_from_kernel_mode(regs);
> local_daif_inherit(regs);
> do_sp_pc_abort(far, esr, regs);
> - local_daif_mask();
> arm64_exit_to_kernel_mode(regs, state);
> }
>
> @@ -324,7 +325,6 @@ static void noinstr el1_undef(struct pt_regs *regs, unsigned long esr)
> state = arm64_enter_from_kernel_mode(regs);
> local_daif_inherit(regs);
> do_el1_undef(regs, esr);
> - local_daif_mask();
> arm64_exit_to_kernel_mode(regs, state);
> }
>
> @@ -335,7 +335,6 @@ static void noinstr el1_bti(struct pt_regs *regs, unsigned long esr)
> state = arm64_enter_from_kernel_mode(regs);
> local_daif_inherit(regs);
> do_el1_bti(regs, esr);
> - local_daif_mask();
> arm64_exit_to_kernel_mode(regs, state);
> }
>
> @@ -346,7 +345,6 @@ static void noinstr el1_gcs(struct pt_regs *regs, unsigned long esr)
> state = arm64_enter_from_kernel_mode(regs);
> local_daif_inherit(regs);
> do_el1_gcs(regs, esr);
> - local_daif_mask();
> arm64_exit_to_kernel_mode(regs, state);
> }
>
> @@ -357,7 +355,6 @@ static void noinstr el1_mops(struct pt_regs *regs, unsigned long esr)
> state = arm64_enter_from_kernel_mode(regs);
> local_daif_inherit(regs);
> do_el1_mops(regs, esr);
> - local_daif_mask();
> arm64_exit_to_kernel_mode(regs, state);
> }
>
> @@ -423,7 +420,6 @@ static void noinstr el1_fpac(struct pt_regs *regs, unsigned long esr)
> state = arm64_enter_from_kernel_mode(regs);
> local_daif_inherit(regs);
> do_el1_fpac(regs, esr);
> - local_daif_mask();
> arm64_exit_to_kernel_mode(regs, state);
> }
>
^ permalink raw reply
* Re: [PATCH 08/10] arm64: entry: Use irqentry_{enter_from,exit_to}_kernel_mode()
From: Jinjie Ruan @ 2026-04-08 1:50 UTC (permalink / raw)
To: Mark Rutland, linux-arm-kernel, Catalin Marinas, Will Deacon
Cc: vladimir.murzin, peterz, linux-kernel, tglx, luto
In-Reply-To: <20260407131650.3813777-9-mark.rutland@arm.com>
On 2026/4/7 21:16, Mark Rutland wrote:
> The generic irqentry code now provides irqentry_enter_from_kernel_mode()
> and irqentry_exit_to_kernel_mode(), which can be used when an exception
> is known to be taken from kernel mode. These can be inlined into
> architecture-specific entry code, and avoid redundant work to test
> whether the exception was taken from user mode.
>
> Use these in arm64_enter_from_kernel_mode() and
> arm64_exit_to_kernel_mode(), which are only used for exceptions known to
> be taken from kernel mode. This will remove a small amount of redundant
> work, and will permit further changes to arm64_exit_to_kernel_mode() in
> subsequent patches.
>
> There should be no funcitonal change as a result of this patch.
>
> Signed-off-by: Mark Rutland <mark.rutland@arm.com>
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Jinjie Ruan <ruanjinjie@huawei.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Thomas Gleixner <tglx@kernel.org>
> Cc: Vladimir Murzin <vladimir.murzin@arm.com>
> Cc: Will Deacon <will@kernel.org>
> ---
> arch/arm64/kernel/entry-common.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/arch/arm64/kernel/entry-common.c b/arch/arm64/kernel/entry-common.c
> index 3d01cdacdc7a2..16a65987a6a9b 100644
> --- a/arch/arm64/kernel/entry-common.c
> +++ b/arch/arm64/kernel/entry-common.c
> @@ -39,7 +39,7 @@ static noinstr irqentry_state_t arm64_enter_from_kernel_mode(struct pt_regs *reg
> {
> irqentry_state_t state;
>
> - state = irqentry_enter(regs);
> + state = irqentry_enter_from_kernel_mode(regs);
> mte_check_tfsr_entry();
> mte_disable_tco_entry(current);
>
> @@ -55,7 +55,7 @@ static void noinstr arm64_exit_to_kernel_mode(struct pt_regs *regs,
> irqentry_state_t state)
> {
> mte_check_tfsr_exit();
> - irqentry_exit(regs, state);
> + irqentry_exit_to_kernel_mode(regs, state);
Reviewed-by: Jinjie Ruan <ruanjinjie@huawei.com>
> }
>
> /*
^ permalink raw reply
* Re: [PATCH 07/10] arm64: entry: Consistently prefix arm64-specific wrappers
From: Jinjie Ruan @ 2026-04-08 1:49 UTC (permalink / raw)
To: Mark Rutland, linux-arm-kernel, Catalin Marinas, Will Deacon
Cc: vladimir.murzin, peterz, linux-kernel, tglx, luto
In-Reply-To: <20260407131650.3813777-8-mark.rutland@arm.com>
On 2026/4/7 21:16, Mark Rutland wrote:
> For historical reasons, arm64's entry code has arm64-specific functions
> named enter_from_kernel_mode() and exit_to_kernel_mode(), which are
> wrappers for similarly-named functions from the generic irqentry code.
> Other arm64-specific wrappers have an 'arm64_' prefix to clearly
> distinguish them from their generic counterparts, e.g.
> arm64_enter_from_user_mode() and arm64_exit_to_user_mode().
>
> For consistency and clarity, add an 'arm64_' prefix to these functions.
>
> There should be no functional change as a result of this patch.
>
> Signed-off-by: Mark Rutland <mark.rutland@arm.com>
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Jinjie Ruan <ruanjinjie@huawei.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Thomas Gleixner <tglx@kernel.org>
> Cc: Vladimir Murzin <vladimir.murzin@arm.com>
> Cc: Will Deacon <will@kernel.org>
> ---
> arch/arm64/kernel/entry-common.c | 38 ++++++++++++++++----------------
> 1 file changed, 19 insertions(+), 19 deletions(-)
Reviewed-by: Jinjie Ruan <ruanjinjie@huawei.com>
>
> diff --git a/arch/arm64/kernel/entry-common.c b/arch/arm64/kernel/entry-common.c
> index 3625797e9ee8f..3d01cdacdc7a2 100644
> --- a/arch/arm64/kernel/entry-common.c
> +++ b/arch/arm64/kernel/entry-common.c
> @@ -35,7 +35,7 @@
> * Before this function is called it is not safe to call regular kernel code,
> * instrumentable code, or any code which may trigger an exception.
> */
> -static noinstr irqentry_state_t enter_from_kernel_mode(struct pt_regs *regs)
> +static noinstr irqentry_state_t arm64_enter_from_kernel_mode(struct pt_regs *regs)
> {
> irqentry_state_t state;
>
> @@ -51,8 +51,8 @@ static noinstr irqentry_state_t enter_from_kernel_mode(struct pt_regs *regs)
> * After this function returns it is not safe to call regular kernel code,
> * instrumentable code, or any code which may trigger an exception.
> */
> -static void noinstr exit_to_kernel_mode(struct pt_regs *regs,
> - irqentry_state_t state)
> +static void noinstr arm64_exit_to_kernel_mode(struct pt_regs *regs,
> + irqentry_state_t state)
> {
> mte_check_tfsr_exit();
> irqentry_exit(regs, state);
> @@ -298,11 +298,11 @@ static void noinstr el1_abort(struct pt_regs *regs, unsigned long esr)
> unsigned long far = read_sysreg(far_el1);
> irqentry_state_t state;
>
> - state = enter_from_kernel_mode(regs);
> + state = arm64_enter_from_kernel_mode(regs);
> local_daif_inherit(regs);
> do_mem_abort(far, esr, regs);
> local_daif_mask();
> - exit_to_kernel_mode(regs, state);
> + arm64_exit_to_kernel_mode(regs, state);
> }
>
> static void noinstr el1_pc(struct pt_regs *regs, unsigned long esr)
> @@ -310,55 +310,55 @@ static void noinstr el1_pc(struct pt_regs *regs, unsigned long esr)
> unsigned long far = read_sysreg(far_el1);
> irqentry_state_t state;
>
> - state = enter_from_kernel_mode(regs);
> + state = arm64_enter_from_kernel_mode(regs);
> local_daif_inherit(regs);
> do_sp_pc_abort(far, esr, regs);
> local_daif_mask();
> - exit_to_kernel_mode(regs, state);
> + arm64_exit_to_kernel_mode(regs, state);
> }
>
> static void noinstr el1_undef(struct pt_regs *regs, unsigned long esr)
> {
> irqentry_state_t state;
>
> - state = enter_from_kernel_mode(regs);
> + state = arm64_enter_from_kernel_mode(regs);
> local_daif_inherit(regs);
> do_el1_undef(regs, esr);
> local_daif_mask();
> - exit_to_kernel_mode(regs, state);
> + arm64_exit_to_kernel_mode(regs, state);
> }
>
> static void noinstr el1_bti(struct pt_regs *regs, unsigned long esr)
> {
> irqentry_state_t state;
>
> - state = enter_from_kernel_mode(regs);
> + state = arm64_enter_from_kernel_mode(regs);
> local_daif_inherit(regs);
> do_el1_bti(regs, esr);
> local_daif_mask();
> - exit_to_kernel_mode(regs, state);
> + arm64_exit_to_kernel_mode(regs, state);
> }
>
> static void noinstr el1_gcs(struct pt_regs *regs, unsigned long esr)
> {
> irqentry_state_t state;
>
> - state = enter_from_kernel_mode(regs);
> + state = arm64_enter_from_kernel_mode(regs);
> local_daif_inherit(regs);
> do_el1_gcs(regs, esr);
> local_daif_mask();
> - exit_to_kernel_mode(regs, state);
> + arm64_exit_to_kernel_mode(regs, state);
> }
>
> static void noinstr el1_mops(struct pt_regs *regs, unsigned long esr)
> {
> irqentry_state_t state;
>
> - state = enter_from_kernel_mode(regs);
> + state = arm64_enter_from_kernel_mode(regs);
> local_daif_inherit(regs);
> do_el1_mops(regs, esr);
> local_daif_mask();
> - exit_to_kernel_mode(regs, state);
> + arm64_exit_to_kernel_mode(regs, state);
> }
>
> static void noinstr el1_breakpt(struct pt_regs *regs, unsigned long esr)
> @@ -420,11 +420,11 @@ static void noinstr el1_fpac(struct pt_regs *regs, unsigned long esr)
> {
> irqentry_state_t state;
>
> - state = enter_from_kernel_mode(regs);
> + state = arm64_enter_from_kernel_mode(regs);
> local_daif_inherit(regs);
> do_el1_fpac(regs, esr);
> local_daif_mask();
> - exit_to_kernel_mode(regs, state);
> + arm64_exit_to_kernel_mode(regs, state);
> }
>
> asmlinkage void noinstr el1h_64_sync_handler(struct pt_regs *regs)
> @@ -491,13 +491,13 @@ static __always_inline void __el1_irq(struct pt_regs *regs,
> {
> irqentry_state_t state;
>
> - state = enter_from_kernel_mode(regs);
> + state = arm64_enter_from_kernel_mode(regs);
>
> irq_enter_rcu();
> do_interrupt_handler(regs, handler);
> irq_exit_rcu();
>
> - exit_to_kernel_mode(regs, state);
> + arm64_exit_to_kernel_mode(regs, state);
> }
> static void noinstr el1_interrupt(struct pt_regs *regs,
> void (*handler)(struct pt_regs *))
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox