Linux-ARM-Kernel Archive on lore.kernel.org

Linux-ARM-Kernel Archive on lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH] drm/bridge: imx8qxp-pxl2dpi: avoid of_node_put() on ERR_PTR()
From: Guangshuo Li @ 2026-04-19 12:21 UTC (permalink / raw)
  To: Liu Ying, Andrzej Hajda, Neil Armstrong, Robert Foss,
	Laurent Pinchart, Jonas Karlman, Jernej Skrabec,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Simona Vetter, Frank Li, Sascha Hauer, Pengutronix Kernel Team,
	Fabio Estevam, Luca Ceresoli, dri-devel, imx, linux-arm-kernel,
	linux-kernel
  Cc: Guangshuo Li, stable

imx8qxp_pxl2dpi_get_available_ep_from_port() may return ERR_PTR(-ENODEV)
or ERR_PTR(-EINVAL). imx8qxp_pxl2dpi_find_next_bridge() stores that
value in a __free(device_node) variable and then immediately checks
IS_ERR(ep).

On the error path, returning from the function triggers the cleanup
handler for __free(device_node). Since the device_node cleanup helper
only checks for NULL before calling of_node_put(), this results in
of_node_put(ERR_PTR(...)), which may lead to an invalid kobject_put()
dereference and crash the kernel.

Fix it by avoiding __free(device_node) for the endpoint pointer and
releasing it explicitly after obtaining the remote port parent.

This issue was found by a custom static analysis tool.

Fixes: ceea3f7806a10 ("drm/bridge: imx8qxp-pxl2dpi: simplify put of device_node pointers")
Cc: stable@vger.kernel.org
Signed-off-by: Guangshuo Li <lgs201920130244@gmail.com>
---
 drivers/gpu/drm/bridge/imx/imx8qxp-pxl2dpi.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/bridge/imx/imx8qxp-pxl2dpi.c b/drivers/gpu/drm/bridge/imx/imx8qxp-pxl2dpi.c
index 441fd32dc91c..3610ca94a8e6 100644
--- a/drivers/gpu/drm/bridge/imx/imx8qxp-pxl2dpi.c
+++ b/drivers/gpu/drm/bridge/imx/imx8qxp-pxl2dpi.c
@@ -264,12 +264,15 @@ imx8qxp_pxl2dpi_get_available_ep_from_port(struct imx8qxp_pxl2dpi *p2d,
 
 static int imx8qxp_pxl2dpi_find_next_bridge(struct imx8qxp_pxl2dpi *p2d)
 {
-	struct device_node *ep __free(device_node) =
-		imx8qxp_pxl2dpi_get_available_ep_from_port(p2d, 1);
+	struct device_node *ep;
+
+	ep = imx8qxp_pxl2dpi_get_available_ep_from_port(p2d, 1);
 	if (IS_ERR(ep))
 		return PTR_ERR(ep);
 
 	struct device_node *remote __free(device_node) = of_graph_get_remote_port_parent(ep);
+	of_node_put(ep);
+
 	if (!remote || !of_device_is_available(remote)) {
 		DRM_DEV_ERROR(p2d->dev, "no available remote\n");
 		return -ENODEV;
-- 
2.43.0



^ permalink raw reply related

* [PATCH v3 2/2] hwrng: mtk - add support for hw access via SMCC
From: Daniel Golle @ 2026-04-19 12:05 UTC (permalink / raw)
  To: Olivia Mackall, Herbert Xu, Rob Herring, Krzysztof Kozlowski,
	Conor Dooley, Matthias Brugger, AngeloGioacchino Del Regno,
	Sean Wang, Daniel Golle, linux-crypto, devicetree, linux-kernel,
	linux-arm-kernel, linux-mediatek
In-Reply-To: <585fc832e4e5d3656bd25ecee6bafb636993104a.1776600269.git.daniel@makrotopia.org>

Newer versions of ARM TrustedFirmware-A on MediaTek's ARMv8 SoCs no longer
allow accessing the TRNG from outside of the trusted firmware.
Instead, a vendor-defined custom Secure Monitor Call can be used to
acquire random bytes.

Add support for newer SoCs (MT7981, MT7987, MT7988).

As TF-A for the MT7986 may either follow the old or the new
convention, the best bet is to test if firmware blocks direct access
to the hwrng and if so, expect the SMCC interface to be usable.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
---
v3: unchanged
v2: unchanged

 drivers/char/hw_random/mtk-rng.c | 127 ++++++++++++++++++++++++++-----
 1 file changed, 106 insertions(+), 21 deletions(-)

diff --git a/drivers/char/hw_random/mtk-rng.c b/drivers/char/hw_random/mtk-rng.c
index 5808d09d12c45..8f5856b59ad66 100644
--- a/drivers/char/hw_random/mtk-rng.c
+++ b/drivers/char/hw_random/mtk-rng.c
@@ -3,6 +3,7 @@
  * Driver for Mediatek Hardware Random Number Generator
  *
  * Copyright (C) 2017 Sean Wang <sean.wang@mediatek.com>
+ * Copyright (C) 2026 Daniel Golle <daniel@makrotopia.org>
  */
 #define MTK_RNG_DEV KBUILD_MODNAME
 
@@ -17,6 +18,8 @@
 #include <linux/of.h>
 #include <linux/platform_device.h>
 #include <linux/pm_runtime.h>
+#include <linux/arm-smccc.h>
+#include <linux/soc/mediatek/mtk_sip_svc.h>
 
 /* Runtime PM autosuspend timeout: */
 #define RNG_AUTOSUSPEND_TIMEOUT		100
@@ -30,6 +33,11 @@
 
 #define RNG_DATA			0x08
 
+/* Driver feature flags */
+#define MTK_RNG_SMC			BIT(0)
+
+#define MTK_SIP_KERNEL_GET_RND		MTK_SIP_SMC_CMD(0x550)
+
 #define to_mtk_rng(p)	container_of(p, struct mtk_rng, rng)
 
 struct mtk_rng {
@@ -37,6 +45,7 @@ struct mtk_rng {
 	struct clk *clk;
 	struct hwrng rng;
 	struct device *dev;
+	unsigned long flags;
 };
 
 static int mtk_rng_init(struct hwrng *rng)
@@ -103,6 +112,56 @@ static int mtk_rng_read(struct hwrng *rng, void *buf, size_t max, bool wait)
 	return retval || !wait ? retval : -EIO;
 }
 
+static int mtk_rng_read_smc(struct hwrng *rng, void *buf, size_t max,
+			    bool wait)
+{
+	struct arm_smccc_res res;
+	int retval = 0;
+
+	while (max >= sizeof(u32)) {
+		arm_smccc_smc(MTK_SIP_KERNEL_GET_RND, 0, 0, 0, 0, 0, 0, 0,
+			      &res);
+		if (res.a0)
+			break;
+
+		*(u32 *)buf = res.a1;
+		retval += sizeof(u32);
+		buf += sizeof(u32);
+		max -= sizeof(u32);
+	}
+
+	return retval || !wait ? retval : -EIO;
+}
+
+static bool mtk_rng_hw_accessible(struct mtk_rng *priv)
+{
+	u32 val;
+	int err;
+
+	err = clk_prepare_enable(priv->clk);
+	if (err)
+		return false;
+
+	val = readl(priv->base + RNG_CTRL);
+	val |= RNG_EN;
+	writel(val, priv->base + RNG_CTRL);
+
+	val = readl(priv->base + RNG_CTRL);
+
+	if (val & RNG_EN) {
+		/* HW is accessible, clean up: disable RNG and clock */
+		writel(val & ~RNG_EN, priv->base + RNG_CTRL);
+		clk_disable_unprepare(priv->clk);
+		return true;
+	}
+
+	/*
+	 * If TF-A blocks direct access, the register reads back as 0.
+	 * Leave the clock enabled as TF-A needs it.
+	 */
+	return false;
+}
+
 static int mtk_rng_probe(struct platform_device *pdev)
 {
 	int ret;
@@ -114,23 +173,42 @@ static int mtk_rng_probe(struct platform_device *pdev)
 
 	priv->dev = &pdev->dev;
 	priv->rng.name = pdev->name;
-#ifndef CONFIG_PM
-	priv->rng.init = mtk_rng_init;
-	priv->rng.cleanup = mtk_rng_cleanup;
-#endif
-	priv->rng.read = mtk_rng_read;
 	priv->rng.quality = 900;
-
-	priv->clk = devm_clk_get(&pdev->dev, "rng");
-	if (IS_ERR(priv->clk)) {
-		ret = PTR_ERR(priv->clk);
-		dev_err(&pdev->dev, "no clock for device: %d\n", ret);
-		return ret;
+	priv->flags = (unsigned long)device_get_match_data(&pdev->dev);
+
+	if (!(priv->flags & MTK_RNG_SMC)) {
+		priv->clk = devm_clk_get(&pdev->dev, "rng");
+		if (IS_ERR(priv->clk)) {
+			ret = PTR_ERR(priv->clk);
+			dev_err(&pdev->dev, "no clock for device: %d\n", ret);
+			return ret;
+		}
+
+		priv->base = devm_platform_ioremap_resource(pdev, 0);
+		if (IS_ERR(priv->base))
+			return PTR_ERR(priv->base);
+
+		if (IS_ENABLED(CONFIG_HAVE_ARM_SMCCC) &&
+		    of_device_is_compatible(pdev->dev.of_node,
+					    "mediatek,mt7986-rng") &&
+		    !mtk_rng_hw_accessible(priv)) {
+			priv->flags |= MTK_RNG_SMC;
+			dev_info(&pdev->dev,
+				 "HW RNG not MMIO accessible, using SMC\n");
+		}
 	}
 
-	priv->base = devm_platform_ioremap_resource(pdev, 0);
-	if (IS_ERR(priv->base))
-		return PTR_ERR(priv->base);
+	if (priv->flags & MTK_RNG_SMC) {
+		if (!IS_ENABLED(CONFIG_HAVE_ARM_SMCCC))
+			return -ENODEV;
+		priv->rng.read = mtk_rng_read_smc;
+	} else {
+#ifndef CONFIG_PM
+		priv->rng.init = mtk_rng_init;
+		priv->rng.cleanup = mtk_rng_cleanup;
+#endif
+		priv->rng.read = mtk_rng_read;
+	}
 
 	ret = devm_hwrng_register(&pdev->dev, &priv->rng);
 	if (ret) {
@@ -139,12 +217,15 @@ static int mtk_rng_probe(struct platform_device *pdev)
 		return ret;
 	}
 
-	dev_set_drvdata(&pdev->dev, priv);
-	pm_runtime_set_autosuspend_delay(&pdev->dev, RNG_AUTOSUSPEND_TIMEOUT);
-	pm_runtime_use_autosuspend(&pdev->dev);
-	ret = devm_pm_runtime_enable(&pdev->dev);
-	if (ret)
-		return ret;
+	if (!(priv->flags & MTK_RNG_SMC)) {
+		dev_set_drvdata(&pdev->dev, priv);
+		pm_runtime_set_autosuspend_delay(&pdev->dev,
+						 RNG_AUTOSUSPEND_TIMEOUT);
+		pm_runtime_use_autosuspend(&pdev->dev);
+		ret = devm_pm_runtime_enable(&pdev->dev);
+		if (ret)
+			return ret;
+	}
 
 	dev_info(&pdev->dev, "registered RNG driver\n");
 
@@ -181,8 +262,11 @@ static const struct dev_pm_ops mtk_rng_pm_ops = {
 #endif	/* CONFIG_PM */
 
 static const struct of_device_id mtk_rng_match[] = {
-	{ .compatible = "mediatek,mt7986-rng" },
 	{ .compatible = "mediatek,mt7623-rng" },
+	{ .compatible = "mediatek,mt7981-rng", .data = (void *)MTK_RNG_SMC },
+	{ .compatible = "mediatek,mt7986-rng" },
+	{ .compatible = "mediatek,mt7987-rng", .data = (void *)MTK_RNG_SMC },
+	{ .compatible = "mediatek,mt7988-rng", .data = (void *)MTK_RNG_SMC },
 	{},
 };
 MODULE_DEVICE_TABLE(of, mtk_rng_match);
@@ -200,4 +284,5 @@ module_platform_driver(mtk_rng_driver);
 
 MODULE_DESCRIPTION("Mediatek Random Number Generator Driver");
 MODULE_AUTHOR("Sean Wang <sean.wang@mediatek.com>");
+MODULE_AUTHOR("Daniel Golle <daniel@makrotopia.org>");
 MODULE_LICENSE("GPL");
-- 
2.53.0


^ permalink raw reply related

* [PATCH v3 1/2] dt-bindings: rng: mtk-rng: add SMC-based TRNG variants
From: Daniel Golle @ 2026-04-19 12:05 UTC (permalink / raw)
  To: Olivia Mackall, Herbert Xu, Rob Herring, Krzysztof Kozlowski,
	Conor Dooley, Matthias Brugger, AngeloGioacchino Del Regno,
	Sean Wang, Daniel Golle, linux-crypto, devicetree, linux-kernel,
	linux-arm-kernel, linux-mediatek

Add compatible strings for MediaTek SoCs where the hardware random number
generator is accessed via a vendor-defined Secure Monitor Call (SMC)
rather than direct MMIO register access:

  - mediatek,mt7981-rng
  - mediatek,mt7987-rng
  - mediatek,mt7988-rng

These variants require no reg, clocks, or clock-names properties since
the RNG hardware is managed by ARM Trusted Firmware-A.

Relax the $nodename pattern to also allow 'rng' in addition to the
existing 'rng@...' pattern.

Add a second example showing the minimal SMC variant binding.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
---
v3:
 * drop not: in compatible conditional
 * add reg/clocks/clock-names: false for mt7981-rng
 * add else: requiring reg/clocks/clock-names for others

v2: express compatibilities with fallback

 .../devicetree/bindings/rng/mtk-rng.yaml      | 32 ++++++++++++++++---
 1 file changed, 28 insertions(+), 4 deletions(-)

diff --git a/Documentation/devicetree/bindings/rng/mtk-rng.yaml b/Documentation/devicetree/bindings/rng/mtk-rng.yaml
index 7e8dc62e5d3a6..8fe6c209ab1e5 100644
--- a/Documentation/devicetree/bindings/rng/mtk-rng.yaml
+++ b/Documentation/devicetree/bindings/rng/mtk-rng.yaml
@@ -11,12 +11,13 @@ maintainers:
 
 properties:
   $nodename:
-    pattern: "^rng@[0-9a-f]+$"
+    pattern: "^rng(@[0-9a-f]+)?$"
 
   compatible:
     oneOf:
       - enum:
           - mediatek,mt7623-rng
+          - mediatek,mt7981-rng
       - items:
           - enum:
               - mediatek,mt7622-rng
@@ -25,6 +26,11 @@ properties:
               - mediatek,mt8365-rng
               - mediatek,mt8516-rng
           - const: mediatek,mt7623-rng
+      - items:
+          - enum:
+              - mediatek,mt7987-rng
+              - mediatek,mt7988-rng
+          - const: mediatek,mt7981-rng
 
   reg:
     maxItems: 1
@@ -38,9 +44,23 @@ properties:
 
 required:
   - compatible
-  - reg
-  - clocks
-  - clock-names
+
+allOf:
+  - if:
+      properties:
+        compatible:
+          contains:
+            const: mediatek,mt7981-rng
+    then:
+      properties:
+        reg: false
+        clocks: false
+        clock-names: false
+    else:
+      required:
+        - reg
+        - clocks
+        - clock-names
 
 additionalProperties: false
 
@@ -53,3 +73,7 @@ examples:
             clocks = <&infracfg CLK_INFRA_TRNG>;
             clock-names = "rng";
     };
+  - |
+    rng {
+            compatible = "mediatek,mt7981-rng";
+    };
-- 
2.53.0


^ permalink raw reply related

* [RESEND] Re: [PATCH RFC v3 4/4] mm: add PMD-level huge page support for remap_pfn_range()
From: Yin Tirui @ 2026-04-19 11:41 UTC (permalink / raw)
  To: David Hildenbrand (Arm), lorenzo.stoakes
  Cc: linux-kernel, linux-mm, x86, linux-arm-kernel, willy, jgross,
	catalin.marinas, will, tglx, mingo, bp, dave.hansen, hpa, luto,
	peterz, akpm, ziy, baolin.wang, Liam.Howlett, npache,
	ryan.roberts, dev.jain, baohua, lance.yang, vbabka, rppt, surenb,
	mhocko, anshuman.khandual, rmclure, kevin.brodsky, apopple, ajd,
	pasha.tatashin, bhe, thuth, coxu, dan.j.williams, yu-cheng.yu,
	baolu.lu, conor.dooley, Jonathan.Cameron, riel, wangkefeng.wang,
	chenjun102
In-Reply-To: <9886b2c6-5516-4be8-ac31-db3133455af2@kernel.org>

(Resending to keep the thread intact, sorry for the noise)

Hi David,

Thanks a lot for the thorough review!

On 4/14/26 04:02, David Hildenbrand (Arm) wrote:
> On 2/28/26 08:09, Yin Tirui wrote:
>> Add PMD-level huge page support to remap_pfn_range(), automatically
>> creating huge mappings when prerequisites are satisfied (size, alignment,
>> architecture support, etc.) and falling back to normal page mappings
>> otherwise.
>>
>> Implement special huge PMD splitting by utilizing the pgtable deposit/
>> withdraw mechanism. When splitting is needed, the deposited pgtable is
>> withdrawn and populated with individual PTEs created from the original
>> huge mapping.
>>
>> Signed-off-by: Yin Tirui <yintirui@huawei.com>
>> ---
>
> [...]
>
>>
>>  	if (!vma_is_anonymous(vma)) {
>>  		old_pmd = pmdp_huge_clear_flush(vma, haddr, pmd);
>> +
>> +		if (!vma_is_dax(vma) && vma_is_special_huge(vma)) {
>
> These magical vma checks are really bad. This all needs a cleanup
> (Lorenzo is doing some, hoping it will look better on top of that).
>
Agreed. I am following Lorenzo's recent cleanups closely.

>> +			pte_t entry;
>> +
>> +			if (!pmd_special(old_pmd)) {
>
> If you are using pmd_special(), you are doing something wrong.
>
> Hint: vm_normal_page_pmd() is usually what you want.

Spot on.

While looking into applying vm_normal_folio_pmd() here to avoid the
magical VMA checks, I realized that both __split_huge_pmd_locked() and
copy_huge_pmd() currently suffer from the same !vma_is_anonymous(vma)
top-level entanglement.I think these functions could benefit from a
structural refactoring similar to what Lorenzo is currently doing in
zap_huge_pmd().

My idea is to flatten both functions into a pmd_present()-driven
decision tree:
1. Branch strictly on pmd_present().
2. For present PMDs, use vm_normal_folio_pmd() as the single source of
truth.
3. If !folio (and not a huge zero page), it cleanly identifies special
mappings (like PFNMAPs) without relying on vma_is_special_huge(). We can
handle the split/copy directly and return early.
4. Otherwise, proceed with the normal Anon/File THP logic, or handle
non-present migration entries in the !pmd_present() branch.

I have drafted two preparation patches demonstrating this approach and
appended the diffs at the end of this email. Does this direction look
reasonable to you? If so, I will iron out the implementation details and
include these refactoring patches in my upcoming v4 series.

>
>> +				zap_deposited_table(mm, pmd);
>> +				return;
>> +			}
>> +			pgtable = pgtable_trans_huge_withdraw(mm, pmd);
>> +			if (unlikely(!pgtable))
>> +				return;
>> +			pmd_populate(mm, &_pmd, pgtable);
>> +			pte = pte_offset_map(&_pmd, haddr);
>> +			entry = pfn_pte(pmd_pfn(old_pmd), pmd_pgprot(old_pmd));
>> +			set_ptes(mm, haddr, pte, entry, HPAGE_PMD_NR);
>> +			pte_unmap(pte);
>> +
>> +			smp_wmb(); /* make pte visible before pmd */
>> +			pmd_populate(mm, pmd, pgtable);
>> +			return;
>> +		}
>> +
>>  		/*
>>  		 * We are going to unmap this huge page. So
>>  		 * just go ahead and zap it
>>  		 */
>>  		if (arch_needs_pgtable_deposit())
>>  			zap_deposited_table(mm, pmd);
>> -		if (!vma_is_dax(vma) && vma_is_special_huge(vma))
>> -			return;
>> +
>>  		if (unlikely(pmd_is_migration_entry(old_pmd))) {
>>  			const softleaf_t old_entry = softleaf_from_pmd(old_pmd);
>>
>> diff --git a/mm/memory.c b/mm/memory.c
>> index 07778814b4a8..affccf38cbcf 100644
>> --- a/mm/memory.c
>> +++ b/mm/memory.c
>> @@ -2890,6 +2890,40 @@ static int remap_pte_range(struct mm_struct
*mm, pmd_t *pmd,
>>  	return err;
>>  }
>>
>> +#ifdef CONFIG_ARCH_SUPPORTS_PMD_PFNMAP
>
> Why exactly do we need arch support for that in form of a Kconfig.
>
> Usually, we guard pmd support by CONFIG_TRANSPARENT_HUGEPAGE.
>
> And then, we must check at runtime if PMD leaves are actually supported.
>
> Luiz is working on a cleanup series:
>
> https://lore.kernel.org/r/cover.1775679721.git.luizcap@redhat.com
>
> pgtable_has_pmd_leaves() is what you would want to check.

Makes sense. This Kconfig was inherited from Peter Xu's earlier
proposal, but depending on CONFIG_TRANSPARENT_HUGEPAGE and
pgtable_has_pmd_leaves() is indeed the correct standard. I will rebase
on Luiz's series.

>
>
>> +static int remap_try_huge_pmd(struct mm_struct *mm, pmd_t *pmd,
>> +			unsigned long addr, unsigned long end,
>> +			unsigned long pfn, pgprot_t prot)
>
> Use two-tab indent. (currently 3? 🙂 )
>
> Also, we tend to call these things now "pmd leaves". Call it
>  "remap_try_pmd_leaf" or something even more expressive like
>
> "remap_try_install_pmd_leaf()"
>

Noted. Will fix the indentation and rename it.

>> +{
>> +	pgtable_t pgtable;
>> +	spinlock_t *ptl;
>> +
>> +	if ((end - addr) != PMD_SIZE)
>
> 	if (end - addr != PMD_SIZE)
>
> Should work

Noted.

>
>> +		return 0;
>> +
>> +	if (!IS_ALIGNED(addr, PMD_SIZE))
>> +		return 0;
>> +
>
> You could likely combine both things into a
>
> 	if (!IS_ALIGNED(addr | end, PMD_SIZE))
>
>> +	if (!IS_ALIGNED(pfn, HPAGE_PMD_NR))
>
> Another sign that you piggy-back on THP support 😉

Indeed! 🙂

>
>> +		return 0;
>> +
>> +	if (pmd_present(*pmd) && !pmd_free_pte_page(pmd, addr))
>> +		return 0;
>
> Ripping out a page table?! That doesn't sound right 🙂
>
> Why is that required? We shouldn't be doing that here. Gah.
>
> Especially, without any pmd locks etc.

...oops. That is indeed a silly one. Thanks for catching it.
I will fix this to:

	if (!pmd_none(*pmd))
		return 0;

>
>> +
>> +	pgtable = pte_alloc_one(mm);
>> +	if (unlikely(!pgtable))
>> +		return 0;
>> +
>> +	mm_inc_nr_ptes(mm);
>> +	ptl = pmd_lock(mm, pmd);
>> +	set_pmd_at(mm, addr, pmd, pmd_mkspecial(pmd_mkhuge(pfn_pmd(pfn,
prot))));
>> +	pgtable_trans_huge_deposit(mm, pmd, pgtable);
>> +	spin_unlock(ptl);
>> +
>> +	return 1;
>> +}
>> +#endif
>> +
>>  static inline int remap_pmd_range(struct mm_struct *mm, pud_t *pud,
>>  			unsigned long addr, unsigned long end,
>>  			unsigned long pfn, pgprot_t prot)
>> @@ -2905,6 +2939,12 @@ static inline int remap_pmd_range(struct
mm_struct *mm, pud_t *pud,
>>  	VM_BUG_ON(pmd_trans_huge(*pmd));
>>  	do {
>>  		next = pmd_addr_end(addr, end);
>> +#ifdef CONFIG_ARCH_SUPPORTS_PMD_PFNMAP
>> +		if (remap_try_huge_pmd(mm, pmd, addr, next,
>> +				pfn + (addr >> PAGE_SHIFT), prot)) {
>
> Please provide a stub instead so we don't end up with ifdef in this code.

Will do.

>

Appendix:

1. copy_huge_pmd()

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 42c983821c03..3f8b3f15c6ba 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1912,35 +1912,11 @@ int copy_huge_pmd(struct mm_struct *dst_mm,
struct mm_struct *src_mm,
 		  struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma)
 {
 	spinlock_t *dst_ptl, *src_ptl;
-	struct page *src_page;
 	struct folio *src_folio;
 	pmd_t pmd;
 	pgtable_t pgtable = NULL;
 	int ret = -ENOMEM;

-	pmd = pmdp_get_lockless(src_pmd);
-	if (unlikely(pmd_present(pmd) && pmd_special(pmd) &&
-		     !is_huge_zero_pmd(pmd))) {
-		dst_ptl = pmd_lock(dst_mm, dst_pmd);
-		src_ptl = pmd_lockptr(src_mm, src_pmd);
-		spin_lock_nested(src_ptl, SINGLE_DEPTH_NESTING);
-		/*
-		 * No need to recheck the pmd, it can't change with write
-		 * mmap lock held here.
-		 *
-		 * Meanwhile, making sure it's not a CoW VMA with writable
-		 * mapping, otherwise it means either the anon page wrongly
-		 * applied special bit, or we made the PRIVATE mapping be
-		 * able to wrongly write to the backend MMIO.
-		 */
-		VM_WARN_ON_ONCE(is_cow_mapping(src_vma->vm_flags) && pmd_write(pmd));
-		goto set_pmd;
-	}
-
-	/* Skip if can be re-fill on fault */
-	if (!vma_is_anonymous(dst_vma))
-		return 0;
-
 	pgtable = pte_alloc_one(dst_mm);
 	if (unlikely(!pgtable))
 		goto out;
@@ -1952,48 +1928,69 @@ int copy_huge_pmd(struct mm_struct *dst_mm,
struct mm_struct *src_mm,
 	ret = -EAGAIN;
 	pmd = *src_pmd;

-	if (unlikely(thp_migration_supported() &&
-		     pmd_is_valid_softleaf(pmd))) {
+	if (likely(pmd_present(pmd))) {
+		src_folio = vm_normal_folio_pmd(src_vma, addr, pmd);
+		if (unlikely(!src_folio)) {
+			/*
+			 * When page table lock is held, the huge zero pmd should not be
+			 * under splitting since we don't split the page itself, only pmd to
+			 * a page table.
+			 */
+			if (is_huge_zero_pmd(pmd)) {
+				/*
+				 * mm_get_huge_zero_folio() will never allocate a new
+				 * folio here, since we already have a zero page to
+				 * copy. It just takes a reference.
+				 */
+				mm_get_huge_zero_folio(dst_mm);
+				goto out_zero_page;
+			}
+
+			/*
+			 * Making sure it's not a CoW VMA with writable
+			 * mapping, otherwise it means either the anon page wrongly
+			 * applied special bit, or we made the PRIVATE mapping be
+			 * able to wrongly write to the backend MMIO.
+			 */
+			VM_WARN_ON_ONCE(is_cow_mapping(src_vma->vm_flags) && pmd_write(pmd));
+			pte_free(dst_mm, pgtable);
+			goto set_pmd;
+		}
+
+		if (!folio_test_anon(src_folio)) {
+			pte_free(dst_mm, pgtable);
+			ret = 0;
+			goto out_unlock;
+		}
+
+		folio_get(src_folio);
+		if (unlikely(folio_try_dup_anon_rmap_pmd(src_folio, &src_folio->page,
dst_vma, src_vma))) {
+			/* Page maybe pinned: split and retry the fault on PTEs. */
+			folio_put(src_folio);
+			pte_free(dst_mm, pgtable);
+			spin_unlock(src_ptl);
+			spin_unlock(dst_ptl);
+			__split_huge_pmd(src_vma, src_pmd, addr, false);
+			return -EAGAIN;
+		}
+		add_mm_counter(dst_mm, MM_ANONPAGES, HPAGE_PMD_NR);
+
+	} else if (unlikely(thp_migration_supported() &&
pmd_is_valid_softleaf(pmd))) {
+		if (unlikely(!vma_is_anonymous(dst_vma))) {
+			pte_free(dst_mm, pgtable);
+			ret = 0;
+			goto out_unlock;
+		}
 		copy_huge_non_present_pmd(dst_mm, src_mm, dst_pmd, src_pmd, addr,
 					  dst_vma, src_vma, pmd, pgtable);
 		ret = 0;
 		goto out_unlock;
-	}

-	if (unlikely(!pmd_trans_huge(pmd))) {
+	} else {
 		pte_free(dst_mm, pgtable);
 		goto out_unlock;
 	}
-	/*
-	 * When page table lock is held, the huge zero pmd should not be
-	 * under splitting since we don't split the page itself, only pmd to
-	 * a page table.
-	 */
-	if (is_huge_zero_pmd(pmd)) {
-		/*
-		 * mm_get_huge_zero_folio() will never allocate a new
-		 * folio here, since we already have a zero page to
-		 * copy. It just takes a reference.
-		 */
-		mm_get_huge_zero_folio(dst_mm);
-		goto out_zero_page;
-	}

-	src_page = pmd_page(pmd);
-	VM_BUG_ON_PAGE(!PageHead(src_page), src_page);
-	src_folio = page_folio(src_page);
-
-	folio_get(src_folio);
-	if (unlikely(folio_try_dup_anon_rmap_pmd(src_folio, src_page, dst_vma,
src_vma))) {
-		/* Page maybe pinned: split and retry the fault on PTEs. */
-		folio_put(src_folio);
-		pte_free(dst_mm, pgtable);
-		spin_unlock(src_ptl);
-		spin_unlock(dst_ptl);
-		__split_huge_pmd(src_vma, src_pmd, addr, false);
-		return -EAGAIN;
-	}
-	add_mm_counter(dst_mm, MM_ANONPAGES, HPAGE_PMD_NR);
 out_zero_page:
 	mm_inc_nr_ptes(dst_mm);
 	pgtable_trans_huge_deposit(dst_mm, dst_pmd, pgtable);
-- 
2.43.0



2. __split_huge_pmd_locked()

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 3f8b3f15c6ba..c02c2843520f 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -3090,98 +3090,50 @@ static void __split_huge_pmd_locked(struct
vm_area_struct *vma, pmd_t *pmd,

 	count_vm_event(THP_SPLIT_PMD);

-	if (!vma_is_anonymous(vma)) {
-		old_pmd = pmdp_huge_clear_flush(vma, haddr, pmd);
-		/*
-		 * We are going to unmap this huge page. So
-		 * just go ahead and zap it
-		 */
-		if (arch_needs_pgtable_deposit())
-			zap_deposited_table(mm, pmd);
-		if (vma_is_special_huge(vma))
-			return;
-		if (unlikely(pmd_is_migration_entry(old_pmd))) {
-			const softleaf_t old_entry = softleaf_from_pmd(old_pmd);
+	if (pmd_present(*pmd)) {
+		folio = vm_normal_folio_pmd(vma, haddr, *pmd);

-			folio = softleaf_to_folio(old_entry);
-		} else if (is_huge_zero_pmd(old_pmd)) {
+		if (unlikely(!folio)) {
+			/* Huge Zero Page */
+			if (is_huge_zero_pmd(*pmd))
+				/*
+				 * FIXME: Do we want to invalidate secondary mmu by calling
+				 * mmu_notifier_arch_invalidate_secondary_tlbs() see comments below
+				 * inside __split_huge_pmd() ?
+				 *
+				 * We are going from a zero huge page write protected to zero
+				 * small page also write protected so it does not seems useful
+				 * to invalidate secondary mmu at this time.
+				 */
+				return __split_huge_zero_page_pmd(vma, haddr, pmd);
+
+			/* Huge PFNMAP */
+			old_pmd = pmdp_huge_clear_flush(vma, haddr, pmd);
+			if (arch_needs_pgtable_deposit())
+				zap_deposited_table(mm, pmd);
 			return;
-		} else {
+		}
+
+		/* File/Shmem THP */
+		if (unlikely(!folio_test_anon(folio))) {
+			old_pmd = pmdp_huge_clear_flush(vma, haddr, pmd);
+			if (arch_needs_pgtable_deposit())
+				zap_deposited_table(mm, pmd);
+			if (vma_is_special_huge(vma))
+				return;
+
 			page = pmd_page(old_pmd);
-			folio = page_folio(page);
 			if (!folio_test_dirty(folio) && pmd_dirty(old_pmd))
 				folio_mark_dirty(folio);
 			if (!folio_test_referenced(folio) && pmd_young(old_pmd))
 				folio_set_referenced(folio);
 			folio_remove_rmap_pmd(folio, page, vma);
 			folio_put(folio);
+			add_mm_counter(mm, mm_counter_file(folio), -HPAGE_PMD_NR);
+			return;
 		}
-		add_mm_counter(mm, mm_counter_file(folio), -HPAGE_PMD_NR);
-		return;
-	}
-
-	if (is_huge_zero_pmd(*pmd)) {
-		/*
-		 * FIXME: Do we want to invalidate secondary mmu by calling
-		 * mmu_notifier_arch_invalidate_secondary_tlbs() see comments below
-		 * inside __split_huge_pmd() ?
-		 *
-		 * We are going from a zero huge page write protected to zero
-		 * small page also write protected so it does not seems useful
-		 * to invalidate secondary mmu at this time.
-		 */
-		return __split_huge_zero_page_pmd(vma, haddr, pmd);
-	}
-
-	if (pmd_is_migration_entry(*pmd)) {
-		softleaf_t entry;
-
-		old_pmd = *pmd;
-		entry = softleaf_from_pmd(old_pmd);
-		page = softleaf_to_page(entry);
-		folio = page_folio(page);
-
-		soft_dirty = pmd_swp_soft_dirty(old_pmd);
-		uffd_wp = pmd_swp_uffd_wp(old_pmd);
-
-		write = softleaf_is_migration_write(entry);
-		if (PageAnon(page))
-			anon_exclusive = softleaf_is_migration_read_exclusive(entry);
-		young = softleaf_is_migration_young(entry);
-		dirty = softleaf_is_migration_dirty(entry);
-	} else if (pmd_is_device_private_entry(*pmd)) {
-		softleaf_t entry;
-
-		old_pmd = *pmd;
-		entry = softleaf_from_pmd(old_pmd);
-		page = softleaf_to_page(entry);
-		folio = page_folio(page);
-
-		soft_dirty = pmd_swp_soft_dirty(old_pmd);
-		uffd_wp = pmd_swp_uffd_wp(old_pmd);
-
-		write = softleaf_is_device_private_write(entry);
-		anon_exclusive = PageAnonExclusive(page);

-		/*
-		 * Device private THP should be treated the same as regular
-		 * folios w.r.t anon exclusive handling. See the comments for
-		 * folio handling and anon_exclusive below.
-		 */
-		if (freeze && anon_exclusive &&
-		    folio_try_share_anon_rmap_pmd(folio, page))
-			freeze = false;
-		if (!freeze) {
-			rmap_t rmap_flags = RMAP_NONE;
-
-			folio_ref_add(folio, HPAGE_PMD_NR - 1);
-			if (anon_exclusive)
-				rmap_flags |= RMAP_EXCLUSIVE;
-
-			folio_add_anon_rmap_ptes(folio, page, HPAGE_PMD_NR,
-						 vma, haddr, rmap_flags);
-		}
-	} else {
+		/* Anon THP */
 		/*
 		 * Up to this point the pmd is present and huge and userland has
 		 * the whole access to the hugepage during the split (which
@@ -3207,7 +3159,6 @@ static void __split_huge_pmd_locked(struct
vm_area_struct *vma, pmd_t *pmd,
 		 */
 		old_pmd = pmdp_invalidate(vma, haddr, pmd);
 		page = pmd_page(old_pmd);
-		folio = page_folio(page);
 		if (pmd_dirty(old_pmd)) {
 			dirty = true;
 			folio_set_dirty(folio);
@@ -3218,8 +3169,6 @@ static void __split_huge_pmd_locked(struct
vm_area_struct *vma, pmd_t *pmd,
 		uffd_wp = pmd_uffd_wp(old_pmd);

 		VM_WARN_ON_FOLIO(!folio_ref_count(folio), folio);
-		VM_WARN_ON_FOLIO(!folio_test_anon(folio), folio);
-
 		/*
 		 * Without "freeze", we'll simply split the PMD, propagating the
 		 * PageAnonExclusive() flag for each PTE by setting it for
@@ -3236,17 +3185,82 @@ static void __split_huge_pmd_locked(struct
vm_area_struct *vma, pmd_t *pmd,
 		 * See folio_try_share_anon_rmap_pmd(): invalidate PMD first.
 		 */
 		anon_exclusive = PageAnonExclusive(page);
-		if (freeze && anon_exclusive &&
-		    folio_try_share_anon_rmap_pmd(folio, page))
+		if (freeze && anon_exclusive && folio_try_share_anon_rmap_pmd(folio,
page))
 			freeze = false;
 		if (!freeze) {
 			rmap_t rmap_flags = RMAP_NONE;
-
 			folio_ref_add(folio, HPAGE_PMD_NR - 1);
 			if (anon_exclusive)
 				rmap_flags |= RMAP_EXCLUSIVE;
-			folio_add_anon_rmap_ptes(folio, page, HPAGE_PMD_NR,
-						 vma, haddr, rmap_flags);
+			folio_add_anon_rmap_ptes(folio, page, HPAGE_PMD_NR, vma, haddr,
rmap_flags);
+		}
+	} else { /* pmd not present */
+		folio = pmd_to_softleaf_folio(*pmd);
+		if (unlikely(!folio))
+			return;
+
+		/* Migration of File/Shmem THP */
+		if (unlikely(!folio_test_anon(folio))) {
+			old_pmd = pmdp_huge_clear_flush(vma, haddr, pmd);
+			if (arch_needs_pgtable_deposit())
+				zap_deposited_table(mm, pmd);
+			if (vma_is_special_huge(vma))
+				return;
+			add_mm_counter(mm, mm_counter_file(folio), -HPAGE_PMD_NR);
+			return;
+		}
+
+		/* Migration of Anon THP or Device Private*/
+		if (pmd_is_migration_entry(*pmd)) {
+			softleaf_t entry;
+
+			old_pmd = *pmd;
+			entry = softleaf_from_pmd(old_pmd);
+			page = softleaf_to_page(entry);
+			folio = page_folio(page);
+
+			soft_dirty = pmd_swp_soft_dirty(old_pmd);
+			uffd_wp = pmd_swp_uffd_wp(old_pmd);
+
+			write = softleaf_is_migration_write(entry);
+			if (PageAnon(page))
+				anon_exclusive = softleaf_is_migration_read_exclusive(entry);
+			young = softleaf_is_migration_young(entry);
+			dirty = softleaf_is_migration_dirty(entry);
+		} else if (pmd_is_device_private_entry(*pmd)) {
+			softleaf_t entry;
+
+			old_pmd = *pmd;
+			entry = softleaf_from_pmd(old_pmd);
+			page = softleaf_to_page(entry);
+
+			soft_dirty = pmd_swp_soft_dirty(old_pmd);
+			uffd_wp = pmd_swp_uffd_wp(old_pmd);
+
+			write = softleaf_is_device_private_write(entry);
+			anon_exclusive = PageAnonExclusive(page);
+
+			/*
+			* Device private THP should be treated the same as regular
+			* folios w.r.t anon exclusive handling. See the comments for
+			* folio handling and anon_exclusive below.
+			*/
+			if (freeze && anon_exclusive &&
+				folio_try_share_anon_rmap_pmd(folio, page))
+				freeze = false;
+			if (!freeze) {
+				rmap_t rmap_flags = RMAP_NONE;
+
+				folio_ref_add(folio, HPAGE_PMD_NR - 1);
+				if (anon_exclusive)
+					rmap_flags |= RMAP_EXCLUSIVE;
+
+				folio_add_anon_rmap_ptes(folio, page, HPAGE_PMD_NR,
+							vma, haddr, rmap_flags);
+			}
+		} else {
+			VM_WARN_ONCE(1, "unknown situation.");
+			return;
 		}
 	}

-- 
2.43.0


-- 
Yin Tirui



^ permalink raw reply related

* Re: [PATCH RFC v3 4/4] mm: add PMD-level huge page support for remap_pfn_range()
From: Yin Tirui @ 2026-04-19 11:24 UTC (permalink / raw)
  To: David Hildenbrand (Arm), lorenzo.stoakes
  Cc: linux-kernel, linux-mm, x86, linux-arm-kernel, willy, jgross,
	catalin.marinas, will, tglx, mingo, bp, dave.hansen, hpa, luto,
	peterz, akpm, ziy, baolin.wang, Liam.Howlett, npache,
	ryan.roberts, dev.jain, baohua, lance.yang, vbabka, rppt, surenb,
	mhocko, anshuman.khandual, rmclure, kevin.brodsky, apopple, ajd,
	pasha.tatashin, bhe, thuth, coxu, dan.j.williams, yu-cheng.yu,
	yangyicong, baolu.lu, conor.dooley, Jonathan.Cameron, riel,
	wangkefeng.wang, chenjun102
In-Reply-To: <5d04929b-576f-4926-9f3b-be9a41a3e010@gmail.com>

Hi David,

Thanks a lot for the thorough review!

On 4/14/26 04:02, David Hildenbrand (Arm) wrote:
> On 2/28/26 08:09, Yin Tirui wrote:
>> Add PMD-level huge page support to remap_pfn_range(), automatically
>> creating huge mappings when prerequisites are satisfied (size, alignment,
>> architecture support, etc.) and falling back to normal page mappings
>> otherwise.
>>
>> Implement special huge PMD splitting by utilizing the pgtable deposit/
>> withdraw mechanism. When splitting is needed, the deposited pgtable is
>> withdrawn and populated with individual PTEs created from the original
>> huge mapping.
>>
>> Signed-off-by: Yin Tirui <yintirui@huawei.com>
>> ---
> 
> [...]
> 
>>  
>>  	if (!vma_is_anonymous(vma)) {
>>  		old_pmd = pmdp_huge_clear_flush(vma, haddr, pmd);
>> +
>> +		if (!vma_is_dax(vma) && vma_is_special_huge(vma)) {
> 
> These magical vma checks are really bad. This all needs a cleanup
> (Lorenzo is doing some, hoping it will look better on top of that).
> 
Agreed. I am following Lorenzo's recent cleanups closely.

>> +			pte_t entry;
>> +
>> +			if (!pmd_special(old_pmd)) {
> 
> If you are using pmd_special(), you are doing something wrong.
> 
> Hint: vm_normal_page_pmd() is usually what you want.

Spot on.

While looking into applying vm_normal_folio_pmd() here to avoid the
magical VMA checks, I realized that both __split_huge_pmd_locked() and
copy_huge_pmd() currently suffer from the same !vma_is_anonymous(vma)
top-level entanglement.I think these functions could benefit from a
structural refactoring similar to what Lorenzo is currently doing in
zap_huge_pmd().

My idea is to flatten both functions into a pmd_present()-driven
decision tree:
1. Branch strictly on pmd_present().
2. For present PMDs, rely exclusively on vm_normal_folio_pmd() to
determine the underlying memory type, rather than guessing from VMA flags.
3. If !folio (and not a huge zero page), it cleanly identifies special
mappings (like PFNMAPs) without relying on vma_is_special_huge(). We can
handle the split/copy directly and return early.
4. Otherwise, proceed with the normal Anon/File THP logic, or handle
non-present migration entries in the !pmd_present() branch.

I have drafted two preparation patches demonstrating this approach and
appended the diffs at the end of this email. Does this direction look
reasonable to you? If so, I will iron out the implementation details and
include these refactoring patches in my upcoming v4 series.

> 
>> +				zap_deposited_table(mm, pmd);
>> +				return;
>> +			}
>> +			pgtable = pgtable_trans_huge_withdraw(mm, pmd);
>> +			if (unlikely(!pgtable))
>> +				return;
>> +			pmd_populate(mm, &_pmd, pgtable);
>> +			pte = pte_offset_map(&_pmd, haddr);
>> +			entry = pfn_pte(pmd_pfn(old_pmd), pmd_pgprot(old_pmd));
>> +			set_ptes(mm, haddr, pte, entry, HPAGE_PMD_NR);
>> +			pte_unmap(pte);
>> +
>> +			smp_wmb(); /* make pte visible before pmd */
>> +			pmd_populate(mm, pmd, pgtable);
>> +			return;
>> +		}
>> +
>>  		/*
>>  		 * We are going to unmap this huge page. So
>>  		 * just go ahead and zap it
>>  		 */
>>  		if (arch_needs_pgtable_deposit())
>>  			zap_deposited_table(mm, pmd);
>> -		if (!vma_is_dax(vma) && vma_is_special_huge(vma))
>> -			return;
>> +
>>  		if (unlikely(pmd_is_migration_entry(old_pmd))) {
>>  			const softleaf_t old_entry = softleaf_from_pmd(old_pmd);
>>  
>> diff --git a/mm/memory.c b/mm/memory.c
>> index 07778814b4a8..affccf38cbcf 100644
>> --- a/mm/memory.c
>> +++ b/mm/memory.c
>> @@ -2890,6 +2890,40 @@ static int remap_pte_range(struct mm_struct *mm, pmd_t *pmd,
>>  	return err;
>>  }
>>  
>> +#ifdef CONFIG_ARCH_SUPPORTS_PMD_PFNMAP
> 
> Why exactly do we need arch support for that in form of a Kconfig.
> 
> Usually, we guard pmd support by CONFIG_TRANSPARENT_HUGEPAGE.
> 
> And then, we must check at runtime if PMD leaves are actually supported.
> 
> Luiz is working on a cleanup series:
> 
> https://lore.kernel.org/r/cover.1775679721.git.luizcap@redhat.com
> 
> pgtable_has_pmd_leaves() is what you would want to check.

Makes sense. This Kconfig was inherited from Peter Xu's earlier
proposal, but depending on CONFIG_TRANSPARENT_HUGEPAGE and
pgtable_has_pmd_leaves() is indeed the correct standard. I will rebase
on Luiz's series.

> 
> 
>> +static int remap_try_huge_pmd(struct mm_struct *mm, pmd_t *pmd,
>> +			unsigned long addr, unsigned long end,
>> +			unsigned long pfn, pgprot_t prot)
> 
> Use two-tab indent. (currently 3? :) )
> 
> Also, we tend to call these things now "pmd leaves". Call it
>  "remap_try_pmd_leaf" or something even more expressive like
> 
> "remap_try_install_pmd_leaf()"
> 

Noted. Will fix the indentation and rename it.

>> +{
>> +	pgtable_t pgtable;
>> +	spinlock_t *ptl;
>> +
>> +	if ((end - addr) != PMD_SIZE)
> 
> 	if (end - addr != PMD_SIZE)
> 
> Should work

Noted.

> 
>> +		return 0;
>> +
>> +	if (!IS_ALIGNED(addr, PMD_SIZE))
>> +		return 0;
>> +
> 
> You could likely combine both things into a
> 
> 	if (!IS_ALIGNED(addr | end, PMD_SIZE))
> 
>> +	if (!IS_ALIGNED(pfn, HPAGE_PMD_NR))
> 
> Another sign that you piggy-back on THP support ;)

Indeed! :)

> 
>> +		return 0;
>> +
>> +	if (pmd_present(*pmd) && !pmd_free_pte_page(pmd, addr))
>> +		return 0;
> 
> Ripping out a page table?! That doesn't sound right :)
> 
> Why is that required? We shouldn't be doing that here. Gah.
> 
> Especially, without any pmd locks etc.

...oops. That is indeed a silly one. Thanks for catching it.
I will fix this to:

	if (!pmd_none(*pmd))
		return 0;

> 
>> +
>> +	pgtable = pte_alloc_one(mm);
>> +	if (unlikely(!pgtable))
>> +		return 0;
>> +
>> +	mm_inc_nr_ptes(mm);
>> +	ptl = pmd_lock(mm, pmd);
>> +	set_pmd_at(mm, addr, pmd, pmd_mkspecial(pmd_mkhuge(pfn_pmd(pfn, prot))));
>> +	pgtable_trans_huge_deposit(mm, pmd, pgtable);
>> +	spin_unlock(ptl);
>> +
>> +	return 1;
>> +}
>> +#endif
>> +
>>  static inline int remap_pmd_range(struct mm_struct *mm, pud_t *pud,
>>  			unsigned long addr, unsigned long end,
>>  			unsigned long pfn, pgprot_t prot)
>> @@ -2905,6 +2939,12 @@ static inline int remap_pmd_range(struct mm_struct *mm, pud_t *pud,
>>  	VM_BUG_ON(pmd_trans_huge(*pmd));
>>  	do {
>>  		next = pmd_addr_end(addr, end);
>> +#ifdef CONFIG_ARCH_SUPPORTS_PMD_PFNMAP
>> +		if (remap_try_huge_pmd(mm, pmd, addr, next,
>> +				pfn + (addr >> PAGE_SHIFT), prot)) {
> 
> Please provide a stub instead so we don't end up with ifdef in this code.

Will do.

> 

Appendix:

Based on the mm-stable branch.

1. copy_huge_pmd()

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 42c983821c03..3f8b3f15c6ba 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1912,35 +1912,11 @@ int copy_huge_pmd(struct mm_struct *dst_mm,
struct mm_struct *src_mm,
 		  struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma)
 {
 	spinlock_t *dst_ptl, *src_ptl;
-	struct page *src_page;
 	struct folio *src_folio;
 	pmd_t pmd;
 	pgtable_t pgtable = NULL;
 	int ret = -ENOMEM;

-	pmd = pmdp_get_lockless(src_pmd);
-	if (unlikely(pmd_present(pmd) && pmd_special(pmd) &&
-		     !is_huge_zero_pmd(pmd))) {
-		dst_ptl = pmd_lock(dst_mm, dst_pmd);
-		src_ptl = pmd_lockptr(src_mm, src_pmd);
-		spin_lock_nested(src_ptl, SINGLE_DEPTH_NESTING);
-		/*
-		 * No need to recheck the pmd, it can't change with write
-		 * mmap lock held here.
-		 *
-		 * Meanwhile, making sure it's not a CoW VMA with writable
-		 * mapping, otherwise it means either the anon page wrongly
-		 * applied special bit, or we made the PRIVATE mapping be
-		 * able to wrongly write to the backend MMIO.
-		 */
-		VM_WARN_ON_ONCE(is_cow_mapping(src_vma->vm_flags) && pmd_write(pmd));
-		goto set_pmd;
-	}
-
-	/* Skip if can be re-fill on fault */
-	if (!vma_is_anonymous(dst_vma))
-		return 0;
-
 	pgtable = pte_alloc_one(dst_mm);
 	if (unlikely(!pgtable))
 		goto out;
@@ -1952,48 +1928,69 @@ int copy_huge_pmd(struct mm_struct *dst_mm,
struct mm_struct *src_mm,
 	ret = -EAGAIN;
 	pmd = *src_pmd;

-	if (unlikely(thp_migration_supported() &&
-		     pmd_is_valid_softleaf(pmd))) {
+	if (likely(pmd_present(pmd))) {
+		src_folio = vm_normal_folio_pmd(src_vma, addr, pmd);
+		if (unlikely(!src_folio)) {
+			/*
+			 * When page table lock is held, the huge zero pmd should not be
+			 * under splitting since we don't split the page itself, only pmd to
+			 * a page table.
+			 */
+			if (is_huge_zero_pmd(pmd)) {
+				/*
+				 * mm_get_huge_zero_folio() will never allocate a new
+				 * folio here, since we already have a zero page to
+				 * copy. It just takes a reference.
+				 */
+				mm_get_huge_zero_folio(dst_mm);
+				goto out_zero_page;
+			}
+
+			/*
+			 * Making sure it's not a CoW VMA with writable
+			 * mapping, otherwise it means either the anon page wrongly
+			 * applied special bit, or we made the PRIVATE mapping be
+			 * able to wrongly write to the backend MMIO.
+			 */
+			VM_WARN_ON_ONCE(is_cow_mapping(src_vma->vm_flags) && pmd_write(pmd));
+			pte_free(dst_mm, pgtable);
+			goto set_pmd;
+		}
+
+		if (!folio_test_anon(src_folio)) {
+			pte_free(dst_mm, pgtable);
+			ret = 0;
+			goto out_unlock;
+		}
+
+		folio_get(src_folio);
+		if (unlikely(folio_try_dup_anon_rmap_pmd(src_folio, &src_folio->page,
dst_vma, src_vma))) {
+			/* Page maybe pinned: split and retry the fault on PTEs. */
+			folio_put(src_folio);
+			pte_free(dst_mm, pgtable);
+			spin_unlock(src_ptl);
+			spin_unlock(dst_ptl);
+			__split_huge_pmd(src_vma, src_pmd, addr, false);
+			return -EAGAIN;
+		}
+		add_mm_counter(dst_mm, MM_ANONPAGES, HPAGE_PMD_NR);
+
+	} else if (unlikely(thp_migration_supported() &&
pmd_is_valid_softleaf(pmd))) {
+		if (unlikely(!vma_is_anonymous(dst_vma))) {
+			pte_free(dst_mm, pgtable);
+			ret = 0;
+			goto out_unlock;
+		}
 		copy_huge_non_present_pmd(dst_mm, src_mm, dst_pmd, src_pmd, addr,
 					  dst_vma, src_vma, pmd, pgtable);
 		ret = 0;
 		goto out_unlock;
-	}

-	if (unlikely(!pmd_trans_huge(pmd))) {
+	} else {
 		pte_free(dst_mm, pgtable);
 		goto out_unlock;
 	}
-	/*
-	 * When page table lock is held, the huge zero pmd should not be
-	 * under splitting since we don't split the page itself, only pmd to
-	 * a page table.
-	 */
-	if (is_huge_zero_pmd(pmd)) {
-		/*
-		 * mm_get_huge_zero_folio() will never allocate a new
-		 * folio here, since we already have a zero page to
-		 * copy. It just takes a reference.
-		 */
-		mm_get_huge_zero_folio(dst_mm);
-		goto out_zero_page;
-	}

-	src_page = pmd_page(pmd);
-	VM_BUG_ON_PAGE(!PageHead(src_page), src_page);
-	src_folio = page_folio(src_page);
-
-	folio_get(src_folio);
-	if (unlikely(folio_try_dup_anon_rmap_pmd(src_folio, src_page, dst_vma,
src_vma))) {
-		/* Page maybe pinned: split and retry the fault on PTEs. */
-		folio_put(src_folio);
-		pte_free(dst_mm, pgtable);
-		spin_unlock(src_ptl);
-		spin_unlock(dst_ptl);
-		__split_huge_pmd(src_vma, src_pmd, addr, false);
-		return -EAGAIN;
-	}
-	add_mm_counter(dst_mm, MM_ANONPAGES, HPAGE_PMD_NR);
 out_zero_page:
 	mm_inc_nr_ptes(dst_mm);
 	pgtable_trans_huge_deposit(dst_mm, dst_pmd, pgtable);


2. __split_huge_pmd_locked()

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 3f8b3f15c6ba..c02c2843520f 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -3090,98 +3090,50 @@ static void __split_huge_pmd_locked(struct
vm_area_struct *vma, pmd_t *pmd,

 	count_vm_event(THP_SPLIT_PMD);

-	if (!vma_is_anonymous(vma)) {
-		old_pmd = pmdp_huge_clear_flush(vma, haddr, pmd);
-		/*
-		 * We are going to unmap this huge page. So
-		 * just go ahead and zap it
-		 */
-		if (arch_needs_pgtable_deposit())
-			zap_deposited_table(mm, pmd);
-		if (vma_is_special_huge(vma))
-			return;
-		if (unlikely(pmd_is_migration_entry(old_pmd))) {
-			const softleaf_t old_entry = softleaf_from_pmd(old_pmd);
+	if (pmd_present(*pmd)) {
+		folio = vm_normal_folio_pmd(vma, haddr, *pmd);

-			folio = softleaf_to_folio(old_entry);
-		} else if (is_huge_zero_pmd(old_pmd)) {
+		if (unlikely(!folio)) {
+			/* Huge Zero Page */
+			if (is_huge_zero_pmd(*pmd))
+				/*
+				 * FIXME: Do we want to invalidate secondary mmu by calling
+				 * mmu_notifier_arch_invalidate_secondary_tlbs() see comments below
+				 * inside __split_huge_pmd() ?
+				 *
+				 * We are going from a zero huge page write protected to zero
+				 * small page also write protected so it does not seems useful
+				 * to invalidate secondary mmu at this time.
+				 */
+				return __split_huge_zero_page_pmd(vma, haddr, pmd);
+
+			/* Huge PFNMAP */
+			old_pmd = pmdp_huge_clear_flush(vma, haddr, pmd);
+			if (arch_needs_pgtable_deposit())
+				zap_deposited_table(mm, pmd);
 			return;
-		} else {
+		}
+
+		/* File/Shmem THP */
+		if (unlikely(!folio_test_anon(folio))) {
+			old_pmd = pmdp_huge_clear_flush(vma, haddr, pmd);
+			if (arch_needs_pgtable_deposit())
+				zap_deposited_table(mm, pmd);
+			if (vma_is_special_huge(vma))
+				return;
+
 			page = pmd_page(old_pmd);
-			folio = page_folio(page);
 			if (!folio_test_dirty(folio) && pmd_dirty(old_pmd))
 				folio_mark_dirty(folio);
 			if (!folio_test_referenced(folio) && pmd_young(old_pmd))
 				folio_set_referenced(folio);
 			folio_remove_rmap_pmd(folio, page, vma);
 			folio_put(folio);
+			add_mm_counter(mm, mm_counter_file(folio), -HPAGE_PMD_NR);
+			return;
 		}
-		add_mm_counter(mm, mm_counter_file(folio), -HPAGE_PMD_NR);
-		return;
-	}
-
-	if (is_huge_zero_pmd(*pmd)) {
-		/*
-		 * FIXME: Do we want to invalidate secondary mmu by calling
-		 * mmu_notifier_arch_invalidate_secondary_tlbs() see comments below
-		 * inside __split_huge_pmd() ?
-		 *
-		 * We are going from a zero huge page write protected to zero
-		 * small page also write protected so it does not seems useful
-		 * to invalidate secondary mmu at this time.
-		 */
-		return __split_huge_zero_page_pmd(vma, haddr, pmd);
-	}
-
-	if (pmd_is_migration_entry(*pmd)) {
-		softleaf_t entry;
-
-		old_pmd = *pmd;
-		entry = softleaf_from_pmd(old_pmd);
-		page = softleaf_to_page(entry);
-		folio = page_folio(page);
-
-		soft_dirty = pmd_swp_soft_dirty(old_pmd);
-		uffd_wp = pmd_swp_uffd_wp(old_pmd);
-
-		write = softleaf_is_migration_write(entry);
-		if (PageAnon(page))
-			anon_exclusive = softleaf_is_migration_read_exclusive(entry);
-		young = softleaf_is_migration_young(entry);
-		dirty = softleaf_is_migration_dirty(entry);
-	} else if (pmd_is_device_private_entry(*pmd)) {
-		softleaf_t entry;
-
-		old_pmd = *pmd;
-		entry = softleaf_from_pmd(old_pmd);
-		page = softleaf_to_page(entry);
-		folio = page_folio(page);
-
-		soft_dirty = pmd_swp_soft_dirty(old_pmd);
-		uffd_wp = pmd_swp_uffd_wp(old_pmd);
-
-		write = softleaf_is_device_private_write(entry);
-		anon_exclusive = PageAnonExclusive(page);

-		/*
-		 * Device private THP should be treated the same as regular
-		 * folios w.r.t anon exclusive handling. See the comments for
-		 * folio handling and anon_exclusive below.
-		 */
-		if (freeze && anon_exclusive &&
-		    folio_try_share_anon_rmap_pmd(folio, page))
-			freeze = false;
-		if (!freeze) {
-			rmap_t rmap_flags = RMAP_NONE;
-
-			folio_ref_add(folio, HPAGE_PMD_NR - 1);
-			if (anon_exclusive)
-				rmap_flags |= RMAP_EXCLUSIVE;
-
-			folio_add_anon_rmap_ptes(folio, page, HPAGE_PMD_NR,
-						 vma, haddr, rmap_flags);
-		}
-	} else {
+		/* Anon THP */
 		/*
 		 * Up to this point the pmd is present and huge and userland has
 		 * the whole access to the hugepage during the split (which
@@ -3207,7 +3159,6 @@ static void __split_huge_pmd_locked(struct
vm_area_struct *vma, pmd_t *pmd,
 		 */
 		old_pmd = pmdp_invalidate(vma, haddr, pmd);
 		page = pmd_page(old_pmd);
-		folio = page_folio(page);
 		if (pmd_dirty(old_pmd)) {
 			dirty = true;
 			folio_set_dirty(folio);
@@ -3218,8 +3169,6 @@ static void __split_huge_pmd_locked(struct
vm_area_struct *vma, pmd_t *pmd,
 		uffd_wp = pmd_uffd_wp(old_pmd);

 		VM_WARN_ON_FOLIO(!folio_ref_count(folio), folio);
-		VM_WARN_ON_FOLIO(!folio_test_anon(folio), folio);
-
 		/*
 		 * Without "freeze", we'll simply split the PMD, propagating the
 		 * PageAnonExclusive() flag for each PTE by setting it for
@@ -3236,17 +3185,82 @@ static void __split_huge_pmd_locked(struct
vm_area_struct *vma, pmd_t *pmd,
 		 * See folio_try_share_anon_rmap_pmd(): invalidate PMD first.
 		 */
 		anon_exclusive = PageAnonExclusive(page);
-		if (freeze && anon_exclusive &&
-		    folio_try_share_anon_rmap_pmd(folio, page))
+		if (freeze && anon_exclusive && folio_try_share_anon_rmap_pmd(folio,
page))
 			freeze = false;
 		if (!freeze) {
 			rmap_t rmap_flags = RMAP_NONE;
-
 			folio_ref_add(folio, HPAGE_PMD_NR - 1);
 			if (anon_exclusive)
 				rmap_flags |= RMAP_EXCLUSIVE;
-			folio_add_anon_rmap_ptes(folio, page, HPAGE_PMD_NR,
-						 vma, haddr, rmap_flags);
+			folio_add_anon_rmap_ptes(folio, page, HPAGE_PMD_NR, vma, haddr,
rmap_flags);
+		}
+	} else { /* pmd not present */
+		folio = pmd_to_softleaf_folio(*pmd);
+		if (unlikely(!folio))
+			return;
+
+		/* Migration of File/Shmem THP */
+		if (unlikely(!folio_test_anon(folio))) {
+			old_pmd = pmdp_huge_clear_flush(vma, haddr, pmd);
+			if (arch_needs_pgtable_deposit())
+				zap_deposited_table(mm, pmd);
+			if (vma_is_special_huge(vma))
+				return;
+			add_mm_counter(mm, mm_counter_file(folio), -HPAGE_PMD_NR);
+			return;
+		}
+
+		/* Migration of Anon THP or Device Private*/
+		if (pmd_is_migration_entry(*pmd)) {
+			softleaf_t entry;
+
+			old_pmd = *pmd;
+			entry = softleaf_from_pmd(old_pmd);
+			page = softleaf_to_page(entry);
+			folio = page_folio(page);
+
+			soft_dirty = pmd_swp_soft_dirty(old_pmd);
+			uffd_wp = pmd_swp_uffd_wp(old_pmd);
+
+			write = softleaf_is_migration_write(entry);
+			if (PageAnon(page))
+				anon_exclusive = softleaf_is_migration_read_exclusive(entry);
+			young = softleaf_is_migration_young(entry);
+			dirty = softleaf_is_migration_dirty(entry);
+		} else if (pmd_is_device_private_entry(*pmd)) {
+			softleaf_t entry;
+
+			old_pmd = *pmd;
+			entry = softleaf_from_pmd(old_pmd);
+			page = softleaf_to_page(entry);
+
+			soft_dirty = pmd_swp_soft_dirty(old_pmd);
+			uffd_wp = pmd_swp_uffd_wp(old_pmd);
+
+			write = softleaf_is_device_private_write(entry);
+			anon_exclusive = PageAnonExclusive(page);
+
+			/*
+			* Device private THP should be treated the same as regular
+			* folios w.r.t anon exclusive handling. See the comments for
+			* folio handling and anon_exclusive below.
+			*/
+			if (freeze && anon_exclusive &&
+				folio_try_share_anon_rmap_pmd(folio, page))
+				freeze = false;
+			if (!freeze) {
+				rmap_t rmap_flags = RMAP_NONE;
+
+				folio_ref_add(folio, HPAGE_PMD_NR - 1);
+				if (anon_exclusive)
+					rmap_flags |= RMAP_EXCLUSIVE;
+
+				folio_add_anon_rmap_ptes(folio, page, HPAGE_PMD_NR,
+							vma, haddr, rmap_flags);
+			}
+		} else {
+			VM_WARN_ONCE(1, "unknown situation.");
+			return;
 		}
 	}

-- 
2.43.0


-- 
Yin Tirui



^ permalink raw reply related

* Re: [RFC PATCH 4/4] firmware: arm_ffa: check pkvm initailised when initailise ffa driver
From: Yeoreum Yun @ 2026-04-19 11:12 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-security-module, linux-kernel, linux-integrity,
	linux-arm-kernel, kvmarm, paul, jmorris, serge, zohar,
	roberto.sassu, dmitry.kasatkin, eric.snowberg, peterhuewe, jarkko,
	jgg, sudeep.holla, oupton, joey.gouly, suzuki.poulose, yuzenghui,
	catalin.marinas, will
In-Reply-To: <87pl3vb5bm.wl-maz@kernel.org>

Hi Marc,

> On Sat, 18 Apr 2026 11:34:30 +0100,
> Yeoreum Yun <yeoreum.yun@arm.com> wrote:
> >
> > > > @@ -2035,6 +2037,16 @@ static int __init ffa_init(void)
> > > >  	u32 buf_sz;
> > > >  	size_t rxtx_bufsz = SZ_4K;
> > > >
> > > > +	/*
> > > > +	 * When pKVM is enabled, the FF-A driver must be initialized
> > > > +	 * after pKVM initialization. Otherwise, pKVM cannot negotiate
> > > > +	 * the FF-A version or obtain RX/TX buffer information,
> > > > +	 * which leads to failures in FF-A calls.
> > > > +	 */
> > > > +	if (IS_ENABLED(CONFIG_KVM) && is_protected_kvm_enabled() &&
> > > > +	    !is_kvm_arm_initialised())
> > > > +		return -EPROBE_DEFER;
> > > > +
> > >
> > > That's still fundamentally wrong: pkvm is not ready until
> > > finalize_pkvm() has finished, and that's not indicated by
> > > is_kvm_arm_initialised().
> >
> > Thanks. I miss the TSC bit set in here.
>
> That's the least of the problems. None of the infrastructure is in
> place at this stage...
>
> > IMHO, I'd like to make an new state check function --
> > is_pkvm_arm_initialised() so that ff-a driver to know whether
> > pkvm is initialised.
>
> Doesn't sound great, TBH.
>
> > or any other suggestion?
>
> Instead of adding more esoteric predicates, I'd rather you build on an
> existing infrastructure. You have a dependency on KVM, use something
> that is designed to enforce dependencies. Device links spring to mind
> as something designed for that.
>
> Can you look into enabling this for KVM? If that's possible, then it
> should be easy enough to delay the actual KVM registration after pKVM
> is finalised.

or what about some event notifier? Just like:

----------&<-----------

diff --git a/arch/arm64/include/asm/virt.h b/arch/arm64/include/asm/virt.h
index b51ab6840f9c..ad038a3b8727 100644
--- a/arch/arm64/include/asm/virt.h
+++ b/arch/arm64/include/asm/virt.h
@@ -68,6 +68,8 @@
 #include <asm/sysreg.h>
 #include <asm/cpufeature.h>

+struct notifier_block;
+
 /*
  * __boot_cpu_mode records what mode CPUs were booted in.
  * A correctly-implemented bootloader must start all CPUs in the same mode:
@@ -166,6 +168,15 @@ static inline bool is_hyp_nvhe(void)
 	return is_hyp_mode_available() && !is_kernel_in_hyp_mode();
 }

+enum kvm_arm_event {
+	PKVM_INITIALISED,
+	KVM_ARM_EVENT_MAX,
+};
+
+extern int kvm_arm_event_notifier_call_chain(enum kvm_arm_event event, void *data);
+extern int kvm_arm_event_notifier_register(struct notifier_block *nb);
+extern int kvm_arm_event_notifier_unregister(struct notifier_block *nb);
+
 #endif /* __ASSEMBLER__ */

 #endif /* ! __ASM__VIRT_H */
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 410ffd41fd73..8da10049ab65 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -14,6 +14,7 @@
 #include <linux/vmalloc.h>
 #include <linux/fs.h>
 #include <linux/mman.h>
+#include <linux/notifier.h>
 #include <linux/sched.h>
 #include <linux/kvm.h>
 #include <linux/kvm_irqfd.h>
@@ -111,6 +112,8 @@ DECLARE_KVM_NVHE_PER_CPU(struct kvm_nvhe_init_params, kvm_init_params);

 DECLARE_KVM_NVHE_PER_CPU(struct kvm_cpu_context, kvm_hyp_ctxt);

+BLOCKING_NOTIFIER_HEAD(kvm_arm_event_notifier_head);
+
 static bool vgic_present, kvm_arm_initialised;

 static DEFINE_PER_CPU(unsigned char, kvm_hyp_initialized);
@@ -3064,4 +3067,22 @@ enum kvm_mode kvm_get_mode(void)
 	return kvm_mode;
 }

+int kvm_arm_event_notifier_call_chain(enum kvm_arm_event event, void *data)
+{
+	return blocking_notifier_call_chain(&kvm_arm_event_notifier_head,
+					    event, data);
+}
+
+int kvm_arm_event_notifier_register(struct notifier_block *nb)
+{
+	return blocking_notifier_chain_register(&kvm_arm_event_notifier_head, nb);
+}
+EXPORT_SYMBOL_GPL(kvm_arm_event_notifier_register);
+
+int kvm_arm_event_notifier_unregister(struct notifier_block *nb)
+{
+	return blocking_notifier_chain_unregister(&kvm_arm_event_notifier_head, nb);
+}
+EXPORT_SYMBOL_GPL(kvm_arm_event_notifier_unregister);
+
 module_init(kvm_arm_init);
diff --git a/arch/arm64/kvm/pkvm.c b/arch/arm64/kvm/pkvm.c
index d7a0f69a9982..e76562b0a45a 100644
--- a/arch/arm64/kvm/pkvm.c
+++ b/arch/arm64/kvm/pkvm.c
@@ -280,6 +280,8 @@ static int __init finalize_pkvm(void)
 	ret = pkvm_drop_host_privileges();
 	if (ret)
 		pr_err("Failed to finalize Hyp protection: %d\n", ret);
+	else
+		kvm_arm_event_notifier_call_chain(PKVM_INITIALISED, NULL);

 	return ret;
 }
diff --git a/drivers/firmware/arm_ffa/common.h b/drivers/firmware/arm_ffa/common.h
index 9c6425a81d0d..5cdf4bd222c6 100644
--- a/drivers/firmware/arm_ffa/common.h
+++ b/drivers/firmware/arm_ffa/common.h
@@ -18,9 +18,9 @@ bool ffa_device_is_valid(struct ffa_device *ffa_dev);
 void ffa_device_match_uuid(struct ffa_device *ffa_dev, const uuid_t *uuid);

 #ifdef CONFIG_ARM_FFA_SMCCC
-int __init ffa_transport_init(ffa_fn **invoke_ffa_fn);
+int ffa_transport_init(ffa_fn **invoke_ffa_fn);
 #else
-static inline int __init ffa_transport_init(ffa_fn **invoke_ffa_fn)
+static inline int ffa_transport_init(ffa_fn **invoke_ffa_fn)
 {
 	return -EOPNOTSUPP;
 }
diff --git a/drivers/firmware/arm_ffa/driver.c b/drivers/firmware/arm_ffa/driver.c
index 02c76ac1570b..67df053e65b8 100644
--- a/drivers/firmware/arm_ffa/driver.c
+++ b/drivers/firmware/arm_ffa/driver.c
@@ -35,6 +35,7 @@
 #include <linux/module.h>
 #include <linux/mm.h>
 #include <linux/mutex.h>
+#include <linux/notifier.h>
 #include <linux/of_irq.h>
 #include <linux/scatterlist.h>
 #include <linux/slab.h>
@@ -42,6 +43,8 @@
 #include <linux/uuid.h>
 #include <linux/xarray.h>

+#include <asm/virt.h>
+
 #include "common.h"

 #define FFA_DRIVER_VERSION	FFA_VERSION_1_2
@@ -2029,7 +2032,7 @@ static void ffa_notifications_setup(void)
 	ffa_notifications_cleanup();
 }

-static int __init ffa_init(void)
+static int __ffa_init(void)
 {
 	int ret;
 	u32 buf_sz;
@@ -2105,11 +2108,42 @@ static int __init ffa_init(void)
 free_drv_info:
 	kfree(drv_info);
 	return ret;
+
+}
+
+static int ffa_kvm_arm_event_handler(struct notifier_block *nb,
+				     unsigned long event, void *unused)
+{
+	if (event == PKVM_INITIALISED)
+		__ffa_init();
+
+	return NOTIFY_DONE;
+}
+
+static struct notifier_block ffa_kvm_arm_event_notifier = {
+	.notifier_call = ffa_kvm_arm_event_handler,
+};
+
+static int __init ffa_init(void)
+{
+	/*
+	 * When pKVM is enabled, the FF-A driver must be initialized
+	 * after pKVM initialization. Otherwise, pKVM cannot negotiate
+	 * the FF-A version or obtain RX/TX buffer information,
+	 * which leads to failures in FF-A calls.
+	 */
+	if (IS_ENABLED(CONFIG_KVM) && is_protected_kvm_enabled() &&
+	    !is_pkvm_initialized())
+		return kvm_arm_event_notifier_register(&ffa_kvm_arm_event_notifier);
+
+	return __ffa_init();
 }
 device_initcall(ffa_init);

 static void __exit ffa_exit(void)
 {
+	if (IS_ENABLED(CONFIG_KVM))
+		kvm_arm_event_notifier_unregister(&ffa_kvm_arm_event_notifier);
 	ffa_notifications_cleanup();
 	ffa_partitions_cleanup();
 	ffa_rxtx_unmap();
diff --git a/drivers/firmware/arm_ffa/smccc.c b/drivers/firmware/arm_ffa/smccc.c
index 4d85bfff0a4e..e6125dd9f58f 100644
--- a/drivers/firmware/arm_ffa/smccc.c
+++ b/drivers/firmware/arm_ffa/smccc.c
@@ -17,7 +17,7 @@ static void __arm_ffa_fn_hvc(ffa_value_t args, ffa_value_t *res)
 	arm_smccc_1_2_hvc(&args, res);
 }

-int __init ffa_transport_init(ffa_fn **invoke_ffa_fn)
+int ffa_transport_init(ffa_fn **invoke_ffa_fn)
 {
 	enum arm_smccc_conduit conduit;


> --
> Jazz isn't dead. It just smells funny.

--
Sincerely,
Yeoreum Yun


^ permalink raw reply related

* Re: [PATCH v3 4/4] clk: rockchip: rk3588: add GATE_GRF clocks for I2S MCLK output to IO
From: Heiko Stuebner @ 2026-04-19 10:56 UTC (permalink / raw)
  To: Michael Turquette, Stephen Boyd, Rob Herring, Krzysztof Kozlowski,
	Conor Dooley, Daniele Briguglio
  Cc: Nicolas Frattaroli, linux-clk, devicetree, linux-arm-kernel,
	linux-rockchip, linux-kernel, Daniele Briguglio, Ricardo Pardini
In-Reply-To: <20260320-rk3588-mclk-gate-grf-v3-4-980338eacd2c@superkali.me>

Hi Daniele,

Am Freitag, 20. März 2026, 11:34:16 Mitteleuropäische Sommerzeit schrieb Daniele Briguglio:
> The I2S MCLK outputs on RK3588 are gated by bits in the SYS_GRF
> register SOC_CON6 (offset 0x318). These gates control whether the
> internal CRU MCLK signals reach the external IO pins connected to
> audio codecs.
> 
> The kernel should explicitly manage these gates so that audio
> functionality does not depend on bootloader register state. This is
> analogous to what was done for RK3576 SAI MCLK outputs [1].
> 
> Register the SYS_GRF as an auxiliary GRF with grf_type_sys in the
> early clock init, and add GATE_GRF entries for all four I2S MCLK
> output gates:
> 
>   - I2S0_8CH_MCLKOUT_TO_IO (bit 0)
>   - I2S1_8CH_MCLKOUT_TO_IO (bit 1)
>   - I2S2_2CH_MCLKOUT_TO_IO (bit 2)
>   - I2S3_2CH_MCLKOUT_TO_IO (bit 7)
> 
> Board DTS files that need MCLK on an IO pin can reference these
> clocks, e.g.:
> 
>     clocks = <&cru I2S0_8CH_MCLKOUT_TO_IO>;
> 
> Tested on the Youyeetoo YY3588 (RK3588) with an ES8388 codec on I2S0.
> 
> [1] https://lore.kernel.org/r/20250305-rk3576-sai-v1-2-64e6cf863e9a@collabora.com/
> 
> Reviewed-by: Nicolas Frattaroli <nicolas.frattaroli@collabora.com>
> Tested-by: Ricardo Pardini <ricardo@pardini.net>
> Signed-off-by: Daniele Briguglio <hello@superkali.me>
> ---
>  drivers/clk/rockchip/clk-rk3588.c | 24 ++++++++++++++++++++++++
>  1 file changed, 24 insertions(+)
> 
> diff --git a/drivers/clk/rockchip/clk-rk3588.c b/drivers/clk/rockchip/clk-rk3588.c
> index 1694223f4f84..2cc85fb5b2cc 100644
> --- a/drivers/clk/rockchip/clk-rk3588.c
> +++ b/drivers/clk/rockchip/clk-rk3588.c
> @@ -5,11 +5,14 @@
>   */
>  
>  #include <linux/clk-provider.h>
> +#include <linux/mfd/syscon.h>
>  #include <linux/of.h>
> +#include <linux/slab.h>
>  #include <linux/of_address.h>
>  #include <linux/platform_device.h>
>  #include <linux/syscore_ops.h>
>  #include <dt-bindings/clock/rockchip,rk3588-cru.h>
> +#include <soc/rockchip/rk3588_grf.h>
>  #include "clk.h"
>  
>  #define RK3588_GRF_SOC_STATUS0		0x600
> @@ -892,6 +895,8 @@ static struct rockchip_clk_branch rk3588_early_clk_branches[] __initdata = {
>  			RK3588_CLKGATE_CON(8), 0, GFLAGS),
>  	MUX(I2S2_2CH_MCLKOUT, "i2s2_2ch_mclkout", i2s2_2ch_mclkout_p, CLK_SET_RATE_PARENT,
>  			RK3588_CLKSEL_CON(30), 2, 1, MFLAGS),
> +	GATE_GRF(I2S2_2CH_MCLKOUT_TO_IO, "i2s2_2ch_mclkout_to_io", "i2s2_2ch_mclkout",
> +			0, RK3588_SYSGRF_SOC_CON6, 2, GFLAGS, grf_type_sys),
>  
>  	COMPOSITE(CLK_I2S3_2CH_SRC, "clk_i2s3_2ch_src", gpll_aupll_p, 0,
>  			RK3588_CLKSEL_CON(30), 8, 1, MFLAGS, 3, 5, DFLAGS,
> @@ -907,6 +912,8 @@ static struct rockchip_clk_branch rk3588_early_clk_branches[] __initdata = {
>  			RK3588_CLKGATE_CON(8), 4, GFLAGS),
>  	MUX(I2S3_2CH_MCLKOUT, "i2s3_2ch_mclkout", i2s3_2ch_mclkout_p, CLK_SET_RATE_PARENT,
>  			RK3588_CLKSEL_CON(32), 2, 1, MFLAGS),
> +	GATE_GRF(I2S3_2CH_MCLKOUT_TO_IO, "i2s3_2ch_mclkout_to_io", "i2s3_2ch_mclkout",
> +			0, RK3588_SYSGRF_SOC_CON6, 7, GFLAGS, grf_type_sys),
>  	GATE(PCLK_ACDCDIG, "pclk_acdcdig", "pclk_audio_root", 0,
>  			RK3588_CLKGATE_CON(7), 11, GFLAGS),
>  	GATE(HCLK_I2S0_8CH, "hclk_i2s0_8ch", "hclk_audio_root", 0,
> @@ -935,6 +942,8 @@ static struct rockchip_clk_branch rk3588_early_clk_branches[] __initdata = {
>  			RK3588_CLKGATE_CON(7), 10, GFLAGS),
>  	MUX(I2S0_8CH_MCLKOUT, "i2s0_8ch_mclkout", i2s0_8ch_mclkout_p, CLK_SET_RATE_PARENT,
>  			RK3588_CLKSEL_CON(28), 2, 2, MFLAGS),
> +	GATE_GRF(I2S0_8CH_MCLKOUT_TO_IO, "i2s0_8ch_mclkout_to_io", "i2s0_8ch_mclkout",
> +			0, RK3588_SYSGRF_SOC_CON6, 0, GFLAGS, grf_type_sys),
>  
>  	GATE(HCLK_PDM1, "hclk_pdm1", "hclk_audio_root", 0,
>  			RK3588_CLKGATE_CON(9), 6, GFLAGS),
> @@ -2220,6 +2229,8 @@ static struct rockchip_clk_branch rk3588_early_clk_branches[] __initdata = {
>  			RK3588_PMU_CLKGATE_CON(2), 13, GFLAGS),
>  	MUX(I2S1_8CH_MCLKOUT, "i2s1_8ch_mclkout", i2s1_8ch_mclkout_p, CLK_SET_RATE_PARENT,
>  			RK3588_PMU_CLKSEL_CON(9), 2, 2, MFLAGS),
> +	GATE_GRF(I2S1_8CH_MCLKOUT_TO_IO, "i2s1_8ch_mclkout_to_io", "i2s1_8ch_mclkout",
> +			0, RK3588_SYSGRF_SOC_CON6, 1, GFLAGS, grf_type_sys),
>  	GATE(PCLK_PMU1, "pclk_pmu1", "pclk_pmu0_root", CLK_IS_CRITICAL,
>  			RK3588_PMU_CLKGATE_CON(1), 0, GFLAGS),
>  	GATE(CLK_DDR_FAIL_SAFE, "clk_ddr_fail_safe", "clk_pmu0", CLK_IGNORE_UNUSED,
> @@ -2439,6 +2450,8 @@ static struct rockchip_clk_branch rk3588_clk_branches[] = {
>  static void __init rk3588_clk_early_init(struct device_node *np)
>  {
>  	struct rockchip_clk_provider *ctx;
> +	struct rockchip_aux_grf *sys_grf_e;
> +	struct regmap *sys_grf;
>  	unsigned long clk_nr_clks, max_clk_id1, max_clk_id2;
>  	void __iomem *reg_base;
>  
> @@ -2479,6 +2492,17 @@ static void __init rk3588_clk_early_init(struct device_node *np)
>  			&rk3588_cpub1clk_data, rk3588_cpub1clk_rates,
>  			ARRAY_SIZE(rk3588_cpub1clk_rates));
>  
> +	/* Register SYS_GRF for I2S MCLK output to IO gate clocks */
> +	sys_grf = syscon_regmap_lookup_by_compatible("rockchip,rk3588-sys-grf");
> +	if (!IS_ERR(sys_grf)) {
> +		sys_grf_e = kzalloc_obj(*sys_grf_e);
> +		if (sys_grf_e) {
> +			sys_grf_e->grf = sys_grf;
> +			sys_grf_e->type = grf_type_sys;
> +			hash_add(ctx->aux_grf_table, &sys_grf_e->node, grf_type_sys);
> +		}
> +	}
> +

sorry, took me a bit to articulate, what "issue" I have with this, which
is only that it open-codes adding GRFs. I.e. over time this likely won't
be the only place this might happen, so I envision a more central
function in the rockchip clock code, aka something like:

(1)
rockchip_clk_add_grf(struct rockchip_clk_provider *ctx,
		struct regmap *grf, enum rockchip_grf_type type)


I'm still unsure, if we want the sycon lookup also in there, like:

(2)
rockchip_clk_add_grf(struct rockchip_clk_provider *ctx,
		const char *compat, enum rockchip_grf_type type)

but then we would end up having to also define if it's optional, so I
guess variant (1) is the nicer one, as it at least abstracts away all
the struct rockchip_aux_grf handling from the clock driver itself.


Heiko


>  	rockchip_clk_register_branches(ctx, rk3588_early_clk_branches,
>  				       ARRAY_SIZE(rk3588_early_clk_branches));
>  
> 
> 






^ permalink raw reply

* [PATCH net v2] net: dsa: mt7530: fix .get_stats64 sleeping in atomic context
From: Daniel Golle @ 2026-04-19 10:43 UTC (permalink / raw)
  To: Chester A. Unal, Daniel Golle, Andrew Lunn, Vladimir Oltean,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Matthias Brugger, AngeloGioacchino Del Regno, Russell King,
	Christian Marangi, netdev, linux-kernel, linux-arm-kernel,
	linux-mediatek
  Cc: Frank Wunderlich, Christian Marangi (Ansuel), John Crispin

The .get_stats64 callback runs in atomic context, but on
MDIO-connected switches every register read acquires the MDIO bus
mutex, which can sleep:
[   12.645973] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:609
[   12.654442] in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 759, name: grep
[   12.663377] preempt_count: 0, expected: 0
[   12.667410] RCU nest depth: 1, expected: 0
[   12.671511] INFO: lockdep is turned off.
[   12.675441] CPU: 0 UID: 0 PID: 759 Comm: grep Tainted: G S      W           7.0.0+ #0 PREEMPT
[   12.675453] Tainted: [S]=CPU_OUT_OF_SPEC, [W]=WARN
[   12.675456] Hardware name: Bananapi BPI-R64 (DT)
[   12.675459] Call trace:
[   12.675462]  show_stack+0x14/0x1c (C)
[   12.675477]  dump_stack_lvl+0x68/0x8c
[   12.675487]  dump_stack+0x14/0x1c
[   12.675495]  __might_resched+0x14c/0x220
[   12.675504]  __might_sleep+0x44/0x80
[   12.675511]  __mutex_lock+0x50/0xb10
[   12.675523]  mutex_lock_nested+0x20/0x30
[   12.675532]  mt7530_get_stats64+0x40/0x2ac
[   12.675542]  dsa_user_get_stats64+0x2c/0x40
[   12.675553]  dev_get_stats+0x44/0x1e0
[   12.675564]  dev_seq_printf_stats+0x24/0xe0
[   12.675575]  dev_seq_show+0x14/0x3c
[   12.675583]  seq_read_iter+0x37c/0x480
[   12.675595]  seq_read+0xd0/0xec
[   12.675605]  proc_reg_read+0x94/0xe4
[   12.675615]  vfs_read+0x98/0x29c
[   12.675625]  ksys_read+0x54/0xdc
[   12.675633]  __arm64_sys_read+0x18/0x20
[   12.675642]  invoke_syscall.constprop.0+0x54/0xec
[   12.675653]  do_el0_svc+0x3c/0xb4
[   12.675662]  el0_svc+0x38/0x200
[   12.675670]  el0t_64_sync_handler+0x98/0xdc
[   12.675679]  el0t_64_sync+0x158/0x15c

For MDIO-connected switches, poll MIB counters asynchronously using a
delayed workqueue every second and let .get_stats64 return the cached
values under a spinlock. A mod_delayed_work() call on each read
triggers an immediate refresh so counters stay responsive when queried
more frequently.

MMIO-connected switches (MT7988, EN7581, AN7583) are not affected
because their regmap does not sleep, so they continue to read MIB
counters directly in .get_stats64.

Fixes: 88c810f35ed5 ("net: dsa: mt7530: implement .get_stats64")
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Acked-by: Chester A. Unal <chester.a.unal@arinc9.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
---
v2:
 * use spin_lock_bh()/spin_unlock_bh() to prevent potential deadlock
 * rate-limit mod_delayed_work() refresh to at most once per 100ms
 * move cancel_delayed_work_sync() after dsa_unregister_switch()
 * add mt753x_teardown() callback to cancel the stats work
 * fix commit message

 drivers/net/dsa/mt7530.c | 66 ++++++++++++++++++++++++++++++++++++++--
 drivers/net/dsa/mt7530.h |  8 +++++
 2 files changed, 71 insertions(+), 3 deletions(-)

diff --git a/drivers/net/dsa/mt7530.c b/drivers/net/dsa/mt7530.c
index b9423389c2ef0..8c1186ba2279b 100644
--- a/drivers/net/dsa/mt7530.c
+++ b/drivers/net/dsa/mt7530.c
@@ -25,6 +25,9 @@
 
 #include "mt7530.h"
 
+#define MT7530_STATS_POLL_INTERVAL	(1 * HZ)
+#define MT7530_STATS_RATE_LIMIT		(HZ / 10)
+
 static struct mt753x_pcs *pcs_to_mt753x_pcs(struct phylink_pcs *pcs)
 {
 	return container_of(pcs, struct mt753x_pcs, pcs);
@@ -906,10 +909,9 @@ static void mt7530_get_rmon_stats(struct dsa_switch *ds, int port,
 	*ranges = mt7530_rmon_ranges;
 }
 
-static void mt7530_get_stats64(struct dsa_switch *ds, int port,
-			       struct rtnl_link_stats64 *storage)
+static void mt7530_read_port_stats64(struct mt7530_priv *priv, int port,
+				     struct rtnl_link_stats64 *storage)
 {
-	struct mt7530_priv *priv = ds->priv;
 	uint64_t data;
 
 	/* MIB counter doesn't provide a FramesTransmittedOK but instead
@@ -951,6 +953,45 @@ static void mt7530_get_stats64(struct dsa_switch *ds, int port,
 			       &storage->rx_crc_errors);
 }
 
+static void mt7530_stats_poll(struct work_struct *work)
+{
+	struct mt7530_priv *priv = container_of(work, struct mt7530_priv,
+						stats_work.work);
+	struct rtnl_link_stats64 stats = {};
+	struct dsa_port *dp;
+	int port;
+
+	dsa_switch_for_each_user_port(dp, priv->ds) {
+		port = dp->index;
+
+		mt7530_read_port_stats64(priv, port, &stats);
+
+		spin_lock_bh(&priv->stats_lock);
+		priv->ports[port].stats = stats;
+		spin_unlock_bh(&priv->stats_lock);
+	}
+
+	priv->stats_last = jiffies;
+	schedule_delayed_work(&priv->stats_work,
+			      MT7530_STATS_POLL_INTERVAL);
+}
+
+static void mt7530_get_stats64(struct dsa_switch *ds, int port,
+			       struct rtnl_link_stats64 *storage)
+{
+	struct mt7530_priv *priv = ds->priv;
+
+	if (priv->bus) {
+		spin_lock_bh(&priv->stats_lock);
+		*storage = priv->ports[port].stats;
+		spin_unlock_bh(&priv->stats_lock);
+		if (time_after(jiffies, priv->stats_last + MT7530_STATS_RATE_LIMIT))
+			mod_delayed_work(system_wq, &priv->stats_work, 0);
+	} else {
+		mt7530_read_port_stats64(priv, port, storage);
+	}
+}
+
 static void mt7530_get_eth_ctrl_stats(struct dsa_switch *ds, int port,
 				      struct ethtool_eth_ctrl_stats *ctrl_stats)
 {
@@ -3137,9 +3178,24 @@ mt753x_setup(struct dsa_switch *ds)
 	if (ret && priv->irq_domain)
 		mt7530_free_mdio_irq(priv);
 
+	if (!ret && priv->bus) {
+		spin_lock_init(&priv->stats_lock);
+		INIT_DELAYED_WORK(&priv->stats_work, mt7530_stats_poll);
+		schedule_delayed_work(&priv->stats_work,
+				      MT7530_STATS_POLL_INTERVAL);
+	}
+
 	return ret;
 }
 
+static void mt753x_teardown(struct dsa_switch *ds)
+{
+	struct mt7530_priv *priv = ds->priv;
+
+	if (priv->bus)
+		cancel_delayed_work_sync(&priv->stats_work);
+}
+
 static int mt753x_set_mac_eee(struct dsa_switch *ds, int port,
 			      struct ethtool_keee *e)
 {
@@ -3257,6 +3313,7 @@ static int mt7988_setup(struct dsa_switch *ds)
 static const struct dsa_switch_ops mt7530_switch_ops = {
 	.get_tag_protocol	= mtk_get_tag_protocol,
 	.setup			= mt753x_setup,
+	.teardown		= mt753x_teardown,
 	.preferred_default_local_cpu_port = mt753x_preferred_default_local_cpu_port,
 	.get_strings		= mt7530_get_strings,
 	.get_ethtool_stats	= mt7530_get_ethtool_stats,
@@ -3409,6 +3466,9 @@ mt7530_remove_common(struct mt7530_priv *priv)
 
 	dsa_unregister_switch(priv->ds);
 
+	if (priv->bus)
+		cancel_delayed_work_sync(&priv->stats_work);
+
 	mutex_destroy(&priv->reg_mutex);
 }
 EXPORT_SYMBOL_GPL(mt7530_remove_common);
diff --git a/drivers/net/dsa/mt7530.h b/drivers/net/dsa/mt7530.h
index 3e0090bed298d..dd33b0df3419e 100644
--- a/drivers/net/dsa/mt7530.h
+++ b/drivers/net/dsa/mt7530.h
@@ -796,6 +796,7 @@ struct mt7530_fdb {
  * @pvid:	The VLAN specified is to be considered a PVID at ingress.  Any
  *		untagged frames will be assigned to the related VLAN.
  * @sgmii_pcs:	Pointer to PCS instance for SerDes ports
+ * @stats:	Cached port statistics for MDIO-connected switches
  */
 struct mt7530_port {
 	bool enable;
@@ -803,6 +804,7 @@ struct mt7530_port {
 	u32 pm;
 	u16 pvid;
 	struct phylink_pcs *sgmii_pcs;
+	struct rtnl_link_stats64 stats;
 };
 
 /* Port 5 mode definitions of the MT7530 switch */
@@ -875,6 +877,9 @@ struct mt753x_info {
  * @create_sgmii:	Pointer to function creating SGMII PCS instance(s)
  * @active_cpu_ports:	Holding the active CPU ports
  * @mdiodev:		The pointer to the MDIO device structure
+ * @stats_lock:		Protects cached per-port stats from concurrent access
+ * @stats_work:		Delayed work for polling MIB counters on MDIO switches
+ * @stats_last:		Jiffies timestamp of last MIB counter poll
  */
 struct mt7530_priv {
 	struct device		*dev;
@@ -900,6 +905,9 @@ struct mt7530_priv {
 	int (*create_sgmii)(struct mt7530_priv *priv);
 	u8 active_cpu_ports;
 	struct mdio_device *mdiodev;
+	spinlock_t stats_lock; /* protects cached stats counters */
+	struct delayed_work stats_work;
+	unsigned long stats_last;
 };
 
 struct mt7530_hw_vlan_entry {
-- 
2.53.0


^ permalink raw reply related

* Re: [RFC PATCH 4/4] firmware: arm_ffa: check pkvm initailised when initailise ffa driver
From: Marc Zyngier @ 2026-04-19 10:41 UTC (permalink / raw)
  To: Yeoreum Yun
  Cc: linux-security-module, linux-kernel, linux-integrity,
	linux-arm-kernel, kvmarm, paul, jmorris, serge, zohar,
	roberto.sassu, dmitry.kasatkin, eric.snowberg, peterhuewe, jarkko,
	jgg, sudeep.holla, oupton, joey.gouly, suzuki.poulose, yuzenghui,
	catalin.marinas, will
In-Reply-To: <aeNeNjfO7i128TIP@e129823.arm.com>

On Sat, 18 Apr 2026 11:34:30 +0100,
Yeoreum Yun <yeoreum.yun@arm.com> wrote:
> 
> > > @@ -2035,6 +2037,16 @@ static int __init ffa_init(void)
> > >  	u32 buf_sz;
> > >  	size_t rxtx_bufsz = SZ_4K;
> > >
> > > +	/*
> > > +	 * When pKVM is enabled, the FF-A driver must be initialized
> > > +	 * after pKVM initialization. Otherwise, pKVM cannot negotiate
> > > +	 * the FF-A version or obtain RX/TX buffer information,
> > > +	 * which leads to failures in FF-A calls.
> > > +	 */
> > > +	if (IS_ENABLED(CONFIG_KVM) && is_protected_kvm_enabled() &&
> > > +	    !is_kvm_arm_initialised())
> > > +		return -EPROBE_DEFER;
> > > +
> >
> > That's still fundamentally wrong: pkvm is not ready until
> > finalize_pkvm() has finished, and that's not indicated by
> > is_kvm_arm_initialised().
> 
> Thanks. I miss the TSC bit set in here.

That's the least of the problems. None of the infrastructure is in
place at this stage...

> IMHO, I'd like to make an new state check function --
> is_pkvm_arm_initialised() so that ff-a driver to know whether
> pkvm is initialised.

Doesn't sound great, TBH.

> or any other suggestion?

Instead of adding more esoteric predicates, I'd rather you build on an
existing infrastructure. You have a dependency on KVM, use something
that is designed to enforce dependencies. Device links spring to mind
as something designed for that.

Can you look into enabling this for KVM? If that's possible, then it
should be easy enough to delay the actual KVM registration after pKVM
is finalised.

Thanks,

	M.

-- 
Jazz isn't dead. It just smells funny.


^ permalink raw reply

* Re: [PATCH] arm: dts: allwinner: t113s mangopi: enable watchdog for reboot
From: Jernej Škrabec @ 2026-04-19 10:38 UTC (permalink / raw)
  To: Andre Przywara
  Cc: Rob Herring, Krzysztof Kozlowski, Conor Dooley, Chen-Yu Tsai,
	Samuel Holland, Michal Piekos, devicetree, linux-arm-kernel,
	linux-sunxi, linux-kernel
In-Reply-To: <20260418135519.16e41490@ryzen.lan>

Dne sobota, 18. april 2026 ob 13:55:19 Srednjeevropski poletni čas je Andre Przywara napisal(a):
> On Fri, 17 Apr 2026 20:19:20 +0200
> Jernej Škrabec <jernej.skrabec@gmail.com> wrote:
> 
> > Hi,
> > 
> > Dne nedelja, 12. april 2026 ob 19:42:10 Srednjeevropski poletni čas je Michal Piekos napisal(a):
> > > Reboot hangs on MangoPi MQ-R T113s because no restart handler is
> > > available.
> > > 
> > > Enable the SoC watchdog whose driver registers a restart handler.
> > > 
> > > Tested on MangoPi MQ-R T113s.
> > > 
> > > Signed-off-by: Michal Piekos <michal.piekos@mmpsystems.pl>
> > > ---
> > >  arch/arm/boot/dts/allwinner/sun8i-t113s-mangopi-mq-r-t113.dts | 4 ++++
> > >  1 file changed, 4 insertions(+)
> > > 
> > > diff --git a/arch/arm/boot/dts/allwinner/sun8i-t113s-mangopi-mq-r-t113.dts b/arch/arm/boot/dts/allwinner/sun8i-t113s-mangopi-mq-r-t113.dts
> > > index 8b3a75383816..f0232a5e903b 100644
> > > --- a/arch/arm/boot/dts/allwinner/sun8i-t113s-mangopi-mq-r-t113.dts
> > > +++ b/arch/arm/boot/dts/allwinner/sun8i-t113s-mangopi-mq-r-t113.dts
> > > @@ -33,3 +33,7 @@ rtl8189ftv: wifi@1 {
> > >  		interrupt-names = "host-wake";
> > >  	};
> > >  };
> > > +
> > > +&wdt {
> > > +	status = "okay";
> > > +};  
> > 
> > Move this to sun8i-t113s.dtsi. All t113 boards have the same issue.
> > Watchdog should be always enabled on ARM.
> 
> We actually have that line in U-Boot:
> https://github.com/u-boot/u-boot/blob/master/arch/arm/dts/sunxi-u-boot.dtsi#L22-L27
> 
> IIRC, the idea was that it is *firmware* that chooses the watchdog, so
> the generic DT should not be the place to set this.

Why would firmware need to select watchdog? We only had issue on H6
with it, so I kind of get it for that, but in general, all ARM based SoCs
have it enabled by default which is IMO how it should be.

Best regards,
Jernej

> 
> If people use $fdtcontroladdr as the DT source, everything falls in
> place neatly, no need for changes or runtime patching.
> 
> Cheers,
> Andre
> 






^ permalink raw reply

* Re: [PATCH] gpio: rockchip: Fix GPIO after convert to dynamic base allocation
From: Heiko Stuebner @ 2026-04-19 10:15 UTC (permalink / raw)
  To: Linus Walleij, Bartosz Golaszewski, Shawn Lin, Jonas Karlman
  Cc: Jonas Karlman, Bartosz Golaszewski, linux-gpio, linux-arm-kernel,
	linux-rockchip, linux-kernel
In-Reply-To: <20260416154928.2103388-1-jonas@kwiboo.se>

Am Donnerstag, 16. April 2026, 17:49:28 Mitteleuropäische Sommerzeit schrieb Jonas Karlman:
> The commit c8079f83e0bf ("gpio: rockchip: convert to dynamic GPIO base
> allocation") broke GPIO on devices using device trees which don't set
> the gpio-ranges property, something only Rockchip RK35xx SoC DTs do.
> 
> On a Rockchip RK3399 device something like following is now observed:
> 
> [    0.082771] rockchip-gpio ff720000.gpio: probed /pinctrl/gpio@ff720000
> [    0.083531] rockchip-gpio ff730000.gpio: probed /pinctrl/gpio@ff730000
> [    0.084110] rockchip-gpio ff780000.gpio: probed /pinctrl/gpio@ff780000
> [    0.084746] rockchip-gpio ff788000.gpio: probed /pinctrl/gpio@ff788000
> [    0.085389] rockchip-gpio ff790000.gpio: probed /pinctrl/gpio@ff790000
> --
> [    0.212208] rockchip-pinctrl pinctrl: pin 637 is not registered so it cannot be requested
> [    0.212271] rockchip-pinctrl pinctrl: error -EINVAL: pin-637 (gpio3:637)
> [    0.212344] leds-gpio leds: error -EINVAL: Failed to get GPIO '/leds/led-0'
> [    0.212389] leds-gpio leds: probe with driver leds-gpio failed with error -22
> --
> [    0.607545] rockchip-pinctrl pinctrl: pin 519 is not registered so it cannot be requested
> [    0.608775] rockchip-pinctrl pinctrl: error -EINVAL: pin-519 (gpio0:519)
> [    0.610003] dwmmc_rockchip fe320000.mmc: probe with driver dwmmc_rockchip failed with error -22
> --
> [    0.805882] rockchip-pinctrl pinctrl: pin 547 is not registered so it cannot be requested
> [    0.806672] rockchip-pinctrl pinctrl: error -EINVAL: pin-547 (gpio1:547)
> [    0.807301] reg-fixed-voltage regulator-vbus-typec: error -EINVAL: can't get GPIO
> [    0.807307] rockchip-pinctrl pinctrl: pin 602 is not registered so it cannot be requested
> [    0.807970] reg-fixed-voltage regulator-vbus-typec: probe with driver reg-fixed-voltage failed with error -22
> [    0.808692] rockchip-pinctrl pinctrl: error -EINVAL: pin-602 (gpio2:602)
> [    0.810279] reg-fixed-voltage regulator-vcc3v3-pcie: error -EINVAL: can't get GPIO
> [    0.810284] rockchip-pinctrl pinctrl: pin 665 is not registered so it cannot be requested
> [    0.810299] rockchip-pinctrl pinctrl: error -EINVAL: pin-665 (gpio4:665)
> [    0.810960] reg-fixed-voltage regulator-vcc3v3-pcie: probe with driver reg-fixed-voltage failed with error -22
> [    0.811679] reg-fixed-voltage regulator-vcc5v0-host: error -EINVAL: can't get GPIO
> [    0.813943] reg-fixed-voltage regulator-vcc5v0-host: probe with driver reg-fixed-voltage failed with error -22
> --
> [    0.867788] rockchip-pinctrl pinctrl: pin 522 is not registered so it cannot be requested
> [    0.868537] rockchip-pinctrl pinctrl: error -EINVAL: pin-522 (gpio0:522)
> [    0.869166] pwrseq_simple sdio-pwrseq: error -EINVAL: reset GPIOs not ready
> [    0.869798] pwrseq_simple sdio-pwrseq: probe with driver pwrseq_simple failed with error -22
> --
> [    0.940365] rockchip-pinctrl pinctrl: pin 623 is not registered so it cannot be requested
> [    0.941084] rockchip-pinctrl pinctrl: error -EINVAL: pin-623 (gpio3:623)
> [    0.941823] rk_gmac-dwmac fe300000.ethernet: error -EINVAL: Cannot register the MDIO bus
> [    0.942542] rk_gmac-dwmac fe300000.ethernet: error -EINVAL: MDIO bus (id: 0) registration failed
> [    0.943772] rk_gmac-dwmac fe300000.ethernet: probe with driver rk_gmac-dwmac failed with error -22
> 
> Restore GPIO to a working state on devices using older Rockchip SoCs
> and/or DTs not having the gpio-ranges property set by restoring prior
> use of bank->pin_base as the pin_offset value.
> 
> Also change to use bank->nr_pins as the npins value to align and prevent
> a possible future breakage if gc->ngpio is ever changed to match the 32
> GPIOs each controller theoretically can handle.
> 
> Fixes: c8079f83e0bf ("gpio: rockchip: convert to dynamic GPIO base allocation")
> Signed-off-by: Jonas Karlman <jonas@kwiboo.se>

Acked-by: Heiko Stuebner <heiko@sntech.de>

> ---
>  drivers/gpio/gpio-rockchip.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpio/gpio-rockchip.c b/drivers/gpio/gpio-rockchip.c
> index 08ea64434f8f..44d7ebd12724 100644
> --- a/drivers/gpio/gpio-rockchip.c
> +++ b/drivers/gpio/gpio-rockchip.c
> @@ -617,7 +617,7 @@ static int rockchip_gpiolib_register(struct rockchip_pin_bank *bank)
>  			return -ENODEV;
>  
>  		ret = gpiochip_add_pin_range(gc, dev_name(pctldev->dev), 0,
> -					     gc->base, gc->ngpio);
> +					     bank->pin_base, bank->nr_pins);
>  		if (ret) {
>  			dev_err(bank->dev, "Failed to add pin range\n");
>  			goto fail;
> 






^ permalink raw reply

* Re: [PATCH] iommu/arm-smmu-qcom: Fix fastrpc compatible string in ACTLR client match table
From: Shawn Guo @ 2026-04-19  9:52 UTC (permalink / raw)
  To: bibek.patro
  Cc: Rob Clark, Will Deacon, Robin Murphy, Joerg Roedel,
	Dmitry Baryshkov, iommu, linux-arm-msm, linux-arm-kernel,
	linux-kernel, srinivas.kandagatla
In-Reply-To: <20260408130825.3268733-1-bibek.patro@oss.qualcomm.com>

On Wed, Apr 08, 2026 at 06:38:25PM +0530, bibek.patro@oss.qualcomm.com wrote:
> From: Bibek Kumar Patro <bibek.patro@oss.qualcomm.com>
...
> Assisted-by: Anthropic:claude-4-6-sonnet

Nit - coding-assistants.rst suggests format:

  Assisted-by: AGENT_NAME:MODEL_VERSION

So I guess this might be better?

  Assisted-by: Claude:claude-4-6-sonnet

Shawn

> Fixes: 3e35c3e725de ("iommu/arm-smmu: Add ACTLR data and support for qcom_smmu_500")
> Signed-off-by: Bibek Kumar Patro <bibek.patro@oss.qualcomm.com>


^ permalink raw reply

* Re: [PATCH v4 2/8] dt-bindings: arm: Add zx297520v3 board binding
From: Stefan Dösinger @ 2026-04-19  8:30 UTC (permalink / raw)
  To: Rob Herring (Arm)
  Cc: linux-kernel, Conor Dooley, Jonathan Corbet, Alexandre Belloni,
	Greg Kroah-Hartman, linux-doc, devicetree, Drew Fustini,
	Linus Walleij, Jiri Slaby, Russell King, soc, Arnd Bergmann,
	Krzysztof Kozlowski, Krzysztof Kozlowski, linux-arm-kernel,
	linux-serial, Shuah Khan
In-Reply-To: <177646012448.2165534.5760108355183774935.robh@kernel.org>

[-- Attachment #1: Type: text/plain, Size: 710 bytes --]

Hi Rob,

Am Samstag, 18. April 2026, 00:08:44 Ostafrikanische Zeit schrieben Sie:

> If you already ran 'make dt_binding_check' and didn't see the above
> error(s), then make sure 'yamllint' is installed and dt-schema is up to
> date:

Here is a new PEBKAC issue for your mail template: I ran dt_binding_check, it 
wrote the warning you pointed out, but I only checked the return value - which 
indicated success. Which I guess makes sense for a warning, since there seem 
to be a few preexisting ones. The warning itself was somewhere in the 
scrollback because I let dt_binding_check check all the files.

So I learned I have to actually look at the output to see if there are any 
warnings.

Cheers,
Stefan

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 870 bytes --]

^ permalink raw reply

* [PATCH v2] pwm: atmel-tcb: Cache clock rates and mark chip as atomic
From: Sangyun Kim @ 2026-04-19  8:08 UTC (permalink / raw)
  To: ukleinek
  Cc: nicolas.ferre, alexandre.belloni, claudiu.beznea, linux-pwm,
	linux-arm-kernel, linux-kernel
In-Reply-To: <20260415093433.2359955-1-sangyun.kim@snu.ac.kr>

atmel_tcb_pwm_apply() holds tcbpwmc->lock as a spinlock via
guard(spinlock)() and then calls atmel_tcb_pwm_config(), which calls
clk_get_rate() twice. clk_get_rate() acquires clk_prepare_lock (a
mutex), so this is a sleep-in-atomic-context violation.

On CONFIG_DEBUG_ATOMIC_SLEEP kernels every pwm_apply_state() that
enables or reconfigures the PWM triggers a "BUG: sleeping function
called from invalid context" warning.

Acquire exclusive control over the clock rates with
clk_rate_exclusive_get() at probe time and cache the rates in struct
atmel_tcb_pwm_chip, then read the cached rates from
atmel_tcb_pwm_config(). This keeps the spinlock-based mutual exclusion
introduced in commit 37f7707077f5 ("pwm: atmel-tcb: Fix race condition
and convert to guards") and removes the sleeping calls from the atomic
section.

With no sleeping calls left in .apply() and the regmap-mmio bus already
running with fast_io=true, also mark the chip as atomic so consumers
can use pwm_apply_atomic() from atomic context.

Fixes: 37f7707077f5 ("pwm: atmel-tcb: Fix race condition and convert to guards")
Signed-off-by: Sangyun Kim <sangyun.kim@snu.ac.kr>
---
Hi Uwe,

Thanks for the review! "Sangyun" is the right form to address me, no
worries.

Changes in v2:
 - Keep the spinlock instead of converting tcbpwmc->lock to a mutex.
 - Cache clk and slow_clk rates at probe via clk_rate_exclusive_get()
   so the .apply() path no longer calls clk_get_rate() under the
   spinlock.
 - Mark the chip as atomic now that .apply() has no sleeping calls.

Thanks,
Sangyun

 drivers/pwm/pwm-atmel-tcb.c | 28 +++++++++++++++++++++++++---
 1 file changed, 25 insertions(+), 3 deletions(-)

diff --git a/drivers/pwm/pwm-atmel-tcb.c b/drivers/pwm/pwm-atmel-tcb.c
index f9a9c12cbcdd..8d46ce28f736 100644
--- a/drivers/pwm/pwm-atmel-tcb.c
+++ b/drivers/pwm/pwm-atmel-tcb.c
@@ -50,6 +50,8 @@ struct atmel_tcb_pwm_chip {
 	spinlock_t lock;
 	u8 channel;
 	u8 width;
+	unsigned long rate;
+	unsigned long slow_rate;
 	struct regmap *regmap;
 	struct clk *clk;
 	struct clk *gclk;
@@ -266,7 +268,7 @@ static int atmel_tcb_pwm_config(struct pwm_chip *chip, struct pwm_device *pwm,
 	int slowclk = 0;
 	unsigned period;
 	unsigned duty;
-	unsigned rate = clk_get_rate(tcbpwmc->clk);
+	unsigned long rate = tcbpwmc->rate;
 	unsigned long long min;
 	unsigned long long max;
 
@@ -294,7 +296,7 @@ static int atmel_tcb_pwm_config(struct pwm_chip *chip, struct pwm_device *pwm,
 	 */
 	if (i == ARRAY_SIZE(atmel_tcb_divisors)) {
 		i = slowclk;
-		rate = clk_get_rate(tcbpwmc->slow_clk);
+		rate = tcbpwmc->slow_rate;
 		min = div_u64(NSEC_PER_SEC, rate);
 		max = min << tcbpwmc->width;
 
@@ -431,6 +433,7 @@ static int atmel_tcb_pwm_probe(struct platform_device *pdev)
 	}
 
 	chip->ops = &atmel_tcb_pwm_ops;
+	chip->atomic = true;
 	tcbpwmc->channel = channel;
 	tcbpwmc->width = config->counter_width;
 
@@ -438,16 +441,33 @@ static int atmel_tcb_pwm_probe(struct platform_device *pdev)
 	if (err)
 		goto err_gclk;
 
+	err = clk_rate_exclusive_get(tcbpwmc->clk);
+	if (err)
+		goto err_disable_clk;
+
+	err = clk_rate_exclusive_get(tcbpwmc->slow_clk);
+	if (err)
+		goto err_clk_unlock;
+
+	tcbpwmc->rate = clk_get_rate(tcbpwmc->clk);
+	tcbpwmc->slow_rate = clk_get_rate(tcbpwmc->slow_clk);
+
 	spin_lock_init(&tcbpwmc->lock);
 
 	err = pwmchip_add(chip);
 	if (err < 0)
-		goto err_disable_clk;
+		goto err_slow_clk_unlock;
 
 	platform_set_drvdata(pdev, chip);
 
 	return 0;
 
+err_slow_clk_unlock:
+	clk_rate_exclusive_put(tcbpwmc->slow_clk);
+
+err_clk_unlock:
+	clk_rate_exclusive_put(tcbpwmc->clk);
+
 err_disable_clk:
 	clk_disable_unprepare(tcbpwmc->slow_clk);
 
@@ -470,6 +490,8 @@ static void atmel_tcb_pwm_remove(struct platform_device *pdev)
 
 	pwmchip_remove(chip);
 
+	clk_rate_exclusive_put(tcbpwmc->slow_clk);
+	clk_rate_exclusive_put(tcbpwmc->clk);
 	clk_disable_unprepare(tcbpwmc->slow_clk);
 	clk_put(tcbpwmc->gclk);
 	clk_put(tcbpwmc->clk);
-- 
2.34.1



^ permalink raw reply related

* [PATCH] Input: imx_keypad - Fix spelling mistake "Colums" -> "Columns"
From: Ethan Carter Edwards @ 2026-04-19  0:58 UTC (permalink / raw)
  To: Dmitry Torokhov, Frank Li, Sascha Hauer, Pengutronix Kernel Team,
	Fabio Estevam
  Cc: linux-input, imx, linux-arm-kernel, linux-kernel, kernel-janitors,
	Ethan Carter Edwards

There is a spelling mistake in a two comments. Fix them.

Signed-off-by: Ethan Carter Edwards <ethan@ethancedwards.com>
---
 drivers/input/keyboard/imx_keypad.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/input/keyboard/imx_keypad.c b/drivers/input/keyboard/imx_keypad.c
index 069c1d6376e1..ccde60cd6bb3 100644
--- a/drivers/input/keyboard/imx_keypad.c
+++ b/drivers/input/keyboard/imx_keypad.c
@@ -324,7 +324,7 @@ static void imx_keypad_config(struct imx_keypad *keypad)
 	reg_val |= (keypad->cols_en_mask & 0xff) << 8;	/* cols */
 	writew(reg_val, keypad->mmio_base + KPCR);
 
-	/* Write 0's to KPDR[15:8] (Colums) */
+	/* Write 0's to KPDR[15:8] (Columns) */
 	reg_val = readw(keypad->mmio_base + KPDR);
 	reg_val &= 0x00ff;
 	writew(reg_val, keypad->mmio_base + KPDR);
@@ -357,7 +357,7 @@ static void imx_keypad_inhibit(struct imx_keypad *keypad)
 	reg_val |= KBD_STAT_KPKR | KBD_STAT_KPKD;
 	writew(reg_val, keypad->mmio_base + KPSR);
 
-	/* Colums as open drain and disable all rows */
+	/* Columns as open drain and disable all rows */
 	reg_val = (keypad->cols_en_mask & 0xff) << 8;
 	writew(reg_val, keypad->mmio_base + KPCR);
 }

---
base-commit: c7275b05bc428c7373d97aa2da02d3a7fa6b9f66
change-id: 20260418-imx-typo-14370bd2ce47

Best regards,
-- 
Ethan Carter Edwards <ethan@ethancedwards.com>



^ permalink raw reply related

* Re: [PATCH v2 1/3] MAINTAINERS: Move Peter De Schrijver to CREDITS
From: Aaro Koskinen @ 2026-04-18 20:07 UTC (permalink / raw)
  To: Thierry Reding
  Cc: Geert Uytterhoeven, linux-tegra, linux-arm-kernel, linux-pm,
	linux-omap, linux-m68k, devicetree, linux-kernel, Paul Walmsley
In-Reply-To: <20260417131549.3154534-1-thierry.reding@kernel.org>

Hi,

On Fri, Apr 17, 2026 at 03:15:46PM +0200, Thierry Reding wrote:
> From: Thierry Reding <treding@nvidia.com>
> 
> Peter sadly passed away a while back. Paul did a much better job at
> finding the right words to mourn this loss than I ever could, so I will
> leave this link here:
> 
>   https://lore.kernel.org/lkml/alpine.DEB.2.21.999.2407240345480.11116@utopia.booyaka.com/T/#u
> 
> Co-developed-by: Paul Walmsley <pjw@kernel.org>
> Co-developed-by: Aaro Koskinen <aaro.koskinen@iki.fi>
> Co-developed-by: Geert Uytterhoeven <geert@linux-m68k.org>
> Signed-off-by: Thierry Reding <treding@nvidia.com>

Reviewed-by: Aaro Koskinen <aaro.koskinen@iki.fi>

A.

> ---
> Changes in v2:
> - add more missing entries
> 
>  CREDITS     | 10 ++++++++++
>  MAINTAINERS |  1 -
>  2 files changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/CREDITS b/CREDITS
> index 885fb05d8816..afd1f70b41cf 100644
> --- a/CREDITS
> +++ b/CREDITS
> @@ -3645,7 +3645,17 @@ D: Macintosh IDE Driver
>  
>  N: Peter De Schrijver
>  E: stud11@cc4.kuleuven.ac.be
> +E: p2@mind.be
> +E: peter.de-schrijver@nokia.com
> +E: pdeschrijver@nvidia.com
> +E: p2@psychaos.be
> +D: Apollo Domain workstations
> +D: Ariadne and Hydra Amiga Ethernet drivers
> +D: IBM PS/2, Microchannel, and Token Ring support
>  D: Mitsumi CD-ROM driver patches March version
> +D: TWL4030 power management and audio codec driver
> +D: OMAP power management
> +D: NVIDIA Tegra clock and BPMP drivers, among many other things
>  S: Molenbaan 29
>  S: B2240 Zandhoven
>  S: Belgium
> diff --git a/MAINTAINERS b/MAINTAINERS
> index ef978bfca514..ffe20d770249 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -26145,7 +26145,6 @@ T:	git git://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux.git
>  N:	[^a-z]tegra
>  
>  TEGRA CLOCK DRIVER
> -M:	Peter De Schrijver <pdeschrijver@nvidia.com>
>  M:	Prashant Gaikwad <pgaikwad@nvidia.com>
>  S:	Supported
>  F:	drivers/clk/tegra/
> -- 
> 2.52.0
> 
> 


^ permalink raw reply

* Re: [PATCH net,v3 1/1] net: stmmac: Update default_an_inband before passing value to phylink_config
From: patchwork-bot+netdevbpf @ 2026-04-18 19:20 UTC (permalink / raw)
  To: KhaiWenTan
  Cc: andrew+netdev, davem, edumazet, kuba, pabeni, mcoquelin.stm32,
	alexandre.torgue, rmk+kernel, maxime.chevallier, netdev,
	linux-stm32, linux-arm-kernel, linux-kernel, yoong.siang.song,
	hong.aun.looi, khai.wen.tan
In-Reply-To: <20260416102609.7953-1-khai.wen.tan@intel.com>

Hello:

This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Thu, 16 Apr 2026 18:26:09 +0800 you wrote:
> From: KhaiWenTan <khai.wen.tan@linux.intel.com>
> 
> get_interfaces() will update both the plat->phy_interfaces and
> mdio_bus_data->default_an_inband based on reading a SERDES register. As
> get_interfaces() will be called after default_an_inband had already been
> read, dwmac-intel regressed as a result with incorrect default_an_inband
> value in phylink_config.
> 
> [...]

Here is the summary with links:
  - [net,v3,1/1] net: stmmac: Update default_an_inband before passing value to phylink_config
    https://git.kernel.org/netdev/net/c/8cff9dbe89d8

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html




^ permalink raw reply

* [PATCH wireless] wifi: mt76: mt7996: Fix NULL pointer dereference in mt7996_init_tx_queues()
From: Lorenzo Bianconi @ 2026-04-18 18:04 UTC (permalink / raw)
  To: Felix Fietkau, Ryder Lee, Shayne Chen, Sean Wang, Johannes Berg,
	Matthias Brugger, AngeloGioacchino Del Regno, Lorenzo Bianconi
  Cc: linux-wireless, linux-arm-kernel, linux-mediatek

When MT76_NPU and CONFIG_NET_MEDIATEK_SOC_WED are enabled and
mt76 detects properly the Airoha NPU SoC, mt7996_init_tx_queues() will
dereference a NULL WED pointer.
Fix the issue by always passing the WED pointer from mt7996_dma_init().

Fixes: cd7951f242a7 ("wifi: mt76: mt7996: Integrate MT7990 dma configuration for NPU")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
 drivers/net/wireless/mediatek/mt76/mt7996/dma.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/wireless/mediatek/mt76/mt7996/dma.c b/drivers/net/wireless/mediatek/mt76/mt7996/dma.c
index 8f5d297dafce..3d9353811a02 100644
--- a/drivers/net/wireless/mediatek/mt76/mt7996/dma.c
+++ b/drivers/net/wireless/mediatek/mt76/mt7996/dma.c
@@ -683,7 +683,7 @@ int mt7996_dma_init(struct mt7996_dev *dev)
 		ret = mt7996_init_tx_queues(&dev->phy, MT_TXQ_ID(0),
 					    MT7996_NPU_TX_RING_SIZE,
 					    MT_TXQ_RING_BASE(0) + hif1_ofs,
-					    NULL);
+					    wed);
 	else
 		ret = mt7996_init_tx_queues(&dev->phy,
 					    MT_TXQ_ID(dev->mphy.band_idx),

---
base-commit: 1f5ffc672165ff851063a5fd044b727ab2517ae3
change-id: 20260418-mt7996-dma-init-npu-fix-7f2840c49208

Best regards,
-- 
Lorenzo Bianconi <lorenzo@kernel.org>



^ permalink raw reply related

* Re: [PATCH net] net: dsa: mt7530: fix .get_stats64 sleeping in atomic context
From: Jakub Kicinski @ 2026-04-18 18:01 UTC (permalink / raw)
  To: daniel
  Cc: Jakub Kicinski, chester.a.unal, andrew, olteanv, davem, edumazet,
	pabeni, matthias.bgg, angelogioacchino.delregno, linux,
	ansuelsmth, netdev, linux-kernel, linux-arm-kernel,
	linux-mediatek, frank-w, john
In-Reply-To: <79dc0ec5b6be698b14cb66339d6f63033ca2934a.1776397542.git.daniel@makrotopia.org>

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
---
Subject: net: dsa: mt7530: fix .get_stats64 sleeping in atomic context

This commit addresses a sleep-in-atomic bug when reading MDIO-connected
switch port statistics by delegating the register reads to a delayed
workqueue. The get_stats64 callback now returns cached values protected
by a spinlock instead of synchronously reading from the MDIO bus.
MMIO-connected switches are unaffected and continue to read counters
directly.

> diff --git a/drivers/net/dsa/mt7530.c b/drivers/net/dsa/mt7530.c
> index b9423389c2ef0..786d3a8492bcb 100644
> --- a/drivers/net/dsa/mt7530.c
> +++ b/drivers/net/dsa/mt7530.c

[ ... ]

> @@ -951,6 +952,43 @@ static void mt7530_get_stats64(struct dsa_switch *ds, int port,
>  			       &storage->rx_crc_errors);
>  }
>  
> +static void mt7530_stats_poll(struct work_struct *work)
> +{
> +	struct mt7530_priv *priv = container_of(work, struct mt7530_priv,
> +						stats_work.work);
> +	struct rtnl_link_stats64 stats = {};
> +	struct dsa_port *dp;
> +	int port;
> +
> +	dsa_switch_for_each_user_port(dp, priv->ds) {
> +		port = dp->index;
> +
> +		mt7530_read_port_stats64(priv, port, &stats);
> +
> +		spin_lock(&priv->stats_lock);

Are we at risk of a deadlock here?

The get_stats64 callback can be called by dev_get_stats(), which runs in
softirq context.

If this delayed work runs in process context and is preempted by a softirq
that attempts to acquire the lock via dev_get_stats(), a deadlock could
occur. Both paths should likely use spin_lock_bh() and spin_unlock_bh().

> +		priv->ports[port].stats = stats;
> +		spin_unlock(&priv->stats_lock);
> +	}
> +
> +	schedule_delayed_work(&priv->stats_work,
> +			      MT7530_STATS_POLL_INTERVAL);
> +}
> +
> +static void mt7530_get_stats64(struct dsa_switch *ds, int port,
> +			       struct rtnl_link_stats64 *storage)
> +{
> +	struct mt7530_priv *priv = ds->priv;
> +
> +	if (priv->bus) {
> +		spin_lock(&priv->stats_lock);
> +		*storage = priv->ports[port].stats;
> +		spin_unlock(&priv->stats_lock);
> +		mod_delayed_work(system_wq, &priv->stats_work, 0);

Could this unconditional background refresh cause a denial of service on
the MDIO bus?

If an unprivileged user continually reads network statistics in a tight
loop, mt7530_stats_poll() will execute continuously, potentially starving
other MDIO tasks like PHY link-state monitoring.

Additionally, is it possible for this call to re-arm the workqueue during
teardown?

In mt7530_remove_common(), cancel_delayed_work_sync() is called before
dsa_unregister_switch(). If get_stats64() is called during or after
dsa_unregister_switch(), it would re-arm the workqueue after it was
explicitly cancelled, potentially causing a use-after-free when priv is
freed.

> +	} else {
> +		mt7530_read_port_stats64(priv, port, storage);
> +	}
> +}

[ ... ]

> @@ -3137,6 +3175,13 @@ mt753x_setup(struct dsa_switch *ds)
>  	if (ret && priv->irq_domain)
>  		mt7530_free_mdio_irq(priv);
>  
> +	if (!ret && priv->bus) {
> +		spin_lock_init(&priv->stats_lock);
> +		INIT_DELAYED_WORK(&priv->stats_work, mt7530_stats_poll);
> +		schedule_delayed_work(&priv->stats_work,
> +				      MT7530_STATS_POLL_INTERVAL);

If dsa_register_switch() encounters an error after this setup callback
completes successfully, the DSA core will attempt to clean up.

Since mt7530_switch_ops does not implement a teardown callback to cancel
this delayed work, will the work remain scheduled and eventually
dereference freed priv memory once the driver core frees it via devres?

> +	}
> +
>  	return ret;
>  }
-- 
pw-bot: cr

^ permalink raw reply

* Re: [PATCH] cpu/hotplug: Fix NULL kobject warning in cpuhp_smt_enable()
From: Catalin Marinas @ 2026-04-18 15:05 UTC (permalink / raw)
  To: Jinjie Ruan
  Cc: tglx, peterz, sudeep.holla, yangyicong, dietmar.eggemann,
	Jonathan.Cameron, linux-kernel, James Morse, linux-arm-kernel
In-Reply-To: <aeNxKpHzTQX4_kId@arm.com>

On Sat, Apr 18, 2026 at 12:55:22PM +0100, Catalin Marinas wrote:
> On Fri, Apr 17, 2026 at 03:55:34PM +0800, Jinjie Ruan wrote:
> > When booting with `maxcpus` greater than the number of present CPUs (e.g.,
> > QEMU -smp cpus=4,maxcpus=8), some CPUs are marked as 'present' but have not
> > yet been registered via register_cpu(). Consequently, the per-cpu device
> > objects for these CPUs are not yet initialized.
[...]
> Another option would have been to avoid marking such CPUs present but I
> think this will break other things. Yet another option is to register
> all CPU devices even if they never come up (like maxcpus greater than
> actual CPUs).

Something like below, untested (and I don't claim I properly understand
this code; just lots of tokens used trying to make sense of it ;))

------------------------8<-------------------------
diff --git a/arch/arm64/kernel/acpi.c b/arch/arm64/kernel/acpi.c
index a9d884fd1d00..4c0a5ed906ea 100644
--- a/arch/arm64/kernel/acpi.c
+++ b/arch/arm64/kernel/acpi.c
@@ -448,12 +448,14 @@ int acpi_map_cpu(acpi_handle handle, phys_cpuid_t physid, u32 apci_id,
 		return *pcpu;
 	}
 
+	set_cpu_present(*pcpu, true);
 	return 0;
 }
 EXPORT_SYMBOL(acpi_map_cpu);
 
 int acpi_unmap_cpu(int cpu)
 {
+	set_cpu_present(cpu, false);
 	return 0;
 }
 EXPORT_SYMBOL(acpi_unmap_cpu);
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index 1aa324104afb..751a74d997e1 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -510,8 +510,10 @@ int arch_register_cpu(int cpu)
 	struct cpu *c = &per_cpu(cpu_devices, cpu);
 
 	if (!acpi_disabled && !acpi_handle &&
-	    IS_ENABLED(CONFIG_ACPI_HOTPLUG_CPU))
+	    IS_ENABLED(CONFIG_ACPI_HOTPLUG_CPU)) {
+		set_cpu_present(cpu, false);
 		return -EPROBE_DEFER;
+	}
 
 #ifdef CONFIG_ACPI_HOTPLUG_CPU
 	/* For now block anything that looks like physical CPU Hotplug */


^ permalink raw reply related

* Re: [PATCH] arm: dts: allwinner: t113s mangopi: enable watchdog for reboot
From: Andre Przywara @ 2026-04-18 11:55 UTC (permalink / raw)
  To: Jernej Škrabec
  Cc: Rob Herring, Krzysztof Kozlowski, Conor Dooley, Chen-Yu Tsai,
	Samuel Holland, Michal Piekos, devicetree, linux-arm-kernel,
	linux-sunxi, linux-kernel
In-Reply-To: <2825865.mvXUDI8C0e@jernej-laptop>

On Fri, 17 Apr 2026 20:19:20 +0200
Jernej Škrabec <jernej.skrabec@gmail.com> wrote:

> Hi,
> 
> Dne nedelja, 12. april 2026 ob 19:42:10 Srednjeevropski poletni čas je Michal Piekos napisal(a):
> > Reboot hangs on MangoPi MQ-R T113s because no restart handler is
> > available.
> > 
> > Enable the SoC watchdog whose driver registers a restart handler.
> > 
> > Tested on MangoPi MQ-R T113s.
> > 
> > Signed-off-by: Michal Piekos <michal.piekos@mmpsystems.pl>
> > ---
> >  arch/arm/boot/dts/allwinner/sun8i-t113s-mangopi-mq-r-t113.dts | 4 ++++
> >  1 file changed, 4 insertions(+)
> > 
> > diff --git a/arch/arm/boot/dts/allwinner/sun8i-t113s-mangopi-mq-r-t113.dts b/arch/arm/boot/dts/allwinner/sun8i-t113s-mangopi-mq-r-t113.dts
> > index 8b3a75383816..f0232a5e903b 100644
> > --- a/arch/arm/boot/dts/allwinner/sun8i-t113s-mangopi-mq-r-t113.dts
> > +++ b/arch/arm/boot/dts/allwinner/sun8i-t113s-mangopi-mq-r-t113.dts
> > @@ -33,3 +33,7 @@ rtl8189ftv: wifi@1 {
> >  		interrupt-names = "host-wake";
> >  	};
> >  };
> > +
> > +&wdt {
> > +	status = "okay";
> > +};  
> 
> Move this to sun8i-t113s.dtsi. All t113 boards have the same issue.
> Watchdog should be always enabled on ARM.

We actually have that line in U-Boot:
https://github.com/u-boot/u-boot/blob/master/arch/arm/dts/sunxi-u-boot.dtsi#L22-L27

IIRC, the idea was that it is *firmware* that chooses the watchdog, so
the generic DT should not be the place to set this.

If people use $fdtcontroladdr as the DT source, everything falls in
place neatly, no need for changes or runtime patching.

Cheers,
Andre


^ permalink raw reply

* Re: [PATCH] cpu/hotplug: Fix NULL kobject warning in cpuhp_smt_enable()
From: Catalin Marinas @ 2026-04-18 11:55 UTC (permalink / raw)
  To: Jinjie Ruan
  Cc: tglx, peterz, sudeep.holla, yangyicong, dietmar.eggemann,
	Jonathan.Cameron, linux-kernel, James Morse, linux-arm-kernel
In-Reply-To: <20260417075534.3745793-1-ruanjinjie@huawei.com>

+ James Morse, linux-arm-kernel

On Fri, Apr 17, 2026 at 03:55:34PM +0800, Jinjie Ruan wrote:
> When booting with `maxcpus` greater than the number of present CPUs (e.g.,
> QEMU -smp cpus=4,maxcpus=8), some CPUs are marked as 'present' but have not
> yet been registered via register_cpu(). Consequently, the per-cpu device
> objects for these CPUs are not yet initialized.
> 
> In cpuhp_smt_enable(), the code iterates over all present CPUs. Calling
> _cpu_up() for these unregistered CPUs eventually leads to
> sysfs_create_group() being called with a NULL kobject (or a kobject
> without a directory), triggering the following warning in
> fs/sysfs/group.c:
> 
> 	if (WARN_ON(!kobj || (!update && !kobj->sd)))
> 		return -EINVAL;
> 
> Fix this by adding a check for get_cpu_device(cpu). This ensures that
> SMT is only enabled for CPUs that have successfully completed their
> base device registration in sysfs.
> 
> How to reproduce:
> 
> 	1. echo off > /sys/devices/system/cpu/smt/control
> 		psci: CPU1 killed (polled 0 ms)
> 		psci: CPU3 killed (polled 0 ms)
> 
> 	2. echo 2 > /sys/devices/system/cpu/smt/control
> 
> 	Detected PIPT I-cache on CPU1
> 	GICv3: CPU1: found redistributor 1 region 0:0x00000000080c0000
> 	CPU1: Booted secondary processor 0x0000000001 [0x410fd082]
> 	Detected PIPT I-cache on CPU3
> 	GICv3: CPU3: found redistributor 3 region 0:0x0000000008100000
> 	CPU3: Booted secondary processor 0x0000000003 [0x410fd082]
> 	------------[ cut here ]------------
> 	WARNING: fs/sysfs/group.c:137 at internal_create_group+0x41c/0x4bc, CPU#2: sh/181
> 	Modules linked in:
> 	CPU: 2 UID: 0 PID: 181 Comm: sh Not tainted 7.0.0-rc1-00010-g8d13386c7624 #142 PREEMPT
> 	Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
> 	pstate: 20000005 (nzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> 	pc : internal_create_group+0x41c/0x4bc
> 	lr : sysfs_create_group+0x18/0x24
> 	sp : ffff80008078ba40
> 	x29: ffff80008078ba40 x28: ffff296c980ad000 x27: ffff00007fb94128
> 	x26: 0000000000000054 x25: ffffd693e845f3f0 x24: 0000000000000001
> 	x23: 0000000000000001 x22: 0000000000000004 x21: 0000000000000000
> 	x20: ffffd693e845fc10 x19: 0000000000000004 x18: 00000000ffffffff
> 	x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
> 	x14: 0000000000000358 x13: 0000000000000007 x12: 0000000000000350
> 	x11: 0000000000000008 x10: 0000000000000407 x9 : 0000000000000400
> 	x8 : ffff00007fbf3b60 x7 : 0000000000000000 x6 : ffffd693e845f3f0
> 	x5 : ffff00007fb94128 x4 : 0000000000000000 x3 : ffff000000f4eac0
> 	x2 : ffffd693e7095a08 x1 : 0000000000000000 x0 : 0000000000000000
> 	Call trace:
> 	 internal_create_group+0x41c/0x4bc (P)
> 	 sysfs_create_group+0x18/0x24
> 	 topology_add_dev+0x1c/0x28
> 	 cpuhp_invoke_callback+0x104/0x20c
> 	 __cpuhp_invoke_callback_range+0x94/0x11c
> 	 _cpu_up+0x200/0x37c
> 	 cpuhp_smt_enable+0xbc/0x114
> 	 control_store+0xe8/0x1d4
> 	 dev_attr_store+0x18/0x2c
> 	 sysfs_kf_write+0x7c/0x94
> 	 kernfs_fop_write_iter+0x128/0x1b8
> 	 vfs_write+0x2b0/0x354
> 	 ksys_write+0x68/0xfc
> 	 __arm64_sys_write+0x1c/0x28
> 	 invoke_syscall+0x48/0x10c
> 	 el0_svc_common.constprop.0+0x40/0xe8
> 	 do_el0_svc+0x20/0x2c
> 	 el0_svc+0x34/0x124
> 	 el0t_64_sync_handler+0xa0/0xe4
> 	 el0t_64_sync+0x198/0x19c
> 	---[ end trace 0000000000000000 ]---
> 
> Cc: stable@vger.kernel.org
> Fixes: eed4583bcf9a6 ("arm64: Kconfig: Enable HOTPLUG_SMT")
> Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
> ---
>  kernel/cpu.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/kernel/cpu.c b/kernel/cpu.c
> index bc4f7a9ba64e..df725d92ad4f 100644
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -2706,6 +2706,13 @@ int cpuhp_smt_enable(void)
>  	cpu_maps_update_begin();
>  	cpu_smt_control = CPU_SMT_ENABLED;
>  	for_each_present_cpu(cpu) {
> +		/*
> +		 * Skip CPUs that have not been registered in sysfs yet.
> +		 * This avoids triggering NULL kobject warnings for maxcpus.
> +		 */
> +		if (!get_cpu_device(cpu))
> +			continue;
> +
>  		/* Skip online CPUs and CPUs on offline nodes */
>  		if (cpu_online(cpu) || !node_online(cpu_to_node(cpu)))
>  			continue;

I spent some time trying to understand how we can get into this
situation. It seems to be an ACPI only thing.

Since commit eba4675008a6 ("arm64: arch_register_cpu() variant to check
if an ACPI handle is now available.") as part of the virtual CPU hotplug
series, arch_register_cpu() can defer registering the CPU devices. IIUC
this is done later via acpi_processor_hotadd_init(). smp_prepare_cpus(),
however, still marks them as present.

cpuhp_smt_enable(), if triggered later, walks the present cpus and
attempts _cpu_up(). cpuhp_up_callbacks() will call topology_add_dev()
(registered as a CPUHP_TOPOLOGY_PREPARE callback) and warn since
get_cpu_device() returns NULL. I think this can't happen during early
_cpu_up() calls since smp_init() is called before the topology callback
has been registered (slightly later during do_basic_setup()).

I'm not sure what the best fix should be. If we go with something along
the lines of the above, I wonder whether we should instead return an
error in the topology_add_dev() callback if get_cpu_device() returns
NULL. It feels a bit wrong to add a check for sysfs in
cpuhp_smt_enable() just because of some callbacks attempted by
_cpu_up(). There's precedent in cpu_capacity_sysctl_add() returning
-ENOENT. However, this messes up the smt control enabling since
cpuhp_smt_enable() will bail out early and return an error.

Another option would have been to avoid marking such CPUs present but I
think this will break other things. Yet another option is to register
all CPU devices even if they never come up (like maxcpus greater than
actual CPUs).

Opinions? It might be an arm64+ACPI-only thing.

-- 
Catalin


^ permalink raw reply

* Re: [PATCH v2 2/9] driver core: Add dev_set_drv_queue_sync_state()
From: Danilo Krummrich @ 2026-04-18 11:23 UTC (permalink / raw)
  To: Ulf Hansson
  Cc: Saravana Kannan, Rafael J . Wysocki, Greg Kroah-Hartman, linux-pm,
	Sudeep Holla, Cristian Marussi, Kevin Hilman, Stephen Boyd,
	Marek Szyprowski, Bjorn Andersson, Abel Vesa, Peng Fan,
	Tomi Valkeinen, Maulik Shah, Konrad Dybcio, Thierry Reding,
	Jonathan Hunter, Geert Uytterhoeven, Dmitry Baryshkov,
	linux-arm-kernel, linux-kernel, Geert Uytterhoeven, driver-core
In-Reply-To: <20260410104058.83748-3-ulf.hansson@linaro.org>

On Fri Apr 10, 2026 at 12:40 PM CEST, Ulf Hansson wrote:
> diff --git a/include/linux/device.h b/include/linux/device.h
> index e65d564f01cd..f812e70bdf22 100644
> --- a/include/linux/device.h
> +++ b/include/linux/device.h
> @@ -994,6 +994,18 @@ static inline int dev_set_drv_sync_state(struct device *dev,
>  	return 0;
>  }
>  
> +static inline int dev_set_drv_queue_sync_state(struct device *dev,
> +					       void (*fn)(struct device *dev))

As this is a public function, please add some documentation.

> +{
> +	if (!dev || !dev->driver)
> +		return 0;
> +	if (dev->driver->queue_sync_state && dev->driver->queue_sync_state != fn)
> +		return -EBUSY;
> +	if (!dev->driver->queue_sync_state)
> +		dev->driver->queue_sync_state = fn;

I think this follows dev_set_drv_sync_state(), but I think it is worth pointing
out that it is yet another blocker for moving towards
const struct device_driver.

> +	return 0;
> +}
> +
>  static inline void dev_set_removable(struct device *dev,
>  				     enum device_removable removable)
>  {
> -- 
> 2.43.0



^ permalink raw reply

* Re: [PATCH v2 1/9] driver core: Enable suppliers to implement fine grained sync_state support
From: Danilo Krummrich @ 2026-04-18 11:23 UTC (permalink / raw)
  To: Ulf Hansson
  Cc: Saravana Kannan, Rafael J . Wysocki, Greg Kroah-Hartman, linux-pm,
	Sudeep Holla, Cristian Marussi, Kevin Hilman, Stephen Boyd,
	Marek Szyprowski, Bjorn Andersson, Abel Vesa, Peng Fan,
	Tomi Valkeinen, Maulik Shah, Konrad Dybcio, Thierry Reding,
	Jonathan Hunter, Geert Uytterhoeven, Dmitry Baryshkov,
	linux-arm-kernel, linux-kernel, Geert Uytterhoeven, driver-core
In-Reply-To: <20260410104058.83748-2-ulf.hansson@linaro.org>

On Fri Apr 10, 2026 at 12:40 PM CEST, Ulf Hansson wrote:
> The common sync_state support isn't fine grained enough for some types of
> suppliers, like power domains for example. Especially when a supplier
> provides multiple independent power domains, each with their own set of
> consumers. In these cases we need to wait for all consumers for all the
> provided power domains before invoking the supplier's ->sync_state().
>
> To allow a more fine grained sync_state support to be implemented on per
> supplier's driver basis, let's add a new optional callback. As soon as
> there is an update worth to consider in regards to managing sync_state for
> a supplier device, __device_links_queue_sync_state() invokes the callback.
>
> Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
> ---
>  drivers/base/core.c           | 7 ++++++-
>  include/linux/device/driver.h | 7 +++++++
>  2 files changed, 13 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/base/core.c b/drivers/base/core.c
> index 09b98f02f559..4085a011d8ca 100644
> --- a/drivers/base/core.c
> +++ b/drivers/base/core.c
> @@ -1106,7 +1106,9 @@ int device_links_check_suppliers(struct device *dev)
>   * Queues a device for a sync_state() callback when the device links write lock
>   * isn't held. This allows the sync_state() execution flow to use device links
>   * APIs.  The caller must ensure this function is called with
> - * device_links_write_lock() held.
> + * device_links_write_lock() held.  Note, if the optional queue_sync_state()
> + * callback has been assigned too, it gets called for every update to allowing a

s/allowing/allow/

> + * more fine grained support to be implemented on per supplier basis.
>   *
>   * This function does a get_device() to make sure the device is not freed while
>   * on this list.
> @@ -1126,6 +1128,9 @@ static void __device_links_queue_sync_state(struct device *dev,
>  	if (dev->state_synced)
>  		return;
>  
> +	if (dev->driver && dev->driver->queue_sync_state)
> +		dev->driver->queue_sync_state(dev);

This seems to be called without the device lock being held, which seems to allow
the queue_sync_state() callback to execute concurrently with remove(). This
opens the door for all kinds of UAF conditions in drivers.

This also made me aware that the above dev_has_sync_state() is probably broken,
as it also performs the following check without the device lock being held.

	dev->driver && dev->driver->sync_state

I think nothing prevents dev->driver to become NULL concurrently; in this case
READ_ONCE() should be sufficient though as it doesn't execute the callback.

I will send a patch for this.

> +
>  	list_for_each_entry(link, &dev->links.consumers, s_node) {
>  		if (!device_link_test(link, DL_FLAG_MANAGED))
>  			continue;
> diff --git a/include/linux/device/driver.h b/include/linux/device/driver.h
> index bbc67ec513ed..bc9ae1cbe03c 100644
> --- a/include/linux/device/driver.h
> +++ b/include/linux/device/driver.h
> @@ -68,6 +68,12 @@ enum probe_type {
>   *		be called at late_initcall_sync level. If the device has
>   *		consumers that are never bound to a driver, this function
>   *		will never get called until they do.
> + * @queue_sync_state: Similar to the ->sync_state() callback, but called to
> + *		allow syncing device state to software state in a more fine
> + *		grained way. It is called when there is an updated state that
> + *		may be worth to consider for any of the consumers linked to
> + *		this device. If implemented, the ->sync_state() callback is
> + *		required too.

What happens if this is not the case? Maybe worth to check and warn about this
in driver_register().

>   * @remove:	Called when the device is removed from the system to
>   *		unbind a device from this driver.
>   * @shutdown:	Called at shut-down time to quiesce the device.
> @@ -110,6 +116,7 @@ struct device_driver {
>  
>  	int (*probe) (struct device *dev);
>  	void (*sync_state)(struct device *dev);
> +	void (*queue_sync_state)(struct device *dev);
>  	int (*remove) (struct device *dev);
>  	void (*shutdown) (struct device *dev);
>  	int (*suspend) (struct device *dev, pm_message_t state);
> -- 
> 2.43.0



^ permalink raw reply

* Re: [PATCH v2 0/9] driver core / pmdomain: Add support for fined grained sync_state
From: Danilo Krummrich @ 2026-04-18 11:23 UTC (permalink / raw)
  To: Ulf Hansson
  Cc: Saravana Kannan, Rafael J . Wysocki, Greg Kroah-Hartman, linux-pm,
	Sudeep Holla, Cristian Marussi, Kevin Hilman, Stephen Boyd,
	Marek Szyprowski, Bjorn Andersson, Abel Vesa, Peng Fan,
	Tomi Valkeinen, Maulik Shah, Konrad Dybcio, Thierry Reding,
	Jonathan Hunter, Geert Uytterhoeven, Dmitry Baryshkov,
	linux-arm-kernel, linux-kernel, driver-core
In-Reply-To: <CAPDyKFrPz9gaBBp6xV1=KkoemEfapc0p3POZxuBTvDw7Vamxtg@mail.gmail.com>

On Fri Apr 17, 2026 at 1:27 PM CEST, Ulf Hansson wrote:
> + Danilo (for the driver core changes)

Thanks -- please also remember to Cc: driver-core@lists.linux.dev.


^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox