Linux-ARM-Kernel Archive on lore.kernel.org

Linux-ARM-Kernel Archive on lore.kernel.org
 help / color / mirror / Atom feed

* Re: [PATCH v2 1/8] dt-bindings: mfd: khadas: Add new compatible for Khadas VIM4 MCU
From: Rob Herring @ 2026-04-15 21:48 UTC (permalink / raw)
  To: Ronald Claveau
  Cc: Neil Armstrong, Lee Jones, Krzysztof Kozlowski, Conor Dooley,
	Andi Shyti, Kevin Hilman, Jerome Brunet, Martin Blumenstingl,
	Beniamino Galvani, Rafael J. Wysocki, Daniel Lezcano, Zhang Rui,
	Lukasz Luba, Liam Girdwood, Mark Brown, linux-amlogic, devicetree,
	linux-kernel, linux-i2c, linux-arm-kernel, linux-pm
In-Reply-To: <20260403-add-mcu-fan-khadas-vim4-v2-1-70536b22439a@aliel.fr>

On Fri, Apr 03, 2026 at 06:08:34PM +0200, Ronald Claveau wrote:
> The Khadas VIM4 MCU register is slightly different
> from previous boards' MCU.
> This board also features a switchable power source for its fan.
> 
> Signed-off-by: Ronald Claveau <linux-kernel-dev@aliel.fr>
> ---
>  Documentation/devicetree/bindings/mfd/khadas,mcu.yaml | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/Documentation/devicetree/bindings/mfd/khadas,mcu.yaml b/Documentation/devicetree/bindings/mfd/khadas,mcu.yaml
> index 084960fd5a1fd..67769ef5d58b1 100644
> --- a/Documentation/devicetree/bindings/mfd/khadas,mcu.yaml
> +++ b/Documentation/devicetree/bindings/mfd/khadas,mcu.yaml
> @@ -18,6 +18,7 @@ properties:
>    compatible:
>      enum:
>        - khadas,mcu # MCU revision is discoverable

The revision is no longer discoverable as was claimed?

> +      - khadas,vim4-mcu
>  
>    "#cooling-cells": # Only needed for boards having FAN control feature
>      const: 2
> @@ -25,6 +26,10 @@ properties:
>    reg:
>      maxItems: 1
>  
> +  fan-supply:
> +    description: Phandle to the regulator that powers the fan.
> +    $ref: /schemas/types.yaml#/definitions/phandle
> +
>  required:
>    - compatible
>    - reg
> 
> -- 
> 2.49.0
> 


^ permalink raw reply

* Re: [PATCH v1 1/1] i2c: mediatek: add bus regulator control for power saving
From: Andi Shyti @ 2026-04-15 21:45 UTC (permalink / raw)
  To: adlavinitha reddy
  Cc: Qii Wang, Matthias Brugger, AngeloGioacchino Del Regno, linux-i2c,
	linux-kernel, linux-arm-kernel, linux-mediatek,
	Project_Global_Chrome_Upstream_Group
In-Reply-To: <20260415125833.2579133-1-adlavinitha.reddy@mediatek.com>

> Thank you for your guidance on formatting and submission rules. I will study
> the documentation carefully and ensure I properly send patches in the future.
> 
> Regarding the multiple submissions, I sincerely apologize for the confusion.
> As an intern here, I encountered some unexpected internal email permission
> issues when trying to send the patch. In an attempt to double-check whether
> the email successfully went through the server, I mistakenly re-sent it a
> couple of times. I will be much more careful next time.

No worries Adlavinitha your patch got in anyway.

Have fun in the kernel community :-)

Andi


^ permalink raw reply

* [PATCH rc v2 2/5] iommu/arm-smmu-v3: Implement is_attach_deferred() for kdump
From: Nicolin Chen @ 2026-04-15 21:17 UTC (permalink / raw)
  To: will, robin.murphy, jgg, kevin.tian
  Cc: joro, praan, baolu.lu, miko.lenczewski, smostafa,
	linux-arm-kernel, iommu, linux-kernel, stable, jamien
In-Reply-To: <cover.1776286352.git.nicolinc@nvidia.com>

Though the kdump kernel adopts the crashed kernel's stream table, the iommu
core will still try to attach each probed device to a default domain, which
overwrites the adopted STE and breaks in-flight DMA from that device.

Implement an is_attach_deferred() callback to prevent this. For each device
that has STE.V=1 in the adopted table, defer the default domain attachment,
until the device driver explicitly requests it.

Fixes: b63b3439b856 ("iommu/arm-smmu-v3: Abort all transactions if SMMU is enabled in kdump kernel")
Cc: stable@vger.kernel.org # v6.12+
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 28 +++++++++++++++++++++
 1 file changed, 28 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 9a45f17200a21..d9d543eb8cecf 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -4212,6 +4212,33 @@ static void arm_smmu_remove_master(struct arm_smmu_master *master)
 	kfree(master->build_invs);
 }
 
+static bool arm_smmu_is_attach_deferred(struct device *dev)
+{
+	struct arm_smmu_master *master = dev_iommu_priv_get(dev);
+	struct arm_smmu_device *smmu = master->smmu;
+	int i;
+
+	if (!(smmu->options & ARM_SMMU_OPT_KDUMP))
+		return false;
+
+	for (i = 0; i < master->num_streams; i++) {
+		u32 sid = master->streams[i].id;
+		struct arm_smmu_ste *step;
+
+		/* Guard against unpopulated L2 entries in the adopted table */
+		if ((smmu->features & ARM_SMMU_FEAT_2_LVL_STRTAB) &&
+		    !smmu->strtab_cfg.l2.l2ptrs[arm_smmu_strtab_l1_idx(sid)])
+			continue;
+
+		step = arm_smmu_get_step_for_sid(smmu, sid);
+		/* If the STE has the Valid bit set, defer the attach */
+		if (le64_to_cpu(step->data[0]) & STRTAB_STE_0_V)
+			return true;
+	}
+
+	return false;
+}
+
 static struct iommu_device *arm_smmu_probe_device(struct device *dev)
 {
 	int ret;
@@ -4374,6 +4401,7 @@ static const struct iommu_ops arm_smmu_ops = {
 	.hw_info		= arm_smmu_hw_info,
 	.domain_alloc_sva       = arm_smmu_sva_domain_alloc,
 	.domain_alloc_paging_flags = arm_smmu_domain_alloc_paging_flags,
+	.is_attach_deferred	= arm_smmu_is_attach_deferred,
 	.probe_device		= arm_smmu_probe_device,
 	.release_device		= arm_smmu_release_device,
 	.device_group		= arm_smmu_device_group,
-- 
2.43.0



^ permalink raw reply related

* [PATCH rc v2 5/5] iommu/arm-smmu-v3: Detect ARM_SMMU_OPT_KDUMP in arm_smmu_device_hw_probe()
From: Nicolin Chen @ 2026-04-15 21:17 UTC (permalink / raw)
  To: will, robin.murphy, jgg, kevin.tian
  Cc: joro, praan, baolu.lu, miko.lenczewski, smostafa,
	linux-arm-kernel, iommu, linux-kernel, stable, jamien
In-Reply-To: <cover.1776286352.git.nicolinc@nvidia.com>

arm_smmu_device_hw_probe() runs before arm_smmu_init_structures(), so it's
natural to decide whether the kdump kernel must adopt the crashed kernel's
stream table.

Given that memremap is used to adopt the old stream table, set this option
only on a coherent SMMU.

Fixes: b63b3439b856 ("iommu/arm-smmu-v3: Abort all transactions if SMMU is enabled in kdump kernel")
Cc: stable@vger.kernel.org # v6.12+
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 12cd148a99dc6..5a5e0f80bbfb3 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -5388,6 +5388,25 @@ static int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu)
 
 	dev_info(smmu->dev, "oas %lu-bit (features 0x%08x)\n",
 		 smmu->oas, smmu->features);
+
+	/*
+	 * If SMMU is already active in kdump case, there could be in-flight DMA
+	 * from devices initiated by the crashed kernel. Mark ARM_SMMU_OPT_KDUMP
+	 * to let the init functions adopt the crashed kernel's stream table.
+	 *
+	 * Note that arm_smmu_adopt_strtab() uses memremap that can only work on
+	 * a coherent SMMU. A non-coherent SMMU has no choice but to continue to
+	 * abort any in-flight DMA.
+	 */
+	if (is_kdump_kernel() &&
+	    (readl_relaxed(smmu->base + ARM_SMMU_CR0) & CR0_SMMUEN)) {
+		if (coherent)
+			smmu->options |= ARM_SMMU_OPT_KDUMP;
+		else
+			dev_warn(smmu->dev,
+				 "kdump: in-flight DMA would be rejected\n");
+	}
+
 	return 0;
 }
 
-- 
2.43.0



^ permalink raw reply related

* [PATCH rc v2 1/5] iommu/arm-smmu-v3: Add arm_smmu_adopt_strtab() for kdump
From: Nicolin Chen @ 2026-04-15 21:17 UTC (permalink / raw)
  To: will, robin.murphy, jgg, kevin.tian
  Cc: joro, praan, baolu.lu, miko.lenczewski, smostafa,
	linux-arm-kernel, iommu, linux-kernel, stable, jamien
In-Reply-To: <cover.1776286352.git.nicolinc@nvidia.com>

When transitioning to a kdump kernel, the primary kernel might have crashed
while endpoint devices were actively bus-mastering DMA. Currently, the SMMU
driver aggressively resets the hardware during probe by clearing CR0_SMMUEN
and setting the Global Bypass Attribute (GBPA) to ABORT.

In a kdump scenario, this aggressive reset is highly destructive:
a) If GBPA is set to ABORT, in-flight DMA will be aborted, generating fatal
   PCIe AER or SErrors that may panic the kdump kernel
b) If GBPA is set to BYPASS, in-flight DMA targeting some IOVAs will bypass
   the SMMU and corrupt the physical memory at those 1:1 mapped IOVAs.

To safely absorb in-flight DMA, the kdump kernel must leave SMMUEN=1 intact
and avoid modifying STRTAB_BASE. This allows HW to continue translating in-
flight DMA using the crashed kernel's page tables until the endpoint device
drivers probe and quiesce their respective hardware.

However, the ARM SMMUv3 architecture specification states that updating the
SMMU_STRTAB_BASE register while SMMUEN == 1 is UNPREDICTABLE or ignored.

This leaves a kdump kernel no choice but to adopt the stream table from the
crashed kernel.

Introduce ARM_SMMU_OPT_KDUMP and arm_smmu_adopt_strtab() that does memremap
on all the stream tables extracted from STRTAB_BASE and STRTAB_BASE_CFG.

The option will be set in arm_smmu_device_hw_probe().

Fixes: b63b3439b856 ("iommu/arm-smmu-v3: Abort all transactions if SMMU is enabled in kdump kernel")
Cc: stable@vger.kernel.org # v6.12+
Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |   1 +
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 106 +++++++++++++++++++-
 2 files changed, 106 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index ef42df4753ec4..74950d98ba09f 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -861,6 +861,7 @@ struct arm_smmu_device {
 #define ARM_SMMU_OPT_MSIPOLL		(1 << 2)
 #define ARM_SMMU_OPT_CMDQ_FORCE_SYNC	(1 << 3)
 #define ARM_SMMU_OPT_TEGRA241_CMDQV	(1 << 4)
+#define ARM_SMMU_OPT_KDUMP		(1 << 5)
 	u32				options;
 
 	struct arm_smmu_cmdq		cmdq;
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index f6901c5437edc..9a45f17200a21 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -4553,11 +4553,115 @@ static int arm_smmu_init_strtab_linear(struct arm_smmu_device *smmu)
 	return 0;
 }
 
+static int arm_smmu_adopt_strtab_2lvl(struct arm_smmu_device *smmu, u32 cfg_reg,
+				      dma_addr_t dma)
+{
+	u32 log2size = FIELD_GET(STRTAB_BASE_CFG_LOG2SIZE, cfg_reg);
+	u32 split = FIELD_GET(STRTAB_BASE_CFG_SPLIT, cfg_reg);
+	struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg;
+	u32 num_l1_ents;
+	int i;
+
+	if (log2size < split) {
+		dev_err(smmu->dev, "kdump: invalid log2size %u < split %u\n",
+			log2size, split);
+		return -EINVAL;
+	}
+
+	if (split != STRTAB_SPLIT) {
+		dev_err(smmu->dev,
+			"kdump: unsupported STRTAB_SPLIT %u (expected %u)\n",
+			split, STRTAB_SPLIT);
+		return -EINVAL;
+	}
+
+	num_l1_ents = 1 << (log2size - split);
+	cfg->l2.l1_dma = dma;
+	cfg->l2.num_l1_ents = num_l1_ents;
+	cfg->l2.l1tab = devm_memremap(
+		smmu->dev, dma, num_l1_ents * sizeof(struct arm_smmu_strtab_l1),
+		MEMREMAP_WB);
+	if (!cfg->l2.l1tab)
+		return -ENOMEM;
+
+	cfg->l2.l2ptrs = devm_kcalloc(smmu->dev, num_l1_ents,
+				      sizeof(*cfg->l2.l2ptrs), GFP_KERNEL);
+	if (!cfg->l2.l2ptrs)
+		return -ENOMEM;
+
+	for (i = 0; i < num_l1_ents; i++) {
+		u64 l2ptr = le64_to_cpu(cfg->l2.l1tab[i].l2ptr);
+		u32 span = FIELD_GET(STRTAB_L1_DESC_SPAN, l2ptr);
+		dma_addr_t l2_dma = l2ptr & STRTAB_L1_DESC_L2PTR_MASK;
+
+		if (span && l2_dma) {
+			cfg->l2.l2ptrs[i] = devm_memremap(
+				smmu->dev, l2_dma,
+				sizeof(struct arm_smmu_strtab_l2), MEMREMAP_WB);
+			if (!cfg->l2.l2ptrs[i])
+				return -ENOMEM;
+		}
+	}
+
+	return 0;
+}
+
+static int arm_smmu_adopt_strtab_linear(struct arm_smmu_device *smmu,
+					u32 cfg_reg, dma_addr_t dma)
+{
+	u32 log2size = FIELD_GET(STRTAB_BASE_CFG_LOG2SIZE, cfg_reg);
+	struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg;
+
+	cfg->linear.ste_dma = dma;
+	cfg->linear.num_ents = 1 << log2size;
+	cfg->linear.table = devm_memremap(smmu->dev, dma,
+					  cfg->linear.num_ents *
+						  sizeof(struct arm_smmu_ste),
+					  MEMREMAP_WB);
+	if (!cfg->linear.table)
+		return -ENOMEM;
+	return 0;
+}
+
+static int arm_smmu_adopt_strtab(struct arm_smmu_device *smmu)
+{
+	u32 cfg_reg = readl_relaxed(smmu->base + ARM_SMMU_STRTAB_BASE_CFG);
+	u64 base_reg = readq_relaxed(smmu->base + ARM_SMMU_STRTAB_BASE);
+	u32 fmt = FIELD_GET(STRTAB_BASE_CFG_FMT, cfg_reg);
+	dma_addr_t dma = base_reg & STRTAB_BASE_ADDR_MASK;
+	int ret;
+
+	dev_info(smmu->dev, "kdump: adopting crashed kernel's stream table\n");
+
+	if (fmt == STRTAB_BASE_CFG_FMT_2LVL) {
+		/*
+		 * Both kernels run on the same hardware, so it's impossible for
+		 * kdump kernel to see the support for linear stream table only.
+		 */
+		if (WARN_ON(!(smmu->features & ARM_SMMU_FEAT_2_LVL_STRTAB)))
+			return -EINVAL;
+		ret = arm_smmu_adopt_strtab_2lvl(smmu, cfg_reg, dma);
+	} else if (fmt == STRTAB_BASE_CFG_FMT_LINEAR) {
+		/*
+		 * In case that the old kernel for some reason used the linear
+		 * format, enforce the same format to match the adopted table.
+		 */
+		smmu->features &= ~ARM_SMMU_FEAT_2_LVL_STRTAB;
+		ret = arm_smmu_adopt_strtab_linear(smmu, cfg_reg, dma);
+	} else {
+		dev_err(smmu->dev, "kdump: invalid STRTAB format %u\n", fmt);
+		ret = -EINVAL;
+	}
+	return ret;
+}
+
 static int arm_smmu_init_strtab(struct arm_smmu_device *smmu)
 {
 	int ret;
 
-	if (smmu->features & ARM_SMMU_FEAT_2_LVL_STRTAB)
+	if (smmu->options & ARM_SMMU_OPT_KDUMP)
+		ret = arm_smmu_adopt_strtab(smmu);
+	else if (smmu->features & ARM_SMMU_FEAT_2_LVL_STRTAB)
 		ret = arm_smmu_init_strtab_2lvl(smmu);
 	else
 		ret = arm_smmu_init_strtab_linear(smmu);
-- 
2.43.0



^ permalink raw reply related

* [PATCH rc v2 4/5] iommu/arm-smmu-v3: Skip EVTQ/PRIQ setup in kdump kernel
From: Nicolin Chen @ 2026-04-15 21:17 UTC (permalink / raw)
  To: will, robin.murphy, jgg, kevin.tian
  Cc: joro, praan, baolu.lu, miko.lenczewski, smostafa,
	linux-arm-kernel, iommu, linux-kernel, stable, jamien
In-Reply-To: <cover.1776286352.git.nicolinc@nvidia.com>

In kdump cases, the crashed kernel's CDs and page tables can be corrupted,
which could trigger event spamming. Also, we cannot serve page requests.

Skip the EVTQ/PRIQ setup entirely rather than enabling then disabling them.

Also add some inline comments explaining that.

Fixes: b63b3439b856 ("iommu/arm-smmu-v3: Abort all transactions if SMMU is enabled in kdump kernel")
Cc: stable@vger.kernel.org # v6.12+
Suggested-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 43 +++++++++++++--------
 1 file changed, 27 insertions(+), 16 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index b2c34713bf9f2..12cd148a99dc6 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -5023,21 +5023,35 @@ static int arm_smmu_device_reset(struct arm_smmu_device *smmu)
 	cmd.opcode = CMDQ_OP_TLBI_NSNH_ALL;
 	arm_smmu_cmdq_issue_cmd_with_sync(smmu, &cmd);
 
-	/* Event queue */
-	writeq_relaxed(smmu->evtq.q.q_base, smmu->base + ARM_SMMU_EVTQ_BASE);
-	writel_relaxed(smmu->evtq.q.llq.prod, smmu->page1 + ARM_SMMU_EVTQ_PROD);
-	writel_relaxed(smmu->evtq.q.llq.cons, smmu->page1 + ARM_SMMU_EVTQ_CONS);
-
-	enables |= CR0_EVTQEN;
-	ret = arm_smmu_write_reg_sync(smmu, enables, ARM_SMMU_CR0,
-				      ARM_SMMU_CR0ACK);
-	if (ret) {
-		dev_err(smmu->dev, "failed to enable event queue\n");
-		return ret;
+	/*
+	 * Event queue
+	 *
+	 * Do not enable in a kdump case, as the crashed kernel's CDs and page
+	 * tables might be corrupted, triggering event spamming.
+	 */
+	if (!is_kdump_kernel()) {
+		writeq_relaxed(smmu->evtq.q.q_base,
+			       smmu->base + ARM_SMMU_EVTQ_BASE);
+		writel_relaxed(smmu->evtq.q.llq.prod,
+			       smmu->page1 + ARM_SMMU_EVTQ_PROD);
+		writel_relaxed(smmu->evtq.q.llq.cons,
+			       smmu->page1 + ARM_SMMU_EVTQ_CONS);
+
+		enables |= CR0_EVTQEN;
+		ret = arm_smmu_write_reg_sync(smmu, enables, ARM_SMMU_CR0,
+					      ARM_SMMU_CR0ACK);
+		if (ret) {
+			dev_err(smmu->dev, "failed to enable event queue\n");
+			return ret;
+		}
 	}
 
-	/* PRI queue */
-	if (smmu->features & ARM_SMMU_FEAT_PRI) {
+	/*
+	 * PRI queue
+	 *
+	 * Do not enable in a kdump case, as we cannot serve page requests.
+	 */
+	if (!is_kdump_kernel() && (smmu->features & ARM_SMMU_FEAT_PRI)) {
 		writeq_relaxed(smmu->priq.q.q_base,
 			       smmu->base + ARM_SMMU_PRIQ_BASE);
 		writel_relaxed(smmu->priq.q.llq.prod,
@@ -5070,9 +5084,6 @@ static int arm_smmu_device_reset(struct arm_smmu_device *smmu)
 		return ret;
 	}
 
-	if (is_kdump_kernel())
-		enables &= ~(CR0_EVTQEN | CR0_PRIQEN);
-
 	/* Enable the SMMU interface */
 	enables |= CR0_SMMUEN;
 	ret = arm_smmu_write_reg_sync(smmu, enables, ARM_SMMU_CR0,
-- 
2.43.0



^ permalink raw reply related

* [PATCH rc v2 3/5] iommu/arm-smmu-v3: Retain CR0_SMMUEN during kdump device reset
From: Nicolin Chen @ 2026-04-15 21:17 UTC (permalink / raw)
  To: will, robin.murphy, jgg, kevin.tian
  Cc: joro, praan, baolu.lu, miko.lenczewski, smostafa,
	linux-arm-kernel, iommu, linux-kernel, stable, jamien
In-Reply-To: <cover.1776286352.git.nicolinc@nvidia.com>

When ARM_SMMU_OPT_KDUMP is set, skip the GBPA/disable/CR1/CR2/STRTAB_BASE
update sequence in arm_smmu_device_reset(). Those register writes are all
CONSTRAINED UNPREDICTABLE while CR0_SMMUEN==1, so leaving them untouched
lets in-flight DMA continue to be translated by the adopted stream table.

Initialize 'enables' to 0 so it can carry CR0_SMMUEN in kdump case. Then,
preserve that when enabling the command queue.

Fixes: b63b3439b856 ("iommu/arm-smmu-v3: Abort all transactions if SMMU is enabled in kdump kernel")
Cc: stable@vger.kernel.org # v6.12+
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 29 +++++++++++++++++++--
 1 file changed, 27 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index d9d543eb8cecf..b2c34713bf9f2 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -4938,9 +4938,23 @@ static void arm_smmu_write_strtab(struct arm_smmu_device *smmu)
 static int arm_smmu_device_reset(struct arm_smmu_device *smmu)
 {
 	int ret;
-	u32 reg, enables;
+	u32 reg, enables = 0;
 	struct arm_smmu_cmdq_ent cmd;
 
+	/*
+	 * In a kdump case, retain CR0_SMMUEN to avoid transiently aborting in-
+	 * flight DMA. According to spec, updating STRTAB_BASE, CR1, or CR2 when
+	 * CR0_SMMUEN=1 is CONSTRAINED UNPREDICTABLE. Thus, skip those register
+	 * updates and rely on the adopted stream table from the crashed kernel.
+	 */
+	if (smmu->options & ARM_SMMU_OPT_KDUMP) {
+		dev_info(smmu->dev,
+			 "kdump: retaining SMMUEN for in-flight DMA\n");
+		/* ARM_SMMU_OPT_KDUMP is only set when CR0_SMMUEN=1 */
+		enables = CR0_SMMUEN;
+		goto reset_queues;
+	}
+
 	/* Clear CR0 and sync (disables SMMU and queue processing) */
 	reg = readl_relaxed(smmu->base + ARM_SMMU_CR0);
 	if (reg & CR0_SMMUEN) {
@@ -4972,12 +4986,23 @@ static int arm_smmu_device_reset(struct arm_smmu_device *smmu)
 	/* Stream table */
 	arm_smmu_write_strtab(smmu);
 
+reset_queues:
+	if (smmu->options & ARM_SMMU_OPT_KDUMP) {
+		/* Disable queues since arm_smmu_device_disable() was skipped */
+		ret = arm_smmu_write_reg_sync(smmu, enables, ARM_SMMU_CR0,
+					      ARM_SMMU_CR0ACK);
+		if (ret) {
+			dev_err(smmu->dev, "failed to disable queues\n");
+			return ret;
+		}
+	}
+
 	/* Command queue */
 	writeq_relaxed(smmu->cmdq.q.q_base, smmu->base + ARM_SMMU_CMDQ_BASE);
 	writel_relaxed(smmu->cmdq.q.llq.prod, smmu->base + ARM_SMMU_CMDQ_PROD);
 	writel_relaxed(smmu->cmdq.q.llq.cons, smmu->base + ARM_SMMU_CMDQ_CONS);
 
-	enables = CR0_CMDQEN;
+	enables |= CR0_CMDQEN;
 	ret = arm_smmu_write_reg_sync(smmu, enables, ARM_SMMU_CR0,
 				      ARM_SMMU_CR0ACK);
 	if (ret) {
-- 
2.43.0



^ permalink raw reply related

* [PATCH rc v2 0/5] iommu/arm-smmu-v3: Fix device crash on kdump kernel
From: Nicolin Chen @ 2026-04-15 21:17 UTC (permalink / raw)
  To: will, robin.murphy, jgg, kevin.tian
  Cc: joro, praan, baolu.lu, miko.lenczewski, smostafa,
	linux-arm-kernel, iommu, linux-kernel, stable, jamien

When transitioning to a kdump kernel, the primary kernel might have crashed
while endpoint devices were actively bus-mastering DMA. Currently, the SMMU
driver aggressively resets the hardware during probe by clearing CR0_SMMUEN
and setting the Global Bypass Attribute (GBPA) to ABORT.

In a kdump scenario, this aggressive reset is highly destructive:
a) If GBPA is set to ABORT, in-flight DMA will be aborted, generating fatal
   PCIe AER or SErrors that may panic the kdump kernel
b) If GBPA is set to BYPASS, in-flight DMA targeting some IOVAs will bypass
   the SMMU and corrupt the physical memory at those 1:1 mapped IOVAs.

To safely absorb in-flight DMA, the kdump kernel must leave SMMUEN=1 intact
and avoid modifying STRTAB_BASE. This allows HW to continue translating in-
flight DMA using the crashed kernel's page tables until the endpoint device
drivers probe and quiesce their respective hardware.

However, the ARM SMMUv3 architecture specification states that updating the
SMMU_STRTAB_BASE register while SMMUEN == 1 is UNPREDICTABLE or ignored.

This leaves a kdump kernel no choice but to adopt the stream table from the
crashed kernel.

In this series:
 - Introduce an ARM_SMMU_OPT_KDUMP
 - Skip SMMUEN and STRTAB_BASE resets in arm_smmu_device_reset()
 - Map the crashed kernel's stream tables into the kdump kernel [*]
 - Defer any default domain attachment to retain STEs until device drivers
   explicitly request it.

[*] This is implemented via memremap, which only works on a coherent SMMU.

Note that the entire series requires Jason's work that was merged in v6.12:
85196f54743d ("iommu/arm-smmu-v3: Reorganize struct arm_smmu_strtab_cfg").
I have a backported version that is verified with a v6.8 kernel. I can send
if we see a strong need after this version is accepted.

This is on Github:
https://github.com/nicolinc/iommufd/commits/smmuv3_kdump-v2

Changelog
v2
 * Add warning in non-coherent SMMU cases
 * Keep eventq/priq disabled v.s. enabling-and-disabling-later
 * Check KDUMP option in the beginning of arm_smmu_device_reset()
 * Validate STRTAB format matches HW capability instead of forcing flags
v1:
 https://lore.kernel.org/all/cover.1775763475.git.nicolinc@nvidia.com/

Nicolin Chen (5):
  iommu/arm-smmu-v3: Add arm_smmu_adopt_strtab() for kdump
  iommu/arm-smmu-v3: Implement is_attach_deferred() for kdump
  iommu/arm-smmu-v3: Retain CR0_SMMUEN during kdump device reset
  iommu/arm-smmu-v3: Skip EVTQ/PRIQ setup in kdump kernel
  iommu/arm-smmu-v3: Detect ARM_SMMU_OPT_KDUMP in
    arm_smmu_device_hw_probe()

 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |   1 +
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 225 ++++++++++++++++++--
 2 files changed, 207 insertions(+), 19 deletions(-)

-- 
2.43.0

^ permalink raw reply

* Re: [PATCH rc v1 1/4] iommu/arm-smmu-v3: Add arm_smmu_adopt_strtab() for kdump
From: Nicolin Chen @ 2026-04-15 20:57 UTC (permalink / raw)
  To: jgg, will, robin.murphy
  Cc: jamien, joro, praan, baolu.lu, kevin.tian, smostafa,
	miko.lenczewski, linux-arm-kernel, iommu, linux-kernel, stable
In-Reply-To: <30c7c51c8771722813a9cf54dae7a1b5d0aeb65d.1775763475.git.nicolinc@nvidia.com>

On Thu, Apr 09, 2026 at 12:46:50PM -0700, Nicolin Chen wrote:
> +	if (fmt == STRTAB_BASE_CFG_FMT_2LVL) {
> +		/* Enforce 2-level feature flag to match the adopted table */
> +		smmu->features |= ARM_SMMU_FEAT_2_LVL_STRTAB;
> +		ret = arm_smmu_adopt_strtab_2lvl(smmu, cfg_reg, dma);
> +	} else if (fmt == STRTAB_BASE_CFG_FMT_LINEAR) {
> +		/* Force linear feature flag to match the adopted table */
> +		smmu->features &= ~ARM_SMMU_FEAT_2_LVL_STRTAB;
> +		ret = arm_smmu_adopt_strtab_linear(smmu, cfg_reg, dma);

Made a small fix here. Including it in v2.

@@ -4662,11 +4662,18 @@ static int arm_smmu_adopt_strtab(struct arm_smmu_device *smmu)
        dev_info(smmu->dev, "kdump: adopting crashed kernel's stream table\n");

        if (fmt == STRTAB_BASE_CFG_FMT_2LVL) {
-               /* Enforce 2-level feature flag to match the adopted table */
-               smmu->features |= ARM_SMMU_FEAT_2_LVL_STRTAB;
+               /*
+                * Both kernels run on the same hardware, so it's impossible for
+                * kdump kernel to see the support for linear stream table only.
+                */
+               if (WARN_ON(!(smmu->features & ARM_SMMU_FEAT_2_LVL_STRTAB)))
+                       return -EINVAL;
                ret = arm_smmu_adopt_strtab_2lvl(smmu, cfg_reg, dma);
        } else if (fmt == STRTAB_BASE_CFG_FMT_LINEAR) {
-               /* Force linear feature flag to match the adopted table */
+               /*
+                * In case that the old kernel for some reason used the linear
+                * format, enforce the same format to match the adopted table.
+                */
                smmu->features &= ~ARM_SMMU_FEAT_2_LVL_STRTAB;
                ret = arm_smmu_adopt_strtab_linear(smmu, cfg_reg, dma);
        } else {

Nicolin


^ permalink raw reply

* Re: [PATCH net-next] net: stmmac: enable RPS and RBU interrupts
From: Sam Edwards @ 2026-04-15 20:50 UTC (permalink / raw)
  To: Russell King (Oracle)
  Cc: Jakub Kicinski, Andrew Lunn, Alexandre Torgue, Andrew Lunn,
	David S. Miller, Eric Dumazet,
	moderated list:BROADCOM BCM2711/BCM2835 ARM ARCHITECTURE,
	linux-stm32, Linux Network Development Mailing List, Paolo Abeni
In-Reply-To: <ad_o4aDP0UBY_8i4@shell.armlinux.org.uk>

On Wed, Apr 15, 2026 at 12:37 PM Russell King (Oracle)
<linux@armlinux.org.uk> wrote:
>
> It's not a question about how I define RBU - this is defined by Synopsys
> and I'm using it *exactly* that way as stated in the documentation.
>
> "This bit indicates that the host owns the Next Descriptor in the
> Receive List and the DMA cannot acquire it. The Receive Process is
> suspended. ... This bit is set only when the previous Receive
> Descriptor is owned by the DMA."
>
> In other words, DMA has processed the previous receive descriptor which
> _was_ owned by the hardware, written back to clear the OWN bit, and
> then fetches the next descriptor and finds that the OWN bit is also
> clear.

I'm only trying to leave open the possibility that the Synopsys
technical writer and the hardware implementation team weren't
communicating clearly. We already have a situation where RPS isn't
behaving as documented (even if that's likely just hardware
misconfiguration), so while I'm currently pretty sure RBU carries no
other (actual) meaning than "DMA caught up to OWN=0," I'm only about
75% confident.

> > It would seem* that the kernel isn't really failing to keep up with
> > the packet rate. If RBU is firing with a ring that's not even close to
> > empty, that tells me there's another way for it to fire. So I suspect
> > the hardware designers implemented it to mean:
> > "We couldn't read the next descriptor, _or_ it wasn't owned by hardware."
> >
> > (* However, if bit 31 is clear everywhere, wouldn't that mean the ring
> > is actually completely depleted, not full? If count==budget, wouldn't
> > that mean the whole ring hasn't been visited, so we only refilled 64
> > entries and not necessarily the entire ring? Maybe the kernel isn't
> > keeping up after all.)
>
> Ah, I think that's where our terminology differs.
>
> You seem to define full as "populated with empty buffers". I define
> full to mean "the hardware has filled every buffer with a packet that
> it has received and handed it over to software to process." Note even
> the terminology there - filling buffers with data. That ultimately
> ends up filling the ring, and when completely filled, it is full.
>
> I think of buffers like buckets. If a buffer contains no data, it
> is empty. If a buffer contains data, it has been filled or is full.
> Apply that to a list of buffers and you get the same thing. Many
> ethernet driver documentation uses this same terminology, so I
> thought it would be widely understood.

Ah okay, I was beginning to suspect the same. In my defense: though I
also think of buffers in the same way, this driver calls the process
of supplying empty buffers "refilling," which is also the terminology
we've both been using throughout this exchange, and when something is
"completely refilled" I generally call it "full." But I'm realizing
now that the bidirectional (submissions+completions) nature of this
ring means that "full" and "empty" aren't really well-defined
concepts. I'll try to read more carefully (and switch to saying
"completely dirty" and "completely clean") going forward.

So the kernel is able to supply clean buffers without issue, but it
somehow falls behind the incoming packet rate and the DMA is left with
a completely dirty ring. I agree that stmmac_rx() is therefore just
not running fast enough: either it's got really bad scheduler jitter
for the ~6.3ms minimum it takes for 512x full-sized Ethernet frames to
arrive from the PHY (your scenario 1), or -- more likely -- the NAPI
budgets gradually fall behind the hardware (your scenario 2).

> Right, 40KiB. Sorry, I'm getting interrupted almost constantly while
> trying to do anything.
>
> However, I've tested with 0x7f in both fields, and it still falls flat
> on its face. I've also tried other values, but because I had to unplug
> the laptop from the nvidia board to use the laptop portably due to the
> medical emergency situation, that caused screen to quit, so I've lost
> all that. Chaos reigns supreme here :/

I'm sorry to hear about that, please prioritize you/yours and don't
feel like you owe me speedy replies.

> So, I'm not sure we understand what's going on - I don't think it's that
> the FIFOs are smaller than specified. I suspect that the 9KiB vs 36KiB
> results in some kind of throttling that prevents the condition which
> hangs the hardware.

I'll try playing with the FIFO configuration on my end to learn:
a) If a suitably-configured FIFO size makes the RPS status arrive as documented
b) If I can safely fill the FIFO slowly (by manually stalling the
driver and adding frames one at a time) and have it drain on resume
c) Whether the TQS value can be adjusted independently of this
problem's prevalence
d) The maximum RQS value that allows the problem to happen

> I'm not getting as much time as I'd like to really test out scenarios
> due to everything that is going on, and honestly I feel like just
> writing this week off now and giving up.

I have the same hardware, observe the same issue, and find this
interesting enough to keep plugging away at it. I would have no hard
feelings if you left me alone with this problem for a bit. :)

Be well,
Sam

^ permalink raw reply

* Re: [RFC PATCH] mmc: host: sdhci-iproc: implement the .hw_reset callback
From: Florian Fainelli @ 2026-04-15 20:44 UTC (permalink / raw)
  To: Meagan Lloyd, Scott Branden
  Cc: rjui, sbranden, linux-arm-kernel, tgopinath, adrian.hunter,
	linux-mmc, kernel-list
In-Reply-To: <58294850-4ed4-4fb4-8f46-186063b76a2f@linux.microsoft.com>

On 4/15/26 13:43, Meagan Lloyd wrote:
> 
> On 4/15/2026 1:23 PM, Scott Branden wrote:
>> Hi Meagan,
>>
>> On Wed, Apr 15, 2026 at 11:08 AM Meagan Lloyd <
>> meaganlloyd@linux.microsoft.com> wrote:
>>
>>> On 4/13/2026 10:43 AM, Florian Fainelli wrote:
>>>> On 4/13/26 10:38, Meagan Lloyd wrote:
>>>>> On 3/27/2026 3:21 PM, Meagan Lloyd wrote:
>>>>>> Implement the .hw_reset callback so that the eMMC can be reset as
>>>>>> needed
>>>>>> given cap-mmc-hw-reset is set in the devicetree and the
>>>>>> functionality is
>>>>>> enabled on the eMMC.
>>>>>>
>>>>>> Signed-off-by: Meagan Lloyd <meaganlloyd@linux.microsoft.com>
>>>>>> ---
>>>>>>
>>>>>> SDHCI_POWER_CONTROL[4] (SD Host Controller Standard) has been
>>>>>> repurposed
>>>>>> on my Broadcomm processor to be eMMC hardware reset
>>>>>> (SDIO*_eMMCSDXC_CTRL[12], HRESET).
>>>>>>
>>>>>> Can you confirm this repurposed bit is consistent across the Broadcomm
>>>>>> iProc processors and thus the .hw_reset callback can be uniformly
>>>>>> applied in this driver?
>>>>> Hi Ray & Scott,
>>>>>
>>>>> I hope you're doing well. This bit looks to have been repurposed from
>>>>> the SD Host Controller Standard's VDD2 Power Control to being used for
>>>>> toggling the hardware reset signal to eMMCs. Can you verify that it
>>>>> applies across the iProc processors so that I may finalize this patch?
>>>> Which iProc process are you using? If you are not sure this applies
>>>> broadly, can you at least make it specific to the SoC you are using?
>>> Yes, if it comes to that I can. I think it's overkill to roll a new
>>> compat string/associated structures over this small change, hence
>>> checking with Broadcomm iProc maintainers on this thread.
>>>
>> Which iProc processor are you using?  You will have to check with
>> RaspberryPI as I think they use this driver as well.
>> If that family also supports it then you probably don't need a
>> compatibility string.
> 
> The processor I am using is the BCM58732. Can you help direct me to
> someone who could comment from the RaspberryPi side?
> 

I will take care of that.
-- 
Florian


^ permalink raw reply

* Re: [RFC PATCH] mmc: host: sdhci-iproc: implement the .hw_reset callback
From: Meagan Lloyd @ 2026-04-15 20:43 UTC (permalink / raw)
  To: Scott Branden
  Cc: Florian Fainelli, rjui, sbranden, linux-arm-kernel, tgopinath,
	adrian.hunter, linux-mmc, kernel-list
In-Reply-To: <CA+Jzhd+_CKn1X0YDsoh9-OFeCG9sc2Q0OFfphkRaXpEA647xyQ@mail.gmail.com>


On 4/15/2026 1:23 PM, Scott Branden wrote:
> Hi Meagan,
>
> On Wed, Apr 15, 2026 at 11:08 AM Meagan Lloyd <
> meaganlloyd@linux.microsoft.com> wrote:
>
>> On 4/13/2026 10:43 AM, Florian Fainelli wrote:
>>> On 4/13/26 10:38, Meagan Lloyd wrote:
>>>> On 3/27/2026 3:21 PM, Meagan Lloyd wrote:
>>>>> Implement the .hw_reset callback so that the eMMC can be reset as
>>>>> needed
>>>>> given cap-mmc-hw-reset is set in the devicetree and the
>>>>> functionality is
>>>>> enabled on the eMMC.
>>>>>
>>>>> Signed-off-by: Meagan Lloyd <meaganlloyd@linux.microsoft.com>
>>>>> ---
>>>>>
>>>>> SDHCI_POWER_CONTROL[4] (SD Host Controller Standard) has been
>>>>> repurposed
>>>>> on my Broadcomm processor to be eMMC hardware reset
>>>>> (SDIO*_eMMCSDXC_CTRL[12], HRESET).
>>>>>
>>>>> Can you confirm this repurposed bit is consistent across the Broadcomm
>>>>> iProc processors and thus the .hw_reset callback can be uniformly
>>>>> applied in this driver?
>>>> Hi Ray & Scott,
>>>>
>>>> I hope you're doing well. This bit looks to have been repurposed from
>>>> the SD Host Controller Standard's VDD2 Power Control to being used for
>>>> toggling the hardware reset signal to eMMCs. Can you verify that it
>>>> applies across the iProc processors so that I may finalize this patch?
>>> Which iProc process are you using? If you are not sure this applies
>>> broadly, can you at least make it specific to the SoC you are using?
>> Yes, if it comes to that I can. I think it's overkill to roll a new
>> compat string/associated structures over this small change, hence
>> checking with Broadcomm iProc maintainers on this thread.
>>
> Which iProc processor are you using?  You will have to check with
> RaspberryPI as I think they use this driver as well.
> If that family also supports it then you probably don't need a
> compatibility string.

The processor I am using is the BCM58732. Can you help direct me to
someone who could comment from the RaspberryPi side?



^ permalink raw reply

* [PATCH] arm64: cpufeature: Fix GCIE field ordering in ftr_id_aa64pfr2
From: Mukesh Ojha @ 2026-04-15 20:00 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Marc Zyngier
  Cc: linux-arm-kernel, linux-kernel, Mukesh Ojha

The ftr_id_aa64pfr2[] array must be sorted in descending order of
shift value so that the overlap validation in init_cpu_features()
works correctly. The GCIE field (bits 15:12, shift=12) was placed
last in the array, after MTEFAR (bits 11:8, shift=8) and
MTESTOREONLY (bits 7:4, shift=4), causing a spurious warning at
boot:

[    0.000000] SYS_ID_AA64PFR2_EL1 has feature overlap at shift 12
[    0.000000] WARNING: arch/arm64/kernel/cpufeature.c:989 at init_cpu_features+0x144/0x3d0, CPU#0:
swapper/0
..

[    0.000000] pc : init_cpu_features+0x144/0x3d0
[    0.000000] lr : init_cpu_features+0x144/0x3d0
[    0.000000] sp : ffffc08678f03dc0

...
    0.000000] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffffc08678f14000
[    0.000000] Call trace:
[    0.000000]  init_cpu_features+0x144/0x3d0 (P)
[    0.000000]  cpuinfo_store_boot_cpu+0x4c/0x5c
[    0.000000]  smp_prepare_boot_cpu+0x28/0x38
[    0.000000]  start_kernel+0x1d4/0x848
[    0.000000]  __primary_switched+0x88/0x90

This is because the overlap check computes (shift + width) > prev_shift,
i.e. (12 + 4) > 8, which triggers since GCIE occupies bits above MTEFAR
but was listed after it.

Fix the ordering to match the register layout: FPMR(35:32), GCIE(15:12),
MTEFAR(11:8), MTESTOREONLY(7:4).

Fixes: 899ff451fcee ("KVM: arm64: Advertise ID_AA64PFR2_EL1.GCIE")
Signed-off-by: Mukesh Ojha <mukesh.ojha@oss.qualcomm.com>
---
 arch/arm64/kernel/cpufeature.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 48f2d894101d..6d53bb15cf7b 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -328,9 +328,9 @@ static const struct arm64_ftr_bits ftr_id_aa64pfr1[] = {
 
 static const struct arm64_ftr_bits ftr_id_aa64pfr2[] = {
 	ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64PFR2_EL1_FPMR_SHIFT, 4, 0),
+	ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64PFR2_EL1_GCIE_SHIFT, 4, ID_AA64PFR2_EL1_GCIE_NI),
 	ARM64_FTR_BITS(FTR_VISIBLE, FTR_NONSTRICT, FTR_LOWER_SAFE, ID_AA64PFR2_EL1_MTEFAR_SHIFT, 4, ID_AA64PFR2_EL1_MTEFAR_NI),
 	ARM64_FTR_BITS(FTR_VISIBLE, FTR_NONSTRICT, FTR_LOWER_SAFE, ID_AA64PFR2_EL1_MTESTOREONLY_SHIFT, 4, ID_AA64PFR2_EL1_MTESTOREONLY_NI),
-	ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64PFR2_EL1_GCIE_SHIFT, 4, ID_AA64PFR2_EL1_GCIE_NI),
 	ARM64_FTR_END,
 };
 
-- 
2.53.0



^ permalink raw reply related

* Re: [PATCH net v5] net: stmmac: Prevent NULL deref when RX memory exhausted
From: Russell King (Oracle) @ 2026-04-15 19:58 UTC (permalink / raw)
  To: Sam Edwards
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Maxime Coquelin, Alexandre Torgue, Maxime Chevallier,
	Ovidiu Panait, Vladimir Oltean, Baruch Siach, Serge Semin,
	Giuseppe Cavallaro, netdev, linux-stm32, linux-arm-kernel,
	linux-kernel, stable
In-Reply-To: <CAH5Ym4gy6g8d88-vGhe1zxoV7jNH_fXHsDSdDWC4x00H7s-3=w@mail.gmail.com>

On Wed, Apr 15, 2026 at 10:53:15AM -0700, Sam Edwards wrote:
> On Wed, Apr 15, 2026 at 9:28 AM Russell King (Oracle)
> <linux@armlinux.org.uk> wrote:
> >
> > On Wed, Apr 15, 2026 at 01:56:32PM +0100, Russell King (Oracle) wrote:
> > > Locally, while debugging my issues, I used this to prevent cur_rx
> > > catching up with dirty_rx:
> > >
> > >                 status = stmmac_rx_status(priv, &priv->xstats, p);
> > >                 /* check if managed by the DMA otherwise go ahead */
> > >                 if (unlikely(status & dma_own))
> > >                         break;
> > >
> > >                 next_entry = STMMAC_NEXT_ENTRY(rx_q->cur_rx,
> > >                                                priv->dma_conf.dma_rx_size);
> > >                 if (unlikely(next_entry == rx_q->dirty_rx))
> > >                         break;
> > >
> > >                 rx_q->cur_rx = next_entry;
> > >
> > > If we care about the cost of reloading rx_q->dirty_rx on every
> > > iteration, then I'd suggest that the cost we already incur reading and
> > > writing rx_q->cur_rx is something that should be addressed, and
> > > eliminating that would counter the cost of reading rx_q->dirty_rx. I
> > > suspect, however, that the cost is minimal, as cur_tx and dirty_rx are
> > > likely in the same cache line.
> 
> No, no, I like your approach better. :) It also removes the need for
> the `limit` clamp at the top of the function, so later code can assume
> limit==budget.
> 
> > > It looks like any fix to stmmac_rx() will also need a corresponding
> > > fix for stmmac_rx_zc().
> 
> I agree that stmmac_rx_zc() is likely also broken (in a similar way,
> but not similar enough to permit a "corresponding" fix), but I don't
> agree that there's a dependency relationship here. This patch is
> addressing #221010, which affects the generic/non-ZC codepath; I'm
> afraid the ZC codepath warrants its own investigation.

The code structure is identical. The only difference is what happens
to the packets.

Both paths take the NAPI limit. Both paths process up to that limit of
descriptors. The state saving / restoring is similar. The read_again
label is the same, the condition after is the same.

The ZC path differs at this point in that it will attempt to refill
every 16 descriptors that have been processed.

Both paths then read the descriptor and check the ownership.
Both paths then increment cur_rx to point to the next entry around
the ring.
Both paths then get the following descriptor pointer and prefetch
it.
Both paths then get the extended status if we're using extended
descriptors.
Both paths then handle frame discard.
Both paths then jump back to read_again if this isn't the last
segment and we have an error.
Both paths then check for error.
... and so it goes on.

The ZC path to me looks like a copy-paste-and-tweak approach to
adding support. The difference seems to be centered only around
the handling of the data buffers in the descriptors. The overall
mechanism of processing the descriptors follows the same layout
in both functions.

> > I have some further information, but a new curveball has just been
> > chucked... and I've no idea what this will mean at this stage. Just
> > take it that I won't be responding for a while.
> 
> I think I follow your meaning. Good luck getting it straightened out!

It looks like further curveballs have been thrown as a result,
destroying all "plans" for the next days/week. I have aboslutely
no ideas how much time or when I'll be able to look at anything
at the moment, so don't assume that because I find an opportunity
to send an email, everthing is back to normal.

I'll also note that over the last two days I've written several
emails on this, spent many hours on them, only to discard them
as other ideas/research and maybe even the passage of time means
they're no longer appropriate to send.

Jakub: sorry, I just *can't* review stuff on netdev with everything
that is going on, not when .... cna't complete this.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply

* Re: [PATCH net-next] net: stmmac: enable RPS and RBU interrupts
From: Russell King (Oracle) @ 2026-04-15 19:37 UTC (permalink / raw)
  To: Sam Edwards
  Cc: Jakub Kicinski, Andrew Lunn, Alexandre Torgue, Andrew Lunn,
	David S. Miller, Eric Dumazet,
	moderated list:BROADCOM BCM2711/BCM2835 ARM ARCHITECTURE,
	linux-stm32, Linux Network Development Mailing List, Paolo Abeni
In-Reply-To: <CAH5Ym4jKdzDeYwCfkMLmUz0FsiD2vFwfuAvqFE=uvMtPmakeMQ@mail.gmail.com>

On Wed, Apr 15, 2026 at 10:38:29AM -0700, Sam Edwards wrote:
> On Wed, Apr 15, 2026 at 5:44 AM Russell King (Oracle)
> <linux@armlinux.org.uk> wrote:
> >
> > On Tue, Apr 14, 2026 at 07:12:34PM -0700, Sam Edwards wrote:
> > > On Tue, Apr 14, 2026 at 6:19 PM Russell King (Oracle)
> > > <linux@armlinux.org.uk> wrote:
> > > > Okay, just a quick note to say that nvidia's 5.10.216-tegra kernel
> > > > survives iperf3 -c -R to the imx6.
> > >
> > > Hi Russell,
> > >
> > > Aw, you beat me to it! I was about to report that 5.10.104-tegra is
> > > unaffected. And my iperf3 server is a multi-GbE amd64 machine.
> > >
> > > > Dumping the registers and comparing, and then forcing the RQS and TQS
> > > > values to 0x23 (+1 = 36, *256 = 9216 bytes) and 0x8f (+1 = 144,
> > > > *256 = 36864 ytes) respectively seems to solve the problem. Under
> > > > net-next, these both end up being 0xff (+1 = 256, *256 = 65536 bytes.)
> > > > Suspiciously, 36 * 4 = 144, and I also see that this kernel programs
> > > > all four of the MTL receive operation mode registers, but only the
> > > > first MTL transmit operation mode register. However, DMA channels 1-3
> > > > aren't initialised.
> > >
> > > Wow, great! I wonder if the problem is that the MTL FIFOs are smaller
> > > than that, so when the DMA suffers a momentary hiccup, the FIFOs are
> > > allowed to overflow, putting the hardware in a bad state.
> > >
> > > Though I suspect this is only half of the problem: do you still see
> > > RBUs? Everything you've shared so far suggests the DMA failures are
> > > _not_ because the rx ring is drying up.
> >
> > Yes. Note that RBUs will happen not because of DMA failures, but if
> > the kernel fails to keep up with the packet rate. RBU means "we read
> > the next descriptor, and it wasn't owned by hardware".
> 
> Are you speaking from observation, documentation, or understanding?

Observation.

> I'd define RBU the same way, but you reported:

It's not a question about how I define RBU - this is defined by Synopsys
and I'm using it *exactly* that way as stated in the documentation.

"This bit indicates that the host owns the Next Descriptor in the
Receive List and the DMA cannot acquire it. The Receive Process is
suspended. ... This bit is set only when the previous Receive
Descriptor is owned by the DMA."

In other words, DMA has processed the previous receive descriptor which
_was_ owned by the hardware, written back to clear the OWN bit, and
then fetches the next descriptor and finds that the OWN bit is also
clear.

> 
> ```
> [   55.766199] dwc-eth-dwmac 2490000.ethernet eth0: q0: receive buffer
> unavailable: cur_rx=309 dirty_rx=309 last_cur_rx=245
> last_cur_rx_post=309 last_dirty_rx=245 count=64 budget=64
> 
> cur_rx == dirty_rx _should_ mean that we fully refilled the ring. [...]
> [...]
> Every ring entry contains the same RDES3 value, so it really is
> completely full at the point RBU fires (bit 31 clear means software
> owns the descriptor, and it's basically saying first/last segment,
> RDES1 valid, buffer 1 length of 1518.
> ```

Right, because the _last_ time stmmac_rx() was called, the ring was
completely refilled (as it always is for me).

There are two scenarios that what I'm seeing may happen.

1) The ring was fully refilled, but before stmmac_rx() is next
   executed, all descriptors end up being consumed due to the rate
   at which packets are being received. Thus, the hardware encounters
   a descriptor that has OWN=0

2) The kernel has been slow to respond to packets that have been
   received, and because of the NAPI throttling stmmac_rx() to only
   process 64 descriptors at a time, we are falling way behind the
   hardware position. Eventually, the hardware catches up with
   the point at which stmmac_rx_refill() is repopulating the receive
   descriptors, and encounters a descriptor that has OWN=0.

For (2), for example, let's take the example which you've quoted from
me.

stmmac_rx() gets called, and cur_rx = dirty_rx = 245. We're limited to
a count of 64 meaning we're not going to process more than 64 entries
no matter how far ahead the hardware is. Let's say the hardware is at
e.g. descriptor 400 at this point.

stmmac_rx() runs, processing descriptors. It works its way up to entry
309, at which point count == limit, so it stops, and we now have
cur_rx = 309, dirty_rx = 245.

The next thing stmmac_rx() does is call stmmac_rx_refill(). This looks
at the difference, and calculates how many entries need to be
repopulated. stmmac_rx_dirty() returns 64, as that's the number of
entries between dirty_rx and the updated cur_rx. It populates those
entries.

At this point, dirty_rx = 309. All well and good. However, during that
process, packet reception hasn't stopped, and let's say it's now at
descriptor 500.

In that scenario, we're consuming 100 descriptors, but only repopulating
64 descriptors. As this continues, the hardware is slowly catching up
with point in the ring that stmmac_rx_refill() is repopulating the
descriptors.

When it does catch up, it will encounter a descriptor with OWN=0, which
will fire the RBU interrupt.

At this point, my debug dumps the state of the ring. If the RBU was
raised when stmmac_rx()/stmmac_rx_refill() was not running, _and_ we
are always successfully refilling all the entries that stmmac_rx()
processed, then cur_rx will equal dirty_rx, even when the hardware
could be way ahead of cur_rx. Neither of these indexes have any
relevance to where the hardware actually is in the ring.

The dump of the ring state *clearly* shows that all descriptors have
a RDES3 value which indicates that every single descriptor is not
hardware owned at this point (since RBU has been raised, the receive
process is suspended, so hardware is no longer changing the ring.)

> It would seem* that the kernel isn't really failing to keep up with
> the packet rate. If RBU is firing with a ring that's not even close to
> empty, that tells me there's another way for it to fire. So I suspect
> the hardware designers implemented it to mean:
> "We couldn't read the next descriptor, _or_ it wasn't owned by hardware."
> 
> (* However, if bit 31 is clear everywhere, wouldn't that mean the ring
> is actually completely depleted, not full? If count==budget, wouldn't
> that mean the whole ring hasn't been visited, so we only refilled 64
> entries and not necessarily the entire ring? Maybe the kernel isn't
> keeping up after all.)

Ah, I think that's where our terminology differs.

You seem to define full as "populated with empty buffers". I define
full to mean "the hardware has filled every buffer with a packet that
it has received and handed it over to software to process." Note even
the terminology there - filling buffers with data. That ultimately
ends up filling the ring, and when completely filled, it is full.

I think of buffers like buckets. If a buffer contains no data, it
is empty. If a buffer contains data, it has been filled or is full.
Apply that to a list of buffers and you get the same thing. Many
ethernet driver documentation uses this same terminology, so I
thought it would be widely understood.

> > That has:
> >
> >         const nveu32_t rx_fifo_sz[2U][OSI_EQOS_MAX_NUM_QUEUES] = {
> >                 { FIFO_SZ(9U), FIFO_SZ(9U), FIFO_SZ(9U), FIFO_SZ(9U),
> >                   FIFO_SZ(1U), FIFO_SZ(1U), FIFO_SZ(1U), FIFO_SZ(1U) },
> >                 { FIFO_SZ(36U), FIFO_SZ(2U), FIFO_SZ(2U), FIFO_SZ(2U),
> >                   FIFO_SZ(2U), FIFO_SZ(2U), FIFO_SZ(2U), FIFO_SZ(16U) },
> >         };
> >         const nveu32_t tx_fifo_sz[2U][OSI_EQOS_MAX_NUM_QUEUES] = {
> >                 { FIFO_SZ(9U), FIFO_SZ(9U), FIFO_SZ(9U), FIFO_SZ(9U),
> >                   FIFO_SZ(1U), FIFO_SZ(1U), FIFO_SZ(1U), FIFO_SZ(1U) },
> >                 { FIFO_SZ(8U), FIFO_SZ(8U), FIFO_SZ(8U), FIFO_SZ(8U),
> >                   FIFO_SZ(8U), FIFO_SZ(8U), FIFO_SZ(8U), FIFO_SZ(8U) },
> >         };
> >
> > where each of those values is the RQS/TQS value to use in KiB:
> >
> > #define FIFO_SZ(x)              ((((x) * 1024U) / 256U) - 1U)
> >
> > This doesn't correspond with the values I'm seeing programmed into
> > the hardware under the 5.10.216-tegra kernel. I'm seeing TQS = 143
> > (36KiB), and RQS = 35 (9KiB). Yes, these values exist in the tables
> > above from a quick look, but they're not in the right place!
> 
> True, but:
> a) I doubt 5.10.216-tegra includes exactly the same version of the
> driver found in this random GitHub mirror. (My intent was only to
> point out that they don't use 5.10's stmmac; I should have been more
> clear that I wasn't trying to link the same version, sorry!)
> b) This is vendor code; I don't know how good their testing/review
> process is. It might not run the way it looks. The intent seems to be
> for RQS > TQS (which makes intuitive sense), but as you're seeing the
> registers programmed the other way 'round, they might have gotten them
> subtly mixed up.
> 
> > Now, as for FIFO sizes, if we sum up all the entries, then we
> > get:
> >
> > SUM(rx_fifo_size[0][]) = 60KiB
> > SUM(rx_fifo_size[1][]) = 64KiB
> > SUM(tx_fifo_size[0][]) = 60KiB
> > SUM(tx_fifo_size[1][]) = 64KiB
> 
> I follow the math with 64KiB, but surely the 60KiB should be
> 9+9+9+9+1+1+1+1=40KiB? This seems to me that the "legacy EQOS" simply
> shifts with smaller FIFOs. Since dwmac is licensed as a soft IP core,
> perhaps the FIFO size is an elaboration parameter? That would mean
> this isn't an issue with dwmac 5.0 broadly, but with Nvidia's specific
> instantiation of it.

Right, 40KiB. Sorry, I'm getting interrupted almost constantly while
trying to do anything.

However, I've tested with 0x7f in both fields, and it still falls flat
on its face. I've also tried other values, but because I had to unplug
the laptop from the nvidia board to use the laptop portably due to the
medical emergency situation, that caused screen to quit, so I've lost
all that. Chaos reigns supreme here :/

So, I'm not sure we understand what's going on - I don't think it's that
the FIFOs are smaller than specified. I suspect that the 9KiB vs 36KiB
results in some kind of throttling that prevents the condition which
hangs the hardware.

I'm not getting as much time as I'd like to really test out scenarios
due to everything that is going on, and honestly I feel like just
writing this week off now and giving up.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply

* Re: [PATCH 1/4] exec: inherit HWCAPs from the parent process
From: Andrei Vagin @ 2026-04-15 19:27 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Andrei Vagin, Will Deacon, Kees Cook, Andrew Morton,
	Marek Szyprowski, Cyrill Gorcunov, Mike Rapoport,
	Alexander Mikhalitsyn, linux-kernel, linux-fsdevel, linux-mm,
	criu, Catalin Marinas, linux-arm-kernel, Chen Ridong,
	Christian Brauner, David Hildenbrand, Eric Biederman,
	Lorenzo Stoakes, Michal Koutny, Alexander Mikhalitsyn, Linux API
In-Reply-To: <adUhbk0sKT0ucWhJ@J2N7QTR9R3>

Hi Mark,

Thanks for the feedback and sorry for the delay, was on vacation.
Please see my comments inline.

On Tue, Apr 7, 2026 at 8:29 AM Mark Rutland <mark.rutland@arm.com> wrote:
>
> On Fri, Mar 27, 2026 at 05:21:26PM -0700, Andrei Vagin wrote:
> > Hi Mark,
> >
> > I understand all these points and they are valid. However, as I
> > mentioned, we are not trying to introduce a mechanism that will strictly
> > enforce feature sets for every container. While we would like to have
> > that functionality, as you and will mentioned, it would require
> > substantially more complexity to address, and maintainers would unlikely
> > to pick up that complexity.
>
> The crux of my complaint here is that unless you do that (to some
> degree), this is not going to work reliably, even with the constraints
> you outline.
>
> Further, I disagree with your proposed solution of pushing more
> constraints onto userspace (to also consider HWCAPs as overriding other
> mechainsms, etc).
>
> I think that as-is, the approach is flawed.

I would really appreciate it if we could move this conversation toward
how we can make it work.

>
> > Even masking ID registers on a per-container basis would introduce
> > extra complexity that could make architecture maintainers unhappy.
> > There were a few attempts to introduce container CPUID masking on
> > x86_64 in the past.
>
> > In CRIU, we are not aiming to handle every possible workload. Our goal
> > is to target workloads where developers are ready to cooperate and
> > willing to make adjustments to be C/R compatible. The goal here is to
> > provide developers with clear instructions on what they can do to ensure
> > their applications are C/R compatible. When I say "workloads", I mean
> > this in a broad sense. A container might pack a set of tools with
> > different runtimes (Go, Java, libc-based). All these runtimes should
> > detect only allowed features.
>
> I do not think that arbitrary applications (and libraries!) should have
> to pick up additional constraints that are unnecessary without CRIU,
> especially where that goes against deliberate design decisions (e.g.
> features in arm64's HINT instruction space, which are designed to be
> usable in fast paths WITHOUT needing explicit checks of things like
> HWCAPs). Note that those typically *do* have kernel controls.
>
> I think there's a much larger problem space than you anticipate, and
> adding an incomplete solution now is just going to introduce a
> maintenance burden.

I am not adding arbitrary constraints for standard non-CRIU use cases.
Previously, I suggested that standard libraries would need to call prctl
to determine if hwcaps should be used for feature detection.  However,
we can avoid this extra syscall by adding the new HWCAP2_CR bit. Then
libraries will simply check this bit in auxv[AT_HWCAP2], meaning the
overhead for "non-criu" cases is just a single bit check.

As for HINT instructions, there are two class of instructions.

The first one doesn't change a process state and they are not required
any special handling in term of checkpoint/restore. If a process is
checkpointed on a newer cpu, and restore it on an older cpu, the older
hardware will simply skip over that instructions.  The architectural
state (registers, memory) should remain consistent.

The second class such as PAC are instructions that actually change a
process state. These instructions require kernel/userspace coordination.
For example, usage of PAC keys can be controlled from userspace via prctl.
I mean when support for new instructions is implemented in the kernel,
we will need to consider that userspace should be able to control them.

>
> > Returning to the subject of this patchset: this series extends the role
> > of hwcaps. With this change, we would establish that hwcaps is the
> > "source of truth" for which features an application can safely use. Any
> > other features available on the current CPU would not be guaranteed to
> > remain available after migration to another machine.
> >
> > After this discussion, I found that the current version missed one major
> > thing: there should be a signal indicating that hwcaps must be used for
> > feature detection. Since we will need to integrate this interface into
> > libc, Go, and other runtimes, they definitely should not rely just on
> > hwcaps by default, especially in the early stages. This can be solved
> > via the prctl command.  Libraries like libc would call
> > prctl(PR_USER_HWCAP_ENABLED). If this returns true, the runtime knows
> > that only the features explicitly listed in hwcaps should be used.
>
> I do not think we should be pushing that shape of constraint onto
> userspace.

Look at the previous command.

>
> > You are right, the controlled feature set will be limited to features
> > the kernel knows about. And yes, we would need to report CPU features in
> > hwcaps even if the kernel isn't directly involved in handling them.
>
> To be clear, that is not what I am arguing.
>
> As I mentioned before, the way this works on arm64 is that the kernel
> only exposes what it is aware of, even in the ID regs accessible to
> userspace. We usually *can* hide features, and do that for cases of
> mismatched big.LITTLE, virtual machines, etc.

I understand that. My point was that the kernel would need to report
features in hwcaps even if they don't require specific kernel-side
handling.

>
> > Honestly, I am not certain if this is the "right" interface for that,
> > and I would be happy to consider other ideas. I understand that these
> > hwcaps will not work right out of the box, but we need a way to solve
> > this problem. Having a centralized API for CPU/kernel feature detection
> > seems like the right direction.
>
> I think that for better or worse the approach you are tkaing here simply
> does not solve enough of the problem to actually be worthwhile.

This approach mimics solutions that some CRIU users are already
implementing in userspace, but those only work when the user controls/
recompiles all their libraries. I am open to other ideas, but we need a
path forward.

>
> > As for signal frame size and extended states like SVE/SME, we aware
> > about this problem.  However, it is partly mitigated by the fact that if
> > an application does not use some features, those states are not placed
> > in the signal frame.
>
> That is not true. The kernel can and will create signal frames for
> architectural state that a task might never have touched.
>
> Generally arm64 creates signal frames for features when the feature
> *exists*, regardless of whether the task has actively manipulated the
> relevant state. For example, on systems with SVE a trivial SVE signal
> frame gets created even if a task only uses the FPSIMD registers, and on
> systms with SME a TPIDR2 signal frame gets created even if the task has
> never read/written TPIDR2.
>
> When restoring, an unrecognised signal frame is treated as invalid, and
> we can require that certain signal frames are present.

You are right; that was my mistake. My only explanation for why we don't
see this failure often is that C/R is rarely triggered while a process
is actually
inside a signal handler. This is definitely a problem that still needs
to be solved.

>
> > In the future, when we construct/reload a signal frame, we could look
> > at a process feature set for a process and generate a frame according
> > to those features...
>
> When you say 'we' here, are you talking about within the kernel, or
> within the userspace C/R mechanism?

... within the kernel.

Thanks,
Andrei


^ permalink raw reply

* Re: [PATCH bpf-next v14 0/5] emit ENDBR/BTI instructions for indirect
From: Alexei Starovoitov @ 2026-04-15 19:22 UTC (permalink / raw)
  To: Xu Kuohai
  Cc: bpf, LKML, linux-arm-kernel, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman,
	Kumar Kartikeya Dwivedi, Yonghong Song, Puranjay Mohan,
	Anton Protopopov, Alexis Lothoré, Shahab Vahedi,
	Russell King, Tiezhu Yang, Hengqi Chen, Johan Almbladh,
	Paul Burton, Hari Bathini, Christophe Leroy, Naveen N Rao,
	Luke Nelson, Xi Wang, Björn Töpel, Pu Lehui,
	Ilya Leoshkevich, Heiko Carstens, Vasily Gorbik, David S . Miller,
	Wang YanQing
In-Reply-To: <cover.1776062885.git.xukuohai@hotmail.com>

On Mon, Apr 13, 2026 at 6:06 AM Xu Kuohai <xukuohai@huaweicloud.com> wrote:
>
> From: Xu Kuohai <xukuohai@hotmail.com>
>
> On architectures with CFI protection enabled that require landing pad
> instructions at indirect jump targets, such as x86 with CET/IBT enabled
> and arm64 with BTI enabled, kernel panics when an indirect jump lands on
> a target without landing pad. Therefore, the JIT must emit landing pad
> instructions for indirect jump targets.
>
> The verifier already recognizes which instructions are indirect jump
> targets during the verification phase. So we can store this information
> in env->insn_aux_data and pass it to the JIT as new parameter, allowing
> the JIT to consult env->insn_aux_data to determine which instructions are
> indirect jump targets.
>
> During JIT, constants blinding is performed. It rewrites the private copy
> of instructions for the JITed program, but it does not adjust the global
> env->insn_aux_data array. As a result, after constants blinding, the
> instruction indexes used by JIT may no longer match the indexes in
> env->insn_aux_data, so the JIT can not use env->insn_aux_data directly.
>
> To avoid this mismatch, and given that all existing arch-specific JITs
> already implement constants blinding with largely duplicated code, move
> constants blinding from JIT to generic code.
>
> v14:
> - Rebase

Pls do one more rebase and target bpf tree.


^ permalink raw reply

* Re: [PATCH bpf-next v2 0/2] bpf, arm64/riscv: Remove redundant icache flush after pack allocator finalize
From: patchwork-bot+netdevbpf @ 2026-04-15 19:20 UTC (permalink / raw)
  To: Puranjay Mohan
  Cc: bpf, ast, daniel, andrii, martin.lau, eddyz87, memxor, song,
	yonghong.song, jolsa, xukuohai, catalin.marinas, will,
	luke.r.nels, xi.wang, bjorn, pulehui, pjw, palmer, aou, alex,
	linux-arm-kernel, linux-riscv, linux-kernel
In-Reply-To: <20260413191111.3426023-1-puranjay@kernel.org>

Hello:

This series was applied to bpf/bpf.git (master)
by Alexei Starovoitov <ast@kernel.org>:

On Mon, 13 Apr 2026 12:11:07 -0700 you wrote:
> Changelog:
> v1: https://lore.kernel.org/all/20260413123256.3296452-1-puranjay@kernel.org/
> Changes in v2:
> - Remove "#include <asm/cacheflush.h>" as it is not needed now.
> - Add Acked-by: Song Liu <song@kernel.org>
> 
> When the BPF prog pack allocator was added for arm64 and riscv, the
> existing bpf_flush_icache() calls were retained after
> bpf_jit_binary_pack_finalize(). However, the finalize path copies the
> JITed code via architecture text patching routines (__text_poke on arm64,
> patch_text_nosync on riscv) that already perform a full
> flush_icache_range() internally. The subsequent bpf_flush_icache()
> repeats the same cache maintenance on the same range.
> 
> [...]

Here is the summary with links:
  - [bpf-next,v2,1/2] bpf, arm64: Remove redundant bpf_flush_icache() after pack allocator finalize
    https://git.kernel.org/bpf/bpf/c/42f18ae53011
  - [bpf-next,v2,2/2] bpf, riscv: Remove redundant bpf_flush_icache() after pack allocator finalize
    https://git.kernel.org/bpf/bpf/c/46ee1342b887

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html




^ permalink raw reply

* Re: [PATCH v3 3/3] dt-bindings: i3c: Add AST2600 I3C global registers
From: Dawid Glazik @ 2026-04-15 18:21 UTC (permalink / raw)
  To: Krzysztof Kozlowski
  Cc: Alexandre Belloni, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
	Joel Stanley, Andrew Jeffery, linux-aspeed, linux-i3c, devicetree,
	linux-arm-kernel, Frank Li, Maciej Lawniczak
In-Reply-To: <d74e7aa8-1110-469a-ac7e-3829c2458852@kernel.org>

On 4/9/2026 9:30 AM, Krzysztof Kozlowski wrote:
> On 09/04/2026 09:28, Krzysztof Kozlowski wrote:
>> On Wed, Apr 08, 2026 at 10:34:35PM +0200, Dawid Glazik wrote:
>>> Introduce the device-tree bindings for I3C global registers found on
>>> AST2600 SoCs.
>>>
>>> Signed-off-by: Dawid Glazik <dawid.glazik@linux.intel.com>
>>> ---
>>> I wasn't sure if I should add newline at the end of the
>>> file or not so I took
>>> https://github.com/torvalds/linux/tree/master/Documentation/devicetree/bindings/i3c
>>> as an example.
>>
>> Answer is: you cannot have patch warnings.
>>
>> Documentation/devicetree/bindings/i3c does not have patch warning, does
>> it?
> 
> And if you tested this code with standard tools, you would see that...
> 
> Best regards,
> Krzysztof

Thank you for the review and feedback. This is my first contribution to 
Linux kernel so I'm still learning the process and toolchain. I 
apologize for the rookie mistakes. I will address all the issues you've 
pointed out and resubmit the series.

Best regards,
Dawid


^ permalink raw reply

* Re: [PATCH] arm_pmu: acpi: fix reference leak on failed device registration
From: Mark Rutland @ 2026-04-15 18:19 UTC (permalink / raw)
  To: Guangshuo Li, Greg Kroah-Hartman
  Cc: Will Deacon, Anshuman Khandual, linux-arm-kernel,
	linux-perf-users, linux-kernel, stable
In-Reply-To: <20260415174159.3625777-1-lgs201920130244@gmail.com>

Hi,

Thanks for the patch, but from a quick skim, I don't think this is the right
fix.

Greg, I think we might want to rework the core API here; question for
you at the end.

On Thu, Apr 16, 2026 at 01:41:59AM +0800, Guangshuo Li wrote:
> When platform_device_register() fails in arm_acpi_register_pmu_device(),
> the embedded struct device in pdev has already been initialized by
> device_initialize(), but the failure path only unregisters the GSI and
> does not drop the device reference for the current platform device:
> 
>   arm_acpi_register_pmu_device()
>     -> platform_device_register(pdev)
>        -> device_initialize(&pdev->dev)
>        -> setup_pdev_dma_masks(pdev)
>        -> platform_device_add(pdev)
> 
> This leads to a reference leak when platform_device_register() fails.

AFAICT you're saying that the reference was taken *within*
platform_device_register(), and then platform_device_register() itself
has failed. I think it's surprising that platform_device_register()
doesn't clean that up itself in the case of an error.

There are *tonnes* of calls to platform_device_register() throughout the
kernel that don't even bother to check the return value, and many that
just pass the return onto a caller that can't possibly know to call
platform_device_put().

Code in the same file as platform_device_register() expects it to clean up
after itself, e.g.

| int platform_add_devices(struct platform_device **devs, int num) 
| {
|         int i, ret = 0; 
| 
|         for (i = 0; i < num; i++) {
|                 ret = platform_device_register(devs[i]);
|                 if (ret) {
|                         while (--i >= 0)
|                                 platform_device_unregister(devs[i]);
|                         break;
|                 }    
|         }    
| 
|         return ret; 
| }

That's been there since the initial git commit, and back then,
platform_device_register() didn't mention that callers needed to perform
any cleanup.

I see a comment was added to platform_device_register() in commit:

  67e532a42cf4 ("driver core: platform: document registration-failure requirement")

... and that copied the commend added for device_register() in commit:

  5739411acbaa ("Driver core: Clarify device cleanup.")

... but the potential brokenness is so widespread, and the behaviour is
so surprising, that I'd argue the real but is that device_register()
doesn't clean up in case of error. I don't think it's worth changing
this single instance given the prevalance and churn fixing all of that
would involve.

I think it would be far better to fix the core driver API such that when
those functions return an error, they've already cleaned up for
themselves.

Greg, am I missing some functional reason why we can't rework
device_register() and friends to handle cleanup themselves? I appreciate
that'll involve churn for some callers, but AFAICT the majority of
callers don't have the required cleanup.

Mark.

> Fix this by calling platform_device_put() after unregistering the GSI.
> 
> The issue was identified by a static analysis tool I developed and
> confirmed by manual review.
> 
> Fixes: 81e5ee4716098 ("arm_pmu: acpi: Refactor arm_spe_acpi_register_device()")
> Cc: stable@vger.kernel.org
> Signed-off-by: Guangshuo Li <lgs201920130244@gmail.com>
> ---
>  drivers/perf/arm_pmu_acpi.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/perf/arm_pmu_acpi.c b/drivers/perf/arm_pmu_acpi.c
> index e80f76d95e68..5ce382661e34 100644
> --- a/drivers/perf/arm_pmu_acpi.c
> +++ b/drivers/perf/arm_pmu_acpi.c
> @@ -119,8 +119,10 @@ arm_acpi_register_pmu_device(struct platform_device *pdev, u8 len,
>  
>  	pdev->resource[0].start = irq;
>  	ret = platform_device_register(pdev);
> -	if (ret)
> +	if (ret) {
>  		acpi_unregister_gsi(gsi);
> +		platform_device_put(pdev);
> +	}
>  
>  	return ret;
>  }
> -- 
> 2.43.0
> 

^ permalink raw reply

* Re: [RFC PATCH] mmc: host: sdhci-iproc: implement the .hw_reset callback
From: Meagan Lloyd @ 2026-04-15 18:07 UTC (permalink / raw)
  To: Florian Fainelli, rjui
  Cc: sbranden, linux-arm-kernel, tgopinath, adrian.hunter, linux-mmc
In-Reply-To: <702c52e4-b0b3-4e1e-a40d-29e136e46d7d@broadcom.com>


On 4/13/2026 10:43 AM, Florian Fainelli wrote:
> On 4/13/26 10:38, Meagan Lloyd wrote:
>>
>> On 3/27/2026 3:21 PM, Meagan Lloyd wrote:
>>> Implement the .hw_reset callback so that the eMMC can be reset as
>>> needed
>>> given cap-mmc-hw-reset is set in the devicetree and the
>>> functionality is
>>> enabled on the eMMC.
>>>
>>> Signed-off-by: Meagan Lloyd <meaganlloyd@linux.microsoft.com>
>>> ---
>>>
>>> SDHCI_POWER_CONTROL[4] (SD Host Controller Standard) has been
>>> repurposed
>>> on my Broadcomm processor to be eMMC hardware reset
>>> (SDIO*_eMMCSDXC_CTRL[12], HRESET).
>>>
>>> Can you confirm this repurposed bit is consistent across the Broadcomm
>>> iProc processors and thus the .hw_reset callback can be uniformly
>>> applied in this driver?
>>
>> Hi Ray & Scott,
>>
>> I hope you're doing well. This bit looks to have been repurposed from
>> the SD Host Controller Standard's VDD2 Power Control to being used for
>> toggling the hardware reset signal to eMMCs. Can you verify that it
>> applies across the iProc processors so that I may finalize this patch?
>
> Which iProc process are you using? If you are not sure this applies
> broadly, can you at least make it specific to the SoC you are using? 
Yes, if it comes to that I can. I think it's overkill to roll a new
compat string/associated structures over this small change, hence
checking with Broadcomm iProc maintainers on this thread.


^ permalink raw reply

* Re: [PATCH net v2] net: airoha: Add missing bits in airoha_qdma_cleanup_tx_queue()
From: Lorenzo Bianconi @ 2026-04-15 17:58 UTC (permalink / raw)
  To: Simon Horman
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, linux-arm-kernel, linux-mediatek, netdev
In-Reply-To: <20260415164643.GQ772670@horms.kernel.org>

[-- Attachment #1: Type: text/plain, Size: 5012 bytes --]

> On Tue, Apr 14, 2026 at 08:50:52AM +0200, Lorenzo Bianconi wrote:
> > Similar to airoha_qdma_cleanup_rx_queue(), reset DMA TX descriptors in
> > airoha_qdma_cleanup_tx_queue routine. Moreover, reset TX_DMA_IDX to
> > TX_CPU_IDX to notify the NIC the QDMA TX ring is empty.
> > 
> > Fixes: 23020f0493270 ("net: airoha: Introduce ethernet support for EN7581 SoC")
> > Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
> > ---
> > Changes in v2:
> > - Move q->ndesc initialization at end of airoha_qdma_init_tx routine in
> >   order to avoid any possible NULL pointer dereference in
> >   airoha_qdma_cleanup_tx_queue()
> 
> This seems to be a separate issue.
> If so, I think it should be split out into a separate patch.

Sorry, I missed this comment.
Ack. I will repost splitting this in two separated patches.

Regards,
Lorenzo

> 
> > - Check if q->tx_list is empty in airoha_qdma_cleanup_tx_queue()
> > - Link to v1: https://lore.kernel.org/r/20260410-airoha_qdma_cleanup_tx_queue-fix-net-v1-1-b7171c8f1e78@kernel.org
> 
> I think it was covered in the review Jakub forwarded for v1.  But FTR,
> Sashiko has some feedback on this patch in the form of an existing bug
> (that should almost certainly be handled separately from this patch).
> 
> > ---
> >  drivers/net/ethernet/airoha/airoha_eth.c | 41 ++++++++++++++++++++++++++------
> >  1 file changed, 34 insertions(+), 7 deletions(-)
> > 
> > diff --git a/drivers/net/ethernet/airoha/airoha_eth.c b/drivers/net/ethernet/airoha/airoha_eth.c
> > index 9e995094c32a..3c1a2bc68c42 100644
> > --- a/drivers/net/ethernet/airoha/airoha_eth.c
> > +++ b/drivers/net/ethernet/airoha/airoha_eth.c
> > @@ -966,27 +966,27 @@ static int airoha_qdma_init_tx_queue(struct airoha_queue *q,
> >  	dma_addr_t dma_addr;
> >  
> >  	spin_lock_init(&q->lock);
> > -	q->ndesc = size;
> >  	q->qdma = qdma;
> >  	q->free_thr = 1 + MAX_SKB_FRAGS;
> >  	INIT_LIST_HEAD(&q->tx_list);
> >  
> > -	q->entry = devm_kzalloc(eth->dev, q->ndesc * sizeof(*q->entry),
> > +	q->entry = devm_kzalloc(eth->dev, size * sizeof(*q->entry),
> >  				GFP_KERNEL);
> >  	if (!q->entry)
> >  		return -ENOMEM;
> >  
> > -	q->desc = dmam_alloc_coherent(eth->dev, q->ndesc * sizeof(*q->desc),
> > +	q->desc = dmam_alloc_coherent(eth->dev, size * sizeof(*q->desc),
> >  				      &dma_addr, GFP_KERNEL);
> >  	if (!q->desc)
> >  		return -ENOMEM;
> >  
> > -	for (i = 0; i < q->ndesc; i++) {
> > +	for (i = 0; i < size; i++) {
> >  		u32 val = FIELD_PREP(QDMA_DESC_DONE_MASK, 1);
> >  
> >  		list_add_tail(&q->entry[i].list, &q->tx_list);
> >  		WRITE_ONCE(q->desc[i].ctrl, cpu_to_le32(val));
> >  	}
> > +	q->ndesc = size;
> >  
> >  	/* xmit ring drop default setting */
> >  	airoha_qdma_set(qdma, REG_TX_RING_BLOCKING(qid),
> > @@ -1051,13 +1051,17 @@ static int airoha_qdma_init_tx(struct airoha_qdma *qdma)
> >  
> >  static void airoha_qdma_cleanup_tx_queue(struct airoha_queue *q)
> >  {
> > -	struct airoha_eth *eth = q->qdma->eth;
> > -	int i;
> > +	struct airoha_qdma *qdma = q->qdma;
> > +	struct airoha_eth *eth = qdma->eth;
> > +	int i, qid = q - &qdma->q_tx[0];
> > +	struct airoha_queue_entry *e;
> > +	u16 index = 0;
> >  
> >  	spin_lock_bh(&q->lock);
> >  	for (i = 0; i < q->ndesc; i++) {
> > -		struct airoha_queue_entry *e = &q->entry[i];
> 
> super nit: In v2 e is always used within a block (here and in the hunk below).
>            So I would lean towards declaring e in the blocks where it is
> 	   used.
> 
> 	   No need to repost just for this!
> 
> > +		struct airoha_qdma_desc *desc = &q->desc[i];
> >  
> > +		e = &q->entry[i];
> >  		if (!e->dma_addr)
> >  			continue;
> >  
> > @@ -1067,8 +1071,31 @@ static void airoha_qdma_cleanup_tx_queue(struct airoha_queue *q)
> >  		e->dma_addr = 0;
> >  		e->skb = NULL;
> >  		list_add_tail(&e->list, &q->tx_list);
> > +
> > +		/* Reset DMA descriptor */
> > +		WRITE_ONCE(desc->ctrl, 0);
> > +		WRITE_ONCE(desc->addr, 0);
> > +		WRITE_ONCE(desc->data, 0);
> > +		WRITE_ONCE(desc->msg0, 0);
> > +		WRITE_ONCE(desc->msg1, 0);
> > +		WRITE_ONCE(desc->msg2, 0);
> > +
> >  		q->queued--;
> >  	}
> > +
> > +	if (!list_empty(&q->tx_list)) {
> > +		e = list_first_entry(&q->tx_list, struct airoha_queue_entry,
> > +				     list);
> > +		index = e - q->entry;
> > +	}
> > +	/* Set TX_DMA_IDX to TX_CPU_IDX to notify the hw the QDMA TX ring is
> > +	 * empty.
> > +	 */
> > +	airoha_qdma_rmw(qdma, REG_TX_CPU_IDX(qid), TX_RING_CPU_IDX_MASK,
> > +			FIELD_PREP(TX_RING_CPU_IDX_MASK, index));
> > +	airoha_qdma_rmw(qdma, REG_TX_DMA_IDX(qid), TX_RING_DMA_IDX_MASK,
> > +			FIELD_PREP(TX_RING_DMA_IDX_MASK, index));
> > +
> >  	spin_unlock_bh(&q->lock);
> >  }
> >  
> > 
> > ---
> > base-commit: 2cd7e6971fc2787408ceef17906ea152791448cf
> > change-id: 20260410-airoha_qdma_cleanup_tx_queue-fix-net-93375f5ee80f
> > 
> > Best regards,
> > -- 
> > Lorenzo Bianconi <lorenzo@kernel.org>
> > 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply

* Re: [PATCH net v5] net: stmmac: Prevent NULL deref when RX memory exhausted
From: Sam Edwards @ 2026-04-15 17:53 UTC (permalink / raw)
  To: Russell King (Oracle)
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Maxime Coquelin, Alexandre Torgue, Maxime Chevallier,
	Ovidiu Panait, Vladimir Oltean, Baruch Siach, Serge Semin,
	Giuseppe Cavallaro, netdev, linux-stm32, linux-arm-kernel,
	linux-kernel, stable
In-Reply-To: <ad-8q4OrOm-VtGrO@shell.armlinux.org.uk>

On Wed, Apr 15, 2026 at 9:28 AM Russell King (Oracle)
<linux@armlinux.org.uk> wrote:
>
> On Wed, Apr 15, 2026 at 01:56:32PM +0100, Russell King (Oracle) wrote:
> > Locally, while debugging my issues, I used this to prevent cur_rx
> > catching up with dirty_rx:
> >
> >                 status = stmmac_rx_status(priv, &priv->xstats, p);
> >                 /* check if managed by the DMA otherwise go ahead */
> >                 if (unlikely(status & dma_own))
> >                         break;
> >
> >                 next_entry = STMMAC_NEXT_ENTRY(rx_q->cur_rx,
> >                                                priv->dma_conf.dma_rx_size);
> >                 if (unlikely(next_entry == rx_q->dirty_rx))
> >                         break;
> >
> >                 rx_q->cur_rx = next_entry;
> >
> > If we care about the cost of reloading rx_q->dirty_rx on every
> > iteration, then I'd suggest that the cost we already incur reading and
> > writing rx_q->cur_rx is something that should be addressed, and
> > eliminating that would counter the cost of reading rx_q->dirty_rx. I
> > suspect, however, that the cost is minimal, as cur_tx and dirty_rx are
> > likely in the same cache line.

No, no, I like your approach better. :) It also removes the need for
the `limit` clamp at the top of the function, so later code can assume
limit==budget.

> > It looks like any fix to stmmac_rx() will also need a corresponding
> > fix for stmmac_rx_zc().

I agree that stmmac_rx_zc() is likely also broken (in a similar way,
but not similar enough to permit a "corresponding" fix), but I don't
agree that there's a dependency relationship here. This patch is
addressing #221010, which affects the generic/non-ZC codepath; I'm
afraid the ZC codepath warrants its own investigation.

> I have some further information, but a new curveball has just been
> chucked... and I've no idea what this will mean at this stage. Just
> take it that I won't be responding for a while.

I think I follow your meaning. Good luck getting it straightened out!


^ permalink raw reply

* [PATCH] arm_pmu: acpi: fix reference leak on failed device registration
From: Guangshuo Li @ 2026-04-15 17:41 UTC (permalink / raw)
  To: Will Deacon, Mark Rutland, Anshuman Khandual, linux-arm-kernel,
	linux-perf-users, linux-kernel
  Cc: Guangshuo Li, stable

When platform_device_register() fails in arm_acpi_register_pmu_device(),
the embedded struct device in pdev has already been initialized by
device_initialize(), but the failure path only unregisters the GSI and
does not drop the device reference for the current platform device:

  arm_acpi_register_pmu_device()
    -> platform_device_register(pdev)
       -> device_initialize(&pdev->dev)
       -> setup_pdev_dma_masks(pdev)
       -> platform_device_add(pdev)

This leads to a reference leak when platform_device_register() fails.
Fix this by calling platform_device_put() after unregistering the GSI.

The issue was identified by a static analysis tool I developed and
confirmed by manual review.

Fixes: 81e5ee4716098 ("arm_pmu: acpi: Refactor arm_spe_acpi_register_device()")
Cc: stable@vger.kernel.org
Signed-off-by: Guangshuo Li <lgs201920130244@gmail.com>
---
 drivers/perf/arm_pmu_acpi.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/perf/arm_pmu_acpi.c b/drivers/perf/arm_pmu_acpi.c
index e80f76d95e68..5ce382661e34 100644
--- a/drivers/perf/arm_pmu_acpi.c
+++ b/drivers/perf/arm_pmu_acpi.c
@@ -119,8 +119,10 @@ arm_acpi_register_pmu_device(struct platform_device *pdev, u8 len,
 
 	pdev->resource[0].start = irq;
 	ret = platform_device_register(pdev);
-	if (ret)
+	if (ret) {
 		acpi_unregister_gsi(gsi);
+		platform_device_put(pdev);
+	}
 
 	return ret;
 }
-- 
2.43.0



^ permalink raw reply related

* Re: [PATCH net-next] net: stmmac: enable RPS and RBU interrupts
From: Sam Edwards @ 2026-04-15 17:38 UTC (permalink / raw)
  To: Russell King (Oracle)
  Cc: Jakub Kicinski, Andrew Lunn, Alexandre Torgue, Andrew Lunn,
	David S. Miller, Eric Dumazet,
	moderated list:BROADCOM BCM2711/BCM2835 ARM ARCHITECTURE,
	linux-stm32, Linux Network Development Mailing List, Paolo Abeni
In-Reply-To: <ad-ID2WaPgPJqdsa@shell.armlinux.org.uk>

On Wed, Apr 15, 2026 at 5:44 AM Russell King (Oracle)
<linux@armlinux.org.uk> wrote:
>
> On Tue, Apr 14, 2026 at 07:12:34PM -0700, Sam Edwards wrote:
> > On Tue, Apr 14, 2026 at 6:19 PM Russell King (Oracle)
> > <linux@armlinux.org.uk> wrote:
> > > Okay, just a quick note to say that nvidia's 5.10.216-tegra kernel
> > > survives iperf3 -c -R to the imx6.
> >
> > Hi Russell,
> >
> > Aw, you beat me to it! I was about to report that 5.10.104-tegra is
> > unaffected. And my iperf3 server is a multi-GbE amd64 machine.
> >
> > > Dumping the registers and comparing, and then forcing the RQS and TQS
> > > values to 0x23 (+1 = 36, *256 = 9216 bytes) and 0x8f (+1 = 144,
> > > *256 = 36864 ytes) respectively seems to solve the problem. Under
> > > net-next, these both end up being 0xff (+1 = 256, *256 = 65536 bytes.)
> > > Suspiciously, 36 * 4 = 144, and I also see that this kernel programs
> > > all four of the MTL receive operation mode registers, but only the
> > > first MTL transmit operation mode register. However, DMA channels 1-3
> > > aren't initialised.
> >
> > Wow, great! I wonder if the problem is that the MTL FIFOs are smaller
> > than that, so when the DMA suffers a momentary hiccup, the FIFOs are
> > allowed to overflow, putting the hardware in a bad state.
> >
> > Though I suspect this is only half of the problem: do you still see
> > RBUs? Everything you've shared so far suggests the DMA failures are
> > _not_ because the rx ring is drying up.
>
> Yes. Note that RBUs will happen not because of DMA failures, but if
> the kernel fails to keep up with the packet rate. RBU means "we read
> the next descriptor, and it wasn't owned by hardware".

Are you speaking from observation, documentation, or understanding?
I'd define RBU the same way, but you reported:

```
[   55.766199] dwc-eth-dwmac 2490000.ethernet eth0: q0: receive buffer
unavailable: cur_rx=309 dirty_rx=309 last_cur_rx=245
last_cur_rx_post=309 last_dirty_rx=245 count=64 budget=64

cur_rx == dirty_rx _should_ mean that we fully refilled the ring. [...]
[...]
Every ring entry contains the same RDES3 value, so it really is
completely full at the point RBU fires (bit 31 clear means software
owns the descriptor, and it's basically saying first/last segment,
RDES1 valid, buffer 1 length of 1518.
```

It would seem* that the kernel isn't really failing to keep up with
the packet rate. If RBU is firing with a ring that's not even close to
empty, that tells me there's another way for it to fire. So I suspect
the hardware designers implemented it to mean:
"We couldn't read the next descriptor, _or_ it wasn't owned by hardware."

(* However, if bit 31 is clear everywhere, wouldn't that mean the ring
is actually completely depleted, not full? If count==budget, wouldn't
that mean the whole ring hasn't been visited, so we only refilled 64
entries and not necessarily the entire ring? Maybe the kernel isn't
keeping up after all.)

> That has:
>
>         const nveu32_t rx_fifo_sz[2U][OSI_EQOS_MAX_NUM_QUEUES] = {
>                 { FIFO_SZ(9U), FIFO_SZ(9U), FIFO_SZ(9U), FIFO_SZ(9U),
>                   FIFO_SZ(1U), FIFO_SZ(1U), FIFO_SZ(1U), FIFO_SZ(1U) },
>                 { FIFO_SZ(36U), FIFO_SZ(2U), FIFO_SZ(2U), FIFO_SZ(2U),
>                   FIFO_SZ(2U), FIFO_SZ(2U), FIFO_SZ(2U), FIFO_SZ(16U) },
>         };
>         const nveu32_t tx_fifo_sz[2U][OSI_EQOS_MAX_NUM_QUEUES] = {
>                 { FIFO_SZ(9U), FIFO_SZ(9U), FIFO_SZ(9U), FIFO_SZ(9U),
>                   FIFO_SZ(1U), FIFO_SZ(1U), FIFO_SZ(1U), FIFO_SZ(1U) },
>                 { FIFO_SZ(8U), FIFO_SZ(8U), FIFO_SZ(8U), FIFO_SZ(8U),
>                   FIFO_SZ(8U), FIFO_SZ(8U), FIFO_SZ(8U), FIFO_SZ(8U) },
>         };
>
> where each of those values is the RQS/TQS value to use in KiB:
>
> #define FIFO_SZ(x)              ((((x) * 1024U) / 256U) - 1U)
>
> This doesn't correspond with the values I'm seeing programmed into
> the hardware under the 5.10.216-tegra kernel. I'm seeing TQS = 143
> (36KiB), and RQS = 35 (9KiB). Yes, these values exist in the tables
> above from a quick look, but they're not in the right place!

True, but:
a) I doubt 5.10.216-tegra includes exactly the same version of the
driver found in this random GitHub mirror. (My intent was only to
point out that they don't use 5.10's stmmac; I should have been more
clear that I wasn't trying to link the same version, sorry!)
b) This is vendor code; I don't know how good their testing/review
process is. It might not run the way it looks. The intent seems to be
for RQS > TQS (which makes intuitive sense), but as you're seeing the
registers programmed the other way 'round, they might have gotten them
subtly mixed up.

> Now, as for FIFO sizes, if we sum up all the entries, then we
> get:
>
> SUM(rx_fifo_size[0][]) = 60KiB
> SUM(rx_fifo_size[1][]) = 64KiB
> SUM(tx_fifo_size[0][]) = 60KiB
> SUM(tx_fifo_size[1][]) = 64KiB

I follow the math with 64KiB, but surely the 60KiB should be
9+9+9+9+1+1+1+1=40KiB? This seems to me that the "legacy EQOS" simply
shifts with smaller FIFOs. Since dwmac is licensed as a soft IP core,
perhaps the FIFO size is an elaboration parameter? That would mean
this isn't an issue with dwmac 5.0 broadly, but with Nvidia's specific
instantiation of it.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox