Linux-ARM-Kernel Archive on lore.kernel.org

Linux-ARM-Kernel Archive on lore.kernel.org
 help / color / mirror / Atom feed

* Re: [PATCH v2] iommu: Always fill in gather when unmapping
From: Jason Gunthorpe @ 2026-04-02 16:51 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Alexandre Ghiti, AngeloGioacchino Del Regno, Albert Ou, asahi,
	Baolin Wang, iommu, Janne Grunau, Jernej Skrabec, Joerg Roedel,
	Jean-Philippe Brucker, linux-arm-kernel, linux-mediatek,
	linux-riscv, linux-sunxi, Matthias Brugger, Neal Gompa,
	Orson Zhai, Palmer Dabbelt, Paul Walmsley, Samuel Holland,
	Sven Peter, virtualization, Chen-Yu Tsai, Will Deacon, Yong Wu,
	Chunyan Zhang, Lu Baolu, Janusz Krzysztofik, Joerg Roedel,
	Jon Hunter, patches, Pranjal Shrivastava, Samiullah Khawaja,
	stable, Vasant Hegde
In-Reply-To: <70a128f9-d6f0-41b6-8fef-e249c0507149@arm.com>

On Thu, Apr 02, 2026 at 04:59:29PM +0100, Robin Murphy wrote:
> > @@ -666,9 +666,22 @@ static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable *data,
> >   		/* Clear the remaining entries */
> >   		__arm_lpae_clear_pte(ptep, &iop->cfg, i);
> > -		if (gather && !iommu_iotlb_gather_queued(gather))
> > -			for (int j = 0; j < i; j++)
> > -				io_pgtable_tlb_add_page(iop, gather, iova + j * size, size);
> > +		if (gather && !iommu_iotlb_gather_queued(gather)) {
> > +			if (iop->cfg.tlb && iop->cfg.tlb->tlb_add_range) {
> > +				iop->cfg.tlb->tlb_add_range(gather, iova,
> > +							    i * size, size,
> > +							    iop->cookie);
> > +
> > +			} else {
> > +				iommu_iotlb_gather_add_range(gather, iova,
> > +							     i * size);
> > +
> > +				for (int j = 0; j < i; j++)
> > +					io_pgtable_tlb_add_page(iop, gather,
> > +								iova + j * size,
> > +								size);
> > +			}
> > +		}
> 
> NAK, this is insane.

This quite an optimization for SMMUv3 so it doesn't have to fit into
the ill fitting add_page api. What is "insane" here?

> If you'd rather make gathers mandatory for all drivers than fix it in the
> core code, then for goodness' sake just add the trivial one-liner to the
> handful of .unamp_pages implementations which need it, 

Do I understand this right, you want to not touch io-pgtable and
instead the unmap trampolines will fix the gather like mkt is doing?

Jason


^ permalink raw reply

* Re: [PATCH net v4 0/2] stmmac crash/stall fixes when under memory pressure
From: Sam Edwards @ 2026-04-02 16:53 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Paolo Abeni,
	Maxime Coquelin, Alexandre Torgue, Russell King (Oracle),
	Maxime Chevallier, Ovidiu Panait, Vladimir Oltean, Baruch Siach,
	Serge Semin, Giuseppe Cavallaro, netdev, linux-stm32,
	linux-arm-kernel, linux-kernel
In-Reply-To: <20260402080542.293e8729@kernel.org>

On Thu, Apr 2, 2026 at 8:05 AM Jakub Kicinski <kuba@kernel.org> wrote:
> I meant we need both a threshold, and a delay :(

Hi Jakub - got it: when the critical threshold is reached, allow the
NAPI instance to sleep and start a timer instead.

1) We'd either have to leave interrupts masked or let them race
against the timer. Either one is manageable, but I feel like those
interactions carry *just* enough regression risk to bump that patch to
-next.

2) Could you point out which NAPI driver best handles this situation?
I'd like to replicate its approach.

Thanks,
Sam


^ permalink raw reply

* [PATCH RESEND] thermal/drivers/brcmstb_thermal: Use max to simplify brcmstb_get_temp
From: Thorsten Blum @ 2026-04-02 16:56 UTC (permalink / raw)
  To: Markus Mayer, Broadcom internal kernel review list,
	Rafael J. Wysocki, Daniel Lezcano, Zhang Rui, Lukasz Luba,
	Florian Fainelli
  Cc: Thorsten Blum, linux-pm, linux-arm-kernel, linux-kernel

Use max() to simplify brcmstb_get_temp() and improve its readability.
Since avs_tmon_code_to_temp() returns an int, change the data type of
the local variable 't' from long to int.  No functional changes.

Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
---
 drivers/thermal/broadcom/brcmstb_thermal.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/drivers/thermal/broadcom/brcmstb_thermal.c b/drivers/thermal/broadcom/brcmstb_thermal.c
index f46f2ddc174e..a9ffa596f7c0 100644
--- a/drivers/thermal/broadcom/brcmstb_thermal.c
+++ b/drivers/thermal/broadcom/brcmstb_thermal.c
@@ -16,6 +16,7 @@
 #include <linux/irqreturn.h>
 #include <linux/interrupt.h>
 #include <linux/kernel.h>
+#include <linux/minmax.h>
 #include <linux/module.h>
 #include <linux/of.h>
 #include <linux/platform_device.h>
@@ -154,7 +155,7 @@ static int brcmstb_get_temp(struct thermal_zone_device *tz, int *temp)
 {
 	struct brcmstb_thermal_priv *priv = thermal_zone_device_priv(tz);
 	u32 val;
-	long t;
+	int t;
 
 	val = __raw_readl(priv->tmon_base + AVS_TMON_STATUS);
 
@@ -164,10 +165,7 @@ static int brcmstb_get_temp(struct thermal_zone_device *tz, int *temp)
 	val = (val & AVS_TMON_STATUS_data_msk) >> AVS_TMON_STATUS_data_shift;
 
 	t = avs_tmon_code_to_temp(priv, val);
-	if (t < 0)
-		*temp = 0;
-	else
-		*temp = t;
+	*temp = max(0, t);
 
 	return 0;
 }


^ permalink raw reply related

* Re: [PATCH v3] PCI: imx6: Don't remove MSI capability for i.MX7D/i.MX8M
From: Manivannan Sadhasivam @ 2026-04-02 16:57 UTC (permalink / raw)
  To: frank.li, l.stach, lpieralisi, kwilczynski, robh, bhelgaas,
	s.hauer, kernel, festevam, Richard Zhu
  Cc: linux-pci, linux-arm-kernel, imx, linux-kernel, stable
In-Reply-To: <20260331085252.1243108-1-hongxing.zhu@nxp.com>


On Tue, 31 Mar 2026 16:52:52 +0800, Richard Zhu wrote:
> The MSI trigger mechanism for endpoint devices connected to i.MX7D,
> i.MX8MM, and i.MX8MQ PCIe root complex ports depends on the MSI
> capability register settings in the root complex. Removing the MSI
> capability breaks MSI functionality for these endpoints.
> 
> Add keep_rp_msi_en flag to indicate platforms (i.MX7D, i.MX8MM, i.MX8MQ)
> that should preserve the MSI capability during initialization.
> 
> [...]

Applied, thanks!

[1/1] PCI: imx6: Don't remove MSI capability for i.MX7D/i.MX8M
      commit: a9de12c04779729f6d404192a5320c2e4dac0968

Best regards,
-- 
Manivannan Sadhasivam <mani@kernel.org>



^ permalink raw reply

* [PATCH RESEND 1/2] stm class: Replace kmalloc + copy_from_user with memdup_user
From: Thorsten Blum @ 2026-04-02 16:59 UTC (permalink / raw)
  To: Alexander Shishkin, Maxime Coquelin, Alexandre Torgue
  Cc: Thorsten Blum, linux-stm32, linux-arm-kernel, linux-kernel

Replace kmalloc() followed by copy_from_user() with memdup_user() to
simplify and improve stm_char_write().

Allocate and copy only 'count' bytes instead of 'count + 1' since the
extra byte is unused.

Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
---
 drivers/hwtracing/stm/core.c | 12 +++---------
 1 file changed, 3 insertions(+), 9 deletions(-)

diff --git a/drivers/hwtracing/stm/core.c b/drivers/hwtracing/stm/core.c
index 37584e786bb5..49791024bb86 100644
--- a/drivers/hwtracing/stm/core.c
+++ b/drivers/hwtracing/stm/core.c
@@ -645,15 +645,9 @@ static ssize_t stm_char_write(struct file *file, const char __user *buf,
 			return err;
 	}
 
-	kbuf = kmalloc(count + 1, GFP_KERNEL);
-	if (!kbuf)
-		return -ENOMEM;
-
-	err = copy_from_user(kbuf, buf, count);
-	if (err) {
-		kfree(kbuf);
-		return -EFAULT;
-	}
+	kbuf = memdup_user(buf, count);
+	if (IS_ERR(kbuf))
+		return PTR_ERR(kbuf);
 
 	pm_runtime_get_sync(&stm->dev);
 


^ permalink raw reply related

* [PATCH RESEND 2/2] stm class: Replace kzalloc + copy_from_user with memdup_user_nul
From: Thorsten Blum @ 2026-04-02 16:59 UTC (permalink / raw)
  To: Alexander Shishkin, Maxime Coquelin, Alexandre Torgue
  Cc: Thorsten Blum, linux-stm32, linux-arm-kernel, linux-kernel
In-Reply-To: <20260402165933.895706-4-thorsten.blum@linux.dev>

Replace kzalloc() followed by copy_from_user() with memdup_user_nul() to
simplify and improve stm_char_policy_set_ioctl(). Remove the obsolete
comment.

Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
---
 drivers/hwtracing/stm/core.c | 15 +++------------
 1 file changed, 3 insertions(+), 12 deletions(-)

diff --git a/drivers/hwtracing/stm/core.c b/drivers/hwtracing/stm/core.c
index 49791024bb86..f2c0f2b8f7b1 100644
--- a/drivers/hwtracing/stm/core.c
+++ b/drivers/hwtracing/stm/core.c
@@ -733,18 +733,9 @@ static int stm_char_policy_set_ioctl(struct stm_file *stmf, void __user *arg)
 	if (size < sizeof(*id) || size >= PATH_MAX + sizeof(*id))
 		return -EINVAL;
 
-	/*
-	 * size + 1 to make sure the .id string at the bottom is terminated,
-	 * which is also why memdup_user() is not useful here
-	 */
-	id = kzalloc(size + 1, GFP_KERNEL);
-	if (!id)
-		return -ENOMEM;
-
-	if (copy_from_user(id, arg, size)) {
-		ret = -EFAULT;
-		goto err_free;
-	}
+	id = memdup_user_nul(arg, size);
+	if (IS_ERR(id))
+		return PTR_ERR(id);
 
 	if (id->__reserved_0 || id->__reserved_1)
 		goto err_free;


^ permalink raw reply related

* Re: [PATCH 5/8] thermal: khadas-mcu-fan: Add fan config from platform data Add regulator support
From: Ronald Claveau @ 2026-04-02 17:00 UTC (permalink / raw)
  To: Neil Armstrong, Lee Jones, Rob Herring, Krzysztof Kozlowski,
	Conor Dooley, Andi Shyti, Kevin Hilman, Jerome Brunet,
	Martin Blumenstingl, Beniamino Galvani, Rafael J. Wysocki,
	Daniel Lezcano, Zhang Rui, Lukasz Luba, Liam Girdwood, Mark Brown
  Cc: linux-amlogic, devicetree, linux-kernel, linux-i2c,
	linux-arm-kernel, linux-pm
In-Reply-To: <5e3e8684-f893-4fb0-879e-9661820f72dd@linaro.org>

On 4/2/26 5:39 PM, Neil Armstrong wrote:
> On 4/2/26 16:27, Ronald Claveau wrote:
>> Replace the hardcoded MAX_LEVEL constant and fan register
>> with values read from platform_data (fan_reg, max_level),
>> as new MCUs need different values.
>>
>> Optionally acquire and enable a "fan" regulator supply
>> at probe time and on resume,
>> so boards that gate fan power through a regulator are handled.
>>
>> Signed-off-by: Ronald Claveau <linux-kernel-dev@aliel.fr>
>> ---
>>   drivers/thermal/khadas_mcu_fan.c | 43 ++++++++++++++++++++++++++++++
>> ++++------
>>   1 file changed, 37 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/thermal/khadas_mcu_fan.c b/drivers/thermal/
>> khadas_mcu_fan.c
>> index d35e5313bea41..55b496625e3bd 100644
>> --- a/drivers/thermal/khadas_mcu_fan.c
>> +++ b/drivers/thermal/khadas_mcu_fan.c
>> @@ -13,13 +13,15 @@
>>   #include <linux/regmap.h>
>>   #include <linux/sysfs.h>
>>   #include <linux/thermal.h>
>> -
>> -#define MAX_LEVEL 3
>> +#include <linux/regulator/consumer.h>
>>     struct khadas_mcu_fan_ctx {
>>       struct khadas_mcu *mcu;
>> +    unsigned int fan_reg;
>>       unsigned int level;
>> +    unsigned int max_level;
>>       struct thermal_cooling_device *cdev;
>> +    struct regulator *power;
>>   };
>>     static int khadas_mcu_fan_set_level(struct khadas_mcu_fan_ctx *ctx,
>> @@ -27,8 +29,7 @@ static int khadas_mcu_fan_set_level(struct
>> khadas_mcu_fan_ctx *ctx,
>>   {
>>       int ret;
>>   -    ret = regmap_write(ctx->mcu->regmap,
>> KHADAS_MCU_CMD_FAN_STATUS_CTRL_REG,
>> -               level);
>> +    ret = regmap_write(ctx->mcu->regmap, ctx->fan_reg, level);
>>       if (ret)
>>           return ret;
>>   @@ -40,7 +41,9 @@ static int khadas_mcu_fan_set_level(struct
>> khadas_mcu_fan_ctx *ctx,
>>   static int khadas_mcu_fan_get_max_state(struct
>> thermal_cooling_device *cdev,
>>                       unsigned long *state)
>>   {
>> -    *state = MAX_LEVEL;
>> +    struct khadas_mcu_fan_ctx *ctx = cdev->devdata;
>> +
>> +    *state = ctx->max_level;
>>         return 0;
>>   }
>> @@ -61,7 +64,7 @@ khadas_mcu_fan_set_cur_state(struct
>> thermal_cooling_device *cdev,
>>   {
>>       struct khadas_mcu_fan_ctx *ctx = cdev->devdata;
>>   -    if (state > MAX_LEVEL)
>> +    if (state > ctx->max_level)
>>           return -EINVAL;
>>         if (state == ctx->level)
>> @@ -83,11 +86,32 @@ static int khadas_mcu_fan_probe(struct
>> platform_device *pdev)
>>       struct device *dev = &pdev->dev;
>>       struct khadas_mcu_fan_ctx *ctx;
>>       int ret;
>> +    const struct khadas_mcu_fan_pdata *pdata =
>> dev_get_platdata(&pdev->dev);
>>         ctx = devm_kzalloc(dev, sizeof(*ctx), GFP_KERNEL);
>>       if (!ctx)
>>           return -ENOMEM;
>> +
>>       ctx->mcu = mcu;
>> +    ctx->fan_reg   = pdata->fan_reg;
>> +    ctx->max_level = pdata->max_level;
>> +
>> +    ctx->power = devm_regulator_get_optional(dev->parent, "fan");
>> +    if (IS_ERR(ctx->power)) {
>> +        if (PTR_ERR(ctx->power) == -ENODEV)
>> +            ctx->power = NULL;
>> +        else
>> +            return PTR_ERR(ctx->power);
>> +    }
>> +
>> +    if (ctx->power) {
>> +        ret = regulator_enable(ctx->power);
>> +        if (ret) {
>> +            dev_err(dev, "Failed to enable fan power supply: %d\n",
>> ret);
>> +            return ret;
>> +        }
>> +    }
>> +
>>       platform_set_drvdata(pdev, ctx);
>>         cdev = devm_thermal_of_cooling_device_register(dev->parent,
>> @@ -130,6 +154,13 @@ static int khadas_mcu_fan_suspend(struct device
>> *dev)
>>   static int khadas_mcu_fan_resume(struct device *dev)
>>   {
>>       struct khadas_mcu_fan_ctx *ctx = dev_get_drvdata(dev);
>> +    int ret;
>> +
>> +    if (ctx->power) {
>> +        ret = regulator_enable(ctx->power);
> 
> Seems you're missing a regulator_disable() on suspend.
> 
> Neil
> 

You right, I will add the regulator_disable on suspend for next version.
Thanks for your feedback.

>> +        if (ret)
>> +            return ret;
>> +    }
>>         return khadas_mcu_fan_set_level(ctx, ctx->level);
>>   }
>>
> 


-- 
Best regards,
Ronald


^ permalink raw reply

* [GIT PULL] Reset controller updates for v7.1
From: Philipp Zabel @ 2026-04-02 17:05 UTC (permalink / raw)
  To: soc; +Cc: linux-arm-kernel, kernel, Philipp Zabel

Dear arm-soc maintainers,

The following changes since commit 6de23f81a5e08be8fbf5e8d7e9febc72a5b5f27f:

  Linux 7.0-rc1 (2026-02-22 13:18:59 -0800)

are available in the Git repository at:

  https://git.pengutronix.de/git/pza/linux.git tags/reset-for-v7.1

for you to fetch changes up to d373605cd514837d8a6de3d00c786d4bae6dbaf8:

  Merge tag 'reset-fixes-for-v7.0-2' into reset/next (2026-04-02 18:32:53 +0200)

----------------------------------------------------------------
Reset controller updates for v7.1

* Rework the reset core to support firmware nodes, add more fine
  grained locking, and use guard() helpers.
* Change the reset-gpio driver to use firmware nodes.
* Add support for the Cix Sky1 SoC reset controller.
* Add support for the RZ/G3E SoC to the reset-rzv2h-usb2phy driver and
  convert it to regmap. Prepare registering a VBUS mux controller.
* Replace use of the deprecated register_restart_handler() function in
  the ath79, intel-gw, lpc18xx, ma35d1, npcm, and sunplus reset drivers.
* Combine two allocations into one in the sti/reset-syscfg driver.
* Fix the reset-rzg2l-usbphy-ctrl MODULE_AUTHOR email.
* Fix the reset_control_rearm() kerneldoc comment.

The last commit is a merge of reset-fixes-for-v7.0-2 into reset/next,
to solve a merge conflict between commits a9b95ce36de4 ("reset: gpio: add a
devlink between reset-gpio and its consumer") and fbffb8c7c7bb ("reset: gpio:
fix double free in reset_add_gpio_aux_device() error path").

----------------------------------------------------------------
Andrew Davis (6):
      reset: ath79: Use devm_register_restart_handler()
      reset: intel: Use devm_register_restart_handler()
      reset: lpc18xx: Use devm_register_sys_off_handler()
      reset: ma35d1: Use devm_register_sys_off_handler()
      reset: npcm: Use devm_register_sys_off_handler()
      reset: sunplus: Use devm_register_sys_off_handler()

Bartosz Golaszewski (15):
      reset: gpio: remove unneeded OF-node put
      reset: gpio: add a devlink between reset-gpio and its consumer
      reset: gpio: simplify fallback device matching
      reset: gpio: remove unneeded auxiliary_set_drvdata()
      reset: warn on reset-gpio release
      reset: fold ida_alloc() into reset_create_gpio_aux_device()
      reset: use lock guards in reset core
      reset: handle removing supplier before consumers
      reset: protect struct reset_controller_dev with its own mutex
      reset: protect struct reset_control with its own mutex
      reset: convert of_reset_control_get_count() to using firmware nodes
      reset: convert the core API to using firmware nodes
      reset: convert reset core to using firmware nodes
      reset: gpio: make the driver fwnode-agnostic
      reset: don't overwrite fwnode_reset_n_cells

Biju Das (1):
      reset: rzg2l-usbphy-ctrl: Fix malformed MODULE_AUTHOR string

Claudiu Beznea (2):
      reset: rzg2l-usbphy-ctrl: Check pwrrdy is valid before using it
      reset: core: Drop unnecessary double quote

Gary Yang (2):
      dt-bindings: soc: cix: document the syscon on Sky1 SoC
      reset: add Sky1 soc reset support

Guangshuo Li (1):
      reset: gpio: fix double free in reset_add_gpio_aux_device() error path

Philipp Zabel (2):
      reset: core: Fix indentation
      Merge tag 'reset-fixes-for-v7.0-2' into reset/next

Rosen Penev (1):
      reset: sti: kzalloc + kcalloc to kzalloc

Tommaso Merciai (5):
      reset: rzv2h-usb2phy: Keep PHY clock enabled for entire device lifetime
      dt-bindings: reset: renesas,rzv2h-usb2phy: Add '#mux-state-cells' property
      dt-bindings: reset: renesas,rzv2h-usb2phy: Document RZ/G3E USB2PHY reset
      reset: rzv2h-usb2phy: Convert to regmap API
      reset: rzv2h-usb2phy: Add support for VBUS mux controller registration

Yixun Lan (1):
      reset: spacemit: k3: Decouple composite reset lines

 .../reset/renesas,rzv2h-usb2phy-reset.yaml         |   9 +-
 .../bindings/soc/cix/cix,sky1-system-control.yaml  |  42 ++
 Documentation/driver-api/reset.rst                 |   1 -
 drivers/reset/Kconfig                              |   9 +
 drivers/reset/Makefile                             |   1 +
 drivers/reset/core.c                               | 518 ++++++++++++++-------
 drivers/reset/reset-ath79.c                        |  12 +-
 drivers/reset/reset-gpio.c                         |  27 +-
 drivers/reset/reset-intel-gw.c                     |  11 +-
 drivers/reset/reset-lpc18xx.c                      |  12 +-
 drivers/reset/reset-ma35d1.c                       |  11 +-
 drivers/reset/reset-npcm.c                         |  12 +-
 drivers/reset/reset-rzg2l-usbphy-ctrl.c            |   5 +-
 drivers/reset/reset-rzv2h-usb2phy.c                | 197 ++++----
 drivers/reset/reset-sky1.c                         | 367 +++++++++++++++
 drivers/reset/reset-sunplus.c                      |  12 +-
 drivers/reset/spacemit/reset-spacemit-k3.c         |  60 ++-
 drivers/reset/sti/reset-syscfg.c                   |   9 +-
 .../dt-bindings/reset/cix,sky1-s5-system-control.h | 163 +++++++
 .../dt-bindings/reset/cix,sky1-system-control.h    |  41 ++
 include/dt-bindings/reset/spacemit,k3-resets.h     |  48 +-
 include/linux/reset-controller.h                   |  21 +-
 include/linux/reset.h                              |  43 +-
 23 files changed, 1223 insertions(+), 408 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/soc/cix/cix,sky1-system-control.yaml
 create mode 100644 drivers/reset/reset-sky1.c
 create mode 100644 include/dt-bindings/reset/cix,sky1-s5-system-control.h
 create mode 100644 include/dt-bindings/reset/cix,sky1-system-control.h


^ permalink raw reply

* [PATCH v3 0/2] mailbox: Fix wrong completion order and improper send result in the blocking mode send API
From: Joonwon Kang @ 2026-04-02 17:06 UTC (permalink / raw)
  To: jassisinghbrar, matthias.bgg, angelogioacchino.delregno,
	thierry.reding, jonathanh
  Cc: linux-kernel, linux-arm-kernel, linux-mediatek, linux-tegra,
	Joonwon Kang

Hi team,

This patch series fixes the two major issues in blocking mode.

1) Wrong completion order in the send API as described in [1]:

        Thread#1(T1)               Thread#2(T2)
     mbox_send_message           mbox_send_message
            |                           |
            V                           |
        add_to_rbuf(M1)                 V
            |                     add_to_rbuf(M2)
            |                           |
            |                           V
            V                      msg_submit(picks M1)
        msg_submit                      |
            |                           V
            V                   wait_for_completion(on M2)
     wait_for_completion(on M1)         |  (1st in waitQ)
            |   (2nd in waitQ)          V
            V                   wake_up(on completion of M1)<--incorrect

2) Send API does not return the actual send result.

This patch series contains two patches for each issue:
  0001-mailbox-Use-per-thread-completion-to-fix-wrong-co.patch
  0002-mailbox-Make-mbox_send_message-return-error-code-.patch

The first issue has to do with multi-threads support. Given the
discussion in [1] with the mailbox framework maintainer, it has been
long thought that the mailbox framework is designed to support
multi-threads although it missed the completion order issue at its
introduction. The first patch of this series is to fix it.

Alternatively, we could instead declare that the mailbox API does not
support multi-threads [2]. However, it would be a sudden big change to
the mailbox users after the long standing implication of supporting
multi-threads. Plus, it would have disparity with the non-blocking mode
which supports multi-threads already, which could also lead to confusion
to the users by saying "non-blocking mode supports multi-threads whereas
blocking mode doesn't". For this reason, the first patch in this series
does not choose this alternative.

The patch series rules out the case where tx_tick() is called twice or
more for a sent message on the same channel. In theory, it could happen
when timeout occurs. For example, one tx_tick() by the mailbox core due
to timeout and another tx_tick() by the mailbox controller or client by
accident or for any other reason. If it happens, the internal mailbox
state could become inconsistent even on a single thread. Thus, this
issue should be handled in an orthogonal effort later on.

The second issue forces users to register tx done callback to get the
actual send result although they are using the blocking mode send API.
This behavior is different from typical blocking send APIs, which just
return the actual send result directly, and so confusing to the users.
Without knowing this additional requirement of the API, it would be
prone to miss the send result check entirely. The second patch is to fix
it by making the blocking mode send API return the actual send result.

Change log of the first patch:
 - v3: Rebase on the latest for-next branch.
 - v2: Consider the case where timeout occurs and so tx_tick() is called
   for a channel by one thread while another thread is having an active
   request on the same channel. In that case, we mark the inactive
   request as canceled and do not send it to the controller.
 - v1: The previous solution in v0 tries to have per-message completion:
   `tx_cmpl[MBOX_TX_QUEUE_LEN]`; each completion belongs to each slot of
   the message queue: `msg_data[i]`. Those completions take up additional
   memory even when they are not used. Instead, this patch tries to have
   per-"thread" completion; each completion belongs to each sender thread
   and each slot of the message queue has a pointer to that completion;
   `struct mbox_message` has the "pointer" field
   `struct completion *tx_complete` which points to the completion which
   is created on the stack of the sender, instead of owning the
   completion by `struct completion tx_complete`. This way, we could
   avoid additional memory use since a completion will be allocated only
   when necessary. Plus, more importantly, we could avoid the window
   where the same completion is reused by different sender threads, which
   the previous solution still has.
 - v0: This first attempt tries to have per-message completion: [1].

Change log of the second patch:
 - No major change from v1.

References:
 - [1]: https://lore.kernel.org/all/1490809381-28869-1-git-send-email-jaswinder.singh@linaro.org
 - [2]: https://lore.kernel.org/all/CABb+yY39rhTZbtA21MecYk-R9fh7VQQr5kZUgCw4z92mWhZ1Rg@mail.gmail.com/

Joonwon Kang (2):
  mailbox: Use per-thread completion to fix wrong completion order
  mailbox: Make mbox_send_message() return error code when tx fails

 drivers/mailbox/mailbox.c          | 98 ++++++++++++++++++++----------
 drivers/mailbox/mtk-vcp-mailbox.c  |  2 +-
 drivers/mailbox/tegra-hsp.c        |  2 +-
 include/linux/mailbox_controller.h | 22 +++++--
 4 files changed, 85 insertions(+), 39 deletions(-)

Thanks,
Joonwon Kang
-- 
2.53.0.1185.g05d4b7b318-goog

^ permalink raw reply

* [PATCH v3 1/2] mailbox: Use per-thread completion to fix wrong completion order
From: Joonwon Kang @ 2026-04-02 17:06 UTC (permalink / raw)
  To: jassisinghbrar, matthias.bgg, angelogioacchino.delregno,
	thierry.reding, jonathanh
  Cc: linux-kernel, linux-arm-kernel, linux-mediatek, linux-tegra,
	Joonwon Kang, stable
In-Reply-To: <20260402170641.2082547-1-joonwonkang@google.com>

Previously, a sender thread in mbox_send_message() could be woken up at
a wrong time in blocking mode. It is because there was only a single
completion for a channel whereas messages from multiple threads could be
sent in any order; since the shared completion could be signalled in any
order, it could wake up a wrong sender thread.

This commit resolves the false wake-up issue with the following changes:
- Completions are created just as many as the number of concurrent sender
  threads
- A completion is created on a sender thread's stack
- Each slot of the message queue, i.e. `msg_data`, contains a pointer to
  its target completion
- tx_tick() signals the completion of the currently active slot of the
  message queue

Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/all/1490809381-28869-1-git-send-email-jaswinder.singh@linaro.org
Signed-off-by: Joonwon Kang <joonwonkang@google.com>
---
 drivers/mailbox/mailbox.c          | 86 +++++++++++++++++++-----------
 drivers/mailbox/mtk-vcp-mailbox.c  |  2 +-
 drivers/mailbox/tegra-hsp.c        |  2 +-
 include/linux/mailbox_controller.h | 20 ++++---
 4 files changed, 72 insertions(+), 38 deletions(-)

diff --git a/drivers/mailbox/mailbox.c b/drivers/mailbox/mailbox.c
index 138ffbcd4fde..d63386468982 100644
--- a/drivers/mailbox/mailbox.c
+++ b/drivers/mailbox/mailbox.c
@@ -21,7 +21,7 @@
 static LIST_HEAD(mbox_cons);
 static DEFINE_MUTEX(con_mutex);
 
-static int add_to_rbuf(struct mbox_chan *chan, void *mssg)
+static int add_to_rbuf(struct mbox_chan *chan, void *mssg, struct completion *tx_complete)
 {
 	int idx;
 
@@ -32,7 +32,8 @@ static int add_to_rbuf(struct mbox_chan *chan, void *mssg)
 		return -ENOBUFS;
 
 	idx = chan->msg_free;
-	chan->msg_data[idx] = mssg;
+	chan->msg_data[idx].data = mssg;
+	chan->msg_data[idx].tx_complete = tx_complete;
 	chan->msg_count++;
 
 	if (idx == MBOX_TX_QUEUE_LEN - 1)
@@ -50,24 +51,33 @@ static void msg_submit(struct mbox_chan *chan)
 	int err = -EBUSY;
 
 	scoped_guard(spinlock_irqsave, &chan->lock) {
-		if (!chan->msg_count || chan->active_req != MBOX_NO_MSG)
+		if (chan->active_req >= 0)
 			break;
 
-		count = chan->msg_count;
-		idx = chan->msg_free;
-		if (idx >= count)
-			idx -= count;
-		else
-			idx += MBOX_TX_QUEUE_LEN - count;
+		while (chan->msg_count > 0) {
+			count = chan->msg_count;
+			idx = chan->msg_free;
+			if (idx >= count)
+				idx -= count;
+			else
+				idx += MBOX_TX_QUEUE_LEN - count;
 
-		data = chan->msg_data[idx];
+			data = chan->msg_data[idx].data;
+			if (data != MBOX_NO_MSG)
+				break;
+
+			chan->msg_count--;
+		}
+
+		if (!chan->msg_count)
+			break;
 
 		if (chan->cl->tx_prepare)
 			chan->cl->tx_prepare(chan->cl, data);
 		/* Try to submit a message to the MBOX controller */
 		err = chan->mbox->ops->send_data(chan, data);
 		if (!err) {
-			chan->active_req = data;
+			chan->active_req = idx;
 			chan->msg_count--;
 		}
 	}
@@ -79,27 +89,35 @@ static void msg_submit(struct mbox_chan *chan)
 	}
 }
 
-static void tx_tick(struct mbox_chan *chan, int r)
+static void tx_tick(struct mbox_chan *chan, int r, int idx)
 {
-	void *mssg;
+	struct mbox_message mssg = {MBOX_NO_MSG, NULL};
 
 	scoped_guard(spinlock_irqsave, &chan->lock) {
-		mssg = chan->active_req;
-		chan->active_req = MBOX_NO_MSG;
+		if (idx >= 0 && idx != chan->active_req) {
+			chan->msg_data[idx].data = MBOX_NO_MSG;
+			chan->msg_data[idx].tx_complete = NULL;
+			return;
+		}
+
+		if (chan->active_req >= 0) {
+			mssg = chan->msg_data[chan->active_req];
+			chan->active_req = -1;
+		}
 	}
 
 	/* Submit next message */
 	msg_submit(chan);
 
-	if (mssg == MBOX_NO_MSG)
+	if (mssg.data == MBOX_NO_MSG)
 		return;
 
 	/* Notify the client */
 	if (chan->cl->tx_done)
-		chan->cl->tx_done(chan->cl, mssg, r);
+		chan->cl->tx_done(chan->cl, mssg.data, r);
 
 	if (r != -ETIME && chan->cl->tx_block)
-		complete(&chan->tx_complete);
+		complete(mssg.tx_complete);
 }
 
 static enum hrtimer_restart txdone_hrtimer(struct hrtimer *hrtimer)
@@ -112,10 +130,10 @@ static enum hrtimer_restart txdone_hrtimer(struct hrtimer *hrtimer)
 	for (i = 0; i < mbox->num_chans; i++) {
 		struct mbox_chan *chan = &mbox->chans[i];
 
-		if (chan->active_req != MBOX_NO_MSG && chan->cl) {
+		if (chan->active_req >= 0 && chan->cl) {
 			txdone = chan->mbox->ops->last_tx_done(chan);
 			if (txdone)
-				tx_tick(chan, 0);
+				tx_tick(chan, 0, -1);
 			else
 				resched = true;
 		}
@@ -168,7 +186,7 @@ void mbox_chan_txdone(struct mbox_chan *chan, int r)
 		return;
 	}
 
-	tx_tick(chan, r);
+	tx_tick(chan, r, -1);
 }
 EXPORT_SYMBOL_GPL(mbox_chan_txdone);
 
@@ -188,7 +206,7 @@ void mbox_client_txdone(struct mbox_chan *chan, int r)
 		return;
 	}
 
-	tx_tick(chan, r);
+	tx_tick(chan, r, -1);
 }
 EXPORT_SYMBOL_GPL(mbox_client_txdone);
 
@@ -266,11 +284,19 @@ EXPORT_SYMBOL_GPL(mbox_chan_tx_slots_available);
 int mbox_send_message(struct mbox_chan *chan, void *mssg)
 {
 	int t;
+	int idx;
+	struct completion tx_complete;
 
 	if (!chan || !chan->cl || mssg == MBOX_NO_MSG)
 		return -EINVAL;
 
-	t = add_to_rbuf(chan, mssg);
+	if (chan->cl->tx_block) {
+		init_completion(&tx_complete);
+		t = add_to_rbuf(chan, mssg, &tx_complete);
+	} else {
+		t = add_to_rbuf(chan, mssg, NULL);
+	}
+
 	if (t < 0) {
 		dev_err(chan->mbox->dev, "Try increasing MBOX_TX_QUEUE_LEN\n");
 		return t;
@@ -287,10 +313,11 @@ int mbox_send_message(struct mbox_chan *chan, void *mssg)
 		else
 			wait = msecs_to_jiffies(chan->cl->tx_tout);
 
-		ret = wait_for_completion_timeout(&chan->tx_complete, wait);
+		ret = wait_for_completion_timeout(&tx_complete, wait);
 		if (ret == 0) {
+			idx = t;
 			t = -ETIME;
-			tx_tick(chan, t);
+			tx_tick(chan, t, idx);
 		}
 	}
 
@@ -321,7 +348,7 @@ int mbox_flush(struct mbox_chan *chan, unsigned long timeout)
 
 	ret = chan->mbox->ops->flush(chan, timeout);
 	if (ret < 0)
-		tx_tick(chan, ret);
+		tx_tick(chan, ret, -1);
 
 	return ret;
 }
@@ -340,9 +367,8 @@ static int __mbox_bind_client(struct mbox_chan *chan, struct mbox_client *cl)
 	scoped_guard(spinlock_irqsave, &chan->lock) {
 		chan->msg_free = 0;
 		chan->msg_count = 0;
-		chan->active_req = MBOX_NO_MSG;
+		chan->active_req = -1;
 		chan->cl = cl;
-		init_completion(&chan->tx_complete);
 
 		if (chan->txdone_method	== TXDONE_BY_POLL && cl->knows_txdone)
 			chan->txdone_method = TXDONE_BY_ACK;
@@ -498,7 +524,7 @@ void mbox_free_channel(struct mbox_chan *chan)
 	/* The queued TX requests are simply aborted, no callbacks are made */
 	scoped_guard(spinlock_irqsave, &chan->lock) {
 		chan->cl = NULL;
-		chan->active_req = MBOX_NO_MSG;
+		chan->active_req = -1;
 		if (chan->txdone_method == TXDONE_BY_ACK)
 			chan->txdone_method = TXDONE_BY_POLL;
 	}
@@ -553,7 +579,7 @@ int mbox_controller_register(struct mbox_controller *mbox)
 
 		chan->cl = NULL;
 		chan->mbox = mbox;
-		chan->active_req = MBOX_NO_MSG;
+		chan->active_req = -1;
 		chan->txdone_method = txdone;
 		spin_lock_init(&chan->lock);
 	}
diff --git a/drivers/mailbox/mtk-vcp-mailbox.c b/drivers/mailbox/mtk-vcp-mailbox.c
index 1b291b8ea15a..a7bab06ac686 100644
--- a/drivers/mailbox/mtk-vcp-mailbox.c
+++ b/drivers/mailbox/mtk-vcp-mailbox.c
@@ -84,7 +84,7 @@ static int mtk_vcp_mbox_send_data(struct mbox_chan *chan, void *data)
 
 static bool mtk_vcp_mbox_last_tx_done(struct mbox_chan *chan)
 {
-	struct mtk_ipi_info *ipi_info = chan->active_req;
+	struct mtk_ipi_info *ipi_info = chan->msg_data[chan->active_req].data;
 	struct mtk_vcp_mbox *priv = chan->con_priv;
 
 	return !(readl(priv->base + priv->cfg->set_in) & BIT(ipi_info->index));
diff --git a/drivers/mailbox/tegra-hsp.c b/drivers/mailbox/tegra-hsp.c
index 7b1e1b83ea29..efe0033cb5c5 100644
--- a/drivers/mailbox/tegra-hsp.c
+++ b/drivers/mailbox/tegra-hsp.c
@@ -495,7 +495,7 @@ static int tegra_hsp_mailbox_flush(struct mbox_chan *chan,
 			mbox_chan_txdone(chan, 0);
 
 			/* Wait until channel is empty */
-			if (chan->active_req != MBOX_NO_MSG)
+			if (chan->active_req >= 0)
 				continue;
 
 			return 0;
diff --git a/include/linux/mailbox_controller.h b/include/linux/mailbox_controller.h
index e3896b08f22e..912499ad08ed 100644
--- a/include/linux/mailbox_controller.h
+++ b/include/linux/mailbox_controller.h
@@ -113,16 +113,25 @@ struct mbox_controller {
  */
 #define MBOX_TX_QUEUE_LEN	20
 
+/**
+ * struct mbox_message - Internal representation of a mailbox message
+ * @data:		Data packet
+ * @tx_complete:	Pointer to the transmission completion
+ */
+struct mbox_message {
+	void *data;
+	struct completion *tx_complete;
+};
+
 /**
  * struct mbox_chan - s/w representation of a communication chan
  * @mbox:		Pointer to the parent/provider of this channel
  * @txdone_method:	Way to detect TXDone chosen by the API
  * @cl:			Pointer to the current owner of this channel
- * @tx_complete:	Transmission completion
- * @active_req:		Currently active request hook
+ * @active_req:		Index of the currently active slot in the queue
  * @msg_count:		No. of mssg currently queued
  * @msg_free:		Index of next available mssg slot
- * @msg_data:		Hook for data packet
+ * @msg_data:		Queue of data packets
  * @lock:		Serialise access to the channel
  * @con_priv:		Hook for controller driver to attach private data
  */
@@ -130,10 +139,9 @@ struct mbox_chan {
 	struct mbox_controller *mbox;
 	unsigned txdone_method;
 	struct mbox_client *cl;
-	struct completion tx_complete;
-	void *active_req;
+	int active_req;
 	unsigned msg_count, msg_free;
-	void *msg_data[MBOX_TX_QUEUE_LEN];
+	struct mbox_message msg_data[MBOX_TX_QUEUE_LEN];
 	spinlock_t lock; /* Serialise access to the channel */
 	void *con_priv;
 };
-- 
2.53.0.1185.g05d4b7b318-goog



^ permalink raw reply related

* [PATCH v3 2/2] mailbox: Make mbox_send_message() return error code when tx fails
From: Joonwon Kang @ 2026-04-02 17:06 UTC (permalink / raw)
  To: jassisinghbrar, matthias.bgg, angelogioacchino.delregno,
	thierry.reding, jonathanh
  Cc: linux-kernel, linux-arm-kernel, linux-mediatek, linux-tegra,
	Joonwon Kang, stable
In-Reply-To: <20260402170641.2082547-1-joonwonkang@google.com>

When the mailbox controller failed transmitting message, the error code
was only passed to the client's tx done handler and not to
mbox_send_message(). For this reason, the function could return a false
success. This commit resolves the issue by introducing the tx status and
checking it before mbox_send_message() returns.

Cc: stable@vger.kernel.org
Signed-off-by: Joonwon Kang <joonwonkang@google.com>
---
 drivers/mailbox/mailbox.c          | 20 +++++++++++++++-----
 include/linux/mailbox_controller.h |  2 ++
 2 files changed, 17 insertions(+), 5 deletions(-)

diff --git a/drivers/mailbox/mailbox.c b/drivers/mailbox/mailbox.c
index d63386468982..ea9aec9dc947 100644
--- a/drivers/mailbox/mailbox.c
+++ b/drivers/mailbox/mailbox.c
@@ -21,7 +21,10 @@
 static LIST_HEAD(mbox_cons);
 static DEFINE_MUTEX(con_mutex);
 
-static int add_to_rbuf(struct mbox_chan *chan, void *mssg, struct completion *tx_complete)
+static int add_to_rbuf(struct mbox_chan *chan,
+		       void *mssg,
+		       struct completion *tx_complete,
+		       int *tx_status)
 {
 	int idx;
 
@@ -34,6 +37,7 @@ static int add_to_rbuf(struct mbox_chan *chan, void *mssg, struct completion *tx
 	idx = chan->msg_free;
 	chan->msg_data[idx].data = mssg;
 	chan->msg_data[idx].tx_complete = tx_complete;
+	chan->msg_data[idx].tx_status = tx_status;
 	chan->msg_count++;
 
 	if (idx == MBOX_TX_QUEUE_LEN - 1)
@@ -91,12 +95,13 @@ static void msg_submit(struct mbox_chan *chan)
 
 static void tx_tick(struct mbox_chan *chan, int r, int idx)
 {
-	struct mbox_message mssg = {MBOX_NO_MSG, NULL};
+	struct mbox_message mssg = {MBOX_NO_MSG, NULL, NULL};
 
 	scoped_guard(spinlock_irqsave, &chan->lock) {
 		if (idx >= 0 && idx != chan->active_req) {
 			chan->msg_data[idx].data = MBOX_NO_MSG;
 			chan->msg_data[idx].tx_complete = NULL;
+			chan->msg_data[idx].tx_status = NULL;
 			return;
 		}
 
@@ -116,8 +121,10 @@ static void tx_tick(struct mbox_chan *chan, int r, int idx)
 	if (chan->cl->tx_done)
 		chan->cl->tx_done(chan->cl, mssg.data, r);
 
-	if (r != -ETIME && chan->cl->tx_block)
+	if (r != -ETIME && chan->cl->tx_block) {
+		*mssg.tx_status = r;
 		complete(mssg.tx_complete);
+	}
 }
 
 static enum hrtimer_restart txdone_hrtimer(struct hrtimer *hrtimer)
@@ -286,15 +293,16 @@ int mbox_send_message(struct mbox_chan *chan, void *mssg)
 	int t;
 	int idx;
 	struct completion tx_complete;
+	int tx_status = 0;
 
 	if (!chan || !chan->cl || mssg == MBOX_NO_MSG)
 		return -EINVAL;
 
 	if (chan->cl->tx_block) {
 		init_completion(&tx_complete);
-		t = add_to_rbuf(chan, mssg, &tx_complete);
+		t = add_to_rbuf(chan, mssg, &tx_complete, &tx_status);
 	} else {
-		t = add_to_rbuf(chan, mssg, NULL);
+		t = add_to_rbuf(chan, mssg, NULL, NULL);
 	}
 
 	if (t < 0) {
@@ -318,6 +326,8 @@ int mbox_send_message(struct mbox_chan *chan, void *mssg)
 			idx = t;
 			t = -ETIME;
 			tx_tick(chan, t, idx);
+		} else if (tx_status < 0) {
+			t = tx_status;
 		}
 	}
 
diff --git a/include/linux/mailbox_controller.h b/include/linux/mailbox_controller.h
index 912499ad08ed..890da97bcb50 100644
--- a/include/linux/mailbox_controller.h
+++ b/include/linux/mailbox_controller.h
@@ -117,10 +117,12 @@ struct mbox_controller {
  * struct mbox_message - Internal representation of a mailbox message
  * @data:		Data packet
  * @tx_complete:	Pointer to the transmission completion
+ * @tx_status:		Pointer to the transmission status
  */
 struct mbox_message {
 	void *data;
 	struct completion *tx_complete;
+	int *tx_status;
 };
 
 /**
-- 
2.53.0.1185.g05d4b7b318-goog



^ permalink raw reply related

* Re: [PATCH net v4 0/2] stmmac crash/stall fixes when under memory pressure
From: Russell King (Oracle) @ 2026-04-02 17:16 UTC (permalink / raw)
  To: Sam Edwards
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Maxime Coquelin, Alexandre Torgue, Maxime Chevallier,
	Ovidiu Panait, Vladimir Oltean, Baruch Siach, Serge Semin,
	Giuseppe Cavallaro, netdev, linux-stm32, linux-arm-kernel,
	linux-kernel
In-Reply-To: <20260401041929.12392-1-CFSworks@gmail.com>

On Tue, Mar 31, 2026 at 09:19:27PM -0700, Sam Edwards wrote:
> Hi netdev,
> 
> This is v4 of my series containing a pair of bugfixes for the stmmac driver's
> receive pipeline. These issues occur when stmmac_rx_refill() does not (fully)
> succeed, which happens more frequently when free memory is low.
> 
> The first patch closes Bugzilla bug #221010 [1], where stmmac_rx() can circle
> around to a still-dirty descriptor (with a NULL buffer pointer), mistake it for
> a filled descriptor (due to OWN=0), and attempt to dereference the buffer.
> 
> In testing that patch, I discovered a second issue: starvation of available RX
> buffers causes the NIC to stop sending interrupts; if the driver stops polling,
> it will wait indefinitely for an interrupt that will never come. (Note: the
> first patch makes this issue more prominent -- mostly because it lets the
> system survive long enough to exhibit it -- but doesn't *cause* it.) The second
> patch addresses that problem as well.
> 
> Both patches are minimal, appropriate for stable, and designated to `net`. My
> focus is on small, obviously-correct, easy-to-explain changes: I'll follow up
> with another patch/series (something like [2]) for `net-next` that fixes the
> ring in a more robust way.
> 
> The tx and zc paths seem to have similar low-memory bugs, to be addressed in
> separate series.

I've tested this on my Jetson Xavier platform. One of the issues I've
had is that running iperf3 results in the receive side stalling because
it runs out of descriptors. However, despite the receive ring
eventually being re-filled and the hardware appropriately prodded, it
steadfastly refuses to restart, despite the descriptors having been
updated.

What I can see is there's 40 packets in the internal FIFOs via the
PRXQ[13:0] field of the ETH_MTLRXQxDR register.

With your patches applied:

root@tegra-ubuntu:~# iperf3 -c 192.168.248.1 -R
Connecting to host 192.168.248.1, port 5201
Reverse mode, remote host 192.168.248.1 is sending
[  5] local 192.168.248.174 port 43728 connected to 192.168.248.1 port 5201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec  30.3 MBytes   254 Mbits/sec
[  5]   1.00-2.00   sec  0.00 Bytes  0.00 bits/sec
[  5]   2.00-3.00   sec  0.00 Bytes  0.00 bits/sec
[  5]   3.00-4.00   sec  0.00 Bytes  0.00 bits/sec
[  5]   4.00-5.00   sec  0.00 Bytes  0.00 bits/sec
[  5]   5.00-6.00   sec  0.00 Bytes  0.00 bits/sec
...

The remote system says:

Accepted connection from 192.168.248.174, port 43720
[  5] local 192.168.248.1 port 5201 connected to 192.168.248.174 port 43728
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  31.8 MBytes   266 Mbits/sec    1   1.41 KBytes
[  5]   1.00-2.00   sec  0.00 Bytes  0.00 bits/sec    1   1.41 KBytes
[  5]   2.00-3.00   sec  0.00 Bytes  0.00 bits/sec    0   1.41 KBytes

This is a dwmac v5.0. I think the relevant registers are:

ETH_MTLQxICSR: Value at address 0x02490d2c: 0x00000002

ETH_MTLRXQxDR: Value at address 0x02490d38: 0x00280020

 PRXQ[13:0]: Number of Packets in Receive Queue = 40
 RXQSTS[1:0]: MTL Rx Queue Fill-Level status = 2 (Rx queue fill-level
 above flow-control activate threshold)
 RRXSTS[1:0]: MTL Rx Queue Read Controller State = 0 (Idle)
 RWCSTS: MTL Rx Queue Write Controller Active Status = 0

ETH_DMADS1R: Value at address 0x0249100c: 0x00006300

 RPS0[3:0]: DMA Channel Receive Process State = Running (Waiting for Rx
 packet)

ETH_DMACxRXDTPR: Value at address 0x02491128: 0xffffe7d0

 RDT[31:0]: Receive Descriptor Tail Pointer (writes should apparently
 trigger DMA to start, but doesn't seem to)

ETH_DMACxCARXDR: Value at address 0x0249114c: 0xffffe7d0

 CURRDESAPTR[31:0]: Application receive descriptor address pointer

ETH_DMACxCARXBR: Value at address 0x0249115c: 0xffb55040

 CURRBUFAPTR[31:0]: Application receive buffer address pointer

 This makes it look like it's read the descriptor at 0x7fffffe7d0.

ETH_DMAC0SR: Value at address 0x02491160: 0x00000484

 ETI (early transmit interrupt)=1 RBU (receive buffer unavailable)=1
 TBU (transmit buffer unavailable)=1

 Clearing RBU doesn't seem to help.

Descriptors:

123 [0x0000007fffffe7b0]: 0xffbba040 0x7f 0x0 0x81000000
124 [0x0000007fffffe7c0]: 0xffbb9040 0x7f 0x0 0x81000000
125 [0x0000007fffffe7d0]: 0xffb55040 0x7f 0x0 0x81000000 <---
126 [0x0000007fffffe7e0]: 0xffc26040 0x7f 0x0 0x81000000
127 [0x0000007fffffe7f0]: 0xfff95040 0x7f 0x0 0x81000000

So they're all refilled.

Any ideas?


Full register dump via devmem2 (ethtool -d doesn't dump all registers):

Value at address 0x02490000: 0x08072203
Value at address 0x02490004: 0x00200000
Value at address 0x02490008: 0x00010404
Value at address 0x0249000c: 0x00000000
Value at address 0x02490010: 0x00040008
Value at address 0x02490014: 0x20000008
Value at address 0x02490018: 0x00000001
Value at address 0x0249001c: 0x00000001
Value at address 0x02490020: 0x00000000
Value at address 0x02490024: 0x00000000
Value at address 0x02490028: 0x00000000
Value at address 0x0249002c: 0x00000000
Value at address 0x02490030: 0x00000000
Value at address 0x02490034: 0x00000000
Value at address 0x02490038: 0x00000000
Value at address 0x0249003c: 0x00000000
Value at address 0x02490040: 0x00000000
Value at address 0x02490044: 0x00000000
Value at address 0x02490048: 0x00000000
Value at address 0x0249004c: 0x00000000
Value at address 0x02490050: 0x01600000
Value at address 0x02490054: 0x00000000
Value at address 0x02490058: 0x00000000
Value at address 0x0249005c: 0x00000000
Value at address 0x02490060: 0x00120000
Value at address 0x02490064: 0x00000000
Value at address 0x02490068: 0x00000000
Value at address 0x0249006c: 0x00000000
Value at address 0x02490070: 0xffff0002
Value at address 0x02490074: 0x00000000
Value at address 0x02490078: 0x00000000
Value at address 0x0249007c: 0x00000000
Value at address 0x02490080: 0x00000000
Value at address 0x02490084: 0x00000000
Value at address 0x02490088: 0x00000000
Value at address 0x0249008c: 0x00000000
Value at address 0x02490090: 0x00000001
Value at address 0x02490094: 0x00000000
Value at address 0x02490098: 0x00000000
Value at address 0x0249009c: 0x00000000
Value at address 0x024900a0: 0x00000002
Value at address 0x024900a4: 0x00000000
Value at address 0x024900a8: 0x00000000
Value at address 0x024900ac: 0x00000000
Value at address 0x024900b0: 0x00000001
Value at address 0x024900b4: 0x00001030
Value at address 0x024900b8: 0x00000000
Value at address 0x024900bc: 0x00000000
Value at address 0x024900c0: 0x00000000
Value at address 0x024900c4: 0x00000000
Value at address 0x024900c8: 0x00000000
Value at address 0x024900cc: 0x00000000
Value at address 0x024900d0: 0x003b0200
Value at address 0x024900d4: 0x03e8001e
Value at address 0x024900d8: 0x000f4240
Value at address 0x024900dc: 0x0000007c
Value at address 0x024900e0: 0x00000000
Value at address 0x024900e4: 0x00000000
Value at address 0x024900e8: 0x00000000
Value at address 0x024900ec: 0x00000000
Value at address 0x024900f0: 0x00000000
Value at address 0x024900f4: 0x00000000
Value at address 0x024900f8: 0x000d0000
Value at address 0x024900fc: 0x00000000
Value at address 0x02490100: 0x00000000
Value at address 0x02490104: 0x00000000
Value at address 0x02490108: 0x00000000
Value at address 0x0249010c: 0x00000000
Value at address 0x02490110: 0x00001050
Value at address 0x02490114: 0x00000000
Value at address 0x02490118: 0x00000000
Value at address 0x0249011c: 0x1bfd73f7
Value at address 0x02490120: 0x421e7a49
Value at address 0x02490124: 0x100c30c3
Value at address 0x02490128: 0x00320220
Value at address 0x02490200: 0x000e010c
Value at address 0x02490204: 0x00000006
Value at address 0x02490208: 0x00000000
Value at address 0x0249020c: 0x00000000
Value at address 0x02490210: 0x00000000
Value at address 0x02490214: 0x00000000
Value at address 0x02490218: 0x00000000
Value at address 0x0249021c: 0x00000000
Value at address 0x02490220: 0x00000000
Value at address 0x02490224: 0x00000000
Value at address 0x02490228: 0x00000000
Value at address 0x0249022c: 0x00000000
Value at address 0x02490230: 0x00000000
Value at address 0x02490234: 0x00000000
Value at address 0x02490238: 0x00000102
Value at address 0x02490d00: 0x00ff000a
Value at address 0x02490d04: 0x00000000
Value at address 0x02490d08: 0x00000000
Value at address 0x02490d0c: 0x00000000
Value at address 0x02490d10: 0x00000000
Value at address 0x02490d14: 0x00000030
Value at address 0x02490d18: 0x00000000
Value at address 0x02490d1c: 0x00000000
Value at address 0x02490d20: 0x00000000
Value at address 0x02490d24: 0x00000000
Value at address 0x02490d28: 0x00000000
Value at address 0x02490d2c: 0x00000002
Value at address 0x02490d30: 0x0ff1c4e0
Value at address 0x02490d34: 0x00000000
Value at address 0x02490d38: 0x00280020
Value at address 0x02490d3c: 0x00000000
Value at address 0x02491000: 0x00000000
Value at address 0x02491004: 0x0002180e
Value at address 0x02491008: 0x00000000
Value at address 0x0249100c: 0x00006300
Value at address 0x02491010: 0x00000000
Value at address 0x02491014: 0x00000000
Value at address 0x02491018: 0x00000000
Value at address 0x0249101c: 0x00000000
Value at address 0x02491020: 0x00000000
Value at address 0x02491024: 0x00000000
Value at address 0x02491028: 0x00000000
Value at address 0x0249102c: 0x00000000
Value at address 0x02491030: 0x00000000
Value at address 0x02491034: 0x00000000
Value at address 0x02491038: 0x00000000
Value at address 0x0249103c: 0x00000000
Value at address 0x02491040: 0x00000000
Value at address 0x02491044: 0x00000000
Value at address 0x02491048: 0x00000000
Value at address 0x0249104c: 0x00000000
Value at address 0x02491050: 0x00000000
Value at address 0x02491054: 0x00000000
Value at address 0x02491100: 0x00010000
Value at address 0x02491104: 0x00101011
Value at address 0x02491108: 0x00080c01
Value at address 0x0249110c: 0x00000000
Value at address 0x02491110: 0x0000007f
Value at address 0x02491114: 0xffffc000
Value at address 0x02491118: 0x0000007f
Value at address 0x0249111c: 0xffffe000
Value at address 0x02491120: 0xffffc500
Value at address 0x02491124: 0x00000000
Value at address 0x02491128: 0xffffe7d0
Value at address 0x0249112c: 0x000001ff
Value at address 0x02491130: 0x000001ff
Value at address 0x02491134: 0x0000d041
Value at address 0x02491138: 0x000000a0
Value at address 0x0249113c: 0x000d07c0
Value at address 0x02491140: 0x00000000
Value at address 0x02491144: 0xffffc500
Value at address 0x02491148: 0x00000000
Value at address 0x0249114c: 0xffffe7d0
Value at address 0x02491150: 0x0000007f
Value at address 0x02491154: 0xffc45b02
Value at address 0x02491158: 0x0000007f
Value at address 0x0249115c: 0xffb55040
Value at address 0x02491160: 0x00000484
Value at address 0x02491164: 0x00000000
Value at address 0x02491168: 0x00000000
Value at address 0x0249116c: 0x00000000
Value at address 0x02491170: 0x00000000
Value at address 0x02491174: 0x00000000
Value at address 0x02491178: 0x00000000
Value at address 0x0249117c: 0x00000000

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!


^ permalink raw reply

* Re: [PATCH RESEND] thermal/drivers/brcmstb_thermal: Use max to simplify brcmstb_get_temp
From: Daniel Lezcano @ 2026-04-02 17:24 UTC (permalink / raw)
  To: Thorsten Blum, Markus Mayer, Broadcom internal kernel review list,
	Rafael J. Wysocki, Daniel Lezcano, Zhang Rui, Lukasz Luba,
	Florian Fainelli
  Cc: linux-pm, linux-arm-kernel, linux-kernel
In-Reply-To: <20260402165616.895305-3-thorsten.blum@linux.dev>

On 4/2/26 18:56, Thorsten Blum wrote:
> Use max() to simplify brcmstb_get_temp() and improve its readability.
> Since avs_tmon_code_to_temp() returns an int, change the data type of
> the local variable 't' from long to int.  No functional changes.
> 
> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
> Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
> ---

Applied, thanks



^ permalink raw reply

* Re: [PATCH net v4 0/2] stmmac crash/stall fixes when under memory pressure
From: Russell King (Oracle) @ 2026-04-02 17:26 UTC (permalink / raw)
  To: Sam Edwards
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Maxime Coquelin, Alexandre Torgue, Maxime Chevallier,
	Ovidiu Panait, Vladimir Oltean, Baruch Siach, Serge Semin,
	Giuseppe Cavallaro, netdev, linux-stm32, linux-arm-kernel,
	linux-kernel
In-Reply-To: <ac6kfQ98Xjt3dCGj@shell.armlinux.org.uk>

On Thu, Apr 02, 2026 at 06:16:45PM +0100, Russell King (Oracle) wrote:
> On Tue, Mar 31, 2026 at 09:19:27PM -0700, Sam Edwards wrote:
> > Hi netdev,
> > 
> > This is v4 of my series containing a pair of bugfixes for the stmmac driver's
> > receive pipeline. These issues occur when stmmac_rx_refill() does not (fully)
> > succeed, which happens more frequently when free memory is low.
> > 
> > The first patch closes Bugzilla bug #221010 [1], where stmmac_rx() can circle
> > around to a still-dirty descriptor (with a NULL buffer pointer), mistake it for
> > a filled descriptor (due to OWN=0), and attempt to dereference the buffer.
> > 
> > In testing that patch, I discovered a second issue: starvation of available RX
> > buffers causes the NIC to stop sending interrupts; if the driver stops polling,
> > it will wait indefinitely for an interrupt that will never come. (Note: the
> > first patch makes this issue more prominent -- mostly because it lets the
> > system survive long enough to exhibit it -- but doesn't *cause* it.) The second
> > patch addresses that problem as well.
> > 
> > Both patches are minimal, appropriate for stable, and designated to `net`. My
> > focus is on small, obviously-correct, easy-to-explain changes: I'll follow up
> > with another patch/series (something like [2]) for `net-next` that fixes the
> > ring in a more robust way.
> > 
> > The tx and zc paths seem to have similar low-memory bugs, to be addressed in
> > separate series.
> 
> I've tested this on my Jetson Xavier platform. One of the issues I've
> had is that running iperf3 results in the receive side stalling because
> it runs out of descriptors. However, despite the receive ring
> eventually being re-filled and the hardware appropriately prodded, it
> steadfastly refuses to restart, despite the descriptors having been
> updated.

I'll make it clear: this problem exists without your patches, so it
is not a regression.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!


^ permalink raw reply

* Re: [PATCH v4 3/4] PCI: tegra: Add Tegra264 support
From: Manivannan Sadhasivam @ 2026-04-02 17:32 UTC (permalink / raw)
  To: Thierry Reding
  Cc: Bjorn Helgaas, Lorenzo Pieralisi, Krzysztof Wilczyński,
	Rob Herring, Krzysztof Kozlowski, Conor Dooley, Thierry Reding,
	Jonathan Hunter, Karthikeyan Mitran, Hou Zhiqiang,
	Thomas Petazzoni, Pali Rohár, Michal Simek, Kevin Xie,
	linux-pci, devicetree, linux-tegra, linux-kernel,
	linux-arm-kernel, Thierry Reding, Manikanta Maddireddy
In-Reply-To: <20260402-tegra264-pcie-v4-3-21e2e19987e8@nvidia.com>

On Thu, Apr 02, 2026 at 04:27:37PM +0200, Thierry Reding wrote:
> From: Thierry Reding <treding@nvidia.com>
> 
> Add a driver for the PCIe controller found on NVIDIA Tegra264 SoCs. The
> driver is very small, with its main purpose being to set up the address
> translation registers and then creating a standard PCI host using ECAM.
> 
> Signed-off-by: Manikanta Maddireddy <mmaddireddy@nvidia.com>
> Signed-off-by: Thierry Reding <treding@nvidia.com>

What is the rationale for adding a new driver? Can't you reuse the existing one?
If so, that should be mentioned in the description.

> ---
> Changes in v2:
> - specify generations applicable for PCI_TEGRA driver to avoid confusion
> - drop SPDX-FileCopyrightText tag
> - rename link_state to link_up to clarify meaning
> - replace memset() by an empty initializer
> - sanity-check only enable BAR regions
> - bring PCI link out of reset in case firmware didn't
> - use common wait times instead of defining our own
> - use core helpers to parse and print PCI link speed
> - fix multi-line comment
> - use dev_err_probe() more ubiquitously
> - fix probe sequence and error cleanup
> - use DEFINE_NOIRQ_DEV_PM_OPS() to avoid warnings for !PM_SUSPEND
> - reuse more standard registers and remove unused register definitions
> - use %pe and ERR_PTR() to print symbolic errors
> - add signed-off-by from Manikanta as the original author
> - add myself as author after significantly modifying the driver
> ---
>  drivers/pci/controller/Kconfig         |  10 +-
>  drivers/pci/controller/Makefile        |   1 +
>  drivers/pci/controller/pcie-tegra264.c | 527 +++++++++++++++++++++++++++++++++
>  3 files changed, 537 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/controller/Kconfig b/drivers/pci/controller/Kconfig
> index 5aaed8ac6e44..6ead04f7bd6e 100644
> --- a/drivers/pci/controller/Kconfig
> +++ b/drivers/pci/controller/Kconfig
> @@ -254,7 +254,15 @@ config PCI_TEGRA
>  	select IRQ_MSI_LIB
>  	help
>  	  Say Y here if you want support for the PCIe host controller found
> -	  on NVIDIA Tegra SoCs.
> +	  on NVIDIA Tegra SoCs (Tegra20 through Tegra186).
> +
> +config PCIE_TEGRA264
> +	bool "NVIDIA Tegra264 PCIe controller"

This driver seems to be using external MSI controller. So it can be built as a
module. Also, you have the remove() callback for some reason.

> +	depends on ARCH_TEGRA || COMPILE_TEST
> +	depends on PCI_MSI

Why?

> +	help
> +	  Say Y here if you want support for the PCIe host controller found
> +	  on NVIDIA Tegra264 SoCs.
>  
>  config PCIE_RCAR_HOST
>  	bool "Renesas R-Car PCIe controller (host mode)"
> diff --git a/drivers/pci/controller/Makefile b/drivers/pci/controller/Makefile
> index ac8db283f0fe..d478743b5142 100644
> --- a/drivers/pci/controller/Makefile
> +++ b/drivers/pci/controller/Makefile
> @@ -7,6 +7,7 @@ obj-$(CONFIG_PCI_HYPERV_INTERFACE) += pci-hyperv-intf.o
>  obj-$(CONFIG_PCI_MVEBU) += pci-mvebu.o
>  obj-$(CONFIG_PCI_AARDVARK) += pci-aardvark.o
>  obj-$(CONFIG_PCI_TEGRA) += pci-tegra.o
> +obj-$(CONFIG_PCIE_TEGRA264) += pcie-tegra264.o
>  obj-$(CONFIG_PCI_RCAR_GEN2) += pci-rcar-gen2.o
>  obj-$(CONFIG_PCIE_RCAR_HOST) += pcie-rcar.o pcie-rcar-host.o
>  obj-$(CONFIG_PCIE_RCAR_EP) += pcie-rcar.o pcie-rcar-ep.o
> diff --git a/drivers/pci/controller/pcie-tegra264.c b/drivers/pci/controller/pcie-tegra264.c
> new file mode 100644
> index 000000000000..3ce1ad971bdb
> --- /dev/null
> +++ b/drivers/pci/controller/pcie-tegra264.c
> @@ -0,0 +1,527 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * PCIe host controller driver for Tegra264 SoC
> + *
> + * Copyright (c) 2022-2026, NVIDIA CORPORATION. All rights reserved.
> + */
> +
> +#include <linux/delay.h>
> +#include <linux/gpio/consumer.h>
> +#include <linux/init.h>
> +#include <linux/interconnect.h>
> +#include <linux/interrupt.h>
> +#include <linux/iopoll.h>
> +#include <linux/kernel.h>
> +#include <linux/module.h>
> +#include <linux/of_address.h>
> +#include <linux/of_device.h>
> +#include <linux/of.h>
> +#include <linux/of_pci.h>
> +#include <linux/of_platform.h>
> +#include <linux/pci-ecam.h>
> +#include <linux/pci.h>
> +#include <linux/pinctrl/consumer.h>
> +#include <linux/platform_device.h>
> +#include <linux/pm_runtime.h>
> +
> +#include <soc/tegra/bpmp.h>
> +#include <soc/tegra/bpmp-abi.h>
> +#include <soc/tegra/fuse.h>
> +
> +#include "../pci.h"
> +
> +/* XAL registers */
> +#define XAL_RC_ECAM_BASE_HI			0x00
> +#define XAL_RC_ECAM_BASE_LO			0x04
> +#define XAL_RC_ECAM_BUSMASK			0x08
> +#define XAL_RC_IO_BASE_HI			0x0c
> +#define XAL_RC_IO_BASE_LO			0x10
> +#define XAL_RC_IO_LIMIT_HI			0x14
> +#define XAL_RC_IO_LIMIT_LO			0x18
> +#define XAL_RC_MEM_32BIT_BASE_HI		0x1c
> +#define XAL_RC_MEM_32BIT_BASE_LO		0x20
> +#define XAL_RC_MEM_32BIT_LIMIT_HI		0x24
> +#define XAL_RC_MEM_32BIT_LIMIT_LO		0x28
> +#define XAL_RC_MEM_64BIT_BASE_HI		0x2c
> +#define XAL_RC_MEM_64BIT_BASE_LO		0x30
> +#define XAL_RC_MEM_64BIT_LIMIT_HI		0x34
> +#define XAL_RC_MEM_64BIT_LIMIT_LO		0x38
> +#define XAL_RC_BAR_CNTL_STANDARD		0x40
> +#define XAL_RC_BAR_CNTL_STANDARD_IOBAR_EN	BIT(0)
> +#define XAL_RC_BAR_CNTL_STANDARD_32B_BAR_EN	BIT(1)
> +#define XAL_RC_BAR_CNTL_STANDARD_64B_BAR_EN	BIT(2)
> +
> +/* XTL registers */
> +#define XTL_RC_PCIE_CFG_LINK_STATUS		0x5a
> +
> +#define XTL_RC_MGMT_PERST_CONTROL		0x218
> +#define XTL_RC_MGMT_PERST_CONTROL_PERST_O_N	BIT(0)
> +
> +#define XTL_RC_MGMT_CLOCK_CONTROL		0x47c
> +#define XTL_RC_MGMT_CLOCK_CONTROL_PEX_CLKREQ_I_N_PIN_USE_CONV_TO_PRSNT	BIT(9)
> +
> +struct tegra264_pcie {
> +	struct device *dev;
> +	bool link_up;

Keep bool types at the end to avoid holes.

> +
> +	/* I/O memory */
> +	void __iomem *xal;
> +	void __iomem *xtl;
> +	void __iomem *ecam;
> +
> +	/* bridge configuration */
> +	struct pci_config_window *cfg;
> +	struct pci_host_bridge *bridge;
> +
> +	/* wake IRQ */
> +	struct gpio_desc *wake_gpio;
> +	unsigned int wake_irq;
> +
> +	/* BPMP and bandwidth management */
> +	struct icc_path *icc_path;
> +	struct tegra_bpmp *bpmp;
> +	u32 ctl_id;
> +};
> +
> +static int tegra264_pcie_parse_dt(struct tegra264_pcie *pcie)
> +{
> +	int err;
> +
> +	pcie->wake_gpio = devm_gpiod_get_optional(pcie->dev, "nvidia,pex-wake",

You should switch to standard 'wake-gpios' property.

> +						  GPIOD_IN);
> +	if (IS_ERR(pcie->wake_gpio))
> +		return PTR_ERR(pcie->wake_gpio);
> +
> +	if (pcie->wake_gpio) {

Since you are bailing out above, you don't need this check.

> +		device_init_wakeup(pcie->dev, true);
> +
> +		err = gpiod_to_irq(pcie->wake_gpio);
> +		if (err < 0) {
> +			dev_err(pcie->dev, "failed to get wake IRQ: %pe\n",
> +				ERR_PTR(err));
> +			return err;
> +		}
> +
> +		pcie->wake_irq = (unsigned int)err;
> +	}
> +
> +	return 0;
> +}
> +
> +static void tegra264_pcie_bpmp_set_rp_state(struct tegra264_pcie *pcie)

I don't think this function name is self explanatory. Looks like it is turning
off the PCIe controller, so how about tegra264_pcie_power_off()?

> +{
> +	struct tegra_bpmp_message msg = {};
> +	struct mrq_pcie_request req = {};
> +	int err;
> +
> +	req.cmd = CMD_PCIE_RP_CONTROLLER_OFF;
> +	req.rp_ctrlr_off.rp_controller = pcie->ctl_id;
> +
> +	msg.mrq = MRQ_PCIE;
> +	msg.tx.data = &req;
> +	msg.tx.size = sizeof(req);
> +
> +	err = tegra_bpmp_transfer(pcie->bpmp, &msg);
> +	if (err)
> +		dev_info(pcie->dev, "failed to turn off PCIe #%u: %pe\n",

Why not dev_err()?

> +			 pcie->ctl_id, ERR_PTR(err));
> +
> +	if (msg.rx.ret)
> +		dev_info(pcie->dev, "failed to turn off PCIe #%u: %d\n",

Same here.

> +			 pcie->ctl_id, msg.rx.ret);
> +}
> +
> +static void tegra264_pcie_icc_set(struct tegra264_pcie *pcie)
> +{
> +	u32 value, speed, width, bw;
> +	int err;
> +
> +	value = readw(pcie->ecam + XTL_RC_PCIE_CFG_LINK_STATUS);
> +	speed = FIELD_GET(PCI_EXP_LNKSTA_CLS, value);
> +	width = FIELD_GET(PCI_EXP_LNKSTA_NLW, value);
> +
> +	bw = width * (PCIE_SPEED2MBS_ENC(speed) / BITS_PER_BYTE);
> +	value = MBps_to_icc(bw);

So this becomes, 'width * (PCIE_SPEED2MBS_ENC(speed) / 8) * 1000 / 8'. But don't
you want, 'width * (PCIE_SPEED2MBS_ENC(speed)) * 1000 / 8'?

> +
> +	err = icc_set_bw(pcie->icc_path, bw, bw);
> +	if (err < 0)
> +		dev_err(pcie->dev,
> +			"failed to request bandwidth (%u MBps): %pe\n",
> +			bw, ERR_PTR(err));

So you don't want to error out if this fails?

> +}
> +
> +/*
> + * The various memory regions used by the controller (I/O, memory, ECAM) are
> + * set up during early boot and have hardware-level protections in place. If
> + * the DT ranges don't match what's been setup, the controller won't be able
> + * to write the address endpoints properly, so make sure to validate that DT
> + * and firmware programming agree on these ranges.
> + */
> +static bool tegra264_pcie_check_ranges(struct platform_device *pdev)
> +{
> +	struct tegra264_pcie *pcie = platform_get_drvdata(pdev);
> +	struct device_node *np = pcie->dev->of_node;
> +	struct of_pci_range_parser parser;
> +	phys_addr_t phys, limit, hi, lo;
> +	struct of_pci_range range;
> +	struct resource *res;
> +	bool status = true;
> +	u32 value;
> +	int err;
> +
> +	err = of_pci_range_parser_init(&parser, np);
> +	if (err < 0)
> +		return false;
> +
> +	for_each_of_pci_range(&parser, &range) {
> +		unsigned int addr_hi, addr_lo, limit_hi, limit_lo, enable;
> +		unsigned long type = range.flags & IORESOURCE_TYPE_BITS;
> +		phys_addr_t start, end, mask;
> +		const char *region = NULL;
> +
> +		end = range.cpu_addr + range.size - 1;
> +		start = range.cpu_addr;
> +
> +		switch (type) {
> +		case IORESOURCE_IO:
> +			addr_hi = XAL_RC_IO_BASE_HI;
> +			addr_lo = XAL_RC_IO_BASE_LO;
> +			limit_hi = XAL_RC_IO_LIMIT_HI;
> +			limit_lo = XAL_RC_IO_LIMIT_LO;
> +			enable = XAL_RC_BAR_CNTL_STANDARD_IOBAR_EN;
> +			mask = SZ_64K - 1;
> +			region = "I/O";
> +			break;
> +
> +		case IORESOURCE_MEM:
> +			if (range.flags & IORESOURCE_PREFETCH) {
> +				addr_hi = XAL_RC_MEM_64BIT_BASE_HI;
> +				addr_lo = XAL_RC_MEM_64BIT_BASE_LO;
> +				limit_hi = XAL_RC_MEM_64BIT_LIMIT_HI;
> +				limit_lo = XAL_RC_MEM_64BIT_LIMIT_LO;
> +				enable = XAL_RC_BAR_CNTL_STANDARD_64B_BAR_EN;
> +				region = "prefetchable memory";
> +			} else {
> +				addr_hi = XAL_RC_MEM_32BIT_BASE_HI;
> +				addr_lo = XAL_RC_MEM_32BIT_BASE_LO;
> +				limit_hi = XAL_RC_MEM_32BIT_LIMIT_HI;
> +				limit_lo = XAL_RC_MEM_32BIT_LIMIT_LO;
> +				enable = XAL_RC_BAR_CNTL_STANDARD_32B_BAR_EN;
> +				region = "memory";
> +			}
> +
> +			mask = SZ_1M - 1;
> +			break;
> +		}
> +
> +		/* not interested in anything that's not I/O or memory */
> +		if (!region)
> +			continue;
> +
> +		/* don't check regions that haven't been enabled */
> +		value = readl(pcie->xal + XAL_RC_BAR_CNTL_STANDARD);
> +		if ((value & enable) == 0)
> +			continue;
> +
> +		hi = readl(pcie->xal + addr_hi);
> +		lo = readl(pcie->xal + addr_lo);
> +		phys = hi << 32 | lo;
> +
> +		hi = readl(pcie->xal + limit_hi);
> +		lo = readl(pcie->xal + limit_lo);
> +		limit = hi << 32 | lo | mask;
> +
> +		if (phys != start || limit != end) {
> +			dev_err(pcie->dev,
> +				"%s region mismatch: %pap-%pap -> %pap-%pap\n",
> +				region, &phys, &limit, &start, &end);
> +			status = false;
> +		}
> +	}
> +
> +	res = platform_get_resource_byname(pdev, IORESOURCE_MEM, "ecam");
> +	if (!res)
> +		return false;
> +
> +	hi = readl(pcie->xal + XAL_RC_ECAM_BASE_HI);
> +	lo = readl(pcie->xal + XAL_RC_ECAM_BASE_LO);
> +	phys = hi << 32 | lo;
> +
> +	value = readl(pcie->xal + XAL_RC_ECAM_BUSMASK);
> +	limit = phys + ((value + 1) << 20) - 1;
> +
> +	if (phys != res->start || limit != res->end) {
> +		dev_err(pcie->dev,
> +			"ECAM region mismatch: %pap-%pap -> %pap-%pap\n",
> +			&phys, &limit, &res->start, &res->end);
> +		status = false;
> +	}
> +
> +	return status;
> +}
> +
> +static bool tegra264_pcie_link_up(struct tegra264_pcie *pcie,
> +				  enum pci_bus_speed *speed)
> +{
> +	u16 value = readw(pcie->ecam + XTL_RC_PCIE_CFG_LINK_STATUS);
> +
> +	if (value & PCI_EXP_LNKSTA_DLLLA) {
> +		if (speed)
> +			*speed = pcie_link_speed[FIELD_GET(PCI_EXP_LNKSTA_CLS,
> +							   value)];
> +
> +		return true;
> +	}
> +
> +	return false;
> +}
> +
> +static void tegra264_pcie_init(struct tegra264_pcie *pcie)
> +{
> +	enum pci_bus_speed speed;
> +	unsigned int i;
> +	u32 value;
> +
> +	/* bring the link out of reset */

s/link/controller or endpoint?

> +	value = readl(pcie->xtl + XTL_RC_MGMT_PERST_CONTROL);
> +	value |= XTL_RC_MGMT_PERST_CONTROL_PERST_O_N;
> +	writel(value, pcie->xtl + XTL_RC_MGMT_PERST_CONTROL);
> +
> +	if (!tegra_is_silicon()) {

This looks like some pre-silicon validation thing. Do you really want it to be
present in the upstream driver?

> +		dev_info(pcie->dev,
> +			 "skipping link state for PCIe #%u in simulation\n",
> +			 pcie->ctl_id);
> +		pcie->link_up = true;
> +		return;
> +	}
> +
> +	for (i = 0; i < PCIE_LINK_WAIT_MAX_RETRIES; i++) {
> +		if (tegra264_pcie_link_up(pcie, NULL))
> +			break;
> +
> +		usleep_range(PCIE_LINK_WAIT_US_MIN, PCIE_LINK_WAIT_US_MAX);
> +	}
> +
> +	if (tegra264_pcie_link_up(pcie, &speed)) {

Why are you doing it for the second time?

> +		/* Per PCIe r5.0, 6.6.1 wait for 100ms after DLL up */

No need of this comment.

> +		msleep(PCIE_RESET_CONFIG_WAIT_MS);
> +
> +		dev_info(pcie->dev, "PCIe #%u link is up (speed: %s)\n",
> +			 pcie->ctl_id, pci_speed_string(speed));
> +		tegra264_pcie_icc_set(pcie);
> +		pcie->link_up = true;
> +	} else {
> +		dev_info(pcie->dev, "PCIe #%u link is down\n", pcie->ctl_id);
> +
> +		value = readl(pcie->xtl + XTL_RC_MGMT_CLOCK_CONTROL);
> +
> +		/*
> +		 * Set link state only when link fails and no hot-plug feature
> +		 * is present.
> +		 */
> +		if ((value & XTL_RC_MGMT_CLOCK_CONTROL_PEX_CLKREQ_I_N_PIN_USE_CONV_TO_PRSNT) == 0) {
> +			dev_info(pcie->dev,
> +				 "PCIe #%u link is down and not hotplug-capable, turning off\n",
> +				 pcie->ctl_id);
> +			tegra264_pcie_bpmp_set_rp_state(pcie);
> +			pcie->link_up = false;
> +		} else {
> +			pcie->link_up = true;
> +		}
> +	}
> +}
> +
> +static int tegra264_pcie_probe(struct platform_device *pdev)
> +{
> +	struct device *dev = &pdev->dev;
> +	struct pci_host_bridge *bridge;
> +	struct tegra264_pcie *pcie;
> +	struct resource_entry *bus;
> +	struct resource *res;
> +	int err;
> +
> +	bridge = devm_pci_alloc_host_bridge(dev, sizeof(struct tegra264_pcie));
> +	if (!bridge)
> +		return dev_err_probe(dev, -ENOMEM,
> +				     "failed to allocate host bridge\n");
> +
> +	pcie = pci_host_bridge_priv(bridge);
> +	platform_set_drvdata(pdev, pcie);
> +	pcie->bridge = bridge;
> +	pcie->dev = dev;
> +
> +	err = pinctrl_pm_select_default_state(dev);

I questioned this before:
https://lore.kernel.org/linux-pci/o5sxxdikdjwd76zsedvkpsl54nw6wrhopwsflt43y5st67mrub@uuw3yfjfqthd/

> +	if (err < 0)
> +		return dev_err_probe(dev, err,
> +				     "failed to configure sideband pins\n");
> +
> +	err = tegra264_pcie_parse_dt(pcie);
> +	if (err < 0)
> +		return dev_err_probe(dev, err, "failed to parse device tree");
> +
> +	pcie->xal = devm_platform_ioremap_resource_byname(pdev, "xal");
> +	if (IS_ERR(pcie->xal))
> +		return dev_err_probe(dev, PTR_ERR(pcie->xal),
> +				     "failed to map XAL memory\n");
> +
> +	pcie->xtl = devm_platform_ioremap_resource_byname(pdev, "xtl-pri");
> +	if (IS_ERR(pcie->xtl))
> +		return dev_err_probe(dev, PTR_ERR(pcie->xtl),
> +				     "failed to map XTL-PRI memory\n");
> +
> +	bus = resource_list_first_type(&bridge->windows, IORESOURCE_BUS);
> +	if (!bus)
> +		return dev_err_probe(dev, -ENODEV,
> +				     "failed to get bus resources\n");
> +
> +	res = platform_get_resource_byname(pdev, IORESOURCE_MEM, "ecam");
> +	if (!res)
> +		return dev_err_probe(dev, -ENXIO,
> +				     "failed to get ECAM resource\n");
> +
> +	pcie->icc_path = devm_of_icc_get(&pdev->dev, "write");
> +	if (IS_ERR(pcie->icc_path))
> +		return dev_err_probe(&pdev->dev, PTR_ERR(pcie->icc_path),
> +				     "failed to get ICC");
> +
> +	/*
> +	 * Parse BPMP property only for silicon, as interaction with BPMP is
> +	 * not needed for other platforms.
> +	 */
> +	if (tegra_is_silicon()) {
> +		pcie->bpmp = tegra_bpmp_get_with_id(dev, &pcie->ctl_id);
> +		if (IS_ERR(pcie->bpmp))
> +			return dev_err_probe(dev, PTR_ERR(pcie->bpmp),
> +					     "failed to get BPMP\n");
> +	}
> +

pm_runtime_set_active()

> +	pm_runtime_enable(dev);

devm_pm_runtime_enable()?

> +	pm_runtime_get_sync(dev);
> +
> +	/* sanity check that programmed ranges match what's in DT */
> +	if (!tegra264_pcie_check_ranges(pdev)) {
> +		err = -EINVAL;
> +		goto put_pm;
> +	}
> +
> +	pcie->cfg = pci_ecam_create(dev, res, bus->res, &pci_generic_ecam_ops);
> +	if (IS_ERR(pcie->cfg)) {
> +		err = dev_err_probe(dev, PTR_ERR(pcie->cfg),
> +				    "failed to create ECAM\n");
> +		goto put_pm;
> +	}
> +
> +	bridge->ops = (struct pci_ops *)&pci_generic_ecam_ops.pci_ops;
> +	bridge->sysdata = pcie->cfg;
> +	pcie->ecam = pcie->cfg->win;
> +
> +	tegra264_pcie_init(pcie);
> +
> +	if (!pcie->link_up)
> +		goto free;

goto free_ecam;

> +
> +	err = pci_host_probe(bridge);
> +	if (err < 0) {
> +		dev_err(dev, "failed to register host: %pe\n", ERR_PTR(err));

dev_err_probe()

> +		goto free;
> +	}
> +
> +	return err;

return 0;

> +
> +free:
> +	pci_ecam_free(pcie->cfg);
> +put_pm:
> +	pm_runtime_put_sync(dev);
> +	pm_runtime_disable(dev);
> +
> +	if (tegra_is_silicon())
> +		tegra_bpmp_put(pcie->bpmp);
> +
> +	return err;
> +}
> +
> +static void tegra264_pcie_remove(struct platform_device *pdev)
> +{
> +	struct tegra264_pcie *pcie = platform_get_drvdata(pdev);
> +
> +	/*
> +	 * If we undo tegra264_pcie_init() then link goes down and need
> +	 * controller reset to bring up the link again. Remove intention is
> +	 * to clean up the root bridge and re-enumerate during bind.
> +	 */
> +	pci_lock_rescan_remove();
> +	pci_stop_root_bus(pcie->bridge->bus);
> +	pci_remove_root_bus(pcie->bridge->bus);
> +	pci_unlock_rescan_remove();
> +
> +	pm_runtime_put_sync(&pdev->dev);
> +	pm_runtime_disable(&pdev->dev);
> +
> +	if (tegra_is_silicon())
> +		tegra_bpmp_put(pcie->bpmp);
> +
> +	pci_ecam_free(pcie->cfg);
> +}
> +
> +static int tegra264_pcie_suspend_noirq(struct device *dev)
> +{
> +	struct tegra264_pcie *pcie = dev_get_drvdata(dev);
> +	int err;
> +
> +	if (pcie->wake_gpio && device_may_wakeup(dev)) {
> +		err = enable_irq_wake(pcie->wake_irq);
> +		if (err < 0)
> +			dev_err(dev, "failed to enable wake IRQ: %pe\n",
> +				ERR_PTR(err));
> +	}
> +
> +	return 0;
> +}
> +
> +static int tegra264_pcie_resume_noirq(struct device *dev)
> +{
> +	struct tegra264_pcie *pcie = dev_get_drvdata(dev);
> +	int err;
> +
> +	if (pcie->wake_gpio && device_may_wakeup(dev)) {
> +		err = disable_irq_wake(pcie->wake_irq);
> +		if (err < 0)
> +			dev_err(dev, "failed to disable wake IRQ: %pe\n",
> +				ERR_PTR(err));
> +	}
> +
> +	if (pcie->link_up == false)
> +		return 0;
> +
> +	tegra264_pcie_init(pcie);
> +

Why do you need init() here without deinit() in tegra264_pcie_suspend_noirq()?

- Mani

-- 
மணிவண்ணன் சதாசிவம்


^ permalink raw reply

* Re: [PATCH v5 4/4] ARM: omap1: enable real software node lookup of GPIOs on Nokia 770
From: Dmitry Torokhov @ 2026-04-02 17:32 UTC (permalink / raw)
  To: Bartosz Golaszewski
  Cc: Greg Kroah-Hartman, Rafael J. Wysocki, Danilo Krummrich,
	Andy Shevchenko, Daniel Scally, Heikki Krogerus, Sakari Ailus,
	Aaro Koskinen, Janusz Krzysztofik, Tony Lindgren, Russell King,
	Kevin Hilman, Arnd Bergmann, brgl, driver-core, linux-kernel,
	linux-acpi, linux-arm-kernel, linux-omap
In-Reply-To: <20260402-nokia770-gpio-swnodes-v5-4-d730db3dd299@oss.qualcomm.com>

On Thu, Apr 02, 2026 at 04:15:05PM +0200, Bartosz Golaszewski wrote:
> @@ -244,6 +263,13 @@ static int __init omap16xx_gpio_init(void)
>  		iounmap(base);
>  
>  		platform_device_register(omap16xx_gpio_dev[i]);
> +
> +		ret = device_add_software_node(&omap16xx_gpio_dev[i]->dev,
> +					       omap16xx_gpio_swnodes[i]);
> +		if (ret) {
> +			dev_err(&pdev->dev, "Failed to add software node.\n");
> +			return ret;
> +		}

I think the best and safest way is to convert to using
platform_driver_register_full() and set swnode in the relevant "info"
instance.

Thanks.

-- 
Dmitry


^ permalink raw reply

* Re: [PATCH net v4 0/2] stmmac crash/stall fixes when under memory pressure
From: Sam Edwards @ 2026-04-02 17:39 UTC (permalink / raw)
  To: Russell King (Oracle)
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Maxime Coquelin, Alexandre Torgue, Maxime Chevallier,
	Ovidiu Panait, Vladimir Oltean, Baruch Siach, Serge Semin,
	Giuseppe Cavallaro, netdev, linux-stm32, linux-arm-kernel,
	linux-kernel
In-Reply-To: <ac6kfQ98Xjt3dCGj@shell.armlinux.org.uk>

On Thu, Apr 2, 2026 at 10:16 AM Russell King (Oracle)
<linux@armlinux.org.uk> wrote:
> I've tested this on my Jetson Xavier platform. One of the issues I've
> had is that running iperf3 results in the receive side stalling because
> it runs out of descriptors. However, despite the receive ring
> eventually being re-filled and the hardware appropriately prodded, it
> steadfastly refuses to restart, despite the descriptors having been
> updated.

Hi Russell,

Just to make sure I understand correctly: before my patches, you've
been observing this problem on Xavier for a while (no interrupts, ring
goes dry); with my patches, the ring is refilled, but the dwmac5
doesn't resume DMA. (Ah, just saw your follow-up email.)

> Any ideas?

Off the top of my head, my hypothesis is that dwmac5 has an additional
tripwire when the receive DMA is exhausted, and the
stmmac_set_rx_tail_ptr()/stmmac_enable_dma_reception() at the end of
stmmac_rx_refill() aren't sufficient to wake it back up.

I think this is new to dwmac5, because my RK3588 (dwmac4.20 iirc)
happily resumes after the same condition.

You gave a lot of info; thanks! I'll try to scrape up some
documentation on dwmac5 to see if there's something more
stmmac_rx_refill() ought to be doing. I think I have a Xavier NX
around here somewhere, I'll see if I can repro the problem.

Cheers,
Sam

^ permalink raw reply

* Re: [PATCH] Bluetooth: btmtk: hide unused  btmtk_mt6639_devs[] array
From: patchwork-bot+bluetooth @ 2026-04-02 17:40 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: marcel, luiz.dentz, matthias.bgg, angelogioacchino.delregno,
	floss, arnd, chris.lu, kees, johan, sean.wang, jiande.lu,
	linux-bluetooth, linux-kernel, linux-arm-kernel, linux-mediatek
In-Reply-To: <20260402141119.2732591-1-arnd@kernel.org>

Hello:

This patch was applied to bluetooth/bluetooth-next.git (master)
by Luiz Augusto von Dentz <luiz.von.dentz@intel.com>:

On Thu,  2 Apr 2026 16:11:15 +0200 you wrote:
> From: Arnd Bergmann <arnd@arndb.de>
> 
> When USB support is disabled, the array is not referenced anywhere,
> causing a warning:
> 
> drivers/bluetooth/btmtk.c:35:3: error: 'btmtk_mt6639_devs' defined but not used [-Werror=unused-const-variable=]
>    35 | } btmtk_mt6639_devs[] = {
>       |   ^~~~~~~~~~~~~~~~~
> 
> [...]

Here is the summary with links:
  - Bluetooth: btmtk: hide unused btmtk_mt6639_devs[] array
    https://git.kernel.org/bluetooth/bluetooth-next/c/a6e00a811c87

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html




^ permalink raw reply

* Re: [PATCH 1/3] selftests/resctrl: Introduced linked list management for IMC counters
From: Reinette Chatre @ 2026-04-02 17:42 UTC (permalink / raw)
  To: Yifan Wu, tony.luck, Dave.Martin, james.morse, babu.moger, shuah,
	tan.shaopeng, fenghuay, ben.horgan, jonathan.cameron, zengheng4,
	linux-kernel, linux-arm-kernel, linux-kselftest, linuxarm
  Cc: xiaqinxin, prime.zeng, wangyushan12, xuwei5, fanghao11, wangzhou1
In-Reply-To: <20260324125034.1509177-2-wuyifan50@huawei.com>

Hi Yifan,

On 3/24/26 5:50 AM, Yifan Wu wrote:
> Added linked list based management for IMC counter configurations,
> allowing the system to dynamically allocate and clean up resources based on
> actual hardware capabilities.

Thank you very much for taking on this change, this is a good addition.

One note on the customs, even though this is user space code it is the Linux kernel
developers that work with it mostly and to support this the style is expected to
(as much as possible) match kernel coding style. Specifically related to this, please
follow coding and changelog guidance in Documentation/process/coding-style.rst and 
Documentation/process/maintainer-tip.rst.

> 
> Signed-off-by: Yifan Wu <wuyifan50@huawei.com>
> ---
>  tools/testing/selftests/resctrl/resctrl.h     |  1 +
>  tools/testing/selftests/resctrl/resctrl_val.c | 25 +++++++++++++++++++
>  2 files changed, 26 insertions(+)
> 
> diff --git a/tools/testing/selftests/resctrl/resctrl.h b/tools/testing/selftests/resctrl/resctrl.h
> index 175101022bf3..29c9f76132f0 100644
> --- a/tools/testing/selftests/resctrl/resctrl.h
> +++ b/tools/testing/selftests/resctrl/resctrl.h
> @@ -24,6 +24,7 @@
>  #include <linux/perf_event.h>
>  #include <linux/compiler.h>
>  #include <linux/bits.h>
> +#include  <linux/list.h>

(extra space)

>  #include "kselftest.h"
>  
>  #define MB			(1024 * 1024)
> diff --git a/tools/testing/selftests/resctrl/resctrl_val.c b/tools/testing/selftests/resctrl/resctrl_val.c
> index f20d2194c35f..ac58d3862281 100644
> --- a/tools/testing/selftests/resctrl/resctrl_val.c
> +++ b/tools/testing/selftests/resctrl/resctrl_val.c
> @@ -28,6 +28,7 @@ struct membw_read_format {
>  };
>  
>  struct imc_counter_config {
> +	struct list_head imc_list;

To make it obvious that this is an entry/node of a list as opposed to a list
header, please use a name like "entry" or "node" or similar.

>  	__u32 type;
>  	__u64 event;
>  	__u64 umask;
> @@ -38,6 +39,7 @@ struct imc_counter_config {
>  static char mbm_total_path[1024];
>  static int imcs;
>  static struct imc_counter_config imc_counters_config[MAX_IMCS];
> +LIST_HEAD(imc_counters_configs);
>  static const struct resctrl_test *current_test;
>  
>  static void read_mem_bw_initialize_perf_event_attr(int i)
> @@ -235,6 +237,7 @@ static int read_from_imc_dir(char *imc_dir, unsigned int *count)
>   */
>  static int num_of_imcs(void)
>  {
> +	struct imc_counter_config *imc_counters_config;

Please avoid variable shadowing.

>  	char imc_dir[512], *temp;
>  	unsigned int count = 0;
>  	struct dirent *ep;
> @@ -263,14 +266,23 @@ static int num_of_imcs(void)
>  			 * first character is a numerical digit or not.
>  			 */
>  			if (temp[0] >= '0' && temp[0] <= '9') {
> +				imc_counters_config = malloc(sizeof(struct imc_counter_config));

One example of the kernel coding style applied here (see "Allocating memory" in
Documentation/process/coding-style.rst) is to use pointer variable and
not typing out the struct.

Even so, this does not look to be the right place to do this allocation ... (more below)

> +				if (!imc_counters_config) {
> +					ksft_print_msg("Unable to allocate memory for iMC counters\n");
> +
> +					return -1;
> +				}
> +				memset(imc_counters_config, 0, sizeof(struct imc_counter_config));

Please use calloc() instead of the malloc() followed by memset().

>  				sprintf(imc_dir, "%s/%s/", DYN_PMU_PATH,
>  					ep->d_name);
>  				ret = read_from_imc_dir(imc_dir, &count);

read_from_imc_dir() obtains "count", which is used as index to imc_counters_config[],
as reference because it increments this counter. Doing so enables read_from_imc_dir()
to initialize more than one array element.

Only allocating one element for the list thus does not look right and I would actually
expect this allocation to be closer to the initialization where it is known whether
a new element is needed or not. Having new list element allocation within
parse_imc_read_bw_events() looks more appropriate?

>  				if (ret) {
> +					free(imc_counters_config);
>  					closedir(dp);
>  
>  					return ret;
>  				}
> +				list_add(&imc_counters_config->imc_list, &imc_counters_configs);
>  			}
>  		}
>  		closedir(dp);
> @@ -303,6 +315,19 @@ int initialize_read_mem_bw_imc(void)
>  	return 0;
>  }
>  
> +void cleanup_read_mem_bw_imc(void)
> +{
> +	struct imc_counter_config *next_imc_counters_config;

This could just be "*tmp" to make its usage clear.

> +	struct imc_counter_config *imc_counters_config;

I find the "imc_counters_configs" and "imc_counters_config" names too similar
for comfort and makes the code harder to follow. A short name that is obviously
different from the global list name will make the code much easier to read.
How about "imc_counter"?

> +
> +	list_for_each_entry_safe(imc_counters_config, next_imc_counters_config,
> +				 &imc_counters_configs, imc_list) {
> +		list_del(&imc_counters_config->imc_list);
> +		free(imc_counters_config);
> +	}
> +	INIT_LIST_HEAD(&imc_counters_configs);

Why is INIT_LIST_HEAD() needed?

> +}
> +
>  static void perf_close_imc_read_mem_bw(void)
>  {
>  	int mc;

This patch allocates memory but never frees is. This is done later in patch #3
as a fix but that change would be better as part of this to make it easier to
consider the memory management.

Reinette

^ permalink raw reply

* Re: [PATCH 2/3] selftests/resctrl: Replace array-based IMC counter management with linked lists
From: Reinette Chatre @ 2026-04-02 17:44 UTC (permalink / raw)
  To: Yifan Wu, tony.luck, Dave.Martin, james.morse, babu.moger, shuah,
	tan.shaopeng, fenghuay, ben.horgan, jonathan.cameron, zengheng4,
	linux-kernel, linux-arm-kernel, linux-kselftest, linuxarm
  Cc: xiaqinxin, prime.zeng, wangyushan12, xuwei5, fanghao11, wangzhou1
In-Reply-To: <20260324125034.1509177-3-wuyifan50@huawei.com>

Hi Yifan,

On 3/24/26 5:50 AM, Yifan Wu wrote:
> Convert IMC counter management from static array to dynamic
> linked list allocation.

Could you please split this patch into two? One patch where utilities
receive pointer to array element instead of index as parameter and
another patch that switches the code to use a list?

> 
> Signed-off-by: Yifan Wu <wuyifan50@huawei.com>
> ---
>  tools/testing/selftests/resctrl/resctrl_val.c | 134 +++++++++---------
>  1 file changed, 66 insertions(+), 68 deletions(-)
> 
> diff --git a/tools/testing/selftests/resctrl/resctrl_val.c b/tools/testing/selftests/resctrl/resctrl_val.c
> index ac58d3862281..417d87ba368a 100644
> --- a/tools/testing/selftests/resctrl/resctrl_val.c
> +++ b/tools/testing/selftests/resctrl/resctrl_val.c
> @@ -14,7 +14,6 @@
>  #define READ_FILE_NAME		"cas_count_read"
>  #define DYN_PMU_PATH		"/sys/bus/event_source/devices"
>  #define SCALE			0.00006103515625
> -#define MAX_IMCS		40
>  #define MAX_TOKENS		5
>  
>  #define CON_MBM_LOCAL_BYTES_PATH		\
> @@ -38,36 +37,37 @@ struct imc_counter_config {
>  
>  static char mbm_total_path[1024];
>  static int imcs;
> -static struct imc_counter_config imc_counters_config[MAX_IMCS];
>  LIST_HEAD(imc_counters_configs);
>  static const struct resctrl_test *current_test;
>  
> -static void read_mem_bw_initialize_perf_event_attr(int i)
> +static void read_mem_bw_initialize_perf_event_attr(struct imc_counter_config *imc_counters_config)

In parameters also, please use a variable name used for element that is further
away from list header name. How about just "imc_counter" for the function parameter? 

...

> @@ -112,10 +113,10 @@ static int open_perf_read_event(int i, int cpu_no)
>  }
>  
>  static int parse_imc_read_bw_events(char *imc_dir, unsigned int type,
> -				    unsigned int *count)
> +				    struct imc_counter_config *imc_counters_config)
>  {
>  	char imc_events_dir[PATH_MAX], imc_counter_cfg[PATH_MAX];
> -	unsigned int orig_count = *count;
> +	unsigned int orig_count = imcs;

Why is global imcs used/needed here? The intention behind orig_count is just to
check if any iMC counters were added by this function. Original code checked 
by comparing the "before" and "after" array index but with a switch to a list this
can just be done locally, for example, with a boolean.

>  	char cas_count_cfg[1024];
>  	struct dirent *ep;
>  	int path_len;
> @@ -165,17 +166,13 @@ static int parse_imc_read_bw_events(char *imc_dir, unsigned int type,
>  			ksft_perror("Could not get iMC cas count read");
>  			goto out_close;
>  		}
> -		if (*count >= MAX_IMCS) {
> -			ksft_print_msg("Maximum iMC count exceeded\n");
> -			goto out_close;
> -		}
>  
> -		imc_counters_config[*count].type = type;
> -		get_read_event_and_umask(cas_count_cfg, *count);
> -		/* Do not fail after incrementing *count. */
> -		*count += 1;
> +		imc_counters_config->type = type;
> +		get_read_event_and_umask(cas_count_cfg, imc_counters_config);
> +		/* Do not fail after incrementing count. */
> +		imcs++;

Note that this is a loop that may initialize more than one counter and since it
uses the single element provided as function parameter each new counter will just
overwrite the previous's settings.

As mentioned in patch #1 it looks more appropriate to allocate and initialize
new list entry here within parse_imc_read_bw_events().

> @@ -239,7 +236,7 @@ static int num_of_imcs(void)
>  {
>  	struct imc_counter_config *imc_counters_config;
>  	char imc_dir[512], *temp;
> -	unsigned int count = 0;
> +	imcs = 0;
>  	struct dirent *ep;
>  	int ret;
>  	DIR *dp;
> @@ -275,7 +272,7 @@ static int num_of_imcs(void)
>  				memset(imc_counters_config, 0, sizeof(struct imc_counter_config));
>  				sprintf(imc_dir, "%s/%s/", DYN_PMU_PATH,
>  					ep->d_name);
> -				ret = read_from_imc_dir(imc_dir, &count);
> +				ret = read_from_imc_dir(imc_dir, imc_counters_config);
>  				if (ret) {
>  					free(imc_counters_config);
>  					closedir(dp);
> @@ -286,7 +283,7 @@ static int num_of_imcs(void)
>  			}
>  		}
>  		closedir(dp);
> -		if (count == 0) {
> +		if (imcs == 0) {

Is this global necessary? How about list_empty() instead?

>  			ksft_print_msg("Unable to find iMC counters\n");
>  
>  			return -1;
> @@ -297,20 +294,22 @@ static int num_of_imcs(void)
>  		return -1;
>  	}
>  
> -	return count;
> +	return imcs;

Looking at how the caller, initialize_read_mem_bw_imc() below, uses the return
value it does not seem necessary to track the number of entries anymore. Could the
global imcs just be dropped?

>  }
>  
>  int initialize_read_mem_bw_imc(void)
>  {
> -	int imc;
> +	int ret;
> +	struct imc_counter_config *imc_counters_config;
>  
> -	imcs = num_of_imcs();
> -	if (imcs <= 0)
> -		return imcs;
> +	ret = num_of_imcs();
> +	if (ret <= 0)
> +		return ret;
>  
>  	/* Initialize perf_event_attr structures for all iMC's */
> -	for (imc = 0; imc < imcs; imc++)
> -		read_mem_bw_initialize_perf_event_attr(imc);
> +	list_for_each_entry(imc_counters_config, &imc_counters_configs, imc_list) {
> +		read_mem_bw_initialize_perf_event_attr(imc_counters_config);
> +	}
>  
>  	return 0;
>  }
Reinette


^ permalink raw reply

* Re: [PATCH V3 7/7] arm64/hw_breakpoint: Enable FEAT_Debugv8p9
From: Rob Herring @ 2026-04-02 17:46 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Anshuman Khandual, linux-arm-kernel, linux-kernel,
	Jonathan Corbet, Marc Zyngier, Oliver Upton, James Morse,
	Suzuki K Poulose, Catalin Marinas, Will Deacon, Mark Brown,
	kvmarm
In-Reply-To: <ac5G8e4zWTdicDBs@J2N7QTR9R3>

On Thu, Apr 2, 2026 at 5:37 AM Mark Rutland <mark.rutland@arm.com> wrote:
>
> On Tue, Mar 31, 2026 at 05:58:00PM -0500, Rob Herring wrote:
> > On Mon, Dec 16, 2024 at 10:58:29AM +0000, Mark Rutland wrote:
> > > On Mon, Dec 16, 2024 at 09:38:31AM +0530, Anshuman Khandual wrote:
> > > > Currently there can be maximum 16 breakpoints, and 16 watchpoints available
> > > > on a given platform - as detected from ID_AA64DFR0_EL1.[BRPs|WRPs] register
> > > > fields. But these breakpoint, and watchpoints can be extended further up to
> > > > 64 via a new arch feature FEAT_Debugv8p9.
> > > >
> > > > This first enables banked access for the breakpoint and watchpoint register
> > > > set via MDSELR_EL1, extended exceptions via MDSCR_EL1.EMBWE and determining
> > > > available breakpoints and watchpoints in the platform from ID_AA64DFR1_EL1,
> > > > when FEAT_Debugv8p9 is enabled.
> > >
> > > [...]
> >
> > Well, this series has landed on my plate...
>
> Sorry about that; thanks for taking a look!
>
> > > > +static u64 read_wb_reg(int reg, int n)
> > > > +{
> > > > + unsigned long flags;
> > > > + u64 val;
> > > > +
> > > > + if (!is_debug_v8p9_enabled())
> > > > +         return __read_wb_reg(reg, n);
> > > > +
> > > > + /*
> > > > +  * Bank selection in MDSELR_EL1, followed by an indexed read from
> > > > +  * breakpoint (or watchpoint) registers cannot be interrupted, as
> > > > +  * that might cause misread from the wrong targets instead. Hence
> > > > +  * this requires mutual exclusion.
> > > > +  */
> > > > + local_irq_save(flags);
> > > > + write_sysreg_s(SYS_FIELD_PREP(MDSELR_EL1, BANK, n / MAX_PER_BANK), SYS_MDSELR_EL1);
> > > > + isb();
> > > > + val = __read_wb_reg(reg, n % MAX_PER_BANK);
> > > > + local_irq_restore(flags);
> > > > + return val;
> > > > +}
> > > >  NOKPROBE_SYMBOL(read_wb_reg);
> > >
> > > I don't believe that disabling interrupts here is sufficient. On the
> > > last version I asked about the case of racing with a watchpoint handler:
> > >
> > > | For example, what prevents watchpoint_handler() from firing in the
> > > | middle of arch_install_hw_breakpoint() or
> > > | arch_uninstall_hw_breakpoint()?
> > >
> > > ... and disabling interrupts cannot prevent that, because
> > > local_irq_{save,restore}() do not affect the behaviour of watchpoints or
> > > breakpoints.
> >
> > I think the answer is we just need NOKPROBE_SYMBOL() annotation on
> > hw_breakpoint_control() (what arch_install_hw_breakpoint() and
> > arch_uninstall_hw_breakpoint() wrap).
>
> Ok. I couldn'y spot where we prevent placing HW breakpoints on
> NOKPROBE_SYMBOL() functions, but if we do enforce that, something like
> that sounds ok.

Uh, actually you are right. I think we need the equivalent to this in
arch_build_bp_info():

    case HW_BREAKPOINT_X:
        /*
         * We don't allow kernel breakpoints in places that are not
         * acceptable for kprobes.  On non-kprobes kernels, we don't
         * allow kernel breakpoints at all.
         */
        if (attr->bp_addr >= TASK_SIZE_MAX) {
            if (within_kprobe_blacklist(attr->bp_addr))
                return -EINVAL;
        }

The 2nd sentence is enforced by within_kprobe_blacklist() returning
true for !CONFIG_KPROBES. Seems like a reasonable restriction, but it
would be a change. Unfortunately, we can't move this to non-arch code
as some arches support h/w BP without supporting kprobes.

>
> I suspect we'd need to make that noinstr to also prevent ftrace
> instrumentation, unless ftrace also inhibits itself for
> NOKPROBE_SYMBOL() functions.

If we use NOKPROBE_SYMBOL() along with nokprobe_inline (which just
forces inlining functions), then ftrace could only be used on the
entry or exit of
arch_install_hw_breakpoint()/arch_uninstall_hw_breakpoint() which I
think would be fine.

>
> > We also need that on __read_wb_reg
> > and __read_wb_reg though I would think those are folded into the calling
> > functions by the compiler. Interestly, the x86 code doesn't use the
> > annotation at all.
>
> IIUC, it looks like they *can* take debug NMIs during
> arch_install_hw_breakpoint() and arch_uninstall_hw_breakpoint(), which
> is why they have ordering constraints for modifying the percpu 'cpu_dr7'
> variable, and their actual DR7 register (which IIUC has the enable
> controls for each HW breakpoint).
>
> That said, the use of 'bp_per_reg' looks suspect given their
> arch_install_hw_breakpoint() and arch_uninstall_hw_breakpoint() modify
> that non-atomically.

You don't believe the comment saying counter->ctx->lock is held?

> We could consider allowing breakpoints on those functions, but I'm not
> sure whether that's possible for us, and (as noted below) it might be
> better to transiently disable breakpoints/watchpoints.

Even if possible, is it all that useful? Easier to just disable than
reason whether it would work or not IMO.


> IIRC on x86, breakpoint exceptions are taken *after* execution of the
> instruction that triggered them, so the handler doesn't have to
> manipulate single-step, and can safely ignore a breakpoint exception
> without the risk of getting stuck taking the breakpoint repeatedly.
>
> > I initially thought the IRQ disabling is also still needed as IRQ
> > handlers can trigger breakpoints. However, the x86 version of
> > arch_install_hw_breakpoint() contains a lockdep_assert_irqs_disabled(),
> > so it seems for that case interrupts are already disabled. And in debug
> > exceptions, we disable interrupts. So I think the interrupt disabling
> > can be dropped.
>
> I'd expect that the core perf code disables interrupts before calling
> arch_install_hw_breakpoint() or arch_uninstall_hw_breakpoint(), and this
> would be necessary for perf to serialise against IPIs that manipulate
> the perf_event_context.
>
> I agree that when we actually take the breakpoint, we'll mask all
> exceptions, and so it's not necessary to mask IRQs there.
>
> So a first step is probably to add that lockdep assert.
>
> > > Please can you try to answer the questions I asked last time, i.e.
> > >
> > > | What prevents a race with an exception handler? e.g.
> > > |
> > > | * Does the structure of the code prevent that somehow?
> >
> > If you can't set a breakpoint/watchpoint in NOKPROBE_SYMBOL() annotated
> > code, you can't race.
>
> As above, I agree (with caveats), but I couldn't spot where this is
> enforced.
>
> > However, there's no such annotation for data. It looks like the kernel
> > policy is "don't do that" or disable all breakpoints/watchpoints.
>
> If we have to transiently disable watchpoints/breakpoints when
> manipulating the relevant HW registers, that sounds fine to me.

For wp/bp_on_reg, the ordering is 'data access, h/w accesses'. I think
we just need a barrier to enforce that ordering so the data access
(and then watchpoint) don't trigger in the middle of the h/w accesses.
Any guidance on the flavor of dsb here? (And is there any guarantee
that the access is visible to the watchpoint h/w after a dsb
completes?)

Rob


^ permalink raw reply

* Re: [PATCH v3 1/2] mailbox: Use per-thread completion to fix wrong completion order
From: Jassi Brar @ 2026-04-02 17:59 UTC (permalink / raw)
  To: Joonwon Kang
  Cc: matthias.bgg, angelogioacchino.delregno, thierry.reding,
	jonathanh, linux-kernel, linux-arm-kernel, linux-mediatek,
	linux-tegra, stable
In-Reply-To: <20260402170641.2082547-2-joonwonkang@google.com>

On Thu, Apr 2, 2026 at 12:07 PM Joonwon Kang <joonwonkang@google.com> wrote:
>
> Previously, a sender thread in mbox_send_message() could be woken up at
> a wrong time in blocking mode. It is because there was only a single
> completion for a channel whereas messages from multiple threads could be
> sent in any order; since the shared completion could be signalled in any
> order, it could wake up a wrong sender thread.
>
> This commit resolves the false wake-up issue with the following changes:
> - Completions are created just as many as the number of concurrent sender
>   threads
> - A completion is created on a sender thread's stack
> - Each slot of the message queue, i.e. `msg_data`, contains a pointer to
>   its target completion
> - tx_tick() signals the completion of the currently active slot of the
>   message queue
>
I think I reviewed it already or is this happening on
one-channel-one-client usage? Because mailbox api does not support
channels shared among multiple clients.

Thanks
Jassi


^ permalink raw reply

* Re: [PATCH v3 2/2] mailbox: Make mbox_send_message() return error code when tx fails
From: Jassi Brar @ 2026-04-02 18:03 UTC (permalink / raw)
  To: Joonwon Kang
  Cc: matthias.bgg, angelogioacchino.delregno, thierry.reding,
	jonathanh, linux-kernel, linux-arm-kernel, linux-mediatek,
	linux-tegra, stable
In-Reply-To: <20260402170641.2082547-3-joonwonkang@google.com>

On Thu, Apr 2, 2026 at 12:07 PM Joonwon Kang <joonwonkang@google.com> wrote:
>
> When the mailbox controller failed transmitting message, the error code
> was only passed to the client's tx done handler and not to
> mbox_send_message(). For this reason, the function could return a false
> success. This commit resolves the issue by introducing the tx status and
> checking it before mbox_send_message() returns.
>
Can you please share the scenario when this becomes necessary? This
can potentially change the ground underneath some clients, so we have
to be sure this is really useful.

Thanks
Jassi


> Cc: stable@vger.kernel.org
> Signed-off-by: Joonwon Kang <joonwonkang@google.com>
> ---
>  drivers/mailbox/mailbox.c          | 20 +++++++++++++++-----
>  include/linux/mailbox_controller.h |  2 ++
>  2 files changed, 17 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/mailbox/mailbox.c b/drivers/mailbox/mailbox.c
> index d63386468982..ea9aec9dc947 100644
> --- a/drivers/mailbox/mailbox.c
> +++ b/drivers/mailbox/mailbox.c
> @@ -21,7 +21,10 @@
>  static LIST_HEAD(mbox_cons);
>  static DEFINE_MUTEX(con_mutex);
>
> -static int add_to_rbuf(struct mbox_chan *chan, void *mssg, struct completion *tx_complete)
> +static int add_to_rbuf(struct mbox_chan *chan,
> +                      void *mssg,
> +                      struct completion *tx_complete,
> +                      int *tx_status)
>  {
>         int idx;
>
> @@ -34,6 +37,7 @@ static int add_to_rbuf(struct mbox_chan *chan, void *mssg, struct completion *tx
>         idx = chan->msg_free;
>         chan->msg_data[idx].data = mssg;
>         chan->msg_data[idx].tx_complete = tx_complete;
> +       chan->msg_data[idx].tx_status = tx_status;
>         chan->msg_count++;
>
>         if (idx == MBOX_TX_QUEUE_LEN - 1)
> @@ -91,12 +95,13 @@ static void msg_submit(struct mbox_chan *chan)
>
>  static void tx_tick(struct mbox_chan *chan, int r, int idx)
>  {
> -       struct mbox_message mssg = {MBOX_NO_MSG, NULL};
> +       struct mbox_message mssg = {MBOX_NO_MSG, NULL, NULL};
>
>         scoped_guard(spinlock_irqsave, &chan->lock) {
>                 if (idx >= 0 && idx != chan->active_req) {
>                         chan->msg_data[idx].data = MBOX_NO_MSG;
>                         chan->msg_data[idx].tx_complete = NULL;
> +                       chan->msg_data[idx].tx_status = NULL;
>                         return;
>                 }
>
> @@ -116,8 +121,10 @@ static void tx_tick(struct mbox_chan *chan, int r, int idx)
>         if (chan->cl->tx_done)
>                 chan->cl->tx_done(chan->cl, mssg.data, r);
>
> -       if (r != -ETIME && chan->cl->tx_block)
> +       if (r != -ETIME && chan->cl->tx_block) {
> +               *mssg.tx_status = r;
>                 complete(mssg.tx_complete);
> +       }
>  }
>
>  static enum hrtimer_restart txdone_hrtimer(struct hrtimer *hrtimer)
> @@ -286,15 +293,16 @@ int mbox_send_message(struct mbox_chan *chan, void *mssg)
>         int t;
>         int idx;
>         struct completion tx_complete;
> +       int tx_status = 0;
>
>         if (!chan || !chan->cl || mssg == MBOX_NO_MSG)
>                 return -EINVAL;
>
>         if (chan->cl->tx_block) {
>                 init_completion(&tx_complete);
> -               t = add_to_rbuf(chan, mssg, &tx_complete);
> +               t = add_to_rbuf(chan, mssg, &tx_complete, &tx_status);
>         } else {
> -               t = add_to_rbuf(chan, mssg, NULL);
> +               t = add_to_rbuf(chan, mssg, NULL, NULL);
>         }
>
>         if (t < 0) {
> @@ -318,6 +326,8 @@ int mbox_send_message(struct mbox_chan *chan, void *mssg)
>                         idx = t;
>                         t = -ETIME;
>                         tx_tick(chan, t, idx);
> +               } else if (tx_status < 0) {
> +                       t = tx_status;
>                 }
>         }
>
> diff --git a/include/linux/mailbox_controller.h b/include/linux/mailbox_controller.h
> index 912499ad08ed..890da97bcb50 100644
> --- a/include/linux/mailbox_controller.h
> +++ b/include/linux/mailbox_controller.h
> @@ -117,10 +117,12 @@ struct mbox_controller {
>   * struct mbox_message - Internal representation of a mailbox message
>   * @data:              Data packet
>   * @tx_complete:       Pointer to the transmission completion
> + * @tx_status:         Pointer to the transmission status
>   */
>  struct mbox_message {
>         void *data;
>         struct completion *tx_complete;
> +       int *tx_status;
>  };
>
>  /**
> --
> 2.53.0.1185.g05d4b7b318-goog
>


^ permalink raw reply

* Re: [PATCH] iommu: Always fill in gather when unmapping
From: Robin Murphy @ 2026-04-02 18:11 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Alexandre Ghiti, AngeloGioacchino Del Regno, Albert Ou, asahi,
	Baolin Wang, iommu, Janne Grunau, Jernej Skrabec, Joerg Roedel,
	Jean-Philippe Brucker, linux-arm-kernel, linux-mediatek,
	linux-riscv, linux-sunxi, Matthias Brugger, Neal Gompa,
	Orson Zhai, Palmer Dabbelt, Paul Walmsley, Samuel Holland,
	Sven Peter, virtualization, Chen-Yu Tsai, Will Deacon, Yong Wu,
	Chunyan Zhang, Lu Baolu, Janusz Krzysztofik, Joerg Roedel,
	Jon Hunter, patches, Samiullah Khawaja, stable, Vasant Hegde
In-Reply-To: <20260401173650.GD310919@nvidia.com>

On 2026-04-01 6:36 pm, Jason Gunthorpe wrote:
> On Wed, Apr 01, 2026 at 05:33:28PM +0100, Robin Murphy wrote:
>>> io-pgtable might have intended to allow the driver to choose between
>>> gather or immediate flush because it passed gather to
>>> ops->tlb_add_page(), however no driver does anything with it.
>>
>> Apart from arm-smmu-v3...
> 
> Bah, I did my research on the wrong tree and missed this.
> 
>>> mtk uses io-pgtable-arm-v7s but added the range to the gather in the
>>> unmap callback. Move this into the io-pgtable-arm unmap itself. That
>>> will fix all the armv7 using drivers (arm-smmu, qcom_iommu,
>>> ipmmu-vmsa).
>>
>> io-pgtable-arm-v7s != io-pgtable-arm. You're *breaking* MTK (and failing
>> to fix the other v7s user, which is MSM).
> 
> I was very confused what you were talking about, but I see now that
> the hunk adding iommu_iotlb_gather_add_range() to v7 got lost somehow!
> 
> @@ -596,6 +596,9 @@ static size_t __arm_v7s_unmap(struct arm_v7s_io_pgtable *data,
>   
>                  __arm_v7s_set_pte(ptep, 0, num_entries, &iop->cfg);
>   
> +               if (!iommu_iotlb_gather_queued(gather))
> +                       iommu_iotlb_gather_add_range(gather, iova, size);
> +
>                  for (i = 0; i < num_entries; i++) {
>                          if (ARM_V7S_PTE_IS_TABLE(pte[i], lvl)) {
>                                  /* Also flush any partial walks */
> 
>>> arm-smmu uses both ARM_V7S and ARM LPAE formats. The LPAE formats
>>> already have the gather population because SMMUv3 requires it, so it
>>> becomes consistent.
>>
>> Huh? arm-smmu-v3 invokes iommu_iotlb_gather_add_page() itself, because
>> arm-smmu-v3 uses gathers
> 
> Yeah, I missed this whole bit, it needs some changes.
> 
>> Invoking add range before add_page will end up defeating the
>> iommu_iotlb_gather_is_disjoint() check and making SMMUv3
>> overinvalidate between disjoint ranges.
> 
> Right, that flow needs fixing.
> 
>> I guess now I remember why we weren't validating gathers in core code
>> before :(
> 
> My point is not filling the gather is a micro-optimization that
> benefits a few drivers. I think it is so small compared to an IOTLB
> flush that it isn't worth worrying about.

It's hardly a "micro-optimisation" for drivers to just not touch an 
optional mechanism which offers no benefit to them, especially when in 
many cases said mechanism is newer than the code that isn't using it 
anyway. The only required semantic of .iotlb_sync is to ensure that any 
previous .unmap_pages calls are complete and their associated 
translations invalidated. The entire concept of gathering and deferred 
invalidation is irrelevant to many IOMMU designs where it would only 
stand to make overall invalidation performance worse.

I'm starting to wish I'd been able to page all this context back in 
before reviewing the first patch, as I too only really had Intel and 
SMMUv3 in mind at the time... :(

> So, I'd like to make everything the same and populate the gather
> correctly in all flows. I'll fix the SMMUv3 thing and lets look again,
> this patch is not so scary to make me think we shouldn't do that.
> 
>> @@ -2714,6 +2714,10 @@ static size_t __iommu_unmap(struct iommu_domain *domain,
>>   		pr_debug("unmapped: iova 0x%lx size 0x%zx\n",
>>   			 iova, unmapped_page);
>> +		/* If the driver itself isn't using the gather, mark it used */
>> +		if (iotlb_gather->end <= iotlb_gather->start)
>> +			iommu_iotlb_gather_add_range(&iotlb_gather, iova, unmapped_page);
> 
> The gathers can be joined across unmaps and now we are inviting subtly
> ill-formed gathers as only the first unmap will get included.

Ill-formed? It's a perfectly valid range for the purposes of any 
subsequent generic check - which couldn't realistically be anything 
beyond empty vs. non-empty anyway - and it's only being set at all in 
the case where we know the driver doesn't care, because if the driver 
*was* going to look at gather->start or gather->end in its .iotlb_sync 
then it must have already set them to meaningful values in the prior 
successful .unmap_pages call. I think we can safely consider it invalid 
for a driver to suddenly decide to start using a gather mid-way through 
an unmap (or indeed to use start/end in any intentionally non-obvious 
manner either).

> We do have error cases where the gather is legitimately empty, and
> this would squash that, it probably needs to check unmapped_page for 0
> too, at least.

Maybe try looking at the rest of the code around these lines...

Thanks,
Robin.


^ permalink raw reply

* Re: [PATCH 3/3] drm: lcdif: Wait for vblank before disabling DMA
From: Paul Kocialkowski @ 2026-04-02 18:29 UTC (permalink / raw)
  To: Lucas Stach
  Cc: dri-devel, imx, linux-arm-kernel, linux-kernel, Marek Vasut,
	Stefan Agner, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Simona Vetter, Frank Li, Sascha Hauer,
	Pengutronix Kernel Team, Fabio Estevam, Krzysztof Hałasa,
	Marco Felsch, Liu Ying
In-Reply-To: <3305c7ec621987a30043f274b2705f739bddd497.camel@pengutronix.de>

[-- Attachment #1: Type: text/plain, Size: 3411 bytes --]

Hi Lucas,

On Tue 31 Mar 26, 11:14, Lucas Stach wrote:
> Am Dienstag, dem 31.03.2026 um 00:46 +0200 schrieb Paul Kocialkowski:
> > It is necessary to wait for the full frame to finish streaming
> > through the DMA engine before we can safely disable it by removing
> > the DISP_PARA_DISP_ON bit. Disabling it in-flight can leave the
> > hardware confused and unable to resume streaming for the next frame.
> > 
> > This causes the FIFO underrun and empty status bits to be set and
> > a single solid color to be shown on the display, coming from one of
> > the pixels of the previous frame. The issue occurs sporadically when
> > a new mode is set, which triggers the crtc disable and enable paths.
> > 
> > Setting the shadow load bit and waiting for it to be cleared by the
> > DMA engine allows waiting for completion.
> > 
> > The NXP BSP driver addresses this issue with a hardcoded 25 ms sleep.
> > 
> > Fixes: 9db35bb349a0 ("drm: lcdif: Add support for i.MX8MP LCDIF variant")
> > Signed-off-by: Paul Kocialkowski <paulk@sys-base.io>
> > Co-developed-by: Lucas Stach <l.stach@pengutronix.de>
> > ---
> >  drivers/gpu/drm/mxsfb/lcdif_kms.c | 16 ++++++++++++++++
> >  1 file changed, 16 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/mxsfb/lcdif_kms.c b/drivers/gpu/drm/mxsfb/lcdif_kms.c
> > index 1aac354041c7..7dce7f48d938 100644
> > --- a/drivers/gpu/drm/mxsfb/lcdif_kms.c
> > +++ b/drivers/gpu/drm/mxsfb/lcdif_kms.c
> > @@ -393,6 +393,22 @@ static void lcdif_disable_controller(struct lcdif_drm_private *lcdif)
> >  	if (ret)
> >  		drm_err(lcdif->drm, "Failed to disable controller!\n");
> >  
> You can drop this no-op poll above...

You're right, it looks a bit weird to keep it as it's not needed.

> > +	/*
> > +	 * It is necessary to wait for the full frame to finish streaming
> > +	 * through the DMA engine before we can safely disable it by removing
> > +	 * the DISP_PARA_DISP_ON bit. Disabling it in-flight can leave the
> > +	 * hardware confused and unable to resume streaming for the next frame.
> > +	 */
> > +	reg = readl(lcdif->base + LCDC_V8_CTRLDESCL0_5);
> > +	reg |= CTRLDESCL0_5_SHADOW_LOAD_EN;
> > +	writel(reg, lcdif->base + LCDC_V8_CTRLDESCL0_5);
> > +
> .. then setting the shadow load enable bit can be merged with the
> access clearing the DMA enable bit.

I've just tried it out and it works just as well!

> > +	ret = readl_poll_timeout(lcdif->base + LCDC_V8_CTRLDESCL0_5,
> > +				 reg, !(reg & CTRLDESCL0_5_SHADOW_LOAD_EN),
> > +				 0, 36000);	/* Wait ~2 frame times max */
> 
> I know this is just a copy from the existing poll, but I don't think
> the busy looping makes a lot of sense. I guess relaxing the poll by a
> 100us or even 200us wait between checks wouldn't hurt.

Yeah good point too. I think 200 us is definitely fine since we're looking
at an average wait of a few ms.

Will respin the series then!

Thanks for the review,

Paul

> > +	if (ret)
> > +		drm_err(lcdif->drm, "Failed to disable controller!\n");
> > +
> >  	reg = readl(lcdif->base + LCDC_V8_DISP_PARA);
> >  	reg &= ~DISP_PARA_DISP_ON;
> >  	writel(reg, lcdif->base + LCDC_V8_DISP_PARA);
> 

-- 
Paul Kocialkowski,

Independent contractor - sys-base - https://www.sys-base.io/
Free software developer - https://www.paulk.fr/

Expert in multimedia, graphics and embedded hardware support with Linux.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox