Linux Tegra architecture development
 help / color / mirror / Atom feed
* Re: [PATCH 1/2] pinctrl: tegra: PINCTRL_TEGRA238 should depend on ARCH_TEGRA
From: Krzysztof Kozlowski @ 2026-06-09 15:38 UTC (permalink / raw)
  To: Geert Uytterhoeven, Prathamesh Shete, Thierry Reding,
	Jonathan Hunter, Linus Walleij
  Cc: linux-gpio, linux-tegra
In-Reply-To: <0643b689f0f4a453d31183d9f598a6f53574ecbc.1781017599.git.geert+renesas@glider.be>

On 09/06/2026 17:08, Geert Uytterhoeven wrote:
> The NVIDIA Tegra238 MAIN and AON pin controllers are only present on
> NVIDIA Tegra238 SoCs.  Hence add a dependency on ARCH_TEGRA, to prevent
> asking the user about this driver when configuring a kernel without
> NVIDIA Tegra SoC support.
> 
> Fixes: 25cac7292d49f4fc ("pinctrl: tegra: Add Tegra238 pinmux driver")
> Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
> ---
>  drivers/pinctrl/tegra/Kconfig | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/pinctrl/tegra/Kconfig b/drivers/pinctrl/tegra/Kconfig
> index c7507193044f4af3..eea7ec9688b6460b 100644
> --- a/drivers/pinctrl/tegra/Kconfig
> +++ b/drivers/pinctrl/tegra/Kconfig
> @@ -39,6 +39,7 @@ config PINCTRL_TEGRA234
>  
>  config PINCTRL_TEGRA238
>  	tristate "NVIDIA Tegra238 pinctrl driver"

It's the only user-selectable driver now, so this could be unified as well.

Anyway, the change is correct:

Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com>


Best regards,
Krzysztof

^ permalink raw reply

* [PATCH 2/2] pinctrl: tegra: PINCTRL_TEGRA264 should depend on ARCH_TEGRA
From: Geert Uytterhoeven @ 2026-06-09 15:08 UTC (permalink / raw)
  To: Prathamesh Shete, Thierry Reding, Jonathan Hunter, Linus Walleij
  Cc: Krzysztof Kozlowski, linux-gpio, linux-tegra, Geert Uytterhoeven
In-Reply-To: <0643b689f0f4a453d31183d9f598a6f53574ecbc.1781017599.git.geert+renesas@glider.be>

The NVIDIA Tegra264 MAIN, AON, and UPHY pin controllers are only present
on NVIDIA Tegra264 SoCs.  Hence add a dependency on ARCH_TEGRA, to
prevent asking the user about this driver when configuring a kernel
without NVIDIA Tegra SoC support.

Fixes: c98506206912dd0d ("pinctrl: tegra: Add Tegra264 pinmux driver")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
---
 drivers/pinctrl/tegra/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/pinctrl/tegra/Kconfig b/drivers/pinctrl/tegra/Kconfig
index eea7ec9688b6460b..39a93733efa2f0a9 100644
--- a/drivers/pinctrl/tegra/Kconfig
+++ b/drivers/pinctrl/tegra/Kconfig
@@ -50,6 +50,7 @@ config PINCTRL_TEGRA238
 
 config PINCTRL_TEGRA264
 	tristate "NVIDIA Tegra264 pinctrl driver"
+	depends on ARCH_TEGRA || COMPILE_TEST
 	default m if ARCH_TEGRA_264_SOC
 	select PINCTRL_TEGRA
 	help
-- 
2.43.0


^ permalink raw reply related

* [PATCH 1/2] pinctrl: tegra: PINCTRL_TEGRA238 should depend on ARCH_TEGRA
From: Geert Uytterhoeven @ 2026-06-09 15:08 UTC (permalink / raw)
  To: Prathamesh Shete, Thierry Reding, Jonathan Hunter, Linus Walleij
  Cc: Krzysztof Kozlowski, linux-gpio, linux-tegra, Geert Uytterhoeven

The NVIDIA Tegra238 MAIN and AON pin controllers are only present on
NVIDIA Tegra238 SoCs.  Hence add a dependency on ARCH_TEGRA, to prevent
asking the user about this driver when configuring a kernel without
NVIDIA Tegra SoC support.

Fixes: 25cac7292d49f4fc ("pinctrl: tegra: Add Tegra238 pinmux driver")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
---
 drivers/pinctrl/tegra/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/pinctrl/tegra/Kconfig b/drivers/pinctrl/tegra/Kconfig
index c7507193044f4af3..eea7ec9688b6460b 100644
--- a/drivers/pinctrl/tegra/Kconfig
+++ b/drivers/pinctrl/tegra/Kconfig
@@ -39,6 +39,7 @@ config PINCTRL_TEGRA234
 
 config PINCTRL_TEGRA238
 	tristate "NVIDIA Tegra238 pinctrl driver"
+	depends on ARCH_TEGRA || COMPILE_TEST
 	default m if ARCH_TEGRA_238_SOC
 	select PINCTRL_TEGRA
 	help
-- 
2.43.0


^ permalink raw reply related

* Re: [PATCH 1/3] memory: tegra186-emc: stop borrowing MC aggregate hook for EMC
From: Krzysztof Kozlowski @ 2026-06-09 15:08 UTC (permalink / raw)
  To: Jon Hunter, Sumit Gupta, treding, linux-kernel, linux-tegra; +Cc: bbasu
In-Reply-To: <0d8f1f3b-ed7b-4066-af6e-08c006788535@nvidia.com>

On 09/06/2026 17:07, Jon Hunter wrote:
> 
> On 09/06/2026 16:01, Krzysztof Kozlowski wrote:
>> On 27/05/2026 16:01, Sumit Gupta wrote:
>>> tegra186_emc_interconnect_init() copies the MC's ICC aggregate hook
>>> into the EMC provider.  That hook (tegra234_mc_icc_aggregate /
>>> tegra264_mc_icc_aggregate) uses container_of() to recover 'mc',
>>> which is only valid when the icc_provider is embedded in struct
>>> tegra_mc.  For an EMC node the provider is embedded in struct
>>> tegra186_emc, so 'mc' points into unrelated memory.
>>>
>>> This stayed harmless until commit faafd6ca7e6e ("memory: tegra:
>>> make icc_set_bw return zero if BWMGR not supported") added an
>>> unconditional read of mc->bwmgr_mrq_supported at the top of the
>>> hook.  UBSAN catches the stray load on every EMC aggregation:
>>>
>>>    UBSAN: invalid-load in drivers/memory/tegra/tegra234.c:1104:9
>>>    load of value 112 is not a valid value for type '_Bool'
>>>
>>> No functional impact in practice, since the hook's only other mc
>>> dereference (mc->num_channels) sits inside a
>>> TEGRA_ICC_MC_CPU_CLUSTER* branch that EMC nodes never enter.
>>>
>>> Fix this by setting the EMC provider's aggregate hook to
>>> icc_std_aggregate, instead of borrowing the MC's hook.  The MC
>>> providers continue using their own aggregate hooks, where
>>> container_of() correctly resolves to struct tegra_mc.
>>>
>>> Reported-by: Jon Hunter <jonathanh@nvidia.com>
>>
>> I assume these reports were offlist. Otherwise this has a valid
>> checkpatch warning.
> 
> 
> Yes some of our internal testing flagged this and I had asked Sumit to 
> take a look.
> 

Ack, thanks!

Best regards,
Krzysztof

^ permalink raw reply

* Re: [PATCH 1/3] memory: tegra186-emc: stop borrowing MC aggregate hook for EMC
From: Jon Hunter @ 2026-06-09 15:07 UTC (permalink / raw)
  To: Krzysztof Kozlowski, Sumit Gupta, treding, linux-kernel,
	linux-tegra; +Cc: bbasu
In-Reply-To: <31cba498-24c9-4d73-bb17-bba9cf730b55@kernel.org>


On 09/06/2026 16:01, Krzysztof Kozlowski wrote:
> On 27/05/2026 16:01, Sumit Gupta wrote:
>> tegra186_emc_interconnect_init() copies the MC's ICC aggregate hook
>> into the EMC provider.  That hook (tegra234_mc_icc_aggregate /
>> tegra264_mc_icc_aggregate) uses container_of() to recover 'mc',
>> which is only valid when the icc_provider is embedded in struct
>> tegra_mc.  For an EMC node the provider is embedded in struct
>> tegra186_emc, so 'mc' points into unrelated memory.
>>
>> This stayed harmless until commit faafd6ca7e6e ("memory: tegra:
>> make icc_set_bw return zero if BWMGR not supported") added an
>> unconditional read of mc->bwmgr_mrq_supported at the top of the
>> hook.  UBSAN catches the stray load on every EMC aggregation:
>>
>>    UBSAN: invalid-load in drivers/memory/tegra/tegra234.c:1104:9
>>    load of value 112 is not a valid value for type '_Bool'
>>
>> No functional impact in practice, since the hook's only other mc
>> dereference (mc->num_channels) sits inside a
>> TEGRA_ICC_MC_CPU_CLUSTER* branch that EMC nodes never enter.
>>
>> Fix this by setting the EMC provider's aggregate hook to
>> icc_std_aggregate, instead of borrowing the MC's hook.  The MC
>> providers continue using their own aggregate hooks, where
>> container_of() correctly resolves to struct tegra_mc.
>>
>> Reported-by: Jon Hunter <jonathanh@nvidia.com>
> 
> I assume these reports were offlist. Otherwise this has a valid
> checkpatch warning.


Yes some of our internal testing flagged this and I had asked Sumit to 
take a look.

Cheers
Jon

-- 
nvpublic


^ permalink raw reply

* Re: [PATCH 0/3] memory: tegra: UBSAN fix and cleanups
From: Krzysztof Kozlowski @ 2026-06-09 15:02 UTC (permalink / raw)
  To: treding, jonathanh, linux-kernel, linux-tegra, Sumit Gupta; +Cc: bbasu
In-Reply-To: <20260527140127.49172-1-sumitg@nvidia.com>


On Wed, 27 May 2026 19:31:24 +0530, Sumit Gupta wrote:
> This series fixes an UBSAN warning in the Tegra MC ICC aggregate
> path and removes two pieces of related dead code.
> 
> - Patch 1: Sets the EMC provider's aggregate hook to
>   icc_std_aggregate, instead of borrowing the MC's aggregate hook.
> - Patch 2: Drops tegra264_mc_icc_aggregate() as its only check
>   duplicates the one in tegra264_mc_icc_set().
> - Patch 3: Drops a dead 'if (mc)' check inside the CPU-cluster
>   branch of tegra234_mc_icc_aggregate().
> 
> [...]

Applied, thanks!

[1/3] memory: tegra186-emc: stop borrowing MC aggregate hook for EMC
      https://git.kernel.org/krzk/linux-mem-ctrl/c/2e05f3d6005d9aa3e2e423d2471f290d9ccbe3d2
[2/3] memory: tegra264: drop redundant tegra264_mc_icc_aggregate()
      https://git.kernel.org/krzk/linux-mem-ctrl/c/e23d87a69e827b60fb985236a0984bacb3b68a19
[3/3] memory: tegra234: drop dead NULL check in tegra234_mc_icc_aggregate()
      https://git.kernel.org/krzk/linux-mem-ctrl/c/b97f7dceb8adb2b05d556469afc6fb54947ef61c

Best regards,
-- 
Krzysztof Kozlowski <krzk@kernel.org>


^ permalink raw reply

* Re: [PATCH 1/3] memory: tegra186-emc: stop borrowing MC aggregate hook for EMC
From: Krzysztof Kozlowski @ 2026-06-09 15:01 UTC (permalink / raw)
  To: Sumit Gupta, treding, jonathanh, linux-kernel, linux-tegra; +Cc: bbasu
In-Reply-To: <20260527140127.49172-2-sumitg@nvidia.com>

On 27/05/2026 16:01, Sumit Gupta wrote:
> tegra186_emc_interconnect_init() copies the MC's ICC aggregate hook
> into the EMC provider.  That hook (tegra234_mc_icc_aggregate /
> tegra264_mc_icc_aggregate) uses container_of() to recover 'mc',
> which is only valid when the icc_provider is embedded in struct
> tegra_mc.  For an EMC node the provider is embedded in struct
> tegra186_emc, so 'mc' points into unrelated memory.
> 
> This stayed harmless until commit faafd6ca7e6e ("memory: tegra:
> make icc_set_bw return zero if BWMGR not supported") added an
> unconditional read of mc->bwmgr_mrq_supported at the top of the
> hook.  UBSAN catches the stray load on every EMC aggregation:
> 
>   UBSAN: invalid-load in drivers/memory/tegra/tegra234.c:1104:9
>   load of value 112 is not a valid value for type '_Bool'
> 
> No functional impact in practice, since the hook's only other mc
> dereference (mc->num_channels) sits inside a
> TEGRA_ICC_MC_CPU_CLUSTER* branch that EMC nodes never enter.
> 
> Fix this by setting the EMC provider's aggregate hook to
> icc_std_aggregate, instead of borrowing the MC's hook.  The MC
> providers continue using their own aggregate hooks, where
> container_of() correctly resolves to struct tegra_mc.
> 
> Reported-by: Jon Hunter <jonathanh@nvidia.com>

I assume these reports were offlist. Otherwise this has a valid
checkpatch warning.

Best regards,
Krzysztof

^ permalink raw reply

* Re: [PATCH 0/3] memory: tegra: UBSAN fix and cleanups
From: Jon Hunter @ 2026-06-09 14:51 UTC (permalink / raw)
  To: Sumit Gupta, krzk, treding, linux-kernel, linux-tegra; +Cc: bbasu
In-Reply-To: <20260527140127.49172-1-sumitg@nvidia.com>


On 27/05/2026 15:01, Sumit Gupta wrote:
> This series fixes an UBSAN warning in the Tegra MC ICC aggregate
> path and removes two pieces of related dead code.
> 
> - Patch 1: Sets the EMC provider's aggregate hook to
>    icc_std_aggregate, instead of borrowing the MC's aggregate hook.
> - Patch 2: Drops tegra264_mc_icc_aggregate() as its only check
>    duplicates the one in tegra264_mc_icc_set().
> - Patch 3: Drops a dead 'if (mc)' check inside the CPU-cluster
>    branch of tegra234_mc_icc_aggregate().
> 
> 
> Sumit Gupta (3):
>    memory: tegra186-emc: stop borrowing MC aggregate hook for EMC
>    memory: tegra264: drop redundant tegra264_mc_icc_aggregate()
>    memory: tegra234: drop dead NULL check in tegra234_mc_icc_aggregate()
> 
>   drivers/memory/tegra/tegra186-emc.c |  4 +---
>   drivers/memory/tegra/tegra234.c     |  6 ++----
>   drivers/memory/tegra/tegra264.c     | 17 +----------------
>   3 files changed, 4 insertions(+), 23 deletions(-)
> 

For the series ...

Reviewed-by: Jon Hunter <jonathanh@nvidia.com>
Tested-by: Jon Hunter <jonathanh@nvidia.com>

Thanks!
Jon

-- 
nvpublic


^ permalink raw reply

* Re: [GIT PULL 4/6] firmware: tegra: Changes for v7.2-rc1
From: Arnd Bergmann @ 2026-06-09 14:38 UTC (permalink / raw)
  To: Thierry Reding, arm, soc
  Cc: Jon Hunter, linux-tegra, linux-arm-kernel, Sasha Levin
In-Reply-To: <20260531060825.1855391-4-thierry.reding@kernel.org>

On Sun, May 31, 2026, at 08:08, Thierry Reding wrote:
> From: Thierry Reding <thierry.reding@gmail.com>
> ----------------------------------------------------------------
> Jon Hunter (2):
>       firmware: tegra: bpmp: Propagate debugfs errors
>       firmware: tegra: bpmp: Add support for multi-socket platforms
>
> Sasha Levin (1):
>       firmware: tegra: Make TEGRA_IVC a hidden Kconfig symbol

I'm merging this, but I would like to point out that the third
patch does not actually solve a real problem and the patch
description is complete nonsense.

Looking through linux-next, I see two more of Sasha's patches
addressing kconfiglint 'K002' warnings. I have not used that
tool and the warning sounds useful in general, but all three
of these patches look like false positives.

Sasha, please try to understand better what the tool is
trying to warn about.

      Arnd

^ permalink raw reply

* [PATCH 4/4] ASoC: tegra: tegra210_ahub: Validate written enum value
From: HyeongJun An @ 2026-06-09 12:43 UTC (permalink / raw)
  To: Mark Brown, Liam Girdwood
  Cc: Jaroslav Kysela, Takashi Iwai, linux-sound, linux-kernel,
	HyeongJun An, Thierry Reding, Jonathan Hunter, Sameer Pujar,
	linux-tegra
In-Reply-To: <20260609124317.38046-1-sammiee5311@gmail.com>

tegra_ahub_put_value_enum() reads e->values[item[0]] before
checking whether item[0] is within the enum item range. The existing
check therefore happens too late to prevent an out-of-range read of the
values array.

Move the check before the array access.

Fixes: 16e1bcc2caf4 ("ASoC: tegra: Add Tegra210 based AHUB driver")
Assisted-by: Claude:claude-opus-4-8
Signed-off-by: HyeongJun An <sammiee5311@gmail.com>
---
 sound/soc/tegra/tegra210_ahub.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/sound/soc/tegra/tegra210_ahub.c b/sound/soc/tegra/tegra210_ahub.c
index ece33b7ff190..efc8f3388668 100644
--- a/sound/soc/tegra/tegra210_ahub.c
+++ b/sound/soc/tegra/tegra210_ahub.c
@@ -62,13 +62,15 @@ static int tegra_ahub_put_value_enum(struct snd_kcontrol *kctl,
 	struct snd_soc_dapm_update update[TEGRA_XBAR_UPDATE_MAX_REG] = { };
 	int val_bytes = snd_soc_component_regmap_val_bytes(cmpnt);
 	unsigned int *item = uctl->value.enumerated.item;
-	unsigned int value = e->values[item[0]];
+	unsigned int value;
 	unsigned int i, bit_pos, reg_idx = 0, reg_val = 0;
 	int change = 0;
 
 	if (item[0] >= e->items)
 		return -EINVAL;
 
+	value = e->values[item[0]];
+
 	if (value) {
 		/* Get the register index and value to set */
 		reg_idx = (value - 1) / (8 * val_bytes);
-- 
2.43.0


^ permalink raw reply related

* Re: [PATCH v2 00/14] list: Prepare entry iterators to cache cursor state
From: Christian König @ 2026-06-09 10:33 UTC (permalink / raw)
  To: Kaitao Cheng, Andy Shevchenko, Muchun Song, Philipp Reisner,
	Lars Ellenberg, Christoph Böhmwalder, Jens Axboe,
	Takashi Sakamoto, Andrzej Hajda, Neil Armstrong, Robert Foss,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Simona Vetter, Jani Nikula, Joonas Lahtinen, Rodrigo Vivi,
	Tvrtko Ursulin, Huang Rui, Eddie James, Mark Brown,
	Maxime Coquelin, Alexandre Torgue, Laxman Dewangan,
	Thierry Reding, Jonathan Hunter, Sowjanya Komatineni,
	Davidlohr Bueso, Paul E . McKenney, Josh Triplett, Peter Zijlstra,
	Ingo Molnar, Will Deacon, Boqun Feng, Liam Girdwood,
	Jaroslav Kysela, Takashi Iwai
  Cc: Laurent Pinchart, Jonas Karlman, Jernej Skrabec, Matthew Auld,
	Matthew Brost, Waiman Long, drbd-dev, linux-block,
	linux1394-devel, dri-devel, intel-gfx, linux-spi, linux-stm32,
	linux-arm-kernel, linux-tegra, linux-sound, linux-kernel,
	Andrew Morton, Randy Dunlap, Christian Brauner, David Howells,
	Luca Ceresoli, Kaito Cheng
In-Reply-To: <20260609061347.93688-1-kaitao.cheng@linux.dev>

On 6/9/26 08:13, Kaitao Cheng wrote:
> From: Kaito Cheng <chengkaitao@kylinos.cn>
> 
> This series prepares for, and then updates, the list_for_each_entry()
> family so the common entry iterators cache their next or previous cursor
> before the loop body runs.

Why in the world would we want to do that?

The safe and non-safe variants have very distinct use cases and that is completely intentional.

What we could improve maybe is the documentation, from my experience an astonishing large amount of people have misconceptions about the safe variants.

> The first 13 patches open-code loops that intentionally depend on the
> old "derive the next entry from the current cursor at the end of the
> iteration" behaviour.  These loops append work to the list being walked,
> restart traversal after dropping a lock, skip an entry consumed by the
> current iteration, or otherwise adjust the cursor in the loop body.

Well I have to clearly reject the changes for subsystems/components I'm maintaining, that just looks horrible to me and I clearly don't see a good reason for that.

Regards,
Christian.

> 
> The final patch changes include/linux/list.h to keep a private cursor in
> the common entry iterators while preserving the public macro interface.
> The safe variants remain available when callers need the temporary
> cursor explicitly or have stronger mutation requirements.
> 
> Changes in v2 (Muchun Song, Andy Shevchenko):
>  - Drop the list_for_each_entry_mutable*() helpers from v1 and make the
>    cursor change directly in the existing list_for_each_entry*() helpers.
>  - Open-code special list walks that rely on updating the loop cursor in
>    the body, preserving their existing traversal semantics.
> 
> Link to v1:
> https://lore.kernel.org/all/20260529082149.76764-1-kaitao.cheng@linux.dev/
> 
> Kaitao Cheng (14):
>   drbd: Open-code transfer log list walk
>   firewire: core: Open-code topology list walk
>   drm/bridge: Open-code bridge chain list walks
>   drm/i915/gt: Open-code active timeline walk
>   drm/i915: Open-code DFS dependency list walk
>   drm/ttm: Open-code reservation list walk
>   spi: fsi: Open-code message transfer walk
>   spi: stm32-ospi: Open-code message transfer walk
>   spi: stm32-qspi: Open-code message transfer walk
>   spi: tegra210-quad: Open-code message transfer walk
>   locking/locktorture: Open-code ww mutex list walk
>   locking/ww_mutex: Open-code stress reorder list walk
>   ASoC: dapm: Open-code widget invalidation walk
>   list: Cache cursors in entry iterators
> 
>  drivers/block/drbd/drbd_debugfs.c      |  4 ++-
>  drivers/firewire/core-topology.c       |  4 ++-
>  drivers/gpu/drm/drm_bridge.c           |  7 ++--
>  drivers/gpu/drm/i915/gt/intel_reset.c  |  4 ++-
>  drivers/gpu/drm/i915/i915_scheduler.c  |  4 ++-
>  drivers/gpu/drm/ttm/ttm_execbuf_util.c |  4 ++-
>  drivers/spi/spi-fsi.c                  |  5 ++-
>  drivers/spi/spi-stm32-ospi.c           |  4 ++-
>  drivers/spi/spi-stm32-qspi.c           |  5 ++-
>  drivers/spi/spi-tegra210-quad.c        |  4 ++-
>  include/linux/list.h                   | 46 ++++++++++++++++++++------
>  kernel/locking/locktorture.c           |  4 ++-
>  kernel/locking/test-ww_mutex.c         |  4 ++-
>  sound/soc/soc-dapm.c                   |  4 ++-
>  14 files changed, 78 insertions(+), 25 deletions(-)
> 
> --
> 2.43.0
> 


^ permalink raw reply

* Re: [PATCH v5 0/4] Enable sysfs module symlink for more built-in drivers
From: Danilo Krummrich @ 2026-06-09 10:29 UTC (permalink / raw)
  To: Suzuki K Poulose
  Cc: Shashank Balaji, James Clark, Alexander Shishkin,
	Greg Kroah-Hartman, Rafael J . Wysocki, Miguel Ojeda, Boqun Feng,
	Gary Guo, Björn Roy Baron, Benno Lossin, Andreas Hindborg,
	Alice Ryhl, Trevor Gross, Jonathan Corbet, Shuah Khan,
	Luis Chamberlain, Petr Pavlu, Daniel Gomez, Sami Tolvanen,
	Aaron Tomlin, Mike Leach, Leo Yan, Thierry Reding,
	Jonathan Hunter, Rahul Bukte, linux-kernel, coresight,
	linux-arm-kernel, driver-core, rust-for-linux, linux-doc,
	Daniel Palmer, Tim Bird, linux-modules, linux-tegra, Sumit Gupta
In-Reply-To: <1c8e441a-6b33-465a-88f9-9552f346ae18@arm.com>

On Tue Jun 9, 2026 at 11:08 AM CEST, Suzuki K Poulose wrote:
> On 08/06/2026 23:24, Danilo Krummrich wrote:
>> On Mon, 18 May 2026 19:19:56 +0900, Shashank Balaji wrote:
>>> [PATCH v5 0/4] Enable sysfs module symlink for more built-in drivers
>> 
>> Applied, thanks!
>> 
>>    Branch: driver-core-testing
>>    Tree:   git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core.git
>> 
>> [1/4] soc/tegra: cbb: Move driver registration from pure_initcall to core_initcall
>>        commit: cd6e95e7ab29
>> [2/4] kernel: param: initialize module_kset in a pure_initcall
>>        commit: c82dfce47833
>> [3/4] coresight: pass THIS_MODULE implicitly through a macro
>>        commit: efc22b3f89a3
>> [4/4] driver core: platform: set mod_name in driver registration
>>        commit: a7a7dc5c46a0
>> 
>> The patches will appear in the next linux-next integration (typically within 24
>> hours on weekdays).
>> 
>> The patches are in the driver-core-testing branch and will be promoted to
>> driver-core-next after validation.
>
> Apologies, I missed your emails. I am fine with those, happy to fixup 
> anything if the linux-next screams.

Thanks for confirming! I did a test merge with linux-next and an allmodconfig
arm64 build before picking it up, so it should be fine.

Thanks,
Danilo

^ permalink raw reply

* Re: [PATCH v5 0/4] Enable sysfs module symlink for more built-in drivers
From: Suzuki K Poulose @ 2026-06-09  9:08 UTC (permalink / raw)
  To: Danilo Krummrich, Shashank Balaji
  Cc: James Clark, Alexander Shishkin, Greg Kroah-Hartman,
	Rafael J . Wysocki, Miguel Ojeda, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, Jonathan Corbet, Shuah Khan, Luis Chamberlain,
	Petr Pavlu, Daniel Gomez, Sami Tolvanen, Aaron Tomlin, Mike Leach,
	Leo Yan, Thierry Reding, Jonathan Hunter, Rahul Bukte,
	linux-kernel, coresight, linux-arm-kernel, driver-core,
	rust-for-linux, linux-doc, Daniel Palmer, Tim Bird, linux-modules,
	linux-tegra, Sumit Gupta
In-Reply-To: <20260608222448.1353773-1-dakr@kernel.org>

On 08/06/2026 23:24, Danilo Krummrich wrote:
> On Mon, 18 May 2026 19:19:56 +0900, Shashank Balaji wrote:
>> [PATCH v5 0/4] Enable sysfs module symlink for more built-in drivers
> 
> Applied, thanks!
> 
>    Branch: driver-core-testing
>    Tree:   git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core.git
> 
> [1/4] soc/tegra: cbb: Move driver registration from pure_initcall to core_initcall
>        commit: cd6e95e7ab29
> [2/4] kernel: param: initialize module_kset in a pure_initcall
>        commit: c82dfce47833
> [3/4] coresight: pass THIS_MODULE implicitly through a macro
>        commit: efc22b3f89a3
> [4/4] driver core: platform: set mod_name in driver registration
>        commit: a7a7dc5c46a0
> 
> The patches will appear in the next linux-next integration (typically within 24
> hours on weekdays).
> 
> The patches are in the driver-core-testing branch and will be promoted to
> driver-core-next after validation.

Apologies, I missed your emails. I am fine with those, happy to fixup 
anything if the linux-next screams.

Cheers
Suzuki

^ permalink raw reply

* [PATCH 6/6] gpu: host1x: Annotate intentional syncpoint wrap-around
From: Mikko Perttunen @ 2026-06-09  8:09 UTC (permalink / raw)
  To: Thierry Reding, David Airlie, Simona Vetter
  Cc: dri-devel, linux-tegra, linux-kernel, Mikko Perttunen
In-Reply-To: <20260609-b4-host1x-small-fixes-a-v1-0-7c1131c0b3ad@nvidia.com>

Host1x syncpoints are 32-bit counters that roll over by design.
To make that explicit in the code, use wrapping_* functions whenever
arithmetic is done on syncpoint values.

Atomic operations cannot be updated but a comment is added.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
---
 drivers/gpu/host1x/cdma.c          |  3 ++-
 drivers/gpu/host1x/hw/channel_hw.c | 10 +++++++---
 drivers/gpu/host1x/intr.c          |  5 +++--
 drivers/gpu/host1x/syncpt.c        |  7 ++++++-
 drivers/gpu/host1x/syncpt.h        |  3 ++-
 5 files changed, 20 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/host1x/cdma.c b/drivers/gpu/host1x/cdma.c
index ba2e572567c0..f6d3db2c8c39 100644
--- a/drivers/gpu/host1x/cdma.c
+++ b/drivers/gpu/host1x/cdma.c
@@ -13,6 +13,7 @@
 #include <linux/interrupt.h>
 #include <linux/kernel.h>
 #include <linux/kfifo.h>
+#include <linux/overflow.h>
 #include <linux/slab.h>
 #include <trace/events/host1x.h>
 
@@ -419,7 +420,7 @@ void host1x_cdma_update_sync_queue(struct host1x_cdma *cdma,
 		/* won't need a timeout when replayed */
 		job->timeout = 0;
 
-		syncpt_incrs = job->syncpt_end - syncpt_val;
+		syncpt_incrs = wrapping_sub(u32, job->syncpt_end, syncpt_val);
 		dev_dbg(dev, "%s: CPU incr (%d)\n", __func__, syncpt_incrs);
 
 		host1x_job_dump(dev, job);
diff --git a/drivers/gpu/host1x/hw/channel_hw.c b/drivers/gpu/host1x/hw/channel_hw.c
index 9dda73199889..a8251ec0810c 100644
--- a/drivers/gpu/host1x/hw/channel_hw.c
+++ b/drivers/gpu/host1x/hw/channel_hw.c
@@ -7,6 +7,7 @@
 
 #include <linux/host1x.h>
 #include <linux/iommu.h>
+#include <linux/overflow.h>
 #include <linux/slab.h>
 
 #include <trace/events/host1x.h>
@@ -120,7 +121,8 @@ static void submit_gathers(struct host1x_job *job, struct host1x_job_cmd *cmds,
 
 		if (cmd->is_wait) {
 			if (cmd->wait.relative)
-				threshold = job_syncpt_base + cmd->wait.threshold;
+				threshold = wrapping_add(u32, job_syncpt_base,
+							 cmd->wait.threshold);
 			else
 				threshold = cmd->wait.threshold;
 
@@ -259,7 +261,8 @@ static void channel_program_cdma(struct host1x_job *job)
 
 	/* Submit work. */
 	job->syncpt_end = host1x_syncpt_incr_max(sp, job->syncpt_incrs);
-	submit_gathers(job, job->cmds + i, job->num_cmds - i, job->syncpt_end - job->syncpt_incrs);
+	submit_gathers(job, job->cmds + i, job->num_cmds - i,
+		       wrapping_sub(u32, job->syncpt_end, job->syncpt_incrs));
 
 	/* Before releasing MLOCK, ensure engine is idle again. */
 	fence = host1x_syncpt_incr_max(sp, 1);
@@ -297,7 +300,8 @@ static void channel_program_cdma(struct host1x_job *job)
 
 	job->syncpt_end = host1x_syncpt_incr_max(sp, job->syncpt_incrs);
 
-	submit_gathers(job, job->cmds, job->num_cmds, job->syncpt_end - job->syncpt_incrs);
+	submit_gathers(job, job->cmds, job->num_cmds,
+		       wrapping_sub(u32, job->syncpt_end, job->syncpt_incrs));
 #endif
 }
 
diff --git a/drivers/gpu/host1x/intr.c b/drivers/gpu/host1x/intr.c
index f77a678949e9..f9fd8a471e60 100644
--- a/drivers/gpu/host1x/intr.c
+++ b/drivers/gpu/host1x/intr.c
@@ -7,6 +7,7 @@
 
 #include <linux/clk.h>
 #include <linux/interrupt.h>
+#include <linux/overflow.h>
 #include "dev.h"
 #include "fence.h"
 #include "intr.h"
@@ -17,7 +18,7 @@ static void host1x_intr_add_fence_to_list(struct host1x_fence_list *list,
 	struct host1x_syncpt_fence *fence_in_list;
 
 	list_for_each_entry_reverse(fence_in_list, &list->list, list) {
-		if ((s32)(fence_in_list->threshold - fence->threshold) <= 0) {
+		if ((s32)wrapping_sub(u32, fence_in_list->threshold, fence->threshold) <= 0) {
 			/* Fence in list is before us, we can insert here */
 			list_add(&fence->list, &fence_in_list->list);
 			return;
@@ -83,7 +84,7 @@ void host1x_intr_handle_interrupt(struct host1x *host, unsigned int id)
 	spin_lock(&sp->fences.lock);
 
 	list_for_each_entry_safe(fence, tmp, &sp->fences.list, list) {
-		if (((value - fence->threshold) & 0x80000000U) != 0U) {
+		if ((wrapping_sub(u32, value, fence->threshold) & 0x80000000U) != 0U) {
 			/* Fence is not yet expired, we are done */
 			break;
 		}
diff --git a/drivers/gpu/host1x/syncpt.c b/drivers/gpu/host1x/syncpt.c
index acc7d82e0585..9ac4f0c80728 100644
--- a/drivers/gpu/host1x/syncpt.c
+++ b/drivers/gpu/host1x/syncpt.c
@@ -8,6 +8,7 @@
 #include <linux/module.h>
 #include <linux/device.h>
 #include <linux/dma-fence.h>
+#include <linux/overflow.h>
 #include <linux/slab.h>
 
 #include <trace/events/host1x.h>
@@ -126,6 +127,10 @@ EXPORT_SYMBOL(host1x_syncpt_id);
  */
 u32 host1x_syncpt_incr_max(struct host1x_syncpt *sp, u32 incrs)
 {
+	/*
+	 * Syncpoint values are intended to be modulo 2^32, so overflow
+	 * here is intended.
+	 */
 	return (u32)atomic_add_return(incrs, &sp->max_val);
 }
 EXPORT_SYMBOL(host1x_syncpt_incr_max);
@@ -274,7 +279,7 @@ bool host1x_syncpt_is_expired(struct host1x_syncpt *sp, u32 thresh)
 
 	current_val = (u32)atomic_read(&sp->min_val);
 
-	return ((current_val - thresh) & 0x80000000U) == 0U;
+	return (wrapping_sub(u32, current_val, thresh) & 0x80000000U) == 0U;
 }
 
 int host1x_syncpt_init(struct host1x *host)
diff --git a/drivers/gpu/host1x/syncpt.h b/drivers/gpu/host1x/syncpt.h
index 4c3f3b2f0e9c..9eff42efc445 100644
--- a/drivers/gpu/host1x/syncpt.h
+++ b/drivers/gpu/host1x/syncpt.h
@@ -12,6 +12,7 @@
 #include <linux/host1x.h>
 #include <linux/kernel.h>
 #include <linux/kref.h>
+#include <linux/overflow.h>
 #include <linux/sched.h>
 
 #include "fence.h"
@@ -77,7 +78,7 @@ static inline bool host1x_syncpt_check_max(struct host1x_syncpt *sp, u32 real)
 	if (sp->client_managed)
 		return true;
 	max = host1x_syncpt_read_max(sp);
-	return (s32)(max - real) >= 0;
+	return (s32)wrapping_sub(u32, max, real) >= 0;
 }
 
 /* Return true if sync point is client managed. */

-- 
2.53.0


^ permalink raw reply related

* [PATCH 5/6] gpu: host1x: Change pin_job() return type to int
From: Mikko Perttunen @ 2026-06-09  8:09 UTC (permalink / raw)
  To: Thierry Reding, David Airlie, Simona Vetter
  Cc: dri-devel, linux-tegra, linux-kernel, Mikko Perttunen
In-Reply-To: <20260609-b4-host1x-small-fixes-a-v1-0-7c1131c0b3ad@nvidia.com>

pin_job() returns negative errno values on error paths (-EINVAL,
-ENOMEM, PTR_ERR() of mapping) but was declared as unsigned int.
The caller would immediately cast back to int, so there was no
functional issue, but it still warrants fixing.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
---
 drivers/gpu/host1x/job.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/host1x/job.c b/drivers/gpu/host1x/job.c
index 3ed49e1fd933..165979319599 100644
--- a/drivers/gpu/host1x/job.c
+++ b/drivers/gpu/host1x/job.c
@@ -138,7 +138,7 @@ void host1x_job_add_wait(struct host1x_job *job, u32 id, u32 thresh,
 }
 EXPORT_SYMBOL(host1x_job_add_wait);
 
-static unsigned int pin_job(struct host1x *host, struct host1x_job *job)
+static int pin_job(struct host1x *host, struct host1x_job *job)
 {
 	unsigned long mask = HOST1X_RELOC_READ | HOST1X_RELOC_WRITE;
 	struct host1x_client *client = job->client;

-- 
2.53.0


^ permalink raw reply related

* [PATCH 4/6] gpu: host1x: Avoid stack over-read in debug output helpers
From: Mikko Perttunen @ 2026-06-09  8:09 UTC (permalink / raw)
  To: Thierry Reding, David Airlie, Simona Vetter
  Cc: dri-devel, linux-tegra, linux-kernel, Mikko Perttunen
In-Reply-To: <20260609-b4-host1x-small-fixes-a-v1-0-7c1131c0b3ad@nvidia.com>

host1x_debug_output() and host1x_debug_cont() used vsnprintf(), which
returns the length the formatted string would have reached with an
unbounded buffer. That return value was passed straight to o->fn as
the number of bytes to emit.

This could cause a read past end of the output buffer if a call to
host1x_debug_* produced a string longer than 256 bytes. This only
affected the debugfs files as the printk debug sink ignores the
number of bytes. In practice, this is very unlikely to occur.

Fix by switching to vscnprintf(), which returns the number of bytes
actually written.

Fixes: 6236451d83a7 ("gpu: host1x: Add debug support")
Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
---
 drivers/gpu/host1x/debug.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/host1x/debug.c b/drivers/gpu/host1x/debug.c
index 6433c00d5d7e..b828f773fc06 100644
--- a/drivers/gpu/host1x/debug.c
+++ b/drivers/gpu/host1x/debug.c
@@ -31,7 +31,7 @@ void host1x_debug_output(struct output *o, const char *fmt, ...)
 	int len;
 
 	va_start(args, fmt);
-	len = vsnprintf(o->buf, sizeof(o->buf), fmt, args);
+	len = vscnprintf(o->buf, sizeof(o->buf), fmt, args);
 	va_end(args);
 
 	o->fn(o->ctx, o->buf, len, false);
@@ -43,7 +43,7 @@ void host1x_debug_cont(struct output *o, const char *fmt, ...)
 	int len;
 
 	va_start(args, fmt);
-	len = vsnprintf(o->buf, sizeof(o->buf), fmt, args);
+	len = vscnprintf(o->buf, sizeof(o->buf), fmt, args);
 	va_end(args);
 
 	o->fn(o->ctx, o->buf, len, true);

-- 
2.53.0


^ permalink raw reply related

* [PATCH 3/6] gpu: host1x: Fix offset calculation in trace_write_gather
From: Mikko Perttunen @ 2026-06-09  8:09 UTC (permalink / raw)
  To: Thierry Reding, David Airlie, Simona Vetter
  Cc: dri-devel, linux-tegra, linux-kernel, Mikko Perttunen
In-Reply-To: <20260609-b4-host1x-small-fixes-a-v1-0-7c1131c0b3ad@nvidia.com>

When a gather longer than 2*TRACE_MAX_LENGTH (256) words is traced
through host1x_cdma_push_gather, the reported BO offset drifts from
the third iteration onward.

Fix the calculation by properly calculating the value on each loop
rather than accumulating.

In reality, gathers tend to be pretty short so this is unlikely to
ever have been observed.

Fixes: b40d02bf96e0 ("gpu: host1x: Use struct host1x_bo pointers in traces")
Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
---
 drivers/gpu/host1x/hw/channel_hw.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/host1x/hw/channel_hw.c b/drivers/gpu/host1x/hw/channel_hw.c
index 2df6a16d484e..9dda73199889 100644
--- a/drivers/gpu/host1x/hw/channel_hw.c
+++ b/drivers/gpu/host1x/hw/channel_hw.c
@@ -36,10 +36,9 @@ static void trace_write_gather(struct host1x_cdma *cdma, struct host1x_bo *bo,
 		for (i = 0; i < words; i += TRACE_MAX_LENGTH) {
 			u32 num_words = min(words - i, TRACE_MAX_LENGTH);
 
-			offset += i * sizeof(u32);
-
 			trace_host1x_cdma_push_gather(dev_name(dev), bo,
-						      num_words, offset,
+						      num_words,
+						      offset + i * sizeof(u32),
 						      mem);
 		}
 

-- 
2.53.0


^ permalink raw reply related

* [PATCH 2/6] gpu: host1x: Avoid double device_add when clients already present
From: Mikko Perttunen @ 2026-06-09  8:09 UTC (permalink / raw)
  To: Thierry Reding, David Airlie, Simona Vetter
  Cc: dri-devel, linux-tegra, linux-kernel, Mikko Perttunen
In-Reply-To: <20260609-b4-host1x-small-fixes-a-v1-0-7c1131c0b3ad@nvidia.com>

host1x_device_add looks through the idle clients list to populate
subdevs, and any matches entries are moved from the subdevs list
to the active list. If all subdevs are populated, device_add will
be called on the device. The secondary "subdevs list empty" check
will then incorrectly again call device_add.

However, this would require a convoluted scenario since clients don't
typically end up on the idle clients list.

Fix by checking whether the device was already added before adding
again.

Fixes: fab823d82ee5 ("gpu: host1x: Allow loading tegra-drm without enabled engines")
Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
---
 drivers/gpu/host1x/bus.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/host1x/bus.c b/drivers/gpu/host1x/bus.c
index f814eb4941c0..fe4af98e8fc6 100644
--- a/drivers/gpu/host1x/bus.c
+++ b/drivers/gpu/host1x/bus.c
@@ -508,7 +508,7 @@ static int host1x_device_add(struct host1x *host1x,
 	 * Add device even if there are no subdevs to ensure syncpoint functionality
 	 * is available regardless of whether any engine subdevices are present
 	 */
-	if (list_empty(&device->subdevs)) {
+	if (list_empty(&device->subdevs) && !device->registered) {
 		err = device_add(&device->dev);
 		if (err < 0)
 			dev_err(&device->dev, "failed to add device: %d\n", err);

-- 
2.53.0


^ permalink raw reply related

* [PATCH 1/6] gpu: host1x: Wait for timeout worker completion on channel free
From: Mikko Perttunen @ 2026-06-09  8:09 UTC (permalink / raw)
  To: Thierry Reding, David Airlie, Simona Vetter
  Cc: dri-devel, linux-tegra, linux-kernel, Mikko Perttunen
In-Reply-To: <20260609-b4-host1x-small-fixes-a-v1-0-7c1131c0b3ad@nvidia.com>

cdma_timeout_destroy() used cancel_delayed_work() to cancel pending
timeout work when destroying the CDMA. Usually this is fine, but
there is a narrow race condition where the timeout handler has started
execution but has not taken cdma->lock; the channel is freed causing
cdma_stop to take cdma->lock and flush the channel; host1x_cdma_deinit
then proceeds with deinitializing cdma while the handler is waiting to
take cdma->lock.

Therefore change cdma_timeout_destroy to use cancel_delayed_work_sync
instead to ensure any pending timeout work completes before proceeding.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
---
 drivers/gpu/host1x/hw/cdma_hw.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/host1x/hw/cdma_hw.c b/drivers/gpu/host1x/hw/cdma_hw.c
index 3f3f0018eee0..ab714d221120 100644
--- a/drivers/gpu/host1x/hw/cdma_hw.c
+++ b/drivers/gpu/host1x/hw/cdma_hw.c
@@ -355,7 +355,7 @@ static int cdma_timeout_init(struct host1x_cdma *cdma)
 static void cdma_timeout_destroy(struct host1x_cdma *cdma)
 {
 	if (cdma->timeout.initialized)
-		cancel_delayed_work(&cdma->timeout.wq);
+		cancel_delayed_work_sync(&cdma->timeout.wq);
 
 	cdma->timeout.initialized = false;
 }

-- 
2.53.0


^ permalink raw reply related

* [PATCH 0/6] Miscellaneous fixes for the Host1x driver
From: Mikko Perttunen @ 2026-06-09  8:09 UTC (permalink / raw)
  To: Thierry Reding, David Airlie, Simona Vetter
  Cc: dri-devel, linux-tegra, linux-kernel, Mikko Perttunen

This series has a number of small miscellaneous fixes to the host1x 
driver.

Patches 1 to 4 fix various logic issues that are unlikely to happen
but technically possible.

Patch 5 fixes a return type from unsigned int to int -- no functional
difference.

Patch 6 adds makes syncpoint value arithmetic explicitly wrapping, 
mostly to help static/dynamic analysis tools.

---
Mikko Perttunen (6):
      gpu: host1x: Wait for timeout worker completion on channel free
      gpu: host1x: Avoid double device_add when clients already present
      gpu: host1x: Fix offset calculation in trace_write_gather
      gpu: host1x: Avoid stack over-read in debug output helpers
      gpu: host1x: Change pin_job() return type to int
      gpu: host1x: Annotate intentional syncpoint wrap-around

 drivers/gpu/host1x/bus.c           |  2 +-
 drivers/gpu/host1x/cdma.c          |  3 ++-
 drivers/gpu/host1x/debug.c         |  4 ++--
 drivers/gpu/host1x/hw/cdma_hw.c    |  2 +-
 drivers/gpu/host1x/hw/channel_hw.c | 15 +++++++++------
 drivers/gpu/host1x/intr.c          |  5 +++--
 drivers/gpu/host1x/job.c           |  2 +-
 drivers/gpu/host1x/syncpt.c        |  7 ++++++-
 drivers/gpu/host1x/syncpt.h        |  3 ++-
 9 files changed, 27 insertions(+), 16 deletions(-)
---
base-commit: 4549871118cf616eecdd2d939f78e3b9e1dddc48
change-id: 20260608-b4-host1x-small-fixes-a-081cfea2c073


^ permalink raw reply

* [PATCH v4 3/3] iommu/arm-smmu-v3: Issue CFGI/TLBI twice on Tegra264
From: Ashish Mhetre @ 2026-06-09  7:32 UTC (permalink / raw)
  To: will, robin.murphy, joro, jgg, nicolinc
  Cc: linux-arm-kernel, iommu, linux-kernel, linux-tegra, Ashish Mhetre
In-Reply-To: <20260609073204.1760077-1-amhetre@nvidia.com>

Apply the workaround for Tegra264 erratum ARM_SMMU_OPT_REPEAT_TLBI_CFGI
by issuing every CFGI/TLBI cmdlist twice on affected SMMU instances,
with CMD_SYNC after each. The erratum requires this exact sequencing:

    TLBI/CFGI ... CMD_SYNC TLBI/CFGI ... CMD_SYNC

Rename the existing arm_smmu_cmdq_issue_cmdlist() to
__arm_smmu_cmdq_issue_cmdlist() and add a thin wrapper that, on
affected SMMUs and when @sync is true with @n > 0, re-issues the
same cmdlist a second time when arm_smmu_erratum_cmd_needs_repeating()
is true. The @n > 0 gate is needed because arm_smmu_cmdq_batch_add_cmd_p()
can call arm_smmu_cmdq_issue_cmdlist() with @n == 0 and @sync == true
to flush a bare CMD_SYNC when the next command is not supported by
the batch's pre-selected cmdq; the repeat path must not inspect
cmds[0] in that case. The static-key gate inside the predicate means
the wrapper compiles to a single tested branch on unaffected kernels.

For the in-tree batching path, register the new condition with
arm_smmu_cmdq_batch_force_sync() so that a full batch carrying
CFGI/TLBI commands flushes with sync=true.

For the iommufd VSMMU path add an arm_vsmmu_can_batch_cmd() predicate
that splits the iommufd batch at every "needs repeating" transition,
so the wrapper's per-batch decision based on the first command stays
correct even when userspace mixes opcode classes.

Also document the erratum in Documentation/arch/arm64/silicon-errata.rst.

Suggested-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Ashish Mhetre <amhetre@nvidia.com>
---
 Documentation/arch/arm64/silicon-errata.rst   |  2 ++
 .../arm/arm-smmu-v3/arm-smmu-v3-iommufd.c     | 15 +++++++-
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c   | 35 ++++++++++++++++---
 3 files changed, 47 insertions(+), 5 deletions(-)

diff --git a/Documentation/arch/arm64/silicon-errata.rst b/Documentation/arch/arm64/silicon-errata.rst
index 046a7fa47063..96050886a7d6 100644
--- a/Documentation/arch/arm64/silicon-errata.rst
+++ b/Documentation/arch/arm64/silicon-errata.rst
@@ -268,6 +268,8 @@ stable kernels.
 |                |                 | T241-MPAM-4,    |                             |
 |                |                 | T241-MPAM-6     |                             |
 +----------------+-----------------+-----------------+-----------------------------+
+| NVIDIA         | T264 SMMU       | T264-SMMU-3     | N/A                         |
++----------------+-----------------+-----------------+-----------------------------+
 +----------------+-----------------+-----------------+-----------------------------+
 | Freescale/NXP  | LS2080A/LS1043A | A-008585        | FSL_ERRATUM_A008585         |
 +----------------+-----------------+-----------------+-----------------------------+
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c
index 1e9f7d2de344..11d22acae613 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c
@@ -350,6 +350,18 @@ static int arm_vsmmu_convert_user_cmd(struct arm_vsmmu *vsmmu,
 	return 0;
 }
 
+static bool arm_vsmmu_can_batch_cmd(struct arm_smmu_device *smmu,
+				    struct arm_vsmmu_invalidation_cmd *last,
+				    struct arm_vsmmu_invalidation_cmd *next)
+{
+	struct arm_smmu_cmd next_cmd = {
+		.data[0] = le64_to_cpu(next->ucmd.cmd[0]),
+	};
+
+	return arm_smmu_erratum_cmd_needs_repeating(smmu, &last->cmd) ==
+	       arm_smmu_erratum_cmd_needs_repeating(smmu, &next_cmd);
+}
+
 int arm_vsmmu_cache_invalidate(struct iommufd_viommu *viommu,
 			       struct iommu_user_data_array *array)
 {
@@ -382,7 +394,8 @@ int arm_vsmmu_cache_invalidate(struct iommufd_viommu *viommu,
 
 		/* FIXME work in blocks of CMDQ_BATCH_ENTRIES and copy each block? */
 		cur++;
-		if (cur != end && (cur - last) != CMDQ_BATCH_ENTRIES - 1)
+		if (cur != end && (cur - last) != CMDQ_BATCH_ENTRIES - 1 &&
+		    arm_vsmmu_can_batch_cmd(smmu, last, cur))
 			continue;
 
 		/* FIXME always uses the main cmdq rather than trying to group by type */
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 599c835c50d8..041e188b3b30 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -700,10 +700,10 @@ static void arm_smmu_cmdq_write_entries(struct arm_smmu_cmdq *cmdq,
  *   insert their own list of commands then all of the commands from one
  *   CPU will appear before any of the commands from the other CPU.
  */
-int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu,
-				struct arm_smmu_cmdq *cmdq,
-				struct arm_smmu_cmd *cmds, int n,
-				bool sync)
+static int __arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu,
+					 struct arm_smmu_cmdq *cmdq,
+					 struct arm_smmu_cmd *cmds, int n,
+					 bool sync)
 {
 	struct arm_smmu_cmd cmd_sync;
 	u32 prod;
@@ -822,6 +822,28 @@ int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu,
 	return ret;
 }
 
+int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu,
+				struct arm_smmu_cmdq *cmdq,
+				struct arm_smmu_cmd *cmds, int n,
+				bool sync)
+{
+	int ret = __arm_smmu_cmdq_issue_cmdlist(smmu, cmdq, cmds, n, sync);
+
+	/*
+	 * arm_smmu_cmdq_batch_add_cmd_p() can flush its current batch with
+	 * sync=true and n=0 (bare SYNC) when the next command is not
+	 * supported by the batch's pre-selected cmdq, so the repeat path
+	 * must not inspect cmds[0].
+	 */
+	if (!n || ret || !sync)
+		return ret;
+
+	if (arm_smmu_erratum_cmd_needs_repeating(smmu, &cmds[0]))
+		ret = __arm_smmu_cmdq_issue_cmdlist(smmu, cmdq, cmds, n, sync);
+
+	return ret;
+}
+
 static int arm_smmu_cmdq_issue_cmd_p(struct arm_smmu_device *smmu,
 				     struct arm_smmu_cmd *cmd, bool sync)
 {
@@ -862,6 +884,11 @@ static bool arm_smmu_cmdq_batch_force_sync(struct arm_smmu_device *smmu,
 	    (smmu->options & ARM_SMMU_OPT_CMDQ_FORCE_SYNC))
 		return true;
 
+	/* See ARM_SMMU_OPT_REPEAT_TLBI_CFGI */
+	if (cmds->num == CMDQ_BATCH_ENTRIES &&
+	    arm_smmu_erratum_cmd_needs_repeating(smmu, &cmds->cmds[0]))
+		return true;
+
 	return false;
 }
 
-- 
2.50.1


^ permalink raw reply related

* [PATCH v4 1/3] iommu/arm-smmu-v3: Factor out CMDQ batch force-sync conditions
From: Ashish Mhetre @ 2026-06-09  7:32 UTC (permalink / raw)
  To: will, robin.murphy, joro, jgg, nicolinc
  Cc: linux-arm-kernel, iommu, linux-kernel, linux-tegra, Ashish Mhetre
In-Reply-To: <20260609073204.1760077-1-amhetre@nvidia.com>

From: Nicolin Chen <nicolinc@nvidia.com>

arm_smmu_cmdq_batch_add_cmd_p() carries two distinct reasons for
flushing the current batch with a CMD_SYNC before appending the
new command:

  - The batch's pre-assigned cmdq does not support the new command.
  - The Arm erratum 2812531 workaround (ARM_SMMU_OPT_CMDQ_FORCE_SYNC)
    forces a SYNC at one entry before the batch is full.

Lift those checks into a new arm_smmu_cmdq_batch_force_sync() helper
so that adding another force-sync condition becomes a one-line
addition. No functional change.

Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Ashish Mhetre <amhetre@nvidia.com>
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 23 +++++++++++++++------
 1 file changed, 17 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index a10affb483a4..76efe479e80f 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -847,16 +847,27 @@ static void arm_smmu_cmdq_batch_init_cmd(struct arm_smmu_device *smmu,
 	cmds->cmdq = arm_smmu_get_cmdq(smmu, cmd);
 }
 
+static bool arm_smmu_cmdq_batch_force_sync(struct arm_smmu_device *smmu,
+					   struct arm_smmu_cmdq_batch *cmds,
+					   struct arm_smmu_cmd *cmd)
+{
+	/* The batch's pre-assigned cmdq doesn't support the new command */
+	if (!arm_smmu_cmdq_supports_cmd(cmds->cmdq, cmd))
+		return true;
+
+	/* Arm erratum 2812531 */
+	if (cmds->num == CMDQ_BATCH_ENTRIES - 1 &&
+	    (smmu->options & ARM_SMMU_OPT_CMDQ_FORCE_SYNC))
+		return true;
+
+	return false;
+}
+
 static void arm_smmu_cmdq_batch_add_cmd_p(struct arm_smmu_device *smmu,
 					  struct arm_smmu_cmdq_batch *cmds,
 					  struct arm_smmu_cmd *cmd)
 {
-	bool force_sync = (cmds->num == CMDQ_BATCH_ENTRIES - 1) &&
-			  (smmu->options & ARM_SMMU_OPT_CMDQ_FORCE_SYNC);
-	bool unsupported_cmd;
-
-	unsupported_cmd = !arm_smmu_cmdq_supports_cmd(cmds->cmdq, cmd);
-	if (force_sync || unsupported_cmd) {
+	if (arm_smmu_cmdq_batch_force_sync(smmu, cmds, cmd)) {
 		arm_smmu_cmdq_issue_cmdlist(smmu, cmds->cmdq, cmds->cmds,
 					    cmds->num, true);
 		arm_smmu_cmdq_batch_init_cmd(smmu, cmds, cmd);
-- 
2.50.1


^ permalink raw reply related

* [PATCH v4 2/3] iommu/arm-smmu-v3: Detect Tegra264 erratum
From: Ashish Mhetre @ 2026-06-09  7:32 UTC (permalink / raw)
  To: will, robin.murphy, joro, jgg, nicolinc
  Cc: linux-arm-kernel, iommu, linux-kernel, linux-tegra, Ashish Mhetre
In-Reply-To: <20260609073204.1760077-1-amhetre@nvidia.com>

Tegra264 SMMU is affected by an erratum where a TLB entry can survive
an invalidation that races with concurrent traffic targeting the same
entry. The hardware-recommended software workaround is to issue every
CFGI/TLBI command (each followed by CMD_SYNC) twice. The second issue
is guaranteed to evict the entry. ATC_INV is not affected and must
not be doubled.

The erratum is not flagged by any SMMUv3 IDR/IIDR register, so it
cannot be detected from hardware registers. Tegra264 boots from device
tree only and has no ACPI/IORT support, so detection is through device
tree only.

Add the ARM_SMMU_OPT_REPEAT_TLBI_CFGI option and set it on instances
matching the existing "nvidia,tegra264-smmu" compatible. Also add a
matching arm_smmu_erratum_repeat_tlbi_cfgi_key static key that DT
probe enables, so the inline classifier compiles down to a single
test+branch on unaffected kernels. Add an
arm_smmu_erratum_cmd_needs_repeating() helper in arm-smmu-v3.h that
gates on the static key first and then range-checks the opcode
(CFGI_STE .. ATC_INV), so subsequent changes wiring the workaround
into the CMDQ submission and iommufd batching paths can share a
single predicate.

No callers consume the option yet. A subsequent change wires the
workaround into the CMDQ issue paths.

Signed-off-by: Ashish Mhetre <amhetre@nvidia.com>
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c |  7 +++++-
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 24 +++++++++++++++++++++
 2 files changed, 30 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 76efe479e80f..599c835c50d8 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -42,6 +42,8 @@ MODULE_PARM_DESC(disable_msipolling,
 static const struct iommu_ops arm_smmu_ops;
 static struct iommu_dirty_ops arm_smmu_dirty_ops;
 
+DEFINE_STATIC_KEY_FALSE(arm_smmu_erratum_repeat_tlbi_cfgi_key);
+
 enum arm_smmu_msi_index {
 	EVTQ_MSI_INDEX,
 	GERROR_MSI_INDEX,
@@ -5303,8 +5305,11 @@ static int arm_smmu_device_dt_probe(struct platform_device *pdev,
 	if (of_dma_is_coherent(dev->of_node))
 		smmu->features |= ARM_SMMU_FEAT_COHERENCY;
 
-	if (of_device_is_compatible(dev->of_node, "nvidia,tegra264-smmu"))
+	if (of_device_is_compatible(dev->of_node, "nvidia,tegra264-smmu")) {
 		tegra_cmdqv_dt_probe(dev->of_node, smmu);
+		smmu->options |= ARM_SMMU_OPT_REPEAT_TLBI_CFGI;
+		static_branch_enable(&arm_smmu_erratum_repeat_tlbi_cfgi_key);
+	}
 
 	return ret;
 }
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index c909c9a88538..c6ea3b8dc761 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -11,6 +11,7 @@
 #include <linux/bitfield.h>
 #include <linux/iommu.h>
 #include <linux/iommufd.h>
+#include <linux/jump_label.h>
 #include <linux/kernel.h>
 #include <linux/mmzone.h>
 #include <linux/sizes.h>
@@ -928,6 +929,12 @@ struct arm_smmu_device {
 #define ARM_SMMU_OPT_MSIPOLL		(1 << 2)
 #define ARM_SMMU_OPT_CMDQ_FORCE_SYNC	(1 << 3)
 #define ARM_SMMU_OPT_TEGRA241_CMDQV	(1 << 4)
+/*
+ * Repeat every {CFGI,TLBI};CMD_SYNC command sequence so that the second
+ * issue executes only after the first issue's CMD_SYNC has completed.
+ * Does not apply to ATC_INV.
+ */
+#define ARM_SMMU_OPT_REPEAT_TLBI_CFGI	(1 << 5)
 	u32				options;
 
 	struct arm_smmu_cmdq		cmdq;
@@ -1212,6 +1219,23 @@ int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu,
 				struct arm_smmu_cmd *cmds, int n,
 				bool sync);
 
+DECLARE_STATIC_KEY_FALSE(arm_smmu_erratum_repeat_tlbi_cfgi_key);
+
+static inline bool
+arm_smmu_erratum_cmd_needs_repeating(struct arm_smmu_device *smmu,
+				     struct arm_smmu_cmd *cmd)
+{
+	u8 opcode;
+
+	if (!static_branch_unlikely(&arm_smmu_erratum_repeat_tlbi_cfgi_key))
+		return false;
+	if (!(smmu->options & ARM_SMMU_OPT_REPEAT_TLBI_CFGI))
+		return false;
+
+	opcode = FIELD_GET(CMDQ_0_OP, cmd->data[0]);
+	return opcode >= CMDQ_OP_CFGI_STE && opcode < CMDQ_OP_ATC_INV;
+}
+
 #ifdef CONFIG_ARM_SMMU_V3_SVA
 bool arm_smmu_sva_supported(struct arm_smmu_device *smmu);
 void arm_smmu_sva_notifier_synchronize(void);
-- 
2.50.1


^ permalink raw reply related

* [PATCH v4 0/3] iommu/arm-smmu-v3: Tegra264 invalidation workaround
From: Ashish Mhetre @ 2026-06-09  7:32 UTC (permalink / raw)
  To: will, robin.murphy, joro, jgg, nicolinc
  Cc: linux-arm-kernel, iommu, linux-kernel, linux-tegra, Ashish Mhetre

Nvidia Tegra264 SMMUs are affected by an erratum where a TLB entry can
survive an invalidation that races with concurrent traffic targeting
the same entry. The hardware-recommended software workaround is to
issue every CFGI/TLBI command (each followed by CMD_SYNC) twice.
The second issue must execute only after the first issue's CMD_SYNC
has completed, giving the sequence:

    TLBI/CFGI ... CMD_SYNC TLBI/CFGI ... CMD_SYNC

ATC_INV is not affected and must not be doubled.

The erratum is not flagged by any SMMUv3 IDR/IIDR register, so it
cannot be detected from hardware ID. Tegra264 is device-tree-only
(no ACPI/IORT support), so detection is purely by compatible string.

This series is structured as a small refactor + detect + apply
sequence so that each step is reviewable in isolation:

 1/3 Pure refactor (no functional change): lift the existing
     force-sync conditions out of arm_smmu_cmdq_batch_add_cmd_p()
     into a new arm_smmu_cmdq_batch_force_sync() helper, so that
     adding another condition (in patch 3) is a one-liner.
     Authored by Nicolin Chen.

 2/3 Detect the erratum and provide the classifier. Adds the
     ARM_SMMU_OPT_REPEAT_TLBI_CFGI per-instance option, a global
     arm_smmu_erratum_repeat_tlbi_cfgi_key static key, and the
     arm_smmu_erratum_cmd_needs_repeating() predicate. The static
     key means the wrapper compiles to a single tested branch on
     unaffected kernels.

 3/3 Apply the workaround: factor arm_smmu_cmdq_issue_cmdlist()
     into a thin wrapper around __arm_smmu_cmdq_issue_cmdlist()
     that re-issues the cmdlist a second time when the predicate
     fires; register the same condition with the batch helper so
     full batches of CFGI/TLBI flush with sync=true; and add
     arm_vsmmu_can_batch_cmd() so iommufd does not mix command
     classes inside a single batch. Also documents the erratum
     in silicon-errata.rst.

The series applies cleanly on linux-next/master (base-commit below).

Changes since v3:
 - Drop the cmds->num == 0 early-return so the refactor is
   truly "no functional change".
 - Rename ARM_SMMU_OPT_TLBI_TWICE -> ARM_SMMU_OPT_REPEAT_TLBI_CFGI
   and rephrase its kdoc to be hardware-agnostic.
 - Rename arm_smmu_cmd_needs_tlbi_twice() ->
   arm_smmu_erratum_cmd_needs_repeating() and drop the kdoc
   above it.
 - Replace the explicit opcode switch with a single range check
   opcode >= CMDQ_OP_CFGI_STE && opcode < CMDQ_OP_ATC_INV.
 - Introduce arm_smmu_erratum_repeat_tlbi_cfgi_key static key:
   the predicate gates on it first so unaffected kernels pay
   only a single static_branch_unlikely() check.
 - Drop the verbose Tegra264-specific comments above
   arm_vsmmu_can_batch_cmd() and inside the batch helper.
 - Document the erratum in
   Documentation/arch/arm64/silicon-errata.rst.
 - Guard the repeat path in arm_smmu_cmdq_issue_cmdlist() with
   an n > 0 check so we never inspect cmds[0] on the bare-SYNC
   flush emitted by arm_smmu_cmdq_batch_add_cmd_p() when the
   next command is unsupported by the batch's pre-selected
   cmdq.
 - Drop the carried Reviewed-by tags now that the patch
   shape has changed; re-review appreciated.

Changes since v2:
 - Split into a 3-patch series (refactor / detect / apply) to keep
   each step small and bisectable.
 - Move the classifier to arm-smmu-v3.h as static inline so the
   iommufd file can share it.
 - Add arm_vsmmu_can_batch_cmd() to split iommufd batches at
   "needs repeating" transitions so the per-batch decision based
   on the first command stays correct under mixed user input.
 - Spell out in the commit message why detection is via DT and
   not via IIDR/ACPI.

Changes since v1:
 - Detect the erratum from the existing "nvidia,tegra264-smmu"
   compatible instead of adding a new property.
 - Centralise the doubling at the CMDQ submission layer and only
   apply it to CFGI/TLBI (not ATC_INV).
 - Drop the binding/dtsi patches accordingly.

Ashish Mhetre (2):
  iommu/arm-smmu-v3: Detect Tegra264 erratum
  iommu/arm-smmu-v3: Issue CFGI/TLBI twice on Tegra264

Nicolin Chen (1):
  iommu/arm-smmu-v3: Factor out CMDQ batch force-sync conditions

 Documentation/arch/arm64/silicon-errata.rst   |  2 +
 .../arm/arm-smmu-v3/arm-smmu-v3-iommufd.c     | 15 ++++-
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c   | 65 +++++++++++++++----
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h   | 24 +++++++
 4 files changed, 94 insertions(+), 12 deletions(-)


base-commit: 7da7f07112610a520567421dd2ffcb51beaefbcc
-- 
2.50.1


^ permalink raw reply

* Re: [PATCH v3 2/3] iommu/arm-smmu-v3: Detect Tegra264 erratum
From: Ashish Mhetre @ 2026-06-09  7:07 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Will Deacon, robin.murphy, joro, nicolinc, linux-arm-kernel,
	iommu, linux-kernel, linux-tegra
In-Reply-To: <20260605141053.GF2487554@ziepe.ca>



On 6/5/2026 7:40 PM, Jason Gunthorpe wrote:
> External email: Use caution opening links or attachments
>
>
> On Fri, Jun 05, 2026 at 07:35:35PM +0530, Ashish Mhetre wrote:
>>>> +{
>>>> +     if (!(smmu->options & ARM_SMMU_OPT_TLBI_TWICE))
>>>> +             return false;
>>> Maybe we should make this a static key?
>> Okay. Shall I add just static key and remove option bit, or
>> have static key alongside existing option bit such that
>> static_branch_unlikely will precede the option bit check?
> You'd have the static key and the options. Keep it simple, enable the
> static key once if any driver probes to set TWICE. Check the key
> before options to get the best code gen

Okay, I'll incorporate this in V4 and send.

> But IDK if it is really worth it, there are already lots of branches
> on the performance tlbi flow, and we didn't do this for other tlbi
> affecting errata..
>
> IDK if we really care about branches we should also be doing things
> like disabling the range/non-range paths and ATC based on what is
> actually in use..
>
> Jason

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox