Linux Tegra architecture development
 help / color / mirror / Atom feed
* [REGRESSION] Invalid gather when using Tegra210 media engines
@ 2025-02-03 14:55 Diogo Ivo
  2025-02-03 17:06 ` Jason Gunthorpe
  0 siblings, 1 reply; 9+ messages in thread
From: Diogo Ivo @ 2025-02-03 14:55 UTC (permalink / raw)
  To: thierry.reding, vdumpa, joro, will, robin.murphy, jonathanh, jgg,
	baolu.lu, jsnitsel, jroedel
  Cc: regressions, linux-tegra, iommu, diogo.ivo

Hello,

Commit c8cc2655cc6c introduced a regression when trying to use the media
accelerators present on the Tegra X1 SoC.

I came across this regression when testing the branch [1] that leverages
the NVJPG engine in the Tegra X1 for decoding a JPEG file. After commit
c8cc2655cc6c we see the following error messages after submitting a job
through the TEGRA_CHANNEL_SUBMIT IOCTL:

[   46.879757] tegra-nvjpg 54380000.nvjpg: invalid gather for push 
buffer 0x0000000108f08000
[   46.888018] tegra-mc 70019000.memory-controller: host1xdmar: read 
@0x00000000021f7000: EMEM address decode error (EMEM decode error)
[   57.052256] ---- mlocks ----
[   57.055156] 0: unlocked
[   57.057620] 1: unlocked
[   57.060069] 2: unlocked
[   57.062533] 3: unlocked
[   57.064993] 4: unlocked
[   57.067439] 5: unlocked
[   57.069896] 6: unlocked
[   57.072351] 7: unlocked
[   57.074798] 8: unlocked
[   57.077254] 9: unlocked
[   57.079702] 10: unlocked
[   57.082251] 11: unlocked
[   57.084796] 12: unlocked
[   57.087329] 13: unlocked
[   57.089871] 14: unlocked
[   57.092411] 15: unlocked
[   57.094945]
[   57.096449] ---- syncpts ----
[   57.099423] id 0 (0-reserved-nop) min 0 max 0 (0 waiters)
[   57.104836] id 1 (1-54200000.dc) min 3261 max 0 (0 waiters)
[   57.110420] id 2 (2-54240000.dc) min 0 max 0 (0 waiters)
[   57.115745] id 3 (3-54340000.vic) min 0 max 0 (0 waiters)
[   57.121155] id 4 (4-ffmpeg) min 0 max 1 (1 waiters)
[   57.126275]
[   57.127766] ---- channels ----
[   57.130834] 0: fifo:
[   57.133029] FIFOSTAT 80100840
[   57.135997] [empty]
[   57.138109] 0-54340000.vic:
[   57.138113] inactive
[   57.138113]
[   57.144669] 1: fifo:
[   57.146856] FIFOSTAT 80100840
[   57.149833] [empty]
[   57.151936] 1-54340000.vic:
[   57.151940] inactive
[   57.151940]
[   57.158499] 2: fifo:
[   57.160694] FIFOSTAT 80100840
[   57.163661] [empty]
[   57.165775] 2-54380000.nvjpg:
[   57.165780] active class 00, offset 0fff, val ffffffff
[   57.173987] DMASTART 0x00000000021f7000, DMAEND 0x0000000000000ffc
[   57.180183] DMAPUT 00000010 DMAGET 00000010 DMACTL 00000000
[   57.185766] CBREAD ffffffff CBSTAT 00000fff
[   57.189960] JOB, syncpt 4: 1 timeout: 10000 num_slots: 2 num_handles: 1
[   57.196587]     0x00000001021f7000: 00080041: SETCL(class=001, 
offset=008, mask=01, [04000000])
[   57.205314]     0x00000001021f7008: 00003000: SETCL(class=0c0)
[   57.211161]     0x00000001021f700c: 20000000: NONINCR(offset=000, [])
[   57.217622]   GATHER at 0x0000000108f08000+0x0, 32 words
[   57.222944]     0x0000000108f08000: 10100002: INCR(offset=010, 
[00000080, 00000001])
[   57.230708]     0x0000000108f0800c: 10100002: INCR(offset=010, 
[000001c0, 00000000])
[   57.238472]     0x0000000108f08018: 10100002: INCR(offset=010, 
[000001c1, 00000000])
[   57.246235]     0x0000000108f08024: 10100002: INCR(offset=010, 
[000001c2, 0005ae60])
[   57.253998]     0x0000000108f08030: 10100002: INCR(offset=010, 
[000001c3, 0001fa40])
[   57.261761]     0x0000000108f0803c: 10100002: INCR(offset=010, 
[000001c4, 000516a0])
[   57.269524]     0x0000000108f08048: 10100002: INCR(offset=010, 
[000001c5, 0001fa60])
[   57.277287]     0x0000000108f08054: 10100002: INCR(offset=010, 
[000001c6, 00000000])
[   57.285050]     0x0000000108f08060: 10100002: INCR(offset=010, 
[000001c7, 00000000])
[   57.292814]     0x0000000108f0806c: 10100002: INCR(offset=010, 
[000000c0, 00000100])
[   57.300577]     0x0000000108f08078: 20000001: NONINCR(offset=000, 
[00000104])
[   57.307729]
[   57.309236] tegra-host1x 50000000.host1x: cdma_timeout_handler: 
timeout: 4 (4-ffmpeg), HW thresh 0, done 1

These error messages are printed with commit fae6e669cdc5 applied. If
this commit is not applied not even the Display Controllers work, as 
mentioned in
the regression report [2], and there is no display output, which we at
least get with the fix from fae6e669cdc5.

Something interesting here is that prior to c8cc2655cc6c we see the
following messages in dmesg:

[    0.044463] platform 50000000.host1x: Adding to iommu group 0
[    0.044547] platform 57000000.gpu: Adding to iommu group 1
[    1.385557] Failed to set up IOMMU for device 50000000.host1x; 
retaining platform DMA ops
[    1.395120] platform 54200000.dc: Adding to iommu group 2
[    1.400834] platform 54240000.dc: Adding to iommu group 2
[    1.406930] platform 54340000.vic: Adding to iommu group 3
[    1.412762] platform 54380000.nvjpg: Adding to iommu group 4
[    1.420438] Failed to set up IOMMU for device 54200000.dc; retaining 
platform DMA ops
[    1.436616] Failed to set up IOMMU for device 54240000.dc; retaining 
platform DMA ops
[    1.456545] Failed to set up IOMMU for device 54340000.vic; retaining 
platform DMA ops
[    1.465057] Failed to set up IOMMU for device 54380000.nvjpg; 
retaining platform DMA ops

whereas after c8cc2655cc6c + fae6e669cdc5 we only get:

[    1.818480] tegra-host1x 50000000.host1x: Adding to iommu group 0
[    1.829288] tegra-dc 54200000.dc: Adding to iommu group 1
[    1.839705] tegra-dc 54240000.dc: Adding to iommu group 1
[    1.855328] tegra-vic 54340000.vic: Adding to iommu group 2
[    1.859287] tegra-nvjpg 54380000.nvjpg: Adding to iommu group 3

Here GPU does not show up because I disabled the compilation of Nouveau but
the results are the same.

Please let me know if you need more information on my side and I'll be
happy to provide it.

Best regards,
Diogo

#regzbot introduced: c8cc2655cc6c

[1]: https://gitlab.freedesktop.org/d.ivo/mesa/-/tree/diogo/vaapi_remove_gpu
[2]: 
https://lore.kernel.org/linux-tegra/bbmhcoghrprmbdibnjum6lefix2eoquxrde7wyqeulm4xabmlm@b6jy32saugqh/

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [REGRESSION] Invalid gather when using Tegra210 media engines
  2025-02-03 14:55 [REGRESSION] Invalid gather when using Tegra210 media engines Diogo Ivo
@ 2025-02-03 17:06 ` Jason Gunthorpe
  2025-02-03 17:35   ` Diogo Ivo
  0 siblings, 1 reply; 9+ messages in thread
From: Jason Gunthorpe @ 2025-02-03 17:06 UTC (permalink / raw)
  To: Diogo Ivo
  Cc: thierry.reding, vdumpa, joro, will, robin.murphy, jonathanh,
	baolu.lu, jsnitsel, jroedel, regressions, linux-tegra, iommu

On Mon, Feb 03, 2025 at 02:55:12PM +0000, Diogo Ivo wrote:
> Hello,
> 
> Commit c8cc2655cc6c introduced a regression when trying to use the media
> accelerators present on the Tegra X1 SoC.
> 
> I came across this regression when testing the branch [1] that leverages
> the NVJPG engine in the Tegra X1 for decoding a JPEG file. After commit
> c8cc2655cc6c we see the following error messages after submitting a job
> through the TEGRA_CHANNEL_SUBMIT IOCTL:
> 
> [   46.879757] tegra-nvjpg 54380000.nvjpg: invalid gather for push buffer
> 0x0000000108f08000

What driver is this? The message comes from 
   drivers/gpu/host1x/hw/channel_hw.c

But what driver is 'tegra-nvjpg' that is bound to 54380000.nvjpg ?

Is it the stuff in 
 drivers/gpu/drm/nouveau/nvkm/engine/nvjpg/

I don't see "tegra-nvjpg" in the kernel?

Can you share where the failing command was sent to the device?

> Please let me know if you need more information on my side and I'll be
> happy to provide it.

It is still ARM64 & CONFIG_ARM_DMA_USE_IOMMU=n

?

I'm guessing it is the same basic issue as fae6e669cdc5 ("drm/tegra:
Do not assume that a NULL domain means no DMA IOMMU"), except in the
host1x not DRM code. It looks to me like the same pattern was copied
there.

How about this:

diff --git a/drivers/gpu/host1x/dev.c b/drivers/gpu/host1x/dev.c
index be2ad7203d7b96..090b1fc97a7309 100644
--- a/drivers/gpu/host1x/dev.c
+++ b/drivers/gpu/host1x/dev.c
@@ -361,6 +361,10 @@ static bool host1x_wants_iommu(struct host1x *host1x)
 	return true;
 }
 
+/*
+ * Returns ERR_PTR on failure, NULL if the translation is IDENTITY, otherwise a
+ * valid paging domain.
+ */
 static struct iommu_domain *host1x_iommu_attach(struct host1x *host)
 {
 	struct iommu_domain *domain = iommu_get_domain_for_dev(host->dev);
@@ -385,6 +389,8 @@ static struct iommu_domain *host1x_iommu_attach(struct host1x *host)
 	 * Similarly, if host1x is already attached to an IOMMU (via the DMA
 	 * API), don't try to attach again.
 	 */
+	if (domain && domain->type == IOMMU_DOMAIN_IDENTITY)
+		domain = NULL;
 	if (!host1x_wants_iommu(host) || domain)
 		return domain;
 

(if not can you investigate this function's flow compared to a good
kernel?)

Jason

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [REGRESSION] Invalid gather when using Tegra210 media engines
  2025-02-03 17:06 ` Jason Gunthorpe
@ 2025-02-03 17:35   ` Diogo Ivo
  2025-02-03 17:43     ` Jason Gunthorpe
  0 siblings, 1 reply; 9+ messages in thread
From: Diogo Ivo @ 2025-02-03 17:35 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: thierry.reding, vdumpa, joro, will, robin.murphy, jonathanh,
	baolu.lu, jsnitsel, jroedel, regressions, linux-tegra, iommu

[-- Attachment #1: Type: text/plain, Size: 3201 bytes --]

Hi Jason, thanks for the quick reply!

On 2/3/25 5:06 PM, Jason Gunthorpe wrote:
> On Mon, Feb 03, 2025 at 02:55:12PM +0000, Diogo Ivo wrote:
>> Hello,
>>
>> Commit c8cc2655cc6c introduced a regression when trying to use the media
>> accelerators present on the Tegra X1 SoC.
>>
>> I came across this regression when testing the branch [1] that leverages
>> the NVJPG engine in the Tegra X1 for decoding a JPEG file. After commit
>> c8cc2655cc6c we see the following error messages after submitting a job
>> through the TEGRA_CHANNEL_SUBMIT IOCTL:
>>
>> [   46.879757] tegra-nvjpg 54380000.nvjpg: invalid gather for push buffer
>> 0x0000000108f08000
> 
> What driver is this? The message comes from
>     drivers/gpu/host1x/hw/channel_hw.c
> 
> But what driver is 'tegra-nvjpg' that is bound to 54380000.nvjpg ?
> 
> Is it the stuff in
>   drivers/gpu/drm/nouveau/nvkm/engine/nvjpg/
> 
> I don't see "tegra-nvjpg" in the kernel?

The driver for NVJPG is not upstreamed yet, I am using a driver that I
wrote that is pretty much a copy of the driver for NVDEC. I have
attached it to this e-mail.

> Can you share where the failing command was sent to the device?

The command submission happens in tegra_task_submit() found in [1].

>> Please let me know if you need more information on my side and I'll be
>> happy to provide it.
> 
> It is still ARM64 & CONFIG_ARM_DMA_USE_IOMMU=n?

Yes it is.

> I'm guessing it is the same basic issue as fae6e669cdc5 ("drm/tegra:
> Do not assume that a NULL domain means no DMA IOMMU"), except in the
> host1x not DRM code. It looks to me like the same pattern was copied
> there.
> 
> How about this:
> 
> diff --git a/drivers/gpu/host1x/dev.c b/drivers/gpu/host1x/dev.c
> index be2ad7203d7b96..090b1fc97a7309 100644
> --- a/drivers/gpu/host1x/dev.c
> +++ b/drivers/gpu/host1x/dev.c
> @@ -361,6 +361,10 @@ static bool host1x_wants_iommu(struct host1x *host1x)
>   	return true;
>   }
>   
> +/*
> + * Returns ERR_PTR on failure, NULL if the translation is IDENTITY, otherwise a
> + * valid paging domain.
> + */
>   static struct iommu_domain *host1x_iommu_attach(struct host1x *host)
>   {
>   	struct iommu_domain *domain = iommu_get_domain_for_dev(host->dev);
> @@ -385,6 +389,8 @@ static struct iommu_domain *host1x_iommu_attach(struct host1x *host)
>   	 * Similarly, if host1x is already attached to an IOMMU (via the DMA
>   	 * API), don't try to attach again.
>   	 */
> +	if (domain && domain->type == IOMMU_DOMAIN_IDENTITY)
> +		domain = NULL;
>   	if (!host1x_wants_iommu(host) || domain)
>   		return domain;
>   
> 
> (if not can you investigate this function's flow compared to a good
> kernel?)

Yes, this worked! Does this mean that with this change we go through the
path of using the shared Tegra domain (for example in the driver I
attached client->group == true), and if that is the case would it be
beneficial for us to try and change tegra_smmu_def_domain_type() from
returning IOMMU_DOMAIN_IDENTITY into IOMMU_DOMAIN_DMA so that the
dma_alloc_* functions are called directly?

Thank you for your time!

Best regards,
Diogo

[1]: 
https://gitlab.freedesktop.org/d.ivo/mesa/-/blob/diogo/vaapi_remove_gpu/src/gallium/drivers/tegra/tegra_task.c

[-- Attachment #2: 0001-drm-tegra-Add-NVJPG-driver.patch --]
[-- Type: text/x-patch, Size: 10798 bytes --]

From 6aea2ca071bb39c4bd5fd7b730e9743aeef3ead5 Mon Sep 17 00:00:00 2001
From: Diogo Ivo <diogo.ivo@tecnico.ulisboa.pt>
Date: Thu, 16 Nov 2023 17:29:23 +0000
Subject: [PATCH 1/3] drm/tegra: Add NVJPG driver

Add support for booting and using NVJPG on Tegra210 to the Host1x
and TegraDRM drivers. This driver only supports the new TegraDRM uAPI.

Signed-off-by: Diogo Ivo <diogo.ivo@tecnico.ulisboa.pt>
---
 drivers/gpu/drm/tegra/Makefile |   1 +
 drivers/gpu/drm/tegra/drm.c    |   2 +
 drivers/gpu/drm/tegra/drm.h    |   1 +
 drivers/gpu/drm/tegra/nvjpg.c  | 331 +++++++++++++++++++++++++++++++++
 include/linux/host1x.h         |   1 +
 5 files changed, 336 insertions(+)
 create mode 100644 drivers/gpu/drm/tegra/nvjpg.c

diff --git a/drivers/gpu/drm/tegra/Makefile b/drivers/gpu/drm/tegra/Makefile
index 6fc4b504e786..e399b40d64a1 100644
--- a/drivers/gpu/drm/tegra/Makefile
+++ b/drivers/gpu/drm/tegra/Makefile
@@ -25,6 +25,7 @@ tegra-drm-y := \
 	falcon.o \
 	vic.o \
 	nvdec.o \
+	nvjpg.o \
 	riscv.o
 
 tegra-drm-y += trace.o
diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c
index 35ff303c6674..1252ad834b9b 100644
--- a/drivers/gpu/drm/tegra/drm.c
+++ b/drivers/gpu/drm/tegra/drm.c
@@ -1359,6 +1359,7 @@ static const struct of_device_id host1x_drm_subdevs[] = {
 	{ .compatible = "nvidia,tegra210-sor1", },
 	{ .compatible = "nvidia,tegra210-vic", },
 	{ .compatible = "nvidia,tegra210-nvdec", },
+	{ .compatible = "nvidia,tegra210-nvjpg", },
 	{ .compatible = "nvidia,tegra186-display", },
 	{ .compatible = "nvidia,tegra186-dc", },
 	{ .compatible = "nvidia,tegra186-sor", },
@@ -1396,6 +1397,7 @@ static struct platform_driver * const drivers[] = {
 	&tegra_gr3d_driver,
 	&tegra_vic_driver,
 	&tegra_nvdec_driver,
+	&tegra_nvjpg_driver,
 };
 
 static int __init host1x_drm_init(void)
diff --git a/drivers/gpu/drm/tegra/drm.h b/drivers/gpu/drm/tegra/drm.h
index f9d18e8cf6ab..c210b0423f4c 100644
--- a/drivers/gpu/drm/tegra/drm.h
+++ b/drivers/gpu/drm/tegra/drm.h
@@ -209,5 +209,6 @@ extern struct platform_driver tegra_gr2d_driver;
 extern struct platform_driver tegra_gr3d_driver;
 extern struct platform_driver tegra_vic_driver;
 extern struct platform_driver tegra_nvdec_driver;
+extern struct platform_driver tegra_nvjpg_driver;
 
 #endif /* HOST1X_DRM_H */
diff --git a/drivers/gpu/drm/tegra/nvjpg.c b/drivers/gpu/drm/tegra/nvjpg.c
new file mode 100644
index 000000000000..8eae654bac78
--- /dev/null
+++ b/drivers/gpu/drm/tegra/nvjpg.c
@@ -0,0 +1,331 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+#include <linux/clk.h>
+#include <linux/delay.h>
+#include <linux/dma-mapping.h>
+#include <linux/host1x.h>
+#include <linux/iommu.h>
+#include <linux/module.h>
+#include <linux/of.h>
+#include <linux/platform_device.h>
+#include <linux/pm_runtime.h>
+
+#include "drm.h"
+#include "falcon.h"
+
+struct nvjpg_config {
+	const char *firmware;
+	unsigned int version;
+};
+
+struct nvjpg {
+	struct falcon falcon;
+
+	void __iomem *regs;
+	struct tegra_drm_client client;
+	struct device *dev;
+	struct clk *clk;
+
+	/* Platform configuration */
+	const struct nvjpg_config *config;
+};
+
+static inline struct nvjpg *to_nvjpg(struct tegra_drm_client *client)
+{
+	return container_of(client, struct nvjpg, client);
+}
+
+static inline void nvjpg_writel(struct nvjpg *nvjpg, u32 value,
+				unsigned int offset)
+{
+	writel(value, nvjpg->regs + offset);
+}
+
+static int nvjpg_init(struct host1x_client *client)
+{
+	struct tegra_drm_client *drm = host1x_to_drm_client(client);
+	struct drm_device *dev = dev_get_drvdata(client->host);
+	struct tegra_drm *tegra = dev->dev_private;
+	struct nvjpg *nvjpg = to_nvjpg(drm);
+	int err;
+
+	err = host1x_client_iommu_attach(client);
+	if (err < 0 && err != -ENODEV) {
+		dev_err(nvjpg->dev, "failed to attach to domain: %d\n", err);
+		return err;
+	}
+
+	err = tegra_drm_register_client(tegra, drm);
+	if (err < 0)
+		goto detach;
+
+	/*
+	 * Inherit the DMA parameters (such as maximum segment size) from the
+	 * parent host1x device.
+	 */
+	client->dev->dma_parms = client->host->dma_parms;
+
+	return 0;
+
+detach:
+	host1x_client_iommu_detach(client);
+
+	return err;
+}
+
+static int nvjpg_exit(struct host1x_client *client)
+{
+	struct tegra_drm_client *drm = host1x_to_drm_client(client);
+	struct drm_device *dev = dev_get_drvdata(client->host);
+	struct tegra_drm *tegra = dev->dev_private;
+	struct nvjpg *nvjpg = to_nvjpg(drm);
+	int err;
+
+	/* avoid a dangling pointer just in case this disappears */
+	client->dev->dma_parms = NULL;
+
+	err = tegra_drm_unregister_client(tegra, drm);
+	if (err < 0)
+		return err;
+
+	pm_runtime_dont_use_autosuspend(client->dev);
+	pm_runtime_force_suspend(client->dev);
+
+	host1x_client_iommu_detach(client);
+
+	if (client->group) {
+		dma_unmap_single(nvjpg->dev, nvjpg->falcon.firmware.phys,
+				 nvjpg->falcon.firmware.size, DMA_TO_DEVICE);
+		tegra_drm_free(tegra, nvjpg->falcon.firmware.size,
+			       nvjpg->falcon.firmware.virt,
+			       nvjpg->falcon.firmware.iova);
+	} else {
+		dma_free_coherent(nvjpg->dev, nvjpg->falcon.firmware.size,
+				  nvjpg->falcon.firmware.virt,
+				  nvjpg->falcon.firmware.iova);
+	}
+
+	return 0;
+}
+
+static const struct host1x_client_ops nvjpg_client_ops = {
+	.init = nvjpg_init,
+	.exit = nvjpg_exit,
+};
+
+static int nvjpg_load_falcon_firmware(struct nvjpg *nvjpg)
+{
+	struct host1x_client *client = &nvjpg->client.base;
+	struct tegra_drm *tegra = nvjpg->client.drm;
+	dma_addr_t iova;
+	size_t size;
+	void *virt;
+	int err;
+
+	if (nvjpg->falcon.firmware.virt)
+		return 0;
+
+	err = falcon_read_firmware(&nvjpg->falcon, nvjpg->config->firmware);
+	if (err < 0)
+		return err;
+
+	size = nvjpg->falcon.firmware.size;
+
+	if (!client->group) {
+		virt = dma_alloc_coherent(nvjpg->dev, size, &iova, GFP_KERNEL);
+
+		err = dma_mapping_error(nvjpg->dev, iova);
+		if (err < 0)
+			return err;
+	} else {
+		virt = tegra_drm_alloc(tegra, size, &iova);
+		if (IS_ERR(virt))
+			return PTR_ERR(virt);
+	}
+
+	nvjpg->falcon.firmware.virt = virt;
+	nvjpg->falcon.firmware.iova = iova;
+
+	err = falcon_load_firmware(&nvjpg->falcon);
+	if (err < 0)
+		goto cleanup;
+
+	/*
+	 * In this case we have received an IOVA from the shared domain, so we
+	 * need to make sure to get the physical address so that the DMA API
+	 * knows what memory pages to flush the cache for.
+	 */
+	if (client->group) {
+		dma_addr_t phys;
+
+		phys = dma_map_single(nvjpg->dev, virt, size, DMA_TO_DEVICE);
+
+		err = dma_mapping_error(nvjpg->dev, phys);
+		if (err < 0)
+			goto cleanup;
+
+		nvjpg->falcon.firmware.phys = phys;
+	}
+
+	return 0;
+
+cleanup:
+	if (!client->group)
+		dma_free_coherent(nvjpg->dev, size, virt, iova);
+	else
+		tegra_drm_free(tegra, size, virt, iova);
+
+	return err;
+}
+
+static __maybe_unused int nvjpg_runtime_resume(struct device *dev)
+{
+	struct nvjpg *nvjpg = dev_get_drvdata(dev);
+	int err;
+
+	err = clk_prepare_enable(nvjpg->clk);
+	if (err < 0)
+		return err;
+
+	err = nvjpg_load_falcon_firmware(nvjpg);
+	if (err < 0)
+		goto disable;
+
+	err = falcon_boot(&nvjpg->falcon);
+	if (err < 0)
+		goto disable;
+
+	return 0;
+
+disable:
+	clk_disable_unprepare(nvjpg->clk);
+	return err;
+}
+
+static __maybe_unused int nvjpg_runtime_suspend(struct device *dev)
+{
+	struct nvjpg *nvjpg = dev_get_drvdata(dev);
+
+	clk_disable_unprepare(nvjpg->clk);
+
+	return 0;
+}
+
+static int nvjpg_can_use_memory_ctx(struct tegra_drm_client *client, bool *supported)
+{
+	*supported = false;
+
+	return 0;
+}
+
+static const struct tegra_drm_client_ops nvjpg_ops = {
+	.get_streamid_offset = NULL,
+	.can_use_memory_ctx = nvjpg_can_use_memory_ctx,
+};
+#define NVIDIA_TEGRA_210_NVJPG_FIRMWARE "nvidia/tegra210/nvjpg.bin"
+
+static const struct nvjpg_config nvjpg_t210_config = {
+	.firmware = NVIDIA_TEGRA_210_NVJPG_FIRMWARE,
+	.version = 0x21,
+};
+
+static const struct of_device_id tegra_nvjpg_of_match[] = {
+	{ .compatible = "nvidia,tegra210-nvjpg", .data = &nvjpg_t210_config },
+	{ },
+};
+MODULE_DEVICE_TABLE(of, tegra_nvjpg_of_match);
+
+static int nvjpg_probe(struct platform_device *pdev)
+{
+	struct device *dev = &pdev->dev;
+	struct nvjpg *nvjpg;
+	int err;
+
+	/* inherit DMA mask from host1x parent */
+	err = dma_coerce_mask_and_coherent(dev, *dev->parent->dma_mask);
+	if (err < 0) {
+		dev_err(&pdev->dev, "failed to set DMA mask: %d\n", err);
+		return err;
+	}
+
+	nvjpg = devm_kzalloc(dev, sizeof(*nvjpg), GFP_KERNEL);
+	if (!nvjpg)
+		return -ENOMEM;
+
+	nvjpg->config = of_device_get_match_data(dev);
+
+	nvjpg->regs = devm_platform_get_and_ioremap_resource(pdev, 0, NULL);
+	if (IS_ERR(nvjpg->regs))
+		return PTR_ERR(nvjpg->regs);
+
+	nvjpg->clk = devm_clk_get(dev, "nvjpg");
+	if (IS_ERR(nvjpg->clk)) {
+		dev_err(&pdev->dev, "failed to get clock\n");
+		return PTR_ERR(nvjpg->clk);
+	}
+
+	nvjpg->falcon.dev = dev;
+	nvjpg->falcon.regs = nvjpg->regs;
+
+	err = falcon_init(&nvjpg->falcon);
+	if (err < 0)
+		return err;
+
+	platform_set_drvdata(pdev, nvjpg);
+
+	INIT_LIST_HEAD(&nvjpg->client.base.list);
+	nvjpg->client.base.ops = &nvjpg_client_ops;
+	nvjpg->client.base.dev = dev;
+	nvjpg->client.base.class = HOST1X_CLASS_NVJPG;
+	nvjpg->dev = dev;
+
+	INIT_LIST_HEAD(&nvjpg->client.list);
+	nvjpg->client.version = nvjpg->config->version;
+	nvjpg->client.ops = &nvjpg_ops;
+
+	err = host1x_client_register(&nvjpg->client.base);
+	if (err < 0) {
+		dev_err(dev, "failed to register host1x client: %d\n", err);
+		goto exit_falcon;
+	}
+
+	pm_runtime_enable(dev);
+	pm_runtime_use_autosuspend(dev);
+	pm_runtime_set_autosuspend_delay(dev, 500);
+
+	return 0;
+
+exit_falcon:
+	falcon_exit(&nvjpg->falcon);
+
+	return err;
+}
+
+static void nvjpg_remove(struct platform_device *pdev)
+{
+	struct nvjpg *nvjpg = platform_get_drvdata(pdev);
+
+	pm_runtime_disable(&pdev->dev);
+	host1x_client_unregister(&nvjpg->client.base);
+	falcon_exit(&nvjpg->falcon);
+}
+
+static const struct dev_pm_ops nvjpg_pm_ops = {
+	SET_RUNTIME_PM_OPS(nvjpg_runtime_suspend, nvjpg_runtime_resume, NULL)
+	SET_SYSTEM_SLEEP_PM_OPS(pm_runtime_force_suspend,
+				pm_runtime_force_resume)
+};
+
+struct platform_driver tegra_nvjpg_driver = {
+	.driver = {
+		.name = "tegra-nvjpg",
+		.of_match_table = tegra_nvjpg_of_match,
+		.pm = &nvjpg_pm_ops
+	},
+	.probe = nvjpg_probe,
+	.remove_new = nvjpg_remove,
+};
+
+#if IS_ENABLED(CONFIG_ARCH_TEGRA_210_SOC)
+MODULE_FIRMWARE(NVIDIA_TEGRA_210_NVJPG_FIRMWARE);
+#endif
diff --git a/include/linux/host1x.h b/include/linux/host1x.h
index 9c8119ed13a4..922867359b0e 100644
--- a/include/linux/host1x.h
+++ b/include/linux/host1x.h
@@ -18,6 +18,7 @@ enum host1x_class {
 	HOST1X_CLASS_GR2D_SB = 0x52,
 	HOST1X_CLASS_VIC = 0x5D,
 	HOST1X_CLASS_GR3D = 0x60,
+	HOST1X_CLASS_NVJPG = 0xC0,
 	HOST1X_CLASS_NVDEC = 0xF0,
 	HOST1X_CLASS_NVDEC1 = 0xF5,
 };
-- 
2.48.1


[-- Attachment #3: 0002-arm64-tegra-Add-NVJPG-power-domain-node.patch --]
[-- Type: text/x-patch, Size: 973 bytes --]

From b5c50cdf63f24d4979c5ea88481610575b503b61 Mon Sep 17 00:00:00 2001
From: Diogo Ivo <diogo.ivo@tecnico.ulisboa.pt>
Date: Thu, 16 Nov 2023 17:29:24 +0000
Subject: [PATCH 2/3] arm64: tegra: Add NVJPG power-domain node

Add the NVJPG power-domain node in order to support the NVJPG
accelerator.

Signed-off-by: Diogo Ivo <diogo.ivo@tecnico.ulisboa.pt>
---
 arch/arm64/boot/dts/nvidia/tegra210.dtsi | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/arch/arm64/boot/dts/nvidia/tegra210.dtsi b/arch/arm64/boot/dts/nvidia/tegra210.dtsi
index 842669dac094..d96651d09d90 100644
--- a/arch/arm64/boot/dts/nvidia/tegra210.dtsi
+++ b/arch/arm64/boot/dts/nvidia/tegra210.dtsi
@@ -939,6 +939,12 @@ pd_xusbhost: xusbc {
 				resets = <&tegra_car TEGRA210_CLK_XUSB_HOST>;
 				#power-domain-cells = <0>;
 			};
+
+			pd_nvjpg: nvjpg {
+				clocks = <&tegra_car TEGRA210_CLK_NVJPG>;
+				resets = <&tegra_car 195>;
+				#power-domain-cells = <0>;
+			};
 		};
 	};
 
-- 
2.48.1


[-- Attachment #4: 0003-arm64-tegra-Add-NVJPG-node.patch --]
[-- Type: text/x-patch, Size: 1139 bytes --]

From d63e3a36f0c0e570e1bfa4467efe54da164a062d Mon Sep 17 00:00:00 2001
From: Diogo Ivo <diogo.ivo@tecnico.ulisboa.pt>
Date: Thu, 16 Nov 2023 17:29:25 +0000
Subject: [PATCH 3/3] arm64: tegra: Add NVJPG node

The Tegra X1 chip contains a NVJPG accelerator capable of
encoding/decoding JPEG files in hardware, so add its DT node.

Signed-off-by: Diogo Ivo <diogo.ivo@tecnico.ulisboa.pt>
---
 arch/arm64/boot/dts/nvidia/tegra210.dtsi | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/boot/dts/nvidia/tegra210.dtsi b/arch/arm64/boot/dts/nvidia/tegra210.dtsi
index d96651d09d90..f957eefeae9f 100644
--- a/arch/arm64/boot/dts/nvidia/tegra210.dtsi
+++ b/arch/arm64/boot/dts/nvidia/tegra210.dtsi
@@ -253,7 +253,13 @@ vic@54340000 {
 		nvjpg@54380000 {
 			compatible = "nvidia,tegra210-nvjpg";
 			reg = <0x0 0x54380000 0x0 0x00040000>;
-			status = "disabled";
+			clocks = <&tegra_car TEGRA210_CLK_NVJPG>;
+			clock-names = "nvjpg";
+			resets = <&tegra_car 195>;
+			reset-names = "nvjpg";
+
+			iommus = <&mc TEGRA_SWGROUP_NVJPG>;
+			power-domains = <&pd_nvjpg>;
 		};
 
 		dsib: dsi@54400000 {
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [REGRESSION] Invalid gather when using Tegra210 media engines
  2025-02-03 17:35   ` Diogo Ivo
@ 2025-02-03 17:43     ` Jason Gunthorpe
  2025-02-03 18:49       ` Robin Murphy
  0 siblings, 1 reply; 9+ messages in thread
From: Jason Gunthorpe @ 2025-02-03 17:43 UTC (permalink / raw)
  To: Diogo Ivo
  Cc: thierry.reding, vdumpa, joro, will, robin.murphy, jonathanh,
	baolu.lu, jsnitsel, jroedel, regressions, linux-tegra, iommu

> > How about this:
> > 
> > diff --git a/drivers/gpu/host1x/dev.c b/drivers/gpu/host1x/dev.c
> > index be2ad7203d7b96..090b1fc97a7309 100644
> > --- a/drivers/gpu/host1x/dev.c
> > +++ b/drivers/gpu/host1x/dev.c
> > @@ -361,6 +361,10 @@ static bool host1x_wants_iommu(struct host1x *host1x)
> >   	return true;
> >   }
> > +/*
> > + * Returns ERR_PTR on failure, NULL if the translation is IDENTITY, otherwise a
> > + * valid paging domain.
> > + */
> >   static struct iommu_domain *host1x_iommu_attach(struct host1x *host)
> >   {
> >   	struct iommu_domain *domain = iommu_get_domain_for_dev(host->dev);
> > @@ -385,6 +389,8 @@ static struct iommu_domain *host1x_iommu_attach(struct host1x *host)
> >   	 * Similarly, if host1x is already attached to an IOMMU (via the DMA
> >   	 * API), don't try to attach again.
> >   	 */
> > +	if (domain && domain->type == IOMMU_DOMAIN_IDENTITY)
> > +		domain = NULL;
> >   	if (!host1x_wants_iommu(host) || domain)
> >   		return domain;
> > 
> > (if not can you investigate this function's flow compared to a good
> > kernel?)
> 
> Yes, this worked! Does this mean that with this change we go through the
> path of using the shared Tegra domain (for example in the driver I
> attached client->group == true), and if that is the case would it be
> beneficial for us to try and change tegra_smmu_def_domain_type() from
> returning IOMMU_DOMAIN_IDENTITY into IOMMU_DOMAIN_DMA so that the
> dma_alloc_* functions are called directly?

I do not know the answer those questions.. The whole rational around
this host 1x domain stuff is mysterious to me.

It does sound quite appealing for the implementation to use the dma
api instead of attaching its own special domain.

Jason

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [REGRESSION] Invalid gather when using Tegra210 media engines
  2025-02-03 17:43     ` Jason Gunthorpe
@ 2025-02-03 18:49       ` Robin Murphy
  2025-02-03 19:13         ` Jason Gunthorpe
  0 siblings, 1 reply; 9+ messages in thread
From: Robin Murphy @ 2025-02-03 18:49 UTC (permalink / raw)
  To: Jason Gunthorpe, Diogo Ivo
  Cc: thierry.reding, vdumpa, joro, will, jonathanh, baolu.lu, jsnitsel,
	jroedel, regressions, linux-tegra, iommu

On 2025-02-03 5:43 pm, Jason Gunthorpe wrote:
>>> How about this:
>>>
>>> diff --git a/drivers/gpu/host1x/dev.c b/drivers/gpu/host1x/dev.c
>>> index be2ad7203d7b96..090b1fc97a7309 100644
>>> --- a/drivers/gpu/host1x/dev.c
>>> +++ b/drivers/gpu/host1x/dev.c
>>> @@ -361,6 +361,10 @@ static bool host1x_wants_iommu(struct host1x *host1x)
>>>    	return true;
>>>    }
>>> +/*
>>> + * Returns ERR_PTR on failure, NULL if the translation is IDENTITY, otherwise a
>>> + * valid paging domain.
>>> + */
>>>    static struct iommu_domain *host1x_iommu_attach(struct host1x *host)
>>>    {
>>>    	struct iommu_domain *domain = iommu_get_domain_for_dev(host->dev);
>>> @@ -385,6 +389,8 @@ static struct iommu_domain *host1x_iommu_attach(struct host1x *host)
>>>    	 * Similarly, if host1x is already attached to an IOMMU (via the DMA
>>>    	 * API), don't try to attach again.
>>>    	 */
>>> +	if (domain && domain->type == IOMMU_DOMAIN_IDENTITY)
>>> +		domain = NULL;
>>>    	if (!host1x_wants_iommu(host) || domain)
>>>    		return domain;
>>>
>>> (if not can you investigate this function's flow compared to a good
>>> kernel?)
>>
>> Yes, this worked! Does this mean that with this change we go through the
>> path of using the shared Tegra domain (for example in the driver I
>> attached client->group == true), and if that is the case would it be
>> beneficial for us to try and change tegra_smmu_def_domain_type() from
>> returning IOMMU_DOMAIN_IDENTITY into IOMMU_DOMAIN_DMA so that the
>> dma_alloc_* functions are called directly?

FWIW that would be better thought of as simply making the entire 
existence of tegra_smmu_def_domain_type() conditional on CONFIG_ARM. 
Definitely worth trying, at least.

> I do not know the answer those questions.. The whole rational around
> this host 1x domain stuff is mysterious to me.

This change (i.e. ignoring identity domains) should indeed lead to 
host1x allocating and using its own paging domain, which is indeed what 
you want in that case because otherwise the non-IOMMU DMA ops aren't 
going to be much use on their own for allocating/importing great big 
media buffers.

> It does sound quite appealing for the implementation to use the dma
> api instead of attaching its own special domain.

I'd hope the historical reasons for not supporting IOMMU_DOMAIN_DMA in 
tegra-smmu no longer apply, given that all the default domain stuff has 
now been integrated into host1x for the newer arm-smmu based Tegras. And 
I guess it also shows that nobody's been running those newer SoCs with 
IOMMU_DEFAULT_PASSTHROUGH either, or they presumably would have run in 
to this same issue...

Thanks,
Robin.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [REGRESSION] Invalid gather when using Tegra210 media engines
  2025-02-03 18:49       ` Robin Murphy
@ 2025-02-03 19:13         ` Jason Gunthorpe
  2025-02-04  3:26           ` Mikko Perttunen
  0 siblings, 1 reply; 9+ messages in thread
From: Jason Gunthorpe @ 2025-02-03 19:13 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Diogo Ivo, thierry.reding, vdumpa, joro, will, jonathanh,
	baolu.lu, jsnitsel, jroedel, regressions, linux-tegra, iommu

On Mon, Feb 03, 2025 at 06:49:07PM +0000, Robin Murphy wrote:
> I'd hope the historical reasons for not supporting IOMMU_DOMAIN_DMA in
> tegra-smmu no longer apply, given that all the default domain stuff has now
> been integrated into host1x for the newer arm-smmu based Tegras.

Indeed I do see appropriate looking calls to the normal DMA API, and
the other mapping path is conditionalized by !host->domain.

So, why didn't it work for Diogo? Even in identity mode the DMA API
will return correct DMA addresses and the !host->domain path will skip
mapping them.

Poking around I wonder if there is some assumption that if other parts
of the stack, maybe the DRM driver, are using the special domain than
everyone is? It seems to blindly pass around IOVA without really
checking who is consuming it.

Christian was telling me DMABUF had some drivers that made the
(incorrect!) assumption they were all sharing translation.

It does seem like a nice project for someone who has the hardware to
rip out all of this custom domain stuff and just have the iommu layer
setup a shared dma-iommu domain.

Jason

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [REGRESSION] Invalid gather when using Tegra210 media engines
  2025-02-03 19:13         ` Jason Gunthorpe
@ 2025-02-04  3:26           ` Mikko Perttunen
  2025-02-04 13:21             ` Jason Gunthorpe
  0 siblings, 1 reply; 9+ messages in thread
From: Mikko Perttunen @ 2025-02-04  3:26 UTC (permalink / raw)
  To: Jason Gunthorpe, Robin Murphy
  Cc: Diogo Ivo, thierry.reding, vdumpa, joro, will, jonathanh,
	baolu.lu, jsnitsel, jroedel, regressions, linux-tegra, iommu

On 2/4/25 4:13 AM, Jason Gunthorpe wrote:
> On Mon, Feb 03, 2025 at 06:49:07PM +0000, Robin Murphy wrote:
>> I'd hope the historical reasons for not supporting IOMMU_DOMAIN_DMA in
>> tegra-smmu no longer apply, given that all the default domain stuff has now
>> been integrated into host1x for the newer arm-smmu based Tegras.
> 
> Indeed I do see appropriate looking calls to the normal DMA API, and
> the other mapping path is conditionalized by !host->domain.
> 
> So, why didn't it work for Diogo? Even in identity mode the DMA API
> will return correct DMA addresses and the !host->domain path will skip
> mapping them.
> 
> Poking around I wonder if there is some assumption that if other parts
> of the stack, maybe the DRM driver, are using the special domain than
> everyone is? It seems to blindly pass around IOVA without really
> checking who is consuming it.

I'm not sure where that would be, but it's certainly possible given that 
this combination of code paths wouldn't have been tested.

> 
> Christian was telling me DMABUF had some drivers that made the
> (incorrect!) assumption they were all sharing translation.
> 
> It does seem like a nice project for someone who has the hardware to
> rip out all of this custom domain stuff and just have the iommu layer
> setup a shared dma-iommu domain.
> 
> Jason
> 

This has been a long-standing project. The issue is that some boot 
chains set up the display expecting identity mappings, but if dma-iommu 
domains were enabled, they would enable the IOMMU without mappings 
before the display driver gets to run. Perhaps Thierry knows what the 
missing pieces are.

See 
https://lore.kernel.org/linux-iommu/20220512190052.1152377-5-thierry.reding@gmail.com/

Cheers,
Mikko

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [REGRESSION] Invalid gather when using Tegra210 media engines
  2025-02-04  3:26           ` Mikko Perttunen
@ 2025-02-04 13:21             ` Jason Gunthorpe
  2025-02-04 14:36               ` Thierry Reding
  0 siblings, 1 reply; 9+ messages in thread
From: Jason Gunthorpe @ 2025-02-04 13:21 UTC (permalink / raw)
  To: Mikko Perttunen
  Cc: Robin Murphy, Diogo Ivo, thierry.reding, vdumpa, joro, will,
	jonathanh, baolu.lu, jsnitsel, jroedel, regressions, linux-tegra,
	iommu

On Tue, Feb 04, 2025 at 12:26:46PM +0900, Mikko Perttunen wrote:
> On 2/4/25 4:13 AM, Jason Gunthorpe wrote:
> > On Mon, Feb 03, 2025 at 06:49:07PM +0000, Robin Murphy wrote:
> > > I'd hope the historical reasons for not supporting IOMMU_DOMAIN_DMA in
> > > tegra-smmu no longer apply, given that all the default domain stuff has now
> > > been integrated into host1x for the newer arm-smmu based Tegras.
> > 
> > Indeed I do see appropriate looking calls to the normal DMA API, and
> > the other mapping path is conditionalized by !host->domain.
> > 
> > So, why didn't it work for Diogo? Even in identity mode the DMA API
> > will return correct DMA addresses and the !host->domain path will skip
> > mapping them.
> > 
> > Poking around I wonder if there is some assumption that if other parts
> > of the stack, maybe the DRM driver, are using the special domain than
> > everyone is? It seems to blindly pass around IOVA without really
> > checking who is consuming it.
> 
> I'm not sure where that would be, but it's certainly possible given that
> this combination of code paths wouldn't have been tested.

I saw some weird stuff.. Like tegra_bo_pin() does:

	/*
	 * If we've manually mapped the buffer object through the IOMMU, make sure to return the
	 * existing IOVA address of our mapping.
	 */
	if (!obj->mm) {
	} else {
		map->phys = obj->iova;
		map->chunks = 1;

And obj->iova isn't obviously linked to a dma map on dev.. The comment
reads like it is making the assumption that there is only one iommu
domain shared by everyone (without checking dev is part of that
scheme)

> > Christian was telling me DMABUF had some drivers that made the
> > (incorrect!) assumption they were all sharing translation.
> > 
> > It does seem like a nice project for someone who has the hardware to
> > rip out all of this custom domain stuff and just have the iommu layer
> > setup a shared dma-iommu domain.
> 
> This has been a long-standing project. The issue is that some boot chains
> set up the display expecting identity mappings,

The smmu driver can match on compatible strings and leave some devices
in identity mode, if that helps. Other drivers are doing this to work
around various issues.

> See https://lore.kernel.org/linux-iommu/20220512190052.1152377-5-thierry.reding@gmail.com/

But using RMR seems like a better solution?

We could perhaps also figure out some way to leave the translation in
identity until the DRM driver completes the reset, then the DRM driver
could activate the DMA API translation manually.

Jason

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [REGRESSION] Invalid gather when using Tegra210 media engines
  2025-02-04 13:21             ` Jason Gunthorpe
@ 2025-02-04 14:36               ` Thierry Reding
  0 siblings, 0 replies; 9+ messages in thread
From: Thierry Reding @ 2025-02-04 14:36 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Mikko Perttunen, Robin Murphy, Diogo Ivo, vdumpa, joro, will,
	jonathanh, baolu.lu, jsnitsel, jroedel, regressions, linux-tegra,
	iommu

[-- Attachment #1: Type: text/plain, Size: 6185 bytes --]

On Tue, Feb 04, 2025 at 09:21:12AM -0400, Jason Gunthorpe wrote:
> On Tue, Feb 04, 2025 at 12:26:46PM +0900, Mikko Perttunen wrote:
> > On 2/4/25 4:13 AM, Jason Gunthorpe wrote:
> > > On Mon, Feb 03, 2025 at 06:49:07PM +0000, Robin Murphy wrote:
> > > > I'd hope the historical reasons for not supporting IOMMU_DOMAIN_DMA in
> > > > tegra-smmu no longer apply, given that all the default domain stuff has now
> > > > been integrated into host1x for the newer arm-smmu based Tegras.
> > > 
> > > Indeed I do see appropriate looking calls to the normal DMA API, and
> > > the other mapping path is conditionalized by !host->domain.
> > > 
> > > So, why didn't it work for Diogo? Even in identity mode the DMA API
> > > will return correct DMA addresses and the !host->domain path will skip
> > > mapping them.
> > > 
> > > Poking around I wonder if there is some assumption that if other parts
> > > of the stack, maybe the DRM driver, are using the special domain than
> > > everyone is? It seems to blindly pass around IOVA without really
> > > checking who is consuming it.
> > 
> > I'm not sure where that would be, but it's certainly possible given that
> > this combination of code paths wouldn't have been tested.
> 
> I saw some weird stuff.. Like tegra_bo_pin() does:
> 
> 	/*
> 	 * If we've manually mapped the buffer object through the IOMMU, make sure to return the
> 	 * existing IOVA address of our mapping.
> 	 */
> 	if (!obj->mm) {
> 	} else {
> 		map->phys = obj->iova;
> 		map->chunks = 1;
> 
> And obj->iova isn't obviously linked to a dma map on dev.. The comment
> reads like it is making the assumption that there is only one iommu
> domain shared by everyone (without checking dev is part of that
> scheme)

Yes, that's correct. We used to have a bit more code around to deal with
single domain (Tegra SMMU is supported on some platforms where we have a
maximum of 4 address spaces, hence we have to be very careful about what
devices share which address space).

In fact the host1x and DRM drivers also support Tegra20, which didn't
have an IOMMU at all, but one of those really old GARTs. On top of that
it only supported a 32 *M*iB window. I think others have concluded that
today we can't practically use GART and it's better to use something
like CMA for those cases. So maybe that's not an argument anymore.

Most of these restrictions no longer apply as of Tegra124 (I think) and
starting with Tegra186 the IOMMU is an ARM SMMU, so none of these work-
arounds should be needed there. Most of this code exists for very legacy
reasons, but there's still people with Tegra30, Tegra114, Tegra124 and
Tegra210, all of which are susceptible to this (at least partially).

> > > Christian was telling me DMABUF had some drivers that made the
> > > (incorrect!) assumption they were all sharing translation.
> > > 
> > > It does seem like a nice project for someone who has the hardware to
> > > rip out all of this custom domain stuff and just have the iommu layer
> > > setup a shared dma-iommu domain.
> > 
> > This has been a long-standing project. The issue is that some boot chains
> > set up the display expecting identity mappings,

I should revive an old patch series that attempted to get rid of most of
this. The plan was indeed to remove all of the direct IOMMU API code and
use DMA API exclusively. There were a couple of issues with it, such as
performance regressions due to the extra mappings required, but I think
I got pretty far in mitigating most of those.

In the end I got frustrated with it a few years back because there are
some corner cases that I couldn't make work and testing of this becomes
increasingly complicated. And yes, display was specifically an issue on
Tegra210 platforms (earlier platforms didn't usually set up display on
boot).

One additional problem with supporting the reserved-memory identity
mappings is that existing firmware doesn't set things up correctly and
these devices are software EOL'ed, so no new releases are planned that
could contain a fix for this.

> The smmu driver can match on compatible strings and leave some devices
> in identity mode, if that helps. Other drivers are doing this to work
> around various issues.

This might indeed be a viable path. Again, we need to be careful about
the older devices, but maybe a shared dma-iommu domain would work these
days.

We definitely want to transition to a DMA IOMMU domain later on, but it
would need to be after display has had a chance to set things up. And it
would still need a firmware that actually passes the right information
to DT because otherwise we'd still get SMMU faults from the display
hardware trying to scan out from memory that it doesn't own (or I guess
scanning out garbage from memory that's now being used for other
purposes).

> > See https://lore.kernel.org/linux-iommu/20220512190052.1152377-5-thierry.reding@gmail.com/
> 
> But using RMR seems like a better solution?

reserved-memory is the DT equivalent of ACPI's RMR, as far as I
understand. Or at least, it's the closest thing to it. We don't have
ACPI on these devices, hence why we do this via reserved-memory DT
nodes. But again, the problem is that firmware doesn't currently pass
this correctly to the kernel, so we'd need a new release for that. I'd
need to find out what can be done to achieve this. One big issue might
be that it'd be a DT ABI break, technically. So upstream Linux would
only work reliably using that new version. That might be fine if we can
make some sort of official release.

> We could perhaps also figure out some way to leave the translation in
> identity until the DRM driver completes the reset, then the DRM driver
> could activate the DMA API translation manually.

Yeah. We do something similar for Tegra186 and later where we have the
memory controller redirect accesses to the SMMU only after the device
has been attached. I'm not sure if we can do that for the old Tegra
SMMU, though. It would probably still require a firmware update to make
sure this can be properly handed off to the kernel.

Thierry

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2025-02-04 14:36 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-03 14:55 [REGRESSION] Invalid gather when using Tegra210 media engines Diogo Ivo
2025-02-03 17:06 ` Jason Gunthorpe
2025-02-03 17:35   ` Diogo Ivo
2025-02-03 17:43     ` Jason Gunthorpe
2025-02-03 18:49       ` Robin Murphy
2025-02-03 19:13         ` Jason Gunthorpe
2025-02-04  3:26           ` Mikko Perttunen
2025-02-04 13:21             ` Jason Gunthorpe
2025-02-04 14:36               ` Thierry Reding

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox