* [PATCH] dmaengine: idxd: fix double free in idxd_alloc() error path
From: Guangshuo Li @ 2026-04-01 9:40 UTC (permalink / raw)
To: Vinicius Costa Gomes, Dave Jiang, Vinod Koul, Shuai Xue,
Fenghua Yu, dmaengine, linux-kernel
Cc: Guangshuo Li, stable
When dev_set_name() fails after device_initialize(), idxd_alloc()
calls put_device(conf_dev).
For these devices, conf_dev->type is set from idxd->data->dev_type,
which resolves to dsa_device_type or iax_device_type, and both use
idxd_conf_device_release() as their release callback.
That release callback frees idxd, idxd->opcap_bmap, and releases
idxd->id, but the current error path then frees those resources again
directly, causing a double free.
Keep the cleanup in idxd_conf_device_release() after put_device() and
avoid freeing idxd-managed resources again in idxd_alloc().
Fixes: 46a5cca76c76 ("dmaengine: idxd: fix memory leak in error handling path of idxd_alloc")
Cc: stable@vger.kernel.org
Signed-off-by: Guangshuo Li <lgs201920130244@gmail.com>
---
drivers/dma/idxd/init.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/dma/idxd/init.c b/drivers/dma/idxd/init.c
index 4eff74182225..94ce52565e7a 100644
--- a/drivers/dma/idxd/init.c
+++ b/drivers/dma/idxd/init.c
@@ -635,7 +635,7 @@ static struct idxd_device *idxd_alloc(struct pci_dev *pdev, struct idxd_driver_d
err_name:
put_device(conf_dev);
- bitmap_free(idxd->opcap_bmap);
+ return NULL;
err_opcap:
ida_free(&idxd_ida, idxd->id);
err_ida:
--
2.43.0
^ permalink raw reply related
* Re: [PATCH] dmaengine: xilinx_dma: Fix CPU stall in xilinx_dma_poll_timeout
From: Geert Uytterhoeven @ 2026-04-01 8:40 UTC (permalink / raw)
To: Alex Bereza
Cc: Gupta, Suraj, Vinod Koul, Frank Li, Michal Simek, Ulf Hansson,
Arnd Bergmann, Tony Lindgren, dmaengine, linux-arm-kernel,
linux-kernel
In-Reply-To: <DHHOCNHDN27K.RIE745OFAACD@bereza.email>
Hi Alex,
On Wed, 1 Apr 2026 at 10:27, Alex Bereza <alex@bereza.email> wrote:
> On Wed Apr 1, 2026 at 7:23 AM CEST, Suraj Gupta wrote:
> >> Rename XILINX_DMA_LOOP_COUNT to XILINX_DMA_POLL_TIMEOUT_US because the
> >> former is incorrect. It is a timeout value for polling various register
> >> bits in microseconds. It is not a loop count. Add a constant
> >> XILINX_DMA_POLL_DELAY_US for delay_us value.
> >
> > Please split this change in a new patch.
>
> Ok, will send a v2.
>
> >> Fixes: 7349a69cf312 ("iopoll: Do not use timekeeping in read_poll_timeout_atomic()")
> >
> > This patch doesn't fixes anything in iopoll, please use correct fixes tag.
Fixes-tag are also used as guidelines, to indicate which patches
are also needed when backporting something. I.e. if 7349a69cf312 is
ever backported, any other commits that contain "Fixes: 7349a69cf312"
should be backported, too. So having this Fixes-tag, in addition to
another xilinx_dma-specific one, sounds fine to me.
> Ok, but I'm not sure what would be the correct fixes tag then? I though I need to reference
> 7349a69cf312 in fixes tag because this is the actual change that surfaced the CPU stall issue that I
> want to fix in this driver. I'm fixing the call sites of xilinx_dma_poll_timeout but they were added
> in different commits. Should I add all of them? That would be the following then:
>
> Fixes: 9495f2648287 ("dmaengine: xilinx_vdma: Use readl_poll_timeout instead of do while loop's")
> Fixes: 676f9c26c330 ("dmaengine: xilinx: fix device_terminate_all() callback for AXI CDMA")
Gr{oetje,eeting}s,
Geert
--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org
In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
^ permalink raw reply
* Re: [PATCH] dmaengine: xilinx_dma: Fix CPU stall in xilinx_dma_poll_timeout
From: Alex Bereza @ 2026-04-01 8:27 UTC (permalink / raw)
To: Gupta, Suraj, Alex Bereza, Vinod Koul, Frank Li, Michal Simek,
Geert Uytterhoeven, Ulf Hansson, Arnd Bergmann, Tony Lindgren
Cc: dmaengine, linux-arm-kernel, linux-kernel
In-Reply-To: <833bb42a-65b8-4c93-8109-d2959f8b807f@amd.com>
On Wed Apr 1, 2026 at 7:23 AM CEST, Suraj Gupta wrote:
>> Rename XILINX_DMA_LOOP_COUNT to XILINX_DMA_POLL_TIMEOUT_US because the
>> former is incorrect. It is a timeout value for polling various register
>> bits in microseconds. It is not a loop count. Add a constant
>> XILINX_DMA_POLL_DELAY_US for delay_us value.
>
> Please split this change in a new patch.
Ok, will send a v2.
>> Fixes: 7349a69cf312 ("iopoll: Do not use timekeeping in read_poll_timeout_atomic()")
>
> This patch doesn't fixes anything in iopoll, please use correct fixes tag.
Ok, but I'm not sure what would be the correct fixes tag then? I though I need to reference
7349a69cf312 in fixes tag because this is the actual change that surfaced the CPU stall issue that I
want to fix in this driver. I'm fixing the call sites of xilinx_dma_poll_timeout but they were added
in different commits. Should I add all of them? That would be the following then:
Fixes: 9495f2648287 ("dmaengine: xilinx_vdma: Use readl_poll_timeout instead of do while loop's")
Fixes: 676f9c26c330 ("dmaengine: xilinx: fix device_terminate_all() callback for AXI CDMA")
Three call sites with delay_us=0 were first introduced by 9495f2648287, then 676f9c26c330 added the
fourth call site when introducing xilinx_cdma_stop_transfer (probably copy paste from
xilinx_dma_stop_transfer). Would adding these two fixes tags be correct?
>> Hi, in addition to this patch I also have a question: what is the point
>> of atomically polling for the HALTED or IDLE bit in the stop_transfer
>> functions? Does device_terminate_all really need to be callable from
>> atomic context? If not, one could switch to polling non-atomically and
>> avoid burning CPU cycles.
>>
>
> dmaengine_terminate_async(), which directly calls device_terminate_all
> can be called from atomic context.
Right, thanks! Just for my understanding: I still think there is potential for improvement, because
from my understanding it would be beneficial to do the waiting for the bits in the status register
and the freeing of descriptors in xilinx_dma_synchronize. Do I understand correctly that this is
currently not possible due to how the DMA engine API is structured? To make this possible I think
the deprecated dmaengine_terminate_all would have to be removed and all users of this API would have
to be adapted accordingly, correct? So this would be a patch of much larger scope than xilinx_dma
driver alone.
^ permalink raw reply
* Re: [PATCH v3 1/5] dt-bindings: dmaengine: Add SpacemiT K3 DMA compatible string
From: Troy Mitchell @ 2026-04-01 6:44 UTC (permalink / raw)
To: Krzysztof Kozlowski, Troy Mitchell
Cc: Vinod Koul, Frank Li, Rob Herring, Krzysztof Kozlowski,
Conor Dooley, Yixun Lan, Guodong Xu, Michael Turquette,
Stephen Boyd, Paul Walmsley, Palmer Dabbelt, Albert Ou,
Alexandre Ghiti, dmaengine, devicetree, linux-riscv, spacemit,
linux-kernel, linux-clk
In-Reply-To: <20260401-divergent-magenta-dalmatian-3c6c3e@quoll>
On Wed Apr 1, 2026 at 2:42 PM CST, Krzysztof Kozlowski wrote:
> On Tue, Mar 31, 2026 at 04:27:04PM +0800, Troy Mitchell wrote:
>> From: Guodong Xu <guodong@riscstar.com>
>> - Memory addressing capabilities: Unlike the K1 SoC, which had memory addressing
>> limitations (e.g., restricted to the 0-4GB space) and required a dedicated
>> dma-bus with dma-ranges to restrict memory allocations, the K3 DMA masters
>> possess full memory addressing capabilities.
>
> Programming interface is still compatible, regardless of memory
> addressing limitations, so that is rather incorrect reason.
I'll remove this item. Thanks.
- Troy
>
> Best regards,
> Krzysztof
^ permalink raw reply
* Re: [PATCH v3 1/5] dt-bindings: dmaengine: Add SpacemiT K3 DMA compatible string
From: Krzysztof Kozlowski @ 2026-04-01 6:42 UTC (permalink / raw)
To: Troy Mitchell
Cc: Vinod Koul, Frank Li, Rob Herring, Krzysztof Kozlowski,
Conor Dooley, Yixun Lan, Guodong Xu, Michael Turquette,
Stephen Boyd, Paul Walmsley, Palmer Dabbelt, Albert Ou,
Alexandre Ghiti, dmaengine, devicetree, linux-riscv, spacemit,
linux-kernel, linux-clk
In-Reply-To: <20260331-k3-pdma-v3-1-a4e60dd8b4b3@linux.spacemit.com>
On Tue, Mar 31, 2026 at 04:27:04PM +0800, Troy Mitchell wrote:
> From: Guodong Xu <guodong@riscstar.com>
>
> Add the "spacemit,k3-pdma" compatible string for the SpacemiT K3 SoC.
>
> While the K3 PDMA IP reuses most of the design found on the earlier K1 SoC,
> a new compatible string is required due to the following hardware differences:
Please wrap commit message according to Linux coding style / submission
process (neither too early nor over the limit):
https://elixir.bootlin.com/linux/v6.4-rc1/source/Documentation/process/submitting-patches.rst#L597
>
> - Variable extended DRCMR base: The DRCMR (DMA Request/Command Register) base
> address for extended DMA request numbers (>= 64) differs from the K1
> implementation, requiring different driver ops.
Please do not mention drivers.
> - Memory addressing capabilities: Unlike the K1 SoC, which had memory addressing
> limitations (e.g., restricted to the 0-4GB space) and required a dedicated
> dma-bus with dma-ranges to restrict memory allocations, the K3 DMA masters
> possess full memory addressing capabilities.
Programming interface is still compatible, regardless of memory
addressing limitations, so that is rather incorrect reason.
Best regards,
Krzysztof
^ permalink raw reply
* Re: [PATCH] dmaengine: xilinx_dma: Fix CPU stall in xilinx_dma_poll_timeout
From: Gupta, Suraj @ 2026-04-01 5:23 UTC (permalink / raw)
To: Alex Bereza, Vinod Koul, Frank Li, Michal Simek,
Geert Uytterhoeven, Ulf Hansson, Arnd Bergmann, Tony Lindgren
Cc: dmaengine, linux-arm-kernel, linux-kernel
In-Reply-To: <20260331-fix-atomic-poll-timeout-regression-v1-1-5b7bd96eaca0@bereza.email>
On 3/31/2026 10:14 PM, Alex Bereza wrote:
> [You don't often get email from alex@bereza.email. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
>
> Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.
>
>
> Currently when calling xilinx_dma_poll_timeout with delay_us=0 and a
> condition that is never fulfilled, the CPU busy-waits for prolonged time
> and the timeout triggers only with a massive delay causing a CPU stall.
>
> This happens due to a huge underestimation of wall clock time in
> poll_timeout_us_atomic. Commit 7349a69cf312 ("iopoll: Do not use
> timekeeping in read_poll_timeout_atomic()") changed the behavior to no
> longer use ktime_get at the expense of underestimation of wall clock
> time which appears to be very large for delay_us=0. Instead of timing
> out after approximately XILINX_DMA_LOOP_COUNT microseconds, the timeout
> takes XILINX_DMA_LOOP_COUNT * 1000 * (time that the overhead of the for
> loop in poll_timeout_us_atomic takes) which is in the range of several
> minutes for XILINX_DMA_LOOP_COUNT=1000000. Fix this by using a non-zero
> value for delay_us. Use delay_us=10 to keep the delay in the hot path of
> starting DMA transfers minimal but still avoid CPU stalls in case of
> unexpected hardware failures.
>
> One-off measurement with delay_us=0 causes the cpu to busy wait around 7
> minutes in the timeout case. After applying this patch with delay_us=10
> the measured timeout was 1053428 microseconds which is roughly
> equivalent to the expected 1000000 microseconds specified in
> XILINX_DMA_POLL_TIMEOUT_US.
>
> Rename XILINX_DMA_LOOP_COUNT to XILINX_DMA_POLL_TIMEOUT_US because the
> former is incorrect. It is a timeout value for polling various register
> bits in microseconds. It is not a loop count. Add a constant
> XILINX_DMA_POLL_DELAY_US for delay_us value.
Please split this change in a new patch.
>
> Fixes: 7349a69cf312 ("iopoll: Do not use timekeeping in read_poll_timeout_atomic()")
This patch doesn't fixes anything in iopoll, please use correct fixes tag.
> Signed-off-by: Alex Bereza <alex@bereza.email>
> ---
> Hi, in addition to this patch I also have a question: what is the point
> of atomically polling for the HALTED or IDLE bit in the stop_transfer
> functions? Does device_terminate_all really need to be callable from
> atomic context? If not, one could switch to polling non-atomically and
> avoid burning CPU cycles.
>
dmaengine_terminate_async(), which directly calls device_terminate_all
can be called from atomic context.
Regards,
Suraj
> As this is my first patch, please feel free to point me in the right
> direction if I am missing anything.
> ---
> drivers/dma/xilinx/xilinx_dma.c | 26 ++++++++++++++++----------
> 1 file changed, 16 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/dma/xilinx/xilinx_dma.c b/drivers/dma/xilinx/xilinx_dma.c
> index 02a05f215614..8556c357b665 100644
> --- a/drivers/dma/xilinx/xilinx_dma.c
> +++ b/drivers/dma/xilinx/xilinx_dma.c
> @@ -165,8 +165,10 @@
> #define XILINX_DMA_FLUSH_MM2S 2
> #define XILINX_DMA_FLUSH_BOTH 1
>
> -/* Delay loop counter to prevent hardware failure */
> -#define XILINX_DMA_LOOP_COUNT 1000000
> +/* Timeout for polling various registers */
> +#define XILINX_DMA_POLL_TIMEOUT_US 1000000
> +/* Delay between polls (avoid a delay of 0 to prevent CPU stalls) */
> +#define XILINX_DMA_POLL_DELAY_US 10
>
> /* AXI DMA Specific Registers/Offsets */
> #define XILINX_DMA_REG_SRCDSTADDR 0x18
> @@ -1332,8 +1334,9 @@ static int xilinx_dma_stop_transfer(struct xilinx_dma_chan *chan)
>
> /* Wait for the hardware to halt */
> return xilinx_dma_poll_timeout(chan, XILINX_DMA_REG_DMASR, val,
> - val & XILINX_DMA_DMASR_HALTED, 0,
> - XILINX_DMA_LOOP_COUNT);
> + val & XILINX_DMA_DMASR_HALTED,
> + XILINX_DMA_POLL_DELAY_US,
> + XILINX_DMA_POLL_TIMEOUT_US);
> }
>
> /**
> @@ -1347,8 +1350,9 @@ static int xilinx_cdma_stop_transfer(struct xilinx_dma_chan *chan)
> u32 val;
>
> return xilinx_dma_poll_timeout(chan, XILINX_DMA_REG_DMASR, val,
> - val & XILINX_DMA_DMASR_IDLE, 0,
> - XILINX_DMA_LOOP_COUNT);
> + val & XILINX_DMA_DMASR_IDLE,
> + XILINX_DMA_POLL_DELAY_US,
> + XILINX_DMA_POLL_TIMEOUT_US);
> }
>
> /**
> @@ -1364,8 +1368,9 @@ static void xilinx_dma_start(struct xilinx_dma_chan *chan)
>
> /* Wait for the hardware to start */
> err = xilinx_dma_poll_timeout(chan, XILINX_DMA_REG_DMASR, val,
> - !(val & XILINX_DMA_DMASR_HALTED), 0,
> - XILINX_DMA_LOOP_COUNT);
> + !(val & XILINX_DMA_DMASR_HALTED),
> + XILINX_DMA_POLL_DELAY_US,
> + XILINX_DMA_POLL_TIMEOUT_US);
>
> if (err) {
> dev_err(chan->dev, "Cannot start channel %p: %x\n",
> @@ -1780,8 +1785,9 @@ static int xilinx_dma_reset(struct xilinx_dma_chan *chan)
>
> /* Wait for the hardware to finish reset */
> err = xilinx_dma_poll_timeout(chan, XILINX_DMA_REG_DMACR, tmp,
> - !(tmp & XILINX_DMA_DMACR_RESET), 0,
> - XILINX_DMA_LOOP_COUNT);
> + !(tmp & XILINX_DMA_DMACR_RESET),
> + XILINX_DMA_POLL_DELAY_US,
> + XILINX_DMA_POLL_TIMEOUT_US);
>
> if (err) {
> dev_err(chan->dev, "reset timeout, cr %x, sr %x\n",
>
> ---
> base-commit: b7560798466a07d9c3fb011698e92c335ab28baf
> change-id: 20260330-fix-atomic-poll-timeout-regression-4f4e3baf3fd7
>
> Best regards,
> --
> Alex Bereza <alex@bereza.email>
>
>
^ permalink raw reply
* [PATCH] dmaengine: idxd: fix double free in idxd_setup_engines() error path
From: Guangshuo Li @ 2026-04-01 3:40 UTC (permalink / raw)
To: Vinicius Costa Gomes, Dave Jiang, Vinod Koul, Fenghua Yu,
Shuai Xue, dmaengine, linux-kernel
Cc: Guangshuo Li, stable
When an error happens after device_initialize(), idxd_setup_engines()
calls put_device(conf_dev).
The device release callback idxd_conf_engine_release() frees engine,
but the current error paths then call kfree(engine) again, causing a
double free.
Keep the cleanup in idxd_conf_engine_release() after put_device() and
avoid freeing engine again in idxd_setup_engines().
Fixes: 817bced19d1d ("dmaengine: idxd: fix memory leak in error handling path of idxd_setup_engines")
Cc: stable@vger.kernel.org
Signed-off-by: Guangshuo Li <lgs201920130244@gmail.com>
---
drivers/dma/idxd/init.c | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/drivers/dma/idxd/init.c b/drivers/dma/idxd/init.c
index d9a9d56dd277..4eff74182225 100644
--- a/drivers/dma/idxd/init.c
+++ b/drivers/dma/idxd/init.c
@@ -310,7 +310,7 @@ static int idxd_setup_engines(struct idxd_device *idxd)
rc = dev_set_name(conf_dev, "engine%d.%d", idxd->id, engine->id);
if (rc < 0) {
put_device(conf_dev);
- kfree(engine);
+
goto err;
}
@@ -324,7 +324,7 @@ static int idxd_setup_engines(struct idxd_device *idxd)
engine = idxd->engines[i];
conf_dev = engine_confdev(engine);
put_device(conf_dev);
- kfree(engine);
+
}
kfree(idxd->engines);
@@ -374,7 +374,6 @@ static int idxd_setup_groups(struct idxd_device *idxd)
rc = dev_set_name(conf_dev, "group%d.%d", idxd->id, group->id);
if (rc < 0) {
put_device(conf_dev);
-
goto err;
}
@@ -399,7 +398,6 @@ static int idxd_setup_groups(struct idxd_device *idxd)
while (--i >= 0) {
group = idxd->groups[i];
put_device(group_confdev(group));
-
}
kfree(idxd->groups);
--
2.43.0
^ permalink raw reply related
* [PATCH] dmaengine: idxd: fix double free in idxd_setup_groups() error path
From: Guangshuo Li @ 2026-04-01 3:36 UTC (permalink / raw)
To: Vinicius Costa Gomes, Dave Jiang, Vinod Koul, Fenghua Yu,
Shuai Xue, dmaengine, linux-kernel
Cc: Guangshuo Li, stable
When an error happens after device_initialize(), idxd_setup_groups()
calls put_device(conf_dev).
The device release callback idxd_conf_group_release() frees group, but
the current error paths then call kfree(group) again, causing a double
free.
Keep the cleanup in idxd_conf_group_release() after put_device() and
avoid freeing group again in idxd_setup_groups().
Fixes: aa6f4f945b10 ("dmaengine: idxd: fix memory leak in error handling path of idxd_setup_groups")
Cc: stable@vger.kernel.org
Signed-off-by: Guangshuo Li <lgs201920130244@gmail.com>
---
drivers/dma/idxd/init.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/dma/idxd/init.c b/drivers/dma/idxd/init.c
index b782eb3c191d..d9a9d56dd277 100644
--- a/drivers/dma/idxd/init.c
+++ b/drivers/dma/idxd/init.c
@@ -374,7 +374,7 @@ static int idxd_setup_groups(struct idxd_device *idxd)
rc = dev_set_name(conf_dev, "group%d.%d", idxd->id, group->id);
if (rc < 0) {
put_device(conf_dev);
- kfree(group);
+
goto err;
}
@@ -399,7 +399,7 @@ static int idxd_setup_groups(struct idxd_device *idxd)
while (--i >= 0) {
group = idxd->groups[i];
put_device(group_confdev(group));
- kfree(group);
+
}
kfree(idxd->groups);
--
2.43.0
^ permalink raw reply related
* [PATCH] dmaengine: idxd: fix double free in idxd_setup_wqs() error path
From: Guangshuo Li @ 2026-04-01 3:30 UTC (permalink / raw)
To: Vinicius Costa Gomes, Dave Jiang, Vinod Koul, Dan Carpenter,
dmaengine, linux-kernel
Cc: Guangshuo Li, stable
When an error happens after device_initialize(), idxd_setup_wqs()
calls put_device(conf_dev).
The device release callback idxd_conf_wq_release() frees wq,
wq->wqcfg, and wq->opcap_bmap, but the current error paths then free
them again directly, causing a double free.
Keep the cleanup in idxd_conf_wq_release() after put_device() and
avoid freeing those objects again in idxd_setup_wqs().
Fixes: 39aaa337449e7 ("dmaengine: idxd: Fix double free in idxd_setup_wqs()")
Cc: stable@vger.kernel.org
Signed-off-by: Guangshuo Li <lgs201920130244@gmail.com>
---
drivers/dma/idxd/init.c | 14 ++++++--------
1 file changed, 6 insertions(+), 8 deletions(-)
diff --git a/drivers/dma/idxd/init.c b/drivers/dma/idxd/init.c
index 2acc34b3daff..b782eb3c191d 100644
--- a/drivers/dma/idxd/init.c
+++ b/drivers/dma/idxd/init.c
@@ -212,7 +212,7 @@ static int idxd_setup_wqs(struct idxd_device *idxd)
rc = dev_set_name(conf_dev, "wq%d.%d", idxd->id, wq->id);
if (rc < 0) {
put_device(conf_dev);
- kfree(wq);
+
goto err_unwind;
}
@@ -226,7 +226,7 @@ static int idxd_setup_wqs(struct idxd_device *idxd)
wq->wqcfg = kzalloc_node(idxd->wqcfg_size, GFP_KERNEL, dev_to_node(dev));
if (!wq->wqcfg) {
put_device(conf_dev);
- kfree(wq);
+
rc = -ENOMEM;
goto err_unwind;
}
@@ -234,9 +234,9 @@ static int idxd_setup_wqs(struct idxd_device *idxd)
if (idxd->hw.wq_cap.op_config) {
wq->opcap_bmap = bitmap_zalloc(IDXD_MAX_OPCAP_BITS, GFP_KERNEL);
if (!wq->opcap_bmap) {
- kfree(wq->wqcfg);
+
put_device(conf_dev);
- kfree(wq);
+
rc = -ENOMEM;
goto err_unwind;
}
@@ -252,12 +252,10 @@ static int idxd_setup_wqs(struct idxd_device *idxd)
err_unwind:
while (--i >= 0) {
wq = idxd->wqs[i];
- if (idxd->hw.wq_cap.op_config)
- bitmap_free(wq->opcap_bmap);
- kfree(wq->wqcfg);
+
conf_dev = wq_confdev(wq);
put_device(conf_dev);
- kfree(wq);
+
}
bitmap_free(idxd->wq_enable_map);
--
2.43.0
^ permalink raw reply related
* Re: [PATCHv2] dmaengine: hsu: use kzalloc_flex()
From: Rosen Penev @ 2026-03-31 21:30 UTC (permalink / raw)
To: Andy Shevchenko
Cc: dmaengine, Andy Shevchenko, Vinod Koul, Frank Li, Kees Cook,
Gustavo A. R. Silva,
open list:INTEL MID (Mobile Internet Device) PLATFORM,
open list:KERNEL HARDENING (not covered by other areas):Keyword:b__counted_by(_le|_be)?b
In-Reply-To: <CAHp75Vdvn9n_qgBsXTBw8mRxdJcrmCi01JfAGz7oTkKQ1uXBmw@mail.gmail.com>
On Mon, Mar 30, 2026 at 9:29 PM Andy Shevchenko
<andy.shevchenko@gmail.com> wrote:
>
> On Mon, Mar 30, 2026 at 11:41 PM Rosen Penev <rosenp@gmail.com> wrote:
> > On Mon, Mar 30, 2026 at 1:46 AM Andy Shevchenko
> > <andy.shevchenko@gmail.com> wrote:
> > > On Sat, Mar 28, 2026 at 9:17 PM Rosen Penev <rosenp@gmail.com> wrote:
>
> ...
>
> > > > - hsu = devm_kzalloc(chip->dev, sizeof(*hsu), GFP_KERNEL);
> > > > + /* Calculate nr_channels from the IO space length */
> > > > + nr_channels = (chip->length - chip->offset) / HSU_DMA_CHAN_LENGTH;
> > > > + hsu = devm_kzalloc(chip->dev, struct_size(hsu, chan, nr_channels), GFP_KERNEL);
> > > > if (!hsu)
> > > > return -ENOMEM;
> > > >
> > > > - chip->hsu = hsu;
> > > > -
> > > > - /* Calculate nr_channels from the IO space length */
> > > > - hsu->nr_channels = (chip->length - chip->offset) / HSU_DMA_CHAN_LENGTH;
> > > > + hsu->nr_channels = nr_channels;
> > > >
> > > > - hsu->chan = devm_kcalloc(chip->dev, hsu->nr_channels,
> > > > - sizeof(*hsu->chan), GFP_KERNEL);
> > > > - if (!hsu->chan)
> > > > - return -ENOMEM;
> > > > + chip->hsu = hsu;
> > >
> > > Don't know these _flex() APIs enough, but can we leave the chip->hsu =
> > > hsu; in the same place as it's now?
> > __counted_by requires the first assignment after allocation to be the
> > counting variable. The _flex macros do this automatically for GCC15
> > and above.
>
> Why? The hsu member has nothing to do with VLA, where is this
> requirement coming from? My understanding is that the check should
> imply the minimum sizeof of the data structure and the compiler should
> know that way before doing any allocations.
Not sure I follow. This patch changes hsu's chan member to a FAM.
Where is VLA coming from?
The current code is devm_kzalloc(x, struct_size()). When it gets
introduced, I'm sure there will be a treewide conversion to
devm_kzalloc_flex, which would automatically set the counting variable
for >=GCC15.
It's best practice to assign right after since kzalloc_flex does it anyways.
>
> My understanding seems in align with what Gustavo blogged:
> https://people.kernel.org/gustavoars/how-to-use-the-new-counted_by-attribute-in-c-and-linux
>
> The same is written in the GCC patch description
> https://gcc.gnu.org/pipermail/gcc-patches/2024-May/653123.html
>
> --
> With Best Regards,
> Andy Shevchenko
^ permalink raw reply
* Re: [PATCH v6 00/10] Add GPCDMA support in Tegra264
From: Jon Hunter @ 2026-03-31 18:06 UTC (permalink / raw)
To: Akhil R, Vinod Koul, Frank Li, Rob Herring, Krzysztof Kozlowski,
Conor Dooley, Thierry Reding, Laxman Dewangan, Philipp Zabel,
dmaengine, devicetree, linux-tegra, linux-kernel
In-Reply-To: <20260331102303.33181-1-akhilrajeev@nvidia.com>
On 31/03/2026 11:22, Akhil R wrote:
> This series adds support for GPCDMA in Tegra264 with additional
> support for separate stream ID for each channel. Tegra264 GPCDMA
> controller has changes in the register offsets and uses 41-bit
> addressing for memory. Add changes in the tegra186-gpc-dma driver
> to support these.
>
> v5->v6:
> - Replace dev_err() with dev_err_probe() in the probe function for fixed
> return values also.
> v4->v5:
> - Use dev_err_probe() when returning error from the probe function.
> - Remove tegra194 and tegra234 compatible from the reset 'if' condition
> in the bindings as suggested in v2 (which I missed).
> v3->v4:
> - Split device tree changes to two patches.
> - Reordered patches to have fixes first.
> - Added fixes tag to dt-bindings and device tree changes.
> v2->v3:
> - Add description for iommu-map property and update commit descriptions.
> - Use enum for compatible string instead of const.
> - Remove unused registers from struct tegra_dma_channel_regs.
> - Use devm_of_dma_controller_register() to register the DMA controller.
> - Remove return value check for mask setting in the driver as the bitmask
> value is always greater than 32.
> v1->v2:
> - Fix dt_bindings_check warnings
> - Drop fallback compatible "nvidia,tegra186-gpcdma" from Tegra264 DT
> - Use dma_addr_t for sg_req src/dst fields and drop separate high_add
> variable and check for the addr_bits only when programming the
> registers.
> - Update address width to 39 bits for Tegra234 and before since the SMMU
> supports only up to 39 bits till Tegra234.
> - Add a patch to do managed DMA controller registration.
> - Describe the second iteration in the probe.
> - Update commit descriptions.
>
> Akhil R (10):
> dt-bindings: dma: nvidia,tegra186-gpc-dma: Make reset optional
> arm64: tegra: Remove fallback compatible for GPCDMA
> dt-bindings: dma: nvidia,tegra186-gpc-dma: Add iommu-map property
> dmaengine: tegra: Make reset control optional
> dmaengine: tegra: Use struct for register offsets
> dmaengine: tegra: Support address width > 39 bits
> dmaengine: tegra: Use managed DMA controller registration
> dmaengine: tegra: Use iommu-map for stream ID
> dmaengine: tegra: Add Tegra264 support
> arm64: tegra: Enable GPCDMA in Tegra264 and add iommu-map
>
> .../bindings/dma/nvidia,tegra186-gpc-dma.yaml | 32 +-
> .../arm64/boot/dts/nvidia/tegra264-p3834.dtsi | 4 +
> arch/arm64/boot/dts/nvidia/tegra264.dtsi | 3 +-
> drivers/dma/tegra186-gpc-dma.c | 429 +++++++++++-------
> 4 files changed, 284 insertions(+), 184 deletions(-)
>
For the series ...
Reviewed-by: Jon Hunter <jonathanh@nvidia.com>
Thanks!
Jon
--
nvpublic
^ permalink raw reply
* [PATCH] dmaengine: xilinx_dma: Fix CPU stall in xilinx_dma_poll_timeout
From: Alex Bereza @ 2026-03-31 16:44 UTC (permalink / raw)
To: Vinod Koul, Frank Li, Michal Simek, Geert Uytterhoeven,
Ulf Hansson, Arnd Bergmann, Tony Lindgren
Cc: dmaengine, linux-arm-kernel, linux-kernel, Alex Bereza
Currently when calling xilinx_dma_poll_timeout with delay_us=0 and a
condition that is never fulfilled, the CPU busy-waits for prolonged time
and the timeout triggers only with a massive delay causing a CPU stall.
This happens due to a huge underestimation of wall clock time in
poll_timeout_us_atomic. Commit 7349a69cf312 ("iopoll: Do not use
timekeeping in read_poll_timeout_atomic()") changed the behavior to no
longer use ktime_get at the expense of underestimation of wall clock
time which appears to be very large for delay_us=0. Instead of timing
out after approximately XILINX_DMA_LOOP_COUNT microseconds, the timeout
takes XILINX_DMA_LOOP_COUNT * 1000 * (time that the overhead of the for
loop in poll_timeout_us_atomic takes) which is in the range of several
minutes for XILINX_DMA_LOOP_COUNT=1000000. Fix this by using a non-zero
value for delay_us. Use delay_us=10 to keep the delay in the hot path of
starting DMA transfers minimal but still avoid CPU stalls in case of
unexpected hardware failures.
One-off measurement with delay_us=0 causes the cpu to busy wait around 7
minutes in the timeout case. After applying this patch with delay_us=10
the measured timeout was 1053428 microseconds which is roughly
equivalent to the expected 1000000 microseconds specified in
XILINX_DMA_POLL_TIMEOUT_US.
Rename XILINX_DMA_LOOP_COUNT to XILINX_DMA_POLL_TIMEOUT_US because the
former is incorrect. It is a timeout value for polling various register
bits in microseconds. It is not a loop count. Add a constant
XILINX_DMA_POLL_DELAY_US for delay_us value.
Fixes: 7349a69cf312 ("iopoll: Do not use timekeeping in read_poll_timeout_atomic()")
Signed-off-by: Alex Bereza <alex@bereza.email>
---
Hi, in addition to this patch I also have a question: what is the point
of atomically polling for the HALTED or IDLE bit in the stop_transfer
functions? Does device_terminate_all really need to be callable from
atomic context? If not, one could switch to polling non-atomically and
avoid burning CPU cycles.
As this is my first patch, please feel free to point me in the right
direction if I am missing anything.
---
drivers/dma/xilinx/xilinx_dma.c | 26 ++++++++++++++++----------
1 file changed, 16 insertions(+), 10 deletions(-)
diff --git a/drivers/dma/xilinx/xilinx_dma.c b/drivers/dma/xilinx/xilinx_dma.c
index 02a05f215614..8556c357b665 100644
--- a/drivers/dma/xilinx/xilinx_dma.c
+++ b/drivers/dma/xilinx/xilinx_dma.c
@@ -165,8 +165,10 @@
#define XILINX_DMA_FLUSH_MM2S 2
#define XILINX_DMA_FLUSH_BOTH 1
-/* Delay loop counter to prevent hardware failure */
-#define XILINX_DMA_LOOP_COUNT 1000000
+/* Timeout for polling various registers */
+#define XILINX_DMA_POLL_TIMEOUT_US 1000000
+/* Delay between polls (avoid a delay of 0 to prevent CPU stalls) */
+#define XILINX_DMA_POLL_DELAY_US 10
/* AXI DMA Specific Registers/Offsets */
#define XILINX_DMA_REG_SRCDSTADDR 0x18
@@ -1332,8 +1334,9 @@ static int xilinx_dma_stop_transfer(struct xilinx_dma_chan *chan)
/* Wait for the hardware to halt */
return xilinx_dma_poll_timeout(chan, XILINX_DMA_REG_DMASR, val,
- val & XILINX_DMA_DMASR_HALTED, 0,
- XILINX_DMA_LOOP_COUNT);
+ val & XILINX_DMA_DMASR_HALTED,
+ XILINX_DMA_POLL_DELAY_US,
+ XILINX_DMA_POLL_TIMEOUT_US);
}
/**
@@ -1347,8 +1350,9 @@ static int xilinx_cdma_stop_transfer(struct xilinx_dma_chan *chan)
u32 val;
return xilinx_dma_poll_timeout(chan, XILINX_DMA_REG_DMASR, val,
- val & XILINX_DMA_DMASR_IDLE, 0,
- XILINX_DMA_LOOP_COUNT);
+ val & XILINX_DMA_DMASR_IDLE,
+ XILINX_DMA_POLL_DELAY_US,
+ XILINX_DMA_POLL_TIMEOUT_US);
}
/**
@@ -1364,8 +1368,9 @@ static void xilinx_dma_start(struct xilinx_dma_chan *chan)
/* Wait for the hardware to start */
err = xilinx_dma_poll_timeout(chan, XILINX_DMA_REG_DMASR, val,
- !(val & XILINX_DMA_DMASR_HALTED), 0,
- XILINX_DMA_LOOP_COUNT);
+ !(val & XILINX_DMA_DMASR_HALTED),
+ XILINX_DMA_POLL_DELAY_US,
+ XILINX_DMA_POLL_TIMEOUT_US);
if (err) {
dev_err(chan->dev, "Cannot start channel %p: %x\n",
@@ -1780,8 +1785,9 @@ static int xilinx_dma_reset(struct xilinx_dma_chan *chan)
/* Wait for the hardware to finish reset */
err = xilinx_dma_poll_timeout(chan, XILINX_DMA_REG_DMACR, tmp,
- !(tmp & XILINX_DMA_DMACR_RESET), 0,
- XILINX_DMA_LOOP_COUNT);
+ !(tmp & XILINX_DMA_DMACR_RESET),
+ XILINX_DMA_POLL_DELAY_US,
+ XILINX_DMA_POLL_TIMEOUT_US);
if (err) {
dev_err(chan->dev, "reset timeout, cr %x, sr %x\n",
---
base-commit: b7560798466a07d9c3fb011698e92c335ab28baf
change-id: 20260330-fix-atomic-poll-timeout-regression-4f4e3baf3fd7
Best regards,
--
Alex Bereza <alex@bereza.email>
^ permalink raw reply related
* Re: [PATCH] dmaengine: st_fdma: simplify allocation
From: Gustavo A. R. Silva @ 2026-03-31 15:33 UTC (permalink / raw)
To: Rosen Penev, dmaengine
Cc: Patrice Chotard, Vinod Koul, Frank Li, Kees Cook,
Gustavo A. R. Silva, moderated list:ARM/STI ARCHITECTURE,
open list,
open list:KERNEL HARDENING (not covered by other areas):Keyword:b__counted_by(_le|_be)?b
In-Reply-To: <20260330211555.13974-1-rosenp@gmail.com>
On 3/30/26 15:15, Rosen Penev wrote:
> Use a flexible array member to combine kzalloc and kcalloc to a single
> allocation.
>
> Add __counted_by for extra runtime analysis.
Assign counting variable
> after allocation as required by __counted_by.
This is misinformation and should be phrased differently[1]
-Gustavo
[1] https://lore.kernel.org/linux-hardening/37378f49-437f-438b-ad6c-d60480feb306@embeddedor.com/
>
> Signed-off-by: Rosen Penev <rosenp@gmail.com>
> ---
> drivers/dma/st_fdma.c | 27 ++++++++-------------------
> drivers/dma/st_fdma.h | 4 ++--
> 2 files changed, 10 insertions(+), 21 deletions(-)
>
> diff --git a/drivers/dma/st_fdma.c b/drivers/dma/st_fdma.c
> index d9547017f3bd..3ec0d6731b8d 100644
> --- a/drivers/dma/st_fdma.c
> +++ b/drivers/dma/st_fdma.c
> @@ -710,16 +710,6 @@ static const struct of_device_id st_fdma_match[] = {
> };
> MODULE_DEVICE_TABLE(of, st_fdma_match);
>
> -static int st_fdma_parse_dt(struct platform_device *pdev,
> - const struct st_fdma_driverdata *drvdata,
> - struct st_fdma_dev *fdev)
> -{
> - snprintf(fdev->fw_name, FW_NAME_SIZE, "fdma_%s_%d.elf",
> - drvdata->name, drvdata->id);
> -
> - return of_property_read_u32(pdev->dev.of_node, "dma-channels",
> - &fdev->nr_channels);
> -}
> #define FDMA_DMA_BUSWIDTHS (BIT(DMA_SLAVE_BUSWIDTH_1_BYTE) | \
> BIT(DMA_SLAVE_BUSWIDTH_2_BYTES) | \
> BIT(DMA_SLAVE_BUSWIDTH_3_BYTES) | \
> @@ -742,27 +732,26 @@ static int st_fdma_probe(struct platform_device *pdev)
> struct st_fdma_dev *fdev;
> struct device_node *np = pdev->dev.of_node;
> const struct st_fdma_driverdata *drvdata;
> + u32 nr_channels;
> int ret, i;
>
> drvdata = device_get_match_data(&pdev->dev);
>
> - fdev = devm_kzalloc(&pdev->dev, sizeof(*fdev), GFP_KERNEL);
> - if (!fdev)
> - return -ENOMEM;
> -
> - ret = st_fdma_parse_dt(pdev, drvdata, fdev);
> + ret = of_property_read_u32(pdev->dev.of_node, "dma-channels", &nr_channels);
> if (ret) {
> dev_err(&pdev->dev, "unable to find platform data\n");
> - goto err;
> + return ret;
> }
>
> - fdev->chans = devm_kcalloc(&pdev->dev, fdev->nr_channels,
> - sizeof(struct st_fdma_chan), GFP_KERNEL);
> - if (!fdev->chans)
> + fdev = devm_kzalloc(&pdev->dev, struct_size(fdev, chans, nr_channels), GFP_KERNEL);
> + if (!fdev)
> return -ENOMEM;
>
> + fdev->nr_channels = nr_channels;
> fdev->dev = &pdev->dev;
> fdev->drvdata = drvdata;
> + snprintf(fdev->fw_name, FW_NAME_SIZE, "fdma_%s_%d.elf", drvdata->name, drvdata->id);
> +
> platform_set_drvdata(pdev, fdev);
>
> fdev->irq = platform_get_irq(pdev, 0);
> diff --git a/drivers/dma/st_fdma.h b/drivers/dma/st_fdma.h
> index f1e746f7bc7d..27ded555879f 100644
> --- a/drivers/dma/st_fdma.h
> +++ b/drivers/dma/st_fdma.h
> @@ -136,13 +136,13 @@ struct st_fdma_dev {
>
> int irq;
>
> - struct st_fdma_chan *chans;
> -
> spinlock_t dreq_lock;
> unsigned long dreq_mask;
>
> u32 nr_channels;
> char fw_name[FW_NAME_SIZE];
> +
> + struct st_fdma_chan chans[] __counted_by(nr_channels);
> };
>
> /* Peripheral Registers*/
^ permalink raw reply
* Re: [PATCH v2 4/4] dmaengine: dma-axi-dmac: Defer freeing DMA descriptors
From: Nuno Sá @ 2026-03-31 15:21 UTC (permalink / raw)
To: Frank Li
Cc: Nuno Sá, dmaengine, linux-kernel, Lars-Peter Clausen,
Vinod Koul, Frank Li, Eliza Balas
In-Reply-To: <acvXKYJkXID9qiqM@lizhi-Precision-Tower-5810>
On Tue, Mar 31, 2026 at 10:16:09AM -0400, Frank Li wrote:
> On Tue, Mar 31, 2026 at 09:53:45AM +0100, Nuno Sá wrote:
> > On Mon, Mar 30, 2026 at 11:24:34AM -0400, Frank Li wrote:
> > > On Fri, Mar 27, 2026 at 04:58:41PM +0000, Nuno Sá wrote:
> > > > From: Eliza Balas <eliza.balas@analog.com>
> > > >
> > > > This IP core can be used in architectures (like Microblaze) where DMA
> > > > descriptors are allocated with vmalloc().
> > >
> > > strage, why use vmalloc()?
> >
> > It's just one of the paths in dma_alloc_coherent(). It should be
> > architecture dependant.
>
> Which architectures, this may common problem, dma_alloc/free_coherent() is
> quite common at other dma-engine driver.
I'll double check this but I believe this was triggered on microblaze
where we also use this IP. Will come back with confirmation!
- Nuno Sá
>
> Frank
>
> >
> > - Nuno Sá
> >
> > >
> > > Frank
> > >
> > > > Hence, given that freeing the
> > > > descriptors happen in softirq context, vunmpap() will BUG().
> > > >
> > > > To solve the above, we setup a work item during allocation of the
> > > > descriptors and schedule in softirq context. Hence, the actual freeing
> > > > happens in threaded context.
> > > >
> > > > Also note that to account for the possible race where the struct axi_dmac
> > > > object is gone between scheduling the work and actually running it, we
> > > > now save and get a reference of struct device when allocating the
> > > > descriptor (given that's all we need in axi_dmac_free_desc()) and
> > > > release it in axi_dmac_free_desc().
> > > >
> > > > Signed-off-by: Eliza Balas <eliza.balas@analog.com>
> > > > Co-developed-by: Nuno Sá <nuno.sa@analog.com>
> > > > Signed-off-by: Nuno Sá <nuno.sa@analog.com>
> > > > ---
> > > > drivers/dma/dma-axi-dmac.c | 50 ++++++++++++++++++++++++++++++++++------------
> > > > 1 file changed, 37 insertions(+), 13 deletions(-)
> > > >
> > > > diff --git a/drivers/dma/dma-axi-dmac.c b/drivers/dma/dma-axi-dmac.c
> > > > index 70d3ad7e7d37..46f1ead0c7d7 100644
> > > > --- a/drivers/dma/dma-axi-dmac.c
> > > > +++ b/drivers/dma/dma-axi-dmac.c
> > > > @@ -25,6 +25,7 @@
> > > > #include <linux/regmap.h>
> > > > #include <linux/slab.h>
> > > > #include <linux/spinlock.h>
> > > > +#include <linux/workqueue.h>
> > > >
> > > > #include <dt-bindings/dma/axi-dmac.h>
> > > >
> > > > @@ -133,6 +134,9 @@ struct axi_dmac_sg {
> > > > struct axi_dmac_desc {
> > > > struct virt_dma_desc vdesc;
> > > > struct axi_dmac_chan *chan;
> > > > + struct device *dev;
> > > > +
> > > > + struct work_struct sched_work;
> > > >
> > > > bool cyclic;
> > > > bool cyclic_eot;
> > > > @@ -666,6 +670,25 @@ static void axi_dmac_issue_pending(struct dma_chan *c)
> > > > spin_unlock_irqrestore(&chan->vchan.lock, flags);
> > > > }
> > > >
> > > > +static void axi_dmac_free_desc(struct axi_dmac_desc *desc)
> > > > +{
> > > > + struct axi_dmac_hw_desc *hw = desc->sg[0].hw;
> > > > + dma_addr_t hw_phys = desc->sg[0].hw_phys;
> > > > +
> > > > + dma_free_coherent(desc->dev, PAGE_ALIGN(desc->num_sgs * sizeof(*hw)),
> > > > + hw, hw_phys);
> > > > + put_device(desc->dev);
> > > > + kfree(desc);
> > > > +}
> > > > +
> > > > +static void axi_dmac_free_desc_schedule_work(struct work_struct *work)
> > > > +{
> > > > + struct axi_dmac_desc *desc = container_of(work, struct axi_dmac_desc,
> > > > + sched_work);
> > > > +
> > > > + axi_dmac_free_desc(desc);
> > > > +}
> > > > +
> > > > static struct axi_dmac_desc *
> > > > axi_dmac_alloc_desc(struct axi_dmac_chan *chan, unsigned int num_sgs)
> > > > {
> > > > @@ -681,6 +704,7 @@ axi_dmac_alloc_desc(struct axi_dmac_chan *chan, unsigned int num_sgs)
> > > > return NULL;
> > > > desc->num_sgs = num_sgs;
> > > > desc->chan = chan;
> > > > + desc->dev = get_device(dmac->dma_dev.dev);
> > > >
> > > > hws = dma_alloc_coherent(dev, PAGE_ALIGN(num_sgs * sizeof(*hws)),
> > > > &hw_phys, GFP_ATOMIC);
> > > > @@ -703,21 +727,18 @@ axi_dmac_alloc_desc(struct axi_dmac_chan *chan, unsigned int num_sgs)
> > > > /* The last hardware descriptor will trigger an interrupt */
> > > > desc->sg[num_sgs - 1].hw->flags = AXI_DMAC_HW_FLAG_LAST | AXI_DMAC_HW_FLAG_IRQ;
> > > >
> > > > + /*
> > > > + * We need to setup a work item because this IP can be used on archs
> > > > + * that rely on vmalloced memory for descriptors. And given that freeing
> > > > + * the descriptors happens in softirq context, vunmpap() will BUG().
> > > > + * Hence, setup the worker so that we can queue it and free the
> > > > + * descriptor in threaded context.
> > > > + */
> > > > + INIT_WORK(&desc->sched_work, axi_dmac_free_desc_schedule_work);
> > > > +
> > > > return desc;
> > > > }
> > > >
> > > > -static void axi_dmac_free_desc(struct axi_dmac_desc *desc)
> > > > -{
> > > > - struct axi_dmac *dmac = chan_to_axi_dmac(desc->chan);
> > > > - struct device *dev = dmac->dma_dev.dev;
> > > > - struct axi_dmac_hw_desc *hw = desc->sg[0].hw;
> > > > - dma_addr_t hw_phys = desc->sg[0].hw_phys;
> > > > -
> > > > - dma_free_coherent(dev, PAGE_ALIGN(desc->num_sgs * sizeof(*hw)),
> > > > - hw, hw_phys);
> > > > - kfree(desc);
> > > > -}
> > > > -
> > > > static struct axi_dmac_sg *axi_dmac_fill_linear_sg(struct axi_dmac_chan *chan,
> > > > enum dma_transfer_direction direction, dma_addr_t addr,
> > > > unsigned int num_periods, unsigned int period_len,
> > > > @@ -958,7 +979,10 @@ static void axi_dmac_free_chan_resources(struct dma_chan *c)
> > > >
> > > > static void axi_dmac_desc_free(struct virt_dma_desc *vdesc)
> > > > {
> > > > - axi_dmac_free_desc(to_axi_dmac_desc(vdesc));
> > > > + struct axi_dmac_desc *desc = to_axi_dmac_desc(vdesc);
> > > > +
> > > > + /* See the comment in axi_dmac_alloc_desc() for the why! */
> > > > + schedule_work(&desc->sched_work);
> > > > }
> > > >
> > > > static bool axi_dmac_regmap_rdwr(struct device *dev, unsigned int reg)
> > > >
> > > > --
> > > > 2.53.0
> > > >
^ permalink raw reply
* [PATCH v2] dmaengine: loongson: loongson2-apb: fix broken bus width validation in ls2x_dmac_detect_burst()
From: David Carlier @ 2026-03-31 15:01 UTC (permalink / raw)
To: Frank Li, Binbin Zhou, Vinod Koul, Yingkun Meng; +Cc: dmaengine, David Carlier
In-Reply-To: <acqkrL7CYbr0WmHf@lizhi-Precision-Tower-5810>
The bus width validation check in ls2x_dmac_detect_burst() compares raw
enum dma_slave_buswidth values (e.g. 4, 8) directly against
LDMA_SLAVE_BUSWIDTHS, which is a BIT()-encoded bitmask
(BIT(4) | BIT(8) = 0x110). Since 4 & 0x110 == 0 and 8 & 0x110 == 0,
the condition is always false for valid bus widths, making the
validation dead code.
Additionally, the logic was inverted: it rejected configurations where
both widths matched valid values, rather than rejecting when neither
width is supported.
Fix by comparing directly against the supported enum values, avoiding
BIT() which would overflow for large dma_slave_buswidth values like
DMA_SLAVE_BUSWIDTH_128_BYTES. Invert the logic to reject when neither
width is supported by the hardware.
Fixes: 71e7d3cb6e55 ("dmaengine: ls2x-apb: New driver for the Loongson LS2X APB DMA controller")
Signed-off-by: David Carlier <devnexen@gmail.com>
---
drivers/dma/loongson/loongson2-apb-dma.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/dma/loongson/loongson2-apb-dma.c b/drivers/dma/loongson/loongson2-apb-dma.c
index aceb069e71fc..d219f40cb0ae 100644
--- a/drivers/dma/loongson/loongson2-apb-dma.c
+++ b/drivers/dma/loongson/loongson2-apb-dma.c
@@ -220,8 +220,10 @@ static size_t ls2x_dmac_detect_burst(struct ls2x_dma_chan *lchan)
u32 maxburst, buswidth;
/* Reject definitely invalid configurations */
- if ((lchan->sconfig.src_addr_width & LDMA_SLAVE_BUSWIDTHS) &&
- (lchan->sconfig.dst_addr_width & LDMA_SLAVE_BUSWIDTHS))
+ if (lchan->sconfig.src_addr_width != DMA_SLAVE_BUSWIDTH_4_BYTES &&
+ lchan->sconfig.src_addr_width != DMA_SLAVE_BUSWIDTH_8_BYTES &&
+ lchan->sconfig.dst_addr_width != DMA_SLAVE_BUSWIDTH_4_BYTES &&
+ lchan->sconfig.dst_addr_width != DMA_SLAVE_BUSWIDTH_8_BYTES)
return 0;
if (lchan->sconfig.direction == DMA_MEM_TO_DEV) {
--
2.53.0
^ permalink raw reply related
* Re: [PATCH] dmaengine: loongson: loongson2-apb: fix broken bus width validation in ls2x_dmac_detect_burst()
From: Frank Li @ 2026-03-31 14:23 UTC (permalink / raw)
To: David CARLIER; +Cc: Binbin Zhou, Vinod Koul, Frank Li, Yingkun Meng, dmaengine
In-Reply-To: <CA+XhMqyn_jTgQMpMT9n958qm=1m1bFG+hNjCeqm6eaGqRwU7+A@mail.gmail.com>
On Mon, Mar 30, 2026 at 05:57:44PM +0100, David CARLIER wrote:
> Hi,
>
> On Mon, 30 Mar 2026 at 17:28, Frank Li <Frank.li@nxp.com> wrote:
> >
> > On Wed, Mar 18, 2026 at 04:48:03PM +0000, David Carlier wrote:
> > > The bus width validation check in ls2x_dmac_detect_burst() compares raw
> > > enum dma_slave_buswidth values (e.g. 4, 8) directly against
> > > LDMA_SLAVE_BUSWIDTHS, which is a BIT()-encoded bitmask
> > > (BIT(4) | BIT(8) = 0x110). Since 4 & 0x110 == 0 and 8 & 0x110 == 0,
> > > the condition is always false for valid bus widths, making the
> > > validation dead code.
> > >
> > > Additionally, the logic was inverted: it rejected configurations where
> > > both widths matched valid values, rather than rejecting when neither
> > > width is supported.
> > >
> > > Fix by wrapping the enum values with BIT() before masking (matching the
> > > pattern used in sun6i-dma.c) and inverting the logic to reject when
> > > neither width is supported by the hardware.
> > >
> > > Fixes: 71e7d3cb6e55 ("dmaengine: ls2x-apb: New driver for the Loongson LS2X APB DMA controller")
> > > Signed-off-by: David Carlier <devnexen@gmail.com>
> > > ---
> > > drivers/dma/loongson/loongson2-apb-dma.c | 4 ++--
> > > 1 file changed, 2 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/drivers/dma/loongson/loongson2-apb-dma.c b/drivers/dma/loongson/loongson2-apb-dma.c
> > > index aceb069e71fc..102c01f993ef 100644
> > > --- a/drivers/dma/loongson/loongson2-apb-dma.c
> > > +++ b/drivers/dma/loongson/loongson2-apb-dma.c
> > > @@ -220,8 +220,8 @@ static size_t ls2x_dmac_detect_burst(struct ls2x_dma_chan *lchan)
> > > u32 maxburst, buswidth;
> > >
> > > /* Reject definitely invalid configurations */
> > > - if ((lchan->sconfig.src_addr_width & LDMA_SLAVE_BUSWIDTHS) &&
> > > - (lchan->sconfig.dst_addr_width & LDMA_SLAVE_BUSWIDTHS))
> > > + if (!(BIT(lchan->sconfig.src_addr_width) & LDMA_SLAVE_BUSWIDTHS) &&
> > > + !(BIT(lchan->sconfig.dst_addr_width) & LDMA_SLAVE_BUSWIDTHS))
> >
> > src_addr_width is enum dma_slave_buswidth, which allow
> > DMA_SLAVE_BUSWIDTH_128_BYTES = 128,
> >
> > BIT(128) will overflow.
>
> Thanks for the review Frank. You're right that BIT() would overflow
> for DMA_SLAVE_BUSWIDTH_128_BYTES. While this driver only supports
> 4-byte and 8-byte widths
> today, relying on BIT() for buswidth validation is fragile. I'll
> send a v2 that avoids BIT() altogether — would a direct comparison
> against the supported widths work
> for you, or do you have a preferred pattern in mind?
I think LDMA_SLAVE_BUSWIDTHS use enum dma_slave_buswidth OR together.
Frank
>
> >
> > Frank
> >
> > > return 0;
> > >
> > > if (lchan->sconfig.direction == DMA_MEM_TO_DEV) {
> > > --
> > > 2.53.0
> > >
^ permalink raw reply
* Re: [PATCH v2 3/4] dmaengine: dma-axi-dmac: fix use-after-free on unbind
From: Frank Li @ 2026-03-31 14:20 UTC (permalink / raw)
To: Nuno Sá
Cc: Nuno Sá, dmaengine, linux-kernel, Lars-Peter Clausen,
Vinod Koul, Frank Li
In-Reply-To: <acuI7MOEMmAgGwve@nsa>
On Tue, Mar 31, 2026 at 09:46:35AM +0100, Nuno Sá wrote:
> On Mon, Mar 30, 2026 at 11:22:10AM -0400, Frank Li wrote:
> > On Fri, Mar 27, 2026 at 04:58:40PM +0000, Nuno Sá wrote:
> > > The DMA device lifetime can extend beyond the platform driver unbind if
> > > DMA channels are still referenced by client drivers. This leads to
> > > use-after-free when the devm-managed memory is freed on unbind but the
> > > DMA device callbacks still access it.
> > >
> > > Fix this by:
> > > - Allocating axi_dmac with kzalloc_obj() instead of devm_kzalloc() so
> > > its lifetime is not tied to the platform device.
> > > - Implementing the device_release callback that so that we can free
> > > the object when reference count gets to 0 (no users).
> > > - Adding an 'unbound' flag protected by the vchan lock that is set
> > > during driver removal, preventing MMIO accesses after the device has been
> > > unbound.
> > >
> > > While at it, explicitly include spinlock.h given it was missing.
> > >
> > > Signed-off-by: Nuno Sá <nuno.sa@analog.com>
> > > ---
> >
> > Not sure if it similar with
> > https://lore.kernel.org/dmaengine/20250903-v6-16-topic-sdma-v1-9-ac7bab629e8b@pengutronix.de/
> >
> > It looks like miss device link between comsumer and provider.
>
> Well, it surely it's related. I mean, if we ensure the consumers are
> gone through devlinks and nothing is left behind, then this patch is basically unneeded.
>
> But, FWIW, my 2cents would also go into questioning if AUTOREMOVE is
> really want we want in every situation? Might be to harsh to assume that
> a DMA channel consumer is useless even if DMA is gone. Anyways, is there
> a v2 already? I would be interested in following this one...
This patch may be missed by vnod, I have asked vnod to check again. The
open question link to channel device or dma enginee device. I prefer link
to channel devices, so it support more advance's runtime pm management.
Frank
>
> - Nuno Sá
>
> >
> > Frank
> >
> > > drivers/dma/dma-axi-dmac.c | 70 +++++++++++++++++++++++++++++++++++++++-------
> > > 1 file changed, 60 insertions(+), 10 deletions(-)
> > >
> > > diff --git a/drivers/dma/dma-axi-dmac.c b/drivers/dma/dma-axi-dmac.c
> > > index 127c3cf80a0e..70d3ad7e7d37 100644
> > > --- a/drivers/dma/dma-axi-dmac.c
> > > +++ b/drivers/dma/dma-axi-dmac.c
> > > @@ -24,6 +24,7 @@
> > > #include <linux/platform_device.h>
> > > #include <linux/regmap.h>
> > > #include <linux/slab.h>
> > > +#include <linux/spinlock.h>
> > >
> > > #include <dt-bindings/dma/axi-dmac.h>
> > >
> > > @@ -174,6 +175,8 @@ struct axi_dmac {
> > >
> > > struct dma_device dma_dev;
> > > struct axi_dmac_chan chan;
> > > +
> > > + bool unbound;
> > > };
> > >
> > > static struct axi_dmac *chan_to_axi_dmac(struct axi_dmac_chan *chan)
> > > @@ -182,6 +185,11 @@ static struct axi_dmac *chan_to_axi_dmac(struct axi_dmac_chan *chan)
> > > dma_dev);
> > > }
> > >
> > > +static struct axi_dmac *dev_to_axi_dmac(struct dma_device *dev)
> > > +{
> > > + return container_of(dev, struct axi_dmac, dma_dev);
> > > +}
> > > +
> > > static struct axi_dmac_chan *to_axi_dmac_chan(struct dma_chan *c)
> > > {
> > > return container_of(c, struct axi_dmac_chan, vchan.chan);
> > > @@ -614,7 +622,12 @@ static int axi_dmac_terminate_all(struct dma_chan *c)
> > > LIST_HEAD(head);
> > >
> > > spin_lock_irqsave(&chan->vchan.lock, flags);
> > > - axi_dmac_write(dmac, AXI_DMAC_REG_CTRL, 0);
> > > + /*
> > > + * Only allow the MMIO access if the device is live. Otherwise still
> > > + * go for freeing the descriptors.
> > > + */
> > > + if (!dmac->unbound)
> > > + axi_dmac_write(dmac, AXI_DMAC_REG_CTRL, 0);
> > > chan->next_desc = NULL;
> > > vchan_get_all_descriptors(&chan->vchan, &head);
> > > list_splice_tail_init(&chan->active_descs, &head);
> > > @@ -642,9 +655,12 @@ static void axi_dmac_issue_pending(struct dma_chan *c)
> > > if (chan->hw_sg)
> > > ctrl |= AXI_DMAC_CTRL_ENABLE_SG;
> > >
> > > - axi_dmac_write(dmac, AXI_DMAC_REG_CTRL, ctrl);
> > > -
> > > spin_lock_irqsave(&chan->vchan.lock, flags);
> > > + if (dmac->unbound) {
> > > + spin_unlock_irqrestore(&chan->vchan.lock, flags);
> > > + return;
> > > + }
> > > + axi_dmac_write(dmac, AXI_DMAC_REG_CTRL, ctrl);
> > > if (vchan_issue_pending(&chan->vchan))
> > > axi_dmac_start_transfer(chan);
> > > spin_unlock_irqrestore(&chan->vchan.lock, flags);
> > > @@ -1184,6 +1200,14 @@ static int axi_dmac_detect_caps(struct axi_dmac *dmac, unsigned int version)
> > > return 0;
> > > }
> > >
> > > +static void axi_dmac_release(struct dma_device *dma_dev)
> > > +{
> > > + struct axi_dmac *dmac = dev_to_axi_dmac(dma_dev);
> > > +
> > > + put_device(dma_dev->dev);
> > > + kfree(dmac);
> > > +}
> > > +
> > > static void axi_dmac_tasklet_kill(void *task)
> > > {
> > > tasklet_kill(task);
> > > @@ -1194,16 +1218,27 @@ static void axi_dmac_free_dma_controller(void *of_node)
> > > of_dma_controller_free(of_node);
> > > }
> > >
> > > +static void axi_dmac_disable(void *__dmac)
> > > +{
> > > + struct axi_dmac *dmac = __dmac;
> > > + unsigned long flags;
> > > +
> > > + spin_lock_irqsave(&dmac->chan.vchan.lock, flags);
> > > + dmac->unbound = true;
> > > + spin_unlock_irqrestore(&dmac->chan.vchan.lock, flags);
> > > + axi_dmac_write(dmac, AXI_DMAC_REG_CTRL, 0);
> > > +}
> > > +
> > > static int axi_dmac_probe(struct platform_device *pdev)
> > > {
> > > struct dma_device *dma_dev;
> > > - struct axi_dmac *dmac;
> > > + struct axi_dmac *__dmac;
> > > struct regmap *regmap;
> > > unsigned int version;
> > > u32 irq_mask = 0;
> > > int ret;
> > >
> > > - dmac = devm_kzalloc(&pdev->dev, sizeof(*dmac), GFP_KERNEL);
> > > + struct axi_dmac *dmac __free(kfree) = kzalloc_obj(struct axi_dmac);
> > > if (!dmac)
> > > return -ENOMEM;
> > >
> > > @@ -1251,6 +1286,7 @@ static int axi_dmac_probe(struct platform_device *pdev)
> > > dma_dev->dev = &pdev->dev;
> > > dma_dev->src_addr_widths = BIT(dmac->chan.src_width);
> > > dma_dev->dst_addr_widths = BIT(dmac->chan.dest_width);
> > > + dma_dev->device_release = axi_dmac_release;
> > > dma_dev->directions = BIT(dmac->chan.direction);
> > > dma_dev->residue_granularity = DMA_RESIDUE_GRANULARITY_DESCRIPTOR;
> > > dma_dev->max_sg_burst = 31; /* 31 SGs maximum in one burst */
> > > @@ -1285,12 +1321,21 @@ static int axi_dmac_probe(struct platform_device *pdev)
> > > if (ret)
> > > return ret;
> > >
> > > + /*
> > > + * From this point on, our dmac object has it's lifetime bounded with
> > > + * dma_dev. Will be freed when dma_dev refcount goes to 0. That means,
> > > + * no more automatic kfree(). Also note that dmac is now NULL so we
> > > + * need __dmac.
> > > + */
> > > + __dmac = no_free_ptr(dmac);
> > > + get_device(&pdev->dev);
> > > +
> > > /*
> > > * Put the action in here so it get's done before unregistering the DMA
> > > * device.
> > > */
> > > ret = devm_add_action_or_reset(&pdev->dev, axi_dmac_tasklet_kill,
> > > - &dmac->chan.vchan.task);
> > > + &__dmac->chan.vchan.task);
> > > if (ret)
> > > return ret;
> > >
> > > @@ -1304,13 +1349,18 @@ static int axi_dmac_probe(struct platform_device *pdev)
> > > if (ret)
> > > return ret;
> > >
> > > - ret = devm_request_irq(&pdev->dev, dmac->irq, axi_dmac_interrupt_handler,
> > > - IRQF_SHARED, dev_name(&pdev->dev), dmac);
> > > + /* So that we can mark the device as unbound and disable it */
> > > + ret = devm_add_action_or_reset(&pdev->dev, axi_dmac_disable, __dmac);
> > > if (ret)
> > > return ret;
> > >
> > > - regmap = devm_regmap_init_mmio(&pdev->dev, dmac->base,
> > > - &axi_dmac_regmap_config);
> > > + ret = devm_request_irq(&pdev->dev, __dmac->irq, axi_dmac_interrupt_handler,
> > > + IRQF_SHARED, dev_name(&pdev->dev), __dmac);
> > > + if (ret)
> > > + return ret;
> > > +
> > > + regmap = devm_regmap_init_mmio(&pdev->dev, __dmac->base,
> > > + &axi_dmac_regmap_config);
> > >
> > > return PTR_ERR_OR_ZERO(regmap);
> > > }
> > >
> > > --
> > > 2.53.0
> > >
^ permalink raw reply
* Re: [PATCH v3 4/5] clk: spacemit: k3: mark top_dclk as CLK_IS_CRITICAL
From: Brian Masney @ 2026-03-31 14:16 UTC (permalink / raw)
To: Troy Mitchell
Cc: Vinod Koul, Frank Li, Rob Herring, Krzysztof Kozlowski,
Conor Dooley, Yixun Lan, Guodong Xu, Michael Turquette,
Stephen Boyd, Paul Walmsley, Palmer Dabbelt, Albert Ou,
Alexandre Ghiti, dmaengine, devicetree, linux-riscv, spacemit,
linux-kernel, linux-clk
In-Reply-To: <20260331-k3-pdma-v3-4-a4e60dd8b4b3@linux.spacemit.com>
On Tue, Mar 31, 2026 at 04:27:07PM +0800, Troy Mitchell wrote:
> top_dclk is the DDR bus clock. If it is gated by clk_disable_unused,
> all memory-mapped bus transactions cease to function, causing DMA
> engines to hang and general system instability.
>
> Mark it CLK_IS_CRITICAL so the CCF never gates it during the
> unused clock sweep.
>
> Fixes: e371a77255b8 ("clk: spacemit: k3: add the clock tree")
> Signed-off-by: Troy Mitchell <troy.mitchell@linux.spacemit.com>
Reviewed-by: Brian Masney <bmasney@redhat.com>
^ permalink raw reply
* Re: [PATCH v2 4/4] dmaengine: dma-axi-dmac: Defer freeing DMA descriptors
From: Frank Li @ 2026-03-31 14:16 UTC (permalink / raw)
To: Nuno Sá
Cc: Nuno Sá, dmaengine, linux-kernel, Lars-Peter Clausen,
Vinod Koul, Frank Li, Eliza Balas
In-Reply-To: <acuJ-Girr0ozQHh2@nsa>
On Tue, Mar 31, 2026 at 09:53:45AM +0100, Nuno Sá wrote:
> On Mon, Mar 30, 2026 at 11:24:34AM -0400, Frank Li wrote:
> > On Fri, Mar 27, 2026 at 04:58:41PM +0000, Nuno Sá wrote:
> > > From: Eliza Balas <eliza.balas@analog.com>
> > >
> > > This IP core can be used in architectures (like Microblaze) where DMA
> > > descriptors are allocated with vmalloc().
> >
> > strage, why use vmalloc()?
>
> It's just one of the paths in dma_alloc_coherent(). It should be
> architecture dependant.
Which architectures, this may common problem, dma_alloc/free_coherent() is
quite common at other dma-engine driver.
Frank
>
> - Nuno Sá
>
> >
> > Frank
> >
> > > Hence, given that freeing the
> > > descriptors happen in softirq context, vunmpap() will BUG().
> > >
> > > To solve the above, we setup a work item during allocation of the
> > > descriptors and schedule in softirq context. Hence, the actual freeing
> > > happens in threaded context.
> > >
> > > Also note that to account for the possible race where the struct axi_dmac
> > > object is gone between scheduling the work and actually running it, we
> > > now save and get a reference of struct device when allocating the
> > > descriptor (given that's all we need in axi_dmac_free_desc()) and
> > > release it in axi_dmac_free_desc().
> > >
> > > Signed-off-by: Eliza Balas <eliza.balas@analog.com>
> > > Co-developed-by: Nuno Sá <nuno.sa@analog.com>
> > > Signed-off-by: Nuno Sá <nuno.sa@analog.com>
> > > ---
> > > drivers/dma/dma-axi-dmac.c | 50 ++++++++++++++++++++++++++++++++++------------
> > > 1 file changed, 37 insertions(+), 13 deletions(-)
> > >
> > > diff --git a/drivers/dma/dma-axi-dmac.c b/drivers/dma/dma-axi-dmac.c
> > > index 70d3ad7e7d37..46f1ead0c7d7 100644
> > > --- a/drivers/dma/dma-axi-dmac.c
> > > +++ b/drivers/dma/dma-axi-dmac.c
> > > @@ -25,6 +25,7 @@
> > > #include <linux/regmap.h>
> > > #include <linux/slab.h>
> > > #include <linux/spinlock.h>
> > > +#include <linux/workqueue.h>
> > >
> > > #include <dt-bindings/dma/axi-dmac.h>
> > >
> > > @@ -133,6 +134,9 @@ struct axi_dmac_sg {
> > > struct axi_dmac_desc {
> > > struct virt_dma_desc vdesc;
> > > struct axi_dmac_chan *chan;
> > > + struct device *dev;
> > > +
> > > + struct work_struct sched_work;
> > >
> > > bool cyclic;
> > > bool cyclic_eot;
> > > @@ -666,6 +670,25 @@ static void axi_dmac_issue_pending(struct dma_chan *c)
> > > spin_unlock_irqrestore(&chan->vchan.lock, flags);
> > > }
> > >
> > > +static void axi_dmac_free_desc(struct axi_dmac_desc *desc)
> > > +{
> > > + struct axi_dmac_hw_desc *hw = desc->sg[0].hw;
> > > + dma_addr_t hw_phys = desc->sg[0].hw_phys;
> > > +
> > > + dma_free_coherent(desc->dev, PAGE_ALIGN(desc->num_sgs * sizeof(*hw)),
> > > + hw, hw_phys);
> > > + put_device(desc->dev);
> > > + kfree(desc);
> > > +}
> > > +
> > > +static void axi_dmac_free_desc_schedule_work(struct work_struct *work)
> > > +{
> > > + struct axi_dmac_desc *desc = container_of(work, struct axi_dmac_desc,
> > > + sched_work);
> > > +
> > > + axi_dmac_free_desc(desc);
> > > +}
> > > +
> > > static struct axi_dmac_desc *
> > > axi_dmac_alloc_desc(struct axi_dmac_chan *chan, unsigned int num_sgs)
> > > {
> > > @@ -681,6 +704,7 @@ axi_dmac_alloc_desc(struct axi_dmac_chan *chan, unsigned int num_sgs)
> > > return NULL;
> > > desc->num_sgs = num_sgs;
> > > desc->chan = chan;
> > > + desc->dev = get_device(dmac->dma_dev.dev);
> > >
> > > hws = dma_alloc_coherent(dev, PAGE_ALIGN(num_sgs * sizeof(*hws)),
> > > &hw_phys, GFP_ATOMIC);
> > > @@ -703,21 +727,18 @@ axi_dmac_alloc_desc(struct axi_dmac_chan *chan, unsigned int num_sgs)
> > > /* The last hardware descriptor will trigger an interrupt */
> > > desc->sg[num_sgs - 1].hw->flags = AXI_DMAC_HW_FLAG_LAST | AXI_DMAC_HW_FLAG_IRQ;
> > >
> > > + /*
> > > + * We need to setup a work item because this IP can be used on archs
> > > + * that rely on vmalloced memory for descriptors. And given that freeing
> > > + * the descriptors happens in softirq context, vunmpap() will BUG().
> > > + * Hence, setup the worker so that we can queue it and free the
> > > + * descriptor in threaded context.
> > > + */
> > > + INIT_WORK(&desc->sched_work, axi_dmac_free_desc_schedule_work);
> > > +
> > > return desc;
> > > }
> > >
> > > -static void axi_dmac_free_desc(struct axi_dmac_desc *desc)
> > > -{
> > > - struct axi_dmac *dmac = chan_to_axi_dmac(desc->chan);
> > > - struct device *dev = dmac->dma_dev.dev;
> > > - struct axi_dmac_hw_desc *hw = desc->sg[0].hw;
> > > - dma_addr_t hw_phys = desc->sg[0].hw_phys;
> > > -
> > > - dma_free_coherent(dev, PAGE_ALIGN(desc->num_sgs * sizeof(*hw)),
> > > - hw, hw_phys);
> > > - kfree(desc);
> > > -}
> > > -
> > > static struct axi_dmac_sg *axi_dmac_fill_linear_sg(struct axi_dmac_chan *chan,
> > > enum dma_transfer_direction direction, dma_addr_t addr,
> > > unsigned int num_periods, unsigned int period_len,
> > > @@ -958,7 +979,10 @@ static void axi_dmac_free_chan_resources(struct dma_chan *c)
> > >
> > > static void axi_dmac_desc_free(struct virt_dma_desc *vdesc)
> > > {
> > > - axi_dmac_free_desc(to_axi_dmac_desc(vdesc));
> > > + struct axi_dmac_desc *desc = to_axi_dmac_desc(vdesc);
> > > +
> > > + /* See the comment in axi_dmac_alloc_desc() for the why! */
> > > + schedule_work(&desc->sched_work);
> > > }
> > >
> > > static bool axi_dmac_regmap_rdwr(struct device *dev, unsigned int reg)
> > >
> > > --
> > > 2.53.0
> > >
^ permalink raw reply
* Re: [PATCH v6 08/10] dmaengine: tegra: Use iommu-map for stream ID
From: Frank Li @ 2026-03-31 14:12 UTC (permalink / raw)
To: Akhil R
Cc: Vinod Koul, Frank Li, Rob Herring, Krzysztof Kozlowski,
Conor Dooley, Thierry Reding, Jonathan Hunter, Laxman Dewangan,
Philipp Zabel, dmaengine, devicetree, linux-tegra, linux-kernel
In-Reply-To: <20260331102303.33181-9-akhilrajeev@nvidia.com>
On Tue, Mar 31, 2026 at 03:53:01PM +0530, Akhil R wrote:
> Use 'iommu-map', when provided, to get the stream ID to be programmed
> for each channel. Iterate over the channels registered and configure
> each channel device separately using of_dma_configure_id() to allow
> it to use a separate IOMMU domain for the transfer. However, do this
> in a second loop since the first loop populates the DMA device channels
> list and async_device_register() registers the channels. Both are
> prerequisites for using the channel device in the next loop.
>
> Channels will continue to use the same global stream ID if the
> 'iommu-map' property is not present in the device tree.
>
> Signed-off-by: Akhil R <akhilrajeev@nvidia.com>
> ---
Reviewed-by: Frank Li <Frank.Li@nxp.com>
> drivers/dma/tegra186-gpc-dma.c | 53 ++++++++++++++++++++++++++++------
> 1 file changed, 44 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/dma/tegra186-gpc-dma.c b/drivers/dma/tegra186-gpc-dma.c
> index 9bea2ffb3b9e..cd480d047204 100644
> --- a/drivers/dma/tegra186-gpc-dma.c
> +++ b/drivers/dma/tegra186-gpc-dma.c
> @@ -15,6 +15,7 @@
> #include <linux/module.h>
> #include <linux/of.h>
> #include <linux/of_dma.h>
> +#include <linux/of_device.h>
> #include <linux/platform_device.h>
> #include <linux/reset.h>
> #include <linux/slab.h>
> @@ -1380,9 +1381,13 @@ static int tegra_dma_program_sid(struct tegra_dma_channel *tdc, int stream_id)
> static int tegra_dma_probe(struct platform_device *pdev)
> {
> const struct tegra_dma_chip_data *cdata = NULL;
> + struct tegra_dma_channel *tdc;
> + struct tegra_dma *tdma;
> + struct dma_chan *chan;
> + struct device *chdev;
> + bool use_iommu_map = false;
> unsigned int i;
> u32 stream_id;
> - struct tegra_dma *tdma;
> int ret;
>
> cdata = of_device_get_match_data(&pdev->dev);
> @@ -1410,9 +1415,10 @@ static int tegra_dma_probe(struct platform_device *pdev)
>
> tdma->dma_dev.dev = &pdev->dev;
>
> - if (!tegra_dev_iommu_get_stream_id(&pdev->dev, &stream_id)) {
> - dev_err(&pdev->dev, "Missing iommu stream-id\n");
> - return -EINVAL;
> + use_iommu_map = of_property_present(pdev->dev.of_node, "iommu-map");
> + if (!use_iommu_map) {
> + if (!tegra_dev_iommu_get_stream_id(&pdev->dev, &stream_id))
> + return dev_err_probe(&pdev->dev, -EINVAL, "Missing iommu stream-id\n");
> }
>
> ret = device_property_read_u32(&pdev->dev, "dma-channel-mask",
> @@ -1424,9 +1430,10 @@ static int tegra_dma_probe(struct platform_device *pdev)
> tdma->chan_mask = TEGRA_GPCDMA_DEFAULT_CHANNEL_MASK;
> }
>
> + /* Initialize vchan for each channel and populate the channels list */
> INIT_LIST_HEAD(&tdma->dma_dev.channels);
> for (i = 0; i < cdata->nr_channels; i++) {
> - struct tegra_dma_channel *tdc = &tdma->channels[i];
> + tdc = &tdma->channels[i];
>
> /* Check for channel mask */
> if (!(tdma->chan_mask & BIT(i)))
> @@ -1446,10 +1453,6 @@ static int tegra_dma_probe(struct platform_device *pdev)
>
> vchan_init(&tdc->vc, &tdma->dma_dev);
> tdc->vc.desc_free = tegra_dma_desc_free;
> -
> - /* program stream-id for this channel */
> - tegra_dma_program_sid(tdc, stream_id);
> - tdc->stream_id = stream_id;
> }
>
> dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(cdata->addr_bits));
> @@ -1483,6 +1486,7 @@ static int tegra_dma_probe(struct platform_device *pdev)
> tdma->dma_dev.device_synchronize = tegra_dma_chan_synchronize;
> tdma->dma_dev.residue_granularity = DMA_RESIDUE_GRANULARITY_BURST;
>
> + /* Register the DMA device and the channels */
> ret = dmaenginem_async_device_register(&tdma->dma_dev);
> if (ret < 0) {
> dev_err_probe(&pdev->dev, ret,
> @@ -1490,6 +1494,37 @@ static int tegra_dma_probe(struct platform_device *pdev)
> return ret;
> }
>
> + /*
> + * Configure stream ID for each channel from the channels registered
> + * above. This is done in a separate iteration to ensure that only
> + * the channels available and registered for the DMA device are used.
> + */
> + list_for_each_entry(chan, &tdma->dma_dev.channels, device_node) {
> + chdev = &chan->dev->device;
> + tdc = to_tegra_dma_chan(chan);
> +
> + if (use_iommu_map) {
> + chdev->bus = pdev->dev.bus;
> + dma_coerce_mask_and_coherent(chdev, DMA_BIT_MASK(cdata->addr_bits));
> +
> + ret = of_dma_configure_id(chdev, pdev->dev.of_node,
> + true, &tdc->id);
> + if (ret)
> + return dev_err_probe(chdev, ret,
> + "Failed to configure IOMMU for channel %d\n", tdc->id);
> +
> + if (!tegra_dev_iommu_get_stream_id(chdev, &stream_id))
> + return dev_err_probe(chdev, -EINVAL,
> + "Failed to get stream ID for channel %d\n", tdc->id);
> +
> + chan->dev->chan_dma_dev = true;
> + }
> +
> + /* program stream-id for this channel */
> + tegra_dma_program_sid(tdc, stream_id);
> + tdc->stream_id = stream_id;
> + }
> +
> ret = devm_of_dma_controller_register(&pdev->dev, pdev->dev.of_node,
> tegra_dma_of_xlate, tdma);
> if (ret < 0) {
> --
> 2.50.1
>
^ permalink raw reply
* [PATCH v6 4/4] i2c: qcom-geni: Support multi-owner controllers in GPI mode
From: Mukesh Kumar Savaliya @ 2026-03-31 11:47 UTC (permalink / raw)
To: viken.dadhaniya, andi.shyti, robh, krzk+dt, conor+dt, vkoul,
Frank.Li, andersson, konradybcio, dmitry.baryshkov, linmq006,
quic_jseerapu, agross, linux-arm-msm, linux-i2c, devicetree,
linux-kernel, dmaengine
Cc: krzysztof.kozlowski, bartosz.golaszewski, bjorn.andersson,
konrad.dybcio, Mukesh Kumar Savaliya
In-Reply-To: <20260331114742.2896317-1-mukesh.savaliya@oss.qualcomm.com>
Some platforms use a QUP-based I2C controller in a configuration where the
controller is shared with another system processor. In this setup the
operating system must not assume exclusive ownership of the controller or
its associated pins.
Add support for enabling multi-owner operation when DeviceTree specifies
qcom,qup-multi-owner. When enabled, mark the underlying serial engine as
shared so the common GENI resource handling avoids selecting the "sleep"
pinctrl state, which could disrupt transfers initiated by the other
processor.
For GPI mode transfers, request lock/unlock TRE sequencing from the GPI
driver by setting a single lock_action selector per message, emitting lock
before the first message and unlock after the last message (handling the
single-message case as well). This serializes access to the shared
controller without requiring message-position flags to be passed into the
DMA engine layer.
Signed-off-by: Mukesh Kumar Savaliya <mukesh.savaliya@oss.qualcomm.com>
---
drivers/i2c/busses/i2c-qcom-geni.c | 27 ++++++++++++++++++++++++++-
1 file changed, 26 insertions(+), 1 deletion(-)
diff --git a/drivers/i2c/busses/i2c-qcom-geni.c b/drivers/i2c/busses/i2c-qcom-geni.c
index ae609bdd2ec4..1925e4ec9842 100644
--- a/drivers/i2c/busses/i2c-qcom-geni.c
+++ b/drivers/i2c/busses/i2c-qcom-geni.c
@@ -815,6 +815,14 @@ static int geni_i2c_gpi_xfer(struct geni_i2c_dev *gi2c, struct i2c_msg msgs[], i
if (i < num - 1)
peripheral.stretch = 1;
+ peripheral.lock_action = GPI_LOCK_NONE;
+ if (gi2c->se.multi_owner) {
+ if (i == 0)
+ peripheral.lock_action = GPI_LOCK_ACQUIRE;
+ else if (i == num - 1)
+ peripheral.lock_action = GPI_LOCK_RELEASE;
+ }
+
peripheral.addr = msgs[i].addr;
if (i > 0 && (!(msgs[i].flags & I2C_M_RD)))
peripheral.multi_msg = false;
@@ -1014,6 +1022,17 @@ static int geni_i2c_probe(struct platform_device *pdev)
gi2c->clk_freq_out = I2C_MAX_STANDARD_MODE_FREQ;
}
+ if (of_property_read_bool(pdev->dev.of_node, "qcom,qup-multi-owner")) {
+ /*
+ * Multi-owner controller configuration: the controller may be
+ * used by another system processor. Mark the SE as shared so
+ * common GENI resource handling can avoid pin state changes
+ * that would disrupt the other user.
+ */
+ gi2c->se.multi_owner = true;
+ dev_dbg(&pdev->dev, "I2C controller is shared with another system processor\n");
+ }
+
if (has_acpi_companion(dev))
ACPI_COMPANION_SET(&gi2c->adap.dev, ACPI_COMPANION(dev));
@@ -1089,7 +1108,9 @@ static int geni_i2c_probe(struct platform_device *pdev)
}
if (fifo_disable) {
- /* FIFO is disabled, so we can only use GPI DMA */
+ /* FIFO is disabled, so we can only use GPI DMA.
+ * SE can be shared in GSI mode between subsystems, each SS owns a GPII.
+ */
gi2c->gpi_mode = true;
ret = setup_gpi_dma(gi2c);
if (ret)
@@ -1098,6 +1119,10 @@ static int geni_i2c_probe(struct platform_device *pdev)
dev_dbg(dev, "Using GPI DMA mode for I2C\n");
} else {
gi2c->gpi_mode = false;
+
+ if (gi2c->se.multi_owner)
+ dev_err_probe(dev, -EINVAL, "I2C sharing not supported in non GSI mode\n");
+
tx_depth = geni_se_get_tx_fifo_depth(&gi2c->se);
/* I2C Master Hub Serial Elements doesn't have the HW_PARAM_0 register */
--
2.25.1
^ permalink raw reply related
* [PATCH v6 3/4] soc: qcom: geni-se: Keep pinctrl active for multi-owner controllers
From: Mukesh Kumar Savaliya @ 2026-03-31 11:47 UTC (permalink / raw)
To: viken.dadhaniya, andi.shyti, robh, krzk+dt, conor+dt, vkoul,
Frank.Li, andersson, konradybcio, dmitry.baryshkov, linmq006,
quic_jseerapu, agross, linux-arm-msm, linux-i2c, devicetree,
linux-kernel, dmaengine
Cc: krzysztof.kozlowski, bartosz.golaszewski, bjorn.andersson,
konrad.dybcio, Mukesh Kumar Savaliya
In-Reply-To: <20260331114742.2896317-1-mukesh.savaliya@oss.qualcomm.com>
On platforms where a GENI Serial Engine is shared with another system
processor, selecting the "sleep" pinctrl state can disrupt ongoing
transfers initiated by the other processor.
Teach geni_se_resources_off() to skip selecting the pinctrl sleep state
when the Serial Engine is marked as shared, while still allowing the
rest of the resource shutdown sequence to proceed.
This is required for multi-owner configurations (described via DeviceTree
with qcom,qup-multi-owner on the protocol controller node).
Signed-off-by: Mukesh Kumar Savaliya <mukesh.savaliya@oss.qualcomm.com>
---
drivers/soc/qcom/qcom-geni-se.c | 15 +++++++++++----
include/linux/soc/qcom/geni-se.h | 2 ++
2 files changed, 13 insertions(+), 4 deletions(-)
diff --git a/drivers/soc/qcom/qcom-geni-se.c b/drivers/soc/qcom/qcom-geni-se.c
index cd1779b6a91a..1a60832ace16 100644
--- a/drivers/soc/qcom/qcom-geni-se.c
+++ b/drivers/soc/qcom/qcom-geni-se.c
@@ -597,10 +597,17 @@ int geni_se_resources_off(struct geni_se *se)
if (has_acpi_companion(se->dev))
return 0;
-
- ret = pinctrl_pm_select_sleep_state(se->dev);
- if (ret)
- return ret;
+ /*
+ * Select the "sleep" pinctrl state only when the serial engine is
+ * exclusively owned by this system processor. For shared controller
+ * configurations, another system processor may still be using the pins,
+ * and switching them to "sleep" can disrupt ongoing transfers.
+ */
+ if (!se->multi_owner) {
+ ret = pinctrl_pm_select_sleep_state(se->dev);
+ if (ret)
+ return ret;
+ }
geni_se_clks_off(se);
return 0;
diff --git a/include/linux/soc/qcom/geni-se.h b/include/linux/soc/qcom/geni-se.h
index 0a984e2579fe..326744e311ce 100644
--- a/include/linux/soc/qcom/geni-se.h
+++ b/include/linux/soc/qcom/geni-se.h
@@ -63,6 +63,7 @@ struct geni_icc_path {
* @num_clk_levels: Number of valid clock levels in clk_perf_tbl
* @clk_perf_tbl: Table of clock frequency input to serial engine clock
* @icc_paths: Array of ICC paths for SE
+ * @multi_owner: True if SE is shared between multiprocessors.
*/
struct geni_se {
void __iomem *base;
@@ -72,6 +73,7 @@ struct geni_se {
unsigned int num_clk_levels;
unsigned long *clk_perf_tbl;
struct geni_icc_path icc_paths[3];
+ bool multi_owner;
};
/* Common SE registers */
--
2.25.1
^ permalink raw reply related
* [PATCH v6 2/4] dmaengine: qcom: gpi: Add lock/unlock TREs for multi-owner I2C transfers
From: Mukesh Kumar Savaliya @ 2026-03-31 11:47 UTC (permalink / raw)
To: viken.dadhaniya, andi.shyti, robh, krzk+dt, conor+dt, vkoul,
Frank.Li, andersson, konradybcio, dmitry.baryshkov, linmq006,
quic_jseerapu, agross, linux-arm-msm, linux-i2c, devicetree,
linux-kernel, dmaengine
Cc: krzysztof.kozlowski, bartosz.golaszewski, bjorn.andersson,
konrad.dybcio, Mukesh Kumar Savaliya
In-Reply-To: <20260331114742.2896317-1-mukesh.savaliya@oss.qualcomm.com>
Some platforms use a QUP-based I2C controller in a configuration where the
controller is shared with another system processor (described in DT using
qcom,qup-multi-owner). In such setups, GPI hardware lock/unlock TREs can be
used to serialize access to the controller.
Add support to emit lock and unlock TREs around I2C transfers and increase
the maximum TRE count to account for the additional elements.
Also simplify the client interface by replacing multiple boolean fields
(shared flag and message position tracking) with a single lock_action
selector (acquire/release/none), as the GPI driver only needs to know
whether to emit lock/unlock TREs for a given transfer.
Signed-off-by: Mukesh Kumar Savaliya <mukesh.savaliya@oss.qualcomm.com>
---
drivers/dma/qcom/gpi.c | 44 +++++++++++++++++++++++++++++++-
include/linux/dma/qcom-gpi-dma.h | 18 +++++++++++++
2 files changed, 61 insertions(+), 1 deletion(-)
diff --git a/drivers/dma/qcom/gpi.c b/drivers/dma/qcom/gpi.c
index 6e30f3aa401e..a1f391dd1747 100644
--- a/drivers/dma/qcom/gpi.c
+++ b/drivers/dma/qcom/gpi.c
@@ -2,6 +2,7 @@
/*
* Copyright (c) 2017-2020, The Linux Foundation. All rights reserved.
* Copyright (c) 2020, Linaro Limited
+ * Copyright (c) Qualcomm Technologies, Inc. and/or its subsidiaries.
*/
#include <dt-bindings/dma/qcom-gpi.h>
@@ -67,6 +68,14 @@
#define TRE_DMA_LEN GENMASK(23, 0)
#define TRE_DMA_IMMEDIATE_LEN GENMASK(3, 0)
+/* Lock TRE */
+#define TRE_LOCK BIT(0)
+#define TRE_MINOR_TYPE GENMASK(19, 16)
+#define TRE_MAJOR_TYPE GENMASK(23, 20)
+
+/* Unlock TRE */
+#define TRE_UNLOCK BIT(8)
+
/* Register offsets from gpi-top */
#define GPII_n_CH_k_CNTXT_0_OFFS(n, k) (0x20000 + (0x4000 * (n)) + (0x80 * (k)))
#define GPII_n_CH_k_CNTXT_0_EL_SIZE GENMASK(31, 24)
@@ -518,7 +527,7 @@ struct gpii {
bool ieob_set;
};
-#define MAX_TRE 3
+#define MAX_TRE 5
struct gpi_desc {
struct virt_dma_desc vd;
@@ -1625,12 +1634,27 @@ static int gpi_create_i2c_tre(struct gchan *chan, struct gpi_desc *desc,
unsigned long flags)
{
struct gpi_i2c_config *i2c = chan->config;
+ enum gpi_lock_action lock_action = i2c->lock_action;
struct device *dev = chan->gpii->gpi_dev->dev;
unsigned int tre_idx = 0;
dma_addr_t address;
struct gpi_tre *tre;
unsigned int i;
+ /* Optional lock TRE before transfer */
+ if (lock_action == GPI_LOCK_ACQUIRE) {
+ tre = &desc->tre[tre_idx];
+ tre_idx++;
+
+ tre->dword[0] = 0;
+ tre->dword[1] = 0;
+ tre->dword[2] = 0;
+ tre->dword[3] = u32_encode_bits(1, TRE_LOCK);
+ tre->dword[3] |= u32_encode_bits(1, TRE_FLAGS_IEOB);
+ tre->dword[3] |= u32_encode_bits(0, TRE_MINOR_TYPE);
+ tre->dword[3] |= u32_encode_bits(3, TRE_MAJOR_TYPE);
+ }
+
/* first create config tre if applicable */
if (i2c->set_config) {
tre = &desc->tre[tre_idx];
@@ -1690,6 +1714,24 @@ static int gpi_create_i2c_tre(struct gchan *chan, struct gpi_desc *desc,
if (!(flags & DMA_PREP_INTERRUPT))
tre->dword[3] |= u32_encode_bits(1, TRE_FLAGS_BEI);
+
+ /* If multi-owner and this is the release boundary, chain it */
+ if (i2c->lock_action == GPI_LOCK_RELEASE)
+ tre->dword[3] |= u32_encode_bits(1, TRE_FLAGS_CHAIN);
+ }
+
+ /* Optional unlock TRE after transfer */
+ if (lock_action == GPI_LOCK_RELEASE && i2c->op != I2C_READ) {
+ tre = &desc->tre[tre_idx];
+ tre_idx++;
+
+ tre->dword[0] = 0;
+ tre->dword[1] = 0;
+ tre->dword[2] = 0;
+ tre->dword[3] = u32_encode_bits(1, TRE_UNLOCK);
+ tre->dword[3] |= u32_encode_bits(1, TRE_FLAGS_IEOB);
+ tre->dword[3] |= u32_encode_bits(1, TRE_MINOR_TYPE);
+ tre->dword[3] |= u32_encode_bits(3, TRE_MAJOR_TYPE);
}
for (i = 0; i < tre_idx; i++)
diff --git a/include/linux/dma/qcom-gpi-dma.h b/include/linux/dma/qcom-gpi-dma.h
index 6680dd1a43c6..36cbb85499b4 100644
--- a/include/linux/dma/qcom-gpi-dma.h
+++ b/include/linux/dma/qcom-gpi-dma.h
@@ -1,6 +1,7 @@
/* SPDX-License-Identifier: GPL-2.0 */
/*
* Copyright (c) 2020, Linaro Limited
+ * Copyright (c) Qualcomm Technologies, Inc. and/or its subsidiaries.
*/
#ifndef QCOM_GPI_DMA_H
@@ -51,6 +52,21 @@ enum i2c_op {
I2C_READ,
};
+/**
+ * enum gpi_lock_action - request lock/unlock TRE sequencing
+ * @GPI_LOCK_NONE: No lock/unlock TRE requested for this transfer
+ * @GPI_LOCK_ACQUIRE: Emit a lock TRE before the transfer
+ * @GPI_LOCK_RELEASE: Emit an unlock TRE after the transfer
+ *
+ * Used by protocol drivers for multi-owner controller setups (e.g. when
+ * DeviceTree indicates the controller is shared via qcom,qup-multi-owner).
+ */
+enum gpi_lock_action {
+ GPI_LOCK_NONE = 0,
+ GPI_LOCK_ACQUIRE,
+ GPI_LOCK_RELEASE,
+};
+
/**
* struct gpi_i2c_config - i2c config for peripheral
*
@@ -65,6 +81,7 @@ enum i2c_op {
* @rx_len: receive length for buffer
* @op: i2c cmd
* @muli-msg: is part of multi i2c r-w msgs
+ * @lock_action: request lock/unlock TRE sequencing for this transfer
*/
struct gpi_i2c_config {
u8 set_config;
@@ -78,6 +95,7 @@ struct gpi_i2c_config {
u32 rx_len;
enum i2c_op op;
bool multi_msg;
+ enum gpi_lock_action lock_action;
};
#endif /* QCOM_GPI_DMA_H */
--
2.25.1
^ permalink raw reply related
* [PATCH v6 1/4] dt-bindings: i2c: qcom,i2c-geni: Document multi-owner controller support
From: Mukesh Kumar Savaliya @ 2026-03-31 11:47 UTC (permalink / raw)
To: viken.dadhaniya, andi.shyti, robh, krzk+dt, conor+dt, vkoul,
Frank.Li, andersson, konradybcio, dmitry.baryshkov, linmq006,
quic_jseerapu, agross, linux-arm-msm, linux-i2c, devicetree,
linux-kernel, dmaengine
Cc: krzysztof.kozlowski, bartosz.golaszewski, bjorn.andersson,
konrad.dybcio, Mukesh Kumar Savaliya
In-Reply-To: <20260331114742.2896317-1-mukesh.savaliya@oss.qualcomm.com>
Document a DeviceTree property to describe QUP-based I2C controllers that
are shared with one or more other system processors.
On some Qualcomm platforms, a QUP-based I2C controller may be accessed by
multiple system processors (for example, APPS and DSP). In such
configurations, the operating system must not assume exclusive ownership
of the controller or its associated hardware resources.
The new qcom,qup-multi-owner property indicates that the controller is
externally shared and that the operating system must avoid operations
which rely on sole control of the hardware.
Signed-off-by: Mukesh Kumar Savaliya <mukesh.savaliya@oss.qualcomm.com>
---
.../devicetree/bindings/i2c/qcom,i2c-geni-qcom.yaml | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/Documentation/devicetree/bindings/i2c/qcom,i2c-geni-qcom.yaml b/Documentation/devicetree/bindings/i2c/qcom,i2c-geni-qcom.yaml
index 51534953a69c..9401dc2d5052 100644
--- a/Documentation/devicetree/bindings/i2c/qcom,i2c-geni-qcom.yaml
+++ b/Documentation/devicetree/bindings/i2c/qcom,i2c-geni-qcom.yaml
@@ -60,6 +60,13 @@ properties:
power-domains:
maxItems: 1
+ qcom,qup-multi-owner:
+ type: boolean
+ description:
+ Indicates that the QUP-based controller is shared with one or more
+ other system processors and must not be assumed to have exclusive
+ ownership by the operating system.
+
reg:
maxItems: 1
--
2.25.1
^ permalink raw reply related
* [PATCH v6 0/4] Enable multi-owner I2C support for QCOM GENI controllers
From: Mukesh Kumar Savaliya @ 2026-03-31 11:47 UTC (permalink / raw)
To: viken.dadhaniya, andi.shyti, robh, krzk+dt, conor+dt, vkoul,
Frank.Li, andersson, konradybcio, dmitry.baryshkov, linmq006,
quic_jseerapu, agross, linux-arm-msm, linux-i2c, devicetree,
linux-kernel, dmaengine
Cc: krzysztof.kozlowski, bartosz.golaszewski, bjorn.andersson,
konrad.dybcio, Mukesh Kumar Savaliya
The QUP-based GENI I2C controller driver currently assumes exclusive
ownership of the controller by a single system processor. This prevents
safe use of a single I2C controller by multiple system processors
(e.g. APPS and a DSP) running the same or different operating systems.
One practical example is an EEPROM connected to an I2C controller that
needs to be accessed independently by firmware running on a DSP and by
Linux running on the application processor, without causing bus-level
interference during transfers.
This series adds support for operating a QUP GENI I2C Serial Engine in a
multi-owner configuration. Each system processor uses its own dedicated
GPI instance (GPII) as the data path between the Serial Engine and the
GSI DMA engine. As a result, controller sharing is supported only when
the I2C controller operates in GPI mode; FIFO/CPU DMA modes are not
supported for this configuration.
To serialize access at the hardware level, the GPI DMA engine is used to
emit lock and unlock Transfer Ring Elements (TREs) around I2C transfers.
The lock is acquired before the first transfer and released after the
last transfer, ensuring uninterrupted access to the controller while a
processor owns it.
In addition, when a controller is shared, the GENI common layer avoids
placing the associated GPIOs into the pinctrl "sleep" state during
runtime suspend. This prevents disruption of transfers that may still
be in progress on another system processor using the same controller
pins.
The multi-owner behavior is enabled via a DeviceTree property,
`qcom,qup-multi-owner`, on the I2C controller node. This property must be
used only when the hardware configuration requires controller sharing
and when GPI mode is enabled.
Patch overview:
1. Document the `qcom,qup-multi-owner` DeviceTree property for GENI I2C.
2. Extend the QCOM GPI DMA driver to support lock and unlock TREs with a
simplified single-field API.
3. Update the GENI common layer to keep pinctrl active for shared
controllers during runtime suspend.
4. Enable multi-owner operation in the GENI I2C driver using the new
DeviceTree property and GPI lock/unlock support.
Signed-off-by: Mukesh Kumar Savaliya <mukesh.savaliya@oss.qualcomm.com>
---
Link to V5 : https://lore.kernel.org/lkml/20241129144357.2008465-2-mukesh.savaliya@oss.qualcomm.com/
Changes in V6:
- Addressed review feedback from Krzysztof Kozlowski and other reviewers, primarily
around clarifying the feature semantics and improving the DeviceTree flag naming.
- Renamed the DeviceTree property from qcom,shared-se to qcom,qup-multi-owner to
better describe the multi-owner controller use case.
- Updated the cover letter to clearly describe the multi-owner I2C design, the
GPI-only limitation, and the role of the new qcom,qup-multi-owner flag.
- Updated the DeviceTree binding documentation to reflect the new qcom,qup-multi-owner
property and refined its description for clarity and correctness.
- [Patch 2/4] Simplify the GPI I2C interface by replacing multiple shared SE related
state flags with a single internal lock/unlock control managed entirely in the GPI
driver - Suggested by Vinod Koul.
- [Patch 3/4] Updated the GENI common layer to avoid selecting the pinctrl “sleep”
state for multi-owner controllers, preventing disruption of transfers initiated by
another system processor during runtime suspend.
- [Patch 4/4] Updated the GENI I2C driver to:
- Detect the qcom,qup-multi-owner DeviceTree property.
- Mark the underlying serial engine as shared.
- Request GPI lock and unlock TRE sequencing around I2C transfers using the
simplified single field API.
- Clarified commit messages across all patches to avoid ambiguous terminology
(such as “subsystem”), expand abbreviations, and better explain functional
requirements rather than optimizations.
- Updated copyright headers across all files wherever applicable.
- Renamed variable shared_geni_se to multi_owner to match the DT property naming.
- Changed dev_err(print_log) during probe() to dev_err_probe().
---
Mukesh Kumar Savaliya (4):
dt-bindings: i2c: qcom,i2c-geni: Document multi-owner controller
support
dmaengine: qcom: gpi: Add lock/unlock TREs for multi-owner I2C
transfers
soc: qcom: geni-se: Keep pinctrl active for multi-owner controllers
i2c: qcom-geni: Support multi-owner controllers in GPI mode
.../bindings/i2c/qcom,i2c-geni-qcom.yaml | 7 +++
drivers/dma/qcom/gpi.c | 44 ++++++++++++++++++-
drivers/i2c/busses/i2c-qcom-geni.c | 27 +++++++++++-
drivers/soc/qcom/qcom-geni-se.c | 15 +++++--
include/linux/dma/qcom-gpi-dma.h | 18 ++++++++
include/linux/soc/qcom/geni-se.h | 2 +
6 files changed, 107 insertions(+), 6 deletions(-)
--
2.25.1
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox