linux-media.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] dma-buf/fence-chain: Speed up processing of rearmed callbacks
@ 2025-08-14  8:16 Janusz Krzysztofik
  2025-08-14  8:16 ` [PATCH 1/4] dma-buf/fence-chain: Report time spent in wait_* test cases Janusz Krzysztofik
                   ` (3 more replies)
  0 siblings, 4 replies; 10+ messages in thread
From: Janusz Krzysztofik @ 2025-08-14  8:16 UTC (permalink / raw)
  To: Christian König
  Cc: Sumit Semwal, Gustavo Padovan, Chris Wilson, linux-media,
	dri-devel, linaro-mm-sig, linux-kernel, intel-gfx, intel-xe,
	Janusz Krzysztofik

CI results indicate excessive time spent on processing of wait_backward
selftest.  The test case enables signaling on each link of a 4096 links
long chain, then signals those links one after another in reverse order.
That scanario corresponds to user space potentially building a dma_fence
chain of an arbitrary length step by step, e.g. via drm_syncobj IOCTLs,
and each step waiting on a chain link it has added.

When first user starts waiting on a not yet signaled fence of a chain
link, or signaling is explicitly enabled on it, a dma_fence_chain callback
is added to a user fence of that link.  When the user fence of that chain
link is then signaled, the chain is traversed in search for a first not
signaled link and the callback is rearmed on a user fence of that link.
Each time an arbitrary user fence is signaled, all dma_fence_chain
callbacks added to it so far must be rearmed to another user fence of the
chain.  In extreme scenarios, when all N links of a chain are awaited and
then signaled in reverse order, the dma_fence_chain callback may be called
up to N * (N + 1) / 2 times (an arithmetic series).

To avoid that potential excessive accumulation of dma_fence_chain
callbacks, rearm a trimmed-down, signal only callback version to the base
fence of a previous link, if not yet signaled, otherwise just signal the
base fence of the current link instead of traversing the chain in search
for a first not signaled link and moving all callbacks collected so far to
a user fence of that link.

For more clear correspondence to a potential userspace scenario, update
the wait_* selftests so they actually wait on each link fence of the chain
instead of just enabling signaling.  For easy evaluation of the change
impact, report processing time of signaling loop of each wait_* test case.


Janusz Krzysztofik (4):
  dma-buf/fence-chain: Report time spent in wait_* test cases
  dma-buf/fence-chain: Let test cases decide which fence to wait on
  dma-buf/fence-chain: Wait on each tested chain link
  dma-buf/fence-chain: Speed up processing of rearmed callbacks

 drivers/dma-buf/dma-fence-chain.c    | 101 ++++++++++++++++----
 drivers/dma-buf/st-dma-fence-chain.c | 133 +++++++++++++++++++++------
 2 files changed, 189 insertions(+), 45 deletions(-)

-- 
2.50.1


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/4] dma-buf/fence-chain: Report time spent in wait_* test cases
  2025-08-14  8:16 [PATCH 0/4] dma-buf/fence-chain: Speed up processing of rearmed callbacks Janusz Krzysztofik
@ 2025-08-14  8:16 ` Janusz Krzysztofik
  2025-08-14  8:16 ` [PATCH 2/4] dma-buf/fence-chain: Let test cases decide which fence to wait on Janusz Krzysztofik
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 10+ messages in thread
From: Janusz Krzysztofik @ 2025-08-14  8:16 UTC (permalink / raw)
  To: Christian König
  Cc: Sumit Semwal, Gustavo Padovan, Chris Wilson, linux-media,
	dri-devel, linaro-mm-sig, linux-kernel, intel-gfx, intel-xe,
	Janusz Krzysztofik

CI results indicate excessive time spent on processing of wait_backward
selftest.  For easy comparison, report processing time of each wait_* test
case.

Suggested-by: Chris Wilson <chris.p.wilson@linux.intel.com>
Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
---
 drivers/dma-buf/st-dma-fence-chain.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/drivers/dma-buf/st-dma-fence-chain.c b/drivers/dma-buf/st-dma-fence-chain.c
index ed4b323886e43..80598da9237af 100644
--- a/drivers/dma-buf/st-dma-fence-chain.c
+++ b/drivers/dma-buf/st-dma-fence-chain.c
@@ -572,6 +572,7 @@ static int wait_forward(void *arg)
 {
 	struct fence_chains fc;
 	struct task_struct *tsk;
+	ktime_t dt;
 	int err;
 	int i;
 
@@ -587,8 +588,12 @@ static int wait_forward(void *arg)
 	get_task_struct(tsk);
 	yield_to(tsk, true);
 
+	dt = -ktime_get();
 	for (i = 0; i < fc.chain_length; i++)
 		dma_fence_signal(fc.fences[i]);
+	dt += ktime_get();
+
+	pr_info("%s: %d signals in %llu ns\n", __func__, fc.chain_length, ktime_to_ns(dt));
 
 	err = kthread_stop_put(tsk);
 
@@ -601,6 +606,7 @@ static int wait_backward(void *arg)
 {
 	struct fence_chains fc;
 	struct task_struct *tsk;
+	ktime_t dt;
 	int err;
 	int i;
 
@@ -616,8 +622,12 @@ static int wait_backward(void *arg)
 	get_task_struct(tsk);
 	yield_to(tsk, true);
 
+	dt = -ktime_get();
 	for (i = fc.chain_length; i--; )
 		dma_fence_signal(fc.fences[i]);
+	dt += ktime_get();
+
+	pr_info("%s: %d signals in %llu ns\n", __func__, fc.chain_length, ktime_to_ns(dt));
 
 	err = kthread_stop_put(tsk);
 
@@ -646,6 +656,7 @@ static int wait_random(void *arg)
 {
 	struct fence_chains fc;
 	struct task_struct *tsk;
+	ktime_t dt;
 	int err;
 	int i;
 
@@ -663,8 +674,12 @@ static int wait_random(void *arg)
 	get_task_struct(tsk);
 	yield_to(tsk, true);
 
+	dt = -ktime_get();
 	for (i = 0; i < fc.chain_length; i++)
 		dma_fence_signal(fc.fences[i]);
+	dt += ktime_get();
+
+	pr_info("%s: %d signals in %llu ns\n", __func__, fc.chain_length, ktime_to_ns(dt));
 
 	err = kthread_stop_put(tsk);
 
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/4] dma-buf/fence-chain: Let test cases decide which fence to wait on
  2025-08-14  8:16 [PATCH 0/4] dma-buf/fence-chain: Speed up processing of rearmed callbacks Janusz Krzysztofik
  2025-08-14  8:16 ` [PATCH 1/4] dma-buf/fence-chain: Report time spent in wait_* test cases Janusz Krzysztofik
@ 2025-08-14  8:16 ` Janusz Krzysztofik
  2025-08-14  8:16 ` [PATCH 3/4] dma-buf/fence-chain: Wait on each tested chain link Janusz Krzysztofik
  2025-08-14  8:16 ` [PATCH 4/4] dma-buf/fence-chain: Speed up processing of rearmed callbacks Janusz Krzysztofik
  3 siblings, 0 replies; 10+ messages in thread
From: Janusz Krzysztofik @ 2025-08-14  8:16 UTC (permalink / raw)
  To: Christian König
  Cc: Sumit Semwal, Gustavo Padovan, Chris Wilson, linux-media,
	dri-devel, linaro-mm-sig, linux-kernel, intel-gfx, intel-xe,
	Janusz Krzysztofik

Test cases that create threads around __wait_fence_chains() function now
pass info about the whole chain to those threads as an argument, with no
hint on which fence of the chain to wait on.  That decision is hard coded
into the __wait_fence_chains() function which always selects the tail of
the chain.

Since future test cases may decide to spawn threads that wait on arbitrary
links of the chain, pass a single fence as the thread argument instead.

Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
---
 drivers/dma-buf/st-dma-fence-chain.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/dma-buf/st-dma-fence-chain.c b/drivers/dma-buf/st-dma-fence-chain.c
index 80598da9237af..bff4192420d8b 100644
--- a/drivers/dma-buf/st-dma-fence-chain.c
+++ b/drivers/dma-buf/st-dma-fence-chain.c
@@ -560,9 +560,9 @@ static int signal_backward(void *arg)
 
 static int __wait_fence_chains(void *arg)
 {
-	struct fence_chains *fc = arg;
+	struct dma_fence *f = arg;
 
-	if (dma_fence_wait(fc->tail, false))
+	if (dma_fence_wait(f, false))
 		return -EIO;
 
 	return 0;
@@ -580,7 +580,7 @@ static int wait_forward(void *arg)
 	if (err)
 		return err;
 
-	tsk = kthread_run(__wait_fence_chains, &fc, "dmabuf/wait");
+	tsk = kthread_run(__wait_fence_chains, fc.tail, "dmabuf/wait");
 	if (IS_ERR(tsk)) {
 		err = PTR_ERR(tsk);
 		goto err;
@@ -614,7 +614,7 @@ static int wait_backward(void *arg)
 	if (err)
 		return err;
 
-	tsk = kthread_run(__wait_fence_chains, &fc, "dmabuf/wait");
+	tsk = kthread_run(__wait_fence_chains, fc.tail, "dmabuf/wait");
 	if (IS_ERR(tsk)) {
 		err = PTR_ERR(tsk);
 		goto err;
@@ -666,7 +666,7 @@ static int wait_random(void *arg)
 
 	randomise_fences(&fc);
 
-	tsk = kthread_run(__wait_fence_chains, &fc, "dmabuf/wait");
+	tsk = kthread_run(__wait_fence_chains, fc.tail, "dmabuf/wait");
 	if (IS_ERR(tsk)) {
 		err = PTR_ERR(tsk);
 		goto err;
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 3/4] dma-buf/fence-chain: Wait on each tested chain link
  2025-08-14  8:16 [PATCH 0/4] dma-buf/fence-chain: Speed up processing of rearmed callbacks Janusz Krzysztofik
  2025-08-14  8:16 ` [PATCH 1/4] dma-buf/fence-chain: Report time spent in wait_* test cases Janusz Krzysztofik
  2025-08-14  8:16 ` [PATCH 2/4] dma-buf/fence-chain: Let test cases decide which fence to wait on Janusz Krzysztofik
@ 2025-08-14  8:16 ` Janusz Krzysztofik
  2025-08-14  8:16 ` [PATCH 4/4] dma-buf/fence-chain: Speed up processing of rearmed callbacks Janusz Krzysztofik
  3 siblings, 0 replies; 10+ messages in thread
From: Janusz Krzysztofik @ 2025-08-14  8:16 UTC (permalink / raw)
  To: Christian König
  Cc: Sumit Semwal, Gustavo Padovan, Chris Wilson, linux-media,
	dri-devel, linaro-mm-sig, linux-kernel, intel-gfx, intel-xe,
	Janusz Krzysztofik

Userspace may build dma_fence chains of arbitrary length step by step,
e.g. via drm_syncobj IOCTLs, and each step may start waiting on a chain
link it has added.  Adjust the wait_* selftests to cover such extreme use
cases.

Having that done, don't enable signaling on each chain link when building
the chain.  There is no point in doing that as long as no user is waiting
on the link, and even then, signaling is enabled automatically as soon as
a user starts waiting on the fence.  Let individual test cases decide
which links of the chain should be waited on / need signaling enabled.

Suggested-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
---
 drivers/dma-buf/st-dma-fence-chain.c | 120 ++++++++++++++++++++-------
 1 file changed, 91 insertions(+), 29 deletions(-)

diff --git a/drivers/dma-buf/st-dma-fence-chain.c b/drivers/dma-buf/st-dma-fence-chain.c
index bff4192420d8b..0e7a24ed7caeb 100644
--- a/drivers/dma-buf/st-dma-fence-chain.c
+++ b/drivers/dma-buf/st-dma-fence-chain.c
@@ -145,8 +145,6 @@ static int fence_chains_init(struct fence_chains *fc, unsigned int count,
 		}
 
 		fc->tail = fc->chains[i];
-
-		dma_fence_enable_sw_signaling(fc->chains[i]);
 	}
 
 	fc->chain_length = i;
@@ -570,23 +568,34 @@ static int __wait_fence_chains(void *arg)
 
 static int wait_forward(void *arg)
 {
+	struct task_struct **tsk;
 	struct fence_chains fc;
-	struct task_struct *tsk;
 	ktime_t dt;
+	int i = 0;
 	int err;
-	int i;
 
 	err = fence_chains_init(&fc, CHAIN_SZ, seqno_inc);
 	if (err)
 		return err;
 
-	tsk = kthread_run(__wait_fence_chains, fc.tail, "dmabuf/wait");
-	if (IS_ERR(tsk)) {
-		err = PTR_ERR(tsk);
+	tsk = kmalloc_array(fc.chain_length, sizeof(*tsk), GFP_KERNEL);
+	if (!tsk) {
+		err = -ENOMEM;
 		goto err;
 	}
-	get_task_struct(tsk);
-	yield_to(tsk, true);
+
+	for (i = 0; i < fc.chain_length; i++) {
+		tsk[i] = kthread_run(__wait_fence_chains, fc.chains[i],
+				     "dmabuf/wait-%llu", fc.fences[i]->seqno);
+		if (IS_ERR(tsk[i])) {
+			err = PTR_ERR(tsk[i]);
+			pr_err("Reported %d for kthread_run(%llu)!\n",
+			       err, fc.fences[i]->seqno);
+			goto err;
+		}
+		get_task_struct(tsk[i]);
+		yield_to(tsk[i], true);
+	}
 
 	dt = -ktime_get();
 	for (i = 0; i < fc.chain_length; i++)
@@ -595,32 +604,53 @@ static int wait_forward(void *arg)
 
 	pr_info("%s: %d signals in %llu ns\n", __func__, fc.chain_length, ktime_to_ns(dt));
 
-	err = kthread_stop_put(tsk);
-
 err:
+	while (i--) {
+		int tsk_err = kthread_stop_put(tsk[i]);
+
+		if (tsk_err)
+			pr_err("Reported %d for kthread_stop_put(%llu)!\n",
+			       tsk_err, fc.fences[i]->seqno);
+
+		if (!err)
+			err = tsk_err;
+	}
+	kfree(tsk);
+
 	fence_chains_fini(&fc);
 	return err;
 }
 
 static int wait_backward(void *arg)
 {
+	struct task_struct **tsk;
 	struct fence_chains fc;
-	struct task_struct *tsk;
 	ktime_t dt;
+	int i = 0;
 	int err;
-	int i;
 
 	err = fence_chains_init(&fc, CHAIN_SZ, seqno_inc);
 	if (err)
 		return err;
 
-	tsk = kthread_run(__wait_fence_chains, fc.tail, "dmabuf/wait");
-	if (IS_ERR(tsk)) {
-		err = PTR_ERR(tsk);
+	tsk = kmalloc_array(fc.chain_length, sizeof(*tsk), GFP_KERNEL);
+	if (!tsk) {
+		err = -ENOMEM;
 		goto err;
 	}
-	get_task_struct(tsk);
-	yield_to(tsk, true);
+
+	for (i = 0; i < fc.chain_length; i++) {
+		tsk[i] = kthread_run(__wait_fence_chains, fc.chains[i],
+				     "dmabuf/wait-%llu", fc.fences[i]->seqno);
+		if (IS_ERR(tsk[i])) {
+			err = PTR_ERR(tsk[i]);
+			pr_err("Reported %d for kthread_run(%llu)!\n",
+			       err, fc.fences[i]->seqno);
+			goto err;
+		}
+		get_task_struct(tsk[i]);
+		yield_to(tsk[i], true);
+	}
 
 	dt = -ktime_get();
 	for (i = fc.chain_length; i--; )
@@ -629,9 +659,20 @@ static int wait_backward(void *arg)
 
 	pr_info("%s: %d signals in %llu ns\n", __func__, fc.chain_length, ktime_to_ns(dt));
 
-	err = kthread_stop_put(tsk);
-
+	i = fc.chain_length;
 err:
+	while (i--) {
+		int tsk_err = kthread_stop_put(tsk[i]);
+
+		if (tsk_err)
+			pr_err("Reported %d for kthread_stop_put(%llu)!\n",
+			       tsk_err, fc.fences[i]->seqno);
+
+		if (!err)
+			err = tsk_err;
+	}
+	kfree(tsk);
+
 	fence_chains_fini(&fc);
 	return err;
 }
@@ -654,11 +695,11 @@ static void randomise_fences(struct fence_chains *fc)
 
 static int wait_random(void *arg)
 {
+	struct task_struct **tsk;
 	struct fence_chains fc;
-	struct task_struct *tsk;
 	ktime_t dt;
+	int i = 0;
 	int err;
-	int i;
 
 	err = fence_chains_init(&fc, CHAIN_SZ, seqno_inc);
 	if (err)
@@ -666,13 +707,24 @@ static int wait_random(void *arg)
 
 	randomise_fences(&fc);
 
-	tsk = kthread_run(__wait_fence_chains, fc.tail, "dmabuf/wait");
-	if (IS_ERR(tsk)) {
-		err = PTR_ERR(tsk);
+	tsk = kmalloc_array(fc.chain_length, sizeof(*tsk), GFP_KERNEL);
+	if (!tsk) {
+		err = -ENOMEM;
 		goto err;
 	}
-	get_task_struct(tsk);
-	yield_to(tsk, true);
+
+	for (i = 0; i < fc.chain_length; i++) {
+		tsk[i] = kthread_run(__wait_fence_chains, fc.chains[i],
+				     "dmabuf/wait-%llu", fc.fences[i]->seqno);
+		if (IS_ERR(tsk[i])) {
+			err = PTR_ERR(tsk[i]);
+			pr_err("Reported %d for kthread_run(%llu)!\n",
+			       err, fc.fences[i]->seqno);
+			goto err;
+		}
+		get_task_struct(tsk[i]);
+		yield_to(tsk[i], true);
+	}
 
 	dt = -ktime_get();
 	for (i = 0; i < fc.chain_length; i++)
@@ -681,9 +733,19 @@ static int wait_random(void *arg)
 
 	pr_info("%s: %d signals in %llu ns\n", __func__, fc.chain_length, ktime_to_ns(dt));
 
-	err = kthread_stop_put(tsk);
-
 err:
+	while (i--) {
+		int tsk_err = kthread_stop_put(tsk[i]);
+
+		if (tsk_err)
+			pr_err("Reported %d for kthread_stop_put(%llu)!\n",
+			       tsk_err, fc.fences[i]->seqno);
+
+		if (!err)
+			err = tsk_err;
+	}
+	kfree(tsk);
+
 	fence_chains_fini(&fc);
 	return err;
 }
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 4/4] dma-buf/fence-chain: Speed up processing of rearmed callbacks
  2025-08-14  8:16 [PATCH 0/4] dma-buf/fence-chain: Speed up processing of rearmed callbacks Janusz Krzysztofik
                   ` (2 preceding siblings ...)
  2025-08-14  8:16 ` [PATCH 3/4] dma-buf/fence-chain: Wait on each tested chain link Janusz Krzysztofik
@ 2025-08-14  8:16 ` Janusz Krzysztofik
  2025-08-14 12:24   ` Christian König
  3 siblings, 1 reply; 10+ messages in thread
From: Janusz Krzysztofik @ 2025-08-14  8:16 UTC (permalink / raw)
  To: Christian König
  Cc: Sumit Semwal, Gustavo Padovan, Chris Wilson, linux-media,
	dri-devel, linaro-mm-sig, linux-kernel, intel-gfx, intel-xe,
	Janusz Krzysztofik

When first user starts waiting on a not yet signaled fence of a chain
link, a dma_fence_chain callback is added to a user fence of that link.
When the user fence of that chain link is then signaled, the chain is
traversed in search for a first not signaled link and the callback is
rearmed on a user fence of that link.

Since chain fences may be exposed to user space, e.g. over drm_syncobj
IOCTLs, users may start waiting on any link of the chain, then many links
of a chain may have signaling enabled and their callbacks added to their
user fences.  Once an arbitrary user fence is signaled, all
dma_fence_chain callbacks added to it so far must be rearmed to another
user fence of the chain.  In extreme scenarios, when all N links of a
chain are awaited and then signaled in reverse order, the dma_fence_chain
callback may be called up to N * (N + 1) / 2 times (an arithmetic series).

To avoid that potential excessive accumulation of dma_fence_chain
callbacks, rearm a trimmed-down, signal only callback version to the base
fence of a previous link, if not yet signaled, otherwise just signal the
base fence of the current link instead of traversing the chain in search
for a first not signaled link and moving all callbacks collected so far to
a user fence of that link.

Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/12904
Suggested-by: Chris Wilson <chris.p.wilson@linux.intel.com>
Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
---
 drivers/dma-buf/dma-fence-chain.c | 101 +++++++++++++++++++++++++-----
 1 file changed, 84 insertions(+), 17 deletions(-)

diff --git a/drivers/dma-buf/dma-fence-chain.c b/drivers/dma-buf/dma-fence-chain.c
index a8a90acf4f34d..90eff264ee05c 100644
--- a/drivers/dma-buf/dma-fence-chain.c
+++ b/drivers/dma-buf/dma-fence-chain.c
@@ -119,46 +119,113 @@ static const char *dma_fence_chain_get_timeline_name(struct dma_fence *fence)
         return "unbound";
 }
 
-static void dma_fence_chain_irq_work(struct irq_work *work)
+static void signal_irq_work(struct irq_work *work)
 {
 	struct dma_fence_chain *chain;
 
 	chain = container_of(work, typeof(*chain), work);
 
-	/* Try to rearm the callback */
-	if (!dma_fence_chain_enable_signaling(&chain->base))
-		/* Ok, we are done. No more unsignaled fences left */
-		dma_fence_signal(&chain->base);
+	dma_fence_signal(&chain->base);
 	dma_fence_put(&chain->base);
 }
 
-static void dma_fence_chain_cb(struct dma_fence *f, struct dma_fence_cb *cb)
+static void signal_cb(struct dma_fence *f, struct dma_fence_cb *cb)
+{
+	struct dma_fence_chain *chain;
+
+	chain = container_of(cb, typeof(*chain), cb);
+	init_irq_work(&chain->work, signal_irq_work);
+	irq_work_queue(&chain->work);
+}
+
+static void rearm_irq_work(struct irq_work *work)
+{
+	struct dma_fence_chain *chain;
+	struct dma_fence *prev;
+
+	chain = container_of(work, typeof(*chain), work);
+
+	rcu_read_lock();
+	prev = rcu_dereference(chain->prev);
+	if (prev && dma_fence_add_callback(prev, &chain->cb, signal_cb))
+		prev = NULL;
+	rcu_read_unlock();
+	if (prev)
+		return;
+
+	/* Ok, we are done. No more unsignaled fences left */
+	signal_irq_work(work);
+}
+
+static inline bool fence_is_signaled__nested(struct dma_fence *fence)
+{
+	if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags))
+		return true;
+
+	if (fence->ops->signaled && fence->ops->signaled(fence)) {
+		unsigned long flags;
+
+		spin_lock_irqsave_nested(fence->lock, flags, SINGLE_DEPTH_NESTING);
+		dma_fence_signal_locked(fence);
+		spin_unlock_irqrestore(fence->lock, flags);
+
+		return true;
+	}
+
+	return false;
+}
+
+static bool prev_is_signaled(struct dma_fence_chain *chain)
+{
+	struct dma_fence *prev;
+	bool result;
+
+	rcu_read_lock();
+	prev = rcu_dereference(chain->prev);
+	result = !prev || fence_is_signaled__nested(prev);
+	rcu_read_unlock();
+
+	return result;
+}
+
+static void rearm_or_signal_cb(struct dma_fence *f, struct dma_fence_cb *cb)
 {
 	struct dma_fence_chain *chain;
 
 	chain = container_of(cb, typeof(*chain), cb);
-	init_irq_work(&chain->work, dma_fence_chain_irq_work);
+	if (prev_is_signaled(chain)) {
+		/* Ok, we are done. No more unsignaled fences left */
+		init_irq_work(&chain->work, signal_irq_work);
+	} else {
+		/* Try to rearm the callback */
+		init_irq_work(&chain->work, rearm_irq_work);
+	}
+
 	irq_work_queue(&chain->work);
-	dma_fence_put(f);
 }
 
 static bool dma_fence_chain_enable_signaling(struct dma_fence *fence)
 {
 	struct dma_fence_chain *head = to_dma_fence_chain(fence);
+	int err = -ENOENT;
 
-	dma_fence_get(&head->base);
-	dma_fence_chain_for_each(fence, &head->base) {
-		struct dma_fence *f = dma_fence_chain_contained(fence);
+	if (WARN_ON(!head))
+		return false;
 
-		dma_fence_get(f);
-		if (!dma_fence_add_callback(f, &head->cb, dma_fence_chain_cb)) {
+	dma_fence_get(fence);
+	if (head->fence)
+		err = dma_fence_add_callback(head->fence, &head->cb, rearm_or_signal_cb);
+	if (err) {
+		if (prev_is_signaled(head)) {
 			dma_fence_put(fence);
-			return true;
+		} else {
+			init_irq_work(&head->work, rearm_irq_work);
+			irq_work_queue(&head->work);
+			err = 0;
 		}
-		dma_fence_put(f);
 	}
-	dma_fence_put(&head->base);
-	return false;
+
+	return !err;
 }
 
 static bool dma_fence_chain_signaled(struct dma_fence *fence)
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 4/4] dma-buf/fence-chain: Speed up processing of rearmed callbacks
  2025-08-14  8:16 ` [PATCH 4/4] dma-buf/fence-chain: Speed up processing of rearmed callbacks Janusz Krzysztofik
@ 2025-08-14 12:24   ` Christian König
  2025-08-18 14:30     ` Janusz Krzysztofik
  0 siblings, 1 reply; 10+ messages in thread
From: Christian König @ 2025-08-14 12:24 UTC (permalink / raw)
  To: Janusz Krzysztofik
  Cc: Sumit Semwal, Gustavo Padovan, Chris Wilson, linux-media,
	dri-devel, linaro-mm-sig, linux-kernel, intel-gfx, intel-xe



On 14.08.25 10:16, Janusz Krzysztofik wrote:
> When first user starts waiting on a not yet signaled fence of a chain
> link, a dma_fence_chain callback is added to a user fence of that link.
> When the user fence of that chain link is then signaled, the chain is
> traversed in search for a first not signaled link and the callback is
> rearmed on a user fence of that link.
> 
> Since chain fences may be exposed to user space, e.g. over drm_syncobj
> IOCTLs, users may start waiting on any link of the chain, then many links
> of a chain may have signaling enabled and their callbacks added to their
> user fences.  Once an arbitrary user fence is signaled, all
> dma_fence_chain callbacks added to it so far must be rearmed to another
> user fence of the chain.  In extreme scenarios, when all N links of a
> chain are awaited and then signaled in reverse order, the dma_fence_chain
> callback may be called up to N * (N + 1) / 2 times (an arithmetic series).
> 
> To avoid that potential excessive accumulation of dma_fence_chain
> callbacks, rearm a trimmed-down, signal only callback version to the base
> fence of a previous link, if not yet signaled, otherwise just signal the
> base fence of the current link instead of traversing the chain in search
> for a first not signaled link and moving all callbacks collected so far to
> a user fence of that link.

Well clear NAK to that! You can easily overflow the kernel stack with that!

Additional to this messing with the fence ops outside of the dma_fence code is an absolute no-go.

Regards,
Christian.

> 
> Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/12904
> Suggested-by: Chris Wilson <chris.p.wilson@linux.intel.com>
> Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
> ---
>  drivers/dma-buf/dma-fence-chain.c | 101 +++++++++++++++++++++++++-----
>  1 file changed, 84 insertions(+), 17 deletions(-)
> 
> diff --git a/drivers/dma-buf/dma-fence-chain.c b/drivers/dma-buf/dma-fence-chain.c
> index a8a90acf4f34d..90eff264ee05c 100644
> --- a/drivers/dma-buf/dma-fence-chain.c
> +++ b/drivers/dma-buf/dma-fence-chain.c
> @@ -119,46 +119,113 @@ static const char *dma_fence_chain_get_timeline_name(struct dma_fence *fence)
>          return "unbound";
>  }
>  
> -static void dma_fence_chain_irq_work(struct irq_work *work)
> +static void signal_irq_work(struct irq_work *work)
>  {
>  	struct dma_fence_chain *chain;
>  
>  	chain = container_of(work, typeof(*chain), work);
>  
> -	/* Try to rearm the callback */
> -	if (!dma_fence_chain_enable_signaling(&chain->base))
> -		/* Ok, we are done. No more unsignaled fences left */
> -		dma_fence_signal(&chain->base);
> +	dma_fence_signal(&chain->base);
>  	dma_fence_put(&chain->base);
>  }
>  
> -static void dma_fence_chain_cb(struct dma_fence *f, struct dma_fence_cb *cb)
> +static void signal_cb(struct dma_fence *f, struct dma_fence_cb *cb)
> +{
> +	struct dma_fence_chain *chain;
> +
> +	chain = container_of(cb, typeof(*chain), cb);
> +	init_irq_work(&chain->work, signal_irq_work);
> +	irq_work_queue(&chain->work);
> +}
> +
> +static void rearm_irq_work(struct irq_work *work)
> +{
> +	struct dma_fence_chain *chain;
> +	struct dma_fence *prev;
> +
> +	chain = container_of(work, typeof(*chain), work);
> +
> +	rcu_read_lock();
> +	prev = rcu_dereference(chain->prev);
> +	if (prev && dma_fence_add_callback(prev, &chain->cb, signal_cb))
> +		prev = NULL;
> +	rcu_read_unlock();
> +	if (prev)
> +		return;
> +
> +	/* Ok, we are done. No more unsignaled fences left */
> +	signal_irq_work(work);
> +}
> +
> +static inline bool fence_is_signaled__nested(struct dma_fence *fence)
> +{
> +	if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags))
> +		return true;
> +
> +	if (fence->ops->signaled && fence->ops->signaled(fence)) {
> +		unsigned long flags;
> +
> +		spin_lock_irqsave_nested(fence->lock, flags, SINGLE_DEPTH_NESTING);
> +		dma_fence_signal_locked(fence);
> +		spin_unlock_irqrestore(fence->lock, flags);
> +
> +		return true;
> +	}
> +
> +	return false;
> +}
> +
> +static bool prev_is_signaled(struct dma_fence_chain *chain)
> +{
> +	struct dma_fence *prev;
> +	bool result;
> +
> +	rcu_read_lock();
> +	prev = rcu_dereference(chain->prev);
> +	result = !prev || fence_is_signaled__nested(prev);
> +	rcu_read_unlock();
> +
> +	return result;
> +}
> +
> +static void rearm_or_signal_cb(struct dma_fence *f, struct dma_fence_cb *cb)
>  {
>  	struct dma_fence_chain *chain;
>  
>  	chain = container_of(cb, typeof(*chain), cb);
> -	init_irq_work(&chain->work, dma_fence_chain_irq_work);
> +	if (prev_is_signaled(chain)) {
> +		/* Ok, we are done. No more unsignaled fences left */
> +		init_irq_work(&chain->work, signal_irq_work);
> +	} else {
> +		/* Try to rearm the callback */
> +		init_irq_work(&chain->work, rearm_irq_work);
> +	}
> +
>  	irq_work_queue(&chain->work);
> -	dma_fence_put(f);
>  }
>  
>  static bool dma_fence_chain_enable_signaling(struct dma_fence *fence)
>  {
>  	struct dma_fence_chain *head = to_dma_fence_chain(fence);
> +	int err = -ENOENT;
>  
> -	dma_fence_get(&head->base);
> -	dma_fence_chain_for_each(fence, &head->base) {
> -		struct dma_fence *f = dma_fence_chain_contained(fence);
> +	if (WARN_ON(!head))
> +		return false;
>  
> -		dma_fence_get(f);
> -		if (!dma_fence_add_callback(f, &head->cb, dma_fence_chain_cb)) {
> +	dma_fence_get(fence);
> +	if (head->fence)
> +		err = dma_fence_add_callback(head->fence, &head->cb, rearm_or_signal_cb);
> +	if (err) {
> +		if (prev_is_signaled(head)) {
>  			dma_fence_put(fence);
> -			return true;
> +		} else {
> +			init_irq_work(&head->work, rearm_irq_work);
> +			irq_work_queue(&head->work);
> +			err = 0;
>  		}
> -		dma_fence_put(f);
>  	}
> -	dma_fence_put(&head->base);
> -	return false;
> +
> +	return !err;
>  }
>  
>  static bool dma_fence_chain_signaled(struct dma_fence *fence)


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 4/4] dma-buf/fence-chain: Speed up processing of rearmed callbacks
  2025-08-14 12:24   ` Christian König
@ 2025-08-18 14:30     ` Janusz Krzysztofik
  2025-08-18 14:42       ` Christian König
  0 siblings, 1 reply; 10+ messages in thread
From: Janusz Krzysztofik @ 2025-08-18 14:30 UTC (permalink / raw)
  To: Christian König
  Cc: Sumit Semwal, Gustavo Padovan, Chris Wilson, linux-media,
	dri-devel, linaro-mm-sig, linux-kernel, intel-gfx, intel-xe

Hi Christian,

On Thursday, 14 August 2025 14:24:29 CEST Christian König wrote:
> 
> On 14.08.25 10:16, Janusz Krzysztofik wrote:
> > When first user starts waiting on a not yet signaled fence of a chain
> > link, a dma_fence_chain callback is added to a user fence of that link.
> > When the user fence of that chain link is then signaled, the chain is
> > traversed in search for a first not signaled link and the callback is
> > rearmed on a user fence of that link.
> > 
> > Since chain fences may be exposed to user space, e.g. over drm_syncobj
> > IOCTLs, users may start waiting on any link of the chain, then many links
> > of a chain may have signaling enabled and their callbacks added to their
> > user fences.  Once an arbitrary user fence is signaled, all
> > dma_fence_chain callbacks added to it so far must be rearmed to another
> > user fence of the chain.  In extreme scenarios, when all N links of a
> > chain are awaited and then signaled in reverse order, the dma_fence_chain
> > callback may be called up to N * (N + 1) / 2 times (an arithmetic series).
> > 
> > To avoid that potential excessive accumulation of dma_fence_chain
> > callbacks, rearm a trimmed-down, signal only callback version to the base
> > fence of a previous link, if not yet signaled, otherwise just signal the
> > base fence of the current link instead of traversing the chain in search
> > for a first not signaled link and moving all callbacks collected so far to
> > a user fence of that link.
> 
> Well clear NAK to that! You can easily overflow the kernel stack with that!

I'll be happy to propose a better solution, but for that I need to understand 
better your message.  Could you please point out an exact piece of the 
proposed code and/or describe a scenario where you can see the risk of stack 
overflow?

> 
> Additional to this messing with the fence ops outside of the dma_fence code is an absolute no-go.

Could you please explain what piece of code you are referring to when you say 
"messing with the fence ops outside the dma_fence code"?  If not this patch 
then which particular one of this series did you mean?  I'm assuming you 
didn't mean drm_syncobj code that I mentioned in my commit descriptions.

Thanks,
Janusz

> 
> Regards,
> Christian.
> 
> > 
> > Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/12904
> > Suggested-by: Chris Wilson <chris.p.wilson@linux.intel.com>
> > Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
> > ---
> >  drivers/dma-buf/dma-fence-chain.c | 101 +++++++++++++++++++++++++-----
> >  1 file changed, 84 insertions(+), 17 deletions(-)
> > 
> > diff --git a/drivers/dma-buf/dma-fence-chain.c b/drivers/dma-buf/dma-fence-chain.c
> > index a8a90acf4f34d..90eff264ee05c 100644
> > --- a/drivers/dma-buf/dma-fence-chain.c
> > +++ b/drivers/dma-buf/dma-fence-chain.c
> > @@ -119,46 +119,113 @@ static const char *dma_fence_chain_get_timeline_name(struct dma_fence *fence)
> >          return "unbound";
> >  }
> >  
> > -static void dma_fence_chain_irq_work(struct irq_work *work)
> > +static void signal_irq_work(struct irq_work *work)
> >  {
> >  	struct dma_fence_chain *chain;
> >  
> >  	chain = container_of(work, typeof(*chain), work);
> >  
> > -	/* Try to rearm the callback */
> > -	if (!dma_fence_chain_enable_signaling(&chain->base))
> > -		/* Ok, we are done. No more unsignaled fences left */
> > -		dma_fence_signal(&chain->base);
> > +	dma_fence_signal(&chain->base);
> >  	dma_fence_put(&chain->base);
> >  }
> >  
> > -static void dma_fence_chain_cb(struct dma_fence *f, struct dma_fence_cb *cb)
> > +static void signal_cb(struct dma_fence *f, struct dma_fence_cb *cb)
> > +{
> > +	struct dma_fence_chain *chain;
> > +
> > +	chain = container_of(cb, typeof(*chain), cb);
> > +	init_irq_work(&chain->work, signal_irq_work);
> > +	irq_work_queue(&chain->work);
> > +}
> > +
> > +static void rearm_irq_work(struct irq_work *work)
> > +{
> > +	struct dma_fence_chain *chain;
> > +	struct dma_fence *prev;
> > +
> > +	chain = container_of(work, typeof(*chain), work);
> > +
> > +	rcu_read_lock();
> > +	prev = rcu_dereference(chain->prev);
> > +	if (prev && dma_fence_add_callback(prev, &chain->cb, signal_cb))
> > +		prev = NULL;
> > +	rcu_read_unlock();
> > +	if (prev)
> > +		return;
> > +
> > +	/* Ok, we are done. No more unsignaled fences left */
> > +	signal_irq_work(work);
> > +}
> > +
> > +static inline bool fence_is_signaled__nested(struct dma_fence *fence)
> > +{
> > +	if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags))
> > +		return true;
> > +
> > +	if (fence->ops->signaled && fence->ops->signaled(fence)) {
> > +		unsigned long flags;
> > +
> > +		spin_lock_irqsave_nested(fence->lock, flags, SINGLE_DEPTH_NESTING);
> > +		dma_fence_signal_locked(fence);
> > +		spin_unlock_irqrestore(fence->lock, flags);
> > +
> > +		return true;
> > +	}
> > +
> > +	return false;
> > +}
> > +
> > +static bool prev_is_signaled(struct dma_fence_chain *chain)
> > +{
> > +	struct dma_fence *prev;
> > +	bool result;
> > +
> > +	rcu_read_lock();
> > +	prev = rcu_dereference(chain->prev);
> > +	result = !prev || fence_is_signaled__nested(prev);
> > +	rcu_read_unlock();
> > +
> > +	return result;
> > +}
> > +
> > +static void rearm_or_signal_cb(struct dma_fence *f, struct dma_fence_cb *cb)
> >  {
> >  	struct dma_fence_chain *chain;
> >  
> >  	chain = container_of(cb, typeof(*chain), cb);
> > -	init_irq_work(&chain->work, dma_fence_chain_irq_work);
> > +	if (prev_is_signaled(chain)) {
> > +		/* Ok, we are done. No more unsignaled fences left */
> > +		init_irq_work(&chain->work, signal_irq_work);
> > +	} else {
> > +		/* Try to rearm the callback */
> > +		init_irq_work(&chain->work, rearm_irq_work);
> > +	}
> > +
> >  	irq_work_queue(&chain->work);
> > -	dma_fence_put(f);
> >  }
> >  
> >  static bool dma_fence_chain_enable_signaling(struct dma_fence *fence)
> >  {
> >  	struct dma_fence_chain *head = to_dma_fence_chain(fence);
> > +	int err = -ENOENT;
> >  
> > -	dma_fence_get(&head->base);
> > -	dma_fence_chain_for_each(fence, &head->base) {
> > -		struct dma_fence *f = dma_fence_chain_contained(fence);
> > +	if (WARN_ON(!head))
> > +		return false;
> >  
> > -		dma_fence_get(f);
> > -		if (!dma_fence_add_callback(f, &head->cb, dma_fence_chain_cb)) {
> > +	dma_fence_get(fence);
> > +	if (head->fence)
> > +		err = dma_fence_add_callback(head->fence, &head->cb, rearm_or_signal_cb);
> > +	if (err) {
> > +		if (prev_is_signaled(head)) {
> >  			dma_fence_put(fence);
> > -			return true;
> > +		} else {
> > +			init_irq_work(&head->work, rearm_irq_work);
> > +			irq_work_queue(&head->work);
> > +			err = 0;
> >  		}
> > -		dma_fence_put(f);
> >  	}
> > -	dma_fence_put(&head->base);
> > -	return false;
> > +
> > +	return !err;
> >  }
> >  
> >  static bool dma_fence_chain_signaled(struct dma_fence *fence)
> 
> 





^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 4/4] dma-buf/fence-chain: Speed up processing of rearmed callbacks
  2025-08-18 14:30     ` Janusz Krzysztofik
@ 2025-08-18 14:42       ` Christian König
  2025-08-19  9:04         ` Janusz Krzysztofik
  0 siblings, 1 reply; 10+ messages in thread
From: Christian König @ 2025-08-18 14:42 UTC (permalink / raw)
  To: Janusz Krzysztofik
  Cc: Sumit Semwal, Gustavo Padovan, Chris Wilson, linux-media,
	dri-devel, linaro-mm-sig, linux-kernel, intel-gfx, intel-xe

On 18.08.25 16:30, Janusz Krzysztofik wrote:
> Hi Christian,
> 
> On Thursday, 14 August 2025 14:24:29 CEST Christian König wrote:
>>
>> On 14.08.25 10:16, Janusz Krzysztofik wrote:
>>> When first user starts waiting on a not yet signaled fence of a chain
>>> link, a dma_fence_chain callback is added to a user fence of that link.
>>> When the user fence of that chain link is then signaled, the chain is
>>> traversed in search for a first not signaled link and the callback is
>>> rearmed on a user fence of that link.
>>>
>>> Since chain fences may be exposed to user space, e.g. over drm_syncobj
>>> IOCTLs, users may start waiting on any link of the chain, then many links
>>> of a chain may have signaling enabled and their callbacks added to their
>>> user fences.  Once an arbitrary user fence is signaled, all
>>> dma_fence_chain callbacks added to it so far must be rearmed to another
>>> user fence of the chain.  In extreme scenarios, when all N links of a
>>> chain are awaited and then signaled in reverse order, the dma_fence_chain
>>> callback may be called up to N * (N + 1) / 2 times (an arithmetic series).
>>>
>>> To avoid that potential excessive accumulation of dma_fence_chain
>>> callbacks, rearm a trimmed-down, signal only callback version to the base
>>> fence of a previous link, if not yet signaled, otherwise just signal the
>>> base fence of the current link instead of traversing the chain in search
>>> for a first not signaled link and moving all callbacks collected so far to
>>> a user fence of that link.
>>
>> Well clear NAK to that! You can easily overflow the kernel stack with that!
> 
> I'll be happy to propose a better solution, but for that I need to understand 
> better your message.  Could you please point out an exact piece of the 
> proposed code and/or describe a scenario where you can see the risk of stack 
> overflow?

The sentence "rearm .. to the base fence of a previous link" sounds like you are trying to install a callback on the signaling to the previous chain element.

That is exactly what I pointed out previously where you need to be super careful because when this chain signals the callbacks will execute recursively which means that you can trivially overflow the kernel stack if you have more than a handful of chain elements.

In other words A waits for B, B waits for C, C waits for D etc.... when D finally signals it will call C which in turn calls B which in turn calls A.

Even if the chain is a recursive data structure you absolutely can't use recursion for the handling of it.

Maybe I misunderstood your textual description but reading a sentence like this rings all alarm bells here. Otherwise I can't see what the patch is supposed to be optimizing.

>>
>> Additional to this messing with the fence ops outside of the dma_fence code is an absolute no-go.
> 
> Could you please explain what piece of code you are referring to when you say 
> "messing with the fence ops outside the dma_fence code"?  If not this patch 
> then which particular one of this series did you mean?  I'm assuming you 
> didn't mean drm_syncobj code that I mentioned in my commit descriptions.

See below.

> 
> Thanks,
> Janusz
> 
>>
>> Regards,
>> Christian.
>>
>>>
>>> Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/12904
>>> Suggested-by: Chris Wilson <chris.p.wilson@linux.intel.com>
>>> Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
>>> ---
>>>  drivers/dma-buf/dma-fence-chain.c | 101 +++++++++++++++++++++++++-----
>>>  1 file changed, 84 insertions(+), 17 deletions(-)
>>>
>>> diff --git a/drivers/dma-buf/dma-fence-chain.c b/drivers/dma-buf/dma-fence-chain.c
>>> index a8a90acf4f34d..90eff264ee05c 100644
>>> --- a/drivers/dma-buf/dma-fence-chain.c
>>> +++ b/drivers/dma-buf/dma-fence-chain.c
>>> @@ -119,46 +119,113 @@ static const char *dma_fence_chain_get_timeline_name(struct dma_fence *fence)
>>>          return "unbound";
>>>  }
>>>  
>>> -static void dma_fence_chain_irq_work(struct irq_work *work)
>>> +static void signal_irq_work(struct irq_work *work)
>>>  {
>>>  	struct dma_fence_chain *chain;
>>>  
>>>  	chain = container_of(work, typeof(*chain), work);
>>>  
>>> -	/* Try to rearm the callback */
>>> -	if (!dma_fence_chain_enable_signaling(&chain->base))
>>> -		/* Ok, we are done. No more unsignaled fences left */
>>> -		dma_fence_signal(&chain->base);
>>> +	dma_fence_signal(&chain->base);
>>>  	dma_fence_put(&chain->base);
>>>  }
>>>  
>>> -static void dma_fence_chain_cb(struct dma_fence *f, struct dma_fence_cb *cb)
>>> +static void signal_cb(struct dma_fence *f, struct dma_fence_cb *cb)
>>> +{
>>> +	struct dma_fence_chain *chain;
>>> +
>>> +	chain = container_of(cb, typeof(*chain), cb);
>>> +	init_irq_work(&chain->work, signal_irq_work);
>>> +	irq_work_queue(&chain->work);
>>> +}
>>> +
>>> +static void rearm_irq_work(struct irq_work *work)
>>> +{
>>> +	struct dma_fence_chain *chain;
>>> +	struct dma_fence *prev;
>>> +
>>> +	chain = container_of(work, typeof(*chain), work);
>>> +
>>> +	rcu_read_lock();
>>> +	prev = rcu_dereference(chain->prev);
>>> +	if (prev && dma_fence_add_callback(prev, &chain->cb, signal_cb))
>>> +		prev = NULL;
>>> +	rcu_read_unlock();
>>> +	if (prev)
>>> +		return;
>>> +
>>> +	/* Ok, we are done. No more unsignaled fences left */
>>> +	signal_irq_work(work);
>>> +}
>>> +
>>> +static inline bool fence_is_signaled__nested(struct dma_fence *fence)
>>> +{
>>> +	if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags))
>>> +		return true;
>>> +

>>> +	if (fence->ops->signaled && fence->ops->signaled(fence)) {

Calling this outside of dma-fence.[ch] is a clear no-go.

Regards,
Christian.

>>> +		unsigned long flags;
>>> +
>>> +		spin_lock_irqsave_nested(fence->lock, flags, SINGLE_DEPTH_NESTING);
>>> +		dma_fence_signal_locked(fence);
>>> +		spin_unlock_irqrestore(fence->lock, flags);
>>> +
>>> +		return true;
>>> +	}
>>> +
>>> +	return false;
>>> +}
>>> +
>>> +static bool prev_is_signaled(struct dma_fence_chain *chain)
>>> +{
>>> +	struct dma_fence *prev;
>>> +	bool result;
>>> +
>>> +	rcu_read_lock();
>>> +	prev = rcu_dereference(chain->prev);
>>> +	result = !prev || fence_is_signaled__nested(prev);
>>> +	rcu_read_unlock();
>>> +
>>> +	return result;
>>> +}
>>> +
>>> +static void rearm_or_signal_cb(struct dma_fence *f, struct dma_fence_cb *cb)
>>>  {
>>>  	struct dma_fence_chain *chain;
>>>  
>>>  	chain = container_of(cb, typeof(*chain), cb);
>>> -	init_irq_work(&chain->work, dma_fence_chain_irq_work);
>>> +	if (prev_is_signaled(chain)) {
>>> +		/* Ok, we are done. No more unsignaled fences left */
>>> +		init_irq_work(&chain->work, signal_irq_work);
>>> +	} else {
>>> +		/* Try to rearm the callback */
>>> +		init_irq_work(&chain->work, rearm_irq_work);
>>> +	}
>>> +
>>>  	irq_work_queue(&chain->work);
>>> -	dma_fence_put(f);
>>>  }
>>>  
>>>  static bool dma_fence_chain_enable_signaling(struct dma_fence *fence)
>>>  {
>>>  	struct dma_fence_chain *head = to_dma_fence_chain(fence);
>>> +	int err = -ENOENT;
>>>  
>>> -	dma_fence_get(&head->base);
>>> -	dma_fence_chain_for_each(fence, &head->base) {
>>> -		struct dma_fence *f = dma_fence_chain_contained(fence);
>>> +	if (WARN_ON(!head))
>>> +		return false;
>>>  
>>> -		dma_fence_get(f);
>>> -		if (!dma_fence_add_callback(f, &head->cb, dma_fence_chain_cb)) {
>>> +	dma_fence_get(fence);
>>> +	if (head->fence)
>>> +		err = dma_fence_add_callback(head->fence, &head->cb, rearm_or_signal_cb);
>>> +	if (err) {
>>> +		if (prev_is_signaled(head)) {
>>>  			dma_fence_put(fence);
>>> -			return true;
>>> +		} else {
>>> +			init_irq_work(&head->work, rearm_irq_work);
>>> +			irq_work_queue(&head->work);
>>> +			err = 0;
>>>  		}
>>> -		dma_fence_put(f);
>>>  	}
>>> -	dma_fence_put(&head->base);
>>> -	return false;
>>> +
>>> +	return !err;
>>>  }
>>>  
>>>  static bool dma_fence_chain_signaled(struct dma_fence *fence)
>>
>>
> 
> 
> 
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 4/4] dma-buf/fence-chain: Speed up processing of rearmed callbacks
  2025-08-18 14:42       ` Christian König
@ 2025-08-19  9:04         ` Janusz Krzysztofik
  2025-08-19  9:15           ` Christian König
  0 siblings, 1 reply; 10+ messages in thread
From: Janusz Krzysztofik @ 2025-08-19  9:04 UTC (permalink / raw)
  To: Christian König
  Cc: Sumit Semwal, Gustavo Padovan, Chris Wilson, linux-media,
	dri-devel, linaro-mm-sig, linux-kernel, intel-gfx, intel-xe,
	Janusz Krzysztofik

On Monday, 18 August 2025 16:42:56 CEST Christian König wrote:
> On 18.08.25 16:30, Janusz Krzysztofik wrote:
> > Hi Christian,
> > 
> > On Thursday, 14 August 2025 14:24:29 CEST Christian König wrote:
> >>
> >> On 14.08.25 10:16, Janusz Krzysztofik wrote:
> >>> When first user starts waiting on a not yet signaled fence of a chain
> >>> link, a dma_fence_chain callback is added to a user fence of that link.
> >>> When the user fence of that chain link is then signaled, the chain is
> >>> traversed in search for a first not signaled link and the callback is
> >>> rearmed on a user fence of that link.
> >>>
> >>> Since chain fences may be exposed to user space, e.g. over drm_syncobj
> >>> IOCTLs, users may start waiting on any link of the chain, then many links
> >>> of a chain may have signaling enabled and their callbacks added to their
> >>> user fences.  Once an arbitrary user fence is signaled, all
> >>> dma_fence_chain callbacks added to it so far must be rearmed to another
> >>> user fence of the chain.  In extreme scenarios, when all N links of a
> >>> chain are awaited and then signaled in reverse order, the dma_fence_chain
> >>> callback may be called up to N * (N + 1) / 2 times (an arithmetic series).
> >>>
> >>> To avoid that potential excessive accumulation of dma_fence_chain
> >>> callbacks, rearm a trimmed-down, signal only callback version to the base
> >>> fence of a previous link, if not yet signaled, otherwise just signal the
> >>> base fence of the current link instead of traversing the chain in search
> >>> for a first not signaled link and moving all callbacks collected so far to
> >>> a user fence of that link.
> >>
> >> Well clear NAK to that! You can easily overflow the kernel stack with that!
> > 
> > I'll be happy to propose a better solution, but for that I need to understand 
> > better your message.  Could you please point out an exact piece of the 
> > proposed code and/or describe a scenario where you can see the risk of stack 
> > overflow?
> 
> The sentence "rearm .. to the base fence of a previous link" sounds like you are trying to install a callback on the signaling to the previous chain element.
> 
> That is exactly what I pointed out previously where you need to be super careful because when this chain signals the callbacks will execute recursively which means that you can trivially overflow the kernel stack if you have more than a handful of chain elements.
> 
> In other words A waits for B, B waits for C, C waits for D etc.... when D finally signals it will call C which in turn calls B which in turn calls A.

OK, maybe my commit description was not precise enough, however, I didn't 
describe implementation details (how) intentionally.
When D signals then it doesn't call C directly, only it submits an irq work 
that calls C.  Then C doesn't just call B, only it submits another irq work 
that calls B, and so on.
Doesn't that code pattern effectively break the recursion loop into separate 
work items, each with its own separate stack?

> 
> Even if the chain is a recursive data structure you absolutely can't use recursion for the handling of it.
> 
> Maybe I misunderstood your textual description but reading a sentence like this rings all alarm bells here. Otherwise I can't see what the patch is supposed to be optimizing.

OK, maybe I should start my commit description of this patch with a copy of 
the first sentence from cover letter and also from patch 1/4 description that 
informs about the problem as reported by CI.  Maybe I should also provide a 
comparison of measured signaling times from trybot executions [1][2][3].  
Here are example numbers from CI machine fi-bsw-n3050:

With signaling time reports only added to selftests (patch 1 of 4):
<6> [777.914451] dma-buf: Running dma_fence_chain/wait_forward
<6> [778.123516] wait_forward: 4096 signals in 21373487 ns
<6> [778.335709] dma-buf: Running dma_fence_chain/wait_backward
<6> [795.791546] wait_backward: 4096 signals in 17249051192 ns
<6> [795.859699] dma-buf: Running dma_fence_chain/wait_random
<6> [796.161375] wait_random: 4096 signals in 97386256 ns

With dma_fence_enable_signaling() replaced in selftests with dma_fence_wait() 
(patches 1-3 of 4):
<6> [782.505692] dma-buf: Running dma_fence_chain/wait_forward
<6> [784.609213] wait_forward: 4096 signals in 36513103 ns
<3> [784.837226] Reported -4 for kthread_stop_put(0)!
<6> [785.147643] dma-buf: Running dma_fence_chain/wait_backward
<6> [806.367763] wait_backward: 4096 signals in 18428009499 ns
<6> [807.175325] dma-buf: Running dma_fence_chain/wait_random
<6> [809.453942] wait_random: 4096 signals in 119761950 ns

With the fix (patches 1-4 of 4):
<6> [731.519020] dma-buf: Running dma_fence_chain/wait_forward
<6> [733.623375] wait_forward: 4096 signals in 31890220 ns
<6> [734.258972] dma-buf: Running dma_fence_chain/wait_backward
<6> [736.267325] wait_backward: 4096 signals in 39007955 ns
<6> [736.700221] dma-buf: Running dma_fence_chain/wait_random
<6> [739.346706] wait_random: 4096 signals in 48384865 ns

Signaling time in wait_backward selftest has been reduced from 17s to 39ms.

[1] https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_152785v1/index.html?
[2] https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_152828v2/index.html?
[3] https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_152830v2/index.html?

> 
> >>
> >> Additional to this messing with the fence ops outside of the dma_fence code is an absolute no-go.
> > 
> > Could you please explain what piece of code you are referring to when you say 
> > "messing with the fence ops outside the dma_fence code"?  If not this patch 
> > then which particular one of this series did you mean?  I'm assuming you 
> > didn't mean drm_syncobj code that I mentioned in my commit descriptions.
> 
> See below.
> 
> > 
> > Thanks,
> > Janusz
> > 
> >>
> >> Regards,
> >> Christian.
> >>
> >>>
> >>> Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/12904
> >>> Suggested-by: Chris Wilson <chris.p.wilson@linux.intel.com>
> >>> Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
> >>> ---
> >>>  drivers/dma-buf/dma-fence-chain.c | 101 +++++++++++++++++++++++++-----
> >>>  1 file changed, 84 insertions(+), 17 deletions(-)
> >>>
> >>> diff --git a/drivers/dma-buf/dma-fence-chain.c b/drivers/dma-buf/dma-fence-chain.c
> >>> index a8a90acf4f34d..90eff264ee05c 100644
> >>> --- a/drivers/dma-buf/dma-fence-chain.c
> >>> +++ b/drivers/dma-buf/dma-fence-chain.c
> >>> @@ -119,46 +119,113 @@ static const char *dma_fence_chain_get_timeline_name(struct dma_fence *fence)
> >>>          return "unbound";
> >>>  }
> >>>  
> >>> -static void dma_fence_chain_irq_work(struct irq_work *work)
> >>> +static void signal_irq_work(struct irq_work *work)
> >>>  {
> >>>  	struct dma_fence_chain *chain;
> >>>  
> >>>  	chain = container_of(work, typeof(*chain), work);
> >>>  
> >>> -	/* Try to rearm the callback */
> >>> -	if (!dma_fence_chain_enable_signaling(&chain->base))
> >>> -		/* Ok, we are done. No more unsignaled fences left */
> >>> -		dma_fence_signal(&chain->base);
> >>> +	dma_fence_signal(&chain->base);
> >>>  	dma_fence_put(&chain->base);
> >>>  }
> >>>  
> >>> -static void dma_fence_chain_cb(struct dma_fence *f, struct dma_fence_cb *cb)
> >>> +static void signal_cb(struct dma_fence *f, struct dma_fence_cb *cb)
> >>> +{
> >>> +	struct dma_fence_chain *chain;
> >>> +
> >>> +	chain = container_of(cb, typeof(*chain), cb);
> >>> +	init_irq_work(&chain->work, signal_irq_work);
> >>> +	irq_work_queue(&chain->work);
> >>> +}
> >>> +
> >>> +static void rearm_irq_work(struct irq_work *work)
> >>> +{
> >>> +	struct dma_fence_chain *chain;
> >>> +	struct dma_fence *prev;
> >>> +
> >>> +	chain = container_of(work, typeof(*chain), work);
> >>> +
> >>> +	rcu_read_lock();
> >>> +	prev = rcu_dereference(chain->prev);
> >>> +	if (prev && dma_fence_add_callback(prev, &chain->cb, signal_cb))
> >>> +		prev = NULL;
> >>> +	rcu_read_unlock();
> >>> +	if (prev)
> >>> +		return;
> >>> +
> >>> +	/* Ok, we are done. No more unsignaled fences left */
> >>> +	signal_irq_work(work);
> >>> +}
> >>> +
> >>> +static inline bool fence_is_signaled__nested(struct dma_fence *fence)
> >>> +{
> >>> +	if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags))
> >>> +		return true;
> >>> +
> 
> >>> +	if (fence->ops->signaled && fence->ops->signaled(fence)) {
> 
> Calling this outside of dma-fence.[ch] is a clear no-go.

But this patch applies only to drivers/dma-buf/dma-fence-chain.c, not 
outside of it.

Thanks,
Janusz

> 
> Regards,
> Christian.
> 
> >>> +		unsigned long flags;
> >>> +
> >>> +		spin_lock_irqsave_nested(fence->lock, flags, SINGLE_DEPTH_NESTING);
> >>> +		dma_fence_signal_locked(fence);
> >>> +		spin_unlock_irqrestore(fence->lock, flags);
> >>> +
> >>> +		return true;
> >>> +	}
> >>> +
> >>> +	return false;
> >>> +}
> >>> +
> >>> +static bool prev_is_signaled(struct dma_fence_chain *chain)
> >>> +{
> >>> +	struct dma_fence *prev;
> >>> +	bool result;
> >>> +
> >>> +	rcu_read_lock();
> >>> +	prev = rcu_dereference(chain->prev);
> >>> +	result = !prev || fence_is_signaled__nested(prev);
> >>> +	rcu_read_unlock();
> >>> +
> >>> +	return result;
> >>> +}
> >>> +
> >>> +static void rearm_or_signal_cb(struct dma_fence *f, struct dma_fence_cb *cb)
> >>>  {
> >>>  	struct dma_fence_chain *chain;
> >>>  
> >>>  	chain = container_of(cb, typeof(*chain), cb);
> >>> -	init_irq_work(&chain->work, dma_fence_chain_irq_work);
> >>> +	if (prev_is_signaled(chain)) {
> >>> +		/* Ok, we are done. No more unsignaled fences left */
> >>> +		init_irq_work(&chain->work, signal_irq_work);
> >>> +	} else {
> >>> +		/* Try to rearm the callback */
> >>> +		init_irq_work(&chain->work, rearm_irq_work);
> >>> +	}
> >>> +
> >>>  	irq_work_queue(&chain->work);
> >>> -	dma_fence_put(f);
> >>>  }
> >>>  
> >>>  static bool dma_fence_chain_enable_signaling(struct dma_fence *fence)
> >>>  {
> >>>  	struct dma_fence_chain *head = to_dma_fence_chain(fence);
> >>> +	int err = -ENOENT;
> >>>  
> >>> -	dma_fence_get(&head->base);
> >>> -	dma_fence_chain_for_each(fence, &head->base) {
> >>> -		struct dma_fence *f = dma_fence_chain_contained(fence);
> >>> +	if (WARN_ON(!head))
> >>> +		return false;
> >>>  
> >>> -		dma_fence_get(f);
> >>> -		if (!dma_fence_add_callback(f, &head->cb, dma_fence_chain_cb)) {
> >>> +	dma_fence_get(fence);
> >>> +	if (head->fence)
> >>> +		err = dma_fence_add_callback(head->fence, &head->cb, rearm_or_signal_cb);
> >>> +	if (err) {
> >>> +		if (prev_is_signaled(head)) {
> >>>  			dma_fence_put(fence);
> >>> -			return true;
> >>> +		} else {
> >>> +			init_irq_work(&head->work, rearm_irq_work);
> >>> +			irq_work_queue(&head->work);
> >>> +			err = 0;
> >>>  		}
> >>> -		dma_fence_put(f);
> >>>  	}
> >>> -	dma_fence_put(&head->base);
> >>> -	return false;
> >>> +
> >>> +	return !err;
> >>>  }
> >>>  
> >>>  static bool dma_fence_chain_signaled(struct dma_fence *fence)
> >>
> >>
> > 
> > 
> > 
> > 
> 
> 





^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 4/4] dma-buf/fence-chain: Speed up processing of rearmed callbacks
  2025-08-19  9:04         ` Janusz Krzysztofik
@ 2025-08-19  9:15           ` Christian König
  0 siblings, 0 replies; 10+ messages in thread
From: Christian König @ 2025-08-19  9:15 UTC (permalink / raw)
  To: Janusz Krzysztofik
  Cc: Sumit Semwal, Gustavo Padovan, Chris Wilson, linux-media,
	dri-devel, linaro-mm-sig, linux-kernel, intel-gfx, intel-xe

On 19.08.25 11:04, Janusz Krzysztofik wrote:
> On Monday, 18 August 2025 16:42:56 CEST Christian König wrote:
>> On 18.08.25 16:30, Janusz Krzysztofik wrote:
>>> Hi Christian,
>>>
>>> On Thursday, 14 August 2025 14:24:29 CEST Christian König wrote:
>>>>
>>>> On 14.08.25 10:16, Janusz Krzysztofik wrote:
>>>>> When first user starts waiting on a not yet signaled fence of a chain
>>>>> link, a dma_fence_chain callback is added to a user fence of that link.
>>>>> When the user fence of that chain link is then signaled, the chain is
>>>>> traversed in search for a first not signaled link and the callback is
>>>>> rearmed on a user fence of that link.
>>>>>
>>>>> Since chain fences may be exposed to user space, e.g. over drm_syncobj
>>>>> IOCTLs, users may start waiting on any link of the chain, then many links
>>>>> of a chain may have signaling enabled and their callbacks added to their
>>>>> user fences.  Once an arbitrary user fence is signaled, all
>>>>> dma_fence_chain callbacks added to it so far must be rearmed to another
>>>>> user fence of the chain.  In extreme scenarios, when all N links of a
>>>>> chain are awaited and then signaled in reverse order, the dma_fence_chain
>>>>> callback may be called up to N * (N + 1) / 2 times (an arithmetic series).
>>>>>
>>>>> To avoid that potential excessive accumulation of dma_fence_chain
>>>>> callbacks, rearm a trimmed-down, signal only callback version to the base
>>>>> fence of a previous link, if not yet signaled, otherwise just signal the
>>>>> base fence of the current link instead of traversing the chain in search
>>>>> for a first not signaled link and moving all callbacks collected so far to
>>>>> a user fence of that link.
>>>>
>>>> Well clear NAK to that! You can easily overflow the kernel stack with that!
>>>
>>> I'll be happy to propose a better solution, but for that I need to understand 
>>> better your message.  Could you please point out an exact piece of the 
>>> proposed code and/or describe a scenario where you can see the risk of stack 
>>> overflow?
>>
>> The sentence "rearm .. to the base fence of a previous link" sounds like you are trying to install a callback on the signaling to the previous chain element.
>>
>> That is exactly what I pointed out previously where you need to be super careful because when this chain signals the callbacks will execute recursively which means that you can trivially overflow the kernel stack if you have more than a handful of chain elements.
>>
>> In other words A waits for B, B waits for C, C waits for D etc.... when D finally signals it will call C which in turn calls B which in turn calls A.
> 
> OK, maybe my commit description was not precise enough, however, I didn't 
> describe implementation details (how) intentionally.
> When D signals then it doesn't call C directly, only it submits an irq work 
> that calls C.  Then C doesn't just call B, only it submits another irq work 
> that calls B, and so on.
> Doesn't that code pattern effectively break the recursion loop into separate 
> work items, each with its own separate stack?

No, it's architecture dependent if the irq_work executes on a separate stack or not.

You would need a work_struct to really separate the two and I would reject that because it adds additional latency to the signaling path.

>>
>> Even if the chain is a recursive data structure you absolutely can't use recursion for the handling of it.
>>
>> Maybe I misunderstood your textual description but reading a sentence like this rings all alarm bells here. Otherwise I can't see what the patch is supposed to be optimizing.
> 
> OK, maybe I should start my commit description of this patch with a copy of 
> the first sentence from cover letter and also from patch 1/4 description that 
> informs about the problem as reported by CI.  Maybe I should also provide a 
> comparison of measured signaling times from trybot executions [1][2][3].  
> Here are example numbers from CI machine fi-bsw-n3050:

Yeah and I've pointed out before that this is irrelevant.

The problem is *not* the dma_fence_chain implementation, that one is doing exactly what is expected to do.

The problem is that the test case is nonsense. I've pointed that out numerous times now!

Regards,
Christian.

> 
> With signaling time reports only added to selftests (patch 1 of 4):
> <6> [777.914451] dma-buf: Running dma_fence_chain/wait_forward
> <6> [778.123516] wait_forward: 4096 signals in 21373487 ns
> <6> [778.335709] dma-buf: Running dma_fence_chain/wait_backward
> <6> [795.791546] wait_backward: 4096 signals in 17249051192 ns
> <6> [795.859699] dma-buf: Running dma_fence_chain/wait_random
> <6> [796.161375] wait_random: 4096 signals in 97386256 ns
> 
> With dma_fence_enable_signaling() replaced in selftests with dma_fence_wait() 
> (patches 1-3 of 4):
> <6> [782.505692] dma-buf: Running dma_fence_chain/wait_forward
> <6> [784.609213] wait_forward: 4096 signals in 36513103 ns
> <3> [784.837226] Reported -4 for kthread_stop_put(0)!
> <6> [785.147643] dma-buf: Running dma_fence_chain/wait_backward
> <6> [806.367763] wait_backward: 4096 signals in 18428009499 ns
> <6> [807.175325] dma-buf: Running dma_fence_chain/wait_random
> <6> [809.453942] wait_random: 4096 signals in 119761950 ns
> 
> With the fix (patches 1-4 of 4):
> <6> [731.519020] dma-buf: Running dma_fence_chain/wait_forward
> <6> [733.623375] wait_forward: 4096 signals in 31890220 ns
> <6> [734.258972] dma-buf: Running dma_fence_chain/wait_backward
> <6> [736.267325] wait_backward: 4096 signals in 39007955 ns
> <6> [736.700221] dma-buf: Running dma_fence_chain/wait_random
> <6> [739.346706] wait_random: 4096 signals in 48384865 ns
> 
> Signaling time in wait_backward selftest has been reduced from 17s to 39ms.
> 
> [1] https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_152785v1/index.html?
> [2] https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_152828v2/index.html?
> [3] https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_152830v2/index.html?
> 
>>
>>>>
>>>> Additional to this messing with the fence ops outside of the dma_fence code is an absolute no-go.
>>>
>>> Could you please explain what piece of code you are referring to when you say 
>>> "messing with the fence ops outside the dma_fence code"?  If not this patch 
>>> then which particular one of this series did you mean?  I'm assuming you 
>>> didn't mean drm_syncobj code that I mentioned in my commit descriptions.
>>
>> See below.
>>
>>>
>>> Thanks,
>>> Janusz
>>>
>>>>
>>>> Regards,
>>>> Christian.
>>>>
>>>>>
>>>>> Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/12904
>>>>> Suggested-by: Chris Wilson <chris.p.wilson@linux.intel.com>
>>>>> Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
>>>>> ---
>>>>>  drivers/dma-buf/dma-fence-chain.c | 101 +++++++++++++++++++++++++-----
>>>>>  1 file changed, 84 insertions(+), 17 deletions(-)
>>>>>
>>>>> diff --git a/drivers/dma-buf/dma-fence-chain.c b/drivers/dma-buf/dma-fence-chain.c
>>>>> index a8a90acf4f34d..90eff264ee05c 100644
>>>>> --- a/drivers/dma-buf/dma-fence-chain.c
>>>>> +++ b/drivers/dma-buf/dma-fence-chain.c
>>>>> @@ -119,46 +119,113 @@ static const char *dma_fence_chain_get_timeline_name(struct dma_fence *fence)
>>>>>          return "unbound";
>>>>>  }
>>>>>  
>>>>> -static void dma_fence_chain_irq_work(struct irq_work *work)
>>>>> +static void signal_irq_work(struct irq_work *work)
>>>>>  {
>>>>>  	struct dma_fence_chain *chain;
>>>>>  
>>>>>  	chain = container_of(work, typeof(*chain), work);
>>>>>  
>>>>> -	/* Try to rearm the callback */
>>>>> -	if (!dma_fence_chain_enable_signaling(&chain->base))
>>>>> -		/* Ok, we are done. No more unsignaled fences left */
>>>>> -		dma_fence_signal(&chain->base);
>>>>> +	dma_fence_signal(&chain->base);
>>>>>  	dma_fence_put(&chain->base);
>>>>>  }
>>>>>  
>>>>> -static void dma_fence_chain_cb(struct dma_fence *f, struct dma_fence_cb *cb)
>>>>> +static void signal_cb(struct dma_fence *f, struct dma_fence_cb *cb)
>>>>> +{
>>>>> +	struct dma_fence_chain *chain;
>>>>> +
>>>>> +	chain = container_of(cb, typeof(*chain), cb);
>>>>> +	init_irq_work(&chain->work, signal_irq_work);
>>>>> +	irq_work_queue(&chain->work);
>>>>> +}
>>>>> +
>>>>> +static void rearm_irq_work(struct irq_work *work)
>>>>> +{
>>>>> +	struct dma_fence_chain *chain;
>>>>> +	struct dma_fence *prev;
>>>>> +
>>>>> +	chain = container_of(work, typeof(*chain), work);
>>>>> +
>>>>> +	rcu_read_lock();
>>>>> +	prev = rcu_dereference(chain->prev);
>>>>> +	if (prev && dma_fence_add_callback(prev, &chain->cb, signal_cb))
>>>>> +		prev = NULL;
>>>>> +	rcu_read_unlock();
>>>>> +	if (prev)
>>>>> +		return;
>>>>> +
>>>>> +	/* Ok, we are done. No more unsignaled fences left */
>>>>> +	signal_irq_work(work);
>>>>> +}
>>>>> +
>>>>> +static inline bool fence_is_signaled__nested(struct dma_fence *fence)
>>>>> +{
>>>>> +	if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags))
>>>>> +		return true;
>>>>> +
>>
>>>>> +	if (fence->ops->signaled && fence->ops->signaled(fence)) {
>>
>> Calling this outside of dma-fence.[ch] is a clear no-go.
> 
> But this patch applies only to drivers/dma-buf/dma-fence-chain.c, not 
> outside of it.
> 
> Thanks,
> Janusz
> 
>>
>> Regards,
>> Christian.
>>
>>>>> +		unsigned long flags;
>>>>> +
>>>>> +		spin_lock_irqsave_nested(fence->lock, flags, SINGLE_DEPTH_NESTING);
>>>>> +		dma_fence_signal_locked(fence);
>>>>> +		spin_unlock_irqrestore(fence->lock, flags);
>>>>> +
>>>>> +		return true;
>>>>> +	}
>>>>> +
>>>>> +	return false;
>>>>> +}
>>>>> +
>>>>> +static bool prev_is_signaled(struct dma_fence_chain *chain)
>>>>> +{
>>>>> +	struct dma_fence *prev;
>>>>> +	bool result;
>>>>> +
>>>>> +	rcu_read_lock();
>>>>> +	prev = rcu_dereference(chain->prev);
>>>>> +	result = !prev || fence_is_signaled__nested(prev);
>>>>> +	rcu_read_unlock();
>>>>> +
>>>>> +	return result;
>>>>> +}
>>>>> +
>>>>> +static void rearm_or_signal_cb(struct dma_fence *f, struct dma_fence_cb *cb)
>>>>>  {
>>>>>  	struct dma_fence_chain *chain;
>>>>>  
>>>>>  	chain = container_of(cb, typeof(*chain), cb);
>>>>> -	init_irq_work(&chain->work, dma_fence_chain_irq_work);
>>>>> +	if (prev_is_signaled(chain)) {
>>>>> +		/* Ok, we are done. No more unsignaled fences left */
>>>>> +		init_irq_work(&chain->work, signal_irq_work);
>>>>> +	} else {
>>>>> +		/* Try to rearm the callback */
>>>>> +		init_irq_work(&chain->work, rearm_irq_work);
>>>>> +	}
>>>>> +
>>>>>  	irq_work_queue(&chain->work);
>>>>> -	dma_fence_put(f);
>>>>>  }
>>>>>  
>>>>>  static bool dma_fence_chain_enable_signaling(struct dma_fence *fence)
>>>>>  {
>>>>>  	struct dma_fence_chain *head = to_dma_fence_chain(fence);
>>>>> +	int err = -ENOENT;
>>>>>  
>>>>> -	dma_fence_get(&head->base);
>>>>> -	dma_fence_chain_for_each(fence, &head->base) {
>>>>> -		struct dma_fence *f = dma_fence_chain_contained(fence);
>>>>> +	if (WARN_ON(!head))
>>>>> +		return false;
>>>>>  
>>>>> -		dma_fence_get(f);
>>>>> -		if (!dma_fence_add_callback(f, &head->cb, dma_fence_chain_cb)) {
>>>>> +	dma_fence_get(fence);
>>>>> +	if (head->fence)
>>>>> +		err = dma_fence_add_callback(head->fence, &head->cb, rearm_or_signal_cb);
>>>>> +	if (err) {
>>>>> +		if (prev_is_signaled(head)) {
>>>>>  			dma_fence_put(fence);
>>>>> -			return true;
>>>>> +		} else {
>>>>> +			init_irq_work(&head->work, rearm_irq_work);
>>>>> +			irq_work_queue(&head->work);
>>>>> +			err = 0;
>>>>>  		}
>>>>> -		dma_fence_put(f);
>>>>>  	}
>>>>> -	dma_fence_put(&head->base);
>>>>> -	return false;
>>>>> +
>>>>> +	return !err;
>>>>>  }
>>>>>  
>>>>>  static bool dma_fence_chain_signaled(struct dma_fence *fence)
>>>>
>>>>
>>>
>>>
>>>
>>>
>>
>>
> 
> 
> 
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2025-08-19  9:15 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-14  8:16 [PATCH 0/4] dma-buf/fence-chain: Speed up processing of rearmed callbacks Janusz Krzysztofik
2025-08-14  8:16 ` [PATCH 1/4] dma-buf/fence-chain: Report time spent in wait_* test cases Janusz Krzysztofik
2025-08-14  8:16 ` [PATCH 2/4] dma-buf/fence-chain: Let test cases decide which fence to wait on Janusz Krzysztofik
2025-08-14  8:16 ` [PATCH 3/4] dma-buf/fence-chain: Wait on each tested chain link Janusz Krzysztofik
2025-08-14  8:16 ` [PATCH 4/4] dma-buf/fence-chain: Speed up processing of rearmed callbacks Janusz Krzysztofik
2025-08-14 12:24   ` Christian König
2025-08-18 14:30     ` Janusz Krzysztofik
2025-08-18 14:42       ` Christian König
2025-08-19  9:04         ` Janusz Krzysztofik
2025-08-19  9:15           ` Christian König

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).