[PATCH v3 0/2] landlock: Fix TSYNC deadlock and clean up error path

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH v3 0/2] landlock: Fix TSYNC deadlock and clean up error path
@ 2026-02-26  1:59 Yihan Ding
  2026-02-26  1:59 ` [PATCH v3 1/2] landlock: Serialize TSYNC thread restriction Yihan Ding
                   ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Yihan Ding @ 2026-02-26  1:59 UTC (permalink / raw)
  To: Mickaël Salaün, Günther Noack
  Cc: Paul Moore, Jann Horn, linux-security-module, linux-kernel,
	syzbot+7ea2f5e9dfd468201817

Hello,

This patch series fixes a deadlock in the Landlock TSYNC multithreading 
support, originally reported by syzbot, and cleans up the associated 
interrupt recovery path.

The deadlock occurs when multiple threads concurrently call 
landlock_restrict_self() with sibling thread restriction enabled, 
causing them to mutually queue task_works on each other and block 
indefinitely.

* Patch 1 fixes the root cause by serializing the TSYNC operations 
  within the same process using the exec_update_lock.
* Patch 2 cleans up the interrupt recovery path by replacing an 
  unnecessary wait_for_completion() with a straightforward loop break, 
  avoiding Use-After-Free while unblocking remaining task_works.

Changes in v3:
- Patch 1: Changed down_write_killable() to down_write_trylock() and
  return -ERESTARTNOINTR on failure. This avoids a secondary deadlock 
  where a blocking wait prevents a sibling thread from waking up to 
  execute the requested TSYNC task_work. (Noted by Günther Noack. 
  down_write_interruptible() was also suggested but is not implemented 
  for rw_semaphores in the kernel).
- Patch 2: No changes.

Changes in v2:
- Split the changes into a 2-patch series.
- Patch 1: Adopted down_write_killable() instead of down_write().
- Patch 2: Removed wait_for_completion(&shared_ctx.all_prepared) and 
  replaced it with a `break` to prevent UAF.

Link to v2: https://lore.kernel.org/all/20260225024734.3024732-1-dingyihan@uniontech.com/
Link to v1: https://lore.kernel.org/all/20260224062729.2908692-1-dingyihan@uniontech.com/

Yihan Ding (2):
  landlock: Serialize TSYNC thread restriction
  landlock: Clean up interrupted thread logic in TSYNC

 security/landlock/tsync.c | 20 ++++++++++++++------
 1 file changed, 14 insertions(+), 6 deletions(-)
-- 
2.51.0

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v3 1/2] landlock: Serialize TSYNC thread restriction
  2026-02-26  1:59 [PATCH v3 0/2] landlock: Fix TSYNC deadlock and clean up error path Yihan Ding
@ 2026-02-26  1:59 ` Yihan Ding
  2026-02-26  7:23   ` Günther Noack
                     ` (2 more replies)
  2026-02-26  1:59 ` [PATCH v3 2/2] landlock: Clean up interrupted thread logic in TSYNC Yihan Ding
  2026-03-03 17:31 ` [PATCH v3 0/2] landlock: Fix TSYNC deadlock and clean up error path Mickaël Salaün
  2 siblings, 3 replies; 17+ messages in thread
From: Yihan Ding @ 2026-02-26  1:59 UTC (permalink / raw)
  To: Mickaël Salaün, Günther Noack
  Cc: Paul Moore, Jann Horn, linux-security-module, linux-kernel,
	syzbot+7ea2f5e9dfd468201817, Yihan Ding

syzbot found a deadlock in landlock_restrict_sibling_threads().
When multiple threads concurrently call landlock_restrict_self() with
sibling thread restriction enabled, they can deadlock by mutually
queueing task_works on each other and then blocking in kernel space
(waiting for the other to finish).

Fix this by serializing the TSYNC operations within the same process
using the exec_update_lock. This prevents concurrent invocations
from deadlocking. 

We use down_write_trylock() and return -ERESTARTNOINTR if the lock
cannot be acquired immediately. This ensures that if a thread fails
to get the lock, it will return to userspace, allowing it to process
any pending TSYNC task_works from the lock holder, and then
transparently restart the syscall.

Fixes: 42fc7e6543f6 ("landlock: Multithreading support for landlock_restrict_self()")
Reported-by: syzbot+7ea2f5e9dfd468201817@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=7ea2f5e9dfd468201817
Suggested-by: Günther Noack <gnoack3000@gmail.com>
Signed-off-by: Yihan Ding <dingyihan@uniontech.com>
---
Changes in v3:
- Replaced down_write_killable() with down_write_trylock() and 
  returned -ERESTARTNOINTR to avoid a secondary deadlock caused by 
  blocking the execution of task_works. (Caught by Günther Noack).

Changes in v2:
- Use down_write_killable() instead of down_write().
- Split the interrupt path cleanup into a separate patch.
---
 security/landlock/tsync.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/security/landlock/tsync.c b/security/landlock/tsync.c
index de01aa899751..xxxxxxxxxxxx 100644
--- a/security/landlock/tsync.c
+++ b/security/landlock/tsync.c
@@ -447,6 +447,13 @@ int landlock_restrict_sibling_threads(const struct cred *old_cred,
 	shared_ctx.new_cred = new_cred;
 	shared_ctx.set_no_new_privs = task_no_new_privs(current);
 
+	/*
+	 * Serialize concurrent TSYNC operations to prevent deadlocks
+	 * when multiple threads call landlock_restrict_self() simultaneously.
+	 */
+	if (!down_write_trylock(&current->signal->exec_update_lock))
+		return -ERESTARTNOINTR;
+
 	/*
 	 * We schedule a pseudo-signal task_work for each of the calling task's
 	 * sibling threads.  In the task work, each thread:
@@ -556,6 +563,7 @@ int landlock_restrict_sibling_threads(const struct cred *old_cred,
 		wait_for_completion(&shared_ctx.all_finished);
 
 	tsync_works_release(&works);
+	up_write(&current->signal->exec_update_lock);
 
 	return atomic_read(&shared_ctx.preparation_error);
 }
-- 
2.51.0


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 1/2] landlock: Serialize TSYNC thread restriction
  2026-02-26  1:59 ` [PATCH v3 1/2] landlock: Serialize TSYNC thread restriction Yihan Ding
@ 2026-02-26  7:23   ` Günther Noack
  2026-03-03 16:20   ` Justin Suess
  2026-03-03 17:51   ` Mickaël Salaün
  2 siblings, 0 replies; 17+ messages in thread
From: Günther Noack @ 2026-02-26  7:23 UTC (permalink / raw)
  To: Yihan Ding
  Cc: Mickaël Salaün, Paul Moore, Jann Horn,
	linux-security-module, linux-kernel, syzbot+7ea2f5e9dfd468201817

On Thu, Feb 26, 2026 at 09:59:02AM +0800, Yihan Ding wrote:
> syzbot found a deadlock in landlock_restrict_sibling_threads().
> When multiple threads concurrently call landlock_restrict_self() with
> sibling thread restriction enabled, they can deadlock by mutually
> queueing task_works on each other and then blocking in kernel space
> (waiting for the other to finish).
> 
> Fix this by serializing the TSYNC operations within the same process
> using the exec_update_lock. This prevents concurrent invocations
> from deadlocking. 
> 
> We use down_write_trylock() and return -ERESTARTNOINTR if the lock
> cannot be acquired immediately. This ensures that if a thread fails
> to get the lock, it will return to userspace, allowing it to process
> any pending TSYNC task_works from the lock holder, and then
> transparently restart the syscall.
> 
> Fixes: 42fc7e6543f6 ("landlock: Multithreading support for landlock_restrict_self()")
> Reported-by: syzbot+7ea2f5e9dfd468201817@syzkaller.appspotmail.com
> Closes: https://syzkaller.appspot.com/bug?extid=7ea2f5e9dfd468201817
> Suggested-by: Günther Noack <gnoack3000@gmail.com>
> Signed-off-by: Yihan Ding <dingyihan@uniontech.com>
> ---
> Changes in v3:
> - Replaced down_write_killable() with down_write_trylock() and 
>   returned -ERESTARTNOINTR to avoid a secondary deadlock caused by 
>   blocking the execution of task_works. (Caught by Günther Noack).
> 
> Changes in v2:
> - Use down_write_killable() instead of down_write().
> - Split the interrupt path cleanup into a separate patch.
> ---
>  security/landlock/tsync.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/security/landlock/tsync.c b/security/landlock/tsync.c
> index de01aa899751..xxxxxxxxxxxx 100644
> --- a/security/landlock/tsync.c
> +++ b/security/landlock/tsync.c
> @@ -447,6 +447,13 @@ int landlock_restrict_sibling_threads(const struct cred *old_cred,
>  	shared_ctx.new_cred = new_cred;
>  	shared_ctx.set_no_new_privs = task_no_new_privs(current);
>  
> +	/*
> +	 * Serialize concurrent TSYNC operations to prevent deadlocks
> +	 * when multiple threads call landlock_restrict_self() simultaneously.
> +	 */
> +	if (!down_write_trylock(&current->signal->exec_update_lock))
> +		return -ERESTARTNOINTR;
> +
>  	/*
>  	 * We schedule a pseudo-signal task_work for each of the calling task's
>  	 * sibling threads.  In the task work, each thread:
> @@ -556,6 +563,7 @@ int landlock_restrict_sibling_threads(const struct cred *old_cred,
>  		wait_for_completion(&shared_ctx.all_finished);
>  
>  	tsync_works_release(&works);
> +	up_write(&current->signal->exec_update_lock);
>  
>  	return atomic_read(&shared_ctx.preparation_error);
>  }
> -- 
> 2.51.0
> 

Reviewed-by: Günther Noack <gnoack3000@gmail.com>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 1/2] landlock: Serialize TSYNC thread restriction
  2026-02-26  1:59 ` [PATCH v3 1/2] landlock: Serialize TSYNC thread restriction Yihan Ding
  2026-02-26  7:23   ` Günther Noack
@ 2026-03-03 16:20   ` Justin Suess
  2026-03-03 17:47     ` Mickaël Salaün
  2026-03-03 19:50     ` Günther Noack
  2026-03-03 17:51   ` Mickaël Salaün
  2 siblings, 2 replies; 17+ messages in thread
From: Justin Suess @ 2026-03-03 16:20 UTC (permalink / raw)
  To: Yihan Ding
  Cc: Mickaël Salaün, Günther Noack, Paul Moore,
	Jann Horn, linux-security-module, linux-kernel,
	syzbot+7ea2f5e9dfd468201817

On Thu, Feb 26, 2026 at 09:59:02AM +0800, Yihan Ding wrote:
> syzbot found a deadlock in landlock_restrict_sibling_threads().
> When multiple threads concurrently call landlock_restrict_self() with
> sibling thread restriction enabled, they can deadlock by mutually
> queueing task_works on each other and then blocking in kernel space
> (waiting for the other to finish).
> 
> Fix this by serializing the TSYNC operations within the same process
> using the exec_update_lock. This prevents concurrent invocations
> from deadlocking. 
> 
> We use down_write_trylock() and return -ERESTARTNOINTR if the lock
> cannot be acquired immediately. This ensures that if a thread fails
> to get the lock, it will return to userspace, allowing it to process
> any pending TSYNC task_works from the lock holder, and then
> transparently restart the syscall.
> 
> Fixes: 42fc7e6543f6 ("landlock: Multithreading support for landlock_restrict_self()")
> Reported-by: syzbot+7ea2f5e9dfd468201817@syzkaller.appspotmail.com
> Closes: https://syzkaller.appspot.com/bug?extid=7ea2f5e9dfd468201817
> Suggested-by: Günther Noack <gnoack3000@gmail.com>
> Signed-off-by: Yihan Ding <dingyihan@uniontech.com>
> ---
> Changes in v3:
> - Replaced down_write_killable() with down_write_trylock() and 
>   returned -ERESTARTNOINTR to avoid a secondary deadlock caused by 
>   blocking the execution of task_works. (Caught by Günther Noack).
> 
> Changes in v2:
> - Use down_write_killable() instead of down_write().
> - Split the interrupt path cleanup into a separate patch.
> ---
>  security/landlock/tsync.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/security/landlock/tsync.c b/security/landlock/tsync.c
> index de01aa899751..xxxxxxxxxxxx 100644
> --- a/security/landlock/tsync.c
> +++ b/security/landlock/tsync.c
> @@ -447,6 +447,13 @@ int landlock_restrict_sibling_threads(const struct cred *old_cred,
>  	shared_ctx.new_cred = new_cred;
>  	shared_ctx.set_no_new_privs = task_no_new_privs(current);
>  
> +	/*
> +	 * Serialize concurrent TSYNC operations to prevent deadlocks
> +	 * when multiple threads call landlock_restrict_self() simultaneously.
> +	 */
> +	if (!down_write_trylock(&current->signal->exec_update_lock))
> +		return -ERESTARTNOINTR;
These two lines above introduced a test failure in tsync_test
completing_enablement.

The commit that introduced the bug is 3d6327c306b3e1356ab868bf27a0854669295a4f
(this patch) and is currently in the mic/next branch.

I noticed the test failure while testing an unrelated patch.

The bug is because this code never actually yields or restarts the syscall.

This is the test output I observed:

  [+] Running tsync_test:
  TAP version 13
  1..4
  # Starting 4 tests from 1 test cases.
  #  RUN           global.single_threaded_success ...
  #            OK  global.single_threaded_success
  ok 1 global.single_threaded_success
  #  RUN           global.multi_threaded_success ...
  #            OK  global.multi_threaded_success
  ok 2 global.multi_threaded_success
  #  RUN           global.multi_threaded_success_despite_diverging_domains ...
  #            OK  global.multi_threaded_success_despite_diverging_domains
  ok 3 global.multi_threaded_success_despite_diverging_domains
  #  RUN           global.competing_enablement ...
  # tsync_test.c:156:competing_enablement:Expected 0 (0) == d[1].result (-1)
  # competing_enablement: Test failed
  #          FAIL  global.competing_enablement
  not ok 4 global.competing_enablement
  # FAILED: 3 / 4 tests passed.


Brief investigation and the additions of these pr_warn lines:


diff --git a/security/landlock/syscalls.c b/security/landlock/syscalls.c
index 0d66a68677b7..84909232b220 100644
--- a/security/landlock/syscalls.c
+++ b/security/landlock/syscalls.c
@@ -574,6 +574,9 @@ SYSCALL_DEFINE2(landlock_restrict_self, const int, ruleset_fd, const __u32,
 		const int err = landlock_restrict_sibling_threads(
 			current_cred(), new_cred);
 		if (err) {
+			pr_warn("landlock: restrict_self tsync err pid=%d tgid=%d err=%d flags=0x%x ruleset_fd=%d\n",
+				task_pid_nr(current), task_tgid_nr(current), err,
+				flags, ruleset_fd);
 			abort_creds(new_cred);
 			return err;
 		}
diff --git a/security/landlock/tsync.c b/security/landlock/tsync.c
index 5afc5d639b8f..deb0f0b1f081 100644
--- a/security/landlock/tsync.c
+++ b/security/landlock/tsync.c
@@ -489,8 +489,11 @@ int landlock_restrict_sibling_threads(const struct cred *old_cred,
 	 * Serialize concurrent TSYNC operations to prevent deadlocks when multiple
 	 * threads call landlock_restrict_self() simultaneously.
 	 */
-	if (!down_write_trylock(&current->signal->exec_update_lock))
+	if (!down_write_trylock(&current->signal->exec_update_lock)) {
+		pr_warn("landlock: tsync trylock busy pid=%d tgid=%d\n",
+			task_pid_nr(current), task_tgid_nr(current));
 		return -ERESTARTNOINTR;
+	}
 
 	/*
 	 * We schedule a pseudo-signal task_work for each of the calling task's
@@ -602,6 +605,10 @@ int landlock_restrict_sibling_threads(const struct cred *old_cred,
 
 	tsync_works_release(&works);
 	up_write(&current->signal->exec_update_lock);
+	if (atomic_read(&shared_ctx.preparation_error))
+		pr_warn("landlock: tsync preparation_error pid=%d tgid=%d err=%d\n",
+			task_pid_nr(current), task_tgid_nr(current),
+			atomic_read(&shared_ctx.preparation_error));
 
 	return atomic_read(&shared_ctx.preparation_error);
 }

Resulted in the following output:

  landlock: tsync trylock busy pid=1263 tgid=1261
  landlock: landlock: restrict_self tsync err pid=1263 tgid=1261 err=-513 flags=0x8 ruleset_fd=6
  # tsync_test.c:156:competing_enablement:Expected 0 (0) == d[1].result (-1)
  # competing_enablement: Test failed
  #          FAIL  global.competing_enablement
  not ok 4 global.competing_enablement

I have a fix that I will send as a patch.

Kind Regards,
Justin Suess

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 1/2] landlock: Serialize TSYNC thread restriction
  2026-03-03 16:20   ` Justin Suess
@ 2026-03-03 17:47     ` Mickaël Salaün
  2026-03-03 18:13       ` Justin Suess
  2026-03-03 19:50     ` Günther Noack
  1 sibling, 1 reply; 17+ messages in thread
From: Mickaël Salaün @ 2026-03-03 17:47 UTC (permalink / raw)
  To: Justin Suess
  Cc: Yihan Ding, Günther Noack, Paul Moore, Jann Horn,
	linux-security-module, linux-kernel, syzbot+7ea2f5e9dfd468201817

On Tue, Mar 03, 2026 at 11:20:10AM -0500, Justin Suess wrote:
> On Thu, Feb 26, 2026 at 09:59:02AM +0800, Yihan Ding wrote:
> > syzbot found a deadlock in landlock_restrict_sibling_threads().
> > When multiple threads concurrently call landlock_restrict_self() with
> > sibling thread restriction enabled, they can deadlock by mutually
> > queueing task_works on each other and then blocking in kernel space
> > (waiting for the other to finish).
> > 
> > Fix this by serializing the TSYNC operations within the same process
> > using the exec_update_lock. This prevents concurrent invocations
> > from deadlocking. 
> > 
> > We use down_write_trylock() and return -ERESTARTNOINTR if the lock
> > cannot be acquired immediately. This ensures that if a thread fails
> > to get the lock, it will return to userspace, allowing it to process
> > any pending TSYNC task_works from the lock holder, and then
> > transparently restart the syscall.
> > 
> > Fixes: 42fc7e6543f6 ("landlock: Multithreading support for landlock_restrict_self()")
> > Reported-by: syzbot+7ea2f5e9dfd468201817@syzkaller.appspotmail.com
> > Closes: https://syzkaller.appspot.com/bug?extid=7ea2f5e9dfd468201817
> > Suggested-by: Günther Noack <gnoack3000@gmail.com>
> > Signed-off-by: Yihan Ding <dingyihan@uniontech.com>
> > ---
> > Changes in v3:
> > - Replaced down_write_killable() with down_write_trylock() and 
> >   returned -ERESTARTNOINTR to avoid a secondary deadlock caused by 
> >   blocking the execution of task_works. (Caught by Günther Noack).
> > 
> > Changes in v2:
> > - Use down_write_killable() instead of down_write().
> > - Split the interrupt path cleanup into a separate patch.
> > ---
> >  security/landlock/tsync.c | 8 ++++++++
> >  1 file changed, 8 insertions(+)
> > 
> > diff --git a/security/landlock/tsync.c b/security/landlock/tsync.c
> > index de01aa899751..xxxxxxxxxxxx 100644
> > --- a/security/landlock/tsync.c
> > +++ b/security/landlock/tsync.c
> > @@ -447,6 +447,13 @@ int landlock_restrict_sibling_threads(const struct cred *old_cred,
> >  	shared_ctx.new_cred = new_cred;
> >  	shared_ctx.set_no_new_privs = task_no_new_privs(current);
> >  
> > +	/*
> > +	 * Serialize concurrent TSYNC operations to prevent deadlocks
> > +	 * when multiple threads call landlock_restrict_self() simultaneously.
> > +	 */
> > +	if (!down_write_trylock(&current->signal->exec_update_lock))
> > +		return -ERESTARTNOINTR;
> These two lines above introduced a test failure in tsync_test
> completing_enablement.
> 
> The commit that introduced the bug is 3d6327c306b3e1356ab868bf27a0854669295a4f
> (this patch) and is currently in the mic/next branch.
> 
> I noticed the test failure while testing an unrelated patch.
> 
> The bug is because this code never actually yields or restarts the syscall.
> 
> This is the test output I observed:
> 
>   [+] Running tsync_test:
>   TAP version 13
>   1..4
>   # Starting 4 tests from 1 test cases.
>   #  RUN           global.single_threaded_success ...
>   #            OK  global.single_threaded_success
>   ok 1 global.single_threaded_success
>   #  RUN           global.multi_threaded_success ...
>   #            OK  global.multi_threaded_success
>   ok 2 global.multi_threaded_success
>   #  RUN           global.multi_threaded_success_despite_diverging_domains ...
>   #            OK  global.multi_threaded_success_despite_diverging_domains
>   ok 3 global.multi_threaded_success_despite_diverging_domains
>   #  RUN           global.competing_enablement ...
>   # tsync_test.c:156:competing_enablement:Expected 0 (0) == d[1].result (-1)
>   # competing_enablement: Test failed
>   #          FAIL  global.competing_enablement
>   not ok 4 global.competing_enablement
>   # FAILED: 3 / 4 tests passed.
> 
> 
> Brief investigation and the additions of these pr_warn lines:
> 
> 
> diff --git a/security/landlock/syscalls.c b/security/landlock/syscalls.c
> index 0d66a68677b7..84909232b220 100644
> --- a/security/landlock/syscalls.c
> +++ b/security/landlock/syscalls.c
> @@ -574,6 +574,9 @@ SYSCALL_DEFINE2(landlock_restrict_self, const int, ruleset_fd, const __u32,
>  		const int err = landlock_restrict_sibling_threads(
>  			current_cred(), new_cred);
>  		if (err) {
> +			pr_warn("landlock: restrict_self tsync err pid=%d tgid=%d err=%d flags=0x%x ruleset_fd=%d\n",
> +				task_pid_nr(current), task_tgid_nr(current), err,
> +				flags, ruleset_fd);
>  			abort_creds(new_cred);
>  			return err;
>  		}
> diff --git a/security/landlock/tsync.c b/security/landlock/tsync.c
> index 5afc5d639b8f..deb0f0b1f081 100644
> --- a/security/landlock/tsync.c
> +++ b/security/landlock/tsync.c
> @@ -489,8 +489,11 @@ int landlock_restrict_sibling_threads(const struct cred *old_cred,
>  	 * Serialize concurrent TSYNC operations to prevent deadlocks when multiple
>  	 * threads call landlock_restrict_self() simultaneously.
>  	 */
> -	if (!down_write_trylock(&current->signal->exec_update_lock))
> +	if (!down_write_trylock(&current->signal->exec_update_lock)) {
> +		pr_warn("landlock: tsync trylock busy pid=%d tgid=%d\n",
> +			task_pid_nr(current), task_tgid_nr(current));
>  		return -ERESTARTNOINTR;
> +	}
>  
>  	/*
>  	 * We schedule a pseudo-signal task_work for each of the calling task's
> @@ -602,6 +605,10 @@ int landlock_restrict_sibling_threads(const struct cred *old_cred,
>  
>  	tsync_works_release(&works);
>  	up_write(&current->signal->exec_update_lock);
> +	if (atomic_read(&shared_ctx.preparation_error))
> +		pr_warn("landlock: tsync preparation_error pid=%d tgid=%d err=%d\n",
> +			task_pid_nr(current), task_tgid_nr(current),
> +			atomic_read(&shared_ctx.preparation_error));
>  
>  	return atomic_read(&shared_ctx.preparation_error);
>  }
> 
> Resulted in the following output:
> 
>   landlock: tsync trylock busy pid=1263 tgid=1261
>   landlock: landlock: restrict_self tsync err pid=1263 tgid=1261 err=-513 flags=0x8 ruleset_fd=6
>   # tsync_test.c:156:competing_enablement:Expected 0 (0) == d[1].result (-1)
>   # competing_enablement: Test failed
>   #          FAIL  global.competing_enablement
>   not ok 4 global.competing_enablement

You're right, I have the same issue, not sure how I missed it last time.

> 
> I have a fix that I will send as a patch.

I'll need to squash your fix to this fix to only have one non-buggy
patch.  So, either you send a new patch and I'll squash it with
Co-developed-by, or Yihan takes your patch and send a new version with
your contribution (I'll prefer the later to make it easier to follow
this series).

> 
> Kind Regards,
> Justin Suess
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 1/2] landlock: Serialize TSYNC thread restriction
  2026-03-03 17:47     ` Mickaël Salaün
@ 2026-03-03 18:13       ` Justin Suess
  0 siblings, 0 replies; 17+ messages in thread
From: Justin Suess @ 2026-03-03 18:13 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: Yihan Ding, Günther Noack, Paul Moore, Jann Horn,
	linux-security-module, linux-kernel, syzbot+7ea2f5e9dfd468201817

On Tue, Mar 03, 2026 at 06:47:30PM +0100, Mickaël Salaün wrote:
> On Tue, Mar 03, 2026 at 11:20:10AM -0500, Justin Suess wrote:
> > On Thu, Feb 26, 2026 at 09:59:02AM +0800, Yihan Ding wrote:
> > > syzbot found a deadlock in landlock_restrict_sibling_threads().
> > > When multiple threads concurrently call landlock_restrict_self() with
> > > sibling thread restriction enabled, they can deadlock by mutually
> > > queueing task_works on each other and then blocking in kernel space
> > > (waiting for the other to finish).
> > > 
> > > Fix this by serializing the TSYNC operations within the same process
> > > using the exec_update_lock. This prevents concurrent invocations
> > > from deadlocking. 
> > > 
> > > We use down_write_trylock() and return -ERESTARTNOINTR if the lock
> > > cannot be acquired immediately. This ensures that if a thread fails
> > > to get the lock, it will return to userspace, allowing it to process
> > > any pending TSYNC task_works from the lock holder, and then
> > > transparently restart the syscall.
> > > 
> > > Fixes: 42fc7e6543f6 ("landlock: Multithreading support for landlock_restrict_self()")
> > > Reported-by: syzbot+7ea2f5e9dfd468201817@syzkaller.appspotmail.com
> > > Closes: https://syzkaller.appspot.com/bug?extid=7ea2f5e9dfd468201817
> > > Suggested-by: Günther Noack <gnoack3000@gmail.com>
> > > Signed-off-by: Yihan Ding <dingyihan@uniontech.com>
> > > ---
> > > Changes in v3:
> > > - Replaced down_write_killable() with down_write_trylock() and 
> > >   returned -ERESTARTNOINTR to avoid a secondary deadlock caused by 
> > >   blocking the execution of task_works. (Caught by Günther Noack).
> > > 
> > > Changes in v2:
> > > - Use down_write_killable() instead of down_write().
> > > - Split the interrupt path cleanup into a separate patch.
> > > ---
> > >  security/landlock/tsync.c | 8 ++++++++
> > >  1 file changed, 8 insertions(+)
> > > 
> > > diff --git a/security/landlock/tsync.c b/security/landlock/tsync.c
> > > index de01aa899751..xxxxxxxxxxxx 100644
> > > --- a/security/landlock/tsync.c
> > > +++ b/security/landlock/tsync.c
> > > @@ -447,6 +447,13 @@ int landlock_restrict_sibling_threads(const struct cred *old_cred,
> > >  	shared_ctx.new_cred = new_cred;
> > >  	shared_ctx.set_no_new_privs = task_no_new_privs(current);
> > >  
> > > +	/*
> > > +	 * Serialize concurrent TSYNC operations to prevent deadlocks
> > > +	 * when multiple threads call landlock_restrict_self() simultaneously.
> > > +	 */
> > > +	if (!down_write_trylock(&current->signal->exec_update_lock))
> > > +		return -ERESTARTNOINTR;
> > These two lines above introduced a test failure in tsync_test
> > completing_enablement.
> > 
> > The commit that introduced the bug is 3d6327c306b3e1356ab868bf27a0854669295a4f
> > (this patch) and is currently in the mic/next branch.
> > 
> > I noticed the test failure while testing an unrelated patch.
> > 
> > The bug is because this code never actually yields or restarts the syscall.
> > 
> > This is the test output I observed:
> > 
> >   [+] Running tsync_test:
> >   TAP version 13
> >   1..4
> >   # Starting 4 tests from 1 test cases.
> >   #  RUN           global.single_threaded_success ...
> >   #            OK  global.single_threaded_success
> >   ok 1 global.single_threaded_success
> >   #  RUN           global.multi_threaded_success ...
> >   #            OK  global.multi_threaded_success
> >   ok 2 global.multi_threaded_success
> >   #  RUN           global.multi_threaded_success_despite_diverging_domains ...
> >   #            OK  global.multi_threaded_success_despite_diverging_domains
> >   ok 3 global.multi_threaded_success_despite_diverging_domains
> >   #  RUN           global.competing_enablement ...
> >   # tsync_test.c:156:competing_enablement:Expected 0 (0) == d[1].result (-1)
> >   # competing_enablement: Test failed
> >   #          FAIL  global.competing_enablement
> >   not ok 4 global.competing_enablement
> >   # FAILED: 3 / 4 tests passed.
> > 
> > 
> > Brief investigation and the additions of these pr_warn lines:
> > 
> > 
> > diff --git a/security/landlock/syscalls.c b/security/landlock/syscalls.c
> > index 0d66a68677b7..84909232b220 100644
> > --- a/security/landlock/syscalls.c
> > +++ b/security/landlock/syscalls.c
> > @@ -574,6 +574,9 @@ SYSCALL_DEFINE2(landlock_restrict_self, const int, ruleset_fd, const __u32,
> >  		const int err = landlock_restrict_sibling_threads(
> >  			current_cred(), new_cred);
> >  		if (err) {
> > +			pr_warn("landlock: restrict_self tsync err pid=%d tgid=%d err=%d flags=0x%x ruleset_fd=%d\n",
> > +				task_pid_nr(current), task_tgid_nr(current), err,
> > +				flags, ruleset_fd);
> >  			abort_creds(new_cred);
> >  			return err;
> >  		}
> > diff --git a/security/landlock/tsync.c b/security/landlock/tsync.c
> > index 5afc5d639b8f..deb0f0b1f081 100644
> > --- a/security/landlock/tsync.c
> > +++ b/security/landlock/tsync.c
> > @@ -489,8 +489,11 @@ int landlock_restrict_sibling_threads(const struct cred *old_cred,
> >  	 * Serialize concurrent TSYNC operations to prevent deadlocks when multiple
> >  	 * threads call landlock_restrict_self() simultaneously.
> >  	 */
> > -	if (!down_write_trylock(&current->signal->exec_update_lock))
> > +	if (!down_write_trylock(&current->signal->exec_update_lock)) {
> > +		pr_warn("landlock: tsync trylock busy pid=%d tgid=%d\n",
> > +			task_pid_nr(current), task_tgid_nr(current));
> >  		return -ERESTARTNOINTR;
> > +	}
> >  
> >  	/*
> >  	 * We schedule a pseudo-signal task_work for each of the calling task's
> > @@ -602,6 +605,10 @@ int landlock_restrict_sibling_threads(const struct cred *old_cred,
> >  
> >  	tsync_works_release(&works);
> >  	up_write(&current->signal->exec_update_lock);
> > +	if (atomic_read(&shared_ctx.preparation_error))
> > +		pr_warn("landlock: tsync preparation_error pid=%d tgid=%d err=%d\n",
> > +			task_pid_nr(current), task_tgid_nr(current),
> > +			atomic_read(&shared_ctx.preparation_error));
> >  
> >  	return atomic_read(&shared_ctx.preparation_error);
> >  }
> > 
> > Resulted in the following output:
> > 
> >   landlock: tsync trylock busy pid=1263 tgid=1261
> >   landlock: landlock: restrict_self tsync err pid=1263 tgid=1261 err=-513 flags=0x8 ruleset_fd=6
> >   # tsync_test.c:156:competing_enablement:Expected 0 (0) == d[1].result (-1)
> >   # competing_enablement: Test failed
> >   #          FAIL  global.competing_enablement
> >   not ok 4 global.competing_enablement
> 
> You're right, I have the same issue, not sure how I missed it last time.
> 
> > 
> > I have a fix that I will send as a patch.
> 
> I'll need to squash your fix to this fix to only have one non-buggy
> patch.  So, either you send a new patch and I'll squash it with
> Co-developed-by, or Yihan takes your patch and send a new version with
> your contribution (I'll prefer the later to make it easier to follow
> this series).
>
Agreed.

The latter option is probably better. Yihan, are you ok to squash / apply
my patch [1] yourself to your next version of this series?

Feel free to do whatever you think is best. (or submit your own fix if
you think mine isn't a good fit)

[1]: https://lore.kernel.org/linux-security-module/20260303174354.1839461-1-utilityemal77@gmail.com/
> > 
> > Kind Regards,
> > Justin Suess
> > 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 1/2] landlock: Serialize TSYNC thread restriction
  2026-03-03 16:20   ` Justin Suess
  2026-03-03 17:47     ` Mickaël Salaün
@ 2026-03-03 19:50     ` Günther Noack
  2026-03-03 20:38       ` Tingmao Wang
  2026-03-03 21:08       ` Justin Suess
  1 sibling, 2 replies; 17+ messages in thread
From: Günther Noack @ 2026-03-03 19:50 UTC (permalink / raw)
  To: Justin Suess
  Cc: Yihan Ding, Mickaël Salaün, Paul Moore, Jann Horn,
	linux-security-module, linux-kernel, syzbot+7ea2f5e9dfd468201817

Oof, thanks for catching this, Justin! 🏆

I was convinced I had tried the selftests for that variant,
but apparently I had not. Mea culpa.

On Tue, Mar 03, 2026 at 11:20:10AM -0500, Justin Suess wrote:
> On Thu, Feb 26, 2026 at 09:59:02AM +0800, Yihan Ding wrote:
> > syzbot found a deadlock in landlock_restrict_sibling_threads().
> > When multiple threads concurrently call landlock_restrict_self() with
> > sibling thread restriction enabled, they can deadlock by mutually
> > queueing task_works on each other and then blocking in kernel space
> > (waiting for the other to finish).
> > 
> > Fix this by serializing the TSYNC operations within the same process
> > using the exec_update_lock. This prevents concurrent invocations
> > from deadlocking. 
> > 
> > We use down_write_trylock() and return -ERESTARTNOINTR if the lock
> > cannot be acquired immediately. This ensures that if a thread fails
> > to get the lock, it will return to userspace, allowing it to process
> > any pending TSYNC task_works from the lock holder, and then
> > transparently restart the syscall.
> > 
> > Fixes: 42fc7e6543f6 ("landlock: Multithreading support for landlock_restrict_self()")
> > Reported-by: syzbot+7ea2f5e9dfd468201817@syzkaller.appspotmail.com
> > Closes: https://syzkaller.appspot.com/bug?extid=7ea2f5e9dfd468201817
> > Suggested-by: Günther Noack <gnoack3000@gmail.com>
> > Signed-off-by: Yihan Ding <dingyihan@uniontech.com>
> > ---
> > Changes in v3:
> > - Replaced down_write_killable() with down_write_trylock() and 
> >   returned -ERESTARTNOINTR to avoid a secondary deadlock caused by 
> >   blocking the execution of task_works. (Caught by Günther Noack).
> > 
> > Changes in v2:
> > - Use down_write_killable() instead of down_write().
> > - Split the interrupt path cleanup into a separate patch.
> > ---
> >  security/landlock/tsync.c | 8 ++++++++
> >  1 file changed, 8 insertions(+)
> > 
> > diff --git a/security/landlock/tsync.c b/security/landlock/tsync.c
> > index de01aa899751..xxxxxxxxxxxx 100644
> > --- a/security/landlock/tsync.c
> > +++ b/security/landlock/tsync.c
> > @@ -447,6 +447,13 @@ int landlock_restrict_sibling_threads(const struct cred *old_cred,
> >  	shared_ctx.new_cred = new_cred;
> >  	shared_ctx.set_no_new_privs = task_no_new_privs(current);
> >  
> > +	/*
> > +	 * Serialize concurrent TSYNC operations to prevent deadlocks
> > +	 * when multiple threads call landlock_restrict_self() simultaneously.
> > +	 */
> > +	if (!down_write_trylock(&current->signal->exec_update_lock))
> > +		return -ERESTARTNOINTR;
> These two lines above introduced a test failure in tsync_test
> completing_enablement.
> 
> The commit that introduced the bug is 3d6327c306b3e1356ab868bf27a0854669295a4f
> (this patch) and is currently in the mic/next branch.
> 
> I noticed the test failure while testing an unrelated patch.
> 
> The bug is because this code never actually yields or restarts the syscall.
> 
> This is the test output I observed:
> 
>   [+] Running tsync_test:
>   TAP version 13
>   1..4
>   # Starting 4 tests from 1 test cases.
>   #  RUN           global.single_threaded_success ...
>   #            OK  global.single_threaded_success
>   ok 1 global.single_threaded_success
>   #  RUN           global.multi_threaded_success ...
>   #            OK  global.multi_threaded_success
>   ok 2 global.multi_threaded_success
>   #  RUN           global.multi_threaded_success_despite_diverging_domains ...
>   #            OK  global.multi_threaded_success_despite_diverging_domains
>   ok 3 global.multi_threaded_success_despite_diverging_domains
>   #  RUN           global.competing_enablement ...
>   # tsync_test.c:156:competing_enablement:Expected 0 (0) == d[1].result (-1)

The interesting part here is when you print out the errno that is
returned from the syscall -- it is 513, the value of ERESTARTNOINTR!

My understanding so far: Poking around in kernel/entry/common.c, it
seems that __exit_to_user_mode_loop() calls
arch_do_signal_or_restart() only when there is a pending signal
(_TIF_SIGPENDING or _TIF_NOTIFY_SIGNAL).  So it was possible that the
system call returns with the (normally internal) error code
ERESTARTNOINTR, in the case where the trylock fails, but where current
has not received a signal from the other competing TSYNC thread yet.

So with that in mind, would it work to do this?

  while (try-to-acquire-the-lock) {
    if (current-has-task-works-pending)
      return -ERESTARTNOINTR;

    cond_resched();
  }

Then we could avoid calling task_work_run() directly; (I find it
difficult to reason about the implications of calling taks_work_run()
directly, because these task works may make assumptions about the
context in which they are running.)

(Apologies for the somewhat draft-state source code; I have not had
the time to test my theories yet; I'll fully accept it if I am wrong
about it.)

–Günther


>   # competing_enablement: Test failed
>   #          FAIL  global.competing_enablement
>   not ok 4 global.competing_enablement
>   # FAILED: 3 / 4 tests passed.
> 
> 
> Brief investigation and the additions of these pr_warn lines:
> 
> 
> diff --git a/security/landlock/syscalls.c b/security/landlock/syscalls.c
> index 0d66a68677b7..84909232b220 100644
> --- a/security/landlock/syscalls.c
> +++ b/security/landlock/syscalls.c
> @@ -574,6 +574,9 @@ SYSCALL_DEFINE2(landlock_restrict_self, const int, ruleset_fd, const __u32,
>  		const int err = landlock_restrict_sibling_threads(
>  			current_cred(), new_cred);
>  		if (err) {
> +			pr_warn("landlock: restrict_self tsync err pid=%d tgid=%d err=%d flags=0x%x ruleset_fd=%d\n",
> +				task_pid_nr(current), task_tgid_nr(current), err,
> +				flags, ruleset_fd);
>  			abort_creds(new_cred);
>  			return err;
>  		}
> diff --git a/security/landlock/tsync.c b/security/landlock/tsync.c
> index 5afc5d639b8f..deb0f0b1f081 100644
> --- a/security/landlock/tsync.c
> +++ b/security/landlock/tsync.c
> @@ -489,8 +489,11 @@ int landlock_restrict_sibling_threads(const struct cred *old_cred,
>  	 * Serialize concurrent TSYNC operations to prevent deadlocks when multiple
>  	 * threads call landlock_restrict_self() simultaneously.
>  	 */
> -	if (!down_write_trylock(&current->signal->exec_update_lock))
> +	if (!down_write_trylock(&current->signal->exec_update_lock)) {
> +		pr_warn("landlock: tsync trylock busy pid=%d tgid=%d\n",
> +			task_pid_nr(current), task_tgid_nr(current));
>  		return -ERESTARTNOINTR;
> +	}
>  
>  	/*
>  	 * We schedule a pseudo-signal task_work for each of the calling task's
> @@ -602,6 +605,10 @@ int landlock_restrict_sibling_threads(const struct cred *old_cred,
>  
>  	tsync_works_release(&works);
>  	up_write(&current->signal->exec_update_lock);
> +	if (atomic_read(&shared_ctx.preparation_error))
> +		pr_warn("landlock: tsync preparation_error pid=%d tgid=%d err=%d\n",
> +			task_pid_nr(current), task_tgid_nr(current),
> +			atomic_read(&shared_ctx.preparation_error));
>  
>  	return atomic_read(&shared_ctx.preparation_error);
>  }
> 
> Resulted in the following output:
> 
>   landlock: tsync trylock busy pid=1263 tgid=1261
>   landlock: landlock: restrict_self tsync err pid=1263 tgid=1261 err=-513 flags=0x8 ruleset_fd=6
>   # tsync_test.c:156:competing_enablement:Expected 0 (0) == d[1].result (-1)
>   # competing_enablement: Test failed
>   #          FAIL  global.competing_enablement
>   not ok 4 global.competing_enablement
> 
> I have a fix that I will send as a patch.
> 
> Kind Regards,
> Justin Suess

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 1/2] landlock: Serialize TSYNC thread restriction
  2026-03-03 19:50     ` Günther Noack
@ 2026-03-03 20:38       ` Tingmao Wang
  2026-03-03 21:19         ` Günther Noack
  2026-03-03 21:08       ` Justin Suess
  1 sibling, 1 reply; 17+ messages in thread
From: Tingmao Wang @ 2026-03-03 20:38 UTC (permalink / raw)
  To: Yihan Ding, Günther Noack, Justin Suess,
	Mickaël Salaün
  Cc: Paul Moore, Jann Horn, linux-security-module, linux-kernel,
	syzbot+7ea2f5e9dfd468201817

On 3/3/26 19:50, Günther Noack wrote:
> [...]
> On Tue, Mar 03, 2026 at 11:20:10AM -0500, Justin Suess wrote:
>> On Thu, Feb 26, 2026 at 09:59:02AM +0800, Yihan Ding wrote:
>>> [...]
>>> diff --git a/security/landlock/tsync.c b/security/landlock/tsync.c
>>> index de01aa899751..xxxxxxxxxxxx 100644
>>> --- a/security/landlock/tsync.c
>>> +++ b/security/landlock/tsync.c
>>> @@ -447,6 +447,13 @@ int landlock_restrict_sibling_threads(const struct cred *old_cred,
>>>  	shared_ctx.new_cred = new_cred;
>>>  	shared_ctx.set_no_new_privs = task_no_new_privs(current);
>>>  
>>> +	/*
>>> +	 * Serialize concurrent TSYNC operations to prevent deadlocks
>>> +	 * when multiple threads call landlock_restrict_self() simultaneously.
>>> +	 */
>>> +	if (!down_write_trylock(&current->signal->exec_update_lock))
>>> +		return -ERESTARTNOINTR;
>> These two lines above introduced a test failure in tsync_test
>> completing_enablement.
>>
>> The commit that introduced the bug is 3d6327c306b3e1356ab868bf27a0854669295a4f
>> (this patch) and is currently in the mic/next branch.
>>
>> I noticed the test failure while testing an unrelated patch.
>>
>> The bug is because this code never actually yields or restarts the syscall.
>>
>> This is the test output I observed:
>>
>>   [+] Running tsync_test:
>>   TAP version 13
>>   1..4
>>   # Starting 4 tests from 1 test cases.
>>   #  RUN           global.single_threaded_success ...
>>   #            OK  global.single_threaded_success
>>   ok 1 global.single_threaded_success
>>   #  RUN           global.multi_threaded_success ...
>>   #            OK  global.multi_threaded_success
>>   ok 2 global.multi_threaded_success
>>   #  RUN           global.multi_threaded_success_despite_diverging_domains ...
>>   #            OK  global.multi_threaded_success_despite_diverging_domains
>>   ok 3 global.multi_threaded_success_despite_diverging_domains
>>   #  RUN           global.competing_enablement ...
>>   # tsync_test.c:156:competing_enablement:Expected 0 (0) == d[1].result (-1)
> 
> The interesting part here is when you print out the errno that is
> returned from the syscall -- it is 513, the value of ERESTARTNOINTR!
> 
> My understanding so far: Poking around in kernel/entry/common.c, it
> seems that __exit_to_user_mode_loop() calls
> arch_do_signal_or_restart() only when there is a pending signal
> (_TIF_SIGPENDING or _TIF_NOTIFY_SIGNAL).  So it was possible that the
> system call returns with the (normally internal) error code
> ERESTARTNOINTR, in the case where the trylock fails, but where current
> has not received a signal from the other competing TSYNC thread yet.
> 
> So with that in mind, would it work to do this?
> 
>   while (try-to-acquire-the-lock) {
>     if (current-has-task-works-pending)
>       return -ERESTARTNOINTR;
> 
>     cond_resched();
>   }
> 
> Then we could avoid calling task_work_run() directly; (I find it
> difficult to reason about the implications of calling taks_work_run()
> directly, because these task works may make assumptions about the
> context in which they are running.)

I've not caught up with the full discussion so might be missing some context on why RESTARTNOINTR was used here,
but wouldn't

diff --git a/security/landlock/tsync.c b/security/landlock/tsync.c
index 950b63d23729..f695fe44e2f1 100644
--- a/security/landlock/tsync.c
+++ b/security/landlock/tsync.c
@@ -490,7 +490,7 @@ int landlock_restrict_sibling_threads(const struct cred *old_cred,
 	 * when multiple threads call landlock_restrict_self() simultaneously.
 	 */
 	if (!down_write_trylock(&current->signal->exec_update_lock))
-		return -ERESTARTNOINTR;
+		return restart_syscall();
 
 	/*
 	 * We schedule a pseudo-signal task_work for each of the calling task's

achieve what the original patch intended?

Kind regards,
Tingmao

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 1/2] landlock: Serialize TSYNC thread restriction
  2026-03-03 20:38       ` Tingmao Wang
@ 2026-03-03 21:19         ` Günther Noack
  2026-03-04  2:46           ` Ding Yihan
  0 siblings, 1 reply; 17+ messages in thread
From: Günther Noack @ 2026-03-03 21:19 UTC (permalink / raw)
  To: Tingmao Wang
  Cc: Yihan Ding, Justin Suess, Mickaël Salaün, Paul Moore,
	Jann Horn, linux-security-module, linux-kernel,
	syzbot+7ea2f5e9dfd468201817

On Tue, Mar 03, 2026 at 08:38:13PM +0000, Tingmao Wang wrote:
> On 3/3/26 19:50, Günther Noack wrote:
> > [...]
> > On Tue, Mar 03, 2026 at 11:20:10AM -0500, Justin Suess wrote:
> >> On Thu, Feb 26, 2026 at 09:59:02AM +0800, Yihan Ding wrote:
> >>> [...]
> >>> diff --git a/security/landlock/tsync.c b/security/landlock/tsync.c
> >>> index de01aa899751..xxxxxxxxxxxx 100644
> >>> --- a/security/landlock/tsync.c
> >>> +++ b/security/landlock/tsync.c
> >>> @@ -447,6 +447,13 @@ int landlock_restrict_sibling_threads(const struct cred *old_cred,
> >>>  	shared_ctx.new_cred = new_cred;
> >>>  	shared_ctx.set_no_new_privs = task_no_new_privs(current);
> >>>  
> >>> +	/*
> >>> +	 * Serialize concurrent TSYNC operations to prevent deadlocks
> >>> +	 * when multiple threads call landlock_restrict_self() simultaneously.
> >>> +	 */
> >>> +	if (!down_write_trylock(&current->signal->exec_update_lock))
> >>> +		return -ERESTARTNOINTR;
> >> These two lines above introduced a test failure in tsync_test
> >> completing_enablement.
> >>
> >> The commit that introduced the bug is 3d6327c306b3e1356ab868bf27a0854669295a4f
> >> (this patch) and is currently in the mic/next branch.
> >>
> >> I noticed the test failure while testing an unrelated patch.
> >>
> >> The bug is because this code never actually yields or restarts the syscall.
> >>
> >> This is the test output I observed:
> >>
> >>   [+] Running tsync_test:
> >>   TAP version 13
> >>   1..4
> >>   # Starting 4 tests from 1 test cases.
> >>   #  RUN           global.single_threaded_success ...
> >>   #            OK  global.single_threaded_success
> >>   ok 1 global.single_threaded_success
> >>   #  RUN           global.multi_threaded_success ...
> >>   #            OK  global.multi_threaded_success
> >>   ok 2 global.multi_threaded_success
> >>   #  RUN           global.multi_threaded_success_despite_diverging_domains ...
> >>   #            OK  global.multi_threaded_success_despite_diverging_domains
> >>   ok 3 global.multi_threaded_success_despite_diverging_domains
> >>   #  RUN           global.competing_enablement ...
> >>   # tsync_test.c:156:competing_enablement:Expected 0 (0) == d[1].result (-1)
> > 
> > The interesting part here is when you print out the errno that is
> > returned from the syscall -- it is 513, the value of ERESTARTNOINTR!
> > 
> > My understanding so far: Poking around in kernel/entry/common.c, it
> > seems that __exit_to_user_mode_loop() calls
> > arch_do_signal_or_restart() only when there is a pending signal
> > (_TIF_SIGPENDING or _TIF_NOTIFY_SIGNAL).  So it was possible that the
> > system call returns with the (normally internal) error code
> > ERESTARTNOINTR, in the case where the trylock fails, but where current
> > has not received a signal from the other competing TSYNC thread yet.
> > 
> > So with that in mind, would it work to do this?
> > 
> >   while (try-to-acquire-the-lock) {
> >     if (current-has-task-works-pending)
> >       return -ERESTARTNOINTR;
> > 
> >     cond_resched();
> >   }
> > 
> > Then we could avoid calling task_work_run() directly; (I find it
> > difficult to reason about the implications of calling taks_work_run()
> > directly, because these task works may make assumptions about the
> > context in which they are running.)
> 
> I've not caught up with the full discussion so might be missing some context on why RESTARTNOINTR was used here,
> but wouldn't
> 
> diff --git a/security/landlock/tsync.c b/security/landlock/tsync.c
> index 950b63d23729..f695fe44e2f1 100644
> --- a/security/landlock/tsync.c
> +++ b/security/landlock/tsync.c
> @@ -490,7 +490,7 @@ int landlock_restrict_sibling_threads(const struct cred *old_cred,
>  	 * when multiple threads call landlock_restrict_self() simultaneously.
>  	 */
>  	if (!down_write_trylock(&current->signal->exec_update_lock))
> -		return -ERESTARTNOINTR;
> +		return restart_syscall();
>  
>  	/*
>  	 * We schedule a pseudo-signal task_work for each of the calling task's
> 
> achieve what the original patch intended?

Thanks, that's an excellent point!

restart_syscall() (a) sets TIF_SIGPENDING and then (b) returns
-ERESTARTNOINTR.  (a) was the part that we have been missing for the
restart to work (see discussion above).  Together, (a) and (b) cause
__exit_to_user_mode_loop() to restart the syscall.  Given that this is
offered in signal.h, this seems like a clean and more "official" way
to do this than using the task works APIs.

It also fixes the previously failing selftest (I tried).

Yihan, Justin: Does that seem reasonable to you as well?

–Günther

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 1/2] landlock: Serialize TSYNC thread restriction
  2026-03-03 21:19         ` Günther Noack
@ 2026-03-04  2:46           ` Ding Yihan
  2026-03-04  7:44             ` Günther Noack
  2026-03-04 14:08             ` Justin Suess
  0 siblings, 2 replies; 17+ messages in thread
From: Ding Yihan @ 2026-03-04  2:46 UTC (permalink / raw)
  To: Günther Noack, Tingmao Wang
  Cc: Justin Suess, Mickaël Salaün, Paul Moore, Jann Horn,
	linux-security-module, linux-kernel, syzbot+7ea2f5e9dfd468201817

Hi all,

Thank you Justin for catching the test failure and the thorough
investigation! And thanks Günther and Tingmao for diving into the
syscall restart mechanics.

I've evaluated both the `while` loop approach with `task_work_run()`
and the `restart_syscall()` approach. I strongly lean towards using
`restart_syscall()` as suggested by Tingmao. 

As Günther pointed out earlier, executing `task_work_run()` directly
deep inside the syscall context can be risky. Task works often assume
they are running at the kernel-user boundary with a specific state.
Using `restart_syscall()` safely bounces us to that boundary, processes
the works cleanly, and restarts the syscall via standard mechanisms.

After some selftests,I will prepare the v4 patch series using `restart_syscall()`.
I will also ensure all comments are properly wrapped to 80 columns as requested
by Mickaël, and make sure to include the proper Reported-by and
Suggested-by tags for everyone's excellent input here.

Expect the v4 series shortly. Thanks again for the great collaboration!


Best regards,
Yihan Ding

在 2026/3/4 05:19, Günther Noack 写道:
> On Tue, Mar 03, 2026 at 08:38:13PM +0000, Tingmao Wang wrote:
>> On 3/3/26 19:50, Günther Noack wrote:
>>> [...]
>>> On Tue, Mar 03, 2026 at 11:20:10AM -0500, Justin Suess wrote:
>>>> On Thu, Feb 26, 2026 at 09:59:02AM +0800, Yihan Ding wrote:
>>>>> [...]
>>>>> diff --git a/security/landlock/tsync.c b/security/landlock/tsync.c
>>>>> index de01aa899751..xxxxxxxxxxxx 100644
>>>>> --- a/security/landlock/tsync.c
>>>>> +++ b/security/landlock/tsync.c
>>>>> @@ -447,6 +447,13 @@ int landlock_restrict_sibling_threads(const struct cred *old_cred,
>>>>>  	shared_ctx.new_cred = new_cred;
>>>>>  	shared_ctx.set_no_new_privs = task_no_new_privs(current);
>>>>>  
>>>>> +	/*
>>>>> +	 * Serialize concurrent TSYNC operations to prevent deadlocks
>>>>> +	 * when multiple threads call landlock_restrict_self() simultaneously.
>>>>> +	 */
>>>>> +	if (!down_write_trylock(&current->signal->exec_update_lock))
>>>>> +		return -ERESTARTNOINTR;
>>>> These two lines above introduced a test failure in tsync_test
>>>> completing_enablement.
>>>>
>>>> The commit that introduced the bug is 3d6327c306b3e1356ab868bf27a0854669295a4f
>>>> (this patch) and is currently in the mic/next branch.
>>>>
>>>> I noticed the test failure while testing an unrelated patch.
>>>>
>>>> The bug is because this code never actually yields or restarts the syscall.
>>>>
>>>> This is the test output I observed:
>>>>
>>>>   [+] Running tsync_test:
>>>>   TAP version 13
>>>>   1..4
>>>>   # Starting 4 tests from 1 test cases.
>>>>   #  RUN           global.single_threaded_success ...
>>>>   #            OK  global.single_threaded_success
>>>>   ok 1 global.single_threaded_success
>>>>   #  RUN           global.multi_threaded_success ...
>>>>   #            OK  global.multi_threaded_success
>>>>   ok 2 global.multi_threaded_success
>>>>   #  RUN           global.multi_threaded_success_despite_diverging_domains ...
>>>>   #            OK  global.multi_threaded_success_despite_diverging_domains
>>>>   ok 3 global.multi_threaded_success_despite_diverging_domains
>>>>   #  RUN           global.competing_enablement ...
>>>>   # tsync_test.c:156:competing_enablement:Expected 0 (0) == d[1].result (-1)
>>>
>>> The interesting part here is when you print out the errno that is
>>> returned from the syscall -- it is 513, the value of ERESTARTNOINTR!
>>>
>>> My understanding so far: Poking around in kernel/entry/common.c, it
>>> seems that __exit_to_user_mode_loop() calls
>>> arch_do_signal_or_restart() only when there is a pending signal
>>> (_TIF_SIGPENDING or _TIF_NOTIFY_SIGNAL).  So it was possible that the
>>> system call returns with the (normally internal) error code
>>> ERESTARTNOINTR, in the case where the trylock fails, but where current
>>> has not received a signal from the other competing TSYNC thread yet.
>>>
>>> So with that in mind, would it work to do this?
>>>
>>>   while (try-to-acquire-the-lock) {
>>>     if (current-has-task-works-pending)
>>>       return -ERESTARTNOINTR;
>>>
>>>     cond_resched();
>>>   }
>>>
>>> Then we could avoid calling task_work_run() directly; (I find it
>>> difficult to reason about the implications of calling taks_work_run()
>>> directly, because these task works may make assumptions about the
>>> context in which they are running.)
>>
>> I've not caught up with the full discussion so might be missing some context on why RESTARTNOINTR was used here,
>> but wouldn't
>>
>> diff --git a/security/landlock/tsync.c b/security/landlock/tsync.c
>> index 950b63d23729..f695fe44e2f1 100644
>> --- a/security/landlock/tsync.c
>> +++ b/security/landlock/tsync.c
>> @@ -490,7 +490,7 @@ int landlock_restrict_sibling_threads(const struct cred *old_cred,
>>  	 * when multiple threads call landlock_restrict_self() simultaneously.
>>  	 */
>>  	if (!down_write_trylock(&current->signal->exec_update_lock))
>> -		return -ERESTARTNOINTR;
>> +		return restart_syscall();
>>  
>>  	/*
>>  	 * We schedule a pseudo-signal task_work for each of the calling task's
>>
>> achieve what the original patch intended?
> 
> Thanks, that's an excellent point!
> 
> restart_syscall() (a) sets TIF_SIGPENDING and then (b) returns
> -ERESTARTNOINTR.  (a) was the part that we have been missing for the
> restart to work (see discussion above).  Together, (a) and (b) cause
> __exit_to_user_mode_loop() to restart the syscall.  Given that this is
> offered in signal.h, this seems like a clean and more "official" way
> to do this than using the task works APIs.
> 
> It also fixes the previously failing selftest (I tried).
> 
> Yihan, Justin: Does that seem reasonable to you as well?
> 
> –Günther
> 


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 1/2] landlock: Serialize TSYNC thread restriction
  2026-03-04  2:46           ` Ding Yihan
@ 2026-03-04  7:44             ` Günther Noack
  2026-03-04 14:08             ` Justin Suess
  1 sibling, 0 replies; 17+ messages in thread
From: Günther Noack @ 2026-03-04  7:44 UTC (permalink / raw)
  To: Ding Yihan
  Cc: Tingmao Wang, Justin Suess, Mickaël Salaün, Paul Moore,
	Jann Horn, linux-security-module, linux-kernel,
	syzbot+7ea2f5e9dfd468201817

On Wed, Mar 04, 2026 at 10:46:39AM +0800, Ding Yihan wrote:
> Hi all,
> 
> Thank you Justin for catching the test failure and the thorough
> investigation! And thanks Günther and Tingmao for diving into the
> syscall restart mechanics.
> 
> I've evaluated both the `while` loop approach with `task_work_run()`
> and the `restart_syscall()` approach. I strongly lean towards using
> `restart_syscall()` as suggested by Tingmao. 
> 
> As Günther pointed out earlier, executing `task_work_run()` directly
> deep inside the syscall context can be risky. Task works often assume
> they are running at the kernel-user boundary with a specific state.
> Using `restart_syscall()` safely bounces us to that boundary, processes
> the works cleanly, and restarts the syscall via standard mechanisms.

Agreed.  I also like the restart_syscall() solution for its simplicity
and use of a standard mechanism.

(This code path is very unlikely (and probably unintended by the
userspace programmer), so we need to protect against deadlock, but
it's not a performance critical path by far.  By using the more
standard restart_syscall(), we have to worry about fewer corner cases
(e.g. what assumptions are made by task_works about the context they
get executed in).  I think this robustness trumps performance tuning
in this case.)


> After some selftests,I will prepare the v4 patch series using `restart_syscall()`.
> I will also ensure all comments are properly wrapped to 80 columns as requested
> by Mickaël, and make sure to include the proper Reported-by and
> Suggested-by tags for everyone's excellent input here.
> 
> Expect the v4 series shortly. Thanks again for the great collaboration!

Thanks, I'm looking forward to the revised patch. I agree with this plan. :)

–Günther

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 1/2] landlock: Serialize TSYNC thread restriction
  2026-03-04  2:46           ` Ding Yihan
  2026-03-04  7:44             ` Günther Noack
@ 2026-03-04 14:08             ` Justin Suess
  1 sibling, 0 replies; 17+ messages in thread
From: Justin Suess @ 2026-03-04 14:08 UTC (permalink / raw)
  To: Ding Yihan
  Cc: Günther Noack, Tingmao Wang, Mickaël Salaün,
	Paul Moore, Jann Horn, linux-security-module, linux-kernel,
	syzbot+7ea2f5e9dfd468201817

On Wed, Mar 04, 2026 at 10:46:39AM +0800, Ding Yihan wrote:
> Hi all,
> 
> Thank you Justin for catching the test failure and the thorough
> investigation! And thanks Günther and Tingmao for diving into the
> syscall restart mechanics.
> 
> I've evaluated both the `while` loop approach with `task_work_run()`
> and the `restart_syscall()` approach. I strongly lean towards using
> `restart_syscall()` as suggested by Tingmao. 
> 
> As Günther pointed out earlier, executing `task_work_run()` directly
> deep inside the syscall context can be risky. Task works often assume
> they are running at the kernel-user boundary with a specific state.
> Using `restart_syscall()` safely bounces us to that boundary, processes
> the works cleanly, and restarts the syscall via standard mechanisms.
> 
> After some selftests,I will prepare the v4 patch series using `restart_syscall()`.
> I will also ensure all comments are properly wrapped to 80 columns as requested
> by Mickaël, and make sure to include the proper Reported-by and
> Suggested-by tags for everyone's excellent input here.
> 
> Expect the v4 series shortly. Thanks again for the great collaboration!
> 
> 
> Best regards,
> Yihan Ding
>
After review, I agree Tingmao's solution is better.

Coming from a userspace background, I didn't think of that as a solution
for a lock contention, but kernel space has different needs/conventions.

I agree this is probably the right way to go. The simplest approach is
probably best here, and the restart_syscall seems better here, seeing as
task_work_run is rarely called in kernel code outside core paths.

I've learned a lot about kernel task workers and how locking is handled
as a result.

Thank you for your work with this series, this fix is useful!
> 在 2026/3/4 05:19, Günther Noack 写道:
> > On Tue, Mar 03, 2026 at 08:38:13PM +0000, Tingmao Wang wrote:
> >> On 3/3/26 19:50, Günther Noack wrote:
> >>> [...]
> >>> On Tue, Mar 03, 2026 at 11:20:10AM -0500, Justin Suess wrote:
> >>>> On Thu, Feb 26, 2026 at 09:59:02AM +0800, Yihan Ding wrote:
> >>>>> [...]
> >>>>> diff --git a/security/landlock/tsync.c b/security/landlock/tsync.c
> >>>>> index de01aa899751..xxxxxxxxxxxx 100644
> >>>>> --- a/security/landlock/tsync.c
> >>>>> +++ b/security/landlock/tsync.c
> >>>>> @@ -447,6 +447,13 @@ int landlock_restrict_sibling_threads(const struct cred *old_cred,
> >>>>>  	shared_ctx.new_cred = new_cred;
> >>>>>  	shared_ctx.set_no_new_privs = task_no_new_privs(current);
> >>>>>  
> >>>>> +	/*
> >>>>> +	 * Serialize concurrent TSYNC operations to prevent deadlocks
> >>>>> +	 * when multiple threads call landlock_restrict_self() simultaneously.
> >>>>> +	 */
> >>>>> +	if (!down_write_trylock(&current->signal->exec_update_lock))
> >>>>> +		return -ERESTARTNOINTR;
> >>>> These two lines above introduced a test failure in tsync_test
> >>>> completing_enablement.
> >>>>
> >>>> The commit that introduced the bug is 3d6327c306b3e1356ab868bf27a0854669295a4f
> >>>> (this patch) and is currently in the mic/next branch.
> >>>>
> >>>> I noticed the test failure while testing an unrelated patch.
> >>>>
> >>>> The bug is because this code never actually yields or restarts the syscall.
> >>>>
> >>>> This is the test output I observed:
> >>>>
> >>>>   [+] Running tsync_test:
> >>>>   TAP version 13
> >>>>   1..4
> >>>>   # Starting 4 tests from 1 test cases.
> >>>>   #  RUN           global.single_threaded_success ...
> >>>>   #            OK  global.single_threaded_success
> >>>>   ok 1 global.single_threaded_success
> >>>>   #  RUN           global.multi_threaded_success ...
> >>>>   #            OK  global.multi_threaded_success
> >>>>   ok 2 global.multi_threaded_success
> >>>>   #  RUN           global.multi_threaded_success_despite_diverging_domains ...
> >>>>   #            OK  global.multi_threaded_success_despite_diverging_domains
> >>>>   ok 3 global.multi_threaded_success_despite_diverging_domains
> >>>>   #  RUN           global.competing_enablement ...
> >>>>   # tsync_test.c:156:competing_enablement:Expected 0 (0) == d[1].result (-1)
> >>>
> >>> The interesting part here is when you print out the errno that is
> >>> returned from the syscall -- it is 513, the value of ERESTARTNOINTR!
> >>>
> >>> My understanding so far: Poking around in kernel/entry/common.c, it
> >>> seems that __exit_to_user_mode_loop() calls
> >>> arch_do_signal_or_restart() only when there is a pending signal
> >>> (_TIF_SIGPENDING or _TIF_NOTIFY_SIGNAL).  So it was possible that the
> >>> system call returns with the (normally internal) error code
> >>> ERESTARTNOINTR, in the case where the trylock fails, but where current
> >>> has not received a signal from the other competing TSYNC thread yet.
> >>>
> >>> So with that in mind, would it work to do this?
> >>>
> >>>   while (try-to-acquire-the-lock) {
> >>>     if (current-has-task-works-pending)
> >>>       return -ERESTARTNOINTR;
> >>>
> >>>     cond_resched();
> >>>   }
> >>>
> >>> Then we could avoid calling task_work_run() directly; (I find it
> >>> difficult to reason about the implications of calling taks_work_run()
> >>> directly, because these task works may make assumptions about the
> >>> context in which they are running.)
> >>
> >> I've not caught up with the full discussion so might be missing some context on why RESTARTNOINTR was used here,
> >> but wouldn't
> >>
> >> diff --git a/security/landlock/tsync.c b/security/landlock/tsync.c
> >> index 950b63d23729..f695fe44e2f1 100644
> >> --- a/security/landlock/tsync.c
> >> +++ b/security/landlock/tsync.c
> >> @@ -490,7 +490,7 @@ int landlock_restrict_sibling_threads(const struct cred *old_cred,
> >>  	 * when multiple threads call landlock_restrict_self() simultaneously.
> >>  	 */
> >>  	if (!down_write_trylock(&current->signal->exec_update_lock))
> >> -		return -ERESTARTNOINTR;
> >> +		return restart_syscall();
> >>  
> >>  	/*
> >>  	 * We schedule a pseudo-signal task_work for each of the calling task's
> >>
> >> achieve what the original patch intended?
> > 
> > Thanks, that's an excellent point!
> > 
> > restart_syscall() (a) sets TIF_SIGPENDING and then (b) returns
> > -ERESTARTNOINTR.  (a) was the part that we have been missing for the
> > restart to work (see discussion above).  Together, (a) and (b) cause
> > __exit_to_user_mode_loop() to restart the syscall.  Given that this is
> > offered in signal.h, this seems like a clean and more "official" way
> > to do this than using the task works APIs.
> > 
> > It also fixes the previously failing selftest (I tried).
> > 
> > Yihan, Justin: Does that seem reasonable to you as well?
> > 
> > –Günther
> > 
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 1/2] landlock: Serialize TSYNC thread restriction
  2026-03-03 19:50     ` Günther Noack
  2026-03-03 20:38       ` Tingmao Wang
@ 2026-03-03 21:08       ` Justin Suess
  1 sibling, 0 replies; 17+ messages in thread
From: Justin Suess @ 2026-03-03 21:08 UTC (permalink / raw)
  To: Günther Noack
  Cc: Yihan Ding, Mickaël Salaün, Paul Moore, Jann Horn,
	linux-security-module, linux-kernel, syzbot+7ea2f5e9dfd468201817

On Tue, Mar 03, 2026 at 08:50:50PM +0100, Günther Noack wrote:
> Oof, thanks for catching this, Justin! 🏆
> 
> I was convinced I had tried the selftests for that variant,
> but apparently I had not. Mea culpa.
> 
> On Tue, Mar 03, 2026 at 11:20:10AM -0500, Justin Suess wrote:
> > On Thu, Feb 26, 2026 at 09:59:02AM +0800, Yihan Ding wrote:
> > > syzbot found a deadlock in landlock_restrict_sibling_threads().
> > > When multiple threads concurrently call landlock_restrict_self() with
> > > sibling thread restriction enabled, they can deadlock by mutually
> > > queueing task_works on each other and then blocking in kernel space
> > > (waiting for the other to finish).
> > > 
> > > Fix this by serializing the TSYNC operations within the same process
> > > using the exec_update_lock. This prevents concurrent invocations
> > > from deadlocking. 
> > > 
> > > We use down_write_trylock() and return -ERESTARTNOINTR if the lock
> > > cannot be acquired immediately. This ensures that if a thread fails
> > > to get the lock, it will return to userspace, allowing it to process
> > > any pending TSYNC task_works from the lock holder, and then
> > > transparently restart the syscall.
> > > 
> > > Fixes: 42fc7e6543f6 ("landlock: Multithreading support for landlock_restrict_self()")
> > > Reported-by: syzbot+7ea2f5e9dfd468201817@syzkaller.appspotmail.com
> > > Closes: https://syzkaller.appspot.com/bug?extid=7ea2f5e9dfd468201817
> > > Suggested-by: Günther Noack <gnoack3000@gmail.com>
> > > Signed-off-by: Yihan Ding <dingyihan@uniontech.com>
> > > ---
> > > Changes in v3:
> > > - Replaced down_write_killable() with down_write_trylock() and 
> > >   returned -ERESTARTNOINTR to avoid a secondary deadlock caused by 
> > >   blocking the execution of task_works. (Caught by Günther Noack).
> > > 
> > > Changes in v2:
> > > - Use down_write_killable() instead of down_write().
> > > - Split the interrupt path cleanup into a separate patch.
> > > ---
> > >  security/landlock/tsync.c | 8 ++++++++
> > >  1 file changed, 8 insertions(+)
> > > 
> > > diff --git a/security/landlock/tsync.c b/security/landlock/tsync.c
> > > index de01aa899751..xxxxxxxxxxxx 100644
> > > --- a/security/landlock/tsync.c
> > > +++ b/security/landlock/tsync.c
> > > @@ -447,6 +447,13 @@ int landlock_restrict_sibling_threads(const struct cred *old_cred,
> > >  	shared_ctx.new_cred = new_cred;
> > >  	shared_ctx.set_no_new_privs = task_no_new_privs(current);
> > >  
> > > +	/*
> > > +	 * Serialize concurrent TSYNC operations to prevent deadlocks
> > > +	 * when multiple threads call landlock_restrict_self() simultaneously.
> > > +	 */
> > > +	if (!down_write_trylock(&current->signal->exec_update_lock))
> > > +		return -ERESTARTNOINTR;
> > These two lines above introduced a test failure in tsync_test
> > completing_enablement.
> > 
> > The commit that introduced the bug is 3d6327c306b3e1356ab868bf27a0854669295a4f
> > (this patch) and is currently in the mic/next branch.
> > 
> > I noticed the test failure while testing an unrelated patch.
> > 
> > The bug is because this code never actually yields or restarts the syscall.
> > 
> > This is the test output I observed:
> > 
> >   [+] Running tsync_test:
> >   TAP version 13
> >   1..4
> >   # Starting 4 tests from 1 test cases.
> >   #  RUN           global.single_threaded_success ...
> >   #            OK  global.single_threaded_success
> >   ok 1 global.single_threaded_success
> >   #  RUN           global.multi_threaded_success ...
> >   #            OK  global.multi_threaded_success
> >   ok 2 global.multi_threaded_success
> >   #  RUN           global.multi_threaded_success_despite_diverging_domains ...
> >   #            OK  global.multi_threaded_success_despite_diverging_domains
> >   ok 3 global.multi_threaded_success_despite_diverging_domains
> >   #  RUN           global.competing_enablement ...
> >   # tsync_test.c:156:competing_enablement:Expected 0 (0) == d[1].result (-1)
> 
> The interesting part here is when you print out the errno that is
> returned from the syscall -- it is 513, the value of ERESTARTNOINTR!
> 
> My understanding so far: Poking around in kernel/entry/common.c, it
> seems that __exit_to_user_mode_loop() calls
> arch_do_signal_or_restart() only when there is a pending signal
> (_TIF_SIGPENDING or _TIF_NOTIFY_SIGNAL).  So it was possible that the
> system call returns with the (normally internal) error code
> ERESTARTNOINTR, in the case where the trylock fails, but where current
> has not received a signal from the other competing TSYNC thread yet.
> 
> So with that in mind, would it work to do this?
> 
>   while (try-to-acquire-the-lock) {
>     if (current-has-task-works-pending)
>       return -ERESTARTNOINTR;
> 
>     cond_resched();
>   }
Returning -ERESTARTNOINTR does *work* in my testing.

But really anything that would yield to the scheduler technically would!

But it smells funny to me to restart the whole syscall to wait for a
lock to become free.

Also the documentation for that code ERESTARTNOINTR is:
"System call was interrupted by a signal and will be restarted".

That clearly isn't happening here, we're not being interrupted by a
(literal) signal, we're really waiting on a semaphore.

And this lock is held by a task in this code, so why not run the pending
tasks with task_work_run so you can get out of the lock ASAP instead of
jumping through hoops back down to userspace and back?

> 
> Then we could avoid calling task_work_run() directly; (I find it
> difficult to reason about the implications of calling taks_work_run()
> directly, because these task works may make assumptions about the
> context in which they are running.)
Hmm possibly? I really don't know because there aren't many examples or
docs to look at.

My thought process for picking task_work_run was like this:
1. We have to wait for a lock.
2. So how do we do something useful while waiting to avoid deadlock?
3. We can simply help the other task get through doing whatever needs to
be done to get done the lock (doing the jobs we made with task_work_add)

And the way I decided to do that was task_work_run after grepping around
for some examples in io_uring code.
> 
> (Apologies for the somewhat draft-state source code; I have not had
> the time to test my theories yet; I'll fully accept it if I am wrong
> about it.)
> 
> –Günther
> 
> 
> >   # competing_enablement: Test failed
> >   #          FAIL  global.competing_enablement
> >   not ok 4 global.competing_enablement
> >   # FAILED: 3 / 4 tests passed.
> > 
> > 
> > Brief investigation and the additions of these pr_warn lines:
> > 
> > 
> > diff --git a/security/landlock/syscalls.c b/security/landlock/syscalls.c
> > index 0d66a68677b7..84909232b220 100644
> > --- a/security/landlock/syscalls.c
> > +++ b/security/landlock/syscalls.c
> > @@ -574,6 +574,9 @@ SYSCALL_DEFINE2(landlock_restrict_self, const int, ruleset_fd, const __u32,
> >  		const int err = landlock_restrict_sibling_threads(
> >  			current_cred(), new_cred);
> >  		if (err) {
> > +			pr_warn("landlock: restrict_self tsync err pid=%d tgid=%d err=%d flags=0x%x ruleset_fd=%d\n",
> > +				task_pid_nr(current), task_tgid_nr(current), err,
> > +				flags, ruleset_fd);
> >  			abort_creds(new_cred);
> >  			return err;
> >  		}
> > diff --git a/security/landlock/tsync.c b/security/landlock/tsync.c
> > index 5afc5d639b8f..deb0f0b1f081 100644
> > --- a/security/landlock/tsync.c
> > +++ b/security/landlock/tsync.c
> > @@ -489,8 +489,11 @@ int landlock_restrict_sibling_threads(const struct cred *old_cred,
> >  	 * Serialize concurrent TSYNC operations to prevent deadlocks when multiple
> >  	 * threads call landlock_restrict_self() simultaneously.
> >  	 */
> > -	if (!down_write_trylock(&current->signal->exec_update_lock))
> > +	if (!down_write_trylock(&current->signal->exec_update_lock)) {
> > +		pr_warn("landlock: tsync trylock busy pid=%d tgid=%d\n",
> > +			task_pid_nr(current), task_tgid_nr(current));
> >  		return -ERESTARTNOINTR;
> > +	}
> >  
> >  	/*
> >  	 * We schedule a pseudo-signal task_work for each of the calling task's
> > @@ -602,6 +605,10 @@ int landlock_restrict_sibling_threads(const struct cred *old_cred,
> >  
> >  	tsync_works_release(&works);
> >  	up_write(&current->signal->exec_update_lock);
> > +	if (atomic_read(&shared_ctx.preparation_error))
> > +		pr_warn("landlock: tsync preparation_error pid=%d tgid=%d err=%d\n",
> > +			task_pid_nr(current), task_tgid_nr(current),
> > +			atomic_read(&shared_ctx.preparation_error));
> >  
> >  	return atomic_read(&shared_ctx.preparation_error);
> >  }
> > 
> > Resulted in the following output:
> > 
> >   landlock: tsync trylock busy pid=1263 tgid=1261
> >   landlock: landlock: restrict_self tsync err pid=1263 tgid=1261 err=-513 flags=0x8 ruleset_fd=6
> >   # tsync_test.c:156:competing_enablement:Expected 0 (0) == d[1].result (-1)
> >   # competing_enablement: Test failed
> >   #          FAIL  global.competing_enablement
> >   not ok 4 global.competing_enablement
> > 
> > I have a fix that I will send as a patch.
> > 
> > Kind Regards,
> > Justin Suess

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 1/2] landlock: Serialize TSYNC thread restriction
  2026-02-26  1:59 ` [PATCH v3 1/2] landlock: Serialize TSYNC thread restriction Yihan Ding
  2026-02-26  7:23   ` Günther Noack
  2026-03-03 16:20   ` Justin Suess
@ 2026-03-03 17:51   ` Mickaël Salaün
  2 siblings, 0 replies; 17+ messages in thread
From: Mickaël Salaün @ 2026-03-03 17:51 UTC (permalink / raw)
  To: Yihan Ding
  Cc: Günther Noack, Paul Moore, Jann Horn, linux-security-module,
	linux-kernel, syzbot+7ea2f5e9dfd468201817

On Thu, Feb 26, 2026 at 09:59:02AM +0800, Yihan Ding wrote:
> syzbot found a deadlock in landlock_restrict_sibling_threads().
> When multiple threads concurrently call landlock_restrict_self() with
> sibling thread restriction enabled, they can deadlock by mutually
> queueing task_works on each other and then blocking in kernel space
> (waiting for the other to finish).
> 
> Fix this by serializing the TSYNC operations within the same process
> using the exec_update_lock. This prevents concurrent invocations
> from deadlocking. 
> 
> We use down_write_trylock() and return -ERESTARTNOINTR if the lock
> cannot be acquired immediately. This ensures that if a thread fails
> to get the lock, it will return to userspace, allowing it to process
> any pending TSYNC task_works from the lock holder, and then
> transparently restart the syscall.
> 
> Fixes: 42fc7e6543f6 ("landlock: Multithreading support for landlock_restrict_self()")
> Reported-by: syzbot+7ea2f5e9dfd468201817@syzkaller.appspotmail.com
> Closes: https://syzkaller.appspot.com/bug?extid=7ea2f5e9dfd468201817
> Suggested-by: Günther Noack <gnoack3000@gmail.com>
> Signed-off-by: Yihan Ding <dingyihan@uniontech.com>
> ---
> Changes in v3:
> - Replaced down_write_killable() with down_write_trylock() and 
>   returned -ERESTARTNOINTR to avoid a secondary deadlock caused by 
>   blocking the execution of task_works. (Caught by Günther Noack).
> 
> Changes in v2:
> - Use down_write_killable() instead of down_write().
> - Split the interrupt path cleanup into a separate patch.
> ---
>  security/landlock/tsync.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/security/landlock/tsync.c b/security/landlock/tsync.c
> index de01aa899751..xxxxxxxxxxxx 100644
> --- a/security/landlock/tsync.c
> +++ b/security/landlock/tsync.c
> @@ -447,6 +447,13 @@ int landlock_restrict_sibling_threads(const struct cred *old_cred,
>  	shared_ctx.new_cred = new_cred;
>  	shared_ctx.set_no_new_privs = task_no_new_privs(current);
>  
> +	/*
> +	 * Serialize concurrent TSYNC operations to prevent deadlocks
> +	 * when multiple threads call landlock_restrict_self() simultaneously.

Please format this comment and the next patch ones to fit (and fill) in
80 columns.

> +	 */
> +	if (!down_write_trylock(&current->signal->exec_update_lock))
> +		return -ERESTARTNOINTR;
> +
>  	/*
>  	 * We schedule a pseudo-signal task_work for each of the calling task's
>  	 * sibling threads.  In the task work, each thread:
> @@ -556,6 +563,7 @@ int landlock_restrict_sibling_threads(const struct cred *old_cred,
>  		wait_for_completion(&shared_ctx.all_finished);
>  
>  	tsync_works_release(&works);
> +	up_write(&current->signal->exec_update_lock);
>  
>  	return atomic_read(&shared_ctx.preparation_error);
>  }
> -- 
> 2.51.0
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v3 2/2] landlock: Clean up interrupted thread logic in TSYNC
  2026-02-26  1:59 [PATCH v3 0/2] landlock: Fix TSYNC deadlock and clean up error path Yihan Ding
  2026-02-26  1:59 ` [PATCH v3 1/2] landlock: Serialize TSYNC thread restriction Yihan Ding
@ 2026-02-26  1:59 ` Yihan Ding
  2026-02-26  7:23   ` Günther Noack
  2026-03-03 17:31 ` [PATCH v3 0/2] landlock: Fix TSYNC deadlock and clean up error path Mickaël Salaün
  2 siblings, 1 reply; 17+ messages in thread
From: Yihan Ding @ 2026-02-26  1:59 UTC (permalink / raw)
  To: Mickaël Salaün, Günther Noack
  Cc: Paul Moore, Jann Horn, linux-security-module, linux-kernel,
	syzbot+7ea2f5e9dfd468201817, Yihan Ding

In landlock_restrict_sibling_threads(), when the calling thread is
interrupted while waiting for sibling threads to prepare, it executes
a recovery path.

Previously, this path included a wait_for_completion() call on
all_prepared to prevent a Use-After-Free of the local shared_ctx.
However, this wait is redundant. Exiting the main do-while loop
already leads to a bottom cleanup section that unconditionally waits
for all_finished. Therefore, replacing the wait with a simple break
is safe, prevents UAF, and correctly unblocks the remaining task_works.

Clean up the error path by breaking the loop and updating the
surrounding comments to accurately reflect the state machine.

Suggested-by: Günther Noack <gnoack3000@gmail.com>
Signed-off-by: Yihan Ding <dingyihan@uniontech.com>
---
Change in v3:
-No change in v3

Changes in v2:
- Replaced wait_for_completion(&shared_ctx.all_prepared) with a break
  statement based on the realization that the bottom wait for 'all_finished'
  already guards against UAF.
- Updated comments for clarity.
---
 security/landlock/tsync.c | 18 +++++++++++-------
 1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/security/landlock/tsync.c b/security/landlock/tsync.c
index 420fcfc2fe9a..9731ec7f329a 100644
--- a/security/landlock/tsync.c
+++ b/security/landlock/tsync.c
@@ -534,24 +534,28 @@ int landlock_restrict_sibling_threads(const struct cred *old_cred,
 					   -ERESTARTNOINTR);
 
 				/*
-				 * Cancel task works for tasks that did not start running yet,
-				 * and decrement all_prepared and num_unfinished accordingly.
+				 * Opportunistic improvement: try to cancel task works
+				 * for tasks that did not start running yet. We do not
+				 * have a guarantee that it cancels any of the enqueued
+				 * task works (because task_work_run() might already have
+				 * dequeued them).
 				 */
 				cancel_tsync_works(&works, &shared_ctx);
 
 				/*
-				 * The remaining task works have started running, so waiting for
-				 * their completion will finish.
+				 * Break the loop with error. The cleanup code after the loop
+				 * unblocks the remaining task_works.
 				 */
-				wait_for_completion(&shared_ctx.all_prepared);
+				break;
 			}
 		}
 	} while (found_more_threads &&
 		 !atomic_read(&shared_ctx.preparation_error));
 
 	/*
-	 * We now have all sibling threads blocking and in "prepared" state in the
-	 * task work. Ask all threads to commit.
+	 * We now have either (a) all sibling threads blocking and in
+	 * "prepared" state in the task work, or (b) the preparation error is
+	 * set. Ask all threads to commit (or abort).
 	 */
 	complete_all(&shared_ctx.ready_to_commit);
 
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 2/2] landlock: Clean up interrupted thread logic in TSYNC
  2026-02-26  1:59 ` [PATCH v3 2/2] landlock: Clean up interrupted thread logic in TSYNC Yihan Ding
@ 2026-02-26  7:23   ` Günther Noack
  0 siblings, 0 replies; 17+ messages in thread
From: Günther Noack @ 2026-02-26  7:23 UTC (permalink / raw)
  To: Yihan Ding
  Cc: Mickaël Salaün, Paul Moore, Jann Horn,
	linux-security-module, linux-kernel, syzbot+7ea2f5e9dfd468201817

On Thu, Feb 26, 2026 at 09:59:03AM +0800, Yihan Ding wrote:
> In landlock_restrict_sibling_threads(), when the calling thread is
> interrupted while waiting for sibling threads to prepare, it executes
> a recovery path.
> 
> Previously, this path included a wait_for_completion() call on
> all_prepared to prevent a Use-After-Free of the local shared_ctx.
> However, this wait is redundant. Exiting the main do-while loop
> already leads to a bottom cleanup section that unconditionally waits
> for all_finished. Therefore, replacing the wait with a simple break
> is safe, prevents UAF, and correctly unblocks the remaining task_works.
> 
> Clean up the error path by breaking the loop and updating the
> surrounding comments to accurately reflect the state machine.
> 
> Suggested-by: Günther Noack <gnoack3000@gmail.com>
> Signed-off-by: Yihan Ding <dingyihan@uniontech.com>
> ---
> Change in v3:
> -No change in v3
> 
> Changes in v2:
> - Replaced wait_for_completion(&shared_ctx.all_prepared) with a break
>   statement based on the realization that the bottom wait for 'all_finished'
>   already guards against UAF.
> - Updated comments for clarity.
> ---
>  security/landlock/tsync.c | 18 +++++++++++-------
>  1 file changed, 11 insertions(+), 7 deletions(-)
> 
> diff --git a/security/landlock/tsync.c b/security/landlock/tsync.c
> index 420fcfc2fe9a..9731ec7f329a 100644
> --- a/security/landlock/tsync.c
> +++ b/security/landlock/tsync.c
> @@ -534,24 +534,28 @@ int landlock_restrict_sibling_threads(const struct cred *old_cred,
>  					   -ERESTARTNOINTR);
>  
>  				/*
> -				 * Cancel task works for tasks that did not start running yet,
> -				 * and decrement all_prepared and num_unfinished accordingly.
> +				 * Opportunistic improvement: try to cancel task works
> +				 * for tasks that did not start running yet. We do not
> +				 * have a guarantee that it cancels any of the enqueued
> +				 * task works (because task_work_run() might already have
> +				 * dequeued them).
>  				 */
>  				cancel_tsync_works(&works, &shared_ctx);
>  
>  				/*
> -				 * The remaining task works have started running, so waiting for
> -				 * their completion will finish.
> +				 * Break the loop with error. The cleanup code after the loop
> +				 * unblocks the remaining task_works.
>  				 */
> -				wait_for_completion(&shared_ctx.all_prepared);
> +				break;
>  			}
>  		}
>  	} while (found_more_threads &&
>  		 !atomic_read(&shared_ctx.preparation_error));
>  
>  	/*
> -	 * We now have all sibling threads blocking and in "prepared" state in the
> -	 * task work. Ask all threads to commit.
> +	 * We now have either (a) all sibling threads blocking and in
> +	 * "prepared" state in the task work, or (b) the preparation error is
> +	 * set. Ask all threads to commit (or abort).
>  	 */
>  	complete_all(&shared_ctx.ready_to_commit);
>  
> -- 
> 2.51.0
> 

Reviewed-by: Günther Noack <gnoack3000@gmail.com>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 0/2] landlock: Fix TSYNC deadlock and clean up error path
  2026-02-26  1:59 [PATCH v3 0/2] landlock: Fix TSYNC deadlock and clean up error path Yihan Ding
  2026-02-26  1:59 ` [PATCH v3 1/2] landlock: Serialize TSYNC thread restriction Yihan Ding
  2026-02-26  1:59 ` [PATCH v3 2/2] landlock: Clean up interrupted thread logic in TSYNC Yihan Ding
@ 2026-03-03 17:31 ` Mickaël Salaün
  2 siblings, 0 replies; 17+ messages in thread
From: Mickaël Salaün @ 2026-03-03 17:31 UTC (permalink / raw)
  To: Yihan Ding
  Cc: Günther Noack, Paul Moore, Jann Horn, linux-security-module,
	linux-kernel, syzbot+7ea2f5e9dfd468201817

Thanks!  It's been in -next since last week and I'll send a PR with it
soon.

On Thu, Feb 26, 2026 at 09:59:01AM +0800, Yihan Ding wrote:
> Hello,
> 
> This patch series fixes a deadlock in the Landlock TSYNC multithreading 
> support, originally reported by syzbot, and cleans up the associated 
> interrupt recovery path.
> 
> The deadlock occurs when multiple threads concurrently call 
> landlock_restrict_self() with sibling thread restriction enabled, 
> causing them to mutually queue task_works on each other and block 
> indefinitely.
> 
> * Patch 1 fixes the root cause by serializing the TSYNC operations 
>   within the same process using the exec_update_lock.
> * Patch 2 cleans up the interrupt recovery path by replacing an 
>   unnecessary wait_for_completion() with a straightforward loop break, 
>   avoiding Use-After-Free while unblocking remaining task_works.
> 
> Changes in v3:
> - Patch 1: Changed down_write_killable() to down_write_trylock() and
>   return -ERESTARTNOINTR on failure. This avoids a secondary deadlock 
>   where a blocking wait prevents a sibling thread from waking up to 
>   execute the requested TSYNC task_work. (Noted by Günther Noack. 
>   down_write_interruptible() was also suggested but is not implemented 
>   for rw_semaphores in the kernel).
> - Patch 2: No changes.
> 
> Changes in v2:
> - Split the changes into a 2-patch series.
> - Patch 1: Adopted down_write_killable() instead of down_write().
> - Patch 2: Removed wait_for_completion(&shared_ctx.all_prepared) and 
>   replaced it with a `break` to prevent UAF.
> 
> Link to v2: https://lore.kernel.org/all/20260225024734.3024732-1-dingyihan@uniontech.com/
> Link to v1: https://lore.kernel.org/all/20260224062729.2908692-1-dingyihan@uniontech.com/
> 
> Yihan Ding (2):
>   landlock: Serialize TSYNC thread restriction
>   landlock: Clean up interrupted thread logic in TSYNC
> 
>  security/landlock/tsync.c | 20 ++++++++++++++------
>  1 file changed, 14 insertions(+), 6 deletions(-)
> -- 
> 2.51.0

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2026-03-04 14:08 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-26  1:59 [PATCH v3 0/2] landlock: Fix TSYNC deadlock and clean up error path Yihan Ding
2026-02-26  1:59 ` [PATCH v3 1/2] landlock: Serialize TSYNC thread restriction Yihan Ding
2026-02-26  7:23   ` Günther Noack
2026-03-03 16:20   ` Justin Suess
2026-03-03 17:47     ` Mickaël Salaün
2026-03-03 18:13       ` Justin Suess
2026-03-03 19:50     ` Günther Noack
2026-03-03 20:38       ` Tingmao Wang
2026-03-03 21:19         ` Günther Noack
2026-03-04  2:46           ` Ding Yihan
2026-03-04  7:44             ` Günther Noack
2026-03-04 14:08             ` Justin Suess
2026-03-03 21:08       ` Justin Suess
2026-03-03 17:51   ` Mickaël Salaün
2026-02-26  1:59 ` [PATCH v3 2/2] landlock: Clean up interrupted thread logic in TSYNC Yihan Ding
2026-02-26  7:23   ` Günther Noack
2026-03-03 17:31 ` [PATCH v3 0/2] landlock: Fix TSYNC deadlock and clean up error path Mickaël Salaün

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox