Re: [syzbot] [kernel?] INFO: task hung in restrict_one_thread

public inbox for linux-security-module@vger.kernel.org
 help / color / mirror / Atom feed

* Re: [syzbot] [kernel?] INFO: task hung in restrict_one_thread_callback
       [not found] <69995a88.050a0220.340abe.0d25.GAE@google.com>
@ 2026-02-21  7:28 ` Ding Yihan
  2026-02-21 12:00   ` Günther Noack
  2026-02-24 14:43   ` [syzbot] [kernel?] INFO: task hung " Günther Noack
  0 siblings, 2 replies; 13+ messages in thread
From: Ding Yihan @ 2026-02-21  7:28 UTC (permalink / raw)
  To: syzbot, Mickaël Salaün
  Cc: linux-security-module, Günther Noack

Hi all,

Thanks to syzbot for the testing and confirmation.

Since I am relatively new to the inner workings of this specific subsystem, 
I would like to take a few days to thoroughly study the root cause 
(the task_work and mutex interaction) and prepare a detailed and proper commit message. 

I will send out the formal patch (v1) to the mailing list later.

Best regards,
Yihan Ding

在 2026/2/21 15:11, syzbot 写道:
> Hello,
> 
> syzbot has tested the proposed patch and the reproducer did not trigger any issue:
> 
> Reported-by: syzbot+7ea2f5e9dfd468201817@syzkaller.appspotmail.com
> Tested-by: syzbot+7ea2f5e9dfd468201817@syzkaller.appspotmail.com
> 
> Tested on:
> 
> commit:         d4906ae1 Add linux-next specific files for 20260220
> git tree:       linux-next
> console output: https://syzkaller.appspot.com/x/log.txt?x=13ea89e6580000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=51f859f3211496bc
> dashboard link: https://syzkaller.appspot.com/bug?extid=7ea2f5e9dfd468201817
> compiler:       Debian clang version 21.1.8 (++20251221033036+2078da43e25a-1~exp1~20251221153213.50), Debian LLD 21.1.8
> patch:          https://syzkaller.appspot.com/x/patch.diff?x=15f0595a580000
> 
> Note: testing is done by a robot and is best-effort only.
> 



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [syzbot] [kernel?] INFO: task hung in restrict_one_thread_callback
  2026-02-21  7:28 ` [syzbot] [kernel?] INFO: task hung in restrict_one_thread_callback Ding Yihan
@ 2026-02-21 12:00   ` Günther Noack
  2026-02-21 13:19     ` Günther Noack
  2026-02-24 14:43   ` [syzbot] [kernel?] INFO: task hung " Günther Noack
  1 sibling, 1 reply; 13+ messages in thread
From: Günther Noack @ 2026-02-21 12:00 UTC (permalink / raw)
  To: Ding Yihan
  Cc: syzbot, Mickaël Salaün, linux-security-module,
	Jann Horn

Hello Ding!

On Sat, Feb 21, 2026 at 03:28:47PM +0800, Ding Yihan wrote:
> Since I am relatively new to the inner workings of this specific subsystem, 
> I would like to take a few days to thoroughly study the root cause 
> (the task_work and mutex interaction) and prepare a detailed and proper commit message. 
> 
> I will send out the formal patch (v1) to the mailing list later.

Thank you very much for preparing a patch, and especially also for
forwarding this to us.  (The original syzkaller report was somehow not
addressed to Landlock or the LSM list.  We should fix that.)

Timing wise, the feature was picked up for the 7.0 release, so we
still have some time to fix it before this is stable.

As an early review for the patch:

Background:

We had previously convinced ourselves that grabbing the
cred_guard_mutex was not necessary.  To quote the comment in
landlock_restrict_sibling_threads():

    Unlike seccomp, which modifies sibling tasks directly, we do not need to
    acquire the cred_guard_mutex and sighand->siglock:

    - As in our case, all threads are themselves exchanging their own struct
      cred through the credentials API, no locks are needed for that.
    - Our for_each_thread() loops are protected by RCU.
    - We do not acquire a lock to keep the list of sibling threads stable
      between our for_each_thread loops.  If the list of available sibling
      threads changes between these for_each_thread loops, we make up for
      that by continuing to look for threads until they are all discovered
      and have entered their task_work, where they are unable to spawn new
      threads.

The question of locking cred_guard_mutex came up in the patch
discussion multiple times as well, the most recent discussion was:
https://lore.kernel.org/all/20251020.fohbo6Iecahz@digikod.net/

If it helps, I keep some of my own notes for this particular feature
on https://wiki.gnoack.org/LandlockMultithreadedEnforcement.

(Very) tentative investigation:

In the Syzkaller report [2], it seems that the reproducer [2.1] is
creating two rulesets and then enforcing them in parallel, a scenario
which we are exercising in the TEST(competing_enablement) in
tools/testing/selftests/landlock/tsync_test.c already, but which has
not failed in my own selftest runs.

In the crash report, there are four threads in total:

* Two are stuck in the line
  wait_for_completion(&ctx->ready_to_commit);
  in the per-thread task work (line 128 [4.1])
* Two are stuck in the line
  wait_for_completion(&shared_ctx.all_prepared)
  in the calling thread's coordination logic (line 539 [4.2])

In line 539, we are already on the code path where we detected that we
are getting interrupted by another thread and where we are attempting
to deal with the scenario where two landlock_restrict_self() calls
compete.  This is detected on line 523 when
wait_for_completion_interruptible() is true.  The approach to handle
this is to set the overall -ERESTARTNOINTR error and cancel the work
that has been ongoing so far, by canceling the task works that did not
start running yet and waiting for the ones that did start running
(that is the step where we are blocked!).  The reasoning there was
that these task works will all hit the "all_prepared" stage now, but
as we can see in the stack trace, the task works that are actively
running are already on line 128 and have passed the "all_prepared"
stage).

Differences I can see between syzkaller and our own test:

* The reproducer also calls openat() and then twice socketpair().
  These syscalls should be unrelated, but it's possible that the
  "async" invocation of socketpair() contributes to adding more
  threads. (Assuming that "async" means "in new thread" in syzkaller)
* Syzkaller gives it more attempts. ([2.2])

I do not understand yet what went wrong in our scheme and need to look
deeper.

Ding, do you have more insights into it from your debugging?

Thanks,
–Günther

For reference:

[1] Report Mail: https://lore.kernel.org/all/69984159.050a0220.21cd75.01bb.GAE@google.com/
[2] Report: https://syzkaller.appspot.com/bug?extid=7ea2f5e9dfd468201817
  [2.1] Reproducer: https://syzkaller.appspot.com/text?tag=ReproSyz&x=16e41c02580000
  [2.2] Reproducer (C): https://syzkaller.appspot.com/text?tag=ReproC&x=15813652580000
[3] Patch: https://lore.kernel.org/all/6999504d.a70a0220.2c38d7.0154.GAE@google.com/
[4.1] https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/security/landlock/tsync.c?id=635c467cc14ebdffab3f77610217c1dacaf88e8c#n128
[4.2] https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/security/landlock/tsync.c?id=635c467cc14ebdffab3f77610217c1dacaf88e8c#n539

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [syzbot] [kernel?] INFO: task hung in restrict_one_thread_callback
  2026-02-21 12:00   ` Günther Noack
@ 2026-02-21 13:19     ` Günther Noack
  2026-02-23  9:42       ` Günther Noack
  0 siblings, 1 reply; 13+ messages in thread
From: Günther Noack @ 2026-02-21 13:19 UTC (permalink / raw)
  To: Ding Yihan
  Cc: syzbot, Mickaël Salaün, linux-security-module,
	Jann Horn

On Sat, Feb 21, 2026 at 01:00:03PM +0100, Günther Noack wrote:
> (Very) tentative investigation:
> 
> In the Syzkaller report [2], it seems that the reproducer [2.1] is
> creating two rulesets and then enforcing them in parallel, a scenario
> which we are exercising in the TEST(competing_enablement) in
> tools/testing/selftests/landlock/tsync_test.c already, but which has
> not failed in my own selftest runs.
> 
> In the crash report, there are four threads in total:
> 
> * Two are stuck in the line
>   wait_for_completion(&ctx->ready_to_commit);
>   in the per-thread task work (line 128 [4.1])
> * Two are stuck in the line
>   wait_for_completion(&shared_ctx.all_prepared)
>   in the calling thread's coordination logic (line 539 [4.2])
> 
> In line 539, we are already on the code path where we detected that we
> are getting interrupted by another thread and where we are attempting
> to deal with the scenario where two landlock_restrict_self() calls
> compete.  This is detected on line 523 when
> wait_for_completion_interruptible() is true.  The approach to handle
> this is to set the overall -ERESTARTNOINTR error and cancel the work
> that has been ongoing so far, by canceling the task works that did not
> start running yet and waiting for the ones that did start running
> (that is the step where we are blocked!).  The reasoning there was
> that these task works will all hit the "all_prepared" stage now, but
> as we can see in the stack trace, the task works that are actively
> running are already on line 128 and have passed the "all_prepared"
> stage).
> 
> Differences I can see between syzkaller and our own test:
> 
> * The reproducer also calls openat() and then twice socketpair().
>   These syscalls should be unrelated, but it's possible that the
>   "async" invocation of socketpair() contributes to adding more
>   threads. (Assuming that "async" means "in new thread" in syzkaller)
> * Syzkaller gives it more attempts. ([2.2])
> 
> I do not understand yet what went wrong in our scheme and need to look
> deeper.

OK, I think I understand now.  Our existing recovery code for this
conflict is this:

/*
 * Decrement num_preparing for current, to undo that we initialized it
 * to 1 a few lines above.
 */
if (atomic_dec_return(&shared_ctx.num_preparing) > 0) {
	if (wait_for_completion_interruptible(
		    &shared_ctx.all_prepared)) {
		/* In case of interruption, we need to retry the system call. */
		atomic_set(&shared_ctx.preparation_error,
			   -ERESTARTNOINTR);

		/*
		 * Cancel task works for tasks that did not start running yet,
		 * and decrement all_prepared and num_unfinished accordingly.
		 */
		cancel_tsync_works(&works, &shared_ctx);

		/*
		 * The remaining task works have started running, so waiting for
		 * their completion will finish.
		 */
		wait_for_completion(&shared_ctx.all_prepared);
	}
}

When I wrote this, I assumed, as the last comment states, that the
task works which we could not cancel, are already running.

I was wrong there, because I had misunderstood task_work_run().  When
the task works get run there, it first *atomically dequeues the entire
queue of scheduled task works*, and then runs them sequentially.

That is why, if we have one task work that belongs to the first
landlock_restrict_self() call and one which belongs to the other, the
task work which is scheduled later can (a) not be dequeued with
cancel_tsync_works() any more, and (b) also has not started running
yet.

Now the only thing that is necessary to produce the deadlock is that
we have a pair of threads where the task works for the restriction
calls have been scheduled in different order.  When the two
landlock_restrict_self() calls end up in the recovery path quoted
above, they will wait for one of their task works to run which is
blocked from running by another task work that is scheduled before and
does not finish either.

(Just pasting a brain dump here to save you some time hunting for the
root cause. I don't know the best solution yet either.)

–Günther

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [syzbot] [kernel?] INFO: task hung in restrict_one_thread_callback
  2026-02-21 13:19     ` Günther Noack
@ 2026-02-23  9:42       ` Günther Noack
  2026-02-23 11:29         ` Ding Yihan
  2026-02-24  6:27         ` [PATCH] landlock: Fix deadlock " Yihan Ding
  0 siblings, 2 replies; 13+ messages in thread
From: Günther Noack @ 2026-02-23  9:42 UTC (permalink / raw)
  To: Ding Yihan
  Cc: syzbot, Mickaël Salaün, linux-security-module,
	Jann Horn, Paul Moore

On Sat, Feb 21, 2026 at 02:19:53PM +0100, Günther Noack wrote:
> OK, I think I understand now.  Our existing recovery code for this
> conflict is this:
> 
> /*
>  * Decrement num_preparing for current, to undo that we initialized it
>  * to 1 a few lines above.
>  */
> if (atomic_dec_return(&shared_ctx.num_preparing) > 0) {
> 	if (wait_for_completion_interruptible(
> 		    &shared_ctx.all_prepared)) {
> 		/* In case of interruption, we need to retry the system call. */
> 		atomic_set(&shared_ctx.preparation_error,
> 			   -ERESTARTNOINTR);
> 
> 		/*
> 		 * Cancel task works for tasks that did not start running yet,
> 		 * and decrement all_prepared and num_unfinished accordingly.
> 		 */
> 		cancel_tsync_works(&works, &shared_ctx);
> 
> 		/*
> 		 * The remaining task works have started running, so waiting for
> 		 * their completion will finish.
> 		 */
> 		wait_for_completion(&shared_ctx.all_prepared);
> 	}
> }
> 
> When I wrote this, I assumed, as the last comment states, that the
> task works which we could not cancel, are already running.
> 
> I was wrong there, because I had misunderstood task_work_run().  When
> the task works get run there, it first *atomically dequeues the entire
> queue of scheduled task works*, and then runs them sequentially.
> 
> That is why, if we have one task work that belongs to the first
> landlock_restrict_self() call and one which belongs to the other, the
> task work which is scheduled later can (a) not be dequeued with
> cancel_tsync_works() any more, and (b) also has not started running
> yet.
> 
> Now the only thing that is necessary to produce the deadlock is that
> we have a pair of threads where the task works for the restriction
> calls have been scheduled in different order.  When the two
> landlock_restrict_self() calls end up in the recovery path quoted
> above, they will wait for one of their task works to run which is
> blocked from running by another task work that is scheduled before and
> does not finish either.
> 
> (Just pasting a brain dump here to save you some time hunting for the
> root cause. I don't know the best solution yet either.)

Let me propose the following fixes:

1. Immediate fix for that specific issue
----------------------------------------

Proposal:
* Remove the wait_for_completion(&shared_ctx.all_prepared)
  call in the code snippet above.
* Rewrite surrounding comments: Be clear about the fact that
  cancel_tsync_works() is an opportunistic improvement, but we don't
  have a guarantee at all that it cancels any of the enqueued task
  works (because task_work_run might already have popped them off).

This removes the hold-and-wait dependency circle between the threads,
which produces the observed deadlock.  The way that we shut down now
is that we exit the main loop (happens already without it, but we
might also "break" to be explicit).

I think that this fix or an equivalent one is needed here, because in
either way, our assumptions in the quoted code above were wrong.

2. Can we reason constructively about correctness?
--------------------------------------------------

The remaining question: If on the shutdown path, we can not actually
remove all the enqueued task works, under what circumstances are we
even able to interrupt and return from the landlock_restrict_self()
system call?

2.1 For n competing restrict_self calls, n-1 of them need to get interrupted
----------------------------------------------------------------------------

To answer this, consider a multithreaded process with threads named
"red", "green" and "blue" and many additional threads: When "red",
"green" and "blue" enforce landlock_restrict_self() concurrently, due
to differing iteration order, we might end up enqueueing the task
works on other threads in all of the following combinations:

  t0:  R G B  <- front of queue
  t1:  R B G
  t2:  G R B
  t3:  G B R
  t4:  B R G
  t5:  B G R

In this configuration, for any of the landlock_restrict_self() system
calls to even return (successfully or unsuccessfully), at least two
threads must receive an interrupt and therefore remove their enqueued
task works from the front of the queue.  Assuming those are green and
blue, we get:

  t0:  R      <- front of queue
  t1:  R
  t2:  G R
  t3:  G B R
  t4:  B R
  t5:  B G R

(This works because after the patch above, all of the enqueued G and B
works finish even if there are remaining G and B works that are still
blocked by an "R" entry.)

Now, "R" is in the front of the queue, and the
landlock_restrict_self() call for the red thread can finish normally,
even without it being interrupted.

Once the "R" task works are done as well, the remaining G and B works
can run and finish as well.

This scheme generalizes: If we have n competing
landlock_restrict_self() calls, then in worst case, at least n-1 of
these system calls need to be interrupted so that they can all
terminate.

2.2 Can we guarantee that two system calls get interrupted?
-----------------------------------------------------------

In case of competing landlock_restrict_self() calls, I think it is
possible that not all relevant system calls get seen.  The scenario is
one where we have a "red" and "green" thread calling
landlock_restrict_self().

  (a set of additional threads)
  t0: task_works: R G
  t1: task_works: G R
  tR: red thread
  tG: green thread

In the red thread, the following happens:
 * Under RCU, count the number of total threads => get a low number
 * Allocate space for that number of task_works
 * Under RCU
   * Enqueue "R" into t0 and t1
   * Enqueue "R" for some of the "additional threads"
   * But we do not have enough pre-allocated space to enqueue "R" for
     the green thread tG.

The same thing happens in the green thread as well.

The result is that we still have a deadlock between t0 and t1, but
neither the red nor the green thread get interrupted so that they can
resolve it.

(FWIW, you could resolve it from the outside by sending a signal to
the red or green thread manually, but it is not guaranteed to happen
on its own.)

Caveat: I am making pessimistic assumptions about the iteration order
of the task list here, and I am assuming that the number of
"additional threads" is swinging up and down during the competing
enforcement, so that the enforcing threads are mis-approximating the
required space for memory pre-allocation.

2.3 Possible resolutions
------------------------

* We could try to interrupt all sibling threads during the teardown,
  to fix the issue discussed in 2.2. (Downside: Complicated, more
  expensive)
* The reason why landlock_restrict_self() can't return is because it
  needs to wait until all task works are done before it can free the
  memory.  Alternatively, we could make the task works take ownership
  of these memory structures (refcounting the shared_ctx).  (Downside:
  The used memory is not linear to the number of threads any more.)

Side remark: In testing, I had the impression that the
landlock_restrict_self() calls can go into a retry loop for a while
where all competing threads get interrupted all the time; in a debug
build, when the Syzkaller test prints out a line for each attempt,
sometimes it was hanging for seconds and *then* resolving itself
again.

3 Conclusion
---------------

I would prefer if the final solution would not require deadlock
reasoning at that level and we could do it in simpler way.  I
therefore propose to do what Ding Yihan suggested, and what we had
also discussed previously in the code review:

* Let's serialize the landlock_restrict_self()-with-TSYNC operations
  through the cred_guard_mutex.

This will resolve the issue where competing landlock_restrict_self()
calls with TSYNC can deadlock.  It will also remove the jittery
behavior for that worst case where the conflict is resolved through
retry.

So in my mind, we need both patches:

 * The fix to the cleanup path from 1. above, to make interruption
   work more reliably and to correct the misunderstandings in the
   comments.
 * cred_guard_mutex to serialize the TSYNC invocations.

Please let me know what you think.

Thanks,
–Günther

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [syzbot] [kernel?] INFO: task hung in restrict_one_thread_callback
  2026-02-23  9:42       ` Günther Noack
@ 2026-02-23 11:29         ` Ding Yihan
  2026-02-23 15:16           ` Günther Noack
  2026-02-24  6:27         ` [PATCH] landlock: Fix deadlock " Yihan Ding
  1 sibling, 1 reply; 13+ messages in thread
From: Ding Yihan @ 2026-02-23 11:29 UTC (permalink / raw)
  To: Günther Noack
  Cc: syzbot, Mickaël Salaün, linux-security-module,
	Jann Horn, Paul Moore

Hi Günther,

Thank you for the detailed analysis and the clear breakdown. 
Apologies for the delayed response. I spent the last couple of days
thoroughly reading through the previous mailing list discussions. I
was trying hard to see if there was any viable pure lockless design
that could solve this concurrency issue while preserving the original
architecture. 

However, after looking at the complexities you outlined, I completely
agree with your conclusion: serializing the TSYNC operations is indeed
the most robust and reasonable path forward to prevent the deadlock.

Regarding the lock choice, since 'cred_guard_mutex' is explicitly
marked as deprecated for new code in the kernel,maybe we can use its
modern replacement: 'exec_update_lock' (using down_write_trylock /
up_write on current->signal). This aligns with the current subsystem
standards and was also briefly touched upon by Jann in the older
discussions.

I fully understand the requirement for the two-part patch series:
1. Cleaning up the cancellation logic and comments.
2. Introducing the serialization lock for TSYNC.

I will take some time to draft and test this patch series properly. 
I also plan to discuss this with my kernel colleagues here at 
UnionTech to see if they have any additional suggestions on the 
implementation details before I submit it.

I will send out the v1 patch series to the list as soon as it is
ready. Thanks again for your guidance and the great discussion!

Best regards,
Yihan Ding

在 2026/2/23 17:42, Günther Noack 写道:
> On Sat, Feb 21, 2026 at 02:19:53PM +0100, Günther Noack wrote:
>> OK, I think I understand now.  Our existing recovery code for this
>> conflict is this:
>>
>> /*
>>  * Decrement num_preparing for current, to undo that we initialized it
>>  * to 1 a few lines above.
>>  */
>> if (atomic_dec_return(&shared_ctx.num_preparing) > 0) {
>> 	if (wait_for_completion_interruptible(
>> 		    &shared_ctx.all_prepared)) {
>> 		/* In case of interruption, we need to retry the system call. */
>> 		atomic_set(&shared_ctx.preparation_error,
>> 			   -ERESTARTNOINTR);
>>
>> 		/*
>> 		 * Cancel task works for tasks that did not start running yet,
>> 		 * and decrement all_prepared and num_unfinished accordingly.
>> 		 */
>> 		cancel_tsync_works(&works, &shared_ctx);
>>
>> 		/*
>> 		 * The remaining task works have started running, so waiting for
>> 		 * their completion will finish.
>> 		 */
>> 		wait_for_completion(&shared_ctx.all_prepared);
>> 	}
>> }
>>
>> When I wrote this, I assumed, as the last comment states, that the
>> task works which we could not cancel, are already running.
>>
>> I was wrong there, because I had misunderstood task_work_run().  When
>> the task works get run there, it first *atomically dequeues the entire
>> queue of scheduled task works*, and then runs them sequentially.
>>
>> That is why, if we have one task work that belongs to the first
>> landlock_restrict_self() call and one which belongs to the other, the
>> task work which is scheduled later can (a) not be dequeued with
>> cancel_tsync_works() any more, and (b) also has not started running
>> yet.
>>
>> Now the only thing that is necessary to produce the deadlock is that
>> we have a pair of threads where the task works for the restriction
>> calls have been scheduled in different order.  When the two
>> landlock_restrict_self() calls end up in the recovery path quoted
>> above, they will wait for one of their task works to run which is
>> blocked from running by another task work that is scheduled before and
>> does not finish either.
>>
>> (Just pasting a brain dump here to save you some time hunting for the
>> root cause. I don't know the best solution yet either.)
> 
> Let me propose the following fixes:
> 
> 1. Immediate fix for that specific issue
> ----------------------------------------
> 
> Proposal:
> * Remove the wait_for_completion(&shared_ctx.all_prepared)
>   call in the code snippet above.
> * Rewrite surrounding comments: Be clear about the fact that
>   cancel_tsync_works() is an opportunistic improvement, but we don't
>   have a guarantee at all that it cancels any of the enqueued task
>   works (because task_work_run might already have popped them off).
> 
> This removes the hold-and-wait dependency circle between the threads,
> which produces the observed deadlock.  The way that we shut down now
> is that we exit the main loop (happens already without it, but we
> might also "break" to be explicit).
> 
> I think that this fix or an equivalent one is needed here, because in
> either way, our assumptions in the quoted code above were wrong.
> 
> 
> 2. Can we reason constructively about correctness?
> --------------------------------------------------
> 
> The remaining question: If on the shutdown path, we can not actually
> remove all the enqueued task works, under what circumstances are we
> even able to interrupt and return from the landlock_restrict_self()
> system call?
> 
> 2.1 For n competing restrict_self calls, n-1 of them need to get interrupted
> ----------------------------------------------------------------------------
> 
> To answer this, consider a multithreaded process with threads named
> "red", "green" and "blue" and many additional threads: When "red",
> "green" and "blue" enforce landlock_restrict_self() concurrently, due
> to differing iteration order, we might end up enqueueing the task
> works on other threads in all of the following combinations:
> 
>   t0:  R G B  <- front of queue
>   t1:  R B G
>   t2:  G R B
>   t3:  G B R
>   t4:  B R G
>   t5:  B G R
> 
> In this configuration, for any of the landlock_restrict_self() system
> calls to even return (successfully or unsuccessfully), at least two
> threads must receive an interrupt and therefore remove their enqueued
> task works from the front of the queue.  Assuming those are green and
> blue, we get:
> 
>   t0:  R      <- front of queue
>   t1:  R
>   t2:  G R
>   t3:  G B R
>   t4:  B R
>   t5:  B G R
> 
> (This works because after the patch above, all of the enqueued G and B
> works finish even if there are remaining G and B works that are still
> blocked by an "R" entry.)
> 
> Now, "R" is in the front of the queue, and the
> landlock_restrict_self() call for the red thread can finish normally,
> even without it being interrupted.
> 
> Once the "R" task works are done as well, the remaining G and B works
> can run and finish as well.
> 
> This scheme generalizes: If we have n competing
> landlock_restrict_self() calls, then in worst case, at least n-1 of
> these system calls need to be interrupted so that they can all
> terminate.
> 
> 2.2 Can we guarantee that two system calls get interrupted?
> -----------------------------------------------------------
> 
> In case of competing landlock_restrict_self() calls, I think it is
> possible that not all relevant system calls get seen.  The scenario is
> one where we have a "red" and "green" thread calling
> landlock_restrict_self().
> 
>   (a set of additional threads)
>   t0: task_works: R G
>   t1: task_works: G R
>   tR: red thread
>   tG: green thread
> 
> In the red thread, the following happens:
>  * Under RCU, count the number of total threads => get a low number
>  * Allocate space for that number of task_works
>  * Under RCU
>    * Enqueue "R" into t0 and t1
>    * Enqueue "R" for some of the "additional threads"
>    * But we do not have enough pre-allocated space to enqueue "R" for
>      the green thread tG.
> 
> The same thing happens in the green thread as well.
> 
> The result is that we still have a deadlock between t0 and t1, but
> neither the red nor the green thread get interrupted so that they can
> resolve it.
> 
> (FWIW, you could resolve it from the outside by sending a signal to
> the red or green thread manually, but it is not guaranteed to happen
> on its own.)
> 
> Caveat: I am making pessimistic assumptions about the iteration order
> of the task list here, and I am assuming that the number of
> "additional threads" is swinging up and down during the competing
> enforcement, so that the enforcing threads are mis-approximating the
> required space for memory pre-allocation.
> 
> 2.3 Possible resolutions
> ------------------------
> 
> * We could try to interrupt all sibling threads during the teardown,
>   to fix the issue discussed in 2.2. (Downside: Complicated, more
>   expensive)
> * The reason why landlock_restrict_self() can't return is because it
>   needs to wait until all task works are done before it can free the
>   memory.  Alternatively, we could make the task works take ownership
>   of these memory structures (refcounting the shared_ctx).  (Downside:
>   The used memory is not linear to the number of threads any more.)
> 
> Side remark: In testing, I had the impression that the
> landlock_restrict_self() calls can go into a retry loop for a while
> where all competing threads get interrupted all the time; in a debug
> build, when the Syzkaller test prints out a line for each attempt,
> sometimes it was hanging for seconds and *then* resolving itself
> again.
> 
> 3 Conclusion
> ---------------
> 
> I would prefer if the final solution would not require deadlock
> reasoning at that level and we could do it in simpler way.  I
> therefore propose to do what Ding Yihan suggested, and what we had
> also discussed previously in the code review:
> 
> * Let's serialize the landlock_restrict_self()-with-TSYNC operations
>   through the cred_guard_mutex.
> 
> This will resolve the issue where competing landlock_restrict_self()
> calls with TSYNC can deadlock.  It will also remove the jittery
> behavior for that worst case where the conflict is resolved through
> retry.
> 
> 
> So in my mind, we need both patches:
> 
>  * The fix to the cleanup path from 1. above, to make interruption
>    work more reliably and to correct the misunderstandings in the
>    comments.
>  * cred_guard_mutex to serialize the TSYNC invocations.
> 
> Please let me know what you think.
> 
> Thanks,
> –Günther
> 


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [syzbot] [kernel?] INFO: task hung in restrict_one_thread_callback
       [not found] <69984159.050a0220.21cd75.01bb.GAE@google.com>
@ 2026-02-23 13:40 ` Frederic Weisbecker
  2026-02-23 15:15   ` Günther Noack
  0 siblings, 1 reply; 13+ messages in thread
From: Frederic Weisbecker @ 2026-02-23 13:40 UTC (permalink / raw)
  To: syzbot, Mickaël Salaün, Günther Noack, Paul Moore,
	James Morris, Serge E. Hallyn, linux-security-module
  Cc: anna-maria, linux-kernel, syzkaller-bugs, tglx

Le Fri, Feb 20, 2026 at 03:11:21AM -0800, syzbot a écrit :
> Hello,
> 
> syzbot found the following issue on:
> 
> HEAD commit:    635c467cc14e Add linux-next specific files for 20260213
> git tree:       linux-next
> console output: https://syzkaller.appspot.com/x/log.txt?x=1452f6e6580000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=61690c38d1398936
> dashboard link: https://syzkaller.appspot.com/bug?extid=7ea2f5e9dfd468201817
> compiler:       Debian clang version 21.1.8 (++20251221033036+2078da43e25a-1~exp1~20251221153213.50), Debian LLD 21.1.8
> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=16e41c02580000
> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=15813652580000
> 
> Downloadable assets:
> disk image: https://storage.googleapis.com/syzbot-assets/78b3d15ca8e6/disk-635c467c.raw.xz
> vmlinux: https://storage.googleapis.com/syzbot-assets/a95f3d108ef4/vmlinux-635c467c.xz
> kernel image: https://storage.googleapis.com/syzbot-assets/e58086838b24/bzImage-635c467c.xz
> 
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+7ea2f5e9dfd468201817@syzkaller.appspotmail.com
> 
> INFO: task syz.0.2812:14643 blocked for more than 143 seconds.
>       Not tainted syzkaller #0
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> task:syz.0.2812      state:D stack:25600 pid:14643 tgid:14643 ppid:13375  task_flags:0x400040 flags:0x00080002
> Call Trace:
>  <TASK>
>  context_switch kernel/sched/core.c:5295 [inline]
>  __schedule+0x1585/0x5340 kernel/sched/core.c:6907
>  __schedule_loop kernel/sched/core.c:6989 [inline]
>  schedule+0x164/0x360 kernel/sched/core.c:7004
>  schedule_timeout+0xc3/0x2c0 kernel/time/sleep_timeout.c:75
>  do_wait_for_common kernel/sched/completion.c:100 [inline]
>  __wait_for_common kernel/sched/completion.c:121 [inline]
>  wait_for_common kernel/sched/completion.c:132 [inline]
>  wait_for_completion+0x2cc/0x5e0 kernel/sched/completion.c:153
>  restrict_one_thread security/landlock/tsync.c:128 [inline]
>  restrict_one_thread_callback+0x320/0x570 security/landlock/tsync.c:162

Seems to be related to landlock security module.
Cc'ing maintainers for awareness.

Thanks.

>  task_work_run+0x1d9/0x270 kernel/task_work.c:233
>  get_signal+0x11eb/0x1330 kernel/signal.c:2807
>  arch_do_signal_or_restart+0xbc/0x830 arch/x86/kernel/signal.c:337
>  __exit_to_user_mode_loop kernel/entry/common.c:64 [inline]
>  exit_to_user_mode_loop+0x86/0x480 kernel/entry/common.c:98
>  __exit_to_user_mode_prepare include/linux/irq-entry-common.h:226 [inline]
>  syscall_exit_to_user_mode_prepare include/linux/irq-entry-common.h:256 [inline]
>  syscall_exit_to_user_mode include/linux/entry-common.h:325 [inline]
>  do_syscall_64+0x32d/0xf80 arch/x86/entry/syscall_64.c:100
>  entry_SYSCALL_64_after_hwframe+0x77/0x7f
> RIP: 0033:0x7f8d7f19bf79
> RSP: 002b:00007ffe0b192a38 EFLAGS: 00000246 ORIG_RAX: 00000000000000db
> RAX: fffffffffffffdfc RBX: 00000000000389f1 RCX: 00007f8d7f19bf79
> RDX: 0000000000000000 RSI: 0000000000000080 RDI: 00007f8d7f41618c
> RBP: 0000000000000032 R08: 3fffffffffffffff R09: 0000000000000000
> R10: 00007ffe0b192b40 R11: 0000000000000246 R12: 00007ffe0b192b60
> R13: 00007f8d7f41618c R14: 0000000000038a23 R15: 00007ffe0b192b40
>  </TASK>
> INFO: task syz.0.2812:14644 blocked for more than 143 seconds.
>       Not tainted syzkaller #0
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> task:syz.0.2812      state:D stack:28216 pid:14644 tgid:14643 ppid:13375  task_flags:0x400040 flags:0x00080002
> Call Trace:
>  <TASK>
>  context_switch kernel/sched/core.c:5295 [inline]
>  __schedule+0x1585/0x5340 kernel/sched/core.c:6907
>  __schedule_loop kernel/sched/core.c:6989 [inline]
>  schedule+0x164/0x360 kernel/sched/core.c:7004
>  schedule_timeout+0xc3/0x2c0 kernel/time/sleep_timeout.c:75
>  do_wait_for_common kernel/sched/completion.c:100 [inline]
>  __wait_for_common kernel/sched/completion.c:121 [inline]
>  wait_for_common kernel/sched/completion.c:132 [inline]
>  wait_for_completion+0x2cc/0x5e0 kernel/sched/completion.c:153
>  restrict_one_thread security/landlock/tsync.c:128 [inline]
>  restrict_one_thread_callback+0x320/0x570 security/landlock/tsync.c:162
>  task_work_run+0x1d9/0x270 kernel/task_work.c:233
>  get_signal+0x11eb/0x1330 kernel/signal.c:2807
>  arch_do_signal_or_restart+0xbc/0x830 arch/x86/kernel/signal.c:337
>  __exit_to_user_mode_loop kernel/entry/common.c:64 [inline]
>  exit_to_user_mode_loop+0x86/0x480 kernel/entry/common.c:98
>  __exit_to_user_mode_prepare include/linux/irq-entry-common.h:226 [inline]
>  syscall_exit_to_user_mode_prepare include/linux/irq-entry-common.h:256 [inline]
>  syscall_exit_to_user_mode include/linux/entry-common.h:325 [inline]
>  do_syscall_64+0x32d/0xf80 arch/x86/entry/syscall_64.c:100
>  entry_SYSCALL_64_after_hwframe+0x77/0x7f
> RIP: 0033:0x7f8d7f19bf79
> RSP: 002b:00007f8d8007c0e8 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
> RAX: fffffffffffffe00 RBX: 00007f8d7f415fa8 RCX: 00007f8d7f19bf79
> RDX: 0000000000000000 RSI: 0000000000000080 RDI: 00007f8d7f415fa8
> RBP: 00007f8d7f415fa0 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
> R13: 00007f8d7f416038 R14: 00007ffe0b1927f0 R15: 00007ffe0b1928d8
>  </TASK>
> INFO: task syz.0.2812:14645 blocked for more than 143 seconds.
>       Not tainted syzkaller #0
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> task:syz.0.2812      state:D stack:28648 pid:14645 tgid:14643 ppid:13375  task_flags:0x400140 flags:0x00080006
> Call Trace:
>  <TASK>
>  context_switch kernel/sched/core.c:5295 [inline]
>  __schedule+0x1585/0x5340 kernel/sched/core.c:6907
>  __schedule_loop kernel/sched/core.c:6989 [inline]
>  schedule+0x164/0x360 kernel/sched/core.c:7004
>  schedule_timeout+0xc3/0x2c0 kernel/time/sleep_timeout.c:75
>  do_wait_for_common kernel/sched/completion.c:100 [inline]
>  __wait_for_common kernel/sched/completion.c:121 [inline]
>  wait_for_common kernel/sched/completion.c:132 [inline]
>  wait_for_completion+0x2cc/0x5e0 kernel/sched/completion.c:153
>  landlock_restrict_sibling_threads+0xe9c/0x11f0 security/landlock/tsync.c:539
>  __do_sys_landlock_restrict_self security/landlock/syscalls.c:574 [inline]
>  __se_sys_landlock_restrict_self+0x540/0x810 security/landlock/syscalls.c:482
>  do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
>  do_syscall_64+0x14d/0xf80 arch/x86/entry/syscall_64.c:94
>  entry_SYSCALL_64_after_hwframe+0x77/0x7f
> RIP: 0033:0x7f8d7f19bf79
> RSP: 002b:00007f8d8005b028 EFLAGS: 00000246 ORIG_RAX: 00000000000001be
> RAX: ffffffffffffffda RBX: 00007f8d7f416090 RCX: 00007f8d7f19bf79
> RDX: 0000000000000000 RSI: 000000000000000e RDI: 0000000000000003
> RBP: 00007f8d7f2327e0 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
> R13: 00007f8d7f416128 R14: 00007f8d7f416090 R15: 00007ffe0b1928d8
>  </TASK>
> INFO: task syz.0.2812:14646 blocked for more than 144 seconds.
>       Not tainted syzkaller #0
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> task:syz.0.2812      state:D stack:28832 pid:14646 tgid:14643 ppid:13375  task_flags:0x400140 flags:0x00080006
> Call Trace:
>  <TASK>
>  context_switch kernel/sched/core.c:5295 [inline]
>  __schedule+0x1585/0x5340 kernel/sched/core.c:6907
>  __schedule_loop kernel/sched/core.c:6989 [inline]
>  schedule+0x164/0x360 kernel/sched/core.c:7004
>  schedule_timeout+0xc3/0x2c0 kernel/time/sleep_timeout.c:75
>  do_wait_for_common kernel/sched/completion.c:100 [inline]
>  __wait_for_common kernel/sched/completion.c:121 [inline]
>  wait_for_common kernel/sched/completion.c:132 [inline]
>  wait_for_completion+0x2cc/0x5e0 kernel/sched/completion.c:153
>  landlock_restrict_sibling_threads+0xe9c/0x11f0 security/landlock/tsync.c:539
>  __do_sys_landlock_restrict_self security/landlock/syscalls.c:574 [inline]
>  __se_sys_landlock_restrict_self+0x540/0x810 security/landlock/syscalls.c:482
>  do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
>  do_syscall_64+0x14d/0xf80 arch/x86/entry/syscall_64.c:94
>  entry_SYSCALL_64_after_hwframe+0x77/0x7f
> RIP: 0033:0x7f8d7f19bf79
> RSP: 002b:00007f8d8003a028 EFLAGS: 00000246 ORIG_RAX: 00000000000001be
> RAX: ffffffffffffffda RBX: 00007f8d7f416180 RCX: 00007f8d7f19bf79
> RDX: 0000000000000000 RSI: 000000000000000e RDI: 0000000000000003
> RBP: 00007f8d7f2327e0 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
> R13: 00007f8d7f416218 R14: 00007f8d7f416180 R15: 00007ffe0b1928d8
>  </TASK>
> 
> Showing all locks held in the system:
> 1 lock held by khungtaskd/31:
>  #0: ffffffff8e9602e0 (rcu_read_lock){....}-{1:3}, at: rcu_lock_acquire include/linux/rcupdate.h:312 [inline]
>  #0: ffffffff8e9602e0 (rcu_read_lock){....}-{1:3}, at: rcu_read_lock include/linux/rcupdate.h:850 [inline]
>  #0: ffffffff8e9602e0 (rcu_read_lock){....}-{1:3}, at: debug_show_all_locks+0x2e/0x180 kernel/locking/lockdep.c:6775
> 2 locks held by getty/5581:
>  #0: ffff8880328890a0 (&tty->ldisc_sem){++++}-{0:0}, at: tty_ldisc_ref_wait+0x25/0x70 drivers/tty/tty_ldisc.c:243
>  #1: ffffc9000332b2f0 (&ldata->atomic_read_lock){+.+.}-{4:4}, at: n_tty_read+0x45c/0x13c0 drivers/tty/n_tty.c:2211
> 
> =============================================
> 
> NMI backtrace for cpu 0
> CPU: 0 UID: 0 PID: 31 Comm: khungtaskd Not tainted syzkaller #0 PREEMPT(full) 
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 02/12/2026
> Call Trace:
>  <TASK>
>  dump_stack_lvl+0xe8/0x150 lib/dump_stack.c:120
>  nmi_cpu_backtrace+0x274/0x2d0 lib/nmi_backtrace.c:113
>  nmi_trigger_cpumask_backtrace+0x17a/0x300 lib/nmi_backtrace.c:62
>  trigger_all_cpu_backtrace include/linux/nmi.h:161 [inline]
>  __sys_info lib/sys_info.c:157 [inline]
>  sys_info+0x135/0x170 lib/sys_info.c:165
>  check_hung_uninterruptible_tasks kernel/hung_task.c:346 [inline]
>  watchdog+0xfd9/0x1030 kernel/hung_task.c:515
>  kthread+0x388/0x470 kernel/kthread.c:467
>  ret_from_fork+0x51e/0xb90 arch/x86/kernel/process.c:158
>  ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
>  </TASK>
> Sending NMI from CPU 0 to CPUs 1:
> NMI backtrace for cpu 1
> CPU: 1 UID: 0 PID: 86 Comm: kworker/u8:5 Not tainted syzkaller #0 PREEMPT(full) 
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 02/12/2026
> Workqueue: events_unbound nsim_dev_trap_report_work
> RIP: 0010:native_save_fl arch/x86/include/asm/irqflags.h:26 [inline]
> RIP: 0010:arch_local_save_flags arch/x86/include/asm/irqflags.h:109 [inline]
> RIP: 0010:arch_local_irq_save arch/x86/include/asm/irqflags.h:127 [inline]
> RIP: 0010:lock_acquire+0xab/0x2e0 kernel/locking/lockdep.c:5864
> Code: 84 c1 00 00 00 65 8b 05 73 b8 9f 11 85 c0 0f 85 b2 00 00 00 65 48 8b 05 bb 72 9f 11 83 b8 14 0b 00 00 00 0f 85 9d 00 00 00 9c <5b> fa 48 c7 c7 8f a1 02 8e e8 57 40 17 0a 65 ff 05 40 b8 9f 11 45
> RSP: 0018:ffffc9000260f498 EFLAGS: 00000246
> RAX: ffff88801df81e40 RBX: ffffffff818f9166 RCX: 0000000080000002
> RDX: 0000000000000000 RSI: ffffffff8176da62 RDI: 1ffffffff1d2c05c
> RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
> R10: ffffc9000260f638 R11: ffffffff81b11580 R12: 0000000000000002
> R13: ffffffff8e9602e0 R14: 0000000000000000 R15: 0000000000000000
> FS:  0000000000000000(0000) GS:ffff88812510b000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007fe09b2c1ff8 CR3: 000000000e74c000 CR4: 00000000003526f0
> Call Trace:
>  <TASK>
>  rcu_lock_acquire include/linux/rcupdate.h:312 [inline]
>  rcu_read_lock include/linux/rcupdate.h:850 [inline]
>  class_rcu_constructor include/linux/rcupdate.h:1193 [inline]
>  unwind_next_frame+0xc2/0x23c0 arch/x86/kernel/unwind_orc.c:495
>  arch_stack_walk+0x11b/0x150 arch/x86/kernel/stacktrace.c:25
>  stack_trace_save+0xa9/0x100 kernel/stacktrace.c:122
>  kasan_save_stack mm/kasan/common.c:57 [inline]
>  kasan_save_track+0x3e/0x80 mm/kasan/common.c:78
>  unpoison_slab_object mm/kasan/common.c:340 [inline]
>  __kasan_slab_alloc+0x6c/0x80 mm/kasan/common.c:366
>  kasan_slab_alloc include/linux/kasan.h:253 [inline]
>  slab_post_alloc_hook mm/slub.c:4501 [inline]
>  slab_alloc_node mm/slub.c:4830 [inline]
>  kmem_cache_alloc_node_noprof+0x384/0x690 mm/slub.c:4882
>  __alloc_skb+0x1d0/0x7d0 net/core/skbuff.c:702
>  alloc_skb include/linux/skbuff.h:1383 [inline]
>  nsim_dev_trap_skb_build drivers/net/netdevsim/dev.c:819 [inline]
>  nsim_dev_trap_report drivers/net/netdevsim/dev.c:876 [inline]
>  nsim_dev_trap_report_work+0x29a/0xb80 drivers/net/netdevsim/dev.c:922
>  process_one_work+0x949/0x1650 kernel/workqueue.c:3279
>  process_scheduled_works kernel/workqueue.c:3362 [inline]
>  worker_thread+0xb46/0x1140 kernel/workqueue.c:3443
>  kthread+0x388/0x470 kernel/kthread.c:467
>  ret_from_fork+0x51e/0xb90 arch/x86/kernel/process.c:158
>  ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
>  </TASK>
> 
> 
> ---
> This report is generated by a bot. It may contain errors.
> See https://goo.gl/tpsmEJ for more information about syzbot.
> syzbot engineers can be reached at syzkaller@googlegroups.com.
> 
> syzbot will keep track of this issue. See:
> https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
> 
> If the report is already addressed, let syzbot know by replying with:
> #syz fix: exact-commit-title
> 
> If you want syzbot to run the reproducer, reply with:
> #syz test: git://repo/address.git branch-or-commit-hash
> If you attach or paste a git patch, syzbot will apply it before testing.
> 
> If you want to overwrite report's subsystems, reply with:
> #syz set subsystems: new-subsystem
> (See the list of subsystem names on the web dashboard)
> 
> If the report is a duplicate of another one, reply with:
> #syz dup: exact-subject-of-another-report
> 
> If you want to undo deduplication, reply with:
> #syz undup

-- 
Frederic Weisbecker
SUSE Labs

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [syzbot] [kernel?] INFO: task hung in restrict_one_thread_callback
  2026-02-23 13:40 ` Frederic Weisbecker
@ 2026-02-23 15:15   ` Günther Noack
  0 siblings, 0 replies; 13+ messages in thread
From: Günther Noack @ 2026-02-23 15:15 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: syzbot, Mickaël Salaün, Paul Moore, James Morris,
	Serge E. Hallyn, linux-security-module, anna-maria, linux-kernel,
	syzkaller-bugs, tglx

On Mon, Feb 23, 2026 at 02:40:15PM +0100, Frederic Weisbecker wrote:
> Le Fri, Feb 20, 2026 at 03:11:21AM -0800, syzbot a écrit :
> > Call Trace:
> >  <TASK>
> >  context_switch kernel/sched/core.c:5295 [inline]
> >  __schedule+0x1585/0x5340 kernel/sched/core.c:6907
> >  __schedule_loop kernel/sched/core.c:6989 [inline]
> >  schedule+0x164/0x360 kernel/sched/core.c:7004
> >  schedule_timeout+0xc3/0x2c0 kernel/time/sleep_timeout.c:75
> >  do_wait_for_common kernel/sched/completion.c:100 [inline]
> >  __wait_for_common kernel/sched/completion.c:121 [inline]
> >  wait_for_common kernel/sched/completion.c:132 [inline]
> >  wait_for_completion+0x2cc/0x5e0 kernel/sched/completion.c:153
> >  restrict_one_thread security/landlock/tsync.c:128 [inline]
> >  restrict_one_thread_callback+0x320/0x570 security/landlock/tsync.c:162
> 
> Seems to be related to landlock security module.
> Cc'ing maintainers for awareness.

Thank you!  That is correct.  We are already discussing it in
https://lore.kernel.org/all/00A9E53EDC82309F+7b1dfc69-95f8-4ffc-a67c-967de0e2dfee@uniontech.com/

—Günther

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [syzbot] [kernel?] INFO: task hung in restrict_one_thread_callback
  2026-02-23 11:29         ` Ding Yihan
@ 2026-02-23 15:16           ` Günther Noack
  2026-02-24  3:02             ` Ding Yihan
  0 siblings, 1 reply; 13+ messages in thread
From: Günther Noack @ 2026-02-23 15:16 UTC (permalink / raw)
  To: Ding Yihan
  Cc: Günther Noack, syzbot, Mickaël Salaün,
	linux-security-module, Jann Horn, Paul Moore

Hello!

On Mon, Feb 23, 2026 at 07:29:56PM +0800, Ding Yihan wrote:
> Thank you for the detailed analysis and the clear breakdown. 
> Apologies for the delayed response. I spent the last couple of days
> thoroughly reading through the previous mailing list discussions. I
> was trying hard to see if there was any viable pure lockless design
> that could solve this concurrency issue while preserving the original
> architecture. 
> 
> However, after looking at the complexities you outlined, I completely
> agree with your conclusion: serializing the TSYNC operations is indeed
> the most robust and reasonable path forward to prevent the deadlock.
> 
> Regarding the lock choice, since 'cred_guard_mutex' is explicitly
> marked as deprecated for new code in the kernel,maybe we can use its
> modern replacement: 'exec_update_lock' (using down_write_trylock /
> up_write on current->signal). This aligns with the current subsystem
> standards and was also briefly touched upon by Jann in the older
> discussions.
> 
> I fully understand the requirement for the two-part patch series:
> 1. Cleaning up the cancellation logic and comments.
> 2. Introducing the serialization lock for TSYNC.
> 
> I will take some time to draft and test this patch series properly. 
> I also plan to discuss this with my kernel colleagues here at 
> UnionTech to see if they have any additional suggestions on the 
> implementation details before I submit it.
> 
> I will send out the v1 patch series to the list as soon as it is
> ready. Thanks again for your guidance and the great discussion!

Thank you, Ding, this is much appreciated!

I agree, the `exec_update_lock` might be the better solution;
I also need to familiarize myself more with it to double-check.

—Günther

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [syzbot] [kernel?] INFO: task hung in restrict_one_thread_callback
  2026-02-23 15:16           ` Günther Noack
@ 2026-02-24  3:02             ` Ding Yihan
  2026-02-24  3:03               ` syzbot
  0 siblings, 1 reply; 13+ messages in thread
From: Ding Yihan @ 2026-02-24  3:02 UTC (permalink / raw)
  To: Günther Noack
  Cc: Günther Noack, syzbot, Mickaël Salaün,
	linux-security-module, Jann Horn, Paul Moore

Hi Günther,

Thank you for the detailed analysis! I completely agree that serializing the TSYNC 
operations is the right way to prevent this deadlock. I have drafted a patch using 
`exec_update_lock` (similar to how seccomp uses `cred_guard_mutex`).

Regarding your proposal to split this into two patches (one for the cleanup 
path and one for the lock): Maybe combining them into a single patch is a better choice. Here is why:

We actually *cannot* remove `wait_for_completion(&shared_ctx.all_prepared)` 
in the interrupt recovery path. Since `shared_ctx` is allocated on the local 
stack of the caller, removing the wait would cause a severe Use-After-Free (UAF) if the 
thread returns to userspace while sibling task_works are still executing and dereferencing `ctx`. 

By adding the lock, we inherently resolve the deadlock, meaning the sibling task_works 
will never get stuck. Thus, `wait_for_completion` becomes perfectly safe to keep, 
and it remains strictly necessary to protect the stack memory. Therefore, the "fix" for the 
cleanup path is simply updating the comments to reflect this reality, which is tightly coupled with the locking fix. 
It felt more cohesive as a single patch.

I have test the patch on my laptop,and it will not trigger the issue.Let's have syzbot test this combined logic:

#syz test: 

--- a/security/landlock/tsync.c

+++ b/security/landlock/tsync.c

@@ -447,6 +447,12 @@ int landlock_restrict_sibling_threads(const struct cred *old_cred,

        shared_ctx.new_cred = new_cred;

        shared_ctx.set_no_new_privs = task_no_new_privs(current);

 

+       /*

+        * Serialize concurrent TSYNC operations to prevent deadlocks

+        * when multiple threads call landlock_restrict_self() simultaneously.

+        */

+       down_write(&current->signal->exec_update_lock);

+

        /*

         * We schedule a pseudo-signal task_work for each of the calling task's

         * sibling threads.  In the task work, each thread:

@@ -527,14 +533,17 @@ int landlock_restrict_sibling_threads(const struct cred *old_cred,

                                           -ERESTARTNOINTR);

 

                                /*

-                                * Cancel task works for tasks that did not start running yet,

-                                * and decrement all_prepared and num_unfinished accordingly.

+                                * Opportunistic improvement: try to cancel task works

+                                * for tasks that did not start running yet. We do not

+                                * have a guarantee that it cancels any of the enqueued

+                                * task works (because task_work_run() might already have

+                                * dequeued them).

                                 */

                                cancel_tsync_works(&works, &shared_ctx);

 

                                /*

-                                * The remaining task works have started running, so waiting for

-                                * their completion will finish.

+                                * We must wait for the remaining task works to finish to

+                                * prevent a use-after-free of the local shared_ctx.

                                 */

                                wait_for_completion(&shared_ctx.all_prepared);

                        }

@@ -557,5 +566,7 @@ int landlock_restrict_sibling_threads(const struct cred *old_cred,

 

        tsync_works_release(&works);

 

+       up_write(&current->signal->exec_update_lock);

+

        return atomic_read(&shared_ctx.preparation_error);

 }

--
在 2026/2/23 23:16, Günther Noack 写道:
> Hello!
> 
> On Mon, Feb 23, 2026 at 07:29:56PM +0800, Ding Yihan wrote:
>> Thank you for the detailed analysis and the clear breakdown. 
>> Apologies for the delayed response. I spent the last couple of days
>> thoroughly reading through the previous mailing list discussions. I
>> was trying hard to see if there was any viable pure lockless design
>> that could solve this concurrency issue while preserving the original
>> architecture. 
>> 
>> However, after looking at the complexities you outlined, I completely
>> agree with your conclusion: serializing the TSYNC operations is indeed
>> the most robust and reasonable path forward to prevent the deadlock.
>> 
>> Regarding the lock choice, since 'cred_guard_mutex' is explicitly
>> marked as deprecated for new code in the kernel,maybe we can use its
>> modern replacement: 'exec_update_lock' (using down_write_trylock /
>> up_write on current->signal). This aligns with the current subsystem
>> standards and was also briefly touched upon by Jann in the older
>> discussions.
>> 
>> I fully understand the requirement for the two-part patch series:
>> 1. Cleaning up the cancellation logic and comments.
>> 2. Introducing the serialization lock for TSYNC.
>> 
>> I will take some time to draft and test this patch series properly. 
>> I also plan to discuss this with my kernel colleagues here at 
>> UnionTech to see if they have any additional suggestions on the 
>> implementation details before I submit it.
>> 
>> I will send out the v1 patch series to the list as soon as it is
>> ready. Thanks again for your guidance and the great discussion!
> 
> Thank you, Ding, this is much appreciated!
> 
> I agree, the `exec_update_lock` might be the better solution;
> I also need to familiarize myself more with it to double-check.
> 
> —Günther
> 


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [syzbot] [kernel?] INFO: task hung in restrict_one_thread_callback
  2026-02-24  3:02             ` Ding Yihan
@ 2026-02-24  3:03               ` syzbot
  0 siblings, 0 replies; 13+ messages in thread
From: syzbot @ 2026-02-24  3:03 UTC (permalink / raw)
  To: dingyihan
  Cc: dingyihan, gnoack3000, gnoack, jannh, linux-security-module, mic,
	paul, linux-kernel, syzkaller-bugs

> Hi Günther,
>
> Thank you for the detailed analysis! I completely agree that serializing the TSYNC 
> operations is the right way to prevent this deadlock. I have drafted a patch using 
> `exec_update_lock` (similar to how seccomp uses `cred_guard_mutex`).
>
> Regarding your proposal to split this into two patches (one for the cleanup 
> path and one for the lock): Maybe combining them into a single patch is a better choice. Here is why:
>
> We actually *cannot* remove `wait_for_completion(&shared_ctx.all_prepared)` 
> in the interrupt recovery path. Since `shared_ctx` is allocated on the local 
> stack of the caller, removing the wait would cause a severe Use-After-Free (UAF) if the 
> thread returns to userspace while sibling task_works are still executing and dereferencing `ctx`. 
>
> By adding the lock, we inherently resolve the deadlock, meaning the sibling task_works 
> will never get stuck. Thus, `wait_for_completion` becomes perfectly safe to keep, 
> and it remains strictly necessary to protect the stack memory. Therefore, the "fix" for the 
> cleanup path is simply updating the comments to reflect this reality, which is tightly coupled with the locking fix. 
> It felt more cohesive as a single patch.
>
> I have test the patch on my laptop,and it will not trigger the issue.Let's have syzbot test this combined logic:
>
> #syz test: 

"---" does not look like a valid git repo address.

>
> --- a/security/landlock/tsync.c
>
> +++ b/security/landlock/tsync.c
>
> @@ -447,6 +447,12 @@ int landlock_restrict_sibling_threads(const struct cred *old_cred,
>
>         shared_ctx.new_cred = new_cred;
>
>         shared_ctx.set_no_new_privs = task_no_new_privs(current);
>
>  
>
> +       /*
>
> +        * Serialize concurrent TSYNC operations to prevent deadlocks
>
> +        * when multiple threads call landlock_restrict_self() simultaneously.
>
> +        */
>
> +       down_write(&current->signal->exec_update_lock);
>
> +
>
>         /*
>
>          * We schedule a pseudo-signal task_work for each of the calling task's
>
>          * sibling threads.  In the task work, each thread:
>
> @@ -527,14 +533,17 @@ int landlock_restrict_sibling_threads(const struct cred *old_cred,
>
>                                            -ERESTARTNOINTR);
>
>  
>
>                                 /*
>
> -                                * Cancel task works for tasks that did not start running yet,
>
> -                                * and decrement all_prepared and num_unfinished accordingly.
>
> +                                * Opportunistic improvement: try to cancel task works
>
> +                                * for tasks that did not start running yet. We do not
>
> +                                * have a guarantee that it cancels any of the enqueued
>
> +                                * task works (because task_work_run() might already have
>
> +                                * dequeued them).
>
>                                  */
>
>                                 cancel_tsync_works(&works, &shared_ctx);
>
>  
>
>                                 /*
>
> -                                * The remaining task works have started running, so waiting for
>
> -                                * their completion will finish.
>
> +                                * We must wait for the remaining task works to finish to
>
> +                                * prevent a use-after-free of the local shared_ctx.
>
>                                  */
>
>                                 wait_for_completion(&shared_ctx.all_prepared);
>
>                         }
>
> @@ -557,5 +566,7 @@ int landlock_restrict_sibling_threads(const struct cred *old_cred,
>
>  
>
>         tsync_works_release(&works);
>
>  
>
> +       up_write(&current->signal->exec_update_lock);
>
> +
>
>         return atomic_read(&shared_ctx.preparation_error);
>
>  }
>
> --
> 在 2026/2/23 23:16, Günther Noack 写道:
>> Hello!
>> 
>> On Mon, Feb 23, 2026 at 07:29:56PM +0800, Ding Yihan wrote:
>>> Thank you for the detailed analysis and the clear breakdown. 
>>> Apologies for the delayed response. I spent the last couple of days
>>> thoroughly reading through the previous mailing list discussions. I
>>> was trying hard to see if there was any viable pure lockless design
>>> that could solve this concurrency issue while preserving the original
>>> architecture. 
>>> 
>>> However, after looking at the complexities you outlined, I completely
>>> agree with your conclusion: serializing the TSYNC operations is indeed
>>> the most robust and reasonable path forward to prevent the deadlock.
>>> 
>>> Regarding the lock choice, since 'cred_guard_mutex' is explicitly
>>> marked as deprecated for new code in the kernel,maybe we can use its
>>> modern replacement: 'exec_update_lock' (using down_write_trylock /
>>> up_write on current->signal). This aligns with the current subsystem
>>> standards and was also briefly touched upon by Jann in the older
>>> discussions.
>>> 
>>> I fully understand the requirement for the two-part patch series:
>>> 1. Cleaning up the cancellation logic and comments.
>>> 2. Introducing the serialization lock for TSYNC.
>>> 
>>> I will take some time to draft and test this patch series properly. 
>>> I also plan to discuss this with my kernel colleagues here at 
>>> UnionTech to see if they have any additional suggestions on the 
>>> implementation details before I submit it.
>>> 
>>> I will send out the v1 patch series to the list as soon as it is
>>> ready. Thanks again for your guidance and the great discussion!
>> 
>> Thank you, Ding, this is much appreciated!
>> 
>> I agree, the `exec_update_lock` might be the better solution;
>> I also need to familiarize myself more with it to double-check.
>> 
>> —Günther
>> 
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH] landlock: Fix deadlock in restrict_one_thread_callback
  2026-02-23  9:42       ` Günther Noack
  2026-02-23 11:29         ` Ding Yihan
@ 2026-02-24  6:27         ` Yihan Ding
  2026-02-24  8:48           ` Günther Noack
  1 sibling, 1 reply; 13+ messages in thread
From: Yihan Ding @ 2026-02-24  6:27 UTC (permalink / raw)
  To: Mickaël Salaün, Günther Noack
  Cc: Paul Moore, Jann Horn, linux-security-module, linux-kernel,
	syzbot+7ea2f5e9dfd468201817, Yihan Ding

syzbot found a deadlock in landlock_restrict_sibling_threads().
When multiple threads concurrently call landlock_restrict_self() with
sibling thread restriction enabled, they can deadlock by mutually
queueing task_works on each other and then blocking in kernel space
(waiting for the other to finish).

Fix this by serializing the TSYNC operations within the same process
using the exec_update_lock. This prevents concurrent invocations
from deadlocking.

Additionally, update the comments in the interrupt recovery path to
clarify that cancel_tsync_works() is an opportunistic cleanup, and
waiting for completion is strictly necessary to prevent a Use-After-Free
of the stack-allocated shared_ctx.

Fixes: 42fc7e6543f6 ("landlock: Multithreading support for landlock_restrict_self()")
Reported-by: syzbot+7ea2f5e9dfd468201817@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=7ea2f5e9dfd468201817
Signed-off-by: Yihan Ding <dingyihan@uniontech.com>
---
 security/landlock/tsync.c | 19 +++++++++++++++----
 1 file changed, 15 insertions(+), 4 deletions(-)

diff --git a/security/landlock/tsync.c b/security/landlock/tsync.c
index de01aa899751..4e91af271f3b 100644
--- a/security/landlock/tsync.c
+++ b/security/landlock/tsync.c
@@ -447,6 +447,12 @@ int landlock_restrict_sibling_threads(const struct cred *old_cred,
 	shared_ctx.new_cred = new_cred;
 	shared_ctx.set_no_new_privs = task_no_new_privs(current);
 
+	/*
+	 * Serialize concurrent TSYNC operations to prevent deadlocks
+	 * when multiple threads call landlock_restrict_self() simultaneously.
+	 */
+	down_write(&current->signal->exec_update_lock);
+
 	/*
 	 * We schedule a pseudo-signal task_work for each of the calling task's
 	 * sibling threads.  In the task work, each thread:
@@ -527,14 +533,17 @@ int landlock_restrict_sibling_threads(const struct cred *old_cred,
 					   -ERESTARTNOINTR);
 
 				/*
-				 * Cancel task works for tasks that did not start running yet,
-				 * and decrement all_prepared and num_unfinished accordingly.
+				 * Opportunistic improvement: try to cancel task works
+				 * for tasks that did not start running yet. We do not
+				 * have a guarantee that it cancels any of the enqueued
+				 * task works (because task_work_run() might already have
+				 * dequeued them).
 				 */
 				cancel_tsync_works(&works, &shared_ctx);
 
 				/*
-				 * The remaining task works have started running, so waiting for
-				 * their completion will finish.
+				 * We must wait for the remaining task works to finish to
+				 * prevent a use-after-free of the local shared_ctx.
 				 */
 				wait_for_completion(&shared_ctx.all_prepared);
 			}
@@ -557,5 +566,7 @@ int landlock_restrict_sibling_threads(const struct cred *old_cred,
 
 	tsync_works_release(&works);
 
+	up_write(&current->signal->exec_update_lock);
+
 	return atomic_read(&shared_ctx.preparation_error);
 }
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH] landlock: Fix deadlock in restrict_one_thread_callback
  2026-02-24  6:27         ` [PATCH] landlock: Fix deadlock " Yihan Ding
@ 2026-02-24  8:48           ` Günther Noack
  0 siblings, 0 replies; 13+ messages in thread
From: Günther Noack @ 2026-02-24  8:48 UTC (permalink / raw)
  To: Yihan Ding
  Cc: Mickaël Salaün, Paul Moore, Jann Horn,
	linux-security-module, linux-kernel, syzbot+7ea2f5e9dfd468201817

Hello!

Thanks for sending the patch!

On Tue, Feb 24, 2026 at 02:27:29PM +0800, Yihan Ding wrote:
> syzbot found a deadlock in landlock_restrict_sibling_threads().
> When multiple threads concurrently call landlock_restrict_self() with
> sibling thread restriction enabled, they can deadlock by mutually
> queueing task_works on each other and then blocking in kernel space
> (waiting for the other to finish).
> 
> Fix this by serializing the TSYNC operations within the same process
> using the exec_update_lock. This prevents concurrent invocations
> from deadlocking.
> 
> Additionally, update the comments in the interrupt recovery path to
> clarify that cancel_tsync_works() is an opportunistic cleanup, and
> waiting for completion is strictly necessary to prevent a Use-After-Free
> of the stack-allocated shared_ctx.
> 
> Fixes: 42fc7e6543f6 ("landlock: Multithreading support for landlock_restrict_self()")
> Reported-by: syzbot+7ea2f5e9dfd468201817@syzkaller.appspotmail.com
> Closes: https://syzkaller.appspot.com/bug?extid=7ea2f5e9dfd468201817
> Signed-off-by: Yihan Ding <dingyihan@uniontech.com>
> ---
>  security/landlock/tsync.c | 19 +++++++++++++++----
>  1 file changed, 15 insertions(+), 4 deletions(-)
> 
> diff --git a/security/landlock/tsync.c b/security/landlock/tsync.c
> index de01aa899751..4e91af271f3b 100644
> --- a/security/landlock/tsync.c
> +++ b/security/landlock/tsync.c
> @@ -447,6 +447,12 @@ int landlock_restrict_sibling_threads(const struct cred *old_cred,
>  	shared_ctx.new_cred = new_cred;
>  	shared_ctx.set_no_new_privs = task_no_new_privs(current);
>  
> +	/*
> +	 * Serialize concurrent TSYNC operations to prevent deadlocks
> +	 * when multiple threads call landlock_restrict_self() simultaneously.
> +	 */
> +	down_write(&current->signal->exec_update_lock);

Should we use the *_killable variant of this lock acquisition?


>  	/*
>  	 * We schedule a pseudo-signal task_work for each of the calling task's
>  	 * sibling threads.  In the task work, each thread:
> @@ -527,14 +533,17 @@ int landlock_restrict_sibling_threads(const struct cred *old_cred,
>  					   -ERESTARTNOINTR);
>  
>  				/*
> -				 * Cancel task works for tasks that did not start running yet,
> -				 * and decrement all_prepared and num_unfinished accordingly.
> +				 * Opportunistic improvement: try to cancel task works
> +				 * for tasks that did not start running yet. We do not
> +				 * have a guarantee that it cancels any of the enqueued
> +				 * task works (because task_work_run() might already have
> +				 * dequeued them).
>  				 */
>  				cancel_tsync_works(&works, &shared_ctx);
>  
>  				/*
> -				 * The remaining task works have started running, so waiting for
> -				 * their completion will finish.
> +				 * We must wait for the remaining task works to finish to
> +				 * prevent a use-after-free of the local shared_ctx.
>  				 */
>  				wait_for_completion(&shared_ctx.all_prepared);

I do not think that we must wait for all_prepared here, as your
updated comment says: The landlock_restrict_sibling_threads() function
still waits for all of these task works to finish at the bottom where
it waits for "all_finished", so there is no UAF on the local shared
context?

I would recommend replacing the
wait_for_completion(&shared_ctx.all_prepared) call and its comment
with an explicit "break":

/*
 * Break the loop with error.  The cleanup code after the loop
 * unblocks the remaining task_works.
 */
break;

Please also update the comment above the complete_all(ready_to_commit):

  We now have either (a) all sibling threads blocking and in
  "prepared" state in the task work, or (b) the preparation error is
  set.  Ask all threads to commit (or abort).

Then it is a bit more explicit about the error handling variant of this.


(FYI, I have tested the patch variant where I only removed the
wait_for_completion(all_prepared) call, and where I did *not* add the
additional lock at the top.  In this configuration, I was unable to
get it to hang any more, even with added mdelays.  But as discussed in
section 2.2 of [1], there are still difficult to reproduce scenarios
where this can theoretically fail, and it is better to use the lock at
the top.)

[1] https://lore.kernel.org/all/20260223.52c45aed20f8@gnoack.org/

Please also feel free to split up the change into a part that adds the
exec_guard_lock and a part that changes the path where the calling
thread gets interrupted.  Strictly speaking, the part where we change
the interruption logic is only a nicety once we have the
exec_guard_lock in place.

>  			}
> @@ -557,5 +566,7 @@ int landlock_restrict_sibling_threads(const struct cred *old_cred,
>  
>  	tsync_works_release(&works);
>  
> +	up_write(&current->signal->exec_update_lock);
> +
>  	return atomic_read(&shared_ctx.preparation_error);
>  }
> -- 
> 2.51.0
> 

Thanks,
–Günther

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [syzbot] [kernel?] INFO: task hung in restrict_one_thread_callback
  2026-02-21  7:28 ` [syzbot] [kernel?] INFO: task hung in restrict_one_thread_callback Ding Yihan
  2026-02-21 12:00   ` Günther Noack
@ 2026-02-24 14:43   ` Günther Noack
  1 sibling, 0 replies; 13+ messages in thread
From: Günther Noack @ 2026-02-24 14:43 UTC (permalink / raw)
  To: syzbot; +Cc: linux-security-module

#syz set subsystems: lsm, kernel

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2026-02-24 14:43 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <69995a88.050a0220.340abe.0d25.GAE@google.com>
2026-02-21  7:28 ` [syzbot] [kernel?] INFO: task hung in restrict_one_thread_callback Ding Yihan
2026-02-21 12:00   ` Günther Noack
2026-02-21 13:19     ` Günther Noack
2026-02-23  9:42       ` Günther Noack
2026-02-23 11:29         ` Ding Yihan
2026-02-23 15:16           ` Günther Noack
2026-02-24  3:02             ` Ding Yihan
2026-02-24  3:03               ` syzbot
2026-02-24  6:27         ` [PATCH] landlock: Fix deadlock " Yihan Ding
2026-02-24  8:48           ` Günther Noack
2026-02-24 14:43   ` [syzbot] [kernel?] INFO: task hung " Günther Noack
     [not found] <69984159.050a0220.21cd75.01bb.GAE@google.com>
2026-02-23 13:40 ` Frederic Weisbecker
2026-02-23 15:15   ` Günther Noack

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox