From: "Günther Noack" <gnoack3000@gmail.com>
To: Ding Yihan <dingyihan@uniontech.com>
Cc: syzbot <syzbot+7ea2f5e9dfd468201817@syzkaller.appspotmail.com>,
"Mickaël Salaün" <mic@digikod.net>,
linux-security-module@vger.kernel.org,
"Jann Horn" <jannh@google.com>
Subject: Re: [syzbot] [kernel?] INFO: task hung in restrict_one_thread_callback
Date: Sat, 21 Feb 2026 14:19:53 +0100 [thread overview]
Message-ID: <20260221.3ff0e30e4010@gnoack.org> (raw)
In-Reply-To: <20260221.5d8a306bcaf1@gnoack.org>
On Sat, Feb 21, 2026 at 01:00:03PM +0100, Günther Noack wrote:
> (Very) tentative investigation:
>
> In the Syzkaller report [2], it seems that the reproducer [2.1] is
> creating two rulesets and then enforcing them in parallel, a scenario
> which we are exercising in the TEST(competing_enablement) in
> tools/testing/selftests/landlock/tsync_test.c already, but which has
> not failed in my own selftest runs.
>
> In the crash report, there are four threads in total:
>
> * Two are stuck in the line
> wait_for_completion(&ctx->ready_to_commit);
> in the per-thread task work (line 128 [4.1])
> * Two are stuck in the line
> wait_for_completion(&shared_ctx.all_prepared)
> in the calling thread's coordination logic (line 539 [4.2])
>
> In line 539, we are already on the code path where we detected that we
> are getting interrupted by another thread and where we are attempting
> to deal with the scenario where two landlock_restrict_self() calls
> compete. This is detected on line 523 when
> wait_for_completion_interruptible() is true. The approach to handle
> this is to set the overall -ERESTARTNOINTR error and cancel the work
> that has been ongoing so far, by canceling the task works that did not
> start running yet and waiting for the ones that did start running
> (that is the step where we are blocked!). The reasoning there was
> that these task works will all hit the "all_prepared" stage now, but
> as we can see in the stack trace, the task works that are actively
> running are already on line 128 and have passed the "all_prepared"
> stage).
>
> Differences I can see between syzkaller and our own test:
>
> * The reproducer also calls openat() and then twice socketpair().
> These syscalls should be unrelated, but it's possible that the
> "async" invocation of socketpair() contributes to adding more
> threads. (Assuming that "async" means "in new thread" in syzkaller)
> * Syzkaller gives it more attempts. ([2.2])
>
> I do not understand yet what went wrong in our scheme and need to look
> deeper.
OK, I think I understand now. Our existing recovery code for this
conflict is this:
/*
* Decrement num_preparing for current, to undo that we initialized it
* to 1 a few lines above.
*/
if (atomic_dec_return(&shared_ctx.num_preparing) > 0) {
if (wait_for_completion_interruptible(
&shared_ctx.all_prepared)) {
/* In case of interruption, we need to retry the system call. */
atomic_set(&shared_ctx.preparation_error,
-ERESTARTNOINTR);
/*
* Cancel task works for tasks that did not start running yet,
* and decrement all_prepared and num_unfinished accordingly.
*/
cancel_tsync_works(&works, &shared_ctx);
/*
* The remaining task works have started running, so waiting for
* their completion will finish.
*/
wait_for_completion(&shared_ctx.all_prepared);
}
}
When I wrote this, I assumed, as the last comment states, that the
task works which we could not cancel, are already running.
I was wrong there, because I had misunderstood task_work_run(). When
the task works get run there, it first *atomically dequeues the entire
queue of scheduled task works*, and then runs them sequentially.
That is why, if we have one task work that belongs to the first
landlock_restrict_self() call and one which belongs to the other, the
task work which is scheduled later can (a) not be dequeued with
cancel_tsync_works() any more, and (b) also has not started running
yet.
Now the only thing that is necessary to produce the deadlock is that
we have a pair of threads where the task works for the restriction
calls have been scheduled in different order. When the two
landlock_restrict_self() calls end up in the recovery path quoted
above, they will wait for one of their task works to run which is
blocked from running by another task work that is scheduled before and
does not finish either.
(Just pasting a brain dump here to save you some time hunting for the
root cause. I don't know the best solution yet either.)
–Günther
next prev parent reply other threads:[~2026-02-21 13:19 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <69995a88.050a0220.340abe.0d25.GAE@google.com>
2026-02-21 7:28 ` [syzbot] [kernel?] INFO: task hung in restrict_one_thread_callback Ding Yihan
2026-02-21 12:00 ` Günther Noack
2026-02-21 13:19 ` Günther Noack [this message]
2026-02-23 9:42 ` Günther Noack
2026-02-23 11:29 ` Ding Yihan
2026-02-23 15:16 ` Günther Noack
2026-02-24 3:02 ` Ding Yihan
2026-02-24 3:03 ` syzbot
2026-02-24 6:27 ` [PATCH] landlock: Fix deadlock " Yihan Ding
2026-02-24 8:48 ` Günther Noack
2026-02-24 14:43 ` [syzbot] [kernel?] INFO: task hung " Günther Noack
[not found] <69984159.050a0220.21cd75.01bb.GAE@google.com>
2026-02-23 13:40 ` Frederic Weisbecker
2026-02-23 15:15 ` Günther Noack
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260221.3ff0e30e4010@gnoack.org \
--to=gnoack3000@gmail.com \
--cc=dingyihan@uniontech.com \
--cc=jannh@google.com \
--cc=linux-security-module@vger.kernel.org \
--cc=mic@digikod.net \
--cc=syzbot+7ea2f5e9dfd468201817@syzkaller.appspotmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox