From: Justin Suess <utilityemal77@gmail.com>
To: Ding Yihan <dingyihan@uniontech.com>
Cc: "Günther Noack" <gnoack3000@gmail.com>,
"Tingmao Wang" <m@maowtm.org>, "Mickaël Salaün" <mic@digikod.net>,
"Paul Moore" <paul@paul-moore.com>,
"Jann Horn" <jannh@google.com>,
linux-security-module@vger.kernel.org,
linux-kernel@vger.kernel.org,
syzbot+7ea2f5e9dfd468201817@syzkaller.appspotmail.com
Subject: Re: [PATCH v3 1/2] landlock: Serialize TSYNC thread restriction
Date: Wed, 4 Mar 2026 09:08:38 -0500 [thread overview]
Message-ID: <aag85lfE3lfSSGEH@suesslenovo> (raw)
In-Reply-To: <D31C8311F753F56D+d07bcc15-ede7-4a04-829d-d80f69abda83@uniontech.com>
On Wed, Mar 04, 2026 at 10:46:39AM +0800, Ding Yihan wrote:
> Hi all,
>
> Thank you Justin for catching the test failure and the thorough
> investigation! And thanks Günther and Tingmao for diving into the
> syscall restart mechanics.
>
> I've evaluated both the `while` loop approach with `task_work_run()`
> and the `restart_syscall()` approach. I strongly lean towards using
> `restart_syscall()` as suggested by Tingmao.
>
> As Günther pointed out earlier, executing `task_work_run()` directly
> deep inside the syscall context can be risky. Task works often assume
> they are running at the kernel-user boundary with a specific state.
> Using `restart_syscall()` safely bounces us to that boundary, processes
> the works cleanly, and restarts the syscall via standard mechanisms.
>
> After some selftests,I will prepare the v4 patch series using `restart_syscall()`.
> I will also ensure all comments are properly wrapped to 80 columns as requested
> by Mickaël, and make sure to include the proper Reported-by and
> Suggested-by tags for everyone's excellent input here.
>
> Expect the v4 series shortly. Thanks again for the great collaboration!
>
>
> Best regards,
> Yihan Ding
>
After review, I agree Tingmao's solution is better.
Coming from a userspace background, I didn't think of that as a solution
for a lock contention, but kernel space has different needs/conventions.
I agree this is probably the right way to go. The simplest approach is
probably best here, and the restart_syscall seems better here, seeing as
task_work_run is rarely called in kernel code outside core paths.
I've learned a lot about kernel task workers and how locking is handled
as a result.
Thank you for your work with this series, this fix is useful!
> 在 2026/3/4 05:19, Günther Noack 写道:
> > On Tue, Mar 03, 2026 at 08:38:13PM +0000, Tingmao Wang wrote:
> >> On 3/3/26 19:50, Günther Noack wrote:
> >>> [...]
> >>> On Tue, Mar 03, 2026 at 11:20:10AM -0500, Justin Suess wrote:
> >>>> On Thu, Feb 26, 2026 at 09:59:02AM +0800, Yihan Ding wrote:
> >>>>> [...]
> >>>>> diff --git a/security/landlock/tsync.c b/security/landlock/tsync.c
> >>>>> index de01aa899751..xxxxxxxxxxxx 100644
> >>>>> --- a/security/landlock/tsync.c
> >>>>> +++ b/security/landlock/tsync.c
> >>>>> @@ -447,6 +447,13 @@ int landlock_restrict_sibling_threads(const struct cred *old_cred,
> >>>>> shared_ctx.new_cred = new_cred;
> >>>>> shared_ctx.set_no_new_privs = task_no_new_privs(current);
> >>>>>
> >>>>> + /*
> >>>>> + * Serialize concurrent TSYNC operations to prevent deadlocks
> >>>>> + * when multiple threads call landlock_restrict_self() simultaneously.
> >>>>> + */
> >>>>> + if (!down_write_trylock(¤t->signal->exec_update_lock))
> >>>>> + return -ERESTARTNOINTR;
> >>>> These two lines above introduced a test failure in tsync_test
> >>>> completing_enablement.
> >>>>
> >>>> The commit that introduced the bug is 3d6327c306b3e1356ab868bf27a0854669295a4f
> >>>> (this patch) and is currently in the mic/next branch.
> >>>>
> >>>> I noticed the test failure while testing an unrelated patch.
> >>>>
> >>>> The bug is because this code never actually yields or restarts the syscall.
> >>>>
> >>>> This is the test output I observed:
> >>>>
> >>>> [+] Running tsync_test:
> >>>> TAP version 13
> >>>> 1..4
> >>>> # Starting 4 tests from 1 test cases.
> >>>> # RUN global.single_threaded_success ...
> >>>> # OK global.single_threaded_success
> >>>> ok 1 global.single_threaded_success
> >>>> # RUN global.multi_threaded_success ...
> >>>> # OK global.multi_threaded_success
> >>>> ok 2 global.multi_threaded_success
> >>>> # RUN global.multi_threaded_success_despite_diverging_domains ...
> >>>> # OK global.multi_threaded_success_despite_diverging_domains
> >>>> ok 3 global.multi_threaded_success_despite_diverging_domains
> >>>> # RUN global.competing_enablement ...
> >>>> # tsync_test.c:156:competing_enablement:Expected 0 (0) == d[1].result (-1)
> >>>
> >>> The interesting part here is when you print out the errno that is
> >>> returned from the syscall -- it is 513, the value of ERESTARTNOINTR!
> >>>
> >>> My understanding so far: Poking around in kernel/entry/common.c, it
> >>> seems that __exit_to_user_mode_loop() calls
> >>> arch_do_signal_or_restart() only when there is a pending signal
> >>> (_TIF_SIGPENDING or _TIF_NOTIFY_SIGNAL). So it was possible that the
> >>> system call returns with the (normally internal) error code
> >>> ERESTARTNOINTR, in the case where the trylock fails, but where current
> >>> has not received a signal from the other competing TSYNC thread yet.
> >>>
> >>> So with that in mind, would it work to do this?
> >>>
> >>> while (try-to-acquire-the-lock) {
> >>> if (current-has-task-works-pending)
> >>> return -ERESTARTNOINTR;
> >>>
> >>> cond_resched();
> >>> }
> >>>
> >>> Then we could avoid calling task_work_run() directly; (I find it
> >>> difficult to reason about the implications of calling taks_work_run()
> >>> directly, because these task works may make assumptions about the
> >>> context in which they are running.)
> >>
> >> I've not caught up with the full discussion so might be missing some context on why RESTARTNOINTR was used here,
> >> but wouldn't
> >>
> >> diff --git a/security/landlock/tsync.c b/security/landlock/tsync.c
> >> index 950b63d23729..f695fe44e2f1 100644
> >> --- a/security/landlock/tsync.c
> >> +++ b/security/landlock/tsync.c
> >> @@ -490,7 +490,7 @@ int landlock_restrict_sibling_threads(const struct cred *old_cred,
> >> * when multiple threads call landlock_restrict_self() simultaneously.
> >> */
> >> if (!down_write_trylock(¤t->signal->exec_update_lock))
> >> - return -ERESTARTNOINTR;
> >> + return restart_syscall();
> >>
> >> /*
> >> * We schedule a pseudo-signal task_work for each of the calling task's
> >>
> >> achieve what the original patch intended?
> >
> > Thanks, that's an excellent point!
> >
> > restart_syscall() (a) sets TIF_SIGPENDING and then (b) returns
> > -ERESTARTNOINTR. (a) was the part that we have been missing for the
> > restart to work (see discussion above). Together, (a) and (b) cause
> > __exit_to_user_mode_loop() to restart the syscall. Given that this is
> > offered in signal.h, this seems like a clean and more "official" way
> > to do this than using the task works APIs.
> >
> > It also fixes the previously failing selftest (I tried).
> >
> > Yihan, Justin: Does that seem reasonable to you as well?
> >
> > –Günther
> >
>
next prev parent reply other threads:[~2026-03-04 14:08 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-26 1:59 [PATCH v3 0/2] landlock: Fix TSYNC deadlock and clean up error path Yihan Ding
2026-02-26 1:59 ` [PATCH v3 1/2] landlock: Serialize TSYNC thread restriction Yihan Ding
2026-02-26 7:23 ` Günther Noack
2026-03-03 16:20 ` Justin Suess
2026-03-03 17:47 ` Mickaël Salaün
2026-03-03 18:13 ` Justin Suess
2026-03-03 19:50 ` Günther Noack
2026-03-03 20:38 ` Tingmao Wang
2026-03-03 21:19 ` Günther Noack
2026-03-04 2:46 ` Ding Yihan
2026-03-04 7:44 ` Günther Noack
2026-03-04 14:08 ` Justin Suess [this message]
2026-03-03 21:08 ` Justin Suess
2026-03-03 17:51 ` Mickaël Salaün
2026-02-26 1:59 ` [PATCH v3 2/2] landlock: Clean up interrupted thread logic in TSYNC Yihan Ding
2026-02-26 7:23 ` Günther Noack
2026-03-03 17:31 ` [PATCH v3 0/2] landlock: Fix TSYNC deadlock and clean up error path Mickaël Salaün
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aag85lfE3lfSSGEH@suesslenovo \
--to=utilityemal77@gmail.com \
--cc=dingyihan@uniontech.com \
--cc=gnoack3000@gmail.com \
--cc=jannh@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-security-module@vger.kernel.org \
--cc=m@maowtm.org \
--cc=mic@digikod.net \
--cc=paul@paul-moore.com \
--cc=syzbot+7ea2f5e9dfd468201817@syzkaller.appspotmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox