Re: [PATCH v3 1/2] landlock: Serialize TSYNC thread restriction

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Justin Suess <utilityemal77@gmail.com>
To: Ding Yihan <dingyihan@uniontech.com>
Cc: "Günther Noack" <gnoack3000@gmail.com>,
	"Tingmao Wang" <m@maowtm.org>, "Mickaël Salaün" <mic@digikod.net>,
	"Paul Moore" <paul@paul-moore.com>,
	"Jann Horn" <jannh@google.com>,
	linux-security-module@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	syzbot+7ea2f5e9dfd468201817@syzkaller.appspotmail.com
Subject: Re: [PATCH v3 1/2] landlock: Serialize TSYNC thread restriction
Date: Wed, 4 Mar 2026 09:08:38 -0500	[thread overview]
Message-ID: <aag85lfE3lfSSGEH@suesslenovo> (raw)
In-Reply-To: <D31C8311F753F56D+d07bcc15-ede7-4a04-829d-d80f69abda83@uniontech.com>

On Wed, Mar 04, 2026 at 10:46:39AM +0800, Ding Yihan wrote:
> Hi all,
> 
> Thank you Justin for catching the test failure and the thorough
> investigation! And thanks Günther and Tingmao for diving into the
> syscall restart mechanics.
> 
> I've evaluated both the `while` loop approach with `task_work_run()`
> and the `restart_syscall()` approach. I strongly lean towards using
> `restart_syscall()` as suggested by Tingmao. 
> 
> As Günther pointed out earlier, executing `task_work_run()` directly
> deep inside the syscall context can be risky. Task works often assume
> they are running at the kernel-user boundary with a specific state.
> Using `restart_syscall()` safely bounces us to that boundary, processes
> the works cleanly, and restarts the syscall via standard mechanisms.
> 
> After some selftests,I will prepare the v4 patch series using `restart_syscall()`.
> I will also ensure all comments are properly wrapped to 80 columns as requested
> by Mickaël, and make sure to include the proper Reported-by and
> Suggested-by tags for everyone's excellent input here.
> 
> Expect the v4 series shortly. Thanks again for the great collaboration!
> 
> 
> Best regards,
> Yihan Ding
>
After review, I agree Tingmao's solution is better.

Coming from a userspace background, I didn't think of that as a solution
for a lock contention, but kernel space has different needs/conventions.

I agree this is probably the right way to go. The simplest approach is
probably best here, and the restart_syscall seems better here, seeing as
task_work_run is rarely called in kernel code outside core paths.

I've learned a lot about kernel task workers and how locking is handled
as a result.

Thank you for your work with this series, this fix is useful!
> 在 2026/3/4 05:19, Günther Noack 写道:
> > On Tue, Mar 03, 2026 at 08:38:13PM +0000, Tingmao Wang wrote:
> >> On 3/3/26 19:50, Günther Noack wrote:
> >>> [...]
> >>> On Tue, Mar 03, 2026 at 11:20:10AM -0500, Justin Suess wrote:
> >>>> On Thu, Feb 26, 2026 at 09:59:02AM +0800, Yihan Ding wrote:
> >>>>> [...]
> >>>>> diff --git a/security/landlock/tsync.c b/security/landlock/tsync.c
> >>>>> index de01aa899751..xxxxxxxxxxxx 100644
> >>>>> --- a/security/landlock/tsync.c
> >>>>> +++ b/security/landlock/tsync.c
> >>>>> @@ -447,6 +447,13 @@ int landlock_restrict_sibling_threads(const struct cred *old_cred,
> >>>>>  	shared_ctx.new_cred = new_cred;
> >>>>>  	shared_ctx.set_no_new_privs = task_no_new_privs(current);
> >>>>>  
> >>>>> +	/*
> >>>>> +	 * Serialize concurrent TSYNC operations to prevent deadlocks
> >>>>> +	 * when multiple threads call landlock_restrict_self() simultaneously.
> >>>>> +	 */
> >>>>> +	if (!down_write_trylock(&current->signal->exec_update_lock))
> >>>>> +		return -ERESTARTNOINTR;
> >>>> These two lines above introduced a test failure in tsync_test
> >>>> completing_enablement.
> >>>>
> >>>> The commit that introduced the bug is 3d6327c306b3e1356ab868bf27a0854669295a4f
> >>>> (this patch) and is currently in the mic/next branch.
> >>>>
> >>>> I noticed the test failure while testing an unrelated patch.
> >>>>
> >>>> The bug is because this code never actually yields or restarts the syscall.
> >>>>
> >>>> This is the test output I observed:
> >>>>
> >>>>   [+] Running tsync_test:
> >>>>   TAP version 13
> >>>>   1..4
> >>>>   # Starting 4 tests from 1 test cases.
> >>>>   #  RUN           global.single_threaded_success ...
> >>>>   #            OK  global.single_threaded_success
> >>>>   ok 1 global.single_threaded_success
> >>>>   #  RUN           global.multi_threaded_success ...
> >>>>   #            OK  global.multi_threaded_success
> >>>>   ok 2 global.multi_threaded_success
> >>>>   #  RUN           global.multi_threaded_success_despite_diverging_domains ...
> >>>>   #            OK  global.multi_threaded_success_despite_diverging_domains
> >>>>   ok 3 global.multi_threaded_success_despite_diverging_domains
> >>>>   #  RUN           global.competing_enablement ...
> >>>>   # tsync_test.c:156:competing_enablement:Expected 0 (0) == d[1].result (-1)
> >>>
> >>> The interesting part here is when you print out the errno that is
> >>> returned from the syscall -- it is 513, the value of ERESTARTNOINTR!
> >>>
> >>> My understanding so far: Poking around in kernel/entry/common.c, it
> >>> seems that __exit_to_user_mode_loop() calls
> >>> arch_do_signal_or_restart() only when there is a pending signal
> >>> (_TIF_SIGPENDING or _TIF_NOTIFY_SIGNAL).  So it was possible that the
> >>> system call returns with the (normally internal) error code
> >>> ERESTARTNOINTR, in the case where the trylock fails, but where current
> >>> has not received a signal from the other competing TSYNC thread yet.
> >>>
> >>> So with that in mind, would it work to do this?
> >>>
> >>>   while (try-to-acquire-the-lock) {
> >>>     if (current-has-task-works-pending)
> >>>       return -ERESTARTNOINTR;
> >>>
> >>>     cond_resched();
> >>>   }
> >>>
> >>> Then we could avoid calling task_work_run() directly; (I find it
> >>> difficult to reason about the implications of calling taks_work_run()
> >>> directly, because these task works may make assumptions about the
> >>> context in which they are running.)
> >>
> >> I've not caught up with the full discussion so might be missing some context on why RESTARTNOINTR was used here,
> >> but wouldn't
> >>
> >> diff --git a/security/landlock/tsync.c b/security/landlock/tsync.c
> >> index 950b63d23729..f695fe44e2f1 100644
> >> --- a/security/landlock/tsync.c
> >> +++ b/security/landlock/tsync.c
> >> @@ -490,7 +490,7 @@ int landlock_restrict_sibling_threads(const struct cred *old_cred,
> >>  	 * when multiple threads call landlock_restrict_self() simultaneously.
> >>  	 */
> >>  	if (!down_write_trylock(&current->signal->exec_update_lock))
> >> -		return -ERESTARTNOINTR;
> >> +		return restart_syscall();
> >>  
> >>  	/*
> >>  	 * We schedule a pseudo-signal task_work for each of the calling task's
> >>
> >> achieve what the original patch intended?
> > 
> > Thanks, that's an excellent point!
> > 
> > restart_syscall() (a) sets TIF_SIGPENDING and then (b) returns
> > -ERESTARTNOINTR.  (a) was the part that we have been missing for the
> > restart to work (see discussion above).  Together, (a) and (b) cause
> > __exit_to_user_mode_loop() to restart the syscall.  Given that this is
> > offered in signal.h, this seems like a clean and more "official" way
> > to do this than using the task works APIs.
> > 
> > It also fixes the previously failing selftest (I tried).
> > 
> > Yihan, Justin: Does that seem reasonable to you as well?
> > 
> > –Günther
> > 
>

next prev parent reply	other threads:[~2026-03-04 14:08 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-26  1:59 [PATCH v3 0/2] landlock: Fix TSYNC deadlock and clean up error path Yihan Ding
2026-02-26  1:59 ` [PATCH v3 1/2] landlock: Serialize TSYNC thread restriction Yihan Ding
2026-02-26  7:23   ` Günther Noack
2026-03-03 16:20   ` Justin Suess
2026-03-03 17:47     ` Mickaël Salaün
2026-03-03 18:13       ` Justin Suess
2026-03-03 19:50     ` Günther Noack
2026-03-03 20:38       ` Tingmao Wang
2026-03-03 21:19         ` Günther Noack
2026-03-04  2:46           ` Ding Yihan
2026-03-04  7:44             ` Günther Noack
2026-03-04 14:08             ` Justin Suess [this message]
2026-03-03 21:08       ` Justin Suess
2026-03-03 17:51   ` Mickaël Salaün
2026-02-26  1:59 ` [PATCH v3 2/2] landlock: Clean up interrupted thread logic in TSYNC Yihan Ding
2026-02-26  7:23   ` Günther Noack
2026-03-03 17:31 ` [PATCH v3 0/2] landlock: Fix TSYNC deadlock and clean up error path Mickaël Salaün

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aag85lfE3lfSSGEH@suesslenovo \
    --to=utilityemal77@gmail.com \
    --cc=dingyihan@uniontech.com \
    --cc=gnoack3000@gmail.com \
    --cc=jannh@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-security-module@vger.kernel.org \
    --cc=m@maowtm.org \
    --cc=mic@digikod.net \
    --cc=paul@paul-moore.com \
    --cc=syzbot+7ea2f5e9dfd468201817@syzkaller.appspotmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.