From: Christian Brauner <brauner@kernel.org>
To: Andrei Vagin <avagin@gmail.com>
Cc: linux-kernel@vger.kernel.org,
Andy Lutomirski <luto@amacapital.net>,
Dietmar Eggemann <dietmar.eggemann@arm.com>,
Kees Cook <keescook@chromium.org>, Ingo Molnar <mingo@redhat.com>,
Juri Lelli <juri.lelli@redhat.com>,
Peter Oskolkov <posk@google.com>,
Peter Zijlstra <peterz@infradead.org>,
Tycho Andersen <tycho@tycho.pizza>,
Will Drewry <wad@chromium.org>,
Vincent Guittot <vincent.guittot@linaro.org>
Subject: Re: [PATCH 4/4] seccomp: add the synchronous mode for seccomp_unotify
Date: Thu, 1 Sep 2022 15:58:02 +0200 [thread overview]
Message-ID: <20220901135802.oeefj2bmsy5gcsmy@wittgenstein> (raw)
In-Reply-To: <CANaxB-x-TAJLNVKVwDm55wVrHo0YxewuU_+7ast+63Q18eT4gQ@mail.gmail.com>
On Tue, Aug 30, 2022 at 02:23:24PM -0700, Andrei Vagin wrote:
> On Tue, Aug 30, 2022 at 3:43 AM Christian Brauner <brauner@kernel.org> wrote:
> >
> > On Mon, Aug 29, 2022 at 06:43:56PM -0700, Andrei Vagin wrote:
> > > seccomp_unotify allows more privileged processes does actions on behalf
> > > of less privileged processes.
> > >
> > > In many cases, the workflow is fully synchronous. It means a target
> > > process triggers a system call and passes controls to a supervisor
> > > process that handles the system call and returns controls to the target
> > > process. In this context, "synchronous" means that only one process is
> > > running and another one is waiting.
> > >
> > > There is the WF_CURRENT_CPU flag that is used to advise the scheduler to
> > > move the wakee to the current CPU. For such synchronous workflows, it
> > > makes context switches a few times faster.
> > >
> > > Right now, each interaction takes 12µs. With this patch, it takes about
> > > 3µs.
> >
> > Seems like a nice idea though I leave it to the sched people to judge
> > whether this is sane or not. So the supervisor which gets woken will be
> > moved to the current cpu in this synchronous scenario.
> >
> > I have no strong opinions on this patch. There are two things I wonder
> > about. First, how meaningful is that speed up given that the supervisor
> > will most often do a lot of heavy-handed things anyway.
>
> I would not use the "most often" phrase in this case;). It is true for LXC-like
> use cases when we need to handle rare syscalls. In this case, the performance
> of this interface doesn't play a big role. But my use case is very different. I
> have a prototype of the gVisor platform, where seccomp is used to trap
> guest system calls. In this case, the difference between 12µs and 3µs is
> tremendous.
Oh yeah, makes sense. I don't know enough about gVisor but I know we can
trust your word! :)
>
> The idea of WF_CURRENT_CPU is not mine. I spied it from the umcg series.
> I took the second patch from that series without any changes.
>
> >
> > Second, this flag is a very specific thing and I wonder how much
> > userspace will really use this and what's more use this correctly.
> >
> > Just to note that LXD - one of the biggest user of this feature - isn't
> > synchronous iiuc for example. Each container gets a separate seccomp
> > supervisor thread (well, go routine but whatever) which exposes a socket
> > that the container manager connects to and sends the seccomp
> > notifications it received from its payload according to an api we
> > established. And each notification is handled in a separate thread
> > (again, go routine but whatever).
>
> It could be synchronous if seccomp events had been handled in [lxc monitor]. But
> right now, [lxc monitor] is just a proxy. In this case, you are right, lxc will
Yep.
> not get any benefits by setting this flag. But we can look at this from another
> side. If we add these changes, we will have another big user of the interface. I
> think the number of gVisor containers that are started each day is comparable
> with the number of LXC/LXD containers.
Sure, if there's users that would benefit from this then no reason to
not consider it. It's just a lot of low-level knobs we give userspace
here but I guess for the notifier it makes sense.
>
> >
> > >
> > > This change introduce the SECCOMP_USER_NOTIF_FD_SYNC_WAKE_UP flag that
> > > it used to enable the sync mode.
> > >
> > > Signed-off-by: Andrei Vagin <avagin@gmail.com>
> > > ---
> > > include/uapi/linux/seccomp.h | 4 ++++
> > > kernel/seccomp.c | 35 +++++++++++++++++++++++++++++++++--
> > > 2 files changed, 37 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/include/uapi/linux/seccomp.h b/include/uapi/linux/seccomp.h
> > > index 0fdc6ef02b94..dbfc9b37fcae 100644
> > > --- a/include/uapi/linux/seccomp.h
> > > +++ b/include/uapi/linux/seccomp.h
> > > @@ -115,6 +115,8 @@ struct seccomp_notif_resp {
> > > __u32 flags;
> > > };
> > >
>
> <snip>
>
> > >
> > > #ifdef SECCOMP_ARCH_NATIVE
> > > @@ -1117,7 +1120,10 @@ static int seccomp_do_user_notification(int this_syscall,
> > > INIT_LIST_HEAD(&n.addfd);
> > >
> > > atomic_add(1, &match->notif->requests);
> > > - wake_up_poll(&match->wqh, EPOLLIN | EPOLLRDNORM);
> > > + if (match->notif->flags & SECCOMP_USER_NOTIF_FD_SYNC_WAKE_UP)
> > > + wake_up_poll_on_current_cpu(&match->wqh, EPOLLIN | EPOLLRDNORM);
> > > + else
> > > + wake_up_poll(&match->wqh, EPOLLIN | EPOLLRDNORM);
> >
> > (We're accumulating a lot of conditional wake primitives in the notifier.)
> >
>
> I am not sure that I understand what you mean here.
I just meant that we have
if (wait_killable)
err = wait_for_completion_killable(&n.ready);
else
err = wait_for_completion_interruptible(&n.ready);
and now also
if (match->notif->flags & SECCOMP_USER_NOTIF_FD_SYNC_WAKE_UP)
wake_up_poll_on_current_cpu(&match->wqh, EPOLLIN | EPOLLRDNORM);
else
wake_up_poll(&match->wqh, EPOLLIN | EPOLLRDNORM);
which is a bit unpleasant but nothing that would mean we can't do this.
prev parent reply other threads:[~2022-09-01 13:58 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-08-30 1:43 [PATCH 0/4] seccomp: add the synchronous mode for seccomp_unotify Andrei Vagin
2022-08-30 1:43 ` [PATCH 1/4] seccomp: don't use semaphore and wait_queue together Andrei Vagin
2022-08-30 9:57 ` Christian Brauner
2022-08-30 16:07 ` Andrei Vagin
2022-08-30 1:43 ` [PATCH 2/4] sched: add WF_CURRENT_CPU and externise ttwu Andrei Vagin
2022-08-30 1:43 ` [PATCH 3/4] sched: add a few helpers to wake up tasks on the current cpu Andrei Vagin
2022-08-30 1:43 ` [PATCH 4/4] seccomp: add the synchronous mode for seccomp_unotify Andrei Vagin
2022-08-30 10:43 ` Christian Brauner
2022-08-30 21:23 ` Andrei Vagin
2022-09-01 13:58 ` Christian Brauner [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220901135802.oeefj2bmsy5gcsmy@wittgenstein \
--to=brauner@kernel.org \
--cc=avagin@gmail.com \
--cc=dietmar.eggemann@arm.com \
--cc=juri.lelli@redhat.com \
--cc=keescook@chromium.org \
--cc=linux-kernel@vger.kernel.org \
--cc=luto@amacapital.net \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=posk@google.com \
--cc=tycho@tycho.pizza \
--cc=vincent.guittot@linaro.org \
--cc=wad@chromium.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox