From: Marcelo Tosatti <mtosatti@redhat.com>
To: Frederic Weisbecker <frederic@kernel.org>
Cc: nsaenzju@redhat.com, linux-kernel@vger.kernel.org,
Nitesh Lal <nilal@redhat.com>, Christoph Lameter <cl@linux.com>,
Juri Lelli <juri.lelli@redhat.com>,
Peter Zijlstra <peterz@infradead.org>,
Alex Belits <abelits@marvell.com>, Peter Xu <peterx@redhat.com>,
Thomas Gleixner <tglx@linutronix.de>
Subject: Re: [patch 1/4] add basic task isolation prctl interface
Date: Wed, 28 Jul 2021 10:21:34 -0300 [thread overview]
Message-ID: <20210728132134.GA10515@fuller.cnet> (raw)
In-Reply-To: <20210728114548.GA293265@lothringen>
On Wed, Jul 28, 2021 at 01:45:48PM +0200, Frederic Weisbecker wrote:
> On Wed, Jul 28, 2021 at 06:37:07AM -0300, Marcelo Tosatti wrote:
> > On Wed, Jul 28, 2021 at 01:45:39AM +0200, Frederic Weisbecker wrote:
> > > On Tue, Jul 27, 2021 at 11:52:09AM -0300, Marcelo Tosatti wrote:
> > > > The meaning of isolated is specified as follows:
> > > >
> > > > Isolation features
> > > > ==================
> > > >
> > > > - prctl(PR_ISOL_GET, ISOL_SUP_FEATURES, 0, 0, 0) returns the supported
> > > > features as a return value.
> > > >
> > > > - prctl(PR_ISOL_SET, ISOL_FEATURES, bitmask, 0, 0) enables the features in
> > > > the bitmask.
> > > >
> > > > - prctl(PR_ISOL_GET, ISOL_FEATURES, 0, 0, 0) returns the currently
> > > > enabled features.
> > >
> > > So what are the ISOL_FEATURES here? A mode that we enter such as flush
> > > vmstat _everytime_ we resume to userpace after (and including) this prctl() ?
> >
> > ISOL_FEATURES is just the "command" type (which you can get and set).
> >
> > The bitmask would include ISOL_F_QUIESCE_ON_URET, so:
> >
> > - bitmask = ISOL_F_QUIESCE_ON_URET;
> > - prctl(PR_ISOL_SET, ISOL_FEATURES, bitmask, 0, 0) enables the features in
> > the bitmask.
>
> But does it quiesce once or for every further uret?
For every uret, while ISOL_F_QUIESCE_ON_URET is enabled through
prctl(PR_ISOL_ENABLE, enabled_bitmask, 0, 0, 0).
> > - quiesce_bitmap = prctl(PR_ISOL_GET, PR_ISOL_SUP_QUIESCE_CFG, 0, 0, 0)
> > (1)
> >
> > (returns the supported actions to be quiesced).
> >
> > - prctl(PR_ISOL_SET, PR_ISOL_QUIESCE_CFG, quiesce_bitmask, 0, 0) _sets_
> > the actions to be quiesced (2)
> >
> > If an application does not modify "quiesce_bitmask" between
> > points (1) and (2) above, it will enable quiescing of all
> > "features" the kernel supports.
>
> I don't get the difference between ISOL_FEATURES and PR_ISOL_QUIESCE_CFG.
prctl(PR_ISOL_SET, cmd, ...) is intented to accept different types of "command"
variables (including ones for new features which are not known at this
time).
- prctl(PR_ISOL_SET, ISOL_FEATURES, bitmask, 0, 0) enables the features in
the bitmask
(which might now be superceded by
prctl(PR_ISOL_ENABLE, ISOL_F_QUIESCE_ON_URET, 0, 0, 0))
- prctl(PR_ISOL_SET, PR_ISOL_QUIESCE_CFG, bitmask, 0, 0) configures
quiescing of which subsystem/feature is performed:
#define ISOL_F_QUIESCE_VMSTAT_SYNC (1<<0)
#define ISOL_F_QUIESCE_NOHZ_FULL (1<<1)
#define ISOL_F_QUIESCE_DEFER_TLB_FLUSH (1<<2)
> > Application can, however, modify quiesce_bitmap to its preference.
> >
> > Flushing vmstat _everytime_ you resume to userspace is enabled only
> > _after_ prctl(PR_ISOL_ENTER, 0, 0, 0, 0) is performed (which happens
> > only when isolation is fully configured with the PR_ISOL_SET calls).
> > OK, will better document that.
>
> Yes please, I'm completely confused :o)
OK.
> > > If so I'd rather call that ISOL_MODE because feature is too general.
> >
> > Well, in the first patchset, there was one "mode" implemented (but
> > it was possible to implement different modes in the future).
> >
> > This would allow for example easier integration of "full task isolation"
> > patchset type of functionality, disallowing syscalls.
> >
> > I think we'd like to keep that, so i'll keep the previous distinct modes
> > (but allow configuration of individual features on the bitmap).
>
> And I also don't see how such modes differ from configuration of individual
> features on the bitmap.
Good point, they do not intersect, syscall disablement and notification of
"isolation breakage" are orthogonal to quiescing.
> > > > - prctl(PR_ISOL_GET, PR_ISOL_QUIESCE_CFG, 0, 0, 0) returns
> > > > the currently enabled actions to be quiesced.
> > > >
> > > > #define ISOL_F_QUIESCE_VMSTAT_SYNC (1<<0)
> > > > #define ISOL_F_QUIESCE_NOHZ_FULL (1<<1)
> > > > #define ISOL_F_QUIESCE_DEFER_TLB_FLUSH (1<<2)
> > >
> > > And then PR_ISOL_QUIESCE_CFG is a oneshot operation that applies only upon
> > > return to this ctrl, right? If so perhaps this should be called just
> > > ISOL_QUIESCE or ISOL_QUIESCE_ONCE or ISOL_REQ ?
> >
> > There was no one-shot operation implemented in the first patchset. What
> > application would do to achieve that is:
> >
> > 1. Configure isolation with PR_ISOL_SET (say configure mode which
> > allows system calls, and when a system call happens, flush all deferred
> > actions on return to userspace).
> >
> > 2. prctl(PR_ISOL_ENTER, 0, 0, 0, 0) (this actually enables the flushing,
> > and tags the task_struct as isolated). Here we can transfer this information
> > from per-task to per-CPU data, for example, to be able to implement
> > other features such as deferred TLB flushing.
> >
> > On return from this prctl(), deferrable actions are flushed.
> >
> > 3. latency sensitive loop, with no system calls.
> >
> > 4. some event which requires system calls is noticed:
> > prctl(PR_ISOL_EXIT, 0, 0, 0, 0)
> > (this would untag task_struct as isolated).
> >
> > 5. perform system calls A, B, C, D (with no flushing of vmstat,
> > for example).
> >
> > 6. jmp to 2.
> >
> > So there is a problem with this logic, which is that one would like
> > certain isolation functionality to remain enabled between points 4
> > and 6 (for example, blocking CPU hotplug or other blockable activities
> > that would cause interruptions).
> >
> > One way to achieve this would be to replace PR_ISOL_ENTER/PR_ISOL_EXIT
> > with PR_ISOL_ENABLE, which accepts a bitmask:
> >
> > 1. Configure isolation with PR_ISOL_SET (say configure mode which
> > allows system calls, and when a system call happens, flush all deferred
> > actions on return to userspace).
> >
> > 2. enabled_bitmask = ISOL_F_QUIESCE_ON_URET|ISOL_F_BLOCK_INTERRUPTORS;
> > prctl(PR_ISOL_ENABLE, enabled_bitmask, 0, 0, 0)
> >
> > On return from this prctl(), deferrable actions are flushed.
> >
> > 3. latency sensitive loop, with no system calls.
> >
> > 4. some event which requires system calls is noticed:
> >
> > prctl(PR_ISOL_ENABLE, ISOL_F_BLOCK_INTERRUPTORS, 0, 0, 0)
> > (this would clear ISOL_F_QUIESCE_ON_URET, so no flushing
> > is performed on return from system calls).
>
> So PR_ISOL_ENABLE is a way to perform action when some sort of kernel entry
> happens. Then we take actions when that happens (signal, warn, etc...).
>
> I guess we'll need to define what kind of kernel entry, and what kind of
> response need to happen. Ok that's a whole issue of its own that we'll need
> to handle seperately.
>
> Thanks.
In fact, why one can't use SECCOMP for syscall blocking?
Thanks.
next prev parent reply other threads:[~2021-07-28 13:21 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-07-27 10:38 [patch 0/4] prctl task isolation interface and vmstat sync Marcelo Tosatti
2021-07-27 10:38 ` [patch 1/4] add basic task isolation prctl interface Marcelo Tosatti
2021-07-27 10:48 ` nsaenzju
2021-07-27 11:00 ` Marcelo Tosatti
2021-07-27 12:38 ` nsaenzju
2021-07-27 13:06 ` Marcelo Tosatti
2021-07-27 13:08 ` Marcelo Tosatti
2021-07-27 13:09 ` Frederic Weisbecker
2021-07-27 14:52 ` Marcelo Tosatti
2021-07-27 23:45 ` Frederic Weisbecker
2021-07-28 9:37 ` Marcelo Tosatti
2021-07-28 11:45 ` Frederic Weisbecker
2021-07-28 13:21 ` Marcelo Tosatti [this message]
2021-07-28 21:22 ` Frederic Weisbecker
2021-07-28 11:55 ` nsaenzju
2021-07-28 13:16 ` Marcelo Tosatti
[not found] ` <CAFki+LkQwoqVTKmgnwLQQM8ua-ixbLp8i+jUT6xF15k6X=89mw@mail.gmail.com>
2021-07-28 16:21 ` Marcelo Tosatti
2021-07-28 17:08 ` nsaenzju
[not found] ` <CAFki+LmHeXmSFze8YEHFNbYA5hLEtnZyk37Yjf-eyOuKa8Os4w@mail.gmail.com>
2021-07-28 16:17 ` Marcelo Tosatti
2021-07-27 10:38 ` [patch 2/4] task isolation: sync vmstats on return to userspace Marcelo Tosatti
2021-07-27 10:38 ` [patch 3/4] mm: vmstat: move need_update Marcelo Tosatti
2021-07-27 10:38 ` [patch 4/4] mm: vmstat_refresh: avoid queueing work item if cpu stats are clean Marcelo Tosatti
-- strict thread matches above, loose matches on Subject: below --
2021-07-27 12:29 [patch 1/4] add basic task isolation prctl interface kernel test robot
2021-07-30 20:18 [patch 0/4] extensible prctl task isolation interface and vmstat sync (v2) Marcelo Tosatti
2021-07-30 20:18 ` [patch 1/4] add basic task isolation prctl interface Marcelo Tosatti
[not found] ` <CAFki+Lnf0cs62Se0aPubzYxP9wh7xjMXn7RXEPvrmtBdYBrsow@mail.gmail.com>
2021-07-31 0:49 ` Marcelo Tosatti
2021-07-31 7:47 ` kernel test robot
2021-07-31 7:47 ` kernel test robot
[not found] ` <CAFki+LkQVQOe+5aNEKWDvLdnjWjxzKWOiqOvBZzeuPWX+G=XgA@mail.gmail.com>
2021-08-02 14:16 ` Marcelo Tosatti
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210728132134.GA10515@fuller.cnet \
--to=mtosatti@redhat.com \
--cc=abelits@marvell.com \
--cc=cl@linux.com \
--cc=frederic@kernel.org \
--cc=juri.lelli@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=nilal@redhat.com \
--cc=nsaenzju@redhat.com \
--cc=peterx@redhat.com \
--cc=peterz@infradead.org \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.