From: Marcelo Tosatti <mtosatti@redhat.com>
To: Nitesh Lal <nilal@redhat.com>
Cc: linux-kernel@vger.kernel.org,
Nicolas Saenz Julienne <nsaenzju@redhat.com>,
Frederic Weisbecker <frederic@kernel.org>,
Christoph Lameter <cl@linux.com>,
Juri Lelli <juri.lelli@redhat.com>,
Peter Zijlstra <peterz@infradead.org>,
Alex Belits <abelits@belits.com>, Peter Xu <peterx@redhat.com>
Subject: Re: [patch 1/4] add basic task isolation prctl interface
Date: Fri, 30 Jul 2021 21:49:51 -0300 [thread overview]
Message-ID: <20210731004951.GA77573@fuller.cnet> (raw)
In-Reply-To: <CAFki+Lnf0cs62Se0aPubzYxP9wh7xjMXn7RXEPvrmtBdYBrsow@mail.gmail.com>
On Fri, Jul 30, 2021 at 07:36:31PM -0400, Nitesh Lal wrote:
> On Fri, Jul 30, 2021 at 4:21 PM Marcelo Tosatti <mtosatti@redhat.com> wrote:
>
> > Add basic prctl task isolation interface, which allows
> > informing the kernel that application is executing
> > latency sensitive code (where interruptions are undesired).
> >
> > Interface is described by task_isolation.rst (added by this patch).
> >
> > Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
> >
> > Index: linux-2.6/Documentation/userspace-api/task_isolation.rst
> > ===================================================================
> > --- /dev/null
> > +++ linux-2.6/Documentation/userspace-api/task_isolation.rst
> > @@ -0,0 +1,187 @@
> > +.. SPDX-License-Identifier: GPL-2.0
> > +
> > +===============================
> > +Task isolation prctl interface
> > +===============================
> > +
> > +Certain types of applications benefit from running uninterrupted by
> > +background OS activities. Realtime systems and high-bandwidth networking
> > +applications with user-space drivers can fall into the category.
> > +
> > +
> > +To create a OS noise free environment for the application, this
> > +interface allows userspace to inform the kernel the start and
> > +end of the latency sensitive application section (with configurable
> > +system behaviour for that section).
> > +
> > +The prctl options are:
> > +
> > +
> > + - PR_ISOL_FEAT: Retrieve supported features.
> > + - PR_ISOL_GET: Retrieve task isolation parameters.
> > + - PR_ISOL_SET: Set task isolation parameters.
> > + - PR_ISOL_CTRL_GET: Retrieve task isolation state.
> > + - PR_ISOL_CTRL_SET: Set task isolation state (enable/disable task
> > isolation).
> > +
> >
>
> Didn't we decide to replace FEAT/FEATURES with MODE?
Searching for the definition of mode:
mode: one of a series of ways that a machine can be made to work
in manual/automatic mode.
mode: a particular way of doing something.
mode: a way of operating, living, or behaving.
So "mode" seems to fit the case where one case can be chosen
between different choices (exclusively).
Now for this case it seems a composition of things is what is
happening, because quiescing might be functional with both
"syscalls allowed" and "syscalls not allowed" modes
(in that case, "mode" makes more sense).
> > +The isolation parameters and state are not inherited by
> > +children created by fork(2) and clone(2). The setting is
> > +preserved across execve(2).
> > +
> > +The sequence of steps to enable task isolation are:
> > +
> > +1. Retrieve supported task isolation features (PR_ISOL_FEAT).
> > +
> > +2. Configure task isolation features (PR_ISOL_SET/PR_ISOL_GET).
> > +
> > +3. Activate or deactivate task isolation features
> > + (PR_ISOL_CTRL_GET/PR_ISOL_CTRL_SET).
> > +
> > +This interface is based on ideas and code from the
> > +task isolation patchset from Alex Belits:
> > +https://lwn.net/Articles/816298/
> > +
> > +--------------------
> > +Feature description
> > +--------------------
> > +
> > + - ``ISOL_F_QUIESCE``
> > +
> > + This feature allows quiescing select kernel activities on
> > + return from system calls.
> > +
> > +---------------------
> > +Interface description
> > +---------------------
> > +
> > +**PR_ISOL_FEAT**:
> > +
> > + Returns the supported features and feature
> > + capabilities, as a bitmask. Features and its capabilities
> > + are defined at include/uapi/linux/task_isolation.h::
> > +
> > + prctl(PR_ISOL_FEAT, feat, arg3, arg4, arg5);
> > +
> > + The 'feat' argument specifies whether to return
> > + supported features (if zero), or feature capabilities
> > + (if not zero). Possible non-zero values for 'feat' are:
> >
>
> By feature capabilities you mean the kernel activities (vmstat, tlb_flush)?
Not necessarily, but in the case of ISOL_F_QUIESCE, yes, the different
kernel activities that might interrupt the task.
Feature capabilities is a generic term. For example, one might add
ISOL_F_NOTIFY with ISOL_F_NOTIFY_SIGNAL capabilities.
or
ISOL_F_NOTIFY with ISOL_F_NOTIFY_EVENTFD capabilities.
or
ISOL_F_future_feature with ISOL_F_future_feature_capability.
> +
> > + - ``ISOL_F_QUIESCE``:
> > +
> > + If arg3 is zero, returns a bitmask containing
> > + which kernel activities are supported for quiescing.
> > +
> > + If arg3 is ISOL_F_QUIESCE_DEFMASK, returns
> > + default_quiesce_mask, a system-wide configurable.
> > + See description of default_quiesce_mask below.
> > +
> > +**PR_ISOL_GET**:
> > +
> > + Retrieve task isolation feature configuration.
> > + The general format is::
> > +
> > + prctl(PR_ISOL_GET, feat, arg3, arg4, arg5);
> > +
> > + Possible values for feat are:
> > +
> > + - ``ISOL_F_QUIESCE``:
> > +
> > + Returns a bitmask containing which kernel
> > + activities are enabled for quiescing.
> > +
> > +
> > +**PR_ISOL_SET**:
> > +
> > + Configures task isolation features. The general format is::
> > +
> > + prctl(PR_ISOL_SET, feat, arg3, arg4, arg5);
> > +
> > + The 'feat' argument specifies which feature to configure.
> > + Possible values for feat are:
> >
>
> We should be able to enable multiple features as well via this? Something
> like ISOL_F_QUIESCE|ISOL_F_BLOCK_INTERRUPTORS as you have mentioned in the
> last posting.
One probably would do it separately (PR_ISOL_SET configures each
feature separately):
ret = prctl(PR_ISOL_FEAT, 0, 0, 0, 0);
if (ret == -1) {
perror("prctl PR_ISOL_FEAT");
return EXIT_FAILURE;
}
if (!(ret & ISOL_F_BLOCK_INTERRUPTORS)) {
printf("ISOL_F_BLOCK_INTERRUPTORS feature unsupported, quitting\n");
return EXIT_FAILURE;
}
ret = prctl(PR_ISOL_SET, ISOL_F_BLOCK_INTERRUPTORS, params...);
if (ret == -1) {
perror("prctl PR_ISOL_SET");
return EXIT_FAILURE;
}
/* configure ISOL_F_QUIESCE, ISOL_F_NOTIFY,
* ISOL_F_future_feature... */
ctrl_set_mask = ISOL_F_QUIESCE|ISOL_F_BLOCK_INTERRUPTORS|
ISOL_F_NOTIFY|ISOL_F_future_feature;
/*
* activate isolation mode with the features
* as configured above
*/
ret = prctl(PR_ISOL_CTRL_SET, ctrl_set_mask, 0, 0, 0);
if (ret == -1) {
perror("prctl PR_ISOL_CTRL_SET (ISOL_F_QUIESCE)");
return EXIT_FAILURE;
}
latency sensitive loop
> > +
> > + - ``ISOL_F_QUIESCE``:
> > +
> > + The 'arg3' argument is a bitmask specifying which
> > + kernel activities to quiesce. Possible bit sets are:
> > +
> > + - ``ISOL_F_QUIESCE_VMSTATS``
> > +
> > + VM statistics are maintained in per-CPU counters to
> > + improve performance. When a CPU modifies a VM statistic,
> > + this modification is kept in the per-CPU counter.
> > + Certain activities require a global count, which
> > + involves requesting each CPU to flush its local counters
> > + to the global VM counters.
> > +
> > + This flush is implemented via a workqueue item, which
> > + might schedule a workqueue on isolated CPUs.
> > +
> > + To avoid this interruption, task isolation can be
> > + configured to, upon return from system calls,
> > synchronize
> > + the per-CPU counters to global counters, thus avoiding
> > + the interruption.
> > +
> > + To ensure the application returns to userspace
> > + with no modified per-CPU counters, its necessary to
> > + use mlockall() in addition to this isolcpus flag.
> > +
> > +**PR_ISOL_CTRL_GET**:
> > +
> > + Retrieve task isolation control.
> > +
> > + prctl(PR_ISOL_CTRL_GET, 0, 0, 0, 0);
> > +
> > + Returns which isolation features are active.
> > +
> > +**PR_ISOL_CTRL_SET**:
> > +
> > + Activates/deactivates task isolation control.
> > +
> > + prctl(PR_ISOL_CTRL_SET, mask, 0, 0, 0);
> > +
> > + The 'mask' argument specifies which features
> > + to activate (bit set) or deactivate (bit clear).
> > +
> > + For ISOL_F_QUIESCE, quiescing of background activities
> > + happens on return to userspace from the
> > + prctl(PR_ISOL_CTRL_SET) call, and on return from
> > + subsequent system calls.
> > +
> > + Quiescing can be adjusted (while active) by
> > + prctl(PR_ISOL_SET, ISOL_F_QUIESCE, ...).
> >
>
> Why do we need this additional control? We should be able to enable or
> disable task isolation using the _GET_ and _SET_ calls, isn't it?
The distinction is so one is able to configure the features separately,
and then enter isolated mode with them activated.
> > +
> > +--------------------
> > +Default quiesce mask
> > +--------------------
> > +
> > +Applications can either explicitly specify individual
> > +background activities that should be quiesced, or
> > +obtain a system configurable value, which is to be
> > +configured by the system admin/mgmt system.
> > +
> > +/sys/kernel/task_isolation/available_quiesce lists, as
> > +one string per line, the activities which the kernel
> > +supports quiescing.
> >
>
> Probably replace 'quiesce' with 'quiesce_activities' because we are really
> controlling the kernel activities via this control and not the quiesce
> state/feature itself.
OK, makes sense.
> > +
> > +To configure the default quiesce mask, write a comma separated
> > +list of strings (from available_quiesce) to
> > +/sys/kernel/task_isolation/default_quiesce.
> > +
> > +echo > /sys/kernel/task_isolation/default_quiesce disables
> > +all quiescing via ISOL_F_QUIESCE_DEFMASK.
> > +
> > +Using ISOL_F_QUIESCE_DEFMASK allows for the application to
> > +take advantage of future quiescing capabilities without
> > +modification (provided default_quiesce is configured
> > +accordingly).
> >
>
> ISOL_F_QUIESCE_DEFMASK is really telling to quite all kernel
> activities including the one that is not currently supported or I am
> misinterpreting something?
Its telling to quiesce activities that are configured via
/sys/kernel/task_isolation/default_quiesce, including
ones that are not currently supported (in the future,
/sys/kernel/task_isolation/default_quiesce will have to contain the bit
for the new feature as 1).
So userspace can either:
quiesce_mask = value of /sys/kernel/task_isolation/default_quiesce
prctl(PR_ISOL_SET, ISOL_F_QUIESCE, quiesce_mask, 0, 0);
(so that new features might be automatically enabled by
a sysadmin).
or
quiesce_mask = application choice of bits
prctl(PR_ISOL_SET, ISOL_F_QUIESCE, quiesce_mask, 0, 0);
(so that new features might be automatically enabled by
a sysadmin).
next prev parent reply other threads:[~2021-07-31 0:50 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-07-30 20:18 [patch 0/4] extensible prctl task isolation interface and vmstat sync (v2) Marcelo Tosatti
2021-07-30 20:18 ` [patch 1/4] add basic task isolation prctl interface Marcelo Tosatti
[not found] ` <CAFki+Lnf0cs62Se0aPubzYxP9wh7xjMXn7RXEPvrmtBdYBrsow@mail.gmail.com>
2021-07-31 0:49 ` Marcelo Tosatti [this message]
2021-07-31 7:47 ` kernel test robot
2021-07-31 7:47 ` kernel test robot
[not found] ` <CAFki+LkQVQOe+5aNEKWDvLdnjWjxzKWOiqOvBZzeuPWX+G=XgA@mail.gmail.com>
2021-08-02 14:16 ` Marcelo Tosatti
2021-07-30 20:18 ` [patch 2/4] task isolation: sync vmstats on return to userspace Marcelo Tosatti
2021-08-03 15:13 ` nsaenzju
2021-08-03 16:44 ` Marcelo Tosatti
2021-07-30 20:18 ` [patch 3/4] mm: vmstat: move need_update Marcelo Tosatti
2021-07-30 20:18 ` [patch 4/4] mm: vmstat_refresh: avoid queueing work item if cpu stats are clean Marcelo Tosatti
2021-08-07 2:47 ` Nitesh Lal
2021-08-09 17:34 ` Marcelo Tosatti
2021-08-09 19:13 ` Nitesh Lal
2021-08-10 16:40 ` [patch 0/4] extensible prctl task isolation interface and vmstat sync (v2) Thomas Gleixner
2021-08-10 18:37 ` Marcelo Tosatti
2021-08-10 19:15 ` Marcelo Tosatti
-- strict thread matches above, loose matches on Subject: below --
2021-07-27 12:29 [patch 1/4] add basic task isolation prctl interface kernel test robot
2021-07-27 10:38 [patch 0/4] prctl task isolation interface and vmstat sync Marcelo Tosatti
2021-07-27 10:38 ` [patch 1/4] add basic task isolation prctl interface Marcelo Tosatti
2021-07-27 10:48 ` nsaenzju
2021-07-27 11:00 ` Marcelo Tosatti
2021-07-27 12:38 ` nsaenzju
2021-07-27 13:06 ` Marcelo Tosatti
2021-07-27 13:08 ` Marcelo Tosatti
2021-07-27 13:09 ` Frederic Weisbecker
2021-07-27 14:52 ` Marcelo Tosatti
2021-07-27 23:45 ` Frederic Weisbecker
2021-07-28 9:37 ` Marcelo Tosatti
2021-07-28 11:45 ` Frederic Weisbecker
2021-07-28 13:21 ` Marcelo Tosatti
2021-07-28 21:22 ` Frederic Weisbecker
2021-07-28 11:55 ` nsaenzju
2021-07-28 13:16 ` Marcelo Tosatti
[not found] ` <CAFki+LkQwoqVTKmgnwLQQM8ua-ixbLp8i+jUT6xF15k6X=89mw@mail.gmail.com>
2021-07-28 16:21 ` Marcelo Tosatti
2021-07-28 17:08 ` nsaenzju
[not found] ` <CAFki+LmHeXmSFze8YEHFNbYA5hLEtnZyk37Yjf-eyOuKa8Os4w@mail.gmail.com>
2021-07-28 16:17 ` Marcelo Tosatti
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210731004951.GA77573@fuller.cnet \
--to=mtosatti@redhat.com \
--cc=abelits@belits.com \
--cc=cl@linux.com \
--cc=frederic@kernel.org \
--cc=juri.lelli@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=nilal@redhat.com \
--cc=nsaenzju@redhat.com \
--cc=peterx@redhat.com \
--cc=peterz@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.