All of lore.kernel.org
 help / color / mirror / Atom feed
From: Frederic Weisbecker <frederic@kernel.org>
To: Marcelo Tosatti <mtosatti@redhat.com>
Cc: linux-kernel@vger.kernel.org, Nitesh Lal <nilal@redhat.com>,
	Nicolas Saenz Julienne <nsaenzju@redhat.com>,
	Christoph Lameter <cl@linux.com>,
	Juri Lelli <juri.lelli@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Alex Belits <abelits@belits.com>, Peter Xu <peterx@redhat.com>
Subject: Re: [patch V3 2/8] add prctl task isolation prctl docs and samples
Date: Thu, 26 Aug 2021 11:59:58 +0200	[thread overview]
Message-ID: <20210826095958.GA908505@lothringen> (raw)
In-Reply-To: <20210824152646.706875395@fuller.cnet>

On Tue, Aug 24, 2021 at 12:24:25PM -0300, Marcelo Tosatti wrote:
> Add documentation and userspace sample code for prctl
> task isolation interface.
> 
> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
> 
> ---
>  Documentation/userspace-api/task_isolation.rst |  211 +++++++++++++++++++++++++
>  samples/Kconfig                                |    7 
>  samples/Makefile                               |    1 
>  samples/task_isolation/Makefile                |    9 +
>  samples/task_isolation/task_isol.c             |   83 +++++++++
>  samples/task_isolation/task_isol.h             |    9 +
>  samples/task_isolation/task_isol_userloop.c    |   56 ++++++
>  7 files changed, 376 insertions(+)
> 
> Index: linux-2.6/Documentation/userspace-api/task_isolation.rst
> ===================================================================
> --- /dev/null
> +++ linux-2.6/Documentation/userspace-api/task_isolation.rst
> @@ -0,0 +1,281 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +===============================
> +Task isolation prctl interface
> +===============================
> +
> +Certain types of applications benefit from running uninterrupted by
> +background OS activities. Realtime systems and high-bandwidth networking
> +applications with user-space drivers can fall into the category.
> +
> +To create an OS noise free environment for the application, this
> +interface allows userspace to inform the kernel the start and
> +end of the latency sensitive application section (with configurable
> +system behaviour for that section).
> +
> +Note: the prctl interface is independent of nohz_full=.
> +
> +The prctl options are:
> +
> +
> +        - PR_ISOL_FEAT: Retrieve supported features.
> +        - PR_ISOL_GET: Retrieve task isolation parameters.
> +        - PR_ISOL_SET: Set task isolation parameters.
> +        - PR_ISOL_CTRL_GET: Retrieve task isolation state.
> +        - PR_ISOL_CTRL_SET: Set task isolation state.
> +        - PR_ISOL_GET_INT: Retrieve internal parameters.
> +        - PR_ISOL_SET_INT: Retrieve internal parameters.

There should be some short summary here to explain the difference
between parameter, state, and internal parameter. I personally have
no clue so far.

> +
> +
> +Inheritance of the isolation parameters and state, across
> +fork(2) and clone(2), can be changed via
> +PR_ISOL_GET_INT/PR_ISOL_SET_INT.
> +
> +At a high-level, task isolation is divided in two steps:
> +
> +1. Configuration.
> +2. Activation.
> +
> +Section "Userspace support" describes how to use
> +task isolation.
> +
> +In terms of the interface, the sequence of steps to activate
> +task isolation are:
> +
> +1. Retrieve supported task isolation features (PR_ISOL_FEAT).
> +2. Configure task isolation features (PR_ISOL_SET/PR_ISOL_GET).
> +3. Activate or deactivate task isolation features
> +   (PR_ISOL_CTRL_GET/PR_ISOL_CTRL_SET).
> +4. Optionally configure inheritance (PR_ISOL_GET_INT/PR_ISOL_SET_INT).
> +
> +This interface is based on ideas and code from the
> +task isolation patchset from Alex Belits:
> +https://lwn.net/Articles/816298/
> +
> +--------------------
> +Feature description
> +--------------------
> +
> +        - ``ISOL_F_QUIESCE``
> +
> +        This feature allows quiescing select kernel activities on
> +        return from system calls.
> +
> +---------------------
> +Interface description
> +---------------------
> +
> +**PR_ISOL_FEAT**:
> +
> +        Returns the supported features and feature
> +        capabilities, as a bitmask::
> +
> +                prctl(PR_ISOL_FEAT, feat, arg3, arg4, arg5);
> +
> +        The 'feat' argument specifies whether to return
> +        supported features (if zero), or feature capabilities
> +        (if not zero). Possible non-zero values for 'feat' are:
> +
> +        - ``ISOL_F_QUIESCE``:
> +
> +                Returns a bitmask containing which kernel
> +                activities are supported for quiescing.
> +
> +        Features and its capabilities are defined at
> include/uapi/linux/task_isolation.h.

Preferably have feat a parameter name. We never know if we want
to extend it in the future.

> +
> +**PR_ISOL_GET**:
> +
> +        Retrieve task isolation feature configuration.
> +        The general format is::
> +
> +                prctl(PR_ISOL_GET, feat, arg3, arg4, arg5);
> +
> +        The 'feat' argument specifies whether to return
> +        configured features (if zero), or individual feature
> +        configuration (if not zero).

You might need to elaborate a bit on the "feat" behaviour difference.

> +        values for 'feat' are:
> +
> +        - ``ISOL_F_QUIESCE``:
> +
> +                Returns a bitmask containing which kernel
> +                activities are enabled for quiescing.
> +
> +
> +**PR_ISOL_SET**:
> +
> +        Configures task isolation features. The general format is::
> +
> +                prctl(PR_ISOL_SET, feat, arg3, arg4, arg5);
> +
> +        The 'feat' argument specifies which feature to configure.
> +        Possible values for feat are:
> +
> +        - ``ISOL_F_QUIESCE``:
> +
> +                The 'arg3' argument is a bitmask specifying which
> +                kernel activities to quiesce. Possible bit sets are:
> +
> +                - ``ISOL_F_QUIESCE_VMSTATS``
> +
> +                  VM statistics are maintained in per-CPU counters to
> +                  improve performance. When a CPU modifies a VM statistic,
> +                  this modification is kept in the per-CPU counter.
> +                  Certain activities require a global count, which
> +                  involves requesting each CPU to flush its local counters
> +                  to the global VM counters.
> +
> +                  This flush is implemented via a workqueue item, which
> +                  might schedule a workqueue on isolated CPUs.
> +
> +                  To avoid this interruption, task isolation can be
> +                  configured to, upon return from system calls, synchronize
> +                  the per-CPU counters to global counters, thus avoiding
> +                  the interruption.
> +
> +                  To ensure the application returns to userspace
> +                  with no modified per-CPU counters, its necessary to
> +                  use mlockall() in addition to this isolcpus flag.

So prctl(PR_ISOL_SET, ISOL_F_QUIESCE, ISOL_F_QUIESCE_VMSTATS, ...) will quiesce
on all subsequent return to userspace, right?

> +
> +**PR_ISOL_CTRL_GET**:
> +
> +        Retrieve task isolation control.
> +
> +                prctl(PR_ISOL_CTRL_GET, 0, 0, 0, 0);
> +
> +        Returns which isolation features are active.
> +
> +**PR_ISOL_CTRL_SET**:
> +
> +        Activates/deactivates task isolation control.
> +
> +                prctl(PR_ISOL_CTRL_SET, mask, 0, 0, 0);
> +
> +        The 'mask' argument specifies which features
> +        to activate (bit set) or deactivate (bit clear).
> +
> +        For ISOL_F_QUIESCE, quiescing of background activities
> +        happens on return to userspace from the
> +        prctl(PR_ISOL_CTRL_SET) call, and on return from
> +        subsequent system calls.

Now I'm lost again on the difference with PR_ISOL_SET

> +
> +        Quiescing can be adjusted (while active) by
> +        prctl(PR_ISOL_SET, ISOL_F_QUIESCE, ...).
> +
> +**PR_ISOL_GET_INT**:
> +
> +        Retrieves task isolation internal parameters.
> +
> +        The general format is::
> +
> +                prctl(PR_ISOL_GET_INT, cmd, arg3, arg4, arg5);
> +
> +        The 'cmd' argument specifies which parameter to configure.
> +        Possible values for cmd are:
> +
> +        - ``INHERIT_CFG``:
> +
> +                Retrieve inheritance configuration.
> +
> +                The 'arg3' argument is a pointer to a struct
> +                inherit_control::
> +
> +                        struct task_isol_inherit_control {
> +                                __u8    inherit_mask;
> +                                __u8    pad[7];
> +                        };
> +
> +                See PR_ISOL_SET_INT description below for meaning
> +                of structure fields.
> +
> +**PR_ISOL_SET_INT**:
> +
> +        Sets task isolation internal parameters.
> +
> +        The general format is::
> +
> +                prctl(PR_ISOL_SET_INT, cmd, arg3, arg4, arg5);
> +
> +        The 'cmd' argument specifies which parameter to configure.
> +        Possible values for cmd are:
> +
> +        - ``INHERIT_CFG``:
> +
> +                Set inheritance configuration when a new task
> +                is created via fork and clone.
> +
> +                The 'arg3' argument is a pointer to a struct
> +                inherit_control::
> +
> +                        struct task_isol_inherit_control {
> +                                __u8    inherit_mask;
> +                                __u8    pad[7];
> +                        };
> +
> +                inherit_mask is a bitmask that specifies which part
> +                of task isolation to be inherited:
> +
> +                - Bit ISOL_INHERIT_CONF: Inherit task isolation configuration.
> +                  This is the stated written via prctl(PR_ISOL_SET, ...).
> +
> +                - Bit ISOL_INHERIT_ACTIVE: Inherit task isolation activation
> +                  (requires ISOL_INHERIT_CONF to be set). The new task
> +                  should behave, right after fork/clone, in the same manner
> +                  as the parent task _after_ it executed:
> +
> +                        prctl(PR_ISOL_CTRL_SET, mask, 0, 0, 0);
> +
> +                  with a valid mask.

I'm wondering if those things shouldn't be set on arg4 for PR_ISOL_SET instead?
Arguably having a whole prctl for that makes it easier to extend. But then
PR_ISOL_SET_INT must always be called before PR_ISOL_SET, otherwise we create
noise, right?

Thanks.

  reply	other threads:[~2021-08-26 10:00 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-24 15:24 [patch V3 0/8] extensible prctl task isolation interface and vmstat sync Marcelo Tosatti
2021-08-24 15:24 ` [patch V3 1/8] add basic task isolation prctl interface Marcelo Tosatti
2021-08-24 15:24 ` [patch V3 2/8] add prctl task isolation prctl docs and samples Marcelo Tosatti
2021-08-26  9:59   ` Frederic Weisbecker [this message]
2021-08-26 12:11     ` Marcelo Tosatti
2021-08-26 19:15       ` Christoph Lameter
2021-08-26 20:37         ` Marcelo Tosatti
2021-08-27 13:08       ` Frederic Weisbecker
2021-08-27 14:44         ` Marcelo Tosatti
2021-08-30 11:38           ` Frederic Weisbecker
2021-09-01 13:11   ` Nitesh Lal
2021-09-01 17:34     ` Marcelo Tosatti
2021-09-01 17:49       ` Nitesh Lal
2021-08-24 15:24 ` [patch V3 3/8] task isolation: sync vmstats on return to userspace Marcelo Tosatti
2021-09-10 13:49   ` nsaenzju
2021-08-24 15:24 ` [patch V3 4/8] procfs: add per-pid task isolation state Marcelo Tosatti
2021-08-24 15:24 ` [patch V3 5/8] task isolation: sync vmstats conditional on changes Marcelo Tosatti
2021-08-25  9:46   ` Christoph Lameter
2021-08-24 15:24 ` [patch V3 6/8] KVM: x86: call isolation prepare from VM-entry code path Marcelo Tosatti
2021-08-24 15:24 ` [patch V3 7/8] mm: vmstat: move need_update Marcelo Tosatti
2021-08-24 15:24 ` [patch V3 8/8] mm: vmstat_refresh: avoid queueing work item if cpu stats are clean Marcelo Tosatti
2021-08-25  9:30   ` Christoph Lameter
2021-09-01 13:05   ` Nitesh Lal
2021-09-01 17:32     ` Marcelo Tosatti
2021-09-01 18:33       ` Marcelo Tosatti
2021-09-03 17:38         ` Nitesh Lal
2021-08-25 10:02 ` [patch V3 0/8] extensible prctl task isolation interface and vmstat sync Marcelo Tosatti

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210826095958.GA908505@lothringen \
    --to=frederic@kernel.org \
    --cc=abelits@belits.com \
    --cc=cl@linux.com \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mtosatti@redhat.com \
    --cc=nilal@redhat.com \
    --cc=nsaenzju@redhat.com \
    --cc=peterx@redhat.com \
    --cc=peterz@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.