All of lore.kernel.org
 help / color / mirror / Atom feed
From: Marcelo Tosatti <mtosatti@redhat.com>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org, Nitesh Lal <nilal@redhat.com>,
	Nicolas Saenz Julienne <nsaenzju@redhat.com>,
	Frederic Weisbecker <frederic@kernel.org>,
	Christoph Lameter <cl@linux.com>,
	Juri Lelli <juri.lelli@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Alex Belits <abelits@belits.com>, Peter Xu <peterx@redhat.com>
Subject: Re: [patch 0/4] extensible prctl task isolation interface and vmstat sync (v2)
Date: Tue, 10 Aug 2021 16:15:26 -0300	[thread overview]
Message-ID: <20210810191526.GA38862@fuller.cnet> (raw)
In-Reply-To: <20210810183746.GA32986@fuller.cnet>

On Tue, Aug 10, 2021 at 03:37:46PM -0300, Marcelo Tosatti wrote:
> On Tue, Aug 10, 2021 at 06:40:48PM +0200, Thomas Gleixner wrote:
> > Marcelo,
> > 
> > On Fri, Jul 30 2021 at 17:18, Marcelo Tosatti wrote:
> > 
> > can you pretty please:
> > 
> >  1) Add a version number to your patch series right where it belongs:
> > 
> >     [patch V2 N/M]
> > 
> >     Just having a (v2) at the end of the subject line of 0/M is sloppy
> >     at best.
> > 
> >  2) Provide a lore link to the previous version
> > 
> > Thanks,
> > 
> >         tglx
> 
> Thomas,
> 
> Sure, will resend -v3 once done with the following:
> 
> 1) Adding support for KVM.
> 
> 2) Adding a tool called "chisol" to util-linux, similar 
> to chrt/taskset, to prctl+exec (for unmodified applications).
> 
> This raises the question whether or not to add an option to preserve
> the task parameters across fork (i think the answer is yes).
> 
> --
> 
> But the following points are unclear to me (in quotes are earlier 
> comments you made):
> 
> 1) "It's about silencing different and largely independent parts of the OS
> on a particular CPU. Just defining upfront that there is only the choice
> of all or nothing _is_ policy.
> 
> There is a very wide range of use case scenarios out there and just
> because the ones which you care about needs X does not mean that X is
> the right thing for everybody else. You still can have X and let other
> people define their own set of things they want to be protected
> against.
> 
> Aside of that having it selectively is a plus for debugability, testing
> etc."
> 
> So for the ability to individually select what parts of the OS 
> on a particular CPU are quiesced, there is:
> 
> +	defmask = defmask | ISOL_F_QUIESCE_VMSTATS;
> +
> +	ret = prctl(PR_ISOL_SET, ISOL_F_QUIESCE, defmask,
> +                   0, 0);
> +	if (ret == -1) {
> +               perror("prctl PR_ISOL_SET");
> +               return EXIT_FAILURE;
> +	}
> 
> However there is a feeling that implementation details are being exposed 
> to userspace... However that seems to be alright: what could happen is that
> the feature ceases to exist (say vmstat sync), in kernel, and the bit
> is kept for compability (but the kernel does nothing about it). 
> 
> That of course means whatever "vmstat sync" replacement comes up, it should
> avoid IPIs as well.
> 
> Any thoughts on this?
> 
> 2) "Again: I fundamentaly disagree with the proposed task isolation patches
> approach as they leave no choice at all.
> 
> There is a reasonable middle ground where an application is willing to
> pay the price (delay) until the reqested quiescing has taken place in
> order to run undisturbed (hint: cache ...) and also is willing to take
> the addtional overhead of an occacional syscall in the slow path without
> tripping some OS imposed isolation safe guard.
> 
> Aside of that such a granular approach does not necessarily require the
> application to be aware of it. If the admin knows the computational
> pattern of the application, e.g.
> 
>  1     read_data_set() <- involving syscalls/OS obviously
>  2     compute_set()   <- let me alone
>  3     save_data_set() <- involving syscalls/OS obviously
> 
>        repeat the above...
> 
> then it's at his discretion to decide to inflict a particular isolation
> set on the task which is obviously ineffective while doing #1 and #3 but
> might provide the so desired 0.9% boost for compute_set() which
> dominates the judgement.
> 
> That's what we need to think about and once we figured out how to do
> that it gives Marcelo the mechanism to solve his 'run virt undisturbed
> by vmstat or whatever' problem and it allows Alex to build his stuff on
> it.
> 
> Summary: The problem to be solved cannot be restricted to
> 
>     self_defined_important_task(OWN_WORLD);
> 
> Policy is not a binary on/off problem. It's manifold across all levels
> of the stack and only a kernel problem when it comes down to the last
> line of defence.
> 
> Up to the point where the kernel puts the line of last defence, policy
> is defined by the user/admin via mechanims provided by the kernel.
> 
> Emphasis on "mechanims provided by the kernel", aka. user API.
> 
> Just in case, I hope that I don't have to explain what level of scrunity
> and thought this requires."
> 
> OK, so perhaps a handful of use-cases can clarify whether the proposed
> interface requires changes?
> 
> The example on samples/task_isolation/ is focused on "enter task isolation
> and very rarely exit".
> 
> There are two other cases i am aware of:
> 
> A) Christoph's use-case:
> 
> 	1) Enter task-isolation.
> 	2) Latency sensitive loop begins.
> 	3) Some event interrupts latency sensitive section.
> 
> 	4) Handling of the event requires N syscalls, which the programmer
> 	   would be interested in happening without quiescing at 
> 	   every return to system call. The current scheme would be:
> 
> 
>        /*
>         * Application can either set the value from ISOL_F_QUIESCE_DEFMASK,
>         * which is configurable through
>         * /sys/kernel/task_isolation/default_quiesce_activities,
>         * or specific values.
>         *
>         * Using ISOL_F_QUIESCE_DEFMASK allows for the application to
>         * take advantage of future quiescing capabilities without
>         * modification (provided default_quiesce_activities is
>         * configured accordingly).
>         */
>        defmask = defmask | ISOL_F_QUIESCE_VMSTATS;
> 
>        ret = prctl(PR_ISOL_SET, ISOL_F_QUIESCE, defmask,
>                    0, 0);
>        if (ret == -1) {
>                perror("prctl PR_ISOL_SET");
>                return EXIT_FAILURE;
>        }
> 
> lat_loop:
>        ret = prctl(PR_ISOL_CTRL_SET, ISOL_F_QUIESCE, 0, 0, 0);
>        if (ret == -1) {
>                perror("prctl PR_ISOL_CTRL_SET (ISOL_F_QUIESCE)");
>                return EXIT_FAILURE;
>        }
> 
>        latency sensitive loop
> 
>        if (event == 1) {
> 		/* disables quiescing of all features, while maintaining
> 		 * other features such as logging and avoidance of
> 		 * interruptions enabled.
> 		 */
> 		ret = prctl(PR_ISOL_CTRL_SET, 0, 0, 0, 0);
> 		syscall1
> 		syscall2
> 		...
> 		syscallN
> 		/* reenter isolated mode with quiescing */
> 		goto lat_loop;
> 	}
> 	...
> 
> Should it be possible to modify individual quiescing parts individually
> while maintaining isolated mode? Yes, that seems to be desired.
> 
> 
> The other use-case (from you) seems to be:
> 
>  1     read_data_set() <- involving syscalls/OS obviously
>  2     compute_set()   <- let me alone
>  3     save_data_set() <- involving syscalls/OS obviously
> 
>        repeat the above...
> 
> Well, the implementation of Christoph's use above seems not
> to be that bad as well:
> 
>  1     read_data_set() <- involving syscalls/OS obviously
> 	/* disables quiescing of all (or some, if desired)
> 	 * features, while maintaining other features such
> 	 * as logging and avoidance of interruptions enabled.
> 	 */
> 	ret = prctl(PR_ISOL_CTRL_SET, ISOL_F_QUIESCE, 0, 0, 0);
> 
>  2     compute_set()   <- let me alone
> 
> 	ret = prctl(PR_ISOL_CTRL_SET, 0, 0, 0, 0);
> 
>  3     save_data_set() <- involving syscalls/OS obviously
> 
>        repeat the above...
> 
> What kind of different behaviour, other than enabling/disabling
> quiescing, would be desired in this use-case?
> 

And 3) Is a global ISOL_F_QUIESCE_DEFMASK sufficient, or should this 
be per-task, or cgroups?

       /*
        * Application can either set the value from ISOL_F_QUIESCE_DEFMASK,
        * which is configurable through
        * /sys/kernel/task_isolation/default_quiesce_activities,
        * or specific values.
        *
        * Using ISOL_F_QUIESCE_DEFMASK allows for the application to
        * take advantage of future quiescing capabilities without
        * modification (provided default_quiesce_activities is
        * configured accordingly).
        */
       defmask = defmask | ISOL_F_QUIESCE_VMSTATS;

       ret = prctl(PR_ISOL_SET, ISOL_F_QUIESCE, defmask,
                   0, 0);
       if (ret == -1) {
               perror("prctl PR_ISOL_SET");
               return EXIT_FAILURE;
       }


      reply	other threads:[~2021-08-10 19:15 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-30 20:18 [patch 0/4] extensible prctl task isolation interface and vmstat sync (v2) Marcelo Tosatti
2021-07-30 20:18 ` [patch 1/4] add basic task isolation prctl interface Marcelo Tosatti
     [not found]   ` <CAFki+Lnf0cs62Se0aPubzYxP9wh7xjMXn7RXEPvrmtBdYBrsow@mail.gmail.com>
2021-07-31  0:49     ` Marcelo Tosatti
2021-07-31  7:47   ` kernel test robot
2021-07-31  7:47     ` kernel test robot
     [not found]   ` <CAFki+LkQVQOe+5aNEKWDvLdnjWjxzKWOiqOvBZzeuPWX+G=XgA@mail.gmail.com>
2021-08-02 14:16     ` Marcelo Tosatti
2021-07-30 20:18 ` [patch 2/4] task isolation: sync vmstats on return to userspace Marcelo Tosatti
2021-08-03 15:13   ` nsaenzju
2021-08-03 16:44     ` Marcelo Tosatti
2021-07-30 20:18 ` [patch 3/4] mm: vmstat: move need_update Marcelo Tosatti
2021-07-30 20:18 ` [patch 4/4] mm: vmstat_refresh: avoid queueing work item if cpu stats are clean Marcelo Tosatti
2021-08-07  2:47   ` Nitesh Lal
2021-08-09 17:34     ` Marcelo Tosatti
2021-08-09 19:13       ` Nitesh Lal
2021-08-10 16:40 ` [patch 0/4] extensible prctl task isolation interface and vmstat sync (v2) Thomas Gleixner
2021-08-10 18:37   ` Marcelo Tosatti
2021-08-10 19:15     ` Marcelo Tosatti [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210810191526.GA38862@fuller.cnet \
    --to=mtosatti@redhat.com \
    --cc=abelits@belits.com \
    --cc=cl@linux.com \
    --cc=frederic@kernel.org \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=nilal@redhat.com \
    --cc=nsaenzju@redhat.com \
    --cc=peterx@redhat.com \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.