public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Marcelo Tosatti <mtosatti@redhat.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: linux-kernel@vger.kernel.org, Nitesh Lal <nilal@redhat.com>,
	Nicolas Saenz Julienne <nsaenzju@redhat.com>,
	Frederic Weisbecker <frederic@kernel.org>,
	Christoph Lameter <cl@linux.com>,
	Juri Lelli <juri.lelli@redhat.com>,
	Alex Belits <abelits@belits.com>, Peter Xu <peterx@redhat.com>
Subject: Re: [patch v4 1/8] add basic task isolation prctl interface
Date: Thu, 14 Oct 2021 10:02:20 -0300	[thread overview]
Message-ID: <20211014130220.GA5812@fuller.cnet> (raw)
In-Reply-To: <20211013160630.GA106511@fuller.cnet>

<snip>

> What are the requirements of the signal exactly (and why it is popular) ?
> Because the interruption event can be due to:
> 
> * An IPI.
> * A system call.

IRQs (easy to trace), exceptions.

> In the "full task isolation mode" patchset (the one from Alex), a system call
> will automatically generate a SIGKILL once a system call is performed
> (after the prctl to enable task isolated mode, but
> before the prctl to disable task isolated mode).
> This can be implemented, if desired, by SECCOMP syscall blocking
> (which already exists).
> 
> For other interruptions, which happen through IPIs, one can print
> the stack trace of the program (or interrupt) that generated
> the IPI to find out the cause (which is what rt-trace-bpf.py is doing).
> 
> An alternative would be to add tracepoints so that one can
> find out which function in the kernel caused the CPU and
> task to become "a target for interruptions".

For example, adding a tracepoint to mark_vmstat_dirty() function
(allowing to see how that function was invoked on a given CPU, and
by whom) appears to be sufficient information to debug problems.

(mark_vmstat_dirty() from
[patch v4 5/8] task isolation: sync vmstats conditional on changes)

Instead of a coredump image with a SIGKILL sent at that point.

Looking at

https://github.com/abelits/libtmc

One can see the notification via SIGUSR1 being used.

To support something similar to it, one would add a new bit to 
flags field of:

+struct task_isol_activate_control {
+       __u64 flags;
+       __u64 quiesce_oneshot_mask;
+       __u64 pad[6];
+};

Remove 

+ 	       ret = -EINVAL;
+               if (act_ctrl.flags)
+                       goto out;

From the handler, shrink the padded space and use it.

> 
> > > > Also, see:
> > > > 
> > > >   https://lkml.kernel.org/r/20210929152429.186930629@infradead.org
> > > 
> > > As you can see from the below pseudocode, we were thinking of queueing
> > > the (invalidate icache or TLB flush) in case app is in userspace,
> > > to perform on return to kernel space, but the approach in your patch might be
> > > superior (will take sometime to parse that thread...).
> > 
> > Let me assume you're talking about kernel TLB invalidates, otherwise it
> > would be terribly broken.
> > 
> > > > Suppose:
> > > > 
> > > > 	CPU0					CPU1
> > > > 
> > > > 	sys_prctl()
> > > > 	<kernel entry>
> > > > 	  // marks task 'important'
> > > > 						text_poke_sync()
> > > > 						  // checks CPU0, not userspace, queues IPI
> > > > 	<kernel exit>
> > > > 
> > > > 	$important userspace			  arch_send_call_function_ipi_mask()
> > > > 	<IPI>
> > > > 	  // finds task is 'important' and
> > > > 	  // can't take interrupts
> > > > 	  sigkill()
> > > > 
> > > > *Whoopsie*
> > > > 
> > > > 
> > > > Fundamentally CPU1 can't elide the IPI until CPU0 is in userspace,
> > > > therefore CPU0 can't wait for quescence in kernelspace, but if it goes
> > > > to userspace, it'll get killed on interruption. Catch-22.

To reiterate on this point:

> > > >         CPU0                                    CPU1
> > > > 
> > > >         sys_prctl()
> > > >         <kernel entry>
> > > >           // marks task 'important'
> > > >                                                 text_poke_sync()
> > > >                                                   // checks CPU0, not userspace, queues IPI
> > > >         <kernel exit>

1) Such races can be fixed by proper uses of atomic variables.

2) If a signal to an application is desired, fail to see why this
interface (ignoring bugs related to the particular mechanism) does not
allow it.

So hopefully this addresses your comments.


  reply	other threads:[~2021-10-14 13:42 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-07 19:23 [patch v4 0/8] extensible prctl task isolation interface and vmstat sync Marcelo Tosatti
2021-10-07 19:23 ` [patch v4 1/8] add basic task isolation prctl interface Marcelo Tosatti
2021-10-12 13:05   ` Peter Zijlstra
2021-10-13 10:56     ` Marcelo Tosatti
2021-10-13 11:37       ` Marcelo Tosatti
2021-10-13 15:01       ` Peter Zijlstra
2021-10-13 16:06         ` Marcelo Tosatti
2021-10-14 13:02           ` Marcelo Tosatti [this message]
2021-10-19 15:07         ` Marcelo Tosatti
2021-10-07 19:23 ` [patch v4 2/8] add prctl task isolation prctl docs and samples Marcelo Tosatti
2021-10-07 19:23 ` [patch v4 3/8] task isolation: sync vmstats on return to userspace Marcelo Tosatti
2021-10-07 19:23 ` [patch v4 4/8] procfs: add per-pid task isolation state Marcelo Tosatti
2021-10-07 19:23 ` [patch v4 5/8] task isolation: sync vmstats conditional on changes Marcelo Tosatti
2021-10-07 19:23 ` [patch v4 6/8] KVM: x86: call isolation prepare from VM-entry code path Marcelo Tosatti
2021-10-07 19:23 ` [patch v4 7/8] mm: vmstat: move need_update Marcelo Tosatti
2021-10-07 19:23 ` [patch v4 8/8] mm: vmstat_refresh: avoid queueing work item if cpu stats are clean Marcelo Tosatti

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20211014130220.GA5812@fuller.cnet \
    --to=mtosatti@redhat.com \
    --cc=abelits@belits.com \
    --cc=cl@linux.com \
    --cc=frederic@kernel.org \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=nilal@redhat.com \
    --cc=nsaenzju@redhat.com \
    --cc=peterx@redhat.com \
    --cc=peterz@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox