TIF_NOHZ can escape nonhz mask? (Was: [PATCH v3 6/8] x86: Split syscall_trace_enter into two phases)

linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed

From: fweisbec@gmail.com (Frederic Weisbecker)
To: linux-arm-kernel@lists.infradead.org
Subject: TIF_NOHZ can escape nonhz mask? (Was: [PATCH v3 6/8] x86: Split syscall_trace_enter into two phases)
Date: Mon, 28 Jul 2014 21:22:13 +0200	[thread overview]
Message-ID: <20140728192209.GA26017@localhost.localdomain> (raw)
In-Reply-To: <20140728185803.GA24663@redhat.com>

On Mon, Jul 28, 2014 at 08:58:03PM +0200, Oleg Nesterov wrote:
> Off-topic, but...
> 
> On 07/28, Oleg Nesterov wrote:
> >
> > But we should always call user_exit() unconditionally?
> 
> Frederic, don't we need the patch below? In fact clear_() can be moved
> under "if ()" too. and probably copy_process() should clear this flag...
> 
> Or. __context_tracking_task_switch() can simply do
> 
> 	 if (context_tracking_cpu_is_enabled())
> 	 	set_tsk_thread_flag(next, TIF_NOHZ);
> 	 else
> 	 	clear_tsk_thread_flag(next, TIF_NOHZ);
> 
> and then we can forget about copy_process(). Or I am totally confused?
> 
> 
> I am also wondering if we can extend user_return_notifier to handle
> enter/exit and kill TIF_NOHZ.
> 
> Oleg.
> 
> --- x/kernel/context_tracking.c
> +++ x/kernel/context_tracking.c
> @@ -202,7 +202,8 @@ void __context_tracking_task_switch(stru
>  				    struct task_struct *next)
>  {
>  	clear_tsk_thread_flag(prev, TIF_NOHZ);
> -	set_tsk_thread_flag(next, TIF_NOHZ);
> +	if (context_tracking_cpu_is_enabled())
> +		set_tsk_thread_flag(next, TIF_NOHZ);
>  }
>  
>  #ifdef CONFIG_CONTEXT_TRACKING_FORCE

Unfortunately, as long as tasks can migrate in and out a context tracked CPU, we
need to track all CPUs.

This is because there is always a small shift between hard and soft kernelspace
boundaries.

Hard boundaries are the real strict boundaries: between "int", "iret" or faulting
instructions for example.

Soft boundaries are the place where we put our context tracking probes. They
are just function calls and a distance between them and hard boundaries is inevitable.

So here is a scenario where this is a problem: a task runs on CPU 0, passes the context
tracking call before returning from a syscall to userspace, and gets an interrupt. The
interrupt preempts the task and it moves to CPU 1. So it returns from preempt_schedule_irq()
after which it is going to resume to userspace.

In this scenario, if context tracking is only enabled on CPU 1, we have no way to know that
the task is resuming to userspace, because we passed through the context tracking probe
already and it was ignored on CPU 0.

This might be hackbable by ensuring that irqs are disabled between context tracking
calls and actual returns to userspace. It's a nightmare to audit on all archs though,
and it makes the context tracking callers less flexible also that only solve the issue
for irqs. Exception have a similar problem and we can't mask them.

next prev parent reply	other threads:[~2014-07-28 19:22 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-07-22  1:49 [PATCH v3 0/8] Two-phase seccomp and x86 tracing changes Andy Lutomirski
2014-07-22  1:49 ` [PATCH v3 1/8] seccomp, x86, arm, mips, s390: Remove nr parameter from secure_computing Andy Lutomirski
2014-07-22  1:49 ` [PATCH v3 2/8] seccomp: Refactor the filter callback and the API Andy Lutomirski
2014-07-22  1:49 ` [PATCH v3 3/8] seccomp: Allow arch code to provide seccomp_data Andy Lutomirski
2014-07-22  1:49 ` [PATCH v3 4/8] seccomp: Document two-phase seccomp and arch-provided seccomp_data Andy Lutomirski
2014-07-22  1:53 ` [PATCH v3 5/8] x86,x32,audit: Fix x32's AUDIT_ARCH wrt audit Andy Lutomirski
2014-07-22  1:53   ` [PATCH v3 6/8] x86: Split syscall_trace_enter into two phases Andy Lutomirski
2014-07-28 17:37     ` Oleg Nesterov
2014-07-28 18:58       ` TIF_NOHZ can escape nonhz mask? (Was: [PATCH v3 6/8] x86: Split syscall_trace_enter into two phases) Oleg Nesterov
2014-07-28 19:22         ` Frederic Weisbecker [this message]
2014-07-29 17:54           ` Oleg Nesterov
2014-07-30 16:35             ` Frederic Weisbecker
2014-07-30 17:46               ` Oleg Nesterov
2014-07-31  0:30                 ` Frederic Weisbecker
2014-07-31 16:03                   ` Oleg Nesterov
2014-07-31 17:13                     ` Frederic Weisbecker
2014-07-31 18:12                       ` Oleg Nesterov
2014-07-31 18:47                         ` Frederic Weisbecker
2014-07-31 18:50                           ` Frederic Weisbecker
2014-07-31 19:05                             ` Oleg Nesterov
2014-08-02 17:30                           ` Oleg Nesterov
2014-08-04 12:02                             ` Paul E. McKenney
2014-07-28 20:23       ` [PATCH v3 6/8] x86: Split syscall_trace_enter into two phases Andy Lutomirski
2014-07-29 16:54         ` Oleg Nesterov
2014-07-29 17:01           ` Andy Lutomirski
2014-07-29 17:31             ` Oleg Nesterov
2014-07-29 17:55               ` Andy Lutomirski
2014-07-29 18:16                 ` Oleg Nesterov
2014-07-29 18:22                   ` Andy Lutomirski
2014-07-29 18:44                     ` Oleg Nesterov
2014-07-22  1:53   ` [PATCH v3 7/8] x86_64, entry: Treat regs->ax the same in fastpath and slowpath syscalls Andy Lutomirski
2014-07-22  1:53   ` [PATCH v3 8/8] x86_64, entry: Use split-phase syscall_trace_enter for 64-bit syscalls Andy Lutomirski
2014-07-22 19:37 ` [PATCH v3 0/8] Two-phase seccomp and x86 tracing changes Kees Cook
2014-07-23 19:20   ` Andy Lutomirski
2014-07-28 17:59     ` H. Peter Anvin
2014-07-28 23:29       ` Kees Cook
2014-07-28 23:34         ` H. Peter Anvin
2014-07-28 23:42           ` Kees Cook
2014-07-28 23:45             ` H. Peter Anvin
2014-07-28 23:54               ` Kees Cook

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140728192209.GA26017@localhost.localdomain \
    --to=fweisbec@gmail.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).