linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: fweisbec@gmail.com (Frederic Weisbecker)
To: linux-arm-kernel@lists.infradead.org
Subject: TIF_NOHZ can escape nonhz mask? (Was: [PATCH v3 6/8] x86: Split syscall_trace_enter into two phases)
Date: Wed, 30 Jul 2014 18:35:21 +0200	[thread overview]
Message-ID: <20140730163516.GC18158@localhost.localdomain> (raw)
In-Reply-To: <20140729175414.GA3289@redhat.com>

On Tue, Jul 29, 2014 at 07:54:14PM +0200, Oleg Nesterov wrote:
> Thanks Frederic for your explanations. Yes, I was confused. But cough, now I am
> even more confused.
> 
> I didn't even try to read this code, perhaps I'll try later, but let me ask
> another question while you are here ;)
> 
> The comment above __context_tracking_task_switch() says:
> 
> 	 * The context tracking uses the syscall slow path to implement its user-kernel
> 	 * boundaries probes on syscalls. This way it doesn't impact the syscall fast
> 	 * path on CPUs that don't do context tracking.
> 	        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Indeed, in fact the comment is confusing the way it explain things. It suggests that
some CPUs maybe doing context tracking while some other can choose not to context track.

It's rather all or nothing. Actually TIF_NOHZ optimize systems that have CONFIG_CONTEXT_TRACKING=y
and that don't need context tracking. In this case TIF_NOHZ is cleared and thus the syscall
fastpath has no overhead.

So I should rephrase it that way:

        * The context tracking uses the syscall slow path to implement its user-kernel
        * boundaries probes on syscalls. This way it doesn't impact the syscall fast
        * path when context tracking is globally disabled.

> 
> How? Every running task should have TIF_NOHZ set if context_tracking_is_enabled() ?
> 
> 	 * But we need to clear the flag on the previous task because it may later
> 	 * migrate to some CPU that doesn't do the context tracking. As such the TIF
> 	 * flag may not be desired there.
> 
> For what? How this can help? This flag will be set again when we switch to this
> task again?

That is indeed a stale comment from aborted early design.

> 
> Looks like, we can kill context_tracking_task_switch() and simply change the
> "__init" callers of context_tracking_cpu_set() to do set_thread_flag(TIF_NOHZ) ?
> Then this flag will be propagated by copy_process().

Right, that would be much better. Good catch! context tracking is enabled from
tick_nohz_init(). This is the init 0 task so the flag should be propagated from there.

I still think we need a for_each_process_thread() set as well though because some
kernel threads may well have been created at this stage already.

> 
> Or I am totally confused? (quite possible).
> 
> > So here is a scenario where this is a problem: a task runs on CPU 0, passes the context
> > tracking call before returning from a syscall to userspace, and gets an interrupt. The
> > interrupt preempts the task and it moves to CPU 1. So it returns from preempt_schedule_irq()
> > after which it is going to resume to userspace.
> >
> > In this scenario, if context tracking is only enabled on CPU 1, we have no way to know that
> > the task is resuming to userspace, because we passed through the context tracking probe
> > already and it was ignored on CPU 0.
> 
> Thanks. But I still can't understand... So if we only track CPU 1, then in this
> case context_tracking.state == IN_USER on CPU 0, but it can be IN_USER or IN_KERNEL
> on CPU 1.

I'm not sure I understand your question. Context tracking is either enabled everywhere or
nowhere.

I need to say though that there is a per CPU context tracking state named context_tracking.active.
It's confusing because it suggests that context tracking is active per CPU. Actually it's tracked
everywhere when globally enabled, but active determines if we call the RCU and vtime callbacks or
not.

So only nohz full CPUs have context_tracking.active set because only these need to call the RCU
and vtime callbacks. Other CPUs still do the context tracking but they won't call rcu and vtime
functions.

> 
> Oleg.
> 

  reply	other threads:[~2014-07-30 16:35 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-07-22  1:49 [PATCH v3 0/8] Two-phase seccomp and x86 tracing changes Andy Lutomirski
2014-07-22  1:49 ` [PATCH v3 1/8] seccomp, x86, arm, mips, s390: Remove nr parameter from secure_computing Andy Lutomirski
2014-07-22  1:49 ` [PATCH v3 2/8] seccomp: Refactor the filter callback and the API Andy Lutomirski
2014-07-22  1:49 ` [PATCH v3 3/8] seccomp: Allow arch code to provide seccomp_data Andy Lutomirski
2014-07-22  1:49 ` [PATCH v3 4/8] seccomp: Document two-phase seccomp and arch-provided seccomp_data Andy Lutomirski
2014-07-22  1:53 ` [PATCH v3 5/8] x86,x32,audit: Fix x32's AUDIT_ARCH wrt audit Andy Lutomirski
2014-07-22  1:53   ` [PATCH v3 6/8] x86: Split syscall_trace_enter into two phases Andy Lutomirski
2014-07-28 17:37     ` Oleg Nesterov
2014-07-28 18:58       ` TIF_NOHZ can escape nonhz mask? (Was: [PATCH v3 6/8] x86: Split syscall_trace_enter into two phases) Oleg Nesterov
2014-07-28 19:22         ` Frederic Weisbecker
2014-07-29 17:54           ` Oleg Nesterov
2014-07-30 16:35             ` Frederic Weisbecker [this message]
2014-07-30 17:46               ` Oleg Nesterov
2014-07-31  0:30                 ` Frederic Weisbecker
2014-07-31 16:03                   ` Oleg Nesterov
2014-07-31 17:13                     ` Frederic Weisbecker
2014-07-31 18:12                       ` Oleg Nesterov
2014-07-31 18:47                         ` Frederic Weisbecker
2014-07-31 18:50                           ` Frederic Weisbecker
2014-07-31 19:05                             ` Oleg Nesterov
2014-08-02 17:30                           ` Oleg Nesterov
2014-08-04 12:02                             ` Paul E. McKenney
2014-07-28 20:23       ` [PATCH v3 6/8] x86: Split syscall_trace_enter into two phases Andy Lutomirski
2014-07-29 16:54         ` Oleg Nesterov
2014-07-29 17:01           ` Andy Lutomirski
2014-07-29 17:31             ` Oleg Nesterov
2014-07-29 17:55               ` Andy Lutomirski
2014-07-29 18:16                 ` Oleg Nesterov
2014-07-29 18:22                   ` Andy Lutomirski
2014-07-29 18:44                     ` Oleg Nesterov
2014-07-22  1:53   ` [PATCH v3 7/8] x86_64, entry: Treat regs->ax the same in fastpath and slowpath syscalls Andy Lutomirski
2014-07-22  1:53   ` [PATCH v3 8/8] x86_64, entry: Use split-phase syscall_trace_enter for 64-bit syscalls Andy Lutomirski
2014-07-22 19:37 ` [PATCH v3 0/8] Two-phase seccomp and x86 tracing changes Kees Cook
2014-07-23 19:20   ` Andy Lutomirski
2014-07-28 17:59     ` H. Peter Anvin
2014-07-28 23:29       ` Kees Cook
2014-07-28 23:34         ` H. Peter Anvin
2014-07-28 23:42           ` Kees Cook
2014-07-28 23:45             ` H. Peter Anvin
2014-07-28 23:54               ` Kees Cook

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140730163516.GC18158@localhost.localdomain \
    --to=fweisbec@gmail.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).