From: oleg@redhat.com (Oleg Nesterov)
To: linux-arm-kernel@lists.infradead.org
Subject: TIF_NOHZ can escape nonhz mask? (Was: [PATCH v3 6/8] x86: Split syscall_trace_enter into two phases)
Date: Thu, 31 Jul 2014 18:03:53 +0200 [thread overview]
Message-ID: <20140731160353.GA14772@redhat.com> (raw)
In-Reply-To: <20140731003034.GA32078@localhost.localdomain>
On 07/31, Frederic Weisbecker wrote:
>
> On Wed, Jul 30, 2014 at 07:46:30PM +0200, Oleg Nesterov wrote:
> > On 07/30, Frederic Weisbecker wrote:
> > >
> > > On Tue, Jul 29, 2014 at 07:54:14PM +0200, Oleg Nesterov wrote:
> > >
> > > >
> > > > Looks like, we can kill context_tracking_task_switch() and simply change the
> > > > "__init" callers of context_tracking_cpu_set() to do set_thread_flag(TIF_NOHZ) ?
> > > > Then this flag will be propagated by copy_process().
> > >
> > > Right, that would be much better. Good catch! context tracking is enabled from
> > > tick_nohz_init(). This is the init 0 task so the flag should be propagated from there.
> >
> > actually init 1 task, but this doesn't matter.
>
> Are you sure? It does matter because that would invalidate everything I understood
> about init/main.c :)
Sorry for confusion ;)
> I was convinced that the very first kernel init task is PID 0 then
> it forks on rest_init() to launch the userspace init with PID 1. Then init/0 becomes the
> idle task of the boot CPU.
Yes sure. But context_tracking_cpu_set() is called by init task with PID 1, not
by "swapper". And we do not care about idle threads at all.
> > > I still think we need a for_each_process_thread() set as well though because some
> > > kernel threads may well have been created at this stage already.
> >
> > Yes... Or we can add set_thread_flag(TIF_NOHZ) into ____call_usermodehelper().
>
> Couldn't there be some other tasks than usermodehelper stuffs at this stage? Like workqueues
> or random kernel threads?
Sure, but we do not care. A kernel thread can never return to user space, it
must never call user_enter/exit().
> > I meant that in the scenario you described above the "global" TIF_NOHZ doesn't
> > really make a difference, afaics.
> >
> > Lets assume that context tracking is only enabled on CPU 1. To simplify,
> > assume that we have a single usermode task T which sleeps in kernel mode.
> >
> > So context_tracking[0].state == context_tracking[1].state == IN_KERNEL.
> >
> > T wakes up on CPU_0, returns to user space, calls user_enter(). This sets
> > context_tracking[0].state = IN_USER but otherwise does nothing else, this
> > CPU is not tracked and .active is false.
> >
> > Right after local_irq_restore() this task can migrate to CPU_1 and finish
> > its ret-to-usermode path. But since it had already passed user_enter() we
> > do not change context_tracking[1].state and do not play with rcu/vtime.
> > (unless this task hits SCHEDULE_USER in asm).
> >
> > The same for user_exit() of course.
>
> So indeed if context tracking is enabled on CPU 1 and not in CPU 0, we risk
> such situation where CPU 1 has wrong context tracking.
OK. To simplify, lets discuss user_enter() only. So, it is actually a nop on
CPU_0, and CPU_1 can miss it anyway.
> But global TIF_NOHZ should enforce context tracking everywhere.
And this is what I can't understand. Lets return to my initial question, why
we can't change __context_tracking_task_switch()
void __context_tracking_task_switch(struct task_struct *prev,
struct task_struct *next)
{
if (context_tracking_cpu_is_enabled())
set_tsk_thread_flag(next, TIF_NOHZ);
else
clear_tsk_thread_flag(next, TIF_NOHZ);
}
? How can the global TIF_NOHZ help?
OK, OK, a task can return to usermode on CPU_0, notice TIF_NOHZ, take the
slow path, and do the "right" thing if it migrates to CPU_1 _before_ it
comes to user_enter(). But this case is very unlikely, certainly this can't
explain why do we penalize the untracked CPU's ?
> And also it's
> less context switch overhead.
Why???
I think I have a blind spot here. Help!
And of course I can't understand exception_enter/exit(). Not to mention that
(afaics) "prev_ctx == IN_USER" in exception_exit() can be false positive even
if we forget that the caller can migrate in between. Just because, once again,
a tracked CPU can miss user_exit().
So, why not
static inline void exception_enter(void)
{
user_exit();
}
static inline void exception_exit(struct pt_regs *regs)
{
if (user_mode(regs))
user_enter();
}
?
Oleg.
next prev parent reply other threads:[~2014-07-31 16:03 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-07-22 1:49 [PATCH v3 0/8] Two-phase seccomp and x86 tracing changes Andy Lutomirski
2014-07-22 1:49 ` [PATCH v3 1/8] seccomp, x86, arm, mips, s390: Remove nr parameter from secure_computing Andy Lutomirski
2014-07-22 1:49 ` [PATCH v3 2/8] seccomp: Refactor the filter callback and the API Andy Lutomirski
2014-07-22 1:49 ` [PATCH v3 3/8] seccomp: Allow arch code to provide seccomp_data Andy Lutomirski
2014-07-22 1:49 ` [PATCH v3 4/8] seccomp: Document two-phase seccomp and arch-provided seccomp_data Andy Lutomirski
2014-07-22 1:53 ` [PATCH v3 5/8] x86,x32,audit: Fix x32's AUDIT_ARCH wrt audit Andy Lutomirski
2014-07-22 1:53 ` [PATCH v3 6/8] x86: Split syscall_trace_enter into two phases Andy Lutomirski
2014-07-28 17:37 ` Oleg Nesterov
2014-07-28 18:58 ` TIF_NOHZ can escape nonhz mask? (Was: [PATCH v3 6/8] x86: Split syscall_trace_enter into two phases) Oleg Nesterov
2014-07-28 19:22 ` Frederic Weisbecker
2014-07-29 17:54 ` Oleg Nesterov
2014-07-30 16:35 ` Frederic Weisbecker
2014-07-30 17:46 ` Oleg Nesterov
2014-07-31 0:30 ` Frederic Weisbecker
2014-07-31 16:03 ` Oleg Nesterov [this message]
2014-07-31 17:13 ` Frederic Weisbecker
2014-07-31 18:12 ` Oleg Nesterov
2014-07-31 18:47 ` Frederic Weisbecker
2014-07-31 18:50 ` Frederic Weisbecker
2014-07-31 19:05 ` Oleg Nesterov
2014-08-02 17:30 ` Oleg Nesterov
2014-08-04 12:02 ` Paul E. McKenney
2014-07-28 20:23 ` [PATCH v3 6/8] x86: Split syscall_trace_enter into two phases Andy Lutomirski
2014-07-29 16:54 ` Oleg Nesterov
2014-07-29 17:01 ` Andy Lutomirski
2014-07-29 17:31 ` Oleg Nesterov
2014-07-29 17:55 ` Andy Lutomirski
2014-07-29 18:16 ` Oleg Nesterov
2014-07-29 18:22 ` Andy Lutomirski
2014-07-29 18:44 ` Oleg Nesterov
2014-07-22 1:53 ` [PATCH v3 7/8] x86_64, entry: Treat regs->ax the same in fastpath and slowpath syscalls Andy Lutomirski
2014-07-22 1:53 ` [PATCH v3 8/8] x86_64, entry: Use split-phase syscall_trace_enter for 64-bit syscalls Andy Lutomirski
2014-07-22 19:37 ` [PATCH v3 0/8] Two-phase seccomp and x86 tracing changes Kees Cook
2014-07-23 19:20 ` Andy Lutomirski
2014-07-28 17:59 ` H. Peter Anvin
2014-07-28 23:29 ` Kees Cook
2014-07-28 23:34 ` H. Peter Anvin
2014-07-28 23:42 ` Kees Cook
2014-07-28 23:45 ` H. Peter Anvin
2014-07-28 23:54 ` Kees Cook
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140731160353.GA14772@redhat.com \
--to=oleg@redhat.com \
--cc=linux-arm-kernel@lists.infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).