* [PATCH] nohz: don't kick non-idle CPUs in tick_nohz_full_kick_cpu() @ 2018-07-12 18:19 Yury Norov 2018-07-16 15:31 ` Frederic Weisbecker 0 siblings, 1 reply; 5+ messages in thread From: Yury Norov @ 2018-07-12 18:19 UTC (permalink / raw) To: Frederic Weisbecker, Ingo Molnar, Thomas Gleixner Cc: Yury Norov, Goutham, Sunil, Chris Metcalf, linux-kernel IIUC, tick_nohz_full_kick_cpu() is intended to wakeup idle CPUs that will not be poked by scheduler because they are actually nohz_full. But in fact this function kicks all CPUs listed in tick_nohz_full_mask, namely: - idle CPUs; - CPUs runnung normal tasks; - CPUs running isolated tasks [1]; For normal tasks it introduces unneeded latency, and for isolated tasks it's fatal because isolation gets broken and task receives SIGKILL. The patch below makes tick_nohz_full_kick_cpu() kicking only idle CPUs. Non-idle nohz_full CPUs will observe changed system settings just like non-idle normal (i.e. not nohz_full) CPUs, at next reschedule. [1] https://lkml.org/lkml/2017/11/3/589 Signed-off-by: Yury Norov <ynorov@caviumnetworks.com> --- kernel/time/tick-sched.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c index c026145eba2f..1c24c700e75a 100644 --- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -247,7 +247,7 @@ static void tick_nohz_full_kick(void) */ void tick_nohz_full_kick_cpu(int cpu) { - if (!tick_nohz_full_cpu(cpu)) + if (!(tick_nohz_full_cpu(cpu) && idle_cpu(cpu))) return; irq_work_queue_on(&per_cpu(nohz_full_kick_work, cpu), cpu); -- 2.17.1 ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH] nohz: don't kick non-idle CPUs in tick_nohz_full_kick_cpu() 2018-07-12 18:19 [PATCH] nohz: don't kick non-idle CPUs in tick_nohz_full_kick_cpu() Yury Norov @ 2018-07-16 15:31 ` Frederic Weisbecker 2018-07-19 20:22 ` Yury Norov 0 siblings, 1 reply; 5+ messages in thread From: Frederic Weisbecker @ 2018-07-16 15:31 UTC (permalink / raw) To: Yury Norov Cc: Frederic Weisbecker, Ingo Molnar, Thomas Gleixner, Goutham, Sunil, Chris Metcalf, linux-kernel On Thu, Jul 12, 2018 at 09:19:22PM +0300, Yury Norov wrote: > IIUC, tick_nohz_full_kick_cpu() is intended to wakeup idle CPUs > that will not be poked by scheduler because they are actually > nohz_full. Not exactly. It is intended to trigger an interrupt on a nohz_full CPU that may be running in userspace without any tick. The irq_exit() code let us reprogramm the tick with the latest dependency updates. > > But in fact this function kicks all CPUs listed in tick_nohz_full_mask, > namely: > - idle CPUs; > - CPUs runnung normal tasks; > - CPUs running isolated tasks [1]; > > For normal tasks it introduces unneeded latency, and for isolated tasks > it's fatal because isolation gets broken and task receives SIGKILL. So this patch applies on Chris series right? For now there is no such distinction between normal and isolated tasks. Any task running in a nohz_full CPU is considered to be isolated. > The patch below makes tick_nohz_full_kick_cpu() kicking only idle CPUs. > Non-idle nohz_full CPUs will observe changed system settings just like > non-idle normal (i.e. not nohz_full) CPUs, at next reschedule. That's not exactly what we want. In fact when a task runs in a nohz_full CPU, it may not meet any reschedule interrupt for a long while. This is why we have tick_nohz_full_kick_cpu() in order to force a nohz_full CPU to see the latest changes. Thanks. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] nohz: don't kick non-idle CPUs in tick_nohz_full_kick_cpu() 2018-07-16 15:31 ` Frederic Weisbecker @ 2018-07-19 20:22 ` Yury Norov 2018-07-20 17:24 ` Thomas Gleixner 0 siblings, 1 reply; 5+ messages in thread From: Yury Norov @ 2018-07-19 20:22 UTC (permalink / raw) To: Frederic Weisbecker Cc: Frederic Weisbecker, Ingo Molnar, Thomas Gleixner, Goutham, Sunil, Chris Metcalf, linux-kernel On Mon, Jul 16, 2018 at 05:31:10PM +0200, Frederic Weisbecker wrote: > External Email > > On Thu, Jul 12, 2018 at 09:19:22PM +0300, Yury Norov wrote: > > IIUC, tick_nohz_full_kick_cpu() is intended to wakeup idle CPUs > > that will not be poked by scheduler because they are actually > > nohz_full. > > Not exactly. It is intended to trigger an interrupt on a nohz_full > CPU that may be running in userspace without any tick. The irq_exit() > code let us reprogramm the tick with the latest dependency updates. > > > > > But in fact this function kicks all CPUs listed in tick_nohz_full_mask, > > namely: > > - idle CPUs; > > - CPUs runnung normal tasks; > > - CPUs running isolated tasks [1]; > > > > For normal tasks it introduces unneeded latency, and for isolated tasks > > it's fatal because isolation gets broken and task receives SIGKILL. > > So this patch applies on Chris series right? This patch may be applied on master. That's why I sent it to you. > For now there is no such > distinction between normal and isolated tasks. Any task running in a > nohz_full CPU is considered to be isolated. > > > The patch below makes tick_nohz_full_kick_cpu() kicking only idle CPUs. > > Non-idle nohz_full CPUs will observe changed system settings just like > > non-idle normal (i.e. not nohz_full) CPUs, at next reschedule. > > That's not exactly what we want. In fact when a task runs in a nohz_full CPU, > it may not meet any reschedule interrupt for a long while. This is why we have > tick_nohz_full_kick_cpu() in order to force a nohz_full CPU to see the latest > changes. OK, got it. So if my understanding correct, there is 'soft isolation' which is nohz_full, and 'hard isolation' which is Chris' task_isonation feature. For soft isolation, the desirable behavior is to receive interrupts generated by tick_nohz_full_kick_cpu(), and for hard isolation it's obviously not desirable because it kills application. The patch below adds check against task isolation in tick_nohz_full_kick_cpu(). It is on top of Chris' series. Is it OK from nohz point of view? --- While here. I just wonder, on my system IRQs are sent to nohz_full CPUs at every incoming ssh connection. The trace is like this: [ 206.835533] Call trace: [ 206.848411] [<ffff00000889f984>] dump_stack+0x84/0xa8 [ 206.853455] [<ffff0000081ea308>] _task_isolation_remote+0x130/0x140 [ 206.859714] [<ffff0000081bf5ec>] irq_work_queue_on+0xcc/0xfc [ 206.865365] [<ffff0000081478ac>] tick_nohz_full_kick_cpu+0x88/0x94 [ 206.871536] [<ffff000008147930>] tick_nohz_dep_set_all+0x78/0xa8 [ 206.877533] [<ffff000008147b58>] tick_nohz_dep_set_signal+0x28/0x34 [ 206.883792] [<ffff0000081421fc>] set_process_cpu_timer+0xd0/0x128 [ 206.889876] [<ffff0000081422ac>] update_rlimit_cpu+0x58/0x7c [ 206.895528] [<ffff0000083aa3d0>] selinux_bprm_committing_creds+0x180/0x1fc [ 206.902394] [<ffff00000839e394>] security_bprm_committing_creds+0x40/0x5c [ 206.909173] [<ffff00000828c4a0>] install_exec_creds+0x20/0x6c [ 206.914911] [<ffff0000082e15b0>] load_elf_binary+0x368/0xbb8 [ 206.920561] [<ffff00000828d09c>] search_binary_handler+0xb8/0x224 [ 206.926645] [<ffff00000828d99c>] do_execveat_common+0x44c/0x5f0 [ 206.932555] [<ffff00000828db78>] do_execve+0x38/0x44 [ 206.937510] [<ffff00000828dd74>] SyS_execve+0x34/0x44 I suspect that scp, ssh tunneling and similar network activities will source ticks on nohz_full CPUs as well. On high-loaded server it may generate significant interrupt traffic on nohz_full CPUs. Is it desirable behavior? --- Yury From 9be3c9996c06319a8070ac182291d08acfdc588d Mon Sep 17 00:00:00 2001 From: Yury Norov <ynorov@caviumnetworks.com> Date: Tue, 17 Jul 2018 12:40:49 +0300 Subject: [PATCH] task_isolation: don't kick isolated CPUs with tick_nohz_full_kick_cpu() To: Chris Metcalf <cmetcalf@mellanox.com>, Frederic Weisbecker <frederic@kernel.org> Cc: Ingo Molnar <mingo@kernel.org>, Thomas Gleixner <tglx@linutronix.de>, "Goutham, Sunil" <Sunil.Goutham@cavium.com>, linux-kernel@vger.kernel.org On top of Chris Metcalf series: https://lkml.org/lkml/2017/11/3/589 tick_nohz_full_kick_cpu() currently interrupts CPUs that may run isolated task. It's not desirable because that kick will kill isolated application. The patch below adds check against task isolation in tick_nohz_full_kick_cpu() to prevent breaking the isolation. Signed-off-by: Yury Norov <ynorov@caviumnetworks.com> --- include/linux/isolation.h | 7 +++++++ kernel/isolation.c | 6 ------ kernel/time/tick-sched.c | 5 +++-- 3 files changed, 10 insertions(+), 8 deletions(-) diff --git a/include/linux/isolation.h b/include/linux/isolation.h index b7f0a9085b13..fad606cdcd5e 100644 --- a/include/linux/isolation.h +++ b/include/linux/isolation.h @@ -158,6 +158,12 @@ static inline void task_isolation_user_exit(void) #endif } +static inline bool is_isolation_cpu(int cpu) +{ + return task_isolation_map != NULL && + cpumask_test_cpu(cpu, task_isolation_map); +} + #else /* !CONFIG_TASK_ISOLATION */ static inline int task_isolation_request(unsigned int flags) { return -EINVAL; } static inline void task_isolation_start(void) { } @@ -172,6 +178,7 @@ static inline void task_isolation_remote_cpumask_interrupt( const struct cpumask *mask, const char *fmt, ...) { } static inline void task_isolation_signal(struct task_struct *task) { } static inline void task_isolation_user_exit(void) { } +static inline bool is_isolation_cpu(int cpu) { return 0; } #endif #endif /* _LINUX_ISOLATION_H */ diff --git a/kernel/isolation.c b/kernel/isolation.c index 1e39a1493e76..05db247924ef 100644 --- a/kernel/isolation.c +++ b/kernel/isolation.c @@ -41,12 +41,6 @@ static int __init task_isolation_init(void) } core_initcall(task_isolation_init) -static inline bool is_isolation_cpu(int cpu) -{ - return task_isolation_map != NULL && - cpumask_test_cpu(cpu, task_isolation_map); -} - /* Enable stack backtraces of any interrupts of task_isolation cores. */ static bool task_isolation_debug; static int __init task_isolation_debug_func(char *str) diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c index c026145eba2f..91928a6afd81 100644 --- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -23,6 +23,7 @@ #include <linux/sched/clock.h> #include <linux/sched/stat.h> #include <linux/sched/nohz.h> +#include <linux/isolation.h> #include <linux/module.h> #include <linux/irq_work.h> #include <linux/posix-timers.h> @@ -242,12 +243,12 @@ static void tick_nohz_full_kick(void) } /* - * Kick the CPU if it's full dynticks in order to force it to + * Kick the CPU if it's full dynticks and not isolated in order to force it to * re-evaluate its dependency on the tick and restart it if necessary. */ void tick_nohz_full_kick_cpu(int cpu) { - if (!tick_nohz_full_cpu(cpu)) + if (!tick_nohz_full_cpu(cpu) || is_isolation_cpu(cpu)) return; irq_work_queue_on(&per_cpu(nohz_full_kick_work, cpu), cpu); -- 2.17.1 ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH] nohz: don't kick non-idle CPUs in tick_nohz_full_kick_cpu() 2018-07-19 20:22 ` Yury Norov @ 2018-07-20 17:24 ` Thomas Gleixner 2018-07-23 13:15 ` Frederic Weisbecker 0 siblings, 1 reply; 5+ messages in thread From: Thomas Gleixner @ 2018-07-20 17:24 UTC (permalink / raw) To: Yury Norov Cc: Frederic Weisbecker, Frederic Weisbecker, Ingo Molnar, Goutham, Sunil, Chris Metcalf, linux-kernel On Thu, 19 Jul 2018, Yury Norov wrote: > While here. I just wonder, on my system IRQs are sent to nohz_full CPUs > at every incoming ssh connection. The trace is like this: > [ 206.835533] Call trace: > [ 206.848411] [<ffff00000889f984>] dump_stack+0x84/0xa8 > [ 206.853455] [<ffff0000081ea308>] _task_isolation_remote+0x130/0x140 > [ 206.859714] [<ffff0000081bf5ec>] irq_work_queue_on+0xcc/0xfc > [ 206.865365] [<ffff0000081478ac>] tick_nohz_full_kick_cpu+0x88/0x94 > [ 206.871536] [<ffff000008147930>] tick_nohz_dep_set_all+0x78/0xa8 > [ 206.877533] [<ffff000008147b58>] tick_nohz_dep_set_signal+0x28/0x34 > [ 206.883792] [<ffff0000081421fc>] set_process_cpu_timer+0xd0/0x128 > [ 206.889876] [<ffff0000081422ac>] update_rlimit_cpu+0x58/0x7c > [ 206.895528] [<ffff0000083aa3d0>] selinux_bprm_committing_creds+0x180/0x1fc > [ 206.902394] [<ffff00000839e394>] security_bprm_committing_creds+0x40/0x5c > [ 206.909173] [<ffff00000828c4a0>] install_exec_creds+0x20/0x6c > [ 206.914911] [<ffff0000082e15b0>] load_elf_binary+0x368/0xbb8 > [ 206.920561] [<ffff00000828d09c>] search_binary_handler+0xb8/0x224 > [ 206.926645] [<ffff00000828d99c>] do_execveat_common+0x44c/0x5f0 > [ 206.932555] [<ffff00000828db78>] do_execve+0x38/0x44 > [ 206.937510] [<ffff00000828dd74>] SyS_execve+0x34/0x44 > > I suspect that scp, ssh tunneling and similar network activities will source > ticks on nohz_full CPUs as well. On high-loaded server it may generate > significant interrupt traffic on nohz_full CPUs. Is it desirable behavior? Supsicions and desirable are not really technical interesting aspects. Just from looking at the stack trace it's obvious that there is a CPU TIME rlimit on that newly spawned sshd. That's nothing what the kernel imposes. That's what user space sets. Now the actual mechanism which does that, i.e. set_process_cpu_timer() ends up IPI'ing _ALL_ nohz full CPUs for no real good reason. In the exec path this is really pointless because the new process is not running yet and it is single threaded. So forcing a IPI to all cpus is pretty pointless. In fact the state of the task/process for which update_rlimit_cpu(() is called is known, so the IPI can really be either avoided completely or restricted to the CPUs on which this process can run or actually runs. Fredric? Thanks, tglx ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] nohz: don't kick non-idle CPUs in tick_nohz_full_kick_cpu() 2018-07-20 17:24 ` Thomas Gleixner @ 2018-07-23 13:15 ` Frederic Weisbecker 0 siblings, 0 replies; 5+ messages in thread From: Frederic Weisbecker @ 2018-07-23 13:15 UTC (permalink / raw) To: Thomas Gleixner Cc: Yury Norov, Frederic Weisbecker, Ingo Molnar, Goutham, Sunil, Chris Metcalf, linux-kernel On Fri, Jul 20, 2018 at 07:24:00PM +0200, Thomas Gleixner wrote: > On Thu, 19 Jul 2018, Yury Norov wrote: > > While here. I just wonder, on my system IRQs are sent to nohz_full CPUs > > at every incoming ssh connection. The trace is like this: > > [ 206.835533] Call trace: > > [ 206.848411] [<ffff00000889f984>] dump_stack+0x84/0xa8 > > [ 206.853455] [<ffff0000081ea308>] _task_isolation_remote+0x130/0x140 > > [ 206.859714] [<ffff0000081bf5ec>] irq_work_queue_on+0xcc/0xfc > > [ 206.865365] [<ffff0000081478ac>] tick_nohz_full_kick_cpu+0x88/0x94 > > [ 206.871536] [<ffff000008147930>] tick_nohz_dep_set_all+0x78/0xa8 > > [ 206.877533] [<ffff000008147b58>] tick_nohz_dep_set_signal+0x28/0x34 > > [ 206.883792] [<ffff0000081421fc>] set_process_cpu_timer+0xd0/0x128 > > [ 206.889876] [<ffff0000081422ac>] update_rlimit_cpu+0x58/0x7c > > [ 206.895528] [<ffff0000083aa3d0>] selinux_bprm_committing_creds+0x180/0x1fc > > [ 206.902394] [<ffff00000839e394>] security_bprm_committing_creds+0x40/0x5c > > [ 206.909173] [<ffff00000828c4a0>] install_exec_creds+0x20/0x6c > > [ 206.914911] [<ffff0000082e15b0>] load_elf_binary+0x368/0xbb8 > > [ 206.920561] [<ffff00000828d09c>] search_binary_handler+0xb8/0x224 > > [ 206.926645] [<ffff00000828d99c>] do_execveat_common+0x44c/0x5f0 > > [ 206.932555] [<ffff00000828db78>] do_execve+0x38/0x44 > > [ 206.937510] [<ffff00000828dd74>] SyS_execve+0x34/0x44 > > > > I suspect that scp, ssh tunneling and similar network activities will source > > ticks on nohz_full CPUs as well. On high-loaded server it may generate > > significant interrupt traffic on nohz_full CPUs. Is it desirable behavior? > > Supsicions and desirable are not really technical interesting aspects. > > Just from looking at the stack trace it's obvious that there is a CPU TIME > rlimit on that newly spawned sshd. That's nothing what the kernel > imposes. That's what user space sets. > > Now the actual mechanism which does that, i.e. set_process_cpu_timer() ends > up IPI'ing _ALL_ nohz full CPUs for no real good reason. In the exec path > this is really pointless because the new process is not running yet and it > is single threaded. So forcing a IPI to all cpus is pretty pointless. > > In fact the state of the task/process for which update_rlimit_cpu(() is > called is known, so the IPI can really be either avoided completely or > restricted to the CPUs on which this process can run or actually runs. > > Fredric? Indeed, so far the tick dependency code is lazy and IPIs everywhere when we add either a thread or a process timer. We want to make sure that any thread target, running somewhere without a tick, sees the new tick dependency. So in the case a single thread, I can easily fix that and IPI the CPU it's running in if any. In the case of a thread group, I'm concerned about the performance penalty to walk through each of them and IPI only those running. But probably we'll have to come into that in the end. Thanks. ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2018-07-23 13:15 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2018-07-12 18:19 [PATCH] nohz: don't kick non-idle CPUs in tick_nohz_full_kick_cpu() Yury Norov 2018-07-16 15:31 ` Frederic Weisbecker 2018-07-19 20:22 ` Yury Norov 2018-07-20 17:24 ` Thomas Gleixner 2018-07-23 13:15 ` Frederic Weisbecker
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox