From: Chris Metcalf <cmetcalf@ezchip.com>
To: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Gilad Ben Yossef <giladb@ezchip.com>,
Steven Rostedt <rostedt@goodmis.org>,
Ingo Molnar <mingo@kernel.org>,
Peter Zijlstra <peterz@infradead.org>,
Andrew Morton <akpm@linux-foundation.org>,
Rik van Riel <riel@redhat.com>, Tejun Heo <tj@kernel.org>,
Thomas Gleixner <tglx@linutronix.de>,
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
Christoph Lameter <cl@linux.com>,
Viresh Kumar <viresh.kumar@linaro.org>,
Catalin Marinas <catalin.marinas@arm.com>,
Will Deacon <will.deacon@arm.com>,
Andy Lutomirski <luto@amacapital.net>,
<linux-doc@vger.kernel.org>, <linux-api@vger.kernel.org>,
<linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v9 04/13] task_isolation: add initial support
Date: Tue, 19 Jan 2016 15:45:04 -0500 [thread overview]
Message-ID: <569EA050.5030106@ezchip.com> (raw)
In-Reply-To: <20160119154214.GA2722@lerouge>
On 01/19/2016 10:42 AM, Frederic Weisbecker wrote:
> On Mon, Jan 04, 2016 at 02:34:42PM -0500, Chris Metcalf wrote:
>> diff --git a/kernel/isolation.c b/kernel/isolation.c
>> new file mode 100644
>> index 000000000000..68a9f7457bc0
>> --- /dev/null
>> +++ b/kernel/isolation.c
>> @@ -0,0 +1,105 @@
>> +/*
>> + * linux/kernel/isolation.c
>> + *
>> + * Implementation for task isolation.
>> + *
>> + * Distributed under GPLv2.
>> + */
>> +
>> +#include <linux/mm.h>
>> +#include <linux/swap.h>
>> +#include <linux/vmstat.h>
>> +#include <linux/isolation.h>
>> +#include <linux/syscalls.h>
>> +#include "time/tick-sched.h"
>> +
>> +cpumask_var_t task_isolation_map;
>> +
>> +/*
>> + * Isolation requires both nohz and isolcpus support from the scheduler.
>> + * We provide a boot flag that enables both for now, and which we can
>> + * add other functionality to over time if needed. Note that just
>> + * specifying "nohz_full=... isolcpus=..." does not enable task isolation.
>> + */
>> +static int __init task_isolation_setup(char *str)
>> +{
>> + alloc_bootmem_cpumask_var(&task_isolation_map);
>> + if (cpulist_parse(str, task_isolation_map) < 0) {
>> + pr_warn("task_isolation: Incorrect cpumask '%s'\n", str);
>> + return 1;
>> + }
>> +
>> + alloc_bootmem_cpumask_var(&cpu_isolated_map);
>> + cpumask_copy(cpu_isolated_map, task_isolation_map);
>> +
>> + alloc_bootmem_cpumask_var(&tick_nohz_full_mask);
>> + cpumask_copy(tick_nohz_full_mask, task_isolation_map);
>> + tick_nohz_full_running = true;
> How about calling tick_nohz_full_setup() instead? I'd rather prefer
> that nohz full implementation details stay in tick-sched.c
>
> Also what happens if nohz_full= is given as well as task_isolation= ?
> Don't we risk a memory leak and maybe breaking the fact that
> (nohz_full & task_isolation != task_isolation) which is really a requirement?
Yeah, this is a good point. I'm not sure what the best way is to make
this happen. It's already true that we will leak memory if you
specify "nohz_full=" more than once on the command line, but it's
awkward to fix (assuming we want the last value to win) so maybe
we can just ignore this problem - it's a pretty small amount of memory
after all. If so, then making tick_nohz_full_setup() and
isolated_cpu_setup()
both non-static and calling them from task_isolation_setup() might
be the cleanest approach. What do you think?
You asked what happens if nohz_full= is given as well, which is a very
good question. Perhaps the right answer is to have an early_initcall
that suppresses task isolation on any cores that lost their nohz_full
or isolcpus status due to later boot command line arguments (and
generate a console warning, obviously).
>> +
>> + return 1;
>> +}
>> +__setup("task_isolation=", task_isolation_setup);
>> +
>> +/*
>> + * This routine controls whether we can enable task-isolation mode.
>> + * The task must be affinitized to a single task_isolation core or we will
>> + * return EINVAL. Although the application could later re-affinitize
>> + * to a housekeeping core and lose task isolation semantics, this
>> + * initial test should catch 99% of bugs with task placement prior to
>> + * enabling task isolation.
>> + */
>> +int task_isolation_set(unsigned int flags)
>> +{
>> + if (cpumask_weight(tsk_cpus_allowed(current)) != 1 ||
>> + !task_isolation_possible(smp_processor_id()))
>> + return -EINVAL;
>> +
>> + current->task_isolation_flags = flags;
>> + return 0;
>> +}
> What if we concurrently change the task's affinity? Also it seems that preemption
> isn't disabled, so we can also migrate concurrently. I'm surprised you haven't
> seen warnings with smp_processor_id().
>
> Also we should protect against task's affinity change when task_isolation_flags
> is set.
I talked about this a bit when you raised it for the v8 patch series:
http://lkml.kernel.org/r/562FA8FD.8080502@ezchip.com
I'd be curious to hear your take on the arguments I made there.
You're absolutely right about the preemption warnings, which I only fixed
a few days ago. In this case I use raw_smp_processor_id() since with a
fixed single-core cpu affinity, we're not going anywhere, so the warning
from smp_processor_id() would be bogus. And although technically it is
still correct (racing with another task resetting the task affinity on this
one), it is in any case equivalent to having that other task reset the
affinity
on return from the prctl(), which I've already claimed isn't an interesting
use case to try to handle. But let me know what you think!
>> +
>> +/*
>> + * In task isolation mode we try to return to userspace only after
>> + * attempting to make sure we won't be interrupted again. To handle
>> + * the periodic scheduler tick, we test to make sure that the tick is
>> + * stopped, and if it isn't yet, we request a reschedule so that if
>> + * another task needs to run to completion first, it can do so.
>> + * Similarly, if any other subsystems require quiescing, we will need
>> + * to do that before we return to userspace.
>> + */
>> +bool _task_isolation_ready(void)
>> +{
>> + WARN_ON_ONCE(!irqs_disabled());
>> +
>> + /* If we need to drain the LRU cache, we're not ready. */
>> + if (lru_add_drain_needed(smp_processor_id()))
>> + return false;
>> +
>> + /* If vmstats need updating, we're not ready. */
>> + if (!vmstat_idle())
>> + return false;
>> +
>> + /* Request rescheduling unless we are in full dynticks mode. */
>> + if (!tick_nohz_tick_stopped()) {
>> + set_tsk_need_resched(current);
> I'm not sure doing this will help getting the tick to get stopped.
Well, I don't know that there is anything else we CAN do, right? If there's
another task that can run, great - it may be that that's why full dynticks
isn't happening yet. Or, it might be that we're waiting for an RCU tick and
there's nothing else we can do, in which case we basically spend our time
going around through the scheduler code and back out to the
task_isolation_ready() test, but again, there's really nothing else more
useful we can be doing at this point. Once the RCU tick fires (or whatever
it was that was preventing full dynticks from engaging), we will pass this
test and return to user space.
--
Chris Metcalf, EZChip Semiconductor
http://www.ezchip.com
next prev parent reply other threads:[~2016-01-19 20:45 UTC|newest]
Thread overview: 57+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-01-04 19:34 [PATCH v9 00/13] support "task_isolation" mode for nohz_full Chris Metcalf
2016-01-04 19:34 ` [PATCH v9 01/13] vmstat: provide a function to quiet down the diff processing Chris Metcalf
2016-01-04 19:34 ` [PATCH v9 02/13] vmstat: add vmstat_idle function Chris Metcalf
2016-01-04 19:34 ` [PATCH v9 03/13] lru_add_drain_all: factor out lru_add_drain_needed Chris Metcalf
2016-01-04 19:34 ` [PATCH v9 04/13] task_isolation: add initial support Chris Metcalf
2016-01-19 15:42 ` Frederic Weisbecker
2016-01-19 20:45 ` Chris Metcalf [this message]
2016-01-28 0:28 ` Frederic Weisbecker
2016-01-29 18:18 ` Chris Metcalf
2016-01-30 21:11 ` Frederic Weisbecker
2016-02-11 19:24 ` Chris Metcalf
2016-03-04 12:56 ` Frederic Weisbecker
2016-03-09 19:39 ` Chris Metcalf
2016-04-08 13:56 ` Frederic Weisbecker
2016-04-08 16:34 ` Chris Metcalf
2016-04-12 18:41 ` Chris Metcalf
2016-04-22 13:16 ` Frederic Weisbecker
2016-04-25 20:36 ` Chris Metcalf
2016-05-26 1:07 ` Frederic Weisbecker
2016-06-03 19:32 ` Chris Metcalf
2016-06-29 15:18 ` Frederic Weisbecker
2016-07-01 20:59 ` Chris Metcalf
2016-07-05 14:41 ` Frederic Weisbecker
2016-07-05 17:47 ` Christoph Lameter
2016-01-04 19:34 ` [PATCH v9 05/13] task_isolation: support PR_TASK_ISOLATION_STRICT mode Chris Metcalf
2016-01-04 19:34 ` [PATCH v9 06/13] task_isolation: add debug boot flag Chris Metcalf
2016-01-04 22:52 ` Steven Rostedt
2016-01-04 23:42 ` Chris Metcalf
2016-01-05 13:42 ` Steven Rostedt
2016-01-04 19:34 ` [PATCH v9 07/13] arch/x86: enable task isolation functionality Chris Metcalf
2016-01-04 21:02 ` [PATCH v9bis " Chris Metcalf
2016-01-04 19:34 ` [PATCH v9 08/13] arch/arm64: adopt prepare_exit_to_usermode() model from x86 Chris Metcalf
2016-01-04 20:33 ` Mark Rutland
2016-01-04 21:01 ` Chris Metcalf
2016-01-05 17:21 ` Mark Rutland
2016-01-05 17:33 ` [PATCH 1/2] arm64: entry: remove pointless SPSR mode check Mark Rutland
2016-01-06 12:15 ` Catalin Marinas
2016-01-05 17:33 ` [PATCH 2/2] arm64: factor work_pending state machine to C Mark Rutland
2016-01-05 18:53 ` Chris Metcalf
2016-01-06 12:30 ` Catalin Marinas
2016-01-06 12:47 ` Mark Rutland
2016-01-06 13:43 ` Mark Rutland
2016-01-06 14:17 ` Catalin Marinas
2016-01-04 22:31 ` [PATCH v9 08/13] arch/arm64: adopt prepare_exit_to_usermode() model from x86 Andy Lutomirski
2016-01-05 18:01 ` Mark Rutland
2016-01-04 19:34 ` [PATCH v9 09/13] arch/arm64: enable task isolation functionality Chris Metcalf
2016-01-04 19:34 ` [PATCH v9 10/13] arch/tile: adopt prepare_exit_to_usermode() model from x86 Chris Metcalf
2016-01-04 19:34 ` [PATCH v9 11/13] arch/tile: move user_exit() to early kernel entry sequence Chris Metcalf
2016-01-04 19:34 ` [PATCH v9 12/13] arch/tile: enable task isolation functionality Chris Metcalf
2016-01-04 19:34 ` [PATCH v9 13/13] arm, tile: turn off timer tick for oneshot_stopped state Chris Metcalf
2016-01-11 21:15 ` [PATCH v9 00/13] support "task_isolation" mode for nohz_full Chris Metcalf
2016-01-12 10:07 ` Will Deacon
2016-01-12 17:49 ` Chris Metcalf
2016-01-13 10:44 ` Ingo Molnar
2016-01-13 21:19 ` Chris Metcalf
2016-01-20 13:27 ` Mark Rutland
2016-01-12 10:53 ` Ingo Molnar
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=569EA050.5030106@ezchip.com \
--to=cmetcalf@ezchip.com \
--cc=akpm@linux-foundation.org \
--cc=catalin.marinas@arm.com \
--cc=cl@linux.com \
--cc=fweisbec@gmail.com \
--cc=giladb@ezchip.com \
--cc=linux-api@vger.kernel.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=luto@amacapital.net \
--cc=mingo@kernel.org \
--cc=paulmck@linux.vnet.ibm.com \
--cc=peterz@infradead.org \
--cc=riel@redhat.com \
--cc=rostedt@goodmis.org \
--cc=tglx@linutronix.de \
--cc=tj@kernel.org \
--cc=viresh.kumar@linaro.org \
--cc=will.deacon@arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).