* [PATCH V2 1/8] timer: track pinned timers with TIMER_PINNED flag
2014-04-04 8:35 [PATCH V2 0/8] cpusets: Isolate CPUs via sysfs using cpusets Viresh Kumar
@ 2014-04-04 8:35 ` Viresh Kumar
2014-04-04 8:35 ` [PATCH V2 5/8] hrtimer: don't migrate pinned timers Viresh Kumar
` (4 subsequent siblings)
5 siblings, 0 replies; 13+ messages in thread
From: Viresh Kumar @ 2014-04-04 8:35 UTC (permalink / raw)
To: tglx, fweisbec, peterz, mingo, tj, lizefan
Cc: linaro-kernel, linaro-networking, Arvind.Chauhan, linux-kernel,
cgroups, Viresh Kumar
In order to quiesce a CPU on which Isolation might be required, we need to move
away all the timers queued on that CPU. There are two types of timers queued on
any CPU: ones that are pinned to that CPU and others can run on any CPU but are
queued on CPU in question. And we need to migrate only the second type of timers
away from the CPU entering quiesce state.
For this we need some basic infrastructure in timer core to identify which
timers are pinned and which are not.
Hence, this patch adds another flag bit TIMER_PINNED which will be set only for
the timers which are pinned to a CPU.
It also removes 'pinned' parameter of __mod_timer() as it is no more required.
NOTE: One functional change worth mentioning
Existing Behavior: add_timer_on() followed by multiple mod_timer() wouldn't pin
the timer on CPU mentioned in add_timer_on()..
New Behavior: add_timer_on() followed by multiple mod_timer() would pin the
timer on CPU running mod_timer().
I didn't gave much attention to this as we should call mod_timer_on() for the
timers queued with add_timer_on(). Though if required we can simply clear the
TIMER_PINNED flag in mod_timer().
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
include/linux/timer.h | 10 ++++++----
kernel/timer.c | 27 ++++++++++++++++++++-------
2 files changed, 26 insertions(+), 11 deletions(-)
diff --git a/include/linux/timer.h b/include/linux/timer.h
index 8c5a197..2962403 100644
--- a/include/linux/timer.h
+++ b/include/linux/timer.h
@@ -49,7 +49,7 @@ extern struct tvec_base boot_tvec_bases;
#endif
/*
- * Note that all tvec_bases are at least 4 byte aligned and lower two bits
+ * Note that all tvec_bases are at least 8 byte aligned and lower three bits
* of base in timer_list is guaranteed to be zero. Use them for flags.
*
* A deferrable timer will work normally when the system is busy, but
@@ -61,14 +61,18 @@ extern struct tvec_base boot_tvec_bases;
* the completion of the running instance from IRQ handlers, for example,
* by calling del_timer_sync().
*
+ * A pinned timer is allowed to run only on the cpu mentioned and shouldn't be
+ * migrated to any other CPU.
+ *
* Note: The irq disabled callback execution is a special case for
* workqueue locking issues. It's not meant for executing random crap
* with interrupts disabled. Abuse is monitored!
*/
#define TIMER_DEFERRABLE 0x1LU
#define TIMER_IRQSAFE 0x2LU
+#define TIMER_PINNED 0x4LU
-#define TIMER_FLAG_MASK 0x3LU
+#define TIMER_FLAG_MASK 0x7LU
#define __TIMER_INITIALIZER(_function, _expires, _data, _flags) { \
.entry = { .prev = TIMER_ENTRY_STATIC }, \
@@ -179,8 +183,6 @@ extern int mod_timer_pinned(struct timer_list *timer, unsigned long expires);
extern void set_timer_slack(struct timer_list *time, int slack_hz);
-#define TIMER_NOT_PINNED 0
-#define TIMER_PINNED 1
/*
* The jiffies value which is added to now, when there is no timer
* in the timer wheel:
diff --git a/kernel/timer.c b/kernel/timer.c
index d13eb56..e8bcaff 100644
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -104,6 +104,11 @@ static inline unsigned int tbase_get_irqsafe(struct tvec_base *base)
return ((unsigned int)(unsigned long)base & TIMER_IRQSAFE);
}
+static inline unsigned int tbase_get_pinned(struct tvec_base *base)
+{
+ return ((unsigned int)(unsigned long)base & TIMER_PINNED);
+}
+
static inline struct tvec_base *tbase_get_base(struct tvec_base *base)
{
return ((struct tvec_base *)((unsigned long)base & ~TIMER_FLAG_MASK));
@@ -117,6 +122,13 @@ timer_set_base(struct timer_list *timer, struct tvec_base *new_base)
timer->base = (struct tvec_base *)((unsigned long)(new_base) | flags);
}
+static inline void
+timer_set_flags(struct timer_list *timer, unsigned int flags)
+{
+ timer->base = (struct tvec_base *)((unsigned long)(timer->base) |
+ flags);
+}
+
static unsigned long round_jiffies_common(unsigned long j, int cpu,
bool force_up)
{
@@ -742,8 +754,7 @@ static struct tvec_base *lock_timer_base(struct timer_list *timer,
}
static inline int
-__mod_timer(struct timer_list *timer, unsigned long expires,
- bool pending_only, int pinned)
+__mod_timer(struct timer_list *timer, unsigned long expires, bool pending_only)
{
struct tvec_base *base, *new_base;
unsigned long flags;
@@ -760,7 +771,7 @@ __mod_timer(struct timer_list *timer, unsigned long expires,
debug_activate(timer, expires);
- cpu = get_nohz_timer_target(pinned);
+ cpu = get_nohz_timer_target(tbase_get_pinned(timer->base));
new_base = per_cpu(tvec_bases, cpu);
if (base != new_base) {
@@ -802,7 +813,7 @@ out_unlock:
*/
int mod_timer_pending(struct timer_list *timer, unsigned long expires)
{
- return __mod_timer(timer, expires, true, TIMER_NOT_PINNED);
+ return __mod_timer(timer, expires, true);
}
EXPORT_SYMBOL(mod_timer_pending);
@@ -877,7 +888,7 @@ int mod_timer(struct timer_list *timer, unsigned long expires)
if (timer_pending(timer) && timer->expires == expires)
return 1;
- return __mod_timer(timer, expires, false, TIMER_NOT_PINNED);
+ return __mod_timer(timer, expires, false);
}
EXPORT_SYMBOL(mod_timer);
@@ -905,7 +916,8 @@ int mod_timer_pinned(struct timer_list *timer, unsigned long expires)
if (timer->expires == expires && timer_pending(timer))
return 1;
- return __mod_timer(timer, expires, false, TIMER_PINNED);
+ timer_set_flags(timer, TIMER_PINNED);
+ return __mod_timer(timer, expires, false);
}
EXPORT_SYMBOL(mod_timer_pinned);
@@ -944,6 +956,7 @@ void add_timer_on(struct timer_list *timer, int cpu)
timer_stats_timer_set_start_info(timer);
BUG_ON(timer_pending(timer) || !timer->function);
+ timer_set_flags(timer, TIMER_PINNED);
spin_lock_irqsave(&base->lock, flags);
timer_set_base(timer, base);
debug_activate(timer, timer->expires);
@@ -1493,7 +1506,7 @@ signed long __sched schedule_timeout(signed long timeout)
expire = timeout + jiffies;
setup_timer_on_stack(&timer, process_timeout, (unsigned long)current);
- __mod_timer(&timer, expire, false, TIMER_NOT_PINNED);
+ __mod_timer(&timer, expire, false);
schedule();
del_singleshot_timer_sync(&timer);
--
1.7.12.rc2.18.g61b472e
^ permalink raw reply related [flat|nested] 13+ messages in thread* [PATCH V2 5/8] hrtimer: don't migrate pinned timers
2014-04-04 8:35 [PATCH V2 0/8] cpusets: Isolate CPUs via sysfs using cpusets Viresh Kumar
2014-04-04 8:35 ` [PATCH V2 1/8] timer: track pinned timers with TIMER_PINNED flag Viresh Kumar
@ 2014-04-04 8:35 ` Viresh Kumar
2014-04-04 8:35 ` [PATCH V2 6/8] hrtimer: create hrtimer_quiesce_cpu() to isolate CPU from hrtimers Viresh Kumar
` (3 subsequent siblings)
5 siblings, 0 replies; 13+ messages in thread
From: Viresh Kumar @ 2014-04-04 8:35 UTC (permalink / raw)
To: tglx, fweisbec, peterz, mingo, tj, lizefan
Cc: linaro-kernel, linaro-networking, Arvind.Chauhan, linux-kernel,
cgroups, Viresh Kumar
migrate_hrtimer() is called when a CPU goes down and its timers are required to
be migrated to some other CPU. Its the responsibility of the users of the
hrtimer to remove it before control reaches to migrate_hrtimer().
As these were the pinned hrtimers, the best we can do is: don't migrate these
and report to the user as well.
That's all this patch does.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
kernel/hrtimer.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
index c5a4bf4..853dd8c 100644
--- a/kernel/hrtimer.c
+++ b/kernel/hrtimer.c
@@ -1640,6 +1640,7 @@ static void migrate_hrtimer_list(struct hrtimer_clock_base *old_base,
{
struct hrtimer *timer;
struct timerqueue_node *node;
+ int is_pinned;
while ((node = timerqueue_getnext(&old_base->active))) {
timer = container_of(node, struct hrtimer, node);
@@ -1652,6 +1653,15 @@ static void migrate_hrtimer_list(struct hrtimer_clock_base *old_base,
* under us on another CPU
*/
__remove_hrtimer(timer, HRTIMER_STATE_MIGRATE, 0);
+
+ is_pinned = timer->state & HRTIMER_STATE_PINNED;
+
+ /* Check if CPU still has pinned timers */
+ if (unlikely(WARN(is_pinned,
+ "%s: can't migrate pinned timer: %p, deactivating it\n",
+ __func__, timer)))
+ continue;
+
timer->base = new_base;
/*
* Enqueue the timers on the new cpu. This does not
--
1.7.12.rc2.18.g61b472e
^ permalink raw reply related [flat|nested] 13+ messages in thread* [PATCH V2 6/8] hrtimer: create hrtimer_quiesce_cpu() to isolate CPU from hrtimers
2014-04-04 8:35 [PATCH V2 0/8] cpusets: Isolate CPUs via sysfs using cpusets Viresh Kumar
2014-04-04 8:35 ` [PATCH V2 1/8] timer: track pinned timers with TIMER_PINNED flag Viresh Kumar
2014-04-04 8:35 ` [PATCH V2 5/8] hrtimer: don't migrate pinned timers Viresh Kumar
@ 2014-04-04 8:35 ` Viresh Kumar
2014-04-04 8:35 ` [PATCH V2 7/8] cpuset: Create sysfs file: cpusets.quiesce to isolate CPUs Viresh Kumar
` (2 subsequent siblings)
5 siblings, 0 replies; 13+ messages in thread
From: Viresh Kumar @ 2014-04-04 8:35 UTC (permalink / raw)
To: tglx, fweisbec, peterz, mingo, tj, lizefan
Cc: linaro-kernel, linaro-networking, Arvind.Chauhan, linux-kernel,
cgroups, Viresh Kumar
To isolate CPUs (isolate from hrtimers) from sysfs using cpusets, we need some
support from the hrtimer core. i.e. A routine hrtimer_quiesce_cpu() which would
migrate away all the unpinned hrtimers, but shouldn't touch the pinned ones.
This patch creates this routine.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
include/linux/hrtimer.h | 3 +++
kernel/hrtimer.c | 47 +++++++++++++++++++++++++++++++++++++++--------
2 files changed, 42 insertions(+), 8 deletions(-)
diff --git a/include/linux/hrtimer.h b/include/linux/hrtimer.h
index 9fdb67b..0718753 100644
--- a/include/linux/hrtimer.h
+++ b/include/linux/hrtimer.h
@@ -350,6 +350,9 @@ DECLARE_PER_CPU(struct tick_device, tick_cpu_device);
/* Exported timer functions: */
+/* To be used from cpusets, only */
+extern void hrtimer_quiesce_cpu(void *cpup);
+
/* Initialize timers: */
extern void hrtimer_init(struct hrtimer *timer, clockid_t which_clock,
enum hrtimer_mode mode);
diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
index 853dd8c..e8cd1db 100644
--- a/kernel/hrtimer.c
+++ b/kernel/hrtimer.c
@@ -1633,17 +1633,21 @@ static void init_hrtimers_cpu(int cpu)
hrtimer_init_hres(cpu_base);
}
-#ifdef CONFIG_HOTPLUG_CPU
-
+#if defined(CONFIG_HOTPLUG_CPU) || defined(CONFIG_CPUSETS)
static void migrate_hrtimer_list(struct hrtimer_clock_base *old_base,
- struct hrtimer_clock_base *new_base)
+ struct hrtimer_clock_base *new_base,
+ bool remove_pinned)
{
struct hrtimer *timer;
struct timerqueue_node *node;
+ struct timerqueue_head pinned;
int is_pinned;
+ timerqueue_init_head(&pinned);
+
while ((node = timerqueue_getnext(&old_base->active))) {
timer = container_of(node, struct hrtimer, node);
+
BUG_ON(hrtimer_callback_running(timer));
debug_deactivate(timer);
@@ -1655,6 +1659,10 @@ static void migrate_hrtimer_list(struct hrtimer_clock_base *old_base,
__remove_hrtimer(timer, HRTIMER_STATE_MIGRATE, 0);
is_pinned = timer->state & HRTIMER_STATE_PINNED;
+ if (!remove_pinned && is_pinned) {
+ timerqueue_add(&pinned, &timer->node);
+ continue;
+ }
/* Check if CPU still has pinned timers */
if (unlikely(WARN(is_pinned,
@@ -1676,18 +1684,24 @@ static void migrate_hrtimer_list(struct hrtimer_clock_base *old_base,
/* Clear the migration state bit */
timer->state &= ~HRTIMER_STATE_MIGRATE;
}
+
+ /* Re-queue pinned timers for non-hotplug usecase */
+ while ((node = timerqueue_getnext(&pinned))) {
+ timer = container_of(node, struct hrtimer, node);
+
+ timerqueue_del(&pinned, &timer->node);
+ enqueue_hrtimer(timer);
+ timer->state &= ~HRTIMER_STATE_MIGRATE;
+ }
}
-static void migrate_hrtimers(int scpu)
+static void __migrate_hrtimers(int scpu, bool remove_pinned)
{
struct hrtimer_cpu_base *old_base, *new_base;
struct hrtimer_clock_base *clock_base;
unsigned int active_bases;
int i;
- BUG_ON(cpu_online(scpu));
- tick_cancel_sched_timer(scpu);
-
local_irq_disable();
old_base = &per_cpu(hrtimer_bases, scpu);
new_base = &__get_cpu_var(hrtimer_bases);
@@ -1700,7 +1714,8 @@ static void migrate_hrtimers(int scpu)
for_each_active_base(i, clock_base, old_base, active_bases)
migrate_hrtimer_list(clock_base,
- &new_base->clock_base[clock_base->index]);
+ &new_base->clock_base[clock_base->index],
+ remove_pinned);
raw_spin_unlock(&old_base->lock);
raw_spin_unlock(&new_base->lock);
@@ -1709,9 +1724,25 @@ static void migrate_hrtimers(int scpu)
__hrtimer_peek_ahead_timers();
local_irq_enable();
}
+#endif /* CONFIG_HOTPLUG_CPU || CONFIG_CPUSETS */
+#ifdef CONFIG_HOTPLUG_CPU
+static void migrate_hrtimers(int scpu)
+{
+ BUG_ON(cpu_online(scpu));
+ tick_cancel_sched_timer(scpu);
+
+ __migrate_hrtimers(scpu, true);
+}
#endif /* CONFIG_HOTPLUG_CPU */
+#ifdef CONFIG_CPUSETS
+void hrtimer_quiesce_cpu(void *cpup)
+{
+ __migrate_hrtimers(*(int *)cpup, false);
+}
+#endif /* CONFIG_CPUSETS */
+
static int hrtimer_cpu_notify(struct notifier_block *self,
unsigned long action, void *hcpu)
{
--
1.7.12.rc2.18.g61b472e
^ permalink raw reply related [flat|nested] 13+ messages in thread* [PATCH V2 7/8] cpuset: Create sysfs file: cpusets.quiesce to isolate CPUs
2014-04-04 8:35 [PATCH V2 0/8] cpusets: Isolate CPUs via sysfs using cpusets Viresh Kumar
` (2 preceding siblings ...)
2014-04-04 8:35 ` [PATCH V2 6/8] hrtimer: create hrtimer_quiesce_cpu() to isolate CPU from hrtimers Viresh Kumar
@ 2014-04-04 8:35 ` Viresh Kumar
2014-04-04 8:35 ` [PATCH V2 8/8] sched: don't queue timers on quiesced CPUs Viresh Kumar
[not found] ` <cover.1396599474.git.viresh.kumar-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
5 siblings, 0 replies; 13+ messages in thread
From: Viresh Kumar @ 2014-04-04 8:35 UTC (permalink / raw)
To: tglx, fweisbec, peterz, mingo, tj, lizefan
Cc: linaro-kernel, linaro-networking, Arvind.Chauhan, linux-kernel,
cgroups, Viresh Kumar
For networking applications, platforms need to provide one CPU per each user
space data plane thread. These CPUs shouldn't be interrupted by kernel at all
unless userspace has requested for some functionality. Currently, there are
background kernel activities that are running on almost every CPU, like:
timers/hrtimers/watchdogs/etc, and these are required to be migrated to other
CPUs.
To achieve that, this patch adds another option to cpusets, i.e. 'quiesce'.
Writing '1' on this file would migrate these unbound/unpinned timers/hrtimers
away from the CPUs of the cpuset in question. Also it would disallow addition of
any new unpinned timers/hrtimers to isolated CPUs (This would be handled in next
patch). Writing '0' will disable isolation of CPUs in current cpuset and
unpinned timers/hrtimers would be allowed in future on these CPUs.
Currently, only timers and hrtimers are migrated. This would be followed by
other kernel infrastructure later if required.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
Documentation/cgroups/cpusets.txt | 19 ++++++++--
include/linux/cpuset.h | 8 +++++
kernel/cpuset.c | 76 +++++++++++++++++++++++++++++++++++++++
3 files changed, 101 insertions(+), 2 deletions(-)
diff --git a/Documentation/cgroups/cpusets.txt b/Documentation/cgroups/cpusets.txt
index 7740038..8c1078b 100644
--- a/Documentation/cgroups/cpusets.txt
+++ b/Documentation/cgroups/cpusets.txt
@@ -22,7 +22,8 @@ CONTENTS:
1.6 What is memory spread ?
1.7 What is sched_load_balance ?
1.8 What is sched_relax_domain_level ?
- 1.9 How do I use cpusets ?
+ 1.9 What is quiesce?
+ 1.10 How do I use cpusets ?
2. Usage Examples and Syntax
2.1 Basic Usage
2.2 Adding/removing cpus
@@ -581,7 +582,21 @@ If your situation is:
then increasing 'sched_relax_domain_level' would benefit you.
-1.9 How do I use cpusets ?
+1.9 What is quiesce ?
+--------------------------------------
+We need to migrate away all the background kernel activities (Unbound) for
+systems requiring isolation of cores (HPC, Real time, networking, etc). After
+creating cpusets, you can write 1 or 0 to cpuset.quiesce file.
+
+Writing '1': on this file would migrate unbound/unpinned timers and hrtimers
+away from the CPUs of the cpuset in question. Also it would disallow addition of
+any new unpinned timers & hrtimers to isolated CPUs.
+
+Writing '0': will disable isolation of CPUs in current cpuset and unpinned
+timers/hrtimers would be allowed in future on these CPUs.
+
+
+1.10 How do I use cpusets ?
--------------------------
In order to minimize the impact of cpusets on critical kernel
diff --git a/include/linux/cpuset.h b/include/linux/cpuset.h
index 3fe661f..1ce0775 100644
--- a/include/linux/cpuset.h
+++ b/include/linux/cpuset.h
@@ -15,6 +15,13 @@
#ifdef CONFIG_CPUSETS
+extern cpumask_var_t cpuset_quiesced_cpus_mask;
+
+static inline bool cpu_quiesced(int cpu)
+{
+ return cpumask_test_cpu(cpu, cpuset_quiesced_cpus_mask);
+}
+
extern int number_of_cpusets; /* How many cpusets are defined in system? */
extern int cpuset_init(void);
@@ -123,6 +130,7 @@ static inline void set_mems_allowed(nodemask_t nodemask)
#else /* !CONFIG_CPUSETS */
+static inline bool cpu_quiesced(int cpu) { return 0; }
static inline int cpuset_init(void) { return 0; }
static inline void cpuset_init_smp(void) {}
diff --git a/kernel/cpuset.c b/kernel/cpuset.c
index 4410ac6..256cf11 100644
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -43,10 +43,12 @@
#include <linux/pagemap.h>
#include <linux/proc_fs.h>
#include <linux/rcupdate.h>
+#include <linux/tick.h>
#include <linux/sched.h>
#include <linux/seq_file.h>
#include <linux/security.h>
#include <linux/slab.h>
+#include <linux/smp.h>
#include <linux/spinlock.h>
#include <linux/stat.h>
#include <linux/string.h>
@@ -150,6 +152,7 @@ typedef enum {
CS_SCHED_LOAD_BALANCE,
CS_SPREAD_PAGE,
CS_SPREAD_SLAB,
+ CS_QUIESCE,
} cpuset_flagbits_t;
/* convenient tests for these bits */
@@ -193,6 +196,14 @@ static inline int is_spread_slab(const struct cpuset *cs)
return test_bit(CS_SPREAD_SLAB, &cs->flags);
}
+static inline int is_cpu_quiesced(const struct cpuset *cs)
+{
+ return test_bit(CS_QUIESCE, &cs->flags);
+}
+
+/* Mask of CPUs which have requested isolation */
+cpumask_var_t cpuset_quiesced_cpus_mask;
+
static struct cpuset top_cpuset = {
.flags = ((1 << CS_ONLINE) | (1 << CS_CPU_EXCLUSIVE) |
(1 << CS_MEM_EXCLUSIVE)),
@@ -1261,6 +1272,53 @@ static int update_relax_domain_level(struct cpuset *cs, s64 val)
}
/**
+ * quiesce_cpuset - Move unbound timers/hrtimers away from cpuset.cpus
+ * @cs: cpuset to be quiesced
+ *
+ * For isolating a core with cpusets we require all unbound timers/hrtimers to
+ * move away from isolated core. We migrate these to one of the CPUs which
+ * hasn't isolated itself yet. And the CPU is selected by
+ * smp_call_function_any() routine.
+ *
+ * Currently we are only migrating timers and hrtimers away.
+ */
+static int quiesce_cpuset(struct cpuset *cs, int turning_on)
+{
+ int from_cpu;
+ cpumask_t cpumask;
+
+ /* Fail if we are already in the requested state */
+ if (!(is_cpu_quiesced(cs) ^ turning_on))
+ return -EINVAL;
+
+ if (!turning_on) {
+ cpumask_andnot(cpuset_quiesced_cpus_mask,
+ cpuset_quiesced_cpus_mask, cs->cpus_allowed);
+ return 0;
+ }
+
+ cpumask_andnot(&cpumask, cpu_online_mask, cs->cpus_allowed);
+ cpumask_andnot(&cpumask, &cpumask, cpuset_quiesced_cpus_mask);
+
+ if (cpumask_empty(&cpumask)) {
+ pr_err("%s: Couldn't find a CPU to migrate to\n", __func__);
+ return -EPERM;
+ }
+
+ cpumask_or(cpuset_quiesced_cpus_mask, cpuset_quiesced_cpus_mask,
+ cs->cpus_allowed);
+
+ for_each_cpu(from_cpu, cs->cpus_allowed) {
+ smp_call_function_any(&cpumask, hrtimer_quiesce_cpu, &from_cpu,
+ 1);
+ smp_call_function_any(&cpumask, timer_quiesce_cpu, &from_cpu,
+ 1);
+ }
+
+ return 0;
+}
+
+/**
* cpuset_change_flag - make a task's spread flags the same as its cpuset's
* @tsk: task to be updated
* @data: cpuset to @tsk belongs to
@@ -1326,6 +1384,9 @@ static int update_flag(cpuset_flagbits_t bit, struct cpuset *cs,
if (err < 0)
goto out;
+ if (bit == CS_QUIESCE && quiesce_cpuset(cs, turning_on))
+ goto out;
+
err = heap_init(&heap, PAGE_SIZE, GFP_KERNEL, NULL);
if (err < 0)
goto out;
@@ -1597,6 +1658,7 @@ typedef enum {
FILE_MEMORY_PRESSURE,
FILE_SPREAD_PAGE,
FILE_SPREAD_SLAB,
+ FILE_CPU_QUIESCE,
} cpuset_filetype_t;
static int cpuset_write_u64(struct cgroup_subsys_state *css, struct cftype *cft,
@@ -1640,6 +1702,9 @@ static int cpuset_write_u64(struct cgroup_subsys_state *css, struct cftype *cft,
case FILE_SPREAD_SLAB:
retval = update_flag(CS_SPREAD_SLAB, cs, val);
break;
+ case FILE_CPU_QUIESCE:
+ retval = update_flag(CS_QUIESCE, cs, val);
+ break;
default:
retval = -EINVAL;
break;
@@ -1791,6 +1856,8 @@ static u64 cpuset_read_u64(struct cgroup_subsys_state *css, struct cftype *cft)
return is_spread_page(cs);
case FILE_SPREAD_SLAB:
return is_spread_slab(cs);
+ case FILE_CPU_QUIESCE:
+ return is_cpu_quiesced(cs);
default:
BUG();
}
@@ -1908,6 +1975,13 @@ static struct cftype files[] = {
.private = FILE_MEMORY_PRESSURE_ENABLED,
},
+ {
+ .name = "quiesce",
+ .read_u64 = cpuset_read_u64,
+ .write_u64 = cpuset_write_u64,
+ .private = FILE_CPU_QUIESCE,
+ },
+
{ } /* terminate */
};
@@ -2065,6 +2139,8 @@ int __init cpuset_init(void)
if (!alloc_cpumask_var(&cpus_attach, GFP_KERNEL))
BUG();
+ BUG_ON(!zalloc_cpumask_var(&cpuset_quiesced_cpus_mask, GFP_KERNEL));
+
number_of_cpusets = 1;
return 0;
}
--
1.7.12.rc2.18.g61b472e
^ permalink raw reply related [flat|nested] 13+ messages in thread* [PATCH V2 8/8] sched: don't queue timers on quiesced CPUs
2014-04-04 8:35 [PATCH V2 0/8] cpusets: Isolate CPUs via sysfs using cpusets Viresh Kumar
` (3 preceding siblings ...)
2014-04-04 8:35 ` [PATCH V2 7/8] cpuset: Create sysfs file: cpusets.quiesce to isolate CPUs Viresh Kumar
@ 2014-04-04 8:35 ` Viresh Kumar
[not found] ` <cover.1396599474.git.viresh.kumar-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
5 siblings, 0 replies; 13+ messages in thread
From: Viresh Kumar @ 2014-04-04 8:35 UTC (permalink / raw)
To: tglx, fweisbec, peterz, mingo, tj, lizefan
Cc: linaro-kernel, linaro-networking, Arvind.Chauhan, linux-kernel,
cgroups, Viresh Kumar
CPUSets have cpusets.quiesce sysfs file now, with which some CPUs can opt for
isolating themselves from background kernel activities, like: timers & hrtimers.
get_nohz_timer_target() is used for finding suitable CPU for firing a timer. To
guarantee that new timers wouldn't be queued on quiesced CPUs, we need to modify
this routine.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
kernel/sched/core.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index c0339e2..b235af2 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -557,17 +557,18 @@ void resched_cpu(int cpu)
*/
int get_nohz_timer_target(int pinned)
{
- int cpu = smp_processor_id();
- int i;
+ int cpu = smp_processor_id(), i;
struct sched_domain *sd;
- if (pinned || !get_sysctl_timer_migration() || !idle_cpu(cpu))
+ if (pinned || !get_sysctl_timer_migration() ||
+ !(idle_cpu(cpu) || cpu_quiesced(cpu)))
return cpu;
rcu_read_lock();
for_each_domain(cpu, sd) {
for_each_cpu(i, sched_domain_span(sd)) {
- if (!idle_cpu(i)) {
+ /* Don't push timers to quiesced CPUs */
+ if (!(cpu_quiesced(i) || idle_cpu(i))) {
cpu = i;
goto unlock;
}
--
1.7.12.rc2.18.g61b472e
^ permalink raw reply related [flat|nested] 13+ messages in thread[parent not found: <cover.1396599474.git.viresh.kumar-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>]
* [PATCH V2 2/8] timer: don't migrate pinned timers
[not found] ` <cover.1396599474.git.viresh.kumar-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
@ 2014-04-04 8:35 ` Viresh Kumar
2014-04-04 8:35 ` [PATCH V2 3/8] timer: create timer_quiesce_cpu() to isolate CPU from timers Viresh Kumar
` (2 subsequent siblings)
3 siblings, 0 replies; 13+ messages in thread
From: Viresh Kumar @ 2014-04-04 8:35 UTC (permalink / raw)
To: tglx-hfZtesqFncYOwBW4kG4KsQ, fweisbec-Re5JQEeQqe8AvxtiuMwx3w,
peterz-wEGCiKHe2LqWVfeAwA7xHQ, mingo-DgEjT+Ai2ygdnm+yROfE0A,
tj-DgEjT+Ai2ygdnm+yROfE0A, lizefan-hv44wF8Li93QT0dZR+AlfA
Cc: linaro-kernel-cunTk1MwBs8s++Sfvej+rw,
linaro-networking-QSEj5FYQhm4dnm+yROfE0A,
Arvind.Chauhan-5wv7dgnIgG8, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
cgroups-u79uwXL29TY76Z2rM5mHXA, Viresh Kumar
migrate_timer() is called when a CPU goes down and its timers are required to be
migrated to some other CPU. Its the responsibility of the users of the timer to
remove it before control reaches to migrate_timers().
As these were the pinned timers, the best we can do is: don't migrate these and
report to the user as well.
That's all this patch does.
Signed-off-by: Viresh Kumar <viresh.kumar-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
---
kernel/timer.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/kernel/timer.c b/kernel/timer.c
index e8bcaff..6c3a371 100644
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -1606,11 +1606,21 @@ static int init_timers_cpu(int cpu)
static void migrate_timer_list(struct tvec_base *new_base, struct list_head *head)
{
struct timer_list *timer;
+ int is_pinned;
while (!list_empty(head)) {
timer = list_first_entry(head, struct timer_list, entry);
/* We ignore the accounting on the dying cpu */
detach_timer(timer, false);
+
+ is_pinned = tbase_get_pinned(timer->base);
+
+ /* Check if CPU still has pinned timers */
+ if (unlikely(WARN(is_pinned,
+ "%s: can't migrate pinned timer: %p, deactivating it\n",
+ __func__, timer)))
+ continue;
+
timer_set_base(timer, new_base);
internal_add_timer(new_base, timer);
}
--
1.7.12.rc2.18.g61b472e
^ permalink raw reply related [flat|nested] 13+ messages in thread* [PATCH V2 3/8] timer: create timer_quiesce_cpu() to isolate CPU from timers
[not found] ` <cover.1396599474.git.viresh.kumar-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
2014-04-04 8:35 ` [PATCH V2 2/8] timer: don't migrate pinned timers Viresh Kumar
@ 2014-04-04 8:35 ` Viresh Kumar
2014-04-04 8:35 ` [PATCH V2 4/8] hrtimer: update timer->state with 'pinned' information Viresh Kumar
2014-04-06 8:30 ` [PATCH V2 0/8] cpusets: Isolate CPUs via sysfs using cpusets Mike Galbraith
3 siblings, 0 replies; 13+ messages in thread
From: Viresh Kumar @ 2014-04-04 8:35 UTC (permalink / raw)
To: tglx-hfZtesqFncYOwBW4kG4KsQ, fweisbec-Re5JQEeQqe8AvxtiuMwx3w,
peterz-wEGCiKHe2LqWVfeAwA7xHQ, mingo-DgEjT+Ai2ygdnm+yROfE0A,
tj-DgEjT+Ai2ygdnm+yROfE0A, lizefan-hv44wF8Li93QT0dZR+AlfA
Cc: linaro-kernel-cunTk1MwBs8s++Sfvej+rw,
linaro-networking-QSEj5FYQhm4dnm+yROfE0A,
Arvind.Chauhan-5wv7dgnIgG8, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
cgroups-u79uwXL29TY76Z2rM5mHXA, Viresh Kumar
To isolate CPUs (isolate from timers) from sysfs using cpusets, we need some
support from the timer core. i.e. A routine timer_quiesce_cpu() which would
migrate away all the unpinned timers, but shouldn't touch the pinned ones.
This patch creates this routine.
Signed-off-by: Viresh Kumar <viresh.kumar-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
---
include/linux/timer.h | 3 +++
kernel/timer.c | 54 ++++++++++++++++++++++++++++++++++++++++-----------
2 files changed, 46 insertions(+), 11 deletions(-)
diff --git a/include/linux/timer.h b/include/linux/timer.h
index 2962403..1588a4f 100644
--- a/include/linux/timer.h
+++ b/include/linux/timer.h
@@ -196,6 +196,9 @@ extern void set_timer_slack(struct timer_list *time, int slack_hz);
*/
extern unsigned long get_next_timer_interrupt(unsigned long now);
+/* To be used from cpusets, only */
+extern void timer_quiesce_cpu(void *cpup);
+
/*
* Timer-statistics info:
*/
diff --git a/kernel/timer.c b/kernel/timer.c
index 6c3a371..4676a07 100644
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -1602,18 +1602,27 @@ static int init_timers_cpu(int cpu)
return 0;
}
-#ifdef CONFIG_HOTPLUG_CPU
-static void migrate_timer_list(struct tvec_base *new_base, struct list_head *head)
+#if defined(CONFIG_HOTPLUG_CPU) || defined(CONFIG_CPUSETS)
+static void migrate_timer_list(struct tvec_base *new_base,
+ struct list_head *head, bool remove_pinned)
{
struct timer_list *timer;
+ struct list_head pinned_list;
int is_pinned;
+ INIT_LIST_HEAD(&pinned_list);
+
while (!list_empty(head)) {
timer = list_first_entry(head, struct timer_list, entry);
- /* We ignore the accounting on the dying cpu */
- detach_timer(timer, false);
is_pinned = tbase_get_pinned(timer->base);
+ if (!remove_pinned && is_pinned) {
+ list_move_tail(&timer->entry, &pinned_list);
+ continue;
+ } else {
+ /* We ignore the accounting on the dying cpu */
+ detach_timer(timer, false);
+ }
/* Check if CPU still has pinned timers */
if (unlikely(WARN(is_pinned,
@@ -1624,15 +1633,18 @@ static void migrate_timer_list(struct tvec_base *new_base, struct list_head *hea
timer_set_base(timer, new_base);
internal_add_timer(new_base, timer);
}
+
+ if (!list_empty(&pinned_list))
+ list_splice_tail(&pinned_list, head);
}
-static void migrate_timers(int cpu)
+/* Migrate timers from 'cpu' to this_cpu */
+static void __migrate_timers(int cpu, bool remove_pinned)
{
struct tvec_base *old_base;
struct tvec_base *new_base;
int i;
- BUG_ON(cpu_online(cpu));
old_base = per_cpu(tvec_bases, cpu);
new_base = get_cpu_var(tvec_bases);
/*
@@ -1645,20 +1657,40 @@ static void migrate_timers(int cpu)
BUG_ON(old_base->running_timer);
for (i = 0; i < TVR_SIZE; i++)
- migrate_timer_list(new_base, old_base->tv1.vec + i);
+ migrate_timer_list(new_base, old_base->tv1.vec + i,
+ remove_pinned);
for (i = 0; i < TVN_SIZE; i++) {
- migrate_timer_list(new_base, old_base->tv2.vec + i);
- migrate_timer_list(new_base, old_base->tv3.vec + i);
- migrate_timer_list(new_base, old_base->tv4.vec + i);
- migrate_timer_list(new_base, old_base->tv5.vec + i);
+ migrate_timer_list(new_base, old_base->tv2.vec + i,
+ remove_pinned);
+ migrate_timer_list(new_base, old_base->tv3.vec + i,
+ remove_pinned);
+ migrate_timer_list(new_base, old_base->tv4.vec + i,
+ remove_pinned);
+ migrate_timer_list(new_base, old_base->tv5.vec + i,
+ remove_pinned);
}
spin_unlock(&old_base->lock);
spin_unlock_irq(&new_base->lock);
put_cpu_var(tvec_bases);
}
+#endif /* CONFIG_HOTPLUG_CPU || CONFIG_CPUSETS */
+
+#ifdef CONFIG_HOTPLUG_CPU
+static void migrate_timers(int cpu)
+{
+ BUG_ON(cpu_online(cpu));
+ __migrate_timers(cpu, true);
+}
#endif /* CONFIG_HOTPLUG_CPU */
+#ifdef CONFIG_CPUSETS
+void timer_quiesce_cpu(void *cpup)
+{
+ __migrate_timers(*(int *)cpup, false);
+}
+#endif /* CONFIG_CPUSETS */
+
static int timer_cpu_notify(struct notifier_block *self,
unsigned long action, void *hcpu)
{
--
1.7.12.rc2.18.g61b472e
^ permalink raw reply related [flat|nested] 13+ messages in thread* [PATCH V2 4/8] hrtimer: update timer->state with 'pinned' information
[not found] ` <cover.1396599474.git.viresh.kumar-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
2014-04-04 8:35 ` [PATCH V2 2/8] timer: don't migrate pinned timers Viresh Kumar
2014-04-04 8:35 ` [PATCH V2 3/8] timer: create timer_quiesce_cpu() to isolate CPU from timers Viresh Kumar
@ 2014-04-04 8:35 ` Viresh Kumar
2014-04-06 8:30 ` [PATCH V2 0/8] cpusets: Isolate CPUs via sysfs using cpusets Mike Galbraith
3 siblings, 0 replies; 13+ messages in thread
From: Viresh Kumar @ 2014-04-04 8:35 UTC (permalink / raw)
To: tglx-hfZtesqFncYOwBW4kG4KsQ, fweisbec-Re5JQEeQqe8AvxtiuMwx3w,
peterz-wEGCiKHe2LqWVfeAwA7xHQ, mingo-DgEjT+Ai2ygdnm+yROfE0A,
tj-DgEjT+Ai2ygdnm+yROfE0A, lizefan-hv44wF8Li93QT0dZR+AlfA
Cc: linaro-kernel-cunTk1MwBs8s++Sfvej+rw,
linaro-networking-QSEj5FYQhm4dnm+yROfE0A,
Arvind.Chauhan-5wv7dgnIgG8, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
cgroups-u79uwXL29TY76Z2rM5mHXA, Viresh Kumar
'Pinned' information would be required in migrate_hrtimers() now, as we can
migrate non-pinned timers away without a hotplug (i.e. with cpuset.quiesce). And
so we may need to identify pinned timers now, as we can't migrate them.
This patch reuses the timer->state variable for setting this flag as there were
enough number of free bits available in this variable. And there is no point
increasing size of this struct by adding another field.
Signed-off-by: Viresh Kumar <viresh.kumar-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
---
include/linux/hrtimer.h | 3 +++
kernel/hrtimer.c | 12 ++++++++++--
2 files changed, 13 insertions(+), 2 deletions(-)
diff --git a/include/linux/hrtimer.h b/include/linux/hrtimer.h
index 435ac4c..9fdb67b 100644
--- a/include/linux/hrtimer.h
+++ b/include/linux/hrtimer.h
@@ -55,6 +55,7 @@ enum hrtimer_restart {
* 0x01 enqueued into rbtree
* 0x02 callback function running
* 0x04 timer is migrated to another cpu
+ * 0x08 timer is pinned to a cpu
*
* Special cases:
* 0x03 callback function running and enqueued
@@ -81,6 +82,8 @@ enum hrtimer_restart {
#define HRTIMER_STATE_ENQUEUED 0x01
#define HRTIMER_STATE_CALLBACK 0x02
#define HRTIMER_STATE_MIGRATE 0x04
+#define HRTIMER_PINNED_SHIFT 3
+#define HRTIMER_STATE_PINNED (1 << HRTIMER_PINNED_SHIFT)
/**
* struct hrtimer - the basic hrtimer structure
diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
index d62fe32..c5a4bf4 100644
--- a/kernel/hrtimer.c
+++ b/kernel/hrtimer.c
@@ -905,7 +905,11 @@ static void __remove_hrtimer(struct hrtimer *timer, unsigned long newstate,
hrtimer_force_reprogram(base->cpu_base, 1);
}
#endif
- timer->state = newstate;
+ /*
+ * We need to preserve PINNED state here, otherwise we may end up
+ * migrating pinned hrtimers as well.
+ */
+ timer->state = newstate | (timer->state & HRTIMER_STATE_PINNED);
}
/* remove hrtimer, called with base lock held */
@@ -970,6 +974,10 @@ int __hrtimer_start_range_ns(struct hrtimer *timer, ktime_t tim,
timer_stats_hrtimer_set_start_info(timer);
+ /* Update pinned state */
+ timer->state &= ~HRTIMER_STATE_PINNED;
+ timer->state |= !!(mode & HRTIMER_MODE_PINNED) << HRTIMER_PINNED_SHIFT;
+
enqueue_hrtimer(timer);
/*
@@ -1227,7 +1235,7 @@ static void __run_hrtimer(struct hrtimer *timer, ktime_t *now)
* hrtimer_start_range_ns() or in hrtimer_interrupt()
*/
if (restart != HRTIMER_NORESTART) {
- BUG_ON(timer->state != HRTIMER_STATE_CALLBACK);
+ BUG_ON(!(timer->state & HRTIMER_STATE_CALLBACK));
enqueue_hrtimer(timer);
}
--
1.7.12.rc2.18.g61b472e
^ permalink raw reply related [flat|nested] 13+ messages in thread* Re: [PATCH V2 0/8] cpusets: Isolate CPUs via sysfs using cpusets
[not found] ` <cover.1396599474.git.viresh.kumar-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
` (2 preceding siblings ...)
2014-04-04 8:35 ` [PATCH V2 4/8] hrtimer: update timer->state with 'pinned' information Viresh Kumar
@ 2014-04-06 8:30 ` Mike Galbraith
2014-04-07 4:11 ` Viresh Kumar
3 siblings, 1 reply; 13+ messages in thread
From: Mike Galbraith @ 2014-04-06 8:30 UTC (permalink / raw)
To: Viresh Kumar
Cc: tglx-hfZtesqFncYOwBW4kG4KsQ, fweisbec-Re5JQEeQqe8AvxtiuMwx3w,
peterz-wEGCiKHe2LqWVfeAwA7xHQ, mingo-DgEjT+Ai2ygdnm+yROfE0A,
tj-DgEjT+Ai2ygdnm+yROfE0A, lizefan-hv44wF8Li93QT0dZR+AlfA,
linaro-kernel-cunTk1MwBs8s++Sfvej+rw,
linaro-networking-QSEj5FYQhm4dnm+yROfE0A,
Arvind.Chauhan-5wv7dgnIgG8, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
cgroups-u79uwXL29TY76Z2rM5mHXA
On Fri, 2014-04-04 at 14:05 +0530, Viresh Kumar wrote:
> We need to migrate away all the background kernel activities (Unbound) for
> systems requiring isolation of cores (HPC, Real time, networking, etc). After
> creating cpusets, you can write 1 or 0 to cpuset.quiesce file.
I wonder if adding a quiesce switch is really necessary.
Seems to me that if you don't have load balancing turned off, you can't
be very concerned about perturbation, so this should be tied into the
load balancing on/off switch as an extension to isolating cores from the
#1 perturbation source, the scheduler.
I also didn't notice a check for is_cpu_exclusive() at a glance, which
would be a bug, but one that would go away if this additional isolation
were coupled to the existing isolation switch.
-Mike
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: [PATCH V2 0/8] cpusets: Isolate CPUs via sysfs using cpusets
2014-04-06 8:30 ` [PATCH V2 0/8] cpusets: Isolate CPUs via sysfs using cpusets Mike Galbraith
@ 2014-04-07 4:11 ` Viresh Kumar
[not found] ` <CAKohpomW3jpQEqFxD3m_br8GchJyq9-KSxW2x7JLCRqDgf4SAg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
0 siblings, 1 reply; 13+ messages in thread
From: Viresh Kumar @ 2014-04-07 4:11 UTC (permalink / raw)
To: Mike Galbraith
Cc: Thomas Gleixner, Frédéric Weisbecker, Peter Zijlstra,
Ingo Molnar, Tejun Heo, Li Zefan, Lists linaro-kernel,
Linaro Networking, Arvind Chauhan, Linux Kernel Mailing List,
Cgroups
Hi Mike,
On 6 April 2014 14:00, Mike Galbraith <umgwanakikbuti@gmail.com> wrote:
> I wonder if adding a quiesce switch is really necessary.
>
> Seems to me that if you don't have load balancing turned off, you can't
> be very concerned about perturbation, so this should be tied into the
> load balancing on/off switch as an extension to isolating cores from the
> #1 perturbation source, the scheduler.
Its more about not doing any background activities on these CPU which can
be avoided. So, even if a add_timer() is issued from these isolated CPUs, it
should goto the set chosen for doing background activity, unless add_timer_on()
has been issued, in which case user wants that code to execute on the
isolated core.
Probably, yes, people would be disabling load_balancing between these
cpusets to avoid migration of tasks to isolated core as well.. Atleast we
are using it :)
> I also didn't notice a check for is_cpu_exclusive() at a glance, which
> would be a bug, but one that would go away if this additional isolation
> were coupled to the existing isolation switch.
Yeah, there is no check for that. But I didn't got your point completely.
Why do I need to check for exclusivity on the isolated CPUs? So, that
same CPU isn't isolated as well as non-isolated on two separate sets?
Thanks for your feedback.
--
viresh
^ permalink raw reply [flat|nested] 13+ messages in thread