* [PATCH v6 00/20] kthread: Use kthread worker API more widely
@ 2016-04-14 15:14 Petr Mladek
2016-04-14 15:14 ` [PATCH v6 19/20] thermal/intel_powerclamp: Remove duplicated code that starts the kthread Petr Mladek
` (2 more replies)
0 siblings, 3 replies; 9+ messages in thread
From: Petr Mladek @ 2016-04-14 15:14 UTC (permalink / raw)
To: Andrew Morton, Oleg Nesterov, Tejun Heo, Ingo Molnar,
Peter Zijlstra
Cc: Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Vlastimil Babka,
linux-api-u79uwXL29TY76Z2rM5mHXA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA, Petr Mladek, Catalin Marinas,
linux-watchdog-u79uwXL29TY76Z2rM5mHXA, Corey Minyard,
openipmi-developer-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, Doug Ledford,
Sean Hefty, Hal Rosenstock, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
Maxim Levitsky, Zhang Rui, Eduardo Valentin,
Jacob Pan <jacob.jun.pa>
My intention is to make it easier to manipulate and maintain kthreads.
Especially, I want to replace all the custom main cycles with a
generic one. Also I want to make the kthreads sleep in a consistent
state in a common place when there is no work.
My first attempt was with a brand new API (iterant kthread), see
http://thread.gmane.org/gmane.linux.kernel.api/11892 . But I was
directed to improve the existing kthread worker API. This is
the 4th iteration of the new direction.
1nd..10th patches: improve the existing kthread worker API
11th..16th, 18th, 20th patches: convert several kthreads into
the kthread worker API, namely: khugepaged, ring buffer
benchmark, hung_task, kmemleak, ipmi, IB/fmr_pool,
memstick/r592, intel_powerclamp
17th, 19th patches: do some preparation steps; they usually do
some clean up that makes sense even without the conversion.
Changes against v5:
+ removed spin_trylock() from delayed_kthread_work_timer_fn();
instead temporary released worked->lock() when calling
del_timer_sync(); made sure that any queueing was blocked
by work->canceling in the meatime
+ used 0th byte for KTW_FREEZABLE to reduce confusion
+ fixed warnings in comments reported by make htmldocs
+ sigh, there was no easy way to create an empty va_list
that would work on all architectures; decided to make
@namefmt generic in create_kthread_worker_on_cpu()
+ converted khungtaskd a better way; it was inspired by
the recent changes that appeared in 4.6-rc1
Changes against v4:
+ added worker->delayed_work_list; it simplified the check
for pending work; we do not longer need the new timer_active()
function; also we do not need the link work->timer. On the
other hand we need to distinguish between the normal and
the delayed work by a boolean parameter passed to
the common functions, e.g. __cancel_kthread_work_sync()
+ replaced most try_lock repeat cycles with a WARN_ON();
the API does not allow to use the work with more workers;
so such a situation would be a bug; it removed the
complex try_lock_kthread_work() function that supported
more modes;
+ renamed kthread_work_pending() to queuing_blocked();
added this function later when really needed
+ renamed try_to_cancel_kthread_work() to __cancel_kthread_work();
in fact, this a common implementation for the async cancel()
function
+ removed a dull check for invalid cpu number in
create_kthread_worker_on_cpu(); removed some other unnecessary
code structures as suggested by Tejun
+ consistently used bool return value in all new __cancel functions
+ fixed ordering of cpu and flags parameters in
create_kthread_worker_on_cpu() vs. create_kthread_worker()
+ used memset in the init_kthread_worker()
+ updated many comments as suggested by Tejun and as
required the above changes
+ removed obsolete patch adding timer_active()
+ removed obsolete patch for using try_lock in flush_kthread_worker()
+ double checked all existing users of kthread worker API
that they reinitialized the work when the worker was started
and would not print false warnings; all looked fine
+ added taken acks for the Intel Powerclamp conversion
Changes against v3:
+ allow to free struct kthread_work from its callback; do not touch
the struct from the worker post-mortem; as a side effect, the structure
must be reinitialized when the worker gets restarted; updated
khugepaged, and kmemleak accordingly
+ call del_timer_sync() with worker->lock; instead, detect canceling
in the timer callback and give up an attempt to get the lock there;
do busy loop with spin_is_locked() to reduce cache bouncing
+ renamed ipmi+func() -> ipmi_kthread_worker_func() as suggested
by Corey
+ added some collected Reviewed-by
Changes against v2:
+ used worker->lock to synchronize the operations with the work
instead of the PENDING bit as suggested by Tejun Heo; it simplified
the implementation in several ways
+ added timer_active(); used it together with del_timer_sync()
to cancel the work a less tricky way
+ removed the controversial conversion of the RCU kthreads
+ added several other examples: hung_task, kmemleak, ipmi,
IB/fmr_pool, memstick/r592, intel_powerclamp
+ the helper fixes for the ring buffer benchmark has been improved
as suggested by Steven; they already are in the Linus tree now
+ fixed a possible race between the check for existing khugepaged
worker and queuing the work
Changes against v1:
+ remove wrappers to manipulate the scheduling policy and priority
+ remove questionable wakeup_and_destroy_kthread_worker() variant
+ do not check for chained work when draining the queue
+ allocate struct kthread worker in create_kthread_work() and
use more simple checks for running worker
+ add support for delayed kthread works and use them instead
of waiting inside the works
+ rework the "unrelated" fixes for the ring buffer benchmark
as discussed in the 1st RFC; also sent separately
+ convert also the consumer in the ring buffer benchmark
I have tested this patch set against the stable Linus tree
for 4.6-rc3.
Comments against v5 can be found at
http://thread.gmane.org/gmane.linux.kernel.mm/146726
Petr Mladek (20):
kthread/smpboot: Do not park in kthread_create_on_cpu()
kthread: Allow to call __kthread_create_on_node() with va_list args
kthread: Add create_kthread_worker*()
kthread: Add drain_kthread_worker()
kthread: Add destroy_kthread_worker()
kthread: Detect when a kthread work is used by more workers
kthread: Initial support for delayed kthread work
kthread: Allow to cancel kthread work
kthread: Allow to modify delayed kthread work
kthread: Better support freezable kthread workers
mm/huge_page: Convert khugepaged() into kthread worker API
ring_buffer: Convert benchmark kthreads into kthread worker API
hung_task: Convert hungtaskd into kthread worker API
kmemleak: Convert kmemleak kthread into kthread worker API
ipmi: Convert kipmi kthread into kthread worker API
IB/fmr_pool: Convert the cleanup thread into kthread worker API
memstick/r592: Better synchronize debug messages in r592_io kthread
memstick/r592: convert r592_io kthread into kthread worker API
thermal/intel_powerclamp: Remove duplicated code that starts the
kthread
thermal/intel_powerclamp: Convert the kthread to kthread worker API
drivers/char/ipmi/ipmi_si_intf.c | 121 ++++----
drivers/infiniband/core/fmr_pool.c | 54 ++--
drivers/memstick/host/r592.c | 61 ++--
drivers/memstick/host/r592.h | 5 +-
drivers/thermal/intel_powerclamp.c | 302 ++++++++++--------
include/linux/kthread.h | 57 ++++
kernel/hung_task.c | 83 +++--
kernel/kthread.c | 571 +++++++++++++++++++++++++++++++----
kernel/smpboot.c | 5 +
kernel/trace/ring_buffer_benchmark.c | 133 ++++----
mm/huge_memory.c | 138 +++++----
mm/kmemleak.c | 87 +++---
12 files changed, 1106 insertions(+), 511 deletions(-)
CC: Catalin Marinas <catalin.marinas-5wv7dgnIgG8@public.gmane.org>
CC: linux-watchdog-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
CC: Corey Minyard <minyard-HInyCGIudOg@public.gmane.org>
CC: openipmi-developer-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org
CC: Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
CC: Sean Hefty <sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
CC: Hal Rosenstock <hal.rosenstock-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
CC: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
CC: Maxim Levitsky <maximlevitsky-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
CC: Zhang Rui <rui.zhang-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
CC: Eduardo Valentin <edubezval-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
CC: Jacob Pan <jacob.jun.pan-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
CC: linux-pm-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
CC: Sebastian Andrzej Siewior <bigeasy-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org>
--
1.8.5.6
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH v6 19/20] thermal/intel_powerclamp: Remove duplicated code that starts the kthread
2016-04-14 15:14 [PATCH v6 00/20] kthread: Use kthread worker API more widely Petr Mladek
@ 2016-04-14 15:14 ` Petr Mladek
2016-04-14 15:14 ` [PATCH v6 20/20] thermal/intel_powerclamp: Convert the kthread to kthread worker API Petr Mladek
2016-04-22 18:30 ` [PATCH v6 00/20] kthread: Use kthread worker API more widely Tejun Heo
2 siblings, 0 replies; 9+ messages in thread
From: Petr Mladek @ 2016-04-14 15:14 UTC (permalink / raw)
To: Andrew Morton, Oleg Nesterov, Tejun Heo, Ingo Molnar,
Peter Zijlstra
Cc: Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
linux-mm, Vlastimil Babka, linux-api, linux-kernel, Petr Mladek,
Zhang Rui, Eduardo Valentin, Jacob Pan, Sebastian Andrzej Siewior,
linux-pm
This patch removes a code duplication. It does not modify
the functionality.
Signed-off-by: Petr Mladek <pmladek@suse.com>
CC: Zhang Rui <rui.zhang@intel.com>
CC: Eduardo Valentin <edubezval@gmail.com>
CC: Jacob Pan <jacob.jun.pan@linux.intel.com>
CC: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
CC: linux-pm@vger.kernel.org
Acked-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
drivers/thermal/intel_powerclamp.c | 45 +++++++++++++++++---------------------
1 file changed, 20 insertions(+), 25 deletions(-)
diff --git a/drivers/thermal/intel_powerclamp.c b/drivers/thermal/intel_powerclamp.c
index 6c79588251d5..cb32c38f9828 100644
--- a/drivers/thermal/intel_powerclamp.c
+++ b/drivers/thermal/intel_powerclamp.c
@@ -505,10 +505,27 @@ static void poll_pkg_cstate(struct work_struct *dummy)
schedule_delayed_work(&poll_pkg_cstate_work, HZ);
}
+static void start_power_clamp_thread(unsigned long cpu)
+{
+ struct task_struct **p = per_cpu_ptr(powerclamp_thread, cpu);
+ struct task_struct *thread;
+
+ thread = kthread_create_on_node(clamp_thread,
+ (void *) cpu,
+ cpu_to_node(cpu),
+ "kidle_inject/%ld", cpu);
+ if (IS_ERR(thread))
+ return;
+
+ /* bind to cpu here */
+ kthread_bind(thread, cpu);
+ wake_up_process(thread);
+ *p = thread;
+}
+
static int start_power_clamp(void)
{
unsigned long cpu;
- struct task_struct *thread;
/* check if pkg cstate counter is completely 0, abort in this case */
if (!has_pkg_state_counter()) {
@@ -530,20 +547,7 @@ static int start_power_clamp(void)
/* start one thread per online cpu */
for_each_online_cpu(cpu) {
- struct task_struct **p =
- per_cpu_ptr(powerclamp_thread, cpu);
-
- thread = kthread_create_on_node(clamp_thread,
- (void *) cpu,
- cpu_to_node(cpu),
- "kidle_inject/%ld", cpu);
- /* bind to cpu here */
- if (likely(!IS_ERR(thread))) {
- kthread_bind(thread, cpu);
- wake_up_process(thread);
- *p = thread;
- }
-
+ start_power_clamp_thread(cpu);
}
put_online_cpus();
@@ -575,7 +579,6 @@ static int powerclamp_cpu_callback(struct notifier_block *nfb,
unsigned long action, void *hcpu)
{
unsigned long cpu = (unsigned long)hcpu;
- struct task_struct *thread;
struct task_struct **percpu_thread =
per_cpu_ptr(powerclamp_thread, cpu);
@@ -584,15 +587,7 @@ static int powerclamp_cpu_callback(struct notifier_block *nfb,
switch (action) {
case CPU_ONLINE:
- thread = kthread_create_on_node(clamp_thread,
- (void *) cpu,
- cpu_to_node(cpu),
- "kidle_inject/%lu", cpu);
- if (likely(!IS_ERR(thread))) {
- kthread_bind(thread, cpu);
- wake_up_process(thread);
- *percpu_thread = thread;
- }
+ start_power_clamp_thread(cpu);
/* prefer BSP as controlling CPU */
if (cpu == 0) {
control_cpu = 0;
--
1.8.5.6
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH v6 20/20] thermal/intel_powerclamp: Convert the kthread to kthread worker API
2016-04-14 15:14 [PATCH v6 00/20] kthread: Use kthread worker API more widely Petr Mladek
2016-04-14 15:14 ` [PATCH v6 19/20] thermal/intel_powerclamp: Remove duplicated code that starts the kthread Petr Mladek
@ 2016-04-14 15:14 ` Petr Mladek
2016-08-25 8:33 ` Sebastian Andrzej Siewior
2016-04-22 18:30 ` [PATCH v6 00/20] kthread: Use kthread worker API more widely Tejun Heo
2 siblings, 1 reply; 9+ messages in thread
From: Petr Mladek @ 2016-04-14 15:14 UTC (permalink / raw)
To: Andrew Morton, Oleg Nesterov, Tejun Heo, Ingo Molnar,
Peter Zijlstra
Cc: Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
linux-mm, Vlastimil Babka, linux-api, linux-kernel, Petr Mladek,
Zhang Rui, Eduardo Valentin, Jacob Pan, Sebastian Andrzej Siewior,
linux-pm
Kthreads are currently implemented as an infinite loop. Each
has its own variant of checks for terminating, freezing,
awakening. In many cases it is unclear to say in which state
it is and sometimes it is done a wrong way.
The plan is to convert kthreads into kthread_worker or workqueues
API. It allows to split the functionality into separate operations.
It helps to make a better structure. Also it defines a clean state
where no locks are taken, IRQs blocked, the kthread might sleep
or even be safely migrated.
The kthread worker API is useful when we want to have a dedicated
single thread for the work. It helps to make sure that it is
available when needed. Also it allows a better control, e.g.
define a scheduling priority.
This patch converts the intel powerclamp kthreads into the kthread
worker because they need to have a good control over the assigned
CPUs.
IMHO, the most natural way is to split one cycle into two works.
First one does some balancing and let the CPU work normal
way for some time. The second work checks what the CPU has done
in the meantime and put it into C-state to reach the required
idle time ratio. The delay between the two works is achieved
by the delayed kthread work.
The two works have to share some data that used to be local
variables of the single kthread function. This is achieved
by the new per-CPU struct kthread_worker_data. It might look
as a complication. On the other hand, the long original kthread
function was not nice either.
The patch tries to avoid extra init and cleanup works. All the
actions might be done outside the thread. They are moved
to the functions that create or destroy the worker. Especially,
I checked that the timers are assigned to the right CPU.
The two works are queuing each other. It makes it a bit tricky to
break it when we want to stop the worker. We use the global and
per-worker "clamping" variables to make sure that the re-queuing
eventually stops. We also cancel the works to make it faster.
Note that the canceling is not reliable because the handling
of the two variables and queuing is not synchronized via a lock.
But it is not a big deal because it is just an optimization.
The job is stopped faster than before in most cases.
Signed-off-by: Petr Mladek <pmladek@suse.com>
CC: Zhang Rui <rui.zhang@intel.com>
CC: Eduardo Valentin <edubezval@gmail.com>
CC: Jacob Pan <jacob.jun.pan@linux.intel.com>
CC: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
CC: linux-pm@vger.kernel.org
Acked-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
drivers/thermal/intel_powerclamp.c | 287 ++++++++++++++++++++++---------------
1 file changed, 168 insertions(+), 119 deletions(-)
diff --git a/drivers/thermal/intel_powerclamp.c b/drivers/thermal/intel_powerclamp.c
index cb32c38f9828..c6f4058a572b 100644
--- a/drivers/thermal/intel_powerclamp.c
+++ b/drivers/thermal/intel_powerclamp.c
@@ -86,11 +86,27 @@ static unsigned int control_cpu; /* The cpu assigned to collect stat and update
*/
static bool clamping;
+static const struct sched_param sparam = {
+ .sched_priority = MAX_USER_RT_PRIO / 2,
+};
+struct powerclamp_worker_data {
+ struct kthread_worker *worker;
+ struct kthread_work balancing_work;
+ struct delayed_kthread_work idle_injection_work;
+ struct timer_list wakeup_timer;
+ unsigned int cpu;
+ unsigned int count;
+ unsigned int guard;
+ unsigned int window_size_now;
+ unsigned int target_ratio;
+ unsigned int duration_jiffies;
+ bool clamping;
+};
-static struct task_struct * __percpu *powerclamp_thread;
+static struct powerclamp_worker_data * __percpu worker_data;
static struct thermal_cooling_device *cooling_dev;
static unsigned long *cpu_clamping_mask; /* bit map for tracking per cpu
- * clamping thread
+ * clamping kthread worker
*/
static unsigned int duration;
@@ -368,100 +384,102 @@ static bool powerclamp_adjust_controls(unsigned int target_ratio,
return set_target_ratio + guard <= current_ratio;
}
-static int clamp_thread(void *arg)
+static void clamp_balancing_func(struct kthread_work *work)
{
- int cpunr = (unsigned long)arg;
- DEFINE_TIMER(wakeup_timer, noop_timer, 0, 0);
- static const struct sched_param param = {
- .sched_priority = MAX_USER_RT_PRIO/2,
- };
- unsigned int count = 0;
- unsigned int target_ratio;
+ struct powerclamp_worker_data *w_data;
+ int sleeptime;
+ unsigned long target_jiffies;
+ unsigned int compensation;
+ int interval; /* jiffies to sleep for each attempt */
- set_bit(cpunr, cpu_clamping_mask);
- set_freezable();
- init_timer_on_stack(&wakeup_timer);
- sched_setscheduler(current, SCHED_FIFO, ¶m);
-
- while (true == clamping && !kthread_should_stop() &&
- cpu_online(cpunr)) {
- int sleeptime;
- unsigned long target_jiffies;
- unsigned int guard;
- unsigned int compensation = 0;
- int interval; /* jiffies to sleep for each attempt */
- unsigned int duration_jiffies = msecs_to_jiffies(duration);
- unsigned int window_size_now;
-
- try_to_freeze();
- /*
- * make sure user selected ratio does not take effect until
- * the next round. adjust target_ratio if user has changed
- * target such that we can converge quickly.
- */
- target_ratio = set_target_ratio;
- guard = 1 + target_ratio/20;
- window_size_now = window_size;
- count++;
+ w_data = container_of(work, struct powerclamp_worker_data,
+ balancing_work);
- /*
- * systems may have different ability to enter package level
- * c-states, thus we need to compensate the injected idle ratio
- * to achieve the actual target reported by the HW.
- */
- compensation = get_compensation(target_ratio);
- interval = duration_jiffies*100/(target_ratio+compensation);
-
- /* align idle time */
- target_jiffies = roundup(jiffies, interval);
- sleeptime = target_jiffies - jiffies;
- if (sleeptime <= 0)
- sleeptime = 1;
- schedule_timeout_interruptible(sleeptime);
- /*
- * only elected controlling cpu can collect stats and update
- * control parameters.
- */
- if (cpunr == control_cpu && !(count%window_size_now)) {
- should_skip =
- powerclamp_adjust_controls(target_ratio,
- guard, window_size_now);
- smp_mb();
- }
+ /*
+ * make sure user selected ratio does not take effect until
+ * the next round. adjust target_ratio if user has changed
+ * target such that we can converge quickly.
+ */
+ w_data->target_ratio = READ_ONCE(set_target_ratio);
+ w_data->guard = 1 + w_data->target_ratio / 20;
+ w_data->window_size_now = window_size;
+ w_data->duration_jiffies = msecs_to_jiffies(duration);
+ w_data->count++;
+
+ /*
+ * systems may have different ability to enter package level
+ * c-states, thus we need to compensate the injected idle ratio
+ * to achieve the actual target reported by the HW.
+ */
+ compensation = get_compensation(w_data->target_ratio);
+ interval = w_data->duration_jiffies * 100 /
+ (w_data->target_ratio + compensation);
+
+ /* align idle time */
+ target_jiffies = roundup(jiffies, interval);
+ sleeptime = target_jiffies - jiffies;
+ if (sleeptime <= 0)
+ sleeptime = 1;
+
+ if (clamping && w_data->clamping && cpu_online(w_data->cpu))
+ queue_delayed_kthread_work(w_data->worker,
+ &w_data->idle_injection_work,
+ sleeptime);
+}
+
+static void clamp_idle_injection_func(struct kthread_work *work)
+{
+ struct powerclamp_worker_data *w_data;
+ unsigned long target_jiffies;
+
+ w_data = container_of(work, struct powerclamp_worker_data,
+ idle_injection_work.work);
+
+ /*
+ * only elected controlling cpu can collect stats and update
+ * control parameters.
+ */
+ if (w_data->cpu == control_cpu &&
+ !(w_data->count % w_data->window_size_now)) {
+ should_skip =
+ powerclamp_adjust_controls(w_data->target_ratio,
+ w_data->guard,
+ w_data->window_size_now);
+ smp_mb();
+ }
- if (should_skip)
- continue;
+ if (should_skip)
+ goto balance;
+
+ target_jiffies = jiffies + w_data->duration_jiffies;
+ mod_timer(&w_data->wakeup_timer, target_jiffies);
+ if (unlikely(local_softirq_pending()))
+ goto balance;
+ /*
+ * stop tick sched during idle time, interrupts are still
+ * allowed. thus jiffies are updated properly.
+ */
+ preempt_disable();
+ /* mwait until target jiffies is reached */
+ while (time_before(jiffies, target_jiffies)) {
+ unsigned long ecx = 1;
+ unsigned long eax = target_mwait;
- target_jiffies = jiffies + duration_jiffies;
- mod_timer(&wakeup_timer, target_jiffies);
- if (unlikely(local_softirq_pending()))
- continue;
/*
- * stop tick sched during idle time, interrupts are still
- * allowed. thus jiffies are updated properly.
+ * REVISIT: may call enter_idle() to notify drivers who
+ * can save power during cpu idle. same for exit_idle()
*/
- preempt_disable();
- /* mwait until target jiffies is reached */
- while (time_before(jiffies, target_jiffies)) {
- unsigned long ecx = 1;
- unsigned long eax = target_mwait;
-
- /*
- * REVISIT: may call enter_idle() to notify drivers who
- * can save power during cpu idle. same for exit_idle()
- */
- local_touch_nmi();
- stop_critical_timings();
- mwait_idle_with_hints(eax, ecx);
- start_critical_timings();
- atomic_inc(&idle_wakeup_counter);
- }
- preempt_enable();
+ local_touch_nmi();
+ stop_critical_timings();
+ mwait_idle_with_hints(eax, ecx);
+ start_critical_timings();
+ atomic_inc(&idle_wakeup_counter);
}
- del_timer_sync(&wakeup_timer);
- clear_bit(cpunr, cpu_clamping_mask);
+ preempt_enable();
- return 0;
+balance:
+ if (clamping && w_data->clamping && cpu_online(w_data->cpu))
+ queue_kthread_work(w_data->worker, &w_data->balancing_work);
}
/*
@@ -505,22 +523,58 @@ static void poll_pkg_cstate(struct work_struct *dummy)
schedule_delayed_work(&poll_pkg_cstate_work, HZ);
}
-static void start_power_clamp_thread(unsigned long cpu)
+static void start_power_clamp_worker(unsigned long cpu)
{
- struct task_struct **p = per_cpu_ptr(powerclamp_thread, cpu);
- struct task_struct *thread;
-
- thread = kthread_create_on_node(clamp_thread,
- (void *) cpu,
- cpu_to_node(cpu),
- "kidle_inject/%ld", cpu);
- if (IS_ERR(thread))
+ struct powerclamp_worker_data *w_data = per_cpu_ptr(worker_data, cpu);
+ struct kthread_worker *worker;
+
+ worker = create_kthread_worker_on_cpu(cpu, KTW_FREEZABLE,
+ "kidle_inject/%ld", cpu);
+ if (IS_ERR(worker))
return;
- /* bind to cpu here */
- kthread_bind(thread, cpu);
- wake_up_process(thread);
- *p = thread;
+ w_data->worker = worker;
+ w_data->count = 0;
+ w_data->cpu = cpu;
+ w_data->clamping = true;
+ set_bit(cpu, cpu_clamping_mask);
+ setup_timer(&w_data->wakeup_timer, noop_timer, 0);
+ sched_setscheduler(worker->task, SCHED_FIFO, &sparam);
+ init_kthread_work(&w_data->balancing_work, clamp_balancing_func);
+ init_delayed_kthread_work(&w_data->idle_injection_work,
+ clamp_idle_injection_func);
+ queue_kthread_work(w_data->worker, &w_data->balancing_work);
+}
+
+static void stop_power_clamp_worker(unsigned long cpu)
+{
+ struct powerclamp_worker_data *w_data = per_cpu_ptr(worker_data, cpu);
+
+ if (!w_data->worker)
+ return;
+
+ w_data->clamping = false;
+ /*
+ * Make sure that all works that get queued after this point see
+ * the clamping disabled. The counter part is not needed because
+ * there is an implicit memory barrier when the queued work
+ * is proceed.
+ */
+ smp_wmb();
+ cancel_kthread_work_sync(&w_data->balancing_work);
+ cancel_delayed_kthread_work_sync(&w_data->idle_injection_work);
+ /*
+ * The balancing work still might be queued here because
+ * the handling of the "clapming" variable, cancel, and queue
+ * operations are not synchronized via a lock. But it is not
+ * a big deal. The balancing work is fast and destroy kthread
+ * will wait for it.
+ */
+ del_timer_sync(&w_data->wakeup_timer);
+ clear_bit(w_data->cpu, cpu_clamping_mask);
+ destroy_kthread_worker(w_data->worker);
+
+ w_data->worker = NULL;
}
static int start_power_clamp(void)
@@ -545,9 +599,9 @@ static int start_power_clamp(void)
clamping = true;
schedule_delayed_work(&poll_pkg_cstate_work, 0);
- /* start one thread per online cpu */
+ /* start one kthread worker per online cpu */
for_each_online_cpu(cpu) {
- start_power_clamp_thread(cpu);
+ start_power_clamp_worker(cpu);
}
put_online_cpus();
@@ -557,20 +611,17 @@ static int start_power_clamp(void)
static void end_power_clamp(void)
{
int i;
- struct task_struct *thread;
- clamping = false;
/*
- * make clamping visible to other cpus and give per cpu clamping threads
- * sometime to exit, or gets killed later.
+ * Block requeuing in all the kthread workers. They will drain and
+ * stop faster.
*/
- smp_mb();
- msleep(20);
+ clamping = false;
if (bitmap_weight(cpu_clamping_mask, num_possible_cpus())) {
for_each_set_bit(i, cpu_clamping_mask, num_possible_cpus()) {
- pr_debug("clamping thread for cpu %d alive, kill\n", i);
- thread = *per_cpu_ptr(powerclamp_thread, i);
- kthread_stop(thread);
+ pr_debug("clamping worker for cpu %d alive, destroy\n",
+ i);
+ stop_power_clamp_worker(i);
}
}
}
@@ -579,15 +630,13 @@ static int powerclamp_cpu_callback(struct notifier_block *nfb,
unsigned long action, void *hcpu)
{
unsigned long cpu = (unsigned long)hcpu;
- struct task_struct **percpu_thread =
- per_cpu_ptr(powerclamp_thread, cpu);
if (false == clamping)
goto exit_ok;
switch (action) {
case CPU_ONLINE:
- start_power_clamp_thread(cpu);
+ start_power_clamp_worker(cpu);
/* prefer BSP as controlling CPU */
if (cpu == 0) {
control_cpu = 0;
@@ -598,7 +647,7 @@ static int powerclamp_cpu_callback(struct notifier_block *nfb,
if (test_bit(cpu, cpu_clamping_mask)) {
pr_err("cpu %lu dead but powerclamping thread is not\n",
cpu);
- kthread_stop(*percpu_thread);
+ stop_power_clamp_worker(cpu);
}
if (cpu == control_cpu) {
control_cpu = smp_processor_id();
@@ -785,8 +834,8 @@ static int __init powerclamp_init(void)
window_size = 2;
register_hotcpu_notifier(&powerclamp_cpu_notifier);
- powerclamp_thread = alloc_percpu(struct task_struct *);
- if (!powerclamp_thread) {
+ worker_data = alloc_percpu(struct powerclamp_worker_data);
+ if (!worker_data) {
retval = -ENOMEM;
goto exit_unregister;
}
@@ -806,7 +855,7 @@ static int __init powerclamp_init(void)
return 0;
exit_free_thread:
- free_percpu(powerclamp_thread);
+ free_percpu(worker_data);
exit_unregister:
unregister_hotcpu_notifier(&powerclamp_cpu_notifier);
exit_free:
@@ -819,7 +868,7 @@ static void __exit powerclamp_exit(void)
{
unregister_hotcpu_notifier(&powerclamp_cpu_notifier);
end_power_clamp();
- free_percpu(powerclamp_thread);
+ free_percpu(worker_data);
thermal_cooling_device_unregister(cooling_dev);
kfree(cpu_clamping_mask);
--
1.8.5.6
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH v6 00/20] kthread: Use kthread worker API more widely
2016-04-14 15:14 [PATCH v6 00/20] kthread: Use kthread worker API more widely Petr Mladek
2016-04-14 15:14 ` [PATCH v6 19/20] thermal/intel_powerclamp: Remove duplicated code that starts the kthread Petr Mladek
2016-04-14 15:14 ` [PATCH v6 20/20] thermal/intel_powerclamp: Convert the kthread to kthread worker API Petr Mladek
@ 2016-04-22 18:30 ` Tejun Heo
2016-05-11 10:52 ` Petr Mladek
2 siblings, 1 reply; 9+ messages in thread
From: Tejun Heo @ 2016-04-22 18:30 UTC (permalink / raw)
To: Petr Mladek
Cc: Andrew Morton, Oleg Nesterov, Ingo Molnar, Peter Zijlstra,
Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
linux-mm, Vlastimil Babka, linux-api, linux-kernel,
Catalin Marinas, linux-watchdog, Corey Minyard,
openipmi-developer, Doug Ledford, Sean Hefty, Hal Rosenstock,
linux-rdma, Maxim Levitsky <maximl>
Hello, Petr.
On Thu, Apr 14, 2016 at 05:14:19PM +0200, Petr Mladek wrote:
> My intention is to make it easier to manipulate and maintain kthreads.
> Especially, I want to replace all the custom main cycles with a
> generic one. Also I want to make the kthreads sleep in a consistent
> state in a common place when there is no work.
>
> My first attempt was with a brand new API (iterant kthread), see
> http://thread.gmane.org/gmane.linux.kernel.api/11892 . But I was
> directed to improve the existing kthread worker API. This is
> the 4th iteration of the new direction.
>
> 1nd..10th patches: improve the existing kthread worker API
I glanced over them and they generally look good to me. Let's see how
people respond to actual conversions.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v6 00/20] kthread: Use kthread worker API more widely
2016-04-22 18:30 ` [PATCH v6 00/20] kthread: Use kthread worker API more widely Tejun Heo
@ 2016-05-11 10:52 ` Petr Mladek
2016-05-25 20:42 ` Tejun Heo
0 siblings, 1 reply; 9+ messages in thread
From: Petr Mladek @ 2016-05-11 10:52 UTC (permalink / raw)
To: Tejun Heo, Andrew Morton
Cc: Oleg Nesterov, Ingo Molnar, Peter Zijlstra, Steven Rostedt,
Paul E. McKenney, Josh Triplett, Thomas Gleixner, Linus Torvalds,
Jiri Kosina, Borislav Petkov, Michal Hocko, linux-mm,
Vlastimil Babka, linux-api, linux-kernel, Catalin Marinas,
linux-watchdog, Corey Minyard, openipmi-developer, Doug Ledford,
Sean Hefty, Hal Rosenstock, linux-rdma, Maxim Levitsky, Zhang Rui
On Fri 2016-04-22 14:30:40, Tejun Heo wrote:
> Hello, Petr.
>
> On Thu, Apr 14, 2016 at 05:14:19PM +0200, Petr Mladek wrote:
> > My intention is to make it easier to manipulate and maintain kthreads.
> > Especially, I want to replace all the custom main cycles with a
> > generic one. Also I want to make the kthreads sleep in a consistent
> > state in a common place when there is no work.
> >
> > My first attempt was with a brand new API (iterant kthread), see
> > http://thread.gmane.org/gmane.linux.kernel.api/11892 . But I was
> > directed to improve the existing kthread worker API. This is
> > the 4th iteration of the new direction.
> >
> > 1nd..10th patches: improve the existing kthread worker API
>
> I glanced over them and they generally look good to me. Let's see how
> people respond to actual conversions.
The part improving the kthread worker API and the intel powerclamp
conversion seem to be ready for the mainline. But it is getting too late
for 4.7.
I am going to resend this part of the patch set separately after
the 4.7 merge window finishes with the aim for 4.8. The other
conversions are spread over many subsystems, so I will send
them separately.
Tejun, may I add your ack for some of the patches, please?
Or do you want to wait for the resend?
Andrew, I wonder if it could go via the -mm tree once I get
the acks.
Best Regards,
Petr
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v6 00/20] kthread: Use kthread worker API more widely
2016-05-11 10:52 ` Petr Mladek
@ 2016-05-25 20:42 ` Tejun Heo
0 siblings, 0 replies; 9+ messages in thread
From: Tejun Heo @ 2016-05-25 20:42 UTC (permalink / raw)
To: Petr Mladek
Cc: Andrew Morton, Oleg Nesterov, Ingo Molnar, Peter Zijlstra,
Steven Rostedt, Paul E. McKenney, Josh Triplett, Thomas Gleixner,
Linus Torvalds, Jiri Kosina, Borislav Petkov, Michal Hocko,
linux-mm@kvack.org, Vlastimil Babka, Linux API, lkml,
Catalin Marinas, linux-watchdog, Corey Minyard,
openipmi-developer, Doug Ledford, Sean Hefty, Hal Rosenstock
On Wed, May 11, 2016 at 6:52 AM, Petr Mladek <pmladek@suse.com> wrote:
> Tejun, may I add your ack for some of the patches, please?
> Or do you want to wait for the resend?
When you repost, I'll explicitly ack the patches.
Thanks!
--
tejun
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v6 20/20] thermal/intel_powerclamp: Convert the kthread to kthread worker API
2016-04-14 15:14 ` [PATCH v6 20/20] thermal/intel_powerclamp: Convert the kthread to kthread worker API Petr Mladek
@ 2016-08-25 8:33 ` Sebastian Andrzej Siewior
2016-08-25 11:37 ` Petr Mladek
0 siblings, 1 reply; 9+ messages in thread
From: Sebastian Andrzej Siewior @ 2016-08-25 8:33 UTC (permalink / raw)
To: Petr Mladek
Cc: Andrew Morton, Oleg Nesterov, Tejun Heo, Ingo Molnar,
Peter Zijlstra, Steven Rostedt, Paul E. McKenney, Josh Triplett,
Thomas Gleixner, Linus Torvalds, Jiri Kosina, Borislav Petkov,
Michal Hocko, linux-mm, Vlastimil Babka, linux-api, linux-kernel,
Zhang Rui, Eduardo Valentin, Jacob Pan, linux-pm
On 2016-04-14 17:14:39 [+0200], Petr Mladek wrote:
> Kthreads are currently implemented as an infinite loop. Each
> has its own variant of checks for terminating, freezing,
> awakening. In many cases it is unclear to say in which state
> it is and sometimes it is done a wrong way.
What is the status of this? This is the last email I received and it is
from April.
Sebastian
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v6 20/20] thermal/intel_powerclamp: Convert the kthread to kthread worker API
2016-08-25 8:33 ` Sebastian Andrzej Siewior
@ 2016-08-25 11:37 ` Petr Mladek
2016-08-25 11:44 ` Sebastian Andrzej Siewior
0 siblings, 1 reply; 9+ messages in thread
From: Petr Mladek @ 2016-08-25 11:37 UTC (permalink / raw)
To: Sebastian Andrzej Siewior
Cc: Andrew Morton, Oleg Nesterov, Tejun Heo, Ingo Molnar,
Peter Zijlstra, Steven Rostedt, Paul E. McKenney, Josh Triplett,
Thomas Gleixner, Linus Torvalds, Jiri Kosina, Borislav Petkov,
Michal Hocko, linux-mm, Vlastimil Babka, linux-api, linux-kernel,
Zhang Rui, Eduardo Valentin, Jacob Pan, linux-pm
On Thu 2016-08-25 10:33:17, Sebastian Andrzej Siewior wrote:
> On 2016-04-14 17:14:39 [+0200], Petr Mladek wrote:
> > Kthreads are currently implemented as an infinite loop. Each
> > has its own variant of checks for terminating, freezing,
> > awakening. In many cases it is unclear to say in which state
> > it is and sometimes it is done a wrong way.
>
> What is the status of this? This is the last email I received and it is
> from April.
There were still some discussions about the kthread worker API.
Anyway, the needed kthread API changes are in Andrew's -mm tree now
and will be hopefully included in 4.9.
I did not want to send the patches using the API before the API
changes are upstream. But I could send the two intel_powerclamp
patches now if you are comfortable with having them on top of
the -mm tree or linux-next.
Best Regards,
Petr
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v6 20/20] thermal/intel_powerclamp: Convert the kthread to kthread worker API
2016-08-25 11:37 ` Petr Mladek
@ 2016-08-25 11:44 ` Sebastian Andrzej Siewior
0 siblings, 0 replies; 9+ messages in thread
From: Sebastian Andrzej Siewior @ 2016-08-25 11:44 UTC (permalink / raw)
To: Petr Mladek
Cc: Andrew Morton, Oleg Nesterov, Tejun Heo, Ingo Molnar,
Peter Zijlstra, Steven Rostedt, Paul E. McKenney, Josh Triplett,
Thomas Gleixner, Linus Torvalds, Jiri Kosina, Borislav Petkov,
Michal Hocko, linux-mm, Vlastimil Babka, linux-api, linux-kernel,
Zhang Rui, Eduardo Valentin, Jacob Pan, linux-pm
On 2016-08-25 13:37:08 [+0200], Petr Mladek wrote:
> There were still some discussions about the kthread worker API.
> Anyway, the needed kthread API changes are in Andrew's -mm tree now
> and will be hopefully included in 4.9.
Thanks for the update.
> I did not want to send the patches using the API before the API
> changes are upstream. But I could send the two intel_powerclamp
> patches now if you are comfortable with having them on top of
> the -mm tree or linux-next.
I am refreshing my hotplug queue and stumbled over my old powerclamp
patch. Please send them (offline) so I can have a look :) And I add a
note for powerclaml to be v4.9 or so.
> Best Regards,
> Petr
Sebastian
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2016-08-25 11:44 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-04-14 15:14 [PATCH v6 00/20] kthread: Use kthread worker API more widely Petr Mladek
2016-04-14 15:14 ` [PATCH v6 19/20] thermal/intel_powerclamp: Remove duplicated code that starts the kthread Petr Mladek
2016-04-14 15:14 ` [PATCH v6 20/20] thermal/intel_powerclamp: Convert the kthread to kthread worker API Petr Mladek
2016-08-25 8:33 ` Sebastian Andrzej Siewior
2016-08-25 11:37 ` Petr Mladek
2016-08-25 11:44 ` Sebastian Andrzej Siewior
2016-04-22 18:30 ` [PATCH v6 00/20] kthread: Use kthread worker API more widely Tejun Heo
2016-05-11 10:52 ` Petr Mladek
2016-05-25 20:42 ` Tejun Heo
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).