* [RFC/PATCH 0/3] rt: workqueue PI support
@ 2007-10-22 9:50 Peter Zijlstra
2007-10-22 9:50 ` [RFC/PATCH 1/3] rt: rename rt_mutex_setprio to task_setprio Peter Zijlstra
` (2 more replies)
0 siblings, 3 replies; 11+ messages in thread
From: Peter Zijlstra @ 2007-10-22 9:50 UTC (permalink / raw)
To: linux-kernel
Cc: Daniel Walker, Steven Rostedt, Ingo Molnar, Thomas Gleixner,
Gregory Haskins, Oleg Nesterov
Hi,
I revived the PI-workqueue effort. I took daniel's plist patch and went about
fixing the only outstanding issue that I could remember, barriers.
This patch compiles and boots on x86_64, not much testing has been done.
--
^ permalink raw reply [flat|nested] 11+ messages in thread
* [RFC/PATCH 1/3] rt: rename rt_mutex_setprio to task_setprio
2007-10-22 9:50 [RFC/PATCH 0/3] rt: workqueue PI support Peter Zijlstra
@ 2007-10-22 9:50 ` Peter Zijlstra
2007-10-22 9:50 ` [RFC/PATCH 2/3] rt: PI-workqueue support Peter Zijlstra
2007-10-22 9:50 ` [RFC/PATCH 3/3] rt: PI-workqueue: fix barriers Peter Zijlstra
2 siblings, 0 replies; 11+ messages in thread
From: Peter Zijlstra @ 2007-10-22 9:50 UTC (permalink / raw)
To: linux-kernel
Cc: Daniel Walker, Steven Rostedt, Ingo Molnar, Thomas Gleixner,
Gregory Haskins, Oleg Nesterov, Peter Zijlstra
[-- Attachment #1: rt_mutex_setprio.patch --]
[-- Type: text/plain, Size: 2688 bytes --]
With there being multiple non-mutex users of this function its past time it
got renamed.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
include/linux/sched.h | 7 ++++++-
kernel/rcupreempt-boost.c | 4 ++--
kernel/sched.c | 8 ++------
3 files changed, 10 insertions(+), 9 deletions(-)
Index: linux-2.6/include/linux/sched.h
===================================================================
--- linux-2.6.orig/include/linux/sched.h
+++ linux-2.6/include/linux/sched.h
@@ -1655,9 +1655,14 @@ extern unsigned int sysctl_sched_compat_
extern unsigned int sysctl_sched_child_runs_first;
extern unsigned int sysctl_sched_features;
+extern void task_setprio(struct task_struct *p, int prio);
+
#ifdef CONFIG_RT_MUTEXES
extern int rt_mutex_getprio(struct task_struct *p);
-extern void rt_mutex_setprio(struct task_struct *p, int prio);
+static inline void rt_mutex_setprio(struct task_struct *p, int prio)
+{
+ task_setprio(p, prio);
+}
extern void rt_mutex_adjust_pi(struct task_struct *p);
#else
static inline int rt_mutex_getprio(struct task_struct *p)
Index: linux-2.6/kernel/sched.c
===================================================================
--- linux-2.6.orig/kernel/sched.c
+++ linux-2.6/kernel/sched.c
@@ -4445,10 +4445,8 @@ long __sched sleep_on_timeout(wait_queue
}
EXPORT_SYMBOL(sleep_on_timeout);
-#ifdef CONFIG_RT_MUTEXES
-
/*
- * rt_mutex_setprio - set the current priority of a task
+ * task_setprio - set the current priority of a task
* @p: task
* @prio: prio value (kernel-internal form)
*
@@ -4457,7 +4455,7 @@ EXPORT_SYMBOL(sleep_on_timeout);
*
* Used by the rt_mutex code to implement priority inheritance logic.
*/
-void rt_mutex_setprio(struct task_struct *p, int prio)
+void task_setprio(struct task_struct *p, int prio)
{
unsigned long flags;
int oldprio, prev_resched, on_rq;
@@ -4522,8 +4520,6 @@ out_unlock:
task_rq_unlock(rq, &flags);
}
-#endif
-
void set_user_nice(struct task_struct *p, long nice)
{
int old_prio, delta, on_rq;
Index: linux-2.6/kernel/rcupreempt-boost.c
===================================================================
--- linux-2.6.orig/kernel/rcupreempt-boost.c
+++ linux-2.6/kernel/rcupreempt-boost.c
@@ -233,7 +233,7 @@ static void rcu_boost_task(struct task_s
if (task->rcu_prio < task->prio) {
rcu_trace_boost_task_boosted(RCU_BOOST_ME);
- rt_mutex_setprio(task, task->rcu_prio);
+ task_setprio(task, task->rcu_prio);
}
}
@@ -325,7 +325,7 @@ void __rcu_preempt_unboost(void)
spin_lock(&curr->pi_lock);
prio = rt_mutex_getprio(curr);
- rt_mutex_setprio(curr, prio);
+ task_setprio(curr, prio);
curr->rcub_rbdp = NULL;
--
^ permalink raw reply [flat|nested] 11+ messages in thread
* [RFC/PATCH 2/3] rt: PI-workqueue support
2007-10-22 9:50 [RFC/PATCH 0/3] rt: workqueue PI support Peter Zijlstra
2007-10-22 9:50 ` [RFC/PATCH 1/3] rt: rename rt_mutex_setprio to task_setprio Peter Zijlstra
@ 2007-10-22 9:50 ` Peter Zijlstra
2007-10-22 12:00 ` Steven Rostedt
2007-10-22 9:50 ` [RFC/PATCH 3/3] rt: PI-workqueue: fix barriers Peter Zijlstra
2 siblings, 1 reply; 11+ messages in thread
From: Peter Zijlstra @ 2007-10-22 9:50 UTC (permalink / raw)
To: linux-kernel
Cc: Daniel Walker, Steven Rostedt, Ingo Molnar, Thomas Gleixner,
Gregory Haskins, Oleg Nesterov, Peter Zijlstra
[-- Attachment #1: rt-workqeue-prio.patch --]
[-- Type: text/plain, Size: 6334 bytes --]
Add support for priority queueing and priority inheritance to the workqueue
infrastructure. This is done by replacing the linear linked worklist with a
priority sorted plist.
The drawback is that this breaks the workqueue barrier, needed to support
flush_workqueue() and wait_on_work().
Signed-off-by: Daniel Walker <dwalker@mvista.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
include/linux/workqueue.h | 7 ++++---
kernel/power/poweroff.c | 1 +
kernel/sched.c | 4 ----
kernel/workqueue.c | 40 +++++++++++++++++++++++++---------------
4 files changed, 30 insertions(+), 22 deletions(-)
Index: linux-2.6/include/linux/workqueue.h
===================================================================
--- linux-2.6.orig/include/linux/workqueue.h
+++ linux-2.6/include/linux/workqueue.h
@@ -8,6 +8,7 @@
#include <linux/timer.h>
#include <linux/linkage.h>
#include <linux/bitops.h>
+#include <linux/plist.h>
#include <asm/atomic.h>
struct workqueue_struct;
@@ -26,7 +27,7 @@ struct work_struct {
#define WORK_STRUCT_PENDING 0 /* T if work item pending execution */
#define WORK_STRUCT_FLAG_MASK (3UL)
#define WORK_STRUCT_WQ_DATA_MASK (~WORK_STRUCT_FLAG_MASK)
- struct list_head entry;
+ struct plist_node entry;
work_func_t func;
};
@@ -43,7 +44,7 @@ struct execute_work {
#define __WORK_INITIALIZER(n, f) { \
.data = WORK_DATA_INIT(), \
- .entry = { &(n).entry, &(n).entry }, \
+ .entry = PLIST_NODE_INIT(n.entry, MAX_PRIO), \
.func = (f), \
}
@@ -79,7 +80,7 @@ struct execute_work {
#define INIT_WORK(_work, _func) \
do { \
(_work)->data = (atomic_long_t) WORK_DATA_INIT(); \
- INIT_LIST_HEAD(&(_work)->entry); \
+ plist_node_init(&(_work)->entry, -1); \
PREPARE_WORK((_work), (_func)); \
} while (0)
Index: linux-2.6/kernel/power/poweroff.c
===================================================================
--- linux-2.6.orig/kernel/power/poweroff.c
+++ linux-2.6/kernel/power/poweroff.c
@@ -8,6 +8,7 @@
#include <linux/sysrq.h>
#include <linux/init.h>
#include <linux/pm.h>
+#include <linux/sched.h>
#include <linux/workqueue.h>
#include <linux/reboot.h>
Index: linux-2.6/kernel/workqueue.c
===================================================================
--- linux-2.6.orig/kernel/workqueue.c
+++ linux-2.6/kernel/workqueue.c
@@ -44,7 +44,7 @@ struct cpu_workqueue_struct {
spinlock_t lock;
- struct list_head worklist;
+ struct plist_head worklist;
wait_queue_head_t more_work;
struct work_struct *current_work;
@@ -127,16 +127,19 @@ struct cpu_workqueue_struct *get_wq_data
static void insert_work(struct cpu_workqueue_struct *cwq,
struct work_struct *work, int tail)
{
+ int prio = current->normal_prio;
+
set_wq_data(work, cwq);
/*
* Ensure that we get the right work->data if we see the
* result of list_add() below, see try_to_grab_pending().
*/
smp_wmb();
- if (tail)
- list_add_tail(&work->entry, &cwq->worklist);
- else
- list_add(&work->entry, &cwq->worklist);
+ plist_node_init(&work->entry, prio);
+ plist_add(&work->entry, &cwq->worklist);
+
+ if (prio < cwq->thread->prio)
+ task_setprio(cwq->thread, prio);
wake_up(&cwq->more_work);
}
@@ -168,7 +171,7 @@ int fastcall queue_work(struct workqueue
int ret = 0, cpu = raw_smp_processor_id();
if (!test_and_set_bit(WORK_STRUCT_PENDING, work_data_bits(work))) {
- BUG_ON(!list_empty(&work->entry));
+ BUG_ON(!plist_node_empty(&work->entry));
__queue_work(wq_per_cpu(wq, cpu), work);
ret = 1;
}
@@ -222,7 +225,7 @@ int queue_delayed_work_on(int cpu, struc
if (!test_and_set_bit(WORK_STRUCT_PENDING, work_data_bits(work))) {
BUG_ON(timer_pending(timer));
- BUG_ON(!list_empty(&work->entry));
+ BUG_ON(!plist_node_empty(&work->entry));
/* This stores cwq for the moment, for the timer_fn */
set_wq_data(work, wq_per_cpu(wq, raw_smp_processor_id()));
@@ -264,13 +267,17 @@ static void run_workqueue(struct cpu_wor
__FUNCTION__, cwq->run_depth);
dump_stack();
}
- while (!list_empty(&cwq->worklist)) {
- struct work_struct *work = list_entry(cwq->worklist.next,
+ while (!plist_head_empty(&cwq->worklist)) {
+ struct work_struct *work = plist_first_entry(&cwq->worklist,
struct work_struct, entry);
work_func_t f = work->func;
+ if (likely(cwq->thread->prio != work->entry.prio))
+ task_setprio(cwq->thread, work->entry.prio);
+
cwq->current_work = work;
- list_del_init(cwq->worklist.next);
+ plist_del(&work->entry, &cwq->worklist);
+ plist_node_init(&work->entry, MAX_PRIO);
spin_unlock_irq(&cwq->lock);
BUG_ON(get_wq_data(work) != cwq);
@@ -283,6 +290,7 @@ static void run_workqueue(struct cpu_wor
spin_lock_irq(&cwq->lock);
cwq->current_work = NULL;
}
+ task_setprio(cwq->thread, current->normal_prio);
cwq->run_depth--;
spin_unlock_irq(&cwq->lock);
}
@@ -301,7 +309,7 @@ static int worker_thread(void *__cwq)
prepare_to_wait(&cwq->more_work, &wait, TASK_INTERRUPTIBLE);
if (!freezing(current) &&
!kthread_should_stop() &&
- list_empty(&cwq->worklist))
+ plist_head_empty(&cwq->worklist))
schedule();
finish_wait(&cwq->more_work, &wait);
@@ -354,7 +362,8 @@ static int flush_cpu_workqueue(struct cp
active = 0;
spin_lock_irq(&cwq->lock);
- if (!list_empty(&cwq->worklist) || cwq->current_work != NULL) {
+ if (!plist_head_empty(&cwq->worklist) ||
+ cwq->current_work != NULL) {
insert_wq_barrier(cwq, &barr, 1);
active = 1;
}
@@ -413,7 +422,7 @@ static int try_to_grab_pending(struct wo
return ret;
spin_lock_irq(&cwq->lock);
- if (!list_empty(&work->entry)) {
+ if (!plist_node_empty(&work->entry)) {
/*
* This work is queued, but perhaps we locked the wrong cwq.
* In that case we must see the new value after rmb(), see
@@ -421,7 +430,8 @@ static int try_to_grab_pending(struct wo
*/
smp_rmb();
if (cwq == get_wq_data(work)) {
- list_del_init(&work->entry);
+ plist_del(&work->entry, &cwq->worklist);
+ plist_node_init(&work->entry, MAX_PRIO);
ret = 1;
}
}
@@ -747,7 +757,7 @@ init_cpu_workqueue(struct workqueue_stru
cwq->wq = wq;
spin_lock_init(&cwq->lock);
- INIT_LIST_HEAD(&cwq->worklist);
+ plist_head_init(&cwq->worklist, NULL);
init_waitqueue_head(&cwq->more_work);
return cwq;
--
^ permalink raw reply [flat|nested] 11+ messages in thread
* [RFC/PATCH 3/3] rt: PI-workqueue: fix barriers
2007-10-22 9:50 [RFC/PATCH 0/3] rt: workqueue PI support Peter Zijlstra
2007-10-22 9:50 ` [RFC/PATCH 1/3] rt: rename rt_mutex_setprio to task_setprio Peter Zijlstra
2007-10-22 9:50 ` [RFC/PATCH 2/3] rt: PI-workqueue support Peter Zijlstra
@ 2007-10-22 9:50 ` Peter Zijlstra
2007-10-22 11:48 ` [RFC/PATCH 4/3] rt: PI-workqueue: fixup the barrier prio Peter Zijlstra
2007-10-22 17:34 ` [RFC/PATCH 3/3] rt: PI-workqueue: fix barriers Oleg Nesterov
2 siblings, 2 replies; 11+ messages in thread
From: Peter Zijlstra @ 2007-10-22 9:50 UTC (permalink / raw)
To: linux-kernel
Cc: Daniel Walker, Steven Rostedt, Ingo Molnar, Thomas Gleixner,
Gregory Haskins, Oleg Nesterov, Peter Zijlstra
[-- Attachment #1: rt-workqueue-barrier.patch --]
[-- Type: text/plain, Size: 9468 bytes --]
The plist change to the workqueues left the barrier functionality broken.
The barrier is used for two things:
- wait_on_work(), and
- flush_cpu_workqueue().
wait_on_work() - uses the barrier to wait on the completion of the currently
worklet. This was done by inserting a completion barrier at the very head of
the worklist. With plist this would be the head of the highest prio.
In order to do that, we extend the plist_add functionality to allow adding to
the head of a prio. Another noteworthy point it that this high prio worklet
must not boost the prio further than the waiting task's prio, even though we
enqueue it at prio 99.
flush_cpu_workqueue() - is a full ordering barrier, although as the name
suggests usually used to wait for the worklist to drain. We'll support the
full ordering semantics currently present. This means that:
W10, W22, W65, B, W80, B, W99
[ where Wn is a worklet at prio n, and B the barrier ]
would most likely execute in the following order:
W10@99, W65@99, W22@99, W80@99, W99
[ Wn@m is Wn executed at prio m ]
[ W10 would be first because it can start executing while the others
are being added ]
Whereas without the barriers it would be:
W10@99, W99, W80, W65, W22
The prio ordering of the plist makes it hard to impose an extra order on top.
The solution used is to nest plist structures. The example will look like:
W10, B(B(W65, W22), W80), W99
That is, the barrier will splice the worklist into itself, and enqueue itself
as the next item to run (very first item, highest prio). The barrier will then
run its own plist to completion before 'popping' back to the regular worklist.
To avoid callstack nesting, run_workqueue is taught about this barrier stack.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
include/linux/plist.h | 14 ++++++
kernel/workqueue.c | 104 ++++++++++++++++++++++++++++++++++++++++++--------
lib/plist.c | 10 +++-
3 files changed, 110 insertions(+), 18 deletions(-)
Index: linux-2.6/include/linux/plist.h
===================================================================
--- linux-2.6.orig/include/linux/plist.h
+++ linux-2.6/include/linux/plist.h
@@ -145,9 +145,21 @@ static inline void plist_node_init(struc
plist_head_init(&node->plist, NULL);
}
-extern void plist_add(struct plist_node *node, struct plist_head *head);
+void __plist_add(struct plist_node *node, struct plist_head *head, int tail);
+
+static inline void plist_add(struct plist_node *node, struct plist_head *head)
+{
+ __plist_add(node, head, 1);
+}
+
extern void plist_del(struct plist_node *node, struct plist_head *head);
+static inline void plist_head_splice(struct plist_head *src, struct plist_head *dst)
+{
+ list_splice_init(&src->prio_list, &dst->prio_list);
+ list_splice_init(&src->node_list, &dst->node_list);
+}
+
/**
* plist_for_each - iterate over the plist
* @pos: the type * to use as a loop counter
Index: linux-2.6/lib/plist.c
===================================================================
--- linux-2.6.orig/lib/plist.c
+++ linux-2.6/lib/plist.c
@@ -72,7 +72,7 @@ static void plist_check_head(struct plis
* @node: &struct plist_node pointer
* @head: &struct plist_head pointer
*/
-void plist_add(struct plist_node *node, struct plist_head *head)
+void __plist_add(struct plist_node *node, struct plist_head *head, int tail)
{
struct plist_node *iter;
@@ -92,7 +92,13 @@ void plist_add(struct plist_node *node,
lt_prio:
list_add_tail(&node->plist.prio_list, &iter->plist.prio_list);
eq_prio:
- list_add_tail(&node->plist.node_list, &iter->plist.node_list);
+ if (likely(tail))
+ list_add_tail(&node->plist.node_list, &iter->plist.node_list);
+ else {
+ iter = list_entry(iter->plist.prio_list.prev,
+ struct plist_node, plist.prio_list);
+ list_add(&node->plist.node_list, &iter->plist.node_list);
+ }
plist_check_head(head);
}
Index: linux-2.6/kernel/workqueue.c
===================================================================
--- linux-2.6.orig/kernel/workqueue.c
+++ linux-2.6/kernel/workqueue.c
@@ -36,6 +36,8 @@
#include <asm/uaccess.h>
+struct wq_full_barrier;
+
/*
* The per-CPU workqueue (if single thread, we always use the first
* possible cpu).
@@ -52,6 +54,8 @@ struct cpu_workqueue_struct {
struct task_struct *thread;
int run_depth; /* Detect run_workqueue() recursion depth */
+
+ struct wq_full_barrier *barrier;
} ____cacheline_aligned;
/*
@@ -125,10 +129,8 @@ struct cpu_workqueue_struct *get_wq_data
}
static void insert_work(struct cpu_workqueue_struct *cwq,
- struct work_struct *work, int tail)
+ struct work_struct *work, int prio, int boost_prio, int tail)
{
- int prio = current->normal_prio;
-
set_wq_data(work, cwq);
/*
* Ensure that we get the right work->data if we see the
@@ -136,10 +138,10 @@ static void insert_work(struct cpu_workq
*/
smp_wmb();
plist_node_init(&work->entry, prio);
- plist_add(&work->entry, &cwq->worklist);
+ __plist_add(&work->entry, &cwq->worklist, tail);
- if (prio < cwq->thread->prio)
- task_setprio(cwq->thread, prio);
+ if (boost_prio < cwq->thread->prio)
+ task_setprio(cwq->thread, boost_prio);
wake_up(&cwq->more_work);
}
@@ -150,7 +152,7 @@ static void __queue_work(struct cpu_work
unsigned long flags;
spin_lock_irqsave(&cwq->lock, flags);
- insert_work(cwq, work, 1);
+ insert_work(cwq, work, current->normal_prio, current->normal_prio, 1);
spin_unlock_irqrestore(&cwq->lock, flags);
}
@@ -257,8 +259,19 @@ static void leak_check(void *func)
dump_stack();
}
+struct wq_full_barrier {
+ struct work_struct work;
+ struct plist_head worklist;
+ struct wq_full_barrier *prev_barrier;
+ int prev_prio;
+ struct cpu_workqueue_struct *cwq;
+ struct completion done;
+};
+
static void run_workqueue(struct cpu_workqueue_struct *cwq)
{
+ struct plist_head *worklist = &cwq->worklist;
+
spin_lock_irq(&cwq->lock);
cwq->run_depth++;
if (cwq->run_depth > 3) {
@@ -267,16 +280,25 @@ static void run_workqueue(struct cpu_wor
__FUNCTION__, cwq->run_depth);
dump_stack();
}
- while (!plist_head_empty(&cwq->worklist)) {
- struct work_struct *work = plist_first_entry(&cwq->worklist,
+
+again:
+ while (!plist_head_empty(worklist)) {
+ int prio;
+ struct work_struct *work = plist_first_entry(worklist,
struct work_struct, entry);
work_func_t f = work->func;
- if (likely(cwq->thread->prio != work->entry.prio))
- task_setprio(cwq->thread, work->entry.prio);
+ prio = work->entry.prio;
+ if (unlikely(worklist != &cwq->worklist)) {
+ prio = min(prio, cwq->barrier->prev_prio);
+ prio = min(prio, plist_first(&cwq->worklist)->prio);
+ }
+
+ if (likely(cwq->thread->prio != prio))
+ task_setprio(cwq->thread, prio);
cwq->current_work = work;
- plist_del(&work->entry, &cwq->worklist);
+ plist_del(&work->entry, worklist);
plist_node_init(&work->entry, MAX_PRIO);
spin_unlock_irq(&cwq->lock);
@@ -289,7 +311,27 @@ static void run_workqueue(struct cpu_wor
spin_lock_irq(&cwq->lock);
cwq->current_work = NULL;
+
+ if (unlikely(cwq->barrier))
+ worklist = &cwq->barrier->worklist;
+ }
+
+ if (unlikely(worklist != &cwq->worklist)) {
+ struct wq_full_barrier *barrier = cwq->barrier;
+
+ BUG_ON(!barrier);
+ cwq->barrier = barrier->prev_barrier;
+ complete(&barrier->done);
+
+ if (unlikely(cwq->barrier))
+ worklist = &cwq->barrier->worklist;
+ else
+ worklist = &cwq->worklist;
+
+ if (!plist_head_empty(worklist))
+ goto again;
}
+
task_setprio(cwq->thread, current->normal_prio);
cwq->run_depth--;
spin_unlock_irq(&cwq->lock);
@@ -343,7 +385,38 @@ static void insert_wq_barrier(struct cpu
init_completion(&barr->done);
- insert_work(cwq, &barr->work, tail);
+ insert_work(cwq, &barr->work, 0, current->normal_prio, tail);
+}
+
+static void wq_full_barrier_func(struct work_struct *work)
+{
+ struct wq_full_barrier *barrier =
+ container_of(work, struct wq_full_barrier, work);
+ struct cpu_workqueue_struct *cwq = barrier->cwq;
+
+ spin_lock_irq(&cwq->lock);
+ barrier->prev_barrier = cwq->barrier;
+ if (cwq->barrier) {
+ barrier->prev_prio = min(cwq->barrier->prev_prio,
+ plist_first(&cwq->barrier->worklist)->prio);
+ } else
+ barrier->prev_prio = MAX_PRIO;
+ cwq->barrier = barrier;
+ spin_unlock_irq(&cwq->lock);
+}
+
+static void insert_wq_full_barrier(struct cpu_workqueue_struct *cwq,
+ struct wq_full_barrier *barr)
+{
+ INIT_WORK(&barr->work, wq_full_barrier_func);
+ __set_bit(WORK_STRUCT_PENDING, work_data_bits(&barr->work));
+
+ plist_head_init(&barr->worklist, NULL);
+ plist_head_splice(&cwq->worklist, &barr->worklist);
+ barr->cwq = cwq;
+ init_completion(&barr->done);
+
+ insert_work(cwq, &barr->work, 0, current->normal_prio, 1);
}
static int flush_cpu_workqueue(struct cpu_workqueue_struct *cwq)
@@ -358,13 +431,13 @@ static int flush_cpu_workqueue(struct cp
run_workqueue(cwq);
active = 1;
} else {
- struct wq_barrier barr;
+ struct wq_full_barrier barr;
active = 0;
spin_lock_irq(&cwq->lock);
if (!plist_head_empty(&cwq->worklist) ||
cwq->current_work != NULL) {
- insert_wq_barrier(cwq, &barr, 1);
+ insert_wq_full_barrier(cwq, &barr);
active = 1;
}
spin_unlock_irq(&cwq->lock);
@@ -759,6 +832,7 @@ init_cpu_workqueue(struct workqueue_stru
spin_lock_init(&cwq->lock);
plist_head_init(&cwq->worklist, NULL);
init_waitqueue_head(&cwq->more_work);
+ cwq->barrier = NULL;
return cwq;
}
--
^ permalink raw reply [flat|nested] 11+ messages in thread
* [RFC/PATCH 4/3] rt: PI-workqueue: fixup the barrier prio
2007-10-22 9:50 ` [RFC/PATCH 3/3] rt: PI-workqueue: fix barriers Peter Zijlstra
@ 2007-10-22 11:48 ` Peter Zijlstra
2007-10-22 12:18 ` [RFC/PATCH 5/3] " Peter Zijlstra
2007-10-22 17:34 ` [RFC/PATCH 3/3] rt: PI-workqueue: fix barriers Oleg Nesterov
1 sibling, 1 reply; 11+ messages in thread
From: Peter Zijlstra @ 2007-10-22 11:48 UTC (permalink / raw)
To: linux-kernel
Cc: Daniel Walker, Steven Rostedt, Ingo Molnar, Thomas Gleixner,
Gregory Haskins, Oleg Nesterov
[-- Attachment #1: Type: text/plain, Size: 1738 bytes --]
small fix to the PI stuff,
we lost the prio of the barrier waiter.
---
Index: linux-2.6/kernel/workqueue.c
===================================================================
--- linux-2.6.orig/kernel/workqueue.c
+++ linux-2.6/kernel/workqueue.c
@@ -264,6 +264,7 @@ struct wq_full_barrier {
struct plist_head worklist;
struct wq_full_barrier *prev_barrier;
int prev_prio;
+ int waiter_prio;
struct cpu_workqueue_struct *cwq;
struct completion done;
};
@@ -291,6 +292,7 @@ again:
prio = work->entry.prio;
if (unlikely(worklist != &cwq->worklist)) {
prio = min(prio, cwq->barrier->prev_prio);
+ prio = min(prio, cwq->barrier->waiter_prio);
prio = min(prio, plist_first(&cwq->worklist)->prio);
}
@@ -393,14 +395,15 @@ static void wq_full_barrier_func(struct
struct wq_full_barrier *barrier =
container_of(work, struct wq_full_barrier, work);
struct cpu_workqueue_struct *cwq = barrier->cwq;
+ int prio = MAX_PRIO;
spin_lock_irq(&cwq->lock);
barrier->prev_barrier = cwq->barrier;
if (cwq->barrier) {
- barrier->prev_prio = min(cwq->barrier->prev_prio,
- plist_first(&cwq->barrier->worklist)->prio);
- } else
- barrier->prev_prio = MAX_PRIO;
+ prio = min(prio, cwq->barrier->waiter_prio);
+ prio = min(prio, plist_first(&cwq->barrier->worklist)->prio);
+ }
+ barrier->prev_prio = prio;
cwq->barrier = barrier;
spin_unlock_irq(&cwq->lock);
}
@@ -415,6 +418,7 @@ static void insert_wq_full_barrier(struc
plist_head_splice(&cwq->worklist, &barr->worklist);
barr->cwq = cwq;
init_completion(&barr->done);
+ barr->waiter_prio = current->normal_prio;
insert_work(cwq, &barr->work, 0, current->normal_prio, 1);
}
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC/PATCH 2/3] rt: PI-workqueue support
2007-10-22 9:50 ` [RFC/PATCH 2/3] rt: PI-workqueue support Peter Zijlstra
@ 2007-10-22 12:00 ` Steven Rostedt
2007-10-22 12:15 ` Peter Zijlstra
0 siblings, 1 reply; 11+ messages in thread
From: Steven Rostedt @ 2007-10-22 12:00 UTC (permalink / raw)
To: Peter Zijlstra
Cc: linux-kernel, Daniel Walker, Ingo Molnar, Thomas Gleixner,
Gregory Haskins, Oleg Nesterov
--
On Mon, 22 Oct 2007, Peter Zijlstra wrote:
5B>
> Index: linux-2.6/kernel/workqueue.c
> ===================================================================
> --- linux-2.6.orig/kernel/workqueue.c
> +++ linux-2.6/kernel/workqueue.c
> @@ -44,7 +44,7 @@ struct cpu_workqueue_struct {
>
> spinlock_t lock;
>
> - struct list_head worklist;
> + struct plist_head worklist;
> wait_queue_head_t more_work;
> struct work_struct *current_work;
>
> @@ -127,16 +127,19 @@ struct cpu_workqueue_struct *get_wq_data
> static void insert_work(struct cpu_workqueue_struct *cwq,
> struct work_struct *work, int tail)
> {
> + int prio = current->normal_prio;
> +
I'm curious to why you use normal_prio here? If the task has been boosted
by some other PI method, and this task is about to sleep, why not use the
actualy current->prio?
-- Steve
> set_wq_data(work, cwq);
> /*
> * Ensure that we get the right work->data if we see the
> * result of list_add() below, see try_to_grab_pending().
> */
> smp_wmb();
> - if (tail)
> - list_add_tail(&work->entry, &cwq->worklist);
> - else
> - list_add(&work->entry, &cwq->worklist);
> + plist_node_init(&work->entry, prio);
> + plist_add(&work->entry, &cwq->worklist);
> +
> + if (prio < cwq->thread->prio)
> + task_setprio(cwq->thread, prio);
> wake_up(&cwq->more_work);
> }
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC/PATCH 2/3] rt: PI-workqueue support
2007-10-22 12:00 ` Steven Rostedt
@ 2007-10-22 12:15 ` Peter Zijlstra
2007-10-22 15:33 ` Daniel Walker
0 siblings, 1 reply; 11+ messages in thread
From: Peter Zijlstra @ 2007-10-22 12:15 UTC (permalink / raw)
To: Steven Rostedt
Cc: linux-kernel, Daniel Walker, Ingo Molnar, Thomas Gleixner,
Gregory Haskins, Oleg Nesterov
[-- Attachment #1: Type: text/plain, Size: 1376 bytes --]
On Mon, 2007-10-22 at 08:00 -0400, Steven Rostedt wrote:
> --
>
> On Mon, 22 Oct 2007, Peter Zijlstra wrote:
> 5B>
> > Index: linux-2.6/kernel/workqueue.c
> > ===================================================================
> > --- linux-2.6.orig/kernel/workqueue.c
> > +++ linux-2.6/kernel/workqueue.c
> > @@ -44,7 +44,7 @@ struct cpu_workqueue_struct {
> >
> > spinlock_t lock;
> >
> > - struct list_head worklist;
> > + struct plist_head worklist;
> > wait_queue_head_t more_work;
> > struct work_struct *current_work;
> >
> > @@ -127,16 +127,19 @@ struct cpu_workqueue_struct *get_wq_data
> > static void insert_work(struct cpu_workqueue_struct *cwq,
> > struct work_struct *work, int tail)
> > {
> > + int prio = current->normal_prio;
> > +
>
> I'm curious to why you use normal_prio here? If the task has been boosted
> by some other PI method, and this task is about to sleep, why not use the
> actualy current->prio?
Daniel wrote this bit, but I tend to agree with him, but can't give his
rationale. Mine is that worklets are typically asynchonous and thus its
prio should not depend on temporal things like boosting.
OTOH it would probably make sense to allow it to depend on it through
the barrier constructs, but for that I have to hook the completions into
the PI chain. Something that needs more thought.
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread
* [RFC/PATCH 5/3] rt: PI-workqueue: fixup the barrier prio
2007-10-22 11:48 ` [RFC/PATCH 4/3] rt: PI-workqueue: fixup the barrier prio Peter Zijlstra
@ 2007-10-22 12:18 ` Peter Zijlstra
0 siblings, 0 replies; 11+ messages in thread
From: Peter Zijlstra @ 2007-10-22 12:18 UTC (permalink / raw)
To: linux-kernel
Cc: Daniel Walker, Steven Rostedt, Ingo Molnar, Thomas Gleixner,
Gregory Haskins, Oleg Nesterov
Steven is right in that I did over-user normal_prio a bit.
the barriers should use the boosted prio.
---
Index: linux-2.6/kernel/workqueue.c
===================================================================
--- linux-2.6.orig/kernel/workqueue.c
+++ linux-2.6/kernel/workqueue.c
@@ -387,7 +387,7 @@ static void insert_wq_barrier(struct cpu
init_completion(&barr->done);
- insert_work(cwq, &barr->work, 0, current->normal_prio, tail);
+ insert_work(cwq, &barr->work, 0, current->prio, tail);
}
static void wq_full_barrier_func(struct work_struct *work)
@@ -418,9 +418,9 @@ static void insert_wq_full_barrier(struc
plist_head_splice(&cwq->worklist, &barr->worklist);
barr->cwq = cwq;
init_completion(&barr->done);
- barr->waiter_prio = current->normal_prio;
+ barr->waiter_prio = current->prio;
- insert_work(cwq, &barr->work, 0, current->normal_prio, 1);
+ insert_work(cwq, &barr->work, 0, current->prio, 1);
}
static int flush_cpu_workqueue(struct cpu_workqueue_struct *cwq)
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC/PATCH 2/3] rt: PI-workqueue support
2007-10-22 12:15 ` Peter Zijlstra
@ 2007-10-22 15:33 ` Daniel Walker
0 siblings, 0 replies; 11+ messages in thread
From: Daniel Walker @ 2007-10-22 15:33 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Steven Rostedt, linux-kernel, Ingo Molnar, Thomas Gleixner,
Gregory Haskins, Oleg Nesterov
On Mon, 2007-10-22 at 14:15 +0200, Peter Zijlstra wrote:
> Daniel wrote this bit, but I tend to agree with him, but can't give his
> rationale. Mine is that worklets are typically asynchonous and thus its
> prio should not depend on temporal things like boosting.
>
> OTOH it would probably make sense to allow it to depend on it through
> the barrier constructs, but for that I have to hook the completions into
> the PI chain. Something that needs more thought.
Yeah, I think Peter summarized it .. Since the task isn't waiting on
work when it's inserted it didn't seem right to use a priority that may
be boosted, since the work isn't preventing completion .. I think the
only time you would want to transfer the boosted priority is when a task
gets blocked, which does happen when you flush the workqueue.
Although, If there is one area of this code that needs attention I think
it's the PI stuff, it wasn't my first priority at the time .. I also
recall Oleg find some issue with it ..
Daniel
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC/PATCH 3/3] rt: PI-workqueue: fix barriers
2007-10-22 9:50 ` [RFC/PATCH 3/3] rt: PI-workqueue: fix barriers Peter Zijlstra
2007-10-22 11:48 ` [RFC/PATCH 4/3] rt: PI-workqueue: fixup the barrier prio Peter Zijlstra
@ 2007-10-22 17:34 ` Oleg Nesterov
2007-10-22 19:17 ` Peter Zijlstra
1 sibling, 1 reply; 11+ messages in thread
From: Oleg Nesterov @ 2007-10-22 17:34 UTC (permalink / raw)
To: Peter Zijlstra
Cc: linux-kernel, Daniel Walker, Steven Rostedt, Ingo Molnar,
Thomas Gleixner, Gregory Haskins
On 10/22, Peter Zijlstra wrote:
>
> --- linux-2.6.orig/lib/plist.c
> +++ linux-2.6/lib/plist.c
> @@ -72,7 +72,7 @@ static void plist_check_head(struct plis
> * @node: &struct plist_node pointer
> * @head: &struct plist_head pointer
> */
> -void plist_add(struct plist_node *node, struct plist_head *head)
> +void __plist_add(struct plist_node *node, struct plist_head *head, int tail)
> {
> struct plist_node *iter;
>
> @@ -92,7 +92,13 @@ void plist_add(struct plist_node *node,
> lt_prio:
> list_add_tail(&node->plist.prio_list, &iter->plist.prio_list);
> eq_prio:
> - list_add_tail(&node->plist.node_list, &iter->plist.node_list);
> + if (likely(tail))
> + list_add_tail(&node->plist.node_list, &iter->plist.node_list);
Ugh, I think this change is wrong in many ways. Just one example. Suppose
we have plist with 2 nodes X1 and X2, both with ->prio == 10. Now we insert
the new node N, N->prio = 5.
at this point iter == &X1
> + else {
> + iter = list_entry(iter->plist.prio_list.prev,
> + struct plist_node, plist.prio_list);
iter is _still_ &X1, its ->plist.prio_list is empty.
> + list_add(&node->plist.node_list, &iter->plist.node_list);
Now, from the plist_for_each's pov, the nodes in the plist are ordered as
X1, N, X2 - bug.
There are other problems. For example the "eq_prio:" case needs additional
attention if tail == 0, we must remove the old node from the prio_list and
insert the new one.
> static void insert_work(struct cpu_workqueue_struct *cwq,
> - struct work_struct *work, int tail)
> + struct work_struct *work, int prio, int boost_prio, int tail)
> {
> - int prio = current->normal_prio;
> -
> set_wq_data(work, cwq);
> /*
> * Ensure that we get the right work->data if we see the
> @@ -136,10 +138,10 @@ static void insert_work(struct cpu_workq
> */
> smp_wmb();
> plist_node_init(&work->entry, prio);
> - plist_add(&work->entry, &cwq->worklist);
> + __plist_add(&work->entry, &cwq->worklist, tail);
Hmm. Not sure we really need __plist_add() here. If tail == 0, we must
insert this work (barrier) at the head of the list. Can't we do
work->entry->prio = tail ? prio : -1;
plist_add(&work->entry, &cwq->worklist);
instead?
> static void run_workqueue(struct cpu_workqueue_struct *cwq)
> {
> + struct plist_head *worklist = &cwq->worklist;
> +
> spin_lock_irq(&cwq->lock);
> cwq->run_depth++;
> if (cwq->run_depth > 3) {
> @@ -267,16 +280,25 @@ static void run_workqueue(struct cpu_wor
> __FUNCTION__, cwq->run_depth);
> dump_stack();
> }
> - while (!plist_head_empty(&cwq->worklist)) {
> - struct work_struct *work = plist_first_entry(&cwq->worklist,
> +
> +again:
> + while (!plist_head_empty(worklist)) {
> + int prio;
> + struct work_struct *work = plist_first_entry(worklist,
> struct work_struct, entry);
> work_func_t f = work->func;
>
> - if (likely(cwq->thread->prio != work->entry.prio))
> - task_setprio(cwq->thread, work->entry.prio);
> + prio = work->entry.prio;
> + if (unlikely(worklist != &cwq->worklist)) {
> + prio = min(prio, cwq->barrier->prev_prio);
> + prio = min(prio, plist_first(&cwq->worklist)->prio);
> + }
> +
> + if (likely(cwq->thread->prio != prio))
> + task_setprio(cwq->thread, prio);
>
> cwq->current_work = work;
> - plist_del(&work->entry, &cwq->worklist);
> + plist_del(&work->entry, worklist);
> plist_node_init(&work->entry, MAX_PRIO);
> spin_unlock_irq(&cwq->lock);
>
> @@ -289,7 +311,27 @@ static void run_workqueue(struct cpu_wor
>
> spin_lock_irq(&cwq->lock);
> cwq->current_work = NULL;
> +
> + if (unlikely(cwq->barrier))
> + worklist = &cwq->barrier->worklist;
> + }
At first glance this looks wrong, but I am not sure I get it right...
So, now we iterate the local worklist, not cwq->worklist. Suppose it has
the works w1 and w2.
run_workqueue() starts w1->func().
Another thread does cancel_work_sync(w1) under some LOCK. We insert the
barrier at cwq->worklist and sleep.
w1 completes, run_workqueue() fires w2, w2->func does lock(LOCK) ...
Deadlock.
(I'll try to read this patch carefully tomorrow, but it is a bit hard to
read this series, and the very first patch has rejects. Could you make
a single patch?)
Oleg.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC/PATCH 3/3] rt: PI-workqueue: fix barriers
2007-10-22 17:34 ` [RFC/PATCH 3/3] rt: PI-workqueue: fix barriers Oleg Nesterov
@ 2007-10-22 19:17 ` Peter Zijlstra
0 siblings, 0 replies; 11+ messages in thread
From: Peter Zijlstra @ 2007-10-22 19:17 UTC (permalink / raw)
To: Oleg Nesterov
Cc: linux-kernel, Daniel Walker, Steven Rostedt, Ingo Molnar,
Thomas Gleixner, Gregory Haskins
On Mon, 2007-10-22 at 21:34 +0400, Oleg Nesterov wrote:
> On 10/22, Peter Zijlstra wrote:
> > @@ -136,10 +138,10 @@ static void insert_work(struct cpu_workq
> > */
> > smp_wmb();
> > plist_node_init(&work->entry, prio);
> > - plist_add(&work->entry, &cwq->worklist);
> > + __plist_add(&work->entry, &cwq->worklist, tail);
>
> Hmm. Not sure we really need __plist_add() here. If tail == 0, we must
> insert this work (barrier) at the head of the list. Can't we do
>
> work->entry->prio = tail ? prio : -1;
> plist_add(&work->entry, &cwq->worklist);
>
> instead?
Ah yes, that would work I guess. Very nice!
> > static void run_workqueue(struct cpu_workqueue_struct *cwq)
> > {
> > + struct plist_head *worklist = &cwq->worklist;
> > +
> > spin_lock_irq(&cwq->lock);
> > cwq->run_depth++;
> > if (cwq->run_depth > 3) {
> > @@ -267,16 +280,25 @@ static void run_workqueue(struct cpu_wor
> > __FUNCTION__, cwq->run_depth);
> > dump_stack();
> > }
> > - while (!plist_head_empty(&cwq->worklist)) {
> > - struct work_struct *work = plist_first_entry(&cwq->worklist,
> > +
> > +again:
> > + while (!plist_head_empty(worklist)) {
> > + int prio;
> > + struct work_struct *work = plist_first_entry(worklist,
> > struct work_struct, entry);
> > work_func_t f = work->func;
> >
> > - if (likely(cwq->thread->prio != work->entry.prio))
> > - task_setprio(cwq->thread, work->entry.prio);
> > + prio = work->entry.prio;
> > + if (unlikely(worklist != &cwq->worklist)) {
> > + prio = min(prio, cwq->barrier->prev_prio);
> > + prio = min(prio, plist_first(&cwq->worklist)->prio);
> > + }
> > +
> > + if (likely(cwq->thread->prio != prio))
> > + task_setprio(cwq->thread, prio);
> >
> > cwq->current_work = work;
> > - plist_del(&work->entry, &cwq->worklist);
> > + plist_del(&work->entry, worklist);
> > plist_node_init(&work->entry, MAX_PRIO);
> > spin_unlock_irq(&cwq->lock);
> >
> > @@ -289,7 +311,27 @@ static void run_workqueue(struct cpu_wor
> >
> > spin_lock_irq(&cwq->lock);
> > cwq->current_work = NULL;
> > +
> > + if (unlikely(cwq->barrier))
> > + worklist = &cwq->barrier->worklist;
> > + }
>
> At first glance this looks wrong, but I am not sure I get it right...
>
> So, now we iterate the local worklist, not cwq->worklist. Suppose it has
> the works w1 and w2.
>
> run_workqueue() starts w1->func().
>
> Another thread does cancel_work_sync(w1) under some LOCK. We insert the
> barrier at cwq->worklist and sleep.
>
> w1 completes, run_workqueue() fires w2, w2->func does lock(LOCK) ...
>
> Deadlock.
Ah, cancel_work_sync() will not use wq_full_barrier, but wq_barrier.
Its flush_cpu_workqueue() that will use this new nesting thing.
> (I'll try to read this patch carefully tomorrow, but it is a bit hard to
> read this series, and the very first patch has rejects. Could you make
> a single patch?)
Its against .23-rt1 and applies fine.
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2007-10-22 19:18 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-10-22 9:50 [RFC/PATCH 0/3] rt: workqueue PI support Peter Zijlstra
2007-10-22 9:50 ` [RFC/PATCH 1/3] rt: rename rt_mutex_setprio to task_setprio Peter Zijlstra
2007-10-22 9:50 ` [RFC/PATCH 2/3] rt: PI-workqueue support Peter Zijlstra
2007-10-22 12:00 ` Steven Rostedt
2007-10-22 12:15 ` Peter Zijlstra
2007-10-22 15:33 ` Daniel Walker
2007-10-22 9:50 ` [RFC/PATCH 3/3] rt: PI-workqueue: fix barriers Peter Zijlstra
2007-10-22 11:48 ` [RFC/PATCH 4/3] rt: PI-workqueue: fixup the barrier prio Peter Zijlstra
2007-10-22 12:18 ` [RFC/PATCH 5/3] " Peter Zijlstra
2007-10-22 17:34 ` [RFC/PATCH 3/3] rt: PI-workqueue: fix barriers Oleg Nesterov
2007-10-22 19:17 ` Peter Zijlstra
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox