* [PATCH RFC] sched: introduce add_wait_queue_exclusive_head
@ 2014-03-18 13:10 Peng Tao
2014-03-18 13:33 ` Peter Zijlstra
0 siblings, 1 reply; 21+ messages in thread
From: Peng Tao @ 2014-03-18 13:10 UTC (permalink / raw)
To: linux-kernel
Cc: Peng Tao, Ingo Molnar, Peter Zijlstra, Oleg Drokin,
Andreas Dilger
Normally wait_queue_t is a FIFO list for exclusive waiting tasks.
As a side effect, if there are many threads waiting on the same
condition (which is common for data servers like Lustre), all
threads will be waken up again and again, causing unnecessary cache
line polution. Instead of FIFO lists, we can use LIFO lists to always
wake up the most recent active threads.
Lustre implements this add_wait_queue_exclusive_head() privately but we
think it might be useful as a generic function. With it being moved to
generic layer, the rest of Lustre private wrappers for wait queue can be
all removed.
Of course there is an alternative approach to just open code it but we'd
like to ask first to see if there is objection to making it generic.
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: Peng Tao <bergwolf@gmail.com>
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
---
.../lustre/include/linux/libcfs/libcfs_prim.h | 1 -
.../lustre/lustre/libcfs/linux/linux-prim.c | 24 --------------------
include/linux/wait.h | 2 +
kernel/sched/wait.c | 23 +++++++++++++++++++
4 files changed, 25 insertions(+), 25 deletions(-)
diff --git a/drivers/staging/lustre/include/linux/libcfs/libcfs_prim.h b/drivers/staging/lustre/include/linux/libcfs/libcfs_prim.h
index e6e417a..c23b78c 100644
--- a/drivers/staging/lustre/include/linux/libcfs/libcfs_prim.h
+++ b/drivers/staging/lustre/include/linux/libcfs/libcfs_prim.h
@@ -54,7 +54,6 @@ void schedule_timeout_and_set_state(long, int64_t);
void init_waitqueue_entry_current(wait_queue_t *link);
int64_t waitq_timedwait(wait_queue_t *, long, int64_t);
void waitq_wait(wait_queue_t *, long);
-void add_wait_queue_exclusive_head(wait_queue_head_t *, wait_queue_t *);
void cfs_init_timer(struct timer_list *t);
void cfs_timer_init(struct timer_list *t, cfs_timer_func_t *func, void *arg);
diff --git a/drivers/staging/lustre/lustre/libcfs/linux/linux-prim.c b/drivers/staging/lustre/lustre/libcfs/linux/linux-prim.c
index c7bc7fc..13b4a80 100644
--- a/drivers/staging/lustre/lustre/libcfs/linux/linux-prim.c
+++ b/drivers/staging/lustre/lustre/libcfs/linux/linux-prim.c
@@ -53,30 +53,6 @@ init_waitqueue_entry_current(wait_queue_t *link)
}
EXPORT_SYMBOL(init_waitqueue_entry_current);
-/**
- * wait_queue_t of Linux (version < 2.6.34) is a FIFO list for exclusively
- * waiting threads, which is not always desirable because all threads will
- * be waken up again and again, even user only needs a few of them to be
- * active most time. This is not good for performance because cache can
- * be polluted by different threads.
- *
- * LIFO list can resolve this problem because we always wakeup the most
- * recent active thread by default.
- *
- * NB: please don't call non-exclusive & exclusive wait on the same
- * waitq if add_wait_queue_exclusive_head is used.
- */
-void
-add_wait_queue_exclusive_head(wait_queue_head_t *waitq, wait_queue_t *link)
-{
- unsigned long flags;
-
- spin_lock_irqsave(&waitq->lock, flags);
- __add_wait_queue_exclusive(waitq, link);
- spin_unlock_irqrestore(&waitq->lock, flags);
-}
-EXPORT_SYMBOL(add_wait_queue_exclusive_head);
-
void
waitq_wait(wait_queue_t *link, long state)
{
diff --git a/include/linux/wait.h b/include/linux/wait.h
index 559044c..634a49c 100644
--- a/include/linux/wait.h
+++ b/include/linux/wait.h
@@ -105,6 +105,8 @@ static inline int waitqueue_active(wait_queue_head_t *q)
extern void add_wait_queue(wait_queue_head_t *q, wait_queue_t *wait);
extern void add_wait_queue_exclusive(wait_queue_head_t *q, wait_queue_t *wait);
+extern void add_wait_queue_exclusive_head(wait_queue_head_t *q,
+ wait_queue_t *wait);
extern void remove_wait_queue(wait_queue_head_t *q, wait_queue_t *wait);
static inline void __add_wait_queue(wait_queue_head_t *head, wait_queue_t *new)
diff --git a/kernel/sched/wait.c b/kernel/sched/wait.c
index 7d50f79..69925c3 100644
--- a/kernel/sched/wait.c
+++ b/kernel/sched/wait.c
@@ -30,6 +30,29 @@ void add_wait_queue(wait_queue_head_t *q, wait_queue_t *wait)
}
EXPORT_SYMBOL(add_wait_queue);
+/**
+ * wait_queue_t is a FIFO list for exclusively waiting threads, which is
+ * not always desirable because all threads will be waken up again and
+ * again, even user only needs a few of them to be active most time. This
+ * is not good for performance because cache can be polluted by different
+ * threads.
+ *
+ * LIFO list can resolve this problem because we always wakeup the most
+ * recent active thread by default.
+ *
+ * NB: please don't call non-exclusive & exclusive wait on the same
+ * waitq if add_wait_queue_exclusive_head is used.
+ */
+void add_wait_queue_exclusive_head(wait_queue_head_t *q, wait_queue_t *wait)
+{
+ unsigned long flags;
+
+ spin_lock_irqsave(&q->lock, flags);
+ __add_wait_queue_exclusive(q, wait);
+ spin_unlock_irqrestore(&q->lock, flags);
+}
+EXPORT_SYMBOL(add_wait_queue_exclusive_head);
+
void add_wait_queue_exclusive(wait_queue_head_t *q, wait_queue_t *wait)
{
unsigned long flags;
--
1.7.7.6
^ permalink raw reply related [flat|nested] 21+ messages in thread* Re: [PATCH RFC] sched: introduce add_wait_queue_exclusive_head
2014-03-18 13:10 [PATCH RFC] sched: introduce add_wait_queue_exclusive_head Peng Tao
@ 2014-03-18 13:33 ` Peter Zijlstra
2014-03-18 13:51 ` Peng Tao
0 siblings, 1 reply; 21+ messages in thread
From: Peter Zijlstra @ 2014-03-18 13:33 UTC (permalink / raw)
To: Peng Tao
Cc: linux-kernel, Ingo Molnar, Oleg Drokin, Andreas Dilger,
Oleg Nesterov
On Tue, Mar 18, 2014 at 09:10:08PM +0800, Peng Tao wrote:
> Normally wait_queue_t is a FIFO list for exclusive waiting tasks.
> As a side effect, if there are many threads waiting on the same
> condition (which is common for data servers like Lustre), all
> threads will be waken up again and again, causing unnecessary cache
> line polution. Instead of FIFO lists, we can use LIFO lists to always
> wake up the most recent active threads.
>
> Lustre implements this add_wait_queue_exclusive_head() privately but we
> think it might be useful as a generic function. With it being moved to
> generic layer, the rest of Lustre private wrappers for wait queue can be
> all removed.
>
> Of course there is an alternative approach to just open code it but we'd
> like to ask first to see if there is objection to making it generic.
OK, so I don't particularly mind LIFO, but there are a few problems with
the patch.
Firstly I think the _head postfix for LIFO is a bad name, and secondly,
and most important, this breaks __wake_up_common().
So waitqueue wakeups are specified as waking all !exclusive tasks and @n
exclusive tasks. The way this works is that !exclusive tasks are added
to the head (LIFO) and exclusive tasks are added to the tail
(FIFO).
We can then iterate the list until @n exclusive tasks have been
observed.
However if you start adding exclusive tasks to the head this all comes
apart.
If you don't mix exclusive and !exclusive tasks on the same waitqueue
this isn't a problem, but I'm sure people will eventually do this and
get a nasty surprise.
I'm not sure what the best way around this would be; but I can see two
options:
- add enough debugging bits to detect this fail case.
- extend wait_queue_head_t to keep a pointer to the first !exclusive
element and insert exclusive LIFO tasks there -- thereby keeping
!exclusive tasks at the front.
Anybody?
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH RFC] sched: introduce add_wait_queue_exclusive_head
2014-03-18 13:33 ` Peter Zijlstra
@ 2014-03-18 13:51 ` Peng Tao
2014-03-18 14:05 ` Peter Zijlstra
0 siblings, 1 reply; 21+ messages in thread
From: Peng Tao @ 2014-03-18 13:51 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Linux Kernel Mailing List, Ingo Molnar, Oleg Drokin,
Andreas Dilger, Oleg Nesterov
On Tue, Mar 18, 2014 at 9:33 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Tue, Mar 18, 2014 at 09:10:08PM +0800, Peng Tao wrote:
>> Normally wait_queue_t is a FIFO list for exclusive waiting tasks.
>> As a side effect, if there are many threads waiting on the same
>> condition (which is common for data servers like Lustre), all
>> threads will be waken up again and again, causing unnecessary cache
>> line polution. Instead of FIFO lists, we can use LIFO lists to always
>> wake up the most recent active threads.
>>
>> Lustre implements this add_wait_queue_exclusive_head() privately but we
>> think it might be useful as a generic function. With it being moved to
>> generic layer, the rest of Lustre private wrappers for wait queue can be
>> all removed.
>>
>> Of course there is an alternative approach to just open code it but we'd
>> like to ask first to see if there is objection to making it generic.
>
> OK, so I don't particularly mind LIFO, but there are a few problems with
> the patch.
>
> Firstly I think the _head postfix for LIFO is a bad name,
Do you have any preference on the name? add_wait_queue_exclusive_lifo()?
> and secondly,
> and most important, this breaks __wake_up_common().
>
> So waitqueue wakeups are specified as waking all !exclusive tasks and @n
> exclusive tasks. The way this works is that !exclusive tasks are added
> to the head (LIFO) and exclusive tasks are added to the tail
> (FIFO).
>
> We can then iterate the list until @n exclusive tasks have been
> observed.
>
> However if you start adding exclusive tasks to the head this all comes
> apart.
>
> If you don't mix exclusive and !exclusive tasks on the same waitqueue
> this isn't a problem, but I'm sure people will eventually do this and
> get a nasty surprise.
>
Yes, Lustre takes care not to mix exclusive and !exclusive tasks in this case.
> I'm not sure what the best way around this would be; but I can see two
> options:
>
> - add enough debugging bits to detect this fail case.
> - extend wait_queue_head_t to keep a pointer to the first !exclusive
> element and insert exclusive LIFO tasks there -- thereby keeping
> !exclusive tasks at the front.
>
Thank you for the suggestions. Personally I am in favor of the second
one but I'll wait others to comment first.
Thanks,
Tao
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH RFC] sched: introduce add_wait_queue_exclusive_head
2014-03-18 13:51 ` Peng Tao
@ 2014-03-18 14:05 ` Peter Zijlstra
2014-03-18 14:44 ` Peng Tao
2014-03-18 15:47 ` Oleg Nesterov
0 siblings, 2 replies; 21+ messages in thread
From: Peter Zijlstra @ 2014-03-18 14:05 UTC (permalink / raw)
To: Peng Tao
Cc: Linux Kernel Mailing List, Ingo Molnar, Oleg Drokin,
Andreas Dilger, Oleg Nesterov
On Tue, Mar 18, 2014 at 09:51:04PM +0800, Peng Tao wrote:
> > Firstly I think the _head postfix for LIFO is a bad name,
> Do you have any preference on the name? add_wait_queue_exclusive_lifo()?
I think we can avoid the entire function if we add
WQ_FLAG_LIFO and make prepare_to_wait_event() DTRT.
Then we only need to teach ___wait() about this; and I suppose we could
make .exclusive=-1 be the LIFO flag.
Unless you cannot use ___wait() and really need to open-code the
wait_event() stuff.
> > If you don't mix exclusive and !exclusive tasks on the same waitqueue
> > this isn't a problem, but I'm sure people will eventually do this and
> > get a nasty surprise.
> >
> Yes, Lustre takes care not to mix exclusive and !exclusive tasks in this case.
Right; I saw you had a comment to that effect after I wrote this email.
> > I'm not sure what the best way around this would be; but I can see two
> > options:
> >
> > - add enough debugging bits to detect this fail case.
> > - extend wait_queue_head_t to keep a pointer to the first !exclusive
s/!//
> > element and insert exclusive LIFO tasks there -- thereby keeping
> > !exclusive tasks at the front.
> >
> Thank you for the suggestions. Personally I am in favor of the second
> one but I'll wait others to comment first.
Oleg, Ingo, any preferences?
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH RFC] sched: introduce add_wait_queue_exclusive_head
2014-03-18 14:05 ` Peter Zijlstra
@ 2014-03-18 14:44 ` Peng Tao
2014-03-18 16:23 ` Oleg Nesterov
2014-03-18 15:47 ` Oleg Nesterov
1 sibling, 1 reply; 21+ messages in thread
From: Peng Tao @ 2014-03-18 14:44 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Linux Kernel Mailing List, Ingo Molnar, Oleg Drokin,
Andreas Dilger, Oleg Nesterov
On Tue, Mar 18, 2014 at 10:05 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Tue, Mar 18, 2014 at 09:51:04PM +0800, Peng Tao wrote:
>> > Firstly I think the _head postfix for LIFO is a bad name,
>> Do you have any preference on the name? add_wait_queue_exclusive_lifo()?
>
> I think we can avoid the entire function if we add
> WQ_FLAG_LIFO and make prepare_to_wait_event() DTRT.
>
> Then we only need to teach ___wait() about this; and I suppose we could
> make .exclusive=-1 be the LIFO flag.
>
> Unless you cannot use ___wait() and really need to open-code the
> wait_event() stuff.
>
Lustre's private l_wait_event() stuff takes care to (un)mask
LUSTRE_FATAL_SIGS and always wait in TASK_INTERRUPTIBLE state.
It looks to me that we can at least wrap l_wait_event() on top of
wait_event_interruptible/wait_event_timeout_interruptible.
Andreas, any opinions? Was l_wait_event added just for cross platform
support or is there some deeper reason that I missed?
Thanks,
Tao
>> > If you don't mix exclusive and !exclusive tasks on the same waitqueue
>> > this isn't a problem, but I'm sure people will eventually do this and
>> > get a nasty surprise.
>> >
>> Yes, Lustre takes care not to mix exclusive and !exclusive tasks in this case.
>
> Right; I saw you had a comment to that effect after I wrote this email.
>
>> > I'm not sure what the best way around this would be; but I can see two
>> > options:
>> >
>> > - add enough debugging bits to detect this fail case.
>> > - extend wait_queue_head_t to keep a pointer to the first !exclusive
>
> s/!//
>
>> > element and insert exclusive LIFO tasks there -- thereby keeping
>> > !exclusive tasks at the front.
>> >
>> Thank you for the suggestions. Personally I am in favor of the second
>> one but I'll wait others to comment first.
>
> Oleg, Ingo, any preferences?
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH RFC] sched: introduce add_wait_queue_exclusive_head
2014-03-18 14:44 ` Peng Tao
@ 2014-03-18 16:23 ` Oleg Nesterov
2014-03-19 2:22 ` Peng Tao
0 siblings, 1 reply; 21+ messages in thread
From: Oleg Nesterov @ 2014-03-18 16:23 UTC (permalink / raw)
To: Peng Tao
Cc: Peter Zijlstra, Linux Kernel Mailing List, Ingo Molnar,
Oleg Drokin, Andreas Dilger
On 03/18, Peng Tao wrote:
>
> On Tue, Mar 18, 2014 at 10:05 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > Unless you cannot use ___wait() and really need to open-code the
> > wait_event() stuff.
> >
> Lustre's private l_wait_event() stuff takes care to (un)mask
> LUSTRE_FATAL_SIGS
Hmm. This is off-topic but after the quick grep LUSTRE_FATAL_SIGS/etc
looks suspicious.
Firtsly, cfs_block_sigs/cfs_block_sigsinv/etc are not exactly right,
they need set_current_blocked(). And you can read "old" lockless.
And note that cfs_block_sigsinv(0) (which should block all signals)
can't actually protect from SIGKILL (or in fact from another fatal
signal) or SIGSTOP if the caller is multithreaded. Or ptrace, or
freezer.
> and always wait in TASK_INTERRUPTIBLE state.
and it seems that __wstate passed to waitq_wait/waitq_timedwait is
simply ignored.
> It looks to me that we can at least wrap l_wait_event() on top of
> wait_event_interruptible/wait_event_timeout_interruptible.
l_wait_event looks really complicated ;) but perhaps you can rewrite
it on top of ___wait_event(), note that condition/cmd can do anything
you want.
Oleg.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH RFC] sched: introduce add_wait_queue_exclusive_head
2014-03-18 16:23 ` Oleg Nesterov
@ 2014-03-19 2:22 ` Peng Tao
2014-03-19 17:33 ` Oleg Nesterov
0 siblings, 1 reply; 21+ messages in thread
From: Peng Tao @ 2014-03-19 2:22 UTC (permalink / raw)
To: Oleg Nesterov
Cc: Peter Zijlstra, Linux Kernel Mailing List, Ingo Molnar,
Oleg Drokin, Andreas Dilger
On Wed, Mar 19, 2014 at 12:23 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> On 03/18, Peng Tao wrote:
>>
>> On Tue, Mar 18, 2014 at 10:05 PM, Peter Zijlstra <peterz@infradead.org> wrote:
>> >
>> > Unless you cannot use ___wait() and really need to open-code the
>> > wait_event() stuff.
>> >
>> Lustre's private l_wait_event() stuff takes care to (un)mask
>> LUSTRE_FATAL_SIGS
>
> Hmm. This is off-topic but after the quick grep LUSTRE_FATAL_SIGS/etc
> looks suspicious.
>
> Firtsly, cfs_block_sigs/cfs_block_sigsinv/etc are not exactly right,
> they need set_current_blocked(). And you can read "old" lockless.
>
It seems that set_current_blocked() is not exported. Can we ask to export it?
And looking at other similar places like coda_block_signals(), it
doesn't call set_current_blocked() either. So it needs
set_current_blocked() as well.
> And note that cfs_block_sigsinv(0) (which should block all signals)
> can't actually protect from SIGKILL (or in fact from another fatal
> signal) or SIGSTOP if the caller is multithreaded. Or ptrace, or
> freezer.
>
>> and always wait in TASK_INTERRUPTIBLE state.
>
> and it seems that __wstate passed to waitq_wait/waitq_timedwait is
> simply ignored.
>
Yes. That needs to be dropped.
>> It looks to me that we can at least wrap l_wait_event() on top of
>> wait_event_interruptible/wait_event_timeout_interruptible.
>
> l_wait_event looks really complicated ;) but perhaps you can rewrite
> it on top of ___wait_event(), note that condition/cmd can do anything
> you want.
>
Yeah, I meant to say __wait_event(). Thanks for correcting me.
Thanks,
Tao
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH RFC] sched: introduce add_wait_queue_exclusive_head
2014-03-19 2:22 ` Peng Tao
@ 2014-03-19 17:33 ` Oleg Nesterov
2014-03-19 19:44 ` Dilger, Andreas
0 siblings, 1 reply; 21+ messages in thread
From: Oleg Nesterov @ 2014-03-19 17:33 UTC (permalink / raw)
To: Peng Tao
Cc: Peter Zijlstra, Linux Kernel Mailing List, Ingo Molnar,
Oleg Drokin, Andreas Dilger
On 03/19, Peng Tao wrote:
>
> On Wed, Mar 19, 2014 at 12:23 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> >
> > Firtsly, cfs_block_sigs/cfs_block_sigsinv/etc are not exactly right,
> > they need set_current_blocked(). And you can read "old" lockless.
> >
> It seems that set_current_blocked() is not exported. Can we ask to export it?
Why not. If you are going to change this code to use set_current_blocked(),
I'd suggest you to send the "export set_current_blocked" patch in series.
Otherwise, if it is sent separately, your change will depend on another
tree.
Or you can use sigprocmask(). Actually it should die, but this won't happen
soon and it is already exported.
> And looking at other similar places like coda_block_signals(),
Yes, it can have much more users.
But note that set_current_blocked() can't help you to really block
SIGKILL anyway.
Could you explain why __l_wait_event() can't use TASK_UNINTERRUPTIBLE
instead of cfs_block_sigsinv(0) ?
Oleg.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH RFC] sched: introduce add_wait_queue_exclusive_head
2014-03-19 17:33 ` Oleg Nesterov
@ 2014-03-19 19:44 ` Dilger, Andreas
2014-03-19 19:55 ` Peter Zijlstra
0 siblings, 1 reply; 21+ messages in thread
From: Dilger, Andreas @ 2014-03-19 19:44 UTC (permalink / raw)
To: Oleg Nesterov, Peng Tao
Cc: Peter Zijlstra, Linux Kernel Mailing List, Ingo Molnar,
Drokin, Oleg
On 2014/03/19, 11:33 AM, "Oleg Nesterov" <oleg@redhat.com> wrote:
>On 03/19, Peng Tao wrote:
>>
>> On Wed, Mar 19, 2014 at 12:23 AM, Oleg Nesterov <oleg@redhat.com> wrote:
>> >
>> > Firtsly, cfs_block_sigs/cfs_block_sigsinv/etc are not exactly right,
>> > they need set_current_blocked(). And you can read "old" lockless.
>> >
>> It seems that set_current_blocked() is not exported. Can we ask to
>>export it?
>
>Why not. If you are going to change this code to use
>set_current_blocked(), I'd suggest you to send the "export
>set_current_blocked" patch in series. Otherwise, if it is sent
> separately, your change will depend on another tree.
>
>Or you can use sigprocmask(). Actually it should die, but this won't
>happen soon and it is already exported.
>
>> And looking at other similar places like coda_block_signals(),
>
>Yes, it can have much more users.
>
>But note that set_current_blocked() can't help you to really block
>SIGKILL anyway.
>
>Could you explain why __l_wait_event() can't use TASK_UNINTERRUPTIBLE
>instead of cfs_block_sigsinv(0) ?
The original reason for l_wait_event() not using TASK_UNINTERRUPTIBLE
is to avoid the load on the server continually being "num_service_threads"
regardless of whether they are actually doing something or not. We
added various cases for periodic wakeups and such afterward.
l_wait_event() was originally developed for 2.4 kernels, so there may
well be better primitives to use today.
I'd be happy to move toward replacing l_wait_event() with kernel
primitives if possible, but we need to ensure that this is tested
sufficiently since it can otherwise be a source of hard-to-find bugs.
Cheers, Andreas
--
Andreas Dilger
Lustre Software Architect
Intel High Performance Data Division
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH RFC] sched: introduce add_wait_queue_exclusive_head
2014-03-19 19:44 ` Dilger, Andreas
@ 2014-03-19 19:55 ` Peter Zijlstra
2014-03-20 7:06 ` Dilger, Andreas
2014-03-20 18:49 ` Oleg Nesterov
0 siblings, 2 replies; 21+ messages in thread
From: Peter Zijlstra @ 2014-03-19 19:55 UTC (permalink / raw)
To: Dilger, Andreas
Cc: Oleg Nesterov, Peng Tao, Linux Kernel Mailing List, Ingo Molnar,
Drokin, Oleg
On Wed, Mar 19, 2014 at 07:44:29PM +0000, Dilger, Andreas wrote:
> The original reason for l_wait_event() not using TASK_UNINTERRUPTIBLE
> is to avoid the load on the server continually being "num_service_threads"
> regardless of whether they are actually doing something or not. We
> added various cases for periodic wakeups and such afterward.
Hmm, maybe we should finally do the TASK_IDLE thing;
https://lkml.org/lkml/2013/11/12/710
Oleg?
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH RFC] sched: introduce add_wait_queue_exclusive_head
2014-03-19 19:55 ` Peter Zijlstra
@ 2014-03-20 7:06 ` Dilger, Andreas
2014-03-20 18:49 ` Oleg Nesterov
1 sibling, 0 replies; 21+ messages in thread
From: Dilger, Andreas @ 2014-03-20 7:06 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Oleg Nesterov, Peng Tao, Linux Kernel Mailing List, Ingo Molnar,
Drokin, Oleg
On 2014/03/19, 1:55 PM, "Peter Zijlstra" <peterz@infradead.org> wrote:
>On Wed, Mar 19, 2014 at 07:44:29PM +0000, Dilger, Andreas wrote:
>> The original reason for l_wait_event() not using TASK_UNINTERRUPTIBLE
>> is to avoid the load on the server continually being
>>"num_service_threads"
>> regardless of whether they are actually doing something or not. We
>> added various cases for periodic wakeups and such afterward.
>
>Hmm, maybe we should finally do the TASK_IDLE thing;
>
> https://lkml.org/lkml/2013/11/12/710
That looks pretty interesting, and appears to do what we need.
Cheers, Andreas
--
Andreas Dilger
Lustre Software Architect
Intel High Performance Data Division
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH RFC] sched: introduce add_wait_queue_exclusive_head
2014-03-19 19:55 ` Peter Zijlstra
2014-03-20 7:06 ` Dilger, Andreas
@ 2014-03-20 18:49 ` Oleg Nesterov
1 sibling, 0 replies; 21+ messages in thread
From: Oleg Nesterov @ 2014-03-20 18:49 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Dilger, Andreas, Peng Tao, Linux Kernel Mailing List, Ingo Molnar,
Drokin, Oleg
On 03/19, Peter Zijlstra wrote:
>
> On Wed, Mar 19, 2014 at 07:44:29PM +0000, Dilger, Andreas wrote:
> > The original reason for l_wait_event() not using TASK_UNINTERRUPTIBLE
> > is to avoid the load on the server continually being "num_service_threads"
> > regardless of whether they are actually doing something or not. We
> > added various cases for periodic wakeups and such afterward.
>
> Hmm, maybe we should finally do the TASK_IDLE thing;
>
> https://lkml.org/lkml/2013/11/12/710
Agreed. And if we are not going to expose TASK_IDLE bit in /proc/ (I think
we should not) the patch should be really trivial.
Oleg.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH RFC] sched: introduce add_wait_queue_exclusive_head
2014-03-18 14:05 ` Peter Zijlstra
2014-03-18 14:44 ` Peng Tao
@ 2014-03-18 15:47 ` Oleg Nesterov
2014-03-19 2:17 ` Peng Tao
1 sibling, 1 reply; 21+ messages in thread
From: Oleg Nesterov @ 2014-03-18 15:47 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Peng Tao, Linux Kernel Mailing List, Ingo Molnar, Oleg Drokin,
Andreas Dilger
On 03/18, Peter Zijlstra wrote:
>
> I think we can avoid the entire function if we add
> WQ_FLAG_LIFO and make prepare_to_wait_event() DTRT.
Agreed, this looks very natural.
prepare_to_wait_event(WQ_FLAG_LIFO) should probably skip all "flags == 0"
entries before list_add(). Given that it is supposed that the users won't
mix exclusive and !exclusive, perhaps the additional list_for_each() won't
hurt?
> Then we only need to teach ___wait() about this; and I suppose we could
> make .exclusive=-1 be the LIFO flag.
Or we can change ___wait_event()
- if (exclusive) \
- __wait.flags = WQ_FLAG_EXCLUSIVE; \
- else \
- __wait.flags = 0; \
+ __wait.flags = exclusive; \
and obviously change the callers. Perhaps this argument should be renamed
then.
But I am fine either way, just I like the idea to extend the current
interface.
Oleg.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH RFC] sched: introduce add_wait_queue_exclusive_head
2014-03-18 15:47 ` Oleg Nesterov
@ 2014-03-19 2:17 ` Peng Tao
[not found] ` <20140319164907.GA10113@redhat.com>
0 siblings, 1 reply; 21+ messages in thread
From: Peng Tao @ 2014-03-19 2:17 UTC (permalink / raw)
To: Oleg Nesterov
Cc: Peter Zijlstra, Linux Kernel Mailing List, Ingo Molnar,
Oleg Drokin, Andreas Dilger
On Tue, Mar 18, 2014 at 11:47 PM, Oleg Nesterov <oleg@redhat.com> wrote:
> On 03/18, Peter Zijlstra wrote:
>>
>> I think we can avoid the entire function if we add
>> WQ_FLAG_LIFO and make prepare_to_wait_event() DTRT.
>
> Agreed, this looks very natural.
>
> prepare_to_wait_event(WQ_FLAG_LIFO) should probably skip all "flags == 0"
> entries before list_add(). Given that it is supposed that the users won't
> mix exclusive and !exclusive, perhaps the additional list_for_each() won't
> hurt?
>
So please allow me to summary and define the semantics of wait/wake_up
w.r.t. this.
prepare_to_wait_event(0): task is added at the head of the queue.
prepare_to_wait_event(WQ_FLAG_LIFO): task is added at the head of
exclusive queue head
prepare_to_wait_event(WQ_FLAG_EXCLUSIVE): task is added at the tail of the queue
Maybe we should rename the flags to WQ_FLAG_EXCLUSIVE_FIFO and
WQ_FLAG_EXCLUSIVE_LIFO?
>> Then we only need to teach ___wait() about this; and I suppose we could
>> make .exclusive=-1 be the LIFO flag.
>
> Or we can change ___wait_event()
>
> - if (exclusive) \
> - __wait.flags = WQ_FLAG_EXCLUSIVE; \
> - else \
> - __wait.flags = 0; \
> + __wait.flags = exclusive; \
>
> and obviously change the callers. Perhaps this argument should be renamed
> then.
>
Current __wait.flags allows possible extension to the existing
interface. If we change it to __wait.flags = exclusive, we drop the
future extensibility. Is it acceptable?
Thanks,
Tao
> But I am fine either way, just I like the idea to extend the current
> interface.
>
> Oleg.
>
^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2014-03-21 18:50 UTC | newest]
Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-03-18 13:10 [PATCH RFC] sched: introduce add_wait_queue_exclusive_head Peng Tao
2014-03-18 13:33 ` Peter Zijlstra
2014-03-18 13:51 ` Peng Tao
2014-03-18 14:05 ` Peter Zijlstra
2014-03-18 14:44 ` Peng Tao
2014-03-18 16:23 ` Oleg Nesterov
2014-03-19 2:22 ` Peng Tao
2014-03-19 17:33 ` Oleg Nesterov
2014-03-19 19:44 ` Dilger, Andreas
2014-03-19 19:55 ` Peter Zijlstra
2014-03-20 7:06 ` Dilger, Andreas
2014-03-20 18:49 ` Oleg Nesterov
2014-03-18 15:47 ` Oleg Nesterov
2014-03-19 2:17 ` Peng Tao
[not found] ` <20140319164907.GA10113@redhat.com>
2014-03-19 16:57 ` Peter Zijlstra
2014-03-19 17:19 ` Oleg Nesterov
2014-03-20 17:51 ` [PATCH 0/2] wait: introduce WQ_FLAG_EXCLUSIVE_HEAD Oleg Nesterov
2014-03-20 17:51 ` [PATCH 1/2] wait: turn "bool exclusive" arg of __wait_event() into wflags Oleg Nesterov
2014-03-20 17:51 ` [PATCH 2/2] wait: introduce WQ_FLAG_EXCLUSIVE_HEAD Oleg Nesterov
2014-03-21 2:45 ` [PATCH 0/2] " Dilger, Andreas
2014-03-21 18:49 ` Oleg Nesterov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox