From: Ingo Molnar <mingo@elte.hu>
To: Patrick McHardy <kaber@trash.net>,
Oleg Nesterov <oleg@redhat.com>,
Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephen Hemminger <shemminger@vyatta.com>,
David Miller <davem@davemloft.net>,
Rick Jones <rick.jones2@hp.com>,
Eric Dumazet <dada1@cosmosbay.com>,
netdev@vger.kernel.org, netfilter-devel@vger.kernel.org,
tglx@linutronix.de, Martin Josefsson <gandalf@wlug.westbo.se>
Subject: [patch] timers: add mod_timer_pending()
Date: Wed, 18 Feb 2009 13:05:08 +0100 [thread overview]
Message-ID: <20090218120508.GB4100@elte.hu> (raw)
In-Reply-To: <499BDDFE.5010101@trash.net>
* Patrick McHardy <kaber@trash.net> wrote:
> Ingo Molnar wrote:
>>> -extern int __mod_timer(struct timer_list *timer, unsigned long expires);
>>> +extern int __mod_timer(struct timer_list *timer, unsigned long expires, int activate);
>>
>> This is not really acceptable, it slows down every single add_timer()
>> and mod_timer() call in the kernel with a flag that has one specific
>> value in all but your case. There's more than 2000 such callsites in
>> the kernel.
>>
>> Why dont you use something like this instead:
>>
>> if (del_timer(timer))
>> add_timer(timer);
>
> We need to avoid having a timer that was deleted by one CPU
> getting re-added by another, but want to avoid taking the
> conntrack lock for every timer update. The timer-internal
> locking is enough for this as long as we have a mod_timer
> variant that forwards a timer, but doesn't activate it in
> case it isn't active already.
that makes sense - but the implementation is still somewhat
ugly. How about the one below instead? Not tested.
One open question is this construct in mod_timer():
+ /*
+ * This is a common optimization triggered by the
+ * networking code - if the timer is re-modified
+ * to be the same thing then just return:
+ */
+ if (timer->expires == expires && timer_pending(timer))
+ return 1;
We've had this for ages, but it seems rather SMP-unsafe.
timer_pending(), if used in an unserialized fashion, can be any
random value in theory - there's no internal serialization here
anywhere.
We could end up with incorrectly not re-activating a timer in
mod_timer() for example - have such things never been observed
in practice?
So the original patch which added this to mod_timer_noact() was
racy i think, and we cannot preserve this optimization outside
of the timer list lock. (we could do it inside of it.)
Ingo
------------------->
Subject: timers: add mod_timer_pending()
From: Ingo Molnar <mingo@elte.hu>
Date: Wed, 18 Feb 2009 12:23:29 +0100
Impact: new timer API
Based on an idea from Stephen Hemminger: introduce
mod_timer_pending() which is a mod_timer() offspring
that is an invariant on already removed timers.
(regular mod_timer() re-activates non-pending timers.)
This is useful for the networking code in that it can
allow unserialized mod_timer_pending() timer-forwarding
calls, but a single del_timer*() will stop the timer
from being reactivated again.
Also while at it:
- optimize the regular mod_timer() path some more, the
timer-stat and a debug check was needlessly duplicated
in __mod_timer().
- make the exports come straight after the function, as
most other exports in timer.c already did.
- eliminate __mod_timer() as an external API, change the
users to mod_timer().
The regular mod_timer() code path is not impacted
significantly, due to inlining optimizations and due to
the simplifications - but performance testing would be nice
nevertheless.
Based-on-patch-from: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
arch/powerpc/platforms/cell/spufs/sched.c | 2
drivers/infiniband/hw/ipath/ipath_driver.c | 6 -
include/linux/timer.h | 22 -----
kernel/relay.c | 2
kernel/timer.c | 110 +++++++++++++++++++----------
5 files changed, 80 insertions(+), 62 deletions(-)
Index: linux/arch/powerpc/platforms/cell/spufs/sched.c
===================================================================
--- linux.orig/arch/powerpc/platforms/cell/spufs/sched.c
+++ linux/arch/powerpc/platforms/cell/spufs/sched.c
@@ -508,7 +508,7 @@ static void __spu_add_to_rq(struct spu_c
list_add_tail(&ctx->rq, &spu_prio->runq[ctx->prio]);
set_bit(ctx->prio, spu_prio->bitmap);
if (!spu_prio->nr_waiting++)
- __mod_timer(&spusched_timer, jiffies + SPUSCHED_TICK);
+ mod_timer(&spusched_timer, jiffies + SPUSCHED_TICK);
}
}
Index: linux/drivers/infiniband/hw/ipath/ipath_driver.c
===================================================================
--- linux.orig/drivers/infiniband/hw/ipath/ipath_driver.c
+++ linux/drivers/infiniband/hw/ipath/ipath_driver.c
@@ -2715,7 +2715,7 @@ static void ipath_hol_signal_up(struct i
* to prevent HoL blocking, then start the HoL timer that
* periodically continues, then stop procs, so they can detect
* link down if they want, and do something about it.
- * Timer may already be running, so use __mod_timer, not add_timer.
+ * Timer may already be running, so use mod_timer, not add_timer.
*/
void ipath_hol_down(struct ipath_devdata *dd)
{
@@ -2724,7 +2724,7 @@ void ipath_hol_down(struct ipath_devdata
dd->ipath_hol_next = IPATH_HOL_DOWNCONT;
dd->ipath_hol_timer.expires = jiffies +
msecs_to_jiffies(ipath_hol_timeout_ms);
- __mod_timer(&dd->ipath_hol_timer, dd->ipath_hol_timer.expires);
+ mod_timer(&dd->ipath_hol_timer, dd->ipath_hol_timer.expires);
}
/*
@@ -2763,7 +2763,7 @@ void ipath_hol_event(unsigned long opaqu
else {
dd->ipath_hol_timer.expires = jiffies +
msecs_to_jiffies(ipath_hol_timeout_ms);
- __mod_timer(&dd->ipath_hol_timer,
+ mod_timer(&dd->ipath_hol_timer,
dd->ipath_hol_timer.expires);
}
}
Index: linux/include/linux/timer.h
===================================================================
--- linux.orig/include/linux/timer.h
+++ linux/include/linux/timer.h
@@ -161,8 +161,8 @@ static inline int timer_pending(const st
extern void add_timer_on(struct timer_list *timer, int cpu);
extern int del_timer(struct timer_list * timer);
-extern int __mod_timer(struct timer_list *timer, unsigned long expires);
extern int mod_timer(struct timer_list *timer, unsigned long expires);
+extern int mod_timer_pending(struct timer_list *timer, unsigned long expires);
/*
* The jiffies value which is added to now, when there is no timer
@@ -221,25 +221,7 @@ static inline void timer_stats_timer_cle
}
#endif
-/**
- * add_timer - start a timer
- * @timer: the timer to be added
- *
- * The kernel will do a ->function(->data) callback from the
- * timer interrupt at the ->expires point in the future. The
- * current time is 'jiffies'.
- *
- * The timer's ->expires, ->function (and if the handler uses it, ->data)
- * fields must be set prior calling this function.
- *
- * Timers with an ->expires field in the past will be executed in the next
- * timer tick.
- */
-static inline void add_timer(struct timer_list *timer)
-{
- BUG_ON(timer_pending(timer));
- __mod_timer(timer, timer->expires);
-}
+extern void add_timer(struct timer_list *timer);
#ifdef CONFIG_SMP
extern int try_to_del_timer_sync(struct timer_list *timer);
Index: linux/kernel/relay.c
===================================================================
--- linux.orig/kernel/relay.c
+++ linux/kernel/relay.c
@@ -748,7 +748,7 @@ size_t relay_switch_subbuf(struct rchan_
* from the scheduler (trying to re-grab
* rq->lock), so defer it.
*/
- __mod_timer(&buf->timer, jiffies + 1);
+ mod_timer(&buf->timer, jiffies + 1);
}
old = buf->data;
Index: linux/kernel/timer.c
===================================================================
--- linux.orig/kernel/timer.c
+++ linux/kernel/timer.c
@@ -600,11 +600,14 @@ static struct tvec_base *lock_timer_base
}
}
-int __mod_timer(struct timer_list *timer, unsigned long expires)
+static inline int
+__mod_timer(struct timer_list *timer, unsigned long expires, bool pending_only)
{
struct tvec_base *base, *new_base;
unsigned long flags;
- int ret = 0;
+ int ret;
+
+ ret = 0;
timer_stats_timer_set_start_info(timer);
BUG_ON(!timer->function);
@@ -614,6 +617,9 @@ int __mod_timer(struct timer_list *timer
if (timer_pending(timer)) {
detach_timer(timer, 0);
ret = 1;
+ } else {
+ if (pending_only)
+ goto out_unlock;
}
debug_timer_activate(timer);
@@ -640,42 +646,28 @@ int __mod_timer(struct timer_list *timer
timer->expires = expires;
internal_add_timer(base, timer);
+
+out_unlock:
spin_unlock_irqrestore(&base->lock, flags);
return ret;
}
-EXPORT_SYMBOL(__mod_timer);
-
/**
- * add_timer_on - start a timer on a particular CPU
- * @timer: the timer to be added
- * @cpu: the CPU to start it on
+ * mod_timer_pending - modify a pending timer's timeout
+ * @timer: the pending timer to be modified
+ * @expires: new timeout in jiffies
*
- * This is not very scalable on SMP. Double adds are not possible.
+ * mod_timer_pending() is the same for pending timers as mod_timer(),
+ * but will not re-activate and modify already deleted timers.
+ *
+ * It is useful for unserialized use of timers.
*/
-void add_timer_on(struct timer_list *timer, int cpu)
+int mod_timer_pending(struct timer_list *timer, unsigned long expires)
{
- struct tvec_base *base = per_cpu(tvec_bases, cpu);
- unsigned long flags;
-
- timer_stats_timer_set_start_info(timer);
- BUG_ON(timer_pending(timer) || !timer->function);
- spin_lock_irqsave(&base->lock, flags);
- timer_set_base(timer, base);
- debug_timer_activate(timer);
- internal_add_timer(base, timer);
- /*
- * Check whether the other CPU is idle and needs to be
- * triggered to reevaluate the timer wheel when nohz is
- * active. We are protected against the other CPU fiddling
- * with the timer by holding the timer base lock. This also
- * makes sure that a CPU on the way to idle can not evaluate
- * the timer wheel.
- */
- wake_up_idle_cpu(cpu);
- spin_unlock_irqrestore(&base->lock, flags);
+ return __mod_timer(timer, expires, true);
}
+EXPORT_SYMBOL(mod_timer_pending);
/**
* mod_timer - modify a timer's timeout
@@ -699,9 +691,6 @@ void add_timer_on(struct timer_list *tim
*/
int mod_timer(struct timer_list *timer, unsigned long expires)
{
- BUG_ON(!timer->function);
-
- timer_stats_timer_set_start_info(timer);
/*
* This is a common optimization triggered by the
* networking code - if the timer is re-modified
@@ -710,12 +699,62 @@ int mod_timer(struct timer_list *timer,
if (timer->expires == expires && timer_pending(timer))
return 1;
- return __mod_timer(timer, expires);
+ return __mod_timer(timer, expires, false);
}
-
EXPORT_SYMBOL(mod_timer);
/**
+ * add_timer - start a timer
+ * @timer: the timer to be added
+ *
+ * The kernel will do a ->function(->data) callback from the
+ * timer interrupt at the ->expires point in the future. The
+ * current time is 'jiffies'.
+ *
+ * The timer's ->expires, ->function (and if the handler uses it, ->data)
+ * fields must be set prior calling this function.
+ *
+ * Timers with an ->expires field in the past will be executed in the next
+ * timer tick.
+ */
+void add_timer(struct timer_list *timer)
+{
+ BUG_ON(timer_pending(timer));
+ mod_timer(timer, timer->expires);
+}
+EXPORT_SYMBOL(add_timer);
+
+/**
+ * add_timer_on - start a timer on a particular CPU
+ * @timer: the timer to be added
+ * @cpu: the CPU to start it on
+ *
+ * This is not very scalable on SMP. Double adds are not possible.
+ */
+void add_timer_on(struct timer_list *timer, int cpu)
+{
+ struct tvec_base *base = per_cpu(tvec_bases, cpu);
+ unsigned long flags;
+
+ timer_stats_timer_set_start_info(timer);
+ BUG_ON(timer_pending(timer) || !timer->function);
+ spin_lock_irqsave(&base->lock, flags);
+ timer_set_base(timer, base);
+ debug_timer_activate(timer);
+ internal_add_timer(base, timer);
+ /*
+ * Check whether the other CPU is idle and needs to be
+ * triggered to reevaluate the timer wheel when nohz is
+ * active. We are protected against the other CPU fiddling
+ * with the timer by holding the timer base lock. This also
+ * makes sure that a CPU on the way to idle can not evaluate
+ * the timer wheel.
+ */
+ wake_up_idle_cpu(cpu);
+ spin_unlock_irqrestore(&base->lock, flags);
+}
+
+/**
* del_timer - deactive a timer.
* @timer: the timer to be deactivated
*
@@ -744,7 +783,6 @@ int del_timer(struct timer_list *timer)
return ret;
}
-
EXPORT_SYMBOL(del_timer);
#ifdef CONFIG_SMP
@@ -778,7 +816,6 @@ out:
return ret;
}
-
EXPORT_SYMBOL(try_to_del_timer_sync);
/**
@@ -816,7 +853,6 @@ int del_timer_sync(struct timer_list *ti
cpu_relax();
}
}
-
EXPORT_SYMBOL(del_timer_sync);
#endif
@@ -1314,7 +1350,7 @@ signed long __sched schedule_timeout(sig
expire = timeout + jiffies;
setup_timer_on_stack(&timer, process_timeout, (unsigned long)current);
- __mod_timer(&timer, expire);
+ __mod_timer(&timer, expire, false);
schedule();
del_singleshot_timer_sync(&timer);
next prev parent reply other threads:[~2009-02-18 12:05 UTC|newest]
Thread overview: 84+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-02-18 5:19 [RFT 0/4] Netfilter/iptables performance improvements Stephen Hemminger
2009-02-18 5:19 ` [RFT 1/4] iptables: lock free counters Stephen Hemminger
2009-02-18 10:02 ` Patrick McHardy
2009-02-19 19:47 ` [PATCH] " Stephen Hemminger
2009-02-19 23:46 ` Eric Dumazet
2009-02-19 23:56 ` Rick Jones
2009-02-20 1:03 ` Stephen Hemminger
2009-02-20 1:18 ` Rick Jones
2009-02-20 9:42 ` Patrick McHardy
2009-02-20 22:57 ` Rick Jones
2009-02-21 0:35 ` Rick Jones
2009-02-20 9:37 ` Patrick McHardy
2009-02-20 18:10 ` [PATCH] iptables: xt_hashlimit fix Eric Dumazet
2009-02-20 18:33 ` Jan Engelhardt
2009-02-28 1:54 ` Jan Engelhardt
2009-02-28 6:56 ` Eric Dumazet
2009-02-28 8:22 ` Jan Engelhardt
2009-02-24 14:31 ` Patrick McHardy
2009-02-27 14:02 ` [PATCH] iptables: lock free counters Eric Dumazet
2009-02-27 16:08 ` [PATCH] rcu: increment quiescent state counter in ksoftirqd() Eric Dumazet
2009-02-27 16:08 ` Eric Dumazet
2009-02-27 16:34 ` Paul E. McKenney
2009-03-02 10:55 ` [PATCH] iptables: lock free counters Patrick McHardy
2009-03-02 17:47 ` Eric Dumazet
2009-03-02 21:56 ` Patrick McHardy
2009-03-02 22:02 ` Stephen Hemminger
2009-03-02 22:07 ` Patrick McHardy
2009-03-02 22:17 ` Paul E. McKenney
2009-03-02 22:27 ` Eric Dumazet
2009-02-18 5:19 ` [RFT 2/4] Add mod_timer_noact Stephen Hemminger
2009-02-18 9:20 ` Ingo Molnar
2009-02-18 9:30 ` David Miller
2009-02-18 11:01 ` Ingo Molnar
2009-02-18 11:39 ` Jarek Poplawski
2009-02-18 12:37 ` Ingo Molnar
2009-02-18 12:33 ` Patrick McHardy
2009-02-18 21:39 ` David Miller
2009-02-18 21:51 ` Ingo Molnar
2009-02-18 22:04 ` David Miller
2009-02-18 22:42 ` Peter Zijlstra
2009-02-18 22:47 ` David Miller
2009-02-18 22:56 ` Stephen Hemminger
2009-02-18 10:07 ` Patrick McHardy
2009-02-18 12:05 ` Ingo Molnar [this message]
2009-02-18 12:33 ` [patch] timers: add mod_timer_pending() Patrick McHardy
2009-02-18 12:50 ` Ingo Molnar
2009-02-18 12:54 ` Patrick McHardy
2009-02-18 13:47 ` Ingo Molnar
2009-02-18 17:00 ` Oleg Nesterov
2009-02-18 18:23 ` Ingo Molnar
2009-02-18 18:58 ` Oleg Nesterov
2009-02-18 19:24 ` Ingo Molnar
2009-02-18 10:29 ` [RFT 2/4] Add mod_timer_noact Patrick McHardy
2009-02-18 5:19 ` [RFT 3/4] Use mod_timer_noact to remove nf_conntrack_lock Stephen Hemminger
2009-02-18 9:54 ` Patrick McHardy
2009-02-18 11:05 ` Jarek Poplawski
2009-02-18 11:08 ` Patrick McHardy
2009-02-18 14:01 ` Eric Dumazet
2009-02-18 14:04 ` Patrick McHardy
2009-02-18 14:22 ` Eric Dumazet
2009-02-18 14:27 ` Patrick McHardy
2009-02-18 5:19 ` [RFT 4/4] netfilter: Get rid of central rwlock in tcp conntracking Stephen Hemminger
2009-02-18 9:56 ` Patrick McHardy
2009-02-18 14:17 ` Eric Dumazet
2009-02-19 22:03 ` Stephen Hemminger
2009-03-28 16:55 ` [PATCH] netfilter: finer grained nf_conn locking Eric Dumazet
2009-03-29 0:48 ` Stephen Hemminger
2009-03-30 19:57 ` Eric Dumazet
2009-03-30 20:05 ` Stephen Hemminger
2009-04-06 12:07 ` Patrick McHardy
2009-04-06 12:32 ` Jan Engelhardt
2009-04-06 17:25 ` Stephen Hemminger
2009-03-30 18:57 ` Rick Jones
2009-03-30 19:20 ` Eric Dumazet
2009-03-30 19:38 ` Jesper Dangaard Brouer
2009-03-30 19:54 ` Eric Dumazet
2009-03-30 20:34 ` Jesper Dangaard Brouer
2009-03-30 20:41 ` Eric Dumazet
2009-03-30 21:25 ` Jesper Dangaard Brouer
2009-03-30 22:44 ` Rick Jones
2009-02-18 21:55 ` [RFT 4/4] netfilter: Get rid of central rwlock in tcp conntracking David Miller
2009-02-18 23:23 ` Patrick McHardy
2009-02-18 23:35 ` Stephen Hemminger
2009-02-18 8:30 ` [RFT 0/4] Netfilter/iptables performance improvements Eric Dumazet
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090218120508.GB4100@elte.hu \
--to=mingo@elte.hu \
--cc=a.p.zijlstra@chello.nl \
--cc=dada1@cosmosbay.com \
--cc=davem@davemloft.net \
--cc=gandalf@wlug.westbo.se \
--cc=kaber@trash.net \
--cc=netdev@vger.kernel.org \
--cc=netfilter-devel@vger.kernel.org \
--cc=oleg@redhat.com \
--cc=rick.jones2@hp.com \
--cc=shemminger@vyatta.com \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.