From: Ingo Molnar <mingo@elte.hu>
To: Patrick McHardy <kaber@trash.net>,
Oleg Nesterov <oleg@redhat.com>,
Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephen Hemminger <shemminger@vyatta.com>,
David Miller <davem@davemloft.net>,
Rick Jones <rick.jones2@hp.com>,
Eric Dumazet <dada1@cosmosbay.com>,
netdev@vger.kernel.org, netfilter-devel@vger.kernel.org,
tglx@linutronix.de, Martin Josefsson <gandalf@wlug.westbo.se>
Subject: [patch] timers: add mod_timer_pending()
Date: Wed, 18 Feb 2009 13:05:08 +0100 [thread overview]
Message-ID: <20090218120508.GB4100@elte.hu> (raw)
In-Reply-To: <499BDDFE.5010101@trash.net>
* Patrick McHardy <kaber@trash.net> wrote:
> Ingo Molnar wrote:
>>> -extern int __mod_timer(struct timer_list *timer, unsigned long expires);
>>> +extern int __mod_timer(struct timer_list *timer, unsigned long expires, int activate);
>>
>> This is not really acceptable, it slows down every single add_timer()
>> and mod_timer() call in the kernel with a flag that has one specific
>> value in all but your case. There's more than 2000 such callsites in
>> the kernel.
>>
>> Why dont you use something like this instead:
>>
>> if (del_timer(timer))
>> add_timer(timer);
>
> We need to avoid having a timer that was deleted by one CPU
> getting re-added by another, but want to avoid taking the
> conntrack lock for every timer update. The timer-internal
> locking is enough for this as long as we have a mod_timer
> variant that forwards a timer, but doesn't activate it in
> case it isn't active already.
that makes sense - but the implementation is still somewhat
ugly. How about the one below instead? Not tested.
One open question is this construct in mod_timer():
+ /*
+ * This is a common optimization triggered by the
+ * networking code - if the timer is re-modified
+ * to be the same thing then just return:
+ */
+ if (timer->expires == expires && timer_pending(timer))
+ return 1;
We've had this for ages, but it seems rather SMP-unsafe.
timer_pending(), if used in an unserialized fashion, can be any
random value in theory - there's no internal serialization here
anywhere.
We could end up with incorrectly not re-activating a timer in
mod_timer() for example - have such things never been observed
in practice?
So the original patch which added this to mod_timer_noact() was
racy i think, and we cannot preserve this optimization outside
of the timer list lock. (we could do it inside of it.)
Ingo
------------------->
Subject: timers: add mod_timer_pending()
From: Ingo Molnar <mingo@elte.hu>
Date: Wed, 18 Feb 2009 12:23:29 +0100
Impact: new timer API
Based on an idea from Stephen Hemminger: introduce
mod_timer_pending() which is a mod_timer() offspring
that is an invariant on already removed timers.
(regular mod_timer() re-activates non-pending timers.)
This is useful for the networking code in that it can
allow unserialized mod_timer_pending() timer-forwarding
calls, but a single del_timer*() will stop the timer
from being reactivated again.
Also while at it:
- optimize the regular mod_timer() path some more, the
timer-stat and a debug check was needlessly duplicated
in __mod_timer().
- make the exports come straight after the function, as
most other exports in timer.c already did.
- eliminate __mod_timer() as an external API, change the
users to mod_timer().
The regular mod_timer() code path is not impacted
significantly, due to inlining optimizations and due to
the simplifications - but performance testing would be nice
nevertheless.
Based-on-patch-from: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
arch/powerpc/platforms/cell/spufs/sched.c | 2
drivers/infiniband/hw/ipath/ipath_driver.c | 6 -
include/linux/timer.h | 22 -----
kernel/relay.c | 2
kernel/timer.c | 110 +++++++++++++++++++----------
5 files changed, 80 insertions(+), 62 deletions(-)
Index: linux/arch/powerpc/platforms/cell/spufs/sched.c
===================================================================
--- linux.orig/arch/powerpc/platforms/cell/spufs/sched.c
+++ linux/arch/powerpc/platforms/cell/spufs/sched.c
@@ -508,7 +508,7 @@ static void __spu_add_to_rq(struct spu_c
list_add_tail(&ctx->rq, &spu_prio->runq[ctx->prio]);
set_bit(ctx->prio, spu_prio->bitmap);
if (!spu_prio->nr_waiting++)
- __mod_timer(&spusched_timer, jiffies + SPUSCHED_TICK);
+ mod_timer(&spusched_timer, jiffies + SPUSCHED_TICK);
}
}
Index: linux/drivers/infiniband/hw/ipath/ipath_driver.c
===================================================================
--- linux.orig/drivers/infiniband/hw/ipath/ipath_driver.c
+++ linux/drivers/infiniband/hw/ipath/ipath_driver.c
@@ -2715,7 +2715,7 @@ static void ipath_hol_signal_up(struct i
* to prevent HoL blocking, then start the HoL timer that
* periodically continues, then stop procs, so they can detect
* link down if they want, and do something about it.
- * Timer may already be running, so use __mod_timer, not add_timer.
+ * Timer may already be running, so use mod_timer, not add_timer.
*/
void ipath_hol_down(struct ipath_devdata *dd)
{
@@ -2724,7 +2724,7 @@ void ipath_hol_down(struct ipath_devdata
dd->ipath_hol_next = IPATH_HOL_DOWNCONT;
dd->ipath_hol_timer.expires = jiffies +
msecs_to_jiffies(ipath_hol_timeout_ms);
- __mod_timer(&dd->ipath_hol_timer, dd->ipath_hol_timer.expires);
+ mod_timer(&dd->ipath_hol_timer, dd->ipath_hol_timer.expires);
}
/*
@@ -2763,7 +2763,7 @@ void ipath_hol_event(unsigned long opaqu
else {
dd->ipath_hol_timer.expires = jiffies +
msecs_to_jiffies(ipath_hol_timeout_ms);
- __mod_timer(&dd->ipath_hol_timer,
+ mod_timer(&dd->ipath_hol_timer,
dd->ipath_hol_timer.expires);
}
}
Index: linux/include/linux/timer.h
===================================================================
--- linux.orig/include/linux/timer.h
+++ linux/include/linux/timer.h
@@ -161,8 +161,8 @@ static inline int timer_pending(const st
extern void add_timer_on(struct timer_list *timer, int cpu);
extern int del_timer(struct timer_list * timer);
-extern int __mod_timer(struct timer_list *timer, unsigned long expires);
extern int mod_timer(struct timer_list *timer, unsigned long expires);
+extern int mod_timer_pending(struct timer_list *timer, unsigned long expires);
/*
* The jiffies value which is added to now, when there is no timer
@@ -221,25 +221,7 @@ static inline void timer_stats_timer_cle
}
#endif
-/**
- * add_timer - start a timer
- * @timer: the timer to be added
- *
- * The kernel will do a ->function(->data) callback from the
- * timer interrupt at the ->expires point in the future. The
- * current time is 'jiffies'.
- *
- * The timer's ->expires, ->function (and if the handler uses it, ->data)
- * fields must be set prior calling this function.
- *
- * Timers with an ->expires field in the past will be executed in the next
- * timer tick.
- */
-static inline void add_timer(struct timer_list *timer)
-{
- BUG_ON(timer_pending(timer));
- __mod_timer(timer, timer->expires);
-}
+extern void add_timer(struct timer_list *timer);
#ifdef CONFIG_SMP
extern int try_to_del_timer_sync(struct timer_list *timer);
Index: linux/kernel/relay.c
===================================================================
--- linux.orig/kernel/relay.c
+++ linux/kernel/relay.c
@@ -748,7 +748,7 @@ size_t relay_switch_subbuf(struct rchan_
* from the scheduler (trying to re-grab
* rq->lock), so defer it.
*/
- __mod_timer(&buf->timer, jiffies + 1);
+ mod_timer(&buf->timer, jiffies + 1);
}
old = buf->data;
Index: linux/kernel/timer.c
===================================================================
--- linux.orig/kernel/timer.c
+++ linux/kernel/timer.c
@@ -600,11 +600,14 @@ static struct tvec_base *lock_timer_base
}
}
-int __mod_timer(struct timer_list *timer, unsigned long expires)
+static inline int
+__mod_timer(struct timer_list *timer, unsigned long expires, bool pending_only)
{
struct tvec_base *base, *new_base;
unsigned long flags;
- int ret = 0;
+ int ret;
+
+ ret = 0;
timer_stats_timer_set_start_info(timer);
BUG_ON(!timer->function);
@@ -614,6 +617,9 @@ int __mod_timer(struct timer_list *timer
if (timer_pending(timer)) {
detach_timer(timer, 0);
ret = 1;
+ } else {
+ if (pending_only)
+ goto out_unlock;
}
debug_timer_activate(timer);
@@ -640,42 +646,28 @@ int __mod_timer(struct timer_list *timer
timer->expires = expires;
internal_add_timer(base, timer);
+
+out_unlock:
spin_unlock_irqrestore(&base->lock, flags);
return ret;
}
-EXPORT_SYMBOL(__mod_timer);
-
/**
- * add_timer_on - start a timer on a particular CPU
- * @timer: the timer to be added
- * @cpu: the CPU to start it on
+ * mod_timer_pending - modify a pending timer's timeout
+ * @timer: the pending timer to be modified
+ * @expires: new timeout in jiffies
*
- * This is not very scalable on SMP. Double adds are not possible.
+ * mod_timer_pending() is the same for pending timers as mod_timer(),
+ * but will not re-activate and modify already deleted timers.
+ *
+ * It is useful for unserialized use of timers.
*/
-void add_timer_on(struct timer_list *timer, int cpu)
+int mod_timer_pending(struct timer_list *timer, unsigned long expires)
{
- struct tvec_base *base = per_cpu(tvec_bases, cpu);
- unsigned long flags;
-
- timer_stats_timer_set_start_info(timer);
- BUG_ON(timer_pending(timer) || !timer->function);
- spin_lock_irqsave(&base->lock, flags);
- timer_set_base(timer, base);
- debug_timer_activate(timer);
- internal_add_timer(base, timer);
- /*
- * Check whether the other CPU is idle and needs to be
- * triggered to reevaluate the timer wheel when nohz is
- * active. We are protected against the other CPU fiddling
- * with the timer by holding the timer base lock. This also
- * makes sure that a CPU on the way to idle can not evaluate
- * the timer wheel.
- */
- wake_up_idle_cpu(cpu);
- spin_unlock_irqrestore(&base->lock, flags);
+ return __mod_timer(timer, expires, true);
}
+EXPORT_SYMBOL(mod_timer_pending);
/**
* mod_timer - modify a timer's timeout
@@ -699,9 +691,6 @@ void add_timer_on(struct timer_list *tim
*/
int mod_timer(struct timer_list *timer, unsigned long expires)
{
- BUG_ON(!timer->function);
-
- timer_stats_timer_set_start_info(timer);
/*
* This is a common optimization triggered by the
* networking code - if the timer is re-modified
@@ -710,12 +699,62 @@ int mod_timer(struct timer_list *timer,
if (timer->expires == expires && timer_pending(timer))
return 1;
- return __mod_timer(timer, expires);
+ return __mod_timer(timer, expires, false);
}
-
EXPORT_SYMBOL(mod_timer);
/**
+ * add_timer - start a timer
+ * @timer: the timer to be added
+ *
+ * The kernel will do a ->function(->data) callback from the
+ * timer interrupt at the ->expires point in the future. The
+ * current time is 'jiffies'.
+ *
+ * The timer's ->expires, ->function (and if the handler uses it, ->data)
+ * fields must be set prior calling this function.
+ *
+ * Timers with an ->expires field in the past will be executed in the next
+ * timer tick.
+ */
+void add_timer(struct timer_list *timer)
+{
+ BUG_ON(timer_pending(timer));
+ mod_timer(timer, timer->expires);
+}
+EXPORT_SYMBOL(add_timer);
+
+/**
+ * add_timer_on - start a timer on a particular CPU
+ * @timer: the timer to be added
+ * @cpu: the CPU to start it on
+ *
+ * This is not very scalable on SMP. Double adds are not possible.
+ */
+void add_timer_on(struct timer_list *timer, int cpu)
+{
+ struct tvec_base *base = per_cpu(tvec_bases, cpu);
+ unsigned long flags;
+
+ timer_stats_timer_set_start_info(timer);
+ BUG_ON(timer_pending(timer) || !timer->function);
+ spin_lock_irqsave(&base->lock, flags);
+ timer_set_base(timer, base);
+ debug_timer_activate(timer);
+ internal_add_timer(base, timer);
+ /*
+ * Check whether the other CPU is idle and needs to be
+ * triggered to reevaluate the timer wheel when nohz is
+ * active. We are protected against the other CPU fiddling
+ * with the timer by holding the timer base lock. This also
+ * makes sure that a CPU on the way to idle can not evaluate
+ * the timer wheel.
+ */
+ wake_up_idle_cpu(cpu);
+ spin_unlock_irqrestore(&base->lock, flags);
+}
+
+/**
* del_timer - deactive a timer.
* @timer: the timer to be deactivated
*
@@ -744,7 +783,6 @@ int del_timer(struct timer_list *timer)
return ret;
}
-
EXPORT_SYMBOL(del_timer);
#ifdef CONFIG_SMP
@@ -778,7 +816,6 @@ out:
return ret;
}
-
EXPORT_SYMBOL(try_to_del_timer_sync);
/**
@@ -816,7 +853,6 @@ int del_timer_sync(struct timer_list *ti
cpu_relax();
}
}
-
EXPORT_SYMBOL(del_timer_sync);
#endif
@@ -1314,7 +1350,7 @@ signed long __sched schedule_timeout(sig
expire = timeout + jiffies;
setup_timer_on_stack(&timer, process_timeout, (unsigned long)current);
- __mod_timer(&timer, expire);
+ __mod_timer(&timer, expire, false);
schedule();
del_singleshot_timer_sync(&timer);
next prev parent reply other threads:[~2009-02-18 12:05 UTC|newest]
Thread overview: 83+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-02-18 5:19 [RFT 0/4] Netfilter/iptables performance improvements Stephen Hemminger
2009-02-18 5:19 ` [RFT 1/4] iptables: lock free counters Stephen Hemminger
2009-02-18 10:02 ` Patrick McHardy
2009-02-19 19:47 ` [PATCH] " Stephen Hemminger
2009-02-19 23:46 ` Eric Dumazet
2009-02-19 23:56 ` Rick Jones
2009-02-20 1:03 ` Stephen Hemminger
2009-02-20 1:18 ` Rick Jones
2009-02-20 9:42 ` Patrick McHardy
2009-02-20 22:57 ` Rick Jones
2009-02-21 0:35 ` Rick Jones
2009-02-20 9:37 ` Patrick McHardy
2009-02-20 18:10 ` [PATCH] iptables: xt_hashlimit fix Eric Dumazet
2009-02-20 18:33 ` Jan Engelhardt
2009-02-28 1:54 ` Jan Engelhardt
2009-02-28 6:56 ` Eric Dumazet
2009-02-28 8:22 ` Jan Engelhardt
2009-02-24 14:31 ` Patrick McHardy
2009-02-27 14:02 ` [PATCH] iptables: lock free counters Eric Dumazet
2009-02-27 16:08 ` [PATCH] rcu: increment quiescent state counter in ksoftirqd() Eric Dumazet
2009-02-27 16:34 ` Paul E. McKenney
2009-03-02 10:55 ` [PATCH] iptables: lock free counters Patrick McHardy
2009-03-02 17:47 ` Eric Dumazet
2009-03-02 21:56 ` Patrick McHardy
2009-03-02 22:02 ` Stephen Hemminger
2009-03-02 22:07 ` Patrick McHardy
2009-03-02 22:17 ` Paul E. McKenney
2009-03-02 22:27 ` Eric Dumazet
2009-02-18 5:19 ` [RFT 2/4] Add mod_timer_noact Stephen Hemminger
2009-02-18 9:20 ` Ingo Molnar
2009-02-18 9:30 ` David Miller
2009-02-18 11:01 ` Ingo Molnar
2009-02-18 11:39 ` Jarek Poplawski
2009-02-18 12:37 ` Ingo Molnar
2009-02-18 12:33 ` Patrick McHardy
2009-02-18 21:39 ` David Miller
2009-02-18 21:51 ` Ingo Molnar
2009-02-18 22:04 ` David Miller
2009-02-18 22:42 ` Peter Zijlstra
2009-02-18 22:47 ` David Miller
2009-02-18 22:56 ` Stephen Hemminger
2009-02-18 10:07 ` Patrick McHardy
2009-02-18 12:05 ` Ingo Molnar [this message]
2009-02-18 12:33 ` [patch] timers: add mod_timer_pending() Patrick McHardy
2009-02-18 12:50 ` Ingo Molnar
2009-02-18 12:54 ` Patrick McHardy
2009-02-18 13:47 ` Ingo Molnar
2009-02-18 17:00 ` Oleg Nesterov
2009-02-18 18:23 ` Ingo Molnar
2009-02-18 18:58 ` Oleg Nesterov
2009-02-18 19:24 ` Ingo Molnar
2009-02-18 10:29 ` [RFT 2/4] Add mod_timer_noact Patrick McHardy
2009-02-18 5:19 ` [RFT 3/4] Use mod_timer_noact to remove nf_conntrack_lock Stephen Hemminger
2009-02-18 9:54 ` Patrick McHardy
2009-02-18 11:05 ` Jarek Poplawski
2009-02-18 11:08 ` Patrick McHardy
2009-02-18 14:01 ` Eric Dumazet
2009-02-18 14:04 ` Patrick McHardy
2009-02-18 14:22 ` Eric Dumazet
2009-02-18 14:27 ` Patrick McHardy
2009-02-18 5:19 ` [RFT 4/4] netfilter: Get rid of central rwlock in tcp conntracking Stephen Hemminger
2009-02-18 9:56 ` Patrick McHardy
2009-02-18 14:17 ` Eric Dumazet
2009-02-19 22:03 ` Stephen Hemminger
2009-03-28 16:55 ` [PATCH] netfilter: finer grained nf_conn locking Eric Dumazet
2009-03-29 0:48 ` Stephen Hemminger
2009-03-30 19:57 ` Eric Dumazet
2009-03-30 20:05 ` Stephen Hemminger
2009-04-06 12:07 ` Patrick McHardy
2009-04-06 12:32 ` Jan Engelhardt
2009-04-06 17:25 ` Stephen Hemminger
2009-03-30 18:57 ` Rick Jones
2009-03-30 19:20 ` Eric Dumazet
2009-03-30 19:38 ` Jesper Dangaard Brouer
2009-03-30 19:54 ` Eric Dumazet
2009-03-30 20:34 ` Jesper Dangaard Brouer
2009-03-30 20:41 ` Eric Dumazet
2009-03-30 21:25 ` Jesper Dangaard Brouer
2009-03-30 22:44 ` Rick Jones
2009-02-18 21:55 ` [RFT 4/4] netfilter: Get rid of central rwlock in tcp conntracking David Miller
2009-02-18 23:23 ` Patrick McHardy
2009-02-18 23:35 ` Stephen Hemminger
2009-02-18 8:30 ` [RFT 0/4] Netfilter/iptables performance improvements Eric Dumazet
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090218120508.GB4100@elte.hu \
--to=mingo@elte.hu \
--cc=a.p.zijlstra@chello.nl \
--cc=dada1@cosmosbay.com \
--cc=davem@davemloft.net \
--cc=gandalf@wlug.westbo.se \
--cc=kaber@trash.net \
--cc=netdev@vger.kernel.org \
--cc=netfilter-devel@vger.kernel.org \
--cc=oleg@redhat.com \
--cc=rick.jones2@hp.com \
--cc=shemminger@vyatta.com \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).