public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [RFC] [PATCH] Performance of del_timer_sync
@ 2004-05-10 22:16 Geoff Gustafson
  2004-05-11  7:45 ` Andrew Morton
  0 siblings, 1 reply; 22+ messages in thread
From: Geoff Gustafson @ 2004-05-10 22:16 UTC (permalink / raw)
  To: linux-kernel

I started this patch based on profiling an enterprise database application
on a 32p IA64 NUMA machine, where del_timer_sync was one of the top few
functions taking CPU time in the kernel. The problem: even though a timer
can only be scheduled on one CPU at a time, a loop was searching through
each CPU. This is particularly bad on a NUMA machine, but doesn't scale
well on a normal SMP either. With this patch, here were some results:

Before:             32p     4p
     Warm cache   29,000    505
     Cold cache   37,800   1220

After:              32p     4p
     Warm cache       95     88
     Cold cache    1,800    140

[Measurements are CPU cycles spent in a call to del_timer_sync, the average
of 1000 calls. 32p is 16-node NUMA, 4p is SMP.]

I'd be very interested for others to try out this patch, and see what kind
of results you get. Have you also observed del_timer_sync as a hotspot?

Now on to the patch. The basic idea is to remember which 'base' (i.e. CPU)
is currently executing the timer function. Ingo once sent a patch that did
this exact thing, to solve SMP races:
http://marc.theaimsgroup.com/?l=linux-kernel&m=106610555708068&w=2

Linus pointed out that you can't reset timer->running to NULL in
__run_timers because the timer might have been freed by the timer function.
The patch was then dropped. It now seems to me though that we can still get
the performance gains - I call it timer->last_running and it just does not
get reset to NULL. Thus it may be stale, but at least then you only have to
check _one_ CPU for the running timer instead of all of them.

Here are some known issues I'd like people to comment on:

1. How bad is the performance hit of adding a last_running field to
timer_list? I ifdef'd everything with CONFIG_SMP to make sure there's no
impact to uniprocessors, although that may be objectionable for its ugliness.

2. I believe there's a theoretical race; I'm not sure if it could actually
happen in practice. When the timer function gets executed in __run_timers,
the base->lock is briefly unlocked. This gives another CPU that has been
waiting in __mod_timer the opportunity to step in and reschedule the timer.
Then that new scheduled timer has to actually expire (next timer interrupt
at the earliest I believe). Then it has to get around to executing the timer
function. So now two timer functions are overlapping, and timer->last_running
only remembers the second one.

In this situation, the original del_timer_sync code would find the timer
function on the lowest numbered CPU, wait for it to finish, and break out of
the loop. If the function on the higher numbered CPU runs longer, it will
return before it is finished.

With the patch, the code will wait for the most recently executed timer
function to finish. If the older function finishes after the newer one, we
get the original problem. I'd suggest waiting for the most recent one is an
improvement, though not a fix.

If del_timer_sync returns while the timer function is still running, the
caller may do something (like free the timer) that will cause the running
timer function to go awry. Can this really happen? Would it be the timer
function's fault, for running too long?

Comments, results, bug fixes, and improvements welcome!

- Geoff Gustafson

diff -Naur fresh/include/linux/timer.h timer/include/linux/timer.h
--- fresh/include/linux/timer.h	Sat Apr  3 19:37:37 2004
+++ timer/include/linux/timer.h	Tue May  4 14:51:51 2004
@@ -18,10 +18,20 @@
  	unsigned long data;

  	struct tvec_t_base_s *base;
+
+#ifdef CONFIG_SMP
+	struct tvec_t_base_s *last_running;
+#endif
  };

  #define TIMER_MAGIC	0x4b87ad6e

+#ifdef CONFIG_SMP
+#define TIMER_INIT_LASTRUNNING .last_running = NULL,
+#else
+#define TIMER_INIT_LASTRUNNING
+#endif
+
  #define TIMER_INITIALIZER(_function, _expires, _data) {		\
  		.function = (_function),			\
  		.expires = (_expires),				\
@@ -29,6 +39,7 @@
  		.base = NULL,					\
  		.magic = TIMER_MAGIC,				\
  		.lock = SPIN_LOCK_UNLOCKED,			\
+		TIMER_INIT_LASTRUNNING				\
  	}

  /***
@@ -40,6 +51,9 @@
   */
  static inline void init_timer(struct timer_list * timer)
  {
+#ifdef CONFIG_SMP
+	timer->last_running = NULL;
+#endif
  	timer->base = NULL;
  	timer->magic = TIMER_MAGIC;
  	spin_lock_init(&timer->lock);
diff -Naur fresh/kernel/timer.c timer/kernel/timer.c
--- fresh/kernel/timer.c	Sat Apr  3 19:37:59 2004
+++ timer/kernel/timer.c	Tue May  4 11:37:36 2004
@@ -75,6 +75,14 @@
  #endif
  }

+static inline void set_last_running(struct timer_list *timer,
+				    tvec_base_t *base)
+{
+#ifdef CONFIG_SMP
+	timer->last_running = base;
+#endif
+}
+
  /* Fake initialization */
  static DEFINE_PER_CPU(tvec_base_t, tvec_bases) = { SPIN_LOCK_UNLOCKED };

@@ -325,25 +333,22 @@
  int del_timer_sync(struct timer_list *timer)
  {
  	tvec_base_t *base;
-	int i, ret = 0;
+	int ret = 0;

  	check_timer(timer);

  del_again:
  	ret += del_timer(timer);

-	for_each_cpu(i) {
-		base = &per_cpu(tvec_bases, i);
-		if (base->running_timer == timer) {
-			while (base->running_timer == timer) {
-				cpu_relax();
-				preempt_check_resched();
-			}
-			break;
+	base = timer->last_running;
+	if (base) {
+		while (base->running_timer == timer) {
+			cpu_relax();
+			preempt_check_resched();
  		}
  	}
  	smp_rmb();
-	if (timer_pending(timer))
+	if (timer_pending(timer) || timer->last_running != base)
  		goto del_again;

  	return ret;
@@ -418,6 +423,7 @@
  			set_running_timer(base, timer);
  			smp_wmb();
  			timer->base = NULL;
+			set_last_running(timer, base);
  			spin_unlock_irq(&base->lock);
  			fn(data);
  			spin_lock_irq(&base->lock);


^ permalink raw reply	[flat|nested] 22+ messages in thread
* Re: [RFC] [PATCH] Performance of del_timer_sync
@ 2004-05-11 22:46 Geoff Gustafson
  0 siblings, 0 replies; 22+ messages in thread
From: Geoff Gustafson @ 2004-05-11 22:46 UTC (permalink / raw)
  To: akpm; +Cc: linux-kernel, mingo, kenneth.w.chen, hch

Andrew Morton wrote:
 > <anal>del_singleshot_timer_sync</anal>
 >
 > I vote we leave it up to Ken.  But please, not del_timer_kenneth().

Despite your markup language, I thought keeping the 'sync' sounded like
a good idea, since that's the function it's replacing.

Here's a patch. I've done a small amount of testing with a 4p machine.
I've at least seen that the average call to singleshot is down around 90
cycles, pretty similar/expected. We'll test on the original platform and
post results.

- Geoff

* I am in a transitional period w/ email clients and can't seem to use
any of them properly. Sorry for the dups, etc.

diff -Naur fresh/fs/aio.c timer/fs/aio.c
--- fresh/fs/aio.c	Sun May  9 19:32:29 2004
+++ timer/fs/aio.c	Tue May 11 15:14:47 2004
@@ -793,7 +793,7 @@

  static inline void clear_timeout(struct timeout *to)
  {
-	del_timer_sync(&to->timer);
+	del_singleshot_timer_sync(&to->timer);
  }

  static int read_events(struct kioctx *ctx,
diff -Naur fresh/include/linux/timer.h timer/include/linux/timer.h
--- fresh/include/linux/timer.h	Sun May  9 19:32:54 2004
+++ timer/include/linux/timer.h	Tue May 11 14:21:36 2004
@@ -88,8 +88,10 @@

  #ifdef CONFIG_SMP
    extern int del_timer_sync(struct timer_list * timer);
+  extern int del_singleshot_timer_sync(struct timer_list * timer);
  #else
  # define del_timer_sync(t) del_timer(t)
+# define del_singleshot_timer_sync(t) del_timer(t)
  #endif

  extern void init_timers(void);
diff -Naur fresh/kernel/timer.c timer/kernel/timer.c
--- fresh/kernel/timer.c	Sun May  9 19:33:13 2004
+++ timer/kernel/timer.c	Tue May 11 15:08:24 2004
@@ -350,6 +350,35 @@
  }

  EXPORT_SYMBOL(del_timer_sync);
+
+/***
+ * del_singleshot_timer_sync - deactivate a non-recursive timer
+ * @timer: the timer to be deactivated
+ *
+ * This function is an optimization of del_timer_sync for the case where the
+ * caller can guarantee the timer does not reschedule itself in its timer
+ * function.
+ *
+ * Synchronization rules: callers must prevent restarting of the timer,
+ * otherwise this function is meaningless. It must not be called from
+ * interrupt contexts. Upon exit the timer is not queued and the handler
+ * is not running on any CPU.
+ *
+ * The function returns whether it has deactivated a pending timer or not.
+ */
+int del_singleshot_timer_sync(struct timer_list *timer)
+{
+	int ret = del_timer(timer);
+
+	if (!ret) {
+		ret = del_timer_sync(timer);
+		BUG_ON(ret);
+	}
+
+	return ret;
+}
+
+EXPORT_SYMBOL(del_singleshot_timer_sync);
  #endif

  static int cascade(tvec_base_t *base, tvec_t *tv, int index)
@@ -1109,7 +1138,7 @@

  	add_timer(&timer);
  	schedule();
-	del_timer_sync(&timer);
+	del_singleshot_timer_sync(&timer);

  	timeout = expire - jiffies;


^ permalink raw reply	[flat|nested] 22+ messages in thread
[parent not found: <40A152A8.4080104@linux.intel.com>]

end of thread, other threads:[~2004-05-11 23:01 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-05-10 22:16 [RFC] [PATCH] Performance of del_timer_sync Geoff Gustafson
2004-05-11  7:45 ` Andrew Morton
2004-05-11 18:36   ` Geoff Gustafson
2004-05-11 19:11     ` Andrew Morton
2004-05-11 19:23       ` Chen, Kenneth W
2004-05-11 19:28         ` Andrew Morton
2004-05-11 19:30           ` Andrew Morton
2004-05-11 19:58       ` Ingo Molnar
2004-05-11 20:11         ` Andrew Morton
2004-05-11 20:19           ` Christoph Hellwig
2004-05-11 20:26             ` Andrew Morton
2004-05-11 20:32               ` Ingo Molnar
2004-05-11 20:55                 ` Andrew Morton
2004-05-11 20:57                   ` Ingo Molnar
2004-05-11 21:01                     ` Chen, Kenneth W
2004-05-11 20:26           ` Ingo Molnar
2004-05-11 20:27           ` Chen, Kenneth W
2004-05-11 20:30             ` Andrew Morton
2004-05-11 20:45               ` Chen, Kenneth W
2004-05-11 20:38             ` Ingo Molnar
  -- strict thread matches above, loose matches on Subject: below --
2004-05-11 22:46 Geoff Gustafson
     [not found] <40A152A8.4080104@linux.intel.com>
2004-05-11 23:03 ` Andrew Morton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox