public inbox for linux-ia64@vger.kernel.org
 help / color / mirror / Atom feed
From: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
To: linux-ia64@vger.kernel.org
Cc: Linux Kernel list <linux-kernel@vger.kernel.org>
Subject: [PATCH] ia64: Scalability improvement of gettimeofday with jitter
Date: Tue, 12 Jun 2007 03:14:44 +0000	[thread overview]
Message-ID: <466E0FA4.2040405@jp.fujitsu.com> (raw)

Hi all,

This is a proposal with a patch to improve scalability
of time handling.

As described in comment at arch/ia64/kernel/time.c:

[arch/ia64/kernel/time.c]
> #ifdef CONFIG_SMP
>   /* On IA64 in an SMP configuration ITCs are never accurately synchronized.
>    * Jitter compensation requires a cmpxchg which may limit
>    * the scalability of the syscalls for retrieving time.
>    * The ITC synchronization is usually successful to within a few
>    * ITC ticks but this is not a sure thing. If you need to improve
>    * timer performance in SMP situations then boot the kernel with the
>    * "nojitter" option. However, doing so may result in time fluctuating (maybe
>    * even going backward) if the ITC offsets between the individual CPUs
>    * are too large.
>    */
>   if (!nojitter) itc_interpolator.jitter = 1;
> #endif

ia64 uses jitter compensation to prevent time from going backward.

This jitter compensation logic which keep track of cycle value
recently returned is provided as generic code (and copied to
arch/ia64/kernel/fsys.S).
It seems that there is no user (setting jitter = 1) other than ia64.

[kernel/timer.c]
> static inline u64 time_interpolator_get_counter(int writelock)
> {
>   unsigned int src = time_interpolator->source;
>
>   if (time_interpolator->jitter)
>   {
>     cycles_t lcycle;
>     cycles_t now;
>
>     do {
>       lcycle = time_interpolator->last_cycle;
>       now = time_interpolator_get_cycles(src);
>       if (lcycle && time_after(lcycle, now))
>         return lcycle;
>
>       /* When holding the xtime write lock, there's no need
>        * to add the overhead of the cmpxchg.  Readers are
>        * force to retry until the write lock is released.
>        */
>       if (writelock) {
>         time_interpolator->last_cycle = now;
>         return now;
>       }
>       /* Keep track of the last timer value returned. The use of cmpxchg here
>        * will cause contention in an SMP environment.
>        */
>     } while (unlikely(cmpxchg(&time_interpolator->last_cycle, lcycle, now) != lcycle));
>     return now;
>   }
>   else
>     return time_interpolator_get_cycles(src);
> }

The current logic is consist of do-while loop with cmpxchg.

The cmpxchg is known to take long time in an SMP environment but
it is easy way to guarantee atomic operation.
I think this is acceptable while there are no better alternatives.

OTOH, the do-while forces retry if cmpxchg fails (no exchanges).
This means that if there are N threads trying to do cmpxchg at
same time, only 1 can out from this loop and N-1 others will be
trapped in the loop. This also means that a thread could loop
N times in worst case.

Obviously this is a scalability issue.
To avoid this retry loop, I'd like to propose new logic that
removes do-while here.

The basic idea is "use winner's cycle instead of retrying."
Assuming that there are N threads trying to do cmpxchg, it also
be assumed that they are trying to update last_cycle by its own
new value while all values are almost same.
Therefore, it will work that treating threads as a group and
deciding a group's return value by picking up one from the group.

Fortunately, cmpxchg mechanism can help this logic. Only first
one in group can exchange the last_cycle successfully, so this
"winner" gets previous last_cycle as the return value of cmpxchg.
The rests in group will fail to exchange since last_cycle is
already updated by winner, so these "loser" gets current
last_cycle on cmpxchg's return. This means that all thread in
the group can know the winner's cycle.

  ret = cmpxchg(&last_cycle, last, new);
  if (ret = last)
    return new; /* you win! */
  else
    return ret; /* you lose. ret is winner's new */

I had a test running gettimeofday() processes at 1.5GHz*4way box.
It shows followings:

 - x1 process:
    0.15us / 1 gettimeofday() call
    0.15us / 1 gettimeofday() call with patch
 - x2 process:
    0.31us / 1 gettimeofday() call
    0.24us / 1 gettimeofday() call with patch
 - x3 process:
    1.59us / 1 gettimeofday() call
    1.11us / 1 gettimeofday() call with patch
 - x4 process:
    2.34us / 1 gettimeofday() call
    1.29us / 1 gettimeofday() call with patch

I know that this patch could not help quite huge system since
such system like having 1024CPUs should have better clocksource
instead of doing cmpxchg. Even though this patch will work good
on middle-sized box (4~8way, possibly 16~64way?).

Thanks,
H.Seto

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>

-----
 arch/ia64/kernel/fsys.S |   10 +++++++---
 kernel/timer.c          |   45 +++++++++++++++++++++++++--------------------
 2 files changed, 32 insertions(+), 23 deletions(-)

Index: linux-2.6.21/arch/ia64/kernel/fsys.S
=================================--- linux-2.6.21.orig/arch/ia64/kernel/fsys.S
+++ linux-2.6.21/arch/ia64/kernel/fsys.S
@@ -271,18 +271,22 @@
 (p6)	sub r10 = r25,r26	// time we got was less than last_cycle
 (p7)	mov ar.ccv = r25	// more than last_cycle. Prep for cmpxchg
 	;;
+(p7)	cmpxchg8.rel r3 = [r23],r2,ar.ccv
+	;;
+(p7)	cmp.ne p7,p0 = r25,r3	// if cmpxchg not successful, neighbor updated last_cycle.
+	;;
+(p7)	sub r10 = r3,r26	// use new last_cycle instead
+	;;
 	and r10 = r10,r14	// Apply mask
 	;;
 	setf.sig f8 = r10
 	nop.i 123
 	;;
-(p7)	cmpxchg8.rel r3 = [r23],r2,ar.ccv
 EX(.fail_efault, probe.w.fault r31, 3)	// This takes 5 cycles and we have spare time
 	xmpy.l f8 = f8,f7	// nsec_per_cyc*(counter-last_counter)
 (p15)	add r9 = r9,r17		// Add wall to monotonic.secs to result secs
 	;;
 (p15)	ld8 r17 = [r19],-IA64_TIMESPEC_TV_NSEC_OFFSET
-(p7)	cmp.ne p7,p0 = r25,r3	// if cmpxchg not successful redo
 	// simulate tbit.nz.or p7,p0 = r28,0
 	and r28 = ~1,r28	// Make sequence even to force retry if odd
 	getf.sig r2 = f8
@@ -295,7 +299,7 @@
 	;;		// overloaded 3 bundles!
 	// End critical section.
 	add r8 = r8,r2		// Add xtime.nsecs
-	cmp4.ne.or p7,p0 = r28,r10
+	cmp4.ne p7,p0 = r28,r10
 (p7)	br.cond.dpnt.few .time_redo	// sequence number changed ?
 	// Now r8=tv->tv_nsec and r9=tv->tv_sec
 	mov r10 = r0
Index: linux-2.6.21/kernel/timer.c
=================================--- linux-2.6.21.orig/kernel/timer.c
+++ linux-2.6.21/kernel/timer.c
@@ -1762,27 +1762,32 @@

 	if (time_interpolator->jitter)
 	{
-		cycles_t lcycle;
-		cycles_t now;
+		cycles_t lcycle, now, ret;
+
+		lcycle = time_interpolator->last_cycle;
+		now = time_interpolator_get_cycles(src);
+		if (lcycle && time_after(lcycle, now))
+			return lcycle;
+
+		/* When holding the xtime write lock, there's no need
+		 * to add the overhead of the cmpxchg.  Readers are
+		 * force to retry until the write lock is released.
+		 */
+		if (writelock) {
+			time_interpolator->last_cycle = now;
+			return now;
+		}
+
+		/*
+		 * Keep track of the last timer value returned.
+		 * In an SMP environment, you could lose out in contention of
+		 * cmpxchg. If so, your cmpxchg returns new value which the
+		 * winner of contention updated to. Use the new value instead.
+		 */
+		ret = cmpxchg(&time_interpolator->last_cycle, lcycle, now);
+		if (unlikely(ret != lcycle))
+			return ret;

-		do {
-			lcycle = time_interpolator->last_cycle;
-			now = time_interpolator_get_cycles(src);
-			if (lcycle && time_after(lcycle, now))
-				return lcycle;
-
-			/* When holding the xtime write lock, there's no need
-			 * to add the overhead of the cmpxchg.  Readers are
-			 * force to retry until the write lock is released.
-			 */
-			if (writelock) {
-				time_interpolator->last_cycle = now;
-				return now;
-			}
-			/* Keep track of the last timer value returned. The use of cmpxchg here
-			 * will cause contention in an SMP environment.
-			 */
-		} while (unlikely(cmpxchg(&time_interpolator->last_cycle, lcycle, now) != lcycle));
 		return now;
 	}
 	else

             reply	other threads:[~2007-06-12  3:14 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-06-12  3:14 Hidetoshi Seto [this message]
2007-06-13 12:20 ` [PATCH] ia64: Scalability improvement of gettimeofday with jitter compensation Bob Picco
2007-06-14  2:08   ` [PATCH] ia64: Scalability improvement of gettimeofday with jitter Hidetoshi Seto
2007-06-15 17:46 ` Christoph Lameter
2007-06-15 20:03 ` [PATCH] ia64: Scalability improvement of gettimeofday with jitter compensation Luck, Tony

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=466E0FA4.2040405@jp.fujitsu.com \
    --to=seto.hidetoshi@jp.fujitsu.com \
    --cc=linux-ia64@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox