All of lore.kernel.org
 help / color / mirror / Atom feed
From: Rafael Aquini <aquini@redhat.com>
To: Rik van Riel <riel@redhat.com>
Cc: linux-kernel@vger.kernel.org, walken@google.com,
	eric.dumazet@gmail.com, lwoodman@redhat.com, jeremy@goop.org,
	Jan Beulich <JBeulich@novell.com>,
	knoel@redhat.com, chegu_vinod@hp.com,
	raghavendra.kt@linux.vnet.ibm.com, mingo@redhat.com
Subject: Re: [PATCH 3/5] x86,smp: auto tune spinlock backoff delay factor
Date: Thu, 10 Jan 2013 01:13:39 -0200	[thread overview]
Message-ID: <20130110031339.GC1636@x61.redhat.com> (raw)
In-Reply-To: <20130108173029.305d99c0@annuminas.surriel.com>

On Tue, Jan 08, 2013 at 05:30:29PM -0500, Rik van Riel wrote:
> Many spinlocks are embedded in data structures; having many CPUs
> pounce on the cache line the lock is in will slow down the lock
> holder, and can cause system performance to fall off a cliff.
> 
> The paper "Non-scalable locks are dangerous" is a good reference:
> 
> 	http://pdos.csail.mit.edu/papers/linux:lock.pdf
> 
> In the Linux kernel, spinlocks are optimized for the case of
> there not being contention. After all, if there is contention,
> the data structure can be improved to reduce or eliminate
> lock contention.
> 
> Likewise, the spinlock API should remain simple, and the
> common case of the lock not being contended should remain
> as fast as ever.
> 
> However, since spinlock contention should be fairly uncommon,
> we can add functionality into the spinlock slow path that keeps
> system performance from falling off a cliff when there is lock
> contention.
> 
> Proportional delay in ticket locks is delaying the time between
> checking the ticket based on a delay factor, and the number of
> CPUs ahead of us in the queue for this lock. Checking the lock
> less often allows the lock holder to continue running, resulting
> in better throughput and preventing performance from dropping
> off a cliff.
> 
> Proportional spinlock delay with a high delay factor works well
> when there is lots contention on a lock. Likewise, a smaller
> delay factor works well when a lock is lightly contended.
> 
> Making the code auto-tune the delay factor results in a system
> that performs well with both light and heavy lock contention.
> 
> Signed-off-by: Rik van Riel <riel@redhat.com>
> ---
> v3: use fixed-point math for the delay calculations, suggested by Michel Lespinasse
>

Acked-by: Rafael Aquini <aquini@redhat.com>

 
>  arch/x86/kernel/smp.c |   43 +++++++++++++++++++++++++++++++++++++++----
>  1 files changed, 39 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/kernel/smp.c b/arch/x86/kernel/smp.c
> index aa743e9..05f828b 100644
> --- a/arch/x86/kernel/smp.c
> +++ b/arch/x86/kernel/smp.c
> @@ -113,13 +113,34 @@ static atomic_t stopping_cpu = ATOMIC_INIT(-1);
>  static bool smp_no_nmi_ipi = false;
>  
>  /*
> - * Wait on a congested ticket spinlock.
> + * Wait on a congested ticket spinlock. Many spinlocks are embedded in
> + * data structures; having many CPUs pounce on the cache line with the
> + * spinlock simultaneously can slow down the lock holder, and the system
> + * as a whole.
> + *
> + * To prevent total performance collapse in case of bad spinlock contention,
> + * perform proportional backoff. The per-cpu value of delay is automatically
> + * tuned to limit the number of times spinning CPUs poll the lock before
> + * obtaining it. This limits the amount of cross-CPU traffic required to obtain
> + * a spinlock, and keeps system performance from dropping off a cliff.
> + *
> + * There is a tradeoff. If we poll too often, the whole system is slowed
> + * down. If we sleep too long, the lock will go unused for a period of
> + * time. The solution is to go for a fast spin if we are at the head of
> + * the queue, to slowly increase the delay if we sleep for too short a
> + * time, and to decrease the delay if we slept for too long.
>   */
> +#define DELAY_SHIFT 8
> +#define DELAY_FIXED_1 (1<<DELAY_SHIFT)
> +#define MIN_SPINLOCK_DELAY (1 * DELAY_FIXED_1)
> +#define MAX_SPINLOCK_DELAY (16000 * DELAY_FIXED_1)
> +DEFINE_PER_CPU(unsigned, spinlock_delay) = { MIN_SPINLOCK_DELAY };
>  void ticket_spin_lock_wait(arch_spinlock_t *lock, struct __raw_tickets inc)
>  {
>  	__ticket_t head = inc.head, ticket = inc.tail;
>  	__ticket_t waiters_ahead;
> -	unsigned loops;
> +	unsigned delay = __this_cpu_read(spinlock_delay);
> +	unsigned loops = 1;
>  
>  	for (;;) {
>  		waiters_ahead = ticket - head - 1;
> @@ -133,14 +154,28 @@ void ticket_spin_lock_wait(arch_spinlock_t *lock, struct __raw_tickets inc)
>  			} while (ACCESS_ONCE(lock->tickets.head) != ticket);
>  			break;
>  		}
> -		loops = 50 * waiters_ahead;
> +
> +		/* Aggressively increase delay, to minimize lock accesses. */
> +		if (delay < MAX_SPINLOCK_DELAY)
> +			delay += DELAY_FIXED_1 / 7;
> +
> +		loops = (delay * waiters_ahead) >> DELAY_SHIFT;
>  		while (loops--)
>  			cpu_relax();
>  
>  		head = ACCESS_ONCE(lock->tickets.head);
> -		if (head == ticket)
> +		if (head == ticket) {
> +			/*
> +			 * We overslept, and do not know by how.
> +			 * Exponentially decay the value of delay,
> +			 * to get it back to a good value quickly.
> +			 */
> +			if (delay >= 2 * DELAY_FIXED_1)
> +				delay -= max(delay/32, DELAY_FIXED_1);
>  			break;
> +		}
>  	}
> +	__this_cpu_write(spinlock_delay, delay);
>  }
>  
>  /*
> 

  reply	other threads:[~2013-01-10  3:13 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-01-08 22:26 [PATCH 0/5] x86,smp: make ticket spinlock proportional backoff w/ auto tuning Rik van Riel
2013-01-08 22:30 ` [PATCH 3/5] x86,smp: auto tune spinlock backoff delay factor Rik van Riel
2013-01-10  3:13   ` Rafael Aquini [this message]
2013-01-10 12:49   ` Michel Lespinasse
2013-01-08 22:31 ` [PATCH 4/5] x86,smp: keep spinlock delay values per hashed spinlock address Rik van Riel
2013-01-10  3:14   ` Rafael Aquini
2013-01-10 13:01   ` Michel Lespinasse
2013-01-10 13:05     ` Rik van Riel
2013-01-10 13:15       ` Michel Lespinasse
2013-01-08 22:32 ` [DEBUG PATCH 5/5] x86,smp: add debugging code to track spinlock delay value Rik van Riel
2013-01-08 22:32 ` [PATCH 2/5] x86,smp: proportional backoff for ticket spinlocks Rik van Riel
2013-01-08 22:50   ` Eric Dumazet
2013-01-08 22:54     ` Rik van Riel
2013-01-10  2:30   ` Rafael Aquini
2013-01-08 22:32 ` [PATCH 1/5] x86,smp: move waiting on contended ticket lock out of line Rik van Riel
2013-01-08 22:43   ` Eric Dumazet
2013-01-10 17:38   ` Raghavendra K T
2013-01-09 12:50 ` [PATCH 0/5] x86,smp: make ticket spinlock proportional backoff w/ auto tuning Raghavendra K T
2013-01-10  2:27   ` Rafael Aquini
2013-01-10 17:36     ` Raghavendra K T
2013-01-11 20:11       ` Rik van Riel
2013-01-13 18:07         ` Raghavendra K T
2013-01-10 15:19 ` Mike Galbraith
2013-01-10 15:31   ` Rik van Riel
2013-01-10 19:30     ` Mike Galbraith
2013-01-24 13:28       ` Ingo Molnar
2013-01-10 22:24 ` Chegu Vinod

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130110031339.GC1636@x61.redhat.com \
    --to=aquini@redhat.com \
    --cc=JBeulich@novell.com \
    --cc=chegu_vinod@hp.com \
    --cc=eric.dumazet@gmail.com \
    --cc=jeremy@goop.org \
    --cc=knoel@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lwoodman@redhat.com \
    --cc=mingo@redhat.com \
    --cc=raghavendra.kt@linux.vnet.ibm.com \
    --cc=riel@redhat.com \
    --cc=walken@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.