Re: [RFC PATCH 3/3] x86,smp: auto tune spinlock backoff delay factor

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: David Daney <ddaney.cavm@gmail.com>
To: Rik van Riel <riel@redhat.com>
Cc: linux-kernel@vger.kernel.org, aquini@redhat.com,
	walken@google.com, lwoodman@redhat.com, jeremy@goop.org,
	Jan Beulich <JBeulich@novell.com>,
	Thomas Gleixner <tglx@linutronix.de>
Subject: Re: [RFC PATCH 3/3] x86,smp: auto tune spinlock backoff delay factor
Date: Fri, 21 Dec 2012 16:47:54 -0800	[thread overview]
Message-ID: <50D5033A.2070309@gmail.com> (raw)
In-Reply-To: <20121221185147.4ae48ab5@annuminas.surriel.com>

On 12/21/2012 03:51 PM, Rik van Riel wrote:
> Subject: x86,smp: auto tune spinlock backoff delay factor

Nice work,

A couple of comments...

>
> Many spinlocks are embedded in data structures; having many CPUs
> pounce on the cache line the lock is in will slow down the lock
> holder, and can cause system performance to fall off a cliff.
>
> The paper "Non-scalable locks are dangerous" is a good reference:
>
> 	http://pdos.csail.mit.edu/papers/linux:lock.pdf
>
> In the Linux kernel, spinlocks are optimized for the case of
> there not being contention. After all, if there is contention,
> the data structure can be improved to reduce or eliminate
> lock contention.
>
> Likewise, the spinlock API should remain simple, and the
> common case of the lock not being contended should remain
> as fast as ever.
>
> However, since spinlock contention should be fairly uncommon,
> we can add functionality into the spinlock slow path that keeps
> system performance from falling off a cliff when there is lock
> contention.
>
> Proportional delay in ticket locks is delaying the time between
> checking the ticket based on a delay factor, and the number of
> CPUs ahead of us in the queue for this lock. Checking the lock
> less often allows the lock holder to continue running, resulting
> in better throughput and preventing performance from dropping
> off a cliff.
>
> Proportional spinlock delay with a high delay factor works well
> when there is lots contention on a lock. Likewise, a smaller
> delay factor works well when a lock is lightly contended.
>
> Making the code auto-tune the delay factor results in a system
> that performs well with both light and heavy lock contention.
>
> Signed-off-by: Rik van Riel <riel@redhat.com>
> ---
>   arch/x86/kernel/smp.c |   49 ++++++++++++++++++++++++++++++++++++++++++++++---
>   1 files changed, 46 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/kernel/smp.c b/arch/x86/kernel/smp.c
> index 1c1eb02..8786476 100644
> --- a/arch/x86/kernel/smp.c
> +++ b/arch/x86/kernel/smp.c
> @@ -113,19 +113,62 @@ static atomic_t stopping_cpu = ATOMIC_INIT(-1);
>   static bool smp_no_nmi_ipi = false;
>
>   /*
> - * Wait on a congested ticket spinlock.
> + * Wait on a congested ticket spinlock. Many spinlocks are embedded in
> + * data structures; having many CPUs pounce on the cache line with the
> + * spinlock simultaneously can slow down the lock holder, and the system
> + * as a whole.
> + *
> + * To prevent total performance collapse in case of bad spinlock contention,
> + * perform proportional backoff. The per-cpu value of delay is automatically
> + * tuned to limit the number of times spinning CPUs poll the lock before
> + * obtaining it. This limits the amount of cross-CPU traffic required to obtain
> + * a spinlock, and keeps system performance from dropping off a cliff.
> + *
> + * There is a tradeoff. If we poll too often, the whole system is slowed
> + * down. If we sleep too long, the lock will go unused for a period of
> + * time. Adjusting "delay" to poll, on average, 2.7 times before the
> + * lock is obtained seems to result in low bus traffic. The combination
> + * of aiming for a non-integer amount of average polls, and scaling the
> + * sleep period proportionally to how many CPUs are ahead of us in the
> + * queue for this ticket lock seems to reduce the amount of time spent
> + * "oversleeping" the release of the lock.
>    */
> +#define MIN_SPINLOCK_DELAY 1
> +#define MAX_SPINLOCK_DELAY 1000
> +DEFINE_PER_CPU(int, spinlock_delay) = { MIN_SPINLOCK_DELAY };


This gives the same delay for all locks in the system, but the amount of 
work done under each lock is different.  So, for any given lock, the 
delay is not optimal.

This is an untested idea that came to me after looking at this:

o Assume that for any given lock, the optimal delay is the same for all 
CPUs in the system.

o Store a per-lock delay value in arch_spinlock_t.

o Once a CPU owns the lock it can update the delay as you do for the 
per_cpu version.  Tuning the delay on fewer of the locking operations 
reduces bus traffic, but makes it converge more slowly.

o Bonus points if you can update the delay as part of the releasing store.

>   void ticket_spin_lock_wait(arch_spinlock_t *lock, struct __raw_tickets inc)
>   {
> +	/*
> +	 * Use the raw per-cpu pointer; preemption is disabled in the
> +	 * spinlock code. This avoids put_cpu_var once we have the lock.
> +	 */
> +	int *delay_ptr = &per_cpu(spinlock_delay, smp_processor_id());
> +	int delay = *delay_ptr;
> +
>   	for (;;) {
> -		int loops = 50 * (__ticket_t)(inc.tail - inc.head);
> +		int loops = delay * (__ticket_t)(inc.tail - inc.head);
>   		while (loops--)
>   			cpu_relax();
>
>   		inc.head = ACCESS_ONCE(lock->tickets.head);
> -		if (inc.head == inc.tail)
> +		if (inc.head == inc.tail) {
> +			/* Decrease the delay, since we may have overslept. */
> +			if (delay > MIN_SPINLOCK_DELAY)
> +				delay--;
>   			break;
> +		}
> +
> +		/*
> +		 * The lock is still busy, the delay was not long enough.
> +		 * Going through here 2.7 times will, on average, cancel
> +		 * out the decrement above. Using a non-integer number
> +		 * gets rid of performance artifacts and reduces oversleeping.
> +		 */
> +		if (delay < MAX_SPINLOCK_DELAY &&
> +				(!(inc.head & 3) == 0 || (inc.head & 7) == 1))
> +			delay++;
>   	}
> +	*delay_ptr = delay;
>   }
>
>   /*
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>

next prev parent reply	other threads:[~2012-12-22  0:48 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-12-21 23:49 [RFC PATCH 0/3] x86,smp: make ticket spinlock proportional backoff w/ auto tuning Rik van Riel
2012-12-21 23:50 ` [RFC PATCH 1/3] x86,smp: move waiting on contended lock out of line Rik van Riel
2012-12-22  3:05   ` Steven Rostedt
2012-12-22  4:40   ` Michel Lespinasse
2012-12-22  4:48     ` Rik van Riel
2012-12-23 22:52   ` Rafael Aquini
2012-12-21 23:51 ` [RFC PATCH 2/3] x86,smp: proportional backoff for ticket spinlocks Rik van Riel
2012-12-22  3:07   ` Steven Rostedt
2012-12-22  3:14     ` Steven Rostedt
2012-12-22  3:47       ` Rik van Riel
2012-12-22  4:44   ` Michel Lespinasse
2012-12-23 22:55   ` Rafael Aquini
2012-12-21 23:51 ` [RFC PATCH 3/3] x86,smp: auto tune spinlock backoff delay factor Rik van Riel
2012-12-21 23:56   ` [RFC PATCH 3/3 -v2] " Rik van Riel
2012-12-22  0:18     ` Eric Dumazet
2012-12-22  2:43       ` Rik van Riel
2012-12-22  0:48     ` Eric Dumazet
2012-12-22  2:57       ` Rik van Riel
2012-12-22  3:29     ` Eric Dumazet
2012-12-22  3:44       ` Rik van Riel
2012-12-22  3:33     ` Steven Rostedt
2012-12-22  3:50       ` Rik van Riel
2012-12-26 19:10         ` Eric Dumazet
2012-12-26 19:27           ` Eric Dumazet
2012-12-26 19:51           ` Rik van Riel
2012-12-27  6:07             ` Michel Lespinasse
2012-12-27 14:27               ` Eric Dumazet
2012-12-27 14:35                 ` Rik van Riel
2012-12-27 18:41                   ` Jan Beulich
2012-12-27 19:09                     ` Rik van Riel
2013-01-03  9:05                       ` Jan Beulich
2013-01-03 13:24                         ` Steven Rostedt
2013-01-03 13:35                           ` Eric Dumazet
2013-01-03 15:32                             ` Steven Rostedt
2013-01-03 16:10                               ` Eric Dumazet
2013-01-03 16:45                                 ` Steven Rostedt
2013-01-03 17:54                                   ` Eric Dumazet
2012-12-27 18:49                   ` Eric Dumazet
2012-12-27 19:31                     ` Rik van Riel
2012-12-29  0:42                       ` Eric Dumazet
2012-12-29 10:27           ` Michel Lespinasse
2013-01-03 18:17             ` Eric Dumazet
2012-12-22  0:47   ` David Daney [this message]
2012-12-22  2:51     ` [RFC PATCH 3/3] " Rik van Riel
2012-12-22  3:49       ` Steven Rostedt
2012-12-22  3:58         ` Rik van Riel
2012-12-23 23:08           ` Rafael Aquini
2012-12-22  5:42   ` Michel Lespinasse
2012-12-22 14:32     ` Rik van Riel
2013-01-02  0:06 ` ticket spinlock proportional backoff experiments Michel Lespinasse
2013-01-02  0:09   ` [PATCH 1/2] x86,smp: simplify __ticket_spin_lock Michel Lespinasse
2013-01-02 15:31     ` Rik van Riel
2013-01-02  0:10   ` [PATCH 2/2] x86,smp: proportional backoff for ticket spinlocks Michel Lespinasse

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=50D5033A.2070309@gmail.com \
    --to=ddaney.cavm@gmail.com \
    --cc=JBeulich@novell.com \
    --cc=aquini@redhat.com \
    --cc=jeremy@goop.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lwoodman@redhat.com \
    --cc=riel@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=walken@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).