All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: linux-kernel@vger.kernel.org, mingo@elte.hu, dipankar@in.ibm.com,
	akpm@linux-foundation.org, mathieu.desnoyers@efficios.com,
	josh@joshtriplett.org, niv@us.ibm.com, tglx@linutronix.de,
	peterz@infradead.org, rostedt@goodmis.org,
	Valdis.Kletnieks@vt.edu, dhowells@redhat.com,
	edumazet@google.com, darren@dvhart.com, fweisbec@gmail.com,
	sbw@mit.edu, torvalds@linux-foundation.org, walken@google.com,
	waiman.long@hp.com, davidlohr.bueso@hp.com
Subject: Re: [PATCH RFC ticketlock] v3 Auto-queued ticketlock
Date: Thu, 13 Jun 2013 08:22:38 -0700	[thread overview]
Message-ID: <20130613152238.GB5146@linux.vnet.ibm.com> (raw)
In-Reply-To: <51B934AD.1070807@cn.fujitsu.com>

On Thu, Jun 13, 2013 at 10:55:41AM +0800, Lai Jiangshan wrote:
> On 06/12/2013 11:40 PM, Paul E. McKenney wrote:
> > Breaking up locks is better than implementing high-contention locks, but
> > if we must have high-contention locks, why not make them automatically
> > switch between light-weight ticket locks at low contention and queued
> > locks at high contention?  After all, this would remove the need for
> > the developer to predict which locks will be highly contended.
> > 
> > This commit allows ticket locks to automatically switch between pure
> > ticketlock and queued-lock operation as needed.  If too many CPUs are
> > spinning on a given ticket lock, a queue structure will be allocated
> > and the lock will switch to queued-lock operation.  When the lock becomes
> > free, it will switch back into ticketlock operation.  The low-order bit
> > of the head counter is used to indicate that the lock is in queued mode,
> > which forces an unconditional mismatch between the head and tail counters.
> > This approach means that the common-case code path under conditions of
> > low contention is very nearly that of a plain ticket lock.
> > 
> > A fixed number of queueing structures is statically allocated in an
> > array.  The ticket-lock address is used to hash into an initial element,
> > but if that element is already in use, it moves to the next element.  If
> > the entire array is already in use, continue to spin in ticket mode.
> > 
> > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > [ paulmck: Eliminate duplicate code and update comments (Steven Rostedt). ]
> > [ paulmck: Address Eric Dumazet review feedback. ]
> > [ paulmck: Use Lai Jiangshan idea to eliminate smp_mb(). ]
> > [ paulmck: Expand ->head_tkt from s32 to s64 (Waiman Long). ]
> > [ paulmck: Move cpu_relax() to main spin loop (Steven Rostedt). ]
> > [ paulmck: Reduce queue-switch contention (Waiman Long). ]
> > [ paulmck: __TKT_SPIN_INC for __ticket_spin_trylock() (Steffen Persvold). ]
> > [ paulmck: Type safety fixes (Steven Rostedt). ]
> > [ paulmck: Pre-check cmpxchg() value (Waiman Long). ]
> > [ paulmck: smp_mb() downgrade to smp_wmb() (Lai Jiangshan). ]
> 
> 
> Hi, Paul,
> 
> I simplify the code and remove the search by encoding the index of struct tkt_q_head
> into lock->tickets.head.
> 
> A) lock->tickets.head(when queued-lock):
> ---------------------------------
>  index of struct tkt_q_head |0|1|
> ---------------------------------

Interesting approach!  It might reduce queued-mode overhead a bit in
some cases, though I bet that in the common case the first queue element
examined is the right one.  More on this below.

> The bit0 = 1 for queued, and the bit1 = 0 is reserved for __ticket_spin_unlock(),
> thus __ticket_spin_unlock() will not change the higher bits of lock->tickets.head.
> 
> B) tqhp->head is for the real value of lock->tickets.head.
> if the last bit of tqhp->head is 1, it means it is the head ticket when started queuing.

But don't you also need the xadd() in __ticket_spin_unlock() to become
a cmpxchg() for this to work?  Or is your patch missing your changes to
arch/x86/include/asm/spinlock.h?  Either way, this is likely to increase
the no-contention overhead, which might be counterproductive.  Wouldn't
hurt to get measurements, though.

Given the results that Davidlohr posted, I believe that the following
optimizations would also provide some improvement:

1.	Move the call to tkt_spin_pass() from __ticket_spin_lock()
	to a separate linker section in order to reduce the icache
	penalty exacted by the spinloop.  This is likely to be causing
	some of the performance reductions in the cases where ticket
	locks are not highly contended.

2.	Limit the number of elements searched for in the array of
	queues.  However, this would help only if a number of ticket
	locks were in queued mode at the same time.

3.	Dynamically allocate the queue array at boot.  This might
	also reduce cache pressure, again, at least in cases where
	there are a number of ticket locks in queued mode at the
	same time.

Frederic just reminded me that I owe him some energy-efficiency improvements
for adaptive ticks, so I won't get to these very quickly.  Please feel free
to take these on -- the patch clearly does well under high contention, so
reducing the no-contention penalty could really help.

							Thanx, Paul

> Thanks,
> Lai
> 
>  kernel/tktqlock.c |   51 +++++++++++++--------------------------------------
>  1 files changed, 13 insertions(+), 38 deletions(-)
> 
> diff --git a/kernel/tktqlock.c b/kernel/tktqlock.c
> index 912817c..1329d0f 100644
> --- a/kernel/tktqlock.c
> +++ b/kernel/tktqlock.c
> @@ -33,7 +33,7 @@ struct tkt_q {
> 
>  struct tkt_q_head {
>  	arch_spinlock_t *ref;		/* Pointer to spinlock if in use. */
> -	s64 head_tkt;			/* Head ticket when started queuing. */
> +	__ticket_t head;		/* Real head when queued. */
>  	struct tkt_q *spin;		/* Head of queue. */
>  	struct tkt_q **spin_tail;	/* Tail of queue. */
>  };
> @@ -77,15 +77,8 @@ static unsigned long tkt_q_hash(arch_spinlock_t *lock)
>   */
>  static struct tkt_q_head *tkt_q_find_head(arch_spinlock_t *lock)
>  {
> -	int i;
> -	int start;
> -
> -	start = i = tkt_q_hash(lock);
> -	do
> -		if (ACCESS_ONCE(tkt_q_heads[i].ref) == lock)
> -			return &tkt_q_heads[i];
> -	while ((i = tkt_q_next_slot(i)) != start);
> -	return NULL;
> +	BUILD_BUG_ON(TKT_Q_NQUEUES > (1 << (TICKET_SHIFT - 2)));
> +	return &tkt_q_heads[ACCESS_ONCE(lock->tickets.head) >> 2];
>  }
> 
>  /*
> @@ -101,11 +94,11 @@ static bool tkt_q_try_unqueue(arch_spinlock_t *lock, struct tkt_q_head *tqhp)
> 
>  	/* Pick up the ticket values. */
>  	asold = ACCESS_ONCE(*lock);
> -	if ((asold.tickets.head & ~0x1) == asold.tickets.tail) {
> +	if (tqhp->head == asold.tickets.tail) {
> 
>  		/* Attempt to mark the lock as not having a queue. */
>  		asnew = asold;
> -		asnew.tickets.head &= ~0x1;
> +		asnew.tickets.head = tqhp->head;
>  		if (cmpxchg(&lock->head_tail,
>  			    asold.head_tail,
>  			    asnew.head_tail) == asold.head_tail) {
> @@ -128,12 +121,9 @@ void tkt_q_do_wake(arch_spinlock_t *lock)
>  	struct tkt_q_head *tqhp;
>  	struct tkt_q *tqp;
> 
> -	/*
> -	 * If the queue is still being set up, wait for it.  Note that
> -	 * the caller's xadd() provides the needed memory ordering.
> -	 */
> -	while ((tqhp = tkt_q_find_head(lock)) == NULL)
> -		cpu_relax();
> +	tqhp = tkt_q_find_head(lock);
> +	ACCESS_ONCE(lock->tickets.head) -= __TKT_SPIN_INC;
> +	ACCESS_ONCE(tqhp->head) = (tqhp->head & ~0x1) + __TKT_SPIN_INC;
> 
>  	for (;;) {
> 
> @@ -145,9 +135,7 @@ void tkt_q_do_wake(arch_spinlock_t *lock)
>  			return; /* No element, successfully removed queue. */
>  		cpu_relax();
>  	}
> -	if (ACCESS_ONCE(tqhp->head_tkt) != -1)
> -		ACCESS_ONCE(tqhp->head_tkt) = -1;
> -	smp_mb(); /* Order pointer fetch and assignment against handoff. */
> +	smp_mb(); /* Order modification, pointer fetch and assignment against handoff. */
>  	ACCESS_ONCE(tqp->cpu) = -1;
>  }
>  EXPORT_SYMBOL(tkt_q_do_wake);
> @@ -169,10 +157,7 @@ bool tkt_q_do_spin(arch_spinlock_t *lock, struct __raw_tickets inc)
>  	 */
>  	smp_mb();  /* See above block comment. */
> 
> -	/* If there no longer is a queue, leave. */
>  	tqhp = tkt_q_find_head(lock);
> -	if (tqhp == NULL)
> -		return false;
> 
>  	/* Initialize our queue element. */
>  	tq.cpu = raw_smp_processor_id();
> @@ -180,9 +165,8 @@ bool tkt_q_do_spin(arch_spinlock_t *lock, struct __raw_tickets inc)
>  	tq.next = NULL;
> 
>  	/* Check to see if we already hold the lock. */
> -	if (ACCESS_ONCE(tqhp->head_tkt) == inc.tail) {
> +	if (ACCESS_ONCE(tqhp->head) == (inc.tail | 0x1)) {
>  		/* The last holder left before queue formed, we hold lock. */
> -		tqhp->head_tkt = -1;
>  		return true;
>  	}
> 
> @@ -290,16 +274,14 @@ tkt_q_init_contend(int i, arch_spinlock_t *lock, struct __raw_tickets inc)
>  		 * Record the head counter in case one of the spinning
>  		 * CPUs already holds the lock but doesn't realize it yet.
>  		 */
> -		tqhp->head_tkt = asold.tickets.head;
> +		tqhp->head = asold.tickets.head | 0x1;
> 
>  		/* The low-order bit in the head counter says "queued". */
> -		asnew.tickets.head |= 0x1;
> +		asnew.tickets.head = (i << 2) + 0x1;
>  	} while (cmpxchg(&lock->head_tail,
>  		 asold.head_tail,
>  		 asnew.head_tail) != asold.head_tail);
> 
> -	/* Point the queue at the lock and go spin on it. */
> -	ACCESS_ONCE(tqhp->ref) = lock;
>  	return tkt_q_do_spin(lock, inc);
>  }
> 
> @@ -321,15 +303,8 @@ bool tkt_q_start_contend(arch_spinlock_t *lock, struct __raw_tickets inc)
>  	 * the lock with the corresponding queue.
>  	 */
>  	do {
> -		/*
> -		 * Use 0x1 to mark the queue in use, but also avoiding
> -		 * any spinners trying to use it before we get it all
> -		 * initialized.
> -		 */
>  		if (!tkt_q_heads[i].ref &&
> -		    cmpxchg(&tkt_q_heads[i].ref,
> -			    NULL,
> -			    (arch_spinlock_t *)0x1) == NULL) {
> +		    cmpxchg(&tkt_q_heads[i].ref, NULL, lock) == NULL) {
> 
>  			/* Succeeded, now go initialize it. */
>  			return tkt_q_init_contend(i, lock, inc);
> 


  reply	other threads:[~2013-06-13 15:23 UTC|newest]

Thread overview: 96+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-06-09 19:36 [PATCH RFC ticketlock] Auto-queued ticketlock Paul E. McKenney
2013-06-10 20:47 ` Steven Rostedt
2013-06-10 20:57   ` Paul E. McKenney
2013-06-10 21:01   ` Thomas Gleixner
2013-06-10 21:15     ` Paul E. McKenney
2013-06-10 21:08 ` Steven Rostedt
2013-06-10 21:30   ` Paul E. McKenney
2013-06-10 21:35 ` Eric Dumazet
2013-06-10 21:54   ` Paul E. McKenney
2013-06-10 23:02 ` Steven Rostedt
2013-06-11  0:22   ` Paul E. McKenney
2013-06-11  0:44 ` Steven Rostedt
2013-06-11  0:51   ` Linus Torvalds
2013-06-11  7:53     ` Lai Jiangshan
2013-06-11 10:14       ` Paul E. McKenney
2013-06-11 15:22         ` Steven Rostedt
2013-06-11 16:45           ` Paul E. McKenney
2013-06-11 10:06     ` Paul E. McKenney
2013-06-11 17:53     ` Davidlohr Bueso
2013-06-11 18:05       ` Paul E. McKenney
2013-06-11 18:10       ` Steven Rostedt
2013-06-11 18:14         ` Davidlohr Bueso
2013-06-11 18:46         ` Paul E. McKenney
2013-06-12 17:50         ` Davidlohr Bueso
2013-06-12 18:15           ` Linus Torvalds
2013-06-12 20:03             ` Davidlohr Bueso
2013-06-12 20:26               ` Linus Torvalds
2013-06-12 20:40                 ` Davidlohr Bueso
2013-06-12 21:06                 ` Raymond Jennings
2013-06-12 23:32                 ` Al Viro
2013-06-13  0:01                   ` Linus Torvalds
2013-06-13  0:20                     ` Al Viro
2013-06-13  0:38                       ` Linus Torvalds
2013-06-13  0:49                         ` Al Viro
2013-06-13  0:59                           ` Linus Torvalds
2013-06-14 15:00                             ` Waiman Long
2013-06-14 15:37                               ` Linus Torvalds
2013-06-14 18:17                                 ` Waiman Long
2013-06-15  1:26                                   ` Benjamin Herrenschmidt
2013-06-15  3:36                                     ` Waiman Long
2013-06-12 20:37               ` Linus Torvalds
2013-06-12 18:18           ` Steven Rostedt
2013-06-11  9:56   ` Paul E. McKenney
2013-06-11 15:00     ` Paul E. McKenney
2013-06-11  1:04 ` Steven Rostedt
2013-06-11  9:52   ` Paul E. McKenney
2013-06-11 14:48 ` Lai Jiangshan
2013-06-11 15:10   ` Lai Jiangshan
2013-06-11 16:48     ` Paul E. McKenney
2013-06-11 17:17       ` Linus Torvalds
2013-06-11 17:30         ` Paul E. McKenney
2013-06-11 16:21   ` Paul E. McKenney
2013-06-11 15:57 ` Waiman Long
2013-06-11 16:20   ` Steven Rostedt
2013-06-11 16:43     ` Paul E. McKenney
2013-06-11 17:13       ` Steven Rostedt
2013-06-11 17:43         ` Paul E. McKenney
2013-06-11 17:35     ` Waiman Long
2013-06-11 16:36   ` Paul E. McKenney
2013-06-11 17:01     ` Steven Rostedt
2013-06-11 17:16       ` Paul E. McKenney
2013-06-11 18:41     ` Waiman Long
2013-06-11 18:54       ` Davidlohr Bueso
2013-06-11 19:49       ` Paul E. McKenney
2013-06-11 20:09         ` Steven Rostedt
2013-06-11 20:32           ` Paul E. McKenney
2013-06-11 20:53             ` Steven Rostedt
2013-06-11 20:25         ` Jason Low
2013-06-11 20:36           ` Paul E. McKenney
2013-06-11 20:56         ` Steven Rostedt
2013-06-11 21:09           ` Paul E. McKenney
2013-06-12  1:19         ` Lai Jiangshan
2013-06-12  1:58           ` Steven Rostedt
2013-06-12 10:12             ` Paul E. McKenney
2013-06-12 11:06             ` Lai Jiangshan
2013-06-12 14:21               ` Paul E. McKenney
2013-06-12 14:15         ` Lai Jiangshan
2013-06-12 14:44           ` Paul E. McKenney
2013-06-11 17:02 ` [PATCH RFC ticketlock] v2 " Paul E. McKenney
2013-06-11 17:35   ` Linus Torvalds
2013-06-11 17:49     ` Paul E. McKenney
2013-06-11 17:36   ` Steven Rostedt
2013-06-11 17:52     ` Paul E. McKenney
2013-06-12 15:40   ` [PATCH RFC ticketlock] v3 " Paul E. McKenney
2013-06-12 16:13     ` Lai Jiangshan
2013-06-12 16:59       ` Paul E. McKenney
2013-06-13  2:55     ` Lai Jiangshan
2013-06-13 15:22       ` Paul E. McKenney [this message]
2013-06-13 23:25         ` Lai Jiangshan
2013-06-13 23:57           ` Paul E. McKenney
2013-06-14  1:28             ` Lai Jiangshan
2013-06-14 23:49               ` Paul E. McKenney
2013-06-14  7:12             ` Lai Jiangshan
2013-06-14 23:46               ` Paul E. McKenney
     [not found]     ` <CAC4Lta3dpTDc19rXLVQkZrxbu8AJL+Foc6ocAktUAozCpk2-Mg@mail.gmail.com>
2013-07-01  9:19       ` Raghavendra KT
2013-07-02  5:56         ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130613152238.GB5146@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=Valdis.Kletnieks@vt.edu \
    --cc=akpm@linux-foundation.org \
    --cc=darren@dvhart.com \
    --cc=davidlohr.bueso@hp.com \
    --cc=dhowells@redhat.com \
    --cc=dipankar@in.ibm.com \
    --cc=edumazet@google.com \
    --cc=fweisbec@gmail.com \
    --cc=josh@joshtriplett.org \
    --cc=laijs@cn.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mingo@elte.hu \
    --cc=niv@us.ibm.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=sbw@mit.edu \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=waiman.long@hp.com \
    --cc=walken@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.