Re: [PATCH RFC ticketlock] Auto-queued ticketlock

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Steven Rostedt <rostedt@goodmis.org>
Cc: Waiman Long <waiman.long@hp.com>,
	linux-kernel@vger.kernel.org, mingo@elte.hu,
	laijs@cn.fujitsu.com, dipankar@in.ibm.com,
	akpm@linux-foundation.org, mathieu.desnoyers@efficios.com,
	josh@joshtriplett.org, niv@us.ibm.com, tglx@linutronix.de,
	peterz@infradead.org, Valdis.Kletnieks@vt.edu,
	dhowells@redhat.com, edumazet@google.com, darren@dvhart.com,
	fweisbec@gmail.com, sbw@mit.edu, torvalds@linux-foundation.org
Subject: Re: [PATCH RFC ticketlock] Auto-queued ticketlock
Date: Tue, 11 Jun 2013 09:43:13 -0700	[thread overview]
Message-ID: <20130611164313.GH5146@linux.vnet.ibm.com> (raw)
In-Reply-To: <1370967632.9844.182.camel@gandalf.local.home>

On Tue, Jun 11, 2013 at 12:20:32PM -0400, Steven Rostedt wrote:
> On Tue, 2013-06-11 at 11:57 -0400, Waiman Long wrote:
> 
> > This is an interesting patch and I think it is useful for workloads that 
> > run on systems with a large number of CPUs.
> 
> I would say it is definitely a fun academic patch, now if it is
> something for a production environment remains to be seen.

At the moment, it should be considered an intellectual exercise.  Might be
useful at some point, but I would personally rather that the offending
lock be broken up to reduce its contention.

> > > diff --git a/arch/x86/include/asm/spinlock_types.h b/arch/x86/include/asm/spinlock_types.h
> > > index ad0ad07..cdaefdd 100644
> > > --- a/arch/x86/include/asm/spinlock_types.h
> > > +++ b/arch/x86/include/asm/spinlock_types.h
> > > @@ -7,12 +7,18 @@
> > >
> > >   #include<linux/types.h>
> > >
> > > -#if (CONFIG_NR_CPUS<  256)
> > > +#if (CONFIG_NR_CPUS<  128)
> > >   typedef u8  __ticket_t;
> > >   typedef u16 __ticketpair_t;
> > > -#else
> > > +#define TICKET_T_CMP_GE(a, b) (UCHAR_MAX / 2>= (unsigned char)((a) - (b)))
> > > +#elif (CONFIG_NR_CPUS<  32768)
> > >   typedef u16 __ticket_t;
> > >   typedef u32 __ticketpair_t;
> > > +#define TICKET_T_CMP_GE(a, b) (USHRT_MAX / 2>= (unsigned short)((a) - (b)))
> > > +#else
> > > +typedef u32 __ticket_t;
> > > +typedef u64 __ticketpair_t;
> > > +#define TICKET_T_CMP_GE(a, b) (UINT_MAX / 2>= (unsigned int)((a) - (b)))
> > >   #endif
> > 
> > It is theoretically possible that a large number of CPUs (says 64 or 
> > more with CONFIG_NR_CPUS < 128) can acquire a ticket from the lock 
> > before the check for TICKET_T_CMP_GE() in tkt_spin_pass(). So the check 
> > will fail even when there is a large number of CPUs contending for the 
> > lock. The chance of this happening is, of course, extremely rare. This 
> > is not an error as the lock is still working as it should be without 
> > your change.
> 
> Can you explain this more. How can you acquire the ticket and update at
> the same time? If a queue has been set, then you can't acquire the
> ticket as the head has a 1 appended to it.

Suppose that CONFIG_NR_CPUS=127, and suppose that 65 CPUs atomically
increment ->tail before ...

Ah, good point.  If TKT_Q_SWITCH is less than 64, then at least one CPU
will see the need to switch to queued mode, and will do so regardless of
what the other CPUs think.  The key point is that each CPU will get its
ticket from the xadd(), and these will be issued in order.  I therefore
backed out my change of the limits.

> > > +/*
> > > + * TKT_Q_SWITCH is twice the number of CPUs that must be spinning on a
> > > + * given ticket lock to motivate switching to spinning on a queue.
> > > + * The reason that it is twice the number is because the bottom bit of
> > > + * the ticket is reserved for the bit that indicates that a queue is
> > > + * associated with the lock.
> > > + */
> > > +#define TKT_Q_SWITCH  (16 * 2)
> > > +
> > > +/*
> > > + * TKT_Q_NQUEUES is the number of queues to maintain.  Large systems
> > > + * might have multiple highly contended locks, so provide more queues for
> > > + * systems with larger numbers of CPUs.
> > > + */
> > > +#define TKT_Q_NQUEUES (DIV_ROUND_UP(NR_CPUS + TKT_Q_SWITCH - 1, TKT_Q_SWITCH) * 2)
> > > +
> > > +/* The queues themselves. */
> > > +struct tkt_q_head tkt_q_heads[TKT_Q_NQUEUES];
> > 
> > I am a bit concern about the size of the head queue table itself. RHEL6, 
> > for example, had defined CONFIG_NR_CPUS to be 4096 which mean a table 
> > size of 256. Maybe it is better to dynamically allocate the table at 
> > init time depending on the actual number of CPUs in the system.
> 
> Yeah, it can be allocated dynamically at boot.

But let's first demonstrate the need.  Keep in mind that an early-boot
deadlock would exercise this code.  Yes, it is just a check for NULL,
but on the other hand I didn't get the impression that you thought that
this code was too simple.  ;-)

> > > +/*
> > > + * Return a pointer to the queue header associated with the specified lock,
> > > + * or return NULL if there is no queue for the lock or if the lock's queue
> > > + * is in transition.
> > > + */
> > > +static struct tkt_q_head *tkt_q_find_head(arch_spinlock_t *asp)
> > > +{
> > > +	int i;
> > > +	int start;
> > > +
> > > +	start = i = tkt_q_hash(asp);
> > > +	do
> > > +		if (tkt_q_heads[i].ref == asp)
> > > +			return&tkt_q_heads[i];
> > > +	while ((i = tkt_q_next_slot(i)) != start);
> > > +	return NULL;
> > > +}
> > 
> > With a table size of 256 and you have to scan the whole table to find 
> > the right head queue. This can be a significant overhead. I will suggest 
> > setting a limiting of how many entries it scans before it aborts rather 
> > than checking the whole table.
> 
> We could add a limit, but in practice I'm not sure that would have any
> issue. I thought the same thing when I first saw this, but to hit most
> of the list, would require a large collision in the hash algorithm,
> would could probably be fixed with a better hash.
> 
> > 
> > > +/*
> > > + * Hand the lock off to the first CPU on the queue.
> > > + */
> > > +void tkt_q_do_wake(arch_spinlock_t *asp)
> > > +{
> > > +	struct tkt_q_head *tqhp;
> > > +	struct tkt_q *tqp;
> > > +
> > > +	/* If the queue is still being set up, wait for it. */
> > > +	while ((tqhp = tkt_q_find_head(asp)) == NULL)
> > > +		cpu_relax();
> > > +
> > > +	for (;;) {
> > > +
> > > +		/* Find the first queue element. */
> > > +		tqp = ACCESS_ONCE(tqhp->spin);
> > > +		if (tqp != NULL)
> > > +			break;  /* Element exists, hand off lock. */
> > > +		if (tkt_q_try_unqueue(asp, tqhp))
> > > +			return; /* No element, successfully removed queue. */
> > > +		cpu_relax();
> > > +	}
> > > +	if (ACCESS_ONCE(tqhp->head_tkt) != -1)
> > > +		ACCESS_ONCE(tqhp->head_tkt) = -1;
> > 
> > In case NR_CPUS is 32768 or higher, the ticket will be of type u32 and 
> > tqhp->head_tkt is s32. So -1 will be a valid ticket number. You may have 
> > to conditionally define head_tkt to be s64 when the ticket is u32.
> 
> Good point.
> 
> > 
> > Do you have any data on how much this patch can actually improve 
> > performance on certain workloads? This will help the discussion here.
> 
> Yeah, that's come up already in the thread. Linus wants to see hard
> numbers *and* an explanation of why the contended locks can't be fixed,
> before he even considers merging this type of change.

A point that I am definitely -not- arguing with.  ;-)

								Thanx, Paul

next prev parent reply	other threads:[~2013-06-11 16:44 UTC|newest]

Thread overview: 96+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-06-09 19:36 [PATCH RFC ticketlock] Auto-queued ticketlock Paul E. McKenney
2013-06-10 20:47 ` Steven Rostedt
2013-06-10 20:57   ` Paul E. McKenney
2013-06-10 21:01   ` Thomas Gleixner
2013-06-10 21:15     ` Paul E. McKenney
2013-06-10 21:08 ` Steven Rostedt
2013-06-10 21:30   ` Paul E. McKenney
2013-06-10 21:35 ` Eric Dumazet
2013-06-10 21:54   ` Paul E. McKenney
2013-06-10 23:02 ` Steven Rostedt
2013-06-11  0:22   ` Paul E. McKenney
2013-06-11  0:44 ` Steven Rostedt
2013-06-11  0:51   ` Linus Torvalds
2013-06-11  7:53     ` Lai Jiangshan
2013-06-11 10:14       ` Paul E. McKenney
2013-06-11 15:22         ` Steven Rostedt
2013-06-11 16:45           ` Paul E. McKenney
2013-06-11 10:06     ` Paul E. McKenney
2013-06-11 17:53     ` Davidlohr Bueso
2013-06-11 18:05       ` Paul E. McKenney
2013-06-11 18:10       ` Steven Rostedt
2013-06-11 18:14         ` Davidlohr Bueso
2013-06-11 18:46         ` Paul E. McKenney
2013-06-12 17:50         ` Davidlohr Bueso
2013-06-12 18:15           ` Linus Torvalds
2013-06-12 20:03             ` Davidlohr Bueso
2013-06-12 20:26               ` Linus Torvalds
2013-06-12 20:40                 ` Davidlohr Bueso
2013-06-12 21:06                 ` Raymond Jennings
2013-06-12 23:32                 ` Al Viro
2013-06-13  0:01                   ` Linus Torvalds
2013-06-13  0:20                     ` Al Viro
2013-06-13  0:38                       ` Linus Torvalds
2013-06-13  0:49                         ` Al Viro
2013-06-13  0:59                           ` Linus Torvalds
2013-06-14 15:00                             ` Waiman Long
2013-06-14 15:37                               ` Linus Torvalds
2013-06-14 18:17                                 ` Waiman Long
2013-06-15  1:26                                   ` Benjamin Herrenschmidt
2013-06-15  3:36                                     ` Waiman Long
2013-06-12 20:37               ` Linus Torvalds
2013-06-12 18:18           ` Steven Rostedt
2013-06-11  9:56   ` Paul E. McKenney
2013-06-11 15:00     ` Paul E. McKenney
2013-06-11  1:04 ` Steven Rostedt
2013-06-11  9:52   ` Paul E. McKenney
2013-06-11 14:48 ` Lai Jiangshan
2013-06-11 15:10   ` Lai Jiangshan
2013-06-11 16:48     ` Paul E. McKenney
2013-06-11 17:17       ` Linus Torvalds
2013-06-11 17:30         ` Paul E. McKenney
2013-06-11 16:21   ` Paul E. McKenney
2013-06-11 15:57 ` Waiman Long
2013-06-11 16:20   ` Steven Rostedt
2013-06-11 16:43     ` Paul E. McKenney [this message]
2013-06-11 17:13       ` Steven Rostedt
2013-06-11 17:43         ` Paul E. McKenney
2013-06-11 17:35     ` Waiman Long
2013-06-11 16:36   ` Paul E. McKenney
2013-06-11 17:01     ` Steven Rostedt
2013-06-11 17:16       ` Paul E. McKenney
2013-06-11 18:41     ` Waiman Long
2013-06-11 18:54       ` Davidlohr Bueso
2013-06-11 19:49       ` Paul E. McKenney
2013-06-11 20:09         ` Steven Rostedt
2013-06-11 20:32           ` Paul E. McKenney
2013-06-11 20:53             ` Steven Rostedt
2013-06-11 20:25         ` Jason Low
2013-06-11 20:36           ` Paul E. McKenney
2013-06-11 20:56         ` Steven Rostedt
2013-06-11 21:09           ` Paul E. McKenney
2013-06-12  1:19         ` Lai Jiangshan
2013-06-12  1:58           ` Steven Rostedt
2013-06-12 10:12             ` Paul E. McKenney
2013-06-12 11:06             ` Lai Jiangshan
2013-06-12 14:21               ` Paul E. McKenney
2013-06-12 14:15         ` Lai Jiangshan
2013-06-12 14:44           ` Paul E. McKenney
2013-06-11 17:02 ` [PATCH RFC ticketlock] v2 " Paul E. McKenney
2013-06-11 17:35   ` Linus Torvalds
2013-06-11 17:49     ` Paul E. McKenney
2013-06-11 17:36   ` Steven Rostedt
2013-06-11 17:52     ` Paul E. McKenney
2013-06-12 15:40   ` [PATCH RFC ticketlock] v3 " Paul E. McKenney
2013-06-12 16:13     ` Lai Jiangshan
2013-06-12 16:59       ` Paul E. McKenney
2013-06-13  2:55     ` Lai Jiangshan
2013-06-13 15:22       ` Paul E. McKenney
2013-06-13 23:25         ` Lai Jiangshan
2013-06-13 23:57           ` Paul E. McKenney
2013-06-14  1:28             ` Lai Jiangshan
2013-06-14 23:49               ` Paul E. McKenney
2013-06-14  7:12             ` Lai Jiangshan
2013-06-14 23:46               ` Paul E. McKenney
     [not found]     ` <CAC4Lta3dpTDc19rXLVQkZrxbu8AJL+Foc6ocAktUAozCpk2-Mg@mail.gmail.com>
2013-07-01  9:19       ` Raghavendra KT
2013-07-02  5:56         ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130611164313.GH5146@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=Valdis.Kletnieks@vt.edu \
    --cc=akpm@linux-foundation.org \
    --cc=darren@dvhart.com \
    --cc=dhowells@redhat.com \
    --cc=dipankar@in.ibm.com \
    --cc=edumazet@google.com \
    --cc=fweisbec@gmail.com \
    --cc=josh@joshtriplett.org \
    --cc=laijs@cn.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mingo@elte.hu \
    --cc=niv@us.ibm.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=sbw@mit.edu \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=waiman.long@hp.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.