From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Steven Rostedt <rostedt@goodmis.org>
Cc: linux-kernel@vger.kernel.org, mingo@elte.hu,
laijs@cn.fujitsu.com, dipankar@in.ibm.com,
akpm@linux-foundation.org, mathieu.desnoyers@efficios.com,
josh@joshtriplett.org, niv@us.ibm.com, tglx@linutronix.de,
peterz@infradead.org, Valdis.Kletnieks@vt.edu,
dhowells@redhat.com, edumazet@google.com, darren@dvhart.com,
fweisbec@gmail.com, sbw@mit.edu, torvalds@linux-foundation.org
Subject: Re: [PATCH RFC ticketlock] Auto-queued ticketlock
Date: Tue, 11 Jun 2013 08:00:32 -0700 [thread overview]
Message-ID: <20130611150032.GA1762@linux.vnet.ibm.com> (raw)
In-Reply-To: <20130611095655.GA5146@linux.vnet.ibm.com>
On Tue, Jun 11, 2013 at 02:56:55AM -0700, Paul E. McKenney wrote:
> On Mon, Jun 10, 2013 at 08:44:40PM -0400, Steven Rostedt wrote:
> > On Sun, 2013-06-09 at 12:36 -0700, Paul E. McKenney wrote:
[ . . . ]
> > > +bool tkt_q_do_spin(arch_spinlock_t *asp, struct __raw_tickets inc)
> > > +{
> > > + struct tkt_q **oldtail;
> > > + struct tkt_q tq;
> > > + struct tkt_q_head *tqhp;
> > > +
> > > + /*
> > > + * Ensure that accesses to queue header happen after sensing
> > > + * the lock's have-queue bit.
> > > + */
> > > + smp_mb(); /* See above block comment. */
> > > +
> > > + /* If there no longer is a queue, leave. */
> > > + tqhp = tkt_q_find_head(asp);
> > > + if (tqhp == NULL)
> > > + return false;
> > > +
> > > + /* Initialize our queue element. */
> > > + tq.cpu = raw_smp_processor_id();
> > > + tq.tail = inc.tail;
> > > + tq.next = NULL;
> > > +
> > > + /* Check to see if we already hold the lock. */
> > > + if (ACCESS_ONCE(tqhp->head_tkt) == inc.tail) {
> > > + /* The last holder left before queue formed, we hold lock. */
> > > + tqhp->head_tkt = -1;
> > > + return true;
> > > + }
> > > +
> > > + /* Add our element to the tail of the queue. */
> > > + oldtail = xchg(&tqhp->spin_tail, &tq.next);
> >
> > Boy this is tricky code! I thought I found a race window here, but as I
> > went to write my email saying "Gotcha!" I found that it wasn't a race
> > after all. But as I went though the effort of writing this, I figured I
> > would send this out as documentation for others to see. Hmm, I wonder if
> > we can use this email to add more comments. Anyway, here's what I
> > thought was wrong ;-)
>
> If you didn't know any better, you might even think that I had done
> something like this before. ;-)
>
> > OK, I originally thought there was a race window here. Let's say that an
> > NMI hit right here, and it happens to be a big one, where lots of things
> > can happen on other CPUs right now.
> >
> > The scenario is that there's just one item on the queue, which is
> > waiting for the lock to be released, and is spinning below in the:
> >
> > while (ACCESS_ONCE(tq.cpu) != -1)
> > cpu_relax();
> >
> > And then the lock is released, where in tkt_q_do_wake() the following is
> > called:
> >
> > ACCESS_ONCE(tqp->cpu) = -1;
> >
> > Now the old queued task is released. But it's tq->next hasn't been set
> > yet, and is still NULL. It leaves by doing:
> >
> > ACCESS_ONCE(tqhp->spin) = tq.next;
> > return true;
> >
> > All before this task gets to set *oldtail to &tq. But, I then looked
> > below...
> >
> >
> > > + ACCESS_ONCE(*oldtail) = &tq;
> > > +
> > > + /* Spin until handoff. */
> > > + while (ACCESS_ONCE(tq.cpu) != -1)
> > > + cpu_relax();
> > > +
> > > + /*
> > > + * Remove our element from the queue. If the queue is now empty,
> > > + * update carefully so that the next acquisition will queue itself
> > > + * at the head of the list.
> > > + */
> > > + if (tq.next == NULL) {
> >
> > This checks for that scenario.
>
> Yep!
>
> > As if the old task were to come out
> > spinning, the problem would only be if it was the last one on the list,
> > and its tq.next was NULL. But if that was the case, then we set spin to
> > NULL and do the next trick, where I thought I gotcha again...
> >
> >
> > > +
> > > + /* Mark the queue empty. */
> > > + tqhp->spin = NULL;
> > > +
> > > + /* Try to point the tail back at the head. */
> > > + if (cmpxchg(&tqhp->spin_tail,
> > > + &tq.next,
> > > + &tqhp->spin) == &tq.next)
> >
> > Here, I was thinking, oh wait, what happens if this is called right
> > before the xchg() above. Then we would set spin_tail but not update the
> > old tq.next. But wait! look at what we assign spin_tail to. It's the
> > address of spin, which would be what oldtail would point to above, and
> > then above would set spin to the new tq!
>
> Yep again!
>
> > OK, I haven't found a issue here yet, but youss are beiing trickssy! We
> > don't like trickssy, and we must find precccciouss!!!
> >
> >
> > This code is starting to make me look like Gollum :-p
>
> Hmmm... The time and effort to do this might almost have been worthwhile
> just to accomplish that! ;-)
>
> But yes, this would need better comments, design documentation, or
> maybe both.
And for whatever it might be worth, here is an attempted upgrade for
comments.
First, I upgrade the comment for the xchg() that does the enqueue:
/*
* Add our element to the tail of the queue. Note that if the
* queue is empty, the ->spin_tail pointer will reference
* the queue's head pointer, namely ->spin.
*/
oldtail = xchg(&tqhp->spin_tail, &tq.next);
ACCESS_ONCE(*oldtail) = &tq;
Next, I upgrade the comment for the dequeue operation:
/*
* Remove our element from the queue. If the queue is now empty,
* update carefully so that the next acquisition will enqueue itself
* at the head of the list. Of course, the next enqueue operation
* might be happening concurrently, and this code needs to handle all
* of the possible combinations, keeping in mind that the enqueue
* operation happens in two stages: (1) update the tail pointer and
* (2) update the predecessor's ->next pointer. With this in mind,
* the following code needs to deal with three scenarios:
*
* 1. tq is the last entry. In this case, we use cmpxchg to
* point the list tail back to the list head (->spin). If
* the cmpxchg fails, that indicates that we are instead
* in scenario 2 below. If the cmpxchg succeeds, the next
* enqueue operation's tail-pointer exchange will enqueue
* the next element at the queue head, because the ->spin_tail
* pointer now references the queue head.
*
* 2. tq is the last entry, and the next entry has updated the
* tail pointer but has not yet updated tq.next. In this
* case, tq.next is NULL, the cmpxchg will fail, and the
* code will wait for the enqueue to complete before completing
* removal of tq from the list.
*
* 3. tq is not the last pointer. In this case, tq.next is non-NULL,
* so the following code simply removes tq from the list.
*/
if (tq.next == NULL) {
/* Mark the queue empty. */
tqhp->spin = NULL;
/* Try to point the tail back at the head. */
if (cmpxchg(&tqhp->spin_tail,
&tq.next,
&tqhp->spin) == &tq.next)
return true; /* Succeeded, queue is now empty. */
/* Failed, if needed, wait for the enqueue to complete. */
while (tq.next == NULL)
cpu_relax();
/* The following code will repair the head. */
}
smp_mb(); /* Force ordering between handoff and critical section. */
/*
* Advance list-head pointer. This same task will be the next to
* access this when releasing the lock, so no need for a memory
* barrier after the following assignment.
*/
ACCESS_ONCE(tqhp->spin) = tq.next;
return true;
}
Thanx, Paul
next prev parent reply other threads:[~2013-06-11 15:15 UTC|newest]
Thread overview: 96+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-06-09 19:36 [PATCH RFC ticketlock] Auto-queued ticketlock Paul E. McKenney
2013-06-10 20:47 ` Steven Rostedt
2013-06-10 20:57 ` Paul E. McKenney
2013-06-10 21:01 ` Thomas Gleixner
2013-06-10 21:15 ` Paul E. McKenney
2013-06-10 21:08 ` Steven Rostedt
2013-06-10 21:30 ` Paul E. McKenney
2013-06-10 21:35 ` Eric Dumazet
2013-06-10 21:54 ` Paul E. McKenney
2013-06-10 23:02 ` Steven Rostedt
2013-06-11 0:22 ` Paul E. McKenney
2013-06-11 0:44 ` Steven Rostedt
2013-06-11 0:51 ` Linus Torvalds
2013-06-11 7:53 ` Lai Jiangshan
2013-06-11 10:14 ` Paul E. McKenney
2013-06-11 15:22 ` Steven Rostedt
2013-06-11 16:45 ` Paul E. McKenney
2013-06-11 10:06 ` Paul E. McKenney
2013-06-11 17:53 ` Davidlohr Bueso
2013-06-11 18:05 ` Paul E. McKenney
2013-06-11 18:10 ` Steven Rostedt
2013-06-11 18:14 ` Davidlohr Bueso
2013-06-11 18:46 ` Paul E. McKenney
2013-06-12 17:50 ` Davidlohr Bueso
2013-06-12 18:15 ` Linus Torvalds
2013-06-12 20:03 ` Davidlohr Bueso
2013-06-12 20:26 ` Linus Torvalds
2013-06-12 20:40 ` Davidlohr Bueso
2013-06-12 21:06 ` Raymond Jennings
2013-06-12 23:32 ` Al Viro
2013-06-13 0:01 ` Linus Torvalds
2013-06-13 0:20 ` Al Viro
2013-06-13 0:38 ` Linus Torvalds
2013-06-13 0:49 ` Al Viro
2013-06-13 0:59 ` Linus Torvalds
2013-06-14 15:00 ` Waiman Long
2013-06-14 15:37 ` Linus Torvalds
2013-06-14 18:17 ` Waiman Long
2013-06-15 1:26 ` Benjamin Herrenschmidt
2013-06-15 3:36 ` Waiman Long
2013-06-12 20:37 ` Linus Torvalds
2013-06-12 18:18 ` Steven Rostedt
2013-06-11 9:56 ` Paul E. McKenney
2013-06-11 15:00 ` Paul E. McKenney [this message]
2013-06-11 1:04 ` Steven Rostedt
2013-06-11 9:52 ` Paul E. McKenney
2013-06-11 14:48 ` Lai Jiangshan
2013-06-11 15:10 ` Lai Jiangshan
2013-06-11 16:48 ` Paul E. McKenney
2013-06-11 17:17 ` Linus Torvalds
2013-06-11 17:30 ` Paul E. McKenney
2013-06-11 16:21 ` Paul E. McKenney
2013-06-11 15:57 ` Waiman Long
2013-06-11 16:20 ` Steven Rostedt
2013-06-11 16:43 ` Paul E. McKenney
2013-06-11 17:13 ` Steven Rostedt
2013-06-11 17:43 ` Paul E. McKenney
2013-06-11 17:35 ` Waiman Long
2013-06-11 16:36 ` Paul E. McKenney
2013-06-11 17:01 ` Steven Rostedt
2013-06-11 17:16 ` Paul E. McKenney
2013-06-11 18:41 ` Waiman Long
2013-06-11 18:54 ` Davidlohr Bueso
2013-06-11 19:49 ` Paul E. McKenney
2013-06-11 20:09 ` Steven Rostedt
2013-06-11 20:32 ` Paul E. McKenney
2013-06-11 20:53 ` Steven Rostedt
2013-06-11 20:25 ` Jason Low
2013-06-11 20:36 ` Paul E. McKenney
2013-06-11 20:56 ` Steven Rostedt
2013-06-11 21:09 ` Paul E. McKenney
2013-06-12 1:19 ` Lai Jiangshan
2013-06-12 1:58 ` Steven Rostedt
2013-06-12 10:12 ` Paul E. McKenney
2013-06-12 11:06 ` Lai Jiangshan
2013-06-12 14:21 ` Paul E. McKenney
2013-06-12 14:15 ` Lai Jiangshan
2013-06-12 14:44 ` Paul E. McKenney
2013-06-11 17:02 ` [PATCH RFC ticketlock] v2 " Paul E. McKenney
2013-06-11 17:35 ` Linus Torvalds
2013-06-11 17:49 ` Paul E. McKenney
2013-06-11 17:36 ` Steven Rostedt
2013-06-11 17:52 ` Paul E. McKenney
2013-06-12 15:40 ` [PATCH RFC ticketlock] v3 " Paul E. McKenney
2013-06-12 16:13 ` Lai Jiangshan
2013-06-12 16:59 ` Paul E. McKenney
2013-06-13 2:55 ` Lai Jiangshan
2013-06-13 15:22 ` Paul E. McKenney
2013-06-13 23:25 ` Lai Jiangshan
2013-06-13 23:57 ` Paul E. McKenney
2013-06-14 1:28 ` Lai Jiangshan
2013-06-14 23:49 ` Paul E. McKenney
2013-06-14 7:12 ` Lai Jiangshan
2013-06-14 23:46 ` Paul E. McKenney
[not found] ` <CAC4Lta3dpTDc19rXLVQkZrxbu8AJL+Foc6ocAktUAozCpk2-Mg@mail.gmail.com>
2013-07-01 9:19 ` Raghavendra KT
2013-07-02 5:56 ` Paul E. McKenney
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130611150032.GA1762@linux.vnet.ibm.com \
--to=paulmck@linux.vnet.ibm.com \
--cc=Valdis.Kletnieks@vt.edu \
--cc=akpm@linux-foundation.org \
--cc=darren@dvhart.com \
--cc=dhowells@redhat.com \
--cc=dipankar@in.ibm.com \
--cc=edumazet@google.com \
--cc=fweisbec@gmail.com \
--cc=josh@joshtriplett.org \
--cc=laijs@cn.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mathieu.desnoyers@efficios.com \
--cc=mingo@elte.hu \
--cc=niv@us.ibm.com \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=sbw@mit.edu \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.