All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: "Yan, Zheng" <zheng.z.yan@linux.intel.com>
Cc: Oleg Nesterov <oleg@redhat.com>,
	Chris Mason <chris.mason@oracle.com>,
	Frank Rowand <frank.rowand@am.sony.com>,
	Ingo Molnar <mingo@elte.hu>, Thomas Gleixner <tglx@linutronix.de>,
	Mike Galbraith <efault@gmx.de>, Paul Turner <pjt@google.com>,
	Jens Axboe <axboe@kernel.dk>,
	linux-kernel@vger.kernel.org
Subject: Re: [RFC][PATCH 5/5] sched: Reduce ttwu rq->lock contention
Date: Fri, 17 Dec 2010 14:23:20 +0100	[thread overview]
Message-ID: <1292592200.2266.220.camel@twins> (raw)
In-Reply-To: <AANLkTimXyic51Qhe_WsfFBwAw10AKdB7e-Z2q0oLRYKP@mail.gmail.com>

On Fri, 2010-12-17 at 11:06 +0800, Yan, Zheng wrote:
> On Fri, Dec 17, 2010 at 4:32 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> > @@ -953,7 +955,7 @@ static inline struct rq *__task_rq_lock(
> >        for (;;) {
> >                rq = task_rq(p);
> >                raw_spin_lock(&rq->lock);
> > -               if (likely(rq == task_rq(p)))
> > +               if (likely(rq == task_rq(p)) && !task_is_waking(p))
> >                        return rq;
> >                raw_spin_unlock(&rq->lock);
> >        }
> > @@ -973,7 +975,7 @@ static struct rq *task_rq_lock(struct ta
> >                local_irq_save(*flags);
> >                rq = task_rq(p);
> >                raw_spin_lock(&rq->lock);
> > -               if (likely(rq == task_rq(p)))
> > +               if (likely(rq == task_rq(p)) && !task_is_waking(p))
> >                        return rq;
> >                raw_spin_unlock_irqrestore(&rq->lock, *flags);
> >        }
> 
> Looks like nothing prevents ttwu() from changing task's CPU while
> some one else is holding task_rq_lock(). Is this OK?

Ah, crud, good catch. No that is not quite OK ;-)

I'm starting to think adding a per-task scheduler lock isn't such a bad
idea after all :-)

How does something like the below look, it waits for the current
task_rq(p)->lock owner to go away after we flip p->state to TASK_WAKING.

It also optimizes the x86 spinlock code a bit, no need to wait for all
pending owners to go away, just the current one.

This also solves the p->cpus_allowed race..

---
Index: linux-2.6/kernel/sched.c
===================================================================
--- linux-2.6.orig/kernel/sched.c
+++ linux-2.6/kernel/sched.c
@@ -2518,6 +2518,8 @@ try_to_wake_up(struct task_struct *p, un
 			break;
 	}
 
+	raw_spin_unlock_wait(&task_rq(p)->lock);
+
 	ret = 1; /* we qualify as a proper wakeup now */
 
 	if (load) // XXX racy
@@ -2536,10 +2538,7 @@ try_to_wake_up(struct task_struct *p, un
 
 	if (p->sched_class->task_waking)
 		p->sched_class->task_waking(p);
-	/*
-	 * XXX: by having set TASK_WAKING outside of rq->lock, there
-	 * could be an in-flight change to p->cpus_allowed..
-	 */
+
 	cpu = select_task_rq(p, SD_BALANCE_WAKE, wake_flags);
 #endif
 	ttwu_queue(p, cpu);
Index: linux-2.6/arch/x86/include/asm/spinlock.h
===================================================================
--- linux-2.6.orig/arch/x86/include/asm/spinlock.h
+++ linux-2.6/arch/x86/include/asm/spinlock.h
@@ -158,18 +158,34 @@ static __always_inline void __ticket_spi
 }
 #endif
 
+#define TICKET_MASK ((1 << TICKET_SHIFT) - 1)
+
 static inline int __ticket_spin_is_locked(arch_spinlock_t *lock)
 {
 	int tmp = ACCESS_ONCE(lock->slock);
 
-	return !!(((tmp >> TICKET_SHIFT) ^ tmp) & ((1 << TICKET_SHIFT) - 1));
+	return !!(((tmp >> TICKET_SHIFT) ^ tmp) & TICKET_MASK);
 }
 
 static inline int __ticket_spin_is_contended(arch_spinlock_t *lock)
 {
 	int tmp = ACCESS_ONCE(lock->slock);
 
-	return (((tmp >> TICKET_SHIFT) - tmp) & ((1 << TICKET_SHIFT) - 1)) > 1;
+	return (((tmp >> TICKET_SHIFT) - tmp) & TICKET_MASK) > 1;
+}
+
+static inline void __ticket_spin_unlock_wait(arch_spinlock_t *lock)
+{
+	int tmp = ACCESS_ONCE(lock->slock);
+
+	if (!(((tmp >> TICKET_SHIFT) ^ tmp) & TICKET_MASK))
+		return; /* not locked */
+
+	tmp &= TICKET_MASK;
+
+	/* wait until the current lock holder goes away */
+	while ((ACCESS_ONCE(lock->slock) & TICKET_MASK) == tmp)
+		cpu_relax();
 }
 
 #ifndef CONFIG_PARAVIRT_SPINLOCKS
@@ -206,7 +222,11 @@ static __always_inline void arch_spin_lo
 	arch_spin_lock(lock);
 }
 
-#endif	/* CONFIG_PARAVIRT_SPINLOCKS */
+static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
+{
+	__ticket_spin_unlock_wait(lock);
+}
+#else
 
 static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
 {
@@ -214,6 +234,8 @@ static inline void arch_spin_unlock_wait
 		cpu_relax();
 }
 
+#endif	/* CONFIG_PARAVIRT_SPINLOCKS */
+
 /*
  * Read-write spinlocks, allowing multiple readers
  * but only one writer.


  reply	other threads:[~2010-12-17 13:23 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-12-16 14:56 [RFC][PATCH 0/5] Reduce runqueue lock contention -v2 Peter Zijlstra
2010-12-16 14:56 ` [RFC][PATCH 1/5] sched: Always provide p->oncpu Peter Zijlstra
2010-12-18  1:03   ` Frank Rowand
2010-12-16 14:56 ` [RFC][PATCH 2/5] mutex: Use p->oncpu for the adaptive spin Peter Zijlstra
2010-12-16 17:34   ` Oleg Nesterov
2010-12-16 19:29     ` Peter Zijlstra
2010-12-17 19:17       ` Oleg Nesterov
2010-12-16 14:56 ` [RFC][PATCH 3/5] sched: Change the ttwu success details Peter Zijlstra
2010-12-16 15:23   ` Frederic Weisbecker
2010-12-16 15:27     ` Peter Zijlstra
2010-12-16 15:30       ` Peter Zijlstra
2010-12-16 15:45         ` Frederic Weisbecker
2010-12-16 15:35       ` Frederic Weisbecker
2010-12-18  1:05   ` Frank Rowand
2010-12-16 14:56 ` [RFC][PATCH 4/5] sched: Clean up ttwu stats Peter Zijlstra
2010-12-18  1:09   ` Frank Rowand
2010-12-16 14:56 ` [RFC][PATCH 5/5] sched: Reduce ttwu rq->lock contention Peter Zijlstra
2010-12-16 15:31   ` Frederic Weisbecker
2010-12-16 17:58   ` Oleg Nesterov
2010-12-16 18:42   ` Oleg Nesterov
2010-12-16 18:58     ` Peter Zijlstra
2010-12-16 19:03       ` Peter Zijlstra
2010-12-16 19:47         ` Peter Zijlstra
2010-12-16 20:32           ` Peter Zijlstra
2010-12-17  3:06             ` Yan, Zheng
2010-12-17 13:23               ` Peter Zijlstra [this message]
2010-12-17 16:54             ` Oleg Nesterov
2010-12-17 17:43               ` Peter Zijlstra
2010-12-17 18:15                 ` Peter Zijlstra
2010-12-17 19:28                   ` Oleg Nesterov
2010-12-17 21:02                     ` Peter Zijlstra
2010-12-18 14:49                   ` Yong Zhang
2010-12-18 20:08                     ` Oleg Nesterov
2010-12-19 11:20                       ` Yong Zhang
2010-12-17 18:21                 ` Oleg Nesterov
2010-12-17 17:50               ` Oleg Nesterov
2010-12-17 18:24                 ` Peter Zijlstra
2010-12-17 18:41                   ` Peter Zijlstra
2010-12-16 19:12 ` [RFC][PATCH 0/5] Reduce runqueue lock contention -v2 Frank Rowand
2010-12-16 19:36   ` Frank Rowand
2010-12-16 19:39     ` Frank Rowand
2010-12-16 19:42       ` Peter Zijlstra
2010-12-16 20:45         ` Frank Rowand
2010-12-16 19:36   ` Frank Rowand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1292592200.2266.220.camel@twins \
    --to=peterz@infradead.org \
    --cc=axboe@kernel.dk \
    --cc=chris.mason@oracle.com \
    --cc=efault@gmx.de \
    --cc=frank.rowand@am.sony.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=oleg@redhat.com \
    --cc=pjt@google.com \
    --cc=tglx@linutronix.de \
    --cc=zheng.z.yan@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.