From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Frederic Weisbecker <fweisbec@gmail.com>
Cc: linux-kernel@vger.kernel.org, mingo@elte.hu,
laijs@cn.fujitsu.com, dipankar@in.ibm.com,
akpm@linux-foundation.org, mathieu.desnoyers@polymtl.ca,
josh@joshtriplett.org, niv@us.ibm.com, tglx@linutronix.de,
peterz@infradead.org, rostedt@goodmis.org, dhowells@redhat.com,
edumazet@google.com, darren@dvhart.com, sbw@mit.edu
Subject: Re: [PATCH RFC nohz_full v2 6/7] nohz_full: Add full-system-idle state machine
Date: Mon, 1 Jul 2013 11:10:40 -0700 [thread overview]
Message-ID: <20130701181040.GO3773@linux.vnet.ibm.com> (raw)
In-Reply-To: <20130701163529.GO7246@somewhere.redhat.com>
On Mon, Jul 01, 2013 at 06:35:31PM +0200, Frederic Weisbecker wrote:
> On Fri, Jun 28, 2013 at 01:10:21PM -0700, Paul E. McKenney wrote:
> > /*
> > + * Unconditionally force exit from full system-idle state. This is
> > + * invoked when a normal CPU exits idle, but must be called separately
> > + * for the timekeeping CPU (tick_do_timer_cpu). The reason for this
> > + * is that the timekeeping CPU is permitted to take scheduling-clock
> > + * interrupts while the system is in system-idle state, and of course
> > + * rcu_sysidle_exit() has no way of distinguishing a scheduling-clock
> > + * interrupt from any other type of interrupt.
> > + */
> > +void rcu_sysidle_force_exit(void)
> > +{
> > + int oldstate = ACCESS_ONCE(full_sysidle_state);
> > + int newoldstate;
> > +
> > + /*
> > + * Each pass through the following loop attempts to exit full
> > + * system-idle state. If contention proves to be a problem,
> > + * a trylock-based contention tree could be used here.
> > + */
> > + while (oldstate > RCU_SYSIDLE_SHORT) {
> > + newoldstate = cmpxchg(&full_sysidle_state,
> > + oldstate, RCU_SYSIDLE_NOT);
> > + if (oldstate == newoldstate &&
> > + oldstate == RCU_SYSIDLE_FULL_NOTED) {
> > + rcu_kick_nohz_cpu(tick_do_timer_cpu);
> > + return; /* We cleared it, done! */
> > + }
> > + oldstate = newoldstate;
> > + }
> > + smp_mb(); /* Order initial oldstate fetch vs. later non-idle work. */
> > +}
> > +
> > +/*
> > * Invoked to note entry to irq or task transition from idle. Note that
> > * usermode execution does -not- count as idle here! The caller must
> > * have disabled interrupts.
> > @@ -2474,6 +2506,214 @@ static void rcu_sysidle_exit(struct rcu_dynticks *rdtp, int irq)
> > atomic_inc(&rdtp->dynticks_idle);
> > smp_mb__after_atomic_inc();
> > WARN_ON_ONCE(!(atomic_read(&rdtp->dynticks_idle) & 0x1));
> > +
> > + /*
> > + * If we are the timekeeping CPU, we are permitted to be non-idle
> > + * during a system-idle state. This must be the case, because
> > + * the timekeeping CPU has to take scheduling-clock interrupts
> > + * during the time that the system is transitioning to full
> > + * system-idle state. This means that the timekeeping CPU must
> > + * invoke rcu_sysidle_force_exit() directly if it does anything
> > + * more than take a scheduling-clock interrupt.
> > + */
> > + if (smp_processor_id() == tick_do_timer_cpu)
> > + return;
> > +
> > + /* Update system-idle state: We are clearly no longer fully idle! */
> > + rcu_sysidle_force_exit();
> > +}
> > +
> > +/*
> > + * Check to see if the current CPU is idle. Note that usermode execution
> > + * does not count as idle. The caller must have disabled interrupts.
> > + */
> > +static void rcu_sysidle_check_cpu(struct rcu_data *rdp, bool *isidle,
> > + unsigned long *maxj)
> > +{
> > + int cur;
> > + int curnmi;
> > + unsigned long j;
> > + struct rcu_dynticks *rdtp = rdp->dynticks;
> > +
> > + /*
> > + * If some other CPU has already reported non-idle, if this is
> > + * not the flavor of RCU that tracks sysidle state, or if this
> > + * is an offline or the timekeeping CPU, nothing to do.
> > + */
> > + if (!*isidle || rdp->rsp != rcu_sysidle_state ||
> > + cpu_is_offline(rdp->cpu) || rdp->cpu == tick_do_timer_cpu)
> > + return;
> > + /* WARN_ON_ONCE(smp_processor_id() != tick_do_timer_cpu); */
> > +
> > + /*
> > + * Pick up current idle and NMI-nesting counters, check. We check
> > + * for NMIs using RCU's main ->dynticks counter. This works because
> > + * any time ->dynticks has its low bit set, ->dynticks_idle will
> > + * too -- unless the only reason that ->dynticks's low bit is set
> > + * is due to an NMI from idle. Which is exactly the case we need
> > + * to account for.
> > + */
> > + cur = atomic_read(&rdtp->dynticks_idle);
> > + curnmi = atomic_read(&rdtp->dynticks);
> > + if ((cur & 0x1) || (curnmi & 0x1)) {
>
> I think you wanted to ignore NMIs this time because they don't read walltime?
>
> By the way they can still read jiffies, but unlike irq_enter(), nmi_enter()
> don't catch up with missing jiffies update. So the behaviour doesn't change
> compared to !NO_HZ_FULL.
You are right, I missed this when ripping out NMI handling. Will fix!
> > + *isidle = 0; /* We are not idle! */
> > + return;
> > + }
> > + smp_mb(); /* Read counters before timestamps. */
> > +
> > + /* Pick up timestamps. */
> > + j = ACCESS_ONCE(rdtp->dynticks_idle_jiffies);
> > + /* If this CPU entered idle more recently, update maxj timestamp. */
> > + if (ULONG_CMP_LT(*maxj, j))
> > + *maxj = j;
>
> So I'm a bit confused with the ordering so I'm probably going to ask a silly question.
>
> What makes sure that we are not reading a stale value of rdtp->dynticks_idle
> in the following scenario:
>
> CPU 0 CPU 1
>
> //CPU 1 idle
> //rdtp(1)->dynticks_idle == 0
>
> sysidle_check_cpu(CPU 1) {
> rdtp(1)->dynticks_idle == 0
> }
> cmpxchg(full_sysidle_state,
> ...RCU_SYSIDLE_SHORT)
> rcu_irq_exit() {
rcu_irq_enter(), right?
> rdtp(1)->dynticks_idle = 1
> smp_mb()
> rcu_sysidle_force_exit() {
> full_sysidle_state == RCU_SYSIDLE_SHORT
> // no cmpxchg
> smp_mb()
> ...
>
> [1]
> sysidle_check_cpu(CPU 1) {
> rdtp(1)->dynticks_idle == 0
> }
>
> cmpxchg(RCU_SYSIDLE_FULL, ...)
You know, I had an RCU_SYSIDLE_LONG state for this purpose, but later
convinced myself that I didn't need it. :-/
Time to go put it back in, and thank you for your careful review!
Thanx, Paul
> [2]
> sysidle_check_cpu(CPU 1) {
> rdtp(1)->dynticks_idle == 0
> }
>
> cmpxchg(RCU_SYSIDLE_FULL_NOTED, ...)
>
>
> I mean in [1] and [2] I can't see something in the ordering that guarantees that we see
> the new value rdtp(1)->dynticks_idle == 1.
>
next prev parent reply other threads:[~2013-07-01 18:10 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-06-28 20:09 [PATCH RFC nohz_full 0/7] v2 Provide infrastructure for full-system idle Paul E. McKenney
2013-06-28 20:10 ` [PATCH RFC nohz_full v2 1/7] nohz_full: Add Kconfig parameter for scalable detection of all-idle state Paul E. McKenney
2013-06-28 20:10 ` [PATCH RFC nohz_full v2 2/7] nohz_full: Add rcu_dyntick data " Paul E. McKenney
2013-07-01 15:31 ` Josh Triplett
2013-07-01 15:52 ` Paul E. McKenney
2013-07-01 18:16 ` Josh Triplett
2013-07-01 18:23 ` Paul E. McKenney
2013-07-01 18:34 ` Josh Triplett
2013-07-01 19:16 ` Paul E. McKenney
2013-07-02 5:10 ` Mike Galbraith
2013-07-02 5:58 ` Paul E. McKenney
2013-06-28 20:10 ` [PATCH RFC nohz_full v2 3/7] nohz_full: Add per-CPU idle-state tracking Paul E. McKenney
2013-07-01 15:33 ` Josh Triplett
2013-06-28 20:10 ` [PATCH RFC nohz_full v2 4/7] nohz_full: Add full-system idle states and variables Paul E. McKenney
2013-06-28 20:10 ` [PATCH RFC nohz_full v2 5/7] nohz_full: Add full-system-idle arguments to API Paul E. McKenney
2013-06-28 20:10 ` [PATCH RFC nohz_full v2 6/7] nohz_full: Add full-system-idle state machine Paul E. McKenney
2013-07-01 16:35 ` Frederic Weisbecker
2013-07-01 18:10 ` Paul E. McKenney [this message]
2013-07-01 20:55 ` Frederic Weisbecker
2013-07-01 16:53 ` Frederic Weisbecker
2013-07-01 18:17 ` Paul E. McKenney
2013-07-01 21:38 ` Frederic Weisbecker
2013-07-01 22:51 ` Paul E. McKenney
2013-06-28 20:10 ` [PATCH RFC nohz_full v2 7/7] nohz_full: Force RCU's grace-period kthreads onto timekeeping CPU Paul E. McKenney
2013-07-01 15:19 ` [PATCH RFC nohz_full 0/7] v2 Provide infrastructure for full-system idle Andi Kleen
2013-07-01 16:03 ` Paul E. McKenney
2013-07-01 16:19 ` Andi Kleen
2013-07-01 19:19 ` Paul E. McKenney
2013-07-01 19:43 ` Christoph Lameter
2013-07-01 19:56 ` Paul E. McKenney
2013-07-01 20:24 ` Christoph Lameter
2013-07-01 20:43 ` Thomas Gleixner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130701181040.GO3773@linux.vnet.ibm.com \
--to=paulmck@linux.vnet.ibm.com \
--cc=akpm@linux-foundation.org \
--cc=darren@dvhart.com \
--cc=dhowells@redhat.com \
--cc=dipankar@in.ibm.com \
--cc=edumazet@google.com \
--cc=fweisbec@gmail.com \
--cc=josh@joshtriplett.org \
--cc=laijs@cn.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mathieu.desnoyers@polymtl.ca \
--cc=mingo@elte.hu \
--cc=niv@us.ibm.com \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=sbw@mit.edu \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.