All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ingo Molnar <mingo@kernel.org>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Mike Galbraith <bitbucket@online.de>, Peter Anvin <hpa@zytor.com>,
	Andi Kleen <ak@linux.intel.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: [RFC] per-cpu preempt_count
Date: Tue, 13 Aug 2013 12:30:56 +0200	[thread overview]
Message-ID: <20130813103056.GA2170@gmail.com> (raw)
In-Reply-To: <CA+55aFw=S4xxdkKpjQdg77CehBBW6S-13N6-7tq4=-nN_cCUKA@mail.gmail.com>


* Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Mon, Aug 12, 2013 at 10:58 AM, Ingo Molnar <mingo@kernel.org> wrote:
> >
> > We could still have the advantages of NEED_RESCHED in preempt_count() by
> > realizing that we only rarely actually set/clear need_resched and mostly
> > read it from the highest freq user, the preempt_enable() check.
> >
> > So we could have it atomic, but do atomic_read() in the preempt_enable()
> > hotpath which wouldn't suck donkey balls, right?
> 
> Wrong. The thing is, the common case for preempt is to increment and 
> decrement the count, not testing it. Exactly because we do this for 
> spinlocks and for rcu read-locked regions.

Indeed, I should have realized that immediately ...

> Now, what we *could* do is to say:
> 
>  - we will use the high bit of the preempt count for NEED_RESCHED
> 
>  - when we set/clear that high bit, we *always* use atomic sequences, 
> and we never change any of the other bits.
> 
>  - we will increment/decrement the other counters, we *only* do so on 
> the local CPU, and we don't use atomic accesses.
> 
> Now, the downside of that is that *because* we don't use atomic accesses 
> for the inc/dec parts, the updates to the high bit can get lost. But 
> because the high bit updates are done with atomics, we know that they 
> won't mess up the actual counting bits, so at least the count is never 
> corrupted.
> 
> And the NEED_RESCHED bit getting lost would be very unusual. That 
> clearly would *not* be acceptable for RT, but it it might be acceptable 
> for "in the unusual case where we want to preempt a thread that was not 
> preemtible, *and* we ended up having the extra unsual case that 
> preemption enable ended up missing the preempt bit, we don't get 
> preempted in a timely manner". It's probably impossible to ever see in 
> practice, and considering that for non-RT use the PREEMPT bit is a 
> "strong hint" rather than anything else, it sounds like it might be 
> acceptable.
> 
> It is obviously *not* going to be acceptable for the RT people, though, 
> but since they do different code sequences _anyway_, that's not really 
> much of an issue.

Hm, this could introduce weird artifacts for code like signal delivery 
(via kick_process()), with occasional high - possibly user noticeable - 
signal delivery latencies.

But we could perhaps do something else and push the overhead into 
resched_task(): instead of using atomics we could use the resched IPI to 
set the local preempt_count(). That way preempt_count() will only ever be 
updated CPU-locally and we could merge need_resched into preempt_count() 
just fine.

[ Some care has to be taken with polling-idle threads: those could simply
  use another signalling mechanism, another field in task struct, no need
  to abuse need_resched for that. ]

We could still _read_ the preempt count non-destructively from other CPUs, 
to avoid having to send a resched IPI for already marked tasks. [ OTOH it 
might be faster to never do that and assume that an IPI has to be sent in 
99.9% of the cases - that would have to be re-measured. ]

Using this method we could have a really lightweight, minimal, percpu 
based preempt count mechanism in all the default fastpath cases, both for 
nested and for non-nested preempt_enable()s.

Thanks,

	Ingo

  parent reply	other threads:[~2013-08-13 10:31 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-08-12 11:51 [RFC] per-cpu preempt_count Peter Zijlstra
2013-08-12 17:35 ` Linus Torvalds
2013-08-12 17:51   ` H. Peter Anvin
2013-08-12 18:53     ` Linus Torvalds
2013-08-13  8:39       ` Peter Zijlstra
2013-08-12 17:58   ` Ingo Molnar
2013-08-12 19:00     ` Linus Torvalds
2013-08-12 20:44       ` H. Peter Anvin
2013-08-13 10:30       ` Ingo Molnar [this message]
2013-08-13 12:26         ` Peter Zijlstra
2013-08-13 15:39           ` Linus Torvalds
2013-08-13 15:56             ` Ingo Molnar
2013-08-13 16:26               ` Peter Zijlstra
2013-08-13 16:28               ` H. Peter Anvin
2013-08-13 16:29             ` Peter Zijlstra
2013-08-13 16:38               ` Linus Torvalds
2013-08-18 17:57   ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130813103056.GA2170@gmail.com \
    --to=mingo@kernel.org \
    --cc=ak@linux.intel.com \
    --cc=bitbucket@online.de \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.