[PATCH] spin loop primitives for busy waiting

linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] spin loop primitives for busy waiting
@ 2017-05-11 16:57 Nicholas Piggin
  2017-05-11 18:47 ` Linus Torvalds
  0 siblings, 1 reply; 6+ messages in thread
From: Nicholas Piggin @ 2017-05-11 16:57 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Nicholas Piggin, linux-arch, linuxppc-dev

Current busy-wait loops are implemented by repeatedly calling cpu_relax()
to give an arch option for a low-latency option to improve power and/or
SMT resource contention.

This poses some difficulties for powerpc, which has SMT priority setting
instructions (priorities determine how ifetch cycles are apportioned).
powerpc's cpu_relax() is implemented by setting a low priority then
setting normal priority. This has several problems:

 - Changing thread priority can have some execution cost and potential
   impact to other threads in the core. It's inefficient to execute them
   every time around a busy-wait loop.

 - Depending on implementation details, a `low ; medium` sequence may
   not have much if any affect. Some software with similar pattern
   actually inserts a lot of nops between, in order to cause a few fetch
   cycles with the low priority.

 - The busy-wait loop runs with regular priority. This might only be a few
   fetch cycles, but if there are several threads running such loops, they
   could cause a noticable impact on a non-idle thread.

Implement spin_begin, spin_end primitives that can be used around busy
wait loops, which default to no-ops. And spin_cpu_relax which defaults to
cpu_relax.

This will allow architectures to hook the entry and exit of busy-wait
loops, and will allow powerpc to set low SMT priority at entry, and
normal priority at exit.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---

Hi Linus,

Since last discussion of this, I changed the interface to match what you
suggested (e.g., just start/end to be called as a pair from anywhere
in control flow).

If you find this acceptable, I'd like to start wiring in the powerpc
and adding the annotations to some important core spin loops (there's
not too many really). I'm hoping if you take this patch during this
merge window, I'll be able to start sending small patches to maintainers
for the next window. Unless you have a better suggestion for how to
deal with this.

Thanks,
Nick

 include/linux/processor.h | 62 +++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 62 insertions(+)
 create mode 100644 include/linux/processor.h

diff --git a/include/linux/processor.h b/include/linux/processor.h
new file mode 100644
index 000000000000..0a058aaa9bab
--- /dev/null
+++ b/include/linux/processor.h
@@ -0,0 +1,62 @@
+/* Misc low level processor primitives */
+#ifndef _LINUX_PROCESSOR_H
+#define _LINUX_PROCESSOR_H
+
+#include <asm/processor.h>
+
+/*
+ * spin_begin is used before beginning a busy-wait loop, and must be paired
+ * with spin_end when the loop is exited. spin_cpu_relax must be called
+ * within the loop.
+ *
+ * The loop body should be as small and fast as possible, on the order of
+ * tens of instructions/cycles as a guide. It should and avoid calling
+ * cpu_relax, or any "spin" or sleep type of primitive including nested uses
+ * of these primitives. It should not lock or take any other resource.
+ * Violations of these guidelies will not cause a bug, but may cause sub
+ * optimal performance.
+ *
+ * These loops are optimized to be used where wait times are expected to be
+ * less than the cost of a context switch (and associated overhead).
+ *
+ * Detection of resource owner and decision to spin or sleep or guest-yield
+ * (e.g., spin lock holder vcpu preempted, or mutex owner not on CPU) can be
+ * tested within the loop body.
+ */
+#ifndef spin_begin
+#define spin_begin()
+#endif
+
+#ifndef spin_cpu_relax
+#define spin_cpu_relax() cpu_relax()
+#endif
+
+/*
+ * spin_cpu_yield may be called to yield (undirected) to the hypervisor if
+ * necessary. This should be used if the wait is expected to take longer
+ * than context switch overhead, but we can't sleep or do a directed yield.
+ */
+#ifndef spin_cpu_yield
+#define spin_cpu_yield() cpu_relax_yield()
+#endif
+
+#ifndef spin_end
+#define spin_end()
+#endif
+
+/*
+ * spin_on_cond can be used to wait for a condition to become true. It
+ * may be expected that the first iteration will true in the common case
+ * (no spinning).
+ */
+#ifndef spin_on_cond
+#define spin_on_cond(cond)					\
+do {								\
+	spin_begin();						\
+	while (cond)						\
+		spin_cpu_relax();				\
+	spin_end();						\
+} while (0)
+#endif
+
+#endif /* _LINUX_PROCESSOR_H */
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] spin loop primitives for busy waiting
  2017-05-11 16:57 [PATCH] spin loop primitives for busy waiting Nicholas Piggin
@ 2017-05-11 18:47 ` Linus Torvalds
  2017-05-11 19:26   ` Nicholas Piggin
  2017-05-12 12:58   ` David Laight
  0 siblings, 2 replies; 6+ messages in thread
From: Linus Torvalds @ 2017-05-11 18:47 UTC (permalink / raw)
  To: Nicholas Piggin; +Cc: linux-arch@vger.kernel.org, ppc-dev

On Thu, May 11, 2017 at 9:57 AM, Nicholas Piggin <npiggin@gmail.com> wrote:
>
> If you find this acceptable, I'd like to start wiring in the powerpc
> and adding the annotations to some important core spin loops (there's
> not too many really). I'm hoping if you take this patch during this
> merge window, I'll be able to start sending small patches to maintainers
> for the next window. Unless you have a better suggestion for how to
> deal with this.

This looks fine to me as an interface, and yes, I guess I can just
take this scaffolding as-is.

The one question I have is about "spin_on_cond()": since you
explicitly document that the "no spinning" case is expected to be the
default, I really think that the default implementation should be
along the lines if

  #define spin_on_cond(cond) do { \
    if (unlikely(!(cond))) { \
        spin_begin(); do spin_cpu_relax(); while (!(cond)); spin_end(); \
    } \
  } while (0)

which will actually result in better code generation even if
spin_begin/end() are no-ops, and just generally optimizes for the
right behavior (ie do the spinning out-of-line, since by definition it
cannot be performance-critical after the first iteration).

So it's better even when you don't really have the begin/end overhead,
but that version of "spin_on_cond()" is then likely to work _much_
better if you actually do have it, when your version would seem to be
entirely unacceptable.

Hmm?

In fact, do you even want to make that "spin_on_cond()" version
conditional? I don't see how an architecture can really improve on it.
If an architecture wants to use things like "wait on this particular
variable", then that generic "cond" version is too generic an
interface anyway.

                  Linus

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] spin loop primitives for busy waiting
  2017-05-11 18:47 ` Linus Torvalds
@ 2017-05-11 19:26   ` Nicholas Piggin
  2017-05-12 12:58   ` David Laight
  1 sibling, 0 replies; 6+ messages in thread
From: Nicholas Piggin @ 2017-05-11 19:26 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-arch@vger.kernel.org, ppc-dev

On Thu, 11 May 2017 11:47:47 -0700
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Thu, May 11, 2017 at 9:57 AM, Nicholas Piggin <npiggin@gmail.com> wrote:
> >
> > If you find this acceptable, I'd like to start wiring in the powerpc
> > and adding the annotations to some important core spin loops (there's
> > not too many really). I'm hoping if you take this patch during this
> > merge window, I'll be able to start sending small patches to maintainers
> > for the next window. Unless you have a better suggestion for how to
> > deal with this.  
> 
> This looks fine to me as an interface, and yes, I guess I can just
> take this scaffolding as-is.
> 
> The one question I have is about "spin_on_cond()": since you
> explicitly document that the "no spinning" case is expected to be the
> default, I really think that the default implementation should be
> along the lines if
> 
>   #define spin_on_cond(cond) do { \
>     if (unlikely(!(cond))) { \
>         spin_begin(); do spin_cpu_relax(); while (!(cond)); spin_end(); \
>     } \
>   } while (0)
> 
> which will actually result in better code generation even if
> spin_begin/end() are no-ops, and just generally optimizes for the
> right behavior (ie do the spinning out-of-line, since by definition it
> cannot be performance-critical after the first iteration).
> 
> So it's better even when you don't really have the begin/end overhead,
> but that version of "spin_on_cond()" is then likely to work _much_
> better if you actually do have it, when your version would seem to be
> entirely unacceptable.
> 
> Hmm?

I agree completely. But might it depend on architecture and so
just be something they could do themselves?

> In fact, do you even want to make that "spin_on_cond()" version
> conditional? I don't see how an architecture can really improve on it.
> If an architecture wants to use things like "wait on this particular
> variable", then that generic "cond" version is too generic an
> interface anyway.

So what I want is to be able to have architectures control the entire
loop, and you can do some interesting things with goto labels and
coding branches how you want them.

That was my motivation for doing the previous weird spin_do {} spin_while()
interface. But since you didn't like it, and I realized that kind of
control probably only matters for a very small subset of spin loops
(e.g., bit spinlock/seqlock, some idle loops in the scheduler, etc. that
do just tend to spin on a quite simple condition).

For example, if you sent the branch prediction to statically predict
the loop exits, then instead of taking a branch miss the first thing
you do when your lock gets freed, you will be doing useful instructions.

I haven't verified and done enough analysis yet to know if that's going
to be worthwhile yet, but I thought for some of those simple spins, it's
a nicer interface than the begin/relax/end.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: [PATCH] spin loop primitives for busy waiting
  2017-05-11 18:47 ` Linus Torvalds
  2017-05-11 19:26   ` Nicholas Piggin
@ 2017-05-12 12:58   ` David Laight
  2017-05-12 15:56     ` Linus Torvalds
  2017-05-12 15:56     ` Nicholas Piggin
  1 sibling, 2 replies; 6+ messages in thread
From: David Laight @ 2017-05-12 12:58 UTC (permalink / raw)
  To: 'Linus Torvalds', Nicholas Piggin
  Cc: linux-arch@vger.kernel.org, ppc-dev

RnJvbTogTGludXMgVG9ydmFsZHMNCj4gU2VudDogMTEgTWF5IDIwMTcgMTk6NDgNCi4uLg0KPiBU
aGUgb25lIHF1ZXN0aW9uIEkgaGF2ZSBpcyBhYm91dCAic3Bpbl9vbl9jb25kKCkiOiBzaW5jZSB5
b3UNCj4gZXhwbGljaXRseSBkb2N1bWVudCB0aGF0IHRoZSAibm8gc3Bpbm5pbmciIGNhc2UgaXMg
ZXhwZWN0ZWQgdG8gYmUgdGhlDQo+IGRlZmF1bHQsIEkgcmVhbGx5IHRoaW5rIHRoYXQgdGhlIGRl
ZmF1bHQgaW1wbGVtZW50YXRpb24gc2hvdWxkIGJlDQo+IGFsb25nIHRoZSBsaW5lcyBpZg0KPiAN
Cj4gICAjZGVmaW5lIHNwaW5fb25fY29uZChjb25kKSBkbyB7IFwNCj4gICAgIGlmICh1bmxpa2Vs
eSghKGNvbmQpKSkgeyBcDQo+ICAgICAgICAgc3Bpbl9iZWdpbigpOyBkbyBzcGluX2NwdV9yZWxh
eCgpOyB3aGlsZSAoIShjb25kKSk7IHNwaW5fZW5kKCk7IFwNCj4gICAgIH0gXA0KPiAgIH0gd2hp
bGUgKDApDQo+IA0KPiB3aGljaCB3aWxsIGFjdHVhbGx5IHJlc3VsdCBpbiBiZXR0ZXIgY29kZSBn
ZW5lcmF0aW9uIGV2ZW4gaWYNCj4gc3Bpbl9iZWdpbi9lbmQoKSBhcmUgbm8tb3BzLCBhbmQganVz
dCBnZW5lcmFsbHkgb3B0aW1pemVzIGZvciB0aGUNCj4gcmlnaHQgYmVoYXZpb3IgKGllIGRvIHRo
ZSBzcGlubmluZyBvdXQtb2YtbGluZSwgc2luY2UgYnkgZGVmaW5pdGlvbiBpdA0KPiBjYW5ub3Qg
YmUgcGVyZm9ybWFuY2UtY3JpdGljYWwgYWZ0ZXIgdGhlIGZpcnN0IGl0ZXJhdGlvbikuDQoNCkF0
IGxlYXN0IHNvbWUgdmVyc2lvbnMgb2YgZ2NjIGNvbnZlcnQgd2hpbGUgKGNvbmQpIGRvIHtib2R5
fQ0KaW50byBpZiAoY29uZCkgZG8ge2JvZHl9IHdoaWxlIChjb25kKSBldmVuIHdoZW4gJ2NvbmQn
DQppcyBhIG5vbi10cml2aWFsIGV4cHJlc3Npb24gYW5kICdib2R5JyBpcyB0cml2aWFsLg0KVGhl
IGNvZGUtYmxvYXQgaXMgc2lsbHkuDQpObyBwb2ludCBlbmZvcmNpbmcgdGhlICdvcHRpbWlzYXRp
b24nIGhlcmUuDQoNCglEYXZpZA0K

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] spin loop primitives for busy waiting
  2017-05-12 12:58   ` David Laight
@ 2017-05-12 15:56     ` Linus Torvalds
  2017-05-12 15:56     ` Nicholas Piggin
  1 sibling, 0 replies; 6+ messages in thread
From: Linus Torvalds @ 2017-05-12 15:56 UTC (permalink / raw)
  To: David Laight; +Cc: Nicholas Piggin, linux-arch@vger.kernel.org, ppc-dev

On Fri, May 12, 2017 at 5:58 AM, David Laight <David.Laight@aculab.com> wrote:
>
> At least some versions of gcc convert while (cond) do {body}
> into if (cond) do {body} while (cond) even when 'cond'
> is a non-trivial expression and 'body' is trivial.

Afaik pretty much all versions of gcc do that, unless you use -Os
(which we have effectively stopped doing because it causes so many
other problems in stupid instruction choices etc).

> The code-bloat is silly.
> No point enforcing the 'optimisation' here.

Actually, for places where we expect cold code and the loop to be
often run zero times, it actually makes sense. When we can make the
initial "if()" be unlikely, and gcc can then generate the unlikely
code all out of line, then the while -> if+do-while construction makes
sense.

It's the "normal" while loops that it's sad how big code gcc
generates, considering that most of our loop counts are vert small
(but generally nonzero).

               Linus

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] spin loop primitives for busy waiting
  2017-05-12 12:58   ` David Laight
  2017-05-12 15:56     ` Linus Torvalds
@ 2017-05-12 15:56     ` Nicholas Piggin
  1 sibling, 0 replies; 6+ messages in thread
From: Nicholas Piggin @ 2017-05-12 15:56 UTC (permalink / raw)
  To: David Laight
  Cc: 'Linus Torvalds', linux-arch@vger.kernel.org, ppc-dev

On Fri, 12 May 2017 12:58:12 +0000
David Laight <David.Laight@ACULAB.COM> wrote:

> From: Linus Torvalds
> > Sent: 11 May 2017 19:48  
> ...
> > The one question I have is about "spin_on_cond()": since you
> > explicitly document that the "no spinning" case is expected to be the
> > default, I really think that the default implementation should be
> > along the lines if
> > 
> >   #define spin_on_cond(cond) do { \
> >     if (unlikely(!(cond))) { \
> >         spin_begin(); do spin_cpu_relax(); while (!(cond)); spin_end(); \
> >     } \
> >   } while (0)
> > 
> > which will actually result in better code generation even if
> > spin_begin/end() are no-ops, and just generally optimizes for the
> > right behavior (ie do the spinning out-of-line, since by definition it
> > cannot be performance-critical after the first iteration).  
> 
> At least some versions of gcc convert while (cond) do {body}
> into if (cond) do {body} while (cond) even when 'cond'
> is a non-trivial expression and 'body' is trivial.
> The code-bloat is silly.
> No point enforcing the 'optimisation' here.

The point is for something like this:

static inline unsigned __read_seqcount_begin(const seqcount_t *s)
{
        unsigned ret;

repeat:
        ret = READ_ONCE(s->sequence);
        if (unlikely(ret & 1)) {
                cpu_relax();
                goto repeat;
        }
        return ret;
}

to be coded as:

static inline unsigned __read_seqcount_begin(const seqcount_t *s)
{
        unsigned ret;

	spin_on_cond( !((ret = READ_ONCE(s->sequence)) & 1) );

        return ret;
}

That's about as complex as you'd want to go with this, but I think
it's a reasonable case.

Now for x86, you would want these to fall out to the same code
generated. For powerpc, you do not want those spin_begin(); spin_end();

You are right there's a bit of code bloat there. It gets moved out
of line, but gcc still isn't all that smart about it though, and
it doesn't fold the tests back nicely if I go with Linus's suggestion,
so it doesn't work so well as generic implementation.

For powerpc we have to live with it I think.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2017-05-12 15:56 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-05-11 16:57 [PATCH] spin loop primitives for busy waiting Nicholas Piggin
2017-05-11 18:47 ` Linus Torvalds
2017-05-11 19:26   ` Nicholas Piggin
2017-05-12 12:58   ` David Laight
2017-05-12 15:56     ` Linus Torvalds
2017-05-12 15:56     ` Nicholas Piggin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).