public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Andrew Morton <akpm@linux-foundation.org>
To: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: mingo@elte.hu, jens.axboe@oracle.com, a.p.zijlstra@chello.nl,
	cborntra@de.ibm.com, rusty@rustcorp.com.au,
	linux-kernel@vger.kernel.org, arjan@infradead.org
Subject: Re: [PATCH RFC 1/3] Add a trigger API for efficient non-blocking waiting
Date: Wed, 20 Aug 2008 12:25:46 -0700	[thread overview]
Message-ID: <20080820122546.6022d91d.akpm@linux-foundation.org> (raw)
In-Reply-To: <48AC6593.80505@goop.org>

On Wed, 20 Aug 2008 11:42:27 -0700
Jeremy Fitzhardinge <jeremy@goop.org> wrote:

> Andrew Morton wrote:
> > On Sat, 16 Aug 2008 09:34:13 -0700 Jeremy Fitzhardinge <jeremy@goop.org> wrote:
> >
> >   
> >> There are various places in the kernel which wish to wait for a
> >> condition to come true while in a non-blocking context.  Existing
> >> examples of this are stop_machine() and smp_call_function_mask().
> >> (No doubt there are other instances of this pattern in the tree.)
> >>
> >> Thus far, the only way to achieve this is by spinning with a
> >> cpu_relax() loop.  This is fine if the condition becomes true very
> >> quickly, but it is not ideal:
> >>
> >>  - There's little opportunity to put the CPUs into a low-power state.
> >>    cpu_relax() may do this to some extent, but if the wait is
> >>    relatively long, then we can probably do better.
> >>     
> >
> > If this change saves a significant amount of power then we should fix
> > the offending callsites.
> >   
> 
> Fix them how?  In general we're talking about contexts where we can't
> block, and where the wait time is limited by some property of the
> platform, such as IPI time or interrupt latency (though doing a
> cross-cpu call of a long-running function would be something we could fix).

ah, OK, I'd failed to note that you had identified two specific culprits.

Are either of these operations executed frequently enough for there to
be significant energy savings here?

> >>  - In a virtual environment, spinning virtual CPUs just waste CPU
> >>    resources, and may steal CPU time from vCPUs which need it to make
> >>    progress.  The trigger API allows the vCPUs to give up their CPU
> >>    entirely.  The s390 people observed a problem with stop_machine
> >>    taking a very long time (seconds) when there are more vcpus than
> >>    available cpus.
> >>     
> >
> > If this change saves a significant amount of virtual-cpu-time then we
> > should fix the offending callsites.
> >   
> 
> This case isn't particularly about saving vcpu time, but making timely
> progress.  stop_machine() gets all the cpus into a spinloop, where they
> spin waiting for an event to tell them to go to their next state-machine
> state.  By definition this can't be a blocking operation (since the
> whole point is that they're high priority threads that prevent anything
> else from running).  But in the virtual case, the fact that they're all
> spinning means that the underlying hypervisor has no idea who's just
> spinning, and who's trying to do some work needed to make overall
> progress, so the whole thing gets bogged down.

hm.  I'm surprised that stop_machine() is executed frequently enough
for you to care.  What's causing it?

> Now perhaps we could solve stop_machine by modifying the scheduler in
> some way, where you can block the run queue so that you sit in the idle
> loop even though there's runnable processes waiting.  But even then,
> stop_machine requires that interrupts be disabled, which means the we're
> pretty much limited to spinning.

If stop_machine() is the _only_ problematic callsite and we reasonably
expect that no new ones will pop up then sure, a
stop_machine()-specific fix might be appropriate.

Otherwise, sure, we'd need to loko at something more general.

> So my proposal is to add a non-scheduler-blocking operation which is
> semantically equivalent to spinning, but gives the underlying platform
> more information about what's going on.
> 
> Arjan suggested that since this is more or less equivalent to a
> completion, we should just implement "spinpletions" - a spinning
> completion.  This should be more familiar to kernel programmers, and
> should be just as useful as triggers.
> 
> I've run out of time to work on this now, but Rusty has hinted he'll
> pick up the baton...
> 
> (I'd also like to hear from other architecture folks, particularly s390,
> to make sure this is going to be useful to them too.)
> 
>     J

  reply	other threads:[~2008-08-20 19:29 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-08-16 16:34 [PATCH RFC 1/3] Add a trigger API for efficient non-blocking waiting Jeremy Fitzhardinge
2008-08-16 17:47 ` Arjan van de Ven
2008-08-17 23:02 ` Randy Dunlap
2008-08-20  6:21 ` Andrew Morton
2008-08-20 18:42   ` Jeremy Fitzhardinge
2008-08-20 19:25     ` Andrew Morton [this message]
2008-08-20 20:14       ` Jeremy Fitzhardinge
2008-08-25  0:53       ` Rusty Russell
2008-08-28 12:27 ` Christian Borntraeger

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080820122546.6022d91d.akpm@linux-foundation.org \
    --to=akpm@linux-foundation.org \
    --cc=a.p.zijlstra@chello.nl \
    --cc=arjan@infradead.org \
    --cc=cborntra@de.ibm.com \
    --cc=jens.axboe@oracle.com \
    --cc=jeremy@goop.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=rusty@rustcorp.com.au \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox