From: Rusty Russell <rusty@rustcorp.com.au>
To: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>,
Christian Borntraeger <borntraeger@de.ibm.com>,
Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>,
linux-kernel@vger.kernel.org,
virtualization@lists.linux-foundation.org,
Zachary Amsden <zach@vmware.com>
Subject: Re: [PATCH] stopmachine: add stopmachine_timeout
Date: Tue, 15 Jul 2008 11:14:58 +1000 [thread overview]
Message-ID: <200807151114.59562.rusty@rustcorp.com.au> (raw)
In-Reply-To: <20080714212026.GA6705@osiris.boeblingen.de.ibm.com>
On Tuesday 15 July 2008 07:20:26 Heiko Carstens wrote:
> On Mon, Jul 14, 2008 at 11:56:18AM -0700, Jeremy Fitzhardinge wrote:
> > Rusty Russell wrote:
> > > On Monday 14 July 2008 21:51:25 Christian Borntraeger wrote:
> > >> Am Montag, 14. Juli 2008 schrieb Hidetoshi Seto:
> > >>> + /* Wait all others come to life */
> > >>> + while (cpus_weight(prepared_cpus) != num_online_cpus() - 1) {
> > >>> + if (time_is_before_jiffies(limit))
> > >>> + goto timeout;
> > >>> + cpu_relax();
> > >>> + }
> > >>> +
> > >>
> > >> Hmm. I think this could become interesting on virtual machines. The
> > >> hypervisor might be to busy to schedule a specific cpu at certain load
> > >> scenarios. This would cause a failure even if the cpu is not really
> > >> locked up. We had similar problems with the soft lockup daemon on
> > >> s390.
> > >
> > > 5 seconds is a fairly long time. If all else fails we could have a
> > > config option to simply disable this code.
>
> Hmm.. probably a stupid question: but what could happen that a real cpu
> (not virtual) becomes unresponsive so that it won't schedule a
> MAX_RT_PRIO-1 prioritized task for 5 seconds?
Yes. That's exactly what we're trying to detect. Currently the entire
machine will wedge. With this patch we can often limp along.
Hidetoshi's original problem was a client whose machine had one CPU die, then
got wedged as the emergency backup tried to load a module.
Along these lines, I found VMWare's relaxed co-scheduling interesting, BTW:
http://communities.vmware.com/docs/DOC-4960
> cpu_relax() translates to a hypervisor yield on s390. Probably makes sense
> if other architectures would do the same.
Yes, I think so too. Actually, doing a random yield-to-other-VCPU on
cpu_relax is arguable the right semantic (in Linux it's used for spinning,
almost exclusively to wait for other cpus).
Cheers,
Rusty.
next prev parent reply other threads:[~2008-07-15 2:12 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-07-14 7:52 [PATCH] stopmachine: add stopmachine_timeout Hidetoshi Seto
2008-07-14 8:19 ` Hidetoshi Seto
2008-07-14 10:43 ` Rusty Russell
2008-07-15 1:11 ` Hidetoshi Seto
2008-07-15 7:50 ` Rusty Russell
2008-07-16 4:05 ` Hidetoshi Seto
2008-07-20 9:45 ` Rusty Russell
2008-07-22 3:28 ` [PATCH] stopmachine: allow force progress on timeout Hidetoshi Seto
2008-07-14 11:51 ` [PATCH] stopmachine: add stopmachine_timeout Christian Borntraeger
2008-07-14 12:34 ` Rusty Russell
2008-07-14 18:56 ` Jeremy Fitzhardinge
2008-07-14 21:20 ` Heiko Carstens
2008-07-15 1:14 ` Rusty Russell [this message]
2008-07-15 2:24 ` Hidetoshi Seto
2008-07-15 2:37 ` Max Krasnyansky
2008-07-15 2:24 ` Max Krasnyansky
2008-07-15 6:09 ` Heiko Carstens
2008-07-15 8:09 ` Rusty Russell
2008-07-15 8:39 ` Heiko Carstens
2008-07-15 8:51 ` Max Krasnyansky
2008-07-16 9:15 ` Christian Borntraeger
2008-07-16 4:27 ` [PATCH] stopmachine: add stopmachine_timeout v2 Hidetoshi Seto
2008-07-16 6:23 ` Max Krasnyansky
2008-07-16 6:35 ` Hidetoshi Seto
2008-07-16 6:51 ` [PATCH] stopmachine: add stopmachine_timeout v3 Hidetoshi Seto
2008-07-16 7:33 ` Peter Zijlstra
2008-07-16 8:12 ` Hidetoshi Seto
2008-07-16 10:11 ` [PATCH] stopmachine: add stopmachine_timeout v2 Jeremy Fitzhardinge
2008-07-17 3:40 ` Hidetoshi Seto
2008-07-17 5:37 ` Jeremy Fitzhardinge
2008-07-18 4:18 ` Rusty Russell
2008-07-17 6:12 ` [PATCH] stopmachine: add stopmachine_timeout v4 Hidetoshi Seto
2008-07-17 7:09 ` Max Krasnyansky
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=200807151114.59562.rusty@rustcorp.com.au \
--to=rusty@rustcorp.com.au \
--cc=borntraeger@de.ibm.com \
--cc=heiko.carstens@de.ibm.com \
--cc=jeremy@goop.org \
--cc=linux-kernel@vger.kernel.org \
--cc=seto.hidetoshi@jp.fujitsu.com \
--cc=virtualization@lists.linux-foundation.org \
--cc=zach@vmware.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox