From: Jan Stancek <jstancek@redhat.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: alex shi <alex.shi@intel.com>, guz fnst <guz.fnst@cn.fujitsu.com>,
mingo@redhat.com, jolsa@redhat.com, riel@redhat.com,
linux-kernel@vger.kernel.org
Subject: Re: [BUG] scheduler doesn't balance thread to idle cpu for 3 seconds
Date: Fri, 29 Jan 2016 05:33:45 -0500 (EST) [thread overview]
Message-ID: <654964868.14006956.1454063625314.JavaMail.zimbra@redhat.com> (raw)
In-Reply-To: <20160129101522.GF6357@twins.programming.kicks-ass.net>
----- Original Message -----
> From: "Peter Zijlstra" <peterz@infradead.org>
> To: "Jan Stancek" <jstancek@redhat.com>
> Cc: "alex shi" <alex.shi@intel.com>, "guz fnst" <guz.fnst@cn.fujitsu.com>, mingo@redhat.com, jolsa@redhat.com,
> riel@redhat.com, linux-kernel@vger.kernel.org
> Sent: Friday, 29 January, 2016 11:15:22 AM
> Subject: Re: [BUG] scheduler doesn't balance thread to idle cpu for 3 seconds
>
> On Thu, Jan 28, 2016 at 01:43:13PM -0500, Jan Stancek wrote:
> > > How long should I have to wait for a fail?
> >
> > It's about 1000-2000 iterations for me, which I think you covered
> > by now in those 2 hours.
>
> So I've been running:
>
> while ! ./pthread_cond_wait_1 ; do sleep 1; done
>
> overnight on the machine, and have yet to hit a wobbly -- that is, its
> still running.
I have seen similar result.
Then, instead of turning CPUs off, I spawned more low prio threads to scale
with number of CPUs on system:
@@ -213,10 +213,14 @@
printf(ERROR_PREFIX "pthread_attr_setschedparam\n");
exit(PTS_UNRESOLVED);
}
- rc = pthread_create(&low_id, &low_attr, low_priority_thread, NULL);
- if (rc != 0) {
- printf(ERROR_PREFIX "pthread_create\n");
- exit(PTS_UNRESOLVED);
+
+ int i, ncpus = sysconf(_SC_NPROCESSORS_ONLN);
+ for (i = 0; i < ncpus - 1; i++) {
+ rc = pthread_create(&low_id, &low_attr, low_priority_thread, NULL);
+ if (rc != 0) {
+ printf(ERROR_PREFIX "pthread_create\n");
+ exit(PTS_UNRESOLVED);
+ }
and let this ran on 3 bare metal x86 systems over night (v4.5-rc1). It
failed on 2 systems (12 and 24 CPUs) with 1:1000 chance, it never failed
on 3rd one (4 CPUs).
>
> Also note that I don't think failing this test is a bug per se.
> Undesirable maybe, but within spec, since SIGALRM is process wide, so it
> being delivered to the SCHED_OTHER task is accepted, and SCHED_OTHER has
> no timeliness guarantees.
>
> That said; if I could reliably reproduce I'd have a go at fixing this, I
> suspect there's a 'fun' problem at the bottom of this.
Thanks for trying, I'll see if I can find some more reliable way.
Regards,
Jan
next prev parent reply other threads:[~2016-01-29 10:34 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-01-27 14:52 [BUG] scheduler doesn't balance thread to idle cpu for 3 seconds Jan Stancek
2016-01-28 15:55 ` Jan Stancek
2016-01-28 17:49 ` Peter Zijlstra
2016-01-28 18:43 ` Jan Stancek
2016-01-29 10:15 ` Peter Zijlstra
2016-01-29 10:33 ` Jan Stancek [this message]
2016-02-08 13:40 ` Jan Stancek
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=654964868.14006956.1454063625314.JavaMail.zimbra@redhat.com \
--to=jstancek@redhat.com \
--cc=alex.shi@intel.com \
--cc=guz.fnst@cn.fujitsu.com \
--cc=jolsa@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=riel@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.