From: Peter Zijlstra <peterz@infradead.org>
To: "Stephan Bärwolf" <stephan.baerwolf@tu-ilmenau.de>
Cc: linux-kernel@vger.kernel.org
Subject: Re: sched: fix/optimise some issues
Date: Thu, 21 Jul 2011 18:32:47 +0200 [thread overview]
Message-ID: <1311265967.29152.160.camel@twins> (raw)
In-Reply-To: <4E28557A.7040704@tu-ilmenau.de>
On Thu, 2011-07-21 at 18:36 +0200, Stephan Bärwolf wrote:
> > Right, so I've often wanted a [us]128 type, and gcc has some (broken?)
> > support for that, but overhead has always kept me from it.
> 128bit sched_vruntime_t support seems to be running fine, when compiled with
> gcc (Gentoo 4.4.5 p1.2, pie-0.4.5) 4.4.5.
> Of course overhead is a problem (but there is also overhead using u64 on
> x86),
Yeah, I know, but luckily all 32bit computing shall die sooner rather
than later. But there really wasn't much choice there anyway, 32bit
simply won't do.
> that is why it should be Kconfig selectable (for servers with many
> processes,
> deep cgroups and many different priorities?).
Sadly that's not how things work in practice, distro's will have to
enable the option and that means that pretty much everybody runs it. The
whole cgroup crap is already _way_ too expensive.
> But I think also abstracting the whole vruntime-stuff into a seperate
> collection
> simplifies further evaluations and adpations. (Think of central
> statistics collection
> for example maximum timeslice seen or happened overflows - without changing
> all the lines of code with the risk of missing sth.)
It made rather a mess of things,
> > There's also the non-atomicy thing to consider, see min_vruntime_copy
> > etc.
> I think atomicy is not an (great) issue, because of two reasons:
> a) on x86 the u64 wouldn't be atomic, too (vruntime is u64 not
> atomic64_t)
atomic64_t isn't needed in order to guarantee consistent loads, Linux
depends on the fact that all naturally aligned loads are complete loads
(no partials etc.).
> b) every operation on cfs_rq->min_vruntime should happen, when
> holding the runqueue-lock?.
---
commit 3fe1698b7fe05aeb063564e71e40d09f28d8e80c
Author: Peter Zijlstra <a.p.zijlstra@chello.nl>
Date: Tue Apr 5 17:23:48 2011 +0200
sched: Deal with non-atomic min_vruntime reads on 32bits
In order to avoid reading partial updated min_vruntime values on 32bit
implement a seqcount like solution.
Reviewed-by: Frank Rowand <frank.rowand@am.sony.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/20110405152729.111378493@chello.nl
Signed-off-by: Ingo Molnar <mingo@elte.hu>
diff --git a/kernel/sched.c b/kernel/sched.c
index 46f42ca..7a5eb26 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -312,6 +312,9 @@ struct cfs_rq {
u64 exec_clock;
u64 min_vruntime;
+#ifndef CONFIG_64BIT
+ u64 min_vruntime_copy;
+#endif
struct rb_root tasks_timeline;
struct rb_node *rb_leftmost;
diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index ad4c414f..054cebb 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -358,6 +358,10 @@ static void update_min_vruntime(struct cfs_rq *cfs_rq)
}
cfs_rq->min_vruntime = max_vruntime(cfs_rq->min_vruntime, vruntime);
+#ifndef CONFIG_64BIT
+ smp_wmb();
+ cfs_rq->min_vruntime_copy = cfs_rq->min_vruntime;
+#endif
}
/*
@@ -1376,10 +1380,21 @@ static void task_waking_fair(struct task_struct *p)
{
struct sched_entity *se = &p->se;
struct cfs_rq *cfs_rq = cfs_rq_of(se);
+ u64 min_vruntime;
- lockdep_assert_held(&task_rq(p)->lock);
+#ifndef CONFIG_64BIT
+ u64 min_vruntime_copy;
- se->vruntime -= cfs_rq->min_vruntime;
+ do {
+ min_vruntime_copy = cfs_rq->min_vruntime_copy;
+ smp_rmb();
+ min_vruntime = cfs_rq->min_vruntime;
+ } while (min_vruntime != min_vruntime_copy);
+#else
+ min_vruntime = cfs_rq->min_vruntime;
+#endif
+
+ se->vruntime -= min_vruntime;
}
#ifdef CONFIG_FAIR_GROUP_SCHED
next prev parent reply other threads:[~2011-07-21 16:32 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-07-20 13:42 sched: fix/optimise some issues Stephan Bärwolf
2011-07-20 19:11 ` Peter Zijlstra
2011-07-21 1:00 ` Mike Galbraith
2011-07-20 19:11 ` Peter Zijlstra
2011-07-20 19:11 ` Peter Zijlstra
2011-07-21 15:08 ` Peter Zijlstra
2011-07-21 16:36 ` Stephan Bärwolf
2011-07-21 16:32 ` Peter Zijlstra [this message]
2011-07-21 16:43 ` Peter Zijlstra
2011-07-21 16:51 ` Peter Zijlstra
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1311265967.29152.160.camel@twins \
--to=peterz@infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=stephan.baerwolf@tu-ilmenau.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox