From: Peter Zijlstra <peterz@infradead.org>
To: Yuyang Du <yuyang.du@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>,
Andrey Ryabinin <aryabinin@virtuozzo.com>,
Linus Torvalds <torvalds@linux-foundation.org>,
Mike Galbraith <efault@gmx.de>,
Thomas Gleixner <tglx@linutronix.de>,
bsegall@google.com, morten.rasmussen@arm.com, pjt@google.com,
steve.muckle@linaro.org, linux-kernel@vger.kernel.org,
kernel@kyup.com
Subject: [PATCH] sched/fair: Fix cfs_rq avg tracking underflow
Date: Fri, 17 Jun 2016 11:19:48 +0200 [thread overview]
Message-ID: <20160617091948.GJ30927@twins.programming.kicks-ass.net> (raw)
In-Reply-To: <20160616122504.GG30927@twins.programming.kicks-ass.net>
In any case, I've settled on the below. It generates crap code, but
hopefully GCC can be fixed sometime this century.
---
Subject: sched/fair: Fix cfs_rq avg tracking underflow
From: Peter Zijlstra <peterz@infradead.org>
Date: Thu, 16 Jun 2016 10:50:40 +0200
As per commit:
b7fa30c9cc48 ("sched/fair: Fix post_init_entity_util_avg() serialization")
> the code generated from update_cfs_rq_load_avg():
>
> if (atomic_long_read(&cfs_rq->removed_load_avg)) {
> s64 r = atomic_long_xchg(&cfs_rq->removed_load_avg, 0);
> sa->load_avg = max_t(long, sa->load_avg - r, 0);
> sa->load_sum = max_t(s64, sa->load_sum - r * LOAD_AVG_MAX, 0);
> removed_load = 1;
> }
>
> turns into:
>
> ffffffff81087064: 49 8b 85 98 00 00 00 mov 0x98(%r13),%rax
> ffffffff8108706b: 48 85 c0 test %rax,%rax
> ffffffff8108706e: 74 40 je ffffffff810870b0 <update_blocked_averages+0xc0>
> ffffffff81087070: 4c 89 f8 mov %r15,%rax
> ffffffff81087073: 49 87 85 98 00 00 00 xchg %rax,0x98(%r13)
> ffffffff8108707a: 49 29 45 70 sub %rax,0x70(%r13)
> ffffffff8108707e: 4c 89 f9 mov %r15,%rcx
> ffffffff81087081: bb 01 00 00 00 mov $0x1,%ebx
> ffffffff81087086: 49 83 7d 70 00 cmpq $0x0,0x70(%r13)
> ffffffff8108708b: 49 0f 49 4d 70 cmovns 0x70(%r13),%rcx
>
> Which you'll note ends up with sa->load_avg -= r in memory at
> ffffffff8108707a.
So I _should_ have looked at other unserialized users of ->load_avg, but
alas. Luckily nikbor reported a similar /0 from task_h_load() which
instantly triggered recollection of this here problem.
Aside from the intermediate value hitting memory and causing problems,
there's another problem: the underflow detection relies on the signed
bit. This reduces the effective width of the variables, iow. its
effectively the same as having these variables be of signed type.
This patches changes to a different means of unsigned underflow
detection to not rely on the signed bit. This allows the variables to
use the 'full' unsigned range. And it does so with explicit LOAD -
STORE to ensure any intermediate value will never be visible in
memory, allowing these unserialized loads.
Note: GCC generates crap code for this
Note2: I say 'full' above, if we end up at U*_MAX we'll still explode;
maybe we should do clamping on add too.
Cc: Yuyang Du <yuyang.du@intel.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: bsegall@google.com
Cc: pjt@google.com
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: steve.muckle@linaro.org
Cc: kernel@kyup.com
Cc: morten.rasmussen@arm.com
Fixes: 9d89c257dfb9 ("sched/fair: Rewrite runnable load and utilization average tracking")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
kernel/sched/fair.c | 33 +++++++++++++++++++++++++--------
1 file changed, 25 insertions(+), 8 deletions(-)
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2878,6 +2878,23 @@ static inline void cfs_rq_util_change(st
}
}
+/*
+ * Unsigned subtract and clamp on underflow.
+ *
+ * Explicitly do a load-store to ensure the intermediate value never hits
+ * memory. This allows lockless observations without ever seeing the negative
+ * values.
+ */
+#define sub_positive(_ptr, _val) do { \
+ typeof(_ptr) ptr = (_ptr); \
+ typeof(*ptr) val = (_val); \
+ typeof(*ptr) res, var = READ_ONCE(*ptr); \
+ res = var - val; \
+ if (res > var) \
+ res = 0; \
+ WRITE_ONCE(*ptr, res); \
+} while (0)
+
/* Group cfs_rq's load_avg is used for task_h_load and update_cfs_share */
static inline int
update_cfs_rq_load_avg(u64 now, struct cfs_rq *cfs_rq, bool update_freq)
@@ -2887,15 +2904,15 @@ update_cfs_rq_load_avg(u64 now, struct c
if (atomic_long_read(&cfs_rq->removed_load_avg)) {
s64 r = atomic_long_xchg(&cfs_rq->removed_load_avg, 0);
- sa->load_avg = max_t(long, sa->load_avg - r, 0);
- sa->load_sum = max_t(s64, sa->load_sum - r * LOAD_AVG_MAX, 0);
+ sub_positive(&sa->load_avg, r);
+ sub_positive(&sa->load_sum, r * LOAD_AVG_MAX);
removed_load = 1;
}
if (atomic_long_read(&cfs_rq->removed_util_avg)) {
long r = atomic_long_xchg(&cfs_rq->removed_util_avg, 0);
- sa->util_avg = max_t(long, sa->util_avg - r, 0);
- sa->util_sum = max_t(s32, sa->util_sum - r * LOAD_AVG_MAX, 0);
+ sub_positive(&sa->util_avg, r);
+ sub_positive(&sa->util_sum, r * LOAD_AVG_MAX);
removed_util = 1;
}
@@ -2968,10 +2985,10 @@ static void detach_entity_load_avg(struc
&se->avg, se->on_rq * scale_load_down(se->load.weight),
cfs_rq->curr == se, NULL);
- cfs_rq->avg.load_avg = max_t(long, cfs_rq->avg.load_avg - se->avg.load_avg, 0);
- cfs_rq->avg.load_sum = max_t(s64, cfs_rq->avg.load_sum - se->avg.load_sum, 0);
- cfs_rq->avg.util_avg = max_t(long, cfs_rq->avg.util_avg - se->avg.util_avg, 0);
- cfs_rq->avg.util_sum = max_t(s32, cfs_rq->avg.util_sum - se->avg.util_sum, 0);
+ sub_positive(&cfs_rq->avg.load_avg, se->avg.load_avg);
+ sub_positive(&cfs_rq->avg.load_sum, se->avg.load_sum);
+ sub_positive(&cfs_rq->avg.util_avg, se->avg.util_avg);
+ sub_positive(&cfs_rq->avg.util_sum, se->avg.util_sum);
cfs_rq_util_change(cfs_rq);
}
next prev parent reply other threads:[~2016-06-17 9:20 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-06-09 9:01 Divide-by-zero in post_init_entity_util_avg Chris Wilson
2016-06-09 1:33 ` Yuyang Du
2016-06-09 13:07 ` Peter Zijlstra
2016-06-12 22:25 ` Yuyang Du
2016-06-14 11:25 ` [tip:sched/core] sched/fair: Fix post_init_entity_util_avg() serialization tip-bot for Peter Zijlstra
2016-06-16 8:50 ` Divide-by-zero in post_init_entity_util_avg Peter Zijlstra
2016-06-16 12:25 ` Peter Zijlstra
2016-06-16 16:16 ` Peter Zijlstra
2016-06-17 8:16 ` Andrey Ryabinin
2016-06-17 8:23 ` Peter Zijlstra
2016-06-17 9:19 ` Peter Zijlstra [this message]
2016-06-17 2:01 ` [PATCH] sched/fair: Fix cfs_rq avg tracking underflow Yuyang Du
2016-06-20 13:24 ` [tip:sched/urgent] " tip-bot for Peter Zijlstra
2016-06-09 10:29 ` Divide-by-zero in post_init_entity_util_avg Peter Zijlstra
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160617091948.GJ30927@twins.programming.kicks-ass.net \
--to=peterz@infradead.org \
--cc=aryabinin@virtuozzo.com \
--cc=bsegall@google.com \
--cc=chris@chris-wilson.co.uk \
--cc=efault@gmx.de \
--cc=kernel@kyup.com \
--cc=linux-kernel@vger.kernel.org \
--cc=morten.rasmussen@arm.com \
--cc=pjt@google.com \
--cc=steve.muckle@linaro.org \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
--cc=yuyang.du@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox