From: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
To: Peter Zijlstra <peterz@infradead.org>,
Ingo Molnar <mingo@redhat.com>,
linux-kernel@vger.kernel.org
Cc: stable@vger.kernel.org
Subject: Re: [PATCH] sched/fair: initialize throttle_count for new task-groups lazily
Date: Tue, 21 Jun 2016 16:41:03 +0300 [thread overview]
Message-ID: <576943EF.8080902@yandex-team.ru> (raw)
In-Reply-To: <146608182119.21870.8439834428248129633.stgit@buzz>
On 16.06.2016 15:57, Konstantin Khlebnikov wrote:
> Cgroup created inside throttled group must inherit current throttle_count.
> Broken throttle_count allows to nominate throttled entries as a next buddy,
> later this leads to null pointer dereference in pick_next_task_fair().
example of kernel oops to summon maintainers
<1>[3627487.878297] BUG: unable to handle kernel NULL pointer dereference at 0000000000000038
<1>[3627487.879028] IP: [<ffffffff8109ab6c>] set_next_entity+0x1c/0x80
<4>[3627487.879837] PGD 0
<4>[3627487.880567] Oops: 0000 [#1] SMP
<4>[3627487.881292] Modules linked in: macvlan overlay ipmi_si ipmi_devintf ipmi_msghandler ip6t_REJECT nf_reject_ipv6 xt_tcpudp
ip6table_filter ip6_tables x_tables quota_v2 quota_tree cls_cgroup sch_htb bridge netconsole configfs 8021q mrp garp stp llc
x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crc32_pclmul ast ttm drm_kms_helper drm ghash_clmulni_intel aesni_intel
ablk_helper sb_edac cryptd lrw lpc_ich gf128mul edac_core sysimgblt glue_helper aes_x86_64 microcode sysfillrect syscopyarea acpi_pad
tcp_htcp mlx4_en mlx4_core vxlan udp_tunnel ip6_udp_tunnel igb i2c_algo_bit isci ixgbe libsas i2c_core ahci dca ptp libahci
scsi_transport_sas pps_core mdio raid10 raid456 async_pq async_xor xor async_memcpy async_raid6_recov raid6_pq async_tx raid1 raid0
multipath linear [last unloaded: ipmi_msghandler]<4>[3627487.886379]
<4>[3627487.887892] CPU: 21 PID: 0 Comm: swapper/21 Not tainted 3.18.19-24 #1
<4>[3627487.889429] Hardware name: AIC 1D-HV24-02/MB-DPSB04-04, BIOS IVYBV058 07/01/2015
<4>[3627487.891008] task: ffff881fd336f540 ti: ffff881fd33a4000 task.ti: ffff881fd33a4000
<4>[3627487.892569] RIP: 0010:[<ffffffff8109ab6c>] [<ffffffff8109ab6c>] set_next_entity+0x1c/0x80
<4>[3627487.894200] RSP: 0018:ffff881fd33a7d68 EFLAGS: 00010082
<4>[3627487.895750] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff881fffdb2b70
<4>[3627487.897276] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff881fd1193600
<4>[3627487.898793] RBP: ffff881fd33a7d88 R08: 0000000000000f6d R09: 0000000000000000
<4>[3627487.900358] R10: 0000000000000078 R11: 0000000000000000 R12: 0000000000000000
<4>[3627487.901898] R13: ffffffff8180f3c0 R14: ffff881fd33a4000 R15: ffff881fd1193600
<4>[3627487.903381] FS: 0000000000000000(0000) GS:ffff881fffda0000(0000) knlGS:0000000000000000
<4>[3627487.904920] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[3627487.906382] CR2: 0000000000000038 CR3: 0000000001c14000 CR4: 00000000001407e0
<4>[3627487.907904] Stack:
<4>[3627487.909365] ffff881fffdb2b00 ffff881fffdb2b00 0000000000000000 ffffffff8180f3c0
<4>[3627487.910837] ffff881fd33a7e18 ffffffff810a1b18 00000001360794f4 00000001760794f3
<4>[3627487.912322] ffff881fd2888000 0000000000000000 0000000000012b00 ffff881fd336f540
<4>[3627487.913770] Call Trace:
<4>[3627487.915188] [<ffffffff810a1b18>] pick_next_task_fair+0x88/0x5f0
<4>[3627487.916573] [<ffffffff816d258f>] __schedule+0x6ef/0x820
<4>[3627487.917936] [<ffffffff816d2799>] schedule+0x29/0x70
<4>[3627487.919277] [<ffffffff816d2a76>] schedule_preempt_disabled+0x16/0x20
<4>[3627487.920632] [<ffffffff810a8ddb>] cpu_startup_entry+0x14b/0x3d0
<4>[3627487.921999] [<ffffffff810ce272>] ? clockevents_register_device+0xe2/0x140
<4>[3627487.923323] [<ffffffff810463fc>] start_secondary+0x14c/0x160
<4>[3627487.924660] Code: 89 ff 48 89 e5 f0 48 0f b3 3e 5d c3 0f 1f 00 0f 1f 44 00 00 55 48 89 e5 48 83 ec 20 48 89 5d e8 4c 89 65 f0 48 89
f3 4c 89 6d f8 <44> 8b 4e 38 49 89 fc 45 85 c9 74 17 4c 8d 6e 10 4c 39 6f 30 74
<1>[3627487.927435] RIP [<ffffffff8109ab6c>] set_next_entity+0x1c/0x80
<4>[3627487.928741] RSP <ffff881fd33a7d68>
<4>[3627487.930010] CR2: 0000000000000038
>
> This patch initialize cfs_rq->throttle_count at first enqueue: laziness
> allows to skip locking all rq at group creation. Lazy approach also allows
> to skip full sub-tree scan at throttling hierarchy (not in this patch).
>
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
> Cc: Stable <stable@vger.kernel.org> # v3.2+
> ---
> kernel/sched/fair.c | 19 +++++++++++++++++++
> kernel/sched/sched.h | 2 +-
> 2 files changed, 20 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 218f8e83db73..fe809fe169d2 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -4185,6 +4185,25 @@ static void check_enqueue_throttle(struct cfs_rq *cfs_rq)
> if (!cfs_bandwidth_used())
> return;
>
> + /* synchronize hierarchical throttle counter */
> + if (unlikely(!cfs_rq->throttle_uptodate)) {
> + struct rq *rq = rq_of(cfs_rq);
> + struct cfs_rq *pcfs_rq;
> + struct task_group *tg;
> +
> + cfs_rq->throttle_uptodate = 1;
> + /* get closest uptodate node because leaves goes first */
> + for (tg = cfs_rq->tg->parent; tg; tg = tg->parent) {
> + pcfs_rq = tg->cfs_rq[cpu_of(rq)];
> + if (pcfs_rq->throttle_uptodate)
> + break;
> + }
> + if (tg) {
> + cfs_rq->throttle_count = pcfs_rq->throttle_count;
> + cfs_rq->throttled_clock_task = rq_clock_task(rq);
> + }
> + }
> +
> /* an active group must be handled by the update_curr()->put() path */
> if (!cfs_rq->runtime_enabled || cfs_rq->curr)
> return;
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index 72f1f3087b04..7cbeb92a1cb9 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -437,7 +437,7 @@ struct cfs_rq {
>
> u64 throttled_clock, throttled_clock_task;
> u64 throttled_clock_task_time;
> - int throttled, throttle_count;
> + int throttled, throttle_count, throttle_uptodate;
> struct list_head throttled_list;
> #endif /* CONFIG_CFS_BANDWIDTH */
> #endif /* CONFIG_FAIR_GROUP_SCHED */
>
--
Konstantin
next prev parent reply other threads:[~2016-06-21 13:41 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-06-16 12:57 [PATCH] sched/fair: initialize throttle_count for new task-groups lazily Konstantin Khlebnikov
2016-06-16 17:03 ` bsegall
2016-06-16 17:23 ` Konstantin Khlebnikov
2016-06-16 17:33 ` bsegall
2016-06-21 13:41 ` Konstantin Khlebnikov [this message]
2016-06-21 21:10 ` Peter Zijlstra
2016-06-22 8:10 ` Konstantin Khlebnikov
2016-06-22 8:23 ` Peter Zijlstra
2016-06-24 8:59 ` [tip:sched/urgent] sched/fair: Initialize " tip-bot for Konstantin Khlebnikov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=576943EF.8080902@yandex-team.ru \
--to=khlebnikov@yandex-team.ru \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.