From: Aaron Lu <ziqianlu@bytedance.com>
To: Florian Bezdeka <florian.bezdeka@siemens.com>
Cc: Valentin Schneider <vschneid@redhat.com>,
Ben Segall <bsegall@google.com>,
K Prateek Nayak <kprateek.nayak@amd.com>,
Peter Zijlstra <peterz@infradead.org>,
Josh Don <joshdon@google.com>, Ingo Molnar <mingo@redhat.com>,
Vincent Guittot <vincent.guittot@linaro.org>,
Xi Wang <xii@google.com>,
linux-kernel@vger.kernel.org, Juri Lelli <juri.lelli@redhat.com>,
Dietmar Eggemann <dietmar.eggemann@arm.com>,
Steven Rostedt <rostedt@goodmis.org>,
Mel Gorman <mgorman@suse.de>,
Chengming Zhou <chengming.zhou@linux.dev>,
Chuyi Zhou <zhouchuyi@bytedance.com>,
Jan Kiszka <jan.kiszka@siemens.com>
Subject: Re: [RFC PATCH v2 7/7] sched/fair: alternative way of accounting throttle time
Date: Wed, 7 May 2025 17:09:23 +0800 [thread overview]
Message-ID: <20250507090923.GA194948@bytedance> (raw)
In-Reply-To: <099db50ce28f8b4bde37b051485de62a8f452cc2.camel@siemens.com>
Hi Florian,
On Thu, Apr 17, 2025 at 04:06:16PM +0200, Florian Bezdeka wrote:
> Hi Aaron,
>
> On Wed, 2025-04-09 at 20:07 +0800, Aaron Lu wrote:
> > @@ -5889,27 +5943,21 @@ static int tg_unthrottle_up(struct task_group *tg, void *data)
> > cfs_rq->throttled_clock_pelt_time += rq_clock_pelt(rq) -
> > cfs_rq->throttled_clock_pelt;
> >
> > - if (cfs_rq->throttled_clock_self) {
> > - u64 delta = rq_clock(rq) - cfs_rq->throttled_clock_self;
> > -
> > - cfs_rq->throttled_clock_self = 0;
> > -
> > - if (WARN_ON_ONCE((s64)delta < 0))
> > - delta = 0;
> > -
> > - cfs_rq->throttled_clock_self_time += delta;
> > - }
> > + if (cfs_rq->throttled_clock_self)
> > + account_cfs_rq_throttle_self(cfs_rq);
> >
> > /* Re-enqueue the tasks that have been throttled at this level. */
> > list_for_each_entry_safe(p, tmp, &cfs_rq->throttled_limbo_list, throttle_node) {
> > list_del_init(&p->throttle_node);
> > - enqueue_task_fair(rq_of(cfs_rq), p, ENQUEUE_WAKEUP);
> > + enqueue_task_fair(rq_of(cfs_rq), p, ENQUEUE_WAKEUP | ENQUEUE_THROTTLE);
> > }
> >
> > /* Add cfs_rq with load or one or more already running entities to the list */
> > if (!cfs_rq_is_decayed(cfs_rq))
> > list_add_leaf_cfs_rq(cfs_rq);
> >
> > + WARN_ON_ONCE(cfs_rq->h_nr_throttled);
> > +
> > return 0;
> > }
> >
>
> I got this warning while testing in our virtual environment:
>
> Any idea?
>
I made a stupid mistake here: I thought when a cfs_rq gets unthrottled,
it should have no tasks in throttled state, hence I added that check in
tg_unthrottle_up():
WARN_ON_ONCE(cfs_rq->h_nr_throttled);
But h_nr_throttled tracks hierarchical throttled task number, which
means if this cfs_rq has descendent cfs_rqs that are still in throttled
state, its h_nr_throttled can be > 0 when it gets unthrottled.
I just made a setup to emulate this scenario and can reproduce this
warning. I guess in your setup, there are multiple cpu.max settings in a
cgroup hierarchy.
It's just the warn_on_once() itself is incorrect, I'll remove it in next
version, thanks for the report!
Best regards,
Aaron
> [ 26.639641] ------------[ cut here ]------------
> [ 26.639644] WARNING: CPU: 5 PID: 0 at kernel/sched/fair.c:5967 tg_unthrottle_up+0x1a6/0x3d0
> [ 26.639653] Modules linked in: veth xt_nat nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack_netlink xfrm_user xfrm_algo br_netfilter bridge stp llc xt_recent rfkill ip6t_REJECT nf_reject_ipv6 xt_hl ip6t_rt vsock_loopback vmw_vsock_virtio_transport_common ipt_REJECT nf_reject_ipv4 xt_LOG nf_log_syslog vmw_vsock_vmci_transport xt_comment vsock nft_limit xt_limit xt_addrtype xt_tcpudp xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_compat nf_tables intel_rapl_msr intel_rapl_common nfnetlink binfmt_misc intel_uncore_frequency_common isst_if_mbox_msr isst_if_common skx_edac_common nfit libnvdimm ghash_clmulni_intel sha512_ssse3 sha256_ssse3 sha1_ssse3 aesni_intel snd_pcm crypto_simd cryptd snd_timer rapl snd soundcore vmw_balloon vmwgfx pcspkr drm_ttm_helper ttm drm_client_lib button ac drm_kms_helper sg vmw_vmci evdev joydev serio_raw drm loop efi_pstore configfs efivarfs ip_tables x_tables autofs4 overlay nls_ascii nls_cp437 vfat fat ext4 crc16 mbcache jbd2 squashfs dm_verity dm_bufio reed_solomon dm_mod
> [ 26.639715] sd_mod ata_generic mptspi mptscsih ata_piix mptbase libata scsi_transport_spi psmouse scsi_mod vmxnet3 i2c_piix4 i2c_smbus scsi_common
> [ 26.639726] CPU: 5 UID: 0 PID: 0 Comm: swapper/5 Not tainted 6.14.2-CFSfixes #1
> [ 26.639729] Hardware name: VMware, Inc. VMware7,1/440BX Desktop Reference Platform, BIOS VMW71.00V.24224532.B64.2408191458 08/19/2024
> [ 26.639731] RIP: 0010:tg_unthrottle_up+0x1a6/0x3d0
> [ 26.639735] Code: 00 00 48 39 ca 74 14 48 8b 52 10 49 8b 8e 58 01 00 00 48 39 8a 28 01 00 00 74 24 41 8b 86 68 01 00 00 85 c0 0f 84 8d fe ff ff <0f> 0b e9 86 fe ff ff 49 8b 9e 38 01 00 00 41 8b 86 40 01 00 00 48
> [ 26.639737] RSP: 0000:ffffa5df8029cec8 EFLAGS: 00010002
> [ 26.639739] RAX: 0000000000000001 RBX: ffff981c6fcb6a80 RCX: ffff981943752e40
> [ 26.639741] RDX: 0000000000000005 RSI: ffff981c6fcb6a80 RDI: ffff981943752d00
> [ 26.639742] RBP: ffff9819607dc708 R08: ffff981c6fcb6a80 R09: 0000000000000000
> [ 26.639744] R10: 0000000000000001 R11: ffff981969936a10 R12: ffff9819607dc708
> [ 26.639745] R13: ffff9819607dc9d8 R14: ffff9819607dc800 R15: ffffffffad913fb0
> [ 26.639747] FS: 0000000000000000(0000) GS:ffff981c6fc80000(0000) knlGS:0000000000000000
> [ 26.639749] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 26.639750] CR2: 00007ff1292dc44c CR3: 000000015350e006 CR4: 00000000007706f0
> [ 26.639779] PKRU: 55555554
> [ 26.639781] Call Trace:
> [ 26.639783] <IRQ>
> [ 26.639787] ? __pfx_tg_unthrottle_up+0x10/0x10
> [ 26.639790] ? __pfx_tg_nop+0x10/0x10
> [ 26.639793] walk_tg_tree_from+0x58/0xb0
> [ 26.639797] unthrottle_cfs_rq+0xf0/0x360
> [ 26.639800] ? sched_clock_cpu+0xf/0x190
> [ 26.639808] __cfsb_csd_unthrottle+0x11c/0x170
> [ 26.639812] ? __pfx___cfsb_csd_unthrottle+0x10/0x10
> [ 26.639816] __flush_smp_call_function_queue+0x103/0x410
> [ 26.639822] __sysvec_call_function_single+0x1c/0xb0
> [ 26.639826] sysvec_call_function_single+0x6c/0x90
> [ 26.639832] </IRQ>
> [ 26.639833] <TASK>
> [ 26.639834] asm_sysvec_call_function_single+0x1a/0x20
> [ 26.639840] RIP: 0010:pv_native_safe_halt+0xf/0x20
> [ 26.639844] Code: 22 d7 c3 cc cc cc cc 0f 1f 40 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 66 90 0f 00 2d 45 c1 13 00 fb f4 <c3> cc cc cc cc 66 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90
> [ 26.639846] RSP: 0000:ffffa5df80117ed8 EFLAGS: 00000242
> [ 26.639848] RAX: 0000000000000005 RBX: ffff981940804000 RCX: ffff9819a9df7000
> [ 26.639849] RDX: 0000000000000005 RSI: 0000000000000005 RDI: 000000000005c514
> [ 26.639851] RBP: 0000000000000005 R08: 0000000000000000 R09: 0000000000000001
> [ 26.639852] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
> [ 26.639853] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> [ 26.639858] default_idle+0x9/0x20
> [ 26.639861] default_idle_call+0x30/0x100
> [ 26.639863] do_idle+0x1fd/0x240
> [ 26.639869] cpu_startup_entry+0x29/0x30
> [ 26.639872] start_secondary+0x11e/0x140
> [ 26.639875] common_startup_64+0x13e/0x141
> [ 26.639881] </TASK>
> [ 26.639882] ---[ end trace 0000000000000000 ]---
next prev parent reply other threads:[~2025-05-07 9:09 UTC|newest]
Thread overview: 45+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-09 12:07 [RFC PATCH v2 0/7] Defer throttle when task exits to user Aaron Lu
2025-04-09 12:07 ` [RFC PATCH v2 1/7] sched/fair: Add related data structure for task based throttle Aaron Lu
2025-04-14 3:58 ` K Prateek Nayak
2025-04-14 11:55 ` Aaron Lu
2025-04-14 13:37 ` K Prateek Nayak
2025-04-09 12:07 ` [RFC PATCH v2 2/7] sched/fair: Handle throttle path " Aaron Lu
2025-04-14 8:54 ` Florian Bezdeka
2025-04-14 12:10 ` Aaron Lu
2025-04-14 14:39 ` Florian Bezdeka
2025-04-14 15:02 ` K Prateek Nayak
2025-04-30 10:01 ` Aaron Lu
2025-04-09 12:07 ` [RFC PATCH v2 3/7] sched/fair: Handle unthrottle " Aaron Lu
2025-04-09 12:07 ` [RFC PATCH v2 4/7] sched/fair: Take care of group/affinity/sched_class change for throttled task Aaron Lu
2025-04-09 12:07 ` [RFC PATCH v2 5/7] sched/fair: get rid of throttled_lb_pair() Aaron Lu
2025-04-09 12:07 ` [RFC PATCH v2 6/7] sched/fair: fix h_nr_runnable accounting with per-task throttle Aaron Lu
2025-04-09 12:07 ` [RFC PATCH v2 7/7] sched/fair: alternative way of accounting throttle time Aaron Lu
2025-04-09 14:24 ` Aaron Lu
2025-04-17 14:06 ` Florian Bezdeka
2025-04-18 3:15 ` Aaron Lu
2025-04-22 15:03 ` Florian Bezdeka
2025-04-23 11:26 ` Aaron Lu
2025-04-23 12:15 ` Florian Bezdeka
2025-04-24 2:26 ` Aaron Lu
2025-05-07 9:09 ` Aaron Lu [this message]
2025-05-07 9:33 ` Florian Bezdeka
2025-05-08 2:45 ` Aaron Lu
2025-05-08 6:13 ` Jan Kiszka
2025-05-08 13:43 ` Steven Rostedt
2025-04-14 3:05 ` [RFC PATCH v2 0/7] Defer throttle when task exits to user Chengming Zhou
2025-04-14 11:47 ` Aaron Lu
2025-04-14 8:54 ` Florian Bezdeka
2025-04-14 12:04 ` Aaron Lu
2025-04-15 5:29 ` Jan Kiszka
2025-04-15 6:05 ` K Prateek Nayak
2025-04-15 6:09 ` Jan Kiszka
2025-04-15 8:45 ` K Prateek Nayak
2025-04-15 10:21 ` Jan Kiszka
2025-04-15 11:14 ` K Prateek Nayak
[not found] ` <ec2cea83-07fe-472f-8320-911d215473fd@amd.com>
2025-04-15 15:49 ` K Prateek Nayak
2025-04-22 2:10 ` Aaron Lu
2025-04-22 2:54 ` K Prateek Nayak
2025-04-22 14:54 ` Florian Bezdeka
2025-04-15 10:34 ` K Prateek Nayak
2025-04-14 16:34 ` K Prateek Nayak
2025-04-15 11:25 ` Aaron Lu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250507090923.GA194948@bytedance \
--to=ziqianlu@bytedance.com \
--cc=bsegall@google.com \
--cc=chengming.zhou@linux.dev \
--cc=dietmar.eggemann@arm.com \
--cc=florian.bezdeka@siemens.com \
--cc=jan.kiszka@siemens.com \
--cc=joshdon@google.com \
--cc=juri.lelli@redhat.com \
--cc=kprateek.nayak@amd.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mgorman@suse.de \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=vincent.guittot@linaro.org \
--cc=vschneid@redhat.com \
--cc=xii@google.com \
--cc=zhouchuyi@bytedance.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.