public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Aaron Lu <ziqianlu@bytedance.com>
To: Florian Bezdeka <florian.bezdeka@siemens.com>
Cc: Valentin Schneider <vschneid@redhat.com>,
	Ben Segall <bsegall@google.com>,
	K Prateek Nayak <kprateek.nayak@amd.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Josh Don <joshdon@google.com>, Ingo Molnar <mingo@redhat.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Xi Wang <xii@google.com>,
	linux-kernel@vger.kernel.org, Juri Lelli <juri.lelli@redhat.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Mel Gorman <mgorman@suse.de>,
	Chengming Zhou <chengming.zhou@linux.dev>,
	Chuyi Zhou <zhouchuyi@bytedance.com>,
	Jan Kiszka <jan.kiszka@siemens.com>
Subject: Re: [RFC PATCH v2 7/7] sched/fair: alternative way of accounting throttle time
Date: Wed, 7 May 2025 17:09:23 +0800	[thread overview]
Message-ID: <20250507090923.GA194948@bytedance> (raw)
In-Reply-To: <099db50ce28f8b4bde37b051485de62a8f452cc2.camel@siemens.com>

Hi Florian,

On Thu, Apr 17, 2025 at 04:06:16PM +0200, Florian Bezdeka wrote:
> Hi Aaron,
> 
> On Wed, 2025-04-09 at 20:07 +0800, Aaron Lu wrote:
> > @@ -5889,27 +5943,21 @@ static int tg_unthrottle_up(struct task_group *tg, void *data)
> >  	cfs_rq->throttled_clock_pelt_time += rq_clock_pelt(rq) -
> >  		cfs_rq->throttled_clock_pelt;
> >  
> > -	if (cfs_rq->throttled_clock_self) {
> > -		u64 delta = rq_clock(rq) - cfs_rq->throttled_clock_self;
> > -
> > -		cfs_rq->throttled_clock_self = 0;
> > -
> > -		if (WARN_ON_ONCE((s64)delta < 0))
> > -			delta = 0;
> > -
> > -		cfs_rq->throttled_clock_self_time += delta;
> > -	}
> > +	if (cfs_rq->throttled_clock_self)
> > +		account_cfs_rq_throttle_self(cfs_rq);
> >  
> >  	/* Re-enqueue the tasks that have been throttled at this level. */
> >  	list_for_each_entry_safe(p, tmp, &cfs_rq->throttled_limbo_list, throttle_node) {
> >  		list_del_init(&p->throttle_node);
> > -		enqueue_task_fair(rq_of(cfs_rq), p, ENQUEUE_WAKEUP);
> > +		enqueue_task_fair(rq_of(cfs_rq), p, ENQUEUE_WAKEUP | ENQUEUE_THROTTLE);
> >  	}
> >  
> >  	/* Add cfs_rq with load or one or more already running entities to the list */
> >  	if (!cfs_rq_is_decayed(cfs_rq))
> >  		list_add_leaf_cfs_rq(cfs_rq);
> >  
> > +	WARN_ON_ONCE(cfs_rq->h_nr_throttled);
> > +
> >  	return 0;
> >  }
> >  
> 
> I got this warning while testing in our virtual environment:
> 
> Any idea?
>

I made a stupid mistake here: I thought when a cfs_rq gets unthrottled,
it should have no tasks in throttled state, hence I added that check in
tg_unthrottle_up():
        WARN_ON_ONCE(cfs_rq->h_nr_throttled);

But h_nr_throttled tracks hierarchical throttled task number, which
means if this cfs_rq has descendent cfs_rqs that are still in throttled
state, its h_nr_throttled can be > 0 when it gets unthrottled.

I just made a setup to emulate this scenario and can reproduce this
warning. I guess in your setup, there are multiple cpu.max settings in a
cgroup hierarchy.

It's just the warn_on_once() itself is incorrect, I'll remove it in next
version, thanks for the report!

Best regards,
Aaron

> [   26.639641] ------------[ cut here ]------------
> [   26.639644] WARNING: CPU: 5 PID: 0 at kernel/sched/fair.c:5967 tg_unthrottle_up+0x1a6/0x3d0
> [   26.639653] Modules linked in: veth xt_nat nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack_netlink xfrm_user xfrm_algo br_netfilter bridge stp llc xt_recent rfkill ip6t_REJECT nf_reject_ipv6 xt_hl ip6t_rt vsock_loopback vmw_vsock_virtio_transport_common ipt_REJECT nf_reject_ipv4 xt_LOG nf_log_syslog vmw_vsock_vmci_transport xt_comment vsock nft_limit xt_limit xt_addrtype xt_tcpudp xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_compat nf_tables intel_rapl_msr intel_rapl_common nfnetlink binfmt_misc intel_uncore_frequency_common isst_if_mbox_msr isst_if_common skx_edac_common nfit libnvdimm ghash_clmulni_intel sha512_ssse3 sha256_ssse3 sha1_ssse3 aesni_intel snd_pcm crypto_simd cryptd snd_timer rapl snd soundcore vmw_balloon vmwgfx pcspkr drm_ttm_helper ttm drm_client_lib button ac drm_kms_helper sg vmw_vmci evdev joydev serio_raw drm loop efi_pstore configfs efivarfs ip_tables x_tables autofs4 overlay nls_ascii nls_cp437 vfat fat ext4 crc16 mbcache jbd2 squashfs dm_verity dm_bufio reed_solomon dm_mod
> [   26.639715]  sd_mod ata_generic mptspi mptscsih ata_piix mptbase libata scsi_transport_spi psmouse scsi_mod vmxnet3 i2c_piix4 i2c_smbus scsi_common
> [   26.639726] CPU: 5 UID: 0 PID: 0 Comm: swapper/5 Not tainted 6.14.2-CFSfixes #1
> [   26.639729] Hardware name: VMware, Inc. VMware7,1/440BX Desktop Reference Platform, BIOS VMW71.00V.24224532.B64.2408191458 08/19/2024
> [   26.639731] RIP: 0010:tg_unthrottle_up+0x1a6/0x3d0
> [   26.639735] Code: 00 00 48 39 ca 74 14 48 8b 52 10 49 8b 8e 58 01 00 00 48 39 8a 28 01 00 00 74 24 41 8b 86 68 01 00 00 85 c0 0f 84 8d fe ff ff <0f> 0b e9 86 fe ff ff 49 8b 9e 38 01 00 00 41 8b 86 40 01 00 00 48
> [   26.639737] RSP: 0000:ffffa5df8029cec8 EFLAGS: 00010002
> [   26.639739] RAX: 0000000000000001 RBX: ffff981c6fcb6a80 RCX: ffff981943752e40
> [   26.639741] RDX: 0000000000000005 RSI: ffff981c6fcb6a80 RDI: ffff981943752d00
> [   26.639742] RBP: ffff9819607dc708 R08: ffff981c6fcb6a80 R09: 0000000000000000
> [   26.639744] R10: 0000000000000001 R11: ffff981969936a10 R12: ffff9819607dc708
> [   26.639745] R13: ffff9819607dc9d8 R14: ffff9819607dc800 R15: ffffffffad913fb0
> [   26.639747] FS:  0000000000000000(0000) GS:ffff981c6fc80000(0000) knlGS:0000000000000000
> [   26.639749] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   26.639750] CR2: 00007ff1292dc44c CR3: 000000015350e006 CR4: 00000000007706f0
> [   26.639779] PKRU: 55555554
> [   26.639781] Call Trace:
> [   26.639783]  <IRQ>
> [   26.639787]  ? __pfx_tg_unthrottle_up+0x10/0x10
> [   26.639790]  ? __pfx_tg_nop+0x10/0x10
> [   26.639793]  walk_tg_tree_from+0x58/0xb0
> [   26.639797]  unthrottle_cfs_rq+0xf0/0x360
> [   26.639800]  ? sched_clock_cpu+0xf/0x190
> [   26.639808]  __cfsb_csd_unthrottle+0x11c/0x170
> [   26.639812]  ? __pfx___cfsb_csd_unthrottle+0x10/0x10
> [   26.639816]  __flush_smp_call_function_queue+0x103/0x410
> [   26.639822]  __sysvec_call_function_single+0x1c/0xb0
> [   26.639826]  sysvec_call_function_single+0x6c/0x90
> [   26.639832]  </IRQ>
> [   26.639833]  <TASK>
> [   26.639834]  asm_sysvec_call_function_single+0x1a/0x20
> [   26.639840] RIP: 0010:pv_native_safe_halt+0xf/0x20
> [   26.639844] Code: 22 d7 c3 cc cc cc cc 0f 1f 40 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 66 90 0f 00 2d 45 c1 13 00 fb f4 <c3> cc cc cc cc 66 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90
> [   26.639846] RSP: 0000:ffffa5df80117ed8 EFLAGS: 00000242
> [   26.639848] RAX: 0000000000000005 RBX: ffff981940804000 RCX: ffff9819a9df7000
> [   26.639849] RDX: 0000000000000005 RSI: 0000000000000005 RDI: 000000000005c514
> [   26.639851] RBP: 0000000000000005 R08: 0000000000000000 R09: 0000000000000001
> [   26.639852] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
> [   26.639853] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> [   26.639858]  default_idle+0x9/0x20
> [   26.639861]  default_idle_call+0x30/0x100
> [   26.639863]  do_idle+0x1fd/0x240
> [   26.639869]  cpu_startup_entry+0x29/0x30
> [   26.639872]  start_secondary+0x11e/0x140
> [   26.639875]  common_startup_64+0x13e/0x141
> [   26.639881]  </TASK>
> [   26.639882] ---[ end trace 0000000000000000 ]---

  parent reply	other threads:[~2025-05-07  9:09 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-09 12:07 [RFC PATCH v2 0/7] Defer throttle when task exits to user Aaron Lu
2025-04-09 12:07 ` [RFC PATCH v2 1/7] sched/fair: Add related data structure for task based throttle Aaron Lu
2025-04-14  3:58   ` K Prateek Nayak
2025-04-14 11:55     ` Aaron Lu
2025-04-14 13:37       ` K Prateek Nayak
2025-04-09 12:07 ` [RFC PATCH v2 2/7] sched/fair: Handle throttle path " Aaron Lu
2025-04-14  8:54   ` Florian Bezdeka
2025-04-14 12:10     ` Aaron Lu
2025-04-14 14:39   ` Florian Bezdeka
2025-04-14 15:02     ` K Prateek Nayak
2025-04-30 10:01   ` Aaron Lu
2025-04-09 12:07 ` [RFC PATCH v2 3/7] sched/fair: Handle unthrottle " Aaron Lu
2025-04-09 12:07 ` [RFC PATCH v2 4/7] sched/fair: Take care of group/affinity/sched_class change for throttled task Aaron Lu
2025-04-09 12:07 ` [RFC PATCH v2 5/7] sched/fair: get rid of throttled_lb_pair() Aaron Lu
2025-04-09 12:07 ` [RFC PATCH v2 6/7] sched/fair: fix h_nr_runnable accounting with per-task throttle Aaron Lu
2025-04-09 12:07 ` [RFC PATCH v2 7/7] sched/fair: alternative way of accounting throttle time Aaron Lu
2025-04-09 14:24   ` Aaron Lu
2025-04-17 14:06   ` Florian Bezdeka
2025-04-18  3:15     ` Aaron Lu
2025-04-22 15:03       ` Florian Bezdeka
2025-04-23 11:26         ` Aaron Lu
2025-04-23 12:15           ` Florian Bezdeka
2025-04-24  2:26             ` Aaron Lu
2025-05-07  9:09     ` Aaron Lu [this message]
2025-05-07  9:33       ` Florian Bezdeka
2025-05-08  2:45         ` Aaron Lu
2025-05-08  6:13           ` Jan Kiszka
2025-05-08 13:43             ` Steven Rostedt
2025-04-14  3:05 ` [RFC PATCH v2 0/7] Defer throttle when task exits to user Chengming Zhou
2025-04-14 11:47   ` Aaron Lu
2025-04-14  8:54 ` Florian Bezdeka
2025-04-14 12:04   ` Aaron Lu
2025-04-15  5:29     ` Jan Kiszka
2025-04-15  6:05       ` K Prateek Nayak
2025-04-15  6:09         ` Jan Kiszka
2025-04-15  8:45           ` K Prateek Nayak
2025-04-15 10:21             ` Jan Kiszka
2025-04-15 11:14               ` K Prateek Nayak
     [not found]               ` <ec2cea83-07fe-472f-8320-911d215473fd@amd.com>
2025-04-15 15:49                 ` K Prateek Nayak
2025-04-22  2:10                   ` Aaron Lu
2025-04-22  2:54                     ` K Prateek Nayak
2025-04-22 14:54                       ` Florian Bezdeka
2025-04-15 10:34             ` K Prateek Nayak
2025-04-14 16:34 ` K Prateek Nayak
2025-04-15 11:25   ` Aaron Lu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250507090923.GA194948@bytedance \
    --to=ziqianlu@bytedance.com \
    --cc=bsegall@google.com \
    --cc=chengming.zhou@linux.dev \
    --cc=dietmar.eggemann@arm.com \
    --cc=florian.bezdeka@siemens.com \
    --cc=jan.kiszka@siemens.com \
    --cc=joshdon@google.com \
    --cc=juri.lelli@redhat.com \
    --cc=kprateek.nayak@amd.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=vincent.guittot@linaro.org \
    --cc=vschneid@redhat.com \
    --cc=xii@google.com \
    --cc=zhouchuyi@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox