All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Luis Claudio R. Goncalves" <lgoncalv@redhat.com>
To: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	Juri Lelli <juri.lelli@redhat.com>, Phil Auld <pauld@redhat.com>,
	Valentin Schneider <vschneid@redhat.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
	Shizhao Chen <shichen@redhat.com>,
	linux-kernel@vger.kernel.org, Omar Sandoval <osandov@fb.com>,
	Xuewen Yan <xuewen.yan@unisoc.com>
Subject: Re: sched: update_entity_lag does not handle corner case with task in PI chain
Date: Tue, 21 Oct 2025 21:35:52 -0300	[thread overview]
Message-ID: <aPgm6KvDx5Os2oJS@uudg.org> (raw)
In-Reply-To: <c10f6fda-aa8c-4d8e-a315-3c084af08862@amd.com>

On Tue, Oct 21, 2025 at 12:38:17PM +0530, K Prateek Nayak wrote:
> Hello Peter, Luis,
> 
> On 10/19/2025 1:27 AM, Peter Zijlstra wrote:
> >> [ 1805.450470] ------------[ cut here ]------------
> >> [ 1805.450474] WARNING: CPU: 2 PID: 19 at kernel/sched/fair.c:697 update_entity_lag+0x5b/0x70
> >> [ 1805.463366] Modules linked in: intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common skx_edac skx_edac_common nfit libnvdimm x86_pkg_temp_th
> >> ermal intel_powerclamp coretemp kvm_intel kvm platform_profile dell_wmi sparse_keymap rfkill irqbypass iTCO_wdt video mgag200 rapl iTCO_vendor_support dell_smbios ipmi_ssif in
> >> tel_cstate vfat dcdbas wmi_bmof intel_uncore dell_wmi_descriptor pcspkr fat i2c_algo_bit lpc_ich mei_me i2c_i801 i2c_smbus mei intel_pch_thermal ipmi_si acpi_power_meter acpi_
> >> ipmi ipmi_devintf ipmi_msghandler sg fuse loop xfs sd_mod i40e ghash_clmulni_intel libie libie_adminq ahci libahci tg3 libata wmi sunrpc dm_mirror dm_region_hash dm_log dm_mod
> >>  nfnetlink
> >> [ 1805.525160] CPU: 2 UID: 0 PID: 19 Comm: rcub/0 Kdump: loaded Not tainted 6.17.1-rt5 #1 PREEMPT_RT 
> >> [ 1805.534113] Hardware name: Dell Inc. PowerEdge R440/0WKGTH, BIOS 2.21.1 03/07/2024
> >> [ 1805.541678] RIP: 0010:update_entity_lag+0x5b/0x70
> >> [ 1805.546385] Code: 42 f8 48 81 3b 00 00 10 00 75 23 48 89 fa 48 f7 da 48 39 ea 48 0f 4c d5 48 39 fd 48 0f 4d d7 48 89 53 78 5b 5d c3 cc cc cc cc <0f> 0b eb b1 48 89 de e8 b9
> >>  8c ff ff 48 89 c7 eb d0 0f 1f 40 00 90
> >> [ 1805.565130] RSP: 0000:ffffcc9e802f7b90 EFLAGS: 00010046
> >> [ 1805.570358] RAX: 0000000000000000 RBX: ffff8959080c0080 RCX: 0000000000000000
> >> [ 1805.577488] RDX: 0000000000000000 RSI: ffff8959080c0080 RDI: ffff895592cc1c00
> >> [ 1805.584622] RBP: ffff895592cc1c00 R08: 0000000000008800 R09: 0000000000000000
> >> [ 1805.591756] R10: 0000000000000001 R11: 0000000000200b20 R12: 000000000000000e
> >> [ 1805.598886] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> >> [ 1805.606020] FS:  0000000000000000(0000) GS:ffff895947da2000(0000) knlGS:0000000000000000
> >> [ 1805.614107] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >> [ 1805.619853] CR2: 00007f655816ed40 CR3: 00000004ab854006 CR4: 00000000007726f0
> >> [ 1805.626985] PKRU: 55555554
> >> [ 1805.629696] Call Trace:
> >> [ 1805.632150]  <TASK>
> >> [ 1805.634258]  dequeue_entity+0x90/0x4f0
> >> [ 1805.638012]  dequeue_entities+0xc9/0x6b0
> >> [ 1805.641935]  dequeue_task_fair+0x8a/0x190
> >> [ 1805.645949]  ? sched_clock+0x10/0x30
> >> [ 1805.649527]  rt_mutex_setprio+0x318/0x4b0
> > 
> > So we have:
> > 
> > rt_mutex_setprio()
> > 
> >   rq = __task_rq_lock(p, ..); // this asserts p->pi_lock is held
> > 
> >   ...
> > 
> >   queued = task_on_rq_queued(rq); // basically reads p->on_rq
> >   running = task_current_donor()
> >   if (queued)
> >     dequeue_task(rq, p, queue_flags);
> >       dequeue_task_fair()
> >         dequeue_entities()
> > 	  dequeue_entity()
> > 	    update_entity_lag()
> > 	      WARN_ON_ONCE(se->on_rq);
> > 
> > So the only way to get here is if: rq->on_rq is in fact !0 *and*
> > se->on_rq is zero.
> > 
> > And I'm not at all sure how one would get into such a state.
> 
> This looks like something that can happen when a delayed task is
> dequeued from a throttled hierarchy. Matt had reported similar
> problem with wait_task_inactive() in
> https://lore.kernel.org/all/20250925133310.1843863-1-matt@readmodwrite.com/
> 
> rt_mutex_setprio()
>   ...
>   if (prev_class != next_class && p->se.sched_delayed)
>     dequeue_task(rq, p, DEQUEUE_DELAYED)
>       dequeue_entities(se = &p->se)
>         dequeue_entity(se)
>           se->on_rq = 0; /* se->on_rq turns 0 here */
>         ...
>         if (cfs_rq_throttled(cfs_rq))
>           return 0; /* Early return brfore __block_task() */
>   ...
> 
>   /* __block_task() not called; task_on_rq_queued() returns true. */
>   queued = task_on_rq_queued(p);
>   ...
> 
>   if (queued)
>     dequeue_task(rq, p, queue_flag)
>       dequeue_entities(se = &p->se)
>         dequeue_entity(se)
>           update_entity_lag(se)
>             WARN_ON_ONCE(!se->on_rq)
> 
> 
> v6.18 kernels will get rid of this issue as a part of per-task throttle
> feature and stable should pick up the fix for same on the thread soon. 

Thank you! You were right, your patch in that thread seems to have fixed
the issue I reported.

I read the thread you mentioned, built a test kernel with the patch and have
been running tests for more than 6h now without a single backtrace. As reported
earlier, I was able to hit the bug within 15 minutes without the patch.

Best regards,
Luis

> 
> -- 
> Thanks and Regards,
> Prateek
> 
---end quoted text---


  reply	other threads:[~2025-10-22  0:36 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-18 11:34 sched: update_entity_lag does not handle corner case with task in PI chain Luis Claudio R. Goncalves
2025-10-18 19:57 ` Peter Zijlstra
2025-10-20 11:00   ` Luis Claudio R. Goncalves
2025-10-21  7:08   ` K Prateek Nayak
2025-10-22  0:35     ` Luis Claudio R. Goncalves [this message]
2025-10-24  4:00       ` K Prateek Nayak

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aPgm6KvDx5Os2oJS@uudg.org \
    --to=lgoncalv@redhat.com \
    --cc=bsegall@google.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=juri.lelli@redhat.com \
    --cc=kprateek.nayak@amd.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=osandov@fb.com \
    --cc=pauld@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=shichen@redhat.com \
    --cc=vincent.guittot@linaro.org \
    --cc=vschneid@redhat.com \
    --cc=xuewen.yan@unisoc.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.