public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Vishal Chourasia <vishalc@linux.ibm.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: linux-kernel@vger.kernel.org, Ingo Molnar <mingo@redhat.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
	Valentin Schneider <vschneid@redhat.com>,
	luis.machado@arm.com
Subject: Re: sched/fair: Kernel panics in pick_next_entity
Date: Tue, 1 Oct 2024 00:35:16 +0530	[thread overview]
Message-ID: <Zvr2bLBEYyu1gtNz@linux.ibm.com> (raw)
In-Reply-To: <20240930144157.GH5594@noisy.programming.kicks-ass.net>

On Mon, Sep 30, 2024 at 04:41:57PM +0200, Peter Zijlstra wrote:
> On Thu, Sep 26, 2024 at 06:12:19PM +0530, Vishal Chourasia wrote:
> > I've noticed a kernel panic consistently occurring on the mainline v6.11
> > kernel (see attached dmesg log below). 
> > 
> > The panic occurs almost every time I build the Linux kernel from source.
> > 
> > Steps to Reproduce:
> > 
> > make clean
> > ./scripts/config -e LOCALVERSION_AUTO
> > ./scripts/config --set-str LOCALVERSION -master-with-print
> > make localmodconfig
> > make -j8 -s vmlinux modules
> > 
> > >From my investigation, it seems that the function pick_eevdf() can return NULL.
> > Commit f12e1488 ("sched/fair: Prepare pick_next_task() for delayed dequeue") 
> > introduces an access on the return value of pick_eevdf(). If 'se' was NULL, 
> > it can lead to a null pointer dereference. 
> 
> Even before that commit we relied on that thing not being NULL, notably
> f12e1488^1 has:
> 
>                 se = pick_next_entity(cfs_rq);
>                 cfs_rq = group_cfs_rq(se);
> 
> Which will similarly explode when pick_eevdf() goes wobbly.
> 
> > To determine why pick_eevdf() would return NULL, I added a few printk statements
> > Based on one of the printk logs in the shared dmesg log, it appears that if
> > pick_eevdf() is called for a 'cfs_rq' whose 'cfs_rq->curr' is NULL and there
> > are no eligible entities on that 'cfs_rq', it will return NULL. 
> 
> Right, that is not a valid state. Which seems to suggest something went
> sideways with the eligibility thing -- as Luis suggested.
> 
> > I have not been able to think of a quick reproducer to trigger a panic
> > for this case. Hoping if someone can guide me on this.
> > 
> > Note: The following dmesg log also contains a warning reported too. Panic
> > happens later.
> > 
> > ------------[ cut here ]------------
> > !se->on_rq
> > WARNING: CPU: 1 PID: 92333 at kernel/sched/fair.c:705 update_entity_lag+0xcc/0xf0
> > Modules linked in: binfmt_misc bonding tls rfkill ibmveth pseries_rng vmx_crypto nd_pmem nd_btt dax_pmem loop nfnetlink xfs sd_mod papr_scm libnvdimm ibmvscsi scsi_transport_srp pseries_wdt dm_mirror dm_region_hash dm_log dm_mod fuse
> > CPU: 1 UID: 0 PID: 92333 Comm: genksyms Tainted: G        W          6.11.0-master-with-print-10547-g684a64bf32b6-dirty #64
> > Tainted: [W]=WARN
> > Hardware name: IBM,9080-HEX POWER10 (architected) hv:phyp pSeries
> > NIP:  c0000000001cdfcc LR: c0000000001cdfc8 CTR: 0000000000000000
> > REGS: c00000005c62ee50 TRAP: 0700   Tainted: G        W           (6.11.0-master-with-print-10547-g684a64bf32b6-dirty)
> > MSR:  8000000000029033 <SF,EE,ME,IR,DR,RI,LE>  CR: 24002222  XER: 00000005
> > CFAR: c000000000156a10 IRQMASK: 1
> > GPR00: c0000000001cdfc8 c00000005c62f0f0 c000000001b57400 000000000000000a
> > GPR04: 00000000ffff7fff c00000005c62eee0 c00000005c62eed8 00000007fb050000
> > GPR08: 0000000000000027 0000000000000000 0000000000000000 c000000002758de0
> > GPR12: c000000002a18d88 c0000007fffef480 0000000000000000 0000000000000000
> > GPR16: c000000002c56d40 0000000000000000 c00000005c62f5b4 0000000000000000
> > GPR20: fffffffffffffdef 0000000000000000 0000000000000002 c000000003cd7300
> > GPR24: 0000000000000000 0000000000000008 c0000007fd1d3f80 0000000000000000
> > GPR28: 0000000000000001 0000000000000009 c0000007fd1d4080 c0000000656a0000
> > NIP [c0000000001cdfcc] update_entity_lag+0xcc/0xf0
> > LR [c0000000001cdfc8] update_entity_lag+0xc8/0xf0
> > Call Trace:
> > [c00000005c62f0f0] [c0000000001cdfc8] update_entity_lag+0xc8/0xf0 (unreliable)
> > [c00000005c62f160] [c0000000001cea80] dequeue_entity+0xb0/0x6d0
> > [c00000005c62f1f0] [c0000000001cf8b0] dequeue_entities+0x150/0x600
> > [c00000005c62f2c0] [c0000000001d02a8] dequeue_task_fair+0x158/0x2e0
> > [c00000005c62f300] [c0000000001b5ea4] dequeue_task+0x64/0x200
> > [c00000005c62f380] [c0000000001cc950] detach_tasks+0x140/0x420
> > [c00000005c62f3f0] [c0000000001d6044] sched_balance_rq+0x214/0x7c0
> > [c00000005c62f550] [c0000000001d6830] sched_balance_newidle+0x240/0x630
> > [c00000005c62f640] [c0000000001d6d0c] pick_next_task_fair+0x7c/0x4a0
> > [c00000005c62f6d0] [c0000000001afc50] __pick_next_task+0x60/0x2d0
> > [c00000005c62f730] [c0000000010e8ce8] __schedule+0x198/0x840
> > [c00000005c62f810] [c0000000010e93d0] schedule+0x40/0x110
> > [c00000005c62f880] [c00000000064c574] pipe_read+0x424/0x6a0
> > [c00000005c62f960] [c00000000063a0fc] vfs_read+0x30c/0x3d0
> > [c00000005c62fa10] [c00000000063adf4] ksys_read+0x104/0x160
> > [c00000005c62fa60] [c000000000031678] system_call_exception+0x138/0x2d0
> > [c00000005c62fe50] [c00000000000cedc] system_call_vectored_common+0x15c/0x2ec
> 
> So that is a 'fun' one, I don't remember seeing that before. It says
> we're trying to dequeue a task that is not on the runqueue.
> 
> The big new thing this merge window -- I'm assuming v6.11 is good -- is
> DEQUEUE_DELAYED. Does this error go away if you flip that in
> kernel/sched/features.h ?
Yes, with the below diff. I didn't see any warnings or kernel panic
while running the workload

# git diff
diff --git a/kernel/sched/features.h b/kernel/sched/features.h
index 290874079f60..38bf8df813d1 100644
--- a/kernel/sched/features.h
+++ b/kernel/sched/features.h
@@ -46,7 +46,7 @@ SCHED_FEAT(CACHE_HOT_BUDDY, true)
  *
  * DELAY_ZERO clips the lag on dequeue (or wakeup) to 0.
  */
-SCHED_FEAT(DELAY_DEQUEUE, true)
+SCHED_FEAT(DELAY_DEQUEUE, false)
 SCHED_FEAT(DELAY_ZERO, true)

 /*



  reply	other threads:[~2024-09-30 19:05 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-09-26 12:42 sched/fair: Kernel panics in pick_next_entity Vishal Chourasia
2024-09-26 14:31 ` Luis Machado
2024-09-30 14:41 ` Peter Zijlstra
2024-09-30 19:05   ` Vishal Chourasia [this message]
2024-09-30 19:15     ` Vishal Chourasia
2024-10-01  8:30       ` Mike Galbraith
2024-10-01 14:08         ` Vishal Chourasia
2024-10-01 16:41           ` Mike Galbraith
2024-10-02  6:40             ` Mike Galbraith
2024-10-02  8:49         ` Peter Zijlstra
2024-10-02 18:22           ` Vishal Chourasia
2024-10-02 22:31         ` Benjamin Segall
2024-10-03  4:41           ` Mike Galbraith
2024-10-03  9:31             ` Mike Galbraith
2024-10-04  7:17               ` Vishal Chourasia

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Zvr2bLBEYyu1gtNz@linux.ibm.com \
    --to=vishalc@linux.ibm.com \
    --cc=bsegall@google.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luis.machado@arm.com \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=vincent.guittot@linaro.org \
    --cc=vschneid@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox