From: Vishal Chourasia <vishalc@linux.ibm.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: linux-kernel@vger.kernel.org, Ingo Molnar <mingo@redhat.com>,
Vincent Guittot <vincent.guittot@linaro.org>,
Juri Lelli <juri.lelli@redhat.com>,
Dietmar Eggemann <dietmar.eggemann@arm.com>,
Steven Rostedt <rostedt@goodmis.org>,
Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
Valentin Schneider <vschneid@redhat.com>,
luis.machado@arm.com
Subject: Re: sched/fair: Kernel panics in pick_next_entity
Date: Tue, 1 Oct 2024 00:35:16 +0530 [thread overview]
Message-ID: <Zvr2bLBEYyu1gtNz@linux.ibm.com> (raw)
In-Reply-To: <20240930144157.GH5594@noisy.programming.kicks-ass.net>
On Mon, Sep 30, 2024 at 04:41:57PM +0200, Peter Zijlstra wrote:
> On Thu, Sep 26, 2024 at 06:12:19PM +0530, Vishal Chourasia wrote:
> > I've noticed a kernel panic consistently occurring on the mainline v6.11
> > kernel (see attached dmesg log below).
> >
> > The panic occurs almost every time I build the Linux kernel from source.
> >
> > Steps to Reproduce:
> >
> > make clean
> > ./scripts/config -e LOCALVERSION_AUTO
> > ./scripts/config --set-str LOCALVERSION -master-with-print
> > make localmodconfig
> > make -j8 -s vmlinux modules
> >
> > >From my investigation, it seems that the function pick_eevdf() can return NULL.
> > Commit f12e1488 ("sched/fair: Prepare pick_next_task() for delayed dequeue")
> > introduces an access on the return value of pick_eevdf(). If 'se' was NULL,
> > it can lead to a null pointer dereference.
>
> Even before that commit we relied on that thing not being NULL, notably
> f12e1488^1 has:
>
> se = pick_next_entity(cfs_rq);
> cfs_rq = group_cfs_rq(se);
>
> Which will similarly explode when pick_eevdf() goes wobbly.
>
> > To determine why pick_eevdf() would return NULL, I added a few printk statements
> > Based on one of the printk logs in the shared dmesg log, it appears that if
> > pick_eevdf() is called for a 'cfs_rq' whose 'cfs_rq->curr' is NULL and there
> > are no eligible entities on that 'cfs_rq', it will return NULL.
>
> Right, that is not a valid state. Which seems to suggest something went
> sideways with the eligibility thing -- as Luis suggested.
>
> > I have not been able to think of a quick reproducer to trigger a panic
> > for this case. Hoping if someone can guide me on this.
> >
> > Note: The following dmesg log also contains a warning reported too. Panic
> > happens later.
> >
> > ------------[ cut here ]------------
> > !se->on_rq
> > WARNING: CPU: 1 PID: 92333 at kernel/sched/fair.c:705 update_entity_lag+0xcc/0xf0
> > Modules linked in: binfmt_misc bonding tls rfkill ibmveth pseries_rng vmx_crypto nd_pmem nd_btt dax_pmem loop nfnetlink xfs sd_mod papr_scm libnvdimm ibmvscsi scsi_transport_srp pseries_wdt dm_mirror dm_region_hash dm_log dm_mod fuse
> > CPU: 1 UID: 0 PID: 92333 Comm: genksyms Tainted: G W 6.11.0-master-with-print-10547-g684a64bf32b6-dirty #64
> > Tainted: [W]=WARN
> > Hardware name: IBM,9080-HEX POWER10 (architected) hv:phyp pSeries
> > NIP: c0000000001cdfcc LR: c0000000001cdfc8 CTR: 0000000000000000
> > REGS: c00000005c62ee50 TRAP: 0700 Tainted: G W (6.11.0-master-with-print-10547-g684a64bf32b6-dirty)
> > MSR: 8000000000029033 <SF,EE,ME,IR,DR,RI,LE> CR: 24002222 XER: 00000005
> > CFAR: c000000000156a10 IRQMASK: 1
> > GPR00: c0000000001cdfc8 c00000005c62f0f0 c000000001b57400 000000000000000a
> > GPR04: 00000000ffff7fff c00000005c62eee0 c00000005c62eed8 00000007fb050000
> > GPR08: 0000000000000027 0000000000000000 0000000000000000 c000000002758de0
> > GPR12: c000000002a18d88 c0000007fffef480 0000000000000000 0000000000000000
> > GPR16: c000000002c56d40 0000000000000000 c00000005c62f5b4 0000000000000000
> > GPR20: fffffffffffffdef 0000000000000000 0000000000000002 c000000003cd7300
> > GPR24: 0000000000000000 0000000000000008 c0000007fd1d3f80 0000000000000000
> > GPR28: 0000000000000001 0000000000000009 c0000007fd1d4080 c0000000656a0000
> > NIP [c0000000001cdfcc] update_entity_lag+0xcc/0xf0
> > LR [c0000000001cdfc8] update_entity_lag+0xc8/0xf0
> > Call Trace:
> > [c00000005c62f0f0] [c0000000001cdfc8] update_entity_lag+0xc8/0xf0 (unreliable)
> > [c00000005c62f160] [c0000000001cea80] dequeue_entity+0xb0/0x6d0
> > [c00000005c62f1f0] [c0000000001cf8b0] dequeue_entities+0x150/0x600
> > [c00000005c62f2c0] [c0000000001d02a8] dequeue_task_fair+0x158/0x2e0
> > [c00000005c62f300] [c0000000001b5ea4] dequeue_task+0x64/0x200
> > [c00000005c62f380] [c0000000001cc950] detach_tasks+0x140/0x420
> > [c00000005c62f3f0] [c0000000001d6044] sched_balance_rq+0x214/0x7c0
> > [c00000005c62f550] [c0000000001d6830] sched_balance_newidle+0x240/0x630
> > [c00000005c62f640] [c0000000001d6d0c] pick_next_task_fair+0x7c/0x4a0
> > [c00000005c62f6d0] [c0000000001afc50] __pick_next_task+0x60/0x2d0
> > [c00000005c62f730] [c0000000010e8ce8] __schedule+0x198/0x840
> > [c00000005c62f810] [c0000000010e93d0] schedule+0x40/0x110
> > [c00000005c62f880] [c00000000064c574] pipe_read+0x424/0x6a0
> > [c00000005c62f960] [c00000000063a0fc] vfs_read+0x30c/0x3d0
> > [c00000005c62fa10] [c00000000063adf4] ksys_read+0x104/0x160
> > [c00000005c62fa60] [c000000000031678] system_call_exception+0x138/0x2d0
> > [c00000005c62fe50] [c00000000000cedc] system_call_vectored_common+0x15c/0x2ec
>
> So that is a 'fun' one, I don't remember seeing that before. It says
> we're trying to dequeue a task that is not on the runqueue.
>
> The big new thing this merge window -- I'm assuming v6.11 is good -- is
> DEQUEUE_DELAYED. Does this error go away if you flip that in
> kernel/sched/features.h ?
Yes, with the below diff. I didn't see any warnings or kernel panic
while running the workload
# git diff
diff --git a/kernel/sched/features.h b/kernel/sched/features.h
index 290874079f60..38bf8df813d1 100644
--- a/kernel/sched/features.h
+++ b/kernel/sched/features.h
@@ -46,7 +46,7 @@ SCHED_FEAT(CACHE_HOT_BUDDY, true)
*
* DELAY_ZERO clips the lag on dequeue (or wakeup) to 0.
*/
-SCHED_FEAT(DELAY_DEQUEUE, true)
+SCHED_FEAT(DELAY_DEQUEUE, false)
SCHED_FEAT(DELAY_ZERO, true)
/*
next prev parent reply other threads:[~2024-09-30 19:05 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-09-26 12:42 sched/fair: Kernel panics in pick_next_entity Vishal Chourasia
2024-09-26 14:31 ` Luis Machado
2024-09-30 14:41 ` Peter Zijlstra
2024-09-30 19:05 ` Vishal Chourasia [this message]
2024-09-30 19:15 ` Vishal Chourasia
2024-10-01 8:30 ` Mike Galbraith
2024-10-01 14:08 ` Vishal Chourasia
2024-10-01 16:41 ` Mike Galbraith
2024-10-02 6:40 ` Mike Galbraith
2024-10-02 8:49 ` Peter Zijlstra
2024-10-02 18:22 ` Vishal Chourasia
2024-10-02 22:31 ` Benjamin Segall
2024-10-03 4:41 ` Mike Galbraith
2024-10-03 9:31 ` Mike Galbraith
2024-10-04 7:17 ` Vishal Chourasia
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Zvr2bLBEYyu1gtNz@linux.ibm.com \
--to=vishalc@linux.ibm.com \
--cc=bsegall@google.com \
--cc=dietmar.eggemann@arm.com \
--cc=juri.lelli@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=luis.machado@arm.com \
--cc=mgorman@suse.de \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=vincent.guittot@linaro.org \
--cc=vschneid@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.