linux-pm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Qemu-arm64: LTP: cfs_bandwidth01: Unable to handle kernel NULL pointer dereference at virtual address 0000000000000038
@ 2023-09-12  7:55 Naresh Kamboju
  2023-09-13  9:30 ` Cyril Hrubis
  2023-09-14 15:18 ` LTP: cfs_bandwidth01: Unable to handle kernel NULL pointer dereference Yong Wang
  0 siblings, 2 replies; 3+ messages in thread
From: Naresh Kamboju @ 2023-09-12  7:55 UTC (permalink / raw)
  To: open list, LTP List, Linux PM
  Cc: Alex Bennée, Anders Roxell, Arnd Bergmann, Vincent Guittot,
	Wei Gao, Peter Zijlstra, Martin Doucha, Cyril Hrubis

Following kernel crash noticed on Linux stable-rc 6.5.3-rc1 on qemu-arm64 while
running LTP sched tests cases.

This is not always reproducible.

Anyone have noticed LTP cfs_bandwidth01 causing a kernel crash on any of the
devices or qemu-* ?

I need to check similar crashes on other Linux trees and branches.

Boot log and test log:
---------------------
[    0.000000] Booting Linux on physical CPU 0x0000000000 [0x000f0510]
[    0.000000] Linux version 6.5.3-rc1 (tuxmake@tuxmake) (Debian clang
version 18.0.0 (++20230910112057+710b5a12324e-1~exp1~20230910112229.889),
Debian LLD 18.0.0) #1 SMP PREEMPT @1694441978
[    0.000000] KASLR enabled
[    0.000000] random: crng init done
[    0.000000] Machine model: linux,dummy-virt
...
running LTP sched tests
...
cfs_bandwidth01.c:129: TPASS: Workers exited
cfs_bandwidth01.c:117: TPASS: Scheduled bandwidth constrained workers
cfs_bandwidth01.c:54: TINFO: Set 'level2/cpu.max' = '5000 10000'
<1>[   74.455327] Unable to handle kernel NULL pointer dereference at
virtual address 0000000000000038
<1>[   74.456395] Mem abort info:
<1>[   74.456639]   ESR = 0x0000000097880004
<1>[   74.458273]   EC = 0x25: DABT (current EL), IL = 32 bits
<1>[   74.458859]   SET = 0, FnV = 0
<1>[   74.459495]   EA = 0, S1PTW = 0
<1>[   74.460171]   FSC = 0x04: level 0 translation fault
<1>[   74.460799] Data abort info:
<1>[   74.461388]   Access size = 4 byte(s)
<1>[   74.462068]   SSE = 0, SRT = 8
<1>[   74.462713]   SF = 0, AR = 0
<1>[   74.463257]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
<1>[   74.463996]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
<1>[   74.465120] user pgtable: 4k pages, 48-bit VAs, pgdp=00000001029d6000
<1>[   74.465818] [0000000000000038] pgd=0000000000000000, p4d=0000000000000000
<0>[   74.468416] Internal error: Oops: 0000000097880004 [#1] PREEMPT SMP
<4>[   74.469489] Modules linked in: fuse drm dm_mod ip_tables x_tables
<4>[   74.470964] CPU: 0 PID: 435 Comm: cfs_bandwidth01 Not tainted 6.5.3-rc1 #1
<4>[   74.471789] Hardware name: linux,dummy-virt (DT)
<4>[   74.473045] pstate: 634000c9 (nZCv daIF +PAN -UAO +TCO +DIT
-SSBS BTYPE=--)
<4>[   74.473785] pc : set_next_entity+0xc0/0x1f8
<4>[   74.475461] lr : pick_next_task_fair+0x204/0x3b8
<4>[   74.476989] sp : ffff8000807eb870
<4>[   74.477346] x29: ffff8000807eb870 x28: ffff0000c4e3b750 x27:
ffffcb93e8e19008
<4>[   74.478392] x26: ffff0000c4e3b0c0 x25: ffffcb93e8ab4828 x24:
ffff0000c0354a00
<4>[   74.479263] x23: ffff8000807eb900 x22: 0000000000000000 x21:
ffff0000ff5b1300
<4>[   74.480401] x20: ffff0000ff5b1300 x19: 0000000000000000 x18:
0000000000000000
<4>[   74.481417] x17: 000000000000ba7e x16: 0000000000000606 x15:
000000000117d17a
<4>[   74.482733] x14: 0000000000000000 x13: 0000000f0f4bc800 x12:
00000000000002b0
<4>[   74.484181] x11: 0000000f0f4bc800 x10: 0000000cf6ad6bd1 x9 :
ffffcb93e6af8e4c
<4>[   74.485229] x8 : 0000000000000000 x7 : ffffcb93e8a3ccac x6 :
0000000000000003
<4>[   74.486131] x5 : 000000008040002b x4 : 0000ffffbef0c000 x3 :
ffff0000ff5b1200
<4>[   74.487012] x2 : ffff0000c39efc00 x1 : 0000000000000000 x0 :
ffff0000ff5b1300
<4>[   74.488236] Call trace:
<4>[   74.488608]  set_next_entity+0xc0/0x1f8
<4>[   74.489280]  pick_next_task_fair+0x204/0x3b8
<4>[   74.489987]  __schedule+0x1e0/0x9c8
<4>[   74.490903]  schedule+0x134/0x1b8
<4>[   74.491632]  schedule_preempt_disabled+0x90/0x108
<4>[   74.492392]  rwsem_down_write_slowpath+0x288/0x6f0
<4>[   74.493056]  down_write+0x48/0xb0
<4>[   74.493606]  unlink_anon_vmas+0x148/0x1b0
<4>[   74.494222]  free_pgtables+0x10c/0x200
<4>[   74.494800]  exit_mmap+0x174/0x3c0
<4>[   74.495177]  __mmput+0x48/0x150
<4>[   74.495761]  mmput+0x34/0x70
<4>[   74.496058]  exit_mm+0xbc/0x148
<4>[   74.497651]  do_exit+0x22c/0x910
<4>[   74.498212]  do_group_exit+0xa4/0xb0
<4>[   74.498870]  __arm64_sys_exit_group+0x24/0x30
<4>[   74.499484]  invoke_syscall+0x4c/0x120
<4>[   74.499834]  el0_svc_common+0xd0/0x110
<4>[   74.500196]  do_el0_svc+0x3c/0xb8
<4>[   74.500475]  el0_svc+0x30/0x90
<4>[   74.500746]  el0t_64_sync_handler+0x84/0x100
<4>[   74.501309]  el0t_64_sync+0x190/0x198
<0>[   74.502156] Code: f900293f f9403908 b5ffff48 17ffffde (b9403a68)
<4>[   74.503735] ---[ end trace 0000000000000000 ]---
<6>[   74.504727] note: cfs_bandwidth01[435] exited with irqs disabled

Links:
-----
  - https://tuxapi.tuxsuite.com/v1/groups/linaro/projects/lkft/tests/2VFpDOMEgzroNyiP9SSlxRxHsMH
  - https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-6.5.y/build/v6.5.2-740-g7bfd1316ceae/testrun/19901770/suite/log-parser-test/tests/
  - https://storage.tuxsuite.com/public/linaro/lkft/builds/2VFpB1ieNZSp5zh0joVGtoMn7RG/

Steps to reproduce:
----------------
# To install tuxrun to your home directory at ~/.local/bin:
# pip3 install -U --user tuxrun==0.49.2
#
# Or install a deb/rpm depending on the running distribution
# See https://tuxmake.org/install-deb/ or
# https://tuxmake.org/install-rpm/
#
# See https://tuxrun.org/ for complete documentation.
#

tuxrun --runtime podman --device qemu-arm64 --boot-args rw --kernel
https://storage.tuxsuite.com/public/linaro/lkft/builds/2VFpB1ieNZSp5zh0joVGtoMn7RG/Image.gz
--modules https://storage.tuxsuite.com/public/linaro/lkft/builds/2VFpB1ieNZSp5zh0joVGtoMn7RG/modules.tar.xz
--rootfs https://storage.tuxboot.com/debian/bookworm/arm64/rootfs.ext4.xz
--parameters SKIPFILE=skipfile-lkft.yaml --parameters SHARD_NUMBER=4
--parameters SHARD_INDEX=2 --image
docker.io/linaro/tuxrun-dispatcher:v0.49.2 --tests ltp-sched
--timeouts boot=30 ltp-sched=30 --overlay
https://storage.tuxboot.com/overlays/debian/bookworm/arm64/ltp/20230516/ltp.tar.xz


--
Linaro LKFT
https://lkft.linaro.org

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Qemu-arm64: LTP: cfs_bandwidth01: Unable to handle kernel NULL pointer dereference at virtual address 0000000000000038
  2023-09-12  7:55 Qemu-arm64: LTP: cfs_bandwidth01: Unable to handle kernel NULL pointer dereference at virtual address 0000000000000038 Naresh Kamboju
@ 2023-09-13  9:30 ` Cyril Hrubis
  2023-09-14 15:18 ` LTP: cfs_bandwidth01: Unable to handle kernel NULL pointer dereference Yong Wang
  1 sibling, 0 replies; 3+ messages in thread
From: Cyril Hrubis @ 2023-09-13  9:30 UTC (permalink / raw)
  To: Naresh Kamboju
  Cc: open list, LTP List, Linux PM, Alex Bennée, Anders Roxell,
	Arnd Bergmann, Vincent Guittot, Wei Gao, Peter Zijlstra,
	Martin Doucha

Hi!
> Following kernel crash noticed on Linux stable-rc 6.5.3-rc1 on qemu-arm64 while
> running LTP sched tests cases.
> 
> This is not always reproducible.

What the test does is to create three levels of cgroups, sets CPU
quotas for them, runs bussy loop proceses in the groups and changes the
quotas during the time the bussy processes runs.

And the test is regression test for quite a few commits:

commit 39f23ce07b9355d05a64ae303ce20d1c4b92b957
Author: Vincent Guittot <vincent.guittot@linaro.org>
Date:   Wed May 13 15:55:28 2020 +0200

    sched/fair: Fix unthrottle_cfs_rq() for leaf_cfs_rq list


commit b34cb07dde7c2346dec73d053ce926aeaa087303
Author: Phil Auld <pauld@redhat.com>
Date:   Tue May 12 09:52:22 2020 -0400

    sched/fair: Fix enqueue_task_fair() warning some more

commit fe61468b2cbc2b7ce5f8d3bf32ae5001d4c434e9
Author: Vincent Guittot <vincent.guittot@linaro.org>
Date:   Fri Mar 6 14:52:57 2020 +0100

    sched/fair: Fix enqueue_task_fair warning

commit 5ab297bab984310267734dfbcc8104566658ebef
Author: Vincent Guittot <vincent.guittot@linaro.org>
Date:   Fri Mar 6 09:42:08 2020 +0100

    sched/fair: Fix reordering of enqueue/dequeue_task_fair()

commit 6d4d22468dae3d8757af9f8b81b848a76ef4409d
Author: Vincent Guittot <vincent.guittot@linaro.org>
Date:   Mon Feb 24 09:52:14 2020 +0000

    sched/fair: Reorder enqueue/dequeue_task_fair path

commit fdaba61ef8a268d4136d0a113d153f7a89eb9984
Author: Rik van Riel <riel@surriel.com>
Date:   Mon Jun 21 19:43:30 2021 +0200

    sched/fair: Ensure that the CFS parent is added after unthrottling


Unless this is a random corruption we should look closer at scheduller
changes.

-- 
Cyril Hrubis
chrubis@suse.cz

^ permalink raw reply	[flat|nested] 3+ messages in thread

* LTP: cfs_bandwidth01: Unable to handle kernel NULL pointer dereference
  2023-09-12  7:55 Qemu-arm64: LTP: cfs_bandwidth01: Unable to handle kernel NULL pointer dereference at virtual address 0000000000000038 Naresh Kamboju
  2023-09-13  9:30 ` Cyril Hrubis
@ 2023-09-14 15:18 ` Yong Wang
  1 sibling, 0 replies; 3+ messages in thread
From: Yong Wang @ 2023-09-14 15:18 UTC (permalink / raw)
  To: chrubis, naresh.kamboju
  Cc: alex.bennee, anders.roxell, arnd, linux-kernel, linux-pm, ltp,
	mdoucha, peterz, vincent.guittot, wegao, wang.yong12, yang.yang29,
	ran.xiaokai

Hello!
>Following kernel crash noticed on Linux stable-rc 6.5.3-rc1 on qemu-arm64 while
>running LTP sched tests cases.
>
>This is not always reproducible.
I also encountered this problem on linux 5.10 on arm64 environment.
The prompt information is as follows:
[ 2893.003795] ================================================================== 
[ 2893.003822] BUG: KASAN: null-ptr-deref in pick_next_task_fair+0x130/0x4e0 
[ 2893.003880] Read of size 8 at addr 0000000000000080 by task ksoftirqd/0/12 
[ 2893.003901]  
[ 2893.003914] CPU: 0 PID: 12 Comm: ksoftirqd/0 Tainted: P           O      5.10.59-rt52#1 
[ 2893.003959] Call trace: 
[ 2893.003968]  dump_backtrace+0x0/0x2e8 
[ 2893.004009]  show_stack+0x18/0x28 
[ 2893.004032]  dump_stack+0x104/0x174 
[ 2893.004067]  kasan_report+0x1d0/0x258 
[ 2893.004098]  __asan_load8+0x94/0xd0 
[ 2893.004126]  pick_next_task_fair+0x130/0x4e0 
[ 2893.004164]  __schedule+0x220/0xbd0 
[ 2893.004192]  schedule+0xec/0x1a0 
[ 2893.004216]  smpboot_thread_fn+0x124/0x548 
[ 2893.004246]  kthread+0x24c/0x278 
[ 2893.004277]  ret_from_fork+0x10/0x34 
[ 2893.004306] ================================================================== 
[ 2893.004325] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000080 
[ 2893.152267] Mem abort info: 
[ 2893.152639]   ESR = 0x96000004 
[ 2893.153045]   EC = 0x25: DABT (current EL), IL = 32 bits 
[ 2893.153739]   SET = 0, FnV = 0 
[ 2893.154143]   EA = 0, S1PTW = 0 
[ 2893.154560] Data abort info: 
[ 2893.154940]   ISV = 0, ISS = 0x00000004 
[ 2893.155443]   CM = 0, WnR = 0 
[ 2893.155838] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000188edb000 

The source code where the problem occurs corresponds to:
  se = pick_next_entity(cfs_rq, curr);		
  cfs_rq = group_cfs_rq(se); //se is NULL!

It is found that pick_next_entity returns null, so null-ptr-dere appears when accessing the members of se later.
But it is not clear under what circumstances pick_next_entity returns null.

In addition, in my environment, the following operations often recur:
  stress-ng -c 8 --cpu-load 100 --sched fifo --sched-prio 1 --cpu-method pi -t 900 &
  runltp -s cfs_bandwidth01

Hope it helps to solve the problem.
Thanks.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2023-09-14 15:20 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-09-12  7:55 Qemu-arm64: LTP: cfs_bandwidth01: Unable to handle kernel NULL pointer dereference at virtual address 0000000000000038 Naresh Kamboju
2023-09-13  9:30 ` Cyril Hrubis
2023-09-14 15:18 ` LTP: cfs_bandwidth01: Unable to handle kernel NULL pointer dereference Yong Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).