All of lore.kernel.org
 help / color / mirror / Atom feed
From: kernel test robot <oliver.sang@intel.com>
To: <kaiyang2@cs.cmu.edu>
Cc: <oe-lkp@lists.linux.dev>, <lkp@intel.com>,
	<linux-kernel@vger.kernel.org>, <linux-mm@kvack.org>,
	<cgroups@vger.kernel.org>, <roman.gushchin@linux.dev>,
	<shakeel.butt@linux.dev>, <muchun.song@linux.dev>,
	<akpm@linux-foundation.org>, <mhocko@kernel.org>,
	<nehagholkar@meta.com>, <abhishekd@meta.com>,
	<hannes@cmpxchg.org>, <weixugc@google.com>, <rientjes@google.com>,
	Kaiyang Zhao <kaiyang2@cs.cmu.edu>, <oliver.sang@intel.com>
Subject: Re: [RFC PATCH 2/4] calculate memory.low for the local node and track its usage
Date: Sun, 22 Sep 2024 16:39:35 +0800	[thread overview]
Message-ID: <202409221625.1e974ac-oliver.sang@intel.com> (raw)
In-Reply-To: <20240920221202.1734227-3-kaiyang2@cs.cmu.edu>



Hello,

kernel test robot noticed "BUG:kernel_NULL_pointer_dereference,address" on:

commit: 6f4c005a5f8b8ff1ce674731545b302af5f28f3f ("[RFC PATCH 2/4] calculate memory.low for the local node and track its usage")
url: https://github.com/intel-lab-lkp/linux/commits/kaiyang2-cs-cmu-edu/Add-get_cgroup_local_usage-for-estimating-the-top-tier-memory-usage/20240921-061404
base: https://git.kernel.org/cgit/linux/kernel/git/akpm/mm.git mm-everything
patch link: https://lore.kernel.org/all/20240920221202.1734227-3-kaiyang2@cs.cmu.edu/
patch subject: [RFC PATCH 2/4] calculate memory.low for the local node and track its usage

in testcase: boot

compiler: gcc-12
test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G

(please refer to attached dmesg/kmsg for entire log/backtrace)


+---------------------------------------------+------------+------------+
|                                             | 0af685cc17 | 6f4c005a5f |
+---------------------------------------------+------------+------------+
| boot_successes                              | 12         | 0          |
| boot_failures                               | 0          | 12         |
| BUG:kernel_NULL_pointer_dereference,address | 0          | 12         |
| Oops                                        | 0          | 12         |
| RIP:si_meminfo_node                         | 0          | 12         |
| Kernel_panic-not_syncing:Fatal_exception    | 0          | 12         |
+---------------------------------------------+------------+------------+


If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202409221625.1e974ac-oliver.sang@intel.com


[   14.204830][    T1] BUG: kernel NULL pointer dereference, address: 0000000000000090
[   14.206729][    T1] #PF: supervisor read access in kernel mode
[   14.208090][    T1] #PF: error_code(0x0000) - not-present page
[   14.209393][    T1] PGD 0 P4D 0
[   14.210212][    T1] Oops: Oops: 0000 [#1] SMP PTI
[   14.211269][    T1] CPU: 1 UID: 0 PID: 1 Comm: systemd Not tainted 6.11.0-rc6-00570-g6f4c005a5f8b #1
[   14.213284][    T1] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
[ 14.215290][ T1] RIP: 0010:si_meminfo_node (arch/x86/include/asm/atomic64_64.h:15 (discriminator 3) include/linux/atomic/atomic-arch-fallback.h:2583 (discriminator 3) include/linux/atomic/atomic-long.h:38 (discriminator 3) include/linux/atomic/atomic-instrumented.h:3189 (discriminator 3) include/linux/mmzone.h:1042 (discriminator 3) mm/show_mem.c:98 (discriminator 3)) 
[ 14.216523][ T1] Code: 90 90 66 0f 1f 00 0f 1f 44 00 00 48 63 c6 55 31 d2 4c 8b 04 c5 c0 a7 fb 8c 53 48 89 c5 48 89 fb 4c 89 c0 49 8d b8 00 1e 00 00 <48> 8b 88 90 00 00 00 48 05 00 06 00 00 48 01 ca 48 39 f8 75 eb 48
All code
========
   0:	90                   	nop
   1:	90                   	nop
   2:	66 0f 1f 00          	nopw   (%rax)
   6:	0f 1f 44 00 00       	nopl   0x0(%rax,%rax,1)
   b:	48 63 c6             	movslq %esi,%rax
   e:	55                   	push   %rbp
   f:	31 d2                	xor    %edx,%edx
  11:	4c 8b 04 c5 c0 a7 fb 	mov    -0x73045840(,%rax,8),%r8
  18:	8c 
  19:	53                   	push   %rbx
  1a:	48 89 c5             	mov    %rax,%rbp
  1d:	48 89 fb             	mov    %rdi,%rbx
  20:	4c 89 c0             	mov    %r8,%rax
  23:	49 8d b8 00 1e 00 00 	lea    0x1e00(%r8),%rdi
  2a:*	48 8b 88 90 00 00 00 	mov    0x90(%rax),%rcx		<-- trapping instruction
  31:	48 05 00 06 00 00    	add    $0x600,%rax
  37:	48 01 ca             	add    %rcx,%rdx
  3a:	48 39 f8             	cmp    %rdi,%rax
  3d:	75 eb                	jne    0x2a
  3f:	48                   	rex.W

Code starting with the faulting instruction
===========================================
   0:	48 8b 88 90 00 00 00 	mov    0x90(%rax),%rcx
   7:	48 05 00 06 00 00    	add    $0x600,%rax
   d:	48 01 ca             	add    %rcx,%rdx
  10:	48 39 f8             	cmp    %rdi,%rax
  13:	75 eb                	jne    0x0
  15:	48                   	rex.W
[   14.220364][    T1] RSP: 0018:ffffb14b40013d68 EFLAGS: 00010246
[   14.221717][    T1] RAX: 0000000000000000 RBX: ffffb14b40013d88 RCX: 00000000003a19a2
[   14.223496][    T1] RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000001e00
[   14.225170][    T1] RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000008
[   14.226964][    T1] R10: 0000000000000008 R11: 0fffffffffffffff R12: ffffb14b40013d88
[   14.228774][    T1] R13: 00000000003e7ac3 R14: ffffb14b40013e88 R15: ffff98ab0434f7a0
[   14.230421][    T1] FS:  00007f9569ae9940(0000) GS:ffff98adefd00000(0000) knlGS:0000000000000000
[   14.234569][    T1] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   14.235900][    T1] CR2: 0000000000000090 CR3: 0000000100072000 CR4: 00000000000006f0
[   14.237620][    T1] Call Trace:
[   14.238502][    T1]  <TASK>
[ 14.239254][ T1] ? __die (arch/x86/kernel/dumpstack.c:421 arch/x86/kernel/dumpstack.c:434) 
[ 14.240189][ T1] ? page_fault_oops (arch/x86/mm/fault.c:715) 
[ 14.241254][ T1] ? exc_page_fault (arch/x86/include/asm/irqflags.h:37 arch/x86/include/asm/irqflags.h:92 arch/x86/mm/fault.c:1489 arch/x86/mm/fault.c:1539) 
[ 14.242297][ T1] ? asm_exc_page_fault (arch/x86/include/asm/idtentry.h:623) 
[ 14.243313][ T1] ? si_meminfo_node (arch/x86/include/asm/atomic64_64.h:15 (discriminator 3) include/linux/atomic/atomic-arch-fallback.h:2583 (discriminator 3) include/linux/atomic/atomic-long.h:38 (discriminator 3) include/linux/atomic/atomic-instrumented.h:3189 (discriminator 3) include/linux/mmzone.h:1042 (discriminator 3) mm/show_mem.c:98 (discriminator 3)) 
[ 14.244443][ T1] ? si_meminfo_node (mm/show_mem.c:114) 
[ 14.245460][ T1] memory_low_write (mm/memcontrol.c:4088) 
[ 14.246547][ T1] kernfs_fop_write_iter (fs/kernfs/file.c:338) 
[ 14.247804][ T1] vfs_write (fs/read_write.c:497 fs/read_write.c:590) 
[ 14.248830][ T1] ksys_write (fs/read_write.c:643) 
[ 14.249783][ T1] do_syscall_64 (arch/x86/entry/common.c:52 arch/x86/entry/common.c:83) 
[ 14.250800][ T1] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130) 
[   14.252260][    T1] RIP: 0033:0x7f956a64b240
[ 14.253276][ T1] Code: 40 00 48 8b 15 c1 9b 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 80 3d a1 23 0e 00 00 74 17 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 48 83 ec 28 48 89
All code
========
   0:	40 00 48 8b          	add    %cl,-0x75(%rax)
   4:	15 c1 9b 0d 00       	adc    $0xd9bc1,%eax
   9:	f7 d8                	neg    %eax
   b:	64 89 02             	mov    %eax,%fs:(%rdx)
   e:	48 c7 c0 ff ff ff ff 	mov    $0xffffffffffffffff,%rax
  15:	eb b7                	jmp    0xffffffffffffffce
  17:	0f 1f 00             	nopl   (%rax)
  1a:	80 3d a1 23 0e 00 00 	cmpb   $0x0,0xe23a1(%rip)        # 0xe23c2
  21:	74 17                	je     0x3a
  23:	b8 01 00 00 00       	mov    $0x1,%eax
  28:	0f 05                	syscall 
  2a:*	48 3d 00 f0 ff ff    	cmp    $0xfffffffffffff000,%rax		<-- trapping instruction
  30:	77 58                	ja     0x8a
  32:	c3                   	retq   
  33:	0f 1f 80 00 00 00 00 	nopl   0x0(%rax)
  3a:	48 83 ec 28          	sub    $0x28,%rsp
  3e:	48                   	rex.W
  3f:	89                   	.byte 0x89

Code starting with the faulting instruction
===========================================
   0:	48 3d 00 f0 ff ff    	cmp    $0xfffffffffffff000,%rax
   6:	77 58                	ja     0x60
   8:	c3                   	retq   
   9:	0f 1f 80 00 00 00 00 	nopl   0x0(%rax)
  10:	48 83 ec 28          	sub    $0x28,%rsp
  14:	48                   	rex.W
  15:	89                   	.byte 0x89
[   14.257195][    T1] RSP: 002b:00007ffcc66594e8 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
[   14.259009][    T1] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f956a64b240
[   14.260848][    T1] RDX: 0000000000000002 RSI: 00007ffcc6659740 RDI: 000000000000001b
[   14.262500][    T1] RBP: 00007ffcc6659740 R08: 0000000000000000 R09: 0000000000000001
[   14.264147][    T1] R10: 00007f956a6c4820 R11: 0000000000000202 R12: 0000000000000002
[   14.265934][    T1] R13: 000055fd63872c10 R14: 0000000000000002 R15: 00007f956a7219e0
[   14.267589][    T1]  </TASK>
[   14.268340][    T1] Modules linked in: ip_tables
[   14.269410][    T1] CR2: 0000000000000090
[   14.270478][    T1] ---[ end trace 0000000000000000 ]---
[ 14.271717][ T1] RIP: 0010:si_meminfo_node (arch/x86/include/asm/atomic64_64.h:15 (discriminator 3) include/linux/atomic/atomic-arch-fallback.h:2583 (discriminator 3) include/linux/atomic/atomic-long.h:38 (discriminator 3) include/linux/atomic/atomic-instrumented.h:3189 (discriminator 3) include/linux/mmzone.h:1042 (discriminator 3) mm/show_mem.c:98 (discriminator 3)) 
[ 14.272874][ T1] Code: 90 90 66 0f 1f 00 0f 1f 44 00 00 48 63 c6 55 31 d2 4c 8b 04 c5 c0 a7 fb 8c 53 48 89 c5 48 89 fb 4c 89 c0 49 8d b8 00 1e 00 00 <48> 8b 88 90 00 00 00 48 05 00 06 00 00 48 01 ca 48 39 f8 75 eb 48
All code
========
   0:	90                   	nop
   1:	90                   	nop
   2:	66 0f 1f 00          	nopw   (%rax)
   6:	0f 1f 44 00 00       	nopl   0x0(%rax,%rax,1)
   b:	48 63 c6             	movslq %esi,%rax
   e:	55                   	push   %rbp
   f:	31 d2                	xor    %edx,%edx
  11:	4c 8b 04 c5 c0 a7 fb 	mov    -0x73045840(,%rax,8),%r8
  18:	8c 
  19:	53                   	push   %rbx
  1a:	48 89 c5             	mov    %rax,%rbp
  1d:	48 89 fb             	mov    %rdi,%rbx
  20:	4c 89 c0             	mov    %r8,%rax
  23:	49 8d b8 00 1e 00 00 	lea    0x1e00(%r8),%rdi
  2a:*	48 8b 88 90 00 00 00 	mov    0x90(%rax),%rcx		<-- trapping instruction
  31:	48 05 00 06 00 00    	add    $0x600,%rax
  37:	48 01 ca             	add    %rcx,%rdx
  3a:	48 39 f8             	cmp    %rdi,%rax
  3d:	75 eb                	jne    0x2a
  3f:	48                   	rex.W

Code starting with the faulting instruction
===========================================
   0:	48 8b 88 90 00 00 00 	mov    0x90(%rax),%rcx
   7:	48 05 00 06 00 00    	add    $0x600,%rax
   d:	48 01 ca             	add    %rcx,%rdx
  10:	48 39 f8             	cmp    %rdi,%rax
  13:	75 eb                	jne    0x0
  15:	48                   	rex.W


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20240922/202409221625.1e974ac-oliver.sang@intel.com



-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


  parent reply	other threads:[~2024-09-22  8:39 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-09-20 22:11 [RFC PATCH 0/4] memory tiering fairness by per-cgroup control of promotion and demotion kaiyang2
2024-09-20 22:11 ` [RFC PATCH 1/4] Add get_cgroup_local_usage for estimating the top-tier memory usage kaiyang2
2024-09-20 22:11 ` [RFC PATCH 2/4] calculate memory.low for the local node and track its usage kaiyang2
2024-09-21 23:18   ` kernel test robot
2024-09-22  8:39   ` kernel test robot [this message]
2024-10-15 22:05   ` Gregory Price
2024-09-20 22:11 ` [RFC PATCH 3/4] use memory.low local node protection for local node reclaim kaiyang2
2024-09-22  0:51   ` kernel test robot
2024-09-22 16:31   ` kernel test robot
2024-10-15 21:52   ` Gregory Price
2024-09-20 22:11 ` [RFC PATCH 4/4] reduce NUMA balancing scan size of cgroups over their local memory.low kaiyang2
2024-10-11 20:51 ` [RFC PATCH 0/4] memory tiering fairness by per-cgroup control of promotion and demotion Kaiyang Zhao
2024-11-08 19:01 ` kaiyang2

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=202409221625.1e974ac-oliver.sang@intel.com \
    --to=oliver.sang@intel.com \
    --cc=abhishekd@meta.com \
    --cc=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=kaiyang2@cs.cmu.edu \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lkp@intel.com \
    --cc=mhocko@kernel.org \
    --cc=muchun.song@linux.dev \
    --cc=nehagholkar@meta.com \
    --cc=oe-lkp@lists.linux.dev \
    --cc=rientjes@google.com \
    --cc=roman.gushchin@linux.dev \
    --cc=shakeel.butt@linux.dev \
    --cc=weixugc@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.