Linux cgroups development
 help / color / mirror / Atom feed
* cgroup null pointer dereference
@ 2025-04-23 17:30 Kamaljit Singh
  2025-04-23 21:26 ` Waiman Long
  0 siblings, 1 reply; 13+ messages in thread
From: Kamaljit Singh @ 2025-04-23 17:30 UTC (permalink / raw)
  To: cgroups@vger.kernel.org
  Cc: linux-nvme@lists.infradead.org, linux-kernel@vger.kernel.org

Hello,

While running IOs to an nvme fabrics target we're hitting this null pointer which causes 
CPU hard lockups and NMI. Before the lockups, the Medusa IOs ran successfully for ~23 hours.

I did not find any panics listing nvme or block driver calls.

RIP: 0010:cgroup_rstat_flush+0x1d0/0x750
points to rstat.c, cgroup_rstat_push_children(), line 162 under second while() to the following code.

160                 /* updated_next is parent cgroup terminated */
161                 while (child != parent) {
162                         child->rstat_flush_next = head;
163                         head = child;
164                         crstatc = cgroup_rstat_cpu(child, cpu);
165                         grandchild = crstatc->updated_children;

In my test env I've added a null check to 'child' and re-running the long-term test.
I'm wondering if this patch is sufficient to address any underlying issue or is just a band-aid.
Please share any known patches or suggestions.
             -          while (child != parent) {
             +         while (child && child != parent) {

Reference: git://git.infradead.org/nvme.git tags/nvme-6.15-2025-04-10

===========================
2025-04-12 18:40:15.554] [Sat Apr 12 18:40:12 2025] BUG: kernel NULL pointer dereference, address: 00000000000003d8
[2025-04-12 18:40:15.554] [Sat Apr 12 18:40:12 2025] #PF: supervisor read access in kernel mode
[2025-04-12 18:40:15.554] [Sat Apr 12 18:40:12 2025] #PF: error_code(0x0000) - not-present page
[2025-04-12 18:40:15.554] [Sat Apr 12 18:40:12 2025] PGD 0 P4D 0
[2025-04-12 18:40:15.554] [Sat Apr 12 18:40:12 2025] Oops: Oops: 0000 [#1] SMP NOPTI
[2025-04-12 18:40:15.554] [Sat Apr 12 18:40:12 2025] CPU: 19 UID: 0 PID: 349623 Comm: kworker/u1029:0 Tainted: G            E       6.14.0+ #1 PREEMPT(voluntary)
[2025-04-12 18:40:15.554] [Sat Apr 12 18:40:12 2025] Tainted: [E]=UNSIGNED_MODULE
[2025-04-12 18:40:15.554] [Sat Apr 12 18:40:12 2025] Hardware name: Supermicro AS -1124US-TNRP/H12DSU-iN, BIOS 1.2 08/10/2020
[2025-04-12 18:40:15.554] [Sat Apr 12 18:40:12 2025] Workqueue: events_unbound flush_memcg_stats_dwork
[2025-04-12 18:40:15.554] [Sat Apr 12 18:40:12 2025] RIP: 0010:cgroup_rstat_flush+0x1d0/0x750
[2025-04-12 18:40:15.554] [Sat Apr 12 18:40:12 2025] Code: 0f 85 90 00 00 00 48 85 d2 0f 84 95 00 00 00 4c 8b b2 c0 00 00 00 4c 8b 82 00 04 00 00 49 39 d6 75 08 e9 d8 03 00 00 48 89 f2 <48> 8b 82 d8 03 00 00 4c 89 ba 00 04 00 00 49 81 fd 00 20 00 00 0f
[2025-04-12 18:40:15.554] [Sat Apr 12 18:40:12 2025] RSP: 0018:ffffd08eb9a8bd90 EFLAGS: 00010086
[2025-04-12 18:40:15.554] [Sat Apr 12 18:40:12 2025] RAX: ffff8eefcb7c9760 RBX: 0000000000000013 RCX: ffff8ef0dd42c000
[2025-04-12 18:40:15.554] [Sat Apr 12 18:40:12 2025] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[2025-04-12 18:40:15.554] [Sat Apr 12 18:40:12 2025] RBP: ffffd08eb9a8be00 R08: 0000000000000000 R09: 0000000000000000
[2025-04-12 18:40:15.554] [Sat Apr 12 18:40:12 2025] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff89bfd200
[2025-04-12 18:40:15.554] [Sat Apr 12 18:40:12 2025] R13: 0000000000000013 R14: ffffffff89bfd200 R15: ffff8eb1979db000
[2025-04-12 18:40:15.554] [Sat Apr 12 18:40:12 2025] FS:  0000000000000000(0000) GS:ffff8ef041434000(0000) knlGS:0000000000000000
[2025-04-12 18:40:15.554] [Sat Apr 12 18:40:12 2025] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[2025-04-12 18:40:15.554] [Sat Apr 12 18:40:12 2025] CR2: 00000000000003d8 CR3: 000000113b642000 CR4: 0000000000350ef0
[2025-04-12 18:40:15.554] [Sat Apr 12 18:40:12 2025] Call Trace:
[2025-04-12 18:40:15.554] [Sat Apr 12 18:40:12 2025]  <TASK>
[2025-04-12 18:40:15.554] [Sat Apr 12 18:40:12 2025]  __mem_cgroup_flush_stats+0xf6/0x100
[2025-04-12 18:40:15.554] [Sat Apr 12 18:40:12 2025]  flush_memcg_stats_dwork+0x1a/0x50
[2025-04-12 18:40:15.554] [Sat Apr 12 18:40:12 2025]  process_one_work+0x191/0x3e0
[2025-04-12 18:40:15.554] [Sat Apr 12 18:40:12 2025]  worker_thread+0x2e3/0x420
[2025-04-12 18:40:15.554] [Sat Apr 12 18:40:12 2025]  ? srso_return_thunk+0x5/0x5f
[2025-04-12 18:40:15.554] [Sat Apr 12 18:40:12 2025]  ? __pfx_worker_thread+0x10/0x10
[2025-04-12 18:40:15.554] [Sat Apr 12 18:40:12 2025]  kthread+0x10d/0x230
[2025-04-12 18:40:15.554] [Sat Apr 12 18:40:12 2025]  ? __pfx_kthread+0x10/0x10
[2025-04-12 18:40:15.554] [Sat Apr 12 18:40:12 2025]  ret_from_fork+0x47/0x70
[2025-04-12 18:40:15.554] [Sat Apr 12 18:40:12 2025]  ? __pfx_kthread+0x10/0x10
[2025-04-12 18:40:15.554] [Sat Apr 12 18:40:12 2025]  ret_from_fork_asm+0x1a/0x30
[2025-04-12 18:40:15.554] [Sat Apr 12 18:40:12 2025]  </TASK>

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2025-04-25 17:20 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-23 17:30 cgroup null pointer dereference Kamaljit Singh
2025-04-23 21:26 ` Waiman Long
2025-04-25  0:53   ` Kamaljit Singh
2025-04-25  1:33     ` Waiman Long
2025-04-25  1:43       ` Waiman Long
2025-04-25  1:49       ` Waiman Long
2025-04-25  2:22         ` Kamaljit Singh
2025-04-25 14:54           ` hch
2025-04-25 15:04             ` Waiman Long
2025-04-25 15:11               ` hch
2025-04-25 15:22                 ` Waiman Long
2025-04-25 15:26                   ` hch
2025-04-25 17:20                     ` Kamaljit Singh

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox