* Re: Possible memory leak on nfsd
2024-11-27 2:35 Possible memory leak on nfsd Chen Chen via Bugspray Bot
@ 2024-11-27 2:35 ` Chen Chen via Bugspray Bot
2024-11-27 2:35 ` Chen Chen via Bugspray Bot
` (16 subsequent siblings)
17 siblings, 0 replies; 21+ messages in thread
From: Chen Chen via Bugspray Bot @ 2024-11-27 2:35 UTC (permalink / raw)
To: linux-nfs, trondmy, jlayton, anna, cel
Chen Chen added an attachment on Kernel.org Bugzilla:
Created attachment 307284
lsmod
File: lsmod (text/plain)
Size: 4.96 KiB
Link: https://bugzilla.kernel.org/attachment.cgi?id=307284
---
lsmod
You can reply to this message to join the discussion.
--
Deet-doot-dot, I am a bot.
Kernel.org Bugzilla (bugspray 0.1-dev)
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: Possible memory leak on nfsd
2024-11-27 2:35 Possible memory leak on nfsd Chen Chen via Bugspray Bot
2024-11-27 2:35 ` Chen Chen via Bugspray Bot
@ 2024-11-27 2:35 ` Chen Chen via Bugspray Bot
2024-11-27 2:35 ` Chen Chen via Bugspray Bot
` (15 subsequent siblings)
17 siblings, 0 replies; 21+ messages in thread
From: Chen Chen via Bugspray Bot @ 2024-11-27 2:35 UTC (permalink / raw)
To: linux-nfs, trondmy, jlayton, anna, cel
Chen Chen added an attachment on Kernel.org Bugzilla:
Created attachment 307285
/proc/meminfo
File: meminfo (text/plain)
Size: 1.53 KiB
Link: https://bugzilla.kernel.org/attachment.cgi?id=307285
---
/proc/meminfo
You can reply to this message to join the discussion.
--
Deet-doot-dot, I am a bot.
Kernel.org Bugzilla (bugspray 0.1-dev)
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: Possible memory leak on nfsd
2024-11-27 2:35 Possible memory leak on nfsd Chen Chen via Bugspray Bot
2024-11-27 2:35 ` Chen Chen via Bugspray Bot
2024-11-27 2:35 ` Chen Chen via Bugspray Bot
@ 2024-11-27 2:35 ` Chen Chen via Bugspray Bot
2024-11-27 2:35 ` Chen Chen via Bugspray Bot
` (14 subsequent siblings)
17 siblings, 0 replies; 21+ messages in thread
From: Chen Chen via Bugspray Bot @ 2024-11-27 2:35 UTC (permalink / raw)
To: linux-nfs, trondmy, jlayton, anna, cel
Chen Chen added an attachment on Kernel.org Bugzilla:
Created attachment 307286
/proc/slabinfo
File: slabinfo (text/plain)
Size: 30.92 KiB
Link: https://bugzilla.kernel.org/attachment.cgi?id=307286
---
/proc/slabinfo
You can reply to this message to join the discussion.
--
Deet-doot-dot, I am a bot.
Kernel.org Bugzilla (bugspray 0.1-dev)
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: Possible memory leak on nfsd
2024-11-27 2:35 Possible memory leak on nfsd Chen Chen via Bugspray Bot
` (2 preceding siblings ...)
2024-11-27 2:35 ` Chen Chen via Bugspray Bot
@ 2024-11-27 2:35 ` Chen Chen via Bugspray Bot
2024-11-27 2:35 ` Chen Chen via Bugspray Bot
` (13 subsequent siblings)
17 siblings, 0 replies; 21+ messages in thread
From: Chen Chen via Bugspray Bot @ 2024-11-27 2:35 UTC (permalink / raw)
To: linux-nfs, trondmy, jlayton, anna, cel
Chen Chen added an attachment on Kernel.org Bugzilla:
Created attachment 307287
systemctl status
File: systemctl_status (application/octet-stream)
Size: 4.06 KiB
Link: https://bugzilla.kernel.org/attachment.cgi?id=307287
---
systemctl status
You can reply to this message to join the discussion.
--
Deet-doot-dot, I am a bot.
Kernel.org Bugzilla (bugspray 0.1-dev)
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: Possible memory leak on nfsd
2024-11-27 2:35 Possible memory leak on nfsd Chen Chen via Bugspray Bot
` (3 preceding siblings ...)
2024-11-27 2:35 ` Chen Chen via Bugspray Bot
@ 2024-11-27 2:35 ` Chen Chen via Bugspray Bot
2024-11-27 2:35 ` Chen Chen via Bugspray Bot
` (12 subsequent siblings)
17 siblings, 0 replies; 21+ messages in thread
From: Chen Chen via Bugspray Bot @ 2024-11-27 2:35 UTC (permalink / raw)
To: linux-nfs, trondmy, jlayton, anna, cel
Chen Chen added an attachment on Kernel.org Bugzilla:
Created attachment 307288
/proc/vmallocinfo
File: vmallocinfo (text/plain)
Size: 170.08 KiB
Link: https://bugzilla.kernel.org/attachment.cgi?id=307288
---
/proc/vmallocinfo
You can reply to this message to join the discussion.
--
Deet-doot-dot, I am a bot.
Kernel.org Bugzilla (bugspray 0.1-dev)
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: Possible memory leak on nfsd
2024-11-27 2:35 Possible memory leak on nfsd Chen Chen via Bugspray Bot
` (4 preceding siblings ...)
2024-11-27 2:35 ` Chen Chen via Bugspray Bot
@ 2024-11-27 2:35 ` Chen Chen via Bugspray Bot
2024-11-27 2:35 ` Chen Chen via Bugspray Bot
` (11 subsequent siblings)
17 siblings, 0 replies; 21+ messages in thread
From: Chen Chen via Bugspray Bot @ 2024-11-27 2:35 UTC (permalink / raw)
To: linux-nfs, trondmy, jlayton, anna, cel
Chen Chen added an attachment on Kernel.org Bugzilla:
Created attachment 307289
/proc/vmstat
File: vmstat (text/plain)
Size: 3.65 KiB
Link: https://bugzilla.kernel.org/attachment.cgi?id=307289
---
/proc/vmstat
You can reply to this message to join the discussion.
--
Deet-doot-dot, I am a bot.
Kernel.org Bugzilla (bugspray 0.1-dev)
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: Possible memory leak on nfsd
2024-11-27 2:35 Possible memory leak on nfsd Chen Chen via Bugspray Bot
` (5 preceding siblings ...)
2024-11-27 2:35 ` Chen Chen via Bugspray Bot
@ 2024-11-27 2:35 ` Chen Chen via Bugspray Bot
2024-12-07 8:35 ` Chen Chen via Bugspray Bot
` (10 subsequent siblings)
17 siblings, 0 replies; 21+ messages in thread
From: Chen Chen via Bugspray Bot @ 2024-11-27 2:35 UTC (permalink / raw)
To: linux-nfs, trondmy, jlayton, anna, cel
Chen Chen added an attachment on Kernel.org Bugzilla:
Created attachment 307290
oom dmesg from kdump
File: vmcore-dmesg.txt (text/plain)
Size: 535.11 KiB
Link: https://bugzilla.kernel.org/attachment.cgi?id=307290
---
oom dmesg from kdump
You can reply to this message to join the discussion.
--
Deet-doot-dot, I am a bot.
Kernel.org Bugzilla (bugspray 0.1-dev)
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: Possible memory leak on nfsd
2024-11-27 2:35 Possible memory leak on nfsd Chen Chen via Bugspray Bot
` (6 preceding siblings ...)
2024-11-27 2:35 ` Chen Chen via Bugspray Bot
@ 2024-12-07 8:35 ` Chen Chen via Bugspray Bot
2024-12-07 15:30 ` Chuck Lever via Bugspray Bot
` (9 subsequent siblings)
17 siblings, 0 replies; 21+ messages in thread
From: Chen Chen via Bugspray Bot @ 2024-12-07 8:35 UTC (permalink / raw)
To: jlayton, cel, linux-nfs, anna, trondmy
Chen Chen added an attachment on Kernel.org Bugzilla:
Created attachment 307330
dmesg of another 3 crashes
Since reporting I got another 3 crashes. All killed by nfsd.
First one:
[136965.765431] Out of memory and no killable processes...
[136965.765433] Kernel panic - not syncing: System is deadlocked on memory
[136965.766148] CPU: 2 PID: 1856 Comm: nfsd Kdump: loaded Tainted: G E 6.1.119-1.el9.elrepo.x86_64 #1
[136965.766852] Hardware name: Dell Inc. PowerEdge R740/0923K0, BIOS 2.22.2 09/12/2024
[136965.767546] Call Trace:
[136965.768230] <TASK>
[136965.768903] dump_stack_lvl+0x45/0x5e
[136965.769571] panic+0x10c/0x2c2
[136965.770231] out_of_memory.cold+0x2f/0x7e
[136965.770874] __alloc_pages_slowpath.constprop.0+0x707/0x9d0
[136965.771518] __alloc_pages+0x35d/0x370
[136965.772147] __alloc_pages_bulk+0x3e5/0x680
[136965.772766] svc_alloc_arg+0x81/0x1f0 [sunrpc]
[136965.773431] svc_recv+0x1f/0x190 [sunrpc]
[136965.774089] ? nfsd_inet6addr_event+0x110/0x110 [nfsd]
[136965.774726] nfsd+0x87/0xc0 [nfsd]
[136965.775347] kthread+0xe5/0x110
[136965.775926] ? kthread_complete_and_exit+0x20/0x20
[136965.776499] ret_from_fork+0x1f/0x30
[136965.777062] </TASK>
Second:
[167723.787640] WARNING: CPU: 3 PID: 1872 at mm/slab_common.c:957 free_large_kmalloc+0x5a/0x80
[167723.787667] Modules linked in: <cut here>
[167723.787874] CPU: 3 PID: 1872 Comm: nfsd Kdump: loaded Not tainted 5.14.0-503.15.1.el9_5.x86_64 #1
[167723.787882] Hardware name: Dell Inc. PowerEdge R740/0923K0, BIOS 2.22.2 09/12/2024
[167723.787886] RIP: 0010:free_large_kmalloc+0x5a/0x80
Third:
[ 3883.748094] ------------[ cut here ]------------
[ 3883.748105] WARNING: CPU: 9 PID: 1886 at mm/slab_common.c:957 free_large_kmalloc+0x5a/0x80
[ 3883.748131] Modules linked in: <cut here>
[ 3883.748339] CPU: 9 PID: 1886 Comm: nfsd Kdump: loaded Not tainted 5.14.0-503.15.1.el9_5.x86_64 #1
[ 3883.748342] Hardware name: Dell Inc. PowerEdge R740/0923K0, BIOS 2.22.2 09/12/2024
[ 3883.748344] RIP: 0010:free_large_kmalloc+0x5a/0x80
File: crash.log (text/plain)
Size: 31.77 KiB
Link: https://bugzilla.kernel.org/attachment.cgi?id=307330
---
dmesg of another 3 crashes
You can reply to this message to join the discussion.
--
Deet-doot-dot, I am a bot.
Kernel.org Bugzilla (bugspray 0.1-dev)
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: Possible memory leak on nfsd
2024-11-27 2:35 Possible memory leak on nfsd Chen Chen via Bugspray Bot
` (7 preceding siblings ...)
2024-12-07 8:35 ` Chen Chen via Bugspray Bot
@ 2024-12-07 15:30 ` Chuck Lever via Bugspray Bot
2024-12-10 5:20 ` Chen Chen via Bugspray Bot
` (8 subsequent siblings)
17 siblings, 0 replies; 21+ messages in thread
From: Chuck Lever via Bugspray Bot @ 2024-12-07 15:30 UTC (permalink / raw)
To: anna, jlayton, linux-nfs, cel, trondmy
Chuck Lever writes via Kernel.org Bugzilla:
Hi Chen -
After some review, these all appear to be Red Hat Enterprise kernels. Such kernels are extensively patched and maintained exclusively by Red Hat engineers. I kindly request that you report this issue to Red Hat first and have them troubleshoot it.
If they find there is a needed upstream fix, do feel free to re-open this bug.
[I am a fan of the old ConnectX-3 cards, btw]
View: https://bugzilla.kernel.org/show_bug.cgi?id=219535#c9
You can reply to this message to join the discussion.
--
Deet-doot-dot, I am a bot.
Kernel.org Bugzilla (bugspray 0.1-dev)
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: Possible memory leak on nfsd
2024-11-27 2:35 Possible memory leak on nfsd Chen Chen via Bugspray Bot
` (8 preceding siblings ...)
2024-12-07 15:30 ` Chuck Lever via Bugspray Bot
@ 2024-12-10 5:20 ` Chen Chen via Bugspray Bot
2024-12-10 14:45 ` Chuck Lever via Bugspray Bot
` (7 subsequent siblings)
17 siblings, 0 replies; 21+ messages in thread
From: Chen Chen via Bugspray Bot @ 2024-12-10 5:20 UTC (permalink / raw)
To: linux-nfs, cel, anna, trondmy, jlayton
Chen Chen writes via Kernel.org Bugzilla:
Hi Mr. Lever,
I *clearly* stated I was using 6.1.119 which is the latest longterm kernel released on 2024-11-22, compiled by the ELRepo Project as-is from upstream tarball.
[136965.766148] CPU: 2 PID: 1856 Comm: nfsd Kdump: loaded Tainted: G E 6.1.119-1.el9.elrepo.x86_64 #1
I encountered the problem in both shipped RHEL kernel and latest and sub-latest lts. So the bug must still exists in upstream. That's why I filed this bug.
Anyway, I encountered another 2 crashes in the last two days and call stack insists nfsd caused it.
View: https://bugzilla.kernel.org/show_bug.cgi?id=219535#c10
You can reply to this message to join the discussion.
--
Deet-doot-dot, I am a bot.
Kernel.org Bugzilla (bugspray 0.1-dev)
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: Possible memory leak on nfsd
2024-11-27 2:35 Possible memory leak on nfsd Chen Chen via Bugspray Bot
` (9 preceding siblings ...)
2024-12-10 5:20 ` Chen Chen via Bugspray Bot
@ 2024-12-10 14:45 ` Chuck Lever via Bugspray Bot
2024-12-11 1:15 ` Chen Chen via Bugspray Bot
` (6 subsequent siblings)
17 siblings, 0 replies; 21+ messages in thread
From: Chuck Lever via Bugspray Bot @ 2024-12-10 14:45 UTC (permalink / raw)
To: linux-nfs, cel, anna, trondmy, jlayton
Chuck Lever writes via Kernel.org Bugzilla:
This is what comment 0 says:
> My RHEL9 server with only NFS service often OOMed after a day or two,
> with no userspace memory usage. So I switched to elrepo kernel-lts and
> still the problem persists.
> I'm now using 6.1.119-1.el9.elrepo.x86_64. The problem also occured on
> (RHEL 5.14.0-427.40.1.el9_4, (RHEL) 5.14.0-503.14.1.el9_5 and
> 6.1.115-1.el9.elrepo.x86_64.
You mentioned RHEL, and RHEL 9 in particular, several times here. I have no prior knowledge of "the ELRepo Project" -- never heard of it. By "uname" these all look like distro-built kernels to me.
> Anyway, I encountered another 2 crashes in the last two days and
> call stack insists nfsd caused it.
I'm not saying this isn't an NFSD bug. But it might not be a problem in recent kernels. If I'm reading your reports correctly, you have not tested with 6.12 or newer. 6.1.anything is based on a two-year old code base.
Any fix we create for this issue must be applied to the upstream Linus kernel first. Indeed, a fix might already exist somewhere in upstream. By upstream, I mean the "master" branch in this repo:
https://git.kernel.org./pub/scm/linux/kernel/git/torvalds/linux.git
Therefore the first task is for you to confirm by testing that this branch either still has this issue, in which case we have to troubleshoot further; or does not, in which case you can bisect to find the upstream fix that needs to be backported to the LTS kernels.
View: https://bugzilla.kernel.org/show_bug.cgi?id=219535#c11
You can reply to this message to join the discussion.
--
Deet-doot-dot, I am a bot.
Kernel.org Bugzilla (bugspray 0.1-dev)
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: Possible memory leak on nfsd
2024-11-27 2:35 Possible memory leak on nfsd Chen Chen via Bugspray Bot
` (10 preceding siblings ...)
2024-12-10 14:45 ` Chuck Lever via Bugspray Bot
@ 2024-12-11 1:15 ` Chen Chen via Bugspray Bot
2024-12-12 16:00 ` Chuck Lever via Bugspray Bot
` (5 subsequent siblings)
17 siblings, 0 replies; 21+ messages in thread
From: Chen Chen via Bugspray Bot @ 2024-12-11 1:15 UTC (permalink / raw)
To: cel, anna, jlayton, trondmy, linux-nfs
Chen Chen writes via Kernel.org Bugzilla:
Hi Mr. Lever,
> You mentioned RHEL, and RHEL 9 in particular, several times here.
Because I want to indicate that, except the kernel, every other toolchains were using latest version from RHEL9.
The ELRepo Project (https://elrepo.org/) is a group of guys grabbing the latest kernel source and package it into RPMs for easy installation on latest EL-like releases (like RHEL, Oracle Linux, Rocky, Alma etc.)
> By upstream, I mean the "master" branch in this repo
OK. I've just installed the latest stable (aka 6.12.4) and see if it might help.
View: https://bugzilla.kernel.org/show_bug.cgi?id=219535#c12
You can reply to this message to join the discussion.
--
Deet-doot-dot, I am a bot.
Kernel.org Bugzilla (bugspray 0.1-dev)
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: Possible memory leak on nfsd
2024-11-27 2:35 Possible memory leak on nfsd Chen Chen via Bugspray Bot
` (11 preceding siblings ...)
2024-12-11 1:15 ` Chen Chen via Bugspray Bot
@ 2024-12-12 16:00 ` Chuck Lever via Bugspray Bot
2024-12-12 16:15 ` Fwd: " Chuck Lever
2025-01-10 16:50 ` Chen Chen via Bugspray Bot
` (4 subsequent siblings)
17 siblings, 1 reply; 21+ messages in thread
From: Chuck Lever via Bugspray Bot @ 2024-12-12 16:00 UTC (permalink / raw)
To: jlayton, linux-nfs, trondmy, cel, anna
Chuck Lever writes via Kernel.org Bugzilla:
From attachment 307290:
[29924.805968] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0-1,global_oom,task_memcg=/user.slice/user-0.slice/user@0.service/init.scope,task=(sd-pam),pid=4503,uid=0
[29924.805991] Out of memory: Killed process 4503 ((sd-pam)) total-vm:173972kB, anon-rss:0kB, file-rss:0kB, shmem-rss:0kB, UID:0 pgtables:96kB oom_score_adj:100
[29925.425864] nfsd invoked oom-killer: gfp_mask=0xcc0(GFP_KERNEL), order=0, oom_score_adj=0
[29925.425872] CPU: 0 PID: 1874 Comm: nfsd Kdump: loaded Tainted: G E 6.1.119-1.el9.elrepo.x86_64 #1
[29925.425875] Hardware name: Dell Inc. PowerEdge R740/0923K0, BIOS 2.22.2 09/12/2024
[29925.425877] Call Trace:
[29925.425880] <TASK>
[29925.425885] dump_stack_lvl+0x45/0x5e
[29925.425893] dump_header+0x4a/0x213
[29925.425897] oom_kill_process.cold+0xb/0x10
[29925.425901] out_of_memory+0xed/0x2e0
[29925.425906] __alloc_pages_slowpath.constprop.0+0x707/0x9d0
[29925.425916] __alloc_pages+0x35d/0x370
[29925.425921] __alloc_pages_bulk+0x3e5/0x680
[29925.425927] svc_alloc_arg+0x81/0x1f0 [sunrpc]
[29925.425991] svc_recv+0x1f/0x190 [sunrpc]
[29925.426043] ? nfsd_inet6addr_event+0x110/0x110 [nfsd]
[29925.426080] nfsd+0x87/0xc0 [nfsd]
[29925.426113] kthread+0xe5/0x110
[29925.426118] ? kthread_complete_and_exit+0x20/0x20
[29925.426122] ret_from_fork+0x1f/0x30
[29925.426129] </TASK>
NFSD is targeted by OOM killer because it frequently allocates up to 256 pages at a time to fill the send and receive buffers. It is not necessarily the source of a leak.
The bulk page allocator is on the slow path here, suggesting there weren't any free pages available on the lists it normally checks first. So it is doing one-at-a-time order-0 allocations, a sign that memory is short.
We see that Node 1 appears to be short on free memory, but the system has not pushed into swap at all. Kernel memory isn't swappable, so whatever is leaking is in the kernel proper.
The slab caches all look reasonably sized, so not likely a slab leak.
At this point we would want someone with some MM expertise to come in and help us nail down the leak.
View: https://bugzilla.kernel.org/show_bug.cgi?id=219535#c13
You can reply to this message to join the discussion.
--
Deet-doot-dot, I am a bot.
Kernel.org Bugzilla (bugspray 0.1-dev)
^ permalink raw reply [flat|nested] 21+ messages in thread* Fwd: Possible memory leak on nfsd
2024-12-12 16:00 ` Chuck Lever via Bugspray Bot
@ 2024-12-12 16:15 ` Chuck Lever
0 siblings, 0 replies; 21+ messages in thread
From: Chuck Lever @ 2024-12-12 16:15 UTC (permalink / raw)
To: linux-mm, Linux NFS Mailing List
Hi -
An NFSD page allocation on v6.1.y is triggering OOM-killer. The reporter
has provided a lot of detail, and we need some help steering us towards
the possible leak culprit. Any takers?
(We've asked the reporter to reproduce on a more recent kernel if
possible).
-------- Forwarded Message --------
Subject: Re: Possible memory leak on nfsd
Date: Thu, 12 Dec 2024 16:00:17 +0000
From: Chuck Lever via Bugspray Bot <bugbot@kernel.org>
To: jlayton@kernel.org, linux-nfs@vger.kernel.org, trondmy@kernel.org,
cel@kernel.org, anna@kernel.org
Chuck Lever writes via Kernel.org Bugzilla:
From attachment 307290:
[29924.805968]
oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0-1,global_oom,task_memcg=/user.slice/user-0.slice/user@0.service/init.scope,task=(sd-pam),pid=4503,uid=0
[29924.805991] Out of memory: Killed process 4503 ((sd-pam))
total-vm:173972kB, anon-rss:0kB, file-rss:0kB, shmem-rss:0kB, UID:0
pgtables:96kB oom_score_adj:100
[29925.425864] nfsd invoked oom-killer: gfp_mask=0xcc0(GFP_KERNEL),
order=0, oom_score_adj=0
[29925.425872] CPU: 0 PID: 1874 Comm: nfsd Kdump: loaded Tainted: G
E 6.1.119-1.el9.elrepo.x86_64 #1
[29925.425875] Hardware name: Dell Inc. PowerEdge R740/0923K0, BIOS
2.22.2 09/12/2024
[29925.425877] Call Trace:
[29925.425880] <TASK>
[29925.425885] dump_stack_lvl+0x45/0x5e
[29925.425893] dump_header+0x4a/0x213
[29925.425897] oom_kill_process.cold+0xb/0x10
[29925.425901] out_of_memory+0xed/0x2e0
[29925.425906] __alloc_pages_slowpath.constprop.0+0x707/0x9d0
[29925.425916] __alloc_pages+0x35d/0x370
[29925.425921] __alloc_pages_bulk+0x3e5/0x680
[29925.425927] svc_alloc_arg+0x81/0x1f0 [sunrpc]
[29925.425991] svc_recv+0x1f/0x190 [sunrpc]
[29925.426043] ? nfsd_inet6addr_event+0x110/0x110 [nfsd]
[29925.426080] nfsd+0x87/0xc0 [nfsd]
[29925.426113] kthread+0xe5/0x110
[29925.426118] ? kthread_complete_and_exit+0x20/0x20
[29925.426122] ret_from_fork+0x1f/0x30
[29925.426129] </TASK>
NFSD is triggering the OOM killer because it frequently allocates up to
256 pages at a time to fill the send and receive buffers. It is not
necessarily the source of a leak.
The bulk page allocator is on the slow path here, suggesting there
weren't any free pages available on the lists it normally checks first.
So it is doing one-at-a-time order-0 allocations, a sign that memory is
short.
We see that Node 1 appears to be short on free memory, but the system
has not pushed into swap at all. Kernel memory isn't swappable, so
whatever is leaking is in the kernel proper.
The slab caches all look reasonably sized, so not likely a slab leak.
At this point we would want someone with some MM expertise to come in
and help us nail down the leak.
View: https://bugzilla.kernel.org/show_bug.cgi?id=219535#c13
You can reply to this message to join the discussion.
--
Deet-doot-dot, I am a bot.
Kernel.org Bugzilla (bugspray 0.1-dev)
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Possible memory leak on nfsd
2024-11-27 2:35 Possible memory leak on nfsd Chen Chen via Bugspray Bot
` (12 preceding siblings ...)
2024-12-12 16:00 ` Chuck Lever via Bugspray Bot
@ 2025-01-10 16:50 ` Chen Chen via Bugspray Bot
2025-01-10 20:35 ` Chuck Lever
2025-01-22 20:45 ` JJ Jordan via Bugspray Bot
` (3 subsequent siblings)
17 siblings, 1 reply; 21+ messages in thread
From: Chen Chen via Bugspray Bot @ 2025-01-10 16:50 UTC (permalink / raw)
To: anna, linux-nfs, linux-mm, chuck.lever, jlayton, cel, trondmy
Chen Chen writes via Kernel.org Bugzilla:
Sorry for my rudeness in my previous discussion.
After switching to 6.12.4, the server stayed stable for 30 days. So whatever caused the memleak should have been resolved between 6.1.119 to 6.12.
You might want to close this bug if backport is not worthwhile.
View: https://bugzilla.kernel.org/show_bug.cgi?id=219535#c15
You can reply to this message to join the discussion.
--
Deet-doot-dot, I am a bot.
Kernel.org Bugzilla (bugspray 0.1-dev)
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: Possible memory leak on nfsd
2025-01-10 16:50 ` Chen Chen via Bugspray Bot
@ 2025-01-10 20:35 ` Chuck Lever
0 siblings, 0 replies; 21+ messages in thread
From: Chuck Lever @ 2025-01-10 20:35 UTC (permalink / raw)
To: Chen Chen via Bugspray Bot, anna, linux-nfs, linux-mm, jlayton,
cel, trondmy
On 1/10/25 11:50 AM, Chen Chen via Bugspray Bot wrote:
> Chen Chen writes via Kernel.org Bugzilla:
>
> Sorry for my rudeness in my previous discussion.
>
> After switching to 6.12.4, the server stayed stable for 30 days.
That's good news!
> So whatever caused the memleak should have been resolved between 6.1.119 to 6.12.
That's tens of thousands of commits over two years. Unfortunately that
doesn't really tell us what the problem is.
> You might want to close this bug if backport is not worthwhile.
We need to know the exact commit that contains the fix before it can
be determined whether a backport is feasible.
Are you able to bisect between v6.1 and v6.12 ? If not, do you have
a simple, narrow reproducer that we can use to explore this ourselves?
--
Chuck Lever
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Possible memory leak on nfsd
2024-11-27 2:35 Possible memory leak on nfsd Chen Chen via Bugspray Bot
` (13 preceding siblings ...)
2025-01-10 16:50 ` Chen Chen via Bugspray Bot
@ 2025-01-22 20:45 ` JJ Jordan via Bugspray Bot
2025-01-22 21:25 ` JJ Jordan via Bugspray Bot
` (2 subsequent siblings)
17 siblings, 0 replies; 21+ messages in thread
From: JJ Jordan via Bugspray Bot @ 2025-01-22 20:45 UTC (permalink / raw)
To: anna, chuck.lever, cel, trondmy, jlayton, linux-nfs, linux-mm
JJ Jordan added an attachment on Kernel.org Bugzilla:
Created attachment 307525
Logs and traces from Jan-18 pt1
Here are the traces from two NFS crashes that occurred this past weekend.
Both occurred in the AM (US time) on Jan 18, a few hours apart from one
another.
I followed the instructions I found on the various threads.
There was no output to `rpcdebug -m rpc -c`, not sure what I did wrong
there. The syslog ought to contain the output of sysrq-trigger, however.
The output from trace-cmd captures several days' worth of logs in either
case, but not from system boot.
The syslogs I have cut from ~one hour before the incident until it finished
shutting down prior to reboot. I have removed the output of other services.
Both are VMs on GCE running the 6.1.119 kernel from Debian bookworm (6.1.0-28)
~60Gi memory, 16 CPUs.
File: nfs-traces-250118-pt1.tar.bz2 (application/octet-stream)
Size: 4.61 MiB
Link: https://bugzilla.kernel.org/attachment.cgi?id=307525
---
Logs and traces from Jan-18 pt1
You can reply to this message to join the discussion.
--
Deet-doot-dot, I am a bot.
Kernel.org Bugzilla (bugspray 0.1-dev)
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: Possible memory leak on nfsd
2024-11-27 2:35 Possible memory leak on nfsd Chen Chen via Bugspray Bot
` (14 preceding siblings ...)
2025-01-22 20:45 ` JJ Jordan via Bugspray Bot
@ 2025-01-22 21:25 ` JJ Jordan via Bugspray Bot
2025-01-22 21:25 ` JJ Jordan via Bugspray Bot
2025-01-22 21:25 ` JJ Jordan via Bugspray Bot
17 siblings, 0 replies; 21+ messages in thread
From: JJ Jordan via Bugspray Bot @ 2025-01-22 21:25 UTC (permalink / raw)
To: trondmy, linux-mm, anna, jlayton, cel, linux-nfs, chuck.lever
JJ Jordan added an attachment on Kernel.org Bugzilla:
Created attachment 307526
Logs and traces from Jan-18 pt2
Part 2, see previous description
File: nfs-traces-250118-pt2.tar.bz2 (application/octet-stream)
Size: 601.99 KiB
Link: https://bugzilla.kernel.org/attachment.cgi?id=307526
---
Logs and traces from Jan-18 pt2
You can reply to this message to join the discussion.
--
Deet-doot-dot, I am a bot.
Kernel.org Bugzilla (bugspray 0.1-dev)
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: Possible memory leak on nfsd
2024-11-27 2:35 Possible memory leak on nfsd Chen Chen via Bugspray Bot
` (15 preceding siblings ...)
2025-01-22 21:25 ` JJ Jordan via Bugspray Bot
@ 2025-01-22 21:25 ` JJ Jordan via Bugspray Bot
2025-01-22 21:25 ` JJ Jordan via Bugspray Bot
17 siblings, 0 replies; 21+ messages in thread
From: JJ Jordan via Bugspray Bot @ 2025-01-22 21:25 UTC (permalink / raw)
To: trondmy, linux-mm, anna, jlayton, cel, linux-nfs, chuck.lever
JJ Jordan added an attachment on Kernel.org Bugzilla:
Comment on attachment 307525
Logs and traces from Jan-18 pt1
This was submitted in error, apologies.
File: nfs-traces-250118-pt1.tar.bz2 (application/octet-stream)
Size: 4.61 MiB
Link: https://bugzilla.kernel.org/attachment.cgi?id=307525
---
Logs and traces from Jan-18 pt1
You can reply to this message to join the discussion.
--
Deet-doot-dot, I am a bot.
Kernel.org Bugzilla (bugspray 0.1-dev)
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: Possible memory leak on nfsd
2024-11-27 2:35 Possible memory leak on nfsd Chen Chen via Bugspray Bot
` (16 preceding siblings ...)
2025-01-22 21:25 ` JJ Jordan via Bugspray Bot
@ 2025-01-22 21:25 ` JJ Jordan via Bugspray Bot
17 siblings, 0 replies; 21+ messages in thread
From: JJ Jordan via Bugspray Bot @ 2025-01-22 21:25 UTC (permalink / raw)
To: trondmy, linux-mm, anna, jlayton, cel, linux-nfs, chuck.lever
JJ Jordan added an attachment on Kernel.org Bugzilla:
Comment on attachment 307526
Logs and traces from Jan-18 pt2
Also submitted in error.
File: nfs-traces-250118-pt2.tar.bz2 (application/octet-stream)
Size: 601.99 KiB
Link: https://bugzilla.kernel.org/attachment.cgi?id=307526
---
Logs and traces from Jan-18 pt2
You can reply to this message to join the discussion.
--
Deet-doot-dot, I am a bot.
Kernel.org Bugzilla (bugspray 0.1-dev)
^ permalink raw reply [flat|nested] 21+ messages in thread