From: Chen Chen via Bugspray Bot <bugbot@kernel.org>
To: linux-nfs@vger.kernel.org, trondmy@kernel.org,
jlayton@kernel.org, anna@kernel.org, cel@kernel.org
Subject: Possible memory leak on nfsd
Date: Wed, 27 Nov 2024 02:35:07 +0000 [thread overview]
Message-ID: <20241127-b219535c0-4d5445e74947@bugzilla.kernel.org> (raw)
Chen Chen added an attachment on Kernel.org Bugzilla:
Created attachment 307283
sar -r mem usage
My RHEL9 server with only NFS service often OOMed after a day or two, with no userspace memory usage. So I switched to elrepo kernel-lts and still the problem persists.
I'm now using 6.1.119-1.el9.elrepo.x86_64. The problem also occured on (RHEL) 5.14.0-427.40.1.el9_4, (RHEL) 5.14.0-503.14.1.el9_5 and 6.1.115-1.el9.elrepo.x86_64.
I'm not so sure it is caused by NFS but since it is the only service running on the server I can only suspect it is the culprit. The server has a Mellanox Technologies MT27500 Family [ConnectX-3] Infiniband Card and NFSoRMDA is enabled. No 3rd drivers used.
The following data were gathered moments before it OOMed and crashed
sar reported a typical memory leak appearance.
01:20:13 AM 390187300 388732764 3501864 0.89 4856 363952 390344 0.09 100680 358384 17148
01:30:13 AM 379492128 378312768 13642416 3.46 4856 909388 390344 0.09 108844 895740 16
01:40:13 AM 367687716 367062060 24851416 6.30 4856 1498272 390344 0.09 116736 1476672 16
01:50:50 AM 361704244 361471420 30437312 7.72 4856 1888780 390344 0.09 127888 1856036 29912
02:00:13 AM 355796296 355848120 36061648 9.15 4856 2173560 390344 0.09 131544 2137152 0
....
09:00:13 AM 1518392 18089616 373760196 94.79 4760 18648816 390344 0.09 470608 18273412 36
09:10:13 AM 1499980 17223900 374626172 95.01 4740 17801676 390344 0.09 471964 17424672 5292
09:20:13 AM 1561896 6784736 385059756 97.66 1712 7338540 423580 0.10 325452 7070372 0
meminfo also didn't show anything using ram.
MemTotal: 394292660 kB
MemFree: 1551296 kB
MemAvailable: 6776108 kB
Buffers: 1712 kB
Cached: 7340144 kB
SwapCached: 4308 kB
Active: 325936 kB
Inactive: 7071836 kB
...
KReclaimable: 129816 kB
Slab: 331596 kB
SReclaimable: 129816 kB
SUnreclaim: 201780 kB
...
VmallocUsed: 319528 kB
slabinfo is low. Attached.
vmallocinfo doesn't have much. Attached.
dmesg log showed it has killed nearly every userspace programs.
[29960.547403] Tasks state (memory values in pages):
[29960.547404] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name
[29960.547412] [ 1020] 0 1020 9498 640 94208 1000 -1000 systemd-udevd
[29960.547417] [ 1247] 0 1247 105208 6888 126976 0 -1000 multipathd
[29960.547421] [ 1342] 0 1342 23190 330 65536 764 -1000 auditd
[29960.547428] [ 1472] 0 1472 4185 806 73728 357 -1000 sshd
[29960.547438] Out of memory and no killable processes...
[29960.547439] Kernel panic - not syncing: System is deadlocked on memory
systemctl status attached. Nothing else is running.
I have a 224G vmcore dump but have no idea how to deal with it. And it is too big to upload somewhere I think.
I appreciate any help to help me detect what went wrong.
File: sar (text/plain)
Size: 6.95 KiB
Link: https://bugzilla.kernel.org/attachment.cgi?id=307283
---
sar -r mem usage
You can reply to this message to join the discussion.
--
Deet-doot-dot, I am a bot.
Kernel.org Bugzilla (bugspray 0.1-dev)
next reply other threads:[~2024-11-27 2:34 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-11-27 2:35 Chen Chen via Bugspray Bot [this message]
2024-11-27 2:35 ` Possible memory leak on nfsd Chen Chen via Bugspray Bot
2024-11-27 2:35 ` Chen Chen via Bugspray Bot
2024-11-27 2:35 ` Chen Chen via Bugspray Bot
2024-11-27 2:35 ` Chen Chen via Bugspray Bot
2024-11-27 2:35 ` Chen Chen via Bugspray Bot
2024-11-27 2:35 ` Chen Chen via Bugspray Bot
2024-11-27 2:35 ` Chen Chen via Bugspray Bot
2024-12-07 8:35 ` Chen Chen via Bugspray Bot
2024-12-07 15:30 ` Chuck Lever via Bugspray Bot
2024-12-10 5:20 ` Chen Chen via Bugspray Bot
2024-12-10 14:45 ` Chuck Lever via Bugspray Bot
2024-12-11 1:15 ` Chen Chen via Bugspray Bot
2024-12-12 16:00 ` Chuck Lever via Bugspray Bot
2024-12-12 16:15 ` Fwd: " Chuck Lever
2025-01-10 16:50 ` Chen Chen via Bugspray Bot
2025-01-10 20:35 ` Chuck Lever
2025-01-22 20:45 ` JJ Jordan via Bugspray Bot
2025-01-22 21:25 ` JJ Jordan via Bugspray Bot
2025-01-22 21:25 ` JJ Jordan via Bugspray Bot
2025-01-22 21:25 ` JJ Jordan via Bugspray Bot
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20241127-b219535c0-4d5445e74947@bugzilla.kernel.org \
--to=bugbot@kernel.org \
--cc=anna@kernel.org \
--cc=cel@kernel.org \
--cc=jlayton@kernel.org \
--cc=linux-nfs@vger.kernel.org \
--cc=trondmy@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox