From: Chen Chen via Bugspray Bot <bugbot@kernel.org>
To: linux-nfs@vger.kernel.org, trondmy@kernel.org,
jlayton@kernel.org, anna@kernel.org, cel@kernel.org
Subject: Possible memory leak on nfsd
Date: Wed, 27 Nov 2024 02:35:07 +0000 [thread overview]
Message-ID: <20241127-b219535c0-4d5445e74947@bugzilla.kernel.org> (raw)
Chen Chen added an attachment on Kernel.org Bugzilla:
Created attachment 307283
sar -r mem usage
My RHEL9 server with only NFS service often OOMed after a day or two, with no userspace memory usage. So I switched to elrepo kernel-lts and still the problem persists.
I'm now using 6.1.119-1.el9.elrepo.x86_64. The problem also occured on (RHEL) 5.14.0-427.40.1.el9_4, (RHEL) 5.14.0-503.14.1.el9_5 and 6.1.115-1.el9.elrepo.x86_64.
I'm not so sure it is caused by NFS but since it is the only service running on the server I can only suspect it is the culprit. The server has a Mellanox Technologies MT27500 Family [ConnectX-3] Infiniband Card and NFSoRMDA is enabled. No 3rd drivers used.
The following data were gathered moments before it OOMed and crashed
sar reported a typical memory leak appearance.
01:20:13 AM 390187300 388732764 3501864 0.89 4856 363952 390344 0.09 100680 358384 17148
01:30:13 AM 379492128 378312768 13642416 3.46 4856 909388 390344 0.09 108844 895740 16
01:40:13 AM 367687716 367062060 24851416 6.30 4856 1498272 390344 0.09 116736 1476672 16
01:50:50 AM 361704244 361471420 30437312 7.72 4856 1888780 390344 0.09 127888 1856036 29912
02:00:13 AM 355796296 355848120 36061648 9.15 4856 2173560 390344 0.09 131544 2137152 0
....
09:00:13 AM 1518392 18089616 373760196 94.79 4760 18648816 390344 0.09 470608 18273412 36
09:10:13 AM 1499980 17223900 374626172 95.01 4740 17801676 390344 0.09 471964 17424672 5292
09:20:13 AM 1561896 6784736 385059756 97.66 1712 7338540 423580 0.10 325452 7070372 0
meminfo also didn't show anything using ram.
MemTotal: 394292660 kB
MemFree: 1551296 kB
MemAvailable: 6776108 kB
Buffers: 1712 kB
Cached: 7340144 kB
SwapCached: 4308 kB
Active: 325936 kB
Inactive: 7071836 kB
...
KReclaimable: 129816 kB
Slab: 331596 kB
SReclaimable: 129816 kB
SUnreclaim: 201780 kB
...
VmallocUsed: 319528 kB
slabinfo is low. Attached.
vmallocinfo doesn't have much. Attached.
dmesg log showed it has killed nearly every userspace programs.
[29960.547403] Tasks state (memory values in pages):
[29960.547404] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name
[29960.547412] [ 1020] 0 1020 9498 640 94208 1000 -1000 systemd-udevd
[29960.547417] [ 1247] 0 1247 105208 6888 126976 0 -1000 multipathd
[29960.547421] [ 1342] 0 1342 23190 330 65536 764 -1000 auditd
[29960.547428] [ 1472] 0 1472 4185 806 73728 357 -1000 sshd
[29960.547438] Out of memory and no killable processes...
[29960.547439] Kernel panic - not syncing: System is deadlocked on memory
systemctl status attached. Nothing else is running.
I have a 224G vmcore dump but have no idea how to deal with it. And it is too big to upload somewhere I think.
I appreciate any help to help me detect what went wrong.
File: sar (text/plain)
Size: 6.95 KiB
Link: https://bugzilla.kernel.org/attachment.cgi?id=307283
---
sar -r mem usage
You can reply to this message to join the discussion.
--
Deet-doot-dot, I am a bot.
Kernel.org Bugzilla (bugspray 0.1-dev)
next reply other threads:[~2024-11-27 2:34 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-11-27 2:35 Chen Chen via Bugspray Bot [this message]
2024-11-27 2:35 ` Possible memory leak on nfsd Chen Chen via Bugspray Bot
2024-11-27 2:35 ` Chen Chen via Bugspray Bot
2024-11-27 2:35 ` Chen Chen via Bugspray Bot
2024-11-27 2:35 ` Chen Chen via Bugspray Bot
2024-11-27 2:35 ` Chen Chen via Bugspray Bot
2024-11-27 2:35 ` Chen Chen via Bugspray Bot
2024-11-27 2:35 ` Chen Chen via Bugspray Bot
2024-12-07 8:35 ` Chen Chen via Bugspray Bot
2024-12-07 15:30 ` Chuck Lever via Bugspray Bot
2024-12-10 5:20 ` Chen Chen via Bugspray Bot
2024-12-10 14:45 ` Chuck Lever via Bugspray Bot
2024-12-11 1:15 ` Chen Chen via Bugspray Bot
2024-12-12 16:00 ` Chuck Lever via Bugspray Bot
2024-12-12 16:15 ` Fwd: " Chuck Lever
2025-01-10 16:50 ` Chen Chen via Bugspray Bot
2025-01-10 20:35 ` Chuck Lever
2025-01-22 20:45 ` JJ Jordan via Bugspray Bot
2025-01-22 21:25 ` JJ Jordan via Bugspray Bot
2025-01-22 21:25 ` JJ Jordan via Bugspray Bot
2025-01-22 21:25 ` JJ Jordan via Bugspray Bot
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20241127-b219535c0-4d5445e74947@bugzilla.kernel.org \
--to=bugbot@kernel.org \
--cc=anna@kernel.org \
--cc=cel@kernel.org \
--cc=jlayton@kernel.org \
--cc=linux-nfs@vger.kernel.org \
--cc=trondmy@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.