public inbox for linux-nfs@vger.kernel.org
 help / color / mirror / Atom feed
* Possible memory leak on nfsd
@ 2024-11-27  2:35 Chen Chen via Bugspray Bot
  2024-11-27  2:35 ` Chen Chen via Bugspray Bot
                   ` (17 more replies)
  0 siblings, 18 replies; 21+ messages in thread
From: Chen Chen via Bugspray Bot @ 2024-11-27  2:35 UTC (permalink / raw)
  To: linux-nfs, trondmy, jlayton, anna, cel

Chen Chen added an attachment on Kernel.org Bugzilla:

Created attachment 307283
sar -r mem usage

My RHEL9 server with only NFS service often OOMed after a day or two, with no userspace memory usage. So I switched to elrepo kernel-lts and still the problem persists.

I'm now using 6.1.119-1.el9.elrepo.x86_64. The problem also occured on (RHEL) 5.14.0-427.40.1.el9_4, (RHEL) 5.14.0-503.14.1.el9_5 and 6.1.115-1.el9.elrepo.x86_64.

I'm not so sure it is caused by NFS but since it is the only service running on the server I can only suspect it is the culprit. The server has a Mellanox Technologies MT27500 Family [ConnectX-3] Infiniband Card and NFSoRMDA is enabled. No 3rd drivers used.

The following data were gathered moments before it OOMed and crashed

sar reported a typical memory leak appearance.
01:20:13 AM 390187300 388732764   3501864      0.89      4856    363952    390344      0.09    100680    358384     17148
01:30:13 AM 379492128 378312768  13642416      3.46      4856    909388    390344      0.09    108844    895740        16
01:40:13 AM 367687716 367062060  24851416      6.30      4856   1498272    390344      0.09    116736   1476672        16
01:50:50 AM 361704244 361471420  30437312      7.72      4856   1888780    390344      0.09    127888   1856036     29912
02:00:13 AM 355796296 355848120  36061648      9.15      4856   2173560    390344      0.09    131544   2137152         0
....
09:00:13 AM   1518392  18089616 373760196     94.79      4760  18648816    390344      0.09    470608  18273412        36
09:10:13 AM   1499980  17223900 374626172     95.01      4740  17801676    390344      0.09    471964  17424672      5292
09:20:13 AM   1561896   6784736 385059756     97.66      1712   7338540    423580      0.10    325452   7070372         0

meminfo also didn't show anything using ram.
MemTotal:       394292660 kB
MemFree:         1551296 kB
MemAvailable:    6776108 kB
Buffers:            1712 kB
Cached:          7340144 kB
SwapCached:         4308 kB
Active:           325936 kB
Inactive:        7071836 kB
...
KReclaimable:     129816 kB
Slab:             331596 kB
SReclaimable:     129816 kB
SUnreclaim:       201780 kB
...
VmallocUsed:      319528 kB

slabinfo is low. Attached.

vmallocinfo doesn't have much. Attached.

dmesg log showed it has killed nearly every userspace programs.
[29960.547403] Tasks state (memory values in pages):
[29960.547404] [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
[29960.547412] [   1020]     0  1020     9498      640    94208     1000         -1000 systemd-udevd
[29960.547417] [   1247]     0  1247   105208     6888   126976        0         -1000 multipathd
[29960.547421] [   1342]     0  1342    23190      330    65536      764         -1000 auditd
[29960.547428] [   1472]     0  1472     4185      806    73728      357         -1000 sshd
[29960.547438] Out of memory and no killable processes...
[29960.547439] Kernel panic - not syncing: System is deadlocked on memory

systemctl status attached. Nothing else is running.

I have a 224G vmcore dump but have no idea how to deal with it. And it is too big to upload somewhere I think.

I appreciate any help to help me detect what went wrong.

File: sar (text/plain)
Size: 6.95 KiB
Link: https://bugzilla.kernel.org/attachment.cgi?id=307283
---
sar -r mem usage

You can reply to this message to join the discussion.
-- 
Deet-doot-dot, I am a bot.
Kernel.org Bugzilla (bugspray 0.1-dev)


^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2025-01-22 21:24 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-11-27  2:35 Possible memory leak on nfsd Chen Chen via Bugspray Bot
2024-11-27  2:35 ` Chen Chen via Bugspray Bot
2024-11-27  2:35 ` Chen Chen via Bugspray Bot
2024-11-27  2:35 ` Chen Chen via Bugspray Bot
2024-11-27  2:35 ` Chen Chen via Bugspray Bot
2024-11-27  2:35 ` Chen Chen via Bugspray Bot
2024-11-27  2:35 ` Chen Chen via Bugspray Bot
2024-11-27  2:35 ` Chen Chen via Bugspray Bot
2024-12-07  8:35 ` Chen Chen via Bugspray Bot
2024-12-07 15:30 ` Chuck Lever via Bugspray Bot
2024-12-10  5:20 ` Chen Chen via Bugspray Bot
2024-12-10 14:45 ` Chuck Lever via Bugspray Bot
2024-12-11  1:15 ` Chen Chen via Bugspray Bot
2024-12-12 16:00 ` Chuck Lever via Bugspray Bot
2024-12-12 16:15   ` Fwd: " Chuck Lever
2025-01-10 16:50 ` Chen Chen via Bugspray Bot
2025-01-10 20:35   ` Chuck Lever
2025-01-22 20:45 ` JJ Jordan via Bugspray Bot
2025-01-22 21:25 ` JJ Jordan via Bugspray Bot
2025-01-22 21:25 ` JJ Jordan via Bugspray Bot
2025-01-22 21:25 ` JJ Jordan via Bugspray Bot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox