All of lore.kernel.org
 help / color / mirror / Atom feed
* Possible memory leak on nfsd
@ 2024-11-27  2:35 Chen Chen via Bugspray Bot
  2024-11-27  2:35 ` Chen Chen via Bugspray Bot
                   ` (17 more replies)
  0 siblings, 18 replies; 21+ messages in thread
From: Chen Chen via Bugspray Bot @ 2024-11-27  2:35 UTC (permalink / raw)
  To: linux-nfs, trondmy, jlayton, anna, cel

Chen Chen added an attachment on Kernel.org Bugzilla:

Created attachment 307283
sar -r mem usage

My RHEL9 server with only NFS service often OOMed after a day or two, with no userspace memory usage. So I switched to elrepo kernel-lts and still the problem persists.

I'm now using 6.1.119-1.el9.elrepo.x86_64. The problem also occured on (RHEL) 5.14.0-427.40.1.el9_4, (RHEL) 5.14.0-503.14.1.el9_5 and 6.1.115-1.el9.elrepo.x86_64.

I'm not so sure it is caused by NFS but since it is the only service running on the server I can only suspect it is the culprit. The server has a Mellanox Technologies MT27500 Family [ConnectX-3] Infiniband Card and NFSoRMDA is enabled. No 3rd drivers used.

The following data were gathered moments before it OOMed and crashed

sar reported a typical memory leak appearance.
01:20:13 AM 390187300 388732764   3501864      0.89      4856    363952    390344      0.09    100680    358384     17148
01:30:13 AM 379492128 378312768  13642416      3.46      4856    909388    390344      0.09    108844    895740        16
01:40:13 AM 367687716 367062060  24851416      6.30      4856   1498272    390344      0.09    116736   1476672        16
01:50:50 AM 361704244 361471420  30437312      7.72      4856   1888780    390344      0.09    127888   1856036     29912
02:00:13 AM 355796296 355848120  36061648      9.15      4856   2173560    390344      0.09    131544   2137152         0
....
09:00:13 AM   1518392  18089616 373760196     94.79      4760  18648816    390344      0.09    470608  18273412        36
09:10:13 AM   1499980  17223900 374626172     95.01      4740  17801676    390344      0.09    471964  17424672      5292
09:20:13 AM   1561896   6784736 385059756     97.66      1712   7338540    423580      0.10    325452   7070372         0

meminfo also didn't show anything using ram.
MemTotal:       394292660 kB
MemFree:         1551296 kB
MemAvailable:    6776108 kB
Buffers:            1712 kB
Cached:          7340144 kB
SwapCached:         4308 kB
Active:           325936 kB
Inactive:        7071836 kB
...
KReclaimable:     129816 kB
Slab:             331596 kB
SReclaimable:     129816 kB
SUnreclaim:       201780 kB
...
VmallocUsed:      319528 kB

slabinfo is low. Attached.

vmallocinfo doesn't have much. Attached.

dmesg log showed it has killed nearly every userspace programs.
[29960.547403] Tasks state (memory values in pages):
[29960.547404] [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
[29960.547412] [   1020]     0  1020     9498      640    94208     1000         -1000 systemd-udevd
[29960.547417] [   1247]     0  1247   105208     6888   126976        0         -1000 multipathd
[29960.547421] [   1342]     0  1342    23190      330    65536      764         -1000 auditd
[29960.547428] [   1472]     0  1472     4185      806    73728      357         -1000 sshd
[29960.547438] Out of memory and no killable processes...
[29960.547439] Kernel panic - not syncing: System is deadlocked on memory

systemctl status attached. Nothing else is running.

I have a 224G vmcore dump but have no idea how to deal with it. And it is too big to upload somewhere I think.

I appreciate any help to help me detect what went wrong.

File: sar (text/plain)
Size: 6.95 KiB
Link: https://bugzilla.kernel.org/attachment.cgi?id=307283
---
sar -r mem usage

You can reply to this message to join the discussion.
-- 
Deet-doot-dot, I am a bot.
Kernel.org Bugzilla (bugspray 0.1-dev)


^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2025-01-22 21:24 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-11-27  2:35 Possible memory leak on nfsd Chen Chen via Bugspray Bot
2024-11-27  2:35 ` Chen Chen via Bugspray Bot
2024-11-27  2:35 ` Chen Chen via Bugspray Bot
2024-11-27  2:35 ` Chen Chen via Bugspray Bot
2024-11-27  2:35 ` Chen Chen via Bugspray Bot
2024-11-27  2:35 ` Chen Chen via Bugspray Bot
2024-11-27  2:35 ` Chen Chen via Bugspray Bot
2024-11-27  2:35 ` Chen Chen via Bugspray Bot
2024-12-07  8:35 ` Chen Chen via Bugspray Bot
2024-12-07 15:30 ` Chuck Lever via Bugspray Bot
2024-12-10  5:20 ` Chen Chen via Bugspray Bot
2024-12-10 14:45 ` Chuck Lever via Bugspray Bot
2024-12-11  1:15 ` Chen Chen via Bugspray Bot
2024-12-12 16:00 ` Chuck Lever via Bugspray Bot
2024-12-12 16:15   ` Fwd: " Chuck Lever
2025-01-10 16:50 ` Chen Chen via Bugspray Bot
2025-01-10 20:35   ` Chuck Lever
2025-01-22 20:45 ` JJ Jordan via Bugspray Bot
2025-01-22 21:25 ` JJ Jordan via Bugspray Bot
2025-01-22 21:25 ` JJ Jordan via Bugspray Bot
2025-01-22 21:25 ` JJ Jordan via Bugspray Bot

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.