From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail.haak.id.au (mail.haak.id.au [172.105.183.32]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C3D90275861 for ; Fri, 5 Dec 2025 22:27:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=172.105.183.32 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764973633; cv=none; b=r6TTp2YPZ7qisb4/urN3OYa6D9otplI9Orp1EaQotxIqtNB19eq1WGPgzwdFHedo7m5PGUEbicWuOaYF8Ams9ZX92gvWjimvvvbWhkVB59tZr5QTixBsnaBmEf4svqgdcZl7Icy+btstogT8dxxp5h1Glm0/ddddHb8DrTgsGvI= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764973633; c=relaxed/simple; bh=W45P5MNABbQIVqkKcs2dY/zeDcCfCWHdf4kuPeTpARU=; h=Date:From:To:Subject:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=tpGgCJ5uY/ack75xfMep/oGsBxDGTU2xBNqIZ8ZcM2UE/6T/HHAAfkNIVjP/l2BMVy7LdT+dFHMeCdvoNFjIZnbeDozpN+/cVQh0dJuhcP5ItJ+AJBfRa/IjSPGGGzwoisQdgYQ6b1xhD6/s+n1Eb+pQqFoGfZGy6ywL49Gdyc0= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=haak.id.au; spf=pass smtp.mailfrom=haak.id.au; dkim=pass (2048-bit key) header.d=haak.id.au header.i=@haak.id.au header.b=OZJzCCCk; arc=none smtp.client-ip=172.105.183.32 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=haak.id.au Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=haak.id.au Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=haak.id.au header.i=@haak.id.au header.b="OZJzCCCk" Received: from xps15mal (180-150-104-78.b49668.bne.static.aussiebb.net [180.150.104.78]) by mail.haak.id.au (Postfix) with ESMTPSA id 795BF8325D for ; Sat, 06 Dec 2025 08:23:40 +1000 (AEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=haak.id.au; s=202002; t=1764973420; bh=W45P5MNABbQIVqkKcs2dY/zeDcCfCWHdf4kuPeTpARU=; h=Date:From:To:Subject:From; b=OZJzCCCkP/W5KvQ/hFvKeFWj9YCReBVVndw36/7ORI0dOs6vxrIABMMq4knd8vFJg g07E+36qORAmY3/00ZdmHMyndibLs6kGCU5I9G7bOIeJ6+1Eiyht6IstCdoZp0ZwAv An+3GovDNplUhloInV7ZAFU9CdU+Fy/GtGSeUAIrT44F3nkP6RYmWn5GYzLm4FBspD TaV8KUPtJqNQZXRpGE76Wm0N1JoEHv4BcP9Ji59bGFXoHihBoR2XLcR1En22NVKEqG Ta41C72wIHNB5v6loMO/JHXaXVS8WL6DpDrcqj/ydumgS7MascnFAQ7GdBGirvRJgQ 7nM119yzBJH+A== Date: Sat, 6 Dec 2025 08:23:36 +1000 From: Mal Haak To: linux-kernel@vger.kernel.org Subject: Re: Possible memory leak in 6.17.7 Message-ID: <20251206082336.6e04a1ac@xps15mal> In-Reply-To: <20251120122351.231513e1@xps15mal> References: <20251110182008.71e0858b@xps15mal> <20251120122351.231513e1@xps15mal> X-Mailer: Claws Mail 4.3.1 (GTK 3.24.51; x86_64-pc-linux-gnu) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit I have a reproducer. It's slow but it works. I kept rsync running for 2 days by moving 5TB of files. smem -wp Area Used Cache Noncache firmware/hardware 0.00% 0.00% 0.00% kernel image 0.00% 0.00% 0.00% kernel dynamic memory 98.81% 1.69% 97.13% userspace memory 0.08% 0.05% 0.03% free memory 1.11% 1.11% 0.00% [root@kerneltest ~]# uname -a Linux kerneltest 6.18.0-1-mainline #1 SMP PREEMPT_DYNAMIC Tue, 11 Nov 2025 00:02:22 +0000 x86_64 GNU/Linux The issue is in 6.18. On Thu, 20 Nov 2025 12:23:51 +1000 Mal Haak wrote: > On Mon, 10 Nov 2025 18:20:08 +1000 > Mal Haak wrote: > > > Hello, > > > > I have found a memory leak in 6.17.7 but I am unsure how to track it > > down effectively. > > > > I am running a server that has a heavy read/write workload to a > > cephfs file system. It is a VM. > > > > Over time it appears that the non-cache useage of kernel dynamic > > memory increases. The kernel seems to think the pages are > > reclaimable however nothing appears to trigger the reclaim. This > > leads to workloads getting killed via oomkiller. > > > > smem -wp output: > > > > Area Used Cache Noncache > > firmware/hardware 0.00% 0.00% 0.00% > > kernel image 0.00% 0.00% 0.00% > > kernel dynamic memory 88.21% 36.25% 51.96% > > userspace memory 9.49% 0.15% 9.34% > > free memory 2.30% 2.30% 0.00% > > > > free -h output: > > > > total used free shared buff/cache available > > Mem: 31Gi 3.6Gi 500Mi 4.0Mi 11Gi 27Gi > > Swap: 4.0Gi 179Mi 3.8Gi > > > > Reverting to the previous LTS fixes the issue > > > > smem -wp output: > > Area Used Cache Noncache > > firmware/hardware 0.00% 0.00% 0.00% > > kernel image 0.00% 0.00% 0.00% > > kernel dynamic memory 80.22% 79.32% 0.90% > > userspace memory 10.48% 0.20% 10.28% > > free memory 9.30% 9.30% 0.00% > > > I have more information. The leaking of kernel memory only starts once > there is a lot of data in buffers/cache. And only once it's been in > that state for several hours. > > Currently in my search for a reproducer I have found that > downloading then seeding of multiple torrents of linux > distribution ISO's will replicate the issue. But it only begins > leaking at around the 6-9 hour mark. > > It does not appear to be dependant on cephfs; but due to it's use of > sockets I believe this is making the situation worse. > > I cannot replicate it at all with the LTS kernel release but it does > look like the current RC releases do have this issue. > > I was looking at doing a kernel build with CONFIG_DEBUG_KMEMLEAK > enabled and will if it's thought this would find the issue. However as > the memory usage is still somewhat tracked and obviously marked as > reclaimable it feels more like something in the reclaim logic is > getting broken. > > I do wonder if due to it only happening after ram is mostly consumed > by cache, and even then only if it has been that way for hours, if the > issue is memory fragmentation related. > > Regardless, some advice on how to narrow this down faster than a git > bisect as 9hrs to even confirm replication of the issue makes git > bisect painfully slow. > > Thanks in advance > > Mal Haak >