From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail.haak.id.au (mail.haak.id.au [172.105.183.32]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7FCB41624DF for ; Thu, 20 Nov 2025 02:30:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=172.105.183.32 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763605834; cv=none; b=Taxwym0Cn6pWFh5s8uPZvPsy5fjmUphPtMnrwn5xdhm3k0igpO2OfjRt+nvUz0frqEG5fFeJyeD3LSdYP50ceVuQ2W8BbU9CM9jXpZKwTdxTSiw1L3dkuLBPqpvkiMksOVslEUipj2K8oCYz91O4sz2KZanSTF2/4AW9Cmm9bJY= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763605834; c=relaxed/simple; bh=lAq5Hzf+BEKspyrZu1w9ZzFb/Nd385r/FBO7y3k3fRc=; h=Date:From:To:Subject:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=jIsEJj9WvxYctPCeeLdEiohmAJQEnF6biJuyHnnvyDtgf6n8WQPA9KC9WWUlTQR7MiyxMcDx2lzpntJrwh04dSW2Y9pgNPC5SNvpY8N2xMuK6dDopLnO5+/nN0zwZTPGCY5AL0vjnzvTubn12BWT8VAxCc9ZP2v2mtbqQdS1HFI= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=haak.id.au; spf=pass smtp.mailfrom=haak.id.au; dkim=pass (2048-bit key) header.d=haak.id.au header.i=@haak.id.au header.b=Du2T5kao; arc=none smtp.client-ip=172.105.183.32 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=haak.id.au Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=haak.id.au Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=haak.id.au header.i=@haak.id.au header.b="Du2T5kao" Received: from xps15mal (180-150-104-78.b49668.bne.static.aussiebb.net [180.150.104.78]) by mail.haak.id.au (Postfix) with ESMTPSA id 28E0C832AF for ; Thu, 20 Nov 2025 12:23:55 +1000 (AEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=haak.id.au; s=202002; t=1763605435; bh=lAq5Hzf+BEKspyrZu1w9ZzFb/Nd385r/FBO7y3k3fRc=; h=Date:From:To:Subject:From; b=Du2T5kaoC+tnyk0K3R++bpMMY9RswuedtpCAk/b5ZtXD98hm9hBu54oIqK/bgIcqm 9JnfQWQVG8Xw7Ty3oDRC/Ai1AxJxmfuE9eV6JSg8qFrb01rQnaPcJZIXwM2NS20JQe Rx9gY2LGDS9Wo0tFRMblqFHU3zDeQhf4OJ6hu9Bgs1j9AW1Opd1FIos54o+Ex3mTzu /kXyhsckb7cH5UcOatkFgULeZ6yVnIC0DEsm1BsGiVLdqT/S/wJAXXqfPkU+vdzvbD 9bisrMBNkYCrwaj414UVmscUbsp2wlivZkl9G3C96ChJxhcoscbvsngYAkEkJHqI1V 8KdeViB/QNg7g== Date: Thu, 20 Nov 2025 12:23:51 +1000 From: Mal Haak To: linux-kernel@vger.kernel.org Subject: Re: Possible memory leak in 6.17.7 Message-ID: <20251120122351.231513e1@xps15mal> In-Reply-To: <20251110182008.71e0858b@xps15mal> References: <20251110182008.71e0858b@xps15mal> X-Mailer: Claws Mail 4.3.1 (GTK 3.24.51; x86_64-pc-linux-gnu) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit On Mon, 10 Nov 2025 18:20:08 +1000 Mal Haak wrote: > Hello, > > I have found a memory leak in 6.17.7 but I am unsure how to track it > down effectively. > > I am running a server that has a heavy read/write workload to a cephfs > file system. It is a VM. > > Over time it appears that the non-cache useage of kernel dynamic > memory increases. The kernel seems to think the pages are reclaimable > however nothing appears to trigger the reclaim. This leads to > workloads getting killed via oomkiller. > > smem -wp output: > > Area Used Cache Noncache > firmware/hardware 0.00% 0.00% 0.00% > kernel image 0.00% 0.00% 0.00% > kernel dynamic memory 88.21% 36.25% 51.96% > userspace memory 9.49% 0.15% 9.34% > free memory 2.30% 2.30% 0.00% > > free -h output: > > total used free shared buff/cache available > Mem: 31Gi 3.6Gi 500Mi 4.0Mi 11Gi 27Gi > Swap: 4.0Gi 179Mi 3.8Gi > > Reverting to the previous LTS fixes the issue > > smem -wp output: > Area Used Cache Noncache > firmware/hardware 0.00% 0.00% 0.00% > kernel image 0.00% 0.00% 0.00% > kernel dynamic memory 80.22% 79.32% 0.90% > userspace memory 10.48% 0.20% 10.28% > free memory 9.30% 9.30% 0.00% > I have more information. The leaking of kernel memory only starts once there is a lot of data in buffers/cache. And only once it's been in that state for several hours. Currently in my search for a reproducer I have found that downloading then seeding of multiple torrents of linux distribution ISO's will replicate the issue. But it only begins leaking at around the 6-9 hour mark. It does not appear to be dependant on cephfs; but due to it's use of sockets I believe this is making the situation worse. I cannot replicate it at all with the LTS kernel release but it does look like the current RC releases do have this issue. I was looking at doing a kernel build with CONFIG_DEBUG_KMEMLEAK enabled and will if it's thought this would find the issue. However as the memory usage is still somewhat tracked and obviously marked as reclaimable it feels more like something in the reclaim logic is getting broken. I do wonder if due to it only happening after ram is mostly consumed by cache, and even then only if it has been that way for hours, if the issue is memory fragmentation related. Regardless, some advice on how to narrow this down faster than a git bisect as 9hrs to even confirm replication of the issue makes git bisect painfully slow. Thanks in advance Mal Haak