From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from fout-b4-smtp.messagingengine.com (fout-b4-smtp.messagingengine.com [202.12.124.147]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D55C3374185; Thu, 9 Apr 2026 21:09:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=202.12.124.147 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775768962; cv=none; b=LX8nJzqm8wuFPCfQkjOH+3/6OIMm4zx2vjgR7XzqayUHbQK/jYz9Cq3Azh39mjlJ2zwMfVpvRAM745PQXgWGAg+4bQD1UAiZp/pVzpMPE8d0xTukSvRHGF1VOx2oD9LNgOAzJq6Efv3YF83IyHxN1bAunz8dnVQMtfz60WTuJ2U= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775768962; c=relaxed/simple; bh=MZunF1xN5zrge3JmcTJNc0v9YAdspsz5QhxRbtTKwcs=; h=Date:From:To:Cc:Subject:Message-ID:MIME-Version:Content-Type: Content-Disposition; b=KbQLQ4bkVRok+I265DyFdP7GqeufyGyxeMdGXWTwdsbBLlffGqflq1SSiusJYZhs3jxbEk1v5Npd6gNA8Dwn5JnV6qisjjhchJTbRfgL9J1Xui2t4Lhc1JmEpU5zSFG3+8aiOdhzhAHrFNKoW7y+6i+QSxmfevZyZQM69UeRe4o= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=bur.io; spf=pass smtp.mailfrom=bur.io; dkim=pass (2048-bit key) header.d=bur.io header.i=@bur.io header.b=o2L2wXFF; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b=TogtoZrx; arc=none smtp.client-ip=202.12.124.147 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=bur.io Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bur.io Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bur.io header.i=@bur.io header.b="o2L2wXFF"; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="TogtoZrx" Received: from phl-compute-06.internal (phl-compute-06.internal [10.202.2.46]) by mailfout.stl.internal (Postfix) with ESMTP id EF21C1D001C9; Thu, 9 Apr 2026 17:09:19 -0400 (EDT) Received: from phl-frontend-03 ([10.202.2.162]) by phl-compute-06.internal (MEProxy); Thu, 09 Apr 2026 17:09:20 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bur.io; h=cc:cc :content-type:content-type:date:date:from:from:in-reply-to :message-id:mime-version:reply-to:subject:subject:to:to; s=fm3; t=1775768959; x=1775855359; bh=MZunF1xN5zrge3JmcTJNc0v9YAdspsz5 QhxRbtTKwcs=; b=o2L2wXFFkxCVBbna22tFAs8DtCjafPn77fvQX55BlUe5hyLZ am0LxTcNf4z5FFULlAgQwj317NreS6UcntlHscIfFJ207fiw61//jTdwozksAlGQ xRnOYKHcn209iRF7JlDzTaoQUKxoHhy00VAarAxwuWiO+AoVIsJjVYeEGt3c7ayq LagUC7sNf+kArBNw6IKEqmCIx6km3FBq4nispZBa9XiQImZiAIcJ32cCQsE5I1P0 W+Pf+DoJH04S6FCRq9USSq2O6XCk3G+iHVKHYheavy7OLGQsB8bcuxr9OIow7oP9 uuvMOqLN0tb8K5HhxbtGFmIQZ5oqjIrlH86EoQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-type:content-type:date:date :feedback-id:feedback-id:from:from:in-reply-to:message-id :mime-version:reply-to:subject:subject:to:to:x-me-proxy :x-me-sender:x-me-sender:x-sasl-enc; s=fm2; t=1775768959; x= 1775855359; bh=MZunF1xN5zrge3JmcTJNc0v9YAdspsz5QhxRbtTKwcs=; b=T ogtoZrxkMyhClXEh8CIly+y8MXpCcoMJf41gxBmru+I386BL+Sde8gLeltuKzInU jExRog3UsF/G5/9V7+8VmL5dFxRvQaF4ntAYDx4n1/AIC9WqQYkA3tlaZO7E9J2z FYeBAwAQV62ouLfop/bPUq8qj+HsYi8iaBhj2irlsSSDhRbEg4/Fl3N4INVVIOvT buBGIXsXrTCn3KugrZ2M4nSvngBGfYxho25insJrD1ZkhDNxpOLYxzBrKKj1Uvyu P30/o7RilhbTffVJE88RnmSgKoBeST7v+yNhdTOiC4GVIq9H8oWSwthIMVY02sxR bt+ynAL/FrGZz9YRKHb6A== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeefhedrtddtgddvjeehgecutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpuffrtefokffrpgfnqfghnecuuegr ihhlohhuthemuceftddtnecunecujfgurhepfffhvfevuffkgggtugesthdtredttddtvd enucfhrhhomhepuehorhhishcuuehurhhkohhvuceosghorhhishessghurhdrihhoqeen ucggtffrrghtthgvrhhnpeduveeuteeufeefvefgtdeifeetgedvueefgfeuffelvefghf eijefgledtuefhieenucffohhmrghinhepkhgvrhhnvghlrdhorhhgpdihohhuthhusggv rdgtohhmnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrghilhhfrhhomh epsghorhhishessghurhdrihhopdhnsggprhgtphhtthhopeegpdhmohguvgepshhmthhp ohhuthdprhgtphhtthhopehlihhnuhigqdhfshguvghvvghlsehvghgvrhdrkhgvrhhnvg hlrdhorhhgpdhrtghpthhtoheplhhsfhdqphgtsehlihhsthhsrdhlihhnuhigqdhfohhu nhgurghtihhonhdrohhrghdprhgtphhtthhopehlihhnuhigqdhmmheskhhvrggtkhdroh hrghdprhgtphhtthhopehlihhnuhigqdgsthhrfhhssehvghgvrhdrkhgvrhhnvghlrdho rhhg X-ME-Proxy: Feedback-ID: i083147f8:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Thu, 9 Apr 2026 17:09:18 -0400 (EDT) Date: Thu, 9 Apr 2026 14:09:06 -0700 From: Boris Burkov To: linux-fsdevel@vger.kernel.org Cc: lsf-pc@lists.linux-foundation.org, linux-mm@kvack.org, linux-btrfs@vger.kernel.org Subject: [LSF/MM/BPF TOPIC] Direct Reclaim and Filesystems Message-ID: <20260409210906.GA881465@zen.localdomain> Precedence: bulk X-Mailing-List: linux-btrfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Hello, A theme that we (Shakeel, JP, I, and others at Meta) have observed in the fleet at Meta is a tension between btrfs and direct reclaim. This has manifested in a variety of different ways. All situations also must be considered w.r.t. memcg reclaim and global reclaim. There is no overall "assignment of blame" intended, just a desire to build a deeper understanding of best practices and paths forward for all the components involved. I work on BTRFS and have minimal direct experience with how other filesystems besides btrfs handle such challenges, but I imagine there must be some overlap in challenges. I think this is probably too large a topic for a single session, but I am curious if any of the categories of issues are broadly interesting. I personally think the one that cuts across the groups the most is the question of reclaim cpu usage. - The filesystem triggering direct reclaim [2] Especially when the filesystem is holding a lock like the inode rwsem or a filesystem internal lock (like the btrfs btree locks), this results in unexpectedly high latency for the filesystem user, and in the case of memcg reclaim and held locks, will unfairly affect the latency of other cgroups not under reclaim. We are working on categorizing these and reducing them case by case, but a clearer statement about valid allocation contexts and GFP flags could be broadly useful. - Reclaim freeing metadata and/or forcing metadata writeback [1][3][4] In btrfs, this results in redundant work fetching and writing btree nodes if it happens to hot nodes in the btree. Should we be trying to lock some of these nodes down from reclaim? If so, how many is appropriate/safe? - High reclaim CPU usage [1][4][6] It is possible to rapidly generate a very large amount of direct reclaim, for example by doing parallel page cache reads larger than the cgroup limit from many tasks in a memory.[high|max] constrained cgroup. This will then use a great deal of CPU attempting to do the direct reclaim. This CPU usage can become so extreme, and can be emphasized with cpuset cgroups, that we end up being unable to schedule tasks holding important shared locks and massively tank the throughput of the system. I have been able to reproduce conditions where even killing the offending cgroup can take minutes. Some crude early experiments have shown that throttling the reclaim cpu usage reduces the intensity of some of these problems. Can this also be attacked via cgroup cpu throttling? Proxy execution? What about the same issues under significant global direct reclaim? - Filesystem doing expensive work while in direct reclaim [5] In BTRFS, compression can result in relatively expensive work while trying to do writeback urgently. Jan brought up issues around synchronous expensive work in inode reclaim as an LSF/MM/BPF topic already. Thanks for reading and thanks in advance for any feedback and thoughts, Boris Links: [1] btrfs memcg accounting separation (AS_KERNEL_FILE) https://lore.kernel.org/linux-btrfs/f09c4e2c90351d4cb30a1969f7a863b9238bd291.1755812945.git.boris@bur.io/ [2] btrfs readahead direct reclaim reduction https://lore.kernel.org/linux-btrfs/9fd974c2-00aa-4906-8cab-ec0d85750c4b@gmx.com/ [3] btrfs re-cowing inhibition https://lore.kernel.org/linux-btrfs/cover.1772097864.git.loemra.dev@gmail.com/ [4] btrfs csum tree write locking reduction Link: https://lore.kernel.org/linux-btrfs/aa5a3d849cb093a767e08616258c03c7eec8fe26.1753806780.git.boris@bur.io/#r [5] Jan Kara's proposal to discuss complex cleanup in reclaim https://lore.kernel.org/linux-fsdevel/c18f8189b755c13064f51d93bfcaddb15300f9f8.camel@kernel.org/T/#m319eb6245485bb7c71171a55bf700cc1409a144d [6] LPC previous discussion of cpu hogging and locks (unrelated to reclaim). https://www.youtube.com/watch?v=_N-nXJHiDNo