From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 04DA7C87FCA for ; Fri, 25 Jul 2025 23:25:59 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8C44F6B008A; Fri, 25 Jul 2025 19:25:59 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 88CF36B008C; Fri, 25 Jul 2025 19:25:59 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 78B006B0092; Fri, 25 Jul 2025 19:25:59 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 6BC2B6B008A for ; Fri, 25 Jul 2025 19:25:59 -0400 (EDT) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id D9A271A056A for ; Fri, 25 Jul 2025 23:25:58 +0000 (UTC) X-FDA: 83704371996.17.AC5922B Received: from out-184.mta0.migadu.com (out-184.mta0.migadu.com [91.218.175.184]) by imf21.hostedemail.com (Postfix) with ESMTP id EB3DC1C0007 for ; Fri, 25 Jul 2025 23:25:56 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b="Fw/uCp44"; spf=pass (imf21.hostedemail.com: domain of roman.gushchin@linux.dev designates 91.218.175.184 as permitted sender) smtp.mailfrom=roman.gushchin@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1753485957; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=aaydfYuqVafMA2JLko3KZS5h6j/lPMMeqjP2Raw9h1E=; b=IZaURxZt/HwPpkJWC/je9UMyPa9dEB/ig/gvASUJU/0FzOMbcWgFPUWLhdyLay1H56l2nc GxZmnL6RXrEKa30OdDnSK014/sI6xQfkfazCRxZkSBY11DyWD7fC+zBgMeJaFvmTQsR+jH uf0tDHwg3trz4cD8K4jv7LuNebHpfQ4= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1753485957; a=rsa-sha256; cv=none; b=fxgSF2lU61XyYV0JVUkLxS66TcN45uJMn1iReoUSI7q9DuUBSJDWNCqCnyeGw4qmNiW4Yr pOhjyYcR08FhZEBK1eS2ILRCiYnBj0znESQx+d1oBg7H/ONuyhNq+pdQpaO96OIEyBiEdE CfKp7nbL89NGIz1T4L2akjks1spg7sc= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b="Fw/uCp44"; spf=pass (imf21.hostedemail.com: domain of roman.gushchin@linux.dev designates 91.218.175.184 as permitted sender) smtp.mailfrom=roman.gushchin@linux.dev; dmarc=pass (policy=none) header.from=linux.dev X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1753485954; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=aaydfYuqVafMA2JLko3KZS5h6j/lPMMeqjP2Raw9h1E=; b=Fw/uCp44j50elRecwJYSlIwHTVK4my4ZXuqt1jXAPGzfYKW68TOQLEWcy0aYrAYuOUS2tp 5weDJPJQbYnUTtSguNmZ4Q2Fu3dMoSEHhjiXvQJeCIzfWSQfoDu/L3p10rq1rkRSyGO9YO qQ0u6ug7uwAzA06C4Eu0TAeQqCl8Hu4= From: Roman Gushchin To: Jan Kara Cc: Andrew Morton , Matthew Wilcox , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Liu Shixin Subject: Re: [PATCH] mm: consider disabling readahead if there are signs of thrashing In-Reply-To: <875xffsxj4.fsf@linux.dev> (Roman Gushchin's message of "Fri, 25 Jul 2025 15:42:07 -0700") References: <20250710195232.124790-1-roman.gushchin@linux.dev> <875xffsxj4.fsf@linux.dev> Date: Fri, 25 Jul 2025 16:25:49 -0700 Message-ID: <87jz3vdf9e.fsf@linux.dev> MIME-Version: 1.0 Content-Type: text/plain X-Migadu-Flow: FLOW_OUT X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: EB3DC1C0007 X-Stat-Signature: 3eqnceacj3zqc6798y14zbfxuagnmrbd X-Rspam-User: X-HE-Tag: 1753485956-700436 X-HE-Meta: U2FsdGVkX1+D83fjSotNt4+NuFuzqmuQmLuP0H2xvFkFro4xH2oUSLDalxNrTT1S8teym8Dz5cmw20EzkMubBWaMrtE+0pm4ULp7LAY67Sr4jK+tL6kO8D3mcdvmpjHisPdXA6QRKYmAZ+kFgzfDgcyhj5423C2n8668TGrung/9Gslh8b2v83JxnE5vC1dQK2DdzFlOcQxAtvhT71UVSN5lhQrMBgBR6RrZZhM+0Li65OfM3cK1HOSmTu5QH+rmSOA76emBRaArXGmHYWMQYk/L+qGRqJl5eKVEfdudmnQzTnyG2+jOer2BjLmdm8BVObvQ+UYueUj2tPpB8KPHqBseyybmcIRzIhpAMQjrhMcHyaVcVKFzsXzIIemZeb1s9/nmV720+5eJ3rv5agF8GpWDICqKszqpDvTIvCAPDhag7ekRlTaYDC5Y29qw3pk2Rt2aouhxZ5SaRbOOIrcMlARQY3dCLxoYnqg0ylCG+Tnf69mTFzM3KHwyHKdDqT2zGBiL/oUmZZui9GKUVCpg6wHa8KvjF4LvXW85rDvz93N3uMK1MB+sbIaOTMypZYe+GvTzxbYIPc0x00DgZMqzMlVsYV26K1Ny+/askSZALOz6t3S0GHFTepwbNOiHRolwOJ4qcJSKcuOAaGbcRfKi6JA6k/i2FEkcu3IbxgBtBRWpNM9G28qInND/jjWgBAO9z0YDBRC3rBpBW0CubcvTTscAnTgeczls0yfStiwhqoMBxHJIG7K73rO4kE4DOEe51ssKA7SPFfqs4dXioMlB4uoKCOOu+6WroiRwy70ETQnJ8jaq+q2T9JrIaxoCq156rtRExj3oW2Y7nKUCe/v3ilUvU85TjNzCuQonDbQ/NVpuEWATi5zxRQs0EWuBb15hNTobjL6t1G0m7WwYrwNLZt8Hd7iTkfDD0Ga0X5FWIrCcEp1YFuuEPBzRaif0o5b5cNv5Th0zZ/6SiJJhi7Y VrG9qJ3/ paSWO9LCbb3EnK2m92jhaH2kGhL6zcRRbx8X0Uud1tPdk/MSOYJYHTPDUIYtWSwlEx/km+JLZG//Fn6b62nM6SRYjcMfDZxg1qSBea4B1c5d0aO3jCia3vNjREitosjPM3AovnDahd0+4SIDU06yaRwvG5GalRRoGbzFGB5e5SgM0s6UBVYdESg8gUMYAkbhUfrgbawLTzJwai0RVmOgAvS5Jdg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Roman Gushchin writes: > Jan Kara writes: > >> On Thu 10-07-25 12:52:32, Roman Gushchin wrote: >>> We've noticed in production that under a very heavy memory pressure >>> the readahead behavior becomes unstable causing spikes in memory >>> pressure and CPU contention on zone locks. >>> >>> The current mmap_miss heuristics considers minor pagefaults as a >>> good reason to decrease mmap_miss and conditionally start async >>> readahead. This creates a vicious cycle: asynchronous readahead >>> loads more pages, which in turn causes more minor pagefaults. >>> This problem is especially pronounced when multiple threads of >>> an application fault on consecutive pages of an evicted executable, >>> aggressively lowering the mmap_miss counter and preventing readahead >>> from being disabled. >> >> I think you're talking about filemap_map_pages() logic of handling >> mmap_miss. It would be nice to mention it in the changelog. There's one >> thing that doesn't quite make sense to me: When there's memory pressure, >> I'd expect the pages to be reclaimed from memory and not just unmapped. >> Also given your solution uses !uptodate folios suggests the pages were >> actually fully reclaimed and the problem really is that filemap_map_pages() >> treats as minor page fault (i.e., cache hit) what is in fact a major page >> fault (i.e., cache miss)? >> >> Actually, now that I digged deeper I've remembered that based on Liu >> Shixin's report >> (https://lore.kernel.org/all/20240201100835.1626685-1-liushixin2@huawei.com/) >> which sounds a lot like what you're reporting, we have eventually merged his >> fixes (ended up as commits 0fd44ab213bc ("mm/readahead: break read-ahead >> loop if filemap_add_folio return -ENOMEM"), 5c46d5319bde ("mm/filemap: >> don't decrease mmap_miss when folio has workingset flag")). Did you test a >> kernel with these fixes (6.10 or later)? In particular after these fixes >> the !folio_test_workingset() check in filemap_map_folio_range() and >> filemap_map_order0_folio() should make sure we don't decrease mmap_miss >> when faulting fresh pages. Or was in your case page evicted so long ago >> that workingset bit is already clear? >> >> Once we better understand the situation, let me also mention that I have >> two patches which I originally proposed to fix Liu's problems. They didn't >> quite fix them so his patches got merged in the end but the problems >> described there are still somewhat valid: > > Ok, I got a better understanding of the situation now. Basically we have > a multi-threaded application which is under very heavy memory pressure. > I multiple threads are faulting simultaneously into the same page, > do_sync_mmap_readahead() can be called multiple times for the same page. > This creates a negative pressure on the mmap_miss counter, which can't be > matched by do_sync_mmap_readahead(), which is be called only once > for every page. This basically keeps the readahead on, despite the heavy > memory pressure. > > The following patch solves the problem, at least in my test scenario. > Wdyt? Actually, a better version is below. We don't have to avoid the actual readahead, just not decrease mmap_miss if the page is locked. -- diff --git a/mm/filemap.c b/mm/filemap.c index 0d0369fb5fa1..1756690dd275 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -3323,9 +3323,15 @@ static struct file *do_async_mmap_readahead(struct vm_fault *vmf, if (vmf->vma->vm_flags & VM_RAND_READ || !ra->ra_pages) return fpin; - mmap_miss = READ_ONCE(ra->mmap_miss); - if (mmap_miss) - WRITE_ONCE(ra->mmap_miss, --mmap_miss); + /* If folio is locked, we're likely racing against another fault, + * don't decrease the mmap_miss counter to avoid decreasing it + * multiple times for the same page and break the balance. + */ + if (likely(!folio_test_locked(folio))) { + mmap_miss = READ_ONCE(ra->mmap_miss); + if (mmap_miss) + WRITE_ONCE(ra->mmap_miss, --mmap_miss); + } if (folio_test_readahead(folio)) { fpin = maybe_unlock_mmap_for_io(vmf, fpin);