From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-pf1-f181.google.com (mail-pf1-f181.google.com [209.85.210.181]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 36E4233F5A5 for ; Fri, 24 Apr 2026 17:04:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.181 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777050252; cv=none; b=E7Z89l1hEu2K59uc+hENF3heSIcExswiCNXVd7TlKjvyYkoIcJZy99tebsLBFof8Q0WO+nylgGdKFLYXlCWe9WiUHbzQ8Uv5eFQK/1fgqDbNFlHk/p6do/acavuxeFHwYMsRy7mDl2NJEAegrhvVr7PhSjVyptevdNNgVWXw8s8= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777050252; c=relaxed/simple; bh=psG9k68M3iWNmQ7FIkCQZ8GCcnPQbxfXBhHvRDPxHlI=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=lSej1yMVe9dAyPOclg+vWmUYls3iBjNfIIOwcODnkjECg3hdkxKWPArYHAF2qBwL5EMAfYvnTB67N2a38YWHKHo99zfzz/5/x9dGAVR6puq8gooY2w79CcN+noXcw4kItcG5fzL8FJBUdsYR/Q4Z46FdUBzlgrE4MX4xhNxUlQA= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=RpUSss4H; arc=none smtp.client-ip=209.85.210.181 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="RpUSss4H" Received: by mail-pf1-f181.google.com with SMTP id d2e1a72fcca58-82f1f6103afso3830672b3a.1 for ; Fri, 24 Apr 2026 10:04:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1777050250; x=1777655050; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=n86XSIik8J8+AzRhkUPuxV5HTouY06SUAZurklmcJn8=; b=RpUSss4HdrNWLDkQrDAFFqeBBfGmVYlN1AxTHMKN+gyYZsU6VHF8zr131DAChGI62p BGTMfGbJ17zihtQdQWltGMUez2bCTfredLVngqPoL33imujFTk4vajq4P+apR45Yn08i jz4qhAIjJS+rgJz6szus5UREWTdpCpT8n6uFF8bEMFNcbuKnCEUMsHjBdkxipNgyzI2p a2a331o1dGwSrnyTwzx0w8/5YHJxJeE4qhw32XIjcu3WkHcQnI7khzt+r+3mfQAYD7Kv GBrvLjQDCCtmUgpIqsWhsgJ0lx9OVWjEXP51J85YQxOYtRMIDQher+Njmsq5MfQEPWVd vJuA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1777050250; x=1777655050; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-gg:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=n86XSIik8J8+AzRhkUPuxV5HTouY06SUAZurklmcJn8=; b=cGuBiKmXESwiYukzAKbVQ5pxny3FnLCEqw9UanckkZVo0K2tdvgOIo76ICt4oFSxvp YA07oGkNrarwGG7LUSkG1S6ytETtVfmkCXBApu8qX6ynQtvOYmQO6c+/LzWcCsZ5Hd7W lAlhh5SkA77P7AwJjcB5ufRy9O1i31thHoT/qmu6x0F9wczSVOYQrVZE0fQsjSMNKr5i XOSs66uJ6GMibRLu48ddcdVEDvOYem15N03bj3w4+8JT9KJpUtz9IqUJ4A8Jn0votT+I LD6AKhqT/5QYJ0Wj3leiwLhsM9xxeCcnK8Pn+I/jAPFMNQo9UUDNO8zxjm9snzWc5Gs+ tSew== X-Forwarded-Encrypted: i=1; AFNElJ/DW6MwL9TTbEEWVHVwiyRneZGKCXFxOKB6lZYodxo/kYIA22Z7nErQH+IxVNL24vcMOmNKrX9BDiJJ/aY=@vger.kernel.org X-Gm-Message-State: AOJu0Yw3YMHJ99e+rSbhObJjuOHCCkVUDqi3yHq5Vi94AE9b5tzoeZ9c G/jvcOA3g1VBgofCTSrs8MPrj/6/S0HfGHRnP9ka2LGMe5psPQPQ0jxq X-Gm-Gg: AeBDies6XLp2+9H3ExNKLlZfbn2YI+TQ8wfPLlK49EYaTAzQx8erFtTa3elzR1F1ljp /7mNBzgWGWRQse0OUwUCexxWn2LkYFCGxfhofclpK+Ukn1GnGCeseJy275Y5ShUeVjVdkFWM6xQ WvubGJdl/FobfkCmiJOdQt2IU495oC/bzyMyQx4x0fwwDodSMT5uANvh3DrJMJTndK9K7cP/Sf4 VjG0eKBgWGkiRci346PUTdMuoLInsS9R3leguWKAH9M3qsSbPZv3unekB10ZYwYu+soo6As2FRf 1BUgLXvLD700fxtuaa403HK6itE7TgiEznqRhvjIzESj56oRPRrrK+M+4gJifCktGERRdMr2L3h HwxwBYudTuyYV86wSW7EYLIp+EK/5m+ZtYegdtm9fmQW+XZI/vi4Pgb+heMqVjYmOlXIm4avGNL M3FlJONVQTun9SBfZBMIGVzRNV0QVygJlKlk5bJDTmYk8QIE45BR6QBjTPuGOXFdW5nvnDyA== X-Received: by 2002:a05:6a00:3498:b0:829:8cfb:df45 with SMTP id d2e1a72fcca58-82f8c861f7emr32089006b3a.15.1777050250230; Fri, 24 Apr 2026 10:04:10 -0700 (PDT) Received: from KASONG-MC4 ([101.32.222.185]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-82f8ebcc9easm31088369b3a.39.2026.04.24.10.04.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 24 Apr 2026 10:04:09 -0700 (PDT) Date: Sat, 25 Apr 2026 01:04:01 +0800 From: Kairui Song To: linux-mm@kvack.org Cc: Kairui Song , Andrew Morton , Axel Rasmussen , Yuanchu Xie , Wei Xu , Johannes Weiner , David Hildenbrand , Michal Hocko , Qi Zheng , Shakeel Butt , Lorenzo Stoakes , Barry Song , David Stevens , Chen Ridong , Leno Hou , Yafang Shao , Yu Zhao , Zicheng Wang , Kalesh Singh , Suren Baghdasaryan , Chris Li , Vernon Yang , linux-kernel@vger.kernel.org, Qi Zheng , Baolin Wang Subject: Re: [PATCH v6 04/14] mm/mglru: restructure the reclaim loop Message-ID: References: <20260424-mglru-reclaim-v6-0-a57622d770c3@tencent.com> <20260424-mglru-reclaim-v6-4-a57622d770c3@tencent.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20260424-mglru-reclaim-v6-4-a57622d770c3@tencent.com> Forwarding sashiko review: > commit b0a64f8c3f76ce5cebb30c1caa6305fc617888cf > Author: Kairui Song > mm/mglru: restructure the reclaim loop > The current loop calculates the scan number on each iteration, which couples > it with aging and rotation. This patch decouples aging and rotation by > calculating the scan number once at the beginning of reclaim, respecting > reclaim priority, and making aging explicit. > > diff --git a/mm/vmscan.c b/mm/vmscan.c > > index 3c5a6ae9244099..757beb605980d4 100644 > > --- a/mm/vmscan.c > > +++ b/mm/vmscan.c > [ ... ] > > @@ -4940,11 +4932,10 @@ static long get_nr_to_scan(struct lruvec *lruvec, struct scan_control *sc, > > return nr_to_scan; > > > > nr_to_scan = apply_proportional_protection(memcg, sc, nr_to_scan); > > + nr_to_scan >>= sc->priority; > > > > - /* try to get away with not aging at the default priority */ > > - if (!need_aging || sc->priority == DEF_PRIORITY) > > - return nr_to_scan >> sc->priority; > > + if (!nr_to_scan && sc->priority < DEF_PRIORITY) > > + nr_to_scan = min(evictable, SWAP_CLUSTER_MAX); > Does this code introduce a priority inversion where higher reclaim priorities > result in fewer pages being scanned than lower priorities? > If a small memcg has an evictable size of 100 and SWAP_CLUSTER_MAX is 32, > at sc->priority == 7, 100 >> 7 is 0, so the fallback sets nr_to_scan to 32. > At sc->priority == 6 (higher urgency), 100 >> 6 is 1. The fallback is > bypassed and nr_to_scan remains 1. > Would it be better for this fallback to apply a floor regardless of whether > the shifted result is exactly zero? Right, sashiko has been very concerned about this part for many times, it wasn't happy about either design (unconditional floor, no handling at all, or this version). But this is not a functional regression. For tiny cgroups the old code before this patch could be over reclaiming. Now the over reclaiming is no more but could be a trivial priority inversion, which I don't think that's a practically observable problem. But to be academically ideal, following change should better: nr_to_scan = apply_proportional_protection(memcg, sc, nr_to_scan); nr_to_scan >>= sc->priority; /* make escalated reclaim progress and balance with slab shrinking */ if (sc->priority < DEF_PRIORITY) nr_to_scan = max(nr_to_scan, min(evictable, SWAP_CLUSTER_MAX)); No inversion, and no under reclaim or over reclaim. > [ ... ] > > @@ -4985,31 +4977,44 @@ static bool should_abort_scan(struct lruvec *lruvec, struct scan_control *sc) > > return true; > > } > > > > +/* > > + * For future optimizations: > > + * 1. Defer try_to_inc_max_seq() to workqueues to reduce latency for memcg > > + * reclaim. > > + */ > > static bool try_to_shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc) > > { > > + bool need_rotate = false; > > long nr_batch, nr_to_scan; > > - unsigned long scanned = 0; > > int swappiness = get_swappiness(lruvec, sc); > > + struct mem_cgroup *memcg = lruvec_memcg(lruvec); > > > > - while (true) { > > + nr_to_scan = get_nr_to_scan(lruvec, sc, memcg, swappiness); > > + while (nr_to_scan > 0) { > > int delta; > > + DEFINE_MAX_SEQ(lruvec); > > > > - nr_to_scan = get_nr_to_scan(lruvec, sc, swappiness); > > - if (nr_to_scan <= 0) > > + if (mem_cgroup_below_min(sc->target_mem_cgroup, memcg)) { > > + need_rotate = true; > > break; > > + } > > + > > + if (should_run_aging(lruvec, max_seq, sc, swappiness)) { > > + if (try_to_inc_max_seq(lruvec, max_seq, swappiness, false)) > > + need_rotate = true; > > + /* stop scanning as it's low on cold folios */ > > + break; > > + } > Can this bypass critical memcg protection checks and aging when nr_to_scan > is 0? > If get_nr_to_scan() returns 0 (which happens if evictable == 0 or when > nr_to_scan is shifted to 0 at DEF_PRIORITY), the loop body is bypassed > entirely. > This means mem_cgroup_below_min() is never evaluated in those cases, > preventing need_rotate from being set to true for protected memcgs. > The reclaimer relies on this flag to rotate away from protected memcgs. > Additionally, does this skip aging for empty offline memcgs > (where evictable == 0), since they never enter the loop where > should_run_aging() is called? > The commit message states that offline memcgs should now be aged > to enable further reclaim, but this bypass seems to prevent that. Pretty much a false positive. Bypassing the loop body is not a problem at DEF_PRIORITY, as there wasn't aging before either, and the rotation for below-min memcgs is already handled by shrink_one before entering the loop in the global reclaim path, and is inconsequential in the memcg reclaim path. And for offline memcg, nr_to_scan is never zero unless that memcg is completely empty. If we got a lot of empty memcg slowing down the memcg iterator due to missing rotation, that's a different issue and not related or introduce by this patch.