From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 72A20EB64DD for ; Tue, 11 Jul 2023 15:28:16 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BCB8B6B0074; Tue, 11 Jul 2023 11:28:15 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B7B9A6B0075; Tue, 11 Jul 2023 11:28:15 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A44156B0078; Tue, 11 Jul 2023 11:28:15 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 97C7E6B0074 for ; Tue, 11 Jul 2023 11:28:15 -0400 (EDT) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 24A57B007C for ; Tue, 11 Jul 2023 15:28:15 +0000 (UTC) X-FDA: 80999712150.01.3B7380F Received: from mail-qv1-f53.google.com (mail-qv1-f53.google.com [209.85.219.53]) by imf09.hostedemail.com (Postfix) with ESMTP id C4843140007 for ; Tue, 11 Jul 2023 15:28:12 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=cmpxchg-org.20221208.gappssmtp.com header.s=20221208 header.b=VN0P4UAk; spf=pass (imf09.hostedemail.com: domain of hannes@cmpxchg.org designates 209.85.219.53 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org; dmarc=pass (policy=none) header.from=cmpxchg.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1689089293; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=i+ay3yLBi0LIZZhlWfubcnK7psf6xXC5MPT4ey2dzCM=; b=7q7G1viFCJUPAzFFUF8KEPY4j7wpixHleJbGN1ptpdBQiHmQGBVD9YesKfmyzolN916Jm9 mFU8j9jV/7e00G4tZKuBD6J2RktT9UYqqDaPuhQMkg2LWUB+ehLBpFcNJ1vAK9zL/YhbNZ rFWDJbJgzB1wetFg+WLDVUu3st0HRi8= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1689089293; a=rsa-sha256; cv=none; b=siIntrONkivddM+FxjsJEZeRMwPA7gMFkaaedcgvOS/ILDGj/8MqpmihZAtfGYXYIIaDGp EAknV9Xv90AVYUbwjis8OYYPL6bAjvZbp9oByLRUJm2NjYY7m6iEb3wY1kTCWCvPOTVZXn 45ZjXVHc596nWjhPniZbxKbHnBdi374= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=cmpxchg-org.20221208.gappssmtp.com header.s=20221208 header.b=VN0P4UAk; spf=pass (imf09.hostedemail.com: domain of hannes@cmpxchg.org designates 209.85.219.53 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org; dmarc=pass (policy=none) header.from=cmpxchg.org Received: by mail-qv1-f53.google.com with SMTP id 6a1803df08f44-635e5b06aaeso33783096d6.0 for ; Tue, 11 Jul 2023 08:28:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20221208.gappssmtp.com; s=20221208; t=1689089291; x=1691681291; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=i+ay3yLBi0LIZZhlWfubcnK7psf6xXC5MPT4ey2dzCM=; b=VN0P4UAkymUQrXpU1rjo9NzV9mGcIxqtOp3WvPrMyJxIQnMVCUL1e8ErcMDoLFDkY3 HYhWC0QCl+gPpmkzB/ZmiZDUMkpy7zKSVKYp96qoSxkNnqP+rcng3EpM+lkt6gPC6DWq BbxyzZFjZittEeqjL3KZ964FkHJBaERbR4Mawx72T2pOqpfPEsGJ5mh8//f28Peie/aG 6OdI/gZSWFOAl359C+3r5+bK337Pd+74MtXX64ivc15S0SFX9Br1HCh83ALv0SfM8gJi E1Ogcg5CmYiUqYP3yhwZbtTKvylgLFXc+3F7Q4Ep02nGHfd3Np62wrXJz9n7GiZZphuv wVrg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689089291; x=1691681291; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=i+ay3yLBi0LIZZhlWfubcnK7psf6xXC5MPT4ey2dzCM=; b=clRer8pSdCHrfCBQbFcjxdd4a/kcVWMOnAQsTxxRj7k3ziF9P3CwPGxJZN4AnZ/q6K LiF97tT7zJfkfc4CSYeI3dux664XbeVjtqwQU6RQnpMX/7x5VbaX9+3qgKbBaZALtywy CQQ0c7AvdnRd86jvUF6rWOgwY4SfRiTQC8Qb4X+x9T9WY6nAx97KaopDFHKEyDh745lT WuFwXlVMm5SGnS0Uv4osOiTkd92FXFaGoMGerpidMAf+4cA2qGx4g9MY7YquXVi4CbSy tUOMACVprR/XWxBPsN6ZM2d7c0RiyJKKTZplqtMA8UW4Vdupb7EjPLGJhdBIupiJxQYM iXOQ== X-Gm-Message-State: ABy/qLaFDbkMlwScxX1mk0LzQmUrTZNWAB/jv09jhq4mjJnD0i36j0lV Qn+Z3f24SxecBs1pcMTljoHCTg== X-Google-Smtp-Source: APBJJlEgZLDXLGeHihxnGCUofKR/JoUYQTX9WolJ0gTVafxsDHYm1R0HhGeKfrwF0S/taIa4N1WPLw== X-Received: by 2002:a0c:9a92:0:b0:636:60c6:203d with SMTP id y18-20020a0c9a92000000b0063660c6203dmr12252322qvd.35.1689089291549; Tue, 11 Jul 2023 08:28:11 -0700 (PDT) Received: from localhost ([2620:10d:c091:400::5:f593]) by smtp.gmail.com with ESMTPSA id s4-20020a0ce304000000b00636b3519467sm1180685qvl.54.2023.07.11.08.28.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 11 Jul 2023 08:28:11 -0700 (PDT) Date: Tue, 11 Jul 2023 11:28:10 -0400 From: Johannes Weiner To: Efly Young Cc: akpm@linux-foundation.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH] mm:vmscan: fix inaccurate reclaim during proactive reclaim Message-ID: <20230711152810.GA2627@cmpxchg.org> References: <20230707103226.38496-1-yangyifei03@kuaishou.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20230707103226.38496-1-yangyifei03@kuaishou.com> X-Rspamd-Queue-Id: C4843140007 X-Rspam-User: X-Rspamd-Server: rspam11 X-Stat-Signature: 18s773jast3ricg9zcc9ebfhow68gki9 X-HE-Tag: 1689089292-543474 X-HE-Meta: U2FsdGVkX19xVvm06LoUPiEipB312gaPVwgftdUeQkO3JLegRxUFRFLajeAjs6T3PLj+CMMbFEnNjkY8/HB+83/GnjoCv+3t0BXB6MQnzgn0t0bXH9K+NSeKOvv2QVDXrQ8AGBhiodAcfWDG+tE/rEerroFKEVFzW0jNlFObcl/onTNZbq4RcNlLX9+RRqD7lSgvM+83whj0vAFJyAh2VPh1Xkpe/vg0V7jxdQsKlM7sMuABLC4P0Zk6CQxP1Y/huKdfQcQ+/wZWMqmvseQRktNCr7WZKhBBG0aFD0b/OX+opHeemwTK1K7RRCVD8uBWpTrHGDYiZC9Hs0lI/kSTwbDvIcZAzqR/4YJsbn9paqunMorxv6u5lpNU1xJuwc4Sv59cayWETlTmHkH+wqxBt1NIKYWm+1FK7ojJjl7st2nDMD2aV+/EiTQSe4ykL2tSIWTCEzuuQvwEA+L05IsrzS9Osrxd5dxF3N1kGa86CGIqnvi+tW9GqDy1MwuVnBOjhxsSn0cAlPHI8fnSjf9Tz37DN2Sk4oMCewYtQrfmM2SXa6SsvjxHUipmLWq0e6o22I8kUZbDIdopg0kKOFyQFnbkzS7xZ3oYs0urOlJKgBugWflPLtmniuNoV4dX+cr+uzcBCX3hMtNiVYh+UbKPcFAZmFcIWxUtzNTg5CXsD1tWr5Vrj6xBQwmYoEmds7QLhRxAoMUN3sWggb04LI90SZsUr45rPW2ovVwGFf2eT97HenfWtswVZXXLNxsg48n/JKEFWw4WGXW5lXt+z4BJup9Gl93PDAkrIXx5ePaHFxNEnRH/ILn3oSY7/hnDxN4714JGPAVofZX8pXOOw51hdp4pQP54QaQpyCGeXLfvoTGmPZG80Ub6E/MNsEv1SSiiG/nPORl4nin6O5jTotkxfJVvXaCeVSOji+FjrvhCnKJLLVEqyCzku4aqDuHCYOuiiRpnalc2Fr6Ls18+Fn+ 4WCkssYg g7IOXmJyBbKd0htZTNvTUcJ5SEqTpdhRS3N5NQCuUvvm1HrSHf7musJJtR4vegc/uv4EmbxDpIHkNWWN3u0s48xKYVb6MBuxla894aRYPymAF+5VJMX38sm7FGVFlqDkdYjzQG7UzOhjq5hJLxJSVFHJbD/i6r72D+zLSsZRtNn9XEwk/gfjvdBdp1WVqkaio3mZ0cVqgpRNFixv2XVb9akIuL2LXMHclkX49Lrn1w4x93K5zi8Oy7WlXXWrV82OF6huX/xI1ITwArm/YKUKMc4fvUAQed9+THudCIEodXG2ZzFnCHvxsjI/qkD97Vbq1Rgz/f8Z5hE8O+Nulhn/TvB0kpYQM5QrO750RHTnlaXUacTW+/J/NE+C5As4Pbxn/pm6W2H47syckjZevS8NvKQ5rQV3TQBdR65aJLgThSlHwnhO6mroL08lIFt8ZxL3mgbrXP4Ew4X93dGk/wNoQBSzyf3SgKEQKbtgp X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, Jul 07, 2023 at 06:32:26PM +0800, Efly Young wrote: > With commit f53af4285d77 ("mm: vmscan: fix extreme overreclaim > and swap floods"), proactive reclaim still seems inaccurate. > > Our problematic scene also are almost anon pages. Request 1G > by writing memory.reclaim will reclaim 1.7G or other values > more than 1G by swapping. > > This try to fix the inaccurate reclaim problem. I can see how this happens. Direct and kswapd reclaim have much smaller nr_to_reclaim targets, so it's less noticable when we loop a few times. Proactive reclaim can come in with a rather large value. What does the reproducer setup look like? Are you calling reclaim on a higher level cgroup with several children? Or is the looping coming from having multiple zones alone? > Signed-off-by: Efly Young > --- > mm/vmscan.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/mm/vmscan.c b/mm/vmscan.c > index 9c1c5e8b..2aea8d9 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -6208,7 +6208,7 @@ static void shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc) > unsigned long nr_to_scan; > enum lru_list lru; > unsigned long nr_reclaimed = 0; > - unsigned long nr_to_reclaim = sc->nr_to_reclaim; > + unsigned long nr_to_reclaim = (sc->nr_to_reclaim - sc->nr_reclaimed); This can underflow. shrink_list() eats SWAP_CLUSTER_MAX batches out of lru_pages >> priority, and only checks reclaimed > to_reclaim after. This will then disable the bailout mechanism entirely. In general, I'm not sure this is the best spot to fix the problem: - During reclaim/compaction, should_continue_reclaim() may decide that more reclaim is required before compaction can proceed. But the second cycle might not do anything now, since you remember the work done by the previous one. - shrink_node_memcgs() might do the full batch against the first cgroup and not touch the second one anymore. This will result in super lopsided behavior when you target a tree of multiple groups. There might be other spots that break, I haven't checked. You could go through them one by one, of course. But the truth is, larger reclaim targets are the rare exception. Trying to support them at the risk of breaking all other reclaim users seems ill-advised. A better approach might be to just say: "don't call reclaim with large numbers". Have proactive reclaim code handle the batching into smaller chunks: diff --git a/mm/memcontrol.c b/mm/memcontrol.c index e8ca4bdcb03c..4b016806dcc7 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -6696,7 +6696,7 @@ static ssize_t memory_reclaim(struct kernfs_open_file *of, char *buf, lru_add_drain_all(); reclaimed = try_to_free_mem_cgroup_pages(memcg, - nr_to_reclaim - nr_reclaimed, + min(nr_to_reclaim - nr_reclaimed, SWAP_CLUSTER_MAX), GFP_KERNEL, reclaim_options); if (!reclaimed && !nr_retries--)