From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C33B0C61DA4 for ; Fri, 3 Feb 2023 15:11:44 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 64F2E6B0074; Fri, 3 Feb 2023 10:11:44 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 5FF406B0078; Fri, 3 Feb 2023 10:11:44 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4EDBC6B007D; Fri, 3 Feb 2023 10:11:44 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 409106B0074 for ; Fri, 3 Feb 2023 10:11:44 -0500 (EST) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 1DBE6121199 for ; Fri, 3 Feb 2023 15:11:44 +0000 (UTC) X-FDA: 80426320128.11.9B8A55C Received: from mail-qt1-f175.google.com (mail-qt1-f175.google.com [209.85.160.175]) by imf11.hostedemail.com (Postfix) with ESMTP id 2545A40006 for ; Fri, 3 Feb 2023 15:11:41 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=cmpxchg-org.20210112.gappssmtp.com header.s=20210112 header.b=b36P5TVk; spf=pass (imf11.hostedemail.com: domain of hannes@cmpxchg.org designates 209.85.160.175 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org; dmarc=pass (policy=none) header.from=cmpxchg.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1675437102; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=SEpJ1jUZDdfyJlrWRH0MUqtupux6/xnXEoAuep6fT1o=; b=UD7pS3Z+8sKzVo7te7v74Yny2E8VCKqUanzx3nld25/h16eJdXXcIAYloow3nK3qci70Ge ieXfE9UzB8tdZPzAYs7ZyqTq7Oxq9d6mE2cuu0xSNSaNmFnG2TM3iQElpbxMSLSuN6RkmH 0i8kRRkjFvtbwIN0yJM4pbIDe42dAkk= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=cmpxchg-org.20210112.gappssmtp.com header.s=20210112 header.b=b36P5TVk; spf=pass (imf11.hostedemail.com: domain of hannes@cmpxchg.org designates 209.85.160.175 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org; dmarc=pass (policy=none) header.from=cmpxchg.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1675437102; a=rsa-sha256; cv=none; b=dkRf4PQPy1evT2du8vBbuW5/G2trlPzExvTkgbm/8AkhizI0TxjuwpkPUyZgBryEaM/diz ppnOeqmgctDeuqR/9LeLakvhqZVOHr+TTq9s91O6cqZjrxXmuchwXq1NARGTJnAz4j1kYK Tt7C80vm86BfnEHVq9wvsTt7+Kv7C3w= Received: by mail-qt1-f175.google.com with SMTP id m26so5713769qtp.9 for ; Fri, 03 Feb 2023 07:11:41 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20210112.gappssmtp.com; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=SEpJ1jUZDdfyJlrWRH0MUqtupux6/xnXEoAuep6fT1o=; b=b36P5TVkE3Cj7mZuiHbApRHd6kcCvPcBfrmQSSGSVfVs5HEMoPasHoWqrYreaVOwOX ZVAFeeRHvBaa7OHwMP5xHSjlyGK8CoaDmd+KvIig8gLREeDEE6fNVDs39T1McQQgFdNb Vn0J3OlUGzhpn/HFD5hWVOkUCoJsuYsKNQy77+tS5NUMzpC/tuSEi97vuea1Sjw7eXkX 1STe7Y546U350eEX1nXeGvN6bRx18vcfYoZcNLdGoG08b+erzSvnb1JVo9KVwp6zVRI1 UXsyJ04l0JZITVZ9OLOswFN4hbhHgBMlg+MlPP+CXmygSFNe6k3YdwnaG38kkF5jlMS5 2NLA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=SEpJ1jUZDdfyJlrWRH0MUqtupux6/xnXEoAuep6fT1o=; b=ZzfqutFbCJkEYZaz6JqcyUwLDOO2eoSlczjbeUoTNl/LmFu90SMeskXFkpQwatOtze t00ZfOcoySXhKKY12WzpBXcwHtwQul9M5heyIE8Ir09OjUi8zk/sXs8jufjIY2BiiLom sfY8wtvbbDwbbBiGuxGv+g5ir/Eipdwta2ZXShgXP5FZEX0MI7qFEXl8RhLWy0VP35PP v7lZgolFrz64tARkhGp9c4cThUzydKfe/09LihLpUiv1s9YOvULvYYWqQe7t34K3WCS6 QNN3+T/dH2gvzvAGlTrw+mpBpDHfIJY05DuGLHvoH347LL7B5uOqdlA7IYYD9oFXKsr1 LCKA== X-Gm-Message-State: AO0yUKVbI9i7vQFKu9ufQKNUf50xpqplPlzUpwmvUo494W58RVth4ldq AgKX0lwkFMnFEJvj+P9dKRUoHw== X-Google-Smtp-Source: AK7set8kNeAz7t/GuiRI6DhuRD2fUDJlutbL3SVyVrLBMQ65pNOpE0jdTO5KHgFdNUUYMH4MXYkjFg== X-Received: by 2002:a05:622a:1909:b0:3b6:2b38:e075 with SMTP id w9-20020a05622a190900b003b62b38e075mr18218491qtc.9.1675437101114; Fri, 03 Feb 2023 07:11:41 -0800 (PST) Received: from localhost (2603-7000-0c01-2716-8f57-5681-ccd3-4a2e.res6.spectrum.com. [2603:7000:c01:2716:8f57:5681:ccd3:4a2e]) by smtp.gmail.com with ESMTPSA id m11-20020ac8444b000000b003b2957fb45bsm1754141qtn.8.2023.02.03.07.11.40 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 03 Feb 2023 07:11:40 -0800 (PST) Date: Fri, 3 Feb 2023 10:11:39 -0500 From: Johannes Weiner To: Yosry Ahmed Cc: Dave Chinner , Alexander Viro , "Darrick J. Wong" , Christoph Lameter , David Rientjes , Joonsoo Kim , Vlastimil Babka , Roman Gushchin , Hyeonggon Yoo <42.hyeyoo@gmail.com>, "Matthew Wilcox (Oracle)" , Miaohe Lin , David Hildenbrand , Peter Xu , NeilBrown , Shakeel Butt , Michal Hocko , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-xfs@vger.kernel.org, linux-mm@kvack.org Subject: Re: [RFC PATCH v1 0/2] Ignore non-LRU-based reclaim in memcg reclaim Message-ID: References: <20230202233229.3895713-1-yosryahmed@google.com> <20230203000057.GS360264@dread.disaster.area> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 2545A40006 X-Stat-Signature: byumh1gaw8e6qhaoqxjubqyobzptdq9p X-Rspam-User: X-HE-Tag: 1675437101-632775 X-HE-Meta: U2FsdGVkX1+06PYYkJGnJmthdL25LVAO3oq3k6pap0+tp0WTZu4zG8JjKprdXqg30dmV+ZunVQArOF3qer2Y9EfhFuMw3UZle05bmf1rZHjhnwRiCzk2CTRm93vwjVQ7c3pA4r4+dHJHwAbKjlwQAGJTIXC0/+mUymgCaVUlL4iXDIlcFQVG/HAt6/ET7xDsqix9/4lbXmmVEs281IghtwU+ypE+26rfYSkJgs0/tuTMR44pFCvSuEcVMBEwaIHT3DOGRN2Tmly1HK0xm36R277leysfxX/yqXqy4uMhL1ao5Ji+pHI06yhgfm4df1EpRlwgbskrpHUq8C1hEgENFn6rpu3UDD88FIL2CvBz/96A7xxpLmEBka3+oeUem8evjwCOmPwDieoNcHbUhsrnEjysi9IYdVjffL2jUOGpLp3UyvfMBDMOT9ckgefYy4kEiMGzjz8U+mmTsPTj2zy2nSJyGg91syqndj2BX1FCxuXM0PlsWBnWhQvnBhpanIzbRjKQkDvzL7HqGBfUPzUMoKwx162Vkui9Va34RfSOpsH590odqdAT1XgCfzfsvBoG68gCVHuLwkBzRfXa5YgyooTqaDcMpV18mcSFf+CfPvc1SO49VFY3dr2iMPu+y+KkHWnzWEJy9apEUEyHgidRSnrFCB/K7pUXcOgHhNRpfFpopshTZbcm08kVNuNZbh+euN0e2g9jBEGm2kF0vDty8qQLC8moFb2LqfVFo2sIwHK71T+Mq+vIwtZjLk6dGClCCTLtAN8DKI7y1tBoYknmzS9U6XXP1ffsX17c0/PlMuryMm9W6CMNphCuBqjn4MAtsf4YwBu1wPhUtLG//u5Pi+7/7NVFm0SSkO5KYuuDEJMR2K2vyFzI1lh5FgCkXw618ffbVYBNGos2HpKRRSZplqQkfLupETejPI74RRYn8kH1tCJalfJam/sFrPVM8Ukqnzaqu+NhAa6fGVrOKoQ +2dgRyiH +DQkPTkEhbMv2+fcaES0cOUiDte4eCh0/njJ5ahnnClbbJRoLOTb8frvf8P5Ms8Iw2atgQIgk8sDwe4jJ4/+DSeXxq+5NhCMSGb2WgmBk/Hb7ZSDi9zeAd3ndYJm8otbMPd+0447t59e9cML3lyi65Sk6M4Nqni0rG6pue6S9k+Eyo93c6OFkVF0rMILyNjmnVbzMVLaxhMJYhqj4nFT8Pe7q1joJfi2kBBSA9wUR91GL/jG74dRvMRj2ezIEpKs01Z2E4c0nDo6ZStfau4AyKv6Io0Cnnv6g0XPz18TU3k1Qsplpsk7OcoNb1o9ef0gQEb47Mt+hjRYJFKNNp5X9Fuy8JX+YwHKVdkUrVpE0Ws3T/l1hNxwiNTO+NnKpmoxs4HvamKFs7T5hsOmtuYcEbc5qId+Sz30nrZdn1CYoZyla9p0R/aRpajECcDhmlSCDmUP6q2+hqrK/YUGkrmT7z363GzeEL3BXMMYNJcd2J3HNRDOEqPVxeoeiyT4e5LAVQbF9kttmXf2KETa8zzr/I0bPB9X/eOo1o0YLg+w5dEDmYG4= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Feb 02, 2023 at 04:17:18PM -0800, Yosry Ahmed wrote: > On Thu, Feb 2, 2023 at 4:01 PM Dave Chinner wrote: > > > Patch 1 is just refactoring updating reclaim_state into a helper > > > function, and renames reclaimed_slab to just reclaimed, with a comment > > > describing its true purpose. > > > > > > Patch 2 ignores pages reclaimed outside of LRU reclaim in memcg reclaim. > > > > > > The original draft was a little bit different. It also kept track of > > > uncharged objcg pages, and reported them only in memcg reclaim and only > > > if the uncharged memcg is in the subtree of the memcg under reclaim. > > > This was an attempt to make reporting of memcg reclaim even more > > > accurate, but was dropped due to questionable complexity vs benefit > > > tradeoff. It can be revived if there is interest. > > > > > > Yosry Ahmed (2): > > > mm: vmscan: refactor updating reclaimed pages in reclaim_state > > > mm: vmscan: ignore non-LRU-based reclaim in memcg reclaim > > > > > > fs/inode.c | 3 +-- > > > > Inodes and inode mapping pages are directly charged to the memcg > > that allocated them and the shrinker is correctly marked as > > SHRINKER_MEMCG_AWARE. Freeing the pages attached to the inode will > > account them correctly to the related memcg, regardless of which > > memcg is triggering the reclaim. Hence I'm not sure that skipping > > the accounting of the reclaimed memory is even correct in this case; > > Please note that we are not skipping any accounting here. The pages > are still uncharged from the memcgs they are charged to (the allocator > memcgs as you pointed out). We just do not report them in the return > value of try_to_free_mem_cgroup_pages(), to avoid over-reporting. I was wondering the same thing as Dave, reading through this. But you're right, we'll catch the accounting during uncharge. Can you please add a comment on the !cgroup_reclaim() explaining this? There is one wrinkle with this, though. We have the following (simplified) sequence during charging: nr_reclaimed = try_to_free_mem_cgroup_pages(mem_over_limit, nr_pages, gfp_mask, reclaim_options); if (mem_cgroup_margin(mem_over_limit) >= nr_pages) goto retry; /* * Even though the limit is exceeded at this point, reclaim * may have been able to free some pages. Retry the charge * before killing the task. * * Only for regular pages, though: huge pages are rather * unlikely to succeed so close to the limit, and we fall back * to regular pages anyway in case of failure. */ if (nr_reclaimed && nr_pages <= (1 << PAGE_ALLOC_COSTLY_ORDER)) goto retry; So in the unlikely scenario where the first call doesn't make the necessary headroom, and the shrinkers are the only thing that made forward progress, we would OOM prematurely. Not that an OOM would seem that far away in that scenario, anyway. But I remember long discussions with DavidR on probabilistic OOM regressions ;) > > I think the code should still be accounting for all pages that > > belong to the memcg being scanned that are reclaimed, not ignoring > > them altogether... > > 100% agree. Ideally I would want to: > - For pruned inodes: report all freed pages for global reclaim, and > only report pages charged to the memcg under reclaim for memcg > reclaim. This only happens on highmem systems at this point, as elsewhere populated inodes aren't on the shrinker LRUs anymore. We'd probably be ok with a comment noting the inaccuracy in the proactive reclaim stats for the time being, until somebody actually cares about that combination. > - For slab: report all freed pages for global reclaim, and only report > uncharged objcg pages from the memcg under reclaim for memcg reclaim. > > The only problem is that I thought people would think this is too much > complexity and not worth it. If people agree this should be the > approach to follow, I can prepare patches for this. I originally > implemented this for slab pages, but held off on sending it. I'd be curious to see the code!