From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C040DC47258 for ; Tue, 23 Jan 2024 13:58:21 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 531276B007B; Tue, 23 Jan 2024 08:58:21 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 4E03A6B007D; Tue, 23 Jan 2024 08:58:21 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3CF986B0080; Tue, 23 Jan 2024 08:58:21 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 2D6426B007B for ; Tue, 23 Jan 2024 08:58:21 -0500 (EST) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id C3FDF1A0B0C for ; Tue, 23 Jan 2024 13:58:20 +0000 (UTC) X-FDA: 81710730360.04.F4D91B4 Received: from mail-yw1-f182.google.com (mail-yw1-f182.google.com [209.85.128.182]) by imf14.hostedemail.com (Postfix) with ESMTP id 16F45100009 for ; Tue, 23 Jan 2024 13:58:17 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=QGby0Fyh; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf14.hostedemail.com: domain of tjmercier@google.com designates 209.85.128.182 as permitted sender) smtp.mailfrom=tjmercier@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1706018298; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Unn/XUQhMtWV3m5yOooWqaet1YVy9czMl29ZsBMY8K0=; b=KEMoNuUQjwYXd4qnLItinwgCqZTD/ijeMCAOIl2T466E3/Ut/X3PCTzb1zeV/h9YWBlqh7 T7jA+oetEiJBENB0eapoNsjI75mFfydlUGK6sbicAYwQQduHwRQdZSmsl05pqAPndFnRRa qpGDm2peVoZAGl8VZDS3l59FGgPL6Ro= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=QGby0Fyh; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf14.hostedemail.com: domain of tjmercier@google.com designates 209.85.128.182 as permitted sender) smtp.mailfrom=tjmercier@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1706018298; a=rsa-sha256; cv=none; b=vCzYeHw4aI/F+MNTnaInpYJxsfzbhVufuVomS8snz01OeTSbyms0GzAcavswCZsxN4aDb1 FxVHocCgdM8ZxpEfML0vHnt5hrT9wqTF4goSd2jlXimFn0LSSNi3g8z4QRcp6zLeZV9nsh q7524LYCKyv+J5l2ZRuMsFYubI8aGP8= Received: by mail-yw1-f182.google.com with SMTP id 00721157ae682-5ffee6e8770so22904927b3.0 for ; Tue, 23 Jan 2024 05:58:17 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1706018297; x=1706623097; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=Unn/XUQhMtWV3m5yOooWqaet1YVy9czMl29ZsBMY8K0=; b=QGby0FyhuBj0QY2DVGpjAu5P8I96cb2MTymD1POc/xQHKEbisuN5Jq0X/CdL7HwrcC g3NKM6saDVRwT+ySZTfoMZbEmSKFehSdQIbgp2rXKkw2SSsPxZ2hGu5V/rTDNHNgGrAh WOFgQ7GMQ21hTPCmvnJV8R+aNwPm0ZRS/YJSXarewnRHazCZmLmj9+Ho5rxO2Mj9gkSF 9kmQs5r5C5x2O+fg+2PzJ6yKmxpue5eCAz+hJtqh3u+CykvTjDOY4/zuPOBn0VPlAyoA y2HhbBosYA4yc3yBYiC2lC0f1a3997fAMVSMLQOqIErv/1mzCt75wEhHFfY6yOw+1MME RquQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1706018297; x=1706623097; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Unn/XUQhMtWV3m5yOooWqaet1YVy9czMl29ZsBMY8K0=; b=iQSTnvNyBcAadYe0SI8OhTkylV/o4cg4mZo9KKZ1sAM9h6KxF+gJQq6JN/vCnY9VrL yefp65YIR8Zj/+IndcoYBMU/LbBpXJVDwOz1kwZV1sm6c2Z2rlOEMa2o6vzUtyyuPOpy IW8eqk38evnAh38oqJ4cjAP6hvMeGdn5wlXSfJrsfieJNqqSoZ8uthu5XxdUlkEn3a4Q ktaR5M7XJx+3qwqXqO/d1rU2N2Sn3eLXzmqW+9UIkz9VXBP3U+nTaMXop6Ygtne/8mU5 YqIDmCJgeAqm5/oEKc/eKDTE3wTe3r7+0Ly9Lf97SMgHK5k4/IaLpR2qTysk3XQfFNDL EZtQ== X-Gm-Message-State: AOJu0YzcXXd16GZHyfRB20nUSICSow2rCeneKKuoAYkH+h2+6qn90gUz DmeXogsPfqAeQSAAU7lriU/BdjVyjbCCAivdmERAbmKaVIwkAw4EC1VGCuZiovnOHDXZ2Oqr0kq wBts2q7t2ESA38c0hnru//UOcgdTbmrnvD561 X-Google-Smtp-Source: AGHT+IHY/qLbl37tVuFXj90DELM5aSYNnxau4XIsFG7zlEDgdKDus8DEj9COcAlaZJwjLiawQKFxgWu4UxLMxlUfQSY= X-Received: by 2002:a81:840b:0:b0:600:769:179f with SMTP id u11-20020a81840b000000b006000769179fmr2815522ywf.17.1706018296914; Tue, 23 Jan 2024 05:58:16 -0800 (PST) MIME-Version: 1.0 References: <20240121214413.833776-1-tjmercier@google.com> In-Reply-To: From: "T.J. Mercier" Date: Tue, 23 Jan 2024 05:58:05 -0800 Message-ID: Subject: Re: [PATCH] Revert "mm:vmscan: fix inaccurate reclaim during proactive reclaim" To: Michal Hocko Cc: Johannes Weiner , Roman Gushchin , Shakeel Butt , Muchun Song , Andrew Morton , android-mm@google.com, yuzhao@google.com, yangyifei03@kuaishou.com, cgroups@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 16F45100009 X-Stat-Signature: 5a3cibyjj8h7kttcmgh38bd1oj18us99 X-Rspam-User: X-HE-Tag: 1706018297-433952 X-HE-Meta: U2FsdGVkX1+qSFh7A92oDKv9P4MNm4B/HCacxG1DKJ+Cm0qDsF2y2L+Rftl/FOG3ZcElVZj14HoRH+b+H6GRgJPV9AOw5Qjm05q5+SGhatE4Gq1CBCKZQGHZmXfhxRCArMhryq5BQgz6XiTzNvydWvxGCDIsycMSEiGFZpDl59kG3QhZyo+g28ng4Or+kiGCdb/0mDzfz8WRYyqLgHV3vFs6Gg363vZwUbiTAFsgGtDQ+vBxwA3E35liF5Zs0y/JdpYHTPi9jUzIoshWuspU42F+E+xNiH5tFlHlVzDoxQbfZMTPY2h+7Iq8WIDxmroocu8BzkvwKOg0H9TTbmF29/Mn6aGQxE42wXO5nDRIVJv7gQcAsCAaEaTNo1veEn9sQxhLQLYnHaf5j0dodJJ5vggC8GlfL45erj5vI0ESNaq2dlxYKPAwJ7WTU4cVAd+ZnnNyFQeJPHPJB6oK+cIB66j60lug7pfbSewnaeWubVo7vHPQL1Tgq1QGePITnyTuVh7dFV4gVB6J0X8DngKpP7b9WeccmI/A8pTEdOZv1kgOv8TfXBPvAaPAvIpt/tGiRWn3oxzv6BFIYT96WZCDGnXdSyuk3OlO68TK8+kU3zOM+cOtPdIpwqMNcMeGthwHIa3QYRLgDXIYiSSGDKtVZfjD8ae2sWuE5yHy3QbobkiZReASU/fa6bVPILLz4brBsizsSCcAFklILx+SZVXiJbPo4lrjh2YhjwGa2a/Vh8NKNslNQ+Dnhf9qXktOYQYLcZXyUrFR5WW14oI6yanWjdb+gkhHKGMm67hihiFgtMJbXus53mYPA2RbRxb7dn5XfglfbMBOVOMJDW5MXewzAJ/0kljsWE1w6R36s/PyLbhRg5gkQLEPCVamlTOGABg+pZx2LeN1H3ygOs0Pk1Pbfh5mlK9LVZ0zIXCIdIDmiRNtmAyo8GFT+gxkOFzVQFFQZJTIL7Yjh1DTvjueSKn tByHItmV k5sk41NfdeZExg4I7gu/ecDhffRUejgbF7IKQXd/SaMTtTz79UHpfnm4/8fJ5n4rC2vIDLi4OgtsvGXiq6NqSqtsX1edRNzD4txquY2+3r5LGf6+NjQdWt6ZAivPvPRmCHXXQlfi/rM469xnSKnb6YRJTm5czEwAynmqQt7HDHdi5oZEseAmudt+JdqgCnYI/MyzFn8d9Eov9S2XK2YPKgJhSfdqEL/HmscbRv2K0rc40Poyy62P5ZnKGmRi0sMZZHD8wURgpBeXg+Aqe9NGUlh4AZWFFeGtKOd2fxK35p9iHSQnN5lxVSGjIrWICjxn4JwqvVlRFdx36nD5BX90x13qqYJTROPojwZsGnl73pFrc/4SxfnzdD3dL2w== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Jan 23, 2024 at 1:33=E2=80=AFAM Michal Hocko wrot= e: > > On Sun 21-01-24 21:44:12, T.J. Mercier wrote: > > This reverts commit 0388536ac29104a478c79b3869541524caec28eb. > > > > Proactive reclaim on the root cgroup is 10x slower after this patch whe= n > > MGLRU is enabled, and completion times for proactive reclaim on much > > smaller non-root cgroups take ~30% longer (with or without MGLRU). > > What is the reclaim target in these pro-active reclaim requests? Two targets: 1) /sys/fs/cgroup/memory.reclaim 2) /sys/fs/cgroup/uid_0/memory.reclaim (a bunch of Android system services) Note that lru_gen_shrink_node is used for 1, but shrink_node_memcgs is used for 2. The 10x comes from the rate of reclaim (~70k pages/sec vs ~6.6k pages/sec) for 1. After this revert the root reclaim took only about 10 seconds. Before the revert it's still running after about 3 minutes using a core at 100% the whole time, and I'm too impatient to wait longer to record times for comparison. The 30% comes from the average of a few runs for 2: Before revert: $ adb wait-for-device && sleep 120 && adb root && adb shell -t 'time echo "" > /sys/fs/cgroup/uid_0/memory.reclaim' restarting adbd as root 0m09.69s real 0m00.00s user 0m09.19s system After revert: $ adb wait-for-device && sleep 120 && adb root && adb shell -t 'time echo "" > /sys/fs/cgroup/uid_0/memory.reclaim' 0m07.31s real 0m00.00s user 0m06.44s system It's actually a bigger difference for smaller reclaim amounts: Before revert: $ adb wait-for-device && sleep 120 && adb root && adb shell -t 'time echo "3G" > /sys/fs/cgroup/uid_0/memory.reclaim' 0m12.04s real 0m00.00s user 0m11.48s system After revert: $ adb wait-for-device && sleep 120 && adb root && adb shell -t 'time echo "3G" > /sys/fs/cgroup/uid_0/memory.reclaim' 0m06.65s real 0m00.00s user 0m05.91s system > > With > > root reclaim before the patch, I observe average reclaim rates of > > ~70k pages/sec before try_to_free_mem_cgroup_pages starts to fail and > > the nr_retries counter starts to decrement, eventually ending the > > proactive reclaim attempt. > > Do I understand correctly that the reclaim target is over estimated and > you expect that the reclaim process breaks out early Yes. I expect memory_reclaim to fail at some point when it becomes difficult/impossible to reclaim pages where I specify a large amount to reclaim. The ask here is, "please reclaim as much as possible from this cgroup, but don't take all day". But it takes minutes to get there on the root cgroup, working SWAP_CLUSTER_MAX pages at a time. > > After the patch the reclaim rate is > > consistently ~6.6k pages/sec due to the reduced nr_pages value causing > > scan aborts as soon as SWAP_CLUSTER_MAX pages are reclaimed. The > > proactive reclaim doesn't complete after several minutes because > > try_to_free_mem_cgroup_pages is still capable of reclaiming pages in > > tiny SWAP_CLUSTER_MAX page chunks and nr_retries is never decremented. > > I do not understand this part. How does a smaller reclaim target manages > to have reclaimed > 0 while larger one doesn't? They both are able to make progress. The main difference is that a single iteration of try_to_free_mem_cgroup_pages with MGLRU ends soon after it reclaims nr_to_reclaim, and before it touches all memcgs. So a single iteration really will reclaim only about SWAP_CLUSTER_MAX-ish pages with MGLRU. WIthout MGLRU the memcg walk is not aborted immediately after nr_to_reclaim is reached, so a single call to try_to_free_mem_cgroup_pages can actually reclaim thousands of pages even when sc->nr_to_reclaim is 32. (I.E. MGLRU overreclaims less.) https://lore.kernel.org/lkml/20221201223923.873696-1-yuzhao@google.com/