From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-17.2 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id CE230C4338F for ; Wed, 18 Aug 2021 20:18:16 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 5A189610CF for ; Wed, 18 Aug 2021 20:18:16 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 5A189610CF Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=chrisdown.name Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 9B2336B006C; Wed, 18 Aug 2021 16:18:15 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 962228D0001; Wed, 18 Aug 2021 16:18:15 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 850AB6B0072; Wed, 18 Aug 2021 16:18:15 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0051.hostedemail.com [216.40.44.51]) by kanga.kvack.org (Postfix) with ESMTP id 69B216B006C for ; Wed, 18 Aug 2021 16:18:15 -0400 (EDT) Received: from smtpin27.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 1AD4A267DC for ; Wed, 18 Aug 2021 20:18:15 +0000 (UTC) X-FDA: 78489313350.27.7B60C86 Received: from mail-wm1-f45.google.com (mail-wm1-f45.google.com [209.85.128.45]) by imf24.hostedemail.com (Postfix) with ESMTP id A5C9DB004D63 for ; Wed, 18 Aug 2021 20:18:14 +0000 (UTC) Received: by mail-wm1-f45.google.com with SMTP id w24so2363775wmi.5 for ; Wed, 18 Aug 2021 13:18:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chrisdown.name; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=nHfCCbHIXfL8S0LAJ+5b/okpSRYN0l8zL9APPGHUHjM=; b=JNxUjQJfF8Rfgw0pIVdmau+KQ+tvGbR5jwxsNL+C4ZDxsjgE+iui13LWXrzEM2oi4e JExJW4m3jOmYpeYqh5Jm5a8nfWjtKnJpIJqhmxMEVzERO+5UnEh+APgQwU6jjDW2aU50 dWaLO+Kt1o238ME+il7oVvZb/cGa/r219D06M= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=nHfCCbHIXfL8S0LAJ+5b/okpSRYN0l8zL9APPGHUHjM=; b=ojErFn9L9PmO6pLoCw42QQHCGYg4sz1KNNPUOOo2Kdig1Sn7/f+VWV7FMbGUi8wj0s PHYVMhJRdWJF8MsLI6qpgnPi9XY/sTLPpxNyAmlEj7XeS4laRDXWNV+DkAcN1GLiGwJv FzNSlEDH2lv5nhWz4gEH4IZqBE7M4hPAtjR2Ru2eeLiLhNJR3Aw07AZbtmJXrkBJPKKq 4qVlHB3e+xqHdGJOQb0aWIsB4Y2oYZ2E1Pi9n0odA8wbZRE2pMctzJGSGJBNkCWXVX4+ XM7oDFuO1kbdxYzuKmxFEug5RPZJ3nQvCi8BWaM8dVF0dToZQJrsUTjRLQRPdOAlG+kL OfbQ== X-Gm-Message-State: AOAM5316JnoEKMVfLegbAYd8bKUCt/Tr+KwVU6KkNoAtwBTxpwI8RA1t fbl0o4DLixLZz3vak+QqwHZmGw== X-Google-Smtp-Source: ABdhPJzYHZ/Yv7qcLsqA2BQ79yMvS71B29G7UVMNEpzfrAnk3wq8PLoDzLhcnwZ7fYeUBBem9GqVjA== X-Received: by 2002:a05:600c:290:: with SMTP id 16mr10314299wmk.187.1629317893281; Wed, 18 Aug 2021 13:18:13 -0700 (PDT) Received: from localhost (82-69-42-66.dsl.in-addr.zen.co.uk. [82.69.42.66]) by smtp.gmail.com with ESMTPSA id a18sm5639913wmg.43.2021.08.18.13.18.11 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 18 Aug 2021 13:18:12 -0700 (PDT) Date: Wed, 18 Aug 2021 21:18:10 +0100 From: Chris Down To: Johannes Weiner Cc: Andrew Morton , Leon Yang , Roman Gushchin , Michal Hocko , linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@fb.com Subject: Re: [PATCH] mm: memcontrol: fix occasional OOMs due to proportional memory.low reclaim Message-ID: References: <20210817180506.220056-1-hannes@cmpxchg.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Disposition: inline In-Reply-To: <20210817180506.220056-1-hannes@cmpxchg.org> User-Agent: Mutt/2.1.1 (e2a89abc) (2021-07-12) Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=chrisdown.name header.s=google header.b=JNxUjQJf; dmarc=pass (policy=none) header.from=chrisdown.name; spf=pass (imf24.hostedemail.com: domain of chris@chrisdown.name designates 209.85.128.45 as permitted sender) smtp.mailfrom=chris@chrisdown.name X-Stat-Signature: 9xn6m6qtf8senq95dktmarj3s6ej7gnc X-Rspamd-Queue-Id: A5C9DB004D63 X-Rspamd-Server: rspam05 X-HE-Tag: 1629317894-91665 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Johannes Weiner writes: >We've noticed occasional OOM killing when memory.low settings are in >effect for cgroups. This is unexpected and undesirable as memory.low >is supposed to express non-OOMing memory priorities between cgroups. > >The reason for this is proportional memory.low reclaim. When cgroups >are below their memory.low threshold, reclaim passes them over in the >first round, and then retries if it couldn't find pages anywhere else. >But when cgroups are slighly above their memory.low setting, page scan >force is scaled down and diminished in proportion to the overage, to >the point where it can cause reclaim to fail as well - only in that >case we currently don't retry, and instead trigger OOM. > >To fix this, hook proportional reclaim into the same retry logic we >have in place for when cgroups are skipped entirely. This way if >reclaim fails and some cgroups were scanned with dimished pressure, >we'll try another full-force cycle before giving up and OOMing. > >Reported-by: Leon Yang >Signed-off-by: Johannes Weiner Thanks for tracking this down! Agreed that this looks like a good stable candidate. Acked-by: Chris Down >--- > include/linux/memcontrol.h | 29 +++++++++++++++-------------- > mm/vmscan.c | 27 +++++++++++++++++++-------- > 2 files changed, 34 insertions(+), 22 deletions(-) > >diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h >index bfe5c486f4ad..24797929d8a1 100644 >--- a/include/linux/memcontrol.h >+++ b/include/linux/memcontrol.h >@@ -612,12 +612,15 @@ static inline bool mem_cgroup_disabled(void) > return !cgroup_subsys_enabled(memory_cgrp_subsys); > } > >-static inline unsigned long mem_cgroup_protection(struct mem_cgroup *root, >- struct mem_cgroup *memcg, >- bool in_low_reclaim) >+static inline void mem_cgroup_protection(struct mem_cgroup *root, >+ struct mem_cgroup *memcg, >+ unsigned long *min, >+ unsigned long *low) > { >+ *min = *low = 0; >+ > if (mem_cgroup_disabled()) >- return 0; >+ return; > > /* > * There is no reclaim protection applied to a targeted reclaim. >@@ -653,13 +656,10 @@ static inline unsigned long mem_cgroup_protection(struct mem_cgroup *root, > * > */ > if (root == memcg) >- return 0; >- >- if (in_low_reclaim) >- return READ_ONCE(memcg->memory.emin); >+ return; > >- return max(READ_ONCE(memcg->memory.emin), >- READ_ONCE(memcg->memory.elow)); >+ *min = READ_ONCE(memcg->memory.emin); >+ *low = READ_ONCE(memcg->memory.elow); > } > > void mem_cgroup_calculate_protection(struct mem_cgroup *root, >@@ -1147,11 +1147,12 @@ static inline void memcg_memory_event_mm(struct mm_struct *mm, > { > } > >-static inline unsigned long mem_cgroup_protection(struct mem_cgroup *root, >- struct mem_cgroup *memcg, >- bool in_low_reclaim) >+static inline void mem_cgroup_protection(struct mem_cgroup *root, >+ struct mem_cgroup *memcg, >+ unsigned long *min, >+ unsigned long *low) > { >- return 0; >+ *min = *low = 0; > } > > static inline void mem_cgroup_calculate_protection(struct mem_cgroup *root, >diff --git a/mm/vmscan.c b/mm/vmscan.c >index 4620df62f0ff..701106e1829c 100644 >--- a/mm/vmscan.c >+++ b/mm/vmscan.c >@@ -100,9 +100,12 @@ struct scan_control { > unsigned int may_swap:1; > > /* >- * Cgroups are not reclaimed below their configured memory.low, >- * unless we threaten to OOM. If any cgroups are skipped due to >- * memory.low and nothing was reclaimed, go back for memory.low. >+ * Cgroup memory below memory.low is protected as long as we >+ * don't threaten to OOM. If any cgroup is reclaimed at >+ * reduced force or passed over entirely due to its memory.low >+ * setting (memcg_low_skipped), and nothing is reclaimed as a >+ * result, then go back back for one more cycle that reclaims >+ * the protected memory (memcg_low_reclaim) to avert OOM. > */ > unsigned int memcg_low_reclaim:1; > unsigned int memcg_low_skipped:1; >@@ -2537,15 +2540,14 @@ static void get_scan_count(struct lruvec *lruvec, struct scan_control *sc, > for_each_evictable_lru(lru) { > int file = is_file_lru(lru); > unsigned long lruvec_size; >+ unsigned long low, min; > unsigned long scan; >- unsigned long protection; > > lruvec_size = lruvec_lru_size(lruvec, lru, sc->reclaim_idx); >- protection = mem_cgroup_protection(sc->target_mem_cgroup, >- memcg, >- sc->memcg_low_reclaim); >+ mem_cgroup_protection(sc->target_mem_cgroup, memcg, >+ &min, &low); > >- if (protection) { >+ if (min || low) { > /* > * Scale a cgroup's reclaim pressure by proportioning > * its current usage to its memory.low or memory.min >@@ -2576,6 +2578,15 @@ static void get_scan_count(struct lruvec *lruvec, struct scan_control *sc, > * hard protection. > */ > unsigned long cgroup_size = mem_cgroup_size(memcg); >+ unsigned long protection; >+ >+ /* memory.low scaling, make sure we retry before OOM */ >+ if (!sc->memcg_low_reclaim && low > min) { >+ protection = low; >+ sc->memcg_low_skipped = 1; >+ } else { >+ protection = min; >+ } > > /* Avoid TOCTOU with earlier protection check */ > cgroup_size = max(cgroup_size, protection); >-- >2.32.0 >