From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-10.0 required=3.0 tests=BAYES_00,INCLUDES_PATCH, MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9C29CC433E0 for ; Tue, 14 Jul 2020 12:37:31 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 66E742242C for ; Tue, 14 Jul 2020 12:37:31 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 66E742242C Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id DC2906B0002; Tue, 14 Jul 2020 08:37:30 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D9AE38D0002; Tue, 14 Jul 2020 08:37:30 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C89D38D0001; Tue, 14 Jul 2020 08:37:30 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0074.hostedemail.com [216.40.44.74]) by kanga.kvack.org (Postfix) with ESMTP id B3F7F6B0002 for ; Tue, 14 Jul 2020 08:37:30 -0400 (EDT) Received: from smtpin02.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 43B1D8248047 for ; Tue, 14 Jul 2020 12:37:30 +0000 (UTC) X-FDA: 77036632260.02.scene23_6306c0e26ef1 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin02.hostedemail.com (Postfix) with ESMTP id 2444E500019DD48C for ; Tue, 14 Jul 2020 12:37:30 +0000 (UTC) X-HE-Tag: scene23_6306c0e26ef1 X-Filterd-Recvd-Size: 6427 Received: from mail-wr1-f68.google.com (mail-wr1-f68.google.com [209.85.221.68]) by imf13.hostedemail.com (Postfix) with ESMTP for ; Tue, 14 Jul 2020 12:37:29 +0000 (UTC) Received: by mail-wr1-f68.google.com with SMTP id r12so21247402wrj.13 for ; Tue, 14 Jul 2020 05:37:29 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=VmX+2fvGRcl2Dt2SBrz7iGqBoseKtysUI54ildUIvvM=; b=fSReZh1ttYn84GeBuYW95fa/q45Tk/Vmb+uUPES5bfPEirVoRPgxAZVmIkd04rf5Ke iBDSzLhSsmt2lFumbI/2FF+nZDgYHLbYLIsPTmJ6Tw8I9uAbogU15nD+V31Z+M2DFdOc xGnjVkdWMMnpPNeg9TkxtKOzufI9VoH6DhS1dGXnxzMt+9olr4Tk4N2UTVCeMxbWnbb5 KocwwdoNBV+0UJCQJdaYYTDsqEbff3IGFvKq2/KAnGyXsFaAfw9WOWV/UEG/fpMQXlBz LyrpACRgTHhJjBAT+zLO7bRGQ91MBnnDoIYIiW/CSuze7/MAsqiyXVSlDxDP81TvmkD1 ND7A== X-Gm-Message-State: AOAM532k7gxalULD/u/AbrnyglcPfcDYS6DDgrubzIIBlMPwGryCmIMG dY79HD6FspIp58iPB1hxy9M= X-Google-Smtp-Source: ABdhPJzG6j2zvcivXh1CdiYACDhwUmBnJvMM1LaGlNzqr1IoKj8q0XURH/WRwFJYpYAcgZH9Kgi0bQ== X-Received: by 2002:adf:dcd0:: with SMTP id x16mr4895428wrm.387.1594730248535; Tue, 14 Jul 2020 05:37:28 -0700 (PDT) Received: from localhost (ip-37-188-148-171.eurotel.cz. [37.188.148.171]) by smtp.gmail.com with ESMTPSA id u8sm28910756wrt.28.2020.07.14.05.37.27 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 14 Jul 2020 05:37:27 -0700 (PDT) Date: Tue, 14 Jul 2020 14:37:26 +0200 From: Michal Hocko To: Yafang Shao Cc: penguin-kernel@i-love.sakura.ne.jp, rientjes@google.com, akpm@linux-foundation.org, linux-mm@kvack.org Subject: Re: [PATCH] mm, oom: check memcg margin for parallel oom Message-ID: <20200714123726.GI24642@dhcp22.suse.cz> References: <1594728512-18969-1-git-send-email-laoar.shao@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1594728512-18969-1-git-send-email-laoar.shao@gmail.com> X-Rspamd-Queue-Id: 2444E500019DD48C X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam05 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue 14-07-20 08:08:32, Yafang Shao wrote: > The commit 7775face2079 ("memcg: killed threads should not invoke memcg OOM > killer") resolves the problem that different threads in a multi-threaded > task doing parallel memcg oom, but it doesn't solve the problem that > different tasks doing parallel memcg oom. > > It may happens that many different tasks in the same memcg are waiting > oom_lock at the same time, if one of them has already made progress and > freed enough available memory, the others don't need to trigger the oom > killer again. By checking memcg margin after hold oom_lock can help > achieve it. While the changelog makes sense it I believe it can be improved. I would use something like the following. Feel free to use its parts. " Memcg oom killer invocation is synchronized by the global oom_lock and tasks are sleeping on the lock while somebody is selecting the victim or potentially race with the oom_reaper is releasing the victim's memory. This can result in a pointless oom killer invocation because a waiter might be racing with the oom_reaper P1 oom_reaper P2 oom_reap_task mutex_lock(oom_lock) out_of_memory # no victim because we have one already __oom_reap_task_mm mute_unlock(oom_lock) mutex_lock(oom_lock) set MMF_OOM_SKIP select_bad_process # finds a new victim The page allocator prevents from this race by trying to allocate after the lock can be acquired (in __alloc_pages_may_oom) which acts as a last minute check. Moreover page allocator simply doesn't block on the oom_lock and simply retries the whole reclaim process. Memcg oom killer should do the last minute check as well. Call mem_cgroup_margin to do that. Trylock on the oom_lock could be done as well but this doesn't seem to be necessary at this stage. " > Suggested-by: Michal Hocko > Signed-off-by: Yafang Shao > Cc: Michal Hocko > Cc: Tetsuo Handa > Cc: David Rientjes > --- > mm/memcontrol.c | 19 +++++++++++++++++-- > 1 file changed, 17 insertions(+), 2 deletions(-) > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 1962232..df141e1 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -1560,16 +1560,31 @@ static bool mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask, > .gfp_mask = gfp_mask, > .order = order, > }; > - bool ret; > + bool ret = true; > > if (mutex_lock_killable(&oom_lock)) > return true; > + > /* > * A few threads which were not waiting at mutex_lock_killable() can > * fail to bail out. Therefore, check again after holding oom_lock. > */ > - ret = should_force_charge() || out_of_memory(&oc); > + if (should_force_charge()) > + goto out; > + > + /* > + * Different tasks may be doing parallel oom, so after hold the > + * oom_lock the task should check the memcg margin again to check > + * whether other task has already made progress. > + */ > + if (mem_cgroup_margin(memcg) >= (1 << order)) > + goto out; Is there any reason why you simply haven't done this? (+ your comment which is helpful). diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 248e6cad0095..2c176825efe3 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1561,15 +1561,21 @@ static bool mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask, .order = order, .chosen_points = LONG_MIN, }; - bool ret; + bool ret = true; - if (mutex_lock_killable(&oom_lock)) + if (!mutex_trylock(&oom_lock)) return true; + + if (mem_cgroup_margin(memcg) >= (1 << order)) + goto unlock; + /* * A few threads which were not waiting at mutex_lock_killable() can * fail to bail out. Therefore, check again after holding oom_lock. */ ret = should_force_charge() || out_of_memory(&oc); + +unlock: mutex_unlock(&oom_lock); return ret; } > + > + ret = out_of_memory(&oc); > + > +out: > mutex_unlock(&oom_lock); > + > return ret; > } > > -- > 1.8.3.1 -- Michal Hocko SUSE Labs