From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 65426C433EF
	for <linux-mm@archiver.kernel.org>; Mon, 24 Jan 2022 22:22:09 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id 501526B0080; Mon, 24 Jan 2022 17:22:08 -0500 (EST)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 4B0606B0081; Mon, 24 Jan 2022 17:22:08 -0500 (EST)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 351456B0083; Mon, 24 Jan 2022 17:22:08 -0500 (EST)
X-Delivered-To: linux-mm@kvack.org
Received: from forelay.hostedemail.com (smtprelay0237.hostedemail.com [216.40.44.237])
	by kanga.kvack.org (Postfix) with ESMTP id 288666B0080
	for <linux-mm@kvack.org>; Mon, 24 Jan 2022 17:22:08 -0500 (EST)
Received: from smtpin28.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251])
	by forelay05.hostedemail.com (Postfix) with ESMTP id E450F1834D724
	for <linux-mm@kvack.org>; Mon, 24 Jan 2022 22:22:07 +0000 (UTC)
X-FDA: 79066604694.28.4D50993
Received: from mail-pl1-f176.google.com (mail-pl1-f176.google.com [209.85.214.176])
	by imf27.hostedemail.com (Postfix) with ESMTP id 85E574000B
	for <linux-mm@kvack.org>; Mon, 24 Jan 2022 22:22:07 +0000 (UTC)
Received: by mail-pl1-f176.google.com with SMTP id d7so17260140plr.12
        for <linux-mm@kvack.org>; Mon, 24 Jan 2022 14:22:07 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20210112;
        h=sender:date:from:to:cc:subject:message-id:references:mime-version
         :content-disposition:in-reply-to;
        bh=8CStIv8gWVcbwkpZIV3t9cr9JCPK3W/EzXr/N1vDHDQ=;
        b=a3sjhR1W/0gJU/Z264ICYZpnu2bMWB04KHm59D9hScDkQf7aKJ6Gj8RvuJey4xiC+C
         QnIkj57ghqWEwz6MrKdKplZNogaX9X0gTYHtCwc75WYKUMpR82+CiDFuzIeVXfJiiLP7
         mRxWndDdm1NvRp4ITy6Jj19llN/7gGPeWO7wxyFxQa7pdEtTBFMJ71mWuqACY682wQbl
         KAkT7OGkDIsDxdQaWJgLXg77CgaBAMgDfRSNFdmpHseA69bGQMYQE7JvQmVr2UTRZLP6
         zVyTx/rvyPJ6OPxOiSVMP4UVN0uEt1cDkTtMQP7fkDJ2zOkg+XBbmy7Dp2DGNreDA88B
         hRZA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=x-gm-message-state:sender:date:from:to:cc:subject:message-id
         :references:mime-version:content-disposition:in-reply-to;
        bh=8CStIv8gWVcbwkpZIV3t9cr9JCPK3W/EzXr/N1vDHDQ=;
        b=bFWrczU2t7LQeIsTmsGfIi5LcT5lrtk6PuowQKOe/iGr/MmsxbPRQjrF6/L6rDjWG2
         tu9gTwvjOEqXJf7y1KwP8HwkRzM3HmZYo3xxqBIh9fOVQE3eGGcUx/+DbAuHBlZE2+R8
         7SnGNEWaGZMiOCqG1bbZNL8THDPxd++/rah/lAiK0J1Zf3x6Ab26ep4VtLo1fa9MjfYn
         DKM0GKTAj2fD+BX+6oboafRdlq44mpXNtTN9OYGRy0vQtIMxtfoxhe6wcyULHzwmKOpx
         lL2i5ExjM/w0eGqcGfm1Ainkdo9bMrnhBWKUyRmCzWDALMYlXiOQafFHX7dLZugpNiZM
         f8NA==
X-Gm-Message-State: AOAM531TpMpGFP869J1h5wTN6ecM7Ny68UCu26gMpW1Qn3aE1dPgHHRy
	2D53r9OqtG/9NjZJbenGh8Y=
X-Google-Smtp-Source: ABdhPJwlxStjO9LleOKukjnqzaOR9g9H559VVJIMQw0oL8Q3C/NYJRgHb96mhc2gTQinZ2kP4jc+xw==
X-Received: by 2002:a17:902:e8c9:b0:149:b88d:5980 with SMTP id v9-20020a170902e8c900b00149b88d5980mr16033490plg.171.1643062926493;
        Mon, 24 Jan 2022 14:22:06 -0800 (PST)
Received: from google.com ([2620:15c:211:201:ab64:bd9c:3c16:89c8])
        by smtp.gmail.com with ESMTPSA id y41sm4707418pfa.213.2022.01.24.14.22.04
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Mon, 24 Jan 2022 14:22:05 -0800 (PST)
Date: Mon, 24 Jan 2022 14:22:03 -0800
From: Minchan Kim <minchan@kernel.org>
To: Michal Hocko <mhocko@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	David Hildenbrand <david@redhat.com>, linux-mm <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Suren Baghdasaryan <surenb@google.com>,
	John Dias <joaodias@google.com>
Subject: Re: [RESEND][PATCH v2] mm: don't call lru draining in the nested
 lru_cache_disable
Message-ID: <Ye8mi80ObVZvLdS1@google.com>
References: <20211230193627.495145-1-minchan@kernel.org>
 <YeVzWlrojI1+buQx@dhcp22.suse.cz>
 <YedXhpwURNTkW1Z3@google.com>
 <YefX1t4owjlx/m5I@dhcp22.suse.cz>
 <YejkUlnnYeED1pC5@google.com>
 <YekcNmBqcpO9BYWv@dhcp22.suse.cz>
 <YenPK/JVNOhbxjtr@google.com>
 <YeqEBAKJ6NUjLQhr@dhcp22.suse.cz>
 <YessDywpsnCyrfIy@google.com>
 <Ye54ELlNBpeHoXsj@dhcp22.suse.cz>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <Ye54ELlNBpeHoXsj@dhcp22.suse.cz>
X-Rspamd-Server: rspam04
X-Rspamd-Queue-Id: 85E574000B
X-Rspam-User: nil
Authentication-Results: imf27.hostedemail.com;
	dkim=pass header.d=gmail.com header.s=20210112 header.b=a3sjhR1W;
	dmarc=fail reason="SPF not aligned (relaxed), DKIM not aligned (relaxed)" header.from=kernel.org (policy=none);
	spf=pass (imf27.hostedemail.com: domain of minchan.kim@gmail.com designates 209.85.214.176 as permitted sender) smtp.mailfrom=minchan.kim@gmail.com
X-Stat-Signature: ybnzteaxhiy9ipb788a8mr13kz7gs9pu
X-HE-Tag: 1643062927-344772
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

On Mon, Jan 24, 2022 at 10:57:36AM +0100, Michal Hocko wrote:
> On Fri 21-01-22 13:56:31, Minchan Kim wrote:
> > On Fri, Jan 21, 2022 at 10:59:32AM +0100, Michal Hocko wrote:
> > > On Thu 20-01-22 13:07:55, Minchan Kim wrote:
> > > > On Thu, Jan 20, 2022 at 09:24:22AM +0100, Michal Hocko wrote:
> > > > > On Wed 19-01-22 20:25:54, Minchan Kim wrote:
> > > > > > On Wed, Jan 19, 2022 at 10:20:22AM +0100, Michal Hocko wrote:
> > > > > [...]
> > > > > > > What does prevent you from calling lru_cache_{disable,enable} this way
> > > > > > > with the existing implementation? AFAICS calls can be nested just fine.
> > > > > > > Or am I missing something?
> > > > > > 
> > > > > > It just increases more IPI calls since we drain the lru cache
> > > > > > both upper layer and lower layer. That's I'd like to avoid
> > > > > > in this patch. Just disable lru cache one time for entire
> > > > > > allocation path.
> > > > > 
> > > > > I do not follow. Once you call lru_cache_disable at the higher level
> > > > > then no new pages are going to be added to the pcp caches. At the same
> > > > > time existing caches are flushed so the inner lru_cache_disable will not
> > > > > trigger any new IPIs.
> > > > 
> > > > lru_cache_disable calls __lru_add_drain_all with force_all_cpus
> > > > unconditionally so keep calling the IPI.
> > > 
> > > OK, this is something I have missed. Why cannot we remove the force_all
> > > mode for lru_disable_count>0 when there are no pcp caches populated?
> > 
> > Couldn't gaurantee whether the IPI is finished with only atomic counter.
> > 
> > CPU 0                               CPU 1
> > lru_cache_disable                   lru_cache_disable
> >   ret = atomic_inc_return
> >                                     
> >                                    ret = atomic_inc_return
> >   lru_add_drain_all(ret == 1);     lru_add_drain_all(ret == 1)
> >     IPI ongoing                    skip IPI
> >                                    alloc_contig_range
> >                                    fail
> >     ..
> >     ..
> > 
> >    IPI done
> 
> But __lru_add_drain_all uses a local mutex while the IPI flushing is
> done so the racing lru_cache_disable would block until
> flush_work(&per_cpu(lru_add_drain_work, cpu)) completes so all IPIs are
> handled. Or am I missing something?

 CPU 0                               CPU 1

 lru_cache_disable                  lru_cache_disable
   ret = atomic_inc_return;(ret = 1)
                                     
                                    ret = atomic_inc_return;(ret = 2)
                                    
   lru_add_drain_all(true);         
                                    lru_add_drain_all(false)
                                    mutex_lock() is holding
   mutex_lock() is waiting

                                    IPI with !force_all_cpus
                                    ...
                                    ...
                                    IPI done but it skipped some CPUs
               
     ..
     ..
 

Thus, lru_cache_disable on CPU 1 doesn't run with every CPUs so it
introduces race of lru_disable_count so some pages on cores
which didn't run the IPI could accept upcoming pages into per-cpu
cache.