From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=oiSc=AS=kvack.org=owner-linux-mm@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-7.0 required=3.0 tests=INCLUDES_PATCH,
	MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED
	autolearn=ham autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id C08F1C433E0
	for <linux-mm@archiver.kernel.org>; Tue,  7 Jul 2020 01:06:56 +0000 (UTC)
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by mail.kernel.org (Postfix) with ESMTP id 5FEB3206F6
	for <linux-mm@archiver.kernel.org>; Tue,  7 Jul 2020 01:06:56 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 5FEB3206F6
Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org
Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix)
	id 917C66B0002; Mon,  6 Jul 2020 21:06:55 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 8CB8E6B0003; Mon,  6 Jul 2020 21:06:55 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 7BA666B0008; Mon,  6 Jul 2020 21:06:55 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from forelay.hostedemail.com (smtprelay0113.hostedemail.com [216.40.44.113])
	by kanga.kvack.org (Postfix) with ESMTP id 62F9C6B0002
	for <linux-mm@kvack.org>; Mon,  6 Jul 2020 21:06:55 -0400 (EDT)
Received: from smtpin30.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251])
	by forelay05.hostedemail.com (Postfix) with ESMTP id D3805181AC9CC
	for <linux-mm@kvack.org>; Tue,  7 Jul 2020 01:06:54 +0000 (UTC)
X-FDA: 77009490348.30.price56_4300e7726eb0
Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251])
	by smtpin30.hostedemail.com (Postfix) with ESMTP id B26A5180B3C8B
	for <linux-mm@kvack.org>; Tue,  7 Jul 2020 01:06:54 +0000 (UTC)
X-HE-Tag: price56_4300e7726eb0
X-Filterd-Recvd-Size: 9394
Received: from mail-vs1-f67.google.com (mail-vs1-f67.google.com [209.85.217.67])
	by imf33.hostedemail.com (Postfix) with ESMTP
	for <linux-mm@kvack.org>; Tue,  7 Jul 2020 01:06:54 +0000 (UTC)
Received: by mail-vs1-f67.google.com with SMTP id x205so1416054vsc.11
        for <linux-mm@kvack.org>; Mon, 06 Jul 2020 18:06:54 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:date:from:to:cc:subject:message-id:references
         :mime-version:content-disposition:in-reply-to;
        bh=0rAY13BVmkOt6oIgOgE5Fk6r9TycBRd+bLvE307vPBA=;
        b=Mf05vj84+dqlwEhTFm14n0yEBWmCGJ1Z8i8/kj58iF8cayBLbcPOuRScb0iL5HuOC0
         N1iNJAPWGEAvfvIrAuzua4w/bWlPAoZNAlhasj7m2J++FaXknRAPyXRG9fS/yGqSxaLU
         EpQ838JMWjcs3sjGNR4N48JsDNQ0K/LO2ZfdaTWYshtFjB9xH3Uch3Pt8u5PRBb/REPU
         IRjlpJrCFZOQi+wuO/kl0q22bkbB7DT0/Nlh4r3rMCskrq6dX4ko1mVtIxiJFkE6/ziX
         7y5DCAOxiJBjGnrRaXFujQz0qGxYWotMiQQ75Sd4vd+hexMV2ZQxY+A36apEqWLEt+bR
         7tZg==
X-Gm-Message-State: AOAM532KqaMrmGt0YpvXLpVb+6daW/wLqBarN/FRR/ByZ4DP3hDmZNoB
	WvVYW9rlR6044MEmXpSx4uU=
X-Google-Smtp-Source: ABdhPJwLgFN+bIsY/8FQ1xDv6OvVHAUBcYPVJGpT4SuVimv86Vc8B2gcRq32LkK1U+B5AON1k2Jfeg==
X-Received: by 2002:a05:6102:21b4:: with SMTP id i20mr39117138vsb.164.1594084013542;
        Mon, 06 Jul 2020 18:06:53 -0700 (PDT)
Received: from google.com (239.145.196.35.bc.googleusercontent.com. [35.196.145.239])
        by smtp.gmail.com with ESMTPSA id t23sm76680vsa.20.2020.07.06.18.06.52
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Mon, 06 Jul 2020 18:06:52 -0700 (PDT)
Date: Tue, 7 Jul 2020 01:06:51 +0000
From: Dennis Zhou <dennis@kernel.org>
To: Feng Tang <feng.tang@intel.com>
Cc: Qian Cai <cai@lca.pw>, Andrew Morton <akpm@linux-foundation.org>,
	Dennis Zhou <dennis@kernel.org>, Tejun Heo <tj@kernel.org>,
	Christoph Lameter <cl@linux.com>,
	kernel test robot <rong.a.chen@intel.com>,
	Michal Hocko <mhocko@suse.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Matthew Wilcox <willy@infradead.org>, Mel Gorman <mgorman@suse.de>,
	Kees Cook <keescook@chromium.org>,
	Luis Chamberlain <mcgrof@kernel.org>,
	Iurii Zaikin <yzaikin@google.com>, andi.kleen@intel.com,
	tim.c.chen@intel.com, dave.hansen@intel.com, ying.huang@intel.com,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org, lkp@lists.01.org
Subject: Re: [mm] 4e2c82a409: ltp.overcommit_memory01.fail
Message-ID: <20200707010651.GA2384124@google.com>
References: <20200705044454.GA90533@shbuild999.sh.intel.com>
 <FAAE2B23-2565-4F36-B278-018A5AD219EE@lca.pw>
 <20200705125854.GA66252@shbuild999.sh.intel.com>
 <20200705155232.GA608@lca.pw>
 <20200706014313.GB66252@shbuild999.sh.intel.com>
 <20200706023614.GA1231@lca.pw>
 <20200706132443.GA34488@shbuild999.sh.intel.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20200706132443.GA34488@shbuild999.sh.intel.com>
X-Rspamd-Queue-Id: B26A5180B3C8B
X-Spamd-Result: default: False [0.00 / 100.00]
X-Rspamd-Server: rspam01
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

On Mon, Jul 06, 2020 at 09:24:43PM +0800, Feng Tang wrote:
> Hi All,
> 
> Please help to review this fix patch, thanks!
> 
> It is against today's linux-mm tree. For easy review, I put the fix
> into one patch, and I could split it to 2 parts for percpu-counter
> and mm/util.c if it's preferred.
> 
> From 593f9dc139181a7c3bb1705aacd1f625f400e458 Mon Sep 17 00:00:00 2001
> From: Feng Tang <feng.tang@intel.com>
> Date: Mon, 6 Jul 2020 14:48:29 +0800
> Subject: [PATCH] mm/util.c: sync vm_committed_as when changing memory policy
>  to OVERCOMMIT_NEVER
> 
> With the patch to improve scalability of vm_committed_as [1], 0day reported
> the ltp overcommit_memory test case could fail (fail rate is about 5/50) [2].
> The root cause is when system is running with loose memory overcommit policy
> like OVERCOMMIT_GUESS/ALWAYS, the deviation of vm_committed_as could be big,
> and once the policy is runtime changed to OVERCOMMIT_NEVER, vm_committed_as's 
> batch is decreased to 1/64 of original one, but the deviation is not
> compensated accordingly, and following __vm_enough_memory() check for vm
> overcommit could be wrong due to this deviation, which breaks the ltp
> overcommit_memory case.
> 
> Fix it by forcing a sync for percpu counter vm_committed_as when overcommit
> policy is changed to OVERCOMMIT_NEVER (sysctl -w vm.overcommit_memory=2).
> The sync itself is not a fast operation, and is toleratable given user is
> not expected to frequently changing policy to OVERCOMMIT_NEVER.
> 
> [1] https://lore.kernel.org/lkml/1592725000-73486-1-git-send-email-feng.tang@intel.com/
> [2] https://marc.info/?l=linux-mm&m=159367156428286 (can't find a link in lore.kernel.org)
> 
> Reported-by: kernel test robot <rong.a.chen@intel.com>
> Signed-off-by: Feng Tang <feng.tang@intel.com>
> ---
>  include/linux/percpu_counter.h |  4 ++++
>  lib/percpu_counter.c           | 14 ++++++++++++++
>  mm/util.c                      | 11 ++++++++++-
>  3 files changed, 28 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/percpu_counter.h b/include/linux/percpu_counter.h
> index 0a4f54d..01861ee 100644
> --- a/include/linux/percpu_counter.h
> +++ b/include/linux/percpu_counter.h
> @@ -44,6 +44,7 @@ void percpu_counter_add_batch(struct percpu_counter *fbc, s64 amount,
>  			      s32 batch);
>  s64 __percpu_counter_sum(struct percpu_counter *fbc);
>  int __percpu_counter_compare(struct percpu_counter *fbc, s64 rhs, s32 batch);
> +void percpu_counter_sync(struct percpu_counter *fbc);
>  
>  static inline int percpu_counter_compare(struct percpu_counter *fbc, s64 rhs)
>  {
> @@ -172,6 +173,9 @@ static inline bool percpu_counter_initialized(struct percpu_counter *fbc)
>  	return true;
>  }
>  
> +static inline void percpu_counter_sync(struct percpu_counter *fbc)
> +{
> +}
>  #endif	/* CONFIG_SMP */
>  
>  static inline void percpu_counter_inc(struct percpu_counter *fbc)
> diff --git a/lib/percpu_counter.c b/lib/percpu_counter.c
> index a66595b..02d87fc 100644
> --- a/lib/percpu_counter.c
> +++ b/lib/percpu_counter.c
> @@ -98,6 +98,20 @@ void percpu_counter_add_batch(struct percpu_counter *fbc, s64 amount, s32 batch)
>  }
>  EXPORT_SYMBOL(percpu_counter_add_batch);
>  
> +void percpu_counter_sync(struct percpu_counter *fbc)
> +{
> +	unsigned long flags;
> +	s64 count;
> +
> +	raw_spin_lock_irqsave(&fbc->lock, flags);
> +	count = __this_cpu_read(*fbc->counters);
> +	fbc->count += count;
> +	__this_cpu_sub(*fbc->counters, count);
> +	raw_spin_unlock_irqrestore(&fbc->lock, flags);
> +}
> +EXPORT_SYMBOL(percpu_counter_sync);
> +
> +
>  /*
>   * Add up all the per-cpu counts, return the result.  This is a more accurate
>   * but much slower version of percpu_counter_read_positive()
> diff --git a/mm/util.c b/mm/util.c
> index 52ed9c1..5fb62c0 100644
> --- a/mm/util.c
> +++ b/mm/util.c
> @@ -746,14 +746,23 @@ int overcommit_ratio_handler(struct ctl_table *table, int write, void *buffer,
>  	return ret;
>  }
>  
> +static void sync_overcommit_as(struct work_struct *dummy)
> +{
> +	percpu_counter_sync(&vm_committed_as);
> +}
> +

This seems like a rather niche use case as it's currently coupled with a
schedule_on_each_cpu(). I can't imagine a use case where you'd want to
do this without being called by schedule_on_each_cpu().

Would it be better to modify or introduce something akin to
percpu_counter_sum() which sums and folds in the counter state? I'd be
curious to see what the cost of always folding would be as this is
already considered the cold path and would help with the next batch too.

>  int overcommit_policy_handler(struct ctl_table *table, int write, void *buffer,
>  		size_t *lenp, loff_t *ppos)
>  {
>  	int ret;
>  
>  	ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
> -	if (ret == 0 && write)
> +	if (ret == 0 && write) {
> +		if (sysctl_overcommit_memory == OVERCOMMIT_NEVER)
> +			schedule_on_each_cpu(sync_overcommit_as);
> +
>  		mm_compute_batch();
> +	}
>  
>  	return ret;
>  }
> -- 
> 2.7.4
>      
> 
> On Sun, Jul 05, 2020 at 10:36:14PM -0400, Qian Cai wrote:
> > > In my last email, I was not saying OVERCOMMIT_NEVER is not a normal case,
> > > but I don't think user will too frequently runtime change the overcommit
> > > policy. And the fix patch of syncing 'vm_committed_as' is only called when
> > > user calls 'sysctl -w vm.overcommit_memory=2'.
> > > 
> > > > The question is now if any of those regression fixes would now regress
> > > > performance of OVERCOMMIT_NEVER workloads or just in-par with the data
> > > > before the patchset?
> > > 
> > > For the original patchset, it keeps vm_committed_as unchanged for
> > > OVERCOMMIT_NEVER policy and enlarge it for the other 2 loose policies
> > > OVERCOMMIT_ALWAYS and OVERCOMMIT_GUESS, and I don't expect the "OVERCOMMIT_NEVER
> > > workloads" performance  will be impacted. If you have suggetions for this
> > > kind of benchmarks, I can test them to better verify the patchset, thanks!
> > 
> > Then, please capture those information into a proper commit log when you
> > submit the regression fix on top of the patchset, and CC PER-CPU MEMORY
> > ALLOCATOR maintainers, so they might be able to review it properly.
> 
> 
> 

Thanks,
Dennis