From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 968EDC2BD09
	for <linux-mm@archiver.kernel.org>; Fri, 12 Jul 2024 07:36:55 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id 2F5E26B008C; Fri, 12 Jul 2024 03:36:55 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 257A16B0093; Fri, 12 Jul 2024 03:36:55 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 0AA416B0096; Fri, 12 Jul 2024 03:36:55 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11])
	by kanga.kvack.org (Postfix) with ESMTP id D7C6C6B008C
	for <linux-mm@kvack.org>; Fri, 12 Jul 2024 03:36:54 -0400 (EDT)
Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay06.hostedemail.com (Postfix) with ESMTP id 61E01A31E6
	for <linux-mm@kvack.org>; Fri, 12 Jul 2024 07:36:54 +0000 (UTC)
X-FDA: 82330293948.03.797F60B
Received: from mail-qv1-f42.google.com (mail-qv1-f42.google.com [209.85.219.42])
	by imf17.hostedemail.com (Postfix) with ESMTP id 9748D40007
	for <linux-mm@kvack.org>; Fri, 12 Jul 2024 07:36:52 +0000 (UTC)
Authentication-Results: imf17.hostedemail.com;
	dkim=pass header.d=gmail.com header.s=20230601 header.b=STWrqgPX;
	spf=pass (imf17.hostedemail.com: domain of laoar.shao@gmail.com designates 209.85.219.42 as permitted sender) smtp.mailfrom=laoar.shao@gmail.com;
	dmarc=pass (policy=none) header.from=gmail.com
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1720769796;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=/kvBs8ZxQh0XjWAysz4R8iw5c6TJRHiLbdSUkV+v0lo=;
	b=rBvDHZYXLcnQxnAkTCdmCzmy/rNgPeSOKmNAe6tNlliaqgEwrYobd+yWFDOMj0lBFMUvD9
	3EgEY9yX4CucM32h3aM9Uu+lYt1eYWkzC/SYAlTVu3kViQTU7kKasW678BsWit2IEm64V6
	L4MeuMpQARqblbfhB7GBkNvXHpU53pc=
ARC-Authentication-Results: i=1;
	imf17.hostedemail.com;
	dkim=pass header.d=gmail.com header.s=20230601 header.b=STWrqgPX;
	spf=pass (imf17.hostedemail.com: domain of laoar.shao@gmail.com designates 209.85.219.42 as permitted sender) smtp.mailfrom=laoar.shao@gmail.com;
	dmarc=pass (policy=none) header.from=gmail.com
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1720769796; a=rsa-sha256;
	cv=none;
	b=oHKTbcXq3hS20gFGHCqEzIkfowSsS5qLj+9Ru3u1NGh4904ztmY4QhAwydCtZYgLpTlOqJ
	pQAohgPU5JEw9iysD70gEENC+qMf6nuWFnZvxNERAOQ873CGjSQYcmd0a2rV9zdZ8zi24P
	MHnZ8Y4B1XdLbm1vhbK3FEY1UYnbjpY=
Received: by mail-qv1-f42.google.com with SMTP id 6a1803df08f44-6b5dd7cd945so9974116d6.1
        for <linux-mm@kvack.org>; Fri, 12 Jul 2024 00:36:52 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1720769811; x=1721374611; darn=kvack.org;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:from:to:cc:subject:date
         :message-id:reply-to;
        bh=/kvBs8ZxQh0XjWAysz4R8iw5c6TJRHiLbdSUkV+v0lo=;
        b=STWrqgPXKQz2NVJ0I0SIu2RE9Nb3DKVsKs5y6rrHUActprHLztxmsxm4G6/dpfGnJt
         48Iu/OcJt9gJV5vD4C+yf2wyHb8dVBdmCjTRWB2J/q+EEreu+/UKyjvpiT/a6oxl8rcc
         NAREMD4uKVA0rz7nF3jLDRKNt1Kk9UWenfayP5uW1KM9bPeR6VWuvZmoaeaNEyvvX4xz
         1rH8e8okhSOPJPNkAJP/1zGkICp5LsW57BMGPwJtH+Y4sonNSFYtnDUYhazlJh7cBoqf
         QCTtxIjZ1Yld0hYkfdqGaLdndVOoRwlHNVKNcE6K1Yc6F3nYBtjlvMoW1jqK0hP+mHJ0
         S6uQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1720769811; x=1721374611;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=/kvBs8ZxQh0XjWAysz4R8iw5c6TJRHiLbdSUkV+v0lo=;
        b=uTJGjpifFtmDesZOKZbQoXsQ8J4qw3teu1uR2/8zwRRYC56skUNcdPNns51nao42Xi
         oDagP1+9kZTYmAIlFNh4CQnMcr/sy+MhWFtmF02dTZ8MrD6qhuCjiWSfUMVBpMEE1rlO
         pSB+JUFXjaAWqWup88jAmOl9HMIyXj4XQOYAHmG+14d8xaLPMa1Wv5riZBIFuZ+LfeH+
         ZoeuO3WXV5LSNepOHJtUcHd3XrWjvpNxEQw04ZReI3OB8r1rZ2I5n5x9/qNuJscM0OOJ
         cCYxjncNiQG+P2k9F8wyBJ10oRZpFP50OLJUrWnQPV6ADMtPdBiFnEQSYp04szXuxBwJ
         xOow==
X-Forwarded-Encrypted: i=1; AJvYcCX9SytsPLox883mUwLGXkS9LFREjPN6pmDf0ahwCeGo23ZP2v+VLnm4+i7/GK+EIqh2apxvC6LvK0SK801iazaTCW8=
X-Gm-Message-State: AOJu0Yzsa7QvQ/SlHOk4JqkMibPASWIoMolnWO5RRrCHJ9osvVTguxVz
	c2q96iIvAG3yyqt0tGEQNbm16Qn+NOXc/kqpjLDs8xhl5D6JGsV0wAE+w6vx5dfqrr6ElSAdMLT
	sTlAYWKVdTBOndNx7tTw4EZu+LCA=
X-Google-Smtp-Source: AGHT+IEAzy/TqFbelI5CQ94RVHpdhkRdYGcpB/IKa+I3fhKgFLHVNqPpNJdSXchJtCqETvYtPzH4Jp+xfO5eyESlNrc=
X-Received: by 2002:a05:6214:d0f:b0:6b5:da50:ac28 with SMTP id
 6a1803df08f44-6b61c1d4f89mr137596316d6.38.1720769811522; Fri, 12 Jul 2024
 00:36:51 -0700 (PDT)
MIME-Version: 1.0
References: <20240707094956.94654-1-laoar.shao@gmail.com> <20240707094956.94654-4-laoar.shao@gmail.com>
 <878qyaarm6.fsf@yhuang6-desk2.ccr.corp.intel.com> <CALOAHbAnAPkYu4oDHgW48N_e+OSqzo6AzOnGY-LNPTDa48RU9Q@mail.gmail.com>
 <87o774a0pv.fsf@yhuang6-desk2.ccr.corp.intel.com> <CALOAHbDKvkFQO_WST0PTy=LvnfFTXTcnRWMtwWAzQy4eWSsphQ@mail.gmail.com>
 <87frsg9waa.fsf@yhuang6-desk2.ccr.corp.intel.com> <CALOAHbBEsmF3_udHNzpOTrRWscweysBgrGweH1s0SSueMhYP7A@mail.gmail.com>
 <877cds9pa2.fsf@yhuang6-desk2.ccr.corp.intel.com> <CALOAHbBXuY0v8LRBoAjmp8TjzLAzOZ8BbcSFmRZPMqfSey5aWA@mail.gmail.com>
 <87y1678l0f.fsf@yhuang6-desk2.ccr.corp.intel.com> <CALOAHbDG12P4LZN=ngenArGxmQFHcNADXfcEBLvXzPg9AdCRCA@mail.gmail.com>
 <87plrj8g42.fsf@yhuang6-desk2.ccr.corp.intel.com> <CALOAHbDT-d+nZCTUFo6fGTk1HKQCqCMEE-_CuZSxEkQGR0veKQ@mail.gmail.com>
 <87h6cv89n4.fsf@yhuang6-desk2.ccr.corp.intel.com> <CALOAHbA2PWXSUL4eo0Wc2YgPrMD9mgZda-MMJt9XukqSmhFG9w@mail.gmail.com>
 <87cynj878z.fsf@yhuang6-desk2.ccr.corp.intel.com> <CALOAHbBDDdGpx1Q=c8-nDgfJ6HHfTZ__rO+4+vn05oybqJBq3g@mail.gmail.com>
 <874j8v851a.fsf@yhuang6-desk2.ccr.corp.intel.com>
In-Reply-To: <874j8v851a.fsf@yhuang6-desk2.ccr.corp.intel.com>
From: Yafang Shao <laoar.shao@gmail.com>
Date: Fri, 12 Jul 2024 15:36:15 +0800
Message-ID: <CALOAHbCKzEXW6-8ApzYNsh=Ert+a0=GOS6k1enOMzMTVXg2Uqw@mail.gmail.com>
Subject: Re: [PATCH 3/3] mm/page_alloc: Introduce a new sysctl knob vm.pcp_batch_scale_max
To: "Huang, Ying" <ying.huang@intel.com>
Cc: akpm@linux-foundation.org, mgorman@techsingularity.net, linux-mm@kvack.org, 
	Matthew Wilcox <willy@infradead.org>, David Rientjes <rientjes@google.com>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Rspamd-Server: rspam03
X-Rspam-User: 
X-Rspamd-Queue-Id: 9748D40007
X-Stat-Signature: 11yzz8p38sfoaiz4rb47wjyowr6w96ir
X-HE-Tag: 1720769812-483035
X-HE-Meta: U2FsdGVkX19RD/A1wj8JwI38ogTDSJNZtJfivPa+4URY0X3CbGfjLnrV3/CcFys20KW/js/19UqDZZi23JTBGgN+Vg1qh/DOvO4qVChjBouyRW8ic8FpeKsosISemXLFUK2x2bS/WmHLTXHGC+/ud70Xmhgy16ffLXgJJnxLLs6FcqPyFPu1FvpiTnJ9x93atrCcKdnY4vcSKIJRVqRBRJTj2PFIa4MGnFps0uT5zVqGDYGjYpoUUxZtFCbLPWNtUeRRo6QA7uFhiOxQW3t9clXibu2lBmHIc58ZHeG7R15TvLWkOBW9h+F6SzLcdCH3aI93lw0pBAvqfeje6KQXwU/W4NWGJlbr9RI3YBO1VKfzcHQiMLyhf8lTIPchF5T+3dSx3eZQ9JOLpTGCXodJeGHjA9SiDVD2B2PVmHRL4FSOZgZLfVecdnI3RWUDeGU04OUGeo0ANn3dpagdPmbT7lKb2R5UXoZnmjKVVB0rOmFboKOcrslw3/nADjSFgzT693ZrBUA4Krua6VfAyVDL7OAPJcbzUoyORVUs7sCEhD/eSdBVqBfU9b9jp/sMi/Wc2M4UVlzN6qQnkEfXR6t3G9pqV6OLFWi00he3qmNFGp/wneLs6a45kVnPqNhhSjPDNor82oGr1HHpZ7qZHiKfTwp/BW0DyR9M9YCs4kOTEXA/DU/gYMdBfm+yUEPaWzVvToqdGFTEYuTZ10af+fJwmbQJurbHfDRRtV2b6iP4ee8lfWwngzUzNzbOzcKwna2VvCz/j3hA9u/DPYR7vNuFQg24ny1RG9S8j4KzKI/uwoW9npkwZaBjdKS5/AGLeTfgCgZ6Dhxfb3RVS0xtXJX27lvCUoP8Tp4LoOgggepXRKahek+v/30mWy+btJYWB6yXVtmHVl0HzJKmO7pOCV0qqayHOy0VNgmROe6Z1zx0JOzX+yQCzEhnABmq/nSelZa8AhfAakSdVPRuJgA04zZ
 hrFxl3sh
 2xa8WF3PqLfL5KNflC2TChDKD8cPiR02Sc2htmTItLWHfilv6IEP76tlYUjhpDsP8eQjShBB5DnuaB+8ME0PmJD+AA16ZOZtjHx0A7pWmldryJFtAaO5kaaFHFqPkPYK2eae3XDjYaInrP8sai5v6LoYoL0ahsfWoPWhehE6ZXzidnIpQBN6DSTIAIBBNgUdhqL20BOTbkjN6BfzPogr/NHJO/Vd7F/f1isU1NLliHsiEUqSOCHnwGFE+QFLUbliw3VgzXSAfdEl7qlPeUTgrR984kfJS72aYUyiW8/4sYmFTtiX35A8Eb3zIlzPuwPtZ1C8HkDkwjh9P4+7pp9usNYbVl+sLBwBarsyWsUWKcM3haDr4RzfJpBw76Qmmm/AxfJrqmIvfUK8j8lV7E0lFSJcAfi06d3gETcFr
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000103, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>
List-Subscribe: <mailto:majordomo@kvack.org>
List-Unsubscribe: <mailto:majordomo@kvack.org>

On Fri, Jul 12, 2024 at 3:06=E2=80=AFPM Huang, Ying <ying.huang@intel.com> =
wrote:
>
> Yafang Shao <laoar.shao@gmail.com> writes:
>
> > On Fri, Jul 12, 2024 at 2:18=E2=80=AFPM Huang, Ying <ying.huang@intel.c=
om> wrote:
> >>
> >> Yafang Shao <laoar.shao@gmail.com> writes:
> >>
> >> > On Fri, Jul 12, 2024 at 1:26=E2=80=AFPM Huang, Ying <ying.huang@inte=
l.com> wrote:
> >> >>
> >> >> Yafang Shao <laoar.shao@gmail.com> writes:
> >> >>
> >> >> > On Fri, Jul 12, 2024 at 11:07=E2=80=AFAM Huang, Ying <ying.huang@=
intel.com> wrote:
> >> >> >>
> >> >> >> Yafang Shao <laoar.shao@gmail.com> writes:
> >> >> >>
> >> >> >> > On Fri, Jul 12, 2024 at 9:21=E2=80=AFAM Huang, Ying <ying.huan=
g@intel.com> wrote:
> >> >> >> >>
> >> >> >> >> Yafang Shao <laoar.shao@gmail.com> writes:
> >> >> >> >>
> >> >> >> >> > On Thu, Jul 11, 2024 at 6:51=E2=80=AFPM Huang, Ying <ying.h=
uang@intel.com> wrote:
> >> >> >> >> >>
> >> >> >> >> >> Yafang Shao <laoar.shao@gmail.com> writes:
> >> >> >> >> >>
> >> >> >> >> >> > On Thu, Jul 11, 2024 at 4:20=E2=80=AFPM Huang, Ying <yin=
g.huang@intel.com> wrote:
> >> >> >> >> >> >>
> >> >> >> >> >> >> Yafang Shao <laoar.shao@gmail.com> writes:
> >> >> >> >> >> >>
> >> >> >> >> >> >> > On Thu, Jul 11, 2024 at 2:44=E2=80=AFPM Huang, Ying <=
ying.huang@intel.com> wrote:
> >> >> >> >> >> >> >>
> >> >> >> >> >> >> >> Yafang Shao <laoar.shao@gmail.com> writes:
> >> >> >> >> >> >> >>
> >> >> >> >> >> >> >> > On Wed, Jul 10, 2024 at 10:51=E2=80=AFAM Huang, Yi=
ng <ying.huang@intel.com> wrote:
> >> >> >> >> >> >> >> >>
> >> >> >> >> >> >> >> >> Yafang Shao <laoar.shao@gmail.com> writes:
> >> >> >> >> >> >> >> >>
> >> >> >> >> >> >> >> >> > The configuration parameter PCP_BATCH_SCALE_MAX=
 poses challenges for
> >> >> >> >> >> >> >> >> > quickly experimenting with specific workloads i=
n a production environment,
> >> >> >> >> >> >> >> >> > particularly when monitoring latency spikes cau=
sed by contention on the
> >> >> >> >> >> >> >> >> > zone->lock. To address this, a new sysctl param=
eter vm.pcp_batch_scale_max
> >> >> >> >> >> >> >> >> > is introduced as a more practical alternative.
> >> >> >> >> >> >> >> >>
> >> >> >> >> >> >> >> >> In general, I'm neutral to the change.  I can und=
erstand that kernel
> >> >> >> >> >> >> >> >> configuration isn't as flexible as sysctl knob.  =
But, sysctl knob is ABI
> >> >> >> >> >> >> >> >> too.
> >> >> >> >> >> >> >> >>
> >> >> >> >> >> >> >> >> > To ultimately mitigate the zone->lock contentio=
n issue, several suggestions
> >> >> >> >> >> >> >> >> > have been proposed. One approach involves divid=
ing large zones into multi
> >> >> >> >> >> >> >> >> > smaller zones, as suggested by Matthew[0], whil=
e another entails splitting
> >> >> >> >> >> >> >> >> > the zone->lock using a mechanism similar to mem=
ory arenas and shifting away
> >> >> >> >> >> >> >> >> > from relying solely on zone_id to identify the =
range of free lists a
> >> >> >> >> >> >> >> >> > particular page belongs to[1]. However, impleme=
nting these solutions is
> >> >> >> >> >> >> >> >> > likely to necessitate a more extended developme=
nt effort.
> >> >> >> >> >> >> >> >>
> >> >> >> >> >> >> >> >> Per my understanding, the change will hurt instea=
d of improve zone->lock
> >> >> >> >> >> >> >> >> contention.  Instead, it will reduce page allocat=
ion/freeing latency.
> >> >> >> >> >> >> >> >
> >> >> >> >> >> >> >> > I'm quite perplexed by your recent comment. You in=
troduced a
> >> >> >> >> >> >> >> > configuration that has proven to be difficult to u=
se, and you have
> >> >> >> >> >> >> >> > been resistant to suggestions for modifying it to =
a more user-friendly
> >> >> >> >> >> >> >> > and practical tuning approach. May I inquire about=
 the rationale
> >> >> >> >> >> >> >> > behind introducing this configuration in the begin=
ning?
> >> >> >> >> >> >> >>
> >> >> >> >> >> >> >> Sorry, I don't understand your words.  Do you need m=
e to explain what is
> >> >> >> >> >> >> >> "neutral"?
> >> >> >> >> >> >> >
> >> >> >> >> >> >> > No, thanks.
> >> >> >> >> >> >> > After consulting with ChatGPT, I received a clear and=
 comprehensive
> >> >> >> >> >> >> > explanation of what "neutral" means, providing me wit=
h a better
> >> >> >> >> >> >> > understanding of the concept.
> >> >> >> >> >> >> >
> >> >> >> >> >> >> > So, can you explain why you introduced it as a config=
 in the beginning ?
> >> >> >> >> >> >>
> >> >> >> >> >> >> I think that I have explained it in the commit log of c=
ommit
> >> >> >> >> >> >> 52166607ecc9 ("mm: restrict the pcp batch scale factor =
to avoid too long
> >> >> >> >> >> >> latency").  Which introduces the config.
> >> >> >> >> >> >
> >> >> >> >> >> > What specifically are your expectations for how users sh=
ould utilize
> >> >> >> >> >> > this config in real production workload?
> >> >> >> >> >> >
> >> >> >> >> >> >>
> >> >> >> >> >> >> Sysctl knob is ABI, which needs to be maintained foreve=
r.  Can you
> >> >> >> >> >> >> explain why you need it?  Why cannot you use a fixed va=
lue after initial
> >> >> >> >> >> >> experiments.
> >> >> >> >> >> >
> >> >> >> >> >> > Given the extensive scale of our production environment,=
 with hundreds
> >> >> >> >> >> > of thousands of servers, it begs the question: how do yo=
u propose we
> >> >> >> >> >> > efficiently manage the various workloads that remain una=
ffected by the
> >> >> >> >> >> > sysctl change implemented on just a few thousand servers=
? Is it
> >> >> >> >> >> > feasible to expect us to recompile and release a new ker=
nel for every
> >> >> >> >> >> > instance where the default value falls short? Surely, th=
ere must be
> >> >> >> >> >> > more practical and efficient approaches we can explore t=
ogether to
> >> >> >> >> >> > ensure optimal performance across all workloads.
> >> >> >> >> >> >
> >> >> >> >> >> > When making improvements or modifications, kindly ensure=
 that they are
> >> >> >> >> >> > not solely confined to a test or lab environment. It's v=
ital to also
> >> >> >> >> >> > consider the needs and requirements of our actual users,=
 along with
> >> >> >> >> >> > the diverse workloads they encounter in their daily oper=
ations.
> >> >> >> >> >>
> >> >> >> >> >> Have you found that your different systems requires differ=
ent
> >> >> >> >> >> CONFIG_PCP_BATCH_SCALE_MAX value already?
> >> >> >> >> >
> >> >> >> >> > For specific workloads that introduce latency, we set the v=
alue to 0.
> >> >> >> >> > For other workloads, we keep it unchanged until we determin=
e that the
> >> >> >> >> > default value is also suboptimal. What is the issue with th=
is
> >> >> >> >> > approach?
> >> >> >> >>
> >> >> >> >> Firstly, this is a system wide configuration, not workload sp=
ecific.
> >> >> >> >> So, other workloads run on the same system will be impacted t=
oo.  Will
> >> >> >> >> you run one workload only on one system?
> >> >> >> >
> >> >> >> > It seems we're living on different planets. You're happily wor=
king in
> >> >> >> > your lab environment, while I'm struggling with real-world pro=
duction
> >> >> >> > issues.
> >> >> >> >
> >> >> >> > For servers:
> >> >> >> >
> >> >> >> > Server 1 to 10,000: vm.pcp_batch_scale_max =3D 0
> >> >> >> > Server 10,001 to 1,000,000: vm.pcp_batch_scale_max =3D 5
> >> >> >> > Server 1,000,001 and beyond: Happy with all values
> >> >> >> >
> >> >> >> > Is this hard to understand?
> >> >> >> >
> >> >> >> > In other words:
> >> >> >> >
> >> >> >> > For applications:
> >> >> >> >
> >> >> >> > Application 1 to 10,000: vm.pcp_batch_scale_max =3D 0
> >> >> >> > Application 10,001 to 1,000,000: vm.pcp_batch_scale_max =3D 5
> >> >> >> > Application 1,000,001 and beyond: Happy with all values
> >> >> >>
> >> >> >> Good to know this.  Thanks!
> >> >> >>
> >> >> >> >>
> >> >> >> >> Secondly, we need some evidences to introduce a new system AB=
I.  For
> >> >> >> >> example, we need to use different configuration on different =
systems
> >> >> >> >> otherwise some workloads will be hurt.  Can you provide some =
evidences
> >> >> >> >> to support your change?  IMHO, it's not good enough to say I =
don't know
> >> >> >> >> why I just don't want to change existing systems.  If so, it =
may be
> >> >> >> >> better to wait until you have more evidences.
> >> >> >> >
> >> >> >> > It seems the community encourages developers to experiment wit=
h their
> >> >> >> > improvements in lab environments using meticulously designed t=
est
> >> >> >> > cases A, B, C, and as many others as they can imagine, ultimat=
ely
> >> >> >> > obtaining perfect data. However, it discourages developers fro=
m
> >> >> >> > directly addressing real-world workloads. Sigh.
> >> >> >>
> >> >> >> You cannot know whether your workloads benefit or hurt for the d=
ifferent
> >> >> >> batch number and how in your production environment?  If you can=
not, how
> >> >> >> do you decide which workload deploys on which system (with diffe=
rent
> >> >> >> batch number configuration).  If you can, can you provide such
> >> >> >> information to support your patch?
> >> >> >
> >> >> > We leverage a meticulous selection of network metrics, particular=
ly
> >> >> > focusing on TcpExt indicators, to keep a close eye on application
> >> >> > latency. This includes metrics such as TcpExt.TCPTimeouts,
> >> >> > TcpExt.RetransSegs, TcpExt.DelayedACKLost, TcpExt.TCPSlowStartRet=
rans,
> >> >> > TcpExt.TCPFastRetrans, TcpExt.TCPOFOQueue, and more.
> >> >> >
> >> >> > In instances where a problematic container terminates, we've noti=
ced a
> >> >> > sharp spike in TcpExt.TCPTimeouts, reaching over 40 occurrences p=
er
> >> >> > second, which serves as a clear indication that other application=
s are
> >> >> > experiencing latency issues. By fine-tuning the vm.pcp_batch_scal=
e_max
> >> >> > parameter to 0, we've been able to drastically reduce the maximum
> >> >> > frequency of these timeouts to less than one per second.
> >> >>
> >> >> Thanks a lot for sharing this.  I learned much from it!
> >> >>
> >> >> > At present, we're selectively applying this adjustment to cluster=
s
> >> >> > that exclusively host the identified problematic applications, an=
d
> >> >> > we're closely monitoring their performance to ensure stability. T=
o
> >> >> > date, we've observed no network latency issues as a result of thi=
s
> >> >> > change. However, we remain cautious about extending this optimiza=
tion
> >> >> > to other clusters, as the decision ultimately depends on a variet=
y of
> >> >> > factors.
> >> >> >
> >> >> > It's important to note that we're not eager to implement this cha=
nge
> >> >> > across our entire fleet, as we recognize the potential for unfore=
seen
> >> >> > consequences. Instead, we're taking a cautious approach by initia=
lly
> >> >> > applying it to a limited number of servers. This allows us to ass=
ess
> >> >> > its impact and make informed decisions about whether or not to ex=
pand
> >> >> > its use in the future.
> >> >>
> >> >> So, you haven't observed any performance hurt yet.  Right?
> >> >
> >> > Right.
> >> >
> >> >> If you
> >> >> haven't, I suggest you to keep the patch in your downstream kernel =
for a
> >> >> while.  In the future, if you find the performance of some workload=
s
> >> >> hurts because of the new batch number, you can repost the patch wit=
h the
> >> >> supporting data.  If in the end, the performance of more and more
> >> >> workloads is good with the new batch number.  You may consider to m=
ake 0
> >> >> the default value :-)
> >> >
> >> > That is not how the real world works.
> >> >
> >> > In the real world:
> >> >
> >> > - No one knows what may happen in the future.
> >> >   Therefore, if possible, we should make systems flexible, unless
> >> > there is a strong justification for using a hard-coded value.
> >> >
> >> > - Minimize changes whenever possible.
> >> >   These systems have been working fine in the past, even if with low=
er
> >> > performance. Why make changes just for the sake of improving
> >> > performance? Does the key metric of your performance data truly matt=
er
> >> > for their workload?
> >>
> >> These are good policy in your organization and business.  But, it's no=
t
> >> necessary the policy that Linux kernel upstream should take.
> >
> > You mean the Upstream Linux kernel only designed for the lab ?
> >
> >>
> >> Community needs to consider long-term maintenance overhead, so it adds
> >> new ABI (such as sysfs knob) to kernel with the necessary justificatio=
n.
> >> In general, it prefer to use a good default value or an automatic
> >> algorithm that works for everyone.  Community tries avoiding (or fixin=
g)
> >> regressions as much as possible, but this will not stop kernel from
> >> changing, even if it's big.
> >
> > Please explain to me why the kernel config is not ABI, but the sysctl i=
s ABI.
>
> Linux kernel will not break ABI until the last users stop using it.

However, you haven't given a clear reference why the systl is an ABI.

> This usually means tens years if not forever.  Kernel config options
> aren't considered ABI, they are used by developers and distributions.
> They come and go from version to version.
>
> >>
> >> IIUC, because of the different requirements, there are upstream and
> >> downstream kernels.
> >
> > The downstream developer backport features from the upsteam kernel,
> > and if they find issues in the upstream kernel, they should contribute
> > it back. That is how the Linux Community works, right ?
>
> Yes.  If they are issues for upstream kernel too.
>
> --
> Best Regards,
> Huang, Ying


--=20
Regards
Yafang